Editorial
Hiroshi Amma Faculty of Education, Tokyo University, Tokyo, Japan Paul Bertelson Laboratoire de Psychologie E...
25 downloads
1445 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Editorial
Hiroshi Amma Faculty of Education, Tokyo University, Tokyo, Japan Paul Bertelson Laboratoire de Psychologie Experimentale, Universite Libre de Bruxelles Avenue Adot’phe Buyl, 117 1050 BruxelZes, Belgique Ned Block Dept. of Philosophy, M.Z.T. Cambriolpe, Mass., U.S.A. T. G. R. Bower Dept. of Psychology, University of Edinburgh, Edinburgh, Great Britain Franwis Bresson Laboratoire de Psychologie, E.P.H.E. Paris, France Roger Brown Dept. of Psychology, Harvard University, Cambridge, Mass., U.S.A. Jerome S. Bruner Center for Cognitive Studies, Harvard University, Cambridge, Mass., U.S.A. Noam Chomsky Dept. Modern Languages and Lit&sties, M.Z.T., Cambridge, Mass., U.S.A.
Peter D. Eimas Walter S. Hunter Laboratory of Psychology, Brown University, Providence, Rho& Island 02912, U.S.A. Gunnar Fant Lob. of Speech Transmission, Royal Institute of Technology, Stockholm, Sweden Jerry Fodor Dept. of PsychoZogy, M.Z.T., Cambridge, Mass., U.S.A. Kenneth Forster Dept. of Psychology, Monash University, Clayton, Melbourne, Australia Merrill Garrett Department of Psychology, M.Z.T. ElO-034 Cambrt@e, Mass. 02139, U.S.A. Pierre Greco Laboratoire de Psychologie, 54, bvd. Raspail, Paris de. France Jean-Blab Grize, I, ChantemerZe, Neuch&tel, St&se David T. Hakes Department of Psychology, University of Texas, Austin, Texas 78712, U.S.A.
board
Henry Hecaen Directeur d’Etudes, Ecole Pratique des Hautes Etudes, Unite de Recherches Neuropsychologiques, Z.N.S.E.R.M., 2, Rue d’dlesia, 75 Paris 14, France Michel Imbert Laboratoire de Neurophysiologic, College de France, 11 Place Marcelin Berthelot, Paris 5, France Barbel Inhelder Institut des Sciences de I’Education, Palais Wilson, Gendve, Suisse Marc Jeannerod Laboratoire de Neuropsychologie Experimentale, 16 Av. Doyen Lepine 69500 Bron, France James Jenkins Center for Research and Human Learning, University of Minnesota, Minneapolis, Minn. 55455 U.S.A. Daniel Kahneman Dept. of Psychology, The Hebrew University of Jerusalem, Israel
Jerrold J. Katz Dept. of Philosophy, M.I.T., Cambrkige. Mass., U.S.A. Edward Klima Dept. of Linguistics, La Jolla, University of California, San Diego, Calif. 92037, U.S.A. Eric H. Lenneberg Dept. of Psychology, Cornell University, Ithaca, N. Y., U.S. A. Alexei Leontiev Faculty of Psychology, University of Moscow, Moscow, U.S.S.R. Wilhelm Levelt Psychological Laboratory, Nomegen University, Nijmegen, the Netherlands A. R. Luria University of Moscow, 13, Frunze Street, Moscow G. 19, U.S.S.R. John Lyons Dept. of Linguistics, Adam Ferguson Building, Edinburgh, Great Britain Humberto Maturana Escuela de Medicine, Universidad de Chile, A. Sanartu 1042, Santiago, Chile
John Morton Applied Psychology Unit, Cambridge, Great Britain George Noizet Laboratoire de Psychologie Experimentale, Abe-en-Province, France Domenico Parisi Instituto di Psicologia, Consiglio Nazionale delle Richer&e, Rome, Italy Michael Posner Dept. of Psychology, University of Oregon, Eugene, Oregon, U.S.A. Nicolas Ruwet Dept. de Linguistique, Centre Univ. de Vincennes, Paris, France Harris B. Savin Dept. of Psychology, University of Pennsylvania, Philadelphia, Pa., U.S.A. Robert Shaw Center for Research and Human Learning, University of Minnesota, Minneapolis, Minnesota U.S.A. Hermina Sinclair de Zwart Centre d’Epistemologie Genetique, Get&e, Suisse
Dan I. Slobin Department of Psychology, University of California, Berkeley, California 94720, U.S.A. Jan Smedshmd Institute of Psychology, Universitet I Oslo, Oslo, Norway Sydney Strauss Department of Educational Sciences Tel Aviv University, Ramat Aviv, Israel Alma Szeminska Olesiska 513, Warsaw, Poland Yoshihisa Tanaka Dept. of Psychology, University of Tokyo, Bunkyo-ku, Tokyo 113, Japan Hans-Lukas Teuber Dept. of Psychology, M.I.T., Cambridge, Mass. 02139 U.S.A. Peter Wason University College London, Gower Street, London W.C.1, Great Britain
Editorial
When we accepted a paper from South Africa1 on its scientific merits, we were at first tempted to justify its inclusion in Cognition since many policies of that country are recognizably abhorrent. However, we concluded that it would be false liberalism to detail a defense of our editorial decision. Of course we deplore the regime currently governing South Africa, but then we are equally disturbed by policies of many other countries, e.g. the killing of Indians with modem warfare techniques for expropriation of land, the bombing of people fighting for their national independence, the penetration into underdeveloped countries by multinational trusts, the murder of political opponents, torture and persecution . . . and let us not forget those countries that publicly voice their unerring pursuit of justice and morality while they quietly sell arms to any buyer. ‘Liberals’ are not the only ones who consider condemning individuals because of the country or group they belong to. In some countries, inteIlectuaIs who have been banned from the ruling party are not allowed to publish their scientific results in professional journals or elsewhere.2 Other examples could be brought to the reader’s attention. In every case, we are confronted with actions that attempt to preserve the power of a State or cult without regarding individuals and the social nuclei in which they live and express themselves. It is evident that most nations participate in the practice of power which leads to acts that would be considered immoral if carried out by individuals. This does not release individuals from sharing responsibility in the acts of the geographically defined countries where they happen to reside. However, it is a traditional argument of scientists to distinguish an individual’s politics from an individual’s behavior as a scientist. This is the essence of ‘scientific neutrality’. Thus it is clearly a mistake to judge an individual by the country in which he or she lives. However, we must raise the question of the scientist’s social responsibility, the conditions under which scientific endeavours are harmful and the way in which such questions relate to the image that we have of ourselves as scientists. We have all been taught to consider that great men of science are dedicated, totally 1. Miller, R., The use of concrete and abstract concepts by children and adults. Cognition,
2(l), 49-58. 2. Le Momfe, December 27,1972.
8
Editorial
devoted, generous and untarnished by base political interests. There is nothing new about this ‘pasteurized’ conception of the Professor. Society has always had a dynamic need to preserve an image of the academy as a realm ensuring the unbiased evaluation of cultural knowledge and its applications. This image simplifies the complex motivations and contradictions which are embodied within each scientist and the scientific corporation. We do not deny that this conception may have been functional in the past; however it is being examined with increasing mistrust. Today some people are so disgusted by the social uselessness of much research that they have turned against all science; others contend that science is totally political. Still others suggest that science reflects primarily its logistics e.g. the structure of the laboratories, the funding system, the manner in which promotions are decided. Though we may agree or disagree with these points, we must grant that this debate reflects a growing awareness that science does not function in a political vacuum, but in societies. The socio/political background is a major factor in the determination of the way in which theoretical problems are conceived and formulated and also determines financial and administrative decisions. New research ideas that receive support in turn provide new models for society to consider. But the dependence of these models on the existing social forms guarantees that they will not be ‘new’ but will be a superficial reorganization of the pre-existing socially determined structures. Some scientists have adopted an odd way of extricating themselves from any debate on such issues. They claim that rational, scientific knowledge is neutral and that all the current doubts about science are triggered only by irresponsible applications. We even read that scientists themselves ought to oversee the technological applications of scientific knowledge and thereby play the role of apolitical and uninvolved judges. However, such a role would be false in many respects. In the first place, scientists work in the context of a competitive society, in competitive research centers where money is distributed according to goals that they rarely understand. In many such centers independence, criticism and questions are allegedly sought by all. However, ‘new’ ideas must fall within the pre-established limits of the subject under discussion and thus are emasculated from the outset. Questioning ‘side issues’ like the motivations behind considering specific theoretical or practical problems leads to the discovery that some questions are not as welcome as others. Clearly science is not as free to experiment as one would like. In making such statements we do not pretend that we have an answer as to what constitutes ‘socially valuable or valueless’ research. What we want to stress is the necessity of understanding what motivates scientists to follow certain lines of research. If one never questions why one pursues one activity rather than another, if one does not examine the social context and the ideology behind one’s work, the inevitable outcome is manipulation by those who have a clear intention to lead the community
Editorial
9
in a particular direction. Social science is a particularly sensitive field when considered from this standpoint. It is characteristic that many of those who insist on an apolitical conception of science emerge as major government advisors. On closer consideration of the political views of scientists, however, it becomes clear that they are willing to use the alleged sanctity of science as a political argument. For example, some ‘apolitical’ scientists have signed a petition maintaining that all research is neutral including that which explores ‘racial’ bases for individual capacity.3 They go so far as to claim that their most general interest is to protect academic freedom. First they alarm the reader with a reminder of Hitler (who, ironically, in addition to everything else, passed the Eugenic Sterilization Act of 1933). They then argue that many of those who attack them today are ‘militants’ and even ‘anti-scientists’ and suggest that to express hereditarian views or recommend further studies of the biological basis of behavior today is similar to being a heretic in the Middle Ages. One can argue with the petitioners quite vigorously on these matters. Rather than deplore attacks by militants and anti-scientists, we are more impressed by the fact that even when racism attained the hideous summit the apolitical scientists refer to, geneticists rarely spoke out: Though genetics were misused, geneticists remained silent. Why?” ‘The aversion of scientists for publicity and popularization is a long standing tradition in many countries. Those scientists who did go to the public were frequently suspected of charlatanism, weakness of character, and “irresponsibility”.’ It is still rare to find scientists questioning their own role. However, there is generally more awareness and response to such proposals by a number of serious researchers. It is striking that such reactions are perceived by the petitioners as the proposals of ‘militants’ and ‘anti-scientists’. Furthermore, the claim that those scientists who wish are not free today to investigate the influence of ‘race’ on behavior is absolutely false. In the past few years there has been a great deal of such research which presupposes racist and elitist ideas. Often critics of the position that heredity is the necessary basis for social distinction are at the same time philosophical nativists. Chomsky, for example, one of the most coherent critics of Skinner and Herrnstein, is also known to most psychologists for his nativist theories. This demonstrates that an interest in understanding the psychology of humanity can lead to a nativist assumption about what all people have in common. Others are interested in finding out differences between groups of people which is a use of psychology that serves an elitist ideology by presupposing it. But, as a counterpetition has stated,6 ‘theories of racial inferiority are rendered untenable by the 3. The resolution
on scientific
freedom,
cember 1972) Encounter. 4. Carlson, E.A (1972) A tormented
(Dehistory
(Book Review). Science, 180, 584-586. 5. The Committee Against Racism at the University of Connecticut, Storrs, COM.
10 Editorial
evidence of human history: every population has developed its own complex culture. Contrary to the supremacist view, the people of Africa and Asia have, at various times produced civilizations far more advanced than those existing simultaneously in Europe. Moreover, the constant geographical shift of centers of culture is in itself proof of equal capabilities of all people. It is nonsense to suppose genetic superiority wandering about the world.’ In itself the notion of superiority pertains to value judgments rather than to scientific enquiry. Although there are certain current theoretical proposals to turn psychology into a science of values,” such a science is not only premature today but it will remain so until it is based on a theory of humanity. It is in the context of such a controversy that the possible social applications of hereditarian views become most poignant. For example, we may decide as some societies have, that the ailing are to be suppressed, or, on the contrary, since they are less active physically, that they are to serve as the intellectuals. Other societies may decide that all heavy work should be produced by children since they are more docile than adults, while another society assumes that children should not have to work until puberty or adolescence. Thus the utilization of differences in humans is entirely based on a choice of values and cannot be scientifically decidable as an issue. But when any society is based on rewarding those who have the greatest socially defined ‘ability’, by placing them in a priviledged position, social conflict is an internal necessity. To study the alleged genetic basis for social conflict simply diffuses attention from the real causes. It is puzzling why there is so much interest today in this kind of research among otherwise serious scientists. There is no theory that unites the higher mental processes, so apparently mental tests are basically uninterpretable. Even the most general properties governing language and thought are still unknown to us; accordingly, how is it possible to engage in secondary enterprises such as ‘studying’ whether one group of humans is intellectually superior to another one, or to ask if ‘race’ (a difficult concept to define biologically) is a good indicator of success in this society for genetic reasons rather than for reasons of differential oppression. Given the current fundamental ignorance about cognition, a line of ‘research’ such as this can only be politically motivated, and it is with political arguments that it must be countered. To return to our original issue - the acceptance of a paper from a politically unfashionable country. The considerations raised make it clear that within every modern society the structure of social and scientific enterprises presupposes a form of elitism, just as all modern countries act unethically. This could lead to a sort of perverse censorship if we examined every article for geographical and ideological cleanliness: Such is not our intention. Rather we believe that the only hope of pulling social science 6. Skinner, B.F., (1972) Cunlularive 4 selection of‘ papers. New York,
record: Appleton
Century Croft5
Editorial 11
out of the dilemma we have outlined is to increase public and private discussion of the new ways of using science as a force for change rather than as a force for maintenance of the modus vivendi quo ante. No doubt we shall be accused of introducing political and social considerations into purely scientific debates. This does not disturb us; we hope that we have clarified why we cannot agree with the claim that science is apolitical and asocial. However, we have certainly not given concrete answers to the issues raised at the beginning of the editorial. One reason is that the social responsibility of scientists is a dynamic concept that evolves every day: The debate must be expanded and continued as an intrinsic part of all future science. J. MEHLER T. G. BEVER
N.B. The above-signed editorial does not necessarily reflect opinions of the other members of the Editorial Board. We urge readers to
consider the issues we raise and respond to them.
1
Seven
principles parsing
JOHN
of surface structure in natural language*
KIMBALL
Indiana University
Abstract In generative grammar there is a traditional distinction between sentence acceptability, having to do with performance, and sentence grammaticality, having to do with competence. The attempt of thispaper is to provide a characterization of the notion ‘acceptable sentence’ in English, with some suggestions as to how this characterization might be made universal. The procedure is to outline a set of procedures which are conjectured to be operative in the assignment of a surface structure tree to an input sentence. To some extent, these principles of parsing are modeled on certain parsing techniques formulated by computer scientists for computer languages. These principles account for the high acceptability of right branching structures, outline the role of grammatical function words in sentence perception, describe what seems to be ajixed limit on shortterm memory in linguistic processing, and hypothesize the structure of the internal syntactic processing devices. The operation of various classes of transformations with regard to preparing deep structures for input to parsing procedures such as those outlined in the paper is discussed.
1.
Introduction
In grammar there is a distinction between those sentences which are rejected by speakers on grounds of grammaticality versus those rejected for performance reasons. Thus, there is a quadripartite division of the surface structures of any language: (a) Those sentences which are both grammatical and acceptable, e.g. ‘It is raining’; * I am indebted to Frank DeRemer for many discussions on the material in this paper, particularly for providing information on current computer work in parsing, and for sug-
gestions concerning the principles discussed in section 4; also to Cathie Ringen, Jorge Hankammer, Paul Postal and the reviewers for reading and commenting on an earlier version. Cognition 2(l), pp. 15-47
16 John Kimball
(b) those sentences which are grammatical but unacceptable, e.g. ‘Tom figured that that Susan wanted to take the cat out bothered Betsy out’; (c) those which are ungrammatical but acceptable, e.g. ‘They am running’; and (d) those which are both ungrammatical and unacceptable, e.g. ‘Tom and slept the dog’. Part of the problem facing linguists, in particular psycholinguists, is to find a characterization of the processing of linguistic experience, i.e. performance, adequate to distinguish unacceptability from ungrammaticality. In the following paper I will present an interconnected set of principles of surface structure parsing in natural language which is intended to provide a characterization of the notion ‘acceptable sentence’. In particular, these hypotheses will explain the difficulty experienced by native speakers of English when sentences of the classical problem types such as in (1) are encountered. (1) a. That that that two plus two equals four surprised Jack astonished Ingrid bothered Frank. b. Joe believes that Susan left to be interesting. c. The boat floated on the water sank. d. The girl the man the boy saw kissed left. Further, these hypotheses will explain why a sentence like (2b) is not normally interpreted as (2a) in the same way that (3b) can be interpreted as (3a). (2) a. The woman that was attractive took the job. b. The woman took the job that was attractive. (3) a. The woman that was attractive fell down. b. The woman fell down that was attractive. The first section of the paper will be concerned with the traditional hypotheses that have been presented to account for the unacceptability of sentences like (1). In Section 3 I will look at some of the parsing techniques designed by computer scientists for processing the sentences of programming languages. Of particular interest will be those techniques which allow a string to be parsed top-down left-to-right building a phrase structure tree over the string as it is read, since this is the process employed by speakers of natural languages. We will find that there are restrictions on the class of languages which allow this kind of parsing, and it will be possible to examine the claim that part of the function of transformations is to put deep structures in a form to allow limited memory left-to-right parsing. It will also be possible to examine the claim (c$ Chomsky, 1965, chapter 1) that transformations function to construct a surface tree which is optimally parsed by those techniques available to native speakers. In Section 5 we will see that the known transformations of English divide themselves into three distinct classes with respect to the kinds of output structures that are produced.
Seven principles of surface structure parsing in natural language
2.
17
Some previous accounts of surface structure complexity
It is interesting to consider first attempts to explain the low acceptability of a sentence like (Id) ‘The girl the man the boy saw kissed left’. The surface parse tree for this sentence is shown in (4). (4) NP
VP
-
Npft
.cFN
I
the
I I kissed
g&l Np -Deli tl!e
N&P m/n
De% I I the boy
I saw
One hypothesis (Fodor and Garrett, 1967) is that such sentences are perceptually complex due to the low proportion of terminal symbols (words) to sentences. In general, one of the effects of transformations seems to be to reduce the non-terminal to terminal ratio in surface structures (Chomsky, 1965) as compared to deep structures; i.e., to map relatively tall structures such as (5a) into relatively flat structures such as (5b). b.
A D-C d -b
a a-b
It is argued that flaL structures are preceptually less complex, although no explanation of this fact has been offered. Why it should be that flat structures are easier to parse than deeply embedded structures is accounted for by Principle Two of Section4 below. The Fodor and Garrett hypothesis seems to be a specific form of this more general hypothesis. This specific hypothesis is, however, easily falsified by considering sentences which have fewer terminals per S than (Id), but which are perceptually much less complex. (6) is such a sentence.
18 John Kimball
(6) NP I
V&P
th&P
I
,bes I she
I runs
ma* I die
VP I w’alks
In (Id) the ratio of terminals to S-nodes is 9: 3 ; in (6) it is 7: 3. Thus, simple terminal to S ratio will not account for all cases of perceptual complexity, even those it was designed to cover. The more general terminal to non-terminal ratio is also inadequate. That this ratio is relevant to perceptual complexity can be predicted by the principles presented below in Section 4. Chomsky and Miller (1963) claim that (Id) is complex because the perceptual strategy of assigning a N (noun) to the following V (verb) is interrupted. Thus, the subject-verb relation must be assigned to the boy and saw before it can be assigned to the man and kissed, and this before it can be assigned to the girl and left. This hypothesis treats the difficulty of center embedding as one of assigning grammatical and, perhaps, semantic relations to the elements of a sentence. The principles discussed below treat the difficulty in (Id) in terms of surface tree configuration only. I conjecture, then, that it is not the impediment of a principle of semantic interpretation which is involved in the complexity of (Id). Rather, the complexity lies in the structure of the surface tree. The principles below attempt to outline the exact nature of this difficulty. Before presenting these principles, it will be useful to prepare the way for them by considering techniques of parsing designed for computer languages.
3.
Parsing algorthms for programming languages
Programming languages are often based on context free languages; the problem of designing parsers for such languages is solved by constructing a parser from the CF grammar for the language. Since the parser ‘knows’ the productions of the language, its job is then defined as that of selecting which productions had to have applied to generate a particular input string. (A usual requirement for programming languages is that their grammar be unambiugous, so the solution to the parsing problem for any string is always unique.) The parser must reconstruct the derivational history of a string from context. For example, if a grammar contains two productions with identical right hand sides, say A +X, and B +X, where X is an arbitrary string of
Seven principles of surface structure parsing in natural language
19
terminals and non-terminals, then when the parser reads X, it must decide from context which production in fact applied. For example, A might be introduced by a rule -+ E B, E -+e. Thus, when X is to be parsed as C+DA,D-+d,andBbytheruleC an A, it will always be preceded by a d and when it is to be parsed as a B, it will be preceded by an e. Given that this is the general type of problem that must be solved by a parser, then two types of questions may be asked concerning a particular language. First, is it possible to parse the string deterministically from left-to-right as it is read into the computer. And second, what is the size of the largest context that must be examined at any time by the parser to decide how to build the parse tree? For reasons of efficiency, an optimal programming language can be considered to be one in which it is possible to parse left-to-right, and for which there is a fixed finite bound on the size of the forward context which must be examined at any point. There are two general strategies used in parsing algorithms (c$ McKeeman et al., 1970). In the first, a tree is built for an input string by starting with the initial symbol of the grammar (that which is topmost in all trees generated by the grammar) and building a tree downwards to the terminal symbols. Such procedures are called top-down. The operation of such a procedure on an input string ai,. . . a,” for a grammar with an initial symbol S is illustrated below:
iii . . . a, The first step of the algorithm is to build a tree down to the first symbol of the input string, ai,. The language must be such that it is always possible to do so uniquely by examining no more than k symbols ahead of ai, (where k is fixed for the language) if the parser is to be deterministic. At this point, B acts as the new ‘initial’ symbol for some substring, and the parser operated to complete B, again working top-down. When B is filled out, the next higher node in the pathway down from S is the new initial symbol, for the purposes of the parser. The class of grammars which permit parsing by an algorithm such as that outlined above is called the class of LL(k) grammars. For such languages trees can be built top-down, and it is never necessary to look more than k symbols ahead of the given input symbol to determine what action is to be taken by the parser. The second type of parsing procedure involves building a tree from the bottom-up. The first action of such a parser is to assign the first m input symbols to some node, which is then placed at the top of a stack. Thus, in parsing ai, . . . ai,, if the ntst m symbols are dominated by B, the parser will operate as illustrated below:
20
John Kimball
Stack Input string Stage 1: ai, . . . a,at,+, . . . ain Stage 2: B aim+1 . . . ai The parse is completed when the initial symbol is the only symbol in the stack, and the input string has been completely read in. Those languages which can be parsed bottom-up by looking ahead in the input string no more than k symbols are called LR(k) languages; the grammars which generate such languages are called LR(k) grammers (c$ Knuth, 1965). In general, parsing of computer languages differs from that in natural languages in two significant ways. The first involves the fact that programming language grammars are unambiguous; a parser for a programming language yields a unique tree for each string. Not only this, but also the behavior of the computer parser must be deterministic. That is, its action at any given string of input terminals and stack configuration must be uniquely determined. A model of parsing in natural language must allow for more than one parse and should predict on the basis of considerations of surface structure complexity which parse will most likely be offered as the first choice by a native speaker. Second, and most important, parsing in computer languages differs from that in natural languages in that a computer parser is allowed an essentially unrestricted memory. For example, in the case of a parser for an LR(k) grammar, it is possible to look ahead k symbols and then decide that the appropriate action is to read in the next symbol. This can be done until the whole string is read into memory before the parse tree starts being built. On the other hand, there is considerable evidence that short-term memory (STM) in humans is quite restricted, and that a tree must be built over an input string constantly so that the initial parsed string may be cleared from STM. Principle Four below concerns what seems to be a fixed limit on linguistic STM. Because of the limitations on STM, the form of a parseable (acceptable) surface structure in natural language is quite restricted. 4.
Six or seven principles of surface structure parsing
The following principles, although presented here as distinct, are closely linked and interact in various ways, as will be pointed out during the discussion. The central principle of the scheme discussed here is the last, and from it various of the others follow deductively. 4.1
Principle One (Top-Down): Parsing in natural language proceeds according to a top-down algorithm
The operation is that parsing
of such algorithms was outlined above in Section 3. The claim made of natural languages is like that in LL(k) languages, with one variation
Seven principles of surface structure parsing in natural language
21
noted below. Thus, the first node built in such parsing is the top S, while in bottom-up parsing this is the last node to be constructed. Before considering the consequences of such an assumption, let us consider it in operation in parsing a sentence like (7). (7) That the boy and the girl left amazed us. The first step upon hearing the initial terminal that is to build a tree down from S of the form (Sa).
(8) a.
We may ask why such a tree is justified given that the initial terminal that could also have begun a sentence with a totally different structure, e.g. ‘That is a nice flower’. To reduce the number of false starts, we may add the assumption that English is a look-ahead language. That is, the speaker is allowed to hear symbols subsequent to a given terminal before making up his mind as to the appropriate or most probable tree structure. I have no immediate empirical justification for this assumption, other than general considerations of simplicity and efficiency of parsing. The assumption is based on the conjecture that it may require less computation to hold a symbol in storage without tree attachment while one, but probably no more than two, subsequent symbols are scanned, than to build a tree only to have to return to alter it. (We will see later that sentences in which large tree changes are required during parsing are indeed perceptually complex.) The reader will notice that complementizers are represented as Chomsky adjoined (cf: Kimball, 1972a, for a discussion of Chomsky adjunction) to the left of the appropriate notes, following Ross (1967). There is ample justification of this purely in terms of syntax. We will see later on that further justification arises from considering the role complementizers and other ‘function’ words play in perceptual routines. Let us continue now to parse (7). Upon reading the, (8a) becomes (8b), which becomes (8~) after boy is read.
(8) b.
2
NP
2
NP
22
John Kimball
At this point and is read, signalling a conjunction, and three new NP nodes are constructed, shown in (8d). (8)
d.
I
the
I
boy
The and here is represented as Chomsky adjoined to the phrase following it. Again, justification for this can be found in Ross (1967). The circled NP is inserted between the lower S and the first NP. Insertion of such nodes I claim is possible, and such insertion is the only deviation from the general claim that trees are built top-d0wn.l Furthermore, it seems to be the case that nodes that are so inserted are copies of already built nodes, so the
s structure is preserved. NP Once the NP the boy is completed, all subsequent material is taken as belonging to a different phrase. If there were a relative clause construction, such as ‘the boy who kissed the girl’, as soon as boy is read, the NP is closed. Reading of who occasions building a new phrase, as shown in (9). 1. As this type of parsing differs from the usual top-down procedures, we may seek a new name for it. De Remer suggests over-the-top (OTT). In fact, it may be suggested that the mechanism of parsing in fact utilised in natural language is this: Trees are not built down to single terminals but with regards to adjacent pairs of terminals (discriminant pairs). Given an initial member of a pair, a tree is built overthe-top down to the second member. This could be done in one of at least three ways: (1) The tree is built up only as far as the lowest common dominating node for the pair under consideration; (2) the tree is built up only as
far as the lowest common dominating S node for the pair, and then down to the second member; or (3) the tree is built all the way up to the highest S node, and down to the second member. As I have given it in the paper, the parsing hypothesized for natural language corresponds to this third type of OTT parsing. There are some testable consequences which result from taking one or another of the positions outlined above. These have to do with what is maintained in STM, and what is cleared into the syntactic processor. This question is discussed in more detail under principle seven below.
Seven principles of surface structure parsing in natural language
23
3-Y_
(9)
#x
Det
I
the
7 r” who “;’ boy
This observation concerning the closing a phrases is illustrative of a principle that operates generally in sentence parsing. This principle of closure will be discussed in detail below and is connected with another principle dealing with how semantic information is processed. Returning once more to (7), after the and girl are read, the tree looks like (8e).
(8) e.
When kft is read, the embedded sentence is closed, following the principle alluded to above, given look ahead to make sure no other possible parts of that sentence occurred (e.g., early, or to NY). When amazed is read a new VP in the main sentence is constructed, and an NP added as us is read. The final parse tree is shown in (Sf).
(8) f.
IL
Y-L
YP s
th:cS
v
ariazed
NP-Vp
YP N ,‘s
NP-P /2 Det I
the
4 A N and NP &t I
boy
A
Dct I
N I
the girl The ‘top-down’ principle interacts in an interesting way with the following principle.
24
John Kimball
4.2
Principle Two (Right Association): Terminal symbols optimally associate to the lowest nonterminal node.
This principle is designed in part to explain the frequently observed fact that sentences of natural language organize themselves generally into right-branching structures such as (lOa), and that these structures are perceptually less complex than leftbranching structures, such as (lob), or center embedded structures, such as (IOc). (10)
a.
b. *a&j b+ C%
A
BA a C-b D/\c
c.
A B-a b% D-c
h 1 i There is considerable evidence for the existence of such a principle. First, consider a sentence such as (11). (11) Joe figured that Susan wanted to take the train to New York out. The surface structure of this sentence is shown in (12). It will be noticed that the particle out must be associated with a node other than the lowest (and, thus, rightmost). That is, ‘out’ is not associated with the node dominating ‘New York’ but with a higher node. (12) l&VP k v?G-----
This principle also explains why a sentence that is potentially ambiguous, such as ‘Joe figured that Susan wanted to take the cat out’ is read by speakers naturally in a way such that ‘take the cat out’ is a phrase. The status of such sentences constituted a puzzle for Bever (1970a) in that no known principle other than general ‘memory limitation’ would explain the difficulty
Seven principles of surface structure parsing in natural language
25
of such sentences; he was unsure whether they should be marked ungrammatical or merely unacceptable. The sentence Bever gives is (37) (his numbering). (37) I thought the request of the astronomer who was trying at the same time to count the constellations on his toes without taking his shoes off or looking at me over. Such sentences should be counted as fully grammatical in that they are generated by general syntactic mechanisms. Their perceptual complexity is explained by Right Association, which is related to the principle of Closure discussed below. Right Association accounts for the difficulty with phrases like ‘the boy who Bill expected to leave’s ball’, or the preferred but incorrect interpretation of ‘the boy who Sam gave the ball’s book’ (incorrect reading is that it’s the ball’s book; correct is that it is the boy’s book). The reason is that the possessive ‘s optimally associates with the lowest constituent, instead of the higher NP dominating the whole phrase. This principle also explains the preferred interpretation of a sentence like (13). (13) The girl took the job that was attractive. Which is that which is not synonymous with (14). (14) The girl that was attractive took the job. However, there is a general grammatical process that would form (13) out of (14), known as Extraposition From NP. This transformation maps a sentence like (ISa) into (15b). (15) a. The girl that was attractive went to NY. b. The girl went to NY that was attractive. The reason for the preferred
interpretation
of (13) can be seen from its parse tree (16).
(16)
was
attractive
Principle Two predicts that the relative clause will be perceived as associated with the lowest, rightmost node, i.e., the job rather than as a daughter of the top S, where it would have to be interpreted as an extraposed relative. The seeming unambiguity of sentences like (13) have been taken as evidence for the existence of a special and probably quite powerful form of grammatical mechanism known as transderivational constraints. Briefly, according to the hypothesizers of such constraints, a derivation in which (13) comes from (14) is to be blocked because there
26 .John Kimball
already exists a sentence of the same form with a different interpretation. The fact is that (13) can be read as (14) with a little effort; it’s just that this reading is perceptually more difficult, due to Principle Two. Thus, this datum should be removed as evidence for the existence of transderivational constraints. Without going into the matter in detail, I conjecture that all putative evidence for these devices are of the form of that above, namely, they can be explained in terms of preferred interpretation on the basis of established principles of perception. If so, there is no reason to include transderivation constraints among the stock of possible grammatical mechanisms in the theory of universal grammar. (But c$ Hankammer, 1973.) There seem to be grammatical mechanisms to avoid the generation of sentences that would be perceptually complex under Principle Two, i.e., which would involve assigning terminals to other than the lowest, rightmost non-terminal. A transformation known as Heavy NP Shift is a case in point. This transformation moves heavy NP’s to the right hand side of sentences, where definition of ‘heavy’ is discussed in Ross (1967). Thus, a sentence like (17a) would be mapped into (17b). (17) a. Joe gave a book that was about the skinning of cats in Alberta between 1898 and 1901 to Berta. b. Joe gave to Berta a book that was about the skinning of cats in Alberta between 1898 and 1901. The perceptual complexity of (17a) can be seen in its surface parse tree (18). (18)
Seven principles of surface structure parsing in natural language
27
(It will be noticed that what are traditionally called prepositional phrases [e.g., ofcats] are here represented as NPs with Chomsky adjoined prepositions. Justification for this may be found in Ross, 1967). We can utilize the principle of Right Association in gaining a partial understanding of the complexity of sentences such as (Id) ‘The girl the man the boy saw kissed left’ which initiated the discussion of this paper. Part (but not all) of the difficulty resides in the fact that the verb kissed is optimally associated with the lowest, rightmost constituent of the tree. As this association is impossible on semantic grounds, it must receive association with a VP node in a higher sentence. The same goes for left. In this way (Id) violates Right Association. Another confirmation of this principle comes from observing the natural association of adverbs. In this respect, consider a sentence like (19). (19) NP& Jie V-h I said
I
\
rain
yesterday
The dotted lines indicate the possible association of yesterday. Compare these with the natural associations of this adverb as the sentence is interpreted. The easiest reading is that in which the adverb is read as attached to the lowest VP, next easiest is that reading in which it hangs off the middle VP, and the hardest or least likely reading is that in which it is associated with said. This is exactly the prediction made by Right Association. In fact, we can define a metric such that the perceptual complexity of a sentence is proportional to the number of nodes above the lowest rightmost node to which a terminal is attached. It is to be noted that there is a syntactic device available in English to disambiguate (19), as shown in (20a-c).
28
John Kimball
(20)
a. Yesterday Joe said that Martha expected that it would rain. b. Joe said that yesterday Martha expected it would rain. c. Joe said that Martha expected that yesterday it would rain. Notice that a sentence like (21) is read most naturally such that the adverb is associated with the higher clause. (21) Haastiin knew yesterday it rained. That is, the most natural than (22b),
structure
imputed
to (21) is that shown
in (22a) rather
(22)
it rained yesterday even though it is quite possible for an adverb to hang at the beginning of a sentence, as must be case, e.g., with (20a). That this should be the case is again predicted by the principle of Right Association. That is, the adverb must associate with the lowest, rightmost node. Conceivably, the tree could be restructured when the new embedded sentence is built, but such restructuring is very costly as discussed above and as will be elaborated below in the principle of Fixed Structure. Let us consider some apparent counter-examples to Right Association. For example, this principle requires that in ‘Joe bought the book for Susan’, ‘the book for Susan’ should be interpreted as a phrase more readily than ‘bought’, ‘the book’, and ‘for Susan’ being interpreted as separate constituents of a VP, because the new NP ‘for Susan’ should preferably be assigned to the lowest completed node. That this is not the case is a function of the interaction of parsing with semantic information accessible to the speaker during the sentence scan. In particular, the verb ‘buy’ carries lexical information with it such that the speaker would expect to hear a ‘for’ phrase in its VP. This interaction of semantics with parsing will be discussed further below when the principle of Processing is presented. In the same vein, notice that while ‘Joe cooked the peas in the pot’ is ambiguous, with either reading of equal complexity, ‘Joe rode down the street in the car’ does not carry the same ambiguity. That is, one does not read ‘the street in the car’ as a phrase because of semantics or knowledge about the world that streets usually aren’t in cars. Again we see the role of outside information influencing parsing.
Seven principles of surface structure parsing in natural language
29
Thus, the above are not counter-examples to Right Association. Rather, this principle defines the optimal functioning of the parsing algorithm if no outside effects are relevant. Its operation may be superceded by mechanisms other than the parser. The principle of Right Association operates in connection with another principle of perception, that of New Nodes. This principle is needed to account for the following observation: In processing a sentence, when the speaker has constructed a node A shown in (23a) and attached to it daughters a, b . . ., upon reading the next terminal, c, Right Association demands that c be connected as shown in (23b). However, two other things may in fact happen. First, some new node, B, may appear and be subtended under A, as in (23c), or B may be Chomsky adjoined to the right of A, as in (23d). (23)
a.
A / a b...c
c
b. a b...
c
d.
a&R . . . I c
A& A aB...
T c
All three forms of assimilating the new terminal into the existing parse tree shown in (23b-d) are observed to occur in natural language. Right Association predicts that (23a) should become (23b). New Nodes is designed to predict when (23a) will become either (23~) or (23d), i.e., when the terminal c occasions the construction of a new phrase. This principle is stated as follows:
4.3
Principle Three (New Nodes): The construction of a new node is signalled by the occurrence of a grammatical function word
There is a traditional grammatical distinction in the discussion of the parts of speech between what are called content words (nouns, verbs, adjectives, etc.) and function words (prepositions, conjunctions, etc.). In the literature of transformational grammar, this distinction surfaces in terms of the difference between lexical formatives and grammatical formatives. For the time being I will focus on just prepositions, wh-words (e.g., what, where, who, how, when, why, etc.) conjunctions, and complementizers (that, for-to, and pos-ing). Later other categories will be examined as to whether they work like function words for purposes of perception. There is syntactic evidence that grammatical formatives are Chomsky adjoined on surface structure (cf. Ross, 1967). (The assumption that this is the case is in fact not necessary to the correct operation of New Nodes, but I shall maintain the assumption in that which follows.) Thus, what is traditionally called a prepositional phrase is in fact a NP, as in (24a), and the complementizers and conjunctions appear on surface structure as in (24b,c).
30
(24)
John Kimball
a.
b. NP t haA
PrefiP
There is no direct proof that fronted wh-words, as in ‘What did he say?’ or ‘The boy who you say’ are Chomsky adjoined to the front of their clauses. As mentioned above, it makes no difference for New Nodes whether this is the case or not. Let us examine how New Nodes operates to correctly predict the parsing of a sentence like (25). (25) She asked him or she persuaded him to leave. After reading the first three words of (25) a tree such as that in (26a) is constructed. (26)
a. NPAVP I she “_P I asked
I him
At this point a conjunction is reached, and the speaker must decide whether there is a conjunction of NPs (She asked him or her to leave), of VPs (She asked him or persuaded him to leave) or of Ss, as in (25). In this case a look ahead of one word reveals that the latter is the case, and (26b) is constructed, where the new node is Chomsky adjoined to the right of the top S. (26)
b.
asked
him
Right Association says that in this case the conjunction of NPs is easiest, as it is the lowest node, that of VPs next easy, and a conjunction of Ss hardest to perceive. This seems in fact to be the case for the sentences listed above. The deeper the node from which a conjunction proceeds, the more perceptually complex the sentence. An extreme case would be: ‘Everyone said that Bill thought that Max believed that she was, although no one in his right mind who had been the movie would expect that Fred had told Sally that she was pregnant.’ The perceptual complexity arises from the large internal constituent breaks, of which there are two in the above sentence, one before the conjunction although, and one before pregnant.
Seven principles of surface structure parsing in natural language
31
When the to of to leave in (25) is reached, this is a signal that a new node, in this case a VP, is to be formed. The structure of the conjunction forces this to be Chomsky adjoined to the top S, yielding a final parse tree (26~). (26) c. s
persuaded
him
Thus, (25) illustrates how it is that both conjunctions and complementizers occasion the construction of a new node. New Nodes predicts that sentences in which the complementizers and relative pronouns have been deleted by optional rules will be perceptually more complex than those in which complementizers are present, i.e., that (27a) is more complex than (27b), and (28a) more complex than (28b). (27) a. He knew the girl left. b. He knew that the girl left. (28) a. The boy who the girl who the man saw kissed left. b. The boy the girl the man saw kissed left. There is experimental evidence to support this contention. Hakes (1972) found that sentences with complementizers were processed faster than those without complementizers. He writes: ‘When an optional cue to a sentence’s underlying grammatical relations is deleted, the difficulty of comprehending is increased. These results taken together with the numerous results on relative pronoun deletion suggest that the cue deletion effect is general and not limited to a particular cue or structure’ (pp. 283-284). Thus, New Nodes supplies a second piece in fitting together an explanation of the difficulty of a sentence like (Id). (The third, and perhaps crucial, piece comes from Principle Four below.) The particular example considered above illustrates only the nodes formed by a conjunction and the complementizer to. In sentences like (27a) or (27b), the complementizer that signals the existence of an embedded sentence. The occurrence of this complementizer here introduces a structure with three new nodes, as shown in (27~).
32
John Kimball
(27)
a. b.
Joe knew that it was a duck. That it was a duck annoyed Joe. NP
C.
$
A preposition
(28)
introduces
the node NP, as in (28).
S VP_---& Susan :Y went
to+4f B&ton
Traditionally articles (a, the) ale included as function words, and they do in fact serve to introduce new phrases, although they are not Chomsky adjoined. We should include, perhaps, all words which fill the determiner slot in surface structme: several, all, each, every, few, etc. Finally, we may consider auxiliaries, which traditionally were counted as function words. There is a debate concerning the proper surface structure of the auxiliaries. Following the Chomsky (1957) analysis, it would be (29a); Ross’ analysis (1967b) would predict (29b) as the correct structure. b. (29) a.
have
V I been
VP I 7
I
sleeping The evidence for auxiliaries occurring in a right associative configuration as shown in (29b) is quite strong, having to do with deletion, the operation of There Insertion, and other matters. If Ross is right, then auxiliaries do fit the pattern of other function words of introducing new phrases.
Seven principles of surface structure parsing in natural language
33
This statement of the role of function words in sentence perception as signallers of new phrases includes no hypothesis concerning their semantic role or their syntactic origin in deep structure. Function words themselves are among the things learned later in the process of acquisition, the first stage being that of telegraphic speech (Brown and Bellugi, 1964). Likewise, they are generally absent from ‘pidgins’. In both cases, the grammatical structures may be conjectured to be not sufficiently complicated (say, in terms of occurrence of embedded sentences) to require cues to surface structure. We may hypothesize that there is a certain permissible complexity of surface structures which do not require indicators of constituent organization. In a free word order language, not so much of the surface tree is relevant to a determination of the underlying syntactic relations, and surface structures in such languages may be flat and relatively uncomplicated compared with a language such as English; thus, such a language may not need overt cues to surface parsings. The operation of New Nodes in SOV languages needs further examination. In such languages grammatical formatives typically follow those constitutents to which they are attached (as pointed out to me by Jorge Hankammer). For cases where the constituent is a simple NP with a post-position, the principle could be operative, as this NP could be stored until a look-ahead to the post-position gave clue to its syntactic status. For large constitutents such as S’s with following complementizers, New Nodes simply is inoperative. Note, however, that such cases are not counter examples; New Nodes has the logical form of a conditional: If a grammatical function word occurs, it signals construction of a new phrase. It is possible now to consider a principle which is pervasive in application, and which is the first principle directly bearing on what short-term syntactic memory limitations 4.4
are.
Principle Four (Two Sentences): can be parsed at the same time
The constituents of no more than two sentences
The first pieces of supporting evidence for this principle come simply from considering pairs of sentences like (30a,b) and (31a,b) with respect to complexity. (30) a. That Joe left bothered Susan. b. That that Joe left bothered Susan surprised Max. c. That for Joe to leave bothers Susan surprised Max. (31) a. The boy the girl kissed slept. b. The boy the girl the man saw kissed slept. In processing both (30b, 3 1b) at some point the constituents of three different sentences must be held in memory. E.g., when the second that of (30b) is heard and recognized as a complementizer, the imputed structure is (30), where three unfinished sentences are being processed at once.
34
John Kimball
(30)
(30~) shows that repetition of ‘that’ does not here add noticeable complexity. Two Sentences provides the final principle explaining the difficulty of (Id). Part of the complexity of this sentence is due to the absence of wh-words to indicate the surface structure under the principle of New Nodes; part of the difficulty is that Right Association is violated; but the major difficulty here seems to be that the third sentence simply requires short-term memory space beyond the bounds of inherent capacity. Two Sentences is an attempt to state what that inherent capacity is. When the sentences in (30b) are nominatized, the result is much easier to parse, as in (32a).
my
cousin
(32b) shows a large left-branching structure which is, nevertheless, not difficult to comprehend and which is within one S. On the other hand, when a fourth sentence is added to (30b in 33a), the result is totally incomprehensible, while the nominalized version, (33b), is not nearly as bad.
Seven principles of surface structure parsing in natural ianguage
35
(33)
a. That that that Joe left bothered Susan surprised Max annoyed no one. b. Joe’s leaving’s bothering Susan’s surprising Max annoyed no one. One may conclude that the limitation is not on left branching, but rather on the number of Ss that must be processed at the same time. In a later discussion of semantic processing, I will discuss why this might be the case, and why it is permissible to string out relative clauses, embedded sentences, and prepositional phrases on the right of a sentence. The Two Sentences principle accounts also for why right branching relative clause structures are permissible, while center embedded structures are not. Consider a structure like (34). (34)
A
the
NPA 7-N dog
V I saw
FL NP
4A the cat
S,
NP I which
VP
V Iv chased IjP
L into .A
the mouse the&e
i!
S that Jack built
It may be thought that such a sentence violated Two Sentences because the top S is not finished until the last word of the bottom S is processed. To see why this is not SC, let us say that we will consider a constituent to be ‘closed’ when the last immediately dominated rightmost daughter of that constituent is introduced in the process of parsing . In this sense, the top S is closed when the VP is reached, and the same for the second S. S, wouldn’t be closed if, say, the sentential adverb frankly appeared sequentially after built and was to be attached to S,. Why this definition of closure is appropriate and justified in that a phrase is through being parsed when it is closed will be discussed below under the principle of Processing. First, however, it is necessary to consider the principle of Closure.
36
4.5
John Kimball
Principle Five (Closure): A phrase is closed as soon as possible, i.e., unless the next node parsed is an immediate constituent of that phrase
Closure explains in part the complexity of sentences like (Ic) ‘The boat floated on the water sank’. In such sentences, as soon as the end of a potential S is reached, it is closed, unless the next phrase is also part of that S. Thus, at the end of ‘the boat floated on the water’, the assumption is that the S is closed. The remaining causes of the perceptual complexity of (Ic) are accounted for below by Fixed Structure. Also, the increased perceptual complexity of a sentence like (35b) over that of (35a) is explained by Closure, as well as New Nodes and Fixed Structure. (35) a. They knew that the girl was in the closet. b. They knew the girl was in the closet. Without the complementizer to signal the embedded sentence (New Nodes), the sequence ‘they knew the girl’ would optimally be interpreted as an S (Closure) (modulo look-ahead), but when the later words were presented, that this assumption was incorrect would require a restructuring of the presumed tree, adding to complexity (Fixed Struture). Evidence for Closure derives from experiments performed by Chapin, Smith, and Abrahamson (1972). They found that clicks were attracted to preceding surface structure constituent boundaries, even when they do not mark breaks of surface clauses. In particular, when a click was placed between a preceding surface break and a following clause break, the tendency was for the click to be perceived as closer to the preceding boundary. The authors conclude from this that in imposing a parse tree on a sentence, a subject ‘attempts at each successive point to close off a constituent at the highest possible level. Thus, if a string of words can be a noun phrase, the subject assumes that it is a noun phrase and that the next element heard will be part of some subsequent constituent. Such a strategy would explain the strong preposing tendency observed in our experiment’ (p. 171). Closure has interacted closely with a number of the principles discussed above. In particular, it is not clear whether this principle is distinct from the principle of Right Association, for when the latter is violated, a terminal must be placed as a daughter of a node not the lowest, rightmost in the tree. Thus, this higher node must be ‘reopened’ to have a constituent added to it, contrary to the optimal situation described in Closure. That is, consider an abstract tree like (36). (36) B+E
In building
A aBch k such a tree, as soon as E was being built, Closure
would require
that A
Seven principles of surface structure parsing in natural language
37
be finished. However, when k is reached, A must be re-opened to receive the new constituent. In this sense, Closure operates the same as Right Association. However, I think that there are genuine cases where Closure operates which could not be covered by Right Association. For example, a phrase like (38a) should by Closure be interpreted more readily as (38b) rather than (38~) because of the tendency to close the phrase begun with old, even though Right Association predicts the opposite because by it the second phrase will be conjoined to the lowest NP available to it. (38) a. old men who have small annual pensions and gardeners with thirty years of service b.
NP NP-NP
A
Adj I old
oid
and-N
NP-S
‘i’ men
P
gardeners
N6
‘S
I men
b who...
gardeners
with . . .
with . . .
Bever (1970b) accounts for the difficulty in sentences like (lc) (‘The boat Floated on the water sank’) in terms of what he conjectures to be a general strategy of sentence perception. This principle (strategy B, p. 294) is that the first N . . . V . . . (N) sequence isolated in the course of parsing a sentence is interpreted as the main clause. This strategy is a particular case of Closure applied to sentences. Restated, it says that when an S node is ‘opened’ in the course of a parse, the first substring interpretable as an S (given some look ahead) will be so interpreted. In general, when a terminal string can be interpreted as an X-phrase, it will be. With Closure established, we can now turn to Fixed Structure. 4.6
Principle Six (Fixed Structure): When the last immediate constituent of a phrase has been formed, and the phrase E closed, it is costly in terms of perceptual complexity ever to have to go back to reorganize the constituents of that phrase
This principle explains the complexity of sentences like (39a,b), as explained above.
38
John Kimball
a. The horse raced past the barn fell. b. The dog knew the cat disappeared, was rescued. The principle is connected with the look-ahead capacities of the sentence analyzer. Part of the function of this assumed capacity is to prevent having to return to reorganize previously assigned constituents. For example, a sentence beginning with that could be continued in at least three different ways, in each of which that would be the initial constituent of different phrases (‘That 2+2=4 is nice’, ‘that boy sang’, ‘that is a big camel’). Thus, the initial tree built down to that by Top-Down will not be determined until succeeding terminals have been scanned. From Fixed Structure we can conclude that English is a look-ahead language. The scanned but unconnected terminals occupy a certain portion of short-term memory, but not much, in that the biggest restriction here seems to be on the number of S nodes held and being processed, and the allocation of storage space is more than made up for by efficiency of parsing. (39)
4.7
Principle Seven (Processing): When a phrase is closed, it is pushed down into a syntactic (possibly semantic) processing stage and cleared from short-term memory
By this principle when a chunk of the tree is finished (where by ‘a chunk’ is meant a node and all its immediate constituents), it is sent to processing. This principle requires the assumption that there are pointers in the processing unit to keep straight the original structure of the tree, but such devices are simple mechanisms of data organization and surely can be documented to occur in other kinds of data processing (for example, association) that humans perform. Under this assumption, consider how a sentence like (40) might be processed. At each stage, the contents of the processing unit (PU) are listed. (40) Tom saw that the cow jumped over the moon. a.
C.
b.
I Tom saw At this stage, e.g., there are pointers indicating that the first NP in the PU is that dominated by the S, and the VP dominated by the S is that currently still being worked on in short-term memory.
Seven principles of surface structure parsing in natural language
d.
39
PU NT.
t
S
Tom
“h’
NPGP
saw the
cow
‘.N+
p” YP Tom
/ vp
’
A NP VP
’
Y jumped
XNP
Nlp
l
Y saw
A that
the
cow
S
f.
i jumped
/% over
“I”A*x
NP
NP
Tom
VP
the moon Again,
pointers
keep straight
there are two chunks
the relations
of the form
NP
1
V I saw among
I
NP A that
.
NP th~owN{$p
H
S
these tree chunks
A , and by the pointers NP VP
in the PU. E.g., it is kept straight
which is matrix and which subordinate. Notice, in fact, that in a right branching structure the matrix sentence will always appear to the left in the PU of the embedded sentence. This suggests that the form of the tree on surface structure is relevant for the ease of its process in the PU. I assume, further, that at any point semantic information in the PU is available for current decisions being made in constructing the parse tree, as in the different possibilities for parsing in ‘They cooked the peas in the pot’ absent in ‘They rode the street in the car’. Notice that at any given moment during the parse, not much more than a single phrase of one or at most two levels is held in short-term memory, a result of the fact that we have chosen for an example a sentence with a simple right branching structure. Sentences with center embedding require that a great deal more structure be held, simply because the higher phrases are not closed until the lower phrases are closed. Left branching structures don’t present a problem, as each chunk of the tree is snipped off and placed in the PU as it is completed.
40
John Kimball
From the principle of Processing, it is possible to deduce and thus explain some of the principles discussed above, as follows: (a) Closure results from the fact that as soon as a phrase is completed, it is pushed into the PU and thus removed from short-term memory. The longer a phrase remains uncompleted, the more of STM it takes up as its pieces are assembled. (b) Fixed Structure follows from Processing because once a phrase is formed and pushed into the PU, then it should be difficult to reach down into PU, pull the phrase back out (plus all related phrases) to rework its (their) structure. (c) New Nodes is explained because the occurrence of a function word indicated when a new phrase is begun and thus when the old can be pushed into PU. (d) By Processing, what is held as STM are those phrases where have not been completed in the sense of having their immediate constitutents filled out. The statement of Two Sentences is thus made possible, for Processing establishes what is and is not in STM at any given time. That the specific limit should be two (versus one, three, four, etc.) does not follow from Processing. In this connection it is interesting to consider the hypothesis that the sentence is not only the unit of semantic processing, but also that of syntactic processing. Under such an hypothesis, the units which are formed in and cleared from STM are only S units. This hypothesis bears the same relation to Processing as Bever’s strategy B bears to Closure - namely, it is a specific case. At this time, I see no reason to prefer the restricted form over the mere general. I conjecture, then, that syntactically all phrases are treated alike in the process of establishing the surface tree. Semantically of course, S phrases occupy a special place. There is some evidence that the general Closure principle is correct - this same evidence seems to support Processing over the hypothesis considered above as Closure follows from Processing. If correct, this means that syntactically the unit of perception is the phrase - semantically the unit of perception is the sentence. This question deserves empirical investigation. It is interesting to consider which among the seven principles adduced above are universal and which particular to English. I would conjecture that Processing and all those principles that follow from it deductively are universal, that it is common to all perceptual routines for parsing surface structures that phrases are formed, closed, and pushed into a processing unit, and that the semantic manipulations occur in that unit. Likewise, the condition on memory limitations stated in Two Sentences is probably universal. Thus, except for Top Down, which is an assumption for which it would be difficult to accrue evidence, it seems at first glance that all the principles above are universal. On the other hand, it would be most productive to look at a language like Japanese, which is ‘backwards’ from English in its order of constituents in the base, to determine which sentences are perceptually complex and why. Notice, again, that none of the principles predicts that left branching structures per se, as are
Seven principles of surface structure parsing in natural language
41
found, e.g., in Japanese relative clauses, are difficult to parse. Center embeddings are difficult to process, as noted by everyone who has worked on the problem, and the principles above explain why. Notice that the explanation of complexity of surface structures offered above does not refer to the transformational ‘distance’ of a surface structure from its deep structure. That is, these principles refer only to the tree pattern and not to how closely this resembles the tree pattern that represents the semantic relations in deep structure. In this way, these principles go against some of the earliest work on sentence recognition that sought to explain the complexity of a surface form in terms of its transformational history, which has also influenced later writings. For example, Foss and Cairns (1970) write : ‘It seems reasonable to assume that the more the surface structure of a sentence distorts the grammatical relations in the structures underlying it, the more complex is the comprehension process and, hence, the more STS (short term storage) required to complete the task of understanding’ (p. 541). If the principles above are correct, then distance between deep and surface structure bears no relation to sentence perception. It may bear on sentence comprehension and the nature of the computing process in PU, but this is a different matter. So far I have said nothing concerning the internal structure of the PU, other than that its data input file consists of various tree chunks, plus an indication of how the pieces fit together. One model of the computations of the PU is that the surface tree is therein reformed, and transformations are applied ‘backwards’ to reconstruct a deep structure, which is then mapped into the meaning (under the identity mapping for a generative semanticist). I would conjecture that the sentence units of surface structure are reconstructed, being the basic units of comprehension. That is, in, say, ‘John was believed by Bill to have been seen by Susan’, part of the meaning is that John was believed and that John was seen.
5.
Transformations
and perceptual routines
Having a model of perceptual mechanism, it is now possible to discuss how various transformations and classes of transformations arrange the form of surface structure with respect to optimal perception. That is, one traditional explanation for the existence of transformations in natural language (Chomsky, 1965, chapter 1) is that they serve to arrange perceptually complex deep structures into perceptually simple surface structures. Given the definition of perceptual simplicity adduced above, it will now be possible to examine this claim in detail. Research in transformational grammar has resulted in the accumulation of an inventory of transformations. The transformations which have been discovered, how-
42
John Kimball
ever, seem to fall into two distinct classes, the cyclic versus the last-cyclic transformations, with respect to a number of properties (Kimball, 1972~). That this should be the case is in no way predicted by the general theory of transformations. The properties of transformations in these two classes are listed below.
(1) (2) (3) (4) (5) (6)
Cyclic
Last-cyclic
Preserve form of input structure Make no essential use of variables May have lexical exceptions Several may apply within one S Seem not to introduce structural ambiguities Apply working upwards in tree
Derange input structure May make essential use of variables No lexical exceptions Only one global transformation per S May introduce structural ambiguities Apply only on top S
The last-cyclic transformations themselves can be divided into two groups, according to whether a transformation is global, moving constituents over a variable, or local. For example, a global transformation would be Wh-Fronting which moves question words to the front of sentences from arbitrarily far to the right. Thus, (41a) becomes (41b) by this transformation. (41)
a. b. c.
He told you to ask Jill to go to the store to find wh-book? Wh-book he told you to ask Jill to go to the store to find? What book did he tell you to ask Jill to go to the store to find?
(41b) becomes (41~) by Subject-Verb Inversion, which is an example of a local lastcyclic transformation. As it turns out, the local transformations are essentially ordered with respect to some global transformations. Perhaps the most interesting of the properties differentiating cyclic from last-cyclic transformations is the first, that cyclic transformations preserve the form of the input structures, while last-cyclic transformations tend to distort structure. For example, Passive operates on a structure of the form NP V NP, and produces an output structure of essentially the same form. Dative maps the structure V NP NP into one of the same form. An operation like Equi NP Deletion deletes an NP in an embedded sentence, but the sentence in the cycle on which it operates is unchanged. Likewise Subject Raising operating on a sentence embedded in subject position of a matrix sentence results in extraposing the VP of the embedded sentence to the end of the VP of the matrix, as shown in (42a,b).
43
Seven principles of surface structure parsing in natural languages
(42)
a.
b.
S
NP-
VP
I
A NP I
I she
seem
_H-% V I seems
A V-
she
NP&
I
to
ADJ
I
V”
I
be
pretty
I
be
ADJ I
pretty
(The to is the remains of a for-to complementizer.) Notice, however, that the form of the top S remains unchanged. Thus, the input structures to cyclic transformations are preserved across the operation of these transformations. On the other hand, global as well as local last-cyclic transformations result in the production of structures unlike the input structures, structures which are also quite unlike those produced by the base rules of the grammar. For example, Wh-Fronting produces a structure shown in (43), leaving a hole where the wh-NP was extracted. (43) A wh-NP
S
NP
VP
Extraposition moves sentences to the right, producing a structure like (44b) from (44a).
(44)
a. A ‘;“’ dS 2+2=4
VAN, I
amused
I
Susan
b.
am:sed
SuLl It is interesting now to consider the effect of the transformations of these various classes on the tree with respect to the principles of surface structure parsing. The cyclic transformations generally leave behind a tree with the same or less complexity
44
John Kimball
than the input tree. E.g., the perceptual complexity of a passive sentence is not discernably different from that of the active. (It is true that passive sentences tend to be remembered as actives over long-term recall, but this can be taken to be a function of the fact that these sort of memory processes are based on semantics rather than that a passive is more difficult to parse than an active.) The output tree of Subject Raising in subject position, (42b), is simpler perceptually than the input, (42a). On the other hand, the effect of the operation of last-cyclic transformations requires some scrutiny. The local last-cyclic transformations have little effect. The global last-cyclic operations are best considered by dividing them into two classes, those that move constituents to the right, and those than move constituents left, which we may label right global last-cyclic (RGLC) and left global last-cyclic (LGLC) for convenience. Some RGLC’s are listed below: (45) a. Extraposition from NP the boy who was tall left + the boy left who was tall b. Extraposition of PP a review of this book will be coming out + a review will be coming out of this book c. Extraposition (44a) + (44b) d. Right Dislocation Joe gave the book that was about ducks to Susan + Joe gave it to Susan, the book that was about ducks e. Heavy NP Shzjii (discussed in Section 4) he asked the girl with the bright blouse to leave + he asked to leave the girl with the bright blouse All of these transformations hang constituents out on right branches. They all may simplify the sentence in terms of principles like Right Association and Closure, for constituents internal to a sentence are made ‘lighter’, permitting closure to occur in general earlier. The relation of Heavy NP Shift to Right Association was discussed earlier. Note, however, that blind application of these rules does not in every case lead to a perceptually simpler sentence. E.g., (46a,b). (46) a. He told the girl with the blonde eyelashes to go to the bank to ask the clerk to remove $100 from their account. b. He told to go to the bank to ask the clerk to remove $100 from their account the girl with the blonde eyelashes. Finally, notice that all RGLC transformations except Heavy NP Shif leave behind some place marker in the tree. That is, although some constituent is moved, a pronoun or some lexical material remains to mark the place of the removal, and this is generally not true of LGLCs. In terms of Processing, this does not add to the complexity. For by the time the extraposed material is reached and parsed, the original material will
Seven principles of surface structure parsing in natural language
45
most likely be in PU, and a pointer can be attached to the extraposed constituent assigning it to that place in the PU. Let us consider next some operations that move constituents to the left. (47) a. Topicalization (moves an NP to the front of the main S) Joe told Martha to ask Susan to test the bagel for Will + the bagel Joe told Martha to ask Susan to test for Will b. Wh-Fronting (discussed above) c. Relative clause formation Joe spanked the child Bill had seen Betty kiss wh-child -+ Joe spanked the child which Bill had seen Betty kiss d. Left Dislocation (Like Topicalization, except leaves a pronoun) Joe gave the book to Sally + the book, Joe gave it to Sally. Notice now that all these operations except (47d) leave no place marker to indicate the spot of removal. This may be an accident; however, from the point of view of the principles of perception none is required. How it is that the moved material is assigned the correct place for semantic analysis is a problem solved in the PU. One could imagine the moved constituents as being placed in a special category in PU awaiting the first possible pointer assignment to a spot in the tree as it is parsed. But the difficulty in discovering the surface constituents of the tree is not increased by these operations that move elements leftwards. It could be conjectured that the RGLC rules leave a marker to indicate a place in the tree so that the extraposed material need not be assigned as a new constituent under some node already in PU. That is, with Extraposition, for example, a place is opened and held for the extraposed sentence by the it. Without some marker, to correctly appoint the encountered extraposed sentence to its place, a new NP would have to be entered under the S and a pointer assigned to that place. This would violate the principle of Fixed Structure, for some structure that had been placed in PU would have to be altered. Thus it is that it is possible to explain this property of RGLC rules with respect to the operation of rules of perception. In summary, the cyclic transformations either effect no major change in structure from the point of view of perceptual complexity, or, in the case of Subject Raising, may operate to hang material on right branches. A right branching tree is not difficult to parse, as predicted by Right Association and Closure. As pointed out to me by DeRemer, a right associative structure is easier for a top-down mechanism which is predictive. On the other hand, left associative structure is much easier for a bottom-up parser. One may conjecture that languages such as Japanese with characteristic left branching employ a mixed strategy of bottom-up and top-down parsing. RGLC transformations hang material on right branches, simplifying the tree. The fact that these transformations leave markers behind in the tree to indicate the spot
46
John Kimball
of removal material to a tree that does require but this is complexity
is significant; an empty place remains in the tree for the extraposed be reassigned in the PU. Finally, LGLC transformations do not produce is perceptually more complex than the input structure. Their operation that the moved material be located back to the original place in the tree, evidently performed in the PU, and so does not add to the perceptual of the surface structure.
REFERENCES
Bever, T. G. (1970a) The influence of speech performance on linguistic structure. In G. B. F. d’Arcais and W. J. M. Levelt (Eds.) Advances in psycholinguistics. New York, American Elsevier. Bever, T. G. (1970b) The cognitive basis for linguistic structures. In J. R. Hayes (Ed.) Cognition and the development of language. New York, John Wiley. Brown, R. and Bellugi, U. (1964) Three processes in the child’s acquisition of syntax. Harv. educ. Rev., 34 (2), 133-151. Chapin, P. G., Smith, T. S., and Abrahamson, A. A., (1972) Two factors in perceptual segmentation of speech. J. verb. Learn. verb. Beh., 11, 164-173. Chomsky, N. A. (1957) Syntactic structures. The Hague, Mouton. Chomsky, N. A. (1965) Aspects ofthe theory of syntax. Cambridge, M.I.T. Press. Chomsky, N. A. and Miller, G. (1963) Introduction to the formal analysis of natural languages. In R. D. Lute, R. R. Bush and E. Galanter (Eds.) Handbook of mathematical psychology, vol. 2. New York, Wiley. De Remer, F. (1971) Simple LR(k) grammars. Communications of the ACM, 14,453-460. Fodor, J. A. and Garrett, M. (1967) Some syntactic determinants of sentential complexity. Pert. Psychophy.
Foss, D. J. and Cairns H. S., (1970) Som effects of memory limitation upon sentence comprehension and recall. J. verb. Learn. verb. Beh., 9, 541-547. Hakes, D. T. (1972) Effects of reducing complement constructions on sentence comprehension. J. verb. Learn. verb. Beh., 11,278286. Hankammer, J. (1973) Unacceptable ambiguity, Ling. Znq., 4 (1). Kimball, J. (1972a) The formal theory of grammar. New Jersey, Prentice-Hall. Kimball, J. (1972b) The modality of conditions. In J. Kimball (Ed.) Syntax and semantics, vol. 1. New York, Seminar Press. Kimball, J. (1972c) Cyclic and linear grammars. In J. Kimball (Ed.) Syntax and semantics, vol. 1. New York, Seminar Press. Knuth, D. E. (1965) On the translation of languages from left to right. Information and Control, 8. McKeeman, W. M., Horning, J. J., and Wortman, D. B. (1970) A compiler generator. New Jersey, Prentice Hall. Ross, J. R. (1967) Constraints on variables in syntax. M.I.T. Dissertation. Ross, J. R. (1967b) Auxiliaries as main verbs. Mimeo.
Seven principles of surface structure parsing in natural language
11 existe dans la grammaire generative une distinction traditionnelle entre l’acceptabilite d’une phrase, qui appartient au domaine de la et la grammaticalite dune performance, phrase, qui appartient au domaine de la competence. Le but de l’article est de foumir une caracterisation de la notion de ‘phrase acceptable’ en anglais et de suggerer comment cette caracterisation pourrait avoir une portee universelle. La procedure consiste a dormer une serie de procedures qu’on pense dtre operationnelles pour assigner I’arbre de la structure de surface a une phrase d’entree. Ces principes d’analyse sont partiellement inspires
47
par les formules utilisees dans les langages d’informatique. Ces principes qui expliquent la grande acceptabilite de structures de derivation a droite, mettent en evidence le role des mots grammatico/fonctionnels dans la perception des phrases, decrivent ce qui semble etre une limite tke de la memoire a court terme dam le traitement linguistique et permettent de faire l’hypothbe sur la structure des mecanismes intemes dans le traitement de la syntaxe. Entin sont disc&% les differentes classes de transformation que I’on peut utiliser pour preparer les structures profondes comme input a des procedures d’analyse.
2
The use of concrete
and abstract concepts by children and adults*
R. MILLER University of Witwatersrand
Abstract
The general aim of the present study was to test the hypothesis that the younger the child the more perceptual/concrete are the concepts used. Two questions wereposed. Firstly, is there a diflerence between children and adults in using both concrete and abstract concepts as opposed to only one kind of concept? Secondly, is there a dtyerence between children and adults in using either concrete or abstract concepts for the$rst of two diferent kinds (concrete or abstract) of concepts used? Equivalence tasks of a forced-choice type were employed to test the use of concrete and abstract concepts. Only in a minority of cases were significant differences obtained between children and adults regarding (a) the use of both concrete and abstract concepts and, (b) the-first of two different kinds of concepts used.
Introduction The work of Bolles (1937), Goldstein and Scheerer (1941), Welch (1940), and Werner (1948) led Sigel (1953) to derive the hypothesis that the younger the child the more perceptual are his organizations. This hypothesis has more recently been incorporated into Bruner’s theory of cognitive development (1966). In the typical experiments conducted by Bruner and his co-workers, subjects at various ages have been required to group or classify different objects on the basis of some similar attribute or property common to all the objects. Such a task is referred to as an equivalence task, and the concept used, as an equivalence concept. The core findings in Bruner’s experiments are
* This paper is based on part of a thesis submitted for the degree of M.A. to the University of Witwatersrand. The work was carried out
under the supervision of Professor J. W. Mann to whom the writer expresses his gratitude.
Cognition 2(l), pp. 49-58
50
R. Miller
those reported by Olver and Hornsby on the development of equivalence concepts. They found that six-year old children group objects according to perceptible properties to a greater extent than older children, and that with increasing age there is a steady increase in functionally based equivalence. Bruner’s central thesis is that the child uses different modes of representing the world at various stages of development. As a consequence of ikonic representation, the young child groups objects according to perceptual attributes whereas the older child, using symbolic representation, uses more abstract attributes. It was with a view to testing some of Bruner’s assertions that the present study was designed. Although it is by no means a replication of Bruner’s work, neither in breadth nor depth, similar but not identical equivalence tasks are employed and performance on these tasks assessed according to similar criteria. In this connection, the following definitions are employed. A concrete concept is defined as a classification based on an observable attribute common to a group of objects, e.g. colour, shape, etc. An abstract concept is defined as a classification based on a feature common to a group of objects but requiring an inference of some kind, e.g. function. These definitions are based on Bruner’s notion of ‘going beyond the information given’ (1957) and are in accordance with the criteria used in his experiments. It is necessary to clarify one further term. The present experiment is concerned with the use of concepts by children and adults. This should not be confused with the formation or attainment of concepts. When discussing the use of concrete and abstract concepts, it is necessary to distinguish between two variables. Garner (1966) has expressed this as the difference between what a person ‘can do’ and ‘does’. More specifically, Price-Williams (1962) has pointed out that when prodded children use different kinds of concepts than when left to their own devices. It would appear, then, that three steps are necessary in investigating the use of concrete and abstract concepts. Firstly, to establish whether subjects of different ages do use both concrete and abstract concepts when given the opportunity to do so. This satisfies the ‘can do’ variable. Secondly, to establish whether subjects of different ages, using both kinds of concepts, differ with regard to the first of two different kinds of concepts used. This satisfies the ‘does’ variable. The results of both steps could lead to the assertion that children are more perceptual than adults, but the theoretical implications in each case would be different. The former situation could be regarded as a strong form of the assertion and the latter as a weak form. The third step involves an investigation of the nature of any differences which may be found in steps one and two, in terms of whether these differences are due to a greater or lesser use of concrete or abstract concepts by children and adults. In the present experiment the equivalence tasks were ideally of a forced-choice type with the materials being selected to facilitate two alternative equivalence classifications
The use of concrete and abstract concepts by children and adults
51
and the subjects required to classify each set of objects twice. The experiment was designed to answer the following two questions which represent the first two steps mentioned above. 1. In a task, in which a set of objects may be formed into sub-sets, using either concrete or abstract concepts, is there a difference between children and adults in using both kinds of concepts to form these sub-sets? 2. Do children and adults, using both concrete and abstract concepts to form two sub-sets, differ with respect to the kind of concept used to form the first sub-set? Method
Design
Question 1: The variables relate, on the one hand, to children and adults and, on the other, to the use of two different kinds of concepts (concrete AND abstract) and the use of only one kind of concept (concrete OR abstract). A 2 x 2 design was utilized. The two column categories of Table 2 refer respectively to the use by a subject of two different kinds of concepts and one kind of concept. The two row categories relate to children and adults. Question 2: The variables relate to age and the kind of concept first used by those subjects who used both concrete and abstract concepts in forming two equivalence classifications. A 2 x 2 design was employed. The two column categories of Table 3 refer respectively to concrete concepts that were used first, and abstract concepts that were used first. The two row categories refer to children and adults. Subjects
Forty-five children were randomly selected from the grade 1 pupils at two Johannesburg primary schools, the mean age for the sample being 6 years 6 months. Forty-five students were randomly selected from the first-year psychology students at the University of the Witwatersrand, the mean age for the sample being 19 years 2 months. Materials
Eight sets of four objects each comprised the test materials. In most cases real objects were used but where this proved impractical miniature toy models were used (see Table 1). These sets of objects were so constructed that by removing one of the four objects, for each set, the remaining three would constitute a sub-set in terms of either a
52
R. Miller
concrete or abstract equivalence concept. The eight sets of objects and the concrete and abstract sub-sets, made possible by the removal of one of the four objects, are given in Table 1. In addition, the appropriate equivalence concept is indicated in brackets.
Table
1.
Materials comprising the sets and sub-sets
Sets Complete Set 1
2
3
4
5
Concrete Sub-Set Example
Abstract Sub-Set
Banana Orange
Orange
Banana Orange
Ball Plum
Ball Plum
Saw Peg Pliers Hammer
Saw Peg Hammer
Stool Cupboard Dresser Camel
(Toy) (Toy) (Toy) (TOY)
Spoon Ball point pen Foot-long nencil Mapping-p-en
Stool Dresser Camel Spoon Ball point pen Mapping pen
Blue aeroplane (Toy) Red ship (Toy) Red ship Red pen Red pen Red car (TOY) Red car
Circular shape
Made partly of wood
Four legs
Saw Pliers Hammer Stool Cupboard Dresser -
Size - all exactly same length Ball point pen Foot-long pencil Mapuing ven Red colour
6
Colander Colander Shape - Handle Tennis racquet (Toy) Tennis Racquet with a round Ball shape Round wooden bat Round wooden bat
7
Suitcase Two books Two coins Two pens
Two books Two coins Two pens
Small round plate Small square plate Record Large round plate
Small round plate Record Large round plate
8
PlUm
Number-duality
Circular shape
Blue aeroplane Red ship Red car
Example Fruit, edible, etc.
Tools, work, etc.
Furniture, etc.
Writing, etc.
Vehicles, transport, etc.
Sport play, etc. Tennis Racquet Ball Round wooden bat Suitcase Two books Two pens
School, study, etc.
Small round plate Small square plate Eating, etc. Large round plate
The use of concrete and abstract concepts by children and adults
53
Although the sets were constructed to yield only two possible sub-sets, it was decided prior to the experiment that, in the event of sub-sets being formed which were not anticipated by the experimenter, they would be judged on their merits. All subjects, irrespective of the nature of the sub-sets formed, were asked to furnish reasons for excluding a particular object. The four objects in each set were randomly ordered. The administration of each set with the given materials constituted a test. Procedure The subjects were tested individually and the materials were presented in the same order for each subject. All the subjects received the same instructions which were as follows: ‘I am going to show you four things. Three of these things are the same and one isn’t. I want you to take away the thing that you think doesn’t belong. This sounds very easy but it isn’t really and I’ll show you why. Lets look at these four things. (Blue triangle, blue triangle, green triangle, blue circle - all made of cardboard). These (pointing at the triangles) are all the same shape so we can take away the circle because it doesn’t belong. But these three (pointing at the two blue triangles and blue circle) are also the same because they have the same colour, so we can take away this one because it doesn’t belong. Do you understand? Now each time I show you four things I am going to let you have two turns. First, you must take away the thing you think doesn’t belong and then we’ll put it back and you must try and think of something else that doesn’t belong. Let’s do one more together before we start. Look at these four things (cup, mug, biscuit cutter, glass). These three (pointing at the cup, mug and glass) are the same because we can drink from them, so we take away this one (biscuit cutter). But these three (pointing at the cup, mug and biscuit cutter) are also the same because they all have handles, so we can take away this one (glass). Do you understand what to do?’
RMlltS
Question 1 The observed frequencies of children and adults using two different kinds of concepts and one kind of concept are provided in Table 2. The results for each of the eight sets of materials were separately analyzed, using a chi-square test, corrected for continuity. It was decided to reject the null hypothesis at, or beyond, the 0.05 level of significance, and in all cases two-tailed tests were used. Significant differences (p <.05) between children and adults were obtained only on Tests 1, 5, and 6.
54
R. Miller
Table 2.
Number and kinds of concepts used by children and adults for each set of materials
Sat of materials
Subjects
Concepts used Two diff. kinds
One kind
Total
1
Children Adults
30 45
15 0
45 45
75
15
90
31 28
14 17
45 45
TOTAL
59
31
90
Children Adults
29 37
16 8
45 45
TOTAL
66
24
90
Children Adults
38 41
7 4
45 45
TOTAL
79
11
90
Children Adults
26 45
19 0
45 45
i TOTAL 2
4
5
8
Children Adults
TOTAL
71
19
90
Children Adults
33 45
12 0
45 45
TOTAL
78
12
90
Children Adults
30 37
15 8
45 45
TOTAL
67
23
90
Children Adults
43 42
2 3
45 45
85
5
90
Children Adults
260 320
loo 40
360 360
TOTAL
580
140
720
TOTAL TOTAL
With regard to the category ‘one kind of concept’, a distinction can be made between subjects using the same kind of concept twice, by means of using a legitimate equivalence concept in for-ming a sub-set not anticipated by the experimenter, and subjects using only one concept, being unable to form a second sub-set at all. The results for Tests 1, 5, and 6 are shown in Figure 1. Inspection of this figure suggests clear differences between the children and the adults for the three tests in question.
The use of concrete and abstract concepts by children and adults
Figure
1.
55
Graphic representation of the number of children and adults using two different kinds of concepts, two of the same kind of concept, and only one concept, for the sets of materials yielding significant differences 50-
------*-.--._._ ---___--
40 -
test 1. adults test 5. adults test 6. adults
30 NUMBER
test 1. children test 5. children test 6. children
OF
CHlLfbREN AND
ADULTS
20 -
10 *
0-
I two same klnds
I two dlfferent kinds CONCEPTS
L only one
USED
For Test 1, it would appear that the overall difference between the children and adults was mainly due to a greater use by the children of two of the same kind of concepts (see Fig. 1). For Tests 5 and 6 the differences appear to be due mainly to the failure of the children to use more than one concept (see Fig. 1). Even though the numbers involved are small and no statistical technique is available to determine precisely the nature of the differences between the children and adults, to ignore this information would serve to obscure some important psychological issues which will be discussed shortly. Question 2 The number of children and adults, for each of the eight sets of objects, using concrete or abstract concepts in forming the first of two sub-sets, is provided in Table 3. For
56
R. Miller
each of the eight tests, the data were analyzed by means of chi-square, corrected for continuity. A significant difference between the children and adults was obtained only for Test 7 (p <.05). Table 3.
The kinds of concepts used by children and adults to .form the first of two sub-sets for each set of materials Concepts used
Sets of materials 1
1 Subjects I Children
Concrete
Abstract
Total
15 22
15 23
30 45
37
38
75 ._
Children Adults
4 2
27 26
31 28
TOTAL
6
53
59
0 2
29 35
29 37
TOTAL
2
64
66
1 Children Adults
6 2
32 39
38 41
TOTAL
8
71
79
Children Adults
8 5
18 40
26 45
TOTAL
13
58
71
I Children Adults
10 15
23 30
33 45
TOTAL
25
53
78
Children Adults
14 5
16 31
30 36
TOTAL
Adults 1 TOTAL 2
3
1 Children Adults
4
5
6
7
8
TOTAL
19
47
66
Children Adults
3 6
40 36
43 42
TOTAL
9
76
85
60 59
200 260
260 319
119
460
579
Children Adults TOTAL
Discussion The results for Question 1 indicate that only in some cases do children fail to use both concrete and abstract concepts. This finding seems to negate a theory based on develop-
The use of concrete and abstract concepts by children and adults
57
mental modes of representation. The fact that significant differences were obtained for only three of the eight sets of materials seems to imply that it is not a particular mode of representation which is responsible as this should produce more widespread differences. Furthermore, the differences obtained for two of the three significant results (Tests 5 and 6) seemed to be due to the fact that there was a failure to form a second sub-set. This, in turn, could be related to the intrinsic nature of the test materials, some of which may have been more difficult to form into sub-sets than others. For example, the concrete attribute of shape in Test 6 (handle plus round shape) may be more difficult than that in Test 1 (round shape). Similarly, the abstract attribute of transport in Test 5 may be more difficult than that of edible in Test 1. But the difficulty in forming a second sub-set was not necessarily related to an inability to use an abstract concept. Of the children failing to form a second sub-set, 13-14 on Test 5, and 7/9 on Test 6, used an abstract concept to form the first sub-set. This indicates that the differences obtained with these two sets of objects were largely a result of failing to use a concrete concept in forming a second sub-set. Only on Test 1 did the difference appear to be due to children using the same kind of concept twice, to a greater extent than adults. This particular result bears a relation to the notion of rigidity, largely attributable to Goldstein and Scheerer (1941), in which it is assumed that children have difficulty in switching from one kind of concept to another. But apart from Test 1, the other two significant results suggest that rigidity is not a major factor in the context of this experiment because the children did not find difficulty in switching from one kind of concept to another. The results of Question 1 seem to negate the strong form of the hypothesis regarding developmental modes of representation. But it is possible that a particular mode of representation may operate as a weak determiner of the kind of concept used, influencing only the immediate or spontaneous use of a concept, but not necessarily limiting the kinds of concepts which may be used. However, the fact that children and adults, using both kinds of concepts, did not differ significantly on their first concepts, on seven of the eight tests, is not favourable even to the weak form of the hypothesis. Although the two questions posed were not directly concerned with the nature of the concepts used by children and adults, but rather with the difference between them, the results for both questions are interesting in the light of the hypothesis that ‘the younger the child the more perceptual are his organizations’. Of the 360 concepts used by all the children, over the eight sets of materials, on the first correct trial, 267 (74 %) were abstract; and of the 360 concepts used by all the adults, 300 (83 %) were abstract. Only with Test 1 did the children use more concrete than abstract concepts on the first trial, whereas the adults used 22 concrete and 23 abstract concepts on this trial. Similarly, of the 260 concepts used on the first trial, by children who used two different kinds of concepts, 200 (77%) were abstract; and of the 319 concepts used likewise by
58
R. Miller
the adults, 260 (82 %) were abstract. The results, in general, suggest that children are not necessarily more perceptual than adults and that even when differences between them do occur, they are not related to a greater use of abstract concepts by adults. In conclusion, the limited differences between children and adults and the predominance of abstract concepts used by all subjects are inconsistent with Bruner’s theory of cognitive growth. Given the obvious limitations of the study, it would appear that an unqualified acceptance of the view that children are more perceptual (or concrete) than adults is unwarranted.
REFERENCES Belles, M. M. (1937) The basis of pertinence: A study of the test performance of aments, dements and normal children of the same mental age. Archives of Psychology, No. 212, New York. Bruner, J. S. (1957) Going beyond the information given. In The Colorado symposium. Contempory approaches to cognition. Cambridge, Harvard University Press. Bruner, J. S., Olver, R. R., and Greenfield, P. M., et al. (1966) Studies in cognitive growth. New York, Wiley. Garner, W. R. (1966) To perceive is to know. Amer. Psychol., 21, 11-19. Goldstein, K. and Scheerer, M. (1941) Abstract and concrete behavior. An experimental
study with special tests. Psychol. Mono., 53, No. 2. (whole No. 239). Price-Williams, D. R. (1962) Abstract and concrete modes of classification in primitive society. Brit. J. educ. Psychol., 32, 50-61. Sigel, I. E. (1953) Developmental trends in the abstraction ability of children. Child Devel., 24, 131-144. Welch, L. (1940) The genetic development of the associational structures of abstract thinking. J. genet. Psychol., 56, 172-206. Werner, H. (1948) Comparative psychology of mental development (Rev. Ed.). Chicago, Follett .
Rbumk Deux questions sont formul&s dans cet article: Tout d’abord, y a-t-i1 une difference entre la facon dont I’enfant et I’adulte utilisent a la fois des concepts concrets et abstraits, par opposition a l’utilisation dun seul type de concept? Deuxiemement, lorsqu’ii s’agit d’utiliser des concepts concrets et abstraits dans une meme phrase, est-ce que I’ordre dans lequel ces concepts sont present& est fonction de l’Lge
des sujets (enfant ou ad&e)? Afin de tester l’utilisation des concepts concrets et abstraits, on employa une methode d’equivalence de type choix-for& Qu’il s’agisse (a) de l’utilisation de concepts a la fois concrets et abstraits, et (b) du premier de deux differents types de concepts utilises, des differences notables entre adultes et enfants ne furent constates que dans une minorite de cas.
On the evolution of language: A unified view*
PHILIP
LIEBERMAN
University of Connecticut, Storm
Abstract Language Fan be operationally, defined a,s-a~commun$ations. system that.permits -the exchange of new, unaiticipated information, Dlyerent forms of linguage appear to have been present in earlier stages of hominid evolution. Human language is unique, at the present time, since it makes use of ‘encoded’ speech to achieve a rapid transfer of information. The supralaryngeal vocal tract of modern Homo sapiens is a useful jtictor in this encoding process which also involves special neural mechanisms. Other factors like cognitive ability and ‘automatization’ are also necessary for language. Those factors are, however, important for many aspects of human and non-human behavior besides language. The evolution of language appears to have been a gradual process that first led to systems that relied on mixed gestural and vocal communication. Some hominids appear to have retained this system until comparatively recent times. Other hominids appear to have placed a greater reliance on vocal communication. Reconstructions of fossil supralaryngeal vocal tracts show that some forms, Australopithecines and ‘classic’ Neanderthal, lacked the supralaryngeal vocal tract that is necessary for the production of ,fully encoded human speech. Other fossil forms, Steinheim and Es-Skhiil V, had ,functionally modern vocal tracts. Others, like Broken Hill, represent intermediate forms. The evolution of human language can be viewed as a three stage process that involved (a) increased reliance on vocal communication in activities like hunting, (6) the enhancement of the vocal repertoire with the evolution of the human supralaryngeal vocal tract which produces acoustic signals that are both more distinct and more resistant to articulatory errors, and (c) the evolution of neural mechanisms that made use of the preadapted properties of the supralaryngeal vocal tract for rapid encoded speech communication. * Paper prepared for the Ninth International Congress of Anthropological and Ethnological
Sciences, Chicago, U.S.A.,
September 1973.
Cognition 2(I),
pp. 59-94
60
Philip Lieberman
I shall attempt to develop a unified theory for the evolution of human language in this paper. Though this theory centrally involves the comparative, ontogenetic, and evolutionary studies of speech production with which I and my colleagues are closely identified, it also crucially involves the consideration of other recent, and not so recent, studies of cognitive ability in non-human primates: Hunting, bipedal posture, the neural correlates of auditory perception, visual perception in adult and infant humans, speech perception in humans, play activity, gesture, etc. In short, I shall attempt to synthesize a great deal of data into what I hope is a coherent theory. I somewhat redundantly stress that this will hopefully result in a theory that is testable. Like all theories it cannot account for everything. This theory does, however, appear to ‘explain’ and relate a number of phenomena that otherwise appear to be quite unrelated. It, moreover, appears to point to a coherent evolutionary process that relates the communications systems of other animals to human language. It most importantly points out a number of questions that can be resolved through controlled experiments and careful observations. I have drawn on a number of seemingly disparate ethological, anatomical, psychological and anthropological sources because I think that it is obvious that there is no single factor that is, in itself, responsible for the evolution of human language. Evolution is a complex process that inherently involves all aspects of the life cycle and environment of a species and its relationships to other species. Though particular factors like, for example, gestural communication (Hewes, 1971), undoubtedly had an important role in the evolution of human language, no single factor can, in itself, provide, as it were, the ‘central key’ to the puzzle. Everything depends on everything else and the interaction is the ‘crucial’ factor if anything is. Gestural communication, for example, depends on the prior existence of visual pattern recognition, analysis, cognitive ability, and bipedal posture. Visual pattern identification probably depends, in turn, on natural selection for visual ability in an arboral environment. Bipedal posture, in turn, probably again depends on prior selection for brachiation in an arboral environment (Campbell, 1966). Note that I am not saying that we cannot analyze the factors that underly the evolution of human language. I am proposing that the process involved many factors. One of these factors appears to be the process of ‘preadaptation’, that is, natural selection channeled development in particular directions because of previous modifications selected for some other role. Darwin’s (1859) comments concerning the evolution of the lung from the swim bladder perhaps constitute one of the first and most convincing examples of preadaptation. Let me begin by listing the evolutionary factors that I will discuss in this present paper. There probably are more factors, but I propose that these are the central factors in the evolution of human language. I shall order the factors in terms of their
On the evolution of language: A unljied view
61
probable role in differentiating the language of modern man from progressively earlier hominids and other animals. In other words, I shall first list the factors that I think were most important in the late stages of human evolution and proceed to factors that probably were more important in earlier stages. It is important to note that I am not categorically differentiating human language, i.e., the language of present day Homo sapiens, from other languages, e.g., the possible language of present-day chimpanzees. Linguists have been somewhat anthrocentric in defining language to be necessarily human language. I will define a language to be a communications system that is capable of transmitting new information. In other words, I am operationally defining language as a communications system that places no inherent restriction on the nature or quality of the information transferred. It is obvious that this definition does not require that all languages have all of the properties of human language.
1.
Factor 1 - Specialized
speech encoding and decoding
Modern man’s communications achieve a high rate of transmission speed by means of a process of speech encoding and decoding. The rate at which meaningful sound distinctions are transmitted in human speech is about twenty to thirty segments per second. That is, phonetic distinctions that differentiate meaningful words, e.g., the sounds symbolized by the symbols [b], [z], and [t] in the word bat, are transmitted, identified, and sorted at a rate of twenty to thirty segments per second. It is obvious that human listeners cannot simply transmit and identify these sound distinctions as separate entities. The fastest rate at which sounds can be identified is about seven to nine segments per second. Sounds transmitted at a rate of twenty per second indeed merge into an undifferentiable ‘tone’. The linguist’s traditional conception of phonetic elements comprising a set of ‘beads on a string’ clearly is not correct at the acoustic level. How, then, is speech transmitted and perceived? The results of the past twenty years of research on the perception of speech by humans demonstrated that the individual sounds like [b], [re], and [t] are encoded, that is, ‘squashed together’, into the syllable-sized unit [b ae t] (Liberman et al., 1967). A human speaker in producing this syllable starts with his supralaryngeal vocal of [b]. He, tract, i.e., his tongue, lips, velum, etc., in the positions characteristic however, does not maintain this articulatory configuration but instead moves his articulators towards the positions that would be attained if he were instructed to maintain an isolated, steady [re]. He never reaches these positions, however, because he starts towards the articulatory configuration characteristic of [t] before he ever reaches the ‘steady state’ (isolated and sustained) vowel [z]. The articulatory gestures
62
Philip Lieberman
that would be characteristic of each isolated ‘sound’ are never attained. Instead the articulatory gestures are melded together into a composite, characteristic of the syllable. The sound pattern that results from this encoding process is itself an indivisible composite. Just as there is no way of separating with absolute certainty the [b] articulatory gestures from the [me]gestures (you can’t tell exactly when the [b] ends and the [a] begins), there is no way of separating the acoustic cues that are generated by these articulatory maneuvers. The isolated sounds have a psychological status as motor control or ‘programming’ instructions for the speech production apparatus. The sound pattern that results is a composite, and the acoustic cues for the initial and final consonants are largely transmitted as modulations imposed on the vowel. The process is, in effect, a time-compressing system. The acoustic cues that characterize the initial and final consonants are transmitted in the time slot that would have been necessary to transmit a single isolated [ae] vowel. The human brain decodes, that is, ‘unscrambles’, the acoustic signal in terms of the articulatory maneuvers that were put together to generate the syllable. The individual consonants [b] and [t], though they have no independent acoustic status, are perceived as discrete entities. The process of human speech perception inherently requires ‘knowledge’ of the acoustic consequences of the possible range of human supralaryngeal vocal tract speech articulation (Liberman et al., 1967 ; Lieberman, 1970, 1972). The special speech processing involved appears to crucially involve the dominant hemisphere of the human brain (Kimura, 1964; Shankweiler and Kennedy, 1967; Liberman et al., 1967). We will discuss the process of human speech perception in more detail with respect to its interrelation with the anatomy of the human vocal tract. For the moment, we will note that the special neural devices necessary for the ‘decoding’ of human speech may be comparatively recent evolutionary acquisitions.
2.
Factor 2 - Special supralaryngeal
vocal tract anatomy
Modern man’s speech-producing apparatus is quite different from the comparable systems of living non-human primates (Lieberman, 1968; Lieberman et al., 1969; Lieberman et al., 1972b). Non-human primates have supralaryngeal vocal tracts in which the larynx exits directly into the oral cavity (Negus, 1949). In the adult human the larynx exits into the pharynx. The only function for which the adult human supralaryngeal vocal tract appears to be better adapted is speech production. Understanding the anatomical basis of human speech requires that we briefly review the source-filter theory of speech production (Chiba and Kajiyama, 1958; Fant, 1960). Human speech is the result of a source, or sources, of acoustic energy being filtered
On the evolution oj’language: A unified view
63
by the supralaryngeal vocal tract. For voiced sounds, that is, sounds like the English vowels, the source of energy is the periodic sequence of puffs of air that pass through the larynx as the vocal cords (folds) rapidly open and shut. The rate at which the vocal cords open and close determines the fundamental frequency of phonation. Acoustic energy is present at the fundamental frequency and at higher harmonics. The fundamental frequency of phonation can vary from about 80 Hz for adult males to about 500 Hz for children and some adult females. Significant acoustic energy is present in the harmonics of fundamental frequency to at least 3000 Hz. The fundamental frequency of phonation is, within wide limits, under the control of the speaker who can produce controlled variations by varying either pulmonary air pressure or the tension of the laryngeal muscles (Lieberman, 1967). Linguistically significant information can be transmitted by means of these variations in fundamental frequency as, for example, in Chinese where these variations are used to differentiate different words. The main source of phonetic differentiation in human languages, however, arises from the dynamic properties of the supralaryngeal vocal tract acting as an acoustic filter. The length and shape of the supralaryngeal vocal tract determines the frequencies at which maximum energy will be transmitted from the laryngeal source to the air adjacent to the speaker’s lips. These frequencies, at which maximum acoustic energy will be transmitted, are known as formant frequencies. A speaker can vary the formant frequencies by changing the length and shape of his supralaryngeal vocal tract. He can, for example, drastically alter the shape of the airway formed by the posterior margin of his tongue body in his pharynx. He can raise or lower the upper boundary of his tongue in his oral cavity. He can raise or lower his larynx and retract or extend his lips. He can open or close his nasal cavity to the rest of the supralaryngeal vocal tract by lowering or raising his velum. The speaker can, in short, continually vary the formant frequencies generated by his supralaryngeal vocal tract. The acoustic properties that, for example, differentiate the vowels [a] and [i] are determined solely by the shape and length differences that the speaker’s supralaryngeal vocal tract assumes in articulating these vowels. The situation is analagous to the musical properties of a pipe organ where the length and type (open or closed end) of pipe determines the musical quality of each note. The damped resonances of the human supralaryngeal vocal tract are, in effect, the formant frequencies. The length and shape (more precisely the cross-sectional area as a function of distance from the laryngeal source) determine the formant frequencies. The situation is similar for unvoiced sounds where the vocal cords do not open and close at a rapid rate releasing quasiperiodic puffs of air. The source of acoustic energy in these instances is the turbulence generated by air rushing through a constriction in the vocal tract. The vocal tract still acts as an acoustic filter but the acoustic source
64
Philip Lieberman
may not be at the level of the larynx as, For example, in the sound [s] where the source is the turbulence generated near the speaker’s teeth. The anatomy of the adult human supralaryngeal vocal tract permits modern man to generate supralaryngeal vocal tract configurations that involve abrupt discontinuities at its midpoint. These particular vocal tract shapes produce vowels like [a], [i], and [u] which have unique acoustic properties as well as consonants like [g] and [k]. The acoustic properties of these particular sounds will be discussed in detail, but for the moment 1 will simply note that they are sounds that minimize the problems of precise articuiatory control. A speaker can produce about the same formant frequencies for an [i], for example, while he varies the position of the midpoint area function discontinuity by one or two centimeters (Stevens, forthcoming). They are also sounds that are maximally distinct acoustically. They, moreover, are sounds that a human listener can efficiently use to establish the size of the supralaryngeal vocal tract that he is listening to. This last property relates to Factor 1, the specialized speech encoding and decoding that characterizes human language. The reconstructions of the supralaryngeal vocal tracts of various fossil hominids that my colleague Edmund S. Crelin has made indicate that some extinct hominids lacked the anatomical basis for producing these sounds while other hominids appear to have the requisite anatomical specializations for human speech. I will, of course, return to this topic.
3.
Factor 3 - Cognitive ability and automatization
There are two interrelated aspects to the cognitive abilities that underly language. One is the process that I will term automatization. Human language involves rapidly executing complex sequences of articulatory maneuvers or making equally complex perceptual decisions regarding the identity of particular sound segments. At a higher level, complex phonologic and morphophonemic relationships must be determined. None of these processes is, however, what the speaker or listener is directly concerned with. The semantic content of the message is the primary concern of the speaker or listener. The sending and receiving processes are essentially automatic. No conscious thought is expended in the process of speech production, speech perception, or any of the syntactic or morphophonemic stages that may intervene between the semantic content of the message and the acoustic signal. It is clear that ‘automatized’ skills are not unique to human language. Other aspects of human activity, dance, for example, involve similar phenomena. The novice dancer must learn the particular steps and movements that characterize a particular dance form. Once the steps have been learned they must become automatized. The dance itself involves the complex sequences. Playing the piano or violin, skiing, or driving a car all involve automatized behavior.
On the evolution of language: A unified view
65
The bases for the automatized behavior that is a necessary condition for human language may reside in cross-modal transfers from other systems of hominid and hominoid activity. Tool use, for example, requires a high degree of automatization if it is to be effective. You can’t stop to think how to use a hammer every time you drive a nail in. Hunting is perhaps a still stronger case. A successful hunter must be able to accurately thrust his spear without pausing to think about the mechanics of spear thrusting. Natural selection would quickly favor the retention of superior automatization. Automatized behavior pervades all aspects of culture. Indeed a cultural response is, to a degree, a special case of automatized behavior. In simpler animals cultural responses are perhaps less subject to environmental pressures. In humans they may be more subject to external forces rather than innate mechanisms, but they are no less automatized once learned. A special factor that may be germane to automatized behavior is that a ‘plastic’ period appears to be involved. It is comparatively easy to shape behavior during the ‘plastic’ period. It afterwards is either impossible or relatively difficult to modify automatized behavior. Puppies thus can be trained more readily than adult dogs. All humans can readily learn different languages in their youth. Most humans can learn a foreign, sic unfamiliar, language only with great difficulty (or not at all) during adult life. The same comments probably apply to learning to play the violin, tight-rope walking, etc., though no definitive studies have yet been made.
4.
Cognitive ability
Cognitive ability is a necessary factor in human language. Linguists often tend to assume that cognitive ability is linguistic ability. Indeed, since the time of Descartes the absence of human language in other animals has been cited as a ‘proof’ of man’s special status and the lack of cognitive ability in all other species. Human language has been assumed to be a necessary condition for human thought. The absence of human language has been, conversely, assumed to be evidence of the lack of all cognitive ability. It is clear that cognitive, i.e., logical, abilities can be demonstrated or observed in many animals. Behavioral conditioning, for example, which can be applied with great success to pigeons and rats, itself can be viewed as a demonstration of logical ability on the part of the ‘conditioned’ animal. Pavlov’s dogs had to make a logical association between the bell and food. Calling the animal’s response a ‘conditioned reflex’ obscures the fact that the animal had to be able to logically connect the sound of the bell with food. The same ‘conditioned’ response often can be observed as a human gourmet regards the menu. In both cases cognitive ability must interpose between the token
66
Philip Lieberman
of the food that is anticipated and the observed physiologic response. The human gourmet is hopefully more flexible, adaptive and discriminating than Pavlov’s dogs; however, the basic process is similar. In Homo sapiens the cognitive abilities that underly this particular aspect of behavior are simply more complex than is the case for Canus familiarus. The difference is, however, quantitative rather than qualitative. The particular cognitive abilities that are associated with presumably ‘unique’ human behavioral patterns like tool use have been observed in chimpanzees (Goodall, 1971) and sea otter (Kenyon, 1969). Some of the cognitive abilities that have been traditionally associated with human language have likewise been demonstrated by Gardner and Gardner (1969) and by Premack (1972). Premack’s experiments, in particular, clearly demonstrate that cognitive ability and human language cannot be regarded as the same biologic ability. Chimpanzees do not possess the phonetic apparatus of human language. They have available a subset of the phonetic distinctions that are available to modern man. Chimpanzees could, using the phonetic distinctions that are available to them, establish a language. This language’s phonetic system might not be as efficient as modern man’s, but it could form the basis of a language where we have operationally defined a language to be a communications system capable of transmitting unanticipated, new knowledge. The difference, at the phonetic level, between human language and this hypothetical chimpanzee language would be quantitative rather than qualitative. Premack’s experiments demonstrate that the cognitive abilities of chimpanzees are, at worst, restricted to some subset of the cognitive abilities available to humans. The difference at the cognitive level is thus also probably quantitative rather than qualitative. It is important to note, at this point, that quantitative functional abilities can be the bases of behavioral patterns that are qualitatively different. I think that this fact is sometimes not appreciated in discussions of gradual versus abrupt change. A modern electronic desk calculator and a large general purpose digital computer, for example, may be constructed using similar electronic logical devices and similar magnetic memories. The large general purpose machine will, however, have 1,000 to 10,000,000 times as many logical and memory devices. The structural differences between the desk calculator and general purpose machine may thus simply be quantitative rather than qualitative. The ‘behavioral’ consequence of this quantitative difference can, however, be qualitative. The types of problems that one can solve on the general purpose machine will differ in kind, as well as in size, from those suited to the desk calculator. The inherent cognitive abilities of humans and chimpanzees thus could be quantitative and still have qualitative behavioral consequences. The cognitive abilities that are typically associated with human language may have their immediate origins in the complex patterns of hominid behavior associated with
On the evolution of language: A unified view
67
tool use, tool making and hunting. Hewes (1971) makes a convincing case for the role of gestural communication in the earliest forms of hominid language and associates language with the transference of cognitive ability from these complex behavioral patterns. I would agree with Hewes, but I would not limit the earlier hominid languages to gestures nor would I restrict the cognitive abilities that underly language to hominids. Tool use and hunting certainly are not exclusively hominid patterns of behavior. We can get some insights on the neural abilities that non-human primates possess by taking note of the phylogenetic evolution of the peripheral systems involved in information gathering and communication. The acute color vision of primates, for example, would have had no selective advantage if it were not coupled with matching cognitive processes. Gestural communication is consistent with the evolution and retention of increasingly complex facial musculature in the phylogenetic order of primates. It is likewise unlikely that gestural communication was at any stage of hominid evolution the sole ‘phonetic’ medium. Negus (1949), by the methods of comparative anatomy, demonstrates that the larynges of non-human primates are adapted for phonation at the expense of respiratory efficiency. The far simpler larynx of the lung fish is better adapted for respiration and protecting the lungs. Clearly mutations that decreased respiratory efficiency would not have been retained over a phylogenetic order unless they had some selectional advantage. The cognitive skills that underly linguistic ability in hominids thus probably evolved from cognitive facilities that have a functional roles in the social behavior and communications of other animals. Like automatization, these skills would appear to be part of the biologic endowment of many species, and their continued development in ‘higher’ species is concomitant with behavioral complexity. The transference of these cognitive skills to human language thus could be viewed as yet another instance of ‘preadaptation’, the use of cognitive processes for language that originally evolved because of the selective advantages conferred on activities like hunting, evading natural enemies, food gathering, etc.
5.
The speech abilities of Neanderthal and other fossil hominids
As I noted before, it is apparent that no single factor can be in any reasonable way identified as the ‘key’ to language. The two factors that appear to be most recent in shaping the particular form of human language are, however, Speech Encoding and Speech Producing Anatomy. Certain neural mechanisms must be present for the perception of speech (Lenneburg, 1967). It is difficult to make any substantive inferences about the presence or absence of particular neural mechanisms in the
68
Philip Lieberman
brains of extinct fossil hominids since we can deduce only the external size and shape of the brain from a fossil skull. Also, we lack a detailed knowledge of how the human brain functions. We could not really assess the linguistic abilities of a modern man simply by examining his brain. Fortunately, we can derive some insights on the nature of speech perception in various fossil hominids by studying their speech-producing anatomy. The relationship between speech anatomy and speech perception is very much like that which obtains between bipedalism and the detailed anatomy of the pelvic region. The anatomy is a necessary condition, though neural ability is also necessary. The methodology that has enabled us, and I must emphasize that this research has been a joint enterprise, to reconstruct the speech-producing anatomy of extinct hominids is that proposed by Charles Darwin. Darwin in chapters 10 and 13 of On the origin of species (1859) discussed both the ‘affinities of extinct Species to each other, and to living forms’, and ‘Embryology’. We have applied the methods of comparative and functional anatomy to the speech.producing anatomy of present day apes and monkeys and to normal human newborn. We first assessed the speech-producing abilities of these living animals in terms of their speech-producing anatomy. We found that their supralaryngeal vocal tracts inherently restricted their speech-producing abilities. We then noted that certain functional aspects of the morphology of the skulls of these living animals resembled similar features of extinct fossil hominids. The reconstructions of the supralaryngeal vocal tracts of the La Chapelle-auxSaints, Es-Skhtil V, Broken Hill, Steinheim, and Sterkfontein 5 fossils were made by my colleague Edmund S. Crelin by means of the homologues that exist between these skulls and living forms, the marks of the muscles on the fossil skulls, and the general methods of comparative anatomy. Crelin’s (1969) previous experience with the anatomy of the newborn was especially relevant since we can see in the human newborn many of the relevant skeletal features associated with the soft tissue structures that must have occurred in certain of these now extinct hominid forms. In most cases we made use of casts of the fossil material made available by the Wenner-Gren foundation. For the La Chapelle-aux-Saints and Steinheim fossils, casts made available by the University Museum, Philadelphia, Pennsylvania were employed. The original La Chapelle-aux-Saints fossil as well as the La Ferrasie, and La Quina child’s fossil were also examined with the cooperation of the Mu&e de 1’Homme in Paris and the Mu&e des Antiquites Nationales in St.Germain-en-Laye. We attempted to examine the original Steinheim fossil but were not successful. The details of the reconstructions are discussed in our published and forthcoming papers (Lieberman and Crelin, 1971; Lieberman et al., 1972b; Crelin et al., forthcoming). I will, however, note some of the salient points in the discussion of particular fossils. I will first discuss the computer modelling technique that we employed to arrive at a functional assessment of these
On the evolution of language: A un$ed view
69
supralaryngeal vocal tracts. I think that it makes sense to approach the discussion of the reconstructions by first discussing the modelling technique because one of the points that I hope will emerge from the discussion of the modelling technique is the question of how much of the details of the supralaryngeal vocal tract’s morphology we need to know in order to make meaningful statements about speech ability. The answer to this question is that we really need to know only a few, fairly gross aspects of the morphology of the supralaryngeal vocal tract. The reason that this is so is itself one of the functional characteristics of human speech. I’ll begin the discussion of the modelling technique by returning to our studies of the speech capabilities of living non-human primates. This is a useful way to start since we can compare the results of our modelling with the actual phenomena. Figure 1 shows the left half of the head and neck of a young adult male chimpanzee sectioned in the midsagittal plane. Silicone rubber casts were made of the air passages, including the nasal cavity, by filling each side of the split air passages separately in the sectioned head and neck to insure perfect filling of the cavities. The casts from each side of a head and neck were then fused together to make a complete cast of the air passages. In Figure 2 the cast of the chimpanzee airways is shown together with casts made, following the same procedures, for newborn human and adult human. A cast of the reconstructed supralaryngeal airways of the La Chapelle-aux-Saints fossil also appears in this figure. In Figure 3 equal sized outlines of the air passages for these four vocal tracts are sketched. Note the high position of the larynx in the newborn human, and adult chimpanzee vocal tracts where the soft palate and epiglottis can be approximated. In the adult human vocal tract the soft palate and epiglottis are widely separated and cannot be approximated. The tongue is likewise at rest in newborn human and chimpanzee completely within the oral cavity, whereas in adult man the posterior third of the tongue is in a vertical position forming the anterior wall of the supralaryngeal pharyngeal cavity. Note, in particular, that there is practically no supralaryngeal portion of the pharynx present in the direct airway out from the larynx when the soft palate shuts off the nasal cavity in newborn human and in chimpanzee. In adult man half of the supralaryngeal vocal tract is formed by the pharyngeal cavity. This difference between the chimpanzee and newborn supralaryngeal vocal tracts and that of adult Homo sapiens is a consequence of the opening of the larynx into the pharynx directly behind the oral cavity. In other words, the larynx opens almost directly into the oral cavity. This is the case for all living animals (Negus, 1949) with the exception of adult Homo sapiens. We really should use the term ‘adult-like’ rather than adult since these differences appear to be fully developed by two years of age and are probably largely differentiated by six months of age (Lieberman et al., 1972a).
70
Philip Lieberman
The functional distinctions that these anatomical differences confer on adult humans have been determined for respiration, swallowing, and the sense of smell. Kirchner (I 970) notes that the respiratory efficiency of the adult human supralaryngeal airways is about half that of the newborn. The right angle bend in the adult human supralaryngeal airway increases the flow resistance. The non-human supralaryngeal anatomy allows the oral cavity to be sealed from the rest of the airway during inspiration. This aids the sense of smell (Negus, 1949) and also allows an animal to breathe while its mouth contains a liquid (e.g. when a dog laps water). The adult human supralaryngeal airways also increase the possibility of asphyxiation. Food lodged in the pharynx can block the entrance to the larynx. This is not possible in non-humans since the supralaryngeal pharynx serves as a pathway for both food and liquids and as an airway only in adult Homo sapiens.l The functional distinctions that the differences in the anatomy of the supralaryngeal airways confer on speech production can be determined by modelling techniques. The source-filter theory of speech production, as I have noted before, states that speech is the result of the filtering action of the supralaryngeal vocal tract on the acoustic sources that excite it. Since the filtering properties of the filter are uniquely determined by the shape and length (the cross-sectional area function) of the supralaryngeal vocal tract it is possible to assess the properties of a particular vocal tract once we know the range of shapes that it can assume. Note that this type of analysis will not tell us anything about the total range of phonetic variation. We would have to know the properties of the laryngeal source as well as the degree of motor control that a particular organism possessed. We, however, can assess the restraints that the supralaryngeal vocal tract itself imposes on the possible phonetic repertoire. The situation is similar to that which would occur if we found an ancient woodwind instrument made of brass. We would probably not be able to say very much about the reed, which would have decayed, but we would be able to determine some of the constraints that the instrument imposed on a performance. These constraints obviously would inherently structure the musical forms of the period. You can’t write music that cannot be performed. We would not know all of the constraints, we could not say very much about the manual dexterity of the players or the general musical theory, but we would know more than would be the case if we had not found the ancient instrument. 1. The human vocal tract is also inferior to the vocal tracts of hominids like La Chapelle-auxSaints with respect to chewing. The reduction in the body of the mandible in modern Homo sapiens has reduced the tooth area. Dental studies have determined (Manly and Braley, 1950; Manly and Shiere, 1950; Manly and
Vinton, 1951) that chewing efficiency in primates is solely a function of swept tooth area. Hominid forms that have smaller tooth areas have less efficient chewing. The reduction of the mandible in modem man therefore cannot be ascribed to enhancing chewing efficiency.
On the evolution of language: A @ied
view
71
We are in a somewhat better position when we study the reconstructed supralaryngeal vocal apparatus of an extinct hominid. We can tell something about the constraints on the phonetic repertoire. The interconnections that exist between the vocal apparatus and the perception of speech in Homo sapiens, moreover, allow us to make some more general inferences than would otherwise be the case. The technique that we have employed to assess the constraints imposed by the supralaryngeal vocal apparatus of an animal makes use of a computer model of the vocal tract. We really don’t have to make use of this model. It would be possible, though somewhat tedious, to make actual models of possible supralaryngeal vocal tract configurations. If these models, made of plastic or metal, were excited by means of a rapid quasiperiodic series of puffs of air (i.e., an artificial larynx) we would be able to hear the actual vowel-like sounds that a particular vocal tract configuration produced. If we systematically made models that covered the range of possible vocal tract configurations we could determine the constraints that the supralaryngeal vocal tract morphology imposed, independent of the possible constraints determined by limitations on motor control, etc. We would be, of course, restricted to steady-state vowels since we could not rapidly change the shape of the vocal tract but we could generalize our results to consonants since we could model the articulatory configurations that occur at the start and end of typical consonant-vowel sequences. Note that these modelling techniques allow us to assess the limits on the phonetic repertoire that follow from the anatomy of the supralaryngeal vocal tract, independent of muscular or neural control and independent of the dialect, habits, etc., of the animal whose vocal tract we would be modelling. The technology for making these mechanical models existed at the end of the eighteenth century. Von Kempelen’s (1791) famous talking machine modelled the human vocal tract by mechanical means. The method that we have employed simply makes use of the technology of the third quarter of the twentieth century.
6.
Chimpanzee newborn and adult Homo sapiens
In Figure 4 three area functions are shown for the chimpanzee vocal tract derived from the sectioned head and neck shown in Figure 1. The silicone rubber casting and schematic drawing of this vocal tract are shown in Figures 2 and 3 respectively. The area functions shown in Figure 1 represent the best approximations that we could get to the human vowels [a],[i], and [u]. We systematically drew area functions on an oscilloscope input to a PDP 9 computer with a light pen. The computer had been programmed to calculate the formant frequencies that corresponded to these area functions. The details of the computer program are discussed by Henke (1966). The
72
Philip Lieberman
computer allowed us to conveniently and rapidly make hundreds of possible supralaryngeal vocal tract models. We thus could explore the acoustic consequences ot all possible chimpanzee supralaryngeal vocal tract configurations without waiting for a chimpanzee to actually produce these shapes. We used the same procedure to explore the possible range of supralaryngeal vocal tract shapes for the newborn human supralaryngeal vocal tract shown in Figures 2 and 3. We were guided in these simulations by the morphology of the head and neck, i.e., the relative thickness and position of the tongue, the lips, the velum and the position of the pharynx relative to the larynx and oral cavity. We were also able to make use of cineradiographic pictures of newborn infants during cry and swallowing (Truby et al., 1965). The results of these simulations are shown in Figure 5. In Figure 5 the formant frequencies of the three area functions of Figure 4 are plotted, together with an additional data point (X) for human newborn. The loops labelled with phonetic symbols represent the data points for a sample of real utterances derived from 76 adult men, women and adolescent children producing American-English vowels (Peterson and Barney, 1952). In Figure 6 we have reproduced the actual data points for this sample of real human vowels. Note that the chimpanzee and newborn human utterances only cover a small portion of the adult human ‘vowel space’. In other words, the chimpanzee and newborn vocal tracts according to this modelling technique inherently do not appear to be able to produce vowels like [a], [i], and [u]. All normal human speakers can inherently produce these vowels. Any human, if he is raised in an American-English environment will be able to produce these vowels. The modelling of the chimpanzee and newborn vocal tracts indicates that they could not, even if they had the requisite motor and neural abilities. The question that we are addressing is thus not whether chimpanzees and newborns can speak AmericanEnglish. It is whether they have the anatomical apparatus that would allow them to speak. The results of the modelling technique can, of course, be checked against the actual utterances of chimpanzees and newborn Homo sapiens. When this is done it is evident that the actual vowels of newborn Homo sapiens agree with the computer simulation (Irwin, 1948; Lynip, 1951; Lieberman et al., 1972a). The chimpanzee simulation appears to encompass a greater range than has been observed so far in the acoustic analysis of chimpanzee vocalizations (Lieberman, 1968). This may merely indicate that the acoustic analyses so far derived from chimpanzee do not represent the total chimpanzee repertoire. It is, however, apparent that the computer simulation does not appear to be showing a smaller vowel space than is actually the case. The computer simulation for adult Homo sapiens corresponds with that observed (Chiba and Kajiyama, 1958; Fant, 1960; Peterson and Barney, 1952) and is not plotted here. The vowel diagrams in Figures 5 and 6 are really an indirect way of showing that
On the evolution oj’language: A unified view
Figure
4.
73
Chimpanzee supralaryngeal vocal tract area functions modeled on computer. These functions were the ‘best’ approximations that could be produced, given the anatomic limitations of the chimpanzee, to the human vowels [i]. [a], and [u]. The formant frequencies calculated by the computer program for each vowel are tabulated and scaled to the average dimensions of the adult human vocal tract (after Lieberman et al., 1972)
/u
lt+ji&& Formant
Freq.
Freq./l.l
Formant
Freq.
FreqJl.7
/
&...*...*
’ Formant
Freq. Freq.ll.7
1 2 t-i-t 3
830 1800 4080
490 1060 2390
5L) F
OO C’
1,
2 I
3 I
4 LENGTH
5 I
FROM
6 LARYNX
7
8
1
I
9 I
10 1
11 I
(cm)
the chimpanzee and newborn cannot generate supralaryngeal vocal tract area functions like those shown in Figure 7. These three cotigurations are the limiting articulations of a vowel triangle that is language universal (Troubetzkoy, 1939). It is not a question of the chimpanzee and newborn not being able to produce American-English vowels. They could not produce the vowel range that is necessary for any other language of Homo sapiens. Particular modern languages may lack one of these articulations, but they always include at least one of these vowels and/or the glides [y] and [w] which are functionally equivalent to [i] and [u]. It is important to remember that we are discussing the phonetic level rather than the phonemic. Claims that a particular language, e.g. Kabardian (Kuipers, 1960) has only ‘one’ centralized vowel generally concern the phonemic level, i.e., the claim is that a particular language does not
74
Philip Lieberman
Figure
5.
Plot of formant frequencies,for (I),
chimpanzee
(2), and (3), scaled to correspond
tract. Data point (X)
represents
The closed loops enclose 90 percent of 76 adult men,
vowels of Figure
4, data points
to the size of the adult human vocal
an additional point ,for human newborn. of the data points derivedfrom
women, and children producing
American-English
a sumpple vowels
1952). Note that the chimpanzee and newborn vocal tracts cannot produce the vowels [i], [u], and [a] (after Lieberman et al., 1972) (Peterson
sool 0
and Barney
I
I
zm fREQUENCY
40 OF
F,
IN
CYCLES
PER
SECOND
On the evolution of language: A ukjied
Figure
6.
view
75
Formant frequencies of Amesican English vowels for a sample qf 76 adult men, adult women and children. The closed loops enclose 90 percent qf the data points in each vowel category (after Peterson and Barney, 1952)
2500 2 f
2000-
LY
500 0
200
400
600
FREQUENCY
800
OF F1
1000
IN
1200
1 IO
Hz
differentiate words at the phonemic level through vowel contrasts. At the phonetic level these languages make use of vowels like [i], [u], and [a] though these vowels’ occurrences are conditioned by other segments. It is also important to note that a vocal tract that cannot produce the area functions necessary for [i], [a], and [u] also cannot produce velar consonants like [g] and [k]. These consonants also involve discontinuities at the midpoint of the supralaryngeal vocal tract. Dental and bilabial consonants like [d], [t], [b] and [p] are, however, possible.
76
Philip Lieberman
Illustrations
Figure I.
of approximate
(a)
midsagittal
sections,
(b)
--_ -\ IT -.____ /,_--._ F area functions,
vowels ii], [a],
and (c) acoustic transfer functions and [u]
cross-sectional
of the vocal tract for the
(after Lieberman et al., 1972)
“E 0
11’
‘-. -_
:
: \ :
/i/
ij
‘*
B -I
8
L
4
/i/
3
0 h
0
&
b
0
4
6
LENGTH
u
LARYNX
1
(cm
:
0
I
g F
‘\ \ \
4
6 k 1 0 z 0
r \
1
-..
.
6
12
FROM
LARYNX
20
/a/
0
0
E
(cm)
2coD FREOUENCY
a
20
t= Y
0
? !
-20
g
-40 0
”
OF
400D (Hz)
F
-
/u/
SECTION TRACT
40
16
:
MIDSAGGITAL THE VOCAL
4CQD (Hz)
5 -20 $J
4
LENGTH
&
g
F y
,’ kz
z 2
,,’
FREOUENCY
m
2
/a/
200D
16
12
FROM
“E
\
/i/
LENGTH
FROM
CROSS-SECTIONAL THE
VOCAL
TRACT
LARYNX
AREA
(cm1
FUNCTION
2DoD FREOUENCY
OF
MAGNITUDE TRACT
4DQD (HZ)
OF THE
TRANSFER
VOCAL
FUNCTION
On the evolution of language: A unified view
77
Figure 7 shows a midsagittal outline of the vocal tract for the vowels [i], [a], and [u], as well as the cross-sectional areas of the vocal tract (Fant, 1960) and the frequency domain transfer functions for these vowels (Gold and Rabiner, 1968). Ten to one discontinuities in the area function at the vocal tract’s midpoint are necessary to produce these vowels. It is possible to generate these discontinuities with the ‘bent’ adult human supralaryngeal vocal tract since the cross-sectional areas of the oral and pharyngeal cavities can be independently manipulated in adult humans while a midpoint constriction is maintained. The supralaryngeal vocal tract in adult humans thus can, in effect, function as a ‘two tube’ system. The lack of a supralaryngeal pharyngeal cavity in the direct airway from the larynx, at a right angle to the oral cavity, in chimpanzee and newborn humans restricts these forms to ‘single tube’ resonant systems. In adult humans, muscles like the genioglossus can pull the pharyngeal portion of the tongue in an anterior direction, enlarging the pharyngeal cavity while the oral cavity is constricted, as in the production of [i]. In the production of [a], in adult humans, the pharyngeal constrictors reduce its cross sectional area while the oral cavity is opened by lowering the mandible. It is impossible to articulate these extreme discontinuities in the chimpanzee and newborn supralaryngeal vocal tracts. They can only attempt to distort the tongue body in the oral cavity (see Figures 2, 3, and 4) to obtain changes in cross-sectional area. The intrinsic musculature and elastic properties of the tongue severely limit the range of deformations that the tongue body can be expected to employ. This is evident in cineradiographic observations of newborn cry and swallowing (Truby et al., 1965), and baboon cries (Zhinkin, 1963), and the deformations of the oral and pharyngeal portions of the tongue in adult humans (Perkell, 1969). Note that Figure 7 shows that the discontinuities in the [a], [i], and [u] are functions which occur at or near the midpoint of the supralaryngeal vocal tract. Stevens (in press) has shown that the midpoint area discontinuity has an important functional value. It allows human speakers to produce signals that are acoustically distinct with relatively sloppy articulatory maneuvers. The first and second formant frequencies are maximally separated for [i], maximally centered for [a], and maximally lowered for [u]. When a human speaker wants to produce one of these vowels it is not necessary for him (or her) to be very precise about the position of the tongue. All that is necessary is an area function discontinuity within one cm or so from the midpoint. The formant frequencies will not perceptably vary2 (Flanagan, 1955) when the discontinuity shifts plus or minus one cm from the midpoint. This would not be the case for similar articulations if they were generated at any point other than the midpoint of the vocal
2. Flanagan (1955) shows that human listeners are not able to discriminate stimuli that differ
solely with respect to a single formant frequency unless the difference exceeds 60 Hz.
78
Philip Lieberman
tract. The vowels [a], [i], and [u] are thus optimal acoustic signals for communication. The speaker can produce maximally differentiated sounds without having to be terribly precise. All other vowels are both less distinct and less ‘stabile’. The speaker must be more precise to produce acoustic signals that are not as distinct and separable. This factor is germane to one of the points that I raised earlier: How precise does the reconstruction of the supralaryngeal vocal tract of an extinct hominid have to to be yield meaningful data? The answer is that we can derive useful information without having to reconstruct fine detail since the crucial factor is essentially the ability to generate area discontinuities at or near the midpoint.
6.1
La Chapelle-aux-Saints
In Figures 2 and 3 a silicone rubber model and a sketch of the supralaryngeal vocal tract of the La Chapelle-aux-Saints Neanderthal fossil are shown. It obviously was not possible to obtain this information directly from the soft tissue of this fossil hominid. The reconstruction of the supralaryngeal airways was effected by Edmund SCrelin using the similarities that exist between this fossil and newborn human as a guide (Lieberman and Creiin, 1971; Lieberman et al., 1972b). The possible arthritic condition (Straus and Cave, 19.57) of the La Chapelle-aux-Saints fossil has been raised in some criticisms of Crelin’s reconstruction. Arthritic changes could no more have affected his supralaryngeal vocal tract than is the case in modern man. Figure 8 shows a lateral view of the skull, vertebral column, and larynx of newborn and adult Homo sapiens and the reconstructed La Chapelle-aux-Saints fossil. Note that the geniohyoid muscle in adult Homo sapiens runs down and back from the hyoid symphysis of the mandible. This is necessarily the case because the hyoid bone is positioned below the mandible in adult Homo sapiens. The two anterior portions of the digastric muscles, which are not shown in Figure 8, also run down and back from the mandible for the same reason. When the facets into which these muscles are inserted at the symphysis of the mandible are examined, it is evident that the facets are likewise inclined to minimize the sheer forces for these muscles. The human chin appears to be a consequence of the inclination of these facets. The outwards inclination of the chin reflects the inclination of the inferior plane of the mandible at the symphysis. Muscles are essentially ‘glued’ in place to their facets. Tubercles and fossae in this light may be simply regarded as adaptations that increase the strength of the muscle to bone bond by increasing the ‘glued’ surface area. The inclination of the digastric and geniohyoid facets likewise serves to increase the functional strength of the muscle to bone bond by minimizing sheer forces. As Bernard Campbell (1966, p. 2) succintly notes, ‘Muscles leave marks where they are attached to bones, and from such marks we assess the form and size of the muscles’. This is no less true for living than for
On the evolution of language: A untjied view
Figure
79
8.
Skull, vertebral column and larynx of Newborn (A), and adult Man (C), and reconstruction of Neanderthal (II). G-Geniohyoid Muscle, H-Hyoid Bone, S-Stylohyoid Ligament, A4-Thyrohyoid Membrane, T-Thyroid Cartilage, CC-Cricoid Cartilage. Note that the inclination of the styloid process away from the vertical plane in Newborn and Neanderthal results in a corresponding inclination in the stylohyoid ligament. The intersection of the stylohyoid ligament and geniohyoid muscle with the hyoid bone of the larynx occurs at a higher position in Newborn and Neanderthal. The high position of the larynx in the Neanderthal reconstruction follows, in part, from this intersection (after Lieberman and Crelin, 1971)
Figure 9.
Inferior views of base of skull of Newborn (A), Neanderthal (B), and adult Man (C). D-Dental Arch, P-Palate, S-Distance Between Palate and Foramen Magnum, V-Vomer Bone, BO-Basilar Part of Occipital, OOccipital Condyle (after Lieberman and Crelin, 1971)
80
Philip Lieberman
Figure
10.
Plot of formant ,fkequencies for reconstructed La Chapeile-aux-Saints supralaryngeal vocal tract in attempts to produce the vowels [i], [u], and [a]. Note that none of the data points (N) falls into the vowel loops that specify these vowels (after Lieberman and Crelin, 1971)
4wo 3500 r-
FREQUENCY
OF
F,
IN
CYCLES
PER
SECOND
extinct forms. When the corresponding features are examined in newborn Homo sapiens (Figure 10) it is evident that the nearly horizontal inclination of the facets of the geniohyoid and digastric muscles is a concomitant feature of the high position of the hyoid bone (Crelin, 1969, pp. 107-110). These muscles are nearly horizontal with respect to the symphysis of the mandible in newborn Homo sapiens. The facets therefore are nearly horizontal to minimize sheer forces. Newborn Homo sapiens thus lacks a chin.s When the mandible of the La Chapelle-aux-Saints fossil is examined, 3. The human chin is sometimes stated to be a reinforcement for the mandible. This is
probably not the case. It more likely is a stress concentration point. It would be rather simple
On the evolution of language: A unified view
81
it is evident that the facets of these muscles resemble those of newborn Homo sapiens. The inclination of the styloid process away from the vertical plane is also similar in newborn Homo sapiens and the La Chapelle-aux-Saints fossil. When the base of the skull is examined (Fig. 9) for newborn and adult Homo sapiens and the La Chapelleaux-Saints fossil it is again apparent that the newborn Homo sapiens and fossil forms have many common features that differ from adult Homo sapiens. The sphenoid bone is, for example, exposed in newborn Homo sapiens and the La Chapelle-aux-Saints fossil between the vomer and the basilar part of the occipital. This is a skeletal feature that provides room for the larynx which is positioned high with respect to the mandible. There has to be room for the larynx behind the palate in newborn Homo sapiens and in the La Chapelle-aux-Saints fossil. The qualitative difference in the morphology of the base of the skull, i.e., the exposure of the sphenoid, is a skeletal consequence of this anatomical necessity. We do not claim that all the features of the La Chapelle-aux-Saints fossil are found in newborn Homo sapiens. This is definitely not the case. We are claiming that certain features, particularly those relating to the base of the skull and mandible, are similar. These similarities make possible a reasonably accurate reconstruction of the supralaryngeal vocal tract of the La Chapelle-aux-Saints fossil. Our observations are in accord with the results of VlEek’s (1970) independent ‘Onto-phylogenetic’ study of the development of a number of fossil skulls of Neanderthal infants. VlEek notes the presence of skeletal characteristics that are typical of both infant and adult Neanderthal fossils, that are manifested during particular phases of the ontogenetic development of contemporary man. Other features that characterize adult Neanderthal man never appear in the ontogenetic development of contemporary man, while still other features that characterize contemporary man never are manifested in the fossil skulls. I will return to this data when I discuss the status of classic Neanderthal man. For the moment, it is relevant as an independent replication of the similarities between newborn Homo sapiens and the La Chapelle-aux-Saints fossil. Crelin’s reconstruction of the supralaryngeal vocal tract of this fossil is also in accord with earlier attempts like that of Keith, which is discussed by Negus (1949) as well as the inferences of Coon (1966). In Figure 10 the vowel space of the reconstructed La Chapelle-aux-Saints supralaryngeal vocal tract is presented. Each of the data points (N) represents attempts to produce vowels like [a], [i], or [u]. The labelled loops again refer to the PetersonBarney (1952) data for actual human vowels. Note that the vowel space of the fossil is a subset of the human vowel space and that it is impossible to produce the ‘extreme’ vowels [a], [i], and [u]. It is likewise impossible to produce the glides [y] or [w] or velar to resolve this point using the methods of stress analysis
common
in mechanical
and
civil
engineering.
82
Philip Lieberman
consonants like [g] and [k]. The Neanderthal supralaryngeal vocal tract also probably is not capable of making nasal versus non-nasal contrasts. Everything will tend to be nasalized. The modelled Neanderthal vowel space is probably too large since we allowed articulatory maneuvers that would have been rather acrobatic in modern man (Lieberman and Crelin, 1971). We tried to err on the side of making this fossil’s phonetic ability more humanlike whenever we were in doubt.
6.2
Sterkfontein 5
In Figure 11 a silicone rubber model of the airways of the reconstructed supralaryngeal vocal tract of the Sterkfontein 5 cranium (Mrs. Ples) is shown together with the chimpanzee airways that appeared in Figure 2. Note the similarities. Crelin’s reconstruction follows from the similarities that exist between this fossil and presentday orangutan and to a lesser degree chimpanzee. The reconstructed vocal tract has the same phonetic limitations as present-day apes. The details of this reconstruction and the others that follow will be discussed in detail in a separate paper (Crelin et al., forthcoming).
6.3
Es-SkhCl V and Steinheim
In Figure 12 silicone rubber models of the reconstructed airways of the Es-Skhtil V and Steinheim fossils are shown together with the supralaryngeal airways of adult Homo sapiens. Note that the reconstructed supralaryngeal airways both have rightangle bends, that the pharyngeal cavity is part of the direct airway out of the larynx, that both resemble the supralaryngeal airways of adult modern man. The reconstructed Es-Skhul V airway is completely modern. It would place no limits on its owner’s phonetic repertoire if he attempted to produce the full range of human speech. The Steinheim supralaryngeal airway, though it has some pongid features, is also functionally equivalent to a modern supralaryngeal vocal tract. It would have placed no restrictions on its owner’s phonetic repertoire if he attempted to produce the full range of human speech.
6.4
Broken Hill (Rhodesian Man)
In Figure 13 a silicone rubber model of the reconstructed supralaryngeal airways of Rhodesian man is shown together with a casting of the supralaryngeal airways of adult Homo sapiens. Note that despite the large oral cavity which follows from the large palate of this fossil, there is a right angle bend in the supralaryngeal airway. This vocal tract appeals to be an intermediate form. When it is modelled it can
On the evolution of language: A uniJied view
83
produce acoustic signals appropriate to the human vowels [a], [i], and [u] though the supralaryngeal vocal tract configurations that are needed are not as stabile, i.e. resistant to articulatory sloppiness, as equivalent human vocal tract configurations. Note that the large palate in this fossil form occurs with a bent supralaryngeal vocal tract. The reduction of the palate in forms like Steinheim, Es-Skhtil V and modern Homo sapiens therefore cannot be the factor that caused the larynx to descend.
7.
Significance
of resdts
In Table 1 the results of the reconstructions and computer modelling so far discussed are presented together with the results that would be obtained for various fossils that are similar to the ones that we have examined. I have not attempted to list all the similar forms. Note that we have divided the table into two categories : Fossil hominids who had the anatomical specializations that are necessary for human speech and fossils who lacked these specializations. Table
1. - Human Supralaryngeal Vocal Tract Australopithecines
+ Human Supralaryngeal Vocal Tract
: africanus robustus bosei
;;i e $ 3 z
Saccopastore I Monte Circe0 Teschik-Tasch (infant) La Ferrassie I La Chapelle-aux-Saints La Quina (infant) Pech-de-l’Az6
Broken Hill Es-Skhul V Djebel Kafzeh
Solo 11 Shanidar I
7.1
Steinheim
Cro-Magnon modern Homo sapiens
Neotony
The first point that I want to make is that the anatomy necessary for producing the full range of sounds necessary for human speech represents a particular specialization that, at the present time, occurs only in normal adult Homo sapiens. It is clear that
84
Philip Lieberman
adult Homo sapiens does not particularly resemble newborn Homo sapienx4 This is, in general, true of all primates (Schultz, 1968). The infantile forms of primates often do not resemble their adult forms. Schultz (1944, 1955), moreover, shows that the infantile forms of various non-human primates resemble newborn Homo sapiens whereas the adult forms of these non-human primates diverge markedly from adult Homo sapiens. This, however, does not mean that adult Homo sapiens has evolved by preserving neonatal features (Montagu, 1962), since it is apparent that modern man has his own unique specializations. The unique specializations of modern man include the anatomy necessary for the production of human speech. Table 1 shows that these specializations have evolved over at least the past 300,000 years and that until comparatively recent times, various types of hominids existed, some of whom lacked the anatomical mechanisms necessary for articulate human speech. 7.2
The ‘Neanderthal problem’
Note that Table 1 places a number of fossil forms that lacked speech into a category labelled ‘classic Neanderthal’. A view that has enjoyed some popularity in recent years is that Neanderthal fossils do not substantially differ from modern Homo sapiens, that they form a subset of hominids who have characteristics that grade imperceptibly with those typical of the modern population of Homo sapiens. An extreme formulation of this view is, for example, that ‘... no single measurement or even set of measurements can set Neanderthals apart from modern man’ (Nett, 1973). In other words, that Neanderthal man can not be regarded as a separate species or even a separate variety distinct from Homo sapiens. This claim can be substantiated only if one includes fossils like Steinheim and Es-Skhul V in the same class as forms like La Chapelle-aux-Saints. Quantitative multivariate analysis like that of Howells (1968) demonstrates that fossils like La Chapelle and La Ferrasie form a class that is quite distinct from modern man. The measurements contained in Patte’s (1955) comprehensive work as well as the observations of VlEek (1970) on the ontogenetic development of Neanderthal infants, indicate that this class of fossils, classic Neanderthal man, represents a specialization that diverged from the line (or lines) that are more direct ancestors of Homo sapiens. Fossils like Steinheim and Es-Skhul V, which are sometimes categorized as ‘generalized’ Neanderthal, are functionally distinct from classic Neanderthal. These fossils exhibit the anatomical specializations necessary for human speech. 4. Benda (1969) shows that Down’s Syndrom (Mongolism) involves the retention of infantile morphology. Victims of this pathology, in some instances, retain the general proportions of the newborn skull. Their supralaryngeal vocal
tracts retain the morphology of the newborn and they are unable to speak They strikingly demonstrate that Homo sapienshas not evolved by retaining infantile characteristics.
On the evolution of language: A unified view
85
A general overlap between modern man and Neanderthal man is possible only if forms like Steinheim and Es-Skhtil V are put into the same class as La Chapelle, La Ferrasie, Monte Circeo, etc. Hominids who could have produced human speech would have to be classified with hominids who could not have produced human speech. This would be equivalent to putting forms that had the anatomical prerequisites for bipedal posture into the same class as forms that lacked this ability. The question immediately arises. Is this category, i.e., set of fossils, labelled ‘classic Neanderthal’ a separate species? It is useful to remember Darwin’s definition of the term species. Darwin (1859, p. 52) viewed the term species, ‘... as one arbitrarily given for the sake of convenience to a set of individuals closely resembling each other, and that it does not essentially differ from the term variety, which is given to less distinct and more fluctuating forms’. Darwin later notes (1859, p. 485), ‘... the only distinction between species and well-marked varieties is that the latter are known, or believed, to be connected at the present day by intermediate gradations, whereas species were formerly thus connected’. It is evident that intermediate fossil forms like Broken Hill man bridge the gap between classic Neanderthal man and modern Homo sapiens. We do not know, and we probably never will be able to know, all the traits that may have differentiated various hominid populations that are now extinct. We do not, for example, know whether viable progeny would have resulted from the mating of forms like Cro Magnon and La Quina. Even if we did know that viable progeny would result from the mating of classic Neanderthal and early Homo sapiens populations, we would not necessarily conclude that these forms were members of the same species. The term species as Darwin noted is simply a labelling device. Canus lupus and Canus familiarus are considered to be separate species even though they may freely mate and have viable progeny. The behavioral attributes of wolves and dogs make it important for people, for example shepherds, to place these animals into different species, even though some dogs, e.g. Chihuahuas and St.Bernards, are more distinct morphologically, behaviorally and can’t mate. The question of separate species labels for classic Neanderthal and other fossil hominid populations is thus probably not a crucial question. We simply can note that different types of hominids apparently coexisted until comparatively recent times and that some of these hominids do not appear to have contributed to the present human gene pool. Table 1 does have some bearing on the apparent absence of the specializations typical of classic Neanderthal man (e.g. La Chapelle, La Ferrasie, etc.) in modern man. Animal studies (Capranica, 1965) have established the role of vocalizations in courtship and mating. The presence or absence of humanlike speech probably would have served as a powerful factor in assortative mating. In the present population of modern man it is evident that linguistic differences and affinities play a powerful role in mate selection. We would expect this phenomenon to be accentuated when different
86
Philip Lieberman
hominid populations inherently were unable to produce the sounds of other groups. Sexual selection determined by speech patterns may thus have played a significant role in the divergence of groups like classic Neanderthal in Western Europe and the ancestral forms of modern Homo sapiens. 7.3
The evolutionary sequence
There is, unfortunately, a large gap in Table 1 since we have not yet been able to examine specimens of Homo erectus that have intact skull bases. It is, however, likely that the situation that typifies later hominid forms will also characterize Homo erectus. Most will probably not have the anatomy necessary for the production of the full range of human speech. Some forms, however, will undoubtedly be found that either had the necessary anatomy or that were intermediate forms. Evolution goes in small steps and forms intermediate between Steinheim and the Australopithecines must have existed. We still are, like Darwin, at the mercy of the ‘Imperfection of the Geological Record’. We can, despite this gap, draw several inferences from Table 1. I would like to propose the following evolutionary sequence. The first phase of the evolution of human language must have relied on a system of gestures, facial expressions and vocal signals like those of present-day apes to communicate the semantic, i.e. cognitive, aspects of language. The Australopithecines must have had cognitive abilities that surpassed those of present-day apes. Although early Australopithecines may have had cognitive abilities that were near the levels of present-day apes, late forms would have developed superior abilities as evolution continued step by step and mutations favoring larger relative brain sizes were retained. The retention of mutations leading to larger relative brain sizes is itself a sign that cognitive ability had a selective advantage. We can reasonably infer that activites like tool making and collective social enterprises like hunting were important attributes of Australopithecine culture. Although the vocal apparatus of forms like Australopithecus africanus does not appear to differ significantly from those of present-day apes, vocal communications undoubtedly played a part in their linguistic system. Our reconstructions can tell us nothing about the larynx; however, it is almost certain that the laryngeal mechanisms of these forms was at least as developed as those of present-day apes. As Negus (1949) observed, there is a continual elaboration of the larynx as we ascend the phylogenetic scale in terrestrial animals. The larynges of animals like wolves are capable of producing a number of distinct calls that serve as vehicles of vocal communication. The same is true for the larynges of chimpanzees and gorillas. Studies like that of Kelemen (1948) which have attempted to show that chimpanzees cannot talk because of laryngeal deficiencies are not correct. Kelemen shows that the chimpanzee’s larynx is
On the evolution of language: A untjied view
87
different from the larynx of a normal adult human male. The chimpanzee’s larynx will not produce the range of fundamental frequencies typical of adult human males; however, it can produce a variety of sound contrasts. Many of these sound contrasts indeed occur in human languages. A present-day chimpanzee, if it made maximum use of its larynx and supralaryngeal vocal tract, could, for example, produce the following sound contrasts : a. Voiced versus unvoiced, i.e., excitation of the vocal tract by the quasi-periodic output of the larynx versus turbulent noise excitation generated by opening the larynx slightly and expelling air at a high flow rate. b. High fundamental versus normal fundamental, i.e., adjusting the larynx so phonation occurs in the falsetto register rather than the modal chest register (Van den Berg, 1960). The larynx has several modes of phonation which result in acoustic signals that are quite distinct. In falsetto the fundamental frequency is high and the glottal source’s energy spectrum has comparatively little energy at its higher harmonics. C. Low fundamental versus normal fundamental, i.e., adjusting the larynx to a lower register. This lower register termed ‘fry’ produces very low fundamental frequencies (Hollien et al., 1966) that are irregular (Lieberman, 1963). d. Dynamic fundamentalfrequency variations, e.g., low to high, high to low. Variations like these occur in many human tone languages. e. Strident high-energy laryngeal excitation, i.e., the high fundamental frequency, breathy output that can be observed in some chimpanzee vocalizations (Lieberman, 1968) as well as in the cries of human newborn (Lieberman et al., 1972a). f. Continuent versus interrupted, i.e., the temporal pattern of laryngeal excitation can be varied. This can be observed in the calls of present day monkeys and apes (Lieberman, 1968). g. Oral versus non-oral, i.e., the animal can produce a call with his oral cavity sealed or with his oral cavity open. This can be observed in present-day gorilla where the low energy, low fundamental frequency sounds that sometimes accompany feeding appear to be produced with the oral cavity sealed by the epiglottis (Lieberman, 1968). h. Lip rounding and laryngeal lowering. Chimpanzees have the anatomic ability of rounding their lips and/or lowering their larynges while they produce a call. Both of these articulatory gestures could produce a formant frequency pattern that has falling transitions. i. Flared lips and laryngeal raising. Chimpanzees could either flare their lips and/or raise their larynges while they produced a call. This would generate a rising formant frequency pattern. j. Bilabial closures and releases. Sounds like [b] and [p] as well as prevoiced [b] (like
88
Philip Lieberman
that occurring in Spanish, for example [Lisker and Abramson, 19641) could be produced by controlling the timing between the opening and closing of the larynx and the lips. k. Dental closures and releases. Sounds like [d] and [t] (Lisker and Abramson, 1964) could be produced by varying the timing between a closure effected by the tongue blade against the alveolar ridge and the opening and closing of the larynx. Australopithecines could have generated all of the above sound contrasts if they had the requisite motor control and the neural ability to perceive the differences in sound quality that are the consequences of these articulatory maneuvers. Most of these phonetic contrasts, i.e., ‘features’ (Jakobson et al., 1952) have been observed in the vocal communications of present-day non-human primates. Present-day human languages make use of all of these sound contrasts. The combination of articulatory features like (j) and (k) and timing features like (f) could also generate sounds like [f], [v], [s], etc. It is quite probable that late Australopithecines and various forms of Homo erectus made use of these sound contrasts to communicate. The transference of patterns of ‘automatized’ behavior which was discussed early in this paper from activities like tool making and hunting would have facilitated the acquisition of the motor skills necessary to produce these sounds. The role of hunting would have placed a premium on communication out of the line of sight, communication that, furthermore, left the hunter’s hands free. The neural mechanisms necessary for the differentiation of these sounds appear to exist in present-day primates. Wollberg and Newman (1972), for example, have shown that squirrel monkeys (Saimiri sciureus) possess auditory receptors ‘tuned’ to one of the vocal calls that these monkeys make use of in their communications. Similar results have been demonstrated for frogs (Capranica, 1965). Although gestural communication (Hewes, 1971) undoubtedly played a more important role in the communications of these early hominids than is the case for modern man, I think that it is most unlikely that vocal communications also did not play an important role. The crucial stage in the evolution of human language would appear to be the development of the ‘bent’ supralaryngeal vocal tract of modern man. Table 1 shows a divergence in the paths of evolution. Some hominids appear to have retained the communications system that was typical of the Australopithecines. A mixed system that relied on both gestural and vocal components. Other hominids appear to have followed an evolutionary path that resulted in almost total dependence on the vocal component for language, relegating the gestural component to a secondary ‘paralinguistic’ function. The process would have been gradual, following from the prior existence of vocal signals in linguistic communication. As I have noted before, the bent supralaryngeal vocal tract that appears in forms like present-day Homo sapiens, and the Steinheim, Es-Skhtil V, and Broken Hill
On the evolution of language: A unrjied view
89
fossils, allows its possessors to generate acoustic signals that have very distinct acoustic properties and that are very easy to produce. These signals are, in a sense, optimal acoustic signals (Lieberman, 1970). If vocal communications were already part of the linguistic system of early hominids, then mutations that extended the range and the efficiency of the signalling process would have been retained in forms like Steinheim. The ‘bent’ supralaryngeal vocal tract is otherwise a burden for basic vegetative functions. It would not have been retained unless it had conferred an adaptive advantage. The initial adaptive value of the bent supralaryngeal vocal tract would have been its value in increasing the inventory of vocal signals and in, moreover, providing more efficient vocal signals. The neural mechanisms necessary to perceive these new signals would, in all likelihood, have been available to hominids like Steinheim. Recent electrophysiological data (Miller et al., 1972) shows that animals like Rhesus monkey, Maccaa mullatu, will develop neural detectors that identify signals important to the animal. Receptors in the auditory cortex responsive to an 200 Hz sinusoid were discovered after the animals were trained by the classic methods of conditioning to respond behaviorally to this acoustic signal. These neural detectors could not be found in the auditory cortex of untrained animals. The auditory system of these primates thus appears to be ‘plastic’. Receptive neural devices can be formed to respond to acoustic signals that the animal finds useful. These results are in accord with behavioral experiments involving human subjects where ‘categorical’ responses to arbitrary auditory signals can be produced by means of operant conditioning techniques (Lane, 1965). They are also in accord with the results of classic conditioning experiments like those reported by Pavlov. The dogs learned to identify and to respond decisively to the sound of a bell, an ‘unnatural’ sound for a dog. The dog obviously had to ‘learn’ to identify the bell. Hominids like Steinheim who had the potential to make ‘new’ acoustic signals would also have had the ability to ‘learn’ to respond to these sounds in an autcmatized way. The plasticity of the primate auditory system would have provided the initial mechanism for ‘learning’ these new sounds. Later stages in the evolution of human language probably involved the retention of mutations that had ‘innately’ determined neural mechanisms that were ‘tuned’ to these new sounds. By innately determined, I do not mean that the organism needs no interaction with the environment to ‘learn’ to perceive these sounds. The evidence instead suggests that humans are innately predisposed to ‘learn’ to respond to the sounds of speech. Experiments with 6-week old infants (Eimas et al., 1971; Morse, 1971) show that they respond to the acoustic cues that differentiate sounds like [b] and [p] in the same manner as adults. These acoustic distinctions involve 10 msec differences in the timing of the delay between the start of the acoustic signal that occurs when a human speaker opens his lips and the start of phonation. It is most
90
Philip Lieberman
improbable that 6-week old infants could ‘learn’ to respond to these signals unless there was some innate predisposition for this sound contrast to be perceived. This surely is not surprising. Human infants really do not ‘learn’ the complex physiologic maneuvers associated with normal respiration. They have built in ‘knowledge’. The case for the neural mechanisms that are involved in the perception of human speech is not as simple as that for respiration. Some contact with a speech environment is necessary. Deaf children, for example, though they at first produce the vocalizations of normal children become quiet after six months of age (Lenneburg, 1967). Nottebaum (1970) shows similar effects in birds. Some aspects of the bird’s vocal behavior are manifested even when the bird is raised in isolation. Other important aspects of the bird’s vocal behavior develop only when the bird is exposed to a ‘normal’ communicative environment. At some late stage, that is, late with respect to the initial evolution of the ‘bent’ supralarpngeal vocal tract, the neural mechanisms that are necessary far the process of speech encoding would have evolved. The human-like supralaryngeal vocal tract would have initially been retained for the acoustically distinct and articulatory facile signals that it could generate. The acoustic properties of sounds like the vowels [a], [i], and [u] and the glides [y] and [w], which allow a listener to determine the size of the speaker’s supralaryngeal vocal tract, would have preadapted the communications system for speech encoding. When a human listener hears a sound like the word bat, as it is produced by an intermediate sized supralaryngeal vocal tract, it is indeterminate. Ladefoged and Broadbent (1957), for example, show that a listener will perceive this sound as the word bit if he is led to believe that it was produced by a long vocal tract. The same listener will perceive the same sound as but if he is led to believe that it was produced by a small, i.e., short vocal tract. The listener, in effect, ‘normalizes’ the signal to take account of the acoustic properties of different sized vocal tracts. The listener responds as though he is interpreting the acoustic signal in terms of the articulatory gestures that a speaker would employ to generate the word. The perception of human speech is generally structured in terms of the articulatory gestures that underly the acoustic signal (Liberman et al., 1967). This process, as I noted earlier, is the basis of the encoding which allows human speech to transmit information at the rate of 20 to 30 segments per second. Signals like the vowels [a], [u], and [i] and the glides [y] and [w] are determinate in the sense that a particular formant pattern could have been generated by means of only one vocal tract using a particular articulatory maneuver (Stevens and House, 1955; Lindblom and Sundberg, 1969). A listener can use these vowels to instantly identify the size of the supralaryngeal vocal tract that he (or she) is listening to (Darwin, 1971; Rand, 1971). These vowels can indeed serve the same function in the recognition of human speech by computer. Gerstman (1967), for
On the evolution of language: A unljied view
91
example, derives the size of a particular speaker’s vocal tract from these vowels to identify the speaker’s other vowels. Without this information it is impossible to assign a particular acoustic signal into the correct vowel class. The computer, like a human, has to know the size of the speaker’s supralaryngeal vocal tract. The process of speech encoding need not have followed the exact path that I have proposed. Other sounds, like [s] can provide a listener (or a computer) with information about the size of the speaker’s vocal tract. As I noted before, the Australopithecines had the anatomical prerequisites for producing sounds like [s] so the process of speech encoding and the evolution of the human supralaryngeal vocal tract may have been coeval from the start. It is clear, however, that evolution goes by small steps and what we have in present-day man is a fully encoded speech system with a speech producing anatomy that is highly adapted to this function. Other, now extinct, hominids like classic Neanderthal man had speech producing anatomy that clearly was not as well adapted for speech encoding. It is, therefore, reasonable to conclude that speech encoding either was more rudimentray, or not present. It is, however, important to conclude with the point that language does not necessarily have to involve the process of speech encoding and rapid information transfer. The remains of Neanderthal culture all point to the presence of linguistic ability. Conversely, birds may have the potential for rapid information transfer (Grenewalt, 1967); however, birds lack the cognitive ability that is also a necessary factor in language. It is most unlikely that birds could develop a complex language unless they also had larger brains. Human language is the result of the convergence of many factors : Automatization, cognitive ability, and speech encoding. The particular form that human language has taken, however, appears to be the result of the evolution of the human supralaryngeal vocal apparatus. The supralaryngeal vocal apparatus that differentiates present-day Homo sapiens from all living animals thus is as important a factor in the late stage of hominid evolution as dentition and bipedal posture were in earlier stages.
REFERENCES Benda, C. E. (1969) Down’s syndrome, ism
and
its
management.
Grune and Stratton. Campbell, B. (1966) Human introduction
Aldine.
New
mongol-
York,
evolution: An to man’s adaptations. Chicago,
Capranica, R. R. (1965) The evoked vocal response of the bullfrog.Cambridge, Mass.. M.I.T. Press. Chiba, T, and Kajiyama, M. (1958) The vowel: Its nature and structure. Tokyo, Phonetic Society of Japan.
92
Philip Lieberman
Coon, C. S. (1966) The origin of races. New York, Knopf. Crelin, E. W. (1969) Anatomy of the newborn: An atlas. Philadelphia, Lea and Febiger. Crelin, E. W., Lieberman, P., and Klatt, D. H. (In preparation) Anatomy and related phonetic ability of the Skhul V, Steinheim, and Rhodesian fossils and the Pleisianthropus reconstruction. Darwin, C. (1859) On the origin of species (Facsimile edition). New York, Atheneum. Darwin, C. J. (1971) Ear differences in the recall of fricatives and vowels. Q. J. exper. Psyckol., 23, 386-392. Eimas, P. D., Siqueland, E., R. Jusczyk, P., and Vigorito, J. (1971) Speech perception in infants. Science, 171, 303-306. Fant, G. (1960) Acoustic theory of speech production. The Hague, Mouton. Flanagan, J. L. (1955) A difference limen for vowel forrnant frequency. J. acoust. Sot. Am., 27, 613-617. Gardner, R. A. and Gardner B. T., (1969) Teaching sign language to a chimpanzee. Science, 165, 664-672. Gerstman, L. (1967) Classification of selfnormalized vowels. Proceedings of IEEE conference on Speed Communiction and Processing, 97-100. Gold, B. and Rabiner, L.R. (1968) Analysis of digital and analog formant synthesizers. IEEE-Trans. Audio Electroacoustics, A U16, 81-94. Goodall, J. v. L. (1971) In the shadow of man. New York, Dell. Grenewalt, C. A. (1967) Bird song: Acoustics andpkysiology. Washington, D.C., Smithsonian Institution. Henke, W. L. (1966) Dynamic articulatory model of speechproduction usingcomputer simulation, Ph.D. dissertation, (appendix B). M.I.T. Hewes, G. W. (1971) Language origins: A bibliography. Dept. of Anthropology, Boulder, Colo., Univ. of Colorado. Hollien, H., Moore P., Wendahl, R. W., and Michel, J.F. (1966) On the nature of vocal fry. J. Speech Hearing Res., 9, 245247. Howells, W. W. (1968) Mount Carmel man: Morphological relationships. Proceedings of tke VIIItk International Congress of
Anthropological and Ethnological Sciences, Vol. I. Anthropol. Tokyo. Irwin, 0. C. (1948) Infant speech: Development of vowel sounds. J. Speech Hearing Dis., 13, 31-34. Jakobson, R., Fant, C. G. M., and Halle, M. (1952) Preliminaries to speech analysis. Cambridge, Mass., M.I.T. Press. Kelemen, G. (1948) The anatomical basis of phonation in the chimpanzee. J. Morph., 82,229-256. Kempelen, W. R. von (1791) Mechanismus der menscklicken Spracke nebst der Besckreibung seiner spreckenden Masckine. Vienna, J. B. Degen. Kenyon, K. W. (1969) The sea otter in the eastern Pacific Ocean. U.S. Gov’t. Printing Office. Kimura, D. (1964) Left-right differences: The perception of melodies, Q. J. exper. Psyckol., 16, 355-358. Kirchner, J. A. (1970) Pressman and Kelemen’s physiology of the larynx (Rev. Ed.). Rochester, Minn., Amer. Acad. Opthal. and Otolar. Kuipers, A. H. (1960) Phoneme and morpheme in Kabardiun. The Hague, Mouton. Ladefoged, P. and Broadbent, D. E. (1957) Information conveyed by vowels, J. acoust. Sot. Amer. 29,98-104. Lane, H. (1965) Motor theory of speech perception: A critical review. Psyckol. Rev., 72, 275-309. Lenneburg, E. H. (1967) Biologicalfoundations of language. New York, Wiley. Liberman, A. M., Cooper F.S ., Shankweiler, D. P., and Studdert-Kennedy, M. (1967) Perception of the speech code. Psyckol. Rev., 74,431-461. Lieberman, P. (1963) Some acoustic measures of the periodicity of normal and pathologic larynges. J. acoust. Sot. Amer. 35, 344353. Lieberman, P. (1967) Intonation, perception, and language. Cambridge, Mass, M.I.T. Press. Lieberman, P. (1968) Primate vocalizations and human linguistic ability. J. acoust. Sot. Amer., 44, 1574-1584. Lieberman, P. (1970) Towards a unified phonetic theory. Ling. Znq., 1, 307-322. Lieberman, P. (1972) The speech of primates.
On the evolution of language: A unified view
The Hague, Mouton. Lieberman, P., Klatt, D. H. and Wilson, W. A. (1969) Vocal tract limitations on the vowel repertoires of rhesus monkey and other nonhuman primates. Science, 164, 1185-1187. Lieberman, P. and Crelin, E. S. (1971) On the speech of Neanderthal man. Ling. Znq., 2, 203-222. Lieberman, P., Harris, K.S., Wolff, P. and Russell, L. H. (1972a) Newborn infant cry and nonhuman primate vocalizations, J. Speech Hear. Res., 14,718-727. Lieberman, P., Crelin, E. S. and Klatt, D. H. (1972b) Phonetic ability and related anatomy of the newborn, adult human, Neanderthal man and the chimpanzee. Amer. Anthrop., 74, 287-307. Lindblom, B. and Sundberg, J. (1969) A quantitative model of vowel production and the distinctive features of Swedish vowels. Speech transmission laboratory report I. Stockholm, Sweden, Royal Institute of Technology. Lisker, L. and Abramson, A. S. (1964) Acrosslanguage study of voicing in initial stops: Acoustical measurements. Word, 20, 384422. Lynip, A. W. (1951) The uses of magnetic devices in the collection and analysis of the preverbal utterances of an infant. Genet. Psychol. Mono., 44, 221-262. Manly, R. S. and Braley, L. C. (1950) Masticatory performance and efficiency. J. Dent. Res., 29, 448-462. Manly, R. S. and Shiere, F. R. (1950) The effect of dental defficiency on mastication and food preference. Oral Surg., Oral Med., Oral Path., 3, 674-685. Manly, R. S. and Vinton, P. (1951) A survey of the chewing ability of denture wearers. J. dent. Res., 30, 314-321. Miller, J. M., Sutton, D., Pfingst, B., Ryan, A. and Beaton, R., (1972) Single cell activity in the auditory cortex of rhesus monkeys: Behavioral dependency. Science, 177,449451. Montagu, M. F. A. (1962) Time, morphology and neotny in the evolution of man. In M. F. A. Montagu (Ed.), Culture and the evolution of man. New York, Oxford University Press. p. 324-342.
93
Morse, P. (1971) Speech perception in six-week oldinfants. Unpub1ishedPh.D dissertation, University of Connecticut Storm, Corm. Negus, V. E. (1949) The comparative anatomy and physiology of the larynx. New York, Hafner. Nett, E. G. (In press) A note on phonetic ability. Amer. Anth. Nottebohm, F. (1970) Ontogeny of bird song. Science, 167, 950-956. Patte, E. (1955) Les neanderthahens, anatomie, physiologic, comparaisons. Paris, Masson et Cie. Perkell, J. S. (1969) Physiology of speech production: Results and implications of a quantitative cineradiographic study. Cambridge, Mass., M.I.T. Press. Peterson, G. E. and Barney, H. L., (1952) Control methods used in a study of the vowels, J. acoust. Sot. Amer., 24, 175184. Premack, D. (1972) Language in chimpanzee?, Science, 172, 808-822. Rand, T. C. (1971) Vocal tract size normalization in the perception of stop consonants. Haskins Laboratories status report on speech research, SR-26/26, 141-146. Schultz, A. H. (1944) Age changes and variability in gibbons. Amer. J. Phys. Anth., n.s. 2, 1-129. Schultz, A. H. (1955) The position of the occipital condyles and of the fact relative to the skull base in primates, Amer. J. Phys. Anth., n.s. 13, 97-120. Schultz, A. H. (1968) The recent hominoid primates. In S. L. Washburn and P. C. Jay (Eds.) Perspectives on human evolution I. New York, Holt, Rinehart and Winston. Pp. 122-195. Shankweiler, D. and Studdert-Kennedy, M. (1967) Identification of consonants and vowels presented to left and right ears. Q. J. exper. Psychol., 19, 59-63. Stevens, K. N. (In press) Quanta1 nature of speech. In E. E. David and P. B. Denes (Eds.) Human communication: A unified view. New York, McGraw Hill. Stevens, K. N. and House, A. S. (1955) Development of a quantitative description of vowel articulation. J. acoust. Sot. Amer., 27,484-493. Straus, W. L., Jr. and Cave, A. J. E. (1957)
94
Philip Lieberman
Pathology and posture of Neanderthal man. Q. Rev. Biol., 32, 348-363. Troubetzkoy, N. S. (1939) Principes de phonologic (Trans. J. Cantineau, 1949) Paris, Klincksieck. Truby, H. M., Bosma, J. F., and Lind, J. (1965) Newborn infant cry. Uppsala, Almqvist and Wiksells. Van den Berg, J. W., (1960) Vocal ligaments versus registers. Curr. Prob. Phon. Logoped., 1, 19-34. VI&k, E. (1970) Etude comparative ontophylogenttique de l’enfant du Pech-de1’Aze par rapport a d’autres enfants
neandertaliens. In D. Farembach er. al. (Eds.) L’enfant du Pech-de-L’A&, Paris, Masson et tie. Pp. 149-186. Wollberg, Z. and Newman, J. D. (1972) Auditory cortex of squirrel monkey: Response patterns of single cells to species-specific vocalizations. Science, 175, 212-214. Zhinkin, N. I. (1963) An application of the theory of algorithms to the study of animal speech - Methods of vocal intercommunication between monkeys. In R. G. Bushel (Ed.) Acoustic Behavior of Animals. Amsterdam, Elsevier Publishing.
Le langage peut &tre defini operationnellement comme un systeme de communication permettant l’echange d’information nouvelle, non anticipee. Diverses sortes de langages semblent avoir exist& a des stades anterieurs de l%volution des hominiens. Le langage hum& est unique, actuellement, dans la mesure ou il utilise le discours vocal ‘encode’ pour permettre une transmission rapide de l’information. Le canal vocal supralaryngal du moderne Homo sapiens est un facteur utile de ce processus d’encodage qui met en jeu Bgalement des mecanismes nerveux speciaux. D’autres facteurs, tels que la capacite cognitive et P‘automatisation’, sont Cgalement necessaires au langage. Ces facteurs sont, toutefois, importants pour divers aspects du comportement humain et non humain en dehors du langage. L’evolution du langage semble avoir et& un processus graduel qui mena d’abord a des systemes qui reposent sur un melange de communication gestuelle et vocale. Certains hominiens semblent avoir conserve ce systeme jusqu’a une epoque relativement recente. D’autres semblent avoir donne beaucoup plus
d’importance a la communication verbale. Des reconstructions de canaux vocaux supralaryngaux fossiles montrent que certaines formes, les Australopitheques et l’homme de Neanderthal ‘classique’, ne possedaient pas le canal vocal supralaryngal qui est necessaire a la production de la parole humaine completement encodee. D’autres formes fossiles, celles de Steinheim et de Es-Skhul V, possbdent des canaux vocaux fonctionnellement modemes. D’autres, comme celle de Broken Hill, representent des formes intermediaires. L’evolution du langage humain peut etre concue comme un processus en trois stades qui impliqua (a) une importance croissante de la communication vocale dans des activites telles que la chasse, (b) un accroissement du repertoire vocal avec l’evolution du canal vocal supralaryngal humain qui produit des signaux acoustiques a la fois plus distincts et plus resistants aux erreurs articulatoires, et (c) l’evolution des mecanismes nerveux qui font usage des proprietes preadapt&es du canal vocal supralaryngal en vue de la communication verbale encodQ rapide.
4
Retrieval of sentence relations: Semantic vs. syntactic deep structure*
C. A. PERFETTI University of Pittsburgh
Abstract Two experiments
on unaided
and cued recall
of sentences
presented
in context
are
reported.
Key nouns in the sentences were arranged to have uniform surface functions, but to vary independently in deep syntactic category and semantic function. Cued recall for sentences in which the semantic function of actor and recipient coincided with the syntactic function of deep subject and object, respectively, was better than for sentences which did not have this normal semantic-syntactic coincidence. Unaided recall was not different for the two types of sentences. Models of sentence processing may have to represent both types of information as available to the language user.
One of the psychologically significant aspects of transformational grammar is the representation of relational information that is not directly revealed in sentence surface structure. Studies by Blumenthal (1967) and Blumenthal and Boakes (1967) were especially important on this point because they showed clearly a divergence between deep and surface structure in either the storage or retrieval of sentences. The main result of these studies was the demonstration that in sentences which were superficially similar, but different in deep syntax, differences in the probability of words from the sentences to cue recall of the entire sentences were associated with their deep structure roles. For example, comparing the two sentences The oficers were eager to please and The ojicers were easy to please, oficers proved to be a more potent prompt in the first sentence, where it is deep subject, than in the second sentence, where it is not.
* The research reported in this paper was carried out with the substantial assistance of Blaine Garson, who collected the data, and Robert Lindsey, who assisted with judging responses and analyzing data. The research was supported by the Learning Research and
Development Center, which is supported in part by funds from the U.S. Office of Education. Requests for reprints should be sent to the author, Learning Research and Development Center, University of Pittsburgh, Pittsburgh, PA. 15260. Cognition 2(l), pp. 95-105
96
C. A. Perfetti
There are, however, semantic relations that correspond in many cases to the syntactic relations that are revealed in transformational grammar. For example, most noun phrases which are subjects in deep structure are also actors or agents semantically. Thus in Blumenthal’s sentences, and in most examples from the literature of transformational grammar, the deep subject also names the actor of the action described in the sentence. This is the case in the above sentences for example. However, this strict correspondence between the semantic notion of actor and the syntactic notion of deep subject does not always hold. Of course, there are many sentences that do not have action as the semantic property. Statives such as Horace is sympathetic are only one of many semantic relations other than that of action. More to the point, there are sentences which have an action semantic, but for which the actor cannot be identified with deep subject. For example, in a sentence such as Virgil experienced humiliation at the hands of the soldiers, it is not Virgil but the soldiers who comprise the actor; Virgil, however, is the subject. It was sentences of the latter type which were of interest in the present study. The immediate empirical question is whether, in sentences such as the one above, Virgil or the soldiers is the more potent cue for retrieving the entire sentence (when compared to a sentence in which Virgil is both actor and the deep subject). The more general question is whether semantically based relationships such as action (actor, recipient) are cognitively more significant than syntactic relations such as subject of. Case grammars (Fillmore, 1968; Anderson, 1971) assign theoretical importance to these semantic concepts (although as parts of syntactic theory), and thus provide interesting alternatives to phrase-structure based grammars.
1.
Method
1.1
Sentences
The experiments were designed to test the cued recall of sentences under four conditions which varied according to specific syntactic and semantic properties of the noun which served as a cue. The noun cue was (1) Deep Subject and Actor (SA), (2) Deep Subject and Recipient of Action (SR), (3) Deep Object and Actor (OA), (4) Deep Object and Recipient of Action (OR).l These conditions are shown in the following sentences :
1. The ‘deep object’ varied somewhat in its surface relations, occurring, for example,
sometimes as the object of a preposition and sometimes as the indirect object of a verb.
Retrieval of sentence relations: Semantic vs. syntactic deep structure
(a) (b)
(c) (d)
97
The MAYOR publicly denounced the policies of the GOVERNOR. The MA YOR withstood great pressure from the GOVERNOR. The DEFENDANT reluctantly told his story to the PROSECUTOR. The DEFENDANT experienced embarrassment from the questions of the PROSECUTOR.
Sentences (a) and (b) form a paired experimental observation, and (c) and (d) form another. Sentences such as (a) and (c), in which actor and deep subject coincided, were called ‘Normal’. Sentences in which the actor was a syntactic object, such as (b) and (d), were called ‘Marked’. The capitalized nouns are the cues for recall and are the same words for both members of each pair. There were 10 pairs of 20 sentences constructed for the experiments. The cueing conditions are illustrated as follows: MAYOR is SA in (a) and SR in (b); GOVERNOR is OR in (a) and OA in (b). Two lists were constructed, so that, for each sentence pair, the SA sentence occurred on one list and the OA occurred in the other. Half of the sentences in each list were cued by NPl, and the other half by NP2. By comparing the effectiveness of a cue word under its two conditions, the relative contribution of the two controlled cueing properties is obtained. Other noteworthy characteristics of the sentences include the following: (1) The surface subject position was not varied. The first noun phrase (NPl) in surface structure was always deep subject. NP2 had various surface grammatical functions but was never deep subject. (2) There was a slight difference in the length of the two types of sentences comprising the experimental pairs. Normal (SA) sentences, exemplified by (a) and (c), averaged 8.9 words per sentence (range: 7-lo), while Marked (OA) sentences, exemplified by (b) and (d), averaged 9.9 words (range: 7-12). The slight length difference was mainly due to an additional grammatical word required by sentences in which the deep subject is not the actor, and thus the lexical density (Perfetti, 1969) of the types was about equal, .54 for SA and .52 for OA. Variations within types is of little consequence, since the main comparison was to be between the relative cue potency of the two words from the same sentence compared across the two sentence types.
1.2
Experiment 1: Subjects and procedure
Twenty-four University of Pittsburgh undergraduates participated twelve for each of the two lists. Each S was tested individually.
Hence its exact role in deep structure was not uniform, and it was sometimes the subject of a sentence embedded in the verb phrase. Its
in Experiment
1,
defining attribute was that it occurred in deep structure in a constituent dominated by VP.
98
C. A. Perfetti
Since the purpose was to test the retrieval of meaningful relations among sentences, as well as verbatim recall, it was important to ensure that the meaning of the sentences was processed. To this end, a procedure similar to that used by Blumenthal (1967) was used. Each experimental sentence was part of a ‘brief excerpt from a story’ which contained three sentences, the last one being the test sentence. E read the first two sentences to S, and then showed S a card with the third sentence which was then read aloud by S. Instructions to S were that he would be tested for his understanding of the story excerpt and for his recall of the final sentence. The following is an example of an excerpt heard by S: The governor was the mostpowerfulpoliticalfigure in the state. The mayor was known to be independent. The mayor publicly denounced the governor or The mayor withstood great pressure from the governor. All Ss heard the first two, while the final sentence varied according to whether it was a SA or OA condition. The above example illustrates a significant characteristic of the excerpts: Each noun that was to be a cue occurred exactly once in the preceding context sentences and always in the subject position. The input phase of the experiment was followed by a five minute interval in which S made judgments of line drawings. Then followed three tests: (1) Free recall. S was asked to write in a booklet of blank pages each key sentence that he had heard, one sentence per page. (2) First cued recall. S was now required to go through another booklet, this time writing a sentence next to its cue word, one per page. For example, the word Governor would be in the booklet for one S, while another S would respond to Mayor. (3) Second cued recall. Finally, S went through a third booklet which had the alternate cue from each sentence. If Governor had occurred in the first cued recall test, Mayor occurred in the second.
1.3 Experiment 2 In order to observe a direct relationship between the two nouns of the sentence, a second experiment was conducted. The materials were identical to those of Experiment 1, and the procedure differed only in that Ss were asked to recall only the other noun from the sentence given one noun as a prompt. This is essentially a replication of the two prompted recall conditions of Experiment 1 for the case in which S does not have to produce the entire sentence. Recall data should be a fairly direct indication of the stored connection between the two nouns. Twenty-four Ss were assigned to cueing and sentence type conditions in the same fashion as in Experiment 1.
Retrieval
2. 2.1
of sentence relations: Semantic vs. syntactic deep structure
99
Results and Discussion Recall measures
In these experiments, the concern was with the retention of meaning rather than with exact reproduction of the sentence. Accordingly, responses were considered to deviate from perfect reproduction in an ordinal manner which could be reliably scored and which roughly corresponded to an underlying scale of meaning retention. Thus a response was placed into one of the following categories, ordered from least to most retention : 1. Absent or no response other than the prompt. (A) 2. Partial recall. Meaning not preserved. (P) 3. Inference recall. Meaning not preserved, but recall implicationally related to target sentence. (I) 4. Meaning-preserving recall. (M) 5. Verbatim recall. (V) Categories M and V were of primary interest since they are measures of meaningpreserving recall. The criterion for M was that the recall directly revealed the relationship between the actor and the recipient; that is, the relationship ACTION (ACTOR, RECIPIENT) was maintained without respect to perfect lexical or syntactic recall. The categories are cumulative in the sense that any higher category included the amount of meaning preservation reflected in a lower category. Specifically, meaning preservation recall (M) included both categories (M) and (V). (I) and (M) do require judgment in scoring, and they are illustrated in the following response protocol: Target sentence: The artist suflered insults from the pen of the critic. (P) The artist drew the pictures. (I) The critic gave a bad review to the performance. (M) The artist was insulted by the pen of the critic. It is easy to see that (P) is well off target, but the (I) and (M) judgments do require some explanation. In (M), although the words are not quite all there, it is clear that the relationship among elements in the target sentence has been preserved, viz. INSULT (ARTIST, CRITIC). In the case of (I), this is not quite the case. What has been produced is a relationship that is consistent with the meaning of the target sentence; if the artist was insulted at the pen of the critic, then it is quite consistent psychologically to infer that the critic gave some ‘performance’ a bad review. It does not quite preserve meaning because it fails to reflect INSULT (ARTIST, CRITIC). In general, word substitutions which did not change meaning relationships in the sense outlined above were counted as (M). Sentences which failed to produce the relationship but were otherwise inferentially consistent with the target were (I) and
100
C. A. Perfetti
those that were neither were (P). Despite the potential difficulty of the classification task, there were few problems and very good agreement in classifying. The interjudge agreement between two judges was 90 ‘A overall and 95 % when based only on whether a given sentence was to be M or better vs. I or worse. In all subsequent reports of results the measure is the cumulative M category of M plus V except where noted.
2.2
Experiment
1
The measures described above were applied to all response attempts of the first experiment. Cumulative percentages for three of the categories are shown in Table 1. These data show that as the criterion is relaxed, the relationship between sentence type and retention remains essentially unchanged. The nature of this relationship is most clearly seen in Figure 1, which shows the percentage of M responses for successive recall attempts for both Normal and Marked sentences.
Table
1.
Recall data - Experiment
1 Cumulative Recall Categories
Type
Condition
Normal
Unaided Prompt Nl Prompt N2
Marked
Unaided Prompt Nl Prompt N2
Note: The column categories represent successively relaxed response criteria with cumu-
IV
M
I
.05 .I0 .17
.23 .55 .68
.32 .63 .72
.03 .05 .07
.23 .42 .43
.30 .53 .58
lative proportions. The prompting are for the first prompting only.
conditions
There are two rather interesting aspects to Figure 1. For one, Normal sentences are clearly better retained than Marked sentences; but, for the other, this superiority is only under conditions of prompting. When Ss are initially asked to recall all sentences they are equally able to produce the meaning of Normal and Marked sentences. However, when first one noun and then the other is used to prompt the S, there is a significantly greater improvement for normal sentences compared to marked sentences. The low-level conclusion is that somehow both subject and object (or actor and recipient) nouns provide better access to the meaning of sentences in which the actor and the subject are the same than to sentences in which the actor is not the subject.
Retrieval of sentence relations: Semantic vs. syntactic deep structure
Figure
1.
I
101
Retrieval of meaning on successive recall attempts
!
UNAIDED
I
FIRST PROMPT
I
SECOND PROMPT
SUCCESSIVE RECALL ATTEMPTS
Data on the other variable indicate that it did not much matter which noun was used as a prompt. Here the measures are taken only on the first prompted recall trial, since the interpretation of a second prompted recall is rather difficult. The improvement in M recall from the unaided trial to the first prompted trial actually provides the best indication of cue effectiveness. On the first prompted recall, for normal sentences the proportion of M responses was .55 when the first noun was the prompt and .68 when the second noun was the prompt. For marked sentences these figures were .42 and .43 respectively. A two-factor analysis of variance for repeated measures showed that this M improvement measure was affected significantly by sentence type (normal better than marked), (F = 5.27, p < .05), but only marginally by prompt (first noun vs. second noun), (F = 3.20, p < .lO). The interaction was not significant (F < 1). A similar picture emerges if, instead of improvement, absolute performance on the first prompting trial is measured, except that the error variance is somewhat greater, and the prompt factor has an F of less than one.
102
C. A. Perfetti
2.3
Experiment 2
Experiment 2 was designed to get a relatively simple measure of noun recall given noun prompt, without requiring S to produce the whole sentence. Thus on the one hand it provides a measure of what characteristic of the noun (syntactic category or semantic function) is important for its prompting effectiveness; on the other hand it provides a type of control for the first experiment. That is, are differences due to sentence type and prompt strictly related to the production of meaning in recalling sentences, or are they present in noun-noun recall? It is possible, of course, to get similar data for Experiment 1. First in unaided recall, what is the probability that one noun is recalled given that the other noun was also recalled? In prompted recall, the measure is the presence of the other noun, or an acceptable substitute, given one noun as a prompt while ignoring the remaining recall content. This is strictly analogous to the data of Experiment 2. The conditional probabilities are shown in Tables 2 and 3. Table 2 shows the unaided
Table 2.
Unaided noun recall - Experiment 1 Noun Type Sentence Type Normal Marked
(Nl) .38 .41
Note: The cell entries are proportions. The conditionalized proportions in the right half of the table are based on the recall of the lirst-
Table 3.
(N2/Nl) .85 .91
(N2) .36 .38
(Nl /N2) .93 .98
named noun given that S recalled the secondnamed noun.
Prompted noun recall ‘W/Nl)
W/W
(Ave.)
Exper. 1 Normal Marked
.70 .70
.77 .73
(.735) (.715)
Exper. 2 Normal Marked
.72 .72
.78 .75
(.750) (.735)
Note: The first column is recall of N2 given Nl as a prompt. The second column is the
inverse.
Retrieval of sentence relations: Semantic vs. syntactic deep structure
103
recall trial of Experiment 1 where the measure is the presence of the exact noun or a meaning-preserving substitute. (This liberal measure is required by the fact that the main interest in this research is in meaning preservation not verbatim recall, although the relative figures are the same when the verbatim requirement is imposed.) The first two columns show that there was virtually no difference in the probability of recalling the first versus the second noun, and no difference for Normal versus Marked. The last two columns show the very high degree of noun integration present in both types of sentences. Recall is all or none in the sense that if one noun is recalled, both are recalled. Differences between (N2, Nl) and (Nl, N2) are not significant. It is important to note this all or none characteristic, since the integration of a memory unit, as indicated by conditional recall probabilities of elements within the unit, has been related to what part of the unit is a good prompt for the whole unit (Horowitz and Prytulak, 1969). Specifically, for a well integrated unit, the best prompt is the most available part of the unit, whereas for a poorly integrated unit the best prompt is the least available part. Integrated recall is also evidenced for this experiment when the whole sentence is considered. In the unaided condition, the recall of a whole sentence which preserves meaning or is implicationally related to the target is about 74% given any recall at all. And this underestimates the degree of integration in the sense that it excludes many whole sentences which are sensible and related to the original context of the target sentence. In Table 3, the conditional prompted recall of nouns alone for both Experiment 1 and 2 can be seen. The fact that in Experiment 1 whole sentence recall was required made no difference. When prompted to recall the whole sentence, S was as able to produce the other noun from the sentence as when he was asked to produce only the other noun. Since instructions at input were the same in the two experiments, it is a compelling conclusion that the stored meaning of the sentence mediated noun recall in Experiment 2. Subjects recalled the sentence given the noun prompt, and then produced the other noun. There is no significant difference owing to sentence type, and there is no significant difference owing to prompt type (F < 1). The pattern for prompt type is the same as for the M measure but with the difference even smaller.
3.
Conclusion
The results of these two experiments suggest a way to think about sentential relations and the effect of sentence probes. When sentences are well integrated, as they were in the present experiments, there is no superiority in noun prompt effectiveness owing to either surface or deep syntactic category, where the categories are restricted to
104
C. A. Perfetti
subject and object. On the other hand, there is no superiority owing to the semantic role of actor and recipient. Rather, the subject of a sentence is a better prompt it it’s an agent, and the object is better if it’s a recipient. The fact that there were no differences in either free sentence recall or in free noun recall suggests that subsequent differences in prompted recall are related to the structural relations to which the retrieval cues belong, rather than to the general availability of the stored meanings. The retrieval power of the nouns appears then to be associated with the structural relations they enter into, and, in particular, is most powerful when they are associated with their normal semantic and syntactic functions. It is tempting, of course, to suggest that these results are significant for questions concerning the linguistic nature of deep structure. The interesting question of whether relational information contained in sentences is stored in a form more similar to case grammar relations or more similar to Chomskyan deep syntax can be raised, although no unequivocal answer can be given. Consider two of the examples cited previously. (a) The Mayor publicly denounced the Governor and (b) The Mayor withstood great pressure from the Governor. These sentences can be formally described by phrase structure grammars with NP in the first position, or as some configuration of cases (Fillmore, 1971), or as some other verb first structure (McCawley, 1970). Without regard to the variety of possible structural descriptions, there are two prominent hypotheses of significance. The Case Hypothesis assumes actor (agent) is a universal relation of significance for cognitive organization and predicts that it should provide the most powerful retrieval cue. Mayor should be best in (a), while Governor should be best in (b). The Deep Syntax Hypothesis assumes the cognitive significance of the functional grammatical relations that result from the base rules of transformational grammar. It predicts that deep subject should be a more powerful prompt; thus Mayor should be better than Governor in both (a) and (b). The data do not support one of these hypotheses over the other. They do not even support the prediction that the two hypotheses make in common, viz. that Mayor should be a more effective prompt than Governor in sentence (a). The fact is that neither case role nor grammatical function can predict this pattern of results independently. Thus, neither linguistic hypothesis is adequate without additional assumptions. A satisfactory model of the structural relations used in language processing may have to represent information - information that is usually sharply distinguished as semantic versus syntactic - as available in some interactive (non-independent) form.
Retrieval of sentence relations: Semantic vs. syntactic deep structure
105
REFERENCES Anderson, J. M. (1971) The grammar of case: Towards a localistic theory. Cambridge, Cambridge University Press. Blumenthal, A. L. (1967) Promoted recall of sentences. J. verb. Learn. verb. Beh., 6, 203-206. Blumenthal, A. L. and Boakes, R. (1967) Prompted recall of sentences. J. verb. Learn. verb. Beh., 6, 674-675. Fillmore, C. (1968) The case for case. In E. Back and R. T. Harms (Eds.) Universals
in linguistic theory. New York, Holt, Rinehart, and Winston. Horowitz, L. M. and Prytulak, L. S. (1969) Redintegrative memory. Psychol. Rev., 76, 519-531. McCawley, J. D. (1970) English as a VSO language. Language, 46, 286-299. Perfetti, C. A. (1969) Lexical density and phrase structure depth in sentence retention. J. verb. Learn. verb. Beh., 8,719-724.
Cet article d&it deux experiences portant dune part sur le rappel stir&e de phrases present&z dans un contexte particulier, et d’autre part le rappel non&mule de ces mi-?mes phrases. Certains mots clef de ces phrases ont CtC introduits de facon a posseder le meme sens sup&hciel tout en appartenant a des categories syntactiques et des fonctions semantiques indtpendantes. Pour les phrases dans 1esquelIes la fonction semantique d’acteur et de recepteur
coincidaient respectivement avec la fonction syntactique de sujet et d’objet profond, le rappel stimult Btait meilleur que pour les phrases qui ne manifestaient pas cette coincidence normale semantico/syntactique. Le. rappel non-stimule Ctait identique dans les deux cas. Les deux genres d’information normalement utilids dans le langage courant apparaitraient done indispensables dans le traitement de phrases modPle.
5
Time, tense and aspect*
J. P. BRONCKART
and H. SINCLAIR
Universith de Genbe
Abstract The present paper investigates the use of French verbal forms by children between the ages of 2,ll and 8,7. An experiment is presented demonstrating that these Ss do not only use tenses to indicate the relationship ofposteriority, anteriority or simultaneity between the events described and the moment of enunciation, but that aspectualfactors intervene. Seventy-four subjects were asked to describe eleven actionsperformed by the experimenter with toys; these actions dtyered in: Type of result, frequence and duration. For all subjects the type of result influences the choice oj‘ the verb forms. More objective features (frequence and duration) exert an influence between the ages of 3 and 6; after that age, the use of tenses begins to resemble adult usage in which the different verb forms are mainly employed to express temporal relationships. Other aspectual and temporal markers show a similar development with age.
Recently the study of language has acquired new epistemological importance, since it has become widely accepted that every problem of language use and acquisition is a problem of cognition as much as of the mastery of verbal expressions. Experimental work on psycholinguistic problems is often done from a particular epistemological point of view, and to clarify our own theoretical position a few general remarks seem to be in order. The basic assumptions underlying our psycholinguistic research have been mainly derived from Piaget’s theory of cognitive development and may be briefly stated as follows :
* This study was partly supported by the F.N.R.S. (F3elgium) and by grant G69-469 from the Foundation’s Fund for Research in Psy-
chiatry. Authors’ address: Ekole de Psychologie, Universitt de Genkve, Palais Wilson, 52 rue des PLquis, 1211 Geneva 14, Switzerland. Cognition 2(l),
pp. 107-130
108 J. P. Bronckart
and H. Sinclair
1) Starting from isolated action-patterns, more and more general coordinations are achieved, first on the level of pre-verbal practical intelligence, then on the level of intuitive thought, followed by a first period of thought operations based on a logically coherent, but still limited, system and later on the level of hypothetico-deductive thought. 2) These action and thought patterns have two aspects as far as the knowledge resulting from them is concerned. On the one hand, any particular action (perceptual, manipulatory or mental) can lead to better knowledge of the outside world, i.e., about properties of objects and their interactions. On the other hand, the coordinations of action and thought patterns have certain general characteristics, leading to reversibility, transitivity, commutativity, etc. These general features, not of the patterns themselves, but of the ways in which they can be combined, are the source of logicomathematical knowledge, of the structure of the conceptual framework that permits coherent mental computations. The development of these two complementary aspects is itself contrastive and complementary. In fact, logical operations are more powerful the more general they are, whereas knowledge of object properties is more powerful when it is more specific. Classificatory patterns that can deal with the characteristics common to one class are less powerful than those that can deal with relationships between classes. By contrast, to know that some objects are ‘throwable’ and others are not is a less powerful physical concept than that of weight or that of density, which are more specific. 3) Verbal patterns are constructed in close connection with cognitive development (though certainly their construction cannot be reduced to cognitive development alone). Language proper starts after practical intelligence has reached both a certain level of knowledge of object-properties (e.g. their ‘retrievability’ as in Piaget’s objectpermanency task) and a certain level of action-patterns (the recapitulation and coordination of the different movements by which the object was hidden in the same task). What permits the child of whatever culture to start language acquisition is the fact that he disposes of this heuristic model, which leads him to approach the data in a certain manner, and the fact that every language belongs to a class of possible languages, these similarly being determined by universal characteristics of the human mind. 4) During the first period of language acquisition the child elaborates the most fundamental grammatical relationships (apparently universal, cf Slobin, 1973) as they are reflected in his mother tongue. After the elaboration of these fundamental grammatical relationships, the particularities of the mother tongue become more important. As happens with other cognitive acquisitions, morpho-syntactic patterns may at first form isolated totalities, applicable only to certain sentence-types. The treatment of flexions, desinences, etc., present in the mother tongue, demands a
Time, tense and aspect
109
cognitive activity of pre-inferences and inferences, which depends on the child’s cognitive competence. Nevertheless, languages can present systems of very different complexity to express the same relationships; as Slobin (1973) remarks, the linguistic means can be extremely complex and the relationships expressed simple (such as the Arabic plural system, c$ Omar, quoted by Slobin, 1973) ; other linguistic sub-systems, though structurally simple, are used to express complicated relationships (such as logical implication or quantitative concepts). 5) As his cognitive competence grows, the child will make hypotheses about the meaning of linguistic features he encounters in adult speech. However, when he starts producing these features, one cannot conclude that his idea about their function conforms to the function these features present for adult speakers. Even when he ‘understands’ utterances comprising such features in everyday conversation (which is always underpinned by pragmatic and situational data) and even if he produces them in grammatically correct sentences, the child may very well have ‘understood’ them in a different way from the adult, a fact that can only be brought to light by careful experimentation. An example of this last point is provided by the way young children use the French verb system. In several experiments we observed that young children seem to use French conjugation in a rather peculiar manner, even though at a very early age (three, or even before) most of the frequently used forms are morphologically correct. Observational records show present, past and some future tenses (cJ Gregoire, 1947); the absence of lesser-used forms such as pluperfect or preterite future need not surprise us. In experimental production tasks (description of certain events), however, we noted that children had a preference for certain tenses with certain verbs; e.g., they almost always said il lave la voiture (‘he is washing the car’) and il a pouss6 la balle (‘he kicked the ball’), though the description was always requested after the event had taken place. We also noted that in the description of simple events younger children seemed to single out different features from those most frequently described by older children. Three-year-olds, for example, after having been shown a red truck pushing a green car, would say ‘there’s a green car’; slightly older subjects were fond of announcing ‘there’s something wrong with the car’ or ‘the car is in the garage’. Such differences did not surprise us unduly; obviously any event has many features that can be chosen for description, and according to age and familiarity with the situation one of them may be much more interesting than another. The choice of specific linguistic means to describe the event, particularly the apparently deliberate choice of certain tenses, is more surprising. The French verb-system is complex and provides the speaker with a choice of tense, voice and mood. No neutral form of the verb exists (in contrast with some other languages); a choice always has to be made among the different verb endings
110
J. P. Bronckart and H. Sinclair
possible, and this choice almost always implies the indication of temporal relationship (i.e., the speaker indicates the temporal relation between several events or between the moment of utterance and some event). By contrast, what are called aspectual functions are only rarely, and never obligatorily, expressed by verbal forms. Time and aspect have been traditionally considered as constituting two distinct categories, the first accounting for relations of anteriority, posteriority and simultaneity, and the second being variously described as ‘accounting for relations of the type perfective-imperfective’ (Meillet, 1922) or ‘expressing temporal contours of actions’ (Hackett, 1958). Aspect comprises several distinctions (durative versus nondurative, perfective versus imperfective, etc.). Adult speakers of French use verb endings almost exclusively to express temporal relationships; the latter can, of course, also be expressed by many other means. Aspectual shadings are introduced by adverbs, adverbial locations, and, up to a point, also by certain verbal forms, such asjinir de, Ctre en train de, and certain uses of the passe simple and the imparfait. However, the specific temporal function of these endings (which allows native speakers to start a sentence with a temporally posterior event and finish it with an anterior event while still making the correct succession clear) is not established until relatively late, as has been shown by Ferreiro (1971). If the role of tenses as indicators of temporal relationship between events only begins to be apprehended from the age of six onwards, what is their role before that age? Do they belong to the deictic category (linking the description to the moment of utterance) before they can be used to indicate the temporal link between two propositions? Or are these tenses used more or less haphazardly? Or is the use of tense by young children an example of Slobin’s (1973) dictum ‘New forms first express old functions, and new functions are first expressed by old forms’? Several observations, and especially some of Ferreiro’s results, seem to point to the possibility that verb forms are at first used to express aspectual features rather than temporal relationships. In certain situations, Ferreiro presented two simultaneous events: e.g.. a boy-doll pushed a truck (long duration), and at some point during this action a cat knocked over a bottle (short duration). This was the only situation in which her youngest subjects systematically used two different tenses, a prbent or imparfait for the long-duration event, and a pass6 compost for the short duration. The children never added any connecting words such as pendant que (‘while’), and from their responses in the other, successive situations, it did not seem as if they were trying to express a temporal relationship of simultaneity between the two events. Rather, they appeared to express what Hackett calls the ‘temporal contour’ of each event separately. We therefore decided to study in detail the use of verb forms in the description of events, presenting certain situations comparable to Ferreiro’s simultaneous situations,
Time, tense and aspect
111
but mainly concentrating on single events and the descriptions the children gave of these events, without the experimenter introducing any specific constraints. The events were either visually apprehended (movements of toys) or auditorily (sounds produced by dolls or toy-animals). They differed in certain characteristics : Some actions led to a clear result, others did not; some took a certain time, others were almost instantaneous; some actions were repetitive, others were not. All actions presented could be expressed by commonly used verbs of which past tense formation presents no problem and whosepasst compost or imparfait forms have often been observed in spontaneous utterances of very young children. Though the full French verb-system is complex, especially as regards agreement constraints on coordinates and subordinates, our experimental procedure only concerned a small and simple part of this system, i.e., third person past or present indicatives.
1.
Experimental
1.1
Population
procedure
We interviewed 74 children between 2,ll and 8,7 from public nursery schools, kindergartens and primary schools of Onex (Geneva) constituting a homogeneous socio-economic group, of working class milieu. They can be divided into the following groups, corresponding Age group 1: 2,11-3,ll Age group 2: 4,3-4,ll Age group 3: 5,1-5,ll Age group 4: 6,2-6,ll Age group 5 : 7,1-8,7
1.2
generally to degrees of schooling. (average age 3,7); N= 15 (average age 4,7); N= 15 (average age 5,6); N= 16 (average age 6,6); N = 15 (average age 7,8); N = 13
Material
A collection of toys : Dolls (two boys and two girls and in addition one girl and one baby doll provided with a transistorized system to produce sounds of variable duration), five cars, two trucks, one cat, one sheep (which produces a sound when pressed), one duck, one fish, a basin of water, a farm with several animals (horse, cow, dog, etc.), several fences, a pram, a ball, several bottles and boxes, a football goal, a garage, etc.
1.3
Production task
The usual Genevan
technique
for obtaining
utterances
was employed,
with a certain
112 J. P. Bronckart and H. Sinclair
number of special precautions. The experimenter first shows the toys to the children, asks them to name the toys, and explains what they are if the children do not seem to recognize them. Secondly, the experimenter says: ‘I’m going to do something with the toys, and you are going to look carefully at what happens, and after that, you are going to tell me everything. Try not to forget anything.’ The experimenter then proceeds to perform some actions with the toys (e.g., the dog knocks over the bottle, the horse jumps over the fence, the boy plays with the ball). He induces the child to give an adequate description of the event, and he makes it quite clear that the child has to wait till the action is finished before starting his description. Once the children have got used to this method, the presentation of the experimental situation starts, and from then on the experimenter performs the action, puts down the toys, and says simply raconte (‘tell me’ or ‘tell me the story’ - there is no current one-word English equivalent for this French expression). The systematic use of the simple imperative form avoids suggestions of the use of certain tenses which might be found in instructions such as ‘What did you see?’ or ‘What happened ?‘. We noted, in fact, that young children tend to use the same tense as the experimenter when faced with such questions. After the child has given his first description, the experimenter asks whether he has anything else to add. If the first description does not mention the action (e.g., ‘the horse is near the farm’) the experimenter tries to obtain mention of the action (always taking care not to use any conjugated verb-forms). All utterances are tape recorded. Each child was also tested for his understanding of verbal forms in a comprehension task, but these results will be described and discussed in other papers. 1.4 Experimental design of the production task Many of Piaget’s and Inhelder’s experiments (I$ 1946) have shown age differences in the way young children apprehend simple actions. Very often, only the end result is focussed on by the youngest children, followed by a focus on the initial state of affairs; only later will their focus shift to the transforming action itself. A subsequent mental link between the three different points of centration will result in an understanding of the problem. This general phenomenon in cognitive development led to the choice of three types of events: 1) Six actions which give a clear result, cover a certain distance in space, and terminate at a predetermined spot. These actions will be called perfective events. 2) Two actions which do not have any result, consisting of more or less circular movements of animals in their natural habitat. These actions will be called imperfective events. 3) Three events that are perceived auditorily rather than visually, consisting of animal or human cries. Since for this type of action the distinction between perfective
Time, tense and aspect
113
and imperfective is not pertinent, they will be called aperfective events. Inside these three categories, events varied in the following features : - the duration of the action varied.from 3 second to 15 seconds. Fraisse (1948) has demonstrated that actions lasting less than 3 seconds are apprehended in a very different manner from those lasting more than 3 seconds; the first produce a relatively simultaneous perception of duration, the latter lead to a quantitative or qualitative estimation of duration. Accordingly, events that last less than 3 seconds were called non-durative; those that last longer than 3 seconds were called durative. - actions were either repeated or continuous and were called frequentative or nonfrequentative. - some actions attained an explicitly stated aim, others did not. This feature distinguishes events 3 and 4 from events 5 and 6 (see below) which in all other respects are similar. We have thus eleven simple events. Eleven ‘double’ actions were also presented, but these will not be discussed in the present paper. A. List of simple events.’ 1) P/D (10 sec.)/nF; a truck slowly pushes a car towards a garage. 2) P/nD (1 sec.)/nF; a car hits a marble which very rapidly rolls into a pocket. 3) P/D (10 sec.)/F/S; the farmer jumps over ten fences and reaches the farm. 4) P/nD (2 sec.)/nF/S; the farmer’s wife jumps in one big jump over ten fences and reaches the farm. 5) P/D (5 sec.)/F/nS ; the cow jumps over five fences and does not reach the stable. 6) P/nD (1 sec.)/nF/nS; the horse jumps over one fence and does not reach the stable. 7) I/nD (3 sec.)/nF; a fish swims in the basin (circular movement). a duck swims in the basin (circular movement). 8) I/D (15 sec.)/nF; 9) A/nD (+ sec.)/nF; the sheep bleats once. the cat cries eight times (cries: 3 sec., intervals: 3 sec.). 10) A/D (8 sec.)/F; the baby emits a long wail. 11) A/D (8 sec.)/nF; B. Order of presentation. The eleven simple events are grouped into three series: - actions 1, 2, 7, 8, i.e., non-frequentative perfective and imperfective events. - actions 3,4, 5, 6, i.e., frequentative, non-frequentative, successful or non-successful perfective events. - actions 9, 10, 11, i.e., aperfective events. All subjects are first presented with series 1, then with series 2, and finally with series 3. However, inside each series the items are rotated.
1.
Key: P = perfective D = durative I = imperfective F = frequentative
A = aperfective
S = success n =not
114
2.
J. P. Bronckart
and H. Sinclair
Results
A small number of descriptions did not mention the action; e.g. (3, IO): le camion et la voiture (‘the truck and the car’). A few of the youngest subjects gave such ‘static’ descriptions for certain actions, and no repetitions or suggestions by the experimenter could bring them to give ‘dynamic’ descriptions. Twenty-six static descriptions wexe noted, and all analyses of verb forms concern the 788 ‘dynamic’ descriptions.
2.1
Analysis
of tenses usedfor
the three major categories of events
Four tenses appeared in the descriptions: passe composC (N = 465, 59 %), present (N =286, 36 %), and imparfait or plus-que-parfait (N = 37, 5 %). 2.1.1 Perfective events For these actions, we note a very frequent use of passe compose; 78 % of all verbs used are in this tense, 19 y0 are in present and 3 ‘A in imparfait. Table 1 shows the distribution of these tenses in the six perfective events. Table
1.
Distribution of tenses* in the description of the 6 perfective events Difference between the results for the 3 nD events: X2 =I,@; p > .05 Dtrerence between the results for the 3 D events; X2 = 8,72; p < .05 Dtyerence between the results for T nD events and T D events; .X2 = 22,98; p <.Ol
nD events
P-C
Pr.
2. P/nD/nF 4. P/nD/nF/S 6. P/nD/nF/nS
61 61 67
8 10 5
T nD events
189
23
T
D events
P-C
Pr.
I
T
3 0 1
72 71 73
1. P/D/nF 3. P/D/F/S 5. P/D/F/nS
41 50 56
24 22 13
I 1 1
72 73 70
4
216
T D events
147
59
9
215
I
* Note: In this and all following tables, P-C =passd compos6, R =pr&ent, I = im-
parfait.
Passes composes are used more frequently for the description of nD events than for D events (X2 test shows a very significant difference2). The differences between the results for the three D actions are significant, but those for the three nD actions are not. The analysis by age groups (percentages of tenses used by each of these groups) shows another difference between D actions and nD actions (cf. Fig. 1). For nD actions, we note no developmental trend; all age groups give the same type of answers 2. All significance tests applied are X2; differences are labelled non-significant when
p>.O5, significant when p<.O5, significant when p<.Ol .
and very
Time, tense and aspect
115
and the proportion of pass& compose to other tenses remains constant (90 %). In the description of D actions, children of the first age group produce, generally speaking, the same proportion ofpasst composC andprksent, but the use of thepresent diminishes with age, whereas that of the pas& compose increases.
.. / 1.77
Figure
100%
1.
The proportion by age of the use of the P-C tense in the description of nD events (A) and D events (B)
.
.
50%
‘: . \
lOO%r
‘.
./
50% -
t
t *-A .-¤
event 2 event 4 event 6
I
II
m---m event 5 .---A event 3 *__a event 1
/ I
III IV Age groups A
* V
I
I
II
Ill Age groups B
IV
V
These results justify a subdivision of category A: Sub-category Al comprises perfective non-durative actions, and A2 perfective durative actions. In Al, we have thus an almost exclusive use of the passe compos8; it is particularly clear for event 6 (P/nD/nF/nS). Some examples of descriptions given for this event at different ages are the following: (2,ll) il est Id . . . il a saut.2 (‘he’s there . . . he jumped’). (4,5) le cheval, il a sautk aprds la barrisre (‘the horse, he jumped over the fence’). (7,8) il a sautk la barri&e d’un coup, et il attend unpeu (‘he jumped the fence in one go, and now, he waits a bit’). For event 4 (P/nD/nF/S) and event 2 (P/nD/nF), results are almost identical, with two exceptions; for event 2, the use of p&sent increases in age group 4 to 30 ‘A, for event 4, the use ofprksent increases in age group 2 to 36 %. This second increase may be due to the fact that this action took objectively more time (2 sec.) than the others (1 sec.). However, this does not explain why the increase occurs only at that age.
116
J. P. Bronckart and H. Sinclair
In A2, the descriptions given are mostly in pas& compos&, and sometimes in the present tense. The following are some examples of descriptions for event 3 (P/D/F/S) : (3,s) Y monte sur Ies barrizres . . . il est h la maison (‘he mounts the fences . . . he is at home’). (4,ll) II va chez Iui, et il est Ii, Id, ii et Id (‘he goes home and he is there, there, there and there’). (3,5) Le monsieur, il a march& (‘the man, he walked’). (5,2) II a sautd sur chaque barrisre et il est alIt! dans la ferme (‘he jumped over each fence and he went into the farm’). Beyond the age of 6, all descriptions are of the following type: (6,8) il a saute sur toutes les barrisres (‘he jumped over all the fences’). For event 5 (P/D/FnS) and event 1 (P/DnF), the descriptions show the same pattern, with two exceptions; - event 1 gives rise to a considerable proportion of p&sent and imparfait in age group 4 (45%); for action 2, we noted the same phenomenon (see below); event 5 brings fewer pr&ents and imparfaits than the two other actions in this category in each age group; this could be explained by the fact that this action take objectively a shorter time (5 sec.); further analysis will confirm this hypothesis. 2.1.2 Aperfective events These events are described with equal frequency in the pass& compose’ (52 %) and the prt%ent (45 %); only 3 % of all descriptions are in the imparfait. Table 2a shows the distribution of these tenses for the three aperfective events; it appears that the proportion ofpassk compose! is more important for event 9 (A/nD/nF) than for 10 (A/D/F), and more important for event 10 than for 11 (A/DnF). However, this difference in the frequency of passe composP is statistically non-significant. Table 2.
Distribution of tenses in the description of the 3 aperfective events (A) and the 2 imperfective events (B) Difference between the results for the 3 aperfective events; X2 = 4,98; p>.o5 DifSerence between the results for the 2 imperfective events; X2 = 2,08; p>.o5 B
A
Aperfective
P-C
Pr.
I
T
Imperfective
9. A/nD/F 10. A/D/F 11. A/D/nF
44 39 30
28 30 40
2 2 2
14 71 72
7. I/nD/nF 8. I/D/nF
103
98
6
217
T Aperfective
T Imperfective
P-C
Pr.
I
T
8 8
55 51
6 12
69 71
16
106
18
140
Time, tense and aspect
117
The analysis by age group again shows a trend towards more descriptions in the past tense for the older subjects, and a corresponding decrease in present tense descriptions (cJ Fig. 2).
Figure 2.
The proportion by age of the use of the P-C tense in the description of aperfective events
lOO%-
50% -
.\\ a/ / HA
A_ . ‘\
‘d
.
/’
/’
.
l-.
/
.
. /
A---* .-A
.-. I
event
9
event10 event11 II
III
Age
groups
IV
V-
Examples of descriptions given for event 10: (3,lO) II crie plein des fois (‘he cries a lot of times’). (397) II a parlk, il a cri& (‘he said something, he cried’). (593) Yfait de la musique (‘he makes music’). (597) II a miauli beaucoup de fois (‘he miaowed many times’). (796) Le mouton a crit (‘the sheep cried’). 2.1.3 Imperfective events These two events give rise to 76 % pkents, 13 % imparfaits and 11% passe’ compost%. Table 2B shows very small differences (statistically non-significant) between the descriptions of the two events. As follows from Figure 3, no developmental trends are discernible except for the fact that from six years onwards, some children use the imparfait.
118
Figure
J. P. Bronckart and H. Sinclair
3.
The proportion by age of the use of the P-C tense in the description of imperfective events
Age groups
Examples
of descriptions
given for event 8 (I/D/nF):
(3,7) Le canard, yjotte (‘he’s floating, the duck’). ($7) II y a un poussin qui est sur I’eau, il s’amuse sur I’eau (‘there’s a chicken on the water, he plays on the water’). (6,6) Le canard navigue dans I’eau (‘the duck sails in the water’). (7,s) Dans le lac, il y avait un canard qui nageait (‘in the lake, there was a duck that was swimming’). 2.1.4 Comparison between categories A, B and C The division into A, B and C categories was based on the ‘perfective’, ‘aperfective’ and ‘imperfective’ character of the events presented; to these different action features correspond different description-patterns. Actions that in our experimental situations always achieve a certain result (e.g. push, jump over obstacles) are mainly described in passe composk, whereas actions that are performed without any kind of observable result or final state (e.g., the gentle swimming of a duck or a fish) are described in the prbent. The sound-producing situations are described in almost equal proportions in the pr&ent and the pas&
Time, tense and aspect
119
compo.kS In category A, nD actions (Al) are to be distinguished from D actions (A2). It appears from Table 3 that the differences in the use of tenses in these four categories are very significant, especially for the two extreme categories (Al and C).
Table 3.
Distribution of tenses in the description of the 4 categories of events D@erence between the results for the 4 categories; X2 = 220,57; p < .Ol Categories Al A2 B C
RP. X2 RP. X2 RP. X2 RP X2
P-C
Pr.
I
T
189 29,7 147 392 113 032 16 53,7
23 39,l 59 496 98 427 106 60
4 3,62 9 02 6 197 18 19,9
216
286
37
788
T.Rp. 465 X2 = 220,57.
215 217 140
Calculated by age groups, the differences between these categories remain very significant; we obtain the following X2’s: 39, 86, 40,05, 24,04, 46,47 and 76,33 respectively for age groups I, II, III, IV and V. In addition, two developmental trends are observable: 1) A slow progression towards the exclusive use of pas& composk in perfective and aperfective situations (resp. 69 % and 45% in the youngest age group and 94% and 74% in the oldest one); 2) the appearance of imparfaits in the oldest group (V) for imperfective situations for which the present tense is used exclusively by all other age groups (see Fig. 4, p. 120). 2.2
Analysis of the role of duration, frequence, success or failure
Inside each of our categories, perfective, aperfective and imperfective, events differ in 3. Crying, shouting or wailing cannot be considered as actions that can either have a clear ending or not, except, maybe, in the case of a sudden extinction of the shouter’s voice. Therefore the experimental situations cannot be defined as either ‘perfective’ or ‘imperfective’. However, the children often linked the action of crying to an ‘extrinsic’ aim, a result that could be obtained because crying may
cause somebody else to do something. Many subjects expressed this idea, e.g. (5,ll) II appelle maman (‘he’s calling his mummy’). From this point of view one can suppose that the longer and the more differentiated the cry, the more the absence of a result will be noticeable; and indeed, for the long cries, the results are nearer to those for the imperfectives.
120
J. P. Bronckart and H. Sinclair
Figure 4.
Proportion of passe compose, present, and imparfait or plus-que-parfait tenses used by all subjects for each category
lo@%-
50%
0
pass6 composk
m
prh?nt
m
imparfait
or plus-que-parfait
-
Perfective
(
:
+
Aperfective
(B)
lmperfective
(C)
their total duration, in their frequentative or continuous character and in the failure or success of the action. What is the relative importance of these different dimensions?
2.2.1 Perfective events Duration. Three events are considered as durative (D actions; 5 and 10 sec.), and
three other events as non-durative (nD actions; 1 or 2 sec.). Table 1 showed that the
Table 4.
Distribution of P-C and Pr.+I events by each age group
tenses used in the description of nD and D
Age groups
D events D events
x:
I 397
II 497
III 536
IV 696
V 796
Tenses P-C Pr+I
Tenses P-C Pr+I
Tenses P-C PrfI
Tenses P-C Pr+I
Tenses P-C Pr+I
36
39
37
38
1
35
4
39
4 23
21 19,81
7
27
15 4,19
7
30
16 4,70
8
34
10 0,33
1,92
Time, tense and aspect
121
of pas& composC is more frequent in nD actions than in D actions, and that this difference is very significant (X” = 22,98) for the total population. The same statistical analysis shows a significant difference for age groups I, II and III, and no statistical difference for age groups IV and V (see Table 4). For reasons of statistical method, the rare imparfaits and extremely rare plus-que-parfaits had to be grouped with another tense. For all perfective events, both these rarely used tenses follow the developmental trend of the prksent: Ifpr&sent tenses are used, some imparfaits appear also; when the pr&ent disappears, the imparfait no longer occurs. In Tables 4 and 5, plus-que-parfaits and imparfaits are therefore grouped with the p&sent (PrfI) and compared with pass& compost (P-C). Two events can be compared which differ only in duration: Event 1 (P/D(lO sec.)/ nF) and event 2 (P/nD(l sec.)/nF). In both cases, the action is ‘pushing’ and the distance covered is 100 cm, but in event 1, the pushing action takes 10 sec., as against 1 sec. in event 2. Responses are disbributed as follows: use
Table 5.
Distribution of P-C and Pr+ I tenses used in the description of PIDInF and PlnDlnF events by each age group D$erence between the results for P/DInF and PlnDlnF events; age group I: X2 =11,84; p < .Ol, age group group II: X2 = 4,70; p < .05 age group III: Xz = 1,53; p > .05, age group IV: X2 = 0,56; p > .05 age group V: X2 =2,16; p > .05. Age
Events l.P/D/nF 2. P/nD/llF
groups
I 3-7
II 497
III
IV
V
596
696
796
Tenses P-C Pr+I
Tenses P-C Pr+I
Tenses P-C Pr+I
Tenses P-C PrSI
Tenses P-C
Pr+I
6
9
8
6
10
6
8
7
11
2
14
1
13
1
12
3
10
5
13
0
Differences shown are very significant in the first square, significant in the second, and non-significant in the three others. From these two analyses by age groups, it appears that children below 5 or 6 use different verb forms to indicate differences in duration. From the age of 6 onwards, longer or shorter duration does not seem to influence the choice of verb forms for the description of perfective events. A different type of analysis confirms this conclusion; the six perfective events can be grouped as follows according to their duration :
122
J. P. Bronckart and H. Sinclair
events 2 and 6 (P/nD/nF and P/nD/nF/nS): 1 sec. event 4 (P/nD/nF/S) : 2 sec. event 5 (P/D/F/nS) : 5 sec. events 1 and 3 (P/D/nF and P/D/F/S) : 10 sec. As shown by the following Figure 5, the use of present and imparfait increases the increase of duration, and this increase is clearest between 3 and 6 years. Figure
5.
with
The proportion of P-C and Pr+I tenses is represented respectively by dark and blank columns for actions of I’, 2’, 5’ and 10’ duration. Areas withfull upper limit indicate this proportion for age groups I, II and III; areas with broken upper limit indicate the proportion for all age groups
100% l-
Frequence. The influence of frequence can be analysed only for the durative events, two of which are frequentative, and one non-frequentative; the three non-durative events are all non-frequentative. We saw (c$ Table 1) that differences in the use of tenses between the three D actions are significant. If we compare the tenses used for the two frequentative actions (3 and 5) with those used for the continuous action (l), we find a very significant difference (X2 = 6,6). The continuous versus frequentative feature can thus influence the choice of tenses. When action 1 and 3, which only differ in the frequentative or continuous character of the action, are compared, a significant difference between use of Pr+I and the use of P-C appears only for age group IV, e.g., between 6 and 7; these children use the present tense more often for continuous events than for frequentative ones. Thus, duration influences the choice of tenses for children until the age of 6, whereas, between 6 and 7, frequence seems to become predominant. This shift could explain the increases of present and imparfatt in events 1 and 2 noted for age group IV (see p. 116).
Time, tense and aspect
Table 6.
123
Distribution of P-C and Pr+I tenses used in the description of PIDInF and P/D/F/S events by each age group Difference between the results for PID/nF and P/D/F/S events; age group I: X2 = OJ4; p > .05, age group II: X2 = 0,16; p > .05 age group III: X2 = 0,02; p > .05, age group IV: X2 =3,98; p < .05 age group V: X2 = 0,38; p > .05. Age groups V
I
II
III
IV
397
4,7
5,6
636
736
Tenses P-C
Tenses P-C PrSI
Tenses P-C
Pr+I
PrSI
Tenses P-C
PrSI
Tenses P-C Pr+I
1. P/D/nF
6
9
8
6
10
6
8
7
11
2
3. P/D/F/S
7
8
9
5
9
6
13
2
12
1
Events
Success or failure. This distinction cannot be separated from other features in our investigation, since the two failure actions take half the total time of the two success actions. The small difference in tense use observed may therefore be attributed to the difference in duration (see above). Moreover, from the children’s comments it is clear that to them the failure of an action (e.g., the cow jumping over five fences without reaching the farm) is just as important a result as success. 2.2.2 Aperfective events These events differ both in duration and frequence. As we noted before, in the description of nD event (9: 3 sec.) the proportion of passe composes is more important than in the two D events (10 and 11: each 8 sec.), and between these two D actions, the proportion ofpasst composes is more important in frequentative event (10) than in the continuous one (11). This confirms the results obtained for perfective events, but none of the differences are statistically significant. 2.2.3 Imperfective events The two imperfective events difference in duration induce probably, imperfective events because of the absence of any
only differ in their total duration. At no age does the the children to use different tenses in their descriptions; are considered to be of indeterminate duration precisely result whatsoever.
124
2.3
J. P. Bronckart and H. Sinclair
Other means of coding the dimensions of events
Apart from tenses, children use several means of expressing the different aspectual dimensions of actions : Intonation, repetitions, gesture, choice of specific lexical items and adverbs. Though it is very difficult to give a quantitative analysis of the use of those means, we can see a clear evolution, especially in the description of aperfective situations, and in the description of the four events of series B (e.g., the different jumps in actions 3, 4, 5 and 6). For the events of series B, children from 3 to 5 vary their descriptions by gestures, intonation and repetitions; gestures accompany all four events, a particular intonation is introduced in event 4 (i.e., the long jump is described as il saute, with a gesture imitating the movement and a long drawn-out vowel sound), and repetitions for events 3, 5 and 6 (il est 16, Ii, Id, Zri+ or iZa saute, saute?, sautk). Adverbs begin to appear in the descriptions from the age of 5, and after 6, typical descriptions are as follows: II a sautt sur toutes ZesbarriGres et aprds il est rentrt! d la maison (‘he jumped over all the fences and then he went home’). For events 5 and 6, where the farmer only jumps over five of the ten fences and the horse only over one fence, numbers are already introduced in the descriptions from the age of 5 (ZZa sautt sur quatre barrizres ‘he jumped over four fences’ or II n’a saute’ qu’une barrisre - ‘he only jumped one fence’). In fact, the failure-actions, far from being assimilated by the children to imperfective events, give rise to very detailed descriptions of the result: (7,6) EZZea sautP par trois barrikes et s’est arre^tte d la quatrisme (‘she jumped over three fences and stopped at the fourth’). In the descriptions of aperfective events, adverbs are rare before 6 (though expressions such as un peu, ‘a bit’, are used for event 9). After that age adverbs appear, e.g. Zongtemps (‘for a long time’), and adverbial expressions such as une fois (‘once’) and beaucoup de fois (‘many times’). (7,4) event 9 (short cry) : Zl a cri& une fois (‘he cried once’). (7,4) event 10 (series of short cries): ZZa crikplus de fois (‘he cried more times’). (7,5) event 11 (long cry) : II a cri&plus Zongtemps (‘he cried a longer time’). The youngest subjects up to the age of 5,6 tend to use in all their descriptions the same rather vague verbs: ZZmarche (‘he walks’), or II vu (‘he goes’) for all six perfective actions; Ilfait du bruit (‘he makes noise’) for the three aperfective actions. From 3,6 to 6, many different verbs are used; II tire (‘he draws’), Zl dipanne (‘he repairs and sets going’), II roule (‘he drives’), ZZpousse (‘he pushes’), II avance (‘he moves forward’), etc., for the description of the same event 1. After 6, standard verbs appear; pousser (‘to push’) for events 1 and 2, sauter (‘to jump’) for events 3-6, and crier (‘to cry’) for events 9-l 1.
Time, tense and aspect
It seems important aspectual features of distinctions. Between sometimes reinforced repetitions). After 6, phrases.
2.4.
125
to note that the use of different tenses to express different the events appears at the same age as that of detailed lexical 3,6 to 6, the information seems to be concentrated in the verb, by more ‘primitive’ means of expression (intonation, gestures, part of the information is expressed by adverbs or adverbial
Focus of attention
Children’s attention can be retained by different parts of the events; the different focusings vary with age and with the type of event. Our results confirm the evolution found by Piaget (1946) and Ferreiro (1971); children of 3 to 5 start their descriptions with the result. (4,5) event 4: Elle se trouve devant la maison (‘she’s in front of the house’). (3,3) event 1: La voiture qui etait darts le garage (‘the car that was in the garage’). (5,6) event 1: Une voiture avec un camion qui est darts un garage (‘a car with a truck that’s in a garage’). For event 3 (arriving at the farm after many short jumps) the first focus is often on the aim rather than on the result: (3,lO) Y va a la ferme (‘he’s going to the farm’). (3,lO) II va aller d la maison (‘he’s going to go home’). This singling out of either the result or the aim as the first thing to be described (in the situations where this is possible) may already be followed by a different focus at an early age, even without encouragement by the experimenter; the sequence of description then no longer follows the actual time sequence of the event: e.g., (3,2) Elle a arrive’ a la maison . . . elle a suite’ (‘she came home . . . she jumped’). Older children either start immediately by describing the action itself, or, if they give a description in several parts, they meticulously follow the real time-sequence: (7,5) Une voiture Ptait sur la route, un camion est venu et l’apoussee jusqu’au garage et est entre (‘there was a car on the road, a truck came and pushed it, and went inside’).
3.
Discussion
The choice of tenses used in the descriptions shows that at all ages the subjects take into account the difference between perfective, aperfective and imperfective events. Actions that obtain a clear result are mostly described in the passe compost, actions without an intrinsic aim are described in the present or the passe compost, and actions that do not lead to any result are described in thepresent. Within the limits of the ages
126
J. P. Bronckart and H. Sinclair
observed, these ‘subjective’ aspects clearly influence the choice of tenses. Other, more ‘objective’ features of the events presented (duration and frequence) have no influence on the choice of tenses for imperfective events and little influence for aperfective events, but they determine significant differences for the perfective events. In the descriptions of perfective events, the use ofpass& composb decreases between the ages of 3 and 6, with the increase of the influence of the objective duration. Possibly, duration is not an ‘aspect’ by itself, but combined with other characteristics of the action, such as the distance covered, it gives the subject an important cue as to the interval between the start and the result of an action. If the action gives an immediate result (nD actions), the use ofpasst compost reaches its maximum because the observer can only focus on the result; if a certain time elapses before the result is obtained (D actions), the observer can either focus on the result or on the action process itself. The longer it takes to complete the action, the more probable becomes a focusing on the action and the use of prhent. The distance covered by a moving object may have a similar influence. With the 6-year olds, continuous D actions still give rise to an important proportion (40 %) of prbents, whereas frequentative D actions are described by pas& composks. The frequentative feature also appears to favour a focus on the result rather than on the action itself. After the age of 7, all perfective events are generally described in the pass& compos6. The same trend towards an exclusive use of pass& compose appears after the age of 6 for aperfective events, and simultaneously some imparfaits are introduced for imperfective events. How much light do these facts throw on the question of what determines the use of tenses in child language? Spontaneous use of past and future tenses certainly indicates that relationships of posteriority and anteriority (i.e., of the event in relation to the moment of enunciation) play a part, but our results indicate that other factors also have their importance, at least until the age of 6. All descriptions were given about 7 seconds after the termination of the events; there was therefore a clear posteriority relationship, which adult French speakers generally express by the passt compose, imparfait, plus-que-parfait and the more recondite pass& simple. In current use, the passe compose expresses perfective past actions, and the imparfait imperfective past actions. Consequently, we can conclude that from the age of 6, when the trend towards pass& compost% for all actions becomes clearly established, and when imparfaits begin to appear, children use tenses to express the same temporal relationships as adults. Before the age of 6, however, the distinction between perfective and imperfective events seems to be of more importance than the temporal relation between action and the moment of enunciation. Imperfective actions are almost never expressed by past
Time, tense and aspect
127
tenses, and for perfective actions the use of prtsents is the more frequent the greater the probability of taking into account the unaccomplished part of the action. This probability is partly determined by duration, frequence and maybe other objective features we have not investigated. However, the distinction between the importance of the result as against that of the process of the action itself is the predominant and may be the only aspectual feature in the language of children below 6. Exclusive attention to the result of an action implies focusing on the ‘past’ character of an action; conversely, a focus on the process, without attention to the result, projects the action into a kind of perpetual present. For certain types of events, these early, incompatible focuses take on prime importance and lead the child to ignore the relationship of posteriority between enunciation and the termination of the action. Only when these priviledged focuses lose their importance do children begin to express the temporal posteriority in the manner of adults, shifting attention from the character of the action itself to its temporal relation with the moment of enunciation. Finally, the older subjects use different means to express aspectual features. Young children use gestures, intonation and tenses, where 6-year olds use adverbial expressions; after a first period of using pass&partout words for the actions, our young subjects use different lexical items for the same actions where older children use the same verb for the same type of action, though the actions differ in duration and frequence. Though we do not wish to argue that this developmental trend is simply the result of the cognitive development of such notions as time and duration as described by Piaget (1946), there is a striking parallelism with cognitive development in general, rather than directly with such operations as the coordination of time, speed and distance. In one of his latest works (1971), Piaget stresses the lack of differentiation between knowledge about physics and knowledge about logic, that exists during the preoperational period (from 2 to 7 years approximately). Both logical knowledge and knowledge about properties of real objects stem from actions (mainly those performed, partially or totally, by the child himself, but also those that are observed by the child); and every action has two aspects. There is on the one hand the dimension of that which is generalizable, i.e., the way actions are coordinated, the way they can annul or compensate each other. On the other hand, there is a specific dimension, i.e., the particular characteristics of the action and its result. According to Piaget, the obstacles to the formation of the first fully coherent logical system are at least partly a result of the primacy of the particular character of each action over its general, coordinative aspect. The child is mainly interested in the physical outcome of his actions and does not yet differentiate this dimension from the general coordinations he can already perform. Since actions are not done to be undone immediately afterwards, and since their aim is some kind of change in reality, it is not surprising that the main charac-
128
J. P. Bronckart and H. Sinclair
teristics of the first operations system, i.e., reversibility and conservation, are still absent during this period. Though the 4- or 5-year-old knows full well the different factors of a conservation experiment, for example, he can neither dissociate them, nor conciliate the different kinds of information he infers from them. In problems of time and duration these characteristics of the young child’s thinking lead to typical errors. He cannot coordinate the times of arrival and departure if he has to compare two events as to duration; even when he can already recount the simultaneity of start and finish, he will still not conclude that therefore the two events must have taken the same amount of time - except if the two movements were of the same speed and covered the same distance. Focusing on the end result, speed and time will be considered as co-varying with the distance covered; focusing on speed, this factor will overpower and deform the role of the others. For correct judgments, in which duration is ‘conserved’, the factors first have to be dissociated and then their interaction will be understood: Equal duration, but different distances covered can be explained by different speeds. Similarly, in the descriptions the child gives of the events performed in front of him, we observed a frequent mention of aim or result, a global apprehension of the different characteristics of the event expressed by the verb form itself, and finally, a dissociation of features expressed by adverbial locutions and mainly temporal function of tenses. Our knowledge of language acquisition is still far too fragmentary to allow us to do more than point out very general mechanisms of cognitive development that appear to have explanatory value for language acquisition phenomena. Nevertheless, several other experimental results also indicate the existence of underlying cognitive mechanisms which must be one of the factors that determine the often surprising course of linguistic acquisitions (cJ Ferreiro, 1971; Sinclair and Ferreiro, 1970). A parallel of a very different kind can be found in historical linguistics. In this field it is becoming possible to go beyond the ‘establishment of an arbitrary initial stage of a phenomenon’ and to ‘study the dynamic aspects of the process of linguistic development’ (Watkins, 1969, p. 2, 3). Several studies of the Indo-European verb system have led a number of authors (Kurylowicz, 1964; Watkins, 1969) to surmise that initially this system did not comprise any temporal oppositions. There is, however, evidence of a very early opposition between injunctives and indicatives, and of the existence of the aspectual opposition between perfective and imperfective. This opposition can shift to a temporal function, by the opposition of past and present, and aspectual forms such as the desiderative can acquire a temporal function and fill the place of the future, thereby completing the temporal axis. Other aspectual distinctions such as accomplished versus generally ongoing can in their turn cause a rebuilding of the system, and certain forms (as for instance the -s aorist) may acquire a modal function. In this way, two distinct systems
Time, tense and aspect
129
can emerge, one aspectual and temporal, the other modal. In the historically attested languages, such rebuilding did not necessarily take place in the same manner nor at the same time; but to all of them Slobin’s dictum (1973) about language acquisition can be applied : ‘New forms first express old functions, new functions are first expressed by old forms.’ Though such historical parallels are intriguing, we should evidently guard against attributing some explanatory value to them. Knowledge of cognitive development may help us to understand the course of language acquisition, but it can hardly be supposed that what we know about the way certain languages have changed in the course of their history can elucidate the acquisition process. In fact, if parallels such as the one we have referred to have a deeper significance than a chance resemblance, the relation may well take the opposite direction: The course of language acquisition may point towards some theory of historical development.
REFERENCES Ferreiro, E. (1971) Les relations temporelles dam le langage de l’enfant. Gen&e, Droz. Ferreiro, E. and Sinclair, H. (1971) Temporal relationships in language. Inter. J. Psychol., 6,39-47. Fraisse, P. (1948) Etude cornpa& de la perception et de l’estimation de la dur6e chez les enfants et les ad&es. Enfance, 1, 199-211. Gregoire, A. (1947) L’apprentissage du langage. Vol. II, Gembloux, Duculot. Hackett, C. F. (1958) A course in modern linguistics New York, Macmillan. Kurylowicz, J. (1964) Theinflectional categories of Zndo-Eurapean. Heidelberg, Carl Winter UniversitBtsverlag. Meillet, A. (1922) Introduction d l’ttuak comparative des langues Inabeuropeennes. Paris, Hachette.
Piaget, J. (1946) Le developpement de la notion de temps chez I’enfant. Paris, P.U.F. Piaget, J. and Garcia, R. (1971) Les explications causales. Etudes d’episttmologie genetique. Vol. 26, Paris, P.U.F. Sinclair, H. and Ferreiro, E. (1970) Comprehension, production et repetition de phrases au mode passif. Archives de Psychologie, 40, l-42. Slobin, D. I. (1973) Cognitive prerequisitesfor the development of grammar. In E. A. Ferguson, and D. I. Slobin (Eds.), Studies of Child Language Development. New York, Holt, Rinehart and Winston. Pp. 175-208. Watkins, C. (1969) Indo-European origins of the Celtic verb. Dublin, The Dublin Institute for Advanced Studies.
130
J. P. Bronckart and H. Sinclair
Ce papier a pour but de rechercher l’utilisation des formes verbales francaises par des enfants entre 2,ll et 8,7 ans. L’experience presentke montre que les enfants n’utilisent pas seulement les temps pour indiquer la relation de posterior&, anteriorite et simultantite entre les evenements decrits et le moment de l’enonciation mais qu’interviennent des facteurs lies a I’aspect. On a demand6 a 74 enfants de d&ire 11 actions mimees par I’experimentateur avec des jouets. Ces actions ditferaient selon leur
resultat, leur frequence et leur duree. Pour tous les enfants le type de resultat influence le choix de la forme verbale. Les traits plus objectifs tels que frequence et duree exercent une infiuence sur les enfants de 3 a 6 ans. Au dela de cet age, l’utilisation des temps commence a ressembler a celle des adultes qui utilisent differentes formes verbales pour exprimer principalement les relations temporelles. D’autres marques d’aspect et de temps montrent un developpement similaire avec Page.
6
Reductionism
and the nature
of psychology
H. PUTNAM Harvard University
1.
Reduction
A doctrine to which most philosophers of science subscribe (and to which I subscribed for many years) is the doctrine that the laws of st$h ‘higher-level’ sciences as psychology and sociology are reducible to the laws of lower-level sciences - biology, chemistry, ultimately to the laws of elementary particle physics. Acceptance of this doctrine is generally identified with belief in ‘The Unity of Science’ (with capitals), and rejection of it with belief in Vitalism, or Psychism, or, anyway something bud. In this paper I want to argue that this doctrine is wrong. In later sections, I shall specifically discuss the Turing machine model of the mind - and the conception of psychology associated with reductionism and with the Turing machine model. I want to argue that while materialism is right and while it is true that the only method for gaining knowledge of anything is to rely on testing ideas in practice (and evaluating the results of the tests scientifically), acceptance of these doctrines need not lead to reductionism. I shall begin with a logical point and then apply it to the special case of psychology. The logical point is that from the fact that the behavior of a system can be deduced from its description as a system of elementary particles it does not follow that it can be explained from that description. Let us look at an example and then see why this is so. My example will be a system of two macroscopic objects, a board in which there are two holes, a square hole 1” across and a round hole 1” in diameter, and a square peg, a fraction less than 1” across. The fact to be explained is: The peg goes through the square hole, and it does not go through the round hole. One explanation is the peg is approximately rigid under transportation and the board is approximately rigid. The peg goes through the hole that is large enough and not through the hole that is too small. Notice that the microstructure of the board and the peg is irrelevant to this explanation. All that is necessary is that, whatever Cognition 2(l),
pp. 131-146
132
H. Putnam
the microstructure may be, it be compatible proximately rigid objects.
with the board
and the peg being ap-
Suppose, however, we describe the board as a cloud of elementary particles (for simplicity, we will assume these are Newtonian elementary particles) and imagine ourselves given the position and velocity at some arbitrary time t, of each one. We then describe the peg in a similar way. (Say the board is ‘cloud B’ and the peg is ‘cloud A’.) Suppose we describe the round hole as ‘region 1’ and the square hole as ‘region 2’. Let us say that by a heroic feat of calculation we succeed in proving that ‘cloud A’ will pass through ‘region 2’, but not through ‘region 1’. Have we explained anything? It seems to me that whatever the pragmatic constraints on explanation may or may not be, one constraint is surely this: That the relevant features of a situation should be brought out by an explanation and not buried in a mass of irrelevant information. By this criterion, it seems clear that the first explanation -the one that points out that the two macro-objects are approximately rigid and that one of the two holes is big enough for the peg and the other is not - explains why ‘cloud A’ passes through ‘region 2’ and never through ‘region l’, while the second - the deduction of the fact to be explained from the positions and velocities of the elementary particles, their electrical attractions and repulsions, etc. - fails to explain. If this seems counterintuitive it is for two reasons, I think. (1) We have been taught that to deduce a phenomenon in this way is to explain it. But this is ridiculous on the face of it. Suppose I deduce a fact F from G and I, where G is a genuine explanation and I is something irrelevant. Is G and I an explanation of F? Normally we would answer, ‘No. Only the part G is an explanation’. Now, suppose I subject the statement G and I to logical transformations so as to produce a statement H which is mathematically equivalent to G and I (possibly in a complicated way), but such that the information G is, practically speaking, virtually impossible to recover from H. Then on any reasonable standard the resulting statement H is not an explanation of F; but F is deducible from H. I think that the description of the peg and board in terms of the positions and velocities of the elementary particles, their electrical attractions and repulsions, etc., is such a statement H: The relevant information, that the peg and the board are approximately rigid, and the relative sizes of the holes and the peg are buried in this information, but in a useless way (practically speaking). (2) We forget that explanation is not transitive. The microstructure of the board and peg may explain why the board and the peg are rigid, and the rigidity is part of the explanation of the fact that the peg passes through one hole and not the other, but it does not follow that the microstructure, so to speak ‘raw’ - as an assemblage of positions, velocities, etc. - explains the fact that the peg passes through one hole and not the other. Even if the microstructure is not presented ‘raw’, in this sense, but the informa-
Reductionism
and the nature of psychology
133
tion is organized so as to give a revealing account of the rigidity of the macro-objects, a revealing explanation of the rigidity of the macro-objects is not an explanation of something which is explained by that rigidity. If I want to know why the peg passes through one hole and not the other in a normal context (e.g., I already know that these macro-objects are rigid), then the fact that one hole is bigger than the peg all around and the other isn’t is a complete explanation. That the peg and the board consist of atoms arranged in a certain way, and that atoms arranged in that way form a rigid body, etc., might also be an explanation - although one which gives me information (why the board and the peg are rigid) I didn’t ask for. But at least the relevant information - the rigidity of the’board and the peg, and the relation of the sizes and shapes of the holes and the pegs - are still explicit. That the peg and the board consist of atoms arranged in a certain way by itself does not explain why the peg goes through one hole and not the other, even if it explains something which in turn explains that. The relation between (1) and (2) is this: An explanation of an explanation (a ‘parent’ of an explanation, so to speak), generally contains information I which is irrelevant to what we want to explain, and in addition it contains the information which is relevant, if at all, in a form which may be impossible to recognize. For this reason, a parent of an explanation is generally not an explanation. What follows from this is that certain systems can have behaviors to which their microstructure is Zargely irrelevant. For example, a great many facts about rigid bodies can obviously be explained just from their rigidity and the principles of geometry, as in the example just given, without at all going into why those bodies are rigid. A more interesting case is the one in which the higher-level organizational facts on which an explanation turns themselves depend on more than the microstructure of the body under consideration. This, I shall argue, is the typical case in the domain of social phenomena. For an example, consider the explanation of social phenomena. Marx, in his analysis of capitalism, uses certain facts about human beings - for example, that they have to eat in order to live, and they have to produce in order to eat. He discusses how, under certain conditions, human production can lead to the institution of exchange, and how that exchange in turn leads to a new form of production, production of commodities. He discusses how production of commodities for exchange can lead to production of commodities for profit, to wage labor and capital. Assume that something like this is right. How much is the microstructure of human beings relevant? The case is similar to the first example in that the specifics of the microstructure are irrelevant: What is relevant is, so to speak, an organizational result of microstructure. In the first case the relevant organizational result was rigidity: In the present case, the relevant organizational result is intelligent beings
134 H. Putnam
able to modify both the forces of production and the relations of production to satisfy both their basic biological needs and those needs which result from the relations of production they develop. To explain how the microstructure of the human brain and nervous system accounts for this intelligence would be a great feat for biology; it might or might not have relevance for political economy. But there is an important difference between the two examples. Given the microstructure of the peg and the board, one can deduce the rigidity. But given the microstructure of the brain and the nervous system, one cannot deduce that capitalist production relations will exist. The same creatures can exist in pre-capitalist commodity production, or in feudalism, or in socialism, or in other ways. The laws of capitalist society cannot be deduced from the laws of physics plus the description of the human brain: They depend on ‘boundary conditions’ which are accidental from the point of view of physics but essential to the description of a situation as ‘capitalism’. In short, the laws of capitalism have a certain autonomy vis-a-vis the laws of physics: They have a physical basis (men have to eat), but they cannot be deduced from the laws of physics. They are compatible with the laws of physics; but so are the laws of socialism and of feudalism. This same autonomy of the higher-level science appears already at the level of biology. The laws which collectively make up the theory of evolution are not deducible from the laws of physics and chemistry; from the latter laws it does not even follow that one living thing will live for five seconds, let alone that living things will live long enough to evolve. Evolution depends on a result of microstructure (variation in genotype); but it also depends on conditions (presence of oxygen, etc.) which are accidental from the point of view of physics and chemistry. The laws of the higherlevel discipline are deducible from the laws of the lower-level discipline together with ‘auxiliary hypotheses’ which are accidental from the point of view of the lower-level discipline. And most of the structure at the level of physics is irrelevant from the point of view of the higher-level discipline; only certain features of that structure (variation in genotype, or rigidity, or production for profit are relevant), and these are specified by the higher-level discipline, not the lower-level one. The alternatives mechanism or vitalism are false alternatives. The laws of human sociology and psychology, for example, have a basis in the material organization of persons and things, but they also have the autonomy just described vis-&is the laws of physics and chemistry. The reductionist way of looking at science both springs from and reinforces a specific set of ideas about the social sciences. Thus, human biology is relatively unchanging. If the laws of psychology are deducible from the laws of biology and (also unchanging) reductive definitions then it follows that the laws of psychology are also unchanging. Thus the idea of an unchanging human nature - a set of structured
Reductionism and the nature of psychology
135
psychological laws, dependent on biology but independent of sociology - is presupposed at the outset. Also, each science in the familiar sequence - physics, chemistry, biology, psychology, sociology - is supposed to reduce to the one below (and ultimately to physics). Thus sociology is supposed to reduce to psychology which in turn reduces to biology via the theory of the brain and nervous system. This assumes a definite attitude towards sociology, the attitude of methodological individualism. (In conventional economics, for example, the standard attitude is that the market is shaped by the desires and preferences of individual people; no conceptual apparatus even exists for investigating the ways in which the desires and preferences of individuals are shaped by the economic institutions.) Besides supporting the idea of an unchanging human nature and methodological individualism, there is another and more subtle role that reductionism plays in one’s outlook. This role may be illustrated by the effect of reductionism on biology departments: When Crick and Watson made their famous discoveries, many biology department fired some or all of their naturalists! Of course, this was a crude mistake. Even from an extreme reductionist point of view, the possibility of explaining the behavior of species via DNA mechanisms ‘in principle’ is very different from being able to do it in practice. Firing someone who has a lot of knowledge about the habits of, say, bats, because someone else has a lot of knowledge about DNA is a big mistake. Moreover, as we saw above, you can’t explain the behavior of bats, or whatever species, just in terms of DNA mechanisms - you have to know the ‘boundary conditions’. That a given structure enables an organism to fly, for example, is not just a function of its strength, etc., but also of the density of the earth’s atmosphere. And DNA mechanisms represent the wrong level of organization of the data - what one wants to know about the bat, for example, is that it has mechanisms for producing supersonic sounds, and mechanisms for ‘triangulating’ on its own reflected sounds (‘echolocating’). The point is that reductionism comes on as a doctrine that breeds respect for science and the scientific method. In fact, what it breeds is physics worship coupled with neglect of the ‘higher-level’ sciences. Infatuation with what is supposedly possible ‘in principle’ goes with indifference to practice and to the actual structure of practice. I don’t mean to ascribe to reductionists the doctrine that the ‘higher-level’ laws could be arrived at in the first place by deduction from the ‘lower-level’ laws. Reductionist philosophers would very likely have said that firing the naturalist was a misapplication of their doctrines, and that neglect of direct investigation at ‘the level of sociology’ would also be a misapplication of their doctrine. What I think goes on is this. Their claim that higher-level laws are deducible from lower-level laws and therefore higher-level laws are explainable by lower-level laws involves a mistake (in fact, two mistakes). It involves neglect of the structure of the higher-level explanations
136
H. Putnam
which reductionists never talk about at all, and it involves neglect of the fact that more than one higher-level structure can be realized by the lower-level entities (so that what the higher-level laws are cannot be deduced from just the laws obeyed by
the ‘lower-level’ entities). Neglect of the ‘higher-level’ sciences themselves seems to me to be the inevitable corrollary of neglecting the structure of the explanations in those sciences.
2.
Turing machines
In previous papers,l I have argued for the hypothesis that (1) a whole human being is a Turing machine, and (2) that psychological states of a human being are Turing machine states or disjunctions of Turing machine states. In this section I want to argue that this point of view was essentially wrong, and that I was too much in the grip of the reductionist outlook just described. Let me begin with a technical difficulty. A state of a Turing machine is described in such a way that a Turing machine can be in exactly one state at a time. Moreover, memory and learning are not represented in the Turing machine model as acquisition of new states, but as acquisition of new information printed on the machine’s tape. Thus, if human beings have any states at all which resemble Turing machine states, those states must (1) be states the human can be in at any time, independently of learning and memory; and (2) be total instantaneous states of the human being states which determine, together with learning and memory, what the next state will be, as well as totally specifying the present condition of the human being (‘totally’ from the standpoint of psychological theory, that means). These characteristics already establish that no psychological state in any customary sense can be a Turing machine state.2 Take a particular kind of pain to be a ‘psychological state’. If I am a Turing machine, then my present ‘state’ must determine not only whether or not I am having that particular kind of pain, but also whether or not I am about to say ‘three’, whether or not I am hearing a shrill whine, etc. So the psychological state in question (the pain) is not the same as my ‘state’ in the sense of machine state, although it is possible (so far) that my machine state determines my psychological state. Moreover, no psychological theory would pretend that having a 1. Minds and machines, in Sidney Hook (Ed.), Dimensions of mind, New York University, 1960, pp. 148-179; Psychological predicates, in Merrill and Capitan (Eds.), Art, mind, and religion, University of Pittsburgh, 1965, pp. 37-48; The mental life of some machines, in Hector Castaneda (Ed.), Zntentionality, minds,
and perception, Wayne University, 1967, pp. 177-214. 2. For an exposition of Turing machines, see Martin Davis (1958), Computability and unsolvability, New York, McGraw Hill. There is also an attractivelittlemonograph by Trachtenbrot on the subject.
Reductionism and the nature of psychology
137
pain of a particular kind, being about to say ‘three’, or hearing a shrill whine, etc., all belong to one psychological state, although there could well be a machine state characterized by the fact that I was in it only when simultaneously having that pain, being about to say ‘three’, hearing a shrill whine, etc. So, even if I am a Turing machine, my machine states are not the same as my psychological states. My description qua Turing machine (machine table)and my description qua human being (via a psychological theory) are descriptions at two totally different levels of organization. So far it is still possible that a psychological state is a large disjunction (practically speaking, an almost infinite disjunction) of machine states, although no single machine state is a psychological state. But this is very unlikely when we move away from states like ‘pain’ (which are almost biological) to states like ‘jealousy’ or ‘love’ or ‘competitiveness’. Being jealous is certainly not an instantaneous state, and it depends on a great deal of information and on many learned facts and habits. But Turing machine states are instantaneous and are independent of learning and memory. That is, learning and memory may cause a Turing machine to go into a state, but the identity of the state does not depend on learning and memory, whereas, no matter what state I am in, identifying that state as ‘being jealous of X’s regard for Y’ involves specifying that I have learned that X and Y are persons and a good deal about social relations among persons. Thus jealousy can neither be a machine state nor a disjunction of machine states. One might attempt to modify the theory by saying that being jealous = either being in State A and having tape c1 or being in State A and having tape c2 or . . . being in State B and having tape d, or being in State B and having tape d2 or . . . being in State Z and having tape y1 . . . or being in State Z and having tape y, - i.e., define a psychological state as a disjunction, the individual disjuncts being not Turing machine states, as before, but conjunctions of a machine state and a tape (i.e., a total description of the content of the memory bank). Besides the fact that such a description would be literally infinite, the theory is now without content, for the original purpose was to use the machine table as a model of a psychological theory, whereas it is now clear that the machine table description, although different from the description at the elementary particle level, is as removed from the description via a psychological theory as the physico-chemical description is. I now want to make a different point about the Turing machine model. The laws of psychology, if there are ‘laws of psychology’ at all, need not even be compatible with the Turing machine model, or with the physico-chemical description, except in a very attenuated sense. And I don’t have in mind any version of psychism. As an example, consider the laws stated by Hull in his famous theory of rote learning. Those laws specify an analytical relationship between continuous variables. Since a Turing machine is wholly ‘discrete’, those laws are formally incompatible
138
H. Putnam
with the Turing machine model. Yet they could perfectly well be correct. The reader may at this point feel annoyed, and want to retort: Hull’s laws, if ‘correct’, are correct only with a certain accuracy, to a certain approximation. And the exact law has to be compatible with the Turing machine model, if I am a Turing machine, or with the laws of physics, if materialism is true. But there are two separate and distinct elements to this retort. (1) Hull’s laws are ‘correct’ only to a certain accuracy. True. And the statement ‘Hull’s laws are correct to within measurement error’ is perfectly compatible with the Turing machine model, with the physicalist model, etc. It is in this attenuated sense that the laws of any higher-level discipline have to be compatible with the laws of physics: It has to be compatible with the laws of physics that the higher-level laws could be true to within the required accuracy. But the model associated with the higher-level laws need not at all be compatible with the model associated with the lower-level laws. Another way of putting the same point is this. Let L be the higher-level laws as normally stated in psychology texts (or texts of political economy, or whatever). Let L* be the statement ‘L is approximately correct’. Then it is only L* that has to be compatible with the laws of physics, not L. (2) The exact law has to be compatible with the Turing machine model (or anyway the laws of physics). False. There need not be any ‘exact’ law - any law more exact than Hull’s - at the psychological level. In each individual case of rote learning, the exact description of what happened has to be compatible with the laws of physics. But the best statement one can make in the general case, at the psychological level of organization, may well be that Hull’s laws are correct to within random errors whose explanation is beneath the level of psychology. The general picture, it seems to me, is this. Each science describes a set of structures in a somewhat idealized way. It is sometimes believed that a non-idealized description, an ‘exact’ description, is possible ‘in principle’ at the level of physics; be that as it may, there is not the slightest reason to believe that it is possible at the level of psychology or sociology. The difference is this: If a model of a physical structure is not perfect, we can argue that it is the business of physics to account for the inaccuracies. But if a model of a social structure is not perfect, if there are unsystematic errors in its application, the business of accounting for those errors may or may not be the business of social science. If a model of, say, memory in functional terms (e.g., a flow chart for an algorithm) fails to account for certain memory losses, that may be because a better psychological theory of memory (different flow chart) is called for, or because on certain occasions memory losses are to be accounted for by biology (an accident in the brain, say) rather than by psychology. If this picture is correct, then ‘oversimplified’ models may well be best possible at the ‘higher’ levels. And the relationship to physics is just this: It is compatible with physics that the ‘good’ models on the higher levels should be approximately realized
Reductionism and the nature of psychology
139
by systems having the physical constitution that human beings actually have. At this point, I should like to discuss an argument proposed by Hubert Dreyfuss. Dreyfuss believes that the functional organization of the brain is not correctly represented by the model of a digital computer. As an alternative he has suggested that the brain may function more like an analogue computer (or a complicated system of analogue computers). One kind of analogue computer mentioned by Dreyfuss is the following: Construct a map of the railway system of the U.S. made out of string (the strings represent the railroad lines, the knots represent the junctions). Then to find the shortest path between any two junctions - say, Miami and Las Vegas - just pick up the map by the two corresponding knots and pull the two knots away from each other until a string between them becomes straight. That string will represent the shortest path. When Dreyfuss advanced this in conversation, I rejected it on the following grounds: I said that the physical analogue computer (the map) really was a digital computer, or could be treated as one, on the grounds that (1) matter is atomic; (2) one could treat the molecules of which the string consists as gears which are capable of assuming a discrete number of positions vis-a-vis adjacent molecules. Of course, this only says that the analogue computer can be well approximated by a system which is ‘digital’. What I overlooked is that the atomic structure of the string is irrelevant to the working of the analogue computer. Worse, I had to invent a microstructure which is just as fictitious as the idealization of a continuous string of constant length (the idea of treating molecules as ‘gears’) in order to carry through the re-description of the analogue device as a ‘digital’ device. The difference between my idealization (strings of ‘gears’) and the classical idealization (continuous strings) is that the classical idealization is relevant to the functioning of the device as an analogue computer (the device works because the strings are - approximately - continuous strings), while my idealization is irrelevant to the description of the system on any level.
3.
Psychology
The previous considerations show that the Turing machine model need not be taken seriously as a model of the functional organization of the brain. Of course, the brain has digital elements - the ‘yes-no’ firing of the neurons - but whether the larger organization is correctly represented by something like a flow chart for an algorithm, or by something quite different, we have no way of knowing right now. And Hull’s model for rote learning suggests that some brain processes are best conceptualized in terms of continuous rather than discrete variables. In the first section of this paper we argued that psychology need not be deducible
140
H. Putnam
from the laws describing the functional organization of the human brain, and in the last section we used a psychological state (jealousy) to illustrate that the Turing machine model cannot be correct as a paradigm for psychological theory. In short, there are two different questions which have gotten confused in the literature: (1) Is the Turing machine model correct as a model for the functional organization of the human brain? and (2) Is the Turing machine model correct as a model for psychological theory? Only on the reductionist assumption that psychology is the description of the functional organization of the brain, or something very close to it, can these two questions be identified. Our answer to these two questions so far is that (1) there is little evidence that the Turing machine model is correct as a model of the functional organization of the brain; and (2) the Turing machine model cannot be correct as a model for psychological theory - i.e., psychological states are not machine states nor are they disjunctions of machine states. But what is the nature of psychological states? The idea of a fixed repertoire of emotions, attitudes, etc., independent of culture is easily seen to be questionable. An ‘attitude’ that we are very familiar with, for example, is the particular kind of arrogance that one person feels towards other people ‘because’ he does mental work and they do manual work. (The reason I put ‘because’ in shudder quotes is that really the causality is much more complicated - he feels arrogant because his society has successfully won him and millions of other people to the idea that the worker is ‘superior’ to the extent his work differs from the work of a ‘common’ laborer and resembles that 01 a manager, perhaps, or because it has won him and millions ot other people to the idea that certain kinds of work are inherently ‘above’ most people - ‘they couldn’t understand’ - etc.). An ‘attitude’ we find it almost impossible to imagine is the following: One person feeling superior to others because the first person cleans latrines and the others do not. This is not the case because people who clean latrines are innately inferior, nor because latrinecleaning is innately degrading. Given the right social setting, this attitude which we cannot now imagine would be commonplace. Not only are the particular attitudes and emotions we feel culture-bound, so are the connections. For example, in our society, arrogance of mental workers is associated with extreme competitiveness; but in a different society it might be associated with the attitude that one is ‘above’ competing, while being no less arrogant. This might be a reflection of the difference between living in a society based on competition and living in a society based on a feudal hierarchy. Anthropological literature is replete with examples that support the idea that emotions and attitudes are culture-dependent. For example, there have existed and still exist cultures in which private property and the division of labor are unknown. An Arunha cannot imagine the precise attitude with which Marie Antoinette said ‘let them
Reductionism and the nature of psychology
141
eat cake’, nor the precise attitude of Richard Herrnstein towards the ‘residue’ of low I.Q. people which he says is being ‘precipitated’, nor the precise attitude which made me and thousands of other philosophers feel tremendous admiration towards John Austin for distinguishing ‘Three Ways of Spilling Ink’ (‘intentionally’, ‘deliberately’, and ‘on purpose’). Nor can we imagine many of the attitudes which Arunha feel, and which are bound up with their culture and religion. This suggests the following thesis: That psychology is as under-determined by biology as it is by elementary particle physics, and that people’s psychology is partly a reflection of deeply entrenched societal beliefs. One advantage of this position is that it permits one to deny that there is a fixed human nature at the level of psychology, without denying that homo sapiens is a natural kind at the level of biology. Marx’s thesis that there is no fixed ‘human nature’ which people have under all forms of social organization was not a thesis about ‘nature versus nurture’.
4.
Intelligence
As an example, let us take a look at the concept of ‘intelligence’ - a concept in vogue with racist social scientists these days. The concept of intelligence is both an ordinary language concept and a technical concept (under the name ‘IQ’). But the technical concept has been shaped at every point to conform to the politically loaded uses of the ordinary language concept. The three main features of the ordinary language notion of intelligence are (1) intelligence is hard or impossible to change. When one ascribes an excellent or poor performance to high or low skill there is no implication that this was not acquired or could not be changed; but when one ascribes the same performance to ‘high intelligence’ or ‘low intelligence’ there is the definite implication of something innate, something belonging to the very essence of the person involved. (2) Intelligence aids one to succeed, where the criterion of ‘success’ is the criterion of individual success, success in competition. It is built into the notion that only a few people can have a lot of intelligence. (3) Intelligence aids one no matter what the task. Intelligence is thought of as a single ability which may aid one in doing anything from fixing a car or peeling a banana to solving a differential equation. These three assumptions together amount to a certain social theory: The theory of elitism. The theory says that there are a few ‘superior’ people who have this one mysterious factor - ‘intelligence’ - and who are good at everything, and a lot of slobs who are not much good at anything. The IQ test was constructed to preserve the elitist features of the concept in the following way. (1) The IQ test was standardized so that IQ scores would not change
142 H. Putnam
much with age, thus preserving the illusion of a measure of something unchanging. One can do this with any test, as long as relative standings are fairly stable. For example, suppose one takes a test of French vocabulary. One’s score would presumably increase as one went through four years of college French. Now, suppose one standardized the scores so that on the average they did not increase (by simply giving less credit per item to people taking the test with one year of college French, still less credit to people with two years, etc.). Then a person with a score of 85 on ‘French vocabulary’ would, on the average, still score 85 after four full years. Finally, rebaptise the test, and call it a test of ‘French ability’ (‘French intelligence’?) and lo and behold a new social distinction is born! The distinction of having high or low ‘French ability’. But one is still, in the real world, just talking about high or low relative standing on plain old vocabulary tests. (2) The IQ test was ‘validated’ by selecting the items so that they would predict ‘success’ in college - one of Herrnstein’s arguments for identifying IQ with ‘intelligence’ - is 100% a statistical artifact of this method of validation. (3) The third feature of the ordinary language concept - that IQ is a single factor - was harder to ensure. All of the statistical evidence turned out to be against this hypothesis. In fact, it turns out that over a hundred different factors contribute to one’s score on IQ tests. So one just takes an average, weighting the factors so that they predict success in school, and calls the result Intelligence Quotient’. And again, lo and behold! One has people with ‘high IQ’ and people with ‘low IQ’, ‘gifted people’ (a term Herrnstein et al. use interchangeably with ‘high IQ’) and ‘dull people’ (a term used interchangeably with ‘low IQ’). In short, one recovers the full ordinary language use of the concept - but now with the appearance of ‘scientific objectivity’. Under a less competitive form of social organization, the theory of elitism might well be replaced by a different theory -the theory of egalitarianism. This theory might say that ordinary people can do anything that is in their interest and do it well when (1) they are highly motivated, and (2) they work collectively. After all, a long tradition of libertarian thought, including such different thinkers as Marx and Kropotkin, has held that the extreme separation of mental from manual labor and the top-down organization of society are neither forced upon us by human nature nor the only conceivable forms of advanced social organization - that a society based upon what Kropotkin called mutual aid rather than competition for profit and coercion by the state is conceivable and may well represent the sefting in which human potential for ‘free conscious activity’ and ‘productive life’ (in the words of the early Marx) can best develop. Those who have conceived of such a society have always anticipated that it would bring enormous opportunities for ‘more science and art, more diffused knowledge and mental cultivation, more leisure for wage earners, and more capacity for intelligent pleasures’, as Russell wrote in the 1920s. Egalitarian estimates of human potential rest on a number of objective facts. That motivation
Reductionism and the nature of psychology
143
plays a decisive role in acquiring almost any skill is a matter of everyone’s experience. Take driving a car, for example. Anyone who is highly motivated can learn to drive exceptionally well, as a rule. One may be held back by certain emotional problems (themselves frequently connected with social attitudes), for example, nervousness or general insecurity. If one is trying to learn to drive in races, then fear or lack of fear will be a big factor. But if one is not insecure, not fearful, not so wrapped up in oneself or some problem that one lacks judgment, then one can learn to be far more skillful than average if one has the motivation. The same thing is true of any task. Again anthropological evidence is relevant. A Norwegian adolescent has ‘exceptional’ skiing ability by American standards. An Amazonian boy has ‘fantastic’ ability with a blowgun. A factory worker in Maine can sew two hundred collars in one hour - because she has to feed her family. The importance of working collectively is also evidenced in many ways. The Black and Latin prisoners in Attica Prison are presumably part of the low IQ ‘residue’. But they organized brilliantly : Every popular revolution in history makes the same point that ordinary people in a revolution can perform incredible feats of organization, planning, strategy, etc. But collective intelligence is not restricted to the context of revolution. Since the 1950s a series of studies have shown that even in the context of modern capitalist production, workers perform better, and find their jobs less dissatisfying, when the managerial hierarchy is reduced. E.g., studies in the coal mines showed that teams of workers working without a foreman (the jobs were divided up by the workers themselves; the workers rotated jobs at will, decided when they would take breaks, etc. ; only the group as a whole had a production quota to meet, not the individual worker) produced better than when there was a foreman, that absenteeism was less, etc. Today when the automobile industry is experimenting with giving teams of workers a tiny bit of autonomy in this way, it is very evident that the reason for so much hierarchy, rigidity, and coercion was not, in fact, eficiency. Nor are the workers too lacking in intelligence to function without authoritarian discipline. Although it is a serious mistake, in my opinion, to consider China to be an egalitarian or libertarian society, similar experiments in China (possibly on a much larger scale, although it is hard to be sure) have had similar results. Of course, saying that people can solve any problem collectively implies something about the capacities of the individual person. Solving a problem collectively means many people individually doing many hard things. The fact that individual people can do things that are supposed to be ‘beyond’ them, when they are motivated to is the basis of the fact that groups of people aiding each other can do anything, if they have to. But someone may be highly motivated in a collective struggle and ‘lack motivation’ when the context is one of individual competition - especially competition
144
P. Putnam
in a situation which is loaded against him and his group. The same worker may be tremendously motivated in the context of organizing a strike or reorganizing a factory, given the opportunity, and tremendously ‘unmotivated’ when it comes to taking an IQ test. I am not arguing that the IQ test is simply a test of motivation. Among white, middle-class people - that is, among the sort of people the test was standardized on there is little doubt that the test measures some sort of skill connected with reading and with interpreting written material and some sort of skill connected with abstract thinking, in the sense in which mathematics and logic are abstract. Nor am I concerned to deny that these skills have a significant genetic component. The high correlation between the IQ scores of identical twins is an impressive argument for the existence of such a component. But to use a test designed to show differences in the cognitive skills of white middle-class people to try to prove differences in the cognitive capacities of middle-class and working-class people, or white and black people, is unsound. Working-class people and especially disadvantaged people do not have the same motivation to acquire the ‘scholastic’ skills that these tests measure that middleclass people do - not because they are ‘apathetic’ or ‘unmotivated’ in some vegetable sense, but because these skills will not help them get good jobs or decent lives in this society. (For example, Bowles and Gintis3 have shown that in the absence of years of schooling or high-status family, mere posession of high IQ has negligible effect on income. They also show that the effect of schooling on income is independent of its effects on the cognitive skills measured by the adult IQ test.) Even more important is the fact that these tests do not measure any kind of absolute cognitive capacity for anyone - white or black, middle class or working class. From the fact that Mr. Jones’ IQ is 100 you cannot infer how much he will know or be able to do; that depends on the cultural setting. This elementary point cannot be emphasized too strongly considering the uses currently being made of these tests. The absolute level of performance (i.e., level of knowledge displayed) on standardized intelligence tests has increased enormously since 1917. The mean IQ remains 100, of course, by definition. (Recall that IQ is a normalized measure; even if we were all as smart as Einstein, 50% of us would have IQ ‘below loo’.) At one time, reading and writing were skills confined to a tiny minority. If at, say, the time of Charlemagne one had suggested that someday almost everyone would be able to read and write, and that the descendents of the serfs would have a vocabulary of many thousands of words, one would have been looked at as one would be looked at today if he suggested that in twenty years symbolic logic may be taught in elementary school, or in 200 years everyone (including the 50% with ‘IQ below 100’) will know relativity theory. 3. See IQ and the U.S. class structure, Social Policy, Jan.-Feb. 1973, 65-96.
Reductionism and the nature of psygology
145
But symbolic logic may well be taught in elementary school in twenty years, and I would be amazed if everyone didn’t know relativity theory in 200 years. That relative standing with respect to certain cognitive skills may be largely genetically determined is socially unimportant if no upper bound is thereby set on the absolute level of skill that the great majority of people can attain. Notice that, if someone maintains that IQ scores do set an upper bound on the absolute level of skill that people can attain - say, that people with IQ below 100 will never be able to learn accounting - not even if we change the teaching methods and the motivational situation - then he is committed to the claim that the analogous contention in Charlemagne’s time (with respect to literacy) was false because the ‘intellectual’ tasks of Charlemagne’s time were still remote from the boundary of absolute human capacity, whereas accounting is close to the boundary. I find it hard to believe that we have begun to even come close to any absolute boundary of human capacity. Thus, when Herrnstein, for example, writes4 that the 50% of the human race with IQ below 100 are ‘ineligible’ for accounting, it is somewhat difficult to know what he means. If he means that under present conditions, people with IQ below 100 can’t become accountants, this is uninteresting. But if he means that such people cannot become accountants even if we devise better teaching methods, and the culture is different in such a way that they are motivated to learn accounting, then this is highly implausible and in any case totally unproved. Yet it is this second, totally unproved contention that he needs to draw the conclusion that ‘low IQ’ people are destined to become an unemployable ‘residue’ in advanced industrial societies. Once again, we see the political assumptions that were built into the ordinary language notion of ‘intelligence’ operating. The normalization of the IQ measure automatically makes it competitive: How ‘intelligent’ you are is defined to be not a function of how much you can learn on any absolute scale, but what percentage of the population you can beat. And then the measure is used as if it were, after all, an absolute measure, to legitimize stratification which has, in fact, nothing to do with IQ at all6
5.
Psychology
again
If these reflections
are right, then it is worthwhile
4. P. 51 in his article IQ, Atlanric Monthly, Sept. 1971,43-64. 5. For a lucid exposition of the ‘legitimizing’
re-examining
the nature
of psy-
function of IQ, CJ the article by Bowles and Gintis cited in note 3.
146
H. Putnam
chology. Reductionism asserts that psychology is deducible from the functional organization of the brain. The foregoing remarks suggest that psychology is strongly determined by sociology. Which is right? The answer, I suspect, is that it ‘depends on what you mean’ by psychology. Chomsky remarks that ‘so far as we know, animals learn according to a genetically determined program’. While scientific knowledge reflects the development of a socially determined program for learning, there can be little doubt that the possible forms of socially determined programs must in some ways be conditioned by the ‘genetically determined program’ and presuppose the existence of this program in the individual members of the society. The determination of the truth of this hypothesis, and the spelling out of the details is the task of cognitive psychology. Nothing said here is meant to downgrade the importance of that task, or to downgrade the importance of determining the functional organization of the brain. Some parts of psychology are extremely close to biology - Hull’s work on rote learning, much of the work on reinforcement, and so on. It is no accident that in my own reductionist papers the example of a psychological state was usually ‘pain’, a state which is strongly biologically marked. On the other hand, if one thinks of the parts of psychology that philosophers and clinical psychologists tend to talk about - psychological theories of ‘aggression’, for instance, or theories of ‘intelligence’, or theories of ‘sexuality’, then it seems to me that one is thinking of the parts of psychology which study mainly societal beliefs and their effects in individual behavior. That these two sides of psychology are not distinguished very clearly is itself an effect of reductionism. If they were, one might have noticed that none of the literature on ‘intelligence’ in the past 75 years has anything in the slightest to do with illuminating the nature
and structure
of human
cognitive
capacity.
Discussion
Professors and psychological researchers: Conflicting values in conflicting roles
H. B. SAVIN University of Pennsylvania
Psychology Professor Philip Zimbardo and his colleagues hired ten undergraduates to play at being prisoners, and eleven others to be their jailers. The psychologistdirectors spared no pains in constructing a convincing jail. They persuaded the Palo Alto police to arrest the ‘volunteer’ prisoners in squad cars, charge them with felonies at the police station and deliver them blindfolded to Zimbardo’s jail, without informing them that this ‘arrest’ was the beginning of the experiment they had agreed to take part in. The principal result was that the guards behaved like prison guards and the prisoners like prisoners. Indeed, the guards seem to have been more brutal, and the prisoners more degraded, than one would expect them to have become after only a few days in an establishment jail; Zimbardo was outstandingly successful at simulating the most destructive aspects of prisons. Within five days, he reports, he had felt obliged to release four of his ten prisoners because of ‘extreme depression, disorganized thinking, uncontrollable crying and fits of rage’ (1973a, p. 48), even though he was not so squeamish as to have prevented his guards from forcing prisoners to clean toilets with their bare hands, spraying them with fire extinguishers and repeatedly making them ‘do pushups, on occasion with a guard stepping on them’ (p. 44). In short, one cannot make a prison a more humane institution by appointing Mr. Zimbardo its superintendent - a result no more surprising to him than to the rest of us because he knows all about man’s ‘dehumanizing tendency to respond to other people according to socially determined labels and often arbitrarily assigned roles’ (1973a, p. 58). The roles of prisoner and guard have been much discussed of late, and their characteristics are reasonably well known. But Mr. Zimbardo’s study also calls attention to the role of psychological researcher. Much research in psychology, and in medicine as well, cannot be done without subjecting people to injury, sometimes physical and sometimes psychological, sometimes temporary and sometimes permanent. When is such research justifiable? No one, surely, would object to embarrassing a few self-confident volunteers if the result were a cure for schizophrenia. But in a Cognition 2(l),
pp. 147-149
148
H. B. Savin
great many experiments it is by no means clear that the good outweighs the harm, and the balance one strikes will depend in part, as social psychologists know better than anyone else, upon one’s role. Consider again Zimbardo’s study. He has acknowledged that his results were ‘no surprise to sophisticated savants’ (1973b, p. 123), but feels that, even if it did not contribute anything to scientific knowledge, it was worth doing because it would help to enlighten those who had not learned about the importance of roles from less melodramatic research than his.l Is the degradation of thirty-two young men justified by the importance of the results of this research? That depends, obviously, on one’s point of view. In practice, the decision about whether to do an experiment in which the subjects are mistreated rests with the experimenter, who, of course, is likely to profit from whatever good comes of an experiment but does not experience the harm that it does to his subjects. Similar questions are raised by a great many other psychological experiments in which subjects are deceived, frightened, humiliated or maltreated in some other way. Like everyone else, psychological experimenters are bound by the code of criminal law, but the subject in a psychological experiment cannot assume that he will be treated any better than the law requires. Indeed, the police are apt to be somewhat lax in upholding the laws that might otherwise protect subjects because of a presumption that research of almost any sort is useful, or at least respectable. Professors and scientists have traditionally resisted any proposal that outsiders participate in judgments about what research ought not to be done because its objectionable side-effects outweigh its value. (The American Psychological Association continues this tradition in its newly published Ethical principles in the conduct of research with humanparticipants, where numerous ethical precepts are set forth but the only advice given to an investigator whose proposed experiments would violate these precepts is to weight the harm to be done by the research carefully against its possible benefits.) But, when the experimenters themselves decide how much mistreatment the importance of their work justifies them in inflicting on their subjects, the result is exactly what social psychologists would predict: Simple lying becomes a perfectly commonplace feature of even students’ routine laboratory exercises; humiliation of 1. Zimbardo makes one other claim about the value of this research: ‘From what we have learned by observing the process of dehumanization and causal matrix in which pathology was so easily elicited, we can help design not only more humanitarian prisons but help average people break out of their self-imposed or socially ascribed prisons. We have begun to do the former with correctional personnel,
and the latter through a fuller exposition of the psychology of imprisonment in a forthcoming book’ (1973b, p. 123). Zimbardo does not explain how an experiment whose results were foreseeable can help us escape whatever metaphorical prisons we are in, nor how, except for its possible use in enlightening the ignorant, it can help him design more ‘humanitarian’ physical prisons.
Professors and psychological researchers: Confrcting values in conflicting roles
149
subjects is not uncommon; on occasion there is a hell like Zimbardo’s. Society survives in spite of its used-car salesmen, its politicians’ assistants, and a host of other people whose roles tempt them to be as obnoxious as the law allows, and it will not be destroyed by a few dozen psychologists who are similarly overzealous in the pursuit of their careers, but this particular kind of morally obtuse zeal raises special problems for the university. Most of the psychologists whose experiments involve mistreatment of human subjects are university professors, and most of their subjects are university students. Professors who, in pursuit of their own academic interests and professional advancement, deceive, humiliate, and otherwise mistreat their students are subverting the atmosphere of mutual trust and intellectual honesty without which, as we are fond of telling outsiders who want to meddle in our affairs, neither education nor free inquiry can flourish.
REFERENCES American Psychological Association (1973) Ethicalprinciples in the conduct of research with human participants. Washington,
D.C. Zimbardo, Philip G., Banks, W. Curtis, Haney,
Craig, and Jaffe, David (1973a) The mind is a formidable jailer. The New York Times Magazine, April 8, 38ff. Zimbardo, Philip G. (1973b) Letter. The New York Times Magazine, May 20, 123.