Cognition, 8 (198;) 369-387 @ Elsevier Sequoia S.A., Lausanne
1 - Printed
in the Netherlands
What young children think you see when their eyes are closed* JOHN H. FLAVELL SUSAN G. SHIPSTEAD KAREN CROFT
Stanford University Abstract The common assumption that young children egocentrically believe you cannot see them when their own eyes are closed was investigated in two studies. It was found that 2.5-4-year-olds, but not 5-year-olds and adults, would indeed often give a negative reply to the experimenter’s question “Do I see you? ” when their eyes were closed and covered with their hands. However, they would also correctly reply that the experimenter did see their arm and an object placed in front of them and did not see their eyes and back, indicating that they were making veridical, nonegocentric inferences about the experimenter’s visual experience. In addition, their eyes being visible to the experimenter did not prove to be either a necessary or a sufficient condition for their judgment that the experimenter could see “them”(“you “7. It was concluded that, in this context, adults take “you” to mean their whole body while young children take it to mean primarily their face region. Speculations were made as to how young children could have acquired this meaning, and about possible similarities and differences between the self conceptions of young children and adults.
Knowledge concerning visual perception constitutes one form of social or psychological cognition (Shantz, 1975). Flavell and his co-workers have hypothesized that there are at least two developmental levels of such knowledge (Flavell, 1974, 1978; Lempers, Flavell, and Flavell, 1977; Masangkay, *This research was supported by National Science Foundation Grant no. BNS 76-16830. We wish to express our gratitude to the nursery school children and teachers whose cooperation made these studies possible; to Eleanor Flavell for her assistance in testing subjects; to Eleanor Flavell, Rachel Gelman, and Ellen Markman for their critical reading of the manuscript; and to Barbara Abrahams, Eleanor Flavell, Eleanor Maccoby, Ellen Markman, Sandra Starr, James Speer, and numerous others for their valuable suggestions about tasks and interpretations of results. Portions of this paper were presented at the meeting of the American Psychological Association, Toronto, August 1978. Requests for reprints should be sent to John H. FlavelJ, Department of Psychology, Stanford University, Stanford, Calif. 94305.
370
J. H. Flavell, S. G. Shipstead and K. Croft
McCluskey, McIntyre, Sims-Knight, Vaughn and Flavell, 1974). At earlierdeveloping Level 1, the child can nonegocentrically infer what objects another person does and does not see, given adequate cues. At later-developing Level 2, the child further knows that an object simultaneously visible to both the self and the other person may nonetheless elicit different visual impressions or experiences in the two if their viewing circumstances differ (cf., Hughes, 1975). A recent study by Flavell, Shipstead, and Croft (1978) illustrates how surprisingly nonegocentric and skillful Level 1 children can be at inferring whether an object is or is not visible to another person under various perceptual conditions (see also Hughes, 1975). Children of ages 2.5, 3, and 3.5 years were tested for their understanding of object hiding. Even the youngest subjects nonegocentrically hid an object from another person’s sight by placing it on the opposite side of a screen from that person, even though placing it there necessarily left it unconcealed from themselves. Most of them also correctly recognized that the other person could see the object when the screen was interposed between them and the object (thereby blocking their own view of it), but that the other person could not see it when the screen was interposed between that person and the object. In sum, they did not seem to mistake what they themselves did and did not see for what the other person did and did not see. Thus, previous research would lead us to expect that children of this age would also do well on the following unusual type of Level 1 task: (a) the child and another person face one another, (b) the child’s eyes are closed and/or covered, (c) the child is told that the other person’s eyes are open and directed at the child’s face, (d) the other person then says, “Do I see you?” The child’s total lack of visual input or experience in this situation should provide an unusually powerful temptation to respond egocentrically. Consistent with this, there seems to be a popular assumption that young children do often egocentrically assume that others cannot see them when their own eyes are closed. For instance, they are sometimes observed to merely close or cover their eyes rather than conceal their whole body when playing hide-andseek. On the other hand, if it should turn out that young children respond nonegocentrically rather than egocentrically in this putatively egocentrismtempting situation, it would suggest that their Level 1 knowledge is very solid indeed. It would also lead us to question what appears to be a folk belief about what young children think you see when their eyes are closed. The major purpose of Study 1 was therefore to test the solidity of 2- and 3year-olds’ Level 1 knowledge.
What young children think you see
371
Study 1 Method Subjects The subjects were 64 children from middle-class nursery schools and kindergartens, plus nine Stanford University students and staff. The age groups were categorized as 2.5 years (mean age 32.9 months, range 30-35 months), 3 years (mean age 39.4 months, range 36-41 months), 3.5 years (mean age 44.7 months, range 42-47 months), 5 years (mean age 63.3 months, range 60-67 months), and adult (mean age 23.0 years, range 21.2-26.0 years). There were eight girls and eight boys in each child group, four women and five men in the adult group. Procedure The experimenter and subject sat facing each other across a low table with a Snoopy dog toy on it. The adult subjects were told that the tasks were designed for young children and were therefore very simple. The adults were also told to answer each question quickly, giving their first, “gut-level” reaction; they were not to think before answering-just answer. The tasks described below were presented in random order, with the exception that the task Two Eves Closed or Covered,’ administered twice, was always the first (A) and the last (B) task given. 1. Two Eyes Closed or Covered A After the child closed or covered both eyes, the experimenter said, “Now your eyes are closed, and my eyes are open.” Then she asked, “Do I see Snoopy?“, and then, “Do I see you?“. If the child indicated that the experimenter did not see him, the experimenter proceeded to ask, “Do I see your head? “, and then again, “Do I see you?“. (A number of children had difficulty keeping their eyes closed and so were asked to cover them with their hands instead. Regrettably, we did not record which or how many children covered rather than closed their eyes.) 2. One Eye Covered The same procedure (minus the initial statement and the Snoopy was repeated while the child covered one eye with his hand. ‘To aid discrimination,
Study
1 task names will be italicized
and Study
2 ones wiU not.
question)
372
J. H. Flavell, S. G. Shipstead and K. Croft
3. Mouth
Covered
The child was asked to cover his mouth with a hand. The experimenter “Do I see your mouth?“, and then, “Do I see you?“. 4.
Two Eyes Exposed
The child stepped away material and looked at The experimenter could nose. The experimenter you?“. 5.
asked,
from the table and stood behind the experimenter through a small thus see only the child’s eyes and asked, “Do I see your eyes?“, and
a long piece of rectangular slot. the bridge of his then, “Do I see
Turn 180”
The child sat facing away experimenter said, “Your too.” She then asked, “Do the experimenter followed
6. Experimenter
from the experimenter with both eyes open. The eyes are open, and I’m going to keep my eyes open I see you?“. If the child responded in the negative, with, “Do I see your head?“, and “Do I see you?“.
Eyes Covered
A second experimenter faced the child, closed both eyes, and covered them with her hands. The first experimenter then asked the child, “Do you see _ (name of second experimenter)?“.
7. Reflective
Glasses
The
child and experimenter took turns putting on the “special” glasses (silvered ski sunglasses) to show that the wearer of the glasses could see the other but the other could not see the wearer’s eyes. The experimenter verified the child’s understanding of these features of the glasses before posing the questions. The child put on the reflective glasses and was then asked, “Do I see your eyes?“, and “Do I see you?“.
8. Two Eyes Closed or Covered B Same as 1.
What young children think you see
373
Rationale A curious and wholly unanticipated pattern of responding was observed during the pilot testing for this study. With only their eyes closed or covered, some young children would say that the experimenter did not see “them” (‘*you”), just as the popular assumption would predict. However, they would also correctly reply that she did see their head, arm, or other objects in her field of vision. The Both Eyes Closed or Covered task was included to find out how frequently this pattern would be observed in children of different ages. More generally, the set of seven tasks was designed to identify the visual conditions of observed and observer which influence young children’s judgments about what the observer sees. One possibility is that young children egocentrically assume the other person cannot see anything at all when they themselves cannot. As suggested above, recognizing that others can see when the experience of not seeing anything is filling one’s own field of awareness may require more Level 1 ability to decenter from one’s own perspective than young children possess. Negative answers to all Two Eyes Closed or Covered questions would support this possibility; negative answers to the “you” questions only (the response pattern seen in pilot testing) would clearly rule it out. A second easily tested possibility is that they believe the other cannot see “them” unless both their eyes are open (One Eye Covered). A third is that she cannot see “them” if any important part of the face is concealed from her view, or if they engage in any sort of self-hiding gesture (Mouth Covered). The other tasks, together with Two Eyes Closed or Covered, could provide at least tentative evidence for other possibilities that will be considered below.
Results and discussion Table 1 shows the percentages of correct answers to each task question in each of the five age groups. Recall that the questions which are most indented in Table 1 were asked only of subjects who had given an incorrect (negative) answer to the “you” question immediately preceding them. We shall first describe and discuss the adult response pattern, then the developmental trends leading to it, and finally the nature and possible meaning of immature patterns. Adult pattern The adults answered all object and body part questions correctly. They also seemed to construe “you” and “ -” (experimenter’s name) as referring
374
J. H. Flavell, S. G. Shipstead and K. Croft
Table 1.
Percentage ofcorrect answers to each question answers given in paren theses) Tasks and Questions
in each age group (correct
Agea
2.5
3
3.5
5
Adults
100 37 80 10
100 37 90 30
100 50 15 12
100 87 100 0
100 100
81
100
94
100
100
3. Mouth Covered Your mouth? (no) You? (yes)
100 81
94 94
100 100
100 100
100 100
4. Two Eyes Exposed Your eyes? (yes) You? (ye$
100 81
100 75
100 81
100 31
100 33
50 88 25
31 91 18
50 1 00 12
69 100 20
100
44
50
56
87
100
94 44
100 31
00 62
100 63
100 100
100 31 91 9
100 44 89 11
00 50 00 12
100 87 100 0
100 100
1. Two Eyes Closed or Covered A Snoopy? (yes) You? (yes) Your head? (ye~)~ You? (yes) 2. One Eye Covered You? (yes)
5. Turn ISO” You? (yes) Your head? You? (yes)
(yes)
6. Experimenter Eyes Covered Do you see (experimenter)?
(yes)
7. Reflective Glasses Your eyes? (no) You? (yes) 8. Two Eyes Closed or Covered B Snoopy? (yes) You? (yes) Your head? (yes) You? (yes)
“N = 16 for each child group and 9 for the adult group. 4 ndented questions were only asked of subjects who had responded incorrectly (i.e., negatively) to the preceding “you” question. The percentages in these rows are thus based on Ns of less than 16 in all cases. b’or example, 6 of the 16 2.5-year-olds answered the initial “you” question of Task 1 correctly (37%). Of the remaining 10, 8 (80%) correctly answered the subsequent “head” question and 1 (10%) correctly answered the subsequent “you” question. ‘The “correct” answer to this question is somewhat arbitrarily set as yes here.
to each individual’s physical body taken more or less as a whole. Like body parts and external objects, “you” the body-as-a-whole was apparently experi-
What young children think you see
375
enced as “seen” to the extent that it was unconcealed and visible to the observer: definitely and unambiguously seen when most of it was visible; not seen or less certainly seen when only the eyes (or, for all one knows, any small portion of the body) were exposed to view, as in Two Eyes Exposed. Developmental
trends
The data in Table 1 suggest that there is considerable development towards the adult pattern between three and five years of age. Significant or nearsignificant decreases across this age range obtained for task 1, x2(3) = 10.79, p < 0.05, task 6, x2(3) = 7.51,~ < 0.10, and task 8, x*(3) = 11.29,~ < 0.05; the apparent decrease for task 5 is not significant. The age increase in negative answers to the “you” question of the Two Eyes Exposed task was also significant, x2(3) = 12.69, p < 0.01. While not all 5-year-olds responded like the adults, a good many did: eight responded affirmatively to each of the five “you” questions on tasks 1, 5, 6, 7, and 8 and five more responded affirmatively to four of the five; the corresponding figures for the three youngergroups were, from youngest to oldest, 3 and 1, 2 and 0, and 6 and 0. Immature
patterns
There is no suggestion whatever in the data that even the youngest subjects egocentrically assumed that the experimenter could not see anything when they themselves could not see anything. As Table 1 indicates, all subjects said the experimenter saw the Snoopy doll on both administrations of Two Eyes Closed or Covered. In addition, of those children who were asked the “head” question on Tasks 1, 5, and 8 (by virtue of having just said no to the “you” question), the percentages responding correctly were 83%, 94%, and 93% respectively. These affirmative answers are significantly more numerous than would be expected by chance (all are p < 0.001 by Sign Test) and therefore, of course, also far more numerous than would be predicted by any total-inability-to-decenter hypothesis. The data in Table 1 also indicate that almost all the children believed the other could see “them” when only one rather than both of their eyes was covered (One Eye Covered). The possibility that they would say no to the “you” question no matter what facial part was covered was also ruled out by the finding that most of the children also said yes to the “you” question when their mouth was concealed (Mouth Covered). Only 10 subjects consistently gave incorrect answers to the five “you” questions of tasks 1, 5, 6, 7, and 8. We analyzed children’s patterns of yes and no answers to these “you” questions plus the “you” question of Two
376
Eyes
J. H. Flavell, S. G. Shipsteacl and K. Croft
Exposed (task 4) to see if these patterns might at least provide clues about underlying beliefs in this area. We first excluded from these pattern analyses the 18 children who gave correct answers to the task I, 5, 6, 7, and 8 “you” questions, plus two others who responded incorrectly to several “head” questions and may therefore have had unusual attention or comprehension problems. This left a sample of 44 subjects. One imaginable childish belief is that you “see me” if and onZy if Isee, i.e., am sighted. A child who believed this should say no to at least one of the two “you” questions where he is unsighted (tasks 1 and S), but should say yes to all the “you” questions where he is sighted (tasks 4, 5, and 7). A second possible belief is that you “see me” if and o&y if Isee you. A child who believed that should also respond as above, except to say no rather than yes on the task where he is sighted but cannot see the experimenter (task 5). A third possible belief is that you “see me” if and only if you see my eye(s). The response pattern consistent with this belief is a yes on the task where his eyes are visible to the experimenter (task 4) and no on those where they are not (tasks 1 or 8, 5, and 7). Of the 44 subjects considered, one showed the first pattern, eight the second and 14 the third. Moreover, 11 of the latter 14 also said they could not see the experimenter in Experimenter Eyes Covered (task 6), a pattern consistent with the more general belief that anyone can be seen by an observer if and only if the observer can see the person’s eyes. The third belief differs from the first two in that it takes as the relevant consideration what the observer sees rather than what the observed person sees. The young subjects in this study obviously took what the observer sees as the relevant consideration when answering “head” and “Snoopy” questions. It is therefore reasonable to suppose that the same was also true when they answered the “you” questions. The overall pattern of results in Study 1 led us to the following conclusions and speculations. Consistent with their performance on other Level 1 tasks, 3-year-olds are quite capable of accurately and nonegocentrically inferring what physical objects the other does and does not see, even in the extreme condition when they themselves do not see anything. This suggests that their Level 1 knowledge is very robust and well consolidated, and thereby answers the question that originally motivated this study. If this is true, however, it implies that their negative answers to “you” questions were not usually caused by incorrect inferences concerning what or how much of their physical bodies were actually visible to the other. We are thus left with an intriguing puzzle that had not been anticipated when we undertook this study. The most likely alternative cause of these answers seemed to be that “you” or “see you” means something different to young
What young children think you see
377
children than it does to adults; the fact that some children deny that they can see the experimenter when her eyes are closed (Experimenter Eyes Closed) clearly suggests that semantic rather than perceptual considerations must be important here. It looks as if the rumored tendency of young children to think that others do not see them when they avert or cover their eyes does have a factual basis, although its meaning appears to be very different from what most of us would have suspected. Perhaps young children really do believe you “see them” in some special, nonadult meaning of these terms if and only if you see at least one of their eyes (see the above pattern analysis). And if so, could it conceivably be because they (a) take “you” to refer to their inner-psychological rather than outer-physical self in these task settings, and (b) believe that their inner-psychological self is somehow visible to others through their eyes? A search through thesauruses revealed that many writers from Cicero on have spoken metaphorically of the eyes as “the windows of the soul” or the equivalent. Implausible as it may appear, perhaps young children entertain some literal version of this idea, especially when eyeball to eyeball with large, seemingly all-knowing and all-seeing grownups. Study 2 was undertaken to obtain more and better evidence relevant to these possibilities than Study 1 afforded, as well as to see if the basic Study 1 results could be replicated. Study 2 Method Subjects The subjects were 6 boys and 16 girls from middle-class (mean age 46.6 months, range 39-52 months).
nursery
schools
Procedure The tasks described below were presented to the children in random order, with the exception that the Cognitive Self Interview was always administered last. Their rationales will become apparent in the Results and Discussion section. 1. Two Eyes Covered
The child and experimenter sat facing one another. The child closed his eyes and covered them with his hands. After making sure he could not see any-
378
J. H. Flavell, S. G. Shipstead arld K. Croft
thing,
the experimenter said: “My eyes are open and I’m looking. Do I see right now?“. This question was asked five times in succession, with the blank filled by “you”, “you, -” (child’s first name), “your eyes”, “your back” (a nonvisible body part), and “your arm” (a visible body part). These five questions were asked in a random order that was variable across subjects, with the constraint that the two “you” questions were always separated by at least one other question. This same questioning procedure was used in the next four tasks as well, except that the visible body part queried was not always the child’s arm.
2. Card The experimenter held a 5 X 8 inch white card perpendicularly about 20 cm in front of the child’s face, such that neither could see the other’s face. The visible body part was “your foot” in this task.
3. Turn 135” A second experimenter sat 135” to the subject’s right rear, holding a puppet. The child continued to look at it over his shoulder while being questioned, turning his upper torso a greater or lesser amount in order to do so. The visible body part was “your arm. ” “Your back” continued to be used as the supposedly nonvisible body part, although at least a portion of the child’s back was in fact usually visible to the first experimenter while the child looked at the puppet.
4. One Eye Exposed The child stood
behind a long piece of material with one eye pressed against a hole about the*same size and shape as his eye. The questioning procedure was identical to that used in task 1, except of course that “your eye” was substituted for “your eyes”.
5. Reflective The
Glasses and Mirror
properties of the silvered reflective glasses were demonstrated to the child much as in Study 1. The child then put the glasses on and the experimenter knelt behind him, holding a 33.5 X 23 cm mirror in front of them. The child could thus see in the mirror both his own face and that of the
Whatyoung children think you see
379
experimenter looking at him, but of course could not see his own eyes. Unlike in the Reflective Glasses task of Study 1, however, the child also “saw” that the experimenter could not see his eyes either. The visible body part queried was “the top of your head” and a sixth question followed the usual five, namely “Do you see yourself?“. 6. Where Experimenter
Looks
The experimenter
faced about 180” away from the child and said: “My eyes are open and I’m looking right here (pointing to an object across the room). Do I see you right now?“. The statement and question were then repeated with the experimenter pointing successively at (but not naming) the child’s shin, stomach, eyes, chin, and finally shin again. The order of eyes, stomach, and chin was randomized, however, thus making the order of the entire set of subtasks as follows: Away-Shin-(Chin, Eyes, Stomach)-Shin.
7. Experimenter
and Doll Eyes Closed
The experimenter said “Now -‘s right now?“. The child’s visual targets experimenter who had just closed her automatically closed when it was placed
eyes are closed. Do you see ___ were, in random order, the second eyes and a small doll whose eyes in a horizontal position.
8. Cognitive Self Interview The interview
dealt with the meaning, location, and potential visibility of the “cognitive self”, in that order. Using the abovementioned doll, the experimenter first explained that dolls are like people in some ways, namely, both have legs, arms, heads, etc. (pointing to corresponding body parts on the doll and on the child and experimenters). The experimenter then asked how dolls are different from people, and whether dolls know their names and think about things, as the child and other people do. The inflection of the questions and the nature of accompanying remarks suggested that people are in fact different from dolls in just these ways. The location questions then were: “Where is the part of you that knows your name and thinks about things? Where do you do your thinking and knowing?“. Every effort was made to get the child to listen very attentively to these questions and comprehend them as best she could. If the child did not indicate a location in response to these questions the experimenter gestured randomly and imprecisely towards
380
J. 11. Havell, S. G. Shipstead and K.
Croft
different areas of the child’s body, asking “is it here, here, here... where?” The visibility questions came next: “If I look here (points), at (in) your -, do I see the part of you that knows things and thinks?“. Four body parts were named and inquired about in random order: stomach, foot, nose (but with the experimenter actually staring at the child’s eyes), and eyes. Ad lib follow up questions about the location and visibility of the cognitive self were also asked in many cases, depending upon the child’s previous responses and responsivity to the standard questions. Rationale
The subjects used in Study 2 were selected with the hope that they would still be young enough to give some immature responses to key “you” questions but also old enough to comprehend the ideas and questions presented in the Cognitive Self Interview. The questioning procedure of tasks l-5 in this study was intended to be a methodological improvement over that used in Study 1. As. for specific tasks, Two Eyes Covered provides a replication of Study l’s Two Eyes Closed or Covered, but with all subjects hiding their eyes in the same fashion. The Card task presents a condition in which the child can “see” that his face is not visible to the experimenter. If the visibility of his eyes to the observer is critical for the young child, this task should elicit a great many negative answers to its “you” questions. In Turn 135”, most of the front of the child’s body and a bit of the side of his face remains visible to the experimenter; as in Turn 180”, however, the child cannot see the experimenter. Moreover, in contrast to tasks like Two Eyes Covered, Two Eyes Closed or Covered, Two Eyes Exposed, One Eye Exposed, Card, and perhaps even Turn 180”, the child’s turning to look at the puppet in Turn 135” does not closely resemble any hiding-of-self action one could imagine young children of any culture performing in everyday life, e.g., in hiding games with parents. If young children also say that the experimenter does not see “them” in this task, therefore, it probably means they are not merely assimilating all our task conditions to culturally-acquired, stereotyped hiding games. One Eye Exposed provides a more stringent test than Two Eyes Exposed of the hypothesis that, for young children, eye visibility is a sufficient condition for a judgment that they are seen. Similarly, Reflective Glasses and Mirror should be a better test than Reflective Glasses of the possibility that eye visibility is a necessary condition. Once the reflective glasses were put on them, a number of the younger children in Study 1 seemed to have trouble maintaining their just-established recognition that others cannot see the
What young children think you see
381
wearer’s eyes through the glasses. For such children, then, the “you” question came immediately after a hard won and perhaps merely token negative answer to the “eyes” question; this could have led to a similarly shallow negative answer to the “you” question. In contrast, the children in Study 2 had no difficulty in believing that the experimenter did not see their eyes. The reason is that they could not perceive their own eyes and could also “see” that the experimenter could not perceive them either. The subtasks of Where Experimenter Looks might answer several questions. Would young children adopt the adult, whole-body-as-visual-target interpretation of “see you” in a situation designed to highlight it? The Away-Shin sequence should highlight it, since the experimenter first looks away from the child, then turns to look at a part of his body. When the experimenter does look at the child, will the child tend to say the experimenter sees him only when the experimenter looks at his eyes? Or might the tendency to reply affirmatively instead increase more or less continuously as the experimenter’s gaze approaches the eyes, for example from Shin to Stomach to Chin to Eyes? Will there be less tendency to say yes to the second Shin question than to the first one, since the immediate context is now being looked in the eye rather than not being looked at at all? Finally, as in Turn 135”, negative answers to, for example, Shin cannot be easily dismissed as generalizations from previous experience with hiding games. The Experimenter and Doll Eyes Closed subtasks follow up the Study 1 Experimenter Eyes Covered task. The child sees no hands-over-eyes actions that could be assimilated to gamelike hiding rituals in the former, however. The Doll subtask was included simply to find out whether any tendency to say that one cannot “see” other people when their eyes are closed applies only to real people. The principal motivation for appending the Cognitive Self Interview was to provide evidence for or against the windowsof-the-soul speculations advanced at the conclusion of Study 1. If the child localizes at least the cognitive part of the inner, psychological self (“soul”) in the head and also harbors this “windows” intuition, she ought to say the experimenter can see that part when he looks into her eyes. If this intuition depends upon actual eye contact as against the experimenter’s verbal and gestural specification of what he is looking at, the response to Nose and Eyes should be the same; if not, the two responses should differ. Finally, we were simply interested in finding an effective, methodologically adequate method for assessing whether and where young children locate at least one, fairly clearly specifiable part of the psychological self, namely, the thinking and knowing part. A search of the literature suggests that no such method has yet been devised (cf., Horowitz, 1935).
382
J. H. Flavell, S. G. Shipstead and K. Crofi
Results and Discussion Table 2 shows the percentage of subjects giving correct answers to body part and “you” questions. As in Study 1, the children did well on the body part questions. The one apparent exception (task 3, nonvisible part) is readily explained: as indicated earlier, part of the child’s back was in fact usually visible to the experimenter when the child turned to look at the puppet. Of the 14 other body part questions in tasks l-5, the mean number correctly answered was 12.9 1. The sturdiness of 3-yearolds’ Level 1 percept inference skills in the face of probable temptations to egocentrism is again demonstrated. The Two Eyes Covered “you” questions seem to have elicited roughly the same proportion of no answers in this study as the Two Eyes Closed or Covered “you” questions did for the Study 1 group most similar in age to the present sample, namely, the Study 1 3.5year-olds. The somewhat similar Card task “you” questions also elicited substantial percentages of negative answers. The curious tendency for this sort of task situation to elicit “you don’t see me” judgments in many young children thus appears to be quite
Table 2.
Percentages of subjects giving correct answers to each question
Tasks
Types of Questions “You”
Body Part Questions
1. 2. 3. 4. 5. 6.
Two Eyes Covered Card Turn 13.5” One Eye Exposeda Reflective Glasses and Mirror Where Experimenter Looks a. Away b. Shin c. Stomach d. Chin e. Eyes f. Shin 7. Experimenter and Doll Eyes Closed a. Experimenter b. Dolt
aAs in Study yes here.
l’s Two byes Exposed
Eye(s)
Visible
Nonvisible
100 91 86 100 100
86 82 86 100 82
100 100 23 100 1-l
54 23 64 41 86
Questions
“You,
_”
45 36 64 59 82
Both 36 23 46 32 17
100 45
_
73 86 86 50
_
task, the “correct”
“You”
answer
to “you”
questions
~
is arbitrarily
set as
Whatyoung children think you see
383
robust. There were nine no answers (41%) to whichever “you” question was asked first in Turn 135”, compared to 50% for Turn 180” in Study 1. More than is true for Turn 180”, negative answers to Turn 135” “you” questions cannot easily be explained as simple generalizations from self-hiding actions or games learned at home. The data from One Eye Exposed suggests that eye visibility is not in fact a sufficient condition for judged “you” visibility for at least a number of 3.5year-olds. Although all the subjects said their eye was visible, 32% said no to both “you” questions and 68% said no to at least one. Notice that these results could hardly reflect a belief that the experimenter had to see both of their eyes in order to see “them.” That belief would generate negative answers to the Study 1 One Eye Covered “you” question, and such answers were very rare (Table 1). The data from Reflective Glasses and Mirror very strongly indicate that eye visibility is not usually a necessary condition either. Although all subjects said their eyes were not visible in this task, 77% said yes to both “you” questions and 91% said yes to at least one (all subjects also said that they could see themselves). These percentages of yes answers are similar to those for Where Experimenter Looks: Eyes, where the experimenter actually looks at the child’s completely visible eyes. Finally, a child who consistently believed eye visibility to be both a sufficient and a necessary condition for “you” visibility should give two yes answers in One Eye Exposed and two no answers in Reflective Glasses and Mirror. Not one child showed this response pattern, however. It is of course possible that eye visibility might be a sufficient and/or necessary condition for judged “you” visibility in children younger than 3.5 years, although we frankly doubt it. It is apparent from the Where Experimenter Looks data that the AwayShin sequence did not seem to lead most of the 3.5-year-olds to adopt the adult, whole-body-as-physical-target interpretation of “see you”, as we thought it might: only 45% gave a yes response to the “you” question when the experimenter looked at their shins after having just looked away from them (first Shin question), with a similar (50%) rather than a significantly lower number giving the same response to the second Shin question. It is also clear from the Where Experimenter Looks data that yes answers were not given solely when the experimenter looked at the children’s eyes: they were as common or nearly so when their chins and stomachs were the visual targets. These, together with the yes answers to the Shin questions, constitute further evidence against the eye-visibility-as-necessary-condition hypothesis. The frequent no answers to the Shin questions, like those in Turn 135”, once again seem to argue against the supposition that children were merely assimilating our tasks to familiar hiding games. Finally, five of the 22 subjects
384
J. H. Flavell, S. G. Shipstead and K. Croft
(23%) said they did not see the second experimenter when her eyes were closed (task 7a), but none said this when the doll’s eyes were closed (task 7b). Fourteen of the subjects met the following criteria in their responses to the Cognitive Self Interview. First, they unequivocally localized the “part of you that knows your name and thinks about things” in one specific place. Second, they did not do or say anything in addition that was inconsistent with that unique localization, such as later indicating that the experimenter could see that part of them at a location other than the one initially specified. Of these 14 subjects, 10 localized it in the head, three in the mouth, and one in the shoulders. Among the other eight subjects, there were three mentions of stomach, one each of face, foot, hand, and knee, and one failure to specify any location. Significantly, no subject in either group mentioned or pointed to her eyes as a location. Our general impression was that the 14 who met these criteria understood our questions quite well and that most of the remaining eight probably did not. The two subgroups did not differ consistently in their performance on other tasks. Of the 14 who met these criteria, one said that the experimenter saw the part in question only when he indicated he was looking into the child’s eyes, one only when he indicated that he was looking at the child’s nose (but, according to procedure, was actually looking at her eyes), three answered affirmatively to both questions, and the remaining nine answered negatively to both questions. However, an examination of the interview protocols of even the five subjects who responded affirmatively here revealed no evidence whatever that they entertained any “windows-of-the-soul” conception of their eyes. The subsequent interchange with one went like this: “Can I see you thinking? No. Even if I look in your eyes, do I see you thinking? No. Why not? Cause I don’t have any big holes. You mean there would have to be a hole there for me to see you thinking?” The child nods. Two others also subsequently denied that the experimenter could see them thinking (“Cause the skin’s over it”, said one), while the remaining two localized the thinking part in the mouth and shoulders, respectively. From the children’s responses to standard and follow-up questions, the modal intuition seemed roughly to be that thinking and knowing go on inside the head and are therefore not visible to others; in particular, others cannot see these activities or the part of the self that does them by looking into one’s eyes. Although the main purpose of the Cognitive Self Interview was to settle the visibility-of-the-inner-you question, it also appears to be a more promising method for learning about very young children’s concept of the self than previous ones of its kind (Horowitz, 1935).
Whatyoung children think you see
385
How, then, to explain the results of the two studies? It is possible that the young child’s tendency to say yes in response to a given task’s “you” question partly depends upon what he thinks the observer sees in that task condition. What he may think the observer sees is characterized below in the form of an ordered series of categories. The Study 2 tasks that seem to belong in each category are also given, together with justifications where needed: 1. None of body-Where Experimenter Looks: Away. 2. None of face but some of body-Where Experimenter Looks: Shin, and Card. In pilot work with the Card Task, we found that a number of children did not think the experimenter saw their arm when he held the card between their faces. Many did think he could see their foot, however, and that was consequently selected as the visible body part. This explains the present classification of this task under “some of body.” 3. None offace but most of body-Where Experimenter Looks: Stomach. 4. Some of face-Two Eyes Covered, Turn 135”, and One Eye Exposed. In Two Eyes Covered, the child’s hands covered most of the rest of her face as well as her eyes. 5. Most of face-Reflective Glasses and Mirror. 6. AZZofface-Where Experimenter Looks: Eyes and Where Experimenter Looks: Chin. Let us make the post hoc hypothesis that the child’s inclination to say yes to “you” questions increases as task conditions progress from category 1 to category 6. We can then compare the rank order of the 10 tasks based on their category membership with the rank order of these same tasks based upon children’s percentages of yes answers, as shown in Table 2 (where a task had two “you” questions, the average of the two percentages was used for the rank-ordering). The rank-order correlation between the two sets of ranks is 0.92, suggesting that the “dimensions” underlying this ordered categorization may in fact have affected the children’s judgments in the hypothesized way. These and other findings in the two studies suggest the following speculations about the nature and development of the young child’s reactions to our “Do I see you?” questions. When adults (and children) refer to the child, to themselves, or to other people present by the appropriate personal pronoun, they are apt to look at or otherwise direct attention to the face of the person referred to. “Look at me” is usually correctly understood by the child listener to mean “Look at my face.” “ I want to tell you something” is normally accompanied by looking at the child’s face, a co-occurrence he can readily observe. Moreover, should he fail to turn his face to meet the adult’s gaze under these circumstances for any reason (inattention, apprehension, etc.), the adult may effectively get each pronoun associated with its appropriate face by saying
386
J. H. blavell, S. G. Shipsieadand
I(. O-oft
something like “Look at me when I talk to you”, perhaps manually turning the child’s face towards her for good measure. When adults refer to “your arm”, “your leg”, etc., while speaking to the young child, the child usually sees them look at those parts of his body. On the other hand, when they refer to “you” while speaking to the child, he usually sees them look at his face. Such experiences might lead a child to think that the “you” that is sometimes visible to another and sometimes not (thus, precisely the “you” that our tasks must make salient) is roughly coextensive with his face. It might thus seem sensible to the child, although not to an adult, to say that he does not “see” another person whose hands cover her eyes and most of her face (Experimenter Eyes Covered task in Study 1). This “You, your face”, like the adult’s “You, your body”, is a wholly external, physical affair; like the adult, the child has learned that only external, physical entities normally vary in visibility from one observer circumstance to another. The Cognjtive Self Interview data suggest that many young children may also have inklings of another “you’‘-one that knows and thinks. Interestingly, this “you” is situated quite close to the other one. However, it is wholly internal rather than external, and has no ocular windows through which it can be seen (although it might be conceived as material by some young children, and hence visible if one could only see inside somehow). Part of self concept development may therefore take the following form, at least in the subculture from which our subjects were drawn: Both adults and young children (circa 3--4 years of age) have intuitions about at least one kind of inner, psychological self, a cognitive one, and they both probably localize it in the same place: the head. Both have also developed intuitions about at least one kind of outer, physical self, the self that is visible to others, but they probably localize it in different places: the entire body surface, in the case of the adults; largely the facial surface, in the case of the children. By age 5 or so, these differences in the conceived extension of this kind of physical self have largely disappeared.
References Flavell,
J. H. (1974) The development of inferences about others. In T. Mischel (Ed.), (ind~sln&i~z~q other persons. Oxford, England, Blackwell, Basil & Mott. Flavell, J. H. (1978) .The development of knowledge about visual perception. Nebraska Synzposirtrn on Motivation, 25, 43-76. I:laveU, J. [I., Shipstead, S. G., and Croft, I<. (1978) Young children’s knowledge about visual perception: lliding objects from others. Child Devel., 39. 1208-121 1. Horowitz, E. L. (1935) Spatial localization of the self. J. Sot. Ps~~~hoZ.,6. 379 -387.
What young children think you see
387
Hughes,
M. (1975) Egocenrrisrn in preschoolchildren. Unpublished doctoral dissertation, University of Edinburgh. Lempers, J. D., I’lavell, E. R., and Flavell, J. H. (1977) The development in very young children of tacit knowledge concerning visual perception. Genet. Psychol. Mono., 95, 3-53. Masangkay, Z. S., McCluskey, K. A., McIntyre, C. W., Sims-Knight, J., Vaughn, B. E., and Flavell, J. H. (1974) The early development of inferences about the visual percepts of others. Child Devel., 45, 357-366. Shantz, C. V. (1975) The development of social cognition. In E. M. Hetherington (Ed.), Review of child development research (Vol. 5). Chicago, University of Chicago Press.
RPsumC L’iddc, largcmcnt repandue, quc les jeunes cnfants croient cgocentriqucmcnt qu’on nc peut Its voir lorsquc lcurs ycux sont fcrmcs a don& lieu a dcux dtudes. Lcs sujcts de 2.5 i 4 ans rdpondent souvent negativement a la question de I’experimentateur “1%-ce que je te vois?” quand leurs yeux sont fcrmes et couverts par leurs mains. Ni lcs sujcts dc 5 ans ni let adultcs ne donnent cette reponse. Ccpendant, lee jcuncs sujcts rcpondcnt corrcctcmcnt que I’cxpcrimentatcur peut voir leur main et un objet placd devant cux alors, et nc peut voir ni lcur ycux ni lcur dos indiquant ainsi qu’ils peuvcnt faire des infcrcnccs vraicc et non dgoccntriqucs sur Its possibilites visucllcs dc I’cxpdrimcntatcur. En outre, le fait quc leurs yeux soicnt visiblcs par I’cxperimcntatcur n’est ni une condition ncccssaire ni unc condition suffisantc pour justifier leur position que I’cxpcrimcntateur peut les voir “toi”. On conclut done quc les adultcs considcrcnt “toi” comme representant leur corps entier alors que Its jeunes cnfants considcrent qu’il rcprcscntc la r&ion de leur visage. On s’intcrrogc pour savoir comment lcs jcunes cnfants ont acquis cettc id&c ct sur Its points de diffcrcnccs ct dc rcsscmblances cntrc Its conceptions du moi chcz lcu jcunes cnfants et les adultcs.
Cogrzition, 8 (1980) 389-416 @ Elsevier Sequoia S.A., Lausanne
2 - Printed
in the Netherlands
A deletion ahead of its time* HENRY
HAMBURGER
National
Science Foundation
Abstract An alleged defect in transformational treatment of syntax acquisition is the absence from child speech of certain predicted errors with Cvh-’ constructions. In this paper, a theory of acquisition dynamics and intensive longitudinal data are brought to bear on this issue. The key observations involve an early precursor, at 24-28 months, of the relative clause. The analysis sheds light on two fundamental issues in transformational acquisition theory.’ The permissibility of simultaneous rule changes and whether a transformation can be acquired before the associated deep structure. The issues and analysis can be translated into non-transformational terms.
Introduction Acquisition of the restrictive relative clause (RRC) is arguably one of the major achievements of the language responsible cognitive structure: the syntax of the RRC allows recursion, the basis of unlimited novelty in language; while its semantics involves presupposition and nonlocal construal.’ Perhaps such an achievement should be expected to come slowly, and indeed Tavakolian (1977) finds that in comprehension tasks five-year-olds typically have only partially mastered the construction. In contrast (though not in contradiction), Limber (1973) reports some relative clauses uttered by children under three. Part of this paper documents the emergence of RRCs *With heartfelt thanks to Emily, whose company was always a delight, and to Nancy, Wayne and Jeffrey for their interest, cooperation and friendship. For helpful comments at various stages I especially thank Sharon Klein, and also Stephen Cram, Judith Hamburger and an anonymous reviewer. This work was carried out at the University of California, Irvine. Reprint requests should be sent to Henry Hamburger, Division of Information Science and Technology, National Science Foundation, 1800 G Street N.W. Washington DC 20550, U.S.A. IS ‘The expression “language responsible cognitive structure” . used here in the theoretically neutral sense discussed in Walker (1978). “Nonlocal construal” is also intended to be theoretically neutral. It refers to the phenomenon of a noun phrase position going unffled while its referent is verbally realized arbitrarily far away. Such a position is the one, for example, denoted by @ in the restrictive clause of the sentence “here is the man that you think I said you saw 4.”
390
H. Hamburger
in the spontaneous speech of a still younger child, who participated in an intensive longitudinal study. It will be argued that this child was developing a construction with key attributes of the RRC between the ages of 24 and 28 months. Though this documentation is of interest in its own right, (see Gelman’s (1978) work in the cognitive domain), its principal importance lies in its relation to more general issues, three of which deserve mention. First, the pinpointing of early stages in the emergence of a construction is only a means to the broader objective of establishing principles of language acquisition. For this reason, proposed changes in the child’s grammar are discussed in terms of an important aspect in the theory of language learnability, how radically a grammar can be altered in response to a piece of input information. Such issues have received little attention for want of a comprehensive theoretical framework suggesting their significance. A second issue arises from studying the first transition in this particular case of RRC acquisition. The rule change that will be posited leads to a possible explanation for a later ‘missing step’ in some children’s acquisition of w/z-movement and to a more general puzzle-solving strategy for acquisition study. The third point of general interest is that the rules posited in the analysis of the corpus relate to important issues in syntactic and semantic theory. These three items, acquisition principles, puzzle-solving, and general theory receive considerable explicit attention in this paper and form the relevant context of the empirical work. The opening subsection in fact is a discussion of acquisition principles and their relationship to empirical methods. One particular principle is intrdduced in subsection 1.2 and shown to have played an important if implicit role in past discussions of acquisition of the RRC. Evidence is presented in section 2 for the construction that I claim is a precursor of the RRC. Its early appearances are in utterances like ‘That my did it’. Subsequent development of the construction, presented in subsection 3.2, is of theoretical interest in that it seems to be best accounted for by a deletion that is acquired ahead of its time, that is, before the correct adult base structure. The data and analysis shed new light on a long-standing problem for transformational accounts of the acquisition of wh-movement. Further, this analysis exemplifies a more general puzzle-solving technique, introduced in subsection 3.1. After an analysis of ensuing developments in Section 4, we turn, in Section 5, to possible origins of the construction, in which semantics apparently must play a key role. The relationship between the present analysis and work on the “basicoperations hypothesis” is discussed in the final section.
A deletion ahead of its time
39 1
1. On Principles 1.1.
What theory, what data
It is no secret that theory and data go hand in hand. What is less obvious is the interaction between particular kinds of theory and data and their potential joint significance for acquisition-based contribution to linguistic explanation. It will be contended that particular attention should be given to theories of dynamics considered in the light of longitudinal data. Arguments for or against such a position are rarely made explicit despite its significance for research directions. To examine the significance of theories of dynamics, as opposed to theories of statics, first note that both child language and adult language are products of human minds acting on certain inputs for different lengths of time. If those minds and the class of inputs do not undergo fundamental then it would follow that child and change in the course of development,’ adult language are members of the same class of systems. Such a conclusion would place a severe limitation on what a theory of statics of child language could contribute to linguistic theory that is not already available (and, at that, from more cooperative subjects) from adult statistics.3 Therefore I suggest that the crucial contribution of child studies is insight not into the structure of human representation and communications systems but into the process by which that structure changes. If the theory under construction is to be one of dynamics, then longitudinal evidence becomes particularly important, by the following reasoning. If we are to study how grammars change, then an obvious starting point would seem to be changes in grammars, or, in the singular, a change in a grammar. Finding a change in the grammar means studying an individual child intensively enough for long enough to establish a set of her rules and then establish a change in that set. Only after this extent of within-child aggregation of data would one proceed to comparison across children.” ?’ The data should ultimately relate to a comprehensive theory. This is not to deny the importance of considering individual principles of acquisition in *It is an unresolved question whether maturation somehow alters the acquisition device so that di5fering procedures are applied during early versL(s late childhood. A reviewer notes that child statics might still shed tight on linguistic theory, for example by virtue of relative simplicity. 40f course I claim that this stricture was adhered to in the current study. Bowerman (1973) presents detailed grammars for the same child at different points in time but does not pursue rule change in the manner suggested here. ‘Cross-sectional studies make a contribution to the study of dynamics that hinges on the similarity of development patterns across children. Matthci (1978), for example, remarks on this point.
392
H. Hamburger
the light of relevant data, as suggested and exemplified by Slobin (1973). Still we ultimately seek a set of such principles which, taken together, can be proved to constitute an internally consistent theory that accounts for the near-universality of children’s ultimate success. Such a theory would take account of much of what is known or conceivable about child output, adult input (to child), child processing and adult competence (the ‘goal state’). In view of these considerations, the present empirical study of a child’s language will draw upon the framework of ‘learnability theory’, an approach to the comprehensive objective just sketched.6 1.2. The continuity
principle
The particular aspect of the theory that relates to the empirical findings to be reported below is the question of how extensively the learner may alter her grammar in response to a single datum. By placing a constraint on the extent of this change, one takes account of the limits of mental computation. To be particularly undemanding on the learner’s disposition and capacity to compute, much work in learnability theory assumes that only one rulechange can be made at a time.’ This one-change-only assumption will be referred to here as the ‘continuity principle’. Note that a less stringent limitation, say to allow up to two, or seven or an unlimited number of simultaneous changes (as in Gold, 1967) would be more difficult to disprove empirically and hence would be a weaker claim. The continuity principle, then, is relatively demanding on the data, while being relatively undemanding on the learner. Despite the considerations favoring such a constraint, it appears, at least superficially, to be incompatible with the actual course of acquisition of \vlz-questions in English. Various researchers have noted the absence from children’s speech of echo-questions like “He can do what?” (Klima and Bellugi, 1966; Ravem, 1975, Prideaux, 1976; Maratsos and Kuczaj, 1978). Such forms would be anticipated (even if adults never used them) if the ~___ ‘Learnability theory is presented in lfamburger and Wexler (1975) and improved in Wexler and Culicover (1980). In that approach the learner is envisioned as hypothesizing and rejecting rules of a type permitted by universal grammar in an effort to bring her developing grammar into a state of compatibility with input information. Crucial to that work arc proofs that the principles posited yield the fundamental result of language acquisition, that children do learn language. The notion of hypothcsization is criticized in Braine (1971), pursued in a related context by Anderson (1976) and subjected to a detailed analysis in the context of syntactic theory by Baker (1979). 7Some such assumption seems inescapable on psychological grounds (short-term memory limitation), not to mention efficiency (computation time for each datum). An alternative limitation might be formulated in terms of number of symbols involved. One might even envision a probabilistic favoring of relatively simple changes, rather than an inviolable proscription. Whatever particular form a constraint may take, it seems appropriate in empirical work to look for minimal chances.
A deletion ahead of its time
393
relevant parts of the base had to be restructured before a particular transformational rule could be acquired, as a one-change-at-a-time constraint would require. This paper is in part an effort to provide empirical support for the theoretically attractive continuity principle, using data on another construction with wh, the restrictive relative clause. Although the possible existence of a constraint on the gradualness of change has not been an explicit issue in acquisition, it is of relevance to several significant empirical topics. W/z-questions have already been mentioned as one such topic, and, as noted, the empirical analysis in this paper will deal with the possibility of gradual acquisition of the restrictive relative clause. In addition, the debate over the redundancy of past tenses (and possibly other elements) in child language suggests that the rule of whether or not to mark past tense in the auxiliary is frequently learned non-simultaneously with the rule of whether or not to mark past tense in the main verb on those occasions when it is also marked in the auxiliary (Maratsos and Kuczaj, 1978; Mayer et al. 1978; Goodluck and Solan, 1979). Such non-simultaneity would be consistent with the continuity principle. Finally, the suggestion by Erreich et al. (1978) of “competing rules” can be thought of as arising in a natural way from the continuity principle. If there can only be one rule change at a time then one would expect that sometimes a new rule would be learned while an old one would not be instantaneously, simultaneously rejected. Instead the old rule would be rejected at some other time, possibly later. 2. A Non-Adult
Construction
It was argued above that important theoretical issues are involved in whether or not complex constructions can be acquired in small steps. A particular case of gradual acquisition will now be documented. The data come from a year-long study of a two-year-old in which 8,000 utterances were collected in 70 sessions by audiotape, longhand and videotape, together with relevant context and utterances of other individuals present. 2.1. Nutshell version I began regular sessions with Emily a few days after her second birthday. In our second session she twice referred to her artwork of the preceding day as my did it. Justification is provided below for not regarding the use of my here as confusion of first person pronouns and for regarding did and it as two separate words. The rule NP+ Det + VP, posited to account for such phrases, also generates the non-adult noun phrase in There’s another wash hands, uttered in reference to a sink at age 26 months.
394
H. Ham burger
Utterances like Look at my did (in section 3) reflect a change in the construction by the age of 27 months. The crucial difference between such an utterance and the earlier ones is the absence of a direct object for did. Absence of direct objects is confined to a certain class of verbs, those that give rise to a particular structural pattern of co-reference that plays an important role in current syntactic and semantic theory. Evidence that these constructions are indeed ‘proto-RRCs’ comes not only from their evident semantic intent, but also from a self-correction (toward the end of section 5) from Look-a m.v made to the correct adult RRC, Looku wha-I made. 2.2. hlitial data and anal~~sis The first relevant (1) 7/14/77
datum is utterance
My did it. (two instances,
(1). Its apparent separated
intent in the
in time)
situational context was to point out a referent. There was a painting on the wall, the product of her own artistic endeavors, and she was informing me that this was her ‘did it’: the thing that she did. It was not uncommon for her td use noun phrases as whole utterances at this time, frequently to point out a referent. There is an obvious alternative analysis of this utterance which must be discussed at length because it is initially plausible and would, if correct, undermine everything. Fortunately, the arguments against it are, 1 believe, compelling. Suppose one thought that utterance (1) were a sentence and its last two words a verb phrase. The first word would then by adult standards be the subject of that sentence, a conclusion that can be accommodated by assuming an error in the child’s choice of pronoun: WZ_V in place of 1. Although such a conclusion would be consistent with some observations of some children at some stages, it does not receive support from the relevant data in this case. In the first place nonsentential utterances, not unusual for young children anyway. are frequently exhibited by Emily. Those in (2), for example, occurred on the same day as (1) and gave no hint of prosodic incompleteness. (2a) (2b) (2~)
7/14/77 7/14/77 7/14/77
my cake my nightgown hurt m’self
Once the possibility of a noun phrase as a whole utterance is established, the above hypothetical analysis in terms of pronoun confusion is no longer necessary, tholigh still possible. A more direct argument against pronoun confusion
A deletion ahead of its time
395
comes from the pronoun data itself. Although many children use certain pronouns incorrectly at particular stages (cf. Brown, 1973, p. 210), and Emily herself used she’s as a possessive several months later, nevertheless she was using I and my almost flawlessly during the period under discussion. Of the 33 first person pronouns used on the same day as (I), seven cannot provide decisive evidence one way or the other: three were used in the construction itself, three were involved in a self-correction sequence and one was used in the sentence 1 did it which differs from (1) only in the relevant pronoun. All of the remaining 26 are clearly resolvable into subjective and possessive uses, as in examples (2a, b) and (3). Of the 15 subjective uses all were expressed by I; similarly, the 1 1 cases that were evidently possessive were all expressed by my. (3a) 7/14/77 (3b) 7/14/77 (3~) 7/14/77
My knee hurts I push a button I see baby
Thus the data indicate more than a statistical preference; such data is best described as rule-governed. Rules (4)-(S) account for the phenomena under discussion as well as a wide variety of other utterances in the corpus at this time. Rule (4) indicates that an utterance (U) may be a sentence or a noun phrase.’ Rules (5) (6a) and (7) are routinely posited for adults and make sense for many children. It is (6b) that represents the anomalous construction under study here, as indicated by the phrase-markers (9). Rule (8) reflects the systematic absence of the copula. The symbol Dx stands for a deictic element such as this or that.
(4a) (4b) (5) Ez:i (7) (8)
u
--,
11 S
S+ NY+VP
N NP + Det + VP i t VP- V+NP S+Dx+NP
Utterance (lo), assigned phrase-marker (9b), provides additional, stronger evidence for the earlier analysis of (1). Here, since the pronoun I is never, throughout the one-year study, used right after a deictic, the possibility of pronoun confusion can be eliminated altogether. Further, note that (1) is a *In X-prime (or X-bar) notation, Rule (6b) does not fit Jackendoff’s rules.
X”‘(+ Subj) can be taken as start symbol, (1977) scheme for base rules but is related
eliminating rule (4). to his de-verbalizing
396
H. Hamburger
(9)
DX
\
A Det
NP
my
did
I it
NP That
w
i did
I it
(b)
(4
substring of (10) and can therefore be judged a noun phrase on distributional grounds (in addition to situational context as earlier) by comparison to numerous other utterances at this time which are most straightforwardly accounted for in terms of rule (8). (10)
7/14/77
That my did it.
In summary, two claims are made for utterances like (1) and (10): (i) that the choice of my is deliberate and (ii) that my did it is a noun phrase. Each of these claims is consistent with and hence tends to support the other, provided that each can be independently supported. As independent evidence concerning the pronoun there is a consistently correct use of I and my in other circumstances. As independent evidence for the analysis of the entire phrase as a noun phrase there is situational context, linguistic intra-sentential context and discourse context. The situational context consists of the presence of an appropriate referent and the child’s overt attention to it. The intra-sentential evidence consists of the use of a deictic element in (10). Discourse evidence is provided by the interchange in (1 1). (11)
7/l S/77
H: What’s this? (pointing E: My did it.
to yesterday’s
drawing)
The use of the construction continues, as sentences (11) show, though with little variety of linguistic context. Within the VP there is no variation at all at this time, and in fact it will be almost two months before there is any internal variation, or, to put it differently, before the VP becomes internally productive. (12a) (12b)
7/26/77 7/26/77
This my did it. This is my did it.
A deletion ahead of its time
(12~) (12d)
8/19/77 8/19/77
397
That my did it. This my did it.
Although the (alleged) verb phrase did it initially shows no internal variation when used to replace a noun, as in (12) for example, it is not the case that the verb did can occur only with the direct object it, as the utterances in (13) show. (13a) (13b)
7/26/77 8/22/77
Catherine do this. I did this.
Since do and did are productive in other circumstances but not in the anomolous construction a skeptic might argue that the alleged construction comprises no more than a novel lexical item, didit, belonging to the category of nouns. However, a further justification for an analysis in terms of rule (6b) is provided by utterances which occur substantially later in time and appear to be related to the earlier ones. We turn now to these later utterances. The earliest clear example of a use of rule (6b) without ‘did’ and ‘it’ is shown in (14). (14a)
(14b)
There’s a wash hands. There’s another wash hands. (emphasis in utterance)
9/8/77 9/8/77
If the earlier sentences are taken as evidence for rule (6~) then the sentences in (14) which would otherwise appear anomalous are now actually predicted by grammatical rules already posited for the child at this time. Given the utterances in isolation one might imagine that “wash hands” refers to an activity in which some person is washing his or her hands. No such activity, however, was in progress at the time and it was clear from the situational context even for the first of these sentences that the “wash hands” was an object, specifically a sink. The second utterance dispels any possible doubt about this; Emily has just discovered that there is a second sink in the bathU
Dx
Det
(15)
A V
There’s
another
I
ANP
wash
hands
398
H. Hamburger
room in which she is located. Being used to finding only one sink in a bathroom she is understandably surprised and makes the comment. Phrase-marker (15) shows that no new rules are required to generate (14). Two more utterances, one of them on the same day and another about a week later suggest the wide range of applicability of the rule NP + Det + VP. These are shown in (16). (16a) (16b)
9/8/77 9/14/77
Here’s a f(l)ush-a-toi(let). This a close-a-door.
Both of these utterances were uttered while Emily was indicating a relevant object, not an action, in one case a toilet handle and in the other a door handle. It is of interest to note that these sentences do not all have the same semantic structure, as is suggested by the ‘such-that’ sentences, (17a --d), corresponding to (12b), (14a), (16a) and (16b), respectively. (17a) (17b) (17~) (17d)
This is a(my) thing such that I did it. There’s a thing such that one washes hands in it. Here’s a thing such that one flushes a toilet with it. This is a thing such that one closes a door with it.
The item being described in these sentences (the referent of it) may be said to express an objective, locative, or instrumental notion depending upon the sentence. There is also a syntactic difference: in (17a), it is a direct object, in (17b-d) an object of a preposition. Although it has been possible to ignore such distinctions in the above data, certain semantic differences -specifically in referential structure-will become crucial in the next section. Two months after the start of the study, the previously hidden distinctions in semantic structure begin to show up as differences in surface strings.
3. A Grammatical
Change
and Its Significance
The intensity of collection has made it possible, in the preceding subsection, to document a particular construction reliably despite its relatively infrequent use.’ The longitudinal nature of the study now enables it to reveal a key change in the construction. This change provides the ‘missing link’ that makes possible an interpretation compatible with the continuity principle. Thus ‘This infrequency xmld contribute to the explanation of why the construction has never been documented before. Possible reasons for the infrequency are advanced in subsection 6.1. On the other hand, the construction may be idiosyncratic to one child’s development. On this point see subsection 6.3.
A deletion ahead of its time
399
both the intensiveness and the longitudinality of the data turn out to be crucial. The changed construction will be documented in subsection 3.2 after some background remarks in subsection 3.1. 3.1. A puzzle and a solution approach To broaden the potential applicability of this work I shall suggest a general approach to solving puzzles in the acquisition of language. First an alleged puzzle will be presented, then the solution technique, and finally a demonstration of how the technique works in the particular case. Early transformational accounts of adult English present a puzzle by apparently predicting the occurrence in child speech of sentences like (18) as well as the echo question mentioned in subsection 1.2, neither of which is in fact observed. ‘O The theory called for the wh-word in the deep (18)
That’s we made what.
structure of (18) to be located at the position appropriate to its syntactic role, which in the case of (18) would be the direct object position in the embedded clause. A strong advantage of such an account is that the vacancy of that location in surface structure is thereby simply explained. Historically, acquisition arguments have been based on this formulation, so it is convenient to pursue the discussion in the same terms. However, the key finding here, that of gradualness, can survive a change in theory. Sunpose that one of the things that children must acquire is a set of transformations beginning with the empty set (McNeil, 1970; Hamburger and Wexler, 1975).” Then (18) could occur if a child’s grammar contained the relevant base rules but not the wh-movement transformation.” Such an intermediate situation would be consistent with the gradualness required by the continuity principle. “It is only an ‘apparent’ prediction because a theory of adult structure does not predict child structure unless supplemented by an acquisition theory (as noted in section 1). That is why the next parag;;ph mentions the assumption about hypothesization of transformations. The assumption that children acquire transformational grammar by hypothesizing and rejecting transformations is only one possibility. Of interest in this connection is the following, from Limber (1973): “... generally if two constructions have been derived from a common base form within a transformational analysis, it is a good bet that the construction furthest from the base will be produced earlier”. Unfortunately this comment is not accompanied by any analysis of its implications for the nf;ure of the acquisition device. An additional stage, between (18) and the adult form, can be inferred from the “basic operations hypothesis” of Mayer ef al. (1978), which breaks movement into copying followed by deletion. Copying wh but failing to delete it would yield the intermediate form ‘That’s what we made what.’ Utterances like this have not been observed either. Goodluck and Solan (1979) argue that since wh phenomena involve unbounded context, they are different from the movement rules that give rise to the incorrect redundancies accounted for by the basic operations hypothesis.
400
H. Hamburger
The method of solution proposed for such puzzles is as follows: Upon encountering a ‘missing error’ (such as the absence of utterances like (18) above), (i) find an earlier comparable construction with the corresponding error (comparability will be discussed below); (ii) look for subsequent data on which this comparable error has been corrected; (iii) assume that the ‘correction’ (a rule or a constraint acquired by the child) is couched in simple, general terms; and (iv) thereby explain the instant correctness of the more advanced comparable structure, when it emerges, as the consequence of the immediate applicability of the general rule posited as a correction to the earlier construction. To apply this strategy to the particular case of the relativization puzzle. one would need an earlier construction that was “comparable” to relativization in some sense appropriate for step (i). This role will be tilled by the constructions generated with rule (6b). The claim that these constructions are legitimate precursors to relative clauses finds support in both semantics and syntax. Semantically, both the VPs generated by (6b) and adult restrictive relatives restrict the range of possible reference of the noun phrase of which they are a part. Moreover each does so in terms of a presupposition. Syntactically, if X-prime notation is adopted (Jackendoff, 1977) then the VP of (6b) and the S in the adult base rule for relatives are each expressed as a V with some number of primes, so that only a minimal modification will be needed to move from one form to the other. The second step in the above strategy for solving an acquisition puzzle is to find a change in the construction under study; changes are in any case crucial to a theory of dynamics. 3.2.
The ‘deletion’
About a week after the utterances (16), a change does take place: the did it phrases give way to did phrases as can be seen in (19). Each of these utterances contain a substring consisting of a determiner (the possessive my) and (19a)
9/22/77
(19b) (19~) (19d) (19e)
10/l l/77 1O/l l/77 IO/l l/77 1O/20/77
Look-a my dih. H: Look-a my . . . what? Did. Look at my did. Look at my did (separately from b) Look at my doing. My drawed. H: (As if not realizing that she had spoken) My drawed
What’s that?
A deletion ahead of its time
(19f)
10/27/77
(19g)
1 l/20/77
401
M’drawed outside (her painting is outside and we can both see it through a glass door). Look-a my made, . . . Henry
a transitive verb. Add to this a noun phrase (for example the pronoun it as in (12)) and the whole substring would be generable from NP, with the help of rule (6b). It therefore seems likely that these Det + V substrings are developmentally related to the earlier ones. Moreover, notice that each one of them fits into an NP position in its utterance, given a reasonable grammar, say, one consisting of rules (4)-(8), and (20). (Rules something like those of (20) are needed in order to account for numerous other utterances at this time. Rule (20b) is used in generating (19f)) (20a) (20b)
S -+ Look at + NP S + NP + Lot
Suppose then that the substring my did at this stage is indeed an NP and is generated with the help of (6b). Then did comprises a VP and it becomes necessary to account for the absence of its direct object. There are several possibilities but most of them do not stand up under scrutiny. First, the three verbs used in the construction in (19) might be thought to be intransitive for the child, as a result of an incorrect syntactic feature attached to the particular lexical items. However, these verbs in other constructions are invariably accompanied by direct objects whereas in this construction during this time interval they never are. This observation also rules out the possibility that the child has come up with a rule like VP --f V, which would allow all transitive verbs to appear without direct objects. Another possibility is that (6b) has metamorphosed into (21a). Certainly such a rule would enable us to generate all the utterances in (19). It is worth noticing, however, that all of the verbs involved in the construction are tran(21a) (21b)
NP-+ Det+V NP + Det + Vt
sitive, whereas (2 la) leads us to expect intransitives as well. Rule (21 b) overcomes this latter objection, though it requires added baggage in the semantic domain where a new rule will be needed for interpreting a Vt that is not dominated by a VP. Moreover, a generalization would be missed in such a description, since the old rule (6b) is still needed in order to generate the utterances in (22), which, be it noted, all use a transitive verb in the VP generated by (6b).
402
(22a) (22b) (22~)
H. Hamburger
1 l/20/77 1 l/29/77 1 l/29/77
Here’s a flush it Find my brush teeth I have a brush teeth
All of the above proposals consist of alterations in the base grammar. Turning to transformational possibilities, it is always easy enough to account for a missing element by a deletion rule. What is not obvious here is how to express the rule in such a way that it will apply in (19) but not in (22). Unfortunately there is no syntactic distinction between the two sets of utterances, so that (23), for example, will not serve. (23)
X Det V NP Y * 1 2 3 @J5
There is, however, a suppose that (17) is Putting (17a, b) into embedded subject into (24a) (24b)
semantic difference. As a first pass at the semantics, fairly close to the kinds of meanings to be expressed. (adult) relative clause format yields (24). Revising the the corresponding possessive in the head of the relative
This is a thing which I did There’s a thing in which one washes hands
and dropping subject-verb agreement for the now subjectless embedded verb yields (25). Stripping away the virtually contentless head noun thing and the (25a) (25b)
This is my thing which did There’s one’s (someone’s) thing in which wash hands
w/r-phrase (which or in which) leaves (26). By this sequence of operations, the alleged semantic distinctions in (17) between objective and locative (26a) (26b)
This is my did There’s someone’s
wash hands
noun phrases finally show up in (26) as presence IJ~YSUSabsence of a direct object, just as in the speech of Emily in (19) versus (22). Note that presence versus absence of a direct object in the surface string is not a simple consequence of presence or absence of an objective argument to the verb in the corresponding meaning structure, since such an argument is present in each case. In (17b), for example, hands is an objective argument of wash, just as in (17a) it is so related to did. The critical distinction then is not whether an objective argument exists, but what it refers to. In (17a), it is co-referential with the entire complex NP, a thing such that 1 did it. In contrast, hands in (17b) is not co-referential with any other construction.
A deletion ahead of its time
403
In view of the above, a simple’3 account of the difference between (19) and (22) would be provided by a rule something like (27). The recoverability(27)
Delete an NP if it is dominated
by a co-referent.
of-deletions requirement is clearly satisfied by this rule. Other formulations are possible, for example, a variant of (27) with “dominated” replaced by “dominated or commanded”, but there is no motivation for such a complication at this point. It is also conceivable that the distinction could be expressed in terms of semantic structures suggested by (17). However (27) makes it unnecessary to work out the details of semantic structure since it depends only on simple phrase-markers generable by rules already posited, as shown in (28).14 (28)
NPi
NPi
((27))
A
A V
VP
Det
VP
Det
NPi
V
#
Postulating rule (27) fulfills the third step in the four-step strategy outlined above and essentially completes the procedure. There remains only to show 130n whether simplicity is a virtue here, see subsection 6.1. 141t has been pointed out to me by Barbara Partee that if did it is assigned a straightforward hcalculus interpretation, namely AX did(z,x), then rule (27) can be replaced by the following: (i) Delete the noun phrase on which you h-abstract. Moreover, if, as she further suggests, my + N is generally interpreted as (ii), then my did (it) receives the interpretation (iii), provided (6b) is replaced by N --t VP.
LxR(I, x) I
hxR(I. x)
Axdid@, x)
= first-person singular = a variable R = a relation more abstract than ‘possessive’ (subsuming possessiveness). ;= a node whose semantic interpretation is the intersection of those of its daughters. The intersection in (iii) can readily be computed since did is a special case of the general relation R, while 2 is a special case of the variable z. The intersection simply takes the more specialized items, did and I respectively. The resulting interpretation of my did (it) is thus ‘the thing that I did,’ which is presumably just what it means.
404
H. Hamburger
that rule (27) is sufficiently general that it will also be applicable to restrictive relative clauses as soon as the child begins to use them. But this is already clear from (24): in (24a) the NP which co-refers in the manner specified by (27) is deleted, whereas in (24b) there is no co-reference and no deletion. In summary of this section, we find that my did it in (12) gives way to phrases like my did, my druwed, and my made in (19) without the direct object, it, though at the same time the direct object persists in phrases like a flush it in (22). The presence or absence of direct objects in the latter two sets of phrases cannot be explained on purely syntactic grounds. The semantic rule (27), on the other hand, not only accounts for the current data but predicts instantaneous success in omitting the co-referring noun phrase inside restrictive relative clauses, when the latter emerge. Such instantaneous success is in fact observed and has constituted a source of perplexity in transformational accounts of acquisition for 15 years.
4. Further Analysis and Further Development The data in the preceding section, specifically the utterances in (19) with their missing direct objects, could be handled by a narrower version of rule (27), one applying only to direct objects, rather than to any NP that meets the co-reference requirement. On the other hand, such a restriction on applicability would complicate the rule. It is therefore important to examine the empirical consequences of selecting (27) which, it should be noted, is evidently an obligatory rule, there being not even one violation of it after the appearance of the first datum to which it applies. What then are the potential consequences of applying (27) to noun phrases other than direct objects? In the case of prepositional objects and datives, it could be invoked to explain the occurrence of phrases like a sit on, referring to a chair, and my gave or my gave it, referring to a recipient, should such constructions occur. In fact, they do not, at least not in this sequence of corpora; however, neither do their counterparts with noun phrases present (as in a sit on it). Thus datives and objects of prepositions provide no evidence either for or against constraining rule (27). Turning to subjects, we find that they cannot possibly be deleted because the rules of the base grammar, as formulated thus far, do not provide the requisite input. Although (27) is potentially applicable to subjects, it can operate only on a noun phrase that is dominated by another, hence only in the event of recursion in the base phrase-marker. But the only source of recursion is rule (6b) which introduces a VP, not an S, so that there is no subject available to be deleted.
A deletion ahead of its time
405
Subject deletion would, however, be possible if S were recursive in the base, and we know it ultimately must become so in the adult grammar. Let us therefore speculate for a moment about the possibility that at some point rule (6b) undergoes a modification in which VP is replaced by S to yield (29). Such a change in the base grammar would make subject deletion possible, (29)
NP + Det + S
but note that any surface string so produced old rule, as a comparison of the underscored
Wa)
could equally well arise via the category strings in (30) shows.
(3Ob)
Nevertheless the existence of (29) could conceivably be verified in cases where the deletion rule was not applicable to the subject. Data bearing on these matters will be introduced shortly, after some formal consideration of the rule itself. Although a corpus from an individual child at a particular time should be dealt with in its own terms, it is also true that in the succession of grammars that we induce for a child, we expect the last one to be an account of the adult language. Therefore even though retrogressions are possible, we anticipate movement towards rules resembling those in the adult grammar. Since (29) resembles the adult rule NP -+ Det + N + S more closely than does (6b), an analysis claiming that there is a shift from (6b) to (29) would seem to be on the right track. Another formal consideration with respect to (29) is whether it would be introduced as a supplement to (6b) or as a modification of it. The modification of a rule seems a simple enough operation but it could be regarded as a twofold change: the rejection of an old rule coupled with the hypothesization of a new (modified) one. With two alterations of the grammar thus involved, the continuity principle would be at stake. It is clearly undesirable to let the success or failure of the continuity principle hinge on the definition of what constitutes one rule-change. Preferable to seizing on an arbitrary definition would be to compare potential alterations in the grammar in terms of how much processing activity they actually require in order to be imple-
406
H. Hamburger
mented. Precise measurement of magnitude of change therefore ultimately requires specification of all the details of the theory of the acquisition process. Certainly, however, the alteration of a single symbol within a rule would seem to accord with the underlying notion of the continuity principle, that of minimizing the extent of grammatical change brought about by a single datum. Just such a modification (alteration of one symbol) is what is contemplated here: the replacement of VP in rule (6b) by S to form rule (29). These two rules are rewritten as (3 1a, b), respectively, in X-prime notation. (Since there is controversy, even for adult grammar, over the appropriate number of primes, I am using notation that is neutral on that point.) It can be seen from (3 1) that the rule modification consists of adding a single prime. (31a) (31b)
Nj- Det+Vk N’ --t Det -t Vk+l
In trying to determine whether (27) is stated with the correct degree of generality, we have noted that although that rule would permit subordinateclause subjects to be deleted, the base grammar does not generate them. Replacement of (6b) by (29) would allow generation of embedded subjects and would be an exceedingly small alteration in the grammar, apparently in the right direction. Let us examine whether there is any empirical justification for such a change. Adding a subject to the subordinate clause of utterance (12a) would produce This my I did it, while adding one to (19e) produces My I drawed. Consecutive coreferring pronouns such as my, I do not occur for Emily, nor have they been reported for any other child despite considerable discussion of redundancy in the literature. (See Maratsos and Kuczaj (1978) where the significance of redundancy in child language is sharply circumscribed). The nonoccurrence of utterances like My I drawcd could be accounted for by a rule which, like (27), deletes one of two coreferring nodes. One candidate for such a rule is (32). The conditions stated here, that an NP is preceded (32)
If two nodes have the same referent and one of them precedes commands the other, then the second is deleted.
and
and commanded by a coreferent, results in compulsory pronominalization in adult English, according to Lasnik (1976). Further, the relationships “precede” and “command” seem to play a role in many languages in determining when an NP is reduced to a pronoun or to the null element in surface structure. It may well be that (32) is one member of a highly constrained set of such rules, hence readily hypothesized. In dialogues (33) an NP is omitted
A deletion ahead of its time
which corefers with an NP in the preceding question. This phenomenon relevant here, though analysis would require a grammar of discourse. (33a)
1 l/l /77
(33b)
12/l/77
(33~)
12/6/77
407
seems
Did children come to your house last night? E: Yeah. And knock on door. What did he do with his hat? E: Put on a head right here. What are you doing with the napkin? E: I’m putting on your muffin.
If rule (32) were present at the time of the utterances cited above ((12a) and (19e)), then each could be derived using (29) followed by subject deletion by (32). However, before subjects can be deleted they must first be generated and there is no evidence that they are, since the first few overt (undeleted) subordinate subjects, shown in (34), do not appear until shortly after the last (34a) (34b) (34~)
1 l/29/77 1 l/29/77 12/7/77
What’s that . . girl have. What’s that .. Snoopy have. Looket Mommy have on.
of utterances (19). Let us therefore revert to (28) as the derivation of my did, etc. Sentences (34) and later (35) all have subordinate clauses with a subject but no direct object. All of (34) and (35b) can be generated using (29) (35a) (35b) (36)
2/5/78 21817%
Looket that noise... you’re makin’ again. Let’s see... she’s doing. NP+ S
provided that Det can generate the null element. Such an assumption is not needed if (36) is posited rather than (29). Whichever of these may be the base rule, it is still (27) that deletes the direct object, and in either case (6b) is not needed. If (6b) has been superseded in the grammar by the time of (34) then it cannot be used to explain the occurrence, almost three months later, of (37). The analysis of the utterance in (38) uses (29) and suggests that (27) can indeed be applied to subjects. A similar analysis could have given utterances (22) but at the time they were uttered it was not clear that (6b) had been modified or displaced. (37)
2/25/78
(This is not a sandwich.)
This is a look like sandwich.
408
H. Hamburger
(38)
Dx
This is
NP
Det
NP
look like
sandwich
5. Origins An important question that must be answered by a dynamics theory is how alterations in the grammar are related to input data. In Hamburger and Wexler (1975) it is assumed that rules are rejected only when, applied at the level of the main clause, they could make the learner’s grammar become compatible with the input datum. This relationship between alterations and input is also emphasized by Erreich et al. (1980). The issue is of particular importance in the case of a rule which is not considered to be part of a workable account of the adult language; alterations that are moves towards adult grammar are to be expected. Thus here it is rule (6b) rather than its successors that most deserves our attention. What follows is not a precise ontogeny but a plausible conjecture of an input sequence that is compatible with the analysis in sections 2 and 3. My first record of did it phrases is so near the start of the data collection process that it is impossible to do more than speculate on what adult input might have triggered them. However, it seems eminently plausible that one of her parents might have shown a piece of Emily’s handiwork to the other while saying Emily did it. From the fact that Emily has no possessives other than my in her own speech production at this time, one might further speculate that on the comprehension side she would not distinguish whether the word Emil?, in this putative parental utterance is marked for possessive
A deletion ahead of its time
409
(apostrophe-s) or not. It would then be possible for her to construct phrasemarker (39), provided that she was willing to hypothesize rule (6b). It would (39)
u
!&I
I
A
@‘me)
V
Emily
did
NP
it
then immediately become possible to generate (1) which differs from (39) only in that Emily instead of my is the possessive determiner. A possible objection to the suggestion that Emily induced rule (6b) from such an input is that since the correct subject-verb-object parse was allowed by Emily’s grammar at the time, there would have been no need for her to go hypothesizing new rules in response to this datum. To see how potentially damaging this objection might be and to appraise whether it is justified, let us rephrase it. In terms of learnability theory, the trouble with the above scenario, culminating in Emily’s hypothesizing rule (6b), is that it conflicts with a principle of acquisition processing which may be called the “don’tmess principle” (Hamburger, 1979). This principle dictates that if the learner’s current guess at a correct grammar is compatible with the input datum then that guess is not altered. The role of this principle in the theory is to insure ultimate stability of the process. Since the learner’s task is to project an infinite language from a finite corpus, she can never be sure that she is finished and must consider each new datum as potentially bringing about an alteration in her guess at a correct grammar. If even data compatible with the correct guess could result in alterations then the learner would never stop making changes, so the process would never converge. Interpretation and evaluation of this argument hinges crucially on the nature of the data that the child acts upon. If the input were construed as consisting merely of surface strings (sentences) then the above scenario would indeed violate the “don’t mess” principle. The same would be true even with
410
H. Hanlburger
a stronger form of data, so-called “informant data” in which all strings over a given vocabulary are presented, each with an indication as to whether or not it is a member of the language. Such forms of data are introduced by Gold (1967) to study purely syntactic learnability. In contrast, Wexler and Hamburger (1973) present a formal argument that inclusion of semantic information in the duta may be crucial to the acquisition process. Let us therefore turn to semantic considerations. In the hypothetical situation introduced above, Emily would presumably see that her picture is under discussion. Suppose that she therefore anticipates a comment on possession, beginning with a possessive, and that she accepts the word Emily as a possessive. Proceeding from left to right, she would next be faced with the verb phrase and would need either to “back up” in her comprehension analysis and “reparse” the first word or else to hypothesize a new rule. In either case, a simple straightforward, left-to-right parse would elude her because of her preconception about the meaning of the utterance. ’ A good case can be made for hypothesizing a new rule in this circumstance, by pointing out that the alternative strategy of reparsing involves difficulties more serious than merely additional computation. Reparsing can be carried out only if the original assumption about the meaning of the utterance is rejections: a theoretical account of how the child decides, for each datum, learnability theory, comprising part of the data of the acquisition process. Rejection of data cannot be taken lightly. For a theory to countenance such an act on the part of the child would oblige us to come up with a theory of rejections: a theoretical account of how’the child decises, for each datum, whether to deal with it or whether it can safely be ignored. In this view of the matter, the hypothetical input here is an incorrect datum. Even though the adult utterance is grammatical, the situation has led the child to a false conclusion about its meaning. Thus the complete datum, which is the utterance-meaning pair, is counter-productive to the child’s overall task of deducing the mapping between possible meanings and possible utterances of the adult language. The tolerance of the theory for such incorrect data is an open question. The inclusion of meaning as part of the datum makes it possible for the hypothetical utterance-in-context to be handled by the existing grammar, so the “don’t mess” principle is’not violated. On the other hand, we must now face the fact that in this view the “don’t-mess” principle is not a guarantor of ultimate stability, since a correct grammar can be disturbed by an incorrect datum arising from a faulty guess at the meaning of an input. Such a problem may be countered by the child depending progressively more on the sentence itself to provide meaning, rather than the environment. On this point, see Roeper (1978).
A deletion ahead of its time
411
Turning from the early constructions to the later ones, let us consider the possible origins of the deletion rule, (27). A particularly suggestive utterance occurs right around the time of the emergence of (27). This is utterance (40a) which looks like a phonological deformation of adult sentence (40b). Utterance (40a) suggests that Emily may have already mastered the headless (40a) (40b)
9/20/77 (adult)
Look-a wy made. Look at what I made.
relative clause. However, no other such utterances appear for two months, during which time, as detailed above, we find utterances (19). Although Emily was able to utter (40a), she was evidently incapable at that time of incorporating into her grammar all of the adult rules involved in it. To do so would have in any case violated the continuity principle. It is nevertheless plausible that Emily’s own analysis of (40a) was sufficient to allow a principled hypothesization of (27). Consider this scenario: she hears (40b), deduces its meaning from context, can’t parse it, remembers it only well enough to utter (40a) later on, partially analyzes this utterance to yield (41), and from (41) deduces rule (27). (41) A wy
i”\ V I made
Np,
I @J
While the particulars of the foregoing scenario are admittedly speculative, there is a related, more generalized proposition for which I think (19), (22): (41) and several other utterances provide rather convincing evidence. This general notion is that Emily received some exposure to adult utterances which like (40b) lacked subordinate direct objects; that without adopting all the relevant adult rules she managed to match up the adult verb phrase to that of her own phrase-marker; and that she was thereby enabled to deduce either (27) or something exceedingly similar to it. There is one piece of particularly compelling evidence of the intent of Emily’s objectless-VPs-within-NPs and their relationship to adult language. The very last of them is self-corrected, under minimal prompting, into the very construction that I have claimed to be its adult counterpart. The interchange is shown as (42).
412
H. Hamburger
(42a)
1 l/20/77
(42b)
1 l/20/77
Look-a my made... Henny. H: Huh? Look-a wha-I made.
Utterances (40a) and (42b) are almost identical to each other and closely resemble the adult form. Thus it would be reasonable to expect other utterances similar to them to occur during the two-month interval between them. We find instead that such utterances (containing wh- and a subordinate subject) are excluded in favor of utterances like (19), presumably because the existing grammatical rules persist. This section has shown how the grammatical stages documented in sections 2 and 3 might plausibly arise in a learner whose acquisition procedure conforms to the theoretical principles of acquisition discussed in section 1. An explicit (albeit speculative) scenario has made it possible to clarify the formal considerations involved. I have proposed that the original anomalous syntactic construction is traceable to the child’s use of meaning as well as surface string in making hypotheses. The development traced in section 3 shows how a child may learn something from an adult sentence even though she subsequently continues to make errors on that very same sentence.
6. Further
Issues: Simplicity,
Basic Operations, and Non-universality
6. I Simplicity The
‘deletion-ahead-of-its-time’ explanation of the missing direct objects in subsection 3.2 is based on the choice of a rule, (27) that is very simple in form. Such a choice is certainly open to question. The argument for it cannot simply be Occam’s razor cited in favor of simplicity in the linguist’s grammar for the child. Rather, the issue is a dual one, concerning the empirical matter of how simple children’s rules in fact tend to be, and Occam’s razor only at the comprehensive theoretical level sketched in subsection 1.1. The empirical observations that involve overgeneralization are relevant here because generality is a consequence of simplicity: restrictions on applicability are a form of complexity. A lucid and theoretically sound treatment of rule generality in acquisition as an empirical issue for syntactic theory can be found in Baker (1979). On the theoretical side, it is a reasonable hypothesis (by the linguist, not the child) that efficiency of acquisition will be served by a tendency of the (learner’s) hypothesizing device to favor simplicity in rules. On efficiency in the realm of dynamic acquisition theory, I know of only one paper, that by
A deletion aheadof its time 413
Knaus (1975). He concludes that an acquisition device will do better to favor relatively general rules, since if wrong they will result in relatively frequent errors and be discarded relatively sooner. The issue of generality is of particular interest here because the range of data may be thought narrow. Specifically, the only verbs used with (27) are did, made, and drawed, although several others occur with (6b), without (27). An explanation of this restrictiveness may lie in the specifics of Emily’s vocabulary. She only used (6b), with or without (27), when she did not know the appropriate noun. She did not have such words as picture or deelybopper construction at the time she was using my did and my made. Moreover, these art objects were apparently non-representational so an attempt to name their denotata could have seemed hopeless to her. Still, one may ask why she was never observed to call a car Daddy’s drive, a wagon my pull, or a newspaper Mommy’s read. But only the last of these nouns would have challenged her vocabulary, and she also very frequently chose to say What’s_sis when she lacked a noun; therefore the opportunities for the construction were few. Still, one might think it appropriate to posit a more specific rule, a tack that has been suggested to me (along with Daddy’s drive etc.) by a thoughtful reviewer. One option for such a rule would be to restrict (27) by including in its specification that it may apply only if the verb belongs to a category limited to the three verbs used, perhaps semantically defined as verbs of ‘creating’. Before positing such a rule one must ask whether there exist languages in which rules of construal interact in this way with verb subcategorization, and, if not, whether such rules can be entertained in the acquisition process. 6.2 Basic operations The conclusions reached here bear an interesting relationship to the basic operations hypothesis of Mayer et al. (1978). According to this hypothesis a movement transformation can be broken down into two basic operations, copying followed by deletion. These authors suggest that certain errors in child speech are traceable to a child grammar containing one but not the other of the two basic operations. In their examples a child has erred by formulating a copying rule but failing to formulate a deletion rule. In this way they account for Hurford’s (1975) examples with a redundant auxiliary element and also for other data with redundant expression of tense. In the examples from Mayer et al., it can be inferred that the order of acquisition is copying followed by deletion. (They do not claim that this is typical.) This is also the apparent order of application in a correct adult grammar, because if deletion occurred first there would be apparently nothing
414
H. Hamburger
left to copy. We are thus led to the conclusion that the order of acquisition is the same as the order of application. The intuitive reasonableness of such a result can be expressed in terms of the kind of dynamic theory stressed earlier, specifically by assuming that acquisition conforms to the “final touch principle” (Hamburger, 1979). According to this principle, the learner applies the current transformational component, applying all transformations that are applicable, checks the result of doing so against the input datum, and if there is a discrepancy hypothesizes a new transformation to be applied at the last point in the derivation. For such a learner, transformations that must be applied only after others would have to be acquired later. Yet another reason to find the Mayer et al. results reasonable is that, as they point out, it would be difficult to find evidence that deletion was learned ahead of copying. This is because if there were a stage at which the child had acquired deletion but not copying it would be difficult indeed to distinguish the child’s output at that stage from simple incompleteness of phrase structure. (Bowerman, 1973, discusses this problem in a different context). Despite all these arguments for the reasonableness of expecting and finding deletion rules relatively late in acquisition, we have been able to argue from longitudinal data that here is a case of early acquisition of a deletion rule. This result is not really counter-intuitive and need not violate the final-touch principle provided it is clear just what it is that is being deleted. Deletion must apply only to surface lexical formatives and leave referential indices in place in the syntactic structure. 6.3 Non-universulity How widespread is this phenomenon, and can it be important if it is not universal among two-year-old English-learners? To the first question, we simply do not know the answer. As noted in section 3, it was only a rather intensive longitudinal effort that made it possible to establish the relevant grammatical rules. Several researchers have data of this quality but may not have looked for development of ‘proto-RRCs’ in their early data. The prospects for finding them are reasonable, since the use of verbs as nouns and the formation of primitive compounds is not uncommon in young children. It is possible that early ‘deletions’ of the sort produced by Emily have gone unnoticed or uncomprehended in the speech of other children. But suppose not. Suppose that only an occasional child treads this developmental path. Then what has been shown is but one solution out of two or several. Then these results can serve as an ‘existence proof’: there does exist a way to acquire this complicated construction in stages. It remains to resolve
A deletion ahead of its time
415
the construction into its component demands upon the learner and to observe whether those demands tend to be met one at a time, in clusters or all at once. The work of Tavakolian (1978) can be seen in part as an attempt to deal with a later stage of this problem; the same remark applies to Hamburger and Crain (1979).
References Anderson, J. R. (1976) Language, Memory, and Thought. Hillsdale, NJ, Lawrence Erlbaum Assoc. Baker, C. L. (1979) Syntactic Theory and theProjection Problem. Ling. Inq., 10,4. Bowerman, M. (1973) Early Syntactic Development. Cambridge, England, at the University Press. Braine, M. D. S. (1971) On two types of models of the internalization of grammars. In D. Slobin, (ed.), The Ontogenesis of Grammar. New York, Academic Press. Brown, R. (1973) A First Language. Cambridge, Harvard University Press. Erreich, A., Valian, V. and Winzemer, J. (1980) Aspects of a theory of language acquisition. J. Child Lang., 7, 157-179. Gelman, R. (1978) Cognitive Development. An. Rev. Psychol., 30, 297-332. Gold, E. M. (1967) Language identification in the limit. Information and Control, 10, 5, 447-474. Goodluck, H. and Solan, L. (1979) A reevaluation of the basic operations hypotheses. Cog., 7, 1, 85-91. Hamburger, H. (1979) Child data on subject-to-object raising. Proceedings of the Western Conference on Linguistics. Edmonton, Canada, Linguistic Research. Hamburger, H. and Cram, S. (1979) Elicitation of relative clauses. Paper read at LSA, Salzburg, Austria, August 2-4. Hamburger, H. and Wexler, K. (1975) A mathematical theory of learning transformational grammar. J. Math. Psychol., 12, 137-177. Hurford, J. R. (1975) A-child and the English question formation rule. J. Child Lang., 2, 299-301. Jackendoff, R. (1977) X Syntax: A Study of Phrase Structure. Cambridge, Mass., The MIT Press. Klima, E. S. and Bellugi, U. (1966) Syntactic regularities in the speech of children. In Psycholinguistic Papers, J. Lyons and R. J. Wales, (eds.), Edinburgh, at the University Press. Knaus, R. (1975) An incremental learner for transformational grammar. Technical Report No. 3, Contract No. N0001469-A-0201-9006 to ONR and ARPA. Lasnik, H. (1976) Remarks on coreference. Ling. Anal., 2, 1, l-22. Limber, J. (1973) The genesis of complex sentences. In T. E. Moore, (ed.), Cognitive Development and the Acquisition of Language. New York, Academic Press. Maratsos, M. and Kuczaj, S. A. (1978) Against the transformationalist account: a simpler analysis of auxiliary overmarkings. J. Child Lang., 5, 2, 337-345. Matthei, E. H. (1978) Children’s interpretation of sentences containing reciprocals. University of Massachusetts Occasional Papers, Volume 4, Linguistic Department, University of Massachusetts, Amherst. Mayer, J. W., Erreich, A. and Valian, V. (1978) Transformations, basic operations and language acquisition. Cog., 6, 1-13. McNeill, D. (1970) The Acquisition of Language. New York, Harper and Row. Prideaux, G. D. (1976) A functional analysis of English question acquisition. J. Child Lang., 3, 3, 417--422.
416
H. Hamburger
Ravem,
R. (1975) The development of whquestions in first and second language learners. In NeKv Frontiers, in Second Language Learning, J. Schumann and N. Stenson (eds.), Rowley, Mass., Newbury House. Roeper, T. (1978) Linguistic universals and the acquisition of gerunds. University of Massachusetts Occasional Papers in Linguistics, 4, l-36. Slobin, D. I. (1973) Cognitive prerequisites for the development of grammar. In C. A. Ferguson and D. I. Slobin (eds.), Studies of Child Language Development. New York, Holt Rinehart & Winston. Tavakolian, S. (1977) Structural Principles in the Acquisition of Complex Sentences. Ph.D. dissertation, Dept. of Linguistics, University of Massachusetts, Amherst. Wexler, K. and Culicover, P. (1980) Formal Principles of Language Acquisition. Cambridge, Mass., MIT Press. Wexler, K. and Hamburger, H. (1973) Insufficiency of surface data for the learning of transformational languages. In Approaches to Natural Language, K. J. J. Hintikka, J. M. E. Moravcsik and P. Suppes, (eds.), Dordrecht, Holland, D. Reidel.
Un rcproche courant fait au traitemcnt transformationnci dc la syntavc repose sur l’abscncc dans Its productions dcs enfants dc ccrtaincs crreurs prbdites par les constructions “wh-“. Dans cct article il cst proposC une theoric de la dynamique de l’acquisition et dcs don&s longitudinales fines ayant trait i cc sujet. Les observations esscntiellcs montrcnt $ 24- 28 mois, un pr&urscur pr&occ dc la clau?c rclativc. Lcs analyses kclaircnt dcux points fondamcntaus de la thCorie transformationnellc de I’acquisition: la permissibilit@ de changcments simultands dc rZgle ct la question dc savoir si unc transformation pcut etrc acquisc avant la structure profonde associ&. Ces questions ainsi quc Its analysts pcuvcnt se traduire en termcs non transformationncls.
Cognition, 8(1980) 417-459 @Elsevier Sequoia S.A., Lausanne
Discussion - Printed
in the Netherlands
Is the human sentence parsing mechanism an ATN?* JANET DEAN FODOR Univefsily
of Connecticut
LYN FRAZIER University
of Massachusetts
1. The Sausage Machine as an ATN Wanner (1980) has defended ATNt models of the human sentence parsing mechanism against our criticisms (Frazier & Fodor, 1978, henceforth FF) and has turned the tables by arguing that the Sausage Machine (SM) model is empirically inadequate. We believe that the SM emerges unscathed from Wanner’s objections to it. But before responding to these objections, we must establish some general points about the relation between the SM and ATNs. A cursory glance at our paper and Wanner’s might well leave a reader with this impression: Frazier and Fodor presented a model of the human sentence parsing mechanism which they claimed is superior to models which take the form of ATNs. Wanner showed that this claim is false by demonstrating that (insofar as the SM is empirically correct, which isn’t very far) the SM is an ATN. This is a picture that we would like to obliterate at the outset. This is not what we claimed, and it is not what Wanner has proved. The relation between the SM and ATNs is far more intricate than this snappy little summary implies. In making our criticisms of ATN models, we were careful to observe the distinction between ATNs in general, and the specific ATNs that have been singled out and explicitly offered as models of the human sentence parsing mechanism (see FF, footnote 2, p. 294). The general theory of ATNs defines an infinite class of particular ATNs. A particular ATN presented as a model of the human sentence parsing mechanism can be evaluated in just the same way as any other model. (Does it succeed in parsing all and only the sentences and non-sentences of the language that native speakers succeed in *This work was supported Requests for reprints should Storm, Conn. 06268. ‘Augmented
Transition
in part by be addressed
Network.
National Science Foundation Grant No. BNS 79-17600. to Janet Dean Fodor, University of Connecticut U-145,
418
J. D. Fodor and L. Frazier
parsing? Do its relative parsing times for different sentences match the relative parsing times of human subjects? When it makes parsing errors, do these resemble the errors made by human subjects? And so on.) However, if we are interested in the general question of whether the human sentence parsing mechanism is an ATN, the empirical evaluation of one particular ATN does not take us very far. Even if that ATN passes all the empirical tests we can put to it, this may not (for reasons that we will discuss below) constitute support for the general theory of ATNs. And very clearly, if one particular ATN fails to match up to the empirical data, this would not amount to a disconlirmation of ATN theory. Rather than investigating the merits of one individual ATN after another, we might characterize some property shared by a whole class of ATNs, and try to establish whether or not this is also a property of the human sentence parsing mechanism. If it is not, then we will have narrowed the class of viable ATN models considerably in one move. This is what we attempted to do (though very much in passing) in FF. We argued that people compute the phrasal structure of a sentence in two stages, that the first stage of analysis is performed by a component of the parser that has a very limited view of the sentence, that the decision-making element of the parser has access to parts of the partial phrase marker it has constructed for earlier portions of the sentence, that the phrase structure rules referred to by the parser are not built into its program of action plans but reside in an independent ‘rule library’ and are accessed as needed. Each of these claims is incompatible with some subset of the infinite set of ATNs; in particular it is incompatible with the particular ATNs that have been proposed in recent years as models of the human sentence parsing mechanism. We will defend these claims in our discussion below. Our present concern is what implications these, or any other proposed empirical disconlirmations of current ATNs, have for ATN theory in general. Let us suppose, temporarily and just for the sake of simplifying exposition here, that every aspect of the SM that we specified parallels exactly an aspect of the human sentence parsing mechanism. (There are plenty of properties of the human sentence parsing mechanism that we did not attempt to specify. For example, we made no claims about the lexical access routines, or about how sentence prosody affects parsing decisions, or about how transformed sentences are processed, and so on.‘) Now the question is: does our (hypothesized) demonstration that the human sentence parsing mechanism is modelled by the SM constitute a demonstration that it is not modelled by any ATN? The answer, as so often, depends on exactly how the question is construed. On one way of taking it, it is almost certain that a demonstration
Is the human sentence parsing mechanism an ATN?
419
that the human sentence parsing mechanism is modelled by the SM would amount to a demonstration that it Carl be modelled by an ATN. The class of ATNs is extremely heterogeneous; particular models differ with respect to a great many parameters that affect their performance. So, at least if we consider only input-output relations (i.e., weak equivalence), it is very probable that some ATN could be devised whose performance characteristics are indistinguishable from those of the SM. Indeed this is necessarily so, if the version of ATN theory that we assume is as unconstrained as that which Woods (1970) has proved to be equivalent in computational power to a universal Turing machine.’ Viewed this way, our arguments in favor of the SM model constitute a disagreement with proponents of ATN theory only with respect to where, within the infinite class of ATNs, we should concentrate our search for the one which models the human sentence parsing mechanism. They have been looking in one place, but we claim that the target is to be found in another. If this is what is going on, then we clearly have not mounted an attack on ATN theory, but might even be regarded as having made a contribution to it.
1 It has been suggested to us in discussion (at the Joint Language Seminar, MIT, 1978) that our failure to make explicit the complete sequence of steps by which sentences are parsed makes it impossible to judge the validity of the SM model - more strongly, that it deprives the model of any empirical content. This strikes us as an odd view, as odd as the view that, for example, a linguistic claim about the structure of relative clauses has no content until it is supplemented by claims about every other aspect of the grammar of the language in question. It is true that a claim that appears plausible may be undermined as its explicitness and completeness are increased. But what research strategy exists as a reasonable alternative? We make claims about matters to which we have given some thought, and on which we believe we have some relevant evidence to bring to bear. There is no point in embedding those claims in a body of others which we have at present no evidence for and no worthwhite thoughts about. Indeed, to do so would almost certainly obfuscate the claims that we are serious about, and impede their empirical evaluation. It is certainly true that the SM cannot be implemented on a computer to parse actual sentences without filling in, in arbitrary fashion, ah sorts of necessary bits of computational machinery that are as yet unspecified. But this does not entail that the model, insofar as it is specified, is false, or that there is nothing to be said about its truth or falsity. Computer implementations have a valuable role to play in the development of psycholinguistic theory, but it would be a real mistake to conclude from the fact that an incomplete program has no output that an incomplete theory has no value. *The proponents of ATN theory have been very aware that it needs to be constrained, especially in view of the fact that its full power has not been utilized in modelling the human sentence parsing mechanism. As the theory develops, it may conceivably become restricted in such a way that there is no ATN that is even weakly equivalent to the SM. In this event, a demonstration that the SM is correct would be ipso facto a demonstration that ATN theory is incorrect. However, the arguments against ATN theory that we give below do not in any sense hang on this possibility; they go through even if all the overt behavior of the SM can be mimicked by an ATN.
420
J. D. Fodor and L. Frazier
But what would it be like to mount an attack on ATN theory? Is there any way in which it could be empirically disconfirmed? While emphasizing that this was not what we were about in our original paper, we would like to explore this general question here. ATN theory is often criticized as empirically empty. The goal of psycholinguistics is to identify, within the class of conceivable sentence parsing mechanisms, just that one which matches the sentence parsing mechanism embodied in the human brain. But it appears that the class of ATNs is more or less co-extensive with the class of conceivable parsing mecfianisms and thus that ATN theory does not substantially contribute to the search for the human sentence parsing mechanism (HSPM)? We believe that ATN theory can be defended against this charge of emptiness, for there is an interpretation of it under which it does make an empirical claim (though we don’t know whether this is a claim to which the proponents of the theory take themselves to be committed). The idea is that the behavior of the HSPM can be modelled by an ATN because the relevant part of the human brain has exactly the structure and capacities (computational resources, storage capacity, control structures, etc.) of an ATN. If this is what is intended by the claim that the HSPM is an ATN, then we can derive the empirical prediction that the HSPM corresponds not just to some ATN or other, but to a relatively simple and natural ATN - an ATN that is (more or less) optimal given the nature of the task to be performed and the nature of the resources available. (Total optimality - even if it could be satisfactorily defined - would surely be too much to expect. In fact the point is better phrased the other way around: to the extent that the HSPM can be mimicked only by an ATN which utilizes the resources available in an arbitrary and pointlessly inefficient manner, it is implausible that the HSPM is an ATN, in the strong sense.) Since the ATN framework is apparently coming to be regarded as a high level programming language in which parsers of all kinds can be
3We do not intend here to attach any great weight to the idea that there is just one human sentence parsing device rather than a range of devices differing from each other with respect to certain parameters. If, for example, parsing mechanisms are individuated not only in terms of their structure and general operating characteristics but also in terms of the linguistic information they embody, then we must obviously recognize different parsing mechanisms for different languages. In any case, it is probably reasonable to allow for some individual variation even between speakers of the same language. The structure of the relevant part of the human brain certainly looks to be very strongly constrained by genetic factors, but possibly not to uniqueness. In what follows, we will use the abbreviation HSPM to refer to a range of mechanisms, each of which is or could be - in normal circumstances - realized in the brain of a person and used for parsing a natural language. For convenience, however, we will continue to make use of locutions such as “the HSPM is an ATN”; this is technically imprecise but its intended sense is clear enough.
Is the human sentence parsing mechanism an ATN?
42 1
formulated (see, for example, Swartout’s (1978) reformulation of the Parsifal model of Marcus (1977), we could also express this prediction as follows: if all the algorithms for parsing natural languages that could be formulated in this programming language were ranked according to some simplicity metric (simplicity of the network of arcs, or simplicity of the sequence of operations involved in the parsing of individual sentences), then the HSPM would correspond to a program high on this ranking. (Furthermore, since, as noted, what we are calling the HSPM probably consists of a range of slightly different parsing devices, we would expect this range to constitute a natural class of programs when formulated in ATN terms.) If there were some reason for confidence in the truth of ATN theory under this strong construal, then it would obviously be appropriate to formulate our current models of the HSPM in ATN terms. We might be quite sure that, in the present state of the art, the particular models we propose will be deficient in many respects. But we would certainly expect that the most fruitful research strategy for arriving at the correct model would consist in characterizing some ATN and then gradually modifying it in response to internal efficiency considerations as well as new empirical data. From this perspective, it is quite appropriate that Wanner’s defense of ATN theory takes the form of an argument to the effect that certain observations that we made about the human sentence parsing mechanism can be captured within an ATN in a very simple and general fashion. If the same could be demonstrated for every fact that is discovered about how people parse sentences, this would be most impressive and would certainly commend the ATN computational framework as an important contribution to psychology. In particular, if the SM model (continuing to assume for the sake of argument that it is empirically correct) could be shown to be equivalent not just to some ATN but to an optimal ATN in the sense defined, then our research would not invalidate ATN theory but would provide strong support for it. Correspondingly, of course, the demonstration that the SM is an ATN would not in any sense invalidate the SM. We hope it is clear, then, that in principle there need be no conflict between the SM and ATN theory, even if the latter is construed under the strong interpretation. In practice, everything turns on what sort of ATN the SM turns out to be. Our response to Wanner will be that the SM is indeed empirically correct (as far as it goes), and that it corresponds to a complex and ‘unnatural’ ATN. (This latter point should, after all, not be too surprising in view of the differences between the SM and current ATN models, if we make the plausible assumption that current ATN models have not been picked at random but have been favored just because they represent more or less optimal designs for an ATN device that is to be used for parsing natural
422
J. D. Fodor and L. Frazier
these two claims entail that the HSPM, language sentences.) Together, though it can no doubt be modelled by some ATN, cannot be modelled by any ATN that ranks high on the implicit evaluation metric defined by ATN theory. The inevitable conclusion is that, insofar as ATN theory is not empirically empty, it is empirically incorrect. 4 To put it succinctly: this time around, we are indeed challenging not only particular ATN models but the whole ATN framework. It is worth noting here a subtle but important distinction drawn by Pylyshyn (1980) between simulation and emulation. Pylyshyn considers as one example the relation between “the original binary-coded Turing machine introduced by Alan Turing” and a machine with “ what is called a register architecture (in which retrieving a symbol by name or by ‘reference’ is a primitive operation)“. He comments: A register architecture can execute certain algorithms (e.g., the hashcoding lookup algorithm) which are impossible in the Turing machine in spite of the fact that the Turing machine can be made to be weakly 4At the risk of multiplying alternatives beyond necessity, we would point out that there is a possible middle road that defenders of ATN theory might take. The theory is empty if it claims only that there is an ATN weakly equivalent to the HSPM, without making any claims about what the actual structure and resources of the HSPM are. The theory is false (we argue) if it claims that there is an ATN strongly equivalent to the HSPM that has all the resources that ATN theory defines as available - for then it would be incomprehensible that the HSPM goes about sentence parsing in the idiosyncratic way it does. But the theory would not be quite empty, and would not be obviously false, as far as it goes, if it were construed as claiming that there is an ATN that is strongly equivalent to the HSPM and uses only a subset of the resources made available by the general definition of an ATN. This interpretation of the claim that the HSPM is an ATN is not as interesting, of course, as the strong interpretation discussed above, which takes the definition of an ATN to establish both lower and upper limits on how the HSPM goes about its task But even if it doesn’t say which resources the HSPM is supposed to share with ATNs, it does have some substance on the assumption that not every imaginable parsing device can be strongly simulated by an ATN. It is much more difficult, in this construal of the ATN claim, to determine the consequences of a demonstration that the SM model is correct, for these will bc highly sensitive to the exact definition of the structure and resources of an ATN. If the set of possible ATNs includes one that has the structure and resources of the SM (rather than merely having the power to emuZate the SM design), then ATN theory would necessarily be true if the SM theory is true - but it would be superceded by the SM theory since the latter draws the net around the HSPM much more closely. If there is no ATN strongly equivalent to the SM, then the validity of the SM model wouldentail the falsity of the ATN theory on this construal. Wanner does not indicate whether the definition of ATNs that he is assuming is compatible, for example, with a two-stage design of the kind embodied in the SM, and so we don’t know whether to arbtre merely that his new model is incorrect or that the whole theory is incorrect. Perhaps it is time for a clear statement of just what does and does not qualify as an ATN. As long as the debate is about the merits of particular models, this is unimportant; but if the focus of discussion is to be raised to questions about ATN models in general, as opposed to all others, we do need to know exactly which models fall on which side of the line.
Is the human sentence parsing mechanism an ATN?
423
equivalent to this algorithm. In other words, it can compute the same lookup function, but not with the same complexity profile, and hence not by using an algorithm which is complexity-equivalent to the hashcoding algorithm. Of course it could be made to compute the function by simulating the individual steps of the register machine’s algorithm, but in that case the Turing machine would be emulating the architecture of the register machine and executing the algorithm in the emulated architecture, a very different matter from computing it directly by the Turing machine. (Pylyshyn, 1980, p. 124. Author’s emphases.Y In Pylyshyn’s terms, our claim that ATN theory is empirically incorrect amounts to the claim that the HSPM is so structured that it can be simulated naturally by an ATN only insofar as that ATN first emulates the architecture of the SM. If this is so, it implies that there is nothing (except, at present, explicitness) to be gained methodologically by casting our tentative models of the HSPM in ATN terms. Rather, our efforts should be directed towards devising computational mechanisms with a very different architecture, an architecture within which the kinds of computations that come naturally are just the kinds that come naturally to the human brain. As Chomsky has emphasized in discussion of linguistic competence, the structure of the human brain is the result of innumerable accidents of evolution; all the signs are that it is very idiosyncratic, and that the simplicity ranking that it imposes on the range of possible mental states and operations is very different from any ranking that would emerge from a very general computing device deliberately designed for efficient performance. The gist of our arguments for the SM model is that once we make the tiorking assumption that the HSPM has the rather quirky structure that the model ascribes to it, the further assumption that it operates with maximum efficiency by its own rights will predict many
5As Pylyshyn notes elsewhere in his paper, the matching of complexity profiles is not by itself sufficient evidence for the strong equivalence of two computational processes. For example, a putative model of the HSPM that in fact does not capture its structure at all might be engineered to exhibit the same complexity profile by building in arbitrary time delays or by assigning costs to various sub processes in an unprincipled way. Such a model would, of course, not be at all plausible precisely because of the ad hoc nature of the means by which it mimics the behavior of the HSPM. This is why it is important to consider the performance characteristics of a program that is natural given the functional architecture that is being assumed, i.e., a program that makes sensible use of the resources available for the task to be performed.
424
J. D. Fodor and L. Frazier
aspects of its performance. If we were to assume that the HSPM had some other structure, its performance characteristics might still be describable, but they would not be explicable.
2. Outline of the Arguments
We must now face up to Wanner’s empirical objections to the SM model, and his claim that insofar as it is correct it is equivalent to a simple and natural ATN. The points he makes are as follows. (a) There are data that indicate that Right Association governs the decisions made by the PPP. (b) Therefore we were wrong in claiming that Right Association can be explained as a consequence of the PPP’s inability to ‘see’ relevant nodes of the phrase marker. (c) Therefore we were wrong in claiming that there are no ad hoc parsing strategies but only general decision tendencies attributable to the structure and general operating principles of the parser. (d) Furthermore, there is no longer any positive evidence in favor of a twostage parser. (e) Even if we were somehow to build Right Association into the PPP, the Sausage Machine model would be unable to explain the interaction of Right Association and Minimal Attachment. (f) Right Association and Minimal Attachment can be captured within an ATN by general scheduling principles which determine the order in which arcs are attempted. (g) Therefore we were wrong in claiming that ATNs cannot capture parsing preferences based on the geometry of the phrase marker. (h) When Right Association and Minimal Attachment are reformulated as ATN scheduling principles the interaction between them falls out automatitally . (i) Right Association and Minimal Attachment contribute to the efficiency of parsing in an ATN. ci) This efficiency contribution promises to provide an explanation of why the human sentence parsing mechanism (assumed to be an ATN) should abide by Right Association and Minimal Attachment. Our response to these points will be as follows: (a’) We accept both the data and the conclusion. (b’) This is not what we claimed. Wanner’s misunderstanding seems to be due to a terminological confusion in FF for which we are clearly to blame.
Is the human sentence parsing mechanism an ATN?
425
We started with Kimball’s Right Association principle, argued that there were data it could not account for, revised its content, and renamed it Local Association. Unfortunately (not realizing that anyone would want to resuscitate the original Right Association principle, as Wanner has), we alternated freely between the terms Right Association and Local Association in referring to our new principle. In the present paper we will be careful to use Right Association to refer to the principle proposed by Kimball and endorsed by Wanner, and Local Association to refer to the principle proposed in FF. It was this latter principle, Local Association, that we attributed to the limited capacity of the PPP. And we are still prepared to stand by this explanation. (c’) With Minimal Attachment and Local Association attributed to the structure and general operating principles of the Sausage Machine, the only question that arises is whether we can account for Right Association (the principle that we wrongly rejected in FF) in a similar way. We will argue here that we can. (d’) Minimal Attachment and Right Association are not assumed to be related to the twostage structure of the Sausage Machine but to other basic properties of it. But Local Association, which Wanner does not discuss, does motivate the two-stage structure. (e’) The interaction of Right Association and Minimal Attachment is predicted exactly by the Sausage Machine model. Their interaction is governed by Local Association. (f’) Right Association can be captured by a general scheduling principle framed in ATN terms, but Local Association cannot, and only some instances of Minimal Attachment can. (g’) In view of (f’), our original claim was correct. Some such parsing preferences can be recast in terms of the ordering of arcs in a ATN, but others cannot. (h’) Wanner’s claim (h) is incorrect. Because the ATN does not contain any analogue to Local Association, it severely oversimplifies the interaction between Right Association and Minimal Attachment. (i’) Minimal Attachment contributes to efficiency in an ATN just as it does in the Sausage Machine. But Right Association does not contribute to efticiency in an ATN, though it does in the Sausage Machine. (j’) Because of (i’), there is no plausible explanation of Right Association within an ATN. There is also no plausible explanation of how the efficiency contribution of Minimal Attachment could have led to a human sentence parsing mechanism that abides by Minimal Attachment. We should make it quite clear that the criticisms of ATNs embodied in (f )-Cj’) are criticisms of current ATN models and of models that differ from
426
J. D, Fodor and L. Frazier
these only in relatively superficial ways. If there is an ATN that is equivalent to the Sausage Machine, it would of course not be open to these criticisms. In what follows we will try to suggest at each point how current ATNs could be modified to meet our objections, but the necessary revisions are quite radical and we believe, as noted above, that they amount to having the ATN emulate the structure of the SM. Our discussion will be organized around the three parsing principles: Minimal Attachment, Local Association, and Right Association.
3. Minimal Attachment The principle of Minimal Attachment (MA) states that an incoming word or phrase is to be attached into the phrase marker using the smallest possible number of nonterminal nodes linking it with the nodes that are already present. Wanner claims that this is equivalent to the ATN scheduling principle: schedule all CAT arcs and WORD arcs before all SEEK arcs. He com[the MA] strategy by providing ments that this scheduling rule “enforces that any input element will be analysed as a category or a word of the current phrase before any SEEK to a lower phrase is attempted”. But this ATN principle (which we will refer to roughly but conveniently as the CAT-beforeSEEK principle) is not equivalent to MA, because it does not establish any preference in cases of a choice between two SEEK arcs (i.e., between two non-lexical phrasal nodes). In FF, we cited seven kinds of temporary ambiguity in which MA determines the preferred analysis. Only three of these are clearly handled by the CAT-before-SEEK principle, one is clearly not handled by it, and the other three are arguable. For sentence (1) as Wanner notes, CAT-before-SEEK and MA both force the attachment of the book as shown in (2), rather than with an extra NP node over it. (1) John bought (2)
the book for Susan. S
I John
“/\NP
I bought
A Det
I the
N
I book
Is the human sentence parsing mechanism an ATN?
427
This choice then influences the attachment of for Susan. Suppose we posit another principle which requires that the partial phrase marker that has been constructed on the basis of previous words in the sentence is not to be changed in response to subsequent words unless there is no other way of proceeding. This principle (henceforth, the revision-as-last-resort or RALR principle) is a very natural one for almost any parsing model. It implies that having found a legitimate analysis of the initial portion of a sentence, the parser will hang onto it and try to continue it as further words are received and processed. The alternative would be for the parser to switch from one analysis to another in the absence of any good reason to do so. Like Wanner, we have taken it for granted that this would be very inefficient and that the HSPM abides by RALR.6 Applied to (2), RALR determines the attachment offor Susan as in (3a) rather than as in (3b). (3) a. NP
I John
A V
I J!hn
V+ bo?ght
PP DeoN
I
II book
NP /\ Det
PA
the
A
bought
i for
Susan
N
A P
I
II
the
book for
PP
I Susan
Notice that MA does not have to be explicitly invoked again at this point in the analysis, even though the end result is an attachment of the PP using the fewest possible nonterminal nodes. RALR interacts with the CAT-before-SEEK formulation of MA in just the same way in sentences (4) and (5). (4) They told the boy that the girl liked the story. (5) The horse raced past the barn fell.. . Sentence (4) is ambiguous between relative clause structure (6b).
the complement
structure
(6a) and the
6The only room for disagreement between models concerns what is to count as evidence that the current analysis is unsuccessful and should be changed. Semantic incoherence or pragmatic implausibility might be sufficient (cf. Frazier, 1978). We will continue to assume, with Wanner, that whatever else may be involved, syntactic iflformedness is a sufficient condition for rejecting an analysis.
428
J. D. Fodor and L. Frazier
(6) a.
they
“&(NP,
I told
A Det
I N
I
I
the
&
boy Comp
i
I that
the girl liked the story
(6) b.
they told
NY--Y Det , the
A
net’1
N,
A ~ Cc;,,,,
boy
that
&
tl(c
story
the girl liked
CAT-before-SEEK forces attachment of the bola immediately beneath the VP node, as in (6a); RALR requires the continuation of this analysis by ruling out the subsequent insertion of the extra NP node needed for analysis (6b). This accounts for human subjects’ preference for the (6a) analysis (Wanner, Kaplan and Shiner, ms). Sentence fragment (5) is ambiguous between the simple main clause analysis (7a), and the reduced relative clause analysis (7b). (7) a.
Det
N
I the
I horse
“APP I raced
P-NP
I
past
/--l
Det
I
i
the
barn
Is the human sentence parsing mechanism an ATN?
429
(7) b.
A
Det
I
I N
(S‘)
I
the
horse
I
past
Det
I the
barn
CAT-before-SEEK forces attachment of the horse immediately beneath the S node, as in (7a); RALR requires the continuation of this analysis, by ruling out the subsequent insertion of the extra NP node needed for analysis (7b). Two other cases falling under MA are handled by CAT-before-SEEK without the aid of RALR, but only on certain assumptions about the details of the phrase marker, i.e., about the correct grammar for English. On other assumptions, CAT-before-SEEK cannot handle these examples at all. The sentence fragment (8) is ambiguous between a simple direct object noun phrase analysis, as in (9a), and a complement clause analysis as in (9b). (8) We knew the girl.. . b.
(9) a.
AVP
AVP
NP
knew
NP
(3) I
knew
Det
I
7
the
girl ----s A Det
I
“I
the
girl
430
J. D. Fodor and L. Frazier
The NP node that we have parenthesized in (9b) is crucial. Whether or not it should be present is a matter of dispute in linguistics. Certain phenomena (e.g., passivization, pronominalization) suggest that a sentential complement to a verb like know is a noun phrase. Other phenomena (e.g., extractability of constituents from within the complement) could be described most generally if these sentential complements were not (e.g., in terms of Subjacency) dominated by NP. Let us consider first how parsing would proceed if this NP node were present. Notice that the need for a parsing decision would then arise after the structure (10a) had been formed; the parser’s choice would be between developing the NP node as in (lob) and developing it as in (10~).
knew
Clearly, now let over the structure different
analysis (lob) is favored by the CAT-before-SEEK principle. But us consider how parsing would proceed if there were no NP node S node. The need for a parsing decision would then arise after the (1 la) had been formed; the parser’s choice would be between two nonlexical nodes following the verb, as shown in (1 1b) and (1 lc).
(1l)a.
b.
C.
NP I we
i V
/ knew
“ANP
I
knew
I
knew
Because both nodes are nonlexical, they would both correspond to a SEEK arc in the ATN model, and neither would be favored by the CAT-beforeSEEK principle. However, the choice of NP rather than S permits the determiner the to be attached into the phrase marker with the fewest nonterminal nodes, and is thus favored by MA. In the ATN network that he presents @iagram 1, p. 220) Wanner in fact orders the SEEK NP arc before the SEEK S arc in the VP network. This correctly predicts the parsing preference exhibited by human listeners, but it is entirely ad hoc; none of the general principles that Wanner has proposed determines this ordering rather than its opposite. Thus, in this case, the adequacy of the CAT-before-SEEK principle as a simulation of MA is highly sensitive to the exact details of English grammar.
Is the human sentence parsing mechanism an ATN?
A comparable
case is that of sentence
( 12) That silly old-fashioned..
fragment
43 1
(12).
.
This is ambiguous between a simple noun phrase analysis, as in (13a), and a complement clause analysis, as in (13b). (13) a.
7
WV
Det &*d
I that
j
I silly
I old-fashioned
Adj
I silly
Adj
I
I old-fashioned
(Note: we have illustrated this ambiguity for phrases in subject position, but in fact it is quite general and thus adds another dimension of potential ambiguity to examples of the general form of (4) and (8) above.) For this example too, the presence of the NP node above the S node is absolutely crucial if the CAT-before-SEEK principle is to predict the preference for analysis (13a). Without this node, the parser’s choice is between two nonlexical nodes, NP and S, and both will correspond to SEEK arcs in the ATN network. The assumption of an NP node over subject complements is certainly common in linguistics, but it is by no means indisputable. (For example, once the rule NP -+ S is admitted into t_he grammar, one loses the simplest account of the ungrammaticality of an S following a preposition.) To summarize: in order to defend the CAT-before-SEEK principle against the more general MA principle, Wanner must take a stand on a number of very delicate issues about the exact details of phrasal structure. He is commited to there being no cases of alternation between two nonlexical nodes that can dominate words strings beginning with the same lexical item - or at least, to there being no systematic parsing preference in such cases. The next two instances of MA show that this claim in untenable. The sentence fragment (14) might continue in either of the ways shown in (15). (14) The leader of the gang of thieves..
(15) a. . . .was captured
by the police.
.
432
J. D. Fodor and L. Frazier
(15)b . . . .having been captured
by the police, the loot and left the country.
the others
quickly divided up
Thus, the noun phrase (14) might either be the subject of the main clause, as in (15a), or be the subject of a subordinate adverbial clause, as in (15b). The exact structure of such adverbial clauses is unclear, but they do seem to be constituents and hence must be dominated by some nonterminal node; for the purpose of our argument, it doesn’t matter whether this is AdvP, S, or S or all three at once. (15) a.
b. / AdVP I (S)
2
I CS)
A the leader of the gang of thieves
/
2l the leader of the gang of thieves
Clearly, MA favors the simpler analysis (15a), but CAT-before-SEEK makes no prediction. Equally clearly, human listeners favor analysis (15a). However, we don’t want to hang too much on this example, because there is such a big difference in the frequency with which these two constructions occur in ordinary discourse, and we cannot rule out the possibility that the parser’s preferences are sensitive to frequency of occurrence. If so, the preference for (15a) over (1 Sb) need not be attributed to MA. The seventh instance of MA that we cited in our earlier paper is not open to this objection, and it does support MA as opposed to CAT-before-SEEK. An NP NP sequence as in (16) is ambiguous between a conjunction analysis, as in (17a), and a relative clause analysis, as in (17b). (16) The man the girl.. (17) a.
girl
Is the human sentence parsing mechanism an ATN?
(17)
b.
433
s / NP
the
man
NP A Det
I the
girl
Note that both of these analyses require revision of the partial phrase marker constructed at an earlier stage. Because of MA or CAT-before-SEEK, the parser would attach the first two words as a noun phrase immediately beneath the S node, as in (18).
(18) NPAs De/-N the
man
There is now nowhere for the second noun phrase to be attached, and so (despite RALR) an extra node must be inserted into the previously formed structure. The preference for the (17a) analysis cannot be attributed to the nature of this extra node, since in both cases it is an NP. The only question at issue for the parser is whether or not it should include an S and/or an S between this inserted NP node and the NP node immediately dominating the lexical nodes. MA determines, correctly, that it should not. But CAT-beforeSEEK does not determine any preference. In order to capture human subjects’ preference for the conjuction analysis, the order of arcs in the NP network of an ATN would have to be as in (19). (Note: we have omitted proper nouns and other realizations of NP that are irrelevant to the point at hand.) The priority of the CAT DET arc over the two SEEK NP arcs leading from NPO is guaranteed by the CAT-beforeSEEK principle, but the proper ordering of the two SEEK NP arcs relative to each other is not established by CAT-before-SEEK or by any of the principles that Wanner has formulated. Unless it can be shown that the preference
434 J. D. Fodor and L. Frazier
(19)
(‘AT NOUN
SkNO
for conjunction over relative clause constructions has a quite different source than the other preferences we have examined (and also that there is some further generalization that could be formulated in ATN terms to cover it), we must conclude that the greater generality of MA is empirically warranted. As we noted in FF, the absence of any general scheduling principle that orders the arcs correctly in an ATN does not mean that the arcs c’urlnof be ordered correctly. This conclusion would follow only if it could be shown that an ordering needed for one kind of construction is directly contradicted by an ordering needed for another, i.e., if the necessary arc orderings are internally inconsistent. We shall come to a case of this sort shortly in our discussion of RA and LA and their interaction with MA. But as long as only MA is at issue, we know of nothing that would prevent the arcs simply being listed in an order that corresponds to human parsing preferences. However, the MA generalization would then simply be an external generalization about the network; it would not in any sense follow from general principles governing the structure or operation of the parser. In FF, we implied that this lack of any internal generalization is a defect in current ATN models. We will now spell out the reasoning behind this judgment. Taking a step in the direction of explanation rather than mere description, we can ask W/ZJ~the human parsing mechanism should abide by MA, and /ZO\V MA is imposed in practice on its operations. In the SM, the lzow question is answered by the assumption that the rules (- paths in the subnetworks of an ATN) are not extrinsically ordered, but arc accessed in parallel and selected of rules in terms of the outcome of a ‘race’ -- the first rule or combination that successfully relates the current lexical item to the phrase marker dominates subsequent processing.7 The ~V/ZJ,question is answered (rightly or wrongly) by the claim that this mode of operation is innate; either it is simply an evolutionary accident, or else (quite plausibly) it is the result of selection for an optimally efficient parsing mechanism. In current ATN
Is the human sentence parsing mechanism an A TN?
435
models, it appears that the how question can be answered only by the assumption that the arcs in the network are extrinsically ordered. But this makes it very much more difficult to answer the why question. In particular, it cannot plausibly be claimed that this ordering is innate. We have assumed that MA is innate because we have assumed that it is a genuine generalization about the parsing of English (not a mere coincidence), and also that there is nothing about the structure of English sentences that makes it plausible to suppose that in this respect the operation of the parser is shaped by the properties of the input it has to work on. But if it is innate, is must be universal. (Though we haven’t examined the parsing of other languages for evidence that MA is at work there too, we are quite prepared to commit ourselves to the claim that it is.) The SM model clearly predicts that it is universal. Differences between languages are reflected only in the wellformedness rules that constitute the data structure of the SM. The rules for English could be replaced by the rules for Swahili without any effect on the rest of the mechanism. And however much the rules for Swahili differed from the rules for English, MA would still apply in the parsing of Swahili - since the staggered parallel mode of rule access will result in MA for any body of rules at all. But in the ATN, MA is captured within the data structure, i.e., within the network of arcs.’ Given that MA
7This search through the rules will not be random. If the node to be attached into the phrase marker is A, and the node in the phrase marker to which it must be connected, directly or indirectly, is B, then the parser will be looking simultaneously for rules of the form X+ YAZ and rules of the form B + UVW. If these searches coincide in some rule B + YAZ, the parser will make use of this rule. If they do not immediately coincide, the parser will begin to search for linking rules of the form U + PAQ, B + SXT and so forth. This kind of search would probably be facilitated by the use of some non-standard format for representing the rules, such as a connectivity matrix, a multidimensional tree structure, etc. ‘This is true of the model that Wanner presents, but other discussions of ATNs (e.g. Kaplan, 1975) indicate that it is not intended to be a general defining characteristic; the role of the network in specifying the grammatical rules for the language can be formally distinguished from its role in specifying the sequence of steps by which the rules are applied to input sentences. Instead of using the vertical ranking of the arcs that leave a state to indicate the order in which they are to be attempted, the ranking of arcs can be left uninterpreted, the alternative actions they represent can be recorded in an agendu, and the order in which actions arc taken from the agenda and executed can be governed by an independent body of scheduling principles. The agenda can even free the parser in principle from the fixed left-to-right and top-to-bottom sequence of actions that is otherwise implied by the state transitions depicted in the network. (The ‘state names’ in the network then would not in any sense name states of the parser; the network would be nothing but a grammar, expressed in a different notation than linbmists standardly employ.) Separating the grammar and the schedule in this way would not necessarily affect the general lines of our argument, however. What is needed is some way of characterizing the MA scheduling principle independently of the particular pattern of nodes in the grammar for a given language, and the notion of an agenda does not in itself contribute to this.
436
J. D. Fodor and L. Frazier
cannot be imposed on a network by any general scheduling principle, it follows that an MA ordering will be inseparable from the particular set of arcs representing the rules for the language in question. Substituting a network of arcs appropriate to Swahili for a network of arcs appropriate to English would result in the loss of any generalization about how the arcs arc ordered. (It should be remembered that the grammar encoded in the ATN is intended to be a surface structure grammar, so it cannot plausibly be maintained that the network for Swahili is, apart from its lexical items, simply identical with that for English). To put this point another way: even if, by chance, a child learning English were to establish a network in which arcs were ordered in accord with MA, there would be no reason to expect that a child learning Swahili would also establish a network in which arcs were ordered in accord with MA. The crucial ordering is tied up in the language specific data structure of the ATN, rather than in its universal operating principles. There seem to be two possible responses to this argument against current ATNs (see Wanner’s footnote 8, p. 224). One alternative is to assume that feedback during language learning will result in an MA ordering of arcs regardless of which language is being learned. The other alternative is to assume that there are innate constraints on the kinds of data structure that the language learner can hypothesize. The feedback solution may look to be plausible in view of the practical benefits bestowed by MA, on which Wanner and we are in agreement. (As noted in FF, MA minimizes rule accessing and node storage, and permits orderly revision procedures when incorrect structures have been computed.) That is, it looks as if one could assume that the language learner would be encouraged to order arcs in accord with MA by positive reinforcement of the decisions made at all choice points in a successfully parsed sentence, and negative reinforcement of decisions made during an unsuccessful parse. But in fact the reinforcement contingencies for inducing MA could not be as straightforward as this. This is because MA contributes to the overall efficiency of the parser without contributing to the chances that any particular structural choice that is made is correct for the sentence at hand. The minimal attachment may of course hppen to be the correct attachment in many cases. But there is no guarantee that it will always be the correct attachment, or even that it will be the correct attachment more often than not. MA confers general advantages even where it leads temporarily to the wrong analysis. But for these advantages to act as reinforcement for a certain ordering of arcs in the network, the parsing device would have to contain machinery for working out what would have happened if it had made a different set of decisions in the parsing of a sentence than the ones it did
Is the human sentence parsing mechanism an ATN?
437
make, and for integrating the effects of these alternative choices over a large range of sentence types. We cannot prove that this is not what is going on during human language learning, but we do regard it as considerably less plausible than the explanation of MA provided by the SM model. The second approach assumes that MA is innately determined in the form of a general schema for language networks, into which facts about the particular language being learned must be integrated. The best way to illustrate this is to suppose, temporarily, that the CAT-before-SEEK generalization is the correct one. Then we could imagine that the language centers of the brain of a human infant contain a data structure consisting of a skeleton network of arcs, in which all CAT arcs (though not yet specified in more detail) precede all SEEK arcs (also not yet specified in detail). Learning to parse sentences would consist in adding specific details to these arcs in accord with the structure of the target language. But changing the properties of the innately specified skeleton network would be impossible, and so a network which violated the CAT-before-SEEK principle would be unlearnable. It is here that the importance of ageneral ATN statement of MA becomes apparent. If there were such a generalization, this account of the innateness of MA would be plausible enough. But it looks as if there is simply y10 generalization that does the work of MA other than one couched frankly in terms of the number of SEEK arcs that must be traversed before a CAT arc is reached, i.e., in terms of the number of nonlexical nodes dominating a lexical item. And this means that there is no plausible mechanism for imposing MA on networks. MA cannot be captured by means of some straightforward sort of master pattern for networks that the child must respect in forming his hypotheses about the language. Instead, there would have to be some extra machinery for tracking through the network, counting the number of SEEK arcs on alternative paths, and evaluating hypothesized additions to the network in terms of their impact on these counts. Evolution might have resulted in there being such a mechanism in the human brain, but in the absence of direct evidence, we regard the SM assumptions as more plausible. The term “explanatory adequacy” is sometimes used in an overly casual way, without any clear statement of the underlying logic of the argument in which it is being invoked. We have done our best here to delineate the respect in which (we believe) the SM model provides a simpler and more plausible explanation of the MA phenomenon than any current ATN model. The qualification “current” is important here, however. ATN theory could quite directly incorporate an explanation of MA if current ATN models were superseded by ATN models in which the scheduling of arcs was determined by a ‘race’ between alternative routes through the grammar. That is, unless
438
J. D. Fodor andI,. Frazier
such a modification somehow (in ways that we cannot envisage) violated the general defining characteristics of ATNs, the SM account of MA could simply be incorporated into an ATN. Before leaving this topic, we should note that there is one way in which a small compromise could be struck between current ATNs and the SM. The SM achieves its explanation of MA by assuming that MA is the result of online computations rather than being hardened into the data structure: temporal variations in these on-line rule accessing operations achieve the effect of counting nonterminal nodes in alternative derivations without the need to assume any explicit counting device as part of the innate equipmcn’t of the parser. But it is, of course, quite conceivable that rules that are accessed frequently (whether or not they are the right rules for the sentences to which they are applied) become to that extent more accessible in the future. Then, without doing much damage to our original conception of the SM, .we could admit that the rules favored by MA gradually acquire a headstart over the others, so that the staggered parallel processing becomes more and more staggered until, with sufficient parsing experience, it becomes effectively indistinguishable from serial processing. The MA choices between rules would then have become hardened into the system, much as in current ATNs. What this amounts to is a possible mechanism by which the arcs in an ATN network could have become ordered in accord with MA. In other words. the proponents of ATNs could hold to their current description 01 the adult human parsing mechanism if they were prepared to adopt the SM as the correct model of the parsing mechanism of the language learner. It is extremely difficult in general to determine empirically whcthcr or not a phenomenon that would result automatically from the interplay of on-lint processes has also (redundantly) been coded into memory. We have no positive evidence in the case of MA for supposing that this has occurred, but we would not be dismayed if further research should show that it has. Ilowever, WC take our earlier arguments to have established that this minor modification of the SM’s treatment of MA is the most that is even likely to be necessary.
4. Local Association Wanner has presented data which suggest that human sentence parsing is governed by a Right Association (RA) principle (“Terminal symbols optimally associate to the lowest nonterminal node”. Kimball, 1973). We accept this point. and will offer an account of RA within the SM in the next section of
Is the human sentence parsing mechanism an ATN?
439
this paper. We also agree with Wanner that RA does not constitute motivation for assuming that parsing is divided into two stages. What we dispute is that this leaves the assumption of two-stage parsing without any empirical motivation at all. (Wanner says: “There is nothing about FF’s observations which would require a parser with properties (A) and (B)“, where property (A) is “the existence of 2 separate stages of parsing.“). As noted in the summary of our arguments in section 2, it was our own Local Association principle, not Kimball’s original Right Association principle, that we took as evidence for two-stage parsing. Indeed, ‘Local Association’ just is the term that we coined (unfortunately without honoring it with capital letters in FF) to refer to attachment preferences of the kind that would follow from the first stage parser’s limited view of the input sentence. Wanner’s demonstration that the HSPM does after all abide by something like Kimball’s RA principle does not touch on the question of whether it also exhibits LA phenomena. In fact, LA shows up quite clearly in the way that the basic principles MA, RA, and RALR interact with each other. We will now review the nature of this interaction. What we will show is that it is much more intricate than Wanner has recognized, that it cannot be captured by any minor modification of current ATNs, but that it is a predictable consequence of two-stage parsing. Wanner claims that when MA and RA conflict, it is MA that wins. For some sentences this is true, and Wanner’s account of why it is true is perfectly acceptable. For example, in analyzing sentence (20), the parser will first construct the partial phrase marker (2 1) in accord with MA. (20) Joe bought
the book for Susan.
I
bought
A
Det
I the
b&ok
RA would favor the attachment of for Susan as a modifier to the book as in (22b), rather than as an immediate constituent of the verb phrase, as in (22a).
440
J. D. Fodor and L. Frazier
b.
(22) a. NP
I Joe
NPA”P
AVP
’ VAN,’
“_PP
bought
/\ Det tie
Joe N
/\ P
b?ok f!,
n-r/\
bought
NP
/l/& Det
SL
I the
i
‘i
book for
7’
Susan
MA would favor the simpler phrase marker (22a), and the conflict is in fact resolved in favor of (22a). It should be noticed, however, that there is no need to invoke the MA principle itself in order to resolve this conflict, since RALR will select (22a). That is, (22a) represents a straightforward continuation of the structure (21), while (22b) would constitute a revision of this structure by the introduction of an extra NP node into the middle of an already established branch of the tree. It is this fact that Wanner’s account trades on. The relevant arcs of the ATN network are ordered as in (23).
(23)
OC VP0
SEEK PP
SEEK NP
CAT V VP,
VP3
SEND ~fmal
JUMP
<-g=fND SEEK NP
SEEK PP
The SEEK NP arc in the VP network sends the processor off to the NP network. Here the CAT DET arc is ordered first, in accord with CAT-beforeSEEK or MA. Having found the determiner and the following noun, the only move open to the processor is to SEND this noun phrase constituent to the VP network (i.e., to attach it as an immediate constituent of the verb phrase), and then proceed with the search for other constituents at t/w VP level. The words for Susan can therefore only be attached as an independent PP within the VP, as in (22a). The alternative of attaching them within the
Is the human sentence parsing mechanism an ATN?
441
NP, as in (22b), would require that the processor postpones the SEND arc until it has explored the alternative SEEK NP SEEK PP route through the NP network. But this it will not do, because of RALR. That is, having found one successful route through the NP network, it will pursue this analysis unless or until it blocks for lack of any way to attach some later words of the sentence: only in these desperate circumstances will it consider the possibility that an earlier decision, which seemed to work well enough at the time, was in fact an incorrect decision. We have already noted that this RALR principle is a very plausible one, and we have no quarrel with Wanner’s application of it to this example. We do want to show, however, that this account of the interaction between MA and RA is not sufficient. It entails that MA will aEways take precedence over RA, whereas in fact RA is dominant in some sentences. This is not to say that their interaction is random. In fact it is orderly and fully predictable on the assumption that parsing proceeds in two stages. There are two kinds of examples to be considered. In one kind, which Wanner discusses in detail, RA is dominant because MA does not determine any choice at all between alternative structures. Examples (24) and (25) are of this kind. (24) Tom said that Bill had taken the cleaning out yesterday. (25) Joe bought the book that I had been trying to obtain for Susan. In the case of sentence (25), for example, attachment of jbr Susan will require exactly the same number of nodes whether it is positioned within the highest clause as a sister to bought, in the middle clause as a sister to trying, or in the lowest clause as sister to obtain. (All that MA excludes is the attachment of for Susan as a modifier to the head noun phrase the book that / had been trying to obain.) RA is therefore free to select the attachment as sister to obtain. In the second kind of example, which Wanner did not consider, RA is dominant even though it conflicts with MA. Sentence (26) from FF is an example of this kind, and so is sentence (27). (26) John read the note, the memo and the letter to Mary. (27) Joe took the book that I had wanted to include in my birthday Susan.
gift for
For these examples, the preferred analysis has the final prepositional phrase to Mary or jbr Susan attached as a modifier to a head noun phrase (the letter to Mary; my birthda?> gift for Susan), even though this requires one more node than an attachment as sister to a verb (John read NP to Mary; Joe took NP for Susan). In (27), for example, MA would ensure that the words my
447 J. D. Fodor wd L. Frazier
birthday
gift were attached at first with only one NP node above them. Then the attachment of for Susan as a modifier to mq’ birth/a>) gift would involve the revision of this structure by the introduction of an extra NP node to dominate the whole of the complex noun phrase ml’ birth&~. gift fbr Susutz. This is a violation of both MA and RALR. The parser is not in fact in a last resort situation when it makes this revision; it could have ploughed on with the original analysis, and found a simpler attachment for Jar Susan as sister to one of the verbs. Clearly, it will not do to abandon RALR and MA in order to account for these examples, for then there would be no explanation for the observed preference in cases like (20). Instead, we must look for some principled division of cases into those in which MA and RALR do apply. and those in which they do not apply and allow RA to win. In the SM, this division follows automatically from the limited capacity of the first stage parser (the PPP). We can hold to MA and RALR as absolute constraints with no exceptions, governing the operations of both the PPP and the SSS. The appearance of violations stems from the fact that the PPP can only abide by these principles within its own very limited view of the sentence. Because it cannot ‘see’ much of the phrase marker, it will not detect certain attachment possibilities for incoming constituents. It will therefore sometimes have to give up on an analysis it has attempted because it cannot find any way of continuing it; it will conclude that it has reached its last resort, and will therefore quite properly revise the phrase marker that it has formed. But sometimes when the PPP concludes correctly that it is in a last resort situation, this will be false from the point of view of the phrase marker as a whole ~ there will be legitimate ways of continuing the initial analysis by attaching incoming words to nodes in the phrase marker that the PPP cannot see. In other words, even though the PPP honestly tries to abide by MA and RALR, parsing as a whole will not invariably abide by them. This account predicts that in cases of conflict, MA will take precedence over RA if the MA attachment associates incoming words with nearby words in the sentence (since the PPP will be able to see all of these words simultaneously). but RA will take precedence over MA if the MA attachment associates incoming words with distant words in the sentence (since the PPP will be unable to see the distant words). This is exactly what is observed in the contrast between sentences like (20) in which MA wins, and sentences like (27) in which RA wins. And it is this tendency to associate nearby words together, even at the expense of other principles, that we called Local Association. (We would emphasize that in the SM model LA does tzot have to be stipulated as an explicit principle that guides the parser’s decisions. It is an automatic consequence, as WC have just indicated, of the limited capacity of t11e PPP.)
Is the human sentence parsing mechanism an A TN?
443
We do not see any way of incorporating Local Association into an ATN other than by making RALR sensitive to the number of words within a constituent. One way (the best way?) of achieving this would be to stipulate that after five or six words had been processed within a subnetwork, the processor would ‘forget’ that this subroutine had been called by some higher network. For sentence (26), for example, the processor would be operating within the NP network in order to attach the note, the memo and the letter. By the time it reached to Mary it would have forgotten that it had gotten into the NP network as the result of a SEEK NP instruction from the VP network. The SEND arc out of the NP network would therefore lead nowhere. Taking this SEND arc before all the words of the input had been accommodated would thus constitute failure. So the processor would have to look for a way of incorporating to Mary within the noun phrase. As in the SM, this would not constitute a violation of RALR, but rather a mistake (due to the forgetting of earlier structure) about when a last resort situation has arisen. Of course, a parser that completely forgot the SEEK instructions that called its subroutines would simply be unable to parse most sentences of the language. So it would also be necessary to assume that some part of the processor does remember that the NP network was entered because of a SEEK arc in the VP network. But this now begins to look very much like an ATN emulation of the SM. Some part of the ATN processor, just like the PPP, loses access to higher structure within the space of half of dozen words or so; another part of the processor, just like the SSS, keeps track of this higher structure. To summarize: Wanner’s ATN, in which RALR alone governs the interplay of MA and RA, is not rich enough to account for all of the data. The reversal of the interaction of MA and RA must be explained. It appears to be triggered by constituent length. Thus RALR must somehow be rendered inoperative for long distance attachments. We, at least, have been unable to think up any plausible way of achieving this within an ATN other than by isolating lower level parsing from higher level parsing by some sort of ‘forgetting’, during lower level operations, of what higher level operations have been performed, just as in the SM. Notice in particular that the isolation involved is not a matter of making a single cut across an ATN network such that subnetworks on one side of the cut have no access to subnetworks on the other. Whether or not there is access between subnetworks has to do with the properties of the sentence being parsed (the lengths of its constituents). Thus the loss of access that puts RALR out of action must be the result of some on-line phenomenon, rather than being built into the permanent data structure.
444
J. D. Fodor and L. Frazier
The length sensitive discontinuity in the interaction of MA and RA is by no means the only indication that there is a low level parsing device which has no access to the structure of the sentence beyond its most recent six or seven words. In FF, we cited three other discontinuities in parsing that can be explained in the same way. Wanner does not refer to these arguments, though they are the heart of the motivation for the SM. In each case, his single stage ATN parser makes the wrong predictions. We will not discuss these phenomena in great detail here, for they are all quite explicitly described in the earlier paper. But it may be worth reminding readers of the points we made. First, there is a discontinuity in the correlation between height of attachment and unnaturalness of the analysis. A low and local right attachment of an incoming constituent is strongly favored, and all high and distant attachments are more or less equally unfavored. (See the discussion of sentence (16) in FF.) We criticized Kimball’s Right Association principle for predicting a steady increase in unnaturalness as a function of attachment height. Readers can trace through Wanner’s ATN and see that it makes the same incorrect prediction. The SM predicts the discontinuity, on the grounds that all distant attachments fall outside the PPP’s limited view of the sentence. Second, there is a discontinuity in the preference for attachments as a right sister to constituents already present in the phrase marker, over attachments as a left sister to subsequent items in the input. Where the right attachment to prior constituents is local, it is preferred over a left attachment to subsequent constituents. But when the right attachment would be a distant one, the parser prefers to make a local left attachment. (See the discussion of example (17) in FF.) This is so even when the local left attachment results in a highcjr attachment in the phrase marker as a whole than the distant right attachment would have resulted in. Like Kimball’s Right Association principle, Wanner’s ATN incorrectly predicts that a higher attachment is always less favored than a lower one, regardless of how local it is. In the SM, the priority of local attachment over low attachment follows from the fact that the PPP must group each word with others either immediately to its left or immediately to its right, and that it cannot ‘see’ what eventual effect these local attachments will have on the height of that word in the total phrase marker. (Note that this presupposes that bottomup parsing is permitted. In current ATNs, by contrast, all parsing is top-down.) Third, there is a discontinuity in the strength of the tendency towards local association depending on the length of the constituent to be attached. A short constituent strongly attaches itself to neighboring words, even though the result is often a nonsensical interpretation of the sentence. A longer phrase or clause can quite readily be attached to more distant consti-
Is the human sentence parsing mechanism an ATN?
445
tuents. (See the discussion of examples (25) - (28) in FF.) The parsing preferences of Wanner’s ATN are fixed and independent of the length of the constituent to be attached. In the SM, short constituents are attached by the PPP and hence can only be attached locally, while long constituents are attached by the SSS which has access to the whole phrase marker and is thus not subject to any pressure towards local attachment. These four discontinuities, all sensitive to at least roughly the same length parameter, still seem to us to add up to very strong evidence for a fundamental discontinuity in the parsing program, with structure assigned first on a very local basis to clumps of neighboring words in the input sentence. We have sketched here the beginnings of an account of these phenomena within an ATN. We don’t know whether the ‘forgetting’ mechanism that we have suggested for ATNs will prove to be the best approach. But we are prepared to bet that any ATN that could cope with all of these phenomena (in anything less than a totally ud hoc fashion) would in effect incorporate two-stage parsing.
5. Right Association The one observation in Wanner’s critique that does seem to demand some addition to the SM model is that RA apparently applies within the PPP. As Wanner notes, the preferred interpretation of a sentence like John said Bill died yesterday has the adverb attached within the lower clause rather than within the higher clause. On the assumption that the PPP can take account of six or seven words simultaneously, this preference for low right attachment cannot be due to LA; it must be due to some structural principle guiding the PPP’s attachment operations, rather than to the discontinuity between the PPP’s operations and those of the SSS. We might as well concede, while we are about it, that there is also some tendency towards RA within the SSS. For example, in sentence (29) the adverbial clause after he had finished his chores seems to attach more naturally as a modifier to the lower verb taken than as a modifier to the higher verb said. (Cf. example (24) in FF.) (29) Grandfather his chores.
said that he had taken a long hot bath after he had finished
In the heat of our demonstration that Kimball’s Right Association principle is insufficient to account for all attachment preferences, we seem to have overlooked the evidence that RA - as well as LA - does govern the parser’s decisions.
446
J. D. Fodor and L. Frazier
The fact that RA holds within both the PPP and the SSS makes things easier for us, because we can assume that it is a general property of the parsing routines, just like MA and RALR. As noted in FF (p. 314), there seems to be no need to postulate any differences between the PPP and the SSS with respect to the kinds of operations they can perform or how they perform them, other than differences which follow inevitably from the basic division of labor between them ~~ differences concerning the size of the sentence chunks they work on, the rate at which attachment decisions must be made, and so on. But if we are to meet our original goal attributing all parsing tendencies to the fundamental structure and operating characteristics of the parser, rather than to explicit ‘strategies’ which tell the parser what to do at choice points, then we need to provide some sort of plausible story about how and why RA constrains the parser’s operations. We will argue that RA attachments are very natural in the SM, in view of the way in which it builds phrase markers for sentences, and also that RA attachments lead to computational savings. In this latter respect, RA is much better motivated in the SM than in current ATNs. Let us consider first how and why RA applies in Wanner’s ATN. The answer to the lzow question is quite straightforward. Wanner observes that postponing SEND and JUMP arcs as long as possible (within the tolerance of RALR) will ensure that an incoming constituent is attached into the lowest phrase that is currently open that contains a legitimate attachment position for it. Thus there does exist a simple general characterization of RA in ATN terms. And hence it is at least conceivable that RA is innately encoded into the human sentence parsing mechanisms in the form of a constraint on the order of arcs in the network that the language learner constructs. The only further question to be answered, then, is whether there is any reason why evolution should have favored a sentence parsing mechanism that is innately constrained in this way, rather than one with the opposite constraint on arc orderings, or no constraint at all. Wanner’s suggestion is that in a parser with RA, “shifts between constituents will be minimized”. We will return to this claim shortly, but first we should consider why minimizing shifts between constituents might be advantageous. Wanner claims that minimizing shifts between levels of the phrase marker would minimize garden paths (which are extremely wasteful of computational effort). The idea is that “syntactic structure is generally more predictable within constituents than across constituent boundaries”. As it happens, this is true only for parsers resembling the SM, and not for Wanner’s ATN. Suppose, for example, that the initial portion of a sentence contains two incomplete verb phrases, one subordinated to the other, and that the incoming constituent is a prepositional phrase. A parser that abides
of
Is the human sentence parsing mechanism an ATN?
447
by RA will attach this prepositional phrase within the lower of the two verb phrases. But the existence of a prepositional phrase will be equally predictable (or unpredictable) at both levels. Indeed, in this case, it is the very same subnetwork of the ATN (the VP network) that will be the source of both predictions. There is only one respect in which future constituents are more predictable within a level of the phrase marker than across levels. This is that the immediately preceding words of the sentence, which have just been processed, will be a better basis for predicting what might appear next at the same level than for predicting what might appear next at some higher level. Thus in sentence (30), the words that immediately precede the final prepositional phrase are Mary had been reading, and these do establish that a tophrase is possible and even likely in the lower clause. (30) John was reading a book that Mary had been reading to Susan. But these same words contain no indication at all about what, if anything, is likely to occur next in the higher clause. It is the earlier words John was reading a book that are a useful basis for predicting what will appear at the higher level, but a parser that was unable to take distant words into account in making its predictions could not benefit from this information. Thus, predictability considerations would provide a motive for avoiding level shifts only in a parser that ‘forgets’ earlier parts of the sentence as it processes later parts. The PPP of the SM has this property, but no current ATN does. Unless we have overlooked some other source of motivation, it looks as if the only reason why an ATN would be structured so as to avoid level shifting must be that level shifting is inherently costly. This is plausible enough, since each shift constitutes a redirection of attention, and requires the parser to keep track of where it is coming from and where it can legitimately go to next. Shifting thus increases the housekeeping chores and also perhaps increases the chances of error. So now we must return to the question of whether RA does in fact minimize shifts between levels in an ATN. The answer is that it does not. One way in which RA might minimize level shifts is by favoring the correct analysis of a sentence more often than not; then the processor would not have to shift up and down between levels of the phrase marker correcting the positions at which it had attached incoming constituents. For this to be true, it would have to be the case that sentences with RA attachments occur more frequently than sentences with other attachments. This probably is the case. But since there is no grammatical reason for it, the reason probably has to do with the greater processing complexity of other
448
J. D. Fodor and L. Frazier
attachments. Hence, this presumed frequency difference provides no independent motivation for the parser’s avoidance of non-RA attachments. Another way in which RA might minimize level shifts is by favoring errors whose correction requires fewer shifts than other kinds of errors would. We won’t go through a detailed demonstration, but it does seem clear that RA errors are no less costly to correct than errors of the opposite kind. An RA error would be an error of shifting up to a higher level of the phrase marker too late; correction would require shifting down to detach a constituent from the lower level, and then shifting up again to attach it at the higher level. The opposite kind of error would be an error of shifting up to a higher level too soon; correction would require shifting down to insert an extra constituent at the lower level, and then shifting up again to continue the analysis of the sentence. The only desirable outcome of a preference for the lowest possible attachment would be that this permits a uniform correction strategy: if the current attachment proves untenable, try the next highest attachment. However, a preference for the highest possible attachment would have permitted an equally orderly correction strategy. A third way in which RA might minimize level shifts is much more direct, and it deserves more detailed attention. It turns on the principle: a shift postponed (by an RA attachment) is a shift saved. As it happens, this explanation of why a parser should abide by RA is not available in current ATNs, though it is in the SM - and, of course, in ATNs that resemble the SM in relevant respects. The issue is whether a parser that has attached the final word of a sentence low down on the right of the phrase marker, in accord with RA, can terminate its computations there, or whether it still has to shift up level by level through the phrase marker to the top S node even though there are no more lexical items to be attached at those levels. If it need not shift up to the top level, then the level shift that was avoided by the RA attachment of the last word will never have to be made at all; hence RA will result in computational economies. But if the parser must shift up to the top level, regardless of whether there are any attachments to be made on the way, then the total number of level shifts will be exactly the same for the RA attachment of the last word as for any higher attachment of it; hence RA will not result in any computational economies. (We should point out that this question of level shifts after the last word is attached is just a special case of a much more general question, viz., whether the parser can move in one fell swoop from a level at which it has just attached an item to a higher level at which it will attach the next item, or whether it must shift from one to the other passing through all intermediate levels on the way. To simplify the exposition, we will restrict attention, in what follows, to the special case of
Is the human sentence parsing mechanism an ATN?
449
sentence-final shifts. But we would like it to be clear that the advantage of the SM, which does avoid shifts to levels where there is nothing to be attached, is not restricted to sentence endings.) ATNs are so structured that each SEEK action is paired with a SEND action. If, for example, the VP subroutine is interrupted by a SEEK instruction activating the NP subroutine, this NP subroutine must end with a SEND action which switches control back to the VP subroutine. If the VP subroutine was called by the S subroutine, the VP subroutine must end in a SEND action back to the S subroutine. In current ATNs, there is no way of combining these two SEND actions so that the NP subroutine can feed back directly to the S subroutine, even if there are no more constituents to be attached at the intermediate VP level. It is because of this that an ATN must complete its analysis of a sentence with a sequence of SEND actions leading, in effect, up the right hand side of the phrase marker from the node at which the last lexical item was attached to the level of the topmost S. And it is because of this that, in a sentence such as (31), attaching to Susan at the lower VP level, in accord with RA, results in no saving of computational effort compared with attaching it at the higher VP level. (31)
I
said
b ’
\
I
lied i to
T Susan
This characteristic of current ATNs may seem to be a quite superficial one, which could easily be modified. But this is not so. One problem in modifying it is that the interplay of SEEK and SEND actions is part of the ATN ‘language’ itself, rather than part of the program formulated in that language; in Pylyshyn’s terms (op. cit.), it is buried within the ‘virtual machine’. (This is why SEND arcs do not have to be tagged with explicit
450 J. D. Fodor anu’L. Frazier
specifications of the destinations of their SEND actions.) Thus the pattern of SEND actions is not readily accessible for alteration. But in any case, this step-by-step level shifting is deeply entrenched in current ATNs, since it is essential to the treatment of syntactic prediction. When an ATN subroutine is interrupted by another, there is no looking ahead to see what will be involved in the completion of the first subroutine once the interruption is over. In particular, there is no mechanism for noting the presence of obligatory attachment arcs (i.e. SEEK, CAT or WORD arcs that cannot be by-passed by JUMP or SEND arcs) following the arc at which the interruption occurred. The parser therefore must transfer control back to the interrupted subroutine before terminating its analysis of the sentence, because otherwise it would have no way of establishing that all the constituents that the grammar requires to be present in the sentence have actually been identified in it. In other words, even if there are no lexical items to be added at some intermediate level of the phrase marker, there must be a SEND action up to that level in order to establish that there are no more constituents needed at that level. If the parser did not do this, it would have no way of determining that the input sentence was ungrammatical for lack of an obligatory constituent (e.g., “John put the book); it would have no way of determining that it had been garden-pathed into an incorrect analysis of the input that makes it uppeur to be ungrammatical for lack of an obligatory constituent (e.g., “John put the book [thut Mary hud been reading in the study] ); and it would have no way of recognizing the ‘gaps’ in sentences created by transformational movement or deletion of constituents (c.g., Where did John put the book -?). We will now show how the SM can avoid these redundant level shifts, i.e., shifts to levels at which there is no action to be performed other than that of checking that there is no action to be performed. We have argued (in connection with MA) that the grammatical rules for the language should be stored separately from the specification of computational operations involved in the application of those rules. They will reside in a special ‘rule of the parser as library’, and will be accessed by the executive component needed. This means that the SM cannot rely on a built-in pattern of SEEK and SEND actions, as in an ATN, to ensure that the rules are accessed and applied in the proper sequence in building up a phrase marker. In FF we pointed out that rule applications can be properly sequenced in this model if we assume that “the human parsing mechanism not only processes what it does receive but also makes predictions concerning what it is about to receive.” We elaborated this (pp. 3 16-3 17) as follows: We propose to permit both the PPP and the SSS to postulate obligatory nodes in the phrase marker as soon as they become predictable. even if their lexical realiza-
Is the human sentence parsing mechanism an ATN?
451
tions have not yet been received... If these predicted nodes should continue to dangle for lack of any corresponding lexical items in the sentence, they will signal ungrammaticalities of omission [or ‘gaps’ from which constituents were moved or deleted by transformations, cf. footnote 151 . .. They will also sometimes serve to resolve what would otherwise be temporary ambiguities in sentences.
The role of the word was in the sentence fragment That the youngest of was proved to... is unambiguous. The complement clause must contain a verb phrase, and its position in the phrase marker requires that in the lexical string this verb phrase should precede the verb phrase of the main clause. Therefore was proved to... can only be attached within the subordinate clause, not within the main clause. But either attachment would appear to be legitimate to a parser which did not enter the predictable subordinate VP node before attempting to connect was into the phrase marker. This is only one of the innumerable examples in which node prediction can save a parser from the danger of being garden pathed by potential attachment ambiguities. The idea, then, was that in parsing a sentence such as this, the SM would compute the structure (33) rather than (32). the children
(32)
that A the youngest of the children
s
(33) (NP)/
I
that
the youngest of the children
VP
451
J. D. Fodor and 1,. Frazier
Partial phrase marker (33) makes explicit the need for a VP at each clausal level, and also explicitly indicates the relative ordering of these VPs. All that is needed, therefore, to ensure that the words are attached under the nonterminal nodes in the right places is that the parser should be constrained to move around the bottom of the partial phrase marker in an orderly fashion, expanding nonterminal nodes in sequence from left to right, without skipping any. (Or, as noted, if there is no alternative to skipping a node, the parser will recognize a transformational gap or an ungrammaticality of omission.) The important point here is that in the SM model, predictable nodes are prefigured in the phrase marker. Like the paired SEEK and SEND actions in the ATN, this ensures that rule applications are properly scheduled; and it is also independently motivated in any parser whose rules are stored separately from its action plans, since it minimizes rule accessing. Each rule is accessed just once, and all the information it contains is entered into the phrase marker simultaneously. In building the structure (33), for example, the parser must access the rule S -+ NP VP in order to attach the subject noun phrase; and as it does so, it enters not only the NP node but also the VP node into the phrase marker. The alternative would be for the parser to enter just the NP node, and then subsequently E-access the rule to extract the information about the VP node. This re-accessing of rules (as many times as there are nodes to the right of the arrow) would presumably be costly (especially as the parser would have to keep track of how much of the rule it had already entered into the phrase marker). And it would also call for a considerable amount of record-keeping in order to ensure that the rules were re-accessed in the right order ~ e.g. that the lower application of S + NP VP in (33) was completed before the higher application. Efficiency considerations in the SM (though not in current ATNs) theEfore favor the node prediction alternative. It is the fact that the SM enters obligatory nodes into the partial phrase marker that guarantees that it does not need to terminate its analysis of a sentence by shifting up level by level through the tree checking the rules at each step to ensure that all obligatory constituents have indeed been found in the lexical string. Instead, it can simply look to see whether there are any dangling nodes in the phrase marker it has constructed so far. If there are none, it can safely terminate its computations in the knowledge that the analysis it has computed for the sentence is complete. The SM model therefore does explain how RA is advantageous for the human sentence parsing mechanism: an RA attachment can avoid a shift of attention from one level of the tree to another, and the level shift avoided by an RA attachment has a good chance of being avoided altogether.
Is the human sentence parsing mechanism an ATN?
453
A brief digression: we have described the SM as simple ‘looking to see’ whether the phrase marker contains any dangling nodes, but clearly this metaphor must be cashed out in the form of some explicit mechanism. The kinds of representation and computation generally made use of in the modelling of sentence parsing may not lend themselves to an appropriate implementation of this idea9 ; we suspect that models of visual skills (e.g., Kosslyn and Shwartz, 1977) may come closer to what we have in mind. What we want to capture is the notion of scanning the phrase marker, spotting the next node, and jumping across to it. This is surely tl;,: appropriate way to characterize subjects’ performance in the corresponding visual task - for example, to focus on node A in diagram (34) and then to shift attention as rapidly as possible to the first node that is not connected to the string of words at the bottom. (34) Y X-B
C
A -
1
‘Given the computational devices in standard use for sentence parsing at present, the easiest way to implement this in practice might well be to have the parser track through the tree structure itself, passing from the node at which the last attachment was made to the node at which the next attachment is to be made vti the nodes at intermediate levels. This would be like the step-by-step level shifting that we have criticized in ATNs. Nevertheless, it could still save the SM some computational effort when there is no material to be attached at a given level. Because the SM explicitly encodes its predictions into the phrase marker in the form of dangling nodes, the device that searches the tree for the next dangling node to be dealt with (or any dangling node left at the end of the parse) could be a very superficial look-ahead device, distinct from the routines which actually do the work of parsing the sentence. It would not have to have access to any grammatical information about sentence structure, and it would have no parsing decisions to make;its only job would be to distinguish between the presence of a node and the absence of a node. Thus, unlike current ATNs, the SM would save the effort of checking the grammar against the phrase marker at every level up the right hand side of the tree. Our argument about the benefits of RA to the SM therefore go through even on this assumption. As noted, however, we hold out some hope of finding a more direct implementation of the idea of ‘looking to see’ whether a dangling node is present. Indeed this is essential to us if the SSS can receive a package from the PPP that contains a dangling obligatory node. If the SSS were to attach this package and then select the next node to work on by moving up the tree from the point at which it made its most recent attachment, it would overlook this dangling node within the package it just attached. What it must do instead is ‘stand back’ from the partial phrase marker it has constructed so far, and identify the leftmost dangling node in the whole structure.
454
J. D. Fodor and I,. Frazier
We would certainly not expect subjects to find node B by searching up through nodes X and Y and then down the branch to B. Until a better understanding of the mechanism of visual scanning has been achieved, the visual metaphors in our description of how the parser scans the mental representation that it has constructed for the sentence are likely to remain no more than metaphors. But we would suggest that they are at least the right metaphors, and that the human sentence parsing mechanism can shift its attention directly to the next node at which some action is called for without having to compute its way through all intervening nodes. To summarize: because the SM extracts information from its grammatical rules and encodes its syntactic predictions explicitly into the phrase marker that it is constructing, it needs only a simple node detection device to determine whether the analysis it is computing meets all the syntactic obligations imposed by the grammar. It therefore does not have to check the phrase marker against the rules level by level all the way to the top; if there is nothing to be attached at some level, it does not have to shift to that level. Therefore, RA attachments will minimize the number of computations involved in parsing sentences. The SM thus provides an answer to the M~/IJ’ question about RA, and it remains only to show that the Izow question can also be answered. The argument from economy of rule accessing for entering nonterminal nodes into the phrase marker before their lexical realizations have been encountered in the input implies that optional nodes as well as obligatory nodes should be established in the tree as soon as the Elevant rule is accessed, i.e., as soon as the leftmost daughter node introduced by the rule is needed for the attachment of some lexical item.” For example, the optional PP node introduced by the rule VP + V NP (PP) would be entered into the
“The insertion of optional nodes into the phrase marker bcforc the input word string has provided any evidence that they arc needed might seem to be hopelessly inefficient since there arc such a vast number of options defined by the grammar. Winograd (1972) has argued that building options such as conjunction into an ATN network is extremely inefficient. but building them into the phrase marker that is constructed for each scntcnce is surely even worst. llowcver, MA severely limits the number of optional nodes that will in fact bc cntcrcd by the SM. It is true that anywhere that there is a noun phrase there is the possibility that the noun phrase will consist of a conjunction of smaller noun phrases, the possibility that it will consist of a head noun phrase followed by a relative clause or a prcpositional phrase, and so on. Rut MA guarantees that these possibilities will not bc contemplated by the SM except in response to lexical items in the input; they will never be pretikurcd in the phrrlsc marker being constructed but will arise only as a consequence of a revision, cnforccd by the input, of a simpler phrase marker. Only the options represented by parentheses and curly brackets within a rule will bc prefigured within the phrase marker; the options stemming from the optional application o/‘a rule will not be. The appearance of massive inefficiency is thcrcforc an illusion.
Is the human sentence parsing mechanism an ATN?
45.5
phrase marker as soon as the verb was encountered in the lexical string. This optional node would obviously have to be distinguished from obligatory nodes, since the absence of any lexical realization for an optional node does not constitute an ungrammaticality (or a ‘gap’) in the sentence; unlike an obligatory node, an optional node can be by-passed in the course of fitting the words around the bottom of the phrase marker, without this being an indication that something is missing from the sentence. (How optional nodes are distinguished in mental representations from obligatory ones is of no concern to our argument. We could suppose that they are entered with parentheses around them, or in pink rather than in blue.) At the point at which the final prepositional phrase is to be attached in the sentence (31), the partial phrase marker that has been constructed will therefore be (35). (35)
Bill
“A I lied
W)
The parser will, as argued, scan the phrase marker and attend to its dangling nodes in sequence as it moves around the bottom of the structure. It will therefore encounter the lower PP node in (35) before it encounters the higher one. Since this node is marked as optional, the parser could in principle skip it and move on around the phrase marker to the next one. But it seems reasonable to suppose that the parser will take advantage of the opportunity that this node affords for the attachment of the words to Susan.““* (This situation is quite different, of course, from when the parser ‘t What we have been arguing is that within the SM, RA could be favored by selection pressures because it minimizes computational effort. Rut the question arises whether RA would, in any case, result automatically from the natural functioning of the SM. There arc reasons for thinking that it would. I:irst, RA will tend to increase the size of the phrasal packages composed by the PPP, thus reducing the total number of packages in the sentence and hence the rate at which the SSS has to make its decisions. As noted in l:F, the SM operates most efficiently if the PPP does its fair share of the work.
456 J. D. FodorandL.
Frazier
Second, the SM is not restricted to top-down parsing. Thcrcfore, in deciding whether or not to bypass an optional node in the phrase marker it can take account of the nodes over the word or phrase that needs to be attached - for example, of the P node that the lexicon supplies over a preposition, and even of the inevitable PP node above that. Since the parser is under pressure to attach incoming elements into the phrase marker as quickly as possible, we would perhaps cxpecl that whenever it detects a match between the incoming phrase and the attachment possibility offered by the optional node already in the tree, it will take advantage of it. (As noted in IF, the SSS is under considerably less time pressure in making its attachments than the PPP. So it might have the leisure to scan ahead and notice that it can afford to relinquish a low attachment opportunity because there is another equally good one coming along. Together with the fact that the SSS has more information about how constituent meanings fit together, this would explain why the RA tendency is apparently somewhat weaker within the SSS than within the PPP.) In other words, although it is conceivable that an SM parser could be explicitly programmed to systematically by-pass optional nodes prefigured in the phrase marker, this does not look to be a natural way for it to function. “It should be noted that the SM’s favoring of the nearest optional node that is prefigured in the phrase marker does not predict violations of MA in scntcnccs such as Joe bought the book for Susan. It is true that the less preferred attachment point for for Susan in this sentence (as modifier within the object noun phrase) appears earlier on a path around the bottom of the phrase marker than the preferred attachment point (as daughter to the VP). But MA entails that before for Susan is encountered in the input, the parser will have constructed the partial phrase marker Ci) rather than (ii).
s
(0 NPfi”P Joe
i bougll
t
tit I book
the
s
w NPAVP Joe
m
(PP)
;’ bought
NPACPP)
Det
N
I the
book
1
In other words, the optional PP at the verb phrase level wiZ1 be prefigured in the phrase marker, bccausc it is introduced by a rule that has already had to be accessed for the attachment of prior words; but the optional PP within the object noun phrase will nof be prefigured in the phrase marker, because it would have had to be introduced by the unmotivated application of an optional rule. Thus our account of how RA applies in the SM is quite compatible with the dominance of MA over the preference for low right attachments (in casts where this is not reversed by the limited view of the PPP).
Is the human sentence parsing mechanism an ATN?
457
has run out of words in the input sentence and needs to know whether there remain any dangling nodes that would invalidate the analysis. In the latter case, optional nodes would be ignored as the tree is scanned.) Our argument has been that RA can be imposed just as easily within the SM as within an ATN, and furthermore that the SM model makes it at least comprehensible that RA should have been favored by evolutionary selection. We should now consider whether there is any way in which ATNs could be modified to incorporate some such explanation of RA. Several ideas come to mind. One way would be to annotate SEEK instructions with information about obligatory arcs that must still be traversed after the SEEK action is complete. The conventional S network, for example, could be modified as in (36), with a tag on the SEEK NP arc reminding the parser of the need to return to the SEEK VP arc. (36)
SEEK NP [SEEK VP]
SEEK VP
SEND
In the normal course of events, the tag would be returned to the S network by the SEND action that terminates the SEEK NP subroutine, and would be cancelled as soon as the SEEK VP action was initiated. But at the end of the sentence, the parser could stop its computations without engaging in the usual sequence of vacuous SEND actions, as long as no current SEEK action was tagged for an obligatory constituent. Alternatively, since obligatory constituents are, in a perfectly good sense, anticipated in the structure of the network, there could be a device for tracking ahead through the network, recording in a special memory store the existence of obligatory arcs that have yet to be traversed, and cancelling them as and when they are in fact traversed. As far as we can see, there are no real objections to such mechanisms. But they are certainly complications in an ATN system. The tags on SEEK actions, or the arcs listed in the special memory store, would duplicate information that is already in the network without putting it where it must eventually end up, i.e., in the phrase marker being constructed for the sentence. In the SM, all this record-keeping is done in the phrase marker itself; information is taken from the rules and entered into the phrase marker, without being duplicated anywhere else on the way. An ATN could be designed to do the same, of course. But we would emphasize that such an ATN would differ from current ATNs in just the way that we pointed out in FF: it would have to have access to the phrase marker it has constructed while making its decisions (e.g., the decision whether or not to terminate its
458
J. D. Fodor and L. Frazier
parse of the sentence). We must confess that our way of expressing this general point in FF was misleading, in a way that Wanner (p. 223) has quite properly drawn attention to. Though we didn’t quite say so, we did imply that the parser needs to be able to view the phrase marker it is constructing in order to be able to detect several alternative attachment possibilities and compare their merits with respect to some general geometric principle. This is actually inconsistent with our goal of dispensing with overt ‘strategies’ which tell the parser what to do when there is a choice between alternative actions. Instead, as Wanner points out, the general structural preferences of the parser result precisely from its not looking at alternative attachment possibilities and deliberately choosing between them, but simply doing the first (or only) thing that present itself as a possibility. We hope we have made it clear in the present discussion that, despite this expository imprecision? it is still true that the most plausible explanations for some of these general preferences (specifically, RA and LA) presuppose that the current partial’ phrase marker is at least partially ‘visible’ to the executive component of the parser.
6. Conclusion The most striking (and, we think, the best motivated) property of the SM model of the HSPM is its two-stage structure. But there are two other properties that we also regard as important which are not shared by current ATNs. One is the separation of grammatical information about the language from the action plans that determine how the grammatical information is to be used. The other is the accessibility of the current partial phrase marker to the decision making routines. In FF we argued for the equation: HSPM = SM # current
ATNs
We have argued here that this equation still holds, even if the revised ATN that Wanner has proposed is included among current ATNs. Our own attempts to devise ATNs that do simulate the SM and hence the HSPM suggest further that the functional architecture of ATNs in general does not match that of the human sentence parsing mechanism. This conclusion must of course be a tentative one, for it is always possible that those who are more experienced than we are in working within the ATN framework will be able to construct an ATN which ranks high on the implicit evaluation metric, has the same performance characteristics as the SM, and yet does not share these three basic properties of the SM. We think
Is the human sentence parsing mechanism
an ATN?
459
that it is at least worthwhile, however, to get these issues out into the open, for ATNs seem to have a considerable appeal in psycholinguistics - perhaps because they look to be so well-tailored to the demands of natural language sentences with their recursive embedding of phrases within phrases. But it may be that what this amounts to is nothing more than that an ATN embodies a phrase structure grammar and an efficient means of applying it to word strings; it thus avoids all the complications of analysis-by-synthesis routines, or ‘backwards’ transformational derivations, and also the imprecision of ‘detective’ models such as that proposed by Fodor, Bever and Garrett (1974). But it must be borne in mind that these advantages are not exclusive to ATNs.
References J. A., Bever, T. G. and M. F. Garrett. 1974. The Psychology of Language,McGraw-Hill, New York. I:razier, L. 1978. On Comprehending Sentences: Syntactic Parsing Strategies. Ph. D. dissertation, University of Connecticut. Distributed by Indiana University Linguistics Club. I:razier, i. and J. D. Fodor. 1978. The sausage machine: a new two-stage parsing model. Cog., 6, 291-325. Kaplan, R. 1975. On process models for sentence analysis. In D. A. Norman and D. E. Rumelhart (Eds.), Explorations in Cognition, W. H. Freeman and Co., San Francisco. Kimball, J. 1973. Seven principles of surface structure parsing in natural language. Cog., 2, 15-47. Kosslyn, S. M. and S. P. Shwartz. 1977. A data-driven simulation of visual imagery. Cog. Sci., I, 265 -296. Marcus, M. 1977. A theory of syntactic recognition for natural lanbaage. Unpublished Ph. D. dissertation, M.I.T. Pylyshyn, Z. W. 1980. Computation and cognition: issues in the foundations of cognitive science. Behavioral and Brain Science, 3, 111-169. Swartout, W. R. 1978. A comparison of PARSIFAL with Augmented Transition Networks. MIT Artificial Intelligence Laboratory, AI Memo 462. Wanner, I:. 1980. The ATN and the Sausage Machine: which one is baloney? Cog., 8, 209-225. Wanner, E., Kaplan, R. and S. Shiner. (ms.) Garden paths in relative clauses. Winograd, T. 1972. UnderstandingNatural Language, New York, Academic Press. Woods, W. 1970. Transition network grammars for natural language analysis. Communications ofthe ACM, 13, 591-602. Fodor,
461
Cognition
Contents of Volume 8 Number 1 WILLIAM
MARSLEN-WILSON
and
Planck Institut fur Psycholinguistik,
LORRAINE
KOMISARJEVSKY
TYLER
(Max-
Nijmegen)
The temporal structure of spoken language understanding,
1
Discussions
JOHN R. ANDERSON (Carnegie-Mellon University) On the merits of ACT and information-processing review, 73
psychology:
L. JONATHAN COHEN (Oxford University) Whose is the fallacy? A rejonder to Daniel Kahneman
A response to Wexler’s
and Amos Tversky, 89
Book Review
MICHAEL STUDDERT-KENNEDY
College and Graduate Center, City Univer-
(eeens
sity of New York)
Language by hand and by eye. A review of Edward S. Klima and Ursula Bellugi’s, The Signs of Language, 93 Books Received.
109
Number 2 HERBERT H. CLARK and DALE H. SCHUNK (Stanford University) Polite responses to polite requests, 111 LANCE J. RIPS (University
of Chicago)
and WILLIAM
TURNBULL
(Simon
Fraser
University)
How big is big? Relative and absolute properties in memory,
145
RHIANON ALLEN (The Graduate Center of CUNY) and ARTHUR
S. REBER (Brook-
Iyn College of CUNY)
Very long term memory for tacit knowledge,
175
ANN M. PETERS (University of Hawaii) and ERAN ZAIDEL (University of California, Los Angeles and Division of Biology, California Institute of Technology) The acquisition
of homonymy,
187
Discussion ERIC WANNER (Sussex University) The ATN and the Sausage Machine:
Which one is baloney?,
209
Number 3 ELLEN
M. MARKMAN,
(Stanford
MARJORIE
S. HORTON
and ALEXANDER
G. McLANAHAN
University)
Classes and collections: relations, 227 N. H. FKEEMAN,
Principles
S. LLOYD
of
organization
in the
and C. G. SINHA (Department
learning
of hierarchical
off’sychology,
Urziversitq
of Bristol) Infant search objects, 243 J. A. FODOR,
Department, Against
tasks
reveal
early
M. F. GARRETT,
concepts
of containment
E. C. T. WALKER
and
canonical
and C. H. PARKES
usage
of
(Psychology
Massachusetts Institute of Technology) definitions,
263
Number 4 JOHN H. FLAVELL, SUSAN G. SHIPSTEAD and KAREN CROFT (Srarzford Uziversity) What young children think you see when their eyes are closed. 369 HENRY HAMBURGER (National Science Foundation) A deletion ahead of its tirne, 389
Discussion JANET
DEAN
FODOR
Massachusetts) 1s the human sentence
(University parsing
of Connecticut)
mechanism
and LYN FRAZIER
an ATN‘?, 417
(Universitv
of
Cognition
463
Author Index of Volume 8
Allen, Rhianon, 175 Anderson, John R., 73
Hamburger, Henry, 389 Horton, Marjorie S., 227
Reber, Arthur S., 175 Rips, Lance J., 145
Clark, Herbert H., 111 Cohen, L. Jonathan, 89 Croft, Karen, 369
Komisarjevsky Tyler, Lorraine, 1
Schunk, Dale H., 111 Shipstead, Susan G., 369 Sinha, C. G., 243 Studdert-Kennedy, Michael. 93
Lloyd, S., 243 Flavell, John H., 369 Fodor, J. A., 263 Fodor, Janet Dean, 417 Frazier, Lyn, 4 17 Freeman, N. H., 243 Garrett, M. F., 263
McLanahan, Alexander G., 227 Markman, Ellen M., 227 Marslen-Wilson, William, 1 Parkes, C. H., 263 Peters. Ann M., 187
Turnbull,
William, 145
Walker, E. C. T., 263 Wanner, Eric, 209 Zaidel, Eran, 187