Cognition, Vol. 4, No. 2

Cogrzirion, 4 (1976) 125-153 @Elsevier Sequoia S.A., Lausanne 1 - Printed in the Netherlands Reference In memorial tr...

Author: J. Mehler | T. G. Bever & S. Franck (Editors)

285 downloads 1143 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Cogrzirion, 4 (1976) 125-153 @Elsevier Sequoia S.A., Lausanne

1 - Printed

in the Netherlands

Reference In memorial tribute to Eric Lenneberg” ROGER BROWN Harvard

University

Eric Lenneberg and I started our research careers together. “A Study in the paper we co-authored, stands near the Language and Cognition”, beginning of Eric’s list of publications as it does of mine. As matters turned out, we never worked as co-authors again, but for nearly a decade our experiments, though independently planned, seemed always to be closely related. Both of us were studying linguistic reference, the process of naming. Not many other psychologists were interested in reference in the late 1950’s, so it was not unreasonable of a potential publisher of my manuscript Words and Things to ask me: “Just what readership do you have in mind for this book?” I had never thought of such a question and when I had to, my answer was: “Well . . . there’s Eric Lenneberg . ..” At the time we did most of our work on reference, neither of us thought of it as such. Our main stimulus was the Sapir-Whorf hypothesis that differences of structure between languages are coordinated with differences of cognition between native speakers of the languages in question. On occasion, I think we imagined ourselves to be studying linguistic meaning. In fact, it was at a conference on “Linguistic Meaning” at Yale in 1956 that Noam Chomsky gently put me wise to the fact that what Eric and I concerned ourselves with would be more properly called ‘linguistic reference’, (‘morning star’ and ‘evening star’ and all that). We were more pleased than not, since ‘reference’ had a solid manageable sound that ‘meaning’ lacked, and we used it thereafter. As a research topic, reference was largely laid to rest when Syntactic Structures came out in 1957 since that book showed that many profound things could be discovered about language with no reliance whatever on reference. Then in 1963, Katz and Fodor published their paper “The Structure of a Semantic Theory” and proved that a theory of the way in which aspects of the socio-physical world controlled the understanding of utterances was impossible since it would “have to represent all the *This is Part I of a book in progress supported by National Science Foundation

entitled Grant

The New Paradigm of Reference. No. GSOC-7309150.

This research

is

126

Roger Brown

knowledge speakers have about the world” (488-489). That was a bit of a dampener for people interested in reference. All the more so since Katz and Fodor developed the most interesting sketch of a semantic theory any of us had ever read, and they did it by limiting their domain to a small set of judgments concerning anomaly, periphrasis, ambiguity, and so on, all made with respect to purely linguistic objects, entirely without recourse to referents. Even the Katz and Fodor version of semantics did not discourage all interest in reference. Within anthropology, there developed a new set of methods and a new conception of the ethnographic task which relied on reference as well as questions and answers. The ‘new ethnography’ was (not entirely accurately) often called ‘ethnoscience’. The term is accurate insofar as the new studies aimed at describing folk classifications and folk taxonomies for domains corresponding to recognized academic arts and sciences, like botany, geography, medical diagnostics, and so on. In aim, the approach was much more general and aspired to nothing less than a study of a given society’s ways of classifying its total material and spiritual universe. The important emphases were on -emit description, that is description from a point of view internal to the culture and on explicit, rigorous behavioral methods. Some gifted anthropologists discovered a number of interesting things 1964; Conkkin, 1962; Frake, 1964; Hymes, (e.g., Berlin and Romney, 1964) but I think it is fair to say that ethnoscience has not found a clear solution to the problem posed by the Katz and Fodor approach. Since one cannot describe all the knowledge speakers have about the world, how does one know where to begin or when an ethnography is complete? These shortcomings today seem less embarrassing than they once did. The Katz and Fodor theory, while it managed to define a finite task that might, in principle, someday be finished, neglected to show how that task might be begun. So far as I know, the theory has inspired no empirical work at all. One can see why. How do you look for just the semantic markers that are needed for the interpretation, disambiguation, and periphrasis of all the sentences in English and which will mark as semantically anomalous those that are so? Even a grammar of English, one that is both complete and generally satisfying to grammarians, still does not exist, though, in our first enthusiasm for transformational grammar, it seemed not far off. Today, we are grateful for the insights we have been granted into the nature of leviathan and content to let future generations worry about how they will be able to tell when they know it all. In the early 1960’s, intensive study began of the child’s acquisition of language in his preschool years. And at first, the principal method was naturalistic ~ transcription of the child’s speech and of speech to the child.

Reference

127

Conceive if you can the psycholinguistic world’s chagrin at learning that most very young children spend an awful lot of time pointing at referents and asking “What dat.” Furthermore, when these same very young children are not involved in explicity denotative games, many of their short sentences (e.g., Fall floor; Hit ball) are spoken in close temporal coordination with events which are well described by the sentences. The problem of reference, inescapable in developmental studies, central to contemporary ethnographies, and no longer so embarrassingly overshadowed by generative grammar and semantics, stirred and came to life within psychology a few years ago. Some of the work done since then derived inspiration from both the good points and the mistakes of “A Study in Language and Cognition” and from other work done 20 years ago. I think several genuine advances in knowledge have already been achieved; not just changes, mind you, but changes that will not be turned back. That would have been good news to Eric. Tracing the connections as best I can is the tribute I think most appropriate to a pioneer of science. A Study in Language and Cognition Come with me, if you will, to Harvard and M.I.T. in the early 1950’s. American linguistics is still structural; Noam Chomsky is a Junior Fellow at Harvard, and we are all unaware of the surprise he is preparing. Skinner’s behaviorism is strong at Harvard, George Miller is still interested in communication theory, and there is a lot of excitement in Jerome Bruner’s Cognition Project about the work on concept formation that would be described in A Study of Thinking (1956). In 1954, Osgood and Sebeok published their monograph on psycholinguistics which then took the form of a very promising merger between behavioristic psychology and structural linguistics. A few of us supposed we were psycholinguists, but we were interested most of all by a set of articles written by Benjamin Lee Whorf (reprinted in 1956). Whorf was a man who had been trained in chemical engineering at M.I.T., who worked for 22 years for the Hartford Fire Insurance Company, but who, from 1924 until his death in 1941 (when he was 44 years old), pursued linguistics as an avocation. Whorf’s avocation was not casually pursued. He studied Mayan hieroglyphics, modern Aztec (Nahuatl), and, with one native-speaker informant, Hopi. He also took Edward Sapir’s famous course in American Indian linguistics at Yale. Cognitive relativism and empiricism were almost axiomatic in those days, but not in the radical form that Whorf introduced. “The categories and types that we isolate from the world of phenomena we do not find there

128

Reference

because they stare every observer in the face; on the contrary the world is presented in a kaleidoscopic flux of impressions which has to be organized in our minds - and that means largely by the linguistic system in our minds. We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are partners to an agreement to organize it in this way - an agreement that holds throughout our speech community and is codified in the patterns of our language.” (Whorf, 1956, p. 213) Of course, linguistic relativism had a history, including statements by Sapir and Humboldt, but no one had brought forward such vivid examples as Whorf nor made such bold claims. There is a recurrent metaphor in Whorf’s papers: Nature is a formless mass or expanse (etherized upon a table?) which each language with its grammatical categories and its lexicon ‘cuts up’ or ‘dissects’ in some arbitrary way. In fact, his most mind-stretching examples all involve inter-language comparisons of gramr! :rtical categories, but he did also provide some lexical examples. The most famous of these is: “We have the same word for falling snow, snow on the ground, snow packed hard like ice, slushy snow, winddriven flying snow, - whatever the situation may be. To an Eskimo, this all-inclusive word would be almost unthinkable; he would say that falling snow, slushy snow, and so on, are sensuously and operationally different, different things to contend with; he uses different words for them and for other kinds of snow.” (Whorf, 1956, p. 216) The notion of language as a categorical system laid like a grid upon an unformed reality was easily grasped and very congenial to those of us who played a part in the ‘categorization’ studies in Bruner’s group. Eric and I picked a lexical contrast rather than a grammatical one to work with because it looked simpler and because we had neither the means nor the impulse to travel to one of the Indian reservations in the Southwest. We planned to test Whorf’s hypothesis within one language English. How on earth could that be done? There is a prior question. Why test Whorfs ideas at all? He had presented his data. Why was there any need for additional data? Whole conferences mulled over the question of the adequacy of Whorf’s data to the conclusions he seemed to draw, but Lenneberg in 1953 really said all that was necessary. Whorf appeared to put forward two hypotheses: 1. Structural differences between language systems will, in general, be paralleled by non-linguistic cognitive differences, of an unspecified sort, in the native speakers of the two languages. 2. The structure of anyone’s native language strongly influences or fully determines the world-view he will acquire as he learns the language. The problem with Whorfs data is simply that they are entirely linguistic; he neither collected nor reported any non-linguistic cognitive data and yet all

Reference

129

of his assertions (which seem to reduce to the two hypotheses stated above) imply the existence of non-linguistic cognitive differences. As the case stands, in Whorfs own writings, differences of linguistic structure are said to correspond with differences of a non-linguistic kind, but the only evidence for these latter is the linguistic evidence with which he began. So Whorf seems to have been guilty of circularity, not the vicious kind, but the pleasantly stimulating kind. What his case lacks is non-linguistic cognitive data. This is where Brown and Lenneberg step in. Careful analysis of Whorf’s examples of linguistic contrast always shows that the contrast is not absolute. It is never the case that something expressed in Zuni or Hopi or Latin cannot be expressed at all in English. Were it the case, Whorf could not have written his articles as he did entirely in English. The linguistic axiom that anything that can be expressed in one language can be expressed in any other language does seem to hold. Nevertheless, there are differences. What is expressed easily, rapidly, briefly, uniformly, perhaps obligatorily in one language may be expressed in another only by lengthy constructions that vary from one person to another, take time to put together, and are certainly not obligatory. But now this is a condition which is also to be found within one language with respect to different referents. All of which suggests that the notion of a parallel between language and cognition should be susceptible of an intra-cultural test. More easily named or coded referents might be, for instance, more memorable, as measured by recognition, for speakers of English than would less easily coded referents. We allowed ourselves to imagine a universal law relating referent codability to recognition (and perhaps other aspects of cognition) with each different language having its own codability scores for given referents and the speakers of that language having corresponding recognition skills. On this grand scale it seemed a matter of indifference whether the first test were intra-cultural or inter-cultural and, of course, convenience favored the former. What should be the semantic domain? As it has turned out this was a very consequential decision for reasons we did not know about. We chose the domain of color, and we had two reasons for doing so. 1. The -etic level of description for color by which we meant a culturefree, finely-differentiated description already existed. More than a century of psychophysical study of absolute thresholds and difference limens in terms of the dimensions of hue, brightness, and saturation had given us the basic description we needed. Furthermore, a large sample of precisionmanufactured color chips intended to be psychologically evenly spaced was already in existence and could be purchased from the Munsell Color Company. We could not afford to buy the full Munsell Book of Color but

130

Roger Brown

someone in the area had a copy and once we had seen that fabulous array I do not think anything could have dissuaded us from using the color domain. Using the Munsell Book as a catalog, we could pick out the particular chips we wanted to order for our study. 2. The cross-cultural files contained very many instances of differences between languages in color lexicon. For instance, the large region of color space that is labeled in English as either blue or green is, in numerous languages, probably the majority of the New World languages, named by a single word; call it ‘grue’. Quite often, ‘grue’ is also the name of the sea. Presumably, the range of hues through which the sea passes has often, by a process of abstraction, come to be named ‘grue’ wherever any of the colors appeared. It seemed reasonable to suppose that a color transitional between blue and green in English and so not very codable would fall near the center of the range covered by ‘grue’ and so in that other language be highly codable. We thought we could realize the essence of this inter-cultural contrast in an intra-cultural experiment using colors of varying codability in English. The Colors The Munsell system conceives of color as a three-dimensional cylindrical space with hue running around the perimeter, brightness running vertically from various near-blacks at the bottom to various near-whites at the top, and saturation, minimal at the achromatic core of the column, gaining strength with each step outward and maximal on the perimeter (see Figure 1). As one Figure 1.

The Three-Dimensional

Color Space in the Munsell System

white w

brightness

tion

l-l black

Reference

13 1

leafs through the Book of Color it is apparent that most of the highly codable colors are on the perimeter where one finds the maximally saturated version of each hue. We thought it important to include in our array some colors of very high codability and, therefore, some of high saturation. There were, however, 240 chips at maximum saturation and we could hardly use them all. So we asked five judges to examine a systematic array of 240 colors and pick out the best red, orange, yellow, green, blue, purple, pink, and brown. These terms are the most frequent color terms in English (ThorndikeLorge, 1944). The agreement among judges was good, considering how closely spaced the Munsell colors are, and it was possible to identify eight chips which were the favorites for the eight names. To these we added another 16, selected so that in conjunction with the original eight the color space should be as evenly sampled as possible. I remember thinking that there was something uncanny about the eight best instances and, indeed, about all the chips in the immediate neighborhood of each. I can see them still, that Good Gulf orange, that Triconderoga pencil yellow, that blood red; they all of them shine through the years like so many jewels. And the same is not true of the filler colors. Looking them up in the Munsell Book today, I see one that I would call ‘French mustard’, also a ‘tan’, an ‘avocado’, a ‘light sky blue’, and so on. But they had been long forgotten and do not now have the salience of the others. Eric and I saw clearly the distinctiveness of the best instances in the array of 24, and while we believed that language could cause cognitive effects, I think we were not quite prepared for so strong an effect on perception itself. Nevertheless, we thought of no other reason for the salience of the ‘best instances’ and so went ahead and mounted one set of the 24 colors in a random arrangement on a single large chart and mounted a duplicate set on small white cards, one chip to a card. The Names Twenty-four Harvard and Radcliffe students were first shown the full array of 24 colors and asked to look it over for about five minutes. When the chart had been removed, they were told that the colors would appear one at a time in a tachistoscope and that the S’s task was to name each one as quickly as possible. ‘Name’ was defined as “the word or words one would ordinarily use to describe the color to a friend”. In retrospect, one notices an ambiguity in these verbal instructions. Was S to name each chip so that a friend could find the chip in the array, or was he simply to name categorically and approximately, with no particular task in mind. Of course, we had first exposed S to the full array of chips and that action probably suggested

132

Roger Brown

that distinctive names were called for. But it does not do so unequivocally and it is quite possible that some Ss took the task one way and some another. Why did we use these oddly ambiguous naming instructions? I can only guess at the answer, but the guess is worth making in view of later developments. We intended our experiment to be relevant to the theories of an anthropological linguist, specifically Benjamin Whorf. Anthropologists then, and most of them now, do not work with ‘subjects,’ but with ‘informants.’ The difference in the two words epitomizes a difference of professional goals. An informant is intended to be a pipeline and is presumed able to transmit the shared, learned life patterns of a community or what is generally called a culture. One may check an informant’s statements for internal consistency and reliability and, in some cases, use a second or third informant. Still, it is important to realize that in the Weltanschauung of Whorf and most anthropological linguists it makes sense to think of each referent as ha%ng a name or not having one, for all members alike of a linguistic community, and so Lenneberg and I felt some pull on us just to ask that each color be named. However, we were psychologists and we were dealing with subjects, 24 of them. Subjects are not presumed to be informants. They are members of a population and from a sample of members one hopes to be justified in drawing conclusions about the population. But the behavior of subjects always varies and is expected to, and the variation is as important as any constancy found. In fact, we expected that the degree of individual variation in naming different colors would be our best index of codability. The instructions now seem to me a kind of compromise unconsciously reflecting the several orientations of this study; one foot in anthropology, one in psychology, and a couple of others in our mouths. However the subjects may have resolved our ambiguous directions, they produced names which for some colors were single common color words, given quickly, and with little variation. For other colors, the names given were, above all, not the same from subject to subject and not speedily supplied. One subject might use an unqualified color word; others color terms variously modified (e.g., light green, blue-green, etc.); others, especially girls, used low-frequency terms linked with particular interests, like turquoise, ayua, chartreuse, puce, and so on; others named a chip by naming a familiar exemplar such as lime, avocado, even plum. While name length and reaction time and interpersonal agreement all varied together across chips, the best index was shown to be the degree of individual consensus. We worked out a formula for assessing the degree of consensus on each color, and the greater the consensus the higher the codability was taken to be for that color in our linguistic community.

Reference

The Recognition

133

Task

The Ss who were asked to recognize colors were not the same Ss as those from whom the codability data were taken. The general idea was to correlate codability scores from one sample of a language community with recognition scores from another sample of the same population. It was our expectation that codability ~~ a property of individual colors for English speakers would be directly related to color recognition. We used four recognition conditions of which it is sufficient to describe the two extremes. In the easiest case, only one color was exposed, and recognition was tested after only a seven second interval. In the most difficult case, four colors were exposed together and there was a three-minute interval during which the S was occupied with irrelevant tasks before recognition. In all four conditions, codability was positively related to recognition accuracy and the relationship was significant in all but the simplest condition. The relation grew larger as the storage problem from exposure to recognition grew more formidable; in the extreme case it was 0.523. Once again, a retrospective reading makes me wonder why we did what we did at one point. The array in which the test colors were to be recognized was not simply the full set of 24 colors as one might have expected. It was a large array of 120 colors, all at maximum saturation and including the 24 test colors, but varying in hue and brightness. Why did we use this large array instead of the smaller array of 24? I suspect once again that it was the gravitational pull of linguistic anthropology on two psychologists. If our Ss in the naming task did, as I think most of them did, understand the directions to call for distinctive names and if the recognition array had been made up of the original 24 colors, then the two tasks, naming and recognizing, would have been very like one another. Even with different Ss doing the two tasks, I think the correlations between codability and recognition would have been much higher - too high, in an odd way. It was fairly obvious that colors hard to code distinctively in a given contrast array would be hard to recognize in the same array. The tasks would have been almost the same. What made Whorfs claims so stimulating was the implication that linguistic structure, an idealized formulation, determined each native speaker’s construction of reality in all circumstances and not just in some very carefully circumscribed circumstances. There is one last point about “A Study in Language and Cognition” which has turned out to have been of the greatest importance. In order to think of our intra-cultural task as relevant to Whorf’s inter-cultural hypotheses, ‘l/i; had to be making a crucial assumption. And we were. And almost everyone then would have said that it was reasonable. We had to suppose that some other language communities would produce very different coda-

134 Roger Brown

bility scores for the same array of 24 colors and that the recognition of these other communities would follow their various codability Who would then have doubted it?

scores scores.

The Calm A Study in Language and Cognition caused no great stir. Van de Geer and Frijda at the University of Leyden confirmed our results with color coding (1960) and with a new array - human faces (1960). A few friends took notice, and it became fairly common to hear that the Whorf thesis had been confirmed in its weak form but not in its strong form. Presumably, the weak form was a correlation between linguistic structure and cognition, and the strong form was a causal developmental relation - which had certainly not been tested. Lenneberg published a report of his doctoral research in Behavioral Science (1957) under the title “A Probabilistic Approach to Language Learning”. The right context for it did not exist at the time and so it was little noted. The context is right today, and the prescience of the study is difficult to exaggerate. Lenneberg begins by scoring two points against most of the research that had been done under the rubric of concept formation. 1. Subjects are usually in possession of the concepts before the experiment begins. They are told that some stimuli are called, say, ‘piffles’ and some, ‘puffles’. The subject is to learn to sort stimuli accurately into two piles and to define the two classes. In fact, almost always his task is not to learn new concepts but to substitute ‘piffles’ and ‘puffles’ for English words or phrases. Lenneberg thought such experiments could not be accurate models of the child’s learning of color classes or of any other reference-making terms. 2. In laboratory experiments, concepts are always fully determinate classes: A referent is either, a ‘piffle’ or not a ‘piffle’; there is no indeterminacy, no such thing as membership varying in degree. Lenneberg demonstrated that English color terms did not name such determinate classes at all. In this work he used a new array of colors, the Farnsworth-Munsell 100 Hue Test of Color Discrimination, about which I shall say more later. Using every fourth color, he asked 27 Wellesley girls to name each color. The most appropriate, fairly common names for these very unsaturated colors are: Pink, brown, rose, tan, lavender, light blue, and light green. The important line to the future is that Lenneberg’s data show that these English terms name classes of a sort seldom discussed. His results, plotting hue on the baseline and percentage of girls giving each color name for each

Reference

135

chip, appear as a set of roughly symmetrical curves varying in height and, in range, always overlapping with other curves (see Figure 2). Thus, the curve for blue is tall, rising to 100% at its peak which means there was a color everyone called blue, what is nowadays referred to as an ideal or prototypical blue. From this peak the curve falls off gradually and symmetrically, overlapping on one side with aqua and green and on the other with violet and lavender. Figure 2.

Frequency Distribution of Color Names Elicited from 177 Ss. (From: Lenneberg, I95 7) 3

100

5 90

ki 80 El

a

7O

5 60 8 0 =0 2 40 t,

30

t

20

z IO aw k 0 I

9

STIMULUS

I7

25

CONTINUUM

33

41

49

57

65

( FARNSWORTH-MUNSELL

73

81

I

NUMBERS)

Without expanding as yet on the importance of this discovery, let me simply quote Lenneberg: “Concepts are best characterized as areas of waxing and waning typicality on a stimulus continuum” (I 957, p. 2). If we change nothing but the terminology, Lenneberg said that colors and probably most reference classes are not proper sets at all, not sets in which membership is an all-or-none matter. They might be called, and now have been called (Zadeh, 1965 ; Zadeh, 197 1; Kay and McDaniel, 1975), ‘fuzzy sets’ and there might be, and now is, a fuzzy-set theory in which the membership of any individual x would be expressed as a number between 0 and 1. The demonstration that color categories are fuzzy sets was not Lenneberg’s chief point in this article. He was primarily interested in how learnable various sorts of fuzzy sets might be. His method was to teach six versions of a nonsense color lexicon to different groups of Radcliffe girls. Well, actually, the words were not nonsense, they were the Zuni color terms: hek, sossona, ashena, and lhiana. However, they were not used as the Zuni

136 Roger Brown

used them; Lenneberg located them on the hue dimension and assigned them usage probabilities he was interested in. A subject was supposed to be learning a color term from an exotic language by listening to Lenneberg (as stand-in for members of the imaginary language community) name each color chip a number of times and perhaps with a variety of words. There was a control language which used the Zuni words exactly in the manner of English brown, green, blue, and pink and then there were five variant imaginary languages. The results are too complex to review in detail. As would be expected, there was much evidence in the truly invented lexicons of negative interference from English. Perhaps the most important result is the most obvious: Sets or concepts were, in general, easy to learn insofar as they were determinate, and so approached being proper sets, and they were difficult to learn insofar as they were indeterminate or fuzzy. Which is perhaps one reason why academics spend so much time trying to make proper sets out of naturally evolving fuzzy sets, such as those called Ps)ichoZogy or Anthropolog?, or Jew or Liberal or Schizophrenic or just about any set that evolves without the special constraints that operate in logic and mathematics and parts of science. In 1956, Eric Lenneberg teamed up with the anthropologist John Roberts to write the monograph: “The Language of Experience; A Study in Methodology.” This is largely a manual of procedures for the anthropologist, in the field, who would map the color terms of the native language on a standard color space. Without going into details, it is recommended that the field worker use as his standard domain a large representative sample of chips from the Munsell color space. Next, he should elicit as full a list as possible of color terms without exposing any actual colors at all. The long list he is sure to obtain is then to be reduced by excluding terms not known by nearly everyone or used with no clear consistency or unsatisfactory with respect to one or another purely linguistic criterion. Most prophetically, the authors write: “... the color terminologies of most languages will not greatly exceed ten or a dozen terms” (1957, p. 17). Berlin and Kay in their splendid study of the universality and evolution of color terms (1969) give the number elevelr as the upper value identifying languages with the most highly developed color lexicons. Lenneberg and Roberts go on to recommend that the full set of colors be systematically mounted on charts and that the charts be covered by clear acetate sheets. Each ‘informant is then invited to represent his use of the color words, one at a time, by usin g a soft china marker to map the full domain of the term. He is also asked to mark with an ‘X’ the one color chip which seems to him most typical of the term in question. In short, it was

Reference

131

always evident from “A Study in Language and Cognition” through Lenneberg’s thesis on the learning of color terms to this monograph on the “Language of Experience” that color concepts were not proper sets at all, but included small prototypical or focal color areas and also larger areas of color chips less certainly falling under the dominion of a given term. The Lenneberg and Roberts procedure was an improvement of some magnitude on the older procedure in which the field linguist simply tried to identify the local translation-equivalents, if any, for the color terms in his native European language. The new procedure, including the Munsell colors, the acetate sheets, and the identification of a focal instance, as well as a boundary, became the method of the very fruitful work in later years of Berlin and Kay (1969) and of Eleanor Heider (now Eleanor Rosch) (1972a, 1972b, and others). Lenneberg and Roberts even noted that there was a surprising amount of agreement on the foci of color terms and very little agreement on the boundaries of the terms. The time was not right for them to take the powerful next step and think of the possibility that the foci might conceivably be human universals though the boundaries were not. Still under the spell of Whorfian relativism, The Lunguage of Experience reported that for monolingual Zunis there was no lexical distinction between orange and yellow as there is in English. One notes in passing that differentiating orange from yellow has since turned out to be one of the last steps in the evolution of color terms in a language. Lenneberg and Roberts took advantage of their field experience to make a small cross-cultural test of the relation found intra-culturally between codability and recognition in “A Study in Language and Cognition.” They report: “... not a single monolingual Zuni recognized correctly either orange or yellow, thus completely bearing out the expectations based on this hypotheses” (1957, p. 3 1). You may be sure that Brown and Lenneberg, collectively, treasured this modest result since they had begun to doubt that “A Study in Language and Cognition” meant quite what they thought it meant. In 1958, Brown, from his Cambridge armchair, wrote a paper called “How Shall a Thing be Called?” which has turned out to have a role in the contemporary research enthusiasm for linguistic reference. The argument begins with the observation that though we are accustomed to think of each individual entity in the world as having a name, just one, it is always, in fact, the case that any individual (person, thing, action, whatever) can be correctly named in many different ways. “The dime in my pocket is not only a dime. It is also money, a metal object, a thing, and, moving to subordinates, a 1952 dime, in fact a particular I952 dime with a unique pattern of scratches, discolorations, and smooth places” (p. 14). To a philosopher this is very obvious stuff. All it means is that any individual entity can be

138 Roger Brown

conceived of as a member of avariety of extensions (or sets or classes) and in the act of naming it, we reveal the intension (or set-naming term) we have in mind. Happily, the article said a bit more than that. It suggested that for most individuals we have a feeling that one name from all the possible names is the name, the name truest to its nature. Sometimes the name is very generic as with spoon or dime (as opposed to, for instance, metal artifacts). Sometimes the name is very specific as in the proper names of persons. Sometimes the name is very specific for one community and general for another. The dog that is just a dog to most of the world isPrince to the family that owns him. If there is, indeed, something like a level of truest classification for every referent in every community, what determines that level? To me, it seemed that the truest name was the one that assigned the individual referent to its level of ‘usual utility’. What did I mean by that rather opaque expression? You will notice that the individual called a dime is normally exactly equivalent, for an endless set of economic exchanges, with all the members of the extension or set that can be called dimes. A dime and a spoon have almost no functions in common; they are equivalent for almost no conceivable purpose, and so we cannot feel that metal artifact is their true name. Similarly, persons must have proper names since there are always some people for whom they are unique. The dog Prince has no equivalents for his loving family who house, feed, bathe, and play with him, but for much of the world he is exactly equivalent to all other individuals in the set called dog. Getting up from his armchair in order to eavesdrop on the nursery, Brown thought he noticed a couple of other things. Mothers, in naming things for children, seemed sometimes to take the child’s point of view and name from his level of presumed usual utility. Thus, for a very young child, a dime might well be called money in that it is, like all money, something not to be put in the mouth, but also not to be thrown away. Give the child a few years so that he can go out to buy Dad a newspaper, and money gives way to dime, nickel, penny, etc. Parents did not always take the child’s point of view, however, but named a. pineapple a pineapple, not fruit. Why? Well, probably just because one feels that is what it truly is. The truest name can shift with age sometimes, but it is not only the truest name that is given to the developing child. Eventually, he gets a variety of appropriate possible names. Sometimes, the first name given is fairly abstract like dog (not to be petted) and flower (to be sniffed, but not picked) and cur (to watch out for). Later, names are then mostly more concrete in the sense of being the names of subordinate classes - the whole welter of poodles, schnuuzers, and chihuuhuus, etc., and of bougainvilleas, rhododendrons, cumellius, etc., and of Cudilhcs, Mercedeses, AIfa-Romeos, etc. Sometimes

Reference

139

the first name for an individual learned by a child is very concrete whereas later names are superordinates and in that sense more abstract: John Simon then a critic then a journalist or should it be author? From observations of this kind, it follows that the developmental course of names for a designated individual cannot be described as having a uniform direction either from the more concrete to the more abstract or vice versa. What rather seems to be the case is that the first name is the truest name, the name of usual utility, and later names may be either increasingly abstract, increasingly remote, or either. What is, in any case, true is that the assignment of names to referents of all kinds entails the assignment of a particular texture to the world finegrained at some points and coarse-grained at others. Brown, still something of a linguistic determinist, was inclined to think that the names given the child fixed the texture of his world in its various parts, but, of course, that need not be so. The language may largely reflect the given structure of reality (Rosch, et al., 1975). One important clarification: “How Shall a Thing be Called?” speaks of the direction of development of equally appropriate names for a given individual entity. It is the referent that is fixed in the discussion of the direction of development in this paper. There is quite another question to be asked about the development of reference-making terms. Suppose we hold the term constant, say dog, and ask whether the direction of development of that term is toward increasing abstraction or toward increasing concreteness. When the child first uses dog he may use it too generally; perhaps as a term for any four-legged animal. That is a very abstract usage of dog in the sense that it applies to too large an extension and follows an intension with too few attributes. As the child grows older, he may refine his use of dog so as to exclude cats and guinea pigs and rabbits and others. His intension has grown more elaborate and so more concrete as his extension for dog has narrowed. One might guess that his intension now includes a certain range of sizes, a range of barking-type noises, and so on. In this case, with the name held constant, we would say that the meaning of a fixed term had become increasingly concrete as defining features were added. If the development went the other way and features were eliminated to embrace a larger extension, we should say the meaning was growing more abstract. Probably it is development with the term held constant that people most often htive in mind when they speak of semantic development. But in which direction does it move? Nowadays young researchers are brilliantly and assiduously exploring and extending all the questions raised in “How Shall a Thing be Called?” There are data on how parents name for children and what names children

140 Roger Brown

know and how the development progresses with referent held constant (Anglin, 1976). There are data on how meaning and reference develop with the name held constant and also a strong empirical generalization (Clark, 1973). Eleanor Rosch Heider has improved out of all recognition the idea of a truest name and replaced it with what she calls the ‘basic object’, and ‘usual utility’ has been transformed into ‘attributes’ and ‘motor movements’, and Heider has argued that there is plenty of structure in the physical world and that this is more likely to affect language than the other way around.

Color Naming; Confusion

and Resolution

This section concerns only “A Study in Language and Cognition” and the later studies that took it as their point of departure. Variations were introduced in all the major variables - the color array, codability, and the nonlinguistic cognitive task. These changes in various combinations define all the principal experiments. It may be best not to begin by reviewing these experiments in chronological order since their results are, on the face of it, quite chaotic. If, however, we first familiarize ourselves with the major changes rung on the variables, we may then review the experiments and share the thinking that determined their succession.

Major Color Arrays

The array we already know well is the Brown-Lenneberg array (and we shall call it by that name) which is made up of 24 highly saturated colors (from the outer shell of the Munsell space) including eight colors identified as “best instances” of eight English color terms. Lenneberg’s study of the learning of color terms has already introduced a second array: the Farnsworth-Munsell 100 Hue Test of Hue Discrimination (hereafter the Farnsworth-Munsell array). These colors are from inside the Munsell color space. They vary only in hue with brightness at a moderately low level and saturation very low. ,These 100 hues are spaced in perceptually equidistant steps, and they come mounted in small black caps of about the size of the top to a bottle of Jergen’s lotion. To convey the quality of all these 100 hues, one thinks of words like ‘washed out’ or ‘insipid’ or, to speak corectly, ‘low saturation’. There is not a ‘good’ instance of any color in the lot. Berlin and Kay (1969), for their study of the cultural ‘universality’ of colors, used 320 chips, all at maximum saturation, but representing 40 hues and 8 degrees of brightness. To these they added 9 achromatic hues (shades

Reference

14 1

of white, black, and grey). As Berlin and Kay note, this array is identical with that used by Lenneberg and Roberts in their classical study of Zuni color terminology. We shall call this large array of maximally saturated Munsell chips the Berlin-Kay array. Several other color arrays have been used in this literature, but none of them, I think, more than once. In addition, the remaining arrays were so carefully tailored to particular experimental hypotheses that we shall have to describe them when their time comes and there is no point in describing them first here. In sum, three different color arrays have been used repeatedly in colornaming research - the Brown-Lenneberg, Farnsworth-Munsell, and BerlinKay arrays. Others have been used but just once and with a purpose so particular that their nature is best described in connection with the experiment. Major Naming

Tasks

Two naming tasks have already been described. The Brown-Lenneberg task asked Ss to name individual colors after they had had ample time to examine the full array, “to name it as they would to describe it to a friend”. From these data, an index of ‘Codability’, as we shall call it, was derived which was responsive to the degree of inter-subject agreement. The Lenneberg and Roberts procedure separates, as Codability does not, the process of eliciting basic terms from the application of these terms to actual colors. So also does the closely related method of Berlin and Kay. These latter authors are more explicit and more consistently linguistic in the procedures they use for reducing the original, always very large, array to true color terms, never more than I 1 in number. For instance, a basic term must be ‘monolexemic’, which means its meaning cannot be derived from its parts. Its extension must not be entirely embraced by the larger extension of some other term. Recent foreign loan words are suspect and so are objectnaming terms like adz or lime. And there are additional criteria. In the application of basic terms to a large color array (320 and 329), Lenneberg and Roberts and Berlin and Kay proceeded identically. On clear acetate sheets placed over the color array, an informant was asked to indicate all chips which under any conditions might be called by term ‘X’ and, in addition, he was asked to mark out the best, the most typical examples of ‘X’. This procedure we will call the ‘Mapping Procedure’. There is one other naming task that has been used often enough to warrant definition here. It was intended to be an improvement on the original Codability procedure and was first used by Lantz and Stefflre. You may recall that I found it puzzling in our Codability procedure that Lenne-

142

Roger

Browrl

berg and I had used the slightly ambiguous direction “to name a color as you would in describing it to a friend”. Since the full array had been seen, the verbal directions probably suggested that distinctive names were in order, but we did not explicitly say so. The measure I will here call ‘Communicability’ made it perfectly clear that distinctive names were required. Two groups of subjects were used: ‘encoders’ and ‘decoders’. The directions to encoders began: “I’m going to show you some colors, one at a time, through this opening, and I’d like you to name each color using the word or words you would use to name it to a friend so that he or she could pick it out” (Lantz and Steftlre, 1964). The encoders looked first at the array of colors to be named so that they knew the difficulty of the task. In addition, however, there was a second group of Ss, decoders, to each of whom E read 4 messages from each encoder’s 20 messages, or 80 messages in all, randomly selected for each decoder. The task? “I’m going to read some color names to you, one at a time. After I say a name, look at the colors in front of you and point to the color that seems to be the color that name refers to” (p. 476). The Communicability score of a color was simply the mean error score in decoding that chip. There are other naming tasks to be described, but since none of them is used more than once, I will reserve the description for my account of the experiment. Major Cognitive

Tasks

Only one of these has thus far been described -- Recognition. In “A Study in Language and Cognition” there were four versions of the Recognition task, varying the number of colors exposed and the length of the post-exposure retention interval. The single term ‘Recognition’ will serve for all procedures of this class. In some cases, however, it will be important to specify intervals and also the nature of the color arrays in which a test chip has to be recognized. Heider and Olivier had the Dani (a Stone-Age agricultural people of Indonesian New Guinea) name each of 40 colors and also recognize exactly the same 40 colors. As a result, they had two 40 X .40 matrices, one for recognition and one for naming. In the recognition array, for example, a cell defined by the conjunction of colors ‘i’ and ‘j’ would contain the number of times that color ‘j’ had been mistakenly recognized as (or confused with) color ‘i’. In the naming matrix, an ‘i, j’ cell would contain the number of Ss who called chips ‘i’ and ‘j’ by the same rzame. From these two ‘confusion matrices’, as they are called, it was possible to do a multidimensional scaling analysis. The two analyses yielded three-dimensional spaces

Reference

I43

defining, respectively, the naming data and the recognition data. Scaling, as I shall call it, is really a distinctive way of operating on data which, as such, are obtained by the familiar naming and Recognition procedures. The Critical Experiments The research done on color naming and cognititon does not simply consist in all logically possible combinations of the color arrays, naming procedures, and cognitive tasks described above. There was more direction than that in the course of events. Burnham

and Clark, 19.55

This study was only concerned with hue Recognition, not with naming. Burnham and Clark used the Farnsworth-Munsell array of unsaturated, washed out colors and exposed one chip at a time for 5 seconds and after a 5-second interval asked to have it identified in an array. It was Lenneberg who related the Burnham and Clark recognition data to naming data. His report of his thesis research in which 27 Wellesley girls named a sample of chips from the Farnsworth-Munsell array provided the link. Lining up his naming curves with the Burnham and Clark recognition curves, Lenneberg saw that there was a relation, but not of the sort discovered in “A Study in Language and Cognition”. In his words: . ..“while Brown and Lenneberg found a positive correlation between naming and recognition, we are now confronted with a very obvious negative relationship between the two variables” (p. 377). To cite the most deplorable example, the color chip most accurately remembered in the Burnham and Clark study was the color with the most uncertain name in English. So Lenneberg had managed to uncover what looked like a decisive disconfirmation of our earlier work. Lenneberg,

1961

In this paper, Lenneberg described the contradiction noted above and then undertook to reconcile the two sets of results. The disagreement was surely a consequence of the difference between the two color arrays used: the Brown-Lenneberg array with its maximally saturated hues, including the ‘best instances’ of eight color terms, and the unsaturated Farnsworth-Munsell hues which included no really good instances of any color terms. In the Farnsworth-Munsell array, it would be worse than useless to code any chip as green since there were very many greens, some more typical than another. The most distinctive points in the naming curves for this array were the

144 Roger Brown

points of intersection of two names; green-blue would be a more distinctive coding than either green or blue. Eric put it this way: “This paradox is solved if we abandon the notion that it is merely ‘high codability’ that facilitates recognition and assume, instead, that it is our habit of structuring color material semantically which provides a number of anchoring points in a rccongition task” (p. 378). That seems to have been a pretty good synthesis but, in retrospect, there is reason to think that the inclusion of highly saturated focal colors in one array and their total exclusion from the other may have been a more important consideration. Luntz and Stefflrrc, 1964 Volney Stefflre came out of the West to become a graduate student and, later, a Junior Fellow at Harvard, and he brought an intensive enthusiasm for the work Lenneberg and 1 were doing. Except that he thought we were using the wrong operation for Codability and he teamed up with DeLee Lantz, another Harvard student, to write a paper called “Language and Cognition Revisited” which made a powerful case. Not naming agreement, but ‘communication accuracy’, or, as I have called it, ‘Communicability’ was the measure that would show what language can do to Recognition. The contradiction between the Burnham and Clark data and the Brown and Lenneberg data was on the record by this time and Lantz and Stefflre designed a neat experiment to resolve the contradiction more effectively than Lenneberg’s ‘anchoring points’ principle had done. Lantz and Stefflre used the two color arrays that had thus far been involved: the FarnsworthMunsell and the Brown-Lenneberg sets. They proceeded to get both Codability and Communicability scores for all colors in both arrays and to obtain Recognition data, at three levels of difficulty, again for all colors. Their prediction was straightforward: whereas Codability scores should be positively related to the Brown-Lenneberg array and negatively to the Burnham-Clark array, Communicability scores should prove to be related positively to both. Their expectations were richly fulfilled. Consider only the Recognition task of intermediate difficulty which involved four colors exposed for five minutes and with a five-minute delay. Communicability correlated with Recognition for the Brown-Lenneberg array at the level i-O.86 and with the Farnsworth-Munsell array at the level +0.7 1 1. Codability correlated with Recognition for the Brown-Lenneberg array at the level +0.40 and with Recognition for the Farnsworth-Munsell array at the level -0.05. All other results were comparable, and most correlations were significant at the 0.001 level. The Communicability scores for the BrownLenneberg array actually correlated more highly with Recognition scores than the original Codability scores had done.

Reference

145

Lantz and Stefflre certainly had made their case for Communicability and it was soon bolstered with additional data. Stefflre et al. provided a cross-cultural replication (Spanish and Yucatec) in Yucatan in 1966. Lantz and Lenneberg, also in 1966, reported a study of color Communicability and Recognition in deaf children and adults as well as in normals. This was Lenneberg’s first use of the Communicability index and presumably indicates that he had accepted its superiority to Codability. All correlations positive and of good size. There was an incidental effect of considerable interest that remains unexplained. The deaf Ss, both children and adult, distributed their Recognition errors over different loci in the color spectrum than did the hearing Ss. Koen (1966) found that Communicability predicted recognition for arrays other than color. Probably it is a correct general principle that for a given language community the Communicability score of a referent in an array is highly correlated with the memorability of that same referent in that same array. The principle seems to me to be established beyond reasonable doubt and to be important. There is, however, a certain irony in the outcome. “A Study in Language and Cognition” set out, rather ineptly, to test for a relation between a structural fact about a language and relate it to a cognitive process in members of the linguistic community, but that is not really the sort of principle which the research tradition has ultimately discovered. A fact about language structure would be a fact about culture, and one thing that the various definitions of culture agree about is that culture is shared knowledge or behavior. Not every sort of behavior is cultural. Whorf, writing about Hopi or Navaho or English, intended to describe features of an idealized construction, a language, which is one system within culture. He used single informants and showed no interest in individual differences. The Codability measure was neither fish nor fowl, being somewhere between a cultural fact and an index of individual differences. Communicability is very clearly a psychological and not a cultural variable. It is directly concerned with individual differences‘ One person might very well be a highly skilled encoder (or decoder) for a given domain and another very unskilled, though both have the same native language. Communicability scores of individual colors would depend partly on the language, but also on salient objects in the environment. The most reasonable way to think of the relation between Communicability and Recognition is as a relation between Communicability between persons in a community and Communicability within one person over time (Recognition). As Lantz and Stefflre suggest, memory can be thought of.as the encoding of a referent for the purpose of recovering it after a period of time. This is not to depreciate the importance of the generalization, but only to say that it is an entirely psychological generalization.

146

Roger Brown

Heider and Olivier, I9 72 The work of Heider and Olivier is an example of an anthropological or cultural investigation. Their Ss were, in all their work, drawn from two very unlike cultures: the American and the Dani. The Dani, as I have said, are a Stone Age agricultural people of Indonesian New Guinea. Heider and Olivier used a small array of just 40 Munsell chips, composed of 4 brightnesses and 10 hues, all at rather low saturation. This is the only array other than the Farnsworth-Munsell that completely excludes highly saturated colors. Heider and Olivier could not use the Farnsworth-Munsell set because it does not vary brightness, and there was reason to believe that brightness was a major dimension of Dani color terms. Each S was asked to perform two tasks. 1. To name each of the 40 chips (no indication being given that discrimination was important). .The Dani, for instance, were told: “This here”. (E points to a chip.) “What is it called?” And U.S. subjects were just told “to give a name” to each chip. 2. In Recognition, every S was shown a single test chip for 5 seconds, waited 30 seconds, and then was asked to pick out the one he had seen from the array of 40. But now the question asked of the data was not how well naming predicts Recognition; it was a quite new question. The Scaling procedure was used on the two 40 X 40 matrices to determine how similar were the memory structure of colors and the naming structure of colors. It was really a question about the structures of two cultural systems. There was good reason to anticipate differences since earlier work by K. Heider (1970) had found the Dani color lexicon limited to just two terms: mili for dark and ‘cold’ hues; mola for the complementary set of bright and ‘warm’ colors. Of course, individual Dani found ways of designating other colors, but only mili and mola were used by all. Both the naming and recognition data yielded structures roughly cylindrical in form with hue, in effect, wrapped around each achromatic core. Apart from the cylindrical characteristic, the naming structures were quite unlike. The Dani scaling ‘belled out’ at either end where almost all colors were called, respectively, mili or mola. Intermediate chips were widely separated, presumably because Ss differed on the precise boundaries of mili and mola. The U.S. naming structure was the kind of rather even column to which we have become accustomed from work with Munsell colors. Here, then, was a cultural difference, a difference of language structure, very much the sort of thing Whorf had in mind. In addition, however, Heider and Olivier produced 3-dimensional structures for non-verbal recognition from Dani data and from U.S. data. The results are strong and portentous. The Recognition scales were not different from one

Reference

147

another in form though the Dani were generally less successful at the Recognition task. It is a corollary of these facts that the differences in naming structure for two communities were not paralleled by differences of cognitive structure. Heider and Olivier carried out one further task which constitutes a beautiful test of a Whorfian idea. They selected hues immediately adjacent to one another in color space, matched for two dimensions and differing in only one. In some cases, both members of a hue pair lay on the same side of a color name (whether English or Dani) and some cases the two hues lay on opposite sides of a lexical line and so in different color categories. An excellent task, I think, for testing the simplest case of the Whorfian hypothesis: Is recognition better for perceptually adjacent hues when a language line separates them than it is when both lie on the same side of a language line? There was no difference. I think Whorf would agree that these two cultural studies comparing speakers of Dani with speakers of English constituted the most representative operationalization of his ideas to date. He would also have had to agree that they did not support his ideas, not even in their so-called ‘weak form’. Berlin and Kay, I969

Here we have an example of a powerful payoff from ethnoscience and so a demonstration that important results can be achieved even when one has no good way of identifying the right place to begin nor any idea what a complete theory might look like. The important thing is to begin - somewhere. Empirical work often picks up theoretical direction after it has been started. Berlin and Kay, of course, used the Berlin-Kay array of 329 colors, all at maximum saturation, but varying in hue and brightness. They also used the Mapping Procedure, first eliciting basic color terms and then having them mapped on clear acetate sheets, with the ‘best instances’ or ‘focal’ colors indicated for each lexical item. In fact, just the procedure for which Lenneberg and Roberts wrote a kind of ‘how to do it’ monograph. But with a formidable expansion of scope. Berlin and Kay worked with informants from 20 genetically diverse languages; for example: Arabic (Lebanon), Cantonese (China), Hebrew (Israel), Hungarian (Hungary), Ibio (Nigeria), Thai (Thailand), and 14 others. In general, they had single informants for each language, and these were individuals residing in the San Francisco Bay area. In the case of one language, however, Tzeltal (a Mayan language of southern Mexico), they had 40 informants.

148 Roger Brown

Berlin and Kay, you see, had begun to suspect that the domain of color was not an ‘etherized patient’ infinitely accepting of each language’s arbitrary ‘dissections’. They thought there might be a universal aspect to color, not in the marginal areas, but in the location of foci. Remember the point Lenneberg and Roberts made about the Zuni - very stable focal regions, but much variability in the margins. Of course, not all of the 20 languages had the same number of color terms. They ranged from a maximum of 11 (like English) to a minimum of 2 (like Dani), and one could only ask for as many focal areas as there were color terms. When Berlin and Kay had boundaries and focal areas for their 20 colors, ,they superimposed the acetate sheets. Even inspecting by eye the resultant composite, it is clear that the focal areas are approximately invariant or universal. This is ‘far from the case for the marginal boundaries. One especially telling result concerns the speakers of Tzeltal, of whom there were 40. The inter-individual differences for speakers of one language were actually not greater than the inter-individual differences for individuals speaking different languages. Once direct study of color reference in a sample of the languages of the world made it clear that focal colors were approximately universal, it was possible to make some use of the much less exact data scattered through the anthropological literature on color terms in various languages. It is not unreasonable to suppose that the European anthropologist got his color lexicon for the native language by noting the words applied to the various focal areas, which he carried in his head. The imaged focal areas could function as a standard set of color chips. Berlin and Kay expanded their basic sample (of 20 languages directly studied) to a total of 98 languages by including written reports on comparative color terminology. This larger sample was especially useful for working out the second great claim of the study: A universal evolutionary order among color terms. The observation crucial to an evolutionary sequence is the detection of cumulative scaling in color terms. One might notice, for instance, that every language having a translation-equivalent of red had also terms for black and white. How could one know that the foreign words in question were, in fact, properly glossed as English white, black, and red. Ideally from the fact that the terms embraced the respectively appropriate focal chips. But where there had been no use of color chips, one could reasonably assume that the ethnographer arrived at his translations by noting that the foreign terms applied to the focal colors as he knew them. The color terms in languages revealed a consistent cumulative scale. This appears if foci are arranged from left to right in the order of most commonly appearing items to least commonly appearing. Then every language should

Rqference

149

have terms for all the foci to the left of its rightmost term. In fact, the 20 terms directly studied and all of the terms from the full sample of 98 languages did prove to be distributed like a cumulative scale. The most natural interpretation to make of such data is that priority or commonality on the scale represents historical priority in the evolution of language, and if that is correct, it would appear that languages have added color terms in a fixed universal order. The proposed order is pictured in Figure 3. A dichotomous Figure

3.

The Berlin and Kay Proposed Universal Order of Evolution of Color Terms [green]

-+ [yellow] \

[yellow]

--f [green]

[blue]

+ [brown]

+

/

system, like that of the Dani, using only white and black is the most primitive, with red being the next addition and then either yellow orgreen, followed by whichever of the two has not evolved, then blue and brown, and finally a set of 4 that cannot be ordered among themselves. This partially ordered sequence is offered by Berlin and Kay as a universal sequence of linguistic evolution. From the extreme relativism of Whorf and the anthropologists of his day, we have come to an extreme cultural universality and presumptive nativism. Lenneberg, after his several post doctoral years which were spent studying neurology as a Russell Sage Fellow, became much more of a nativist than he had been at the time of his first papers, and I suspect he found this Berlin and Kay result congenial and convincing. Heider,

19 72b

It was left to Eleanor Rosch Heider to make. a retrospective analysis of “A Study in Language and Cognition” in the light of the new data on universal focal colors and show what that old study really meant. In effect, she would turn it on its head. Just two of the four experiments in her article are needed to undo “A Study in Language and Cognition”. Perhaps you recall that I said our interpretation of the study presupposed the existence of languages in which the 24 test colors would have very different Codability scores than they had in English. Heider, no longer prepared to assume this, undertook to determine the Codability scores of colors, comparable to those we had used, in a variety of languages. Her color array included 8 focal colors like those that had turned out to be most highly codable in our study.

150

Roger Brow/l

Heider’s array was filled out with ‘internominal’ colors chosen from the centers of arrays having no clear focal colors and also a set of ‘boundary’ colors lying between areas. Heider could not use as her index of Codability the measure of inter-subject agreement that we had chiefly relied upon because there were not sufficient numbers available of native speakers of all the languages she wanted to use. Therefore, she used two other indices: length of name and latency of response. In “A Study of Language and Cognition” we had also used these indices and found them so highly correlated with inter-subject agreement that they could have been used as alternatives. I see no objections whatever to Heider’s procedure in this respect. The study included speakers of 7 Indo-European languages, 7 Austronesian languages, 4 Sino-Tibetan languages, 4 Afro-Asiatic languages, and 3 others, of which each was the sole representative of yet another language family. The results are not reported language-by-language but, rather, in terms of mean Codability scores across speakers and languages. And Codability in this study meant either length of name or latency of response. Comparing mean ‘Codability’ scores for the focal colors with mean scores for the two kinds of filler colors, it is the case that the focal colors had significantly better Codability scores on the average than either ‘internominal’ or ‘boundary’ colors. Since data for individual languages are not reported, we do not positively know that there was none in which the focal colors had a lower Codability than the latter two kinds of filler colors, but clearly the average trend was strongly the other way. In short, the evidence favors the proposition that in all languages focal colors will have the best Codability scores. English now seems not one arbitrary color lexicon among an infinite set of possible ‘dissections’ of the spectrum, but rather one lexicon which, like the lexicons of all languages, is partly determined by the universal set of focal colors. Can it be right then to interpret the correlation we found between Codability and Recognition as an example of the determining power of language? Heider did a second experiment to settle this question. She used the color array described just above, an array made up of 8 focal colors and filler colors of two kinds. The task was Recognition; one chip at a time, exposed for 5 seconds and, following a 30 second interval, to be identified in an array of 160 colors, selected to represent the full Munsell color space. This Recognition task, though not identical with that used in “A Study in Language and Cognition”, was, in all important respects, comparable. There were two groups of Ss. There were 20 native speakers of English comparable to the Ss we had used for whom Codability scores could have been calculated. But for the second group, Codability scores were quite impossible to obtain. For they were 21 monolingual Dani whose only color terms were mili and

Reference

mola and who, asked to name the colors in the test array, simply chanted

15 1

these two words at a constant rate, permitting no differentialcolor Codability scores of any kind : not latency of response, nor length, nor degree of interpersonal agreement. All of these indices yielded the same scores for all colors. The impossibility of getting Codability scores for the Dani was not a flaw in the study, but its very point. What Heider wanted to find out was whether the superior memorability of focal colors differed when there was, as in English, a superior Codability and when there was no Codability at all. The results are perfectly clear. There is an overall higher recognition performance by American Ss than by the Dani which seems to turn up on every task comparing the two groups and is hardly surprising in view of their unequal acquaintance with researchers and the kinds of things they think up. This is not the result that matters. What matters is the memorability of the focal colors relative to the filler colors for the two kinds of speakers. As it turned out, focal colors were recognized far more often than either sort of filler color for both speakers of English and speakers of Dani. In short, the differential ‘Codability’ which had been invoked to explain the original Recognition results could not be invoked to explain the differential results for the Dani; for the Dani all colors were the same with respect to ‘Codability’. What now has become of “A Study in Language and Cognition”? W. S. Gilbert would have put it something like this: “Reduced - to a special case of a completely misconceived lemma”. Focal colors are human universals and linguistic codes are all designed to fit them; English among the rest. It takes some of the sting out of the wound that Eleanor Rosch Heider was my doctoral student. And there is no sting at all for anyone when we review the principles that have been added to behavior science: 1. The Communicability of a referent in an array and for a particular community is very closely related to the memorability of that referent in the same array and for members of the same community. 2. In the total domain of color there are 11 small focal areas in which are found the best instances of the color categories named in any particular language. The focal areas are human universals, but languages differ in the number of basic color terms they have; they vary from 2 to Il. 3. Color terms appear to evolve in a language according to the universal partial ordering that appears as Figure 3. 4. Focal colors are more memorable, easier to recognize, than any other colors, whether the subjects speak a language having a name for the focal colors or not. 5. The structure of the color space as determined by multidimensional scaling of perceptual data is probably the same for all human communities and unrelated to the space yielded by naming data.

152

Roger

Brown

The fascinating irony of this research is that it began in a spirit of strong relativism and linguistic determinism and has now come to a position of cultural universalism and linguistic insignificance. Eleanor Rosch Heider puts it well: “In short, far from being a domain well suited to the study of the effects of language on thought, the color space would seem to be a prime example of the influence of underlying perceptual-cognitive factors on the formation and reference of linguistic categories” (1972, p. 20). There is even an independently conceived theory of color vision on the neurological level (De Valois and Jacobs, 1968) which seems likely to provide the physiological base for the universal focal colors (McDaniel, 1974; Kay and McDaniel, 1975). Eric Lenneberg would have welcomed all these results. He never sought to prevail, but hoped only to be understood and built upon. Prevailing is for truth.

References Anglin,

J. M. (1975) On the extension of the child’s first terms of reference. Paper presented at SRCD Meetings. Anglin. J. M. (1976) From reference to meaning. Chapter 7 of Word, Object, and Conceptual Developmenf. In progress. Berlin, B., and Kay, P. (1969) Basic Color Terms: Their Universality and Evolution. Berkeley, University of California Press. Brown, R. W. (1958) How shall a thing be called?Psych. Rev. 65, 14-21. Brown, R. W., and Lenneberg, E. H. (1954) A study in language and cognition. J. ahnorm. sot. fsychol., 49, 454-462. Bruner, J. S., Goodnow, J. J., and Austin, G. A. (1956) A Study of Thinking. New York, Wiley. Burnham, R. W., and Clark, J. R. (1955) A test of hue memory. J. app. Psychol., 39, 164-172. Chomsky, N. (1957) Syntactic Structures. The Hague, Mouton. Clark, E. V. (1973) What’s in a word? On the child’s acquisition of semantics in his first language. In T. E. Moore (Ed.), Cognifive Development and the Acquisition of Language. New York, Academic Press, pp. 65-1 IO. Conklin, H. C. (1962) Lexicographical treatment of folk taxonomies. In 1:. W. Householder and S. Saporta (I-ds.), Int. .I. Amer. I.&, 28.2 Part IV. De Valois, R. L., and Jacobs, G. H. (1968) Primate color vision. Science, 162, 533-540. l:rakc, C. 0. (1964) Notes on queries in ethnography. In A. K. Romney and R. G. D’Andrade (Pds.), Transcultural studies in cognition. Amer. Anfhrop., 66, No. 3, Part 2, 132.-145. Heidcr, 1:. R. (1972) Universals in color naming and memory. J. exp. Psycho!., 93, 10-20. Heidcr, L:. R., and Olivicr, D. C. (1972) The structure of the color space in naming and memory for two languages. COR. Psychol.. 3, 337-354. Hymcc, D. (1964) Directions in (cthno-) linguistic theory. In A. K. Romney and R. G. D’Andrade tI:ds.), Transcultural studies in cognition. Amer. Anthrop., 66, No. 3, Part 2, 6-56. Katz, J. J., and I:odor, J. A. (1963) The structure of a semantic theory. Lang., 39, 170-210. Kay, I’., and McDaniel, C. K. (1975) Color catcgorics as fuzzy sets. Working Paper No. 44, Language Behavior Research Laboratory, University of California, Berkeley. Lantz, D., and Stcfflrc, V. (1964) Language and cognition revisited. J. ahnorm. sot. Psycho/.. 6Y. 472481.

Reference

153

Lenneberg, E. H. (1953) Cognition in ethnolinguistics. I,ung., 29, 463471. Lenneberg, E. H. (1957) A probablistic approach to language learning. Beh. Sci, 2, 1-12. Lenneberg, E. H. (1961) Color naming, color recognition, color discrimination: A reappraisal. Percept. Mot. Skills, 12, 375-382. Lenneberg, E. H., and Roberts, J. M. (1956) The language of experience; A study in methodology. Intern. J. Amer. Ling. (Memoir No. 13). McDaniel, C. K. (1974) Basic color terms: Their neurophysiological bases. Paper presented to the American Anthropological Association Annual Meeting, Mexico. Romney, A. K., and D’Andrade, R. G. (Eds.) (1964) Transcultural studies in cognition. Amer. Anthrop., 66, No. 3, Part 2, l-253. Rosch, E., Mervis, C. B., Gray, W., Johnson, D., and Boyes-Braem, P. (1975) Basic objects in natural categories. Working Paper No. 43. Language Behavior Research Laboratory. University of California, Berkeley. Stefflre, V., Castillo, V. V., and Morley, L. (1966) Language and cognition in Yucatan: A crosscultural replication. J. Pers. sot. Psychol., 4, 112-115. Van de Geer, J. P., and Frijda, N. H. (1960) Studies in codability: II Identification and recognition of facial expression. Report No. E002-60, State University of Leyden, Psychological Institute, The Netherlands. Van de Geer, J. P. (1960) Studies in codability: I Identification and recognition of colors. Report No. EOOl-60, State University of Leyden, Psychological Institute, The Netherlands. Zadeh, L. A. (1965) Fuzzy sets. Inform. Contr., 8, 338-375. Zadeh, L. A. (1971) Quantitative fuzzy semantics. Inform. Sci., 3, 159-176.

2

Cognition, 4 (1976) 155-176

@Elsevier Sequoia S.A., Lausanne - Printed in the Netherlands

What’s what: talkers help listeners hear and understand by clarifying sentential relations*

VIRGINIA

VALIAN

CUNY Graduate

Center,

New

York

ROGER WALES University

of St. Andrews

Abstract It was predicted that a talker would clarify the sentential relations of an utterance if a listener indicated difficulty in hearing and understanding. Subjects read syntactically clear and distorted sentences to a listener (cxperirnenter) in un adjoining room. The experimenter often asked “What?” Subjects changed distorted versions to clear versions, while repeating clear versions essentially as first read. Other subjects were asked to make the sentences clear and simple to understand. The same basic results were obtained. Talkers thus seem to interpret a “What?” part1.y as a request for clearer sentential relations und respond accordingly. The results indicate that talkers have knowledge of underlying structure. Several alternate explanations can be rejected. A relative derivational theory of complexity, is presented. The present experiment systematically explores the talker’s knowledge and use of syntactic structure within the context of an everyday speech situation. Every talker has experienced the phenomenon of saying something to someone who is having difficulty both hearing and understanding what has been said. This can occur, for example, in a noisy restaurant, or when the talker and listener are in different rooms. Listeners commonly signal their difficulty by asking “What?” The talker must then decide how, if at all, to *The experiments were begun while the senior author was Research Associate and the junior author Visiting Professor at the Psychology Department, Massachusetts Institute of Technology, and supported in part by the Public Health Service, Grant No. MD 0516844 to M.I.T.; we particularly thank M. Garrett for his help. The research was also supported by a grant from the City University of New York I:nculty Research Award Program. R. I:iengo and D. T. Langcndoen pave linguistic advice; L. Goldstein helped with the apparatus.

156

Virginia Valian and Roger

Wales

change the original utterance. We call this a What? situation. The talker’s change options are phonetic, syntactic, semantic, or any combination of the three. The syntactic options are the focus of the present experiment. If it is true that clearly displayed syntactic relations within a sentence will aid a listener’s semantic analysis (Fodor, Bever & Garrett, 1974), and if talkers in a What? situation want to maximize ease of comprehension on the part of listeners, then one can hypothesize uniform syntactic behavior by talkers when listeners query “What?” Talkers should clarify the sentential relations of their original utterance. In many cases, clarification of sentential relations is equivalent to production of a sentence less transformationally removed than the original sentence from the deep structure representation. For example, in (l), the verb and its particle are separated by the noun phrase ‘the number’. The fact that ‘the (1) Molly forgot to bok the number up before leaving the house (2) Molly forgot to look up the number before leaving the house number’ is the object of ‘look up’ is obscured. Placement of the particle next to the verb, as in (2), clarifies this syntactic relation and also results in a sentence which is less transformationally distorted than (1). If a talker had originally uttered (1), (s)he should utter (2) after being queried; if (s)he had originally uttered (2), (s)he should repeat the sentence essentially as first spoken. Behaviour of this sort would indicate that the talker’s syntactic knowledge includes the relation between sentences which differ by whether an optional transformation has applied. This relation is not specified at the level of surface structure. However, not all cases of clarification of sentential relations need be equivalent to removal of one or more transformations. For example, although (4) has clearer sentential relations than (3), (3) and (4) probably (3) Why not finish your homework now? (4) Why don’t you finish your homework

now?

derive from different deep structures, one of which contains more empty nodes than the other. If a queried talker changed (3) to (4), the only syntactic knowledge that could. be imputed is that a sentence with an explicit subject presents clearer (because fuller) syntactic relations than a sentence without an explicit subject. (When an explicit subject is used, ‘do’ support is obligatory.) Although there is evidence to support the claim that listeners are sensitive to underlying syntactic relations (Blumenthal & Boakes, 1967; Bever, Lackner & Kirk, 1969; Levelt, 1970) and that certain tasks are facilitated if listeners are presented with less distorted sentences (Hakes, 1972; Fodor &

When talkers clarify sentential relations

157

Garrett, 1967; but see Bock & Brewer, 1974 for contrary results), there is no experimental evidence that talkers have implicit knowledge of deeper syntactic structure (though see Jarvella, 1972; Garrett, in press), nor that this knowledge could be employed when listeners signal a need for more explicit syntactic relations. In Experiment I subjects read sentences exemplifying many different linguistic constructions. Half the sentences had relatively clearly displayed syntactic relations, half were relatively distorted; all were grammatical and acceptable. Subjects were told that the experimenter wanted to simulate a What? situation, which was briefly described. All experimental sentences were then queried. The prediction was that subjects would change distorted versions to clear ones and would repeat clear versions. In Experiment II the What? situation was not simulated, but subjects were asked explicitly to clarify and simplify what they had read in order to make it easier to understand. This change of instructions was used to determine if the behavior evoked in Experiment I could also be evoked by more self-conscious instruction. There was the same prediction as in Experiment I because it was hypothesized that talkers interpret a What? in part as a request for syntactic simplification and clarification. In Experiment III subjects were given both forms of each sentence and asked to choose the simpler. This control condition tested whether an unspecified notion of simplicity would yield the same results as Experiments I and II; in these instructions there was also no mention of talkers or listeners. The prediction was that subjects would show a different pattern of results from Experiments I and II, because the task engages neither the subjects’ natural mode of responding nor delineates the relevant dimension of simplicity. The formulation presented here can be seen as a relative, rather than absolute, derivational theory of complexity. An absolute derivational theory of complexity states that one sentence is more psychologically complex than another if there are more transformations in its derivational history (Miller, 1962; Mehler, 1967). The two sentences being compared are not required to have the same deep structure representation, nor is a distinction made between optional and obligatory transformations. A relative derivational theory of complexity, on the other hand, would require of the sentences to be compared that they be derived from the same deep structure, and would claim that the psychologically more complex sentence had one or more optional transformations in its history. As sketched, the theory does not specify what psychological complexity is, and is in any event a theory about listening. Further elaborations of the theory are needed to account for talkers’ behavior. One hypothesis is that talkers tacitly know that more

158

Virginia Valiatz and Roger Wales

transformed sentences are harder for listeners to process; when listeners signal difficulty, talkers produce less transformed sentences to aid listeners’ processing. Experiment

I

Method Procedure

and apparatus

Subjects were run individually. A subject was seated in a sound-free chamber, fitted with earphones and microphone and given a face-down stack of 177 cards, on each of which one sentence was typed. The subject was told that the experimenter wanted to simulate a situation of everyday life in which the subject would say something that a listener failed clearly to hear and understand, resulting in the listener saying “What?” The subject was told that the experimenter would be seated in the outer room, listening through earphones to the subject reading each sentence, and that there would be a varying level of noise present. As a result, the experimenter would often have to ask the subject “What?“, at which time the subject should try to act as (s)he would in that situation in real life, repeating the sentence verbatim or changing it in any way (s)he chose, whichever seemed most natural. The subject was also told that the experimenter was interested in the changes the subject might make. The sequence of events was as follows: The experimenter The subject down.

said “OK”.

turned

over

a card, read it aloud,

On 146 of the trials the experimenter questioning intonation. The subject repeated The experimenter

the sentence

and turned

asked “What?”

with or without

it face

in a natural,

changes.

said “OK” 15 set after (s)he said “What?”

The subject went on to the next card. On the 31 occasions when the experimenter did not say “What?” (s)he said “OK” and the subject went directly to the next card. Subjects took a 5-minute break midway. They were asked at the end of the experiment if they thought they had responded naturally and all said they had, assuming they would have uttered the initial typed sentence.

When talkers clarify sentential relations

Examples

Table 1

I.59

of clear and distorted sentence versions for each linguistic constr!w

tion type Distorted

Clear

____

____ la Subject 1 b Object

relative relative

2 Relative copula

with

The treasure

that she found

Tony watered sold him.

was valuable.

the plant that the florist had

The people who were criticizing cian were angry.

the politi-

The

Roger insisted way.

4 Subject NP complementtram verb

It gratified Marcy that her thesis was a success.

It gratified success.

5 Subject NP complementintrans verb

It appears

It appears

6 Tag questions

The chef hasn’t has he?

7 Manner adverbials

Ginny

8 Deleted questions

Why don’t

you finish your homework

9 Permuted relatives

Somebody

who loves me called me.

started

persuasively

argued

The salesman

12 Regular passive

The spy divulged

13 Double-agent passive

Tom took advantage

now?

her

the ri_ght of thesis

was a

William is going to Chicago.

argued

started her

our order

called

yet?

case persuasively.

Why not finish your homework Somebody

florist

me who

now?

loves me.

Jesse put his shirt on.

Jesse put on his shirt.

11 Todative

Ginny

her case.

the

the politician

he had

Marcy

Has the chef

our order yet,

was valuable.

people criticizing were angry.

Roger insisted

that William is going to Chicago.

____-

she found

Tony watered the plant had sold him.

3 Object NP complement

10 Verb plus particle

that he had the right of way.

The treasure

sold a watch

to Jerry.

the secret to Emma. of Lou. (4 only)

The

salesman

sold

Jerry

The secret was divulged the spy.

a watch.

to Emma by

Lou was taken advantage of by Tom. Advantage of Lou was taken by Tom.

Materials

Eight sentence pairs were created for each of 12 linguistic constructions so that the syntactic relations were clear in one version of a sentence and distorted in the matching version. A sentence was held to display clear sentential relations relative to its semanticallyand lexically-equivalent mate under the following conditions. If both sentences were derived from the same deep structure, the sentence closer to the deep structure representation was the clear form (as in 1, 2, 3,4, 5, 6, 9, 10, 1 1 and, arguably, 7 and 8). If the two sentences were derived from different deep structures, in the

160

Virginia Valian and Roger Wales

clearer sentence the surface structure topic was the deep structure subject rather than deep structure object (as in 12 and 13), or the clearer sentence was derived from a deep structure with fewer empty nodes than the deep structure of the distorted one (as in 8). (The linguistic structure of each construction is discussed at greater length on pages 165 - 168 .) Table 1 lists each construction and gives an example of a clear and distorted version. For a 13th construction, double-agent passives, four clear forms and eight distorted forms were created. Double-agent passives are such that either of the two object noun phrases can be the subject (see Table 1). Thus, an active form has two corresponding passive forms. Data are reported for these thirteen constructions. Thirty-one filler sentences, which were not queried, were also constructed. Sentences were not controlled for length. The clear versions ranged between three and twelve words, with a mean of 7.52 and standard deviation of 1.83. Seventy-five Ilcrcent of all clear sentences were between six and nine words long. The distorted versions ranged between one and twelve words, with a mean of 6.92 and standard deviation of 1.83. Seventy-eight percent of all distorted sentences were between six and nine words long. Data are not reported for an additional six constructions. For five constructions the sentence pairs differed along different sets of dimensions than the clear-distorted dimension, such as nouns us. gerunds. The sixth construction consisted of four non-sentences which represented putative deep structure strings. A total of 3 19 sentences was constructed.

Subjects were divided into two groups of 10 each (5 female, 5 male). One group read four clear and four distorted sentences from each of Constructions 1 - 13 and the other group read the eight complementary versions. For Construction 12 (double-agent passives) one group read two active and four passive sentences and the other group read the six complementary sentences. In two of the passives the direct object was surface subject and in the other two the prepositional object was surface subject. (The remaining 44 queried sentences were divided among the six additional constructions: eight for each of five constructions, four for the sixth construction.) Each subject received a different random order of the 146 queried sentences. The 31 filler sentences occurred in one of two orders, under the constraint that no fewer than two and no more than eight experimental sentences intervened between each filler. The design allowed computation of a ‘,-factor repeated measures analysis of variance with subjects repeated across construction, of which there were thirteen types, and across syntactic form, which was either clear or distorted.

When talkers clarify sentential relatiom

16 1

The design also allowed for computation of a one-between one-within analysis, with sentences repeated across syntactic form and nested within construction. Se0 ring Subjects’ responses were divided into three categories: 1) same in critical respects as the original sentence; 2) different in critical respects from the original sentence; 3) unscorable. Substitutions of one lexical item for another or elimination of lexical items were ignored if a paraphrastic relation existed between the original and repeated sentence. A response was labeled unscorable if there was no paraphrastic relation (very rare) or if the subject chose a new sentence type that did not contain the syntactic construction being investigated. For example, if the subject removed the manner adverb from “Ginny persuasively argued her case”, the position of the adverb relative to the verb could not be assessed and the response was called unstorable. Unscorable responses accounted for 13% of the data. The criteria according to which a repetition conformed in critical respects with the original varied depending on the construction. A brief summary of the criteria for each construction is presented below. In all cases the scoring procedure for the distorted versions was the inverse of the procedure for the clear versions. Thus, only the procedure for the clear versions is described. (1) Relative. A same response required presence of the relative marker ‘that’, or division into two independent clauses. A different response required absence of the relative marker. An unscorable response occurred if the relative clause was changed to an adjective or if the relative clause was deleted. Twenty-nine percent of the responses were unscorable. The scoring is based on Smith (1964); Bever & Langendoen (197 1) suggest that ‘that’ is introduced transformationally. If their analysis is correct, the relative marker is a case where clearer sentential relations are present in sentences transformationally more distant from the base. (2) Relative and copulu. A same response required the presence of the relative marker and copula, or presence of the relative marker and a tense change of the verb to past or present (instead of the progressive), or division into two separate clauses joined by a connective. A different response was scored if the marker and copula were absent, or if the marker, copula and verb were absent. An unscorable response occurred if the relative clause was deleted or if it was permuted to the end of the sentence and changed into an adverbial (e.g., “the children made a lot of noise by chewing gum”). Thirteen percent of the responses were unscorable. (3) Object noun phrase complement. A same response required presence of the complementizer ‘that’ or some equivalent such as ‘like’; a different

162

Virginia Valian and Roger Wales

response required absence of a complementizer. A response was unscorable if the matrix clause was deleted or if the subordinate clause was converted into a non-sentential noun phrase (Rosenbaum. 1967). Fourteen percent of the responses were unscorable. (4) Subject noun phrase complement with transitive verb. A same response occurred if the complementizer ‘that’ was present, even if subjects changed the sentence to an object noun phrase complement either by passivizing the verb or by using an adjective instead of the psychological verb. A different response was scored if the complementizer was absent, again even if an object noun phrase complement was used. This criterion was based on the grounds that passivization retains the complement clause as subject of the sentence. Forty-three percent of the responses were passivizations or used a predicate adjective. An unscorable response occurred in the same conditions as for object noun phrase complements, or if the sentence was converted into a ‘for-to’ complement, or if the complement clause was changed into an adverbial by exchanging ‘because’, ‘when’, etc., for ‘that’. Nineteen percent of the responses were unscorable. (5) Subject noun phrase complement with intrarlsitive verb. A same response was scored if the complementizer ‘that’ or ‘like’ was present. A different response was scored if a complementizer was absent. An unscorable response occurred in the same conditions as for subject noun phrase complements with transitive verbs. Fifty percent of the responses were unscorable. (6) Yes-rzo (tag) questions. A same response required presence of the tag; a different response required absence of the tag (Katz & Postal, 1964). An unscorable response occurred if the repetition was not in the form of a question. One percent of the responses were unscorable. (7) Manner adverbials. A same response was scored if the adverb was either directly before or directly after the verb. A different response was scored if the adverb was placed after the object noun phrase at the end of the sentence or if placed at the beginnin, a of the sentence. An unscorable response occurred if the adverb was deleted. Eleven percent of the responses were unscorable. (8) Deleted noun phrase-verb questions. A same response was scored if the (surface) subject noun phrase and its verb or copula were present; a different response was scored if either the noun phrase or verb was absent. An unstorable response occurred if a non-question form was used. Four percent of the responses were unscorable. Post-hoc linguistic analysis indicated that one sentence had been inappropriately included: the deleted elements of the other sentences could a11 be plausibly argued to be present on a designated list of deletable elements, but the deleted elements of the excluded sentence could not be. Therefore, this sentence was not included in the data analysis.

When talkers clarifv

sentential

relations

163

(9) Permuted relatives. A same response was scored if the relative clause was placed alongside the subject noun phrase (Ross, 1967). Compression of the relative to an adjective was also allowed. A different response was scored if the relative clause and subject noun phrase were separated by the main verb phrase, or if the main verb phrase was converted into the relative clause and the relative converted into the main verb phrase. An unscorable response occurred if the relative clause was eliminated, or if two separate sentences were created. Eight percent of the responses were unscorable. (10) Verb plus particle. A same response was scored if the particle was placed directly after the verb (Chomsky, 1964). Passive constructions, such as “the clerk was bawled out by his supervisor” were allowed. A different response was scored if the particle was placed after the noun phrase. An unscorable response occurred if a verb which does not take a particle was substituted for the original verb, or if the object noun phrase was deleted so that the particle could appear in no position other than directly after the verb. Six percent of the responses were unscorable. (11) To-dative. A same response was scored if the ‘to’ was present; a different response was scored if the ‘to’ was absent (Fillmore, 1965 ; Jackendoff & Culicover, 1971). An unscorable response occurred if the dative was eliminated. Eight percent of the responses were unscorable. (12) Double-agent passives. A same response occurred if the sentence was repeated in the active voice; a different response was scored if the passive voice was used (Chomsky, 1957). An unscorable response occurred if a middle form (“Philip and Emily got out of touch”) was used. Four percent of the responses were unscorable. (13) Regular passives. The same criteria were used as for (12). Four percent of the responses were unscorable. A same response was arbitrarily scored with an 8, a different response with a 2. Subjects Subjects were linguistically naive paid volunteers with normal hearing. Ten subjects were eliminated from the experiment either halfway through or immediately after testing because of their failure to change more than 25% of their utterances; one was eliminated because he was a poor reader; one because she did not follow instructions. This left a total of 20 subjects whose data were analyzed. Results As Table essentially

2 shows, there was an overall tendency to repeat the sentence as read, but this tendency was stronger for the clear versions than

164

Virginia Valian and Roger Wales

for distorted versions, as predicted. When changes were made in the clear versions, they were only 1% times more likely to be a distorted version than to be an unscorable response. When changes were made in the distorted versions, however, they were three times more likely to be a clear version than to be an unscorable response.

Percent

Table 2

response

types to clear and distorted

sentence

versions collapsed

across constructiorl type

Response

Type

Veqion

stay

different

unscorable

Experiment

I

clear distorted x

69 42 55

18 45 32

13 14 13

Experiment

II

clear distorted x

60 31 45

20 53 31

20 15 18

The major prediction that the ratio of same to different responses would be greater for the clear versions than for the different versions was strongly confirmed. Table 3 gives the mean response scores to clear and distorted versions for each construction type; the higher the score, the greater the proportion of stay responses. In one set of scores the scores are averaged across subjects, in the other set across sentences. With subjects as the repeated measure across construction and syntactic form, the effect of syntactic form was significant beyond the 0.001 level, F, (1,19) = 87.02. With sentence items as the measure repeated across syntactic form and nested within construction, syntactic form was also highly significant, F, (1,91) = 136.39, p < 0.001. These Fs were used to compute F’min (1,47) = 53.13, p < 0.001. (See Clark, 1973, for the formulae.) There was a significant effect of construction type: the absolute score obtained by ignoring syntactic form and averaging the clear and distorted versions varied by construction, F, (12,228) = 5.58; F2 (12,91) = 4.7; F’min (12,241) = 2.55, p < 0.005. The interaction between construction type and syntactic form was also significant. That is, the ratio between the clear and distorted version scores varied by construction, F, (12,228) = 18.59;Fl (12,91)= 14.62;F’min (12,233)~ 8.18,~ < 0.001.

165

When talkers clarify sentential relations

Table 3

Mean scores for clear and distorted sentence versions presented bJl construction ~~__

Construction*: Experiment

1

2

3

4

5

6

7

8

9

10

11

12

13

I: What?

Subjects

Clear Distorted

6.27 5.55

5.65 4.95

6.13 4.50

6.95 3.63

7.33 3.43

5.35 7.93

5.65 6.55

7.90 4.27

7.37 3.65

7.40 5.50

6.10 6.53

8.00 3.95

8.00 2.45

Sentences

Clear Distorted

5.85 5.55

5.81 4.90

6.13 4.45

7.09 3.60

7.41 2.96

5.35 7.93

5.97 6.54

7.93 4.39

7.47 3.79

7.38 5.49

6.18 6.48

8.00 3.91

8.00 2.47

Experiment

II: Simplify

Subjects

Clear Distorted

6.53 4.97

5.60 4.67

7.13 4.23

6.83 3.23

7.65 3.65

4.53 7.77

4.85 5.25

7.80 3.77

6.73 2.65

7.05 4.35

5.53 6.45

7.47 2.45

7.85 2.15

Sentences

Clear Distorted

6.47 5.37

5.56 5.17

7.33 4.11

7.19 2.97

8.00 3.06

4.49 7.77

5.07 5.33

7.82 3.83

6.69 2.62

6.91 4.28

5.46 6.26

7.44 2.46

7.85 2.16

*See Table

1 for name of each construction

and examples.

As can be seen from Table 3, ten of the thirteen constructions showed the predicted effect of syntactic form, for both the subjects and items analyses. The effect was in the opposite direction for manner adverbials, to-datives and yes-no questions. By treating each construction as a one-factor repeated measures analysis, F’ mins were computed for those constructions which had significant F, s and F,s. An acceptable alpha level was set at 0.005 one-tail (O.OS/lO>. Seven constructions showed a significant effect of syntactic form in the predicted direction: subject noun phrase complement with transitive verb, F’min (1,22) = 16.46, p < 0.001; subject noun phrase complement with intransitive verb, F’min (1,20) = 28,23, p < 0.001; deleted noun phrase-verb, F’min (1,15) = 24.14, p < 0.001; permuted relatives, F’min (1,14) = 29.66, p < 0.001 ; double-agent passive, F’min (1,24) = 434, p < 0.001; regular passive, F’min (1,26) = 5 1.95, p < 0.00 1; verb plus particle, F’min ( 1,18) = 9.26, p < 0.005. Yes-no questions showed a significant effect of syntactic form in the opposite direction, F’min (1 ,15) = 11.22, p < 0.005 (two-tailed). The remaining constructions showed no significant effect. Discussion

The results strongly confirmed the prediction that talkers would clarify sentential relations when in a simulated What? situation. In some of the constructions, the clearer version was also closer to the deep structure representation of the sentence than was the distorted version. Thus, the

166

Virginia Valian and Roger Wales

results can also be interpreted to suggest that the talker’s processing mechanism has access to deeper levels of representation than surface structure. One implication of this interpretation is that some transformations are psychologically real, and that a relative derivational theory of complexity is a viable theory. For these conclusions to stand, however, it is necessary 1) to assess the linguistic status of each construction and to establish the relation between the clear version and the deep structure representation, 2) to assess the significance of the unscorable responses, and 3) to consider alternative explanations. In the case of constructions (1) - (S), the relatives and complements, it is generally agreed that the complementizer is optionally deletable, so that the deleted version is one transformation more removed from the deep structure representation than is the undeleted, clear version (but see Bever & Langendoen, 1971.). The other relatively non-controversial constructions are 6) tag questions, 9) permuted relatives, and 10) verb plus particle, where the clear version is less transformationally distant from the deep structure than the distorted version is. For these eight constructions, then, preference for the clear version is preference for a version closer to deep structure. In the present experiment, talkers showed a preference for the clear version in all but one of these constructions, 6) tag questions. One possible explanation for the extreme preference for the distorted version in this construction is that a tag question is not recognizable as a question until the tag is reached; by changing to a yes-no question, talkers emphasize the interrogative status of the sentence.* A related possibility is that subjects flesh out the question aspect and drop the statement aspect, since only one half is needed and the sentence as a whole is a question.** The shorter length of the distorted version cannot be the responsible factor, since in many of the other constructions the distorted versions are shorter than the clear versions but are not correspondingly preferred. Another possible explanation is that the linguistic analysis used here (based on Katz & Postal, 1964) is incorrect, that yes-no and tag questions are not derived from the same disjunctive deep structure, such that an additional transformation deletes the second half of the disjunction. Rather, tag questions and negative yes-no questions (but not positive yes-no questions) could be derived from the same deep structure, with the tag formed by an additional optional copying rule, and all other rules in common between the two types (Akmajian & Heny, 1975). This would make the tag question *We thank **We thank

S. Cohen Leehey D. T. Langendoen

for this suggestion. for this suggestion.

When talkers clarify sentential relations

167

more distant from the deep structure than the negative question. However, since half the yes-no questions in the present experiment were positive and half negative, this explanation would account for only a portion of the data. The remaining constructions are discussed individually. 7) Manner adverbials were treated as if they are in immediate construction with the verb, rather than the entire verb phrase. Therefore, placing the adverb on either side of the verb was held to clarify its modifying relation.to the verb and conform to its deep structure placement. However, if the adverb qualifies the entire verb phrase its deep structure position is unfixed, since it can be attached to the verb phrase either before the verb, directly after the verb, or at the end of the verb phrase. The only position ruled out is sentence initial. The very slight preference [F, (1,19) = 1.67, p = 0.2; F, (1,7) = 0.45, n.s.1 in the present experiment for the sentence final position would seem to rule out the hypothesis that the adverb modifies only the verb, on the assumption that subjects are in fact clarifying sentential relations. The preference for sentence final position is larger than it seems, since it contrasts with the combined total for 2 other positions, directly before and directly after the verb, both of which were scored as conforming to the clear version. When these positions are treated separately the preference for final position becomes apparent: 49% of all responses were in sentence final, 36% were before the verb, 4% were between the verb and subsequent object noun phrase. Indeed, this last position sounds quite awkward unless the object noun phrase is a prepositional phrase. One percent of the responses were in sentence initial position and 11% were unscorable. Thus, subjects do seem to prefer sentence final position. There is one linguistic treatment of manner adverbs which can account for the verb phrase final position. Chomsky (1965) locates manner adverbs at the end of the verb phrase, though without discussing other possible positions within the verb phrase. 8) Deleted noun phrase-verb questions can easily be argued to have more clearly marked sentential relations in the clear form, because the clear form includes the surface subject and verb or auxiliary. Any sentence which specifies this information marks sentential relations more clearly than a sentence which does not. But it is a matter of controversy how much deletion should be allowed in a grammar, with recent theory (Fiengo, 1974) eliminating as much deletion as possible, in order to constrain the weak generative capacity of the grammar. Thus, the clear and distorted sentences in this construction would not be transformationally related, but derived from different deep structures and semantically related. If deletion of a small list of designated elements were allowed (Chomsky, 1965), however, then

168 Virginia Valian and Roger Wales

the sentences used in 8), all of which delete only pronouns, auxiliary verbs and the verb ‘go’, could be argued to be transformationally related, with the deleted versions more transformationally distant than the non-deleted versions with the additional assumption that a rule which would delete just these elements could be motivated. Since deletion is so controversial, it seems more conservative to account for subjects’ responses only in terms of clarifying sentential relations and not in terms of a deep structure hypothesis. 11) To-datives have recently been argued (Langendoen, personal communication) to be generated in both forms in deep structure, rather than generated with a ‘to’ which is then optionally deleted. (In Burt, 1971, the ‘to’ is transformationally inserted; this alternative is objectionable because it builds up structure.) The very marginal preference in the present experiment for the distorted form would be most compatible with dual base generation. The preference was not significant, F, (1,19) = 0.58, F, (1,7) = 0.297, indicating that both forms were viewed as being equally clear. 12) Regular passive and 13) double-agent passive are both marked as passive in deep structure, so that changing the sentence from a passive to an active form is not conforming with one aspect of the deep structure representation. The switch to the active form does clarify sentential relations, however, by removing the discrepancy between the surface subject and the deep subject: in the active form the surface subject is also the deep subject and the surface object is also the deep object. Thus, the passive constructions are an example where clarifying sentential relations results, in one sense, in a form which is closer to deep structure, but in another sense is a choice for a different deep form altogether. Summary of discussion by construction For the three constructions where subjects preferred the version labeled as distorted, future linguistic analysis may suggest that the label was applied to the wrong version [6) tag questions] or that neither version is distorted relative to the other [7) manner adverbials and 11) to-datives] . In seven of the remaining eleven constructions which conformed to predictions, the clear version can reasonably be identified as the version which is closer to deep structure. Alternate explunations Memory difficulties can be eliminated as possible explanations: a separate group of eleven subjects was asked simply to repeat back each sentence after (s)he had read it. No subject made more than a total of 6 errors and all errors were small in scope. A more serious candidate objection, mentioned earlier, is that by excluding the unscorable responses from the analysis the

When talkers clarify sentential relations

169

prediction is vacuously confirmed: responses at variance with the hypothesis are eliminated. However, the hypothesis does not demand that subjects maintain the same constructional type. It is a measure of the perceived awkwardness or redundancy of a construction that subjects eliminate it, but this perception and behavior by subjects is orthogonal to the hypothesis, which is only in effect if the construction is maintained. For example, subject noun phrase complements with intransitive verbs received the largest percentage of unscorable responses, 50%. In 78% of these responses subjects dropped the matrix clause “it happened that” or “it seemed that”, etc. The hypothesis being tested does not predict that subjects will maintain the matrix clause, but that if they do the complementizer ‘that’ will also be present. Another objection might be that subjects’ responses to sentences of one version were contaminated by the presence of similar sentences in the other version. For example, it might be argued that subjects would not spontaneously have thought of manipulating the presence of complementizers had there not been sentence versions with and without the complementizer. There are cases, however, where the alternative form was almost never used, such as the two passive constructions and tag questions. Further, the occasional high percentage of unscorable responses indicates that subjects felt free to choose different constructions when the alternative form was not congenial. Finally, even if subjects’ behavior were contaminated as suggested, the objection does not explain why the clear version was preferred to the distorted version. A final objection might be that subjects try to maximize redundancy in a What? situation and that in general the clear version was also the more redundant version, as well as the longer version. Constructions l), 2), 6), 8) and 11) are all more redundant and longer in the clear version, but two of the three constructions which did not show the predicted effect are in this group: 6) tag questions and 11) to-datives. Thus, neither redundancy nor length can account for the results. Theoretical

interpretation

One important theoretical question is whether there are several different factors which can be involved in clarifying sentential relations or whether there is only one unifying factor. If there are several ways in which seritential relations can be clarified, one of which is to produce a sentence version which is closer to the deep structure representation, then the present results provide evidence that subjects have knowledge of deeper levels of syntactic representation than the surface level. If there is only one factor that is involved, the present results cannot be interpreted in this way, because there

170 Virginia Valian and Roger Wales

exist examples where clarifying sentential relations is not the same as producing a transformationally less distorted sentence. Although it seems plausible that clarifying sentential relations should be multifactorial, this experiment cannot decide between the two possibilities. A separate matter of interpretation concerns the task used in Experiment I. The instructions to subjects did not give any explicit directions about how to insure that their listener would hear and understand them, because the methodological goal was to simulate an actual What? situation as closely as possible. It was assumed that subjects would interpret the instructions and the subsequent queries as requests to make the sentence clearer and simpler to understand. The assumption was tested by attempting to replicate the findings of Experiment I by using different instructions, which explicitly asked the subject to make the sentence as clear and simple to understand as possible. A similar pattern of results would suggest that subjects do interpret the What? query in part as a request for syntactic clarification and simplification.

Experiment

Muterids,

II

uppamtus,

These were identical

dcsigjl

to Experiment

1.

Procedure

Only the instructions and the experimenter’s queries differed from Experiment I. Subjects were told that the experimenter was interested in the everyday situation that occurred when the listener asked for repetition because what was said was not as simple and clear as it could have been. Thus, the experimenter would often say “Again” to the subject, at which time the subject “should take a few seconds and think of how to say the sentence more simply and clearly”. If the sentence was already clear and simple then the subject should repeat it as first read. Finally, the subject was reminded that the goal was to make the sentence as clear and simple to understand as possible.

Subjects hearing.

were

twenty

linguistically

naive

paid

volunteers

with

normal

When talkers clarify sentential relations

171

Scoring

This was identical

to Experiment

I.

Results

As Table 2 shows, there was a smaller tendency than in Experiment I to repeat the sentence as read. Overall in Experiment I, 55% of the responses were stay responses, whereas in Experiment II, 45% were stay responses. When changes were made in the clear versions, they were equally likely to be an unscorable response as a distorted version, which is an exaggerated version of Experiment I results. When changes were made in the distorted versions, they were 3% times more likely to be a clear version than an unscorable response. This is again an exaggeration of Experiment I results. The major prediction that the main effects and interaction from Experiment I would be duplicated in Experiment II was confirmed. Table 3 gives the mean response scores to clear and distorted versions for each construction type, separately for sentences and items. The main effect for syntactic form was highly significant, F, (1,19) = 93.27; Fz (1,91> = 199.09; F’,in (1,39) = 63.51,~ < 0.001. The effect of construction type just missed significance, F, (12,228) = 3.75, p < 0.001; F, (12,91) = 3.48, p < 0.001; F’min (12,252) = 1.8, p > 0.1. The interaction between construction type and syntactic form was significant, F, (12,228) = 26.53; F2 (12,91) = 19.98, Ffmin (12,228) = 11.4, p < 0.001. As can be seen from Table 3, the same ten constructions showed the predicted effect of syntactic form in both experiments, but there are higher scores for both the clear and distorted versions in Experiment I. Individual F’,i,s are not reported because the effects are so similar to those of Experiment I. The only difference of note was that an additional construction, 3) object noun phrase complement, was significant, F’min (1,22) = 14.16,~ < 0.005. The effect of experiment was tested both with subjects repeated across syntactic form and construction type and nested within experiment, and with items repeated across syntactic form and experiment and nested within construction. The effect of experiment was significant, F, (1,38) = 6.45; F, (1,91) = 17.67; F’mi, (1,67) = 4.73, p < 0.05. There were no significant interactions involving experiment. The main effects and interaction reported above for Experiments I and II were significant. Discussion

The results confirmed the prediction that the effect of syntactic form would also be present under different instructions, suggesting 1) that subjects are

172

Virginia Valian and Roger Wales

interpreting “What?” in part as a request for syntactic clarification and 2) that the syntactic knowledge is available when explicitly demanded as well as when implicitly requested. One consequence of the instructions which is not reflected in the pattern of results presented above was the change in subjects’ intonation patterns. In Experiment I subjects raised their voices and spoke more distinctly; that behavior was completely absent in Experiment II. This suggests that talkers interpret a “What?” as a request for clarity at all levels, but a request to simplify is interpreted only structurally and lexically. The significant difference between the two instructional conditions is understandable in light of the emphasis placed in Experiment II on making the sentence as clear and simple to understand as possible, rather than on responding naturally, as was the case for Experiment I. The fact that no interactions involving.experiment were significant confirms this interpretation. Given the similarity between Experiments I and II, it might be objected that subjects would prefer the clear version in any task involving simplicity, reducing the interest of the results from Experiments I and II. A similar objection might be that subjects could be operating under a much cruder notion of simplicity than that proposed here, so that asking them just to choose the simpler of two sentences would produce the same pattern of results*. This was tested in Experiment III. It was expected that some, but not all, constructions would show the same effect of syntactic form as was found in Experiments I and II.

Experiment

III

Method Materials

The same materials, minus the filler sentences, were used from For construction 13), double-agent passives, four additional constructed to contrast with the passives; they were constructed the lexical items in the subject and prepositional noun phrases repeating the sentence. The clear and distorted versions of were typed on index cards. Half the time the top sentence version; half the time it was the distorted version.

*This suggestion

is due to J. Fodor.

Experiment I. actives were by changing but otherwise each sentence was the clear

When talkers clarify sentential relations

Procedure

173

and design

Each subject was given a stack of cards in the same random order and asked to indicate on a score sheet which of the two sentences on each card was simpler, by placing a T (for top) or B (for bottom). They were also given the option of placing an S, indicating that the two versions were the same, neither simpler than the other. No further instructions were given. The task required 10 to 15 minutes. Scoring

Subjects’ scores were computed by adding the number of times each version was preferred for each construction; a constant of 1 was added to each sum. Item scores were computed similarly. Thus, for each subject for each construction there were 2 numbers, representing the number of times (s)he chose the clear version (plus 1) and the number of times (s)he chose the distorted version (plus 1). S responses were eliminated from the analysis. The same was true for item scores. Subjects

Subjects

were 20 linguistically

naive volunteers,

some of whom were paid.

Results

Table 4 presents the mean preference frequencies for clear and distorted versions for each construction type, for both subjects and items. The overall Table 4

Mean frequency Experiment III

Construction*:

1

2

preferences

3

4

for clear and distorted sentence versions in

5

6

7

8

9

10

11

12

13

Subjects

Clear Distorted

3.50 5.30

3.25 5.65

4.25 4.50

6.65 2.65

5.30 3.75

1.60 7.25

4.30 3.40

3.70 4.10

6.30 1.65

4.55 3.00

2.85 5.55

7.80 2.00

8.95 1.00

Sentences

Clear Distorted

7.25 11.75

6.63 12.63

9.13 9.75

15.13 5.13

11.25 7.50

2.37 16.25

8.50 6.50

8.71 9.71

14.25 2.63

9.87 6.00

5.37 11.75

18.13 3.50

20.87 1.00

*See Table

1 for name of each construction

and examples.

effect of syntactic form was significant for subjects, F, (1,19> = 4.34; p < 0.05, and items, F, (1,91) = 19.01, p < 0.001, but not when F’mi” was computed, F’min (1,28) = 3.53, n.s. There was a significant effect of construction type, F, (12,228) = 4.54; Fz (12,91) = 11.4; F’,h (12,3 17)

174 Virginia Valian aml Roger Wales

= 3.25,p < 0.001.

There was also a significant interaction between form and construction, F, ( 12,228) = 3 1.99; rz (12,9 1) = 20.01; F’min ( 12,205) = 12.31,~ < 0.001. Although the results cannot be compared directly with the results of Experiments I and II, because frequency rather than mean score is used, the directional differences can be compared. In 5 of the 13 constructions the clear/distorted ratio differs from Experiments I and II. For constructions 1) relative, 2) relative with copula, 3) object noun phrase complement and 8) deleted questions, the distorted version is preferred to the clear; this preference is at variance with that shown in Experiments I and II. For construction 6) manner adverbials, the clear version is preferred to the distorted version, again unlike Experiments I and II. Thus, the effect of syntactic form is considerably different in Experiment III. Another measure of similarity is extent of correlation. In each of Experiments I, II and III, each construction was ranked according to the difference in score between the clear and distorted versions. The difference ranks were used to compute correlations between Experiments I and II and between Experiments I and III *. Although the Kendall rank correlation coefficient between Experiments I and II was highly significant (7~~ = 0.90, Z = 4.29, JJ < 0.001; ritems = 0.87, z = 4.14, p < O.OOl), the weaker correlation between Experiments I and III was also significant (TS, = 0.73, Z = 3.48, p < 0.001; Titems = 0.67, z = 3.17, p < 0.001). A Kendall partial rank correlation coefficient was calculated to determine whether the correlation between Experiments I and II could be due to effects of Experiment III. If this were the case, the new correlation coefficient should be quite small. Instead, the coefficient was large (7XY,z(sS) = 0.74, rXY.z itelna = 0.73). Although the coefficient cannot be tested for significance (Siegel, 1956), the amount of remaining correlation is ‘more than would be expected if the variables determining Experiment III results were the only source of commonality between Experiments I and II.

Although not significant by the F',i,statistic, there was nevertheless a small effect of syntactic form. Inspection of the individual constructions suggests, however, that this is due to very large effects for some constructions, rather than being a consistent phenomenon as in Experiments I and II. For example, in Experiments I and II only three constructions out of thirteen failed to show the predicted effect, whereas in Experiment III six *WC

thank an anonymousrcviewtx

for this

suggestion

When talkers clarif41 sentential relations

175

out of thirteen failed. More significant than the overall results, however, is the fact that the pattern of results apparent in Experiment I and mimicked exactly in Experiment II is quite different in Experiment III. The correlation results confirm this interpretation. Subjects in Experiment III seemed to equate shorter length with greater simplicity. In the ten cases where one version was shorter than the other, subjects chose the shorter version as simpler eight times. This contrasts with the behavior of subjects in Experiments I and II who chose the shorter version four out of ten times. Thus, it seems clear that the results of Experiments I and II are not obtainable under any and all conditions. Nor does a crude notion of simplicity explain the behavior of subjects in Experiments I and II.

Conclusions Taken together, the three experiments suggest that talkers’ behavior in a What? situation is syntactically uniform: talkers interpret the What? query as a request for clearer sentential relations and modify their speech accordingly. In most cases the clarification is equivalent to the production of a sentence less transformationally derived than the original, and the results therefore indirectly support a relative derivational theory of complexity. The new paradigm presented here seems successful in bringing out talkers’ structural knowledge by employing a formalized version of a natural situation. The experiments demonstrate that the apparently diverse responses that subjects could make in such a situation are in fact systematically ordered. Finally, the experiments demonstrate that abstract linguistic generalizations play an important role in everyday speech.

References Akmajian, A. & Heny, F. (1975) An introduction to the principles of trarrsfbrmattonal syntax. Cambridge, M.I.T. Press. Bever, T. C;., Lackner, J. R., & Kirk, R. (1969) The underlying structures of sentences arc the primary units of immediate speech processing. Pert. Psychophy., 5, 225-23 1. Bever, T. G. & Langendoen, D. T. (1971) A dynamic model of the evolution of language. Ling. I/q., 2. 433463. Blumenthal, A. L. & BoakeT, R. (1967) Prompted recall of scntcnces, a further study. J. verh. Learn. verb. Beh., 6, 614--616.

Bock,

J. K. &. Brewer, W. F. (1974) Reconstructive tures. J. exp. Psychol., 103, 831-843. Burt, M. K. (1971) From deep to surface structure.

recall m sentences New York, Harper

\rith & Row.

alternative

surface

struc-

176

Virginia Valian and Roger

Wales

Chomsky, N. (1957) Syntactic structures. The Hague, Mouton. Chomsky, N. (1964) A transformational approach to syntax. In J. A. Fodor & J. Katz (Eds.) The structure of language.Englewood Cliffs, N.J., Prentice-Hall. Chomsky, N. (1965) Aspects of the theory ofsyntax. Cambridge, M.I.T. Press. Clark, H. H. (1973) The language-as-fixed-effect fallacy, A critique of language statistics in psychological research. J. verb. Leurn verb. Beh., 12, 335 359. F’iengo, R. (1974) Semantic conditions on surface structure. Unpublished doctoral dissertation, M.I.T. Fillmore, C. (1965) The indirect object construction in L’nglish and the ordering of transformations. The Hague. Mouton. Fodor, J. A., Bever, T. G., & Garrett, M. (1974) The ps_vchology of language. New York, McGraw-Hill. Fodor, J. A. & Garrett, M. F. (1967) Some syntactic determinants of sentential complexity. Pert. Psychophy., 2. 289-296. Garrett, M. F. (1975) The analysis of sentence production. In G. Bower (Ed.), The psychology of learning and motivation: advances in research and theory, vol. 9, New York, Academic Press. Hakes,

D. T. (1972) Effects of reducing complement constructions on sentence comprehension. J. verb. Learn. verb. Bell., 11. 278-286. Jackendoff, R. & Culicover. P. (I 97 1) A reconsideration of dative movements. Ebundufions of Lunguuge, 7, 397412. Jarvella, R. Starting with psychological verbs. Paper prescntcd at the Midwestern Psychological Association, Cleveland, May, 1972. Katz, J. J & Postal. P. M. (1964) An integrated theory ofhnguisfic descriptions. Cambridge, M.I.T. Press. Levelt, W. J. M. (1970) A scaling approach to the study of syntactic relations. In G. B. Flares d’Arcais & W. J. M. Levelt (Eds.), Advances in psycholinguistics. New York, American Elsevier. Mehler, J. (1967) Some effects of grammatical transformations on the recall of i:nglish sentences. J verb. Learn. verb. Beh., 6, 335-338. Miller, G. A. (1962) Some psychological studies of grammar. Amer. Psycho/.. 20, 15-20. Roscnbaum, P. (1967) The grammar of English predicate complement constructions. Cambridge, M.I.T. Press. Ross, J. (1967) Constraints on variables in syntax. Unpublished doctoral dissertation, M.I.T. Siegel, S. (1956) Non-parametric statistics. New York, MC Craw-Hill. Smith, C. (1964) Determiners and relative clauses in a generative grammar of English. Lung., 40, 37-52.

Nous avons fait I’hypothi.sc qu’un locuteur clarifie les relations phrasistiqucs d’un &once si un auditcur indiquc unc difficult& i entendre ou i comprendrc celui-ci. Lcs sujets dc I’expdrience lisent dcs phrases d&form&cc ct dcs phrases syntactiqucment claircs i I’exp&imcntcur, situ& dans unc pibcc contigiie. L’expe’rimenteur pose souvent la question “Comment?” Les sujcts changent les phrases deform&s mais conservcnt la version initialcmcnt luc dcs phrases claircc. A d’autrc\ \ujcts, on a demand6 de rcndrc Its phrases claires et simples h comprendre. Les rdsultats obtenues sont les mc^mes. Les locuteurs scmblcnt interpreter le “Comment?” comme une demande dc clarification de relations phrasistiqucs. Ces resultats montrent quc le locuteur connaft la structure sous-jacente des phrases. On peut rcjeter d’autres explications dc ce processus. On prdscnte une “relative” thtorie de la complexitd des &non&s.

Cognition,

4 (1976)

3

177-187

@Elsevier Sequoia S.A., Lausanne - Printed in the Netherlands

Semantic bias effects on the outcomes of verbal slips

MICHAEL

T. MOTLEY

Department of Speech Communication, California State University, Los Angeles BERNARD

J. BAARS

Department of Psychology, University of California, Los Angeles

Abstract It has been shown previously that spoonerisms (such as barn door + darn bore) can be elicited by having subjects attempt to articulate a target (barn door) preceded by bias items which contain at least the initial phoneme (Id/) of the desired error outcome. Since certain linguistic characteristics of the error outcomes differ from those of their targets, variables which affect only these ‘outcome’ properties in a systematic way can be shown to be the result of prearticulatory output processes, independent of perceptual ‘target’ properties. The present study shows that the base-rate of errors produced by the phonetic bias technique can be increased dramatically by adding, to the word-pairs preceding the target, some items which are semantically synonymous to the error outcomes of the target. In this way, it is demonstrated rigorously that semantic bias increases the likelihood of slips of the tongue; which is one of the defining properties of so-called ‘Freudian slips’. Implications are discussed. The study of verbal slips has become a fruitful approach to an understanding of the cognitive processes of speech/language production (Boomer and Laver, 1968; Fromkin, 1973; Wickelgren, 1969). The basic rationale for this approach is that verbal slips represent breakdowns in normal language encoding; thus, if we can determine the nature of the specific encoding breakdowns responsible for these errors, we can better determine the kinds of processes operative in normal encoding. Ever since Freud’s (1965) discussion of verbal slips, psychologists and linguists have wondered whether semantic variables can influence the production of speech errors. Freud’s view of verbal slips was, in effect, a

178

Michael T. Motley and Bernard J. Baars

prediction that semantic influences which are independent of the intended utterance can effect a distorted utterance which more closely represents the meaning of the semantic interference than of the intended verbal output. Although the notion of the ‘Freudian slip’ has enjoyed intuitive popularity, there has been no replicable empirical evidence of this phenomenon, and there has been very little evidence of the specific speech encoding processes which could account for this type of verbal slip. More specifically, there has been no experimental evidence that the kinds of semantic considerations present in higher and more central stages of encoding (e.g., semantic stages) can distort the phonological and/or articulatory stages. Since evidence of Freudian slips has always been anecdotal, corresponding theories have been necessarily post hoc. In recent years, researchers have succeeded in artificially eliciting slips of the tongue (Baars and Motley, 1974; Motley and Baars, 1975b), now allowing a precisely replicable investigation of the potential of semantic bias to influence verbal slips. We have investigated this question via a laboratory technique which elicits spoonerisms. A spoonerism is a verbal slip in which speech sounds (phonemes) are switched with one another; for example, the intended utterance, blue chip stocks accidentally spoken as blue chop sticks. The spoonerism has been an especially popular type of verbal slip for psycholinguistic research, partially because of the clarity of its mutilation (see MacKay, 1970, 197 1; Motley, 1973). The technique for laboratory elicitation of spoonerisms (to be detailed below) consists of a tachistoscopic presentation of a word-pair list. The word pairs are read silently by the subject, with the exception of certain word pairs which are cued to be spoken aloud; these being target word pairs designed by the experimenter to elicit spoonerisms. The target word pairs are preceded by ‘interference’ word pairs (read silently) which are designed to more closely resemble the phonology of the desired spoonerism error than the phonology of the subject’s intended target. (For example, the target word pair fruit fly - to elicit the spoonerism flute fry - might be immediately preceded by interference words such as flat freight and flag fraud.) This technique elicits spoonerisms on approximately 30% of the target word pairs attempted by the subject. (For details and variations of this technique, see Motley and Baars, 1975b.) Previous research with this procedure has demonstrated that frequencies are affected by certain characteristics of the spoonerism spoonerism error itself, independent of the characteristics of the target. Motley and Baars (1975a) demonstrated, for example, that spoonerism frequencies increase according to the transitional probability of the initial phoneme sequence of the error: spoonerism frequencies increase for errors with higher word-initial phonotactic probabilities. Baars, Motley, and

Semantic bias effects on the outcomes of verbal slips

179

MacKay ( 1975) demonstrated that spoonerism frequencies increase according to the lexical legitimacy of the error, independent of the lexical characteristics of the targets: spoonerism frequencies are greater for lexically legitimate errors than for lexically anomalous errors, regardless of their targets. The results of these earlier studies indicate that in this laboratory task, the cognitive processing which precedes articulation involves not only a consideration of the target, but also a consideration of its recoded (spoonerized) phoneme sequence. Moreover, these recoded, or restructured, phoneme sequences appear to be evaluated (‘edited’) according to their phonotactic and lexical characteristics. Thus, this laboratory technique has provided evidence of certain kinds of editing operations upon phoneme sequences destined for articulation (see Motley and Baars, 197%). The present study examines the possibility that in addition to phonotactic and lexical editing, semantic editing may also exist in the prearticulatory phase of speech encoding. A semantic analog to our earlier research would be a prearticulatory evaluation of the semantic characteristics of target and recoded phoneme strings, promoting those phoneme sequences which are semantically appropriate, and/or inhibiting the phoneme sequences which are semantically inappropriate. We sought evidence of such prearticulatory semantic evaluations via an adaptation of the spoonerism-generating laboratory technique. Specifically, the procedure was adapted to our present concern by adding, to the interference word pairs, semantic relatives of the spoonerized version of the target. That is, to the original phonologically interfering word pairs, we simply added interference word pairs similar in meaning to the expected spoonerism. (Pilot studies indicated that with semantic interference alone, the task can produce spoonerisms, but with a very low yield. Apparently, phonological interference instigates the phonological recodings upon which phonological, lexical and potential semantic edits may operate. For further discussion, see Motley and Baars, 1975c.) Thus, the hypothesis: Frequencies of spoonerisms will be significantly greater for word-pair targets preceded by both semantic and phonological interference than for targets preceded by phonological interference only.

Procedure

Subjects were 44 experimentally naive students of a required introductory communication course at California State University, Los Angeles. Ages ranged from 17-43 (median age = 21). The native language of all students was English.

180

Michael T. Motley and Bernard J. Baars

The subjects’ task was to participate in the spoonerism elicitation procedure outlined above. A list of 264 word pairs was tachistoscopically presented by a memory drum (Lafayette, Model 303BB). Each word pair was exposed for one second, with less that 0.10 set between exposures. The cue for subjects to speak aloud the target word pairs (and certain neutral control word pairs) was a buzzer which followed the conclusion of each target’s exposure by about 0.5 sec. That is, the subjects were instructed to read the word pairs silently; but upon hearing the buzzer, to speak aloud the word pair which had immediately preceded the buzzer. The subjects were also asked to attempt to retain the entire word-pair list for an ensuing (fictitious) recall task. The post-exposure cue and the fictitious recall task were designed to maximize the subjects’ attention to all exposures. Pilot study post-experiment interviews indicated that subjects of this task are aware of producing errors, but are usually unaware of the precise nature of their errors (i.e., spoonerisms), are totally unaware of the relationship of those errors to earlier word pairs (i.e., phonological or semantic interference items), and usually do not suspect that their errors are a primary variable of the study. (By design, subjects tend to assume that the dependent variable is retention.) Subjects also report an inability to predict the buzzer. Typically, the subjects’ strategy is to attend to and retain each word-pair exposure, independent of the others, just long enough to determine whether the buzzer will cue that word pair; and then to switch attention to the next word pair. Word lists

Two word-pair lists were constructed; one for the Semantic Interference treatment, and one for the Semantically Neutral treatment; identical except for semantic bias items. The word-pair list for the Semantic Interference treatment contained 20 target word pairs as potential spoonerisms, each preceded by eight interference word pairs (actually, 4 interference pairs, each presented twice). Spoonerism target word pairs were designed such that a spoonerism switch of the w.ords’ initial consonants would result in a new meaningful English word pair. (For example, the target word pair Zight rake would be expected to spoonerize to right lake.) Interference preceded the target in the form of two word pairs (each presented twice) containing phonological interference, and two word pairs (each presented twice) containing semantic interference (see Figure 1). The phonological interference words were constructed such that the initial consonants and subsequent vowels of the first and second interference words were identical to the corresponding phonemes of the expected spoonerism. (For example, the

Semantic bias effects on the outcomes of verbal slips 18 1

target word pair pine fig - expected to spoonerize to fine pig - might be preceded by interference word pairs such as fire pit and five pills.) The semantic interference word pairs were constructed such that their meaning would be similar to that of the expected spoonerisms, while independent of the meaning of their target word pairs. (For example, the target word pair get one - expected to spoonerize to wet gun - might be preceded by interference word pairs such as dump rijle and moist pistol.) Each of the 20 interference and target word-pair sets were arranged in the following order of presentation: 1) & 2) a semantic interference pair and its repetition, 3) & 4) a phonological interference pair and its repetition, 5) & 6) a semantic interference pair and its repetition, 7) and 8) a phonological interference pair

Sample from word-pair lists, with explanation.

1.

Figure

Semantic interference

treatment

Word pair list (in order of presentation)

Cued

Expected oral response

Word pair list (in order of presentation)

Cued

Expected oral response

A

no

none Golf ball none Pencil point none

a

no

none Golf ball none Pencil point none

B

Golf ball Golf ball Tan siphon Pencil point Chisel tip

yes no yes no no

Semantically neutral treatment

none Pale braggart

c

Wait race Wait race

no no

none none

d

Celery scare Celery scare

yes no

Celery scare none

e

Waste rain Waste rain

no no

none none

f

Rage weight

yes

Wage rate (?)

C

Wait race Wait race

no no

none none

D

Salary Salary

yes no

Salary none

E

Waste rain Waste rain

no no

none

Rage weight

yes

Wage rate

F

none

yes no

yes

none Pay bracket

b

yes

scale

yes no

Pale braggart Pale braggart

Pay bracket Pay bracket

scale scale

Golf ball Golf ball Tan siphon Pencil point Chisel tip

no

Designed A&a : B : b :

C&c D&d: E&e 1:&f

function of word pairs: neutral control words to minimize predictability of cues and patterns semantic interference toward spoonerism of F phonological relative of B, semantically neutral to spoonerism off : phonological interference toward spoonerism of F and f same as B & b, respectively : sameasC&c : spoonerism target pair (hypothesis predicts greater frequency of spoonerisms

for F than for f:

182

Michael T. MOtie), ad

Betward J. Baars

and its repetition, 9) the spoonerism target word pair. Items #2. ~5, and #9 were cued to be spoken aloud. Each set of interference and target items was separated by four to seven neutral control word pairs, some of which were randomly assigned to be repeated and/or cued to be spoken aloud. (These neutral control word pairs served as ‘fillers’ to prevent the subjects from noticing a pattern by which to predict the cue for the spoonerism target word pairs.) The word-pair list for the Semantically Neutral treatment was identical to the word pairs of the Semantic Interference treatment in every respect except one: Semantic interference word pairs of the Semantic Interference list were replaced in the Semantically Neutral list with word pairs semantically unrelated to the expected spoonerisms. These semantically neutral word pairs were designed to be quite similar in phonology to their semantic interference ‘mates’ on the word list of the Semantic Interference treatment (to control for any unknown phonological interference from the semantic interference word pairs). Figure 1 provides a detailed exemplification of the word-pair lists for both treatments. The Appendix provides a complete list of the targets. their expected spoonerisms, and their semantic interference items. A counter-balanced within-subjects design was employed. Each subject’s trial consisted of a performance on the first half of the word list of one treatment followed immediately by the second half of the word list of the other treatment. Analysis Analysis consisted of recording and comparing, for each subject, the number of spoonerism errors commited on target word pairs in each of the two treatments. That is, each subject’s spoonerism frequency on his/her 10 targets of the Semantic Interference word list was compared with his/her spoonerism frequency on the 10 targets of the Semantically Neutral word list. Two kinds of spoonerism errors were recorded: ‘complete’ spoonerisms, in which the initial consonants of both target words switched with one another (e.g., bad sunz + sau’ hum); and anticipations, or ‘partial’ spoonerisms (e.g.. bud sum+ sud . ..). Results The 44 subjects committed a significantly greater number of spoonerism errors for the Semantic Interference treatment than for the Semantically

Semantic bias effects on the outcomes of verbal slips

183

Neutral treatment. Specifically, 75 spoonerisms (completes and partials) occurred under the Semantic Interference treatment, versus 27 spoonerisms for the Semantically Neutral treatment, with 31 subjects performing as predicted (T(34) = 30, 10 ties, p < 0.001; Wilcoxon Signed Ranks test). Comparing complete spoonerisms only, these frequencies were 27 and 9, respectively, with 20 subjects performing as predicted (T(25) = 60, 19 ties, p < 0.01). For partial spoonerisms only, 48 versus 18, with 23 subjects performing as predicted (T(26) = 25.5, 18 ties, p < 0.001). We may thus accept the hypothesis. A post hoc analysis of the number of spoonerisms generated by each of the 20 targets indicated that the above results may be generalized to all of the target pairs. For example, of the 75 spoonerisms occuring in the Semantic Interference treatment, 6 spoonerisms were yielded by each of 2 targets, 5 spoonerisms for each of 3 targets, 4 spoonerisms for 5 targets, 3 spoonerisms for 8 targets, and 2 spoonerisms for 2 targets; x2( 19) = 6.87, p > 0.05. The Appendix lists these frequencies for each target word pair.

Discussion The results of the study demonstrate that the speech encoding systems of the experimental subjects were sensitive to semantic influence from the ‘semantic interference’ word pairs. Notice that the increase in frequency for semantically biased spoonerisms must be due to some semantic process occurring ufter the target has been recoded into the corresponding slip, since the semantic bias relates only to the outcome of the error, not the target itself. That is, these results could occur only if a semantic evaluation were performed on the spoonerized version of the targets. (Motley and Baars, 1975b, have demonstrated that these laboratory generated spoonerisms are not the result of reading errors or other errors of the targets’ input into the encoding process, but rather are clearly the result of errors in attempting the targets’ articulatory output.) These semantic evaluations of the recoded targets served either to inhibit the eventual articulation of semantically anomalous phoneme sequences, or to facilitate the articulation of semantically appropriate phoneme sequences, or both. The implications of these results fall into two categories - implications regarding the causes of naturally-occurring verbal slips, and implications regarding normal, errorfree language and speech encoding. Semantic influence such as that which facilitated our laboratory-induced spoonerisms might be responsible for some naturally-occurring verbal slips, as well. It seems unlikely, however, that such semantic influence is a primary

184

Michael T. Motley and Bernard J. Baars

catalyst in most naturally occurring spoonerisms. Opportunities for the manifestation of semantic influence in spoonerisms would be rare, since this would call for a very exceptional situation in which a phoneme switch between the two words would accidentally create two new words which are closer to the semantic intent (or context) than were the originally intended words. Nevertheless, it now seems possible that, at least under optimal conditions, semantic influences might facilitate a natural spoonerism. We would also expect that some sort of semantic interference similar to that evidenced in this study might be operative in at least some naturallyoccurring verbal slips of the type discussed by Freud. That is, we see the present study as empirical support for the notion of ‘Freudian slips.’ There are, however, a number of differences between the type of verbal slip discussed by Freud and those generated in the present study: generally, in Freud’s verbal slip examples, the source of phonological mutilation is not readily apparent from the phonological context of the intended utterance; and this is certainly not the case with spoonerisms. Moreover, the semantic interference in Freud’s examples is supposed to originate from ‘outside’ of the total semantic context of the intended utterance; which was only partially the case for our semantic interference in the experiment, since our interference was within the context of the task and in close proximity to the targets although semantically independent of the targets. (We have recently completed a study which generated errors more analogous to Freudian slips in this respect. Using a single word-pair list containing phonological interference only, male subjects performing the task under threat of electric shock were more likely to make errors such as shad bock + bad shock; while male subjects performing the task administered by a physically attractive, provocatively attired female confederate experimenter were more likely to make errors such as goxi furl + foxi girl. See Motley and Baars, forthcoming.) These differences are less impressive, we feel, than are the similarities between Freud’s verbal slips and those generated in the present study: as in Freud’s theory, the laboratory procedure demonstrates psycholinguistic interference contributing to a distortion of the speaker’s intended utterance. Specifically, the procedure allowed semantic interference to contribute to a distortion of the phonology and subsequent articulation of the intended oral output. Moreover, as in Freud’s view, the phonological distortion results in a meaningful utterance, the meaning of which is in fact closer to that of the semantically interfering information than was the intended utterance, and closer to the meaning of the interference than to the intended utterance. We have seen evidence that prearticulatory decisions may be altered by semantic interference. The discovery that a certain psycholinguistic

Semantic bias effects on the outcomes of verbal slips

185

manipulation can be manifested in the laboratory suggests that the same process occurs in normal language encoding. That is, we are proposing that semantically oriented prearticulatory adjustments of the phonology and articulation of impending speech might take place in the most final stages of psycholinguistic processing. The precise role and operation of this semantic sensitivity is yet unclear. It appears, however, that among the complex of operations responsible for language encoding is a prearticulatory operation which monitors the semantic legitimacy or appropriateness of information destined for immediate articulation. Presumably, the results of this check (via feedback loops, and perhaps feedforward) are typically positive, giving the articulation phase a ‘go ahead’ for articulatory output. What happens in the case of a prearticulatory semantic mismatch is for the present unknown. An intuitively attractive notion, however, is that there are a variety of mismatch types. Some would require that the encoder return to his semantic phase, ‘starting over’ in effect; and some might require a return to the syntactic phase for semantico-syntactic adjustments. On certain occasions, however, the mismatch may consist of relatively minor phonological and/or articulator-y discrepancies between semantic intent and the ‘readied’ articulation. The study suggests that such semantico-phonological discrepancies may be corrected on-the-spot, presumably in prearticulation. To assert the presence of such semantic editing is not necessarily inconsistent with the fact that some natural spoonerisms violate lexical and semantic restrictions (Motley, 1973). As discussed by Motley and Baars (1975~) we may assume that the function of phonological, lexical, and semantic prearticulatory edits in natural speech encoding is to insure that the speech output is (as intended) phonologically, lexically, and semantically appropriate. Similarly, it appears that the prearticulatory edits of subjects engaged in the task of this and similar experiments also function to facilitate output which is phonologically, lexically, and semantically appropriate - appropriate for the language, albeit at variance with the target. Spontaneous spoonerisms may be phonologically, lexically, and semantically appropriate; or phonologically and lexically appropriate, while semantically inappropriate; or phonologically appropriate, while lexically and semantically inappropriate; or (very rarely) phonologically, lexically, and semantically inappropriate (Motley, 1973). Inappropriate output presumably results from a failure in the performance of one or more edits. The causes of these edit failures are as yet unkown. There are indications, however, that one potential source of edit breakdowns is a prearticulatory ‘timing schedule’ which may force an output prior to its completion of the edit operations. Although we cannot yet be certain of the precise manner of operation of prearticulatory semantic editing, we can at least fit one more piece into the

186

Michael T. Motley and Bernard J. Baars

puzzle concerning the process and efficiency of language and speech encoding. Somewhere, somehow, manipulations may be performed on impending articulatory output to alter that output (if necessary) in the direction of semantic information available to the communicator. It is convenient for speakers of the language that these manipulations are rarely noticed; for this implies that either they are rarely necessary, or that when necessary, the manipulations are carried out to the advantage of the encoder. It is convenient for investigators of language processing, however, that these manipulations are occasionally responsible for speech errors. Natural speech errors led Freud to suspect that such operations were present, and through laboratory-induced speech errors those suspicions appear to be confirmed.

References Baars, B. J., and Motley, M. T. (1974) The artificial induction of spoonerisms. In Proceedings of the Milwaukee symposium on automatic control (and autonomous computing). Milwaukee, University of Wisconsin Robotics and Artificial Intelligence Laboratory. Baars, B. J., Motley, M. T., and MacKay, D. G. (1975) Output editing for lexical status in artificially elicited slips of the tongue. J. verb. Learn. verb. Beh., 14, 382-391. Boomer, D. S., and Laver, J. D. (1968) Slips of the tongue. B-it. J. Dis. Comm., 3,2-12. Freud, S. In (1965) A. Tyson (Trans.), Psychopathology oj’everyday life. New York, Norton. Fromkin, V. A. (1973) Slipsof the tongue. Sci Amer., 229, 110-117. MacKay, D. G. (1970) Spoonerisms: the structure of errors in the serial order of speech. Neuropsychol., 8,323-350. MacKay, D. G. (1971) Stress pre-entry in motor systems. Amer. J. Psycho!., 84, 35-51. Motley, M. T. (1973) An analysis of spoonerisms as psycholinguistic phenomena. Speech Mono., 40, 66-71. Motley, M. T., and Baars, B. J. (1975a) Encoding sensitivities to phonological markedness and transitional probability: evidence from spoonerisms. Hum. Comm. Res., 2, 351-361. Motley, M. T., and Baars, B. J. (1975b) Laboratory induction of verbal slips: a new methodology for psycholinguistic research. Paper presented to the Western Speech Communication Association, Seattle. Motley, M. T., and Baars, B. J. (1975~) Toward a model of integrated editing processes in prcarticulatory encoding evidence from laboratory generated verbal slips. Paper presented to the Speech Communication Association, Houston. Motley, M. T., and Baars, B. J. (forthcoming) Effects of cognitive set upon laboratory induced verbal (freudian) slips. Wickelgrcn, W. A. (1969) Context-sensitive coding, associative memory, and serial order in (speech) . beh’avior. Psycho/. Rev., 76, 1-15.

Semantic bias effects on the outcomes

of verbal slips

187

Appendix Target word pairs: Targets precede colon, expected spoonerisms follow colon, and both semantic interference word pairs are in parentheses. The digit in parentheses represents the number of spoonerisms yielded by the target in this experiment. Targets are listed in order of their presentation in the experiment. (5) pick soap: sick pope (ill bishop, stricken priest) (3) get one: wet gun (damp rifle, moist pistol) (3) tame soon: same tune (familiar melody, similar song) (5) gain mole: main goal (prime target, chief purpose) (3) mice knob: nice mob (good group, pleasant gang) (4) sat feet: fat scat (large rear, obese posterior) (5) gum dies: dumb guys (foolish boys, stupid fellows) (2) light rake: right lake (proper river, correct pond) (6) pine fig: fine pig (quality sow, good hog) (4) rage weight: wage rate (pay bracket, salary scale) (6) flute fry: fruit fly (lemon beetle, apple bug) (2) queer dean: dear queen (beloved monarch, precious princess) (4) dig bait: big date (grand event, great day) (3) bad sum: sad bum (moody tramp, gloomy hobo) (3) witch run: rich one (wealthy man, affluent person) (3) ball toy: tall boy (big kid, long lad) (4) bad mug: mad bug (angry insect, irate wasp) (4) keen mutt: mean cut (bad gash, nasty wound) (3) song rack: wrong sack (false bag, incorrect pouch) (3) leak mad: meek lad (gentle son, shy boy)

RPsumC On a montre preckdemment que les ‘spoonerismes’ du type comme “barn door + darn bore” peuvent etre provoques chez les sujets en faisant p&ceder le stimulus cible i articuler (barn door) d’un item biaise contenant au moins le phoneme initial (d) de l’erreur attendue. Etant donne que certaines caract&istiques linguistiques de l’erreur sont diffdrentes de celles du stimulus, on peut montrer que les variables qui affectent systematiquement ce resultat seul, sont induits par des processus prearticulatoires, independants des proprietks perceptives du stimulus-cible lui-mime. L’etude present&e montre que la frkquence de base des erreurs provoquees par la technique de biais phondtique peut augmenter de man&e dramatique lorsqu’on ajoute aux paires de mots p&edant le stimulus-cible quelques items simantiquement synonymes de l’erreur attenduc. De cctte maniere, on demontre rigoureusement que le biais stmantique qui constitue l’une des proprietds du lapsus dit ‘Freudien’, accroit remarquablement son apparition. On discute ensuite les implications de ce phinom&te.

Cognition, 4 (1976) 189-202 @Elsevier Sequoia S.A., Lausanne

4 - Printed

in the Netherlands

Language in the two-year

old

SUSAN GOLDIN-MEADOW MARTIN

E. P. SELIGMAN

ROCHEL GELMAN University

of Pennsylvania

Abstract Two stages in the vocabulary development of two-year-olds are reported. In the earlier Receptive stage, the child says many fewer nouns than he understands and says no verbs at all although he understands many. The child then begins to close the comprehension/production gap, entering a Productive stage in which he says virtually all the nouns he understands plus his first verbs. Frequency and length of word combinations correlate with these vocabulary stages. Young children are widely believed to understand a great deal more than they say (Ervin, 1964; Lenneberg, 1966; McNeil], 1966, 1970). This commonplace may mean a variety of things. For instance, parents frequently report that their children follow verbal instructions like “Go get the diaper and bring it to Mommy” long before the children actually say such sentences. More systematically, psycholinguists have reported that children understand utterances containing certain grammatical constructions before they say these utterances (Fraser, Bellugi & Brown, 1963; Brown & Bellugi, 1964; Love11 & Dixon, 1967; Shipley, Smith & Gleitman, 1969). For example, three-year-olds can point to the picture illustrating “the car is *This research was conducted while the by Grant No. HD-00337. It was supported NICHHD Grant No. 04598 to R. Gelman. E. Webber for their help in data collection; W. Meadow, and M. Shatz for reading earlier tit should be understood that publication editors’ policy for using space in a scientific

first author was an NICHHD graduate trainee supported in part by PHS Grant No. MH-19064 to M. Seligman and We thank E. Kohn, W. Postlewaite, K. R. Seligman, and and H. Feldman, L. R. Gleitman, J. Jonides, J. McClelland, versions of this manuscript. of this article in Cognition is not an endorsement of the journal for political editorials (cf January 1974).

190

Susan Goldin-Meadow,

Martin E. P. Se&man

and Rachel

Gelman

bumped by the train” before they themselves produce passive sentences. Because comprehension of such complex utterance types outstrips production, it has been assumed, though not demonstrated, that this is also true for vocabulary. Here we examine the two-year-old’s comprehension and production of nouns and verbs, and the relationship of his vocabularies to the word combinations he produces. We find that our subjects can be divided into two groups on the basis of their vocabulary comprehension-production data. A member of the Receptive group says many fewer nouns than he understands and says no verbs at all although he understands many. A child in the Productive group says virtually all the nouns he understands, and produces some verbs as well. The Productive child also tends to produce more and longer multi-word combinations th.an the Receptive child. Longitudinal data suggest that the child is a member of the Receptive group before he enters the Productive group. As a result of these data, .we postulate two consecutive stages of language development in two-year-old children. Cross Sectional

Study

Method Subjects arzd general procedure

The subjects were twelve white middle-class children ranging in ages from 14 to 26 months. Each child was tested individually in his own home. The testing was usually completed in three two-hour sessions, all held within a one-week period. Throughout the testing the experimenter encouraged the child and told him he was playing nicely, but no specific feedback was provided. We used two basic procedures, the comprehension and the production tests, to determine each child’s knowledge of both nouns and verbs. Comprehension and production questions for nouns and verbs were randomly distributed throughout the session, with the restriction that comprehension and production questions for a given word never occurred in immediate succession. Two experimenters were usually present during each session. The first experimenter carried out the vocabulary procedures. The second recorded the child’s spontaneous utterances along with their nonverbal contexts and noted general information such as the child’s articulation skills and responsiveness to verbal probes. A taperecorder was operated during the session if the second experimenter was not present. The child’s mother often remained in the room during the experiment in order to put the child at ease.

Language in the two-Jlear old

19 1

Stimuli

There were 70 nouns and 30 verbs presented at least twice to each child, once in the comprehension test and once in the production test (see Table 1). The sample words were selected as representative of vocabularies of two-year-old children on the basis of pilot work done by one of us (M.E.P.S.). In this at-home study, several children were followed for five to eight consecutive days while the experimenter assessed as exhaustively as possible their entire receptive and productive vocabularies. We further restricted the set of test items by including a noun in the study only if a toy could be found to represent that noun. Each of the nouns on the sample list was consequently represented by a familiar toy. The verb list was limited to verbs which could be portrayed in action either by a person or by a toy. Comprehension Test For each of the test nouns the child was asked, “Where’s (point to, show me, “, where the designated item was one of a large set or bring me) the ___ of toys or part of the body. During the session, all of the toys were randomly spread on the floor around the child. Thus, for each comprehension question the child had to choose one object from among approximately 70 different objects. For verbs, we asked the child to perform the action indicated by the verb or to make a toy perform the action. For example, the child was asked to “Make the doll lie down”, or to “Lie down” himself. Which of these instructions was given depended on the child’s willingness to co-operate and the appropriateness of the agent for the action in question. Furthermore, for transitive verbs we asked the child to perform the action on an atypical object in order to prevent him from guessing the meaning of the verb from the accompanying noun. Thus the child was asked to “Cd the bear” or to “Drink the barrel”*. The response was counted correct if the child performed the action referred to by the verb, regardless of his choice of object or agent. Consequently, the child could either eat the bear or have the bear eat something, in order to receive credit for comprehension of the verb “eat”. The items in the comprehension test were usually presented only once. However, an item was repeated at least twice** whenever the child either

*We reaiize that we are complicating the task by asking the child to perform the action on an unlikely object. However, Shatz (1975) has shown that young children arc quite willing to respond with action to such bizarre requests as “‘Why don’t you put your shoes on your cars?” Indeed, we found the children cager to co-opcratc and to respond to our somewhat unusual questions. **Our intent was to continue rctcsting an item for as many as five trials. In practice, wc seldom repeated the test item this often because the child usually lost interest in the item.

192

Susan Goldin-Mtiadow,Martin E. P. Se&-man and Rachel Gelman

Table 1.

Number of cross-sectional children responding to each vocabulary item. The first number in parentheses shows how many children out of 12 knew the item receptively; the second number shows how many knew the item productively.

A. Nouns Parts of the Body foot (l&5) head (l&5) hair (11,7) mouth (11,5) hand (10,5) teeth (163) finger (9,4) arm (933) lips (7,l) tongue f7,l) knee (52) elbow (42) thumb (4,l) armpit (WI)

Articles of Clothing hat (12,9) sock (11,7) button (9,g) belt (9,5) pocket (76) scarf (4,l) badge Cl,01

Animals fish cat rabbit bear cow Pig giraffe butterfly

(11,g) (lO,lO) (9,9) (98 f&7) (76) (5,5) GQ)

Parts of the House clock (l&9) chair (l&9) table (l&9) door (11,7) window (11,5) house (l&6) floor (lO,l) Wall

(8,5)

sink lamp

(7,4) (5,4) (3,3) (1,l)

Pot couch

(ll,lO) (l&9)

Letters and Shapes A (535) star (5,4) M (4,4) heart fO,O)

Transitive Verbs eat (l&7) throw (l&3) open (11,7) close (116) kiss (11,4) drink (11,3) blow (11,l) drop (102 flO,l) hug

Transitive Verbs pickup flO,f) shake (92) touch (9,l) wash G3,4) step on @,O) kick (62) push (61) pull C5,l) point to (4,O)

Vehicles airplane train

Food banana orange grape cake cereal sugar mustard

(l&9) flO,g) (106) (9,7) f&5) (g,4) (42)

Miscellaneous Articles ball (l&l 1) pillow (116) scissors (108) flower (167) crayon (10,4) money (9,9) (98) Paper plate (995) mirror @,4) ladder (833) broom (76) ring (6,4) cigarette (3,3) (3~2) flag tire (l,l) stamp Kv)

B. Verbs Intransitive Verbs sit (116) (II,51 jump run (11,3) stand (112) lie down (11,l) fall (9,5) turn around (9,l) dance f&l) (72) fly (52) cry smile (5,O) crawl (3,O)

Language in the two-year old

193

failed to respond to the item, or responded with the wrong object or action. If the child was then correct on one of the retests, he was given credit for that item. Production

Test

To test the child’s ability to produce each of the test nouns the experimenter pointed to the object in question and asked, “What’s this?” To test the child’s ability to produce each of the action verbs the experimenter asked, “What am I (the experimenter) doing” or “What is the doll doing” while the experimenter either performed the action herself or maneuvered a doll to do so. If a child failed to give a conventional response to a test item, he typically did one of two things: either he remained silent or he gave a non-standard answer. In either case, the item was subsequently retested on at least two trials. If the child who was initially silent gave a standard response on a subsequent trial, he received production credit for the item. If, over repeated trials, the child appeared to consistently use his own ‘idiosyncratic’ word which we could interpret, he was likewise given production credit. As an example, one subject consistently said “nite-nite"' for pillow and thus was considered to have a production word for the object pillow. In addition, the child received production credit no matter what form he used in producing an item. For example, the child who said “eating” was treated the same as the child who said “eat”.

Results

Table 1 shows the number of children correctly responding to each productive and receptive item. In general, we found very few incorrect responses and most of the children’s errors were those of omission rather than commission. There was no child who was correct on any given item on the production task and who failed that same item on the receptive task. Word frequency (as measured by Thorndike-Lorge lists from juvenile books) was found to correlate with the number of cross-sectional children who understood each of the 70 nouns (rs = 0.346, p < 0.005) and with the number of children who produced these nouns (rs = 0.220, p < 0.05). Parenthetically, our longitudinal vocabulary data show the same word frequency patterns as these cross-sectional data for both reception and production. That is, the words known by many cross-sectional children (the high frequency words on the T-L list) were the same words acquired early by the longitudinal children. Conversely, the words known by few cross-sectional children (the

194

Susan Goldin-Meadow, Martin E. P. Seligman and Rachel Gelman

low frequency words on the T-L list) were acquired later by the longitudinal children. Table 2 shows the results of the vocabulary tests for each of the 12 children. The children were divided into two separate groups on the basis of these data. The Receptive group of children had ratios of noun comprehension to noun production of 2.7: 1 or greater; that is, they understood almost three or more times as many nouns as they said. In addition, they produced no verbs at all although they understood many. Vocabular)l ratios oj’the children in the cross-sectional study Narllc

Noun Ratio

Verb Ratio

No. Comprehended

No. Comprehended

No. Produced

No. Produced

26 22 21 14

Michael Lcsie 1 Melissa 1 Jenny 1

7.7:l 5.O:l 4.4: 1 2.7: 1

(4616) (35/7) (2215) (27/10)

27 23 24 26 25 26.5 23 26

Ray Sarah Perry Leah Harry Chris Peter Lee

1.5:1 1.3:1 1.3:1 1.3:1 1.2: 1 1.2:1 1.l: 1 l.O:l

(49/32) (41/31) (49/38) (56/43) (56/45) (54/45) (49/43) (54/52)

(22/O)

W/O) ~ ~

(14/O) ( 9/O)

5.3:1 5.n:l 2.8:1 2.7:1 2.9: 1 3.3:l 1.7:1

(21/4) (20/4) (28/10, (27/10) (2318) (23/7) (26/15) 1.8:1 (28/16)

These children stand in contrast to the Productive children who had noun ratios of 1.5: 1 or less; that is, they said almost every noun they understood. They produced verbs although not as many as they understood, and consequently verb ratios were not as low as noun ratios in this stage. There was, however. a correlation between the Productive children’s nouns and verb ratios (rs = 0.695, p < 0.05>, indicating that relatively low noun ratios occurred in children with relatively low verb ratios. In addition to the vocabulary differences between the Receptive and Productive groups, the children differed in their production of word combinations*. Table 3 shows the average length of each child’s multi-word com*Previous research supports our findings of coincident vocabulary and syntax (1973) who found in her data a similar correlation between a jump in word beginnings of phrase constructions.

changes. See Nelson production and the

Language in the two-year old

195

binations, and the longest utterance produced by each child during the testing sessions. The Receptive children’s longest utterances ranged from one to three words, while the Productive children’s longest utterances varied from four to eight words. Furthermore, the noun ratio, which defines the Receptive and Productive groups, was systematically related to the average combination length of these children’s utterances. Specifically, a decrease in noun ratio correlated with an increase in the average length of word combinations (rs = 0.874, p < 0.01). In other words, a prolific producer of nouns tended to be a producer of relatively long utterances. Utterance length has long been accepted as a gross measure of language development. Our data suggest that noun ratio can also be used to measure language development, at least at these early stages. Table 3.

Measures

of utterance length correlated with noun ratio rank Average length of combinations

Longest utterance

1 2 3 4

2 2.09 _

2 3

_

1

5 I 7 7 9.5 9.5 11 12

2.14 2.71 2.48 2.87 3.18 2.34 4.48 4.20

Rank according to noun ratio Michael Lcxie 1 Melissa 1 Jenny 1 Ray Sarah Perry Leah Harry Chris Peter Lee

1

In addition, we found that the Productive child tended to produce more multi-word combinations than the Receptive child: the Receptive children produced between 0 and 6 multi-word combinations per session (mean = 3.25 combinations), while the Productive children produced between 17 and 181 per session (mean = 58.12 combinations). Thus, the Receptive child not only produced shorter, but also fewer word combinations than the Productive child. The two groups of children also differed in their production of ‘idiosyncratic’ words. All of the Receptive children responded with at least one ‘idiosyncratic’ word in the production tests, while very few of the Productive children did. These same Receptive children did, however, respond to the

196

Susan Goldin-Meadow, Martin E. P. Se&man

and Rachel Gelman

conventional word in the comprehension test. For example, Michael responded correctly to ‘clock’, ‘cat’ and ‘train’ on the comprehension test; however, in the production test he called each toy, “tick-tack”, “meow”, and “choochoo”, respectively.

Longitudinal

Study

In order to demonstrate that these two groups constitute two consecutive stages rather than two different types of language learners, we studied three of the children longitudinally over a period of months. The testing procedure was identical to that described above. Results

Table 4 presents the longitudinal data. All three children showed a shift from the Receptive to the Productive characteristics which define the groups found in the cross-sectional study. The three children began at a stage in which comprehension of nouns far exceeded their production, and verbs were understood but not produced. Over time, noun production scores started to catch up to comprehension scores and some of the tested verbs were produced for the first time. As in the cross-sectional study, the longitudinal data suggest that a decline in noun ratios correlates with a decline in verb ratios. Around the time the child began to produce nouns and verbs in earnest, he also increased his production of multi-word combinations. For example, while still in the Receptive stage (Session 2), Lexie produced only five multi-word combinations during one session. He then increased his production to 36 multi-word combinations during Session 4 (one week after he began to produce verbs) and finally produced 111 combinations during Session 6. All the sessions were of approximately equal duration. Melissa said 20 combinations during Session 3, the session when she first began to speak verbs, as compared to two combinations the session before. Jenny went from uttering no combinations to producing some during the session when she first spoke verbs. Thus, around the time the child becomes a full-fledged vocabulary producer he also becomes a producer of a substantial number of word combinations. However, a newly Productive child does not necessarily become an immediate producer of long combinations, as we might have predicted from our cross-sectional data. Lexie did not produce his first four-word utterance until Session 6 (5 weeks after verb production began), and Melissa and Jenny

Language in the two-year old

Table 4. Age

197

Vocabulary ratios of three children followed longitudinally Name

Noun Ratio (Comprehended/Produced)

(Mos. Wks.) la 2 3 4 5 6

Verb Ratio (Comprehended/Produced)

_ _

(22/O) (26/O)

22.0 24.2 25.0 25.1 25.3 26.1

Lexie Lexie Lexie Lexie Lexie Lexie

19.1 21.1 22.0

Melissa 1 Melissa 2 Melissa 3

4.4:1 (22/S) 4.4:1 (40/9) 1.6: 1 (46/29)

(14/O) (16/O) (not recorded but produced verbs spontaneously at this time)

14.0 16.0 17.0 17.1

Jenny Jenny Jenny Jenny

2.7:1 (27/10) 1.7:1(33/19) 1.3:1 (38/29) 1.3:1 (45/34)

(9/O) (14/O) 4.5:1 (18/4) 3.O:l (18/6)

1 2 3 4

a An exhaustive study was done 1 week. Applying our sample list age was 49: 1 (34/7). Thus Lexie’s between the exhaustive study and

5.0:1 3.1:1 2.1:1 1.5:1 1.3:1 l.O:l

(35/7) (54/17) (58/28) (61/40) (61/48) (61/59)

9.O:l 3.9:1 (not (not

(27/3) (27/7) recorded) recorded)

_ _

_

by author M.E.P.S. on Lexie’s noun vocabulary at age 20 months, to these data, Lexie’s comprehension/production noun ratio at this noun ratio remained relatively stable during the two month interval Session 1.

never produced more than two-word combinations during our entire testing period. Our data suggest that the increases in numbers of single- and multiword productions do not coincide with, but rather herald increases in utterance length. We can well imagine that long utterances are difficult to produce without a large vocabulary. Thus, a good sized productive vocabulary may be necessary for increases in utterance length. However, the developmental delay between vocabulary production and length increases in our longitudinal study indicates that a relatively large store of productive nouns and verbs is not sufficient to bring about increases in utterance length. Along with the beginnings of noun and verb production came a decline in idiosyncratic responses on the production test. As each child began to increase his production of nouns and verbs, he also abandoned his idiosyncratic words in favor of more conventional labels. Thus, Lexie in Session 3 not only responded to the word ‘pillow’ on the comprehension test, but also said “pillow” and no longer “nite-nite” on the production test. Two incidental findings bear comment. Despite all of these simultaneous developments in the child’s language, one which was not observed was a con-

I98 Susan Goldin-Meadow, Martin E. P. Seligman and Rachel Gelmarl

current improvement in the child’s articulation. For example, Melissa was a poor speaker during Sessions 1 and 2 and remained so after she began to produce vocabulary words in Session 3. Lexie, on the other hand, was a clear speaker from the very beginning. Lexie’s constant receptive vocabulary size between Sessions 4 and 6 raises the possibility of a ceiling effect; that is, that the change in noun ratios is caused by the child reaching asymptote on the receptive noun list and not by a new productive strategy of vocabulary acquisition. In order to discount the ceiling hypothesis of diminishing ratios, we tested Lexie on a supplementary sample of 40 nouns during the longitudinal study at times 5 and 6. Lexie’s noun comprehension did not approach the ceiling of this new list at either of the additional testings, and his ratios were identical to those obtained with the standard 70 noun sample. These identical results from the new list suggest that presenting the same test lists to one child many times in succession does not produce spuriously low vocabulary ratios.

Discussion Our study substantiates the commonplace with which this paper began: at a certain stage in development children do understand many more nouns and verbs than they say. We then find that, at a subsequent stage in development, children become reasonably competent language producers; that is, they develop a productive skill which works to close the gap between their comprehension and production vocabularies. This skill is tied neither to a particular lexical category nor to single unit utterances, an observation that we now focus on. The Productive

Skill

Lexical C2tegorie.Y We have evidence of the productive skill in both of the two lexical categories included in this study, nouns and verbs. The children increased their noun production almost to the level of their noun comprehension at the same time as they began verb production. In addition, noun and verb production was correlated after the onset of the productive skill; the high noun producers also tended to be relatively high verb producers. Additionally, we found quantitative differences in the extent to which the productive skill affected the two lexical categories. All of the children’s noun ratios were considerably lower than their verb ratios: nouns were easier

Language in the two-J’ear old

199

to produce than verbs. It is possible that this discrepancy between a child’s noun and verb ratios may be due to a fundamental difference between simple nouns and simple verbs, i.e., we can point to a concrete noun’s referent, but not to any instantaneous referent of a verb. Of course, we realize that such differences between nouns and verbs also lead to task differences in our testing procedures. These task differences, however, seem unavoidable and to a certain extent are part of the phenomenon itself. Multi-Unit Combinutions The appearance of the productive skill in single word vocabulary was not an isolated development in the child’s language. At about the time the child became a proficient single-word vocabulary producer, he increased his production of multi-word utterances as well. Thus, the productive skill that develops during the transition from the Receptive to the Productive stages is evidenced in multi-unit, as well as in single-unit utterances. Origirz of the Skill We know from observations of our longitudinal subjects that this newly developed productive skill is not merely an outgrowth of motor devclopment. The motor hypothesis maintains that the child develops sufficient control of his articulatory motor apparatus and thus is able to produce nouns, verbs, and combinations. However, we found no noticeable change in articulation over the course of our longitudinal study. Consequently, the onset of production cannot be explained at a motor coordination level, but must find explanation at another level of analysis, e.g., a linguistic or cognitive level. The Nature of the Production

Vocabular)

The productive skill not only affects the size of the child’s production vocabulary, but the nature of the production vocabulary as well. Before the children became language producers, their production vocabularies were qualitatively different from their comprehension vocabularies. The Receptive children, who understood and responded to the conventional labels for objects, concurrently produced idiosyncratic labels for many of the same objects. Thus, we found that in the Receptive stage the child’s lexical comprehension did not necessarily correspond to his lexical production. This lack of lexical correspondence also appears in the literature on overextension in child language (the extension of a label beyond the conventional definition limit). Huttenlocher (1974) found overextension in the child’s production of words but not in his word comprehension.

200 Susan Goldin-Meadow,

Martin I<.P. Sclipnan and Rachel Geltnan

We find that with the advent of the productive skill, the lexical disparity between comprehension and production disappears; that is, idiosyncratic responses disappear in favor of conventional labels. Bloom (1973) also notes development during the early stages of language acquisition from a noncumulative production lexicon to one that is relatively permanent. Clark (1973) reports similar results in a brief survey of the diary data on child language. According to Clark, the diary studies show that overextension in child language ends as the children experience increases in production vocabulary. Thus, the children’s comprehension and production vocabularies appear to align during the production spurt. We can conclude from these data that, during the Receptive stage, the child’s production vocabulary is not merely a subset of his comprehension vocabulary, but is in part qualitatively different from his comprehension vocabulary. The onset of the productive skill heralds the elimination of this disparity.

Our original commonplace now appears to be somewhat incomplete. Children not only understand more words than they say, but at a certain stage they also understand different words than they say. even for the same referent. This finding has implications for theories of language acquisition in general, and for any particular theory which purports to account for a comprehension/production gap during the two-year-old stage. In general, our data suggest that we cannot easily infer comprehension knowledge from production data, nor can we readily infer production knowledge from comprehension data. If comprehension and production of vocabularies can be out of phase at one period of development, it is entirely possible that comprehension and production of other linguistic skills will not be synchronous at other stages. Thus, we require a description of both comprehension and production knowledge and of the relationship between the two in order to theorize about language acquisition. In particular. a theory to explain the onset of vocabulary production during the two-year-old period must account not only for the changes in production, but also for the lack of changes in comprehension. For example, Bloom’s (1973) explanation of the onset of naming production is based on production data alone; the hypothesis appears insufficient when comprehension data are considered. Bloom would explain the appearance of object and action naming (the productive spurt) in terms of the child’s acquisition of a belief in object permanence. However. any child who is a vocabulary comprehender (regardless of whether or not he is a vocabulary producer)

Language in the two-year old

20 1

must possess some permanent representation of objects and actions in order to understand words. The acquisition of object permanence must therefore coincide with, or precede, the beginnings of vocabulary comprehension. We have found that single word comprehension, and therefore some sort of object permanence, precedes single word production. For these reasons we conclude that the development of object permanence cannot account for the appearance of the child’s new productive skill. Our data suggest that the child’s early production vocabulary is not merely a deficient comprehension vocabulary, but is to some extent a different vocabulary. Two questions are crucial for language acquisition theorists: (1) Why are comprehension and production vocabularies out of alignment during the early stages of language acquisition? and (2) How does alignment of the two vocabularies eventually come about? Any theory professing to account for the comprehension/production gap in early vocabulary acquisition must address these two questions. References L. (1973) One word a? a time. The Hague, Netherlands, Mouton & Co. R. (1973) A first language. Cambridge, Mass., Harvard University Press. R., & Bellugi, U. (1964) Three processes in the child’s acquisition of syntax. In E. Lenneberg (Ed.), New directions in the study of language. Cambridge, Mass., MIT Press. Clark, E. V. (1973) What’s in a word? On the child’s acquisition of semantics in his first language. In T. E. Moore (Ed.), Cognitive development and the r;cquisition of language. New York, Academic Press. Fraser, C., Bellugi, U., & Brown, R. (1963) Control of grammar in imitation, comprehension and production. J. Verb. Learn. Verb. Beh., 2, 121-135. Ervin, S. M. (1964) Imitation and structural change in children’s language. In E. Lenneherg (Ed.), New directions in the stud-v of language. Cambridge, Mass., MIT Press. Huttenlocher, J. (1974) The origin of language comprehension. In R. L. Solso (Ed.), Theories in cognitive psychology. Potomac, Md., Lawrence Erlbaum Associates. Lenneberg, E. H. (1966) The natural history of language. In F. Smith & G. A. Miller (Eds.), The genesis of language. Cambridge, Mass., MIT Press. LovelI, K. & Dixon, E. M. (1967) The growth of the control of grammar in imitation, comprehension and production. J. Child Psychol. Psych., 8, 3 l-39. McNeill, D. (1966) Developmental psycholinguistics. In I:. Smith & G. A. Miller (Eds.), The genesis of language. Cambridge, Mass., MIT Press. Nelson, K. (1973) Structure and strategy in learning to talk. Mono. Sot. Res. Child De&., No. 149, 138 (l-2). Shatz, M. (1975) Towards a developmental theory of communicative competence: The comprehension of indirect directives. Unpublished manuscript. Doctoral dissertation, University of Pennsylvania. Shipley, E., Smith, C. S., & Gleitman, L. R. (1969) A study of the acquisition of language: Free responses to commands, Lang., 45, 322-342. Thorndike, E. L., & Lorge, I. (1944) The teacher’s word book of 3,000 words. New York, Teachers College Bureau of Publications. Bloom, Brown, Brown,

202

Susan Goldin-Meadow,

Martin E. P. Seligman and Rachel Gelman

Nous analysons ici le developpcment du vocabulaire de I’enfant de dcux ans, en drux stades. Au tours I’enfant dit beaucoup moins dc noms qu’il n’en comprend et nc donne du premier stade, “receptif”, aucun verbe bien qu’il en comprenne beaucoup. Ensuite, on assiste a une coordination comprehension/ production et c’est Ic stade “productif” ou I’enfant dit presque tous les noms qu’il comprend et utilise ses premiers verbcs. La friquence des combinaisons de mots ainsi que leur longueur est fonction des deux stades d&its.

Copzition, 4 (1976) 203-214 @Else-&r Sequoia S.A., Lausanne

Discussions - Printed

in the Netherlands

Task-specificity and species-specificity

DANIEL University

in the study of language: A methodological note*

N. OSHERSON of Pennsylvania

THOMAS WASOW Stanford

University

Some linguists and some psychologists find the following questions to be at the heart of their professional interests. (A) What is distinctively human about human intelligence? That is, in what ways is human intelligence the same as, and in what ways is it different from, other logically possible kinds of intellect, including but not restricted to the other biological species and existing computers? (B) In what ways do the several human faculties resemble each other and in what ways do they diverge? That is, what revealing comparisons can be made among the mental systems that constitute the competencies underlying language, logic, ethics, and aesthetics, among others? (A) and (B) are intimately related, both to each other, and to questions of ontogenesis. In this paper we wish to make some methodological remarks about the kind of answers they may receive. Both questions ask for comparisons, in the case of (A) among species of intelligence, in the case of (B) among human faculties. We take it to be obvious that such comparisons, to be revealing, must be made at a sufficiently theoretical level. Superficial differences in two phenomena can obscure deeper similarities that show up only in light of adequate theories for each; the same goes for superificial similarities and deeper differences. So (A) and (B) can be reformulated as follows.

*Preparation of this paper was supported in part by a National Science Foundation grant to the first author and a National Endowment for the Humanities Summer Stipend to the second author. We wish to thank Harris Savin, Lila Gleitman, Julius Moravcsik, and Francis KeiJ for helpful comments on earlier drafts.

204

Daniel N. Osherson and Thomas Wasow

(A*) In what ways is an adequate theory for human intelligence the same as, and in what ways is it different from, adequate theories for other logically possible kinds of intellect? (B’) In what ways do adequate theories for the several human faculties resemble each other, and in what ways do they diverge? So as not to prejudge the answer to (B’) in our statement of (A*), it is best to replace the latter with (A’): (A’) In what ways is an adequate theory for the linguistic capabilities of humans the same as, and in what ways is it different from, adequate theories for the linguistic capabilities of other logically possible kinds of intellect, and similarly for the other human faculties?* The burden of this paper is that both (A’) and (B’) are multiply ambiguous as stated, and that failure to appreciate this can result in confusion. The expression ‘adequate theory’ is responsible for the ambiguity; for nothing has yet been said about the criteria for adequacy. It is to be expected that such criteria can be given only relative to a particular scientific discipline; an ‘ultimate’ sense of adequacy for theories of a human faculty is a chimera. As a consequence, questions within (A’) and (B’) will not, in general, admit of a single answer; rather, they will be answered potentially differently varying with one’s criteria of theoreticai adequacy, i.e., with one’s discipline. These complexities are especially relevant to the study of human language. Theories of this faculty may originate within at least three disciplines, which may be designated ‘physiology’, ‘psychology’, and ‘linguistics’.** Each has its own goals, and thus each has its peculiar criteria for the adequacy of an account of human language. A physiological theory of language would specify the physical structure and processes that mediate language use. A psychological theory would describe the real-time sequence of decisions and information exchanges that figure in the production and processing of language. Within linguistics, such a theory would specify the class of

*To simplify formulation, we stipulate that all species of intellect have the same kinds of faculties, but that species may have zero capabilities within a given faculty. Also, we beg potentially troublesome questions concerning how to individuate faculties in any species (including our own). **These sciences constitute levels of reduction in the following sense: to every event at a ‘higher’ (i.e., more abstract) level there corresponds an event at each ‘lower’ level. The implications of this fact, if any, will occupy us shortly. Note that we do not wish to exclude the possibility that there are other levels at which a theory of language could be constructed. Moreover, we recognize that what we characterize as levels may in fact be different points on a continuum of abstractness of descriptions. finally, we admit to construing the goals of physiology, psychology, and linguistics more narrowly than participants in these disciplines will welcome. This caricature will speed our argument but not contribute to it substantively.

Task-specificity and species-spec(ficity

in the study of language

205

automata (called ‘grammars’) that are sufficient to provide a formal characterization of natural languages. Each of these disciplines will thus have potentially different answers to (A’) and (B’) with respect to language. Physiology will speak to (A’) by comparing the kinds of physical events needed to account for language in humans with those needed to account for language in such other species as Honeybee, Chimpanzee, and IBM 360/70. It will speak to (B’) by comparing the kinds of physical events needed to account for language in humans with those needed to account for, say, ethical judgment and makes the same kind of aesthetic appreciation in humans. Psychology comparisons, except with respect to principles expressed in flow charts and information theory. Linguistics speaks to (A’) by comparing the classes of automata sufficient to formalize all human languages (including the computationally weakest such class, from which ‘optimal’ grammars for human languages are drawn) with those sufficient to formalize nonhuman languages like machine codes and the Waggle Dance of the bee. Linguistics speaks to (B’) by contrasting the classes of automata sufficient to formalize human languages with those sufficient to formalize, say, ethical judgment and aesthetic appreciation. Within all three disciplines the criterion for being a human language is learnability by children in the relatively careless fashion that characterizes normal language acquisition.* These distinctions make a difference because there is likely to be little correlation between the answers for (A’) and (B’) that are supplied by different disciplines. This is true even when both disciplines are concerned with comparisons among the same species and faculties, and even when the disciplines are arranged in a reductionist hierarchy, as are physiology, psychology, and linguistics. To support this claim we invoke a computer metaphor. Imagine two IBM computers of identical model. Both are programmed in IBM Assembly language. However, one machine is programmed to evaluate polynomial functions of degree less than 10, whereas the other is programmed to evaluate polynomial functions of degree between 10 and 20 (assume that these functions are all defined over the same domain, which is bounded so as to prevent overflow; similarly, coefficients are to be bounded). Are the mathematical faculties of these two creatures the *This familiar pretheoretical characterization of the class of human languages leaves the following criterion tacit: to be counted as human a language must be able to express approximately the same range of thoughts as do clear cases of natural languages. Although this criterion is vague, it renders harmless potentially troublesome languages like that consisting of the single word ‘go’ with its linglish meaning. This nonnatural language is likely learnable by children in a relatively careless fashion; but our linguistic theory is spared the complication that would result from providing a grammar for it because this ‘go’ language fails to meet the expressivcncss criterion.

706

Daniel N. Osherson

atId Thotnas

same? At

Wasma

th e ‘physlolog~cal’ level of circuits and transistors the machines are virtually identical. Similarly, at the psychological level of Assembly steps, both employ the same basic vocabulary and probably share a variety of subroutines. As one moves away from the details of the machine language to increasingly succinct flowcharts, differences appear. Finally, at the ‘linguistic’ level, where the two devices can be compared in terms of the functions they can compute, the differences are substantial. By altering the example, the correspondences and failures of correspondence will be differentjy patterned. In general, the connections between reductionist levels arc quite indirect, and care must be exercised in passing between them. In going from the abstract to the concrete, this is evident: there are many different algorithms for computing any recursive function, and there arc, in principle at least, many different ways in which any given algorithm could be implemented physically. The fact that a computer is, say, grinding out perfect squares tells us little about the program it is using and even less about the hardware. Similarly, a grammar tells us little about the actual mental computations involved in language production and processing, and still less about the central nervous system. * It is less evident, but we think true, that one cannot generally draw conclusions by going from the concrete to the abstract either. Though it may, in principle, be possible to determine an abstract description from the corresponding concrete one, when one is dealing with complex phenomena like language, this is generally not possible in practice. Our hypothetical computer’s behavior, which can be characterizcd abstractly by a mathematical function, may be fully determined by the circuits inside the machine; nevertheless, a description of the circuits in however much detail one desires would not significantly enhance our understanding of the function being computed. Indeed, even the program will not be of much help, if the function is nontrivial. Anyone who has tried to find simple arithmetical functions that correspond to fully specified Turing machines (or vice versa) will recognize how obscure the connection can be between our intuitions about a function and an algorithm for computing it. In the case of language, this means that neither a description of the behavior of the organs of language (including the brain) nor a r&-time process model for language use is likely to go very far toward providing a characterization of the classes of abstract automata that can formalize the set of humanly possible languages. The latter might be logically deducible from the former, but it would be surprising if this deduction were feasible. All of this is of course rather hypothetical, given how little is known about language at any level; it is intcndcd only to dispel the apparently common misapprehension *I his point

is made also by Kac (1974)

Task-specificity and species-specificity in the study of language 207

that real-time process models for language or physiological theories of language would be superior to grammatical theories on the grounds that they would account for everything linguistics accounts for, and then some. Behind the tenuousness of the connections between different levels of the reductionist hierarchy lies the fact that theoretical results on one level cannot be translated straightforwardly into results on another. A ‘straightforward’ translation is one in which the natural kind terms figuring in theoretical statements at one level are mapped into natural kind terms figuring in the corresponding theoretical statements at another level; such translation is not, in general, to be expected.* Thus, there is no reason to expect that a component of a theory at one level should correspond to an identifiable component of a theory at another level. For example, a program for calculating the function f (x, JJ) = x2 - 4.’ need not have a chunk that corresponds to either exponent 2; instead, the program could mutiply (x + v) with (x - J)). Analogously. even a transformation as well motivated linguistically as subject-auxiliary inversion in English may not, and probably will not, be localizable in a specific area of the English-speaker’s brain. Nor need a process model of English production or perception contain a recognizable correlate of this transformation. With these points in mind, let us return to the questions with which we started. For a given faculty, (A’) is often called the species-specificity question, and (B’) is often called the tusk-specificity question. We are now in a position to be unhappy with any discussion of these issues that is not sensitive to the multiplicity of criteria that may be invoked as part of the answers to these questions. The relevance of such multiplicity to the speciesspecificity question is obvious. Consider the current dispute over the linguistic competence of certain culturally advantaged chimpanzees (Gardener & Gardener, 1969; Premack, 1971; Brown, 1970; Fodor, Bever, & Garrett, 1974). At least three questions can be asked: (a) Is the neural substrate for these animals’ linguistic capabilities similar to that of humans? (b) Are the psychological processes similar? And (c) Are the formal principles sufficient for characterizing their languages similar to those that are sufficient for human languages? Our point is that these questions are empirically and conceptually distinct, and all are relevant to the species-specificity issue. The situation is the same with respect to task-specificity, but here we wish to dwell awhile since it is of much concern at present within developmental

*I:odor indebted.

(1975, pp. 9 - 26) provides an illuminating See also Putnam (1973) for a similar view.

discussion

of this point,

to which

we are

208 Daniel N. Osherson and Thomas Wasow

psychology (e.g., De-Zwart, 1974). In this debate (B’) is construed as a question about nativism, i.e., the question of what dispositions are built into an infant that allow him to learn language. It is beyond dispute that some innate equipment figures in the acquisition of language (otherwise, the baby’s rattle would learn language as well as the baby, since they have comparable linguistic environments). The only question at issue is whether this innate structure has significant components that subserve the development of no other faculty than language.* In what follows we shall confound the following two questions: (a) does the mature linguistic faculty rest upon intellectual components that are specific to language?; and (b) does the acquisition mechanism for language include components that are specific to language?. It is logically possible that (a) must be answered differently than (b). But we suspect that as a matter of fact, task-specific components of language in the adult result from task specificities in the acquisition mechanism. However, nothing more than expositional comfort is at stake here. If this empirical assumption is false, our arguments for the multiplicity of approaches to (B’) are unaffected; they simply need to be applied to yet a third question, parallel to (B’), that concerns the acquisition mechanisms underlying the several faculties. It is time now to be somewhat more substantive about the task-specificity question within the reductionist hierarchy. At the physiological level, the question becomes: do humans have biological adaptations whose only function is linguistic? It seems fairly clear that the answer is affirmative. The shape of the human vocal tract differs markedly from that of other primates (Lieberman, 1972), and what is more important for present purposes these differences appear to have no other use than to increase the number of version of the task-specificity possible speech sounds. ** The physiological issue is more interesting in its relation to adaptations in the central nervous system. The human brain may have certain areas whose sole function is the processing of linguistic material. The high degree of localization of linguistic functions in the brain (Whitaker, 1971) supports this idea, but leaves open the possibility that such areas might have nonlinguistic functions as well (Luria, 1975). The picture is further complicated by the ability of other portions of the brain to assume responsibility for language processing in case of injury (Lenneberg, 1967). A case can be made for the claim that people are genetically programmed to develop ‘language centers’ in the brain, but

*The same considerations apply if the question is narrowed to a specific comparison between language and some other, given, faculty. itself to speech the human pharynx has **lndccd, Bolingcr (1975, p. 3) claims that “by adapting of choking on food. created a hazard that did not exist before,” viz., incrcascd probability

Task-specificity and species-specificity in the study of language 209

that their precise location is flexible. See Lenneberg & Lenneberg (1975) for discussion of this and related matters. On the other side of the ledger, it is completely evident that there are biological adaptations with both linguistic and nonlinguistic functions. On the psychological level, the task-specificity question becomes: are the sorts of strategies people use to process and produce language different in kind from the sorts of strategies used in other cognitive tasks? In virtually all respects the matter is still quite open (see Fodor, Bever, & Garrett, 1974, pp. 462 - 468). Only the study of speech perception has yielded evidence for language specific psychological processes. For example, the perception of stop consonants imposes categorical distinctions on a number of physically defined continua for both adult and infant listeners (Eimas, 1973). Such auditory processing has been taken to be language-specific in view of the fact that physical continua involved in non-speechlike sounds are not generally perceived categorically. That is, for non-speech sounds listeners can discriminate considerably more stimuli on a given continuum than they can identify absolutely, whereas this is not always true of the physical continua involved in speech sounds. However, recent work has complicated this picture as well. Cutting and Rosner (1974) and Jusczyk et al. (1975) have provided evidence for categorical perception in both adults and infants for the ‘rise time’ of square waves. Rise time is the physical continuum that underlies the perceptual contrast between plucking and bowing a stringed instrument. Inasmuch as plucking and bowing are nonspeech stimuli, the task-specificity of categorical perception is brought into question.* For a thorough review of the evidence on the language-specific nature of speech perception, see Stevens and House, 1972. That there are psychological processes mediating both linguistic and nonlinguistic functions is not open to much doubt, The disambiguation of sentences in context, for example, probably requires mechanisms of memory and attention that are ubiquitous in the human faculties. On the linguistic level the task-specificity question becomes: are the formal devices needed to characterize the class of possible human languages different from those needed to characterize other human faculties?; that is, does the class of automata that contains grammars for every natural larlguage but no grammars for nonnatural languages make use

*Morse and Snowdon (1975) have observed categorical perception in rhesus monkeys for human speech sounds. This finding is obviously relevant to the species-specificity question. However, it bear> on txk-specificity only in conjunction with an assumption about the evolutionary course of this perceptual adaptation, namely, that when it was recruited for human speech it retained its former nonlinguistic functions as well. Although this assumption is not implausible, it is not obviously true either.

2 10

Daniel N. Osherson and Thomas Wasow

of formal devices that play no role in similar formalizations of other human faculties? Again, in all likelihood there are many nonlanguage specific properties of grammars that will figure in a true linguistic theory; an example is rule ordering, which will likely be required in the formalization of deductive intuition (see Osherson, 1975, 1976). That there are at least some taskspecific properties of language at the linguistic level is clearest in the domain of phonology. It is widely agreed that a characterization of the class of possible sound systems for natural languages must include a set of ‘distinctive features,’ i.e., criteria for distinguishing between sounds. A number of candidate sets have been proposed (e.g., Trubetzkoy, 1958; Jakobson, 1941; Ladefoged, 1975; Chomsky & Halle, 1968), and there is controversy over the membership. However, there is little question that these features are specifically linguistic, i.e., that they play no role in characterizing other human capacities. The task-specificity issue is more controversial in the domain of syntax. We know of only one attempt to deal with this question directly, viz., Wasow (1974). To understand Wasow’s argument, let us agree to construe the language faculty more narrowly than usual, as concerned exclusively with systems of sound-meaning correlation. Nonspoken ‘languages’, such as Israeli Sign and Classical Chinese (which has only a written form) are thus excluded from the class of natural languages. Wasow compared these nonspoken languages to spoken languages with respect to their resources for expressing thematic relations like agent and patient.* Spoken languages invariably use syntactic devices like word order or case marking for this purpose, but these devices are not generally available in nonspoken languages.** Thematic relations participate in various nonlinguistic abilities (e.g., the attribution of causality). Wasow’s observations suggest that the syntactic devices just mentioned are not manifestations of a general cognitive

“The thematic relations are to be distinguished from the purely syntactic notions of subject, direct object, and indirect object. Thematic relations determine the argument places to be occupied by the Grammatical relations (as we use the NP’s in what Jackendoff (1972) calls the ‘functional structure’. term here) are purely syntactic notions, and a clause may change its grammatical relations (though not itsthematic relations) in the course of a derivation. Hence, it makes sense to talk of the ‘surface subject’ of a clause, but not of the ‘surface agent’, We ignore here the distinction Chomsky (1965) makes between ‘grammatical relations’ and ‘grammatical functions’, using the former term ambiguously to encompass both concepts. **Wasow relied on a report by Schlesinger (1971) that Israeli Sign language does not use word order or case-marking to express thematic relations. There is currently controversy, however, over Schlesinger’s claim, and more generally over the syntax of sign languages. If word order or inflection turn out to have thematic consequences in sign systems, then this characteristic will not be peculiar to spoken language but to a broader class of expressive communication systems.

Task-specificity and species-specificity in the study of language 2 11

requirement for employing thematic relations in thought or communication (since thematic relations appear without them in Israeli Sign and Classical Chinese), but instead are peculiar to spoken language: for, neither the ordering of a set of elements, nor changes in their morphology are employed to represent thematic relations in other cognitive systems (so far as we know), not even in nonlinguistic systems like Israeli Sign and Classical Chinese that share with natural language essentially identical purposes, namely, the expression and communication of thought. Wasow’s argument raises interesting questions for psychology and physiology; it is presently unclear, for example, to what extent the auditory/ vocalic modality, per se, dictates the use of order and inflection to express thematic relations. But note that whatever answers this kind of question receives from physiologists and psychologists, Wasow’s argument is unaffected. At the linguistic level of abstract automata, the use of element order and inflection to represent thematic relations is specific to natural language if and only if it occurs in optimal formalizations of natural language (which it does), and it does not occur in optimal characterizations of other human faculties (which it seems not to, in light of its not showing up in Classical Chinese and Israeli Sign despite the overlap in function between spoken language and these nonspoken systems). The claim that the use of order and inflection to express thematic relations is language specific in the above sense gains credibility from the fact that a closely related syntactic phenomenon also appears to be specific to language. This phenomenon concerns the grammatical relations of subject, direct object, and indirect object (c.f., footnote** on page 210). Although there is disagreement among linguists regarding the role these notions should play in grammatical descriptions (Postal, 1975) there seems to be little question that they do play a role in linguistic theory, that is, in the characterization of the class of possible human languages. The following is a sample of the proposed linguistic universals that center on grammatical relations (for many others, see Greenberg, 1963): (1) The Specified Subject Condition: No rule can involve X, Y in the structure . ..X... [ a . ..Z...-WYV...] . . . where Z is the specified subject of WYV in @. (Chomsky, Y973, p. 239) (2) The Fixed Subject Constraint: No movement rule T may delete a (part of a) subject if it lies next to a complementizer not mentioned in the structural description of T. (Bresnan, 1972, p. 308) (3) The Agreement Law: Only subjects, direct objects, and indirect objects can trigger verbal agreement. (Perlmutter & Postal, 1974, p. 1) The notions of subject and object employed here are purely syntactic: grammatical subjects that are not agents (or ‘logical subjects’) obey these

2 12 Daniel N. Osherson and Thomas Wasow

constraints, whereas agents that have ceased to bear grammatical relations do not. Now in view of the extent to which linguists of different theoretical persuasions have found it necessary to refer to grammatical relations, it is not unlikely that an optimal characterization of the class of possible human languages will make use of them. Further, there is no indication (so far as we know) that anything formally analogous to subject, direct object, and indirect object plays a role in other human faculties.* Though thematic relations like ‘agent’ may be ‘part and parcel of our way of viewing the world’ (Schlesinger, 197 1, p. 98), the related but distinct grammatical relations like ‘subject’ seem to be purely linguistic. This provides further evidence for task-specific components of language at the linguistic level. To be clear: We are convinced that the grammatical notion of ‘subject’ owes its existence, ultimately, to psychological and neurophysiological circumstances (in that order), and we cheerfully admit the possibility that this reductionist story has affinities to the story for, say, chess. Nonetheless, at the linguistic level, there are grounds for believing that grammatical relations like ‘subject’ are unique to the faculty of language. We have advocated pluralism. Different disciplines, at a given point in their development, will have different answers to (A) and (B). This is as it should be, both in terms of the insight yielded by such diversity and in terms of the progress that can result in seeking whatever reductionist ties are possible between sciences. Still, excesses are possible, so we are disposed to end on a cautionary note. A science is defined both by its concepts and by its analytic tools; but not any combination of these things constitutes a discipline whose answers to (A) and (B) are worthy of attention. For example, if the linguist were to make use of only the concepts and methods of recursive function theory, the lheory of language that resulted would not include the notion of subject, which, as we saw, will likely figure in an adequate theory of language at the linguistic level. Similarly, if psycholinguists limit themselves to the concepts and techniques used in psychological studies of, say, memory or perception, then for this reason alone they may not discover anything unique to the human faculty of language, and will consequently answer both the task- and species-specificity questions in the negative.** More generally, we believe that any inquiry whose direction is determined solely by the available research techniques is unlikely to tell us much of value. The methods and concepts employed must be tailored to

*This is an empirical claim, of course, so further research into the other faculties may eventually falsify it. **Much the same sentiment is expressed, albeit more forcefully, by Chomsky (1975, pp. 26 - 27).

Task-specificity

and species-specijicity

in the study

of language

2 13

the questions one seeks to answer (in our case, questions (A) and (B) above). As we have seen, they must also fit the level at which one chooses to approach the phenomenon.

References Bolingcr, D. (1975), Aspecrs of I,anKuaXe, (2nd Edition), New York, Harcourt, Brace and Jovanovich. Bresnan, J. (1972), Theory of Complementation in English Syntax, unpublished M. 1. T. dissertation. Brown, R. (197(J), The First Sentences of Child and Chimpanzee, in PsvcholinKuistics. Free Press. Chomsky, N. (1965),Aspects qfthe 7%eo[v ofSyntax, Cambridge, Mass., M. 1. T. Press. Chomsky, N. (1973) Conditions on Transformations, in S. Anderson & P. Kiparsky (eds.), A Restschrift for Morris Halle, New York, Holt, Rinehart, and Winston. Chomsky, N. (1975), Re,jlection on I,anRuaKe, New York, Pantheon Books. Cutting, J. & Rosner, B. (1974) Categories and boundaries in speech and music, Percept. PsJjchophvs, 16, pp. 564 - 571. dc Zwart, H. (1973), Language Acquisition and Cognitive Development, in Moore, T. (ed.), Cognitive Development and the Acquisition of Laquage, New York, Academic Press. Eimas, P. (1973), Developmental Studies of SpeechPerception, Brown University Technical Report. I:odor, J. (1975), The I,angua~e of Thoupht, Crowell. Fodor, J., Bever, T., and Garrett, M. (1974), The PsycholoR.l, of‘f,anXua~e, New York, McGraw-ffill. Gardener, R. & Gardener, B. (1969), Teaching Sign Language to a Chimpanzee, Science, vol. 165, pp. 664 - 672. Greenberg, J. (ed.) (1963), Universals in Language, Cambridge, Mass., M. 1. T. Press. Jackendoff, R. (1972), Semantic Interpretation in Generative Grammar, Cambridge, Mass., M. I. T. Press. Jakobson, R. (1941), Kindersprache, Aphasie, und all~emeinr Lautgesetze, The Ilague, Mouton. Jusczyk, P., Rosner, B. Cutting, J., I:oard, C. & Smith, L (1975), Categorical Perception of Nonspeech Sounds in the Two-Month-Old Infant, Paper prcscnted at the Society for Research in Child Dcvelopment Meeting, April I 1, 1975. Kac, M. (1974), Autonomous Linguistics and Psycholinguistics, paper given at the Linguistics Society of America Meeting, Amherst, Mass. Ladefoged, P. (1975), A Course in Phonetics, New York, Harcourt, Brace and Jovanovich. Lenneberg, E. (1967), Biological Foundations of Language, New York, Wiley & Sons. Lenncberg, E. H. & Lenneberg, E. (Eds.) (1975), Foundations ofI.anguage Development, New York, Academic Press. Lieberman, P. (1972), The Speech ofprimates, The Hague, Mouton. Luria, A. (1975), Basic Problems of Language in the Light of Psychology and Neurolinguistics, in Lenneberg & Lenneberg. Morse, P. & Snowdon, C. (1975), An investigation of categorical speech discrimination by rhesus monkeys, Percepf. Psychophys., 17, pp. 9 - 16. Osherson, D. (1975), LogicalAbilities in Children, Vol. 3, New York, Erlbaum. Osherson, D. (1976), LogicalAbilities in Children, Vol. 4, New York, Erlbaum. Perlmutter, D. & Postal, P. (1974), Some laws of grammar, mimeo for the Summer Linguistics Institute, Amherst, Mass. Postal, P. (1975), Avoiding reference to subject, manuscript. Premack, D. (1971), Language in Chimpanzee? Science, Vol. 172, pp. 808 - 822.

2 I4

Daniel N. Osherson and Thomas Wasow

Putnam, H. (1973), Reductionism Schlesinger, 1. (1971), Production

and the nature of psychology, Co,e., Vol. 2, No. 1, pp. 131 - 146. of Utterances and Language Acquisition, in D. Slohin (ed.), 7%~ Ontogenesis o,f (irammar, New York, Academic Press. Stevens, K. & House, A. (1972), Speech Perception, in J. Tobias (Ed.) Foundations of’Modcrn Auditory T/zcor_v, Vol. II, New York, Academic Press. Trubetzkoy, N. (1958), Gnrndz~ge der Pkonologie, Gottingcn, Vandenhoeck and Ruprecht. Wasow. T. (1974), The Innateness Hypothesis and Grammatical Relations, .S.rznfhese, 26.1. Whitaker, II. (1971), On the Representation of Language in the IIuman Brain, Linguistics Research, Inc.