Cognition, Vol. 7, No. 4

Cognition, @Elsevier 7 (1979) 323-331 Sequoia S.A., Lausanne ~ Printed in the Netherlands Does awareness of speech a...

37 downloads 1000 Views 9MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Cognition, @Elsevier

7 (1979) 323-331 Sequoia S.A., Lausanne

~ Printed

in the Netherlands

Does awareness of speech as a sequence of phones arise spontaneously? * JO!& MORAIS LUZ GARY J&US

ALEGRIA

and PAUL BE RTE LSON Universitc! libre de Bruxelles

Abstract It was found that illiterate adults could neither delete nor add a phone at the beginning of a non-word; but these tasks were rather easily performed by people with similar environment and childhood experiences, who learned to read rudimentarily as adults. Awareness of speech as a sequence of phones is thus not attained spontaneously in the course of general cognitive growth, but demands some specific training, which, for most persons, is probably provided by learning to read in the alphabetic system. Introduction

Alphabetic writing in first approximation represents speech at the level of units such as phone and phoneme. 1 Both spelling and reading in an alphabetic system imply, in addition to the ability to perceive minimal phonetic distinctions, an explicit knowledge of the phonetic structure of speech. For example, the reader/writer must not only be able to distinguish between cat and bat, but must also know that cat and bat consist of three units and differ only in the first. An important question is how this knowledge is attained. In normal communication, people pay attention to meaning, not to the structural charac*Reprints may be obtained from Jo& Morais, Laboratoire de Psychologie expirimentale, Universit6 libre de Bruxelles, 117 av. Ad. Buyl, B-1050 Bruxelles, Belgium. ‘While the term phone is generally used to indicate the more elementary units of speech that are perceptibly different, there is a considerable disagreement in the literature about the defmition of phoneme. In the traditional perspective, the phoneme is any collection of phones whose differences are irrelevant to meaning distinctions; in the generative-transformational perspective, the phoneme is an abstract representation that depends on morphemic information and relates to pronunciation through a set of rules. For a discussion of the distinction between phone and phoneme, from the latter point of view, in relation to the alphabetic system, see Gleitman and Rozin (1977). In the present text we shall refer to analysis into phones rather than into phonemes, because the experimental task simply required our subjects to manipulate different sounds without regard for meaning.

324

J. Morais, L. Cary, J. Alegria and P. Bertelson

teristics of the speech they hear and produce. However, conscious reflection on language and therefore explicit knowledge of the linguistic structures do occur. Awareness of speech as a sequence of phones, for instance, might appear spontaneously at some age, as a normal outcome of cognitive growth, through maturation and/or linguistic experience. Alternatively, it may require some specific training, which for most children is usually provided by reading instruction itself. The question is important not only from a theoretical point of view but also from a practical one: under the cognitive growth hypothesis, failures in learning to read can best be avoided by adjusting the age at which reading instruction is started to individual rates of development, while under the specific training hypothesis the solution should be sought in the improvement of educational practices. That the ability to manipulate phones is related to success in learning to read has been largely documented. For instance, Savin (1972) signaled that children who failed to learn to read by the end of the first grade were generally unable to learn Pig Latin. This “secret language” requires the shifting of the initial consonant cluster of each word to the end of the word and the addition of the sound [ei]. This fact, however, may reflect either a delay in the spontaneous acquisition of the ability to analyse speech into phones or the inability to make abstract inferences about the sound system of language from its alphabetic representation. Some observations on the linguistic behavior of preschool children would suggest that insight into the phonetic structure of language may be possible before formal learning to read and write. Read (1978) could elicit phonetically correct judgments of similarity for vowels in kindergarteners. Slobin’s (1978) daughter engaged in rhyming play and noticed sound similarities in her own speech at 3;l: “eggs are beggs; more-bore”. Preschool children apply the plural inflection to new words and appreciate the pronunciation of a sound in a word. However, the conscious manipulation of a particular phone or class of phones (like vowels, which are important in rhyme) does not necessarily imply awareness of speech as a sequence of phones. Phones that can be uttered in isolation may be more accessible, i.e., brought more easily to our awareness, than highly encoded ones. Awareness of such phones may be an example of awareness of a linguistic performance, rather than of a linguistic structure. The problem we consider here is how awareness of the phonetic structure, not of this or that phone, is attained. The few studies in which the development of the ability to make an explicit analysis of utterances into phones has been investigated do not permit one to choose between the cognitive growth and the specific training hypotheses. In one of those studies (Zhurova, 1973), children were shown dolls with colored jackets and told, for instance, “the boy with the

Awareness of speech as a sequence of phones

325

yellow jacket is Yan, the boy with the green jacket is Gan, the boy with the white jacket is Whan”, etc... . Then, they were tested for the retention of names and questioned about other dolls with colored jackets that had not been shown before (pink, violet, etc...). The rule for new jackets was used successfully by 12%, 39% and 100% of the children in the 4 to 5, 5 to 6 and 6 to 7 years age groups. In another study (Liberman, Shankweiler, Fischer and Carter, 1974), children were asked to play a tapping game, in which segments of a word spoken by the experimenter had to be indicated by the number of taps. The segments were either syllables or phones. The authors found that none of the nursery school children (mean age: 4 years 10 months) could segment by phone (i.e., reach a criterion of six consecutive errorless trials) while 46% could segment by syllable. The percentage of children who were able to segment by phone increased in the other groups: 17% of the kindergarteners (mean age: 5 years 10 months) and 70% of the first graders (mean age: 6 years 11 months). In both the Russian and the American studies the most dramatic progress in segmentation performance occurred between ages 5 and 6. As the Haskins workers pointed out, this increase “might result from the reading instruction that typically begins between ages five and six. Alternatively it might be a manifestation of cognitive growth not specifically dependent on training” (Shankweiler and Liberman, 1976). A test of the issue, they suggested, would be provided by a developmental study of segmentation skills in children learning to read in a logographic system, such as Chinese, which does not demand explicit phonetic analysis. However, such a study, they pointed out later (Liberman, Shankweiler, Liberman, Fowler and Fischer, 1977), can no longer be carried out in China, because children now learn to read alphabetic text before they start studying the logographic characters. Fortunately, testing readers of non-alphabetic systems is not the only possibility. In communities where the writing system is alphabetic, there remains a minority of adults who either have never been taught to read or have dropped out of school at a very early stage. Illiterate people should be unable to perform tasks requiring conscious phonetic analysis, if the improvement observed between ages 5 and 6 is related to reading instruction. On the contrary, if the improvement is the result of some cognitive growth process, independent of reading, they would, of course, succeed.

Method The present experiment was run in a poor agricultural area of Portugal (Mira de Aire, district of Leiria). Subjects were all of peasant origin, but

326 J.Morais, L. Cary, J. Alegria and P. Bertelson

most were now working in the textile industry. Thirty illiterate people (I subjects) and 30 people who learned to read beyond the usual age (R subjects) were tested. / subjects, 6 males and 24 females, were aged 38 to 60 and R subjects, 13 males and 17 females, were aged 26 to 60. Among I subjects, twenty had never received any instruction at all, four had been taught by their children to identify letters, and six had been in school for 1 to 6 months in childhood (some of them could “draw” their names). R subjects had attended classes for illiterate people organized by the government, by the Army or by industry. All were at that time 15 years old or more. Twentytwo, as a result, had received some kind of certificate and eight had failed to obtain any. Two tasks were administered. In the “deletion” task, the subject had to delete the first phone from an utterance provided by the experimenter. In task, he had to introduce an additional phone at the the “addition” beginning of the utterance. Half the subjects in each group worked with one of the two tasks. For each task, five subjects worked with the phone [pl, five with the phone [I], and five with the phone [ml ; three different groups of consonants (plosives, fricatives and nasals) were thus represented in the experiment. The test consisted of 15 introductory trials to illustrate the rule, and 20 experimental trials. The subjects were told that their task was to add (delete) one “sound” to the utterances produced by the experimenter. In the introductory trials, these utterances were non-words which became words by adding (deleting) the phone assigned to the subject. For instance “alhaco” became “palhaco” (clown) and “purso” became “urso” (bear). A correction procedure was used at that stage: when the subject failed to produce the correct response, the experimenter provided it. The experimental trials were of two types: in W trials, the experimenter uttered a word which, by the transformation rule, would become another word, for instance “uva” (grape) became “chuva” (rain), and vice-versa; in NW trials, the experimenter uttered a non-word which would become another non-word, for instance “osa” became “posa”, “chosa” or “moss” depending on phone condition. In both types of experimental trials, no information was provided after the subject’s response. The subject had been told beforehand that on some experimental trials the correct response might be a non-word. All the words were of current use and, in all probability, were known by the subjects.

Red

ts

In interpreting the results account must be taken of the fact that only NW trials provide unambiguous information regarding segmentation and fusion

Awareness of speech as a sequence of phones

abilities. In W trials, the correct response might be found by lexicon for a similarly sounding word. W trials yielded in fact mances than NW ones. On NW trials, I subjects gave a very mance and R subjects quite a good one: mean correct responses tively 19% and 72%. The pattern of results is nearly identical tasks (Table 1). Table 1.

Mean percentages

327

searching the better perforpoor perforwere respecfor the two

of correct responses for each type of trial, task, and group

of subjects. In parentheses,

the percentage

of subjects who attained 100% of

correct responses. Task Addition

Deletion

Trials

W

NW

W

NW

I

46 (13)

19 (0)

26 (7)

19 (0)

R

91 (33)

71 (13)

87 (47)

73 (27)

Subjects

Fifty percent of I subjects failed on all NW trials, while no R subject did. More than 50% of R subjects and only one of the I subjects gave 8 correct responses or more on the 10 NW trials (Figure 1). I subjects failed whatever the target phone: mean correct responses on NW trials were 17%, 19% and 20%, for [pl , [JI and [ml respectively. I subjects who had been in school for some time in childhood or who had been taught the names of letters (n = 10) performed somewhat better on NW trials (30%) than the remaining subjects (13%). The difference approached significance at p < 0.05 by a one-tailed t test (t = 1.696; df = 28). Within the R group, the mean percentage of correct responses on NW trials was 55% for the 8 subjects without a course certificate and 79% for the other 22. The difference is significant at p < 0.025: (t = 2.41; df = 28). On the other hand, R subjects who learned to read before age 25 (n = 10) did not perform significantly better than those who learned beyond that age (75% and 7 1% respectively; t = 0.384; df = 28). The analysis of errors on NW trials revealed that only 19% of the incorrect responses made by I subjects involved the correct deletion or addition of the required phone plus some other transformation, while these kinds of responses represented 56% of the R subject’s errors.2 A tendency to produce ‘An example

is the response

pili instead

of pe’cli

328

J. Morais, L. Cay, J. Alegria and P. Bertelson

words in response to non-words was present in both I and R groups and accounted for, respectively, 46% and 32% of the errors; however, the proportion of wrong responses that both were words and involved the required phone3 was much smaller in group I (6%) than in group R (28%). The great majority of errors made by I subjects can thus be linked to lack of awareness of phonetic structure, while an important portion of the errors made by R subjects were apparently due to some other cause.

Figure

1.

Number

of subjects at the different

levels of performance

in the I and R

groups (for NW trials only).

0

1

2

3

4

5

6

7

6

9

10

0123456

Number of cared /

responses

R Subjects

Subjects

Table 2 shows the errors that occurred twice or more (over a maximum of five) in NW trials for each combination of group, task and phone. It should be noticed that the most frequent errors were generally words (except bli, go and the repetitions ~OSU and maguto). The items in italics are those for which the phone to be deleted (or added) has not been deleted (or added). It should be noticed that this more frequent type of error was made by the subjects of group I, not by those of group R.

3An example is the word podu

instead

of the non-word

posu.

Awareness of speech as a sequence of phones

Frequent errors in NW trials for each combination ofgroup, task and phone. The first item is the stimulus and the second the response. The first number inside the brackets indicates the number of occurrences of the response; the second number indicates the total number of ewors in the trial.

Table 2.

Deletion

329

[PI

[/I

Puada - Ada (2/5) Pobli - Pobre (214) Pecli - PP (314)

Chuada

[ml

task

2 Subjects

- Ada (2/5)

Muada Amuada (315) Mobli - M&e1 (3/5)

Chube. Chuva (214) Chimi Chig6

- Mri (3/5)

- 6 (3/S)

Chabata’ - Batata R Subjects

Addition

Puada

- Ada (2/2)

(2/S)

Mimi - Mi (3/5) Mosa - Mosa (215) Migd -Amigo (3/5) Mapto - Map to (2/5) Mabati - Batata (3/5)

Chuada - Ada (2/3) Chobli - Bli (2/2) Chimi -Ma’ (3/3) Chigi, - G6 (3/4) Chabati - Ti (2/3)

task

I Subjects

Imri - Irma” (215) Abat

- Batata (215)

R Subjects

Imi Aquto-Po$o(2/3)

ACuto

- MZe (2/4)

- Chuto (2/4)

Discussion Illiterate adults were unable to delete or add a phone at the beginning of a non-word, while adults from the same environment who learned to read in youth or as adults had little difficulty. It is interesting to note that the performance of the I subjects was slightly inferior to that of Belgian first graders aged 6 years who were tested in the third month of the school year with similar tasks (18% correct responses for deletion, 29% for addition). The performance of the R subjects was at about the same level as that of Belgian second graders aged 7 years and tested in the fourth month of the school year (73% correct responses for deletion and 79% for addition) (Alegria and Morais, 1979). The extremely poor performance of the I subjects cannot be explained in terms of some general inability to manipulate speech segments or to under-

330

J. Morais, L. Cary, J. Alegria and P. Bertelson

stand an inductive instruction. Cary and Morais (1979) have tested a group of 12 illiterates, from the same origin as those of the present experiment, with a more complex task which consisted in reversing the order of either phones or syllables (for instance, chu for ach, or chave for vechd, respectively) after inductive training. In the reversing phones condition the mean percentage of correct responses was 9% (ranging from 0% to 20%), while in the reversing syllables condition it was much higher: 48% (ranging from 13% to 93%). The present results clearly indicate that the ability to deal explicitly with the phonetic units of speech is not acquired spontaneously. Learning to read, whether in childhood or as an adult, evidently allows the ability to manifest itself. Thus, it is not right to say that awareness of the phonetic structure of speech is a precondition for starting learning to read and write. The precondition for the acquisition of these skills is not phonetic awareness as such but the cognitive capacity for “becoming aware” during the first stages of the learning process. Of course, the present results do not mean that cognitive growth plays no part in the development of phonetic awareness. Specific training may not be effectual before some critical developmental stage. If awareness depends on instruction, it does not follow it necessarily. Successful instruction, on the other hand, depends on awareness. There is a reciprocal relationship between learning to read and the developmental changes in phonetic awareness. Two important questions should now be examined. The first is to what extent phonetic awareness can be provoked by other stimulating experiences. Although for most children learning to read constitutes the exercise that renders the analysis of speech into its phonetic elements imperative, it is not necessarily unique to that function, and other kinds of training might presumably achieve the same effect. The second question is to what extent the procedures used in recognizing and producing speech can be affected by awareness of speech as a sequence of phones. The fact that illiterates are not aware of the phonetic structure of speech does not imply, of course, that they do not use segmenting routines at this level when they listen to speech. But that fact should remind us of the risk we may incur in studying the mechanisms of speech perception through tasks that require conscious, explicit segmentation. Under the pressure of modem developments in linguistics and phonetics some psychologists were led to consider the so-called “psychological reality” of, for example, transformational grammars, or phones and phonemes. It is not always clear whether this kind of inquiry concerns implicit (tacit) or explicit knowledge (cf., a discussion of this point by Seuren, 1978). If the question concerns how we perceive speech, by first segmenting it either in phones

Awareness of speech as a sequence of phones

33 1

(phonemes) or in syllables - the question apparently considered by Savin and Bever (1970) and other authors - then it refers to tacit knowledge. The present results with illiterates are irrelevant to this question, but they urge us to distinguish between the prevalence of such or such a unit in segmenting routines at an unconscious level and the ease of access to the same units at a conscious, metalinguistic level.

References Alegria,

.I. and Morais, J. (1979). Le developpement de I’habilete d’analyse phonetique consciente dc la parole et I’apprentissage de la lecture. Archives de Psychologie, in press. Cary, L. and Morais, J. (1979). A aprendizagem da leitura c a consciencia da estrutura fonetica da fala. Revista Portuguesa de Psicologia, in press. Gleitman, L. R. and Rozin, P. (1977). The structure and acquisition of reading. I: Relations between orthographies and the structure of language. In A. S. Reber and D. L. Scarborough (Eds.), Toward a Psychology ofReading. Hillsdale, Lawrence Erlbaum Associates. Liberman, I. Y., Shankweiler, D., Fischer, F. W. and Carter, B. (1974). Reading and the awareness of linguistic segments. J. Exper. Child Psychol., 18. 201-212. Libcrman, I. Y., Shankweiler, D., Liberman, A. M., Fowler, C. and Fischer, F. W. (1977). Phonetic segmentation and recoding in the beginning reader. In A. S. Reber and D. L. Scarborough (Eds.), Toward a Psychology ofReading. Hillsdale, Lawrence Erlbaum Associates. Read, C. (1978). Children’s awareness of language, with emphasis on sound systems. In A. Sinclair, R. J. Jarvella and W. 3. M. Levelt (Eds.), The Child’s conception of language. Berlin, SpringerVerlag. Savin, H. B. (1972). What the child knows about speech when he starts to learn to read. In J. F. Kavanagh and 1. G. Mattingly (Eds.), Language by ear and by eye. Cambridge, Mass., MIT Press. Savin, H. B. and Bever, T. G. (1970). The non-perceptual reality of the phoneme. J. Verb. Learn. Verb. Behav., 9. 295 -302. Scurcn, P. (1978). Grammar as an underground process. In A. Sinclair, R. J. Jarvclla and W. J. M. Levelt (Eds.), The Child’s conception of language. Berlin, Springer-Verlag. Shankweiler, D. and Liberman, I. Y (1976). Exploring the relations between reading and speech. In R. Knights and D. J. Bakkcr (Eds.), The neuropsychology of learning disorders.’ Theoretical approaches. Baltimore. University Park Press. Slobin, D. I. (1978). A case study of early language awareness. In A. Sinclair, R. J. Jarvella and W. J. M. Levelt (Eds.), The Childs conception of language. Berlin, Springer-Verlag. Zhurova, L. Y. (1973). The development of analysis of words into their sounds by preschool children. In C. A. Ferguson and D. I. Slobin (Eds.), Studies of child language development. New York, Holt, Rinehart and Winston, Inc.

R&me’ Un groupe d’adultes analphabetes a kte’ incapable de soustraire ou d’ajouter un phone au debut d’un non-mot, mais ces tlches ont 6te facilement effectuees par un groupe de personnes dont l’environnement et l’experience pendant l’enfance Ctaient similaires et qui ont appris i lire de facon rudimentaire i l’lge adulte. La prise de conscience de la parole comme une sequence de phones n’est done pas acquise spontanement au tours du developpement cognitif mais exige un entrainement spkcifique, lequel, pour la plupart des personnes, est fourni probablement par l’apprentissage de la lecture dans le systi’me alphabetique.

Cognition,

l(l979)

@Elsevier

Sequoia

2

333-362

S.A., Lausanne

- Printed

Intentional

in the Netherlands

communication in the chimpanzee: The development of deception* GUY WOODRUFF

Primate Facility University of Pennsylvania DAVID

PREMACK

Department of Psychology University

of Pennsylvania

Abstract Communication about the location of a hidden incentive was studied in chimpanzee-human dyads, in which each member of a pair served alternately as ‘sender” and “recipient” of information. When the human cooperated with the chimpanzee in finding the goal, from the very beginning the chimpanzees were able to produce and comprehend behavioral cues which conveyed accurate locational information, When the human and chimpanzee competed for the goal, the chimpanzees learned both to withhold information or mislead the recipient, and to discount or controvert the sender’s own misleading cues. The chimpanzee’s ability to convey and utilize both accurate and misleading information, by taking into account the nature of the sender or recipient, provides evidence of a capacity for intentional communication in this nonhuman primate species.

A central issue in the comparative study of animal communication concerns the concept of intentionality. A good deal of research has shown that the social behavior of many species from diverse phyletic levels can serve a communicative function, by transferring information from one individual to *The research was supported by National Science Foundation grants BNS 75-19748 and BNS 77-16853, and by a facilities grant-from the Grant Foundation. We thank G. Dank, R. Glick, Z. Goldfinger, S. Goldsmith. E. Kaufmann, K. Kennel, W. Langbauer. D. Liner. F. Massa. J. Moselsky. B. Pfeffer, R. Reisman, A. Samuels, E. Wier, B. Yarczowei, and ?. Zwicker’ for assistance as trainers and aides. We are also indebted to E. Menzel for advice and assistance during preparation of the project. Address reprint requests to G. Woodruff, University of Pennsylvania Primate Facility, Honey Brook, Pennsylvania, 19344 USA.

334

G. Woodruff and D. Premack

another (Altmann, 1967; Hinde, 1972; Marler, 1965; Smith, 1977). However, the extent to which different species communicate intentionally, i.e., understand and control the transfer of information, is largely unexplored. By definition, any communicative event involves a “sender”, a “recipient”, and behavioral signals that convey information between the two (MacKay, 1972). A particular instance of communication is intentional if, in addition, the sender (i) appreciates the fact that his behavior transmits information, (ii) recognizes that the recipient also knows that his behavior is informative, and (iii) is able to choose from a set of alternatives that course of action (or inaction) which will provide (or suppress) a given bit of information. Intentional communication is thus more than a simple transfer of information; it is a purposive transfer, based on the sender’s knowledge about the effect that his actions can have on the recipient. Although intentionality no doubt plays a pervasive role in human communication, especially human language (Lyons, 1972), there is no firm evidence for this level of complexity in the communication systems of nonhumans. Indeed, some authors have foreclosed the issue, denying the possibility for intentionality outside the human species. But the present paucity of evidence makes such dismissals inadvisable. Even in the case of language, intentionality often remains hidden. The mere fact that communication is symbolic in form does not establish that it is intentional (and conversely, the fact that it is not symbolic does not rule out intentionality). Often the underlying intentionality is unveiled only when there is a breakdown in the assumptions shared by speaker and listener. For example, an important element of ordinary conversation is the assumption of truth that is shared by both parties: the listener assumes the speaker tells the truth, and the speaker assumes the listener considers him truthful (Grice, 1967). When this assumption is violated, intentionality may be revealed by the speaker’s ability to suppress or otherwise alter the information he conveys, and by the listener’s ability to adjust his response to false information provided by a devious speaker. Even very young children show evidence for this level of control and understanding of their communication (Flavell, Botkin, Fry, Wright and Jarvis, 1968). Students of animal communication suggest that evidence for deception would provide the best indication of intentional communication in a given species (Hinde, 1972; Marshall, 1970). In keeping with this suggestion, field observations of animals (typically, primates) engaging in behavior that misleads another individual (e.g., Kohler, 1925; Menzel, 1974; van LawickGoodall, 197 1) have been cited as evidence of intentionality. Unfortunately, these provocative observations must remain at best only suggestive; not all instances of misleading behavior qualify as deceit. For instance, a misleading

Intentionality

in the chimpanzee

335

signal is not intentional if it results from an occasional “error” on the part of the sender, who otherwise always conveys accurate information. Nor is it intentional if the behavior is always triggered by a particular stimulus situation (e.g., the instinctive behavior pattern of a bird “feigning” a broken limb in the presence of a predator (Simmons, 1951)). A claim for intentionality requires demonstration that an individual can reliably use his communicative behavior to convey either accurate or misleading information, as the situation demands. In the present study we systematically explored the potential for deception in a nonhuman primate, chimpanzee. In one test, a chimpanzee was informed of the location of hidden food, but was denied direct access to it by a physical barrier. The animal could obtain the food only by imparting information about its location to an uninformed human positioned outside the enclosure, in the vicinity of the goal. One human was friendly and cooperative; if he found the food, he gave it to the chimpanzee, but if he failed the animal received nothing. Another human was hostile and competitive; if he found the food he kept it for himself, but if he failed the chimpanzee was allowed to leave the enclosure and obtain the food. Thus, the chimpanzee’s success in procuring the goal depended upon his ability to convey accurate locational information to a cooperative partner on the one hand, and suppress or convey misleading information to a competitive individual on the other. In a second test, we reversed the roles of sender and recipient played by chimpanzee and human. The human was informed of the goal location, and the chimpanzee was now required to find the food by using the behavior of the human as a source of information. The humans modelled the chimpanzees’ behavior patterns observed in the previous test, and in addition, one human was cooperative (he always indicated the correct location) whereas the other was competitive (he consistently indicated an incorrect location). Here we assessed the chimpanzee’s ability to comprehend accurate and misleading information, and to adjust his search accordingly.

Method Subjects The subjects were four African-born chimpanzees (Pan troglodytes), one male (Bert) and three females (Sadie, Luvie, and Jessie). The animals arrived in the laboratory at ages estimated to be one to one and a half years. They had lived in the laboratory as a group for ten months at the start of the

336

G. Woodruff and D. Prernack

experiment. The animals were fed two meals daily (fresh fruit and Purina monkey chow), and received a variety of cookies and candies throughout the day in other experiments run concurrently with the present one. Human

trainers

Laboratory assistants and undergraduate student volunteers participated as trainers and aides in the experiment. The present tests were conducted over a period of three years, during which time the identity of trainers in the tests changed periodically. However, all persons involved were familiar with the chimpanzees (sharing in caretaking duties and assisting in other experiments) before serving in the present tests. There were two basic types of trainers, distinguished by sartorial appearance and behavioral dispositions. The “cooperative” trainer wore the usual green laboratory scrub suit, behaved in a friendly manner toward the animals, and vocalized in a soothing tone of voice one would use with a young child. The “competitive” trainer wore black boots, white coat and hat, dark sunglasses, and a cloth over his mouth (after the fashion of a bandit). He behaved in a hostile manner toward the animals, occasionally swatting them as he passed in the hallway and vocalizing in a low, gruff tone of voice. A third person, the “passive” aide, always accompanied the chimpanzees during the tests, in order to reduce the animal’s distress over being separated from his/her companions. The passive aide interacted minimally with the animal during trials, only allowing the animal to cling to him if he/she so desired. Materials

The tests were conducted in a 10 by 11 by 10 foot (height by width by length) laboratory room. Just inside the door of the room was a 4 by 5 foot enclosure, bounded on two sides by heavy gauge wire mesh. The mesh ran from floor to ceiling, except for a space of six inches between the bottom of the mesh and the floor. A cage door allowed passage through the enclosure into the testroom, and a one-way observation window was located in the wall to the left of the doorway. Small containers were used to conceal food in the room, and were selected from a set of common laboratory items (cardboard boxes, tin cans, plastic cups, coffee pots, and so on). Foods used during the experiment were small pieces of fresh or dried fruit, berries, candies, and cookies. Each session was videotaped by means of a Panasonic WV-I8OP studio camera with wide-angle lens, located in one corner of the testroom outside

Intentionality in the chimpanzee

337

the enclosure, and a Sony VO-1800 videocassette recorder, located in a booth behind the observation window. An observer in the booth provided running commentary about trial events on the videotape soundtrack.

Procedure

The experiment consisted of three phases, the first lasting five months, followed by a six-month hiatus, the second lasting 14 months, followed by a ten-month hiatus, and the third lasting one month. Phase 1. In the first phase, only the production test was administered. Sessions consisted of three to six trials with intertrial intervals of approximately two minutes, and were conducted five times per week. Prior to the start of a trial, an aide concealed food under one of two containers located in the testroom, about three feet from the mesh enclosure. The position of the baited container on the left or right side of the room (from the point of view of the subject in the enclosure) was randomized across trials. Only two containers (cardboard box and plastic cup) were used in this phase and each contained food equally often across trials. The aide removed a chimpanzee from the home cage or the outdoor field, carried him/her into the room, and closed the door. The aide lifted the baited container, gave the subject a direct view of the food, and then concealed the food by replacing the container. The aide then carried the animal out of the testroom and down a hallway to an adjacent room. At a signal from the aide, either the “cooperative” or “competitive” trainer and the “passive” aide left the adjacent room and the chimpanzee was turned over to the passive aide. The trainer entered the testroom through the enclosure, shut the cage door, and positioned himself in the far comer of the room, equidistant from the two containers. Finally, the passive aide carried the subject into the enclosure, shut the room door, sat down in the comer by the door, and relaxed his grasp of the subject, at which point the trial officially began. The delay between the subject’s view of the food and the start of the trial ranged from 20 to 30 seconds. The trainer’s task was to use the behavior of the informed chimpanzee to determine which one of the two containers held the food. Hence, several precautions were taken to ensure that other cues could not reveal the goal location. The exact location of each container changed from trial to trial, but always remained within an area of radius two feet. This prevented cues from changes in the position or orientation of the correct container during baiting. In addition, during baiting the testroom door was closed and the trainer and passive aide were stationed in an adjacent room with door closed.

338

G. Woodruff and D. Premack

This prevented cues from the sound of the correct container being shifted, lifted, and replaced during baiting. Thus, of the three organisms in the testroom, only the chimpanzee knew the location of food at the start of each trial. The cooperative and competitive trainers were given several general instructions about their roles in the experiment. They were urged to use the chimpanzee’s behavior to locate the baited container as often as possible, encouraged to take as much time as they needed to be certain of their choice, and to move about freely in the room in order to solicit cues, approaching one or the other container without necessarily overturning it. In addition, each trainer was urged to develop an accurate choice strategy, and then maintain that strategy as much as possible; if a subject later began to mislead him on repeated trials, he did not then change his strategy to “outwit” the subject (e.g., by choosing the opposite container). During initial trials the trainers were of course nai’ve to the task. When experienced trainers later departed from the laboratory, their replacements were given the same instructions, as well as extensive experience viewing trials through the observation window and on videotape. The outcome of each trial depended upon which container the trainer chose and whether he was playing the cooperative or competitive role. If the cooperative trainer chose the baited container, he gave the food to the chimpanzee. If he chose the unbaited container, however, the chimpanzee was consoled by the passive aide and led out of the room without food. On the other hand, if the competitive trainer chose the baited container, he kept the food for himself. If he chose the unbaited container, however, he retired to the far corner of the room in disgust, and sulked as the passive aide allowed the chimpanzee to leave the enclosure to find the food. The chimpanzees were first given 24 trials with only the cooperative trainer at the outset of Phase 1 (the Pretest), in order to be sure that locational information could indeed be transferred in this situation. Thereafter, the chimpanzees were exposed to both trajners in random order across sessions, with only one type of trainer appearing in any given session. Testing in Phase 1 continued for each subject until both trainers’ choice performance appeared stable, with no obvious increasing or decreasing trends in accuracy. Phase 2. In the second phase, the animals were given tests of both production and comprehension. The production test described for Phase 1 was modified in two ways. First, the containers were selected from a set of four new items, and all possible pairwise combinations of two containers were presented in random order. Second, a one minute time limit was imposed on all trials. Now the trainers could not take all the time they wished, but

Intentionality

in the chimpanzee

339

were instructed to try to make a choice as soon as possible. (However, the trainers were not to simply guess if they remained uncertain about the location of the food, a strategy that was allowed in Phase 1). If the cooperative trainer failed to choose within the time limit, the chimpanzee was carried from the room without food, In contrast, if the competitive trainer was unable to choose within one minute, he retired to the far corner of the room and the chimpanzee was allowed to leave the enclosure to find the food. This time limitation was designed to foster rapid and efficient communication with the cooperative trainer, and to allow withholding of information for a specified period of time in the presence of the competitive trainer. We also administered a test of comprehension, in which the roles of informed sender and uninformed recipient played by chimpanzees and humans were reversed. Prior to the start of a trial, an aide gave the trainer a direct view of the food concealed beneath one of the containers. At a signal from the aide, the passive aide then carried a chimpanzee from an adjacent room into the testroom, through the enclosure, to the far corner of the room. The informed trainer then entered the enclosure, shut the door, and performed a set of responses in a prescribed manner. Approximately five seconds after the trainer entered the room, the passive aide relaxed his grasp of the chimpanzee, at which point the trial officially began. The chimpanzee’s task was to use the behavior of the informed human to determine which one of the two containers held food. The same precautions as those described for the production test were taken to ensure that other cues could not reveal the goal location. As an additional precaution, the passive aide averted his gaze toward the floor beneath him throughout each trial. Since he did not look at and “read” the cooperative or competitive trainer’s cues, the passive aide could not inadvertently cue the chimpanzee about the goal location. The cooperative trainer always oriented toward the baited container, whereas the competitive trainer always oriented toward the unbaited container. The behavior pattern performed by the trainers was derived from that commonly observed in the subjects on cooperative trials by the end of Phase 1. The trainer approached the wire mesh nearest one container, sat on the floor with head and torso oriented toward the container, extended a foot or hand under the mesh approximately six inches in the direction of the container, and looked back and forth from the chimpanzee to the container. The trainer thus served as a crude “model” of the chimpanzees’ behavior. Most subjects showed similar types of responses, although they were often embedded in a stream of behavior including other responses not oriented toward a container (jumping, climbing, scratching, clinging to and grooming the passive aide, and so on).

340

G. Woodrujyand D. Premack

The outcome of each trial depended upon which container the chimpanzee chose. If he approached and overturned the baited item, he was allowed to eat the food. If he chose the unbaited container, he was consoled by the passive aide and carried out of the room without food. However, a oneminute time limit was also imposed on comprehension test trials. If the animal failed to inspect a container on a cooperative trial, he was carried from the room without food; but if the subject failed to choose on a competitive trial, the trainer left the room and the animal was given the food by the passive aide. In both the production and comprehension tests of Phase 2, sessions consisted of six trials with intertrial intervals of approximately one minute. Only one type of trainer appeared in any given session. Testing consisted of two cycles of exposure to the following order of conditions: comprehension (cooperation, and then competition), followed by production (competition, and then cooperation). Testing in each condition was continued for each subject until at least one significant “run” (Grant, 1947) of responses was recorded, with the restriction that subjects received a minimum of 24 trials and a maximum of 144 trials per condition in each cycle. Phase 3. In the third and last phase, the chimpanzees were given a final brief test of both production and comprehension. The procedures were the same as those described for Phase 2, with two modifications. First, containers were selected from a new and larger set of items, such that each trial contained a unique pair of baited and unbaited items. Second, trials with the cooperative and competitive trainers were counterbalanced within sessions, such that each trainer appeared on three of the six trials per session. Both production and comprehension tests entailed a total of 48 trials, 24 trials with each trainer. Video tape analysis

At the conclusion of the experiment, videotapes of the chimpanzees’ behavior during Phases 1 and 2 of the production test were used to analyze response topographies. The first and last 24 trials with the cooperative and competitive trainers in each phase were edited onto new videotapes. The editing eliminated a view of the subject’s reaction to the outcome of the trainer’s choice (receipt or loss of food). The edited trials were played for a panel of three laboratory assistants, who were familiar with the chimpanzees but had neither observed nor participated in the experiment proper. The tapes were played at normal speed and in slow motion, repeatedly if necessary, and the observers recorded the frequency (and direction, if applicable) of the following types of behavior: (i) general movement in space

Intentionality

in the chimpanzee

34 1

toward one side of the room or the other (e.g. walking, running, jumping, somersaulting, sliding on the floor, and so on), (ii) more specific orientation of parts of the body (torso, head, limbs), (iii) visual behavior (glances toward the containers and the trainer), (iv) intensifications of a response as the trainer approached a container (a change in rocking motions was virtually the only response recorded in this category), and (v) responses which had the effect of moving or keeping the subject away from the containers (climbing the mesh to the ceiling, clinging to the passive aide). The observers discussed their observations and recorded a response only when a majority agreed upon a decision (interobserver agreement was 85% or more for each chimpanzee).

Results

Production test The cooperative trainer found enough information in the animals’ behavior to choose correctly almost from the beginning of the experiment. Table 1 presents an analysis of “runs” of consecutive correct or incorrect choices by the trainers. The table shows that within the first 24 trials (Pretest), the cooperative trainer showed a statistically significant run of consecutive correct choices beginning on Trial 17, 4, 8, and 6 for Sadie, Bert, Luvie, and Jessie, respectively. The left-hand column of Figure 1 presents the number of trials on which the trainer chose correctly during the Pretest, and these data reveal a significant proportion of trials with a correct choice for each animal (17 or more correct choices per 24 trials, p < 0.05, binomial test). Having established in the Pretest that information could be transferred from chimpanzee to human in this situation, we contrasted cooperative with competitive trials and examined whether or not the animals could exert some degree of control over the flow of information. At first, there was no evidence that any animal differentially controlled the locational information imparted by his behavior. Table 1 indicates that the competitive trainer readily showed a significant run of correct choices, beginning on Trial 1, 25, 4, and 1 for Sadie, Bert, Luvie, and Jessie, respectively. Moreover, there was no significant difference between the trainers’ choice performance during the first 24 trials with each trainer (Phase li, Figure 1) for any subject. Thus, in the beginning the competitive trainer was as successful in denying food to the animals as the cooperative trainer was in giving it to them.

342

G. Woodruff and D. Premack

Significant

Table 1.

“‘runs” of consecutive

correct or incorrect

choices by the trainers

in the production test, and by the chimpanzees in the comprehension test. Each entry shows the number of trials in each run per total trials to the end of the run. Subject

Production

Phase Cooperation Correct

Sadie

Bert

Luvie

Jessie

Competition

Correct choices

Incorrect choices

Correct choices

Correct choices

Incorrect choices

9/9b -.

13/193=

17124’

~

11/144a

9/10b

10/42a 24172’

~ ~

12/16ga 241192’

16129’ -

14124’ 22148’ 24112’

-

12/120a 21/144’ 12/156a

10/l lb

16132’

~

17/154b

-

3

121190a 221214’ 6/9a 1 2/143a 14/21Sb 221239’

~

_

1 1/206a

~

10/34a

~

~ ~

1

56163’ 14/94b

~

2

12/194a

~

3

20/216=

--

1

8/13a 10/66a 12/315=

~

2 3

Cooperation

Incorrect choices

1

2 3

Competition

choices

2

1

Comprehension

Probabilities : C

1 2/2tGoa

12/15c 9/25a 14/44c _.

Incorrect choices

8/8b 33/113= _

-

10/47a

~ _

~

7/7b

~ _

16124’ 12/44b 24172’

_

were computed

with Grant’s

“runs”

test (Grant,

11/170a 12/192a

~

1947).

p < 0.05 p < 0.01 p < 0.001

The introduction of the competitive trainer was not entirely without effect, however. The competitive trainer took longer to “read” the behavior of several of the animals and decide which was the baited container, even at the start of Phase 1, before the two trainers differed in their ability to choose correctly (Figure 1). Figure 1 shows that choice accuracy generally declined for both trainers during Phase 1. Whereas the cooperative trainer later regained his ability to

Intentionality

Figure 1.

in the chimpanzee

343

Number of trials with a correct choice (left-hand panels) and mean trial duration (right-hand panels) at various stages of the production test with each subject. Solid lines show data for the cooperative trainer, broken lines for the competitive trainer. Data are shown for blocks of 24 trials with each trainer. I,, initial Pretest with only the cooperative trainer; Ii, initial trials of Phase 1 with alternating trainers; 1 f, final trials of Phase 1; 2i, initial trials of Phase 2; 2f, final trials of Phase 2. Vertical dotted lines connecting points for the cooperative and competitive trainers indicate a statistically significant difference

(p < 0.05) between performances

,,;

6 x

for the trainers.

Bet-t ,,,,,I,:

6

1, 1, 1, 2, 2,

1, 1, 1, 2, 2,

Phase (blocks of 24 trials) -

Cooperattve trainer - Competitive trainer

choose correctly, the decrease in the competitive trainer’s performance level was permanent for all subjects. Indeed, accuracy for the cooperative trainer clearly exceeded that for the competitive trainer by the end of Phase 1 for three chimpanzees, and the magnitude of the difference, increased over the

scores for the trainers in the production

test and for the chimpanzees in the comprehension

test. Each

Phase

1 2 3

1 2 3

1 2 3

1 2 3

Sadie

Bert

Luvie

Jessie

; Y

approximation

13 8

61 18

38 I

1

0

Number of trials with no choice

; c

with normal

931132’ 811155 9116

316

831131’ 51/107

761133 291947 4/17oL

691133 lSi6d-i 5/23O

Proportion correct

above chance

were computed

0

0

0

0

0

0

0

0

Number of trials with no choice

Competition

below chance

Probabilities

107/151~ 111/168’ 20/24b

121/149’ 46160’ 22124’

80/143 49/12b 22/24c

88/13OC 48160’ 23124’

Proportion correct

Cooperation

Production

choice within the time limit imposed in Phases 2 and 3.

to the binomial

38148’ 24124’

87/144a 21124’

44148’ 24124’

40/48’ 24124’

Proportion correct

p < 0.05

1966).

891179 12/12c

611121 12/15a

102/28OY 4/24p

891161 24124’

1 12

35 9

8 0

7 0

Number of trials with no choice

Competition Proportion correct

(Courts,

p < 0.01 p < 0.001

distribution

0 0

0 0

0 0

0 0

Number of trials with no choice

Cooperation

Comprehension

was made. Also shown are the number of additional trials in which the trainers or chimpanzees failed to make a

score shows the number of trials with a correct choice (the baited container) per total trials on which a choice

Accuracy

Subject

Table 2.

Intentionality in the chimpanzee

345

course of Phase 2. Choice performance for the two trainers showed a statistically significant difference @ < 0.05; z-test for a difference between proportions) by the end of Phase 1 for Luvie, at the start of Phase 2 for Sadie, and by the end of Phase 2 for Bert and Jessie (Figure 1). The right-hand panels of Figure 1 show the course of change in the trainers’ latency to choose during the experiment. The competitive trainer generally took more time to make his choice, and the difference between trainers attained statistical significance @I < 0.05; t-test for a difference between means) at some point during Phases 1 or 2 for all animals. Table 2 presents the overall performance levels for both trainers in each phase of the experiment. Proportion of trials with a correct choice by the cooperative trainer attained levels significantly above chance @ < 0.05; binomial test) for all subjects in all phases except Bert in Phase 1. In contrast, overall accuracy for the competitive trainer decreased during testing for all animals. The competitive trainer was able to choose correctly on a majority of trials in Phase 1 with Luvie and Jessie, but performed at chance levels in Phases 2 and 3 with these two animals. On the other hand, the competitive trainer showed chance levels of performance in Phase 1 with Sadie and Bert, and then accuracy declined to levels significantly below chance during Phases 2 and 3. Finally, the table shows that when the one-minute time limit was imposed in Phases 2 and 3, the competitive trainer was unable to make a choice on a substantial number of trials with Bert, Luvie, and Jessie; whereas the cooperative trainer never experienced this difficulty with any subject. Thus, all animals learned to convey or suppress information about the location of food, depending upon whether the trainer was cooperative or competitive. However, the results show a further development for Sadie and Bert; these subjects demonstrated an ability to misinform the competitive trainer. This trainer eventually showed a significant run of incorrect choices (Table 1) and chose the unbaited container on a significant proportion of trials (Table 2) in Phase 2 for both animals. This ability to mislead the competitive trainer appeared first for Sadie at the outset of Phase 2, and for Bert by the end of Phase 2. Finally, the data from Phase 3 (Table 2) show that even after a tenmonth hiatus in testing, the subjects were quite flexible in choosing to convey accurate information, withhold, or convey misleading information when the type of recipient changed from trial to trial within sessions. Response

forms

Although these results establish that information was transmitted or suppressed by the chimpanzees in the production test, they tell us nothing

346

G. Woodruff arld D. Premack

about the actual behavior that was the source of information, nor how that behavior developed and changed during the experiment. Figures 2, 3 and 4 present the results of the videotape analysis performed on each chimpanzee’s behavior during the first and last 24 trials of Phases 1 and 2 with each trainer. Figure 2 presents the mean frequency (responses per minute)

Figure 2. Mean response

rate fclr componerlts of the subjects’ behavior patterns in the production test. The responses are: A, approach one side of mesh; T, orient torso toward container; G,, gaze at container; P, ‘point” with extended arm or leg toward container; Gt, gaze at trainer while other parts of the body orient toward a container; R, change in intensity of rocking motions when trainer approaches a container; Cb, climb mesh to ceiling; Cl, cling and,lor groom passive aide. Solid lines connect data points from trials with the cooperative trainer, broken lines those from trials with the competitive trainer.

A GTPG,RCbCl AQTPG,RCbCI AG,TPG,RCbCI AG;TPG,RCbCI

-

Response components moperation competitii

Intentionality

in the chimpanzee

347

of each of the various types of response scored by the observers. Figure 3 shows the incidence of directional bias (left versus right) for those types of response that were oriented toward the containers. Lastly, Figure 4 shows a measure of the correlation between the direction of each type of response and the actual location of food. Figure 2 provides an indication of the similarities in the subjects’ response forms, as well as differences in their respective styles. In one way or another, all animals tended to orient parts of their body toward the containers. At first, however, these responses were embedded in substantially different behavior patterns for each subject. For example, Sadie often sat with her back to one wall of the room in the enclosure, rocking from side to side for much of the time during early trials. Bert and Jessie tended to cling to the Figure 3.

Proportion

of all orientational

responses

which were directed

toward the

container on the left side of the room, irrespective of the location of the food. See Figure 2 for explanation of details. Phase:

1,

lf

2,

Response components -

cooperation

- -

competition

2,

348

G. Woodruff and D. Premack

aide, making only brief sojourns to one side of the mesh or the other, and then returning to the aide. Only Luvie showed a relatively well-organized pattern of behavior which appeared “deliberately” informative at the start. On most trials, she left the aide immediately, walked to one side of the mesh, sat facing the near container, extended one leg under thn mesh toward the container : “pointed”), and then glanced back and forth from trainer to container. Over the course of the experiment, similarities in the subjects’ response patterns became more pronounced. Orientational responses increased in frequency, and what was perhaps the most explicit cue, “pointing” with outstretched arm or leg, emerged for the remaining three animals. The folFigure 4.

Proportion of all orientational responses which were directed toward the baited container, irrespective of the position of the container on the left or rCpht side of the room. See Figure 2 for explanation of details. Phase:

5

1,

2

,
0.2

I,,,,,/

-0

s 2

1,

0

I

1.0

1

2,

“’

I

Rwt

,,,, I

no

.:,I AGJPG,R

AG;TPG,R

Respbnse

- -

cooperation competition

AG,TPG,R

ccrn~cments

AG;TPG,R

Intentionality

in the chimpanzee

349

lowing sequence of responses was observed regularly in all subjects by the end of Phase 1: approach one side of the mesh, sit with torso oriented toward the near container, “point” with extended limb, glance back and forth from trainer to container. Figure 5 shows a series of photographs of Sadie “pointing” in this fashion (in this case deceitfully, for she is directing the competitive trainer to the unbaited container). The emergence of this form of behavior varied over the subjects: Luvie showed it from the start, in Sadie and Bert it developed more gradually, and in Jessie it appeared quite abruptly late in testing. For much of Phase 1, Jessie remained on the aide’s lap and glanced at the containers from that position. However, on Trial 115 with the cooperative trainer, and on each trial thereafter, she performed the following sequence: she left the aide’s lap, somersaulted toward one side of the mesh, lay on her side, extended an arm under the mesh toward the container, and glanced back and forth from trainer to container. Figure 2 shows this progression toward more organized patterns of behavior for each subject during the experiment; the frequency curves tend to flatten out for cooperative trials, orientational responses occurring at approximately the same rate because they were performed as a unit. At first there were few differences between behavior patterns performed on cooperative and competitive trials. However, all subjects eventually suppressed information in the presence of the competitive trainer (Figure 1 and Table 2), and the data in Figure 2 reveal one of the ways this effect was achieved. All subjects showed a decline in the frequency of some or all of their orientational responses on competitive trials. Approach and “pointing” were the first responses to drop out, whereas glances toward the containers and the trainer declined very slowly, if at all (e.g., Jessie, Figure 2). For the two subjects (Luvie and Jessie) who only suppressed information by the end of the experiment, the data in Figure 2 show a lower frequency of all orientational responses and a higher frequency of responses that directly competed with orientational ones (e.g., climbing the mesh to the ceiling, clinging to the aide) on competitive rather than on cooperative trials. For the other two subjects (Sadie and Bert), the data show a similar difference in response frequencies midway through the experiment. However, when these animals later mislead the competitive trainer in Phase 2, there were fewer differences in response rates between the two trainers. Figure 3 reveals another means by which information was suppressed. During some portions of the experiment, subjects showed a tendency to orient toward the same location (generally on the left side) across trials, irrespective of the actual location of food. Since food was located on each side of the room on 50% of all trials, this “position bias” had the effect of reducing the trainer’s accuracy to chance levels (50%). Note that for some

350

G. Woodruff and D. t’remack

Figure 5.

(a)

(b)

Four photographs deceitfully toward showing her watch behavior just after

of Sadie, the first two (a and b) showing her “point” the unbaited container on a competitive trial, the third (c) the trainer make a choice, atzd the fourth (d) showing her the trainer lifts the empty container. After directing the

Intentionality in the chimpanzee

35 1

trainer to the unbaited container, her head snaps abruptly toward the baited one. Sadie’s ability to suppress glancing at the baited container - until after the competitive trainer has chosen the unbaited one - is part of what makes her deceit Dossible.

(d)

subjects a position bias was observed only on competitive trials (e.g., Luvie at the start of Phase 2. Jessie at the end of Phase 2). Figure 4 summarizes how the informativeness of each type of response (that is, the correlation between the direction of each response and the actual location of the food) changed over the course of the experiment. The figure shows the proportion of all responses of each type that were directed toward the baited container. A value substantially above 0.5 indicates a strong positive correlation between the direction of the response and the location of food -- the response conveyed largely accurate information. A value at or near 0.5 indicates that the response conveyed little or no differential locational information. A value below 0.5 shows a negative correlation between the direction of the response and the location of food - in this case, the response was most often directed toward the unbaited container, and thus conveyed inaccurate, misleading information. Figure 4 shows that at first most subjects directed their behavior predominantly toward the baited container. l The type of response that was the most informative differed between subjects (changes in rocking motions for Sadie, “pointing” for Bert, glances at the trainer for Luvie, approach for Jessie), and was not the most frequent type of response in any case (compare Figures 2 and 4). By the end of the experiment, however, the results provided the m&t accurate information to the show that “pointing” cooperative trainer for all subjects. Whereas there was little difference in the informativeness of responses performed on cooperative and competitive trials at the beginning, the correlation between certain cues and the location of food soon declined on competitive trials for all animals. This difference appeared at the end of Phase 1 for Sadie, Bert, and Luvie, and at the end of Phase 2 for Jessie. For most responses, the proportions shown in Figure 4 declined to levels near 0.5 for all animals when the trainer was competitive, and declined still further to levels substantially below 0.5 for two subjects (Sadie and Bert). Thus, by the end of Phase 2 Luvie’s and Jessie’s behavior patterns conveyed virtually no reliable cues about the location of food for the competitive trainer; in contrast, Sadie’s and Bert’s behavior imparted information that was consistently misleading on competitive trials. ‘The trainers commented that Jessie’s behavior was the most difficult to read, and Figure 4 shows why. None of her responses showed a value much different from 0.5 early in the experiment. The cooperative trainer indicated that he often based his decision on a subtle difference in the duration of Jessie’s glances toward the two containers. When we reanalyzed the videotapes from the 24 trials with the cooperative trainer in Phase li for Jessie, we found that she spent more time looking at the baited container than at the unbaited one on 17 trials. This figure closely agrees with the trainer’s accuracy score - he chose the baited container on 18 of these trials.

Intentionality in the chimpanzee

353

It should be emphasized that the types of response presented in the figures were those the trainers reported that they used to make their choices. These responses almost certainly do not exhaust all possible ways in which information was conveyed or suppressed; more subtle cues also may have played some role in the trainers’ decision to choose a container. However, the types of response depicted in Figure 4 can account for a good deal of the trainers’ choice behavior. We calculated a mean proportion of accurate orientational cues with each trainer for each block of trials shown in Figure 4 (by averaging the proportions for all response types for each subject in a trial block) and compared these proportions with the trainers’ accuracy scores during the same blocks of trials shown in Figure 2 (proportion of trials with a correct choice for each subject). For each animal we found a statistically significant correlation between the directional accuracy of each animal’s overall behavior pattern and the trainer’s accuracy in choosing the baited container (Pearson r = 0.91, 0.93, 0.78, and 0.85 for Sadie, Bert, Luvie and Jessie, respectively; all p < 0.05, n = 8). Finally, it may be noted that these significant correlations rule out the possibility that certain extraneous factors were responsible for the competitive trainer’s poor choice performance by the end of the experiment. One might suppose that since the person playing the competitive role was friendly with each chimpanzee outside the test situation, he would be unlikely to choose correctly and thereby deny food to the animals. On the contrary, these data show that the competitive trainer’s eventual inability to choose correctly was based on a decrease in the amount and/or accuracy of information provided by the subjects’ behavior. Comprehension

test

When the roles played by chimpanzees and trainers were reversed in the comprehension test, three of four subjects were able to “read” the behavioral cues provided by the cooperative trainer and choose correctly almost from the beginning of the test. Table 1 shows a significant run of correct choices beginning on Trial 8, 11, and 9 for Sadie, Bert, and Jessie, respectively. Luvie, however, showed a significant run of incorrect choices beginning on Trial 2, and only gradually learned to choose correctly over the course of subsequent trials. Nevertheless, the overall data for Phase 2 (Table 2) indicate that all animals were able to choose the baited container on a significant proportion of trials with the cooperative trainer. At first, most animals chose the container toward which the trainer oriented regardless of whether he was cooperative or competitive. For Sadie and Bert, this behavior led to a significant run of incorrect choices

with the competitive trainer at the outset of Phase 2 (Table 1). Over the course of testing, however, the animals’ choice behavior changed on competitive trials. Performance levels generally began at levels below chance (50%) correct) and then rose to chance levels midway through the phase. At the same time, each subject failed to make a choice within the allotted one minute on some trials with the competitive trainer (Table 2). By the end of Phase 2, three subjects showed a further change; Sadie, Luvie and Jessie consistently avoided the container indicated by the competitive trainer, and Table 1 shows the appearance of one or more significant runs of correct choices with the competitive trainer for these subjects. Thus, three subjects learned to controvert the competitive trainer’s misleading cues by the end of Phase 2. In contrast, Bert persisted in choosing the unbaited container on competitive trials throughout Phase 2. Table 1 shows a significant run of incorrect choices at the start and again midway through Phase 2 for this subject. At best, and then only inconsistently, Bert was able to discount the competitive trainer’s cues and thus choose at random for short blocks of trials. Finally, Table 2 shows that performance levels at the end of Phase 2 were maintained ten months later in Phase 3. even when the type of trainer varied from trial to trial within sessions. Sadie. Luvie, and Jessie chose correctly on the majority of cooperative trials, and chose correctly or not at all on competitive trials. Bert continued to choose the container indicated by the trainers, regardless of whether they played the cooperative or competitive roles.

Although comprehension test trials were not videotaped, several informal observations may be of added interest. At first, the chimpanzees walked or ran directly to the container toward which the trainers oriented, and then immediately overturned it. This behavior continued throughout testing on cooperative trials, but soon changed over the course of testing on competitive trials. At the start of a competitive trial, the subjects began to cautiously approach either the trainer or the container toward which he oriented, but did not necessarily choose that container. Instead, on a number of occasions they walked to the opposite (correct) container and inspected its contents. On a few trials, Sadie and Bert walked repeatedly back and forth from one container to the other, but failed to choose within the allotted time (seeTable 2). In contrast, Luvie and Jessie failed to choose on a substantial number of trials in Phase 2 or 3; these subjects typically remained sitting beside the passive aide for the full minute. By the end of the experi-

Inten tionality in the chimpanzee

35 5

ment in Phase 3, only Sadie consistently walked directly and without hesitation to the (correct) container not indicated by the competitive trainer and immediately inspected its contents.

Discussion These results show the risk of inferring intentionality from field observations alone, and the need for tests to determine when the inference is justified. Generally, when the human observer sees an animal orient toward a goal, and a second animal respond to that orientation, he assumes intentionality: the communication looks that way to him, the more so if the animals are primates. Additionally, when he sees a primate who does not orient toward a goal in a situation in which this benefits him, the human observer again finds the assumption of intentionality irresistible. In both cases, however, a simple test is needed. Is the sender’s behavior sensitive to the difference between recipients, e.g., those who do and do not share the goal with him? If it is not, that is, if he is as likely to convey information to one recipient as to the other, the assumption of intentionality is unfounded. In the field it is rarely possible to make this test, for the field seldom provides that special case in which the same sender faces the two kinds of recipient in the same time period. Instead, in the field we observe one of the two cases and simply assume the outcome of the other. The present results show this to be a dubious practice: each animal conveyed information effectively to the cooperative trainer, but in the beginning showed no sensitivity to the difference between the two kinds of recipient, and transmitted information just as effectively to the competitive trainer. Of course, even if the sender does not respond differentially, we could maintain that he was communicating intentionally to both parties - hoping to modify the behavior of the competitive trainer by treating him as though he were not hostile. Strategies of this complexity are not unknown in humans. Rut we cannot defend such assumptions, even for the human case, without additional evidence. In starting the study of intentionality, we must put aside such cases. As a first step, we need to show that the sender is sensitive to the behavior of the recipient, such that he can adjust the information he transmits more or less flexibly according to the demands of the situation. In addition, it is important to know whether he can do this only in the training situation, or more generally as well. Although in the beginning the animals behaved in the same way in the presence of both the cooperative and competitive trainers, and consequently without regard for whether or not it gained them access to the food, changes

356

G. Woodruffarzd D. Prcmack

in their performance over the course of the experiment suggest the development of intentional communication. When serving as recipients, three of four subjects ultimately learned to controvert the competitive trainer’s cues by avoiding the location toward which he oriented. When serving as senders, their behavior patterns soon changed in form: some responses which provided relatively little information (e.g., changes in rocking motions) disappeared, while more explicit cues either increased in frequency (e.g., approach, glancing at a container) or appeared de nova, and in one case this latter outcome happened quite suddenly (“pointing” for Jessie). More revealing was the development of a difference in the amount of information conveyed to the two trainers. All subjects learned to convey or withhold information, depending upon whether their goal in obtaining the food for themselves was in agreement (cooperation) or at odds (competition) with that of the trainer. Thus, the chimpanzees demonstrated an ability to take into account the nature of the recipient in choosing whether or not to impart information. Finally, two subjects consistently misinformed the competitive trainer, and these instances of deceit meet the most stringent behavioral criteria for intentional communication. Since the development of intentional communication in human children depends upon a variety of maturational and experiental factors (Flavell, et al., 1968), it should come as no surprise that clear evidence of this capacity appeared here after many months and numerous trials of testing, and then only in some of these young chimpanzees. How did the subjects ultimately learn to control the transfer of information? In the production test, the first behavioral change was a suppression of information in the presence of both trainers; this outcome was attained by a variety of means. Orientational responses either declined in frequency, were directed nondifferentially toward both containers on each trial, or were directed to a particular location on every trial (a position habit). All subjects showed at least one, and often more than one of these methods for suppressing locational information. In the comprehension test, the first change was a decline in the animals’ tendency to inspect the container toward which the competitive trainer oriented. The subjects eventually responded without regard for the competitive trainer’s cues, by choosing the containers at random or adopting a position habit, or simply failed to respond at all. In both tests, some components of the animals’ behavior patterns were suppressed more rapidly than others. As senders, approach and “pointing” quickly disappeared, while glances at the containers declined in frequency very slowly or not at all. As recipients, the chimpanzees often approached the container indicated by the competitive trainer at the start of a trial, but did not necessarily inspect its contents. This aspect of the results

In ten tionality in the chimpanzee

35 7

supports a common view that different response systems show different degrees of susceptibility to voluntary control (Kimble and Perlmuter, 1970). Our youngest subject, Jessie, was the last to show evidence even for simple withholding of information in the production test, whereas the oldest animal, Sadie, was the first to show this ability. This result hints at a developmental trend in the chimpanzee’s control over its communicative behavior, a trend that may be linked to a more general change with age in behavioral inhibition. Suppression of information about the location of food was made possible, at least in part, by the subject’s ability to inhibit his behavioral predispositions, and this capacity develops relatively slowly in young organisms (Riccio, Rohrbaugh and Hodges, 1968; White, 1965). It is also possible that observational learning played some role in these behavioral changes. Although the animals had repeated opportunities to “lie” or redirect their behavior to the unbaited container, no animal did so until after having observed the competitive trainer behave in this manner. Observational learning, too, may show a developmental trend. Some features of the design of the experiment suggest an interpretation in terms of traditional learning principles. By this view, successful performance in each test required that the subjects learn a “conditional discrimination”, i.e., several stimulus-response associations, or “if . .. then ...” rules. For example, chimpanzees who deceived the competitive trainer in the production test may have simply learned “if the trainer wears green, then orient toward the baited container to receive reward” and “if the trainer wears white, then orient toward the unbaited container to receive reward”. Although associative learning may have played some role in these tests, numerous aspects of the results deviate from this kind of learning. First, consider the difference in rate of learning with the two types of trainer. The chimpanzees responded correctly in the presence of the cooperative trainer almost from the beginning of each test, whereas the correct response in the presence of the competitive trainer was learned after hundreds of trials and one or more years of testing, if it was learned at all. Why was one “association” learned so readily (with the cooperative trainer), and the other with such difficulty (with the competitive trainer)? Lack of exposure to reward for the correct response on competitive trials cannot be the answer, for all subjects experienced numerous trials during which they oriented toward the unbaited container in the presence of the hostile trainer and thereby received food. We might explain the difficulty with the competitive trainer by noting that cue (trainer), response, and reward (food in the container) were spatially separated, an arrangemnt known to retard or prevent discrimination learning in many species, including primates (Meyer, Treichler and Meyer, 1965). However, the separation held for both cooperative and

competitive trials, and thus the chimpanzees’ immediate success with the cooperative trainer would remain both surprising and unexplained. Alternatively, the results might be viewed as a special case of complex discrimination learning, requiring the inhibition or redirection of a strong behavioral predisposition (e.g., approach and orient toward food) in the presence of a potent social stimulus (the hostile trainer). In contrast, traditional discrimination learning involves initially neutral cues (lights, tones) and an arbitrary, topographically simple response (keypeck, leverpress). However, this approach fails to account for an important part of the data, specifically, the response forms. Animals who learned to deceive the competitive trainer showed distinctly different response topographies in the presence of the two trainers. Contrary to traditional examples of “conditional reactions” (Carter and Werner, 1978; Lashley, 1938), behavior observed on competitive trials by the end of the experiment was never simply a redirected version of that on cooperative trials (see Figures 2 and 4). In addition, the chimpanzees who did not learn to mislead the competitive trainer, but only withheld information from him, had ample exposure to immediate reward for orienting toward the unbaited container, yet they ultimately responded in such a way as to delay reward for up to one minute. This result, too, runs contrary to predictions based on associative learning principles. The emergence of “pointing” in all animals is likewise difficult to explain in terms of learning principles. Although in some animals pointing emerged and increased in frequency over the course of many trials, suggesting a process of gradual “shaping”, this was not always the case. Jessie’s pointing response suddenly emerged at full strength quite late in training. Pointing also cannot be accounted for by a principle of “least effort”. All subjects eventually developed pointing, even though merely sitting by the mesh and glancing at a container had gained them access to food on a large proportion of previous trials. Rather than less effort, the development of pointing may have been related to the rich feedback it provided; a subject who extended a limb could at the same time visually monitor the direction of his response, together with the effect it had on the recipient. Thus, pointing may have furnished the subjects with greater self-awareness for the fact that their behavior was informative. Although the data provide evidence for the chimpanzee’s ability to control the flow of locational information at its source (as sender) and at its endpoint (as recipient), most animals showed less than a complete understanding of both aspects of the communication process. For example, Luvie showed the best performance as sender of accurate information, but was initially poorest at “reading” the cooperative trainer’s accurate cues. More-

Intentionality

in the chimpanzee

359

over, there was little or no correlation between the chimpanzees’ performances as sender and recipient of misleading information. Only Sadie both engaged in deception and adjusted to deception on the part of the competitive trainer. The remaining animals showed a deficiency in one or the other capacity. Luvie and Jessie adjusted to misleading cues in the comprehension test, but failed to produce misleading cues of their own. Bert showed the opposite pattern of results; he was unable to adjust his search to the competitive trainer’s misleading cues, but succeeded in deceiving him in the production test. Thus, comprehension and production developed independently in these subjects, and neither ability showed developmental priority. The independent development of production and comprehension in the present data parallels acquisition of language in the child, as well as in the “language’‘-trained chimpanzee. Chimpanzees taught independent lexicons in comprehension and production initially failed to show comprehension for items learned in production, and production of items learned in comprehension. Only later were the two modalities unified such that lexical items learned in either mode transferred to the other (Premack, 1976). The child’s performance is evidently comparable. In the beginning, the child appears to comprehend words that he does not produce, and conversely, produce words that he does not respond to appropriately when they are addressed to him (Bloom, 1973; de Villiers and de Villiers, 1978). It will be of interest to see whether further data support this parallel between verbal and nonverbal communication, and whether it may be understood in terms of deeper principles. In the meantime, the parallel would appear to have negative implications for the view that language develops from general cognitive factors (e.g., Bruner, 1974/1975; Sinclair, 1971). Surely, by the time the child acquires language, production and comprehension must be unified in the case of nonverbal communication. If so, what the child learns in the nonverbal case would seem not to benefit him in the verbal case, for when he acquires language, production and comprehension once again develop as separate competences. On the one hand, it is important to identify the factors that contribute to the development of intentional communication - inhibition, observational learning, emergence of responses with rich feedback, and so on - but it is equally important to establish the limits of the phenomenon. How abstract does the chimpanzee’s understanding of his own communication become? Can he modulate the information he sends in a flexible manner, in circumstances that go beyond those in which he was trained? In tests conducted at the end of the present study, we addressed this question with Sadie, our oldest and most successful subject. She was able to communicate both accurate and misleading information on appropriate occasions when the

360

G. Woodruj~arzd

D. Premack

location of food was changed from the horizontal dimension (left-right) to the vertical (up-down). In addition, when another hostile agent (the familiar laboratory guard dog) was substituted for the competitive trainer, she communicated misleading information to him from the start. Despite this evidence for generality, however, “pointing” has not been observed outside the testroom. The animals are observed throughout the day, and “pointing” in the presence of humans or conspecifics has never been observed in the settings which the animals share as a group (home cage and outdoor field). Observations and formal tests of this kind should serve to establish the limits of the chimpanzee’s ability to communicate intentionally. The extent to which any species is capable of generalized intentional communication would appear to be limited by (i) its ability to exert control over the full range of its responses which convey different kinds of information, and (ii) its ability to make inferences about the motivational, perceptual, and cognitive attributes of other individuals. As regards the first factor, we presently do not know whether the chimpanzee can control behavior that imparts information about anything other than location. Although apes can also .convey the relative quantity or quality (e.g., food V~YSUSsnake) of hidden objects (Menzel, 1971), it is not known whether they do so intentionally. Indeed, to the extent that the behavior which communicates quantity or quality is predominantly autonomic or affective (e.g., piloerection, species-specific vocalizations, facial expressions, see Miller, 1967), intentional communication of this type of information may prove more difficult than that of location. (Still more difficult than the suppression of affect is its simulation: witness the training stage actors must have in order to be able to feign a convincing emotional reaction on demand (Stanislavski, 1961).) With respect to the second factor, the ability to make inferences about others, data on the human child show a developmental trend in his ability to take on another’s “point of view” in communication situations (Flavell, 1974; Flavell et (II., 1968). Previous work in our laboratory suggests that a chimpanzee as well can make some kinds of inferences, for example, about the purposes of another individual (Premack and Woodruff, 1978). Further comparative research on the development and interaction of these factors may be an especially revealing approach to understanding the large gap in complexity between the communication systems of humans and other species. References Altmann, S. A. (1967) Social communication amongprimates.Chicago, Chicago University Press. Bloom. L. M. (1973) One word at a time: the use of sin,@eword utterances before syntax. The Hague, Mouton.

Itzten tionalitv in the chimparuee

Bruner,

36 1

J. (1974/1975) From communication to language: a psychological perspective. Cog., 3, 255287. Carter, D. E., and Werner, T. J. (1978) Complex learning and information processing by pigeons: a critical analysis. J. exper. Anal. Behav., 29, 565-601. de Villiers. J. G., and de Villiers, P. A. (1978) Language acquisifion. Cambridge, Massachusetts, Harvard University Press. Flavell, J. H. (1974) The development of inferences about others. In T. Mischcl (Ed.), Undersfartding other persons. Oxford, England, Blackwell, Basil, and Mott. Flavcll, J. H., Botkin, P. T., Fry, C. L., Wright, J. W., and Jarvis, P. E. (1968) 7’he development of roletaking and communication skills in children. New York, Wiley. Grant, D. A. (1947) Additional tables of the probability of “runs” of correct responses in learning and problem-solving. Psychol. Bull., 44, 216-279. Grice, H. P. (1967) Logic and conversation. William James Lectures, Harvard University. In P. Cole and J. L. Morgan (Eds.) Sfudies in synfax. (Vol. 3) New York, Academic Press, 1975. Hinde, R. A. (1972)h’on-verbal communi~arion. London, Cambridge University Press. Kimble. G. A.. and Perlmuter, L. C. (1970) The problem of volition. Psychol. Rev., 77. 361- 384. Kohler, W. (1925) The mentality ofapes. New York; Harcourt, Brace. Lyons, J. (1972) Human language. In R. A. Hinde (Ed.) Nonverbal communication. London, Cambridge University Press, pp. 49985. MacKay, D. M. (1972) Formal analysis of communicative processes. In R. A. Hinde (Ed.) Non-verbal communication. London, Cambridge University Press, pp. 3-25. Ma&r, P. (1965) Communication in monkeys and apes. In I. De Vore (Ed.) Primate behavior. New York: Holt, Rinehart, and Winston, pp. 544-584. Marshall, J. C. (1970) The biology of communication in man and animals. In J. Lyons (Ed.) New horizons in linguistics. Harmondsworth, England, Penguin. Menzel, E. W. (1971) Communication about the environment in a group of young chimpanzees. Folia Primatologica, 15, 220- 232. Menzel, E. W. (1974) A group of young chimpanzees in a one-acre field. In A. M. Schrier and F. Stollnitz (Eds.) Behavior of nonhuman primates. (Vol. 4) New York, Academic Press, pp. 83- 153. Meyer, D. R., Treichler, F. R., and Meyer, P. M. (1965) Discrete-trial training techniques and stimulus variables. In A. M. Schrier, H. F. Harlow, & F. Stollnitz (Eds.) Behaviour of nonhuman primates: Modern research rrends. (Vol. 1) New York, Academic Press. Miller, R. E. (1967) Experimental approaches to the physiological and behavioral concomitants of affective communication in rhesus monkeys, In S. A. Altmann (Ed.), Social communication among primates. Chicago, Chicago University Press. pp. 125 ~ 134. Premack, D.(i976) Infelligen~e in apeand man. Hillsdale, New Jersey: Erlbaum. Premack, D.. and Woodruff, G. (1978) Does the chimpanzee have a theory of mind? The Behaviora and Brain Sciences I, 515-526. Riccio, D. C., Rohrbaugh, M., and Hodges, L. A. (1968) Developmental aspects of passive and active avoidance learning in rats. Develop. Psychobiol., I, 108- 111. Simmons, K. E. L. (1951) The nature of the predator-reactions of breeding birds. Behav., 4. 161-~ 171. Sinclair, H. (1971) Sensorimotor action patterns as a condition for the acquisition of syntax. In R. Huxley and E. Ingram (Eds.), Language acquisition: models and methods, New York, Academic Press. Smith, W. J. (1977) The behavior of communicating: an ethological approach. Cambridge, Massachusetts, Harvard University Press. Stanislavski, C. (1961) Creating a role. New York: Theatre Arts Books. van Lawick-Goodall, J. (1971) In rhe shadow of man. Boston. Houghton-Mifflin. White, S. H. (1965) Evidence for a hierarchical arrangement of learning processes. In L. P. Lipsitt and C. C. Spiker (Eds.), Advances in child development and behavior. (Vol. 2), New York, Academic Press, pp. 188-220.

362

G. Woodruff and D. Premack

L’etude Porte sur la communication entre dcs chimpanzes et des Etres humains en we de localiscr un objet cache. Chaque membre de la paire chimpanze-humain etait altcrnativement “dmetteur” et “destinatairc” de I’information. Quand I’etre humain coop&e avec lui dans la recherche de I’objet, le chimpanze tres vite produit et comprcnd les indicts comportementaux qui permettent de locahscr le cible pr&isCmcnt. Quand l’etre humain est en competition avec lui dans la rcchcrche de cette cible, le chimpanze apprend d’une part i refuser I’information au destinatairc ou a le tromper ct d’autre part a negliger les indices trompeurs fourniv par I’emettcur. La capacite des chimpanzds de fournir ct d’utiliser une information correcte comme tme information faussc en tenant comptc de la nature du destinataire et de I’dmetteur, est une prcuve en faveur d’unc capacite dc communication designee chez les primates non humains.

Co,@fion, @Elsevier

7 (1979) 363-383 Sequoia S.A., Lausanne

- Printed

in the Netherlands

Syntactic presupposition

in sentence comprehension*

J. LANGFORD

Department of Melbourne,

and V. M. HOLMES Psychology,

Parkville,

Australia

University

of

**

Abstract Two experiments investigated the role of syntactic presupposition in sentence comprehension. In Experiment I subjects verified cleft, pseudocleft and factive complement sentences with respect to preceding context paragraphs, which contradicted either the assertion or the presupposition of the target sentence. Subjects took significantly longer to verify sentences with false presuppositions than sentences with false assertions. In Experiment II subjects verified cleft and pseudocelft sentences with respect to subsequently) presented pictures. Once again, verification times for sentences with false presuppositions were significantly longer than verification times for sentences with false assertions. It was argued that these findings are more adequately> explained by a “‘structural” hypothesis, than in terms of strategies designed to locate given and new information.

Introduction Although a large body of psycholinguistic research has been devoted to the study of sentences in isolation, it is now widely recognized that any approach which ignores the role of context is severely limited. One way of formulating the relationship between sentences and their contexts is in terms of their presuppositional content. For this reason the phenomenon of presupposition has received considerable attention from psychologists concerned with sentence comprehension (e.g., Haviland and Clark, 1974; Hornby, 1974), sentence memory (e.g., Offir, 1973; Singer, 1976; Hupet and Le Bouedec, 1977) and sentence production (e.g., Osgood, 1971; Bock, 1977). Similarly, by studying presupposition, the present research aimed to further elucidate the mechanisms by which sentences are understood in context. Our particular interest was in the type of presupposition which is created by the sentence’s surface structure. *This research was partly supported by an Australian Research Grants Committee award to V. M. Holmes. **Requests for reprints should be addressed to: J. Langford, Department of Psychology, University of Melbourne, Parkville, Victoria 3052, Australia.

364

J. Langford and V. M. Holmes

A syntactic presupposition may be identified as that part of a sentence’s meaning which is not affected by negation of the sentence. It may be distinguished from the focus, the part of the sentence falling within the scope of negation, and from the assertion, the message produced by the focus in combination with the presupposition. In both sentences (1) and (2), the presupposition is John embroidered something and the focus is tlzc napkin. The assertions of the two sentences concern whether the napkin was or was not embroidered by John. The relationship of a sentence to a given context may be specified in terms of the nature of the information contained within the presupposition and focus. A contextually appropriate sentence is one which presupposes established or given information and which focusses new or contrasting information. For example, in the context of the question What did John embroider? ( 1) and (2) would be appropriate replies, (though (2) is unhelpful), while (3) and (4) would be inappropriate. Because of its obvious association with contextual antecedents, presupposition is often referred to as given information, while focus and assertion are referred to as new information. (1) (2) (3) (4)

It was a napkin that John embroidered. It was not a napkin that John embroidered. It was John who embroidered a napkin. The one who embroidered a napkin was John.

The most fully developed account of the role of presupposition in the comprehension process is the Given-New strategy of Haviland and Clark (1974). This strategy, which Clark and Haviland (1977) have characterized as “a three step procedure for relating the current sentence to.. . [a] knowledge base”, involves the following stages: “At Step 1, the listener isolates the given and the new information in the current sentence. At Step 2, he searches memory for a direct antecedent, a structure containing propositions that match the given information precisely. Finally, at Step 3 the listener integrates the new information into the memory structure by attaching it to the antecedent found in Step 2”. (Clark and Haviland, 1977, p. 5.) The Given-New strategy was originally based on a series of experiments which investigated the processing of sentences containing lexical presuppositions (i.e., presuppositions produced by individual word meanings rather than by syntactic structure). Haviland and Clark (1974) found that these sentences were understood more rapidly when preceded by a context sentence which established a direct, as opposed to indirect, antecedent for the presupposition. Thus, the test sentence (7) was understood faster when it was preceded by the context sentence (5) than (6).

Syntactic presupposition in sentence comprehension

(5) (6) (7)

365

Ed was given an alligator for his birthday. Ed wanted an alligator for his birthday. The alligator was his favorite present.

These results may be taken as evidence that people find it more difficult to integrate a sentence with its context when the sentence’s presuppositions are not established directly by the context, and that some additional inferential processing is necessary to understand such sentences. However, because they only considered the processing of unfulfilled presuppositions, Haviland and Clark have not directly established that asserted and presupposed information are processed differently. It would seem likely that if the sentence’s assertion did not follow directly from the context, then a similar increase in comprehension time would be observed. For example, in the context of (8), sentence (9) would follow directly but (10) would not, even though in both cases the presupposition is fulfilled. (8) Ed wanted an alligator for his birthday (9) The alligator was his favourite present. ( 10) The alligator was his worst present.

and was given one.

Presumably, (10) would take longer to integrate with the context because it would require an additional bridging inference, for example, that Ed changed his mind about alligators. Since there is no evidence in Haviland and Clark’s experiment that new information is treated any differently from given, their Given-New model remains unsubstantiated. Quite a different model has been proposed to describe how people process presupposition and focus when verifying sentences. Presumably, assigning truth to a statement might introduce different strategies from those used in comprehension without verification. Hornby (1974) and Clark and Clark (1977) have suggested that, because speakers generally assign presupposition and focus appropriately, listeners are likely to assume that the presupposition is true (since normally this contains information they already know) and to examine more critically the focus, where new information is normally located. Thus, while the Given-New strategy suggests that the listener first corroborates the presupposition and then proceeds to assimilate the assertion, Hornby’s account suggests that the listener critically examines the focus while taking for granted the truth of the presupposition. The evidence for this model, which might perhaps be designated the New-Given strategy, is not particularly convincing. Hornby’s experiment investigated the processing of syntactic, rather than lexical, presupposition in a sentence verification task. In this task each acoustically presented sentence was followed by the tachistoscopic exposure of a picture. Hornby found that more errors were

366

J. Langford

and V. M. Holmes

made in recognizing a discrepancy between sentence and picture when the discrepancy involved a presupposed noun than when it involved a focussed noun. While this result appears to demonstrate a differential effect of presupposition and focus, it is open to alternative interpretations. Firstly, the extremely short presentation time ensured that not more than a single aspect of the picture could be attended to. In this situation all that the observed difference indicates is that subjects tended to examine focussed information first. This result does not, however, directly establish that subjects were “taking for granted” the truth of the presupposition, since there may have been no time left to examine the presupposed information. More importantly, in the experiment sentential focus coincided with the locus of heaviest stress. It is therefore quite possible that the superior recognition of discrepant focussed information was due to its dominant acoustic trace in short-term memory, rather than to a selective search for new information. The marking of focus by acoustic stress also leaves open the question of whether it is the syntax of the sentence that was used as a basis for distinguishing asserted and presupposed information. It seems, then, that the evidence does not unequivocally implicate syntax as a means by which people distinguish between presupposition and assertion in sentence comprehension. Nor do either of the hypothesized Given-New or New-Given strategies have very strong supporting evidence. The experiments reported below thus aimed to determine whether the structural distinction between presupposition and assertion really is utilized in the processing of sentences in context. They also aimed to evaluate the relevance of the givennew distinction to the comprehension process. Experiment

I

Most previous studies concerned with presupposition have examined the performance consequences of presupposition failure. Similarly, this experiment was designed to compare the processing of contradicted presuppositions with the processing of contradicted assertions. The comprehension task chosen was a paragraph-sentence verification task. With this task it was possible to set up the two experimental conditions by constructing for each test sentence two contexts -- one contradicting the sentence’s assertion and the other the sentence’s presupposition. By comparing the verification times for a given sentence in the two context conditions it was hoped to determine whether presupposition and assertion are in fact processed differently. In order to avoid any confounding between acoustic salience and structural marking of presupposition and assertion, all sentences were visually presented. To ensure that the results would not be limited to any one particular sentence structure,

Syntactic presupposition in sentence comprehension

367

two groups of sentences were used. One group comprised cleft and pseudocleft sentences and the other, factive complement sentences. Because the task in the present experiment necessitated relating a target sentence back to a preceding paragraph, the Given-New strategy would seem an appropriate model of the processing involved. Yet, since the task was one of verification, the New-Given strategy is also applicable. The two models predict rather different outcomes of the experiment. Subjects using a GivenNew strategy would first search memory for information corresponding to the presupposition in the target sentence and then would proceed to verify the assertion. These subjects would presumably detect information contradicting the presupposition before they would detect information contradicting the assertion. The Given-New Strategy, therefore, would predict longer verification times for items with false assertions. Subjects using a New-Given strategy, on the other hand, would tend to ignore the presupposed information, assuming it to be true, and would selectively search memory for information relevant to the assertion. These subjects would be expected to succeed in detecting false assertions but highly likely to overlook false presuppositions. The New-Given strategy, then, would predict more errors on items with false presuppositions than on items with false assertions. Materials and design

Materials consisted of 24 true and 24 contradicted, or false items. Each set contained 12 cleft-pseudocleft sentences and I2 factive complement sentences. To control for the order in which the target sentence mentioned assertion and presupposition, two target versions were constructed for each item. One version, either a cleft or an object complement sentence, mentioned the assertion first and the other version, a pseudocleft or a subject complement mentioned the presupposition first. Examples of target versions mentioning the assertion first are It was the coffee that ruined our carpet and The keeper was annoyed by their feeding the monkeys. Target versions which mention the presupposition first are What ruined our carpet was the coffee, and Their feeding

the monkeys

annoyed

the keeper.

The major experimental manipulation was achieved by constructing two contexts for each pair of target sentences. The two contexts were similar in length and content, but differed in that one contained information which contradicted the assertion in the target sentence while the other contained information which contradicted the target’s presupposition. There were thus four related versions of a given item. For the cleft-pseudocleft items, both inconsistencies involved the nouns in the target, while in the factive complements the inconsistencies involved the verbs in the target. To control for

368 J. Langford and V. M. Holmes

where in the context the discrepant information occurred, half of the items were constructed so that information relevant to the assertion was mentioned last and the other half were constructed so that information relevant to the target’s presupposition was mentioned last. The 24 true items were as similar as possible to the false items. Table 1 shows examples of the context and target conditions for two false items used in Experiment I. In sum, the item design consisted of one between-items factor, referring to whether the last information in the context related to the assertion or the presupposition (Context Order). There were also two within-item factors, one referring to the type of proposition, assertion or presupposition, contradicted by the context (Proposition Type), and one referring to whether the target sentence mentioned the assertion or the presupposition first (Target Order). To prevent subjects seeing more than one of the four versions of a given item, four lists were prepared containing one of each of the four conditions obtained from crossing Target Order and Proposition Type. The assignment of conditions was systematically varied so that the four lists contained an equal number of items in each condition. The same random ordering of true and false items was used for the four lists. Each list was given to an independent group of subjects. Thus, the subject design included a between-subjects factor (Group), as well as three within-subject factors (Context Order, Proposition Type and Target Order). As well as performing subject and item analyses of variance, minimum F’ was calculated in order to permit simultaneous generalization to new subject and new item populations (cf., Clark, 1973). The level of significance for all statistical decisions was set at a: = 0.05. Procedure All stimulus materials were presented on the oscilloscope terminal of a PDP11 computer. Subjects began each trial by pressing a button marked Go. A context paragraph appeared on the screen, which subjects had to read carefully, taking as much time as they needed. They then pressed the Go button again and a target sentence appeared on the screen. The subjects’ task was to decide as quickly and accurately as possible whether the target sentence was consistent or not consistent with the context, and then to press a Yes or and “inconsistent” were a No button accordingly. The terms “consistent” used in preference to “true” and “false” because of the logical problem that sentences with false presuppositions cannot themselves be false. The time taken to read the context and the time taken to verify the target sentence were measured to the nearest millisecond. Each subject received six practice trials during which the experimenter provided feedback about the correctness of each response.

Syntactic presupposition in sentence comprehension

Table 1.

369

Examples of contexts and targets for false items from Experiment I Cleft-pseudocleft

item

False Assertion Context Jane and Mary are flatmates. They get on well together but often in the evenings. They already have a radio but Mary would like well. False Presupposition Context Jane and Mary are flatmates. They get on well together but often in the evenings. They already have a television but Jane would like

find themselves bored to buy a television as

find themselves bored to buy a radio as well.

Target Sentencesa a) It is Jane who wants to get a television. b) The one who wants to get a television is Jane. Factive

complement

item

False Assertion Context Linda’s maths teacher is a defensive, discouraging person. He likes to prove his superiority by giving his students problems which are too hard for them. He was quite cross today when Linda, the brightest in the class, managed to solve the problem he set. False Presupposition Context Linda’s maths teacher is a defensive discouraging person. He likes to prove his superiority by giving his students problems which are too hard for them. He was delighted today when not even Linda, the brightest in the class, could solve the problem he set. Target Sentences a) Linda’s teacher was delighted that she could solve the problem. b) The fact that Linda could solve the problem delighted her teacher. aTarget

a) mentions

assertion

first and b) presupposition

first.

Subjects

Forty undergraduate students at the University of Melbourne were paid for participating in the experiment. All were native speakers of English.

Results and discussion The mean and standard deviation of each subject’s response distribution were calculated and, in order to minimize the influence of exceptionally long or short times, any observed verification time which exceeded two standard deviations from the mean was set at that value. This procedure affected 5.5% of verification times for false items and 3.8% of verification times when true

310

.I. I,angj?wd ard

Table 2.

V. M. Holmes

Mean l~erijkation times ill millisecorlds for j&e

___~ Assertion

Target First

items in Experiment Order Presupposition

Context

/

First

Order

Type of Proposition

Assertion Last

Presupposition Last

Assertion Last

Presupposition Last

False Assertion

259-I

2462

2629

2782

F&Z Presupposition

3047

2912

3039

3036

and false items were combined. Data for incorrect responses were excluded from the verification time analyses. Table 2 shows the means for the adjusted verification times for the test (inconsistent) items. In the analysis of variance on these means, the main effect of Proposition Type was highly significant, with F,( 1,36) = 47.34, F,(1,22) = 16.21 and min F’(l,37) = 12.08, showing that sentences with false assertions were verified significantly faster than sentences with false presuppositions. The main effect of Target Order was significant by subjects, with F,(l ,36) = 8.24. However, this effect was not significant in the item analysis, with F2( 1,22) = 1.33, and therefore this result cannot be considered typical of all items. From inspection of Table 2, it can be seen that false assertions were detected faster when they appeared first rather than second in the target sentence (a difference of 176 milliseconds) but false presuppositions were detected no faster when the presupposition was first (a reverse difference of 28 milliseconds). However, this interaction between Proposition Type and Target Order did not approach significance in either the subject or the item analyses. Neither the main effect of Context Order, nor any of the other possible interaction effects, approached significance in either the subject or the item analyses. In order to compare the two types of target sentence structure, an analysis contrasted the verification times for cleft-pseudocleft and for factive complement sentences. There was no evidence that these two structural types differed in overall verification time as the main effect of Sentence Type was not significant, with F,(1,36) = 2.96 and F, < 1. Nor did Sentence Type interact significantly with either of the other factors in the analysis, Proposition Type and Target Order. In a further analysis the means of the test items were compared with the means of the distractor items where the required response

Syntactic presupposition in sentence comprehension

Table 3.

371

Mean percentage of errors for false items in Experiment I Target Assertion

Presupposition

First Context

Type of Proposition False Assertion False Presupposition

Assertion Last

Order

Presupposition Last

First

Order Assertion Last

Presupposition Last

5.8

5.0

6.7

5.0

18.3

11.6

4.2

9.2

was consistent. The means for correct false and true items were 2826 milliseconds and 3387 milliseconds respectively. False items were verified significantly faster than true items, with F,(1,36) = 53.59, F,(1,46) = 25.95 and min F’(1,78) = 17.48. Analyses were also performed on the mean numbers of errors. Table 3 shows the mean percentage error for the test items. A trend was observed for there to be more errors when presuppositions were falsified than when assertions were falsified, although inspection of Table 3 reveals that this effect differed in magnitude for the four order conditions, being entirely absent when the assertion was last in the context and the presupposition first in the target. In the subject analysis there were significant main effects of Proposition Type, F,(1,36) = 12.25, and of Context Order, F,(1,36) = 7.20, and significant interactions between Proposition Type and Context Order, F,( 1,36) = 6.33, and between Proposition Type, Target Order and Context Order, F,(1,36) = 4.65. However, not one of these effects approached significance in the item analysis, suggesting that they would not be generalizable to another set of items. That the main effect of Proposition Type was not representative of the items used was confirmed by inspection of the individual item means; only 7 out of the 24 false items exhibited the effect. The means of the context inspection times for the two experimental conditions, i.e., for the false assertion and the false presupposition conditions, were 15,827 milliseconds and 15,669 milliseconds respectively. These were not significantly different in either the subject or the item analysis, with F, < 1 and F, < 1. The major finding of this experiment was that verification times for items with false assertions were significantly faster than verification times for items with false presuppositions. There was also a tendency for there to be more

372

J. Langford and V. M. Holmes

errors on false presupposition items than on false assertion items, although this was only true of a subset of the items. The non-significant interaction between Sentence Type and Proposition Type suggests that the assertionpresupposition distinction was created just as strongly by the factive complement sentence structures as by the clefts and pseudoclefts. The absence of an interaction between which proposition was false and which was mentioned first in the target sentence rules out the possibility that subjects simply verified sentences in left to right sequence. The non-significant interaction between Proposition Type and Context Order indicates that verification times were no faster for items where the discrepant proposition was mentioned last in the context, suggesting that, at least in this task, recency of mention did not systematically affect the salience of contextual antecedents in working memory. A possible weakness of Experiment I is that the experimental contrast necessitated a comparison between verification times for quite different factcounterfact pairs. It is conceivable that the contradictions involved in the false presuppositions happened for some reason to be more difficult to detect than those involved in the false assertion condition. To determine whether or not this was the case, a control experiment was run where each target sentence was separated into two simple “component” sentences. For example, a factive complement sentence from Experiment I, Basil’s failing physics upset his parents was separated into Basil failed physics (the presupposition component) and BasiE’s parents were upset (the assertion component). Cleftpseudocleft sentences were separated by inserting indefinite pronouns. For example, It was the coffee that ruined our carpet was separated into The coffee ruined something (the assertion component) and Something ruined our carpet (the presupposition component). The procedure of the control experiment was identical to that of Experiment I. Each component sentence appeared with the Experiment I context which contradicted it. It was found that component sentences which had originally been presuppositions were no more difficult to verify than component sentences which had originally been assertions, with F,( 1 ,I 8) = 3.9 1 and F, < 1. In fact the difference between the means for the two conditions was in the opposite direction. The results of this control experiment therefore rule out the possibility that the outcome of Experiment I was simply due to a confounding of difficulty of contradiction with type of proposition contradicted. One other finding that deserves comment at this point is the fact that true items in Experiment I took significantly longer to verify than false items. This result is atypical of the general finding in verification tasks that true responses, at least for explicitly affirmative sentences, are faster than false

Syntactic presupposition in sentence comprehension

3’73

responses (e.g., Clark and Chase, 1972). A simple explanation of this finding is that the true items necessitated an exhaustive search of the context representation, whereas the false items permitted the search to be terminated as soon as a discrepancy was located. A further, artifactual reason for the finding may have been that subjects had difficulty in deciding whether some supposedly equivalent expressions were actually consistent or not. In fact, in some of the true items, the expression in the target sentence was more general than the corresponding expression in the context. Subjects reported that they sometimes found it difficult to decide whether terms such as “oyster” and “seafood”, “godfather” and “man”, and “relations with China” and “foreign policy” were meant to be consistent. Experiment

II

This experiment was designed to investigate whether the assertion-presupposition effect obtained in Experiment I would also be present when target sentences are processed in the absence of prior context. Presumably, any view based on the idea that people use the marking of assertion and presupposition as directions to new and given information would predict that the assertion-presupposition effect would not be present in such a situation. People would adjust to a situation where there is no prior context (and ipso facto, no given information) and would have no need to distinguish structurally between assertion and presupposition. A verification task was again employed but this time with the order of target sentence and context reversed. Accordingly, subjects were required to judge the relevance of a picture to a previously presented sentence. Picture, rather than paragraph contexts were used, on the assumption that a nonverbal context would provide minimal interference with memory for the target sentence. The sentences were all clefts and pseudoclefts, factive complements being excluded because their semantic content proved too difficult to depict unambiguously. Materials and design

There were 16 true and 16 false items. For each item there was one picture context and four possible sentence structures: cleft agent, cleft object, pseudocleft agent and pseudocleft object. Since all the target sentences were clefts or pseudoclefts, the inconsistency between picture and target always involved a noun. For any given false item, the discrepancy involved the same noun in all treatment conditions. As in Experiment I, the cleft and pseudocleft structures controlled for whether the target mentioned the assertion or

374

J. Langford and V. M. Holnws

the presupposition first. The agent and object sentences allowed the discrepant noun to be either focussed or presupposed. Table 4 gives examples of the target sentences used for two of the test items in Experiment II. Pictures were simple black line drawings. Half of the false items were constructed so that the discrepancy involved the logical subject of the action in the picture and the other half were constructed so that the discrepancy involved the logical object of the depicted action. To ensure that all lexical presuppositions were fulfilled, the discrepant noun was always present somewhere in the picture. For instance, in the first example in Fig. 1, the discrepant noun woman is depicted, but not in the appropriate relationship with the cupboard. In the pictures for the true distractor items there was always a third irrelevant object present, to prevent these items being noticeably different from the false items. Figure 1 shows the pictures which were used for the two false items exemplified in Table 4. To summarize, for the test items the design consisted of one betweenitems factor (Role of False Entity) and two within-item factors (Proposition Type and Target Order). Role of False Entity referred to whether the discrepant object was the logical subject or the logical object of the depicted action. As in Experiment I, Proposition Type referred to whether the target sentence asserted or presupposed the discrepant noun and Target Order referred to whether the target sentence mentioned the assertion or the presupposition first. Again, all four treatment conditions for each item (corresponding to the four sentence structures) were assigned to different lists and the assignment was varied over the items so that overall each list contained the same number of each sentence type. The same random ordering of true and false items was used for the four lists, which were given to four independent groups of subjects. The inspection times for all items (i.e., the time taken to read the target sentence) was classified according to two factors, Structure and Case of Clefted Noun. Structure referred to whether the sentence was a cleft or a pseudocleft structure and Case of Clefted Noun referred to whether the sentence asserted the logical subject or the logical object. Once again, subject and item means were analysed, and min F’ was calculated for all analyses. Procedure The stimuli,

which were black and white transparencies, were projected onto a light grey wall by a carousel projector. Subjects were seated at a response table. On each trial they pressed an Advance button to bring on the target sentence, and then, when ready, pressed it again to bring on the picture. Subjects then had to decide whether the picture and target were consistent or

Syntactic presupposition in sentence comprehension

Table 4.

375

Target sentences for two false items from Experiment II a) A false logical subject item Cleft agent: It’s the woman who is pushing the cupboard. Cleft object: It’s the cupboard that the woman is pushing. Pseudocleft agent: The one who is pushing the cupboard is the woman. Pseudocleft object: What the woman is pushing is the cupboard. b) A false logical object item Cleft agenrt It’s the man who is washing the floor. Cleft objecr: It’s the floor that the man is washing. Pseudocleft agent.’ The one who is washing the floor is the man. Pseudocleft object: What the man is washing is the floor.

Figure 1.

Picture contexts for (a) a false logical subject item and (b) a false logical object item in Experiment II.

(a)

(b)

not, and to press the Yes or the No button accordingly. A digital printout timer, connected to a photocell in the projector, recorded to the nearest millisecond the times taken for inspection of the target sentence and the verification time from the onset of the picture. At the beginning of each session there were seven practice trials during which the experimenter provided feedback as to the correctness of the subjects’ responses. Subjects

Forty undergraduate students at the University of Melbourne were paid for participating in the experiment. All were native speakers of English.

376

J. Langford and V. M. Holmes

Results and discussion

The cut-off procedure described above was used in each verification time analysis. This affected 4.6% of verification times for false items and 5.5% of times for true and false items combined. Table 5 shows the means of the adjusted verification times for false items in Experiment 11. In the analysis of variance, the main effect of Proposition Type was significant, with F,(1,36) = 19.58, F,(1,14) = 31.57 and min F’(1,48) = 12.09. Sentences with false assertions were verified significantly faster than sentences with false presuppositions. The main effects of Target Order and Role of False Entity were non-significant in both the subject and item analyses, as were all the possible interactions between the three factors. The mean verification times for false and true items were 1226 milliseconds and 1195 milliseconds respectively. These were not significantly different, with F,( 1,36) = 2.20 and F,( 1,30) < 1. Table 6 shows the means of the percentage error for false items. Analyses of the mean numbers of errors revealed that, although there was a tendency, once again, for there to be more errors on false presupposition items than on false assertion items, this difference was not significant, with F, (1,36) = 3.00 and F,( 1 ,14) = 2.74. None of the other main or interaction effects was significant in either the subject or the item analysis. A preliminary analysis revealed that the means of the target inspection times for true and false items, which were 2146 milliseconds and 2344 milliseconds respectively, were not significantly different in either the subject or the item analysis. Thus, true and false inspection times were combined, the means being presented in Table 7. In the analyses, the main effect of Structure was significant by subjects, with F,(l,36) = 6.54, and by items, with F2( 1,3 1) = 4.44, but min F’ failed to reach significance, with min F’(1,62) = 2.63. There was a strong trend, therefore, for cleft sentences to be processed faster than pseudocleft sentences. The interaction between Structure and Case of Clefted Noun was significant in the subject analysis, with F,( 1,36) = 5.68, but not by items, with F, < 1. This interaction was a cross-over, whereby cleft agent sentences were inspected faster than cleft object sentences, but pseudocleft agent sentences were inspected slower than pseudocleft object sentences. Experiment II has demonstrated that, even when sentences have no prior context, and therefore contain no given information, they are represented in a form which distinguishes between assertion and presupposition. Sentences which presupposed discrepant information took significantly longer to verify than sentences which asserted it. In contrast with Hornby’s findings,

Syntactic presupposition in sentence comprehension

Table 5.

377

Mean verification times in milliseconds for false items in Experiment II Role of False Entity Logical

Subject

Logical Target

Table 6.

Object

Order

Type of Proposition

Assertion First

Presupposition First

Assertion First

Presupposition First

False Assertion

1208

1142

1145

1131

False Presupposition

1321

1267

1297

1356

Mean percentage error on false items in Experiment I1 Role of False Entity Logical

Subject

Logical Target

Table 7.

Object

Order

Type of Proposition

Assertion First

Presupposition First

Assertion First

Presupposition First

False Assertion

5.00

2.50

1.25

3.75

False Presupposition

7.50

3.75

5 .oo

5.00

Mean inspection times in milliseconds for all items in Experiment II Case of Clefted Noun

Agent Object

Sentence

Structure

Cleft

Pseudocleft

2100 2162

2244 2175

the overall error rate for false presupposition items was low, 5.3%. Furthermore, there was no significant difference between the numbers of errors made on items with false presuppositions and on items with false assertions.

378 J. Langford and V. M. Holmes

A superior feature of Experiment II was that the two experimental conditions involved exactly the same contradiction between sentence and picture. Therefore the difference in verification times can only be attributed to differences between the target surface structures in the two conditions. The strength of the assertion-presupposition effect is quite remarkable in view of the unlimited inspection time, and of subjects’ own impressions that they were merely recoding the target sentences into a simple form. The fact that target inspection for true and false items did not differ suggests that subjects could not have anticipated whether an item would be true or false on the basis of the target sentence alone. When true and false inspection times were pooled and analysed in terms of surface structure features, it was found that cleft sentences were processed more rapidly than pseudocleft sentences. In addition, the interaction effect suggested that, at least for some items, sentences which mentioned the logical subject, verb and logical object in that order tended to be processed faster than the sentences with non S-V-O orders. Thus, cleft agent sentences, It is the S that is V-ing the 0 were inspected on average faster than cleft object sentences, It is the 0 that the S is Virlg and pseudocleft object sentences, What the S is V-irlg is the 0 were inspected faster than pseudocleft agent sentences, The OYIFthat is V-kg the 0 is the S. The fact that inspection times tended to be sensitive to the different ways of expressing the same basic meaning justifies the removal of time constraints from the processing of the target sentence. If only a limited amount of time is allowed, as was the case in Hornby’s experiment, then more complex surface structures may be encoded less adequately. In the present experiment, it may be fairly safely assumed that by the verification phase of the trial, the four surface structure types had been encoded in equivalent form. This is borne out by the absence of any effect of surface structure type per se on subsequent verification times and error rates. As in Experiment I, neither of the control factors was related to verification time. Thus there was no evidence either of a serial left-to-right verification strategy, nor of any primacy or recency effects on memory for the target sentence. Similarly, the results ruled out the possibility that subjects were using a systematic strategy to search the picture for logical subject before logical object. There was no difference between verification times for items with discrepant logical subjects and for items with discrepant logical objects. In contrast with Experiment I, verification times for true and false items were not significantly different. There are several aspects of the procedure in Experiment II which might explain this result. Firstly, the picture contexts were simpler than the paragraphs, and thus any exhaustive search in the true items would end much sooner. Secondly, the pictures were present in front of the subject, rather than being held in memory, so that any search involved

Syntactic presupposition in sentence cornprehension

379

would be much more efficient. Finally, the confirming instances were less equivocal in Experiment II, where there were no problems with the intended equivalence of sentences and pictures. Thus a search of the picture could be terminated by either a confirming or a disconfirming instance. General discussion The major finding of the experiments reported was that sentence verification times were significantly longer when a discrepancy between target sentence and context was located in the syntactic presupposition than when the discrepancy was in the assertion, a result which has not been demonstrated before. As was pointed out above, previous studies of presupposition have either failed to make a direct comparison between the processing of assertion and presupposition, or have made the comparison but have confounded the syntactic distinction between assertion and presupposition with other nonsyntactic factors such as acoustic stress. The present experiments were not open to either of these criticisms. The result provides confirmation that once the surface structure of a sentence is processed, not only does it influence the memory representation of the sentence meaning, but it also serves to direct subsequent verification processes. Returning to the two psycholinguistic accounts of presupposition outlined above, the present findings reveal that the Given-New Strategy of Haviland and Clark (1974) is inadequate as a description of the processing of presupposition and assertion in sentence verification tasks. If subjects had been using this strategy to integrate target sentences with contexts, then they should have detected false presuppositions more rapidly than false assertions. It would probably be argued by Haviland and Clark that the present findings do not constitute a refutation of their model since the model was never intended as a description of verification tasks. However, it is surprising that a “procedure for relating the current sentence to.. . [a] knowledge base” was not evident at least in Experiment I, where subjects had to compare a target sentence to a previously assimilated paragraph context. The New-Given strategy, i.e., the model proposed by Hornby (1974) and by Clark and Clark (1977), is also unacceptable as an explanation of the present findings. In Clark and Clark’s formulation, the New-Given model was based on the assumption that the semantic representation of a target sentence contains not only propositional, or underlying logical information, but also thematic, or Given-New specifications (cf., Clark and Clark, 1977, p. 89). According to this model, subjects first encode the sentence’s thematic structure, and then adopt a search strategy based on their expectations about which parts of this thematic structure are likely to be true. To account for Hornby’s finding that subjects were less likely to detect false presuppositions

380 J. Langford and V. M. Holmes

than false assertions, the New-Given model proposes that subjects assume that given information is true, and only search for facts relating to the new information. This account founders when confronted with the results of the present experiment: given enough time, subjects rarely failed to detect false presuppositions. This difficulty may be dealt with by modifying the NewGiven model, and postulating that subjects do eventually search the context for information corresponding to the given part of the sentence, but only after they have searched for, and failed to detect discrepant new information. This revised version of the New-Given strategy is superficially more consistent with the results of the present experiments. However, closer scrutiny reveals that there is a major problem with the idea that subjects may search in serial order first for new and then for given information. In Hornby’s experiment, where the discrepancy involved an unfulfilled lexical presupposition in either the focus or the presupposition of the target sentence, it was possible for subjects to search for new information before searching for given. However, for the cleft and pseudocleft sentences in the present experiment, where the lexical presuppositions for focussed words were always fulfilled, it was logically impossible to detect new information without reference to the content of the presupposition. In the experiments reported above, the discrepant new information involved not the focus, but the combination of focus and presupposition, i.e., the assertion of the target sentence. Thus, the idea that subjects search first for new information, and then for given, cannot account for the findings of the present experiments. Clearly, an alternative explanation is called for. In what follows, we will outline an alternative account of our results which we shall designate the Structural hypothesis. A basic premise of the Stuctural hypothesis is that the encoding of the target sentence contains no specific marking of given and new information, but simply retains some representation of the hierarchical organization already present in the sentence’s surface structure. A second feature of the hypothesis is that it does not explain the assertion-presupposition effect in terms of ordered search strategies, and hence makes no assumptions about the order in which the context is searched. Instead, it assumes that the effect is attributable to the number of mental operations required to express, or to make explicit the discrepancy. Finally, the hypothesis assumes that the expression of a discrepancy involves the construction of either a negated proposition or of a Yes/No question, and that this process is necessary before a subject can be confident of a No response. When subjects attempt to construct a negated proposition, the simplest procedure is simply to predicate a negative to the sentence representation as it stands. Because the target sentence is represented according to its surface structure form, the first constructed negation will correspond to a denial of

Syntactic presupposition in sentence comprehension

381

the assertion but not of the presupposition. If it is the assertion which is discrepant with the context, then a No response will be appropriate and immediate. However, if the discrepancy is in the presupposition then the first constructed denial, which does not extend to the presupposition, will not correspond to the mismatch exactly. In this case, subjects will not be able to respond No immediately but will have to reformulate the target sentence. This reformulation will involve stripping off the main clause and isolating the subordinate clause so that the presupposition may be directly negated. If, on the other hand, subjects verify sentences by generating Yes/No questions, then a similar explanation can be provided for why they take longer to verify sentences with false presuppositions. On the assumption that surface structure form determines the representation of the target sentence, the most easily generated question is one which manipulates elements of the main clause. For example, It is the boy who is chasing the cow is most simply transformed into Is it the boy who is chasing the cow? As with sentential negation, then, the first-generated question interrogates only the assertion. If the discrepant fact is asserted, then the question will be appropriate and an immediate No response possible. However, in order to locate a discrepancy in the presupposition, another question will be required which specifically interrogates the subordinate clause. Once again, the need to generate a second question accounts for the longer times associated with verifying false presuppositions. Clark and Clark (1977) have already pointed to the similarity between verifying a sentence and answering a Yes/No question. Their reason for making this comparison was that they wished to establish, by analogy, a plausible reason for why subjects should assume the presupposition is true and selectively search for New information. However, our proposal is that subjects may actually generate Yes/No questions, or alternatively, negated propositions, in the course of verifying sentences. It is possible to make some tentative suggestions as to the stage at which these structural manipulations occur within the overall comprehension process. In this regard, the distinction drawn by Cutler (1976) between Stage A and Stage B processing is pertinent. In her formulation, Stage A processing involves the parsing-plus-lexical-look-up activity necessary to construct a literal interpretation of the sentence, while Stage B processing involves the subsequent enrichment and modification of this interpretation in the light of extra-sentential factors. In Experiment I, where the verification times included the time taken to read the target sentence, it was not possible to isolate Stage A and Stage B processing. However, in Experiment II, where target inspection times were recorded separately, the two types of processing were distinguishable. The inspection times presumably reflected Stage A processing; they were sensitive to differences in the structural complexity of the four

382

J. Langford and V. M. Holmes

sentence types, and resulted in a sema,rtic interpretation adequate for the accurate verification of the sentence. On the other hand, the verification times may be assumed to reflect Stage B processing. Thus, the structural reformulations needed to apprehend a discrepancy between presupposition and context may be considered a part of Stage B activity. Finally, the major conclusion which may be drawn from the experiments described here is that the structural distinction between assertion and presupposition has a real effect on the processes by which sentences are integrated with their contexts. Although this effect is normally undetected, it becomes strikingly apparent when there is a discrepancy between a sentence and its context. In seeking an explanation for this effect we have rejected the proposal that it is due to strategies based on subjects’ expectations about where Given and New information are normally located in the sentence. Instead, it has been proposed that the effect is primarily due to the position of the presupposition within the subordinate clause which renders it inaccessible to negative or interrogative predicates, both of which are implicated in the apprehension of discrepant information. In their discussion of Hornby’s findings, Clark and Haviland (1977) have noted that English has only clumsy and indirect devices for qualifying presuppositions. However, they conclude that the major reason for subjects having failed to detect false presuppositions was their assumption that the speaker was adhering to the Given-New contract. The difference between their account and ours lies in the emphasis, not so much on the listener’s assumptions about adherence to a Given-New contract, as on the unavoidable consequences of processing surface structure. Although we have not embraced the notion that listeners use assertion and presupposition in a deliberate, strategic way, it is clear that the structure in which a message is conveyed may facilitate the process by which the listener can reconstruct that message. Conversely, it is apparent that the same structure may obstruct the processing which attempts to integrate the sentence with contextual knowledge. Whether a structure will be facilitative or disruptive depends on the nature of the information placed in its presupposition. It appears that the content of the presupposition must be that information which requires minimal processing if sentence comprehension is not to be obstructed. It should be noted that the preceding discussion is not inconsistent with the idea that assertion and presupposition serve a crucial function in the communicative process. However, the emphasis of our Structural hypothesis is that this function is determined by the effect of sentence structure on the language processing mechanisms, rather than by strategies based on the listener’s pragmatic expectations.

Syntactic presupposition in sentence comprehension

383

References Bock, J. K. (1977) The effect of pragmatic presupposition on syntactic structure in question answering. J. verb. Learn. verb. Behav., 16, 723-735. Clark, H. H. (1973) The language-as-fixedeffect-fallacy: a critique of language statistics in psychological research. J. verb. Learn. verb. Behav., 12, 335-359. Clark, H. H. and W. G. Chase (1972) On the process of comparing sentences against pictures. Cog. Psychol., 2, 101-111. Clark, H. H. and E. V. Clark (1977) Psychology and Language. New York, Harcourt Brace Jovanovich, Inc. Clark, H. H. and S. E. Haviland (1977) Comprehension and the Given-New contract. In R. 0. Freedle (ed.), Discourse Production and Comprehension. Norwood, N.J., Ablex. Cutler, A. (1976) Beyond parsing and lexical look-up; an enriched description of auditory sentence comprehension. In R. J. Wales and E. Walker (eds.). New Approaches to Language Mechanisms. Amsterdam, North Holland. Haviland, S. E. and H. H. Clark (1974) What’s new? Acquiring new information as a process in comprehension. J. verb. Learn. verb. Behav., 13, 512-521. Hornby, P. A. (1974) Surface structure and presupposition. J. verb. Learn. verb. Behav., 13, 530-538. Hupet, M. and B. Le Bouedec (1977) The Given-New contract and the constructive aspect of memory for ideas. J. verb. Learn. verb. Behav., 16, 69-75. Offii, C. E. (1973) Recognition memory for presuppositions in relative clause sentences. J. verb. Learn. verb. Behav., 12, 636-643. Osgood, C. E. (1971) Where do sentences come from? In D. D. Steinberg and L. A. Jakobovits (eds.), Semantics, Cambridge, Cambridge University Press. Singer, M. (1976) Thematic structure and the integration of linguistic information. J. verb. Learn. verb. Behav., 15, 549-558.

Resume On a fait deux experiences pour Studier le role de la presupposition syntaxioue dam la comprehension des phrases. Dans la premiere experience les sujets doivent verifier, en fonction de contextes present& avant les phrases, des phrases clivkes, des pseudoclivees et des phrases avec des complements factitifs. Les contextes peuvent contredire soit l’assertion soit la presupposition de la phrase cible. Les sujets mettent significativement plus de temps pour verifier les phrases avec des presuppositions fausses que pour verifier les phrases avec des assertions fausses. Dans l’expirience II, les sujets verifient les phrases clivees et pseudoclivees en fonction d’images present&es apres les phrases. Les temps de verification pour les phrases avec des presuppositions fausses sont ici aussi significativement plus longs que les temps de verification pour les phrases avec des assertions fausscs. On rend mieux compte de ces donnees avec une hypothese “structurale” qu’en termes de strategies ayant pour but de localiser les informations don&es ou nouvelles.

Cognition, 7 (1979) 385-407 @Elsevier Sequoia S.A., Lausanne

Discussion - Printed

in the Netherlands

On the psychology of prediction: L. JONATHAN

Whose is the fallacy?* COHEN”*

Oxford University

We are all undeniably prone to certain perceptual illusions. Heat creates mirages, water distorts the appearance of shape, poor visibility leads us to over-estimate distances, trick diagrams (like the Mfiller-Lyer) promote errors about comparative size. Are we also prone to certain intellectual fallacies? Is it experimentally demonstrable that, unless specifically instructed about the matter at issue, we are systematically inclined to make certain sorts of mistakes in our reasonings? Psychologists have claimed this, and they are undoubtedly right in some cases. Indeed, to demonstrate human proneness to certain kinds of intellectual fallacy, psychological experiment is scarcely needed. For example, an uneducated person’s belief in a flat Earth can hardly result from anything else but a tendency to over-generalise from immediate appearances. But in at least one case it seems more likely that the fallacy has been in the experimenters’ interpretations of their data, rather than in the minds of the experimental subjects. Kahneman and Tversky (I 973 and 1974) and Tversky and Kahneman (1974) have argued that intuitive judgments of probability are biassed towards predicting that outcomes wilI be similar to the evidence. But the Tversky-Kahneman argument is based on the assumption that the human mind has only one legitimate framework within which to reason about uncertain predictions - viz. the calculus of chance that was gradually made explicit by Pascal and others in the seventeenth and eighteenth centuries and has long formed the mathematical basis of statistical theory. When that erroneously restrictive assumption is discarded, the relevant experiments may be construed instead as confirming the common use of a nonPascalian concept of probability for tasks of the kind in question. The Pascalian theory of probability is admittedly the only explicit and systematic theory of probability that currently forms part of a scientist’s educational curriculum. But, even on the subject of probability, we ought to be sufficiently open-minded to envisage the possibility that current educational curricula may not as yet reflect every legitimate mode of reasoning.

*I am grateful to Steve Stich for some helpful comments on an earlier draft of the present paper. **Requests for reprints should be sent to L. Jonathan Cohen, The Queens College, Oxford University, Oxford, England.

386

I,. J. C&w

The illusoriness of a perceptual illusion is normally easy to demonstrate to anyone. Touch comes to the aid of vision, perhaps, and the stick in water that looks bent is j’dt to be straight. Or a closer look corrects a more distant one, and the mirage disappears. Here the methods of checking first impressions are familiar to every sane adult in our culture, and the criteria of correct judgemcnt are universally accepted. But how can the illusoriness of an alleged intellectual illusion be demonstrated? Tversky and Kahneman claim that their subjects are guilty of ignoring calculable inconsistencies between the information given and the probability-judgments produced. Thus the fallacies that they allege to be committed are computational howlers. But to accuse someone of computational error within a logical or mathematical system you need first to be quite sure that you have correctly interpreted what system he is in fact using. The relativity theorist’s calculations are not erroneous just because they are non-Euclidean. So, in order to establish their own contentions about human fallibility, Tversky and Kahneman need to exclude the possibility of a non-Pascalian theory of probability within which their subjects’ calculations would be quite legitimate. But, so far from such a theory’s being impossible, it can be shown to be implicit even in the standard norms of experimental reasoning - norms so familiar in everyday scientific practice that they rarely arouse the reflective attention of those who use them. Francis Bacon (1620) was the first to recognise the existence of these norms in modern science, and his exposition of them was later amplified by Robert Hooke (1705), J. F. W. Herschel1 (1833) William Whewell (1847) and J. S. Mill (1843). But these norms have lacked a systematic theoretical development until recently (Cohen, 1970) and in consequence it has been easy for psychologists, like Tversky and Kahneman, to misclassify certain human reasoning processes as being Pascalian and invalid, rather than as being Baconian and valid.

The alleged fallacy of representativeness According to Tversky and Kahneman, people typically judge the probability that A originates from, or belongs to, B as high when A is highly representative of B (i.e., is highly similar to it) and as low when the opposite is the case. On their view this is a heuristic that, though sometimes harmless, tends to foster two important computational errors. It has the consequences both that prior probabilities, or base-rate frequencies, tend not to be taken into account where they should be, and also that the significance of differences in sample-size tends to be ignored. Two experiments reported by Tversky and Kahneman (1974) will show what is meant here.

On the psychology

of prediction:

Whose is the fallacy?

387

In the first experiment the subjects were shown brief personality descriptions of several individuals, allegedly sampled at random from a group of 100 professionals ~ engineers and lawyers. The subjects were asked to assess, for each description, the probability that it belonged to an engineer rather than to a lawyer. In one experimental condition, subjects were told that the group from which the descriptions had been drawn consisted of 70 engineers and 30 lawyers. In another condition, subjects were told that the group consisted of 30 engineers and 70 lawyers. The odds that any particular description belongs to an engineer rather than to a lawyer should be higher in the first condition where there is a majority of engineers than in the second condition, where there is a majority of lawyers. Specifically, it can be shown by applying Bayes’ rule that the ratio of these odds should be (0.7/0.3)* or 5.44, for each description. In a sharp violation of Bayes’ rule, the subjects in the two conditions produced essentially the same probability judgments. Apparently, subjects evaluated the likelihood that a particular description belonged to an engineer rather than to a lawyer by the degree to which this description was representative of the two stereotypes, with little or no regard for the prior probabilities of the categories. The subjects used prior probabilities correctly when they had no other information. In the absence of a personality sketch, they judged the probability that an unknown individual is an engineer to be 0.7 and 0.3, respectively, in the two baserate conditions. However, prior probabilities were effectively ignored when a description was introduced, even when this description was totally uninformative. In a second

experiment

subjects failed to appreciate the role of sample size even when it was emphasized in the formulation of the problem. Consider the following question: ‘A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know about 50% of all babies are boys. However the exact percentage varies from day to day. Sometimes it may be higher than SO%, sometimes lower. For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?’ [The numbers of subjects who gave the various possible answers were] : The larger hospital (21). The smaller hospital (2 1). About the same (that is, within 5 percent of each other) (53). Most subjects judged the probability of obtaining more than 60% boys to be the same in the small and in the large hospital, presumably because these events are described by the same statistic and are therefore equally representative of the general population. In contrast, sampling theory entails that the expected number of days on which more than 6% of the babies are boys is much greater in the small hospital than in the large one, because a large sample is less likely to stray from 50%.

388

I,. J. Collcn

If in such experiments as these Tversky and Kahneman’s subjects are to be thought of as reasoning in terms of Pascalian probabilities, they are indisputably committing gross fallacies. But before imputing gross fallacies to his fellows a psychologist always needs to be sure that no more charitable interpretation of their responses is available. Some attention must first be devoted to clarifying the content, scope, and credentials of the various norms that might act as arbiters of fallaciousness: and, of course, this clarification is something to be accomplished initially by logical or philosophical argument rather than by the study of experimental findings.

The structure

of Baconian

probability

God, as we are plausibly informed by John Locke, did not make man barely two-legged and leave it to Aristotle to make him logical. Nevertheless, though even those untutored in formal logic can make some kinds of valid deductions, it is an enterprise of considerable difficulty to construct an explicit statement of the principles governing demonstrative validity. Just the same is true of probabilistic reasoning. The seventeenth century saw the first steps taken towards an explicit theory of probability, but these steps led in two rather different directions. Along one main line of advance the initial ideas of Pascal and Fermat were taken up by Leibniz and Bernoulli and were soon refined and developed for a vast variety of self-conscious scientific purposes (cf., Hacking, 1975). Judgments of probability in this sense constrain one another in accordance with the familiar principles of complementation for negation for conjunction (p[B&C/Al = (p[B/Al = 1 ~ P[%W and multiplication or posterior, probabilities are p[B/Al x p[C/A&BI): and conditional. related to unconditional, or prior, ones by the Bayesian principle p[B/A]

p[NBl x P[BI P[AI

=--

where

p[Al

> 0

The other seminal ideas about non-demonstrative inference were given their classical exposition by Francis Bacon, and were concerned with the establishment of causal laws. On the one hand, it was in effect insisted, there must be controls in order for us to be sure that A itself. and not some phenomenon occurring alongside A, is necessary for causing B: on the other hand, the A-B sequence must be observed to occur in a variety of relevant circumstances, in order for us to be sure that A alone is sufficient to cause B. But the method of difference and the method of agreement, as J. S. Mill called these two requirements. have traditionally been catalogued rather than

On the psychology of prediction: Whose is the fallacy?

389

systematised. They have been seen as separate and independent criteria. Indeed the Baconian tradition did not render explicit any well-regulated method of treating the extent of experimental support for a causal hypothesis as a matter of degree rather than all-or-nothing. Nor did it have any conception of the rational constraints that one such judgment of experimental support may place on another: for example, it had no principles relating the support for a hypothesis to the support for its negation, or the support for hypotheses considered separately to the support for their conjunction. As a theoretical explication of intuitive practices in non-demonstrative inference it was thus markedly inferior to the Pascalian tradition. One can understand therefore why Tversky and Kahneman ignore it altogether and identify the Pascalian account with what they call ‘the normative theory of prediction’, making no allowance for the possibility that probabilities may also be graded in a different way and probability-judgments place correspondingly different kinds of constraints on one another. A systematic and sufficiently precise development of Baconian probability is now available, however, and a brief, informal outline of this development will suffice to show its bearing on the issues investigated by Tversky and Kahneman. The theory has four key ideas. (i) The traditionally distinct methods of agreement and difference are generalised into a single ‘method of relevant variables’ for grading the inductive reliability of generalisations about natural phenomena in any domain that is assumed to obey causal laws. (ii) The (Baconian) probability of an A’s being a B is identified with the inductive reliability of the generalisation that all A’s are B’s. (iii) Judgments of Baconian probability are seen to constrain one another in accordance with principles that are derivable within a certain modal-logical axiomsystem but not within the classical calculus of chance. (iv) Baconian probability-functions are seen to deserve a place alongside Pascalian ones in any comprehensive theory of non-demonstrative inference, since Pascalian functions grade probabilification on the assumption that all relevant facts are specified in the evidence, while Baconian ones grade it by the extent to which all relevant facts are specified in the evidence. I shall sketch each of these four ideas in turn: fuller details and arguments are available elsewhere (in Cohen, 1970 and 1977). (i) The method of relevant variables grades generalisations by their capacity to resist interference from factors which sometimes interfere with the operation of other generalisations that have been, or could be, formulated in the same field of enquiry. A relevant variable is defined, roughly, as a (not logically exhaustive) set of such factors that are co-ordinate with one another, in the way that, for example, different kinds of previous medical history might be some of the relevant circumstances for clinical tests on

390 I,. .I. Coilrrl

hypotheses about a drug’s efficacy. We may take it that an investigator approaches the task of testing a particular causal hypothesis in his field with an appropriate list of such inductively relevant variables in mind, and that this list is ordered in accordance with the supposed importance of the different variables and preceded by a pairing of the hypothesised cause with some appropriate control. Experimental tests of increasing severity may then be constructed by varying more and more of these factors in combination with one another. The inductive reliability of a causal or noncausal generalisation may be ranked in accordance with the complexitygrade of the most complex test that it succeeds in passing. No measurefunction is possible here because the severity of an experimental test, thus construed, is a non-additive property of it. But this is clearly the framework within which. with greater or less thoroughness, a great deal of assessment and criticism is carried out in experimental science. A successful hypothesis is seen to hold good over a wider and wider range of inductively relevant circumstances. Of course, the list of appropriate controls and relevant variables in a particular field is always subject to modification or enlargement as we learn more about the subject-matter. For example, the thalidomide tragedy established the hitherto insufficiently recognised importance of the pregnancy factor in toxicity-tests. (Also, the tests may well be carried out on samples of an appropriate size, rather than just on individuals, if it is thought that several unknown variables may be operating: the hypothesis is then that the drug has a certain Pascalian probability of producing the specified result.) But the method of relevant variables itself, in one realisation or another. is an invariant factor in the rational competence of an experimental scientist. Even if he does not wish to rank-order hypothcscs by it. he must at least be prepared to draw comparisons of greater or less reliability in accordance with the rankings that it generates. (ii). The laws we discover are of most use to us as licenses to carry out the corresponding inferences. but where the reliability of the law is not known to be complete the inference can be known only as a probable one. If the hypothesis that all A are B resists a fairly severe level of test, we have a license to infer with a correspondingly high level of inductive or Baconian probability from the premiss that a particular thing is A to the conclusion that it is B. But note the difference here from mathematical or Pascalian probability. The Baconian probability never depends (except in the limitingcases) on the ratio of A that are B to those that are either B or not-B. It depends instead on the extent of causally relevant factors which are powerless to interfere in any particular case with the A-B connection. It is thus a property that belongs distributively to each A rather than collectively to the totality of A. So we have here two different ways of gauging probabil-

ity and we should no more expect judgments of Baconian probability to coincide with judgments of Pascalian probability than we expect judgments of the academic value of a lecture to coincide with judgments of its value for attracting large numbers of people into the lecture-hall. Indeed mere numbers of favourable instances have no close bearing on the inductive reliability of a generalisation, since the replicability of a test-result is nothing but a guarantee for its genuineness. What determines the extent of the inductive support that identically favourable test-results give to a hypothesis is the structure of the test carried out, not the number of times that the same test-result has in fact been repeated. However the reliability of a generalisation may be maintained in the face of unsuccessful test-results if it is so modified as to exclude the presence of the circumstances that falsify it. Thus if ‘All A are B’ is falsified by variant V, of a relevant variable, but not by V2, and if V, excludes VI, then ‘All things that are both A and I/, are B’ has greater reliability than ‘All A are B’. In such a case p,[B/A & V,] will have a higher value than pr[B/A ] things that are both A and J’* are B’ has greater reliability than ‘All A are B’. In such a case pr [B/A & Vz ] will have a higher value than pr [B/A ] (where pI is a Baconian or inductive probability-function). In other words, if a Baconian probability is favourable, it increases with the weight of evidence. (iii). Judgments of Baconian probability impose certain systematic constraints on one another, because of their connection with the method of relevant variables. For example, any test that is passed by both ‘All A are B’ and ‘All A are C’ will also be passed by ‘All A are both B and C’. Hence the latter generalisation is as reliable as either of the former, if they are equally reliable, or as the less reliable of the two, if they are not. It follows that, if pr [B/A 1 2 pr [C/A 1, then pI [B & C/A I = pI [C/A] . The Baconian probability of a conjunction does not have to be less than the probability of either of its conjuncts, in contrast with the familiar rule for Pascalian probability. Again, in some cases being A is quite irrelevant to being B and then both ‘All A are B’ and ‘All A are not-B’ will have zero-grade inductive reliability. Hence it is quite possible for both pr [B/A] and pI [not-B+4 ] to be zero. Equally, whatever test ‘All A are B’ has passed, ‘All A are not-B’ must have failed. If the former has any greater than zero reliability, the latter has none. It follows that for any normal A, if pI [B/A] > 0, then pI [not-B/A] = 0. So the negation principle for Baconian probability is not complementational, as is that for Pascalian probability. Again, because of their dependence on covering generalisations, Baconian probabilities are invariant under contra-position, though this is not true of Pascalian ones. That is, pIIB, A] = pIIA, B] . These and other constraining principles for Baconian probability-judgments are all derivable within a suitable generalisation of the modal logic known as

392

L. .J. Cohm

S4. And to understand why this is so we need to reflect that the ideal of Baconian induction is to establish causal laws - necessities of nature. To ascribe complete inductive reliability to a generalisation is to ascribe it that kind of necessity: to ascribe it a lesser degree of reliability is to ascribe it something resembling, but inferior to, natural necessity. It turns out, moreover, that there is no appropriate way of mapping this modal logic onto the calculus of chance. Normal Baconian probabilities are not merely not equivalent to Pascalian ones, but are not even any kind of function of the latter. We can also define a concept of prior probability for Baconian reasoning, by putting pr[B] equal to pr[B/B-or-not-B] , analogously to the definition of prior in terms of posterior Pascalian probability. It then turns out that we can have pr[B/A] > 0 even when pr[B] = 0: there is no analogue of Bayes’ law in the theory of Baconian probability. In Baconian reasoning prior probabilities always set a floor to posterior ones, but never a ceiling. (iv). Though we are here concerned with two logically disparate systems of reasoning, there is nevertheless an important sense in which not only are both systems equally entitled to claim that they issue in judgments of probability but also each is entitled to claim that it complements the other. Notoriously even the Pascalian calculus is open to several alternative intcrpretations as a theory of probability. According to the semantics provided, it constitutes a theory of relative frequencies (Reichenbach, Von Mises), of belief intensities (Ramsey, de Finetti. Savage), of natural propensities (Peirce, Popper), of logical relations (Keynes, Carnap), or of some other appropriately structured domain (cf., Nagel, 1938 and Mackie, 1973). In each of these probabilistic interpretations, however, it may be regarded as a theory about gradations of provability (Cohen, 1977): what differ from one interpretation to another are the criteria of gradation. Now systems of demonstrative proof may be divided into two categories -- ‘complete’ or ‘inof B is complete’ - in accordance with whether or not the non-provability equivalent to the provability of not-B. Clearly any Pascalian theory must be regarded as a gradation of provability in a complete system, in this sense, because it incorporates a complementational principle for negation. But what about incomplete proof-systems, of which there are very many (e.g., Newtonian mechanics)? Any gradation of provability in an incomplete system must allow the possibility that, for some B. neither B nor not-B have any positive probability. In other words, when provability is put on a scale, there are two ways of doing it. One kind of scale runs from provability at the upper extreme to disprovability at the lower, and the further anything gets from being provable, the nearer it gets to being disprovable - i.e., to the provability of its negation. So the task of the probability-function here is to determine where on the scale the balance lies between proof and disproof. This kind of

On the ps~~clzology of prediction:

Whose is the fallacy?

393

scale requires a complementational principle for negation. The other kind of scale - a non-Pascalian one - runs from provability at the upper extreme to non-provability at the lower. Here we obtain a non-zero gradation of provability for B only if the premisses are already on balance in favour of B, and the level of the gradation depends just on the weight of the premisses, i.e. on the amount of relevant facts they include. Only if the premisses were, on balance, in favour of not-B would we instead, by grading their weight, obtain a non-zero gradation of provability for not-B. So if the probability of B on A is greater than zero, that of not-B must be zero; and it is also possible for both B and not-B to have zero probability, since the premisses A might be wholly indecisive or irrelevant in relation to the issue of B or not B. Therefore the fact that Baconian probability obeys a negation principle of this kind, as we saw earlier, is in no way a mark against it. Rather, we learn thus that Baconian criteria fill a legitimate niche, which is otherwise unoccupied, in any sufficiently comprehensive scheme for the gradation of provability. For example, the evidence that a North American male is thirty years old provides a high Pascalian probability for his survival till the age of fifty. But it provides only a low Baconian one, since the weight of the evidence is rather light: the man might also be a rock-climber, amateur pilot, heavy smoker, etc. The two probability-judgments differ substantially from one another. Nevertheless, neither implies the falsity of the other. Both judgments may be true, because each supplies us with a quite different kind of information from the other. The Pascalian judgment grades probability oy1 the assumption that all relevant facts are specified in the evidence, while the Baconian one grades it by the extent to which all relevant facts are specified in the evidence.

Some familiar uses of Baconian probability The theory of Baconian probability is highly important for the interpretation of Tversky and Kahneman’s results. But, before examining what emerges from its application to some of those results, we should note the existence of several other, independent grounds for supposing that a good deal of intuitive reasoning about uncertain outcomes has a Baconian structure. This will serve to confirm that there is nothing at all odd or surprising about the conformity of Tversky and Kahneman’s subjects to Baconian norms in appropriate cases. The first point has already been mentioned. The experimental method of reasoning in modern science is seen to have an essentially Baconian structure, if we consider how it operates in the assessment of hypotheses about causal laws, in abstraction from any statistical issues involved. Wherever we use

394

I,. J. Cohen

those assessments as a basis for judging the validity of our predictions, we are invoking Baconian standards. Accordingly, so far as scientific enquiry uses only methodised commonsense, we should expect to find that even the lay subjects of Tversky and Kahnemann’s experiments have a capacity for Baconian reasoning which they will exercise in appropriate cases. The second point concerns the relation between probability and rational belief. We naturally tend to suppose that, even where we cannot achieve certainty, there is some level of probability, on the total evidence available, that justifies rational belief. Only the neurotically sceptical always require absolute certainty in order to justify belief. But what kind of probability may this be? Pascalian or non-Pascalian? If the belief is about a particular matter of fact, like whether our friend John Smith will survive till the age of fifty, we arc surely wise, in general, to insist on a high Baconian probability as well as on a high Pascalian one. If the total evidence available is that John Smith is now thirty years old. we may well have a high Pascalian probability of his survival to fifty. But it would be unwise to believe in this survival, because the evidence actually available forms so small a part of the facts about a man that are relevant to his life expectancy. Thirdly, attempts to found a criterion for rational belief on Pascalian probability have to adopt ad hoc evasive stratagems in order not to be hit by the well-known paradox of the lottery. A criterion based on Baconian probability avoids this paradox altogether (Cohen, 1977). Fourthly. besides the above kind of paradox in regard to a Pascalian criterion for predictive beliefs, there is also a whole group of paradoxes in regard to retrodictive beliefs. These paradoxes emerge with particular clarity in relation to the norms of Anglo-American jurisprudence for forensic proof. Proof of an issue in criminal cases has to be at a level of probability that puts the matter beyond reasonable doubt, while in civil cases proof on the balance of probability suffices. So on a Pascalian account we might suppose the civil standard to require a probability of 0.6, perhaps. or 0.7. But then, ii the plaintiff in a civil suit had to establish his case on two relatively independent issues (for example the terms of two separate contracts) in order to win, his case as a whole would not always be established on the balance of Pascalian probability even if each of the two component issues were. What bars this is the multiplicational principle for the Pascalian probability of a conjunction. Similarly we might suppose the required level of proof in a criminal case to be a Pascalian probability of 1 E, where E is as small as you like. But on such a Pascalian analysis, even if all the legally distinguishable elements in a crime (for example, both the wounding and the malicious intention) have been established at the required level, it does not necessarily follow that the crime as a whole has. Now the courts see nothing particularly

problematic or paradoxical about applying the ordinary standards of proof to cases involving more than one issue. Presumably, therefore, if it is sufficient in their eyes for each component issue to be proved at the required level, the coniunction of these issues must also be taken to be proved thereby at the required level. And this is just what the conjunction principle for Baconian probability implies, as we have already seen. So the Anglo-American legal system endorses a Baconian, rather than a Pascalian, framework of reasoning in this context, and the lay jurors who adjudicate on the strength of the evidence put before them must be capable of operating within such a framework. Here is another paradox that would be generated by interpreting the normal standard of forensic proof for civil cases in Pascalian terms. Imagine a rodeo into which 400 people are known to have been admitted through an automatic turnstile after paying the proper sum. Then 1,000 people are counted on the seats, and a hole is discovered in the fence. A man is picked at random from the seats, who turns out to be John Smith. That is all the evidence before the court when John Smith is sued by the management of the rodeo for non-payment of his entry-money. Now, if the balance of probability involved is a Pascalian one, the management should win their case. But, in reality, of course, no jury would award it to them. or at least we should consider it highly unjust if they did. Why so? Because instead a jury would need to know particular facts about John Smith that establish a balance of Baconian probability against him (with a non-complementationa1 negation-principle) - e.g., that he was seen near the hole in the fence, that threads from his clothing were caught on the wire, and so on. Finally, consider the rule that an accused person in a criminal prosecution should be presumed innocent at the outset and judged only on the facts before the court. What this rule seems to require is that a jury should be capable of reasoning within a certain kind of framework. They must be capable of reconciling the thesis that the probability of the accused’s guilt, on the evidence presented, is greater than zero, with the thesis that, prior to the presentation of this evidence, the probability of the accused’s guilt is zero. That is to say, lay juries must be capable of reasoning within a framework in which zero-level prior probability does not compel zero-level posterior probability. In Pascalian reasoning such a framework is not obtainable, because of Bayes’ law. But Baconian reasoning, as we have seen, allows this, because of its quite different foundations: zero-level Baconian probability means non-provability, not disprovability. So here is yet another reason for supposing that the ordinary laymen, who constitute British or American juries, must be fully capable of thinking about probabilities in Baconian terms. To suppose otherwise would be to assume a politically unacceptable

396

L. J. Cohen

lack of fit between legal norms and social reality. Of course, Pascalian probabilities often enter into forensic proofs, in connection with issues about identity, occupational disease, actuarial risk, and innumerable other matters. But when such probabilities are important they enter into a proof as part of the premisses for its conclusion, not as gradations of the extent to which its premisses establish the conclusion. They are given in evidence by expert witnesses. not estimated by jurors in the process of deciding on their verdict. And they establish facts about whole categories of individual people or events, rather than just about the particular people or events that are involved in the case currently before the court.

A re-interpretation

of some experimental

results

If we bear in mind these normative points about Baconian probability. we shall be in a better position to make sense of Tversky and Kahneman’s experimental data. Look back now at the first experiment described above. In it the fallacy imputed to the subjects was that of failing to take prior probabilities into account when judging the probability that a particular description belonged to an engineer rather than to a lawyer. But this would only have been a fallacy if the subjects were reasoning in terms of Pascalian probabilities. If their judgments were Baconian, there was no fallacy. The prior Baconian probability of a particular man’s being an engineer, or of his being a lawyer. is presumably very low indeed, even if not actually zero. So the floor it sets to either posterior Baconian probability is of negligible importance. In assessing the latter all that needs to be done is to determine the inductive reliability of the generalisation that all people of the given description are engineers, or lawyers, as the case may be. I.e. the question to be answered is: how well guarded is the description against those factors that create exceptions to some appropriate rule-of-thumb for inferring a man’s profession? Or, to put the point yet another way, we can understand what Tversky and Kahneman call the subjects’s ‘stereotype’ of an engineer, or lawyer, as the description that the subject implicitly takes to guarantee membership of the appropriate profession. He attributes more or less complete inductive reliability to the generalisation that anyone satisfying the engineer-stereotype is in fact an engineer. So the Baconian probability that such-or-such a description betokens an engineer is judged by the extent to which that description approximates the full stereotype. Representativeness, as Tversky and Kahneman call it, is thus not just a heuristic here, as they regard it, but rather the rationally appropriate criterion. Mutually similar causes produce mutually similar

On the psycholog?l of prediction:

Whose is the fallacy?

397

effects. It follows that if the subject commits any fallacy at all, the fallacy is not a logical or mathematical one, such as would arise from ignoring priors in the computation of Pascalian probabilities. If there were a fallacy, it would lie rather in the choice of stereotype. The subject might have made an empirical error about the factors that more or less guarantee a man to be an engineer. But how, you may ask, does a subject decide whether to assess the probabilities at issue by Pascalian standards or by Baconian ones? Well, the experimenters certainly give him no guidance in this dilemma. They are like someone who asks about the value of a particular lecture and leaves it to his hearers to decide whether he means its academic value or its value for attracting crowds. So, if we charitably assume the subjects to be at least as rational as ourselves, we can hope to construe at least some such experiments as revealing how subjects do decide between Pascalian and Baconian responses. That is to say, we should interpret their answers, where possible, in whichever of the two ways does not involve them in committing any logical or mathematical fallacy. And, if we can thus discover when their judgments are Pascalian, when Baconian, and when irredeemably fallacious, we can go on to look for the features of experimental set-ups which correlate with these various results. One hypothesis which seems to fit the available evidence is that people inexpert in statistical theory tend to apply Baconian patterns of reasoning instead, and to apply these correctly wherever they have an opportunity to make the probability in question depend on the amount of inductively relevant evidence that is offered. Thus it is reasonable to suppose that adoption of a particular profession is causally connected (in some as yet imperfectly known way) with a person’s character, interests, abilities, opportunities. etc. So in the experiment about the engineers and lawyers the subjects naturally tended to go by the weight of evidence and use Baconian reasoning. A similar explanation covers the second experiment described above. The subjects were asked about two particular hospitals in a particular town, during a particular year. They were asked whether they thought that the size of the hospital affected the frequency with which its ratio of male births to female ones diverged markedly from the human average. Since a majority of them neglected sample-size and answered in the negative, we must infer this majority to have applied Baconian standards. Why did they do so? Well, it would have been quite reasonable to argue as follows (though it is not necessary to assume that conscious reasoning of this kind actually took place): maybe doctors can decide on a deliberate policy of impeding births that they know will be of a particular sex; but there is no particular reason to expect this more in a small hospital than in a larger one, since the size

of a hospital, so far as we know, does not affect its obstetricians’ attitudes to the sex-ratio of the babies born in it; so there is zero weight of evidence in favour of either hospital’s having more days with an exceptional sex-ratio of births. Such a way of arguing is fulIy rational in the same sense as the prospective mother, who is concerned to avoid any interference with the biological processes that determine the ratio of boys to girls within her family, is fully rational in paying no attention whatever to the long-run Pascalian probabilities about larger and smaller hospitals when she decides where to lie in. Of course, if the subjects had been familiar with Bernoulli’s theorem they might well have inferred from the information supplied, not that the smaller hospital actually recorded more days with an exceptional sex-ratio ~ since this inference too, /XICCTversky and Kahneman, would have been a fallacy ~ but that such days had a higher Pascalian probability of occurrence (within a certain interval of approximation) in the smaller hospital. Suppose therefore that the experiment were re-run and the question were put unambiguously: in which hospital is there a higher Pascalian probability (within a specified interval of approximation) that any day picked at random would be one with an exceptional sex-ratio? No doubt subjects ignorant of Bernoulli’s theorem would not score well in their replies. But if they are ignorant about Bernoulli’s theorem ~~ i.e., about certain crucial implications of the Pascalian concept of probability - have they fully understood the question? At best such an experiment only exposes the indeterminacy of the experimental situation here, which is like that of many other experiments about proneness to logical or mathematical error. The more inexpert the subjects are in probability theory, the less they can be expected to understand the question properly: the more expert they are, the less they can be expected to argue fallaciously (except through carelessness or other adventitious causes of performance-error). So an experimenter should not expect that his or her subjects’ systematic proneness to logical or mathetnatical error can be demonstrated unambiguously even from their answers to well-phrased questions about Pascalian probability. The most that such an experiment can really show is the extent to which its subjects fail to speak as if they understand the question correctly. On the other hand, where the question is ambiguous between Pascalian’ and Baconian probability, the hypothesis already offered seems to fit the facts. Indeed it fits Tversky and Kahneman’s results about representativeness even where the problem posed to their subjects concerns a random process, like coin-tossing, despite the fact that such processes are often treated as a paradigm field for Pascalian analysis. According to Tversky and Kahneman (1974)

On the psyhologv

of prediction:

Whose is the fallacy?

399

People expect that a sequence of events generated by a random process will represent the essential characteristics of that process even when the sequence is short. In considering tosses of a coin for heads or tails, for example, people regard the sequence H-TFH-T-T--H to be more likely than the sequence H-H-H-T-T--T, which does not appear random, and also more likely than the sequence H-H-H-

H-TV H, which does not represent the fairness of the coin. But though this kind of estimate is fallacious, if understood as a judgment of Pascalian probability, it is not at all fallacious if interpreted in Baconian terms. Suppose you encounter a particular sequence of just six items, where the items belong to just two different kinds. These might be black and white rocks on a path, boy- and girl-births in a family, men and women going through a doorway, . . . or coin-tosses. You are asked how probable is it that the sequence is random, and the only information you have is the relative frequency of the two kinds in the given sequence and their pattern of distribution. Now the word ‘random’, outside the theory of Pascalian probability, means ‘without aim or purpose or principle’. So you are being asked about the probability that the sequence was not composed as a result of a single causal factor but through the undesigned interaction of several such factors. Well, you have not got much evidence to go on, if you can look only at the sequence’s own constituent structure and not at its surrounding circumstances. But the relative frequency of the two kinds of items, and their symmetry of distribution, would obviously be relevant variables. Designed, non-random sequences tend to be regular and symmetrical throughout, even though undesigned, random sequences often exhibit occasional stretches of regularity and symmetry. Hence you would be correct to infer a slightly higher Baconian probability of non-randomness for the sequence H-H&HT-T--T than for the sequence H-T-H-T-T--H. And at least in the case of the men and women going through a doorway the more probable conclusion might well be the true one. Moreover, it should be remembered here that Baconian probabilities are invariant under contraposition. So, if pI(notR[andoml /S[ymmetricall) > pr(not-R, not-S), it must also be the case that pI(not-S/R) > pr(S,R). And the latter inequality corresponds to the judgment that Tversky and Kahneman’s subjects actually produced. Therefore on the more charitable, Baconian interpretation these subjects committed no fallacy and were the victims of no illusion. (Analogous reasoning will account for their belief that H-T-H-T-T-H is more likely than H-H-H-H-T-H.) A similar point emerges with particular clarity in regard to a remark that Tversky and Kahneman make about interview-based predictions. According to them people are prone to experience much confidence in highly fallible judgments, a phenomenon that may be termed the illusion of validity. Like other perceptual and

400

I,. J. Calm

judgmental errors, the illusion of validity often persists even when its illusory character is recognised. When interviewing a candidate, for example, many of us have experienced great confidence in our prediction of his future performance, despite our knowledge that interviews are notoriously fallible. It is to be hoped that not too Inany candidates of Tversky and Kahneman’s confusion here.

have suffered

injustice

because

For even if the results of interviews in general do not significantly correlate, in the long run, with future performance, we can still be quite rational in having a lot of confidence about our predictions in certain particular cases. The fallibility of interviews is a collective property of an indefinitely large class of events. It excludes a high Pascalian probability of success for any interview-based prediction that is selected at random. Justifiable confidence in an interview, however, where such confidence exists, is a property of that particular interview, and is grounded on the weight of evidence - the amount of inductively relevant facts ~~ revealed at the particular interview and thus on the high Baconian probability which that evidence validates. If the hypothesis proposed above is correct, what Tversky and Kahneman call “the rcpresentativeness is the outcome of an underlying heuristic” assumption that, where mutually similar causes operate. mutually similar effects occur. And similarity here is properly to be judged, of course, in terms of inductively relevant characteristics. But in some problems there are no features that may be appropriately treated as inductively relevant characteristics. A typical problem of this nature, studied by Olson (1976). may be stated as follows:Consider two Quebec towns. Anglophones are a majority (65%) of the voters in Town A, but are a minority (45%) in town B. There is an equal number of electoral ridings in each town. You have the voters’ lists from all ridings in both towns. You randomly select a list from one riding, and observe that exactly 55% of the voters are Anglophones. What is your best guess - is the riding in Town A? or in town B?

In dealing with such problems Baconian reasoning has nothing to get a proper grip on, and the inductivist search for salient similarities may then be guided in fact by various intuitive strategies that are normatively indefensible. For example, as Olson has shown, subjects may respond to similarities in the absolute size of numbers rather than in minority-majority distribution. Other writers too have detected a tendency in statistically untutored individuals to base their predictions on intuitive beliefs about causality. Thus Ajzen (1977), investigating people’s estimates of a student’s grade point averages in relation to information of different kinds, found that when asked to make a prediction his subjects looked for factors that would cause the behaviour or event under consideration. Other items of information,

On the psychology

of prediction:

Whose is the fallacy?

401

even though important by normal principles of statistical prediction, tended to be neglected if they had no apparent causal significance: statistical information was used mainly when no causal information was available. Again base-rate information tended to be neglected where it provided no evidence concerning the presence of factors that were perceived to stand in a causal relation to the phenomena considered. But where base-rate information did provide causally relevant evidence it was considered more seriously. (The same point is made by Tversky and Kahneman (1977), with a further wealth of examples). Indeed this overall human tendency to prefer reasoning from causal rather than statistical data is hardly surprising if we bear in mind that the former type of reasoning seems to develop much earlier in the child and to be much more deeply rooted. According to Piaget and Inhelder (1975) “the idea of chance and the intuition of probability constitute almost without a doubt secondary and derived realities, dependent precisely on the search for order and its causes”. However, what I am arguing for in the present paper is not only the factual point that causal inferences underlie much of Tversky and Kahneman’s data, but also the normative point that such inferences, and their gradation by appropriate criteria, should not necessarily be regarded as irrational.

Are people particularly

prone to fallacies in diagnostic

reasoning?

Tversky and Kahneman (1977) have also drawn attention to a type of phenomenon that was originally investigated by Turoff (1972). Tversky and Kahneman describe their result as follows: Let C be the event that within the next 5 years Congress will have passed a law to curb mercury pollution, and let D be the event that within the next 5 years, the number of deaths attributed to mercury poisoning in the U.S. will exceed 500. Let C and b denote the complements of C and D respectively. Question: Which of the two conditional probabilities, p(C/D) and p(C/fiJ, is higher? Question: Which of the two conditional probabilities, p(D/C’) and p(D/C) is higher? The overwhelming majority of respondents state that Congress is more likely to pass a law restricting mercury pollution if the death toll exceeds 500 than if not, i.e., p(C/D) > p(C/D). Most people also state that the death toll is less likely to reach 506 if a law is enacted within the next five years than if it is not, i.e., p(D/C) < p(D/c... However, this seemingly plausible pattern ofjudgments violates the most elementary rules of probability, . .. provided p(c) and p(D) are non-zero.

And the explanation of this particular piece of alleged irrationality to lie in a more general form of alleged intellectual error:

is said

When told to assume that a particular conditioning event has occurred, people are prone to focus on the causal impact of this event on the future and to neglect its significance as a source of infomlation about the past. Now a point to notice about Baconian reasoning is that the inductive probability of one ~‘VCIZ~, given another event, cannot be determined unless the temporal relationship of the two events is specified. This is because the probability depends on the inductive reliability of the covering generalisation. as we have seen, and that reliability is in turn determined by the extent of the gcneralisation’s resistance to causal interference by relevant variables. Because inductive probability is rooted in the operation of causal factors, and causation works from carlier to later, it is necessary to specify which of the pair of events is supposed to come first. However, the design of Tversky and Kahneman’s experiment requires not only that a question be put to the subjects which is ambiguous or unspecific as to the temporal priority of one event to another but also that the subjects be not told that this ambiguity or lack of specificity is essential to the design of the experiment. In the circumstances therefore the subjects may perhaps be forgiven their apparent tendency to read temporal orderings into the event-pairs about which they are asked to make probability-judgments. Without such interpretations of what they arc told, they cannot properly apply Baconian standards of evaluation. And there may be at least two reasons why the interpretations which are made should all in fact assume that what is at stake is the extent to which an earlier event probabilifies a later one. rather than vice versa. The first is that the basic form of Baconian generalisation .-- the form that can be tested experimentally by the manipulation of relevant factors runs naturally from earlier to later, if there is any time difference at all: the question is whether the later event is affected when the relevant circumstances of the earlier one arc altered. The second is that special care with verb-tenses is required if conditionalisation is not to hinr at this order. For example, instead of stating that “Congress is more likely to pass a law restricting mercury pollution if the death toll exceeds 500 than if not” it would be necessary to state “Congress is more likely to pass a law, or to have passed a law. restricting mercury pollution if the death toll exceeds 500 than if not”. But if the unspecificity of temporal order were made apparent in this way the point of the experiment would be destroyed. It follows that the judgments made by Tversky and Kahneman’s subjects arc not in conflict with the logic of probability. On the one hand these subjects tend to hold, for any time t within the next five years. that p(C after t/D before t) > p(C after t/D before t): on the other hand they also tend to hold that p(D after t/C before t) < p(D after t/C before t).

On the psychology

of prediction:

Whose is the fallacy?

403

Indeed these two comparative judgments are not only consistent with one another according to the logical principles constraining relations between different judgments of Baconian probability. They are also quite consistent with one another in terms of the Pascalian calculus. So Tversky and Kahneman’s results supply no reason to suppose that ordinary people have some systematic proneness to irrationality in regard to causally diagnostic reasoning as contrasted with causally predictive reasoning. Nor, on a Baconian interpretation, can Tversky and Kahneman’s subjects be said to be averse to committing themselves - at least by implication - on questions of diagnostic probability, since the contraposition of inductive probabilityjudgments (though not of Pascalian ones) is equivalence-preserving. The inductive probability that the annual death-toll will exceed 500 after t if Congress does not pass anti-pollutant legislation before t is equal to the inductive probability that Congress will have passed anti-pollutant legislation before t if the annual death-toll does not exceed 500 after t. In fact there seems no reason to believe that irrationality is particularly common in diagnostic reasoning even where the occurrence of percentagefigures as evidence in the statement of the problem makes it quite clear that Baconian probabilities are not involved. And it is necessary to demonstrate this in order to establish that such experiments provide no confirmation for Tversky and Kahneman’s theory that people are particularly prone to irrationality in diagnostic reasoning. Tversky and Kahneman (1977) seem to have drawn the wrong conclusion from two experiments of that kind. In the first causally predictive experiment subjects were instructed as follows: Consider the following hypotheses concerning the causes of death. (i) The chance of death from heart failure is 5% among males. (ii) The chance of death from heart failure is 10% among males who are heavy smokers. (iii) The chance of death from heart failure is 45% among males with congenital high bloodpressure. Dick is a heavy smoker with congenital high blood pressure. Question: What is the probability that Dick will die of heart failure? The experiment was designed to test whether subjects would recognise that the correct figure was one that was higher than 45?&, in order to reflect the incremental force of two independent pieces of evidence. In the event a significant majority of respondents did in fact recognise this. In a second causally diagnostic experiment subjects were given the following information:

404

L. J. Cohen

Bill has been referred by his physician to the hospital with suspicion of tumour. Following the examination the following data were obtained. (i) The chance of a malignant tumour is 5% among patients referred pital for such examinations. (ii) The haematologist who examined Bill’s blood test estimated the malignancy to be 10%. (iii) The radiologist who examined Bill’s X-ray estimated the chance nancy to be 45%. Question: What is the probability that Bill has a malignant tumour?

a malignant to the hoschance of a of a malig-

According to Tversky and Kahneman this second problem “is structurally similar” to the first one. “In both problems each datum provides support for the hypothesis in question, and it appears reasonable to assume the incremental property.” But in the second experiment “a significant majority of the subjects . . . violated the incremental property” and were inclined to average the estimates. But did the majority of subjects really commit a fallacy in the second of these two experiments? In the instructions for the first experiment clauses (ii) and (iii) state conditional probabilities. But in the instructions for the second experiment clauses (ii) and (iii) state that certain unconditional probabilities have been arrived at, presumably by detachment from conditional ones. Now any such detachment presupposes that no other evidence is available. A haematologist (or radiologist, or anyone else) cannot properly infer p(H) = x from knowledge of E and of p(H,E) = x, unless E is all the available evidence. So we recover a conditional probability p(H/E,), from the information given in (ii) only on the assumption that there is no other available evidence than Bill’s blood test, E, ; and we recover a conditional probability p(H/E,), from the information given in (iii) only on the assumption that there is no other available evidence than Bill’s X-rays, E2. We are therefore not entitled to compound these two conditional probabilities into a value for p(H/E,&E*) from which we then detach an increased value for p(H) on the assumption that both E, and E, fall within the available evidence. The best that we can do is to derive the weighted average of the two estimates for p(H) that were arrived at separately, since these two estimates were reached on mutually incompatible presuppositions. The fallacy here has been committed by Tversky and Kahneman, not by their subjects. It is therefore worth while to design a third experiment that is indeed structurally similar to the first, rather than to the second, but concerns the same subject-matter as the second experiment. The subjects are now given the following information:

Bill has been referred by his physician nant tumour.

to the hospital with suspicion Certain facts are known about such referrals:

of a malig-

On the psychology of prediction: Whose is the fallacv?

405

(i) Among patients referred to the hospital for this reason, the probability of a malignant tumour is 5%. (ii) Among patients who have positive blood tests after referral, the probability of a malignant tumour is 10%. (iii) Among patients who have positive X-ray results after referral, the probability of a malignant tumour is 45%. Bill turns out to have both a positive blood test and a positive X-ray result. What is the probability that he has a malignant tumour? When this experiment was performed on a mixed-ability group of 25 people in the age-range 17-60, only one responded with a probability that was less than 45%. The remainder were evenly divided between those who responded with a figure greater than 45% and those who responded with just a 45% probability. And a charitably-minded psychologist could well interpret this division as stemming from an underlying factual disagreement about whether the particular probabilities (ii) and (iii) should be construed as being independent of one another or not. What emerges, therefore, when the design of the second experiment has been straightened out in such a way as to preserve a genuine structural similarity with the first experiment, is that Tversky and Kahneman’s results are not replicated. It is just not the case that a significant majority is inclined to average the probabilities given in (ii) and (iii). So here too no support whatever emerges for the Tversky-Kahneman theory that judgments about the probability of causal diagnoses are more prone to irrationality than judgments about the probability of causal predictions. We have no reason to suppose that people are prone to what of the impact” of Tversky and Kahneman call “ a major underestimation diagnostic evidence, “which could have severe consequences in the intuitive assessment of legal, medical, or scientific evidence”, as they claim. It would be more appropriate to be alarmed about the possibility that Tversky and Kahneman’s widely publicised “findings” may have led many people to distrust some of their fellow human beings’ probability-judgments unnecessarily. Of course, certain sorts of fallacy do commonly occur in the reasoning of laymen about probabilities. For example, Tversky and Kahneman claim to have shown that when subjects are already familiar with some instances of a class it tends to appear more numerous than a class of equal frequency but less familiarity. This claim is in no way refuted by what I have been saying about Baconian probability, and the same is true for their claim that the ease with which subjects can imagine a class affects their conception of its size. (Cf., also Smedslund, 1963). Nor would it be surprising if Baconian and Pascalian modes of reasoning - in their intuitive, inexplicit forms ~ are sometimes confused with one another, with the result that nothing rational emerges. Such a confusion may well account for the tendency of ordinary

406

I,. J. Cohen

people to overestimate the combinatorial probability of conjunctive events a tendency that John Cohen, E. I. Cheswick and D. Haran (1972) claim to have found. For in Baconian, inductive reasoning a conjunctive event always has as high a probability as the less probable of its conjuncts and a disjunctive event need have no higher an inductive probability than the more probable of its disjuncts. Perhaps in any case such confusions are especially to be expected in the artificially constrained circumstances of psychological experiment. In a natural situation, when you are confused by a problem, you can make further enquiries so as to determine what would be an appropriate framework within which to think about the issues involved. But as a subject in a laboratory experiment you would not normally have an opportunity for supportive investigations of this kind. Very great care is therefore needed in drawing any conclusions from such experiments about the ability of ordinary people to reason validly about probabilities; and experimenters must also be obliged to cast out the motes in their own eyes. Above all “the normative theory of prediction” must be taken to include Baconian as well as Pascalian modes of reasoning. For, on the assumption which is shared by all investigators of causes and effects, that like causes produce like effects, it is undeniably reasonable to use the degree of relevant likeness of the cause as one kind of criterion for the probability of the effect. And an assumption that is so widely made, in such reputable contexts as those of forensic proof and scientific experiment, can hardly be denied normative credentials.

References Ajzen,

1. (1977). Intuitive theories of events and the effects of base-rate information on prediction. J. Pers. Sot. Psychol. 35, 303-314. Bacon, F. (1620). Novum Organurn. London. Cohen, J., Cheswick, E. I. and Haran, D. (1972). A confirmation of the inertialeffect in sequential choice and decision. Brit. J. Psychol. 63, 4 l-46. Cohen, L. J. (1970). The Implications oflnduction. London., Methuen. Cohen, L. J. (1977). The Probable and the Provable. Oxford, Clarendon Press. Hacking, I. (1975). The Emergence of Probability. Cambridge, Cambridge University Press. Herschell, J. F. W. (1833). A Preliminary Discourse on the Study of Natural Philosophy, London, Longmans, Green. Hooke, R. (1705). A General Scheme or Idea of the Present State of Natural Philosophy and How its Defects may be Remedied by a Methodical Proceeding in the Making of Experiments and Colleting Observations. In R. Waller (ed.), The Posthumous Works of Robert Hooke, London, pp. 6-65. Kahneman, D. and Tversky, A. (1973). On the psychology of prediction. Psychol. Rev. 80. 2377251. Kahneman, D. and Tversky, A. (1974). Subjective probability: a judgment of representativeness. In C.A.S. Stael von Holstein (ed.), The Concept of Probability in Psychological Experiments, Dordrecht-Holland, Reidel. Pp. 25-48.

On the psychology

of prediction:

Whose is the fallacy?

407

Mackie, J. L. (1973). Truth, Probability and Paradox. Oxford, Clarendon Press. Mill, J. S. (1843). A System of Logic, Ratiocinative and Inductive. London. Nagel, E. (1938). Principles of the Theory of Probability. In 0. Neurath, R. Carnap, and C. Morris (eds.), Foundations of the Unity of Science, Chicago, University of Chicago Press. Vol. I, 341422. Olson, C. L. (1976). Some apparent violations of the representativeness heuristic in human judgment. .I. Exp. Psychol.: Hum. Percep. and Perf: 2, 599-608. Piaget, J. and Inhelder, B. (1975). The Origin of the Idea of Chance in Children. London, Routledge. Introduction p. xv. Smedslund, J. (1963). The concept of correlation in adults. Scandin. J. Psycho/. 4, 165-173. Turoff, M. (1972). An alternative approach to gross-impact analysis. Technol. Forecast. Sot. Change 3, 309-339. Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science 125,1124-1131. Tversky, A. and Kahneman, D. (1977). Causal thinking in judgment under uncertainty. In R. Butts and J. Hintikka (eds.), Basic Problems in Methodology and Linguistics, Dordrecht-Holland, Reidel. Pp. 167-190. Whewell, W. (1847). The Philosophy of the Inductive Sciences. London, J. W. Parker.

Cognition, 7 (1979) 409-411 @Elsevier Sequoia S.A., Lausanne

Discussion - Printed

in the Netherlands

On the interpretation of intuitive probability: A reply to Jonathan Cohen DANIEL University

KAHNEMAN of British

Columbia

AMOS TVERSKY Stanford

University

In our discussion of probability judgment and intuitive prediction we have described as errors some judgments and inferences that violate basic principles of probability and statistics. Cohen argues that our subjects’ answers should not be so viewed because they can be construed as compatible with an alternative normative system which he has recently developed. Cohen claims that his system has a sound normative basis, on a par with the standard probability calculus, and that it provides a viable interpretation of the responses of our subjects. It is easy to see, however, that Cohen’s system does not provide a viable explication of the intuitive notion of probability. In this system “the (Baconian) probability of an A being a B is identified with the inductive reliability of the generalization that all A’s are B’s”. Although Cohen does not describe explicitly how to evaluate inductive reliability, he specifies formal rules that govern inductive (or Baconian) probabilities. First, he demands that P,(A/B) = Pr(Not B/Not A). Thus, according to Cohen, the inductive probability that a bird which has just been sighted is white if it is a raven must be equal to the probability that the bird is not a raven if it is not white. We believe that most people would judge the former probability to be vanishingly small and the latter to be substantial. Second, Cohen proposes that if P,(A/B) > 0 then P,(Not A/B) = 0. Thus, if there is non-zero inductive probability that the defendent in a trial is guilty, then the inductive probability that he is innocent must be zero, contrary to legal usage and common sense. More generally, Cohen’s system cannot assign non-zero probability to more than one member of a set of mutually exclusive hypotheses. For example, consider a murder investigation in which there are several suspects and the murderer is known to have acted alone. According to Cohen, the probability of guilt can be non-zero for only one suspect. For all other suspects, the inductive probability of guilt is zero, just as for the rest of mankind.

410

D. Kahnernan and A. TtvrskJj

Whatever notion may be captured by Cohen’s formalism, it clearly does not conform to common usage of “the probability of an A being a B” or “the degree of belief in A, given evidence B”. For an attempt to model the relation between evidence and belief, which also departs from the standard calculus but is free of the above defects, see Shafer (1976). Cohen’s critique of our position is based on a reinterpretation of the questions that were answered by our subjects. In order to rationalize the neglect of base-rate, for instance, Cohen argues that a question such as “what is your probability that a person who owns a programmable calculator is an engineer rather than a lawyer?” is interpreted by subjects as “what is your confidence in the generalization that no lawyer owns a programmable calculator?“. We propose that Cohen’s claim regarding the equivalence of the two questions is incorrect, and we expect most people to give a fairly high probability as an answer to the former question and an extremely low probability as an answer to the latter. Incidentally, we have found that the judged probability that Mr. X (whose personality is briefly sketched) is an engineer rather than a lawyer and the judged probability that he is a lawyer rather than an engineer typically add up to unity both in a within-subject and in a between-subject design. This observation is inconsistent with Cohen’s formalism, which requires the smaller of the two probabilities to be zero. An even less plausible interpretation is introduced by Cohen to explain common answers to problems such as “which hospital (the large or the small) do you think recorded more days during the year in which more than 60% of the babies born were boys?“. We have suggested that subjects correctly attribute daily variations of sex-ratio to chance factors, but fail to appreciate the effect of sample size on sampling variability. In contrast, Cohen argues that subjects attribute any imbalance of sex-ratio to some causal intervention by the obstetricians in the hospital. Because such intervention is presumably unrelated to hospital size, the subjects’ neglect of the variable can be justified. The conflict between the interpretations could be resolved empirically, e.g., by asking subjects whether occasional imbalances of sex-ratio reflect chance factors or hospital policy. We do not believe that Cohen’s hypothesis would survive such a test. We hope that these examples suffice to show that Cohen’s system has little normative or descriptive appeal, and that his interpretation of our findings is hardly compelling. We accept Cohen’s objection to the problem of the malignant tumor, which was indeed deleted in our subsequent treatment of causal and diagnostic reasoning (Tversky & Kahneman, 1979). This objection, however, does not bear on the interpretation of subjective probability. In conclusion, we can only invite the reader to look at the data presented in our papers and to judge whether the observed insensitivity to sample size, prior

Reply

to Jonathan

Cohen

411

probability and reliability of evidence should be viewed as mistakes, which many of us are prone to make but would wish to correct, or as opinions which should be held with pride and confidence because they may be construed as compatible with Cohen’s Baconian formalism.

References Shafer, Glen (1976) A Mathematical Theory of Evidence, Princeton, N.J.: Princeton University Press. Tversky, A. and Kahneman, D. (1979) Causal schemas in judgments under uncertainty. In M. Fishbein (ed.), Progress in Social Psychology. Hillsdale, N.J.: Lawrence Erlbaum Associates.

Cognition, @Elsevier

7 (1979) 413-420 Sequoia S.A., Lausanne

Discussion ~ Printed

in the Netherlands

Developmental and acquired dyslexia: Some observations on Jorm (1979) ANDREW W. ELLIS Department University

of Psychology, of Lancaster

*

In a recent paper on the cognitive and neuropsychological bases of developmental dyslexia, Jorm (1979) has attempted to unite the study ofdevelopmentul dyslexia (defined as a specific reading disorder occurring in otherwise intelligent children provided with an adequate background and educational opportunities) with the study of acquired dyslexia (the term given to the reading problems encountered by hitherto normal readers as a consequence of brain damage). It is undoubtedly a matter for regret that psychologists studying either one of these two aspects of reading have tended to be ignorant of work in the other area, despite the obvious scope for comparison. Jorm is therefore to be congratulated on being one of the few who have tried to bring the two fields together in an integrated manner. However, in the course of his paper Jorm (1979) reiterates a claim made in Jorm (1977) that the symptoms of developmental dyslexia are closely similar to those of a particular variety of acquired dyslexia known either as deep djtsfexia (Marshall and Newcombe, 1973) or phonemic dyslexia (Shallice and Warrington, 1975). An alternative proposal has been made by Holmes ( 1973; 1978) who argues that developmental dyslexia may be likened, not to deep dyslexia, but to another variety of acquired dyslexia which Marshall and Newcombe (1973) term surface dyslexia. In this paper I shall argue that Jorm’s grounds for comparing developmental dyslexia to deep dyslexia are ill-founded, and that such evidence as is available, though not unequivocal, tends to support Holmes’ position. On the basis of the studies of deep dyslexia by Marshall and Newcombe (1966; 1973), Patterson and Marcel (1977), Richardson (1975a, b), Saffran and Marin (1977), and Shallice and Warrington (1975), the following list of symptoms may be drawn up:

*Requests for reprints should University of Lancaster, Lancaster

be addressed to Andrew LA1 4YF, England.

W. Ellis,

Department

of Psychology,

414

Andrew

W. Ellis

(1) Severe problems in nonlexical grapheme-to-phoneme conversion, as evidenced by an almost complete inability to read nonwords such as blurg or gem-k. (2) Errors when reading single words without time constraints. These paraphasias may be visual (shallow --f “shadow”; mclk~~w + “melon”), derivational @refer -+ “preference”;faitlz -+ “faithful”) or semantic (speak + “talk”; berry + “grapes”). (3) Pronounced effects of word characteristics on naming such that nouns are read better than adjectives or verbs which are, in turn, read better than function words. Also, imageable/concrete nouns are read better than abstract nouns. Deep dyslexia is one of three varieties of acquired dyslexia discussed by Marshall and Newcombe (1973). A second variety is visz~al dyslexiu, characterised by purely visual errors, while in a third variety, termed surface d_vslexiu, the vast majority of errors can be described as partial failures of grapheme-phoneme conversion (see also Newcombe and Marshall, 1973). Ambiguous consonant letters such as s, c or g, in which the choice of the correct phonemic counterpart depends upon the graphemic context, create particular problems, leading to errors such as guest + “just” (where g is assigned a value as in ‘gin’), or recent -+ “rikunt”. Other errors involve assigning a phonetic value to silent graphemes (Zistelz + “liston”; isZand -+ “izland”), failure to apply the e-lengthening rule (lace -+ “lass”; describe + “describ”), or stress shifting (begz’tz -+ “beggin”;omz’t + “6mmit”). Holmes (1973, 1978) has argued that the misreadings of developmental dyslexics are closely comparable to the errors made by surface dyslexics. Thus, the errors made by the four 9 to 13-year-old boys studied by her included examplars of all the above categories, for example failures to apply the e-rule (wage + “wag”; quite + “kwit”), or mispronouncing ambiguous graphemes, as when cl1 is pronounced /3/ (as in “church”) when reading words such as anchor or monarch where the correct realization of clz is /k/. Marshall and Newcombe (1973) ascribe surface dyslexia to a moderate-tosevere impairment of the “direct route” from visual written forms to semantic representations (a route for which there is now overwhelming evidence see Coltheart, 1978; Marshall, 1976), combined with a lesser deficit in knowledge of grapheme-phoneme regularities. (One might note that, given the complexities of English spelling-to-sound relations, even normal skilled readers might be expected to make sizeable numbers of reading errors if forced to rely entirely upon grapheme-phoneme correspondence rules). Holmes (1978) concurs with Marshall and Newcombe’s (1973) interpretation of surface dyslexia and extends that interpretation to incorporate developmental dyslexia. This view is opposed to Jorm’s (1979) claim that the direct visual-to-semantic route functions normally in developmental dyslexics.

Developmerltal and acquired dyslexia

415

Jorm (1979) adduces four lines of evidence in support of his thesis that developmental dyslexia might be regarded as a genetic form of deep dyslexia. The validity of these four points will be examined one at a time.

(I) Impairment

of grapheme-phoneme

correspondence

Jorm cites unpublished work by Firth (1972) which apparently found that a test in which children had to read nonsense words like nate and iston discriminated successfully between dyslexic and normal subject groups. Unfortunately, Jorm provides no further information concerning the behavior of the developmental dyslexics. Do they, like deep dyslexics, sit mute in front of printed nonwords, or do they attempt pronunciations which are subsequently deemed by the experimenter to be incorrect? If the latter is the case, do the errors of the dyslexics result from random guesses, or from the application of inappropriate, irregular letter-to-sound correspondence? To illustrate the last point, consider the nonword ghoti. A “correct” reading would, presumably, be something like /goti/ and yet, as George Bernard Shaw observed, if gh is pronounced as in “tough”, o as in “women” and ti as in “nation”, then ghoti may be given an alternative reading as “fish”! Is it, then, that developmental dyslexics fail, in part, because of the application of such irregular grapheme-phoneme correspondences? Certainly, other data available on the reading and spelling performance of developmental dyslexics calls into question any claim that they lack all knowledge of grapheme-phoneme conversion rules (see below).

(2) The effect of imageability

on word reading

Imageability of words is a major factor in predicting the reading performance of deep dyslexics (Richardson, 1975a; 1975b; Shallice and Warrington, 1975). Jorm (1977) showed that poor readers between 8 and 11 years of age were more successful at reading one or two syllable concrete nouns than abstract nouns matched for length and Thorndike-Lorge word frequency. One possible explanation of this result is that although Jorm’s (1977) concrete and abstract nouns were matched on overall word frequency in samples of written English, as assessed by Thomdike and Lorge (1944), nevertheless concrete nouns might plausibly be expected to predominate over abstract-nouns in the reading experience of poor readers, particularly in the sorts of books commonly employed which contain abundant illustrations, with single words or short sentences describing the illustrations and naming objects depicted in them.

416

Andrew

W. Ellis

An alternative explanation of Jorm’s (1977) finding is that imageability may affect readability for all readers. normal or dyslexic. (Here one would have to propose that Jorm’s failure to find an effect of concreteness on the reading performance of good readers is due to a ceiling effect). This proposition is in line with the demonstrations by Holmes, Marshall and Newcombe (197 1) and Marshall, Newcombe and Holmes (1975) that the effect of syntactic word class (noun, verb or adjective) on the reading performance of deep dyslexics also predicts the reading success of 10 to 1 l-year-old children. and the tachistoscopic recognition thresholds of adult skilled readers for the same classes of word. Also relevant here is Spreen, Borkowski and Gordon’s (1966) demonstration using normal subjects that auditory recognition of words in noise is better for concrete than abstract nouns. A final, related point is that we do not have information about the effects of imageability on word recognition in other varieties of acquired dyslexia which one might wish to compare with developmental dyslexia (though see the discussion of ‘surface dyslexia’ below). (31 Pattiw

of errors made

As noted above, deep dyslexics make some visual errors when attempting to read single words. Jorm (1977) noted that his poor readers also tended to make errors in which the response was visually similar to the stimulus word, and claims this as evidence for functional similarities between developmental and deep dyslexia (Jorm, 1979, p. 26). This is weak evidence, however, for two reasons. First, visual errors occur in all the recognised types of acquired dyslexia. Thus, their occurrence in developmental dyslexia tells us nothing about which of the specific forms of acquired is closest in symptomatology to the developmental variety. Furthermore, visual errors are also produced by normal readers under conditions of rapid reading (e.g., Morton, 1964), or very brief presentations (Allport, 1977; Vernon, 1929): in other words. wherever reading difficulties are encountered, visual errors will be found also. A second objection to an argument based on error patterns is that the single most salient feature of “classical” deep dyslexia is the presence of semarztic reading errors. In this context, Wells (1906, pp. 77-8) reports an intriguing case of a child taught to read entirely by the look-and-say method who made semantic errors such as corn -+ “wheat”, locomotive + “engine” and dog -+ “cat”. These errors apparently disappeared after the child was taught to read phonically. Here is a clear case of semantic errors occurring in an apparently normal child who lacked knowledge of grapheme-phoneme correspondences. Jorm (1977), however, was unable to discover any evi-

Developmental and acquired dyslexia

4 17

dence of semantic relatedness between stimulus words and the error responses of poor readers. This apparent absence of semantic errors in developmental dyslexia must surely count strongly against any claim to functional similarity between developmental and deep dyslexia (and, by the same token, may be taken to be corroborative support for Holmes’ (1973; 1978) position). (4) Short-term

memory

impairment

Jorm (1979) devotes considerable space to a discussion of the relationship between developmental dyslexia and short-term memory impairment. There are a number of inconsistencies in his account which could be mentioned. Hardest to reconcile are the juxtaposed claims (Jorm, 1979, p. 23) that short-term memory difficulties in developmental dyslexics can account for a) their susceptibility to order errors in immediate recall (suggesting a “deficit in the auditory-verbal short-term store”) and b) their relative immunity to phonological (‘acoustic’) confusions in immediate recall (which apparently “suggests that dyslexics are not relying on the auditory-verbal short-term store to the same extent as normal readers”). This “heads-l-wintails-the-dyslexics-lose” logic is made even more disquieting when one recalls that phonological similarity is far and away the most potent cause of order errors in immediate recall (Conrad, 1965; Ellis, 1979; Watkins, Watkins and Crowder, 1974; Wickelgren, 1965). To return to the main argument, however, Jorm (1979) notes that one of Marshall and Newcombe’s (1973) two patients (the patient G.R.) and the patient K.F. of Shallice and Warrington (1975) both had reduced memory spans. Short-term memory data is not provided for the other deep dyslexic patients in the literature. One problem here is that both G.R. and K.F. show dysphasic as well as dyslexic symptoms, and their reduced memory spans may be a concomitant of their aphasia rather than a contributory cause of their dyslexia. Also, although several of the studies cited by Jorm (1979) found differences in group means on immediate recall tasks between normal readers and poor or dyslexic readers, it has yet to be demonstrated that an impoverished memory span is a necessary condition for the occurrence of either developmental or acquired (deep) dyslexia. (For what it is worth, one of Marshall and Newcombe’s (1973) two surface dyslexics - the patient J.C. - also displayed a reduced memory span, and is reported as having been more successful at reading concrete nouns than abstract nouns). In the most thorough investigation yet to be carried out on developmental dyslexia within the information-processing, dual-coding framework, Seymour and Porpodas (in press) conclude that the four dyslexic boys they studied possessed operational lexical (direct) and non-lexical (grapheme-phoneme)

routes, but that both systems were to a greater or lesser degree impaired. If anything. this conclusion supports Holmes’ (1978) theory over Jorm’s (1979); however. the two udult dyslexics studied by Seymour and Porpodas possessed generally efficient direct routes with impaired non-lexical routes a finding more in line with Jorm’s (1979) interpretation. One might hypothesize on the basis of Seymour and Porpodas’ (in press) findings that the direct visual-to-semantic word recognition channel improves with age in developmental dyslexics whilst the non-lexical route remains impaired. An alternative possibility is that there exist individual differences in developmental dyslexics such that some individuals are impaired on both modes of word recognition whilst others are disproportionately impaired on the non-lexical mode. Seymour and Porpodas’ results would then be explicable if the former group tend to achieve a more or less tolerable level of reading ability in adulthood ~ perhaps because of their dyslexia being due to a ‘maturational lag’ -- whilst the problems of the latter group persist into maturity. Jorm (1979) peremptorily dismisses the notion of varieties of developmental dyslexia, but the reader should also consult Vernon (1979) and the references contained therein for a more open-minded assessment. In summary, then, Jorm’s (1979) proposal that developmental dyslexics resemble brain-damaged deep dyslexics in their characteristics is not grounded on firm evidence. Holmes’ (1978) likening of developmental dyslexia to acquired surface dyslexia at least has the merit of demonstrating a clear similarity between the errors made by the two groups. Reading is an exceedingly complex skill; one which may be affected in a wide variety of ways by brain damage, producing a number of clinically distinguishable ‘pure’ forms of acquired dyslexia. Such pure forms, however, are rare, and the average patient with reading problems will manifest a mixed symptomatology showing characteristics of several of the pure forms. By analogy, though developmental deep dyslexics and developmental surface dyslexics rna_~’ exist - and, given that one has no reason to believe that genetic neuropsychological syndromes will be less varied than those resulting from brain injury, it would be surprising if this were not the case - one should tmt expect all developmental dyslexics to fall neatly into one or other of the postulated categories. In the field of acquired dyslexia, studies of heterogeneous groups of patients have been far less informative than intensive case studies of particular interesting and theoretically-relevant individuals. The present author is of the firm opinion that real advances in the understanding of developmental reading difficulties will occur only if the same approach is adopted.

Developmental

and acquired dyslexia

4 19

References Allport,

D. A. (1977 ‘) On knowing the meaning of words we are unable to report: The effects of visual masking. In S. Dornic (ed.), Attention and Performance VI. New York, Academic Press. Coltheart, M. (1978) Lexical access in simple reading tasks. In G. Underwood (ed.), Strategies oflnformation Processing. London, Academic Press. Conrad, R. (1965) Order errors in immediate recall of sequences. /. verb. Learn. verb. Behav., 4, 101 109. Ellis, A. W. (1979) Speech production and short-term memory. In J. Morton and J. C. Marshall (eds.), Psycholinguistics Series Vol. 2: Structures and Processes. London, Elek. Firth, I. (1972) Components of Reading Disability. Unpublished doctoral dissertation, University of New South Wales. Holmes, J. M. (1973) Dyslexia: a neurolinguistic study of traumatic and developmental disorders of reading. Unpublished Ph.D. thesis, University of Edinburgh. Holmes, J. M. (1978) “Regression” and reading breakdown. In A. Caramazza and E. 1~. Zurif (eds.), Language Acquisition and Language Breakdown: Parallels and Divergencies. Baltimore. John Hopkins University Press. Holmes, J. M., Marshall, J. C. and Newcombe, 1:. (1971) Syntactic class as a determinant of wordretrieval in normal and dyslexic subjects. Nature, 234, 416. Jorm, A. I:. (1977) Effect of word imagery on reading performance as a function of reader ability. J. educ. Psychol., 69, 46- 54. Jorm, A. F. (1979) The cognitive and neurological basis of developmental dyslexia: a theoretical framework and review. Con., 7, 19~-32. Marshall, J. C. (1976) Neuropsychological aspects of orthographic representation. In R. J. Wales and E. Walker (eds.), New Approach to Language Mechanisms. Amsterdam, North-Holland, Marshall, J. C. and Newcombc, F. (1966) Syntactic and semantic errors in paralexia. Neuropsychol., 4, 1699176. Marshall, J. C. and Ncwcombe, 1:. (1973) Patterns of paralexia: a psycholinguistic approach. J. PSJJC/IOlinguist. Res., I, 1755199. Marshall, J. C., Newcombe, 1;. and Holmes, J. M. (1975) Lexical memory: a linguistic approach. In A. Kennedy and A. Wilkes (eds.), Studies in Long-term Memory. London, J. Wiley. Morton, J. (1964) A model for continuous language behaviour. Lang. Speech, 7, 40-70. Newcombe, P. and Marshall, J. C. (1973) Stages in recovery from dyslexia following a left cerebral abscess. Cortex, 9, 3 19-332. Patterson, K. E. and Marcel, A. J. (1977) Aphasia, dyslexia and the phonological coding of written words. Quart. J. exp. Psychol., 29, 3077318. Richardson, J. T. E. (1975a) The effect of word imageabilityin acquired dyslexia. Neuropsychol., 13, 281-288. Richardson, J. T. E. (1975b) Further evidence on the effect of word imageability in dyslexia. Quart. J. exp. PsychoI., 27, 4455449. Saffran, t. M. and Marin, 0. S. M. (1977) Reading without phonology: Evidence from aphasia. Quart. J. exp. Psychol., 29, 5155525. Seymour, P. H. K. and Porpodas, C. D. (in press) Lexical and non-lexical processing of spelling in developmental dyslexia. In U. Frith (ed.), Cognitive Processes in Spelling. London, Academic Press. Shallice, T. and Warrington, E. K. (1975) Word recognition in a phonemic dyslexic patient. Quart, J. exp. Psychol., 22, 26 1-273. Spreen, O., Borkowski, J. G. and Gordon, A. M. (1966) Effects of abstractness, meaningfulness, and phonetic structure on auditory recognition of nouns. J. Speech Hear. Res., 9, 6199625. Thorndike, E. L. and Lorge, I. (1944) Tlte Teacher’s Wordhook of 30,000 words. New York: Columbia University, Teachers College, Bureau of Publications. Vernon. M. D. (1929) The errors made in reading. Medical Research Council Reports of the Committee upon the Physiology of Vision, Special Report Series, No. 130. London, His Majesty’s Stationery Office.

420

Andrew

W. Ellis

Vernon. M. D. (1979) Variability in reading retardation. Br. .I. Psychol., 70, 7-16. Watkins, M. J.. Watkins, 0. C. and Crowd&. R. A. (1974) The modality effect in free and serial recall as a function of phonological similarity. J. verb. Learn. verb. Rehav, 13, 430 447. Wells, II. L. (1906) Linguistic iapses. In J. ilcK. Cattell and F. J. E. Woodbridge (eds.), Archives of

Philosoph.v, Ps.vcholo~~~arid Scirrltific Methods, A’o. 6, I/Coltrmhia University Contrihutiorz to Philosophy and Psvcholog~~, 1.01. 14. .Vo. 3). New York, Science Press. Wickelgrcn, W. A. (1965) Short-term memory for phonemically-similar lists. Amer. J. Psychol.. 78, 567.~ 574.

Cognition, @Elsevier

I (1979) 421-433 Sequoia S.A., Lausanne

Discussion - Printed

in the Netherlands

The nature of the reading deficit in developmental dyslexia: A reply to Ellis* ANTHONY Deakin

F. JORM

University,

Australia

Ellis (1979) credits me with the view that “developmental dyslexia regarded as a genetic form of deep dyslexia” and then proceeds to my grounds for this view are ill-founded. He offers an alternative that developmental dyslexia may be likened to (acquired) surface In this paper I will attempt to answer Ellis’ (1979) criticisms and critique of his alternative proposal.

Developmental

might be argue that proposal, dyslexia. provide a

and Phonemic Dyslexia

The major point I would like to make is that Ellis has misrepresented my position somewhat. I did not, as he implies, propose that developmental dyslexia and (acquired) phonemic/deep dyslexia are functionally equivalent disorders. Rather, I pointed out certain functional similarities between the two disorders. This distinction is a crucial one. Because Ellis’ criticisms are directed at the view that developmental and phonemic dyslexia are functionally equivalent disorders, they are largely irrelevant to the theory I proposed. In order to clarify the matter, I will reiterate my original argument. What I proposed was that developmental dyslexia results from a genetically-based dysfunction of the left inferior parietal lobule. One of the lines of evidence cited to support this view was that this region of the brain plays a crucial role in reading, in particular reading using grapheme-phoneme rules. Evidence was cited to support the view that lesions to this region produce a deficit in reading via this phonological route. Furthermore, it was pointed out that “this form of acquired dyslexia has certain functional similarities to developmental dyslexia” (p. 26), namely difficulty in reading nonsense words, greater difficulty in reading low-imagery words than high-imagery words, and a tendency to make visual errors in reading. In short, what I argued was that developmental and phonemic dyslexia both involve a difficulty in reading via

*The author wishes to thank B. A. Kitchener for helpful comments on the manuscript and P. H. K. Seymour for providing a prepublication copy of the Seymour & Porpodas article. Requests for reprints should be sent to A. F. Jorm, Cognitive Psychology Research Group, School of Education, Deakin University, Geelong, Victoria, 3217, Australia.

422

A. F. Jornl

the phonological route and that this results in some similarities in the sort of words these dyslexics find difficult to read and in the sort of reading errors they make. However, I did not argue that the reading deficit is functionally identical in both disorders. The distinction may seem slight, but it is an important one. In phonemic dyslexia, brain damage has produced a total or near total blockage of the phonological reading route, whereas in developmental dyslexia the delayed or incomplete maturation which I hypothesized would lead only to poorer (rather than nonexistent) performance at reading via this route. I did not claim, as Ellis suggests, that developmental dyslexics “lack all knowledge of grapheme-phoneme conversion rules”. A second and more obvious difference between the two disorders is that phonemic dyslexia involves the loss of the ability to carry out grapheme-phoneme conversion after reading skills are already well developed, while developmental dyslexia involves a difficulty in grapheme-phoneme conversion which is present from the start of reading instruction. Both of these factors will produce differences between the reading performances of the two groups. However, despite these differences, there should be some similarities between the reading performances of developmental and phonemic dyslexics if in both disorders the phonological reading route is impaired and the direct visual route is intact. More particularly, words which are more easily handled by the direct visual route (e.g., high-imagery words) should be read better than words which are more easily handled by the phonological route (e.g., nonsense words, lowimagery words). Furthermore, reading errors which are characteristic of the direct visual route should predominate in both disorders. Having set the record straight on this matter, I will now deal with some of Ellis’ specific criticisms of the evidence I cited to support the theory.

Ellis questions whether developmental dyslexics are like phonemic dyslexics and “sit mute in front of printed nonwords” and suggests that they may fail at this task because of “the application of inappropriate, irregular letter-tosound correspondences”. The evidence I cited on the ability of developmental dyslexics to read simple nonsense words comes from an unpublished thesis by Firth (1972). Firth gave a 170-item nonsense word reading test to 8 year old children who were classified as being bad or average readers and of average or low IQ. His results are presented in Table 1. As can be seen from the table, the difference in performance between the bad and average readers is very large indeed. When this test was used to classify the sample (a total of 96 children) as bad or average readers, it produced 98% correct classifications. This result indicates that the notion of a phonological recoding deficit is sufficient in itself

Developmental

Table 1.

dyslexia: Reply to Ellis

423

Performance of bad and average eight year old readers on a nonsense word reading test (from Firth, 1972)

Group

Low IQ Average IQ

Average

Bad readers

readers

F ratio

Mean

S.D.

Mean

S.D.

18.5 35.4

19.8 25.6

119.0 118.0

17.8 24.5

356.80 127.54

to account for the reading disabilities of these children. By contrast, the ability to associate spoken words with strings of letter-like visual symbols (a measure of individual differences in the direct visual route), did not discriminate the groups at all. The F ratios for this contrast were both less than 1.0 and the task was found to produce 52% misclassifications. If, as Ellis suggests, dyslexic children have difficulties in reading via the direct visual route, we would expect this task to be a somewhat better discriminator than this. Firth (1972) offers no quantitative data on the sorts of errors made by children in attempting to read the nonsense words. However, he has this to say : The average readers sailed through the nonsense word test very rapidly, sometimes so fast that it was difficult to keep pace in recording answers. Usually they did not “sound out” these unfamiliar words, but pronounced them without hesitation. The bad readers, by contrast, found the task very difficult. The errors made by the bad readers were usually failures to produce any pronunciation at all, rather than the production of incorrect pronunciations. The worst of the bad readers, although able to read a few Schonell RI words, could not produce any pronunciation at all for these nonsense words. Even with explanations, examples, coaching, and sounding of some letters by the tester, they still found the task impossible. The best of the bad readers had some idea of what phonics was about, and could produce some correct pronunciations. However, these pronunciations were produced very slowly and labouriously and with much sounding out of the letters (pp. 1234).

However, data I have collected with 12 older children (mean age 11-2, mean reading age 8-2, mean IQ 111) showed that failures to produce a correct pronunciation are more common than omissions. These children were asked to read 15 nonsense words which were generated by taking some highimagery nouns and altering the initial letter (e.g., doctor became factor, and letter became retter). These children could read an average of 58% of the nonsense words correctly, whereas a matched control group read an average of 92% correctly. 36% of the dyslexic reading errors were real words, 57% were neologisms, and only 7% were omissions. It is interesting to note that

424

A. F. Jorm

despite the fact that the dyslexics could read only an average of 58% of the nonsense words, they managed to read an average of 88% of the real words from which the nonsense words were derived. For each of the 12 dyslexic children, performance on the real words was better than performance on the nonsense words, suggesting that these children must have been relying to a large extent on the direct visual route to read these words correctly. This latter finding argues against the position, which Ellis defends, that developmental dyslexics have an impairment of the visual-to-semantic route. Taken together, these results indicate that in the early years of primary school the dyslexic child is almost totally unable to generate pronunciations for written nonsense words, while by the final years of primary school nonsense words can be read but not very accurately. The effect

of imagery

on word

reading

Ellis has a number of criticisms of my (1977a) work showing that word imagery is a strong predictor of how easy a word is to read for disabled readers. In his criticisms, Ellis has misrepresented this study in a number of respects. Firstly, this study did not show, as Ellis claims, that concrete nouns are read more successfully than abstract nouns, but rather that high-imagery nouns are read more successfully than low-imagery nouns. Experiment 1 of the study showed that concreteness did not correlate with reading success at all when the effects of imagery, frequency, and length were partialled out, whereas imagery did correlate significantly with reading success when the effects of concreteness, frequency, and length were partialled out. Furthermore, in Experiment 2 of the study, the words were selected on the basis of their imagery values rather than their concreteness. A second misrepresentation is Ellis’ claim that: ...although Jorm’s (1977) concrete and abstract nouns were matched on overall word frequency in samples of written English, as assessed by Thorndike and Large (1944), nevertheless concrete nouns might plausibly be expected to predominate over abstract nouns in the reading experience of poor readers, particularly in the sorts of books commonly employed which contain abundant illustrations, with single words or short sentences describing the illustrations and naming objects depicted in them. What Ellis does not mention is that Experiment 2 matched the high-imagery and low-imagery nouns not only for Thorndike-Lorge (1944) frequency, but also for frequency according to Carroll, Davies and Richman’s (197 1) word count on children’s books. Furthermore, the nouns used in Experiment 1 were taken from a series of elementary reading books and all had frequencies of occurrence of greater than 4 in these books. The frequency estimates used in Experiment 1 for correlational purposes were again taken from

Developmental

dyslexia: Reply to Ellis

425

the Carroll et al. (197 1) word count. However, even if Ellis were correct that children get greater exposure to words which are easily illustrated in books, this might explain a concreteness effect (since concrete nouns are defined as those with sensory referents), but could not explain an imagery effect. Ellis goes on to argue that even if imagery does affect readability, this effect may apply to all readers, both normal and dislexic, and my failure to find such a result with good readers may have been due to a ceiling effect. Some evidence relevant to this question comes from studies by Richardson (1976) and Whaley (1978) examining the effects of word attributes on response latencies in simple reading tasks. Because these studies used response times as their dependent measure, they are not subject to the criticism that ceiling effects may have been operating. Using undergraduate students as subjects, Richardson (1976) found that word imagery had no significant effect on either pronunciation latency or word-nonword classification time. With a similar sample of subjects, Whaley (1978) looked at the correlations between various word attributes and word-nonword classification time. Using his data, there is a significant correlation of 0.29 between word imagery and reciprocal response time when the effects of concreteness, length, and frequency are partialled out. (However, with this type of data analysis, we do not know whether this result holds across subjects as well as across words.) More generally though, I believe that Ellis’ criticism may be missing the whole point. I do not deny that imagery affects the reading processes of good readers. In fact, the whole thrust of the third experiment of my (1977a) study was to show that this is the case. What I would suggest is that word imagery affects the ease with which a word can be read via the direct visual route. Since both normal and dyslexic readers use this route, word imagery will affect their processing. However, when the phonological route is operating efficiently, word imagery does not affect overall reading accuracy because words which cannot be handled by the direct visual route alone are handled by the phonological route (or perhaps some interaction of the two routes). Thus, good readers can read high-imagery and low-imagery nouns equally well and make very few reading errors. However, in developmental dyslexics where the phonological route is not operating efficiently, and in phonemic dyslexics where this route is not operating at all, the reader is forced to rely on the visual route and consequently word imagery predicts reading performance. Pattern of errors made

One of the points of evidence I cited to support the notion that developmental and phonemic dyslexia share functional similarities was that both involve a tendency to make visual errors in reading. Ellis notes that visual errors occur

426

A. b: Jam

in all types of acquired dyslexia and hence this finding tells us nothing about which type of acquired dyslexia developmental dyslexia is closest to. I must agree with Ellis on this point, although it should be noted that Marshall and Newcombe (1973) claim that only about 2% of the errors of their surface dyslexic patients were visual confusions. Ellis also points out that developmental dyslexia is clearly different from phonemic dyslexia in that the latter involves a tendency to make semantic errors whereas the former does not. Again, I must agree with Ellis that developmental dyslexics do not make pure semantic errors (without a visual component). However, I would also point out that the frequency of pure semantic errors varies considerably from one phonemic dyslexic to another, and in some cases is very low. For example, Marshall and Newcombe’s (1973) patient K.U. produced only 2 pure semantic errors in reading 170 words. Similarly only 4% of Shallice and Warrington’s (1975) patient’s errors were purely semantic (without a visual or derivational component). What does seem to be a general characteristic of phonemic dyslexics is a tendency to make derivational semantic errors (which in most cases are also visually related to the stimulus word). However, developmental dyslexics also sometimes make semantic errors which have a visual component. Some examples from the errors recorded in my 1977a study are: Christmas + “Christian”, prince + “princess”, life + “live”, surzslzine + “sunny”, slzeep -+ “shepherd”. Baron (1979) also reports some reading errors of this type in a group of dyslexic children. However, it may be more parsimonious to regard this type of error in dyslexic children as being visual rather than semantic. One obvious reason for the failure to find pure semantic errors in developmental dyslexia is that generally the dyslexic child has some ability to apply grapheme-phoneme rules to written words and this rudimentary ability is sufficient to rule out any pure semantic errors. For example, a child has only to be able to sound out the first letter of corn to know that it is not pronounced “wheat” - a full decoding is not necessary. Pure semantic errors are really only possible (at least in reading single words) where the ability to phonologically recode using rules is totally absent. Short-term

memor),

impairment

Ellis finds it hard to reconcile my juxtaposed claims that a deficit in the auditory-verbal short-term store of developmental dyslexics can account for both their susceptibility to temporal order errors and their immunity to phonological confusions in immediate recall. However, there is not necessarily any contradiction here at all. If we assume that the auditory-verbal short-term store is specialised to hold information in a phonological code and to store information about the temporal order of events, then it is to be expected

Developmental dyslexia: Reply to Ellis

421

that a child who has a deficiency of this store will show both little evidence of using a phonological code (i.e., phonological confusions) and poor retention of order information. The evidence which Ellis cites to show that phonologically confusable stimuli produce more order errors in immediate recall is quite irrelevant to the matter of how individual differences in the incidence of various types of errors arise. Ellis goes on to point out that it has yet to be shown that a poor memory span is necessary for the occurrence of either developmental or phonemic dyslexia. Yet again, I must agree with Ellis. However, it must be remembered that memory span is an imperfect measure of the funtioning of the auditoryverbal short-term store, since it is also affected considerably by control processes and the extent of a person’s long-term memory knowledge base; (Chi, 1976, for example, goes so far as to argue that these factors are solely responsible for age differences in memory span). A strict test of the notion that dyslexia is necessarily associated with deficiencies of short-term storage requires a more satisfactory measure than memory span.

Surface Dyslexia and Developmental

Dyslexia

I will now turn to Ellis’ proposal that developmental dyslexia may be likened to surface dyslexia. Ellis’ evidence for this view comes from the work of Holmes (I 973, 1978). As I have not had access to Holmes’ (1973) unpublished thesis, I will confine my comments to her published (1978) work. Holmes (1978) reports that the majority of the reading errors made by both developmental and surface dyslexics can be classified as partial failures of grapheme-phoneme correspondence. I would agree that dyslexic children sometimes make reading errors of this kind; such errors are particularly evident when dyslexic children give neologisms as responses. However, whether such errors are in the majority and whether they implicate a deticit of the direct visual route is debatable. A major problem in evaluating Holmes (1978) evidence is that she only presents selected examples of errors to illustrate her conclusions rather than a quantitative analysis of all the data or a complete corpus of her subjects’ errors. Furthermore, in the examples she cites, Holmes (1978) generally does not clearly designate those errors made by the children and those made by the adult patients. Thus, the reader is left with little recourse but to accept Holmes’ own interpretation of her findings. Yet despite the difficulties which her paper presents to the critical reader, Holmes’ analysis of her subjects’ errors can be seen to be inadequate in some respects. Many of the errors she cites as partial failures of grapheme-phoneme

428

A. F. Jom

correspondence could be equally plausibly classified as visual e.g., certain --f “carton”, beggar -+ “badger”, muscle + “musical”, reign + “region”, revise -+ “rivers”. Holmes is aware of this possibility and comments at one point: “Some readers may argue that at least some of these errors can be attributed to the visual similarity of stimulus and response words. To some extent such a judgment must remain a matter of personal bias (p. 93)“. A more general problem is that, even if Holmes’ (1978) error analysis is valid, the sort of reading deficit implied by such errors is by no means clear. If, as Holmes suggests, the reading errors of developmental dyslexics are characterised by a partial failure of grapheme-phoneme correspondence rules, then it seems odd for Ellis to argue that their primary deficit is in the direct visual route. It seems more plausible that a failure of rules would indicate a deficit of the phonological route. Ellis cites the work of Seymour and Porpodas (in press) as evidence against my position that developmental dyslexia involves a deficit of the phonological route with the direct route intact. Seymour and Porpodas concluded that the four dyslexic boys they studied showed some impairment of both routes. However, a careful look at Seymour and Porpodas’ findings and conclusions reveals that there is no incompatibility with my position. Seymour and Porpodas assessed the functioning of the direct route by tasks which compared performance on words versus nonwords, high-frequency words versus lowfrequency words, and irregular words versus regular words. These tasks were all measures of the achievement of the direct route (i.e., the extent of the subject’s sight-word vocabulary) rather than the ability of the direct route (i.e., the subject’s capability at forming symbol-meaning or symbol-sound associations). As Seymour and Porpodas themselves point out, the development of an extensive sight-word vocabulary may depend partly on having an adequate phonological route. This is exactly what I proposed in my original paper. Children with good phonics skills have a built-in teacher which they can use to add new words to their sight-word vocabulary, whereas children with poor phonics skills must rely on an external teacher for increasing their sight-word vocabulary. Poor achievement of the direct route is thus predicted from my theory, but poor ability of this route is not. Evidence recently reported by Baron (1977) and Brooks (1977) shows quite dramatically the dependence of the direct route on the phonological route. They taught adult subjects to associate strings of printed symbols from an artificial alphabet with spoken responses. In one condition of the experiment (the orthographic condition), the symbols could be related to the responses using grapheme-phoneme correspondence rules, while in the second condition (the paired-associate condition) there was only an arbitrary relationship between the symbols and responses. The surprising result was that

Developmental dyslexia: Reply to Ellis

429

after several hundred practice trials with the artificial words, the words in the orthographic condition were read faster than the words in the pairedassociate condition. Even after extensive practice, response times were influenced by the possibility of using grapheme-phoneme correspondence rules. Thus, a deficit in the phonological route in dyslexic children would produce not only a limited sight-word vocabulary, but also slow performance with highly familiar words.

The Notion of Sub-types of Developmental

Dyslexia

Ellis suggests that there may be subtypes of developmental dyslexia, with some dyslexics being impaired on both the visual and phonological routes and others impaired disproportionately on the phonological route. He claims that I preemptorily dismiss the notion that there are subtypes and cites the references in Vernon’s (1979) article as evidence in support of this notion. I would wish to make it clear that I do not deny the possibility that subtypes of developmental dyslexia exist. The theory I have presented might really only be applicable to one subgroup of dyslexics. However, I would argue that at present there is no satisfactory evidence to support the notion of subtypes and that it is therefore better to adopt the more parsimonious view that developmental dyslexia is a unitary disorder. Let us take, for example, the studies of Naidoo (1972), Boder (1973), and Doehring and Hoshko (1977) which Vernon (1979) cites as indicating the existence of subtypes. Although these three studies are amongst the best available on the subject, I would argue that they do not provide any evidence to support the notion of distinct subtypes. Naidoo (1972) attempted to identify subgroups by carrying out a single linkage cluster analysis on data collected from over 90 dyslexic boys. Variables relating to “developmental history, neurological status, speech, language, auditory memory, visuo-spatial function, arithmetic, perinatal status, and familial factors (p. 99)” were used as the basis for the cluster analysis. The results of this study give little comfort to those who believe in subtypes. Naidoo ( 1972), concluded: The fact that clusters did not emerge naturally does not support the existence of clearly defined types of dyslexia in this sample. One probable reason why no clusters were evident is that some features, for example, low scores on sound blending, Digit Span and Coding and lack of hand-eye-foot concordance, occur so frequently that they are unlikely to differentiate one group from another. Another reason may be that this highly selected sample was too homogeneous to include subgroups (p. 107).

430

A. F. Jorrn

Boder’s (1973) work is often cited to support the notion that there are subtypes. She concluded that the vast majority of dyslexic children can be clearly fitted into one of three groups which are respectively characterised by a disorder of the direct visual route (dyseidetic dyslexics), a disorder of the phonological route (dysphonetic dyslexics), and a disorder of both routes (mixed dysphonetic-dyseidetic dyslexics). However, Boder (1973) presented no evidence to support this conclusion apart from a few illustrative examples of the spelling and reading errors of each type of child. She simply asserted that the children can be fitted into one of these three groups and provided no quantitative analysis to support this view. From the general descriptions which Boder (1973) gives of the three types of dyslexia, it seems equally plausible that they could be regarded as points along a single continuum of reading deficit, with dyseidetic dyslexics at one end having a mild disorder and mixed dyslexics at the other having a very severe disorder. There is also reason to doubt the ability of Boder’s testing procedure to distinguish these groups if they do exist. Central to Boder’s classification of her subjects is the use of a word reading test in which words are classified as being in the child’s “sight_ vocabulary” if they are read within one second, and are classified as being read by “word analysis-synthesis skills” if they are read within l- 10 seconds. The assumption that words read within one second are in a child’s sight vocabulary is a doubtful one. With 8 year old children, I have found (Jorm, 1977b) that the pronunciation latency for reading a nonsense word is quite often less than a second. By Boder’s (1973) testing procedure, such nonsense words would be regarded as being part of the child’s sight vocabulary. Furthermore, in the majority of cases, I found that nonsense words had pronunciation latencies of less than 1 .S seconds. Since Boder did not use any reaction time recording equipment to aid her judgements, it seems doubtful that she could accurately discriminate words read within one second from words read within 1.5 seconds. It is thus possible that many of the words she classifies as being in a child’s sight vocabulary are in fact being read by grapheme-phoneme conversion. Doehring and Hoshko’s (1977) study used Q-technique factor analysis to derive dimensions of test profile types for a group of 34 children with reading problems and a group of 3 1 children with mixed educational problems. It is only the results of the former group which are of interest in the present context. Doehring and Hoshko classified these subjects into groups on the basis of which of three factors they had highest loadings on. Three of the 34 children could not be placed in any group and a further four children fell into more than one group. The three groups of children were characterised by having special difficulties in oral word and syllable reading for Group 1, auditory-visual letter matching for Group 2, and auditory-visual matching of

Dcvelopmer~tal dyslexia: Reply to Ellis

431

words and syllables for Group 3. However, all groups were similar in that they tended to perform consistently poorly on the majority of the tests used. The major limitation of Doehring and Hoshko’s study is that it did not include the results of a normal control group in the factor analysis. The point of using a factor analysis technique should be to show that some disabled readers differ from normal readers on certain components of the reading process while other disabled readers differ from normals on other components. The factors so derived should discriminate normal readers at one end from disabled readers at the other. Since Doehring and Hoshko did not include a normal group in the factor analysis, we do not know whether the three factors they derived represent test profile differences which distinguish disabled and normal readers. They left out the source of variance (between the test profiles of disabled and normal readers) which the factor analysis should have been attempting to analyse. Other more specific criticisms can be made of this study. The tests used were very brief (3 1 tests in approximately 1 hour) and consequently may have had low reliability, in which case the factors derived could have been highly contaminated with error variance. Another criticism is that a few of their reading disabled subjects could hardly be described as having reading difficulties (e.g., a boy aged 9-2 with a reading grade level of 5-2, and a boy aged 14-9 with a reading grade level of 992). Undoubtedly, other studies could be quoted to support the notion of subtypes, but I would argue that in all cases to date they are frought with inadequacies and cannot be taken as positive evidence for the existence of subtypes. What I would agree these sorts of studies show is that dyslexics are a quite varied group. However, the presence of variability does not necessarily imply the existence of subtypes. The reading processes and cognitive abilities of dyslexics can differ considerably from one child to another without these differences being related to their reading retardation. For example, there may be individual differences among dyslexics in the ability to associate spoken words with strings of visual symbols, but the evidence suggests that such individual differences are not a source of variance in reading achievement (Firth, 1972; Jorm, 1977a). It would, therefore, not be useful to say that there are two subgroups of developmental dyslexics, with one group being good at associating spoken words with visual symbols and the other group being poor at this skill, any more than it would be useful to argue for the existence of corresponding subgroups amongst normal readers. Intensive Case Studies as a Research Strategy Ellis’ final point is that “in the field of acquired dyslexia, studies of heterogeneous groups of patients have been far less informative than intensive case

432

A. F. Jorm

studies of particular, interesting and theoretically-relevant individuals”. He argues that therefore “real advances in the understanding of developmental reading difficulties will occur only if the same approach is adopted”. I would argue to the contrary that a case study approach is unwise unless the subject population of interest is such a small one that only single cases can be obtained. It is undoubtedly because of the rarity of brain-damaged patients in whom dyslexia is the predominant problem (see Marshall & Newcombe, 1973, p. 175) that a case study approach has been used extensively in this area. I think it should be kept in mind that group studies are really case studies with replications. By replicating findings over a group of cases, we are able to sort out those characteristics which are common to all cases from those which are idiosyncratic to particular individuals. By using a case study approach we are in danger of being misled by the idiosyncracies of the particular case we are studying and by the imperfect nature of our tests. However, I would agree with Ellis that we need “intensive” studies, by which I mean studies using multiple dependent variables, if meaningful progress is to be made in the area. Let me say in conclusion that I certainly would not claim that the theory of developmental dyslexia I proposed is correct in all respects. Undoubtedly it shares the characteristic of all scientific theories of being in conflict with some evidence from the moment of its birth. I would not deny for a moment that there is some current evidence which the theory cannot easily accommodate. However, I believe that the theory accounts for the current evidence much better than any other theory which has been proposed. It is far easier to pick holes in the theory than to propose a better alternative.

References Baron,

J. (1977) Mechanisms for pronouncing printed words: Use and acquisition. In D. La Berge and S. J. Samuels (eds.), Basic Processes in Reading: Perception and Comprehension. Hillsdale, Lawrence Erlbaum. Baron, J. (1979) Orthographic and word-specific mechanisms in children’s reading of words. Child Dev., 50, 555-666. Boder, E. (1973) Developmental dyslexia: A diagnostic approach based on three atypical readingspelling patterns. Dev. Med. Child Neural., 15, 663-687. Brooks, L. R. (1977) Visual pattern in fluent word identification. In A. Reber and D. Scarborough (eds.), Towards a Psychology of Reading, Hillsdale, Lawrence Erlbaum. Carroll, J. B., Davies, P. and Richman, B. (1971) The Americam Heritage Word Frequency Book. Boston, Houghton Mifflin. Chi, M. J. [I. (1976) Short-term memory limitations in children: Capacity or processing deficits? Mem. Cog., 4, 559%572. Doehring, D. G. and Hoshko, I. M. (1977) Classification of reading problems by the Q-technique of factor analysis. Cortex, 13, 281-294.

Developmental

dyslexia: Reply to Ellis

433

Ellis, A. W. (1979) Developmental and acquired dyslexia: Some observations on Jorm (1979) Cog., 7,413-420. Firth, 1. (1972) Components of Reading Disability. Unpublished doctoral dissertation, University of New South Wales. Holmes, J. M. (1973) Dyslexia: A Neurolinguistic Study of Traumatic and Developmental Disorders of Reading. Unpublished doctoral dissertation, University of Edinburgh. Holmes, J. M. (1978) “Regression” and reading breakdown. In A. Caramazza and E. B. Zurif (eds.), Language Acquisition and Language Breakdown: Parallels and Divergencies. Baltimore, Johns Hopkins University Press. Jorm, A. F. (1977) Effect of word imagery on reading performance as a function of reader ability. J. Educ. Psychol., 69 46-54 (a). Jorm, A. F. (1977) Children’s reading processes revealed by pronunciation latencies and errors. J. Educ. Psychol., 69, 166-171 (b). Marshall, J. C. and Newcombe, F. (1973) Patterns of paralexia: A psycholinguistic approach. J. Psycholinguist. Res., 2, 175-l 99. Naidoo, S. (1972) Specific Dyslexia. London, Pitman. Richardson, J. T. E. (1976) The effects of stimulus attributes upon latency of word recognition. Brit. J. PsychoI., 67, 315-325. Seymour, P. H. K. and Porpodas, C. D. (in press). Lexical and non-lexical processing of spelling in developmental dyslexia. In U. Frith (ed.), Cognitive Processes in Spelling, London, Academic Press. Shallice, T. and Warrington, E. K. (1975) Word recognition in a phonemic dyslexic patient. Quart. J. Exp. Psychol., 27, 187-199. Thorndike, E. L. and Lorge, I. (1944) The Teacher’s Wordbook of 30,000 Words. New York: Columbia University, Teachers College, Bureau of Publications. Venezky, R. L. (1970) The Structure of English Orthography. The Hague, Mouton. Vernon, M. D. (1979) Variability in reading retardation. Brit. J. Psychol., 70, 7-16. Whaley, C. P. (1978) Word-nonword classification time. J. verb. Learn. verb. Behau., 17, 143-154.

Cognition

435

Contents of Volume 7

Number 1 Editorial.

I

MARY SUE AMMON and DAN I. SLOBIN (University of California, Berkeley) A cross-linguistic study of the processing of causative sentences, 3 ANTHONY F. JORM (Deakin University, Australia) The cognitive and neurological basis of developmental work and review, 19

dyslexia:

A theoretical

frame-

Brief Reports

HUGO VAN DER MOLEN and JOHN MORTON (MRC Applied

PsychoZogy

Unit,

Cambridge)

Remembering

plurals: Unit of coding and form of coding during serial recall, 35

ANNE CUTLER and JERRY A. FODOR (Massachusetts Institute of Technology) Semantic focus and sentence comprehension, 49

Discussions

JOHN KLOSEK (Graduate Center, CUNY) Two unargued linguistic assumptions in Kean’s “phonological” matism, 61

interpretation

of agram-

MARY-LOUISE KEAN (University of California, Irvine) Agrammatism: A phonological deficit ?, 69 HELEN GOODLUCK (University of (University of Massachusetts, Amherst) A reevaluation

of the basic operations

Wisconsin, Madison)

hypothesis,

85

JERRY A. FODOR (Massachusetts Institute of Technology) In reply to Philip Johnson-Laird, 93 Books Received,

97

and LAWRENCE

SOLAN

436

Contents

Number 2 THOMAS R. SHULTZ, ARLENE DOVER and ERIC AMSEL (McGill University) The logical and empirical bases of conservation judgements, 99 ANAT NINIO (The Hebrew Universitv, Jerusalem) Piaget’s theory of space perception in infancy, 125 FRANCESCO ANTINUCCI (CNR, Rome), ALLESSANDRO Rome) and LUCYNA GEBERT (University of Genoa) Relative clause structure, 145

relative clause perception,

DURANTI

(University oj’

and the change from SOV to SVO,

Discussions

MARK S. SEIDENBERG

(Columbia

University)

and LAURA A. PETTITO (New York

University)

Signing behavior in apes: A critical review, 177

Number 3 STEVEN PINKER (Harvard University) Formal models of language learning, 217 TIMOTHY E. MOORE (Glendon (State University of New York) Speeded recognition

College,

of ungrammaticality:

York University) and IRVING

Double violations,

BIEDERMAN

285

Discussions

STEPHEN P. SCHWARTZ (Ithaca College) Natural kind terms, 301 ANNE

ERREICH,

JUDITH

WINZEMER

MAYER and VIRGINIA

VALIAN

(CUNY

Graduate Center)

Language acquisition

hypotheses:

A reply to Goodluck and Solan, 3 17

Number 4 JOSE MORAIS,

LUZ CARY, JESUS ALEGRIA

and PAUL BERTELSON

Libre de Bruxelles)

Does awareness of speech as a sequence of phones arise spontaneously?,

323

(Universitd

Cognition

GUY WOODRUFF and DAVID PREMACK (University of Pennsylvania) Intentional communication in the chimpanzee: The development of deception,

437

333

J. LANGFORD and V. M. HOLMES (University of Melbourne) Syntactic presupposition in sentence comprehension, 363

Discussions

L. JONATHAN COHEN (Oxford University) On the psychology of prediction: Whose is the fallacy?, 385 DANIEL KAHNEMAN (University of British Columbia) and AMOS TVERSKY (Stanford University)

On the interpretation

of intuitive

probability:

A reply to Jonathan

ANDREW W. ELLIS (University of Lancaster) Developmental and acquired dyslexia: Some observations ANTHONY F. JORM (De&in University) The nature of the reading deficit in developmental

Cohen, 409

on Jorm (1979), 4 13

dyslexia: A reply to Ellis, 421

Cognition

Author

Alegria, Jesus, 323 Ammon, Mary Sue, 3 Amsel, Eric, 99 Antinucci, Francesco, 145

Gebert, Lucyna, 145 Goodluck, Helen, 85

Bertelson, Paul, 3 23 Biederman, Irving, 285

Jorm, Anthony,

Cary, Luz, 323 Cohen, Jonathan, L., 385 Cutler, Anne, 49

145

Index of Volume 7

Petitto, Laura, A., 177 Pinker, Steven, 217 Premack, David, 333

Holmes, V. M., 363 F., 19,421

Kahneman, Daniel, 409 Kean, Mary-Louise, 69 Klosek, John, 61 Langford, J., 363

Dover, Arlene, 99 Duranti, Alessandro,

439

Schwartz, Stephen, P., 301 Seidenberg, Mark, S., 177 Shultz, Thomas, R., 99 Slobin, Dan, I., 3 Solan, Lawrence, 85

Tversky, Amos, 409

Ellis, Andrew, W., 413 Erreich, Anne, 3 17

Mayer, Judith, W., 317 Moore, Timothy, E., 285 Morais, Jose, 323 Morton, John, 35

Valian, Virginia, 3 17 van der Molen, Hugo, 35

Fodor, Jerry, A., 49,93

Ninio, Anat, 125

Woodruff, Guy, 333

440

Cognition

Erratum to Volume

Mark S. Seidenberg and Laura A. Petitto, review, Cog., 7, 2, 177-215.

Signing behavior

7

in apes: A critical

Unfortunately, two major errors were introduced in this paper before printing. The printers offer their sincere apologies to the authors and readers. On page 206, the sentence starting at the end of the 25th line should have read: “If Y was a flat surface,

they typically

placed the object on it.”

they typically

placed the object in it.”

and not: “If Y was a flat surface,

On page 2 11, in the penultimate paragraph lines 23 and 24 were inadvertently added. This paragraph should have read : Second, the source of many of the problems in the existing literature may be traced to the Gardners’ statement that their analyses “do not depend on any special theory of linguistics or psycholinguistics” (1975, p. 256). Their analyses depend upon a special theory that is created de facto by their acceptance of a simplistic set of assumptions about language structure and language learning. It is possible that Washoe could have accomplished more if her trainers had possessed a richer conception of language and communication.