The Processing and Acquisition of Reference
The Processing and Acquisition of Reference
edited by Edward Gibson and Neal J. Pearlmutter
A Bradford Book The MIT Press Cambridge, Massachusetts London, England
© 2011 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. For information about quantity discounts, email
[email protected]. Set in Times New Roman and Syntax on InDesign by Asco Typesetters, Hong Kong. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data The processing and acquisition of reference / edited by Edward A. Gibson and Neal J. Pearlmutter. â•… p.â•… cm. This volume presents papers from the special session at the CUNY Sentence Processing conference that was hosted by MIT and Northeastern University in 2003. Includes bibliographical references and index. ISBN 978-0-262-01512-7 (alk. paper) 1. Reference (Linguistics) — Congresses.╇ 2. Language acquisition — Congresses.╇ I. Gibson, Edward, 1962–╇ II. Pearlmutter, Neal J., 1967– P325.5.R44P76â•… 2011 2010022680 401'.456 — dc22 10â•… 9â•… 8â•… 7â•… 6â•… 5â•… 4â•… 3â•… 2â•… 1
Contents
1
Introductionâ•…
Edward Gibson and Neal J. Pearlmutter
1
I
Children’s Acquisition and Processing of Reference
2
Cues Don’t Explain Learning: Maximal Trouble in the Determiner Systemâ•… 15
Ken Wexler
3
Children’s Use of Context in Ambiguity Resolutionâ•…
Luisa Meroni and Stephen Crain
4
Referential and Syntactic Processes: What Develops?â•…
John C. Trueswell, Anna Papafragou, and Youngon Choi
5
Parsing, Grammar, and the Challenge of Raising Children at LFâ•…
Julien Musolino and Andrea Gualmini
6
A Cross-Linguistic Study on the Interpretation of Pronouns by Children and Agrammatic Speakers: Evidence from Dutch, Spanish, and Italianâ•… 133
Esther Ruigendijk, Sergio Baauw, Shalom Zuckerman, Nada Vasić, Joke de Lange, and Sergey Avrutin
7
Processing or Pragmatics? Explaining the Coreference Delayâ•…
Tanya Reinhart
II
Adults’ Processing of Reference: Evidence from the Visual-World EyeTracking Paradigm
8
Disfluency Effects in Comprehension: How New Information Can Become Accessibleâ•… 197
Jennifer E. Arnold and Michael K. Tanenhaus
43 65 109
157
vi
Contents
9
It’s Not What You Said, It’s How You Said It: How Modification Conventions Influence On-Line Referential Processingâ•… 219
Jodi D. Edwards and Craig G. Chambers
10
The Effect of Speaker-Specific Information on Pragmatic Inferencesâ•…
Daniel Grodner and Julie C. Sedivy
11
Referential Processing in Monologue and Dialogue with and without Access to Real-World Referentsâ•… 273
Simon Garrod
III
Adults’ Processing of Reference: Evidence from Corpora and Reading Experiments
12
Noun-Phrase Anaphor Resolution: Antecedent Focus, Semantic Overlap, and the Informational Load Hypothesisâ•… 297
H. Wind Cowles and Alan Garnham
13
Investigating the Interpretation of Pronouns and Demonstratives in Finnish: Going beyond Salienceâ•… 323
Elsi Kaiser and John C. Trueswell
14
Not All Subjects Are Born Equal: A Look at Complex Sentence Structureâ•… 355
Eleni Miltsakaki
15
Complement Focus and Reference Phenomenaâ•…
Anthony J. Sanford and Linda M. Moxey
16
The Binding Problem for Language, and Its Consequences for the Neurocognition of Comprehensionâ•… 403
Peter Hagoort
Indexâ•… 437
381
239
The Processing and Acquisition of Reference
1â•…
Introduction Edward Gibson and Neal J. Pearlmutter
This volume presents papers from a special session at the Sixteenth Annual CUNY Conference on Human Sentence Processing, hosted by the Massachusetts Institute of Technology and Northeastern University in 2003. The goal of the special session was to bring together researchers in the fields of language processing and language acquisition to discuss topics of common interest: how people refer to objects in the world, how people comprehend such referential expressions, and how children acquire the abilities to refer and understand reference. Linguistic reference is particularly well suited to connecting the fields of acquisition and processing because it is an active area of research in both linguistics and psycholinguistics, because questions related to producing and understanding reference are already under investigation in both adults and children, and because it is particularly amenable to investigation using headmounted eye-tracking methods (Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy 1995; Trueswell, Sekerina, Hill, and Logrip 1999), which have enabled direct investigation of referential processes using identical designs with adults and children. The papers in the volume are divided into three parts. Part I focuses on Â�issues related to children’s acquisition and processing of reference. Parts II and III are dedicated to issues related to adults’ processing of referential information, part II focusing on work using the visual-world paradigm and part III Â�focusing on work based on corpora and reading experiments. Part I: Children’s Acquisition and Processing of Reference
In chapter 2, Ken Wexler provides a novel account of the pattern of data from the eye-tracking study by Trueswell et al. (1999), which showed that children aged 5 behave very differently from adults when following instructions like Put the frog on the napkin into the box. In particular, children seem unable to use the presence of multiple elements of the same type in the context (e.g., two
2
Gibson and Pearlmutter
frogs) in order to disambiguate the prepositional phrase on the napkin as a modifier of the frog. Wexler proposes that the difficulty that children have in this task stems from their lack of knowledge of the meaning of the word the. He summarizes evidence from the literature that suggests that young children don’t know what the means, and then goes on to show how this lack of knowledge can account for the pattern of data that Trueswell et al. observed. Wexler also discusses the proposal of Grodzinsky and Reinhart (1993) — another account of children’s difficulties in the acquisition of referential expressions which postulates processing complexity as a cause — and suggests that it cannot explain the available acquisition data from the literature either. And then, in a spirited discussion, Wexler argues that the proposals that Trueswell et al. put forward to account for their data are inadequate in other ways. The result is a thoughtful challenge for processing-based theories of aspects of language development. In chapter 3, Luisa Meroni and Stephen Crain also focus on theories of the data pattern from the eye-tracking study by Trueswell et al. (1999). Meroni and Crain argue against aspects of the interpretation of these findings by Trueswell et al. In particular, although they agree with Trueswell et al. that children overcommit to early parsing decisions, they take issue with the claim that children are less able than adults to use the referential context to guide their online parsing decisions. Rather, Meroni and Crain argue that the behaviors shown by the children are completely explainable in terms of early commitment together with an adult-level processing system, where referential information and lÂ�exico-syntactic information are used to resolve ambiguity. In order to demonstrate this, Meroni and Crain present four experiments in which children’s actions are tabulated with respect to displays like the ones that Trueswell et al. first tested. Meroni and Crain first show that children, when faced with the instruction Put the frog on the napkin in contexts with one frog on a napkin and a second not on a napkin, overwhelmingly interpret this instruction to mean that they should put the frog that is not on a napkin onto a napkin. Meroni and Crain therefore argue that the context that Trueswell et al. had used in their experiments contains a pragmatic bias to interpret the frog that is not on a napkin as the one that the experimenter is referring to. They then show that when the context is changed so that both frogs are already on napkins (one on a blue napkin, one on a red napkin) the instruction Put the frog on the red napkin is infelicitous, so that children want to know which frog the experimenter is talking about. In these contexts, it is then demonstrated that children can easily use the prepositional phrase on the red napkin as a modifier in the instruction Put the frog on the red napkin into the box. The results of these experiments are useful additions to the literature on children’s processing of referential information.
Introduction
3
In chapter 4, John Trueswell, Anna Papafragou, and Youngon Choi discuss theories of adult and child sentence processing as constrained by the available data. They focus some of their attention on the aforementioned finding by Trueswell et al. (1999) that 5-year-old children did not seem to be able to use contextual cues to resolve ambiguity in sentences like Put the frog on the nÂ�apkin into the box. Trueswell et al. (ibid.) explained this finding in terms of development of processing resources or working memory: Children’s limited working memory capacity does not allow them to use information from the context to disambiguate linguistic input. In the current paper, Trueswell et al. have refined this claim to suggest that children’s gradual improvement in their ability to use referential context is related to developmental changes in general executive-function processes, specifically the ability to select among competing representations ( Novick, Trueswell, and Thompson-Schill 2005; Trueswell and Gleitman 2004). Trueswell et al. relate this proposal to evidence from the literature on adult language processing, in which it has been shown that normal adults sometimes hold on to meanings consistent with an initially considered but eventually abandoned interpretation of a temporarily ambiguous linguistic signal (see, e.g., Christianson, Hollingworth, Halliwell, and Ferreira 2001). In addition, Trueswell et al. provide evidence and arguments against the view offered here by Wexler and they respond to the data presented here by Meroni and Crain. In chapter 5, Julien Musolino and Andrea Gualmini summarize their investigation of how sentences with ambiguous scope are resolved by children. They focus on an ambiguity involving the universal quantifier every in subject position along with a negated main predicate, as in Every horse didn’t jump over the fence. In one interpretation of this sentence — the “surface-scope” iÂ�nterpretation — the expression every horse refers to all the horses in the set,€ such that for each horse in the set it is true that it did not jump over the€ fence. In the other interpretation of this sentence — the “inverse-scope” iÂ�nterpretation — the expression every horse refers to a subset of the horses, such that it is not true that each horse jumped over the fence (i.e., there were some horses that did not jump over the fence). Whereas adults appear to be able to obtain the inverse-scope interpretation in relevant contexts, Musolino and Gualmini summarize work showing that children often seem to prefer the surface-scope interpretation, and cannot obtain the inverse-scope interpreÂ� tation. This finding suggests a stage of development where children have a strong bias in interpreting these kinds of ambiguities that disappears by adulthood. Musolino and Gualmini speculate on factors that might determine the initial preference and might make it disappear by adulthood. This interesting data pattern — children’s inability to interpret every in these constructions as
4
Gibson and Pearlmutter
referring to only a subset of entities in the set — remains a puzzle for accounts of acquisition of reference. Chapters 6 and 7 evaluate different theories of binding and coreference in human language. In an influential paper published in 1983, Tanya Reinhart proposed a division between variable-binding aspects of anaphoric dependencies (such as between himself and every bear in the sentence Every bear washed himself.) and coreference between referring expressions (such as between him and John in a sentence like John wants Mary to love him.). Chien and Wexler (1990) later showed that young children’s understanding of vÂ�ariable-binding was much better than their understanding of coreference, thus offering additional empirical support for Reinhart’s proposed dissociation between the two kinds of anaphoric dependencies. Chien and Wexler’s observation that young children can understand variable-binding but not coreference has since been replicated many times. There are several theories of what kind of knowledge is acquired by young children so that they eventually achieve adult performance on coreference. Chapters 6 and 7 present research programs and data relevant to distinguishing among some of the theories of coreference development from the literature. In chapter 6, Esther Ruigendijk, Sergio Baauw, Shalom Zuckerman, Nada Vasić, Joke de Lange, and Sergey Avrutin present data from children and agrammatic aphasic patients whose native languages were Dutch, Spanish, and Italian that are relevant to distinguishing among some theories of coreference. In particular, they tested transitive and Exceptional Case Marking constructions (e.g., The boy saw him dance.) in order to distinguish between CÂ�homsky’s (1981) binding hypothesis, on the one hand, and Reinhart and Reuland’s (1993) and Reuland’s (2001) computational-complexity-based theories, on the other. The results showed that the interpretation of pronouns was more difficult in Exceptional Case Marking constructions than in transitive constructions in both children and agrammatic aphasics. Ruigendijk et al. argue that their results are more compatible with Reinhart and Reuland’s (1993) and Reuland’s (2001) hypotheses than with Chomsky’s (1981) binding theory. In chapter 7, Tanya Reinhart describes a research program that seeks to discover the nature of coreference in human language by using evidence from adult acceptability judgments on possible sentences and evidence from child language acquisition. In 1983, Reinhart had proposed that children are delayed in acquiring a pragmatic principle relevant to understanding appropriate coÂ� reference. In more recent work, Grodzinsky and Reinhart (1993) proposed that the computation involved in coreference is more complex than that involved in variable-binding, and that this computational complexity may explain children’s relative difficulties in the relevant tasks. Thornton and Wexler (1999)
Introduction
5
then provided arguments against the processing account. They maintained that the coreference delay reflects a pragmatic deficiency, and they developed an analysis of the relevant pragmatic factors. In the current paper, Reinhart summarizes this earlier work and presents further arguments in favor of the computational complexity account. On a very sad note, Tanya Reinhart passed away in the spring of 2007. We will all miss her greatly. Part II: Adults’ Processing of Reference: Evidence from the Visual-World Eye-Tracking Paradigm
In chapter 8, Jennifer Arnold and Mike Tanenhaus use a novel method in order to measure the relative accessibility of different referents to a particular speaker during language comprehension. Rather than relying on production data or reading times to different kinds of noun-phrase referents, Arnold and Tanenhaus recorded people’s eye movements to different physical objects in a scene. They manipulated how fluently the referent was pronounced in the utterance and found strong evidence that the fluency of production of a referent affected interpretation of that referent. In particular, listeners preferred to look at dÂ�iscourse-given objects when the productions were fluent, but they looked at discourse-new objects when the productions were slower with suggestions of disfluency. The results from these experiments suggest that the notion of discourse “accessibility” is complex. In particular, people may expect reference to non-given referents when they notice that their conversation partner is having difficulty accessing the words for the target referent. If the accessibility of a referent from a speaker’s point of view is determined by whether the speaker has that referent activated in his or her mind, we can measure that accessibility by cues in their speech, such that non-fluent speech may indicate a shift to talking about something that has not been talked about recently. Furthermore, the results suggest that naturalistic productions can be very informative: speech “errors” such as disfluent speech can be used to understand how the human production and comprehension mechanisms work. In chapter 9, Jodi Edwards and Craig Chambers present two head-mounted eye-tracking experiments investigating the use of probabilistic information in resolving noun-phrase reference in the comprehension of spoken language. They focus on comprehenders’ ability to use information about the likelihood of prenominal versus postnominal modification to pick out a referent in visual arrays. The first experiment used simple spoken instructions with a pÂ�ostnominal noun-phrase modifier (e.g., Click on the square with the diamonds) and compared visual displays with a target object that required a postnominal modifier
6
Gibson and Pearlmutter
and one of four competitor objects. The competitor could be a different category from the target (e.g., a circle instead of a square), or could be of the same category as the target (e.g., a square) but marked with a different property that is typically expressed with a prenominal modifier (e.g., the green square), marked with a different property that could be specified either postnominally or prenominally (e.g., the starred square / the square with the star), or marked with a property that required a postnominal modifier (e.g., the square with the happy face). In the experiment, comprehenders showed clear sensitivity to the likelihood that a postnominal modifier would be used: The proportion of looks to the competitor object from the onset of the postmodifier increased steadily with the likelihood that a postnominal modifier would be used for the particÂ� ular contrast present in the environment. This is the first demonstration of sensitivity to this rather subtle property of the relationship between the internal syntactic structure of the referring noun phrase and specific properties of the potential referents in the environment. In the second experiment, however, Edwards and Chambers demonstrate at least one limit on this sensitivity: Within a paradigm similar to the original examination of contextual disambiguation of€a syntactic prepositional-phrase-attachment ambiguity by Tanenhaus et al. (1995), comprehenders showed no sensitivity to the relative likelihood of prenominal versus postnominal modification; difficulty with the ambiguity was not affected by this potentially useful constraint. Edwards and Chambers conclude with a discussion of a variety of possible explanations for the limitation on the use of this likelihood information, and they expand on how information about modifier form might fit with other sources of constraint on reference related to common ground, conversational inference, and pragmatics. In chapter 10, Daniel Grodner and Julie Sedivy continue a research program that investigates the time course of the use of pragmatic information in language comprehension. Sedivy’s earlier research demonstrated that when comprehenders encounter a referential form including a modifier that typically indicates contrastiveness (e.g., the tall glass in a set of glasses), they assume that the referential form is being used contrastively, and consequently they look to the relevant object in the scene very early when processing the referÂ� ential expression ( before the noun glass is encountered). Sedivy has argued that people are implicitly following Grice’s (1975, p. 26) maxim of quantity: “Don’t make your contribution more informative than is required for the purposes of the present exchange.” In the current volume, Grodner and Sedivy explore whether speaker-specific information affects listeners’ expectations about how speakers will refer to an item in the context. One prediction of the Gricean aÂ�pproach is as follows: If comprehenders think that a speaker is not using referring expressions in an appropriate (contrastive) way, they may adapt
Introduction
7
their responses to productions from such speakers. Grodner and Sedivy find evidence consistent with this prediction in an experiment manipulating the iÂ�nformativeness of the speaker. In particular, they demonstrate that comprehenders stop using contrastive information early in resolving reference when speakers consistently do not use contrastive elements appropriately. Grodner€and Sedivy then speculate on how a Gricean approach to understanding reference — in which perceivers must reason counterfactually about what the speaker could have said but did not — might be implemented in the human mind. In the final chapter in part II, Simon Garrod provides an overview of how the role of the situation model in referential processing can link seemingly incompatible results from studies of monologue and dialogue and from studies of reading and visual-world eye tracking. Garrod notes how results from studies of pronoun resolution in reading suggest a two-step model, in which candidate antecedents for an anaphor are first identified on the basis of gender matching and number matching, then evaluated with respect to the overall situation model. In similar visual-world studies, these stages appear to be collapsed together, and Garrod suggests an explanation in terms of differences in how the situation model is related to the text versus the visual display of potential referents. Similarly, in the case of monologue versus dialogue, Garrod focuses on how the relationship among the situation model, the preceding discourse, and the current utterance enforces different constraints on reference processes, leading to different patterns of results. He thus shows how a comparison among these different types of studies can shed new light on the processes of reference resolution in general. Part III: Adults’ Processing of Reference: Evidence from Corpora and Reading Experiments
In chapter 12, Wind Cowles and Alan Garnham examine the behavior of full noun-phrase anaphors in discourse contexts in an attempt to determine how focus status and semantic overlap between an anaphor and its antecedent interact. Following up on Almor’s (1999) Informational Load Hypothesis, which predicts that an anaphor’s semantic overlap with its antecedent should trade off against the accessibility of the antecedent, Cowles and Garnham report three experiments using noun-phrase category anaphors and manipulating the focus status of the antecedent and its typicality within the anaphor’s category (e.g., oak vs. palm as antecedent for tree), where greater typicality yields more sÂ�emantic overlap. The first experiment shows a pattern supporting Almor’s hypothesis, the anaphor being read more quickly after typical than atypical
8
Gibson and Pearlmutter
antecedents when the antecedent was not focused, and the reverse pattern when the antecedent was focused. However, the latter difference is relatively weak, and Cowles and Garnham attempt to obtain clearer patterns by adjusting the instructions, the presentation, and the comprehension questions in the second and third experiments. Surprisingly, however, not only do these manipulations not strengthen the focus by typicality interaction, they eliminate it. Noting that these results raise more questions than they answer, Cowles and Garnham go on to discuss possible explanations for the apparent instability of the focustypicality relationship and to consider what other factors may need investigation within the Informational Load Hypothesis. Antecedent accessibility and salience in reference are also examined in chapters 13 and 14. In chapter 13, Elsi Kaiser and John Trueswell argue that the salience notion commonly employed in the accessibility/salience hierarchy for referential forms is oversimplified. They compare two third-person anaphoric forms referring to humans in Finnish: the gender-neutral pronoun hän (s/ he) and the demonstrative tämä (this). Pronominal and demonstrative forms are located at different points in the accessibility hierarchy, with pronominals hypothesized to refer to entities more salient than demonstratives. But Kaiser and Trueswell show in two off-line sentence-completion experiments and an on-line eye-movement monitoring experiment that hän and tämä depend differently on factors that affect saliency — specifically, word order (informationstructure constraints) and grammatical role (subject vs. object). In particular, hän is used largely to refer to subjects regardless of the previous sentence’s word order, whereas tämä typically refers to the object in a preceding subjectverb-object sentence but refers about equally often to the subject and the object when the preceding sentence’s word order is object-verb-subject. Thus reference for the two anaphoric forms cannot depend on a single measure of salience, because that measure would sometimes have to depend on word order and sometimes not. Kaiser and Trueswell suggest instead that pronominals like hän refer to the most salient element in the available syntactic/semantic representation, while demonstratives like tämä refer to the most salient element in the current discourse representation or mental model. These data not only provide a specific result of interest to understanding reference but also illustrate the importance of investigating such questions cross-linguistically. Eleni Miltsakaki, in chapter 14, makes use of Centering Theory as a framework for another examination involving antecedent salience. She assumes the traditional accessibility/salience hierarchy for referential forms in investigating the effect of subordinate clauses on topichood and salience. In particular, she examines whether subordinate clauses form independent utterance domains and whether entities referenced within such clauses interact with those
Introduction
9
in main clauses for the purposes of determining salience. She discusses several experiments on adverbial subordinate clauses, the results of which suggest that entity topic status is updated sentence-by-sentence rather than clause-by-clause and that entities in subordinate clauses are relatively less available as candidate topics. She then investigates relative clauses in a pair of corpus studies. She contrasts use of pronouns versus full noun-phrase referring expressions to refer to entities within preceding relative clauses, in light of the accessibility hier� archy. The data show that pronouns are rarely used, and that when they are used the salience of an antecedent entity is based on a mention outside the relative clause in addition to within it. In the second corpus study, Miltsakaki computes Centering-Theory-based topic shifts from sentences containing a sentence-final non-restrictive relative clause into the following sentence and shows that these topic shifts are less coherent if the relative clause is treated as an independent utterance than when it is treated as a part of the main clause. Both of these corpus results are consistent with the salience results from adverbial subordinate clauses. They suggest that entities in subordinate clauses are treated differently with respect to salience and topichood than those in main clauses, and that subordinate clauses ought not to be treated as independent utterances for purposes of computing these properties. In chapter 15, Tony Sanford and Linda Moxey describe a range of experimental data that they have gathered over the course of almost two decades of research investigating how different determiners and quantifiers make different sets of entities available for reference. For example, in the sentence Every student passed the physics test., the quantifier every has two sets as its semantic arguments: the set of students and the set of physics-test-passers. Under one analysis of the semantics of every, the complete set of students has to be a subset of the set of test-passers in order for the sentence to be true. We can then refer to this subset in later sentences using a pronoun like they (e.g., as in They were very happy.). A strong claim from the linguistics literature is that possible meanings of determiners and quantifiers are constrained such that the only kinds of sets that determiners and quantifiers make available cross-l�inguistically are subset relationships, but never complement sets. This claim is made explicit in Discourse Representation Theory (Kamp and Reyle 1983). If this generalization is right, there should never be a case in which we can use a pronoun to refer to the students who did not pass the test after a sentence like
student passed the physics test. However, Sanford and Moxey demonstrate that negative quantifiers like few seem to permit reference to a complement set, contrary to Discourse Representation Theory. For example, in the following sequence of sentences, the pronoun they refers to the set of students that did not pass the test:
10
Gibson and Pearlmutter
Few students passed the physics test. They couldn’t be bothered to study the night before. This is reference to the complement set. Sanford and Moxey discuss possible objections to their analysis and provide further empirical evidence in support of their claim that complement-set relationships can be made available by cÂ�ertain determiners and quantifiers. They then speculate on what possible cÂ�onstraints there are for reference with respect to quantifiers, if any, in the semantics of discourse more generally. In the closing chapter, Peter Hagoort outlines a broad-coverage model of language comprehension ( based on Vosse and Kempen 2000) and examines how it fits with neuropsychological data on sentence processing. He first summarizes the main language-relevant waveforms identified in the literature on recording of event-related potentials (ERPs), then points out the contrast in the literature on syntactic processing between syntax-first and more interactive models of sentence processing, then discusses ERP data that present difficulties for syntax-first models. Arguing for an interactive model in which information from multiple sources can be used to guide processing, he shows how cases in which syntactic processing fails in the Vosse and Kempen model can be mapped onto different ERP waveforms. With respect to semantic processing, Hagoort focuses on the question of whether computing the fit of newly processed material to existing sentential versus discourse context involves two separate stages. He describes ERP data that show largely identical patterns for violations in the two cases and argues for a model in which linking to a discourse model ( built up both within and across sentences) is crucial. Finally, he links elements of the cognitive model he proposes to particular regions of the brain. He suggests that the left posterior superior temporal gyrus may be responsible for storage and retrieval of lexical information (including syntactically relevant information stored with lexical items), while the left posterior inferior frontal cortex is responsible for integrating the retrieved information. Whereas Hagoort’s paper does not focus specifically on reference (and, indeed, the “binding problem” he discusses is not the linguistic one that arises for referentially dependent forms, but rather the more general problem of how the brain is able to link together various pieces of an utterance which arrive spread out over time), anaphora is of course a paradigmatic case of the requirement that language imposes on a comprehension system to track and combine information across time; consequently, a better understanding of the psychological and neural bases of human cognitive processes involved in combining linguistic elements more generally should inform models of reference resolution in particular.
Introduction
11
References Almor, A. 1999. Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review 106, 748–765. Chien, Y. C., and Wexler, K. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1, 225–295. Chomsky, N. 1981. Lectures on Government and Binding. Foris. Christianson, K., Hollingworth, A., Halliwell, J., and Ferreira, F. 2001. Thematic roles assigned along the garden path linger. Cognitive Psychology 42, 368– 407. Grice, H. P. 1975. Logic and conversation. In P. Cole and J. Morgan (eds.), Syntax and Semantics 3: Speech Acts. Academic Press. Grodzinsky, Y., and Reinhart, T. 1993. The innateness of binding and coreference. Linguistic Inquiry 24, 69–101. Kamp, H., and Reyle, U. 1993. From Discourse to Logic: Introduction to Model-Â� Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer. Novick, J. M., Trueswell, J. C., and Thompson-Schill, S. 2005. Toward the neural basis of parsing: Prefontal cortex and the role of selectional processes in language comprehension. Journal of Cognitive, Affective, and Behavioral Neuroscience 5, 263–281. Reinhart, T. 1983. Anaphora and Semantic Interpretation. Croom Helm. Reinhart, T., and Reuland, E. 1993. Reflexivity. Linguistic Inquiry 24, 657–720. Reuland, E. 2001. Primitives of binding. Linguistic Inquiry 32, 439– 492. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., and Sedivy, J. C. 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632–1634. Thornton, R., and Wexler, K. 1999. Principle B, VP Ellipsis, and Interpretation in Child Grammar. MIT Press. Trueswell, J. C., and Gleitman, L. 2004. Children’s eye movements during listening: Developmental evidence for a constraint-based theory of sentence processing. In J. M. Henderson and F. Ferreira (eds.), The Interface Of Language, Vision, and Action: Eye Movements and the Visual World. Psychology Press. Trueswell, J. C., Sekerina, I., Hill, N. M., and Logrip, M. L. 1999. The kindergartenpath effect: Studying on-line sentence processing in young children. Cognition 73, 89–134. Vosse, T., and Kempen, G. A. M. 2000. Syntactic structure assembly in human parsing: A computational model based on competitive inhibition and lexicalist grammar. Cognition 75, 105–143.
Iâ•…
Children’s Acquisition and Processing of Reference
2â•…
Cues Don’t Explain Learning: Maximal Trouble in the Determiner System Ken Wexler
Development of the Computational System of Language
This paper is devoted to a theory (and discussion of some relevant data) of the development of the definite and indefinite determiner in children, with special attention paid to the distinction between competence ( knowledge) and performance ( processing in this case). In psycholinguistics, the study of performance is dependent on the study of competence. Ever since Chomsky formulated the competence/performance distinction in modern generative grammar, empirical research has yielded a large amount of evidence for the fruitfulness, even necessity, of such a distinction. The reason it is so easy to forget this distinction in adult psycholinguistics is that the competence basis underlying research comes largely by assumption — competence is determined by linguistic theory, and psycholinguists routinely start from a linguistic description (e.g. a description of the structure of a relÂ� ative clause) and try to understand performance factors (e.g. difficulties, or errors) by adding performance assumptions to the linguistic theory. Of course, this is an oversimplification — sometimes the correct linguistic description is unknown, or at issue, sometimes performance considerations help in determining the competence, etc. Nevertheless, it is a truism to say that, by and large, the competence assumed in psycholinguistic studies is the competence as determined by linguistic theory. Such an easy assumption does not pass over to child psycholinguistics. For children, it is an empirical issue whether or not the competence described by linguistic theory applies to them, especially whole cloth. This is why most studies in the generative tradition (maybe most studies in all traditions) have started by asking what the competence of children is. For only when we have some notion of competence can we begin to investigate performance. In many areas of syntactic development, a field that has seen great progress€ in the last 15 years (see Guasti’s 2002 textbook for an introduction),
16
Wexler
pÂ�sycholinguists have established many results about children’s competence. In broad outline, children know Universal Grammar (UG) and much about the particular properties ( parameters being the most central case) of their own language. But there are properties in which children’s knowledge (computational systems, CS) differs from adults’. In particular, there are slight differences between children’s representation of UG and that of adults. The idea is that the child’s CS grows into the adult’s. For example, there is general agreement that children pass through the Optional Infinitive stage of grammatical development because of a restriction on their grammars; proposals include the Tense Omission Hypothesis of Wexler (1990, 1994), the Truncation Hypothesis of Rizzi (1994), the ATOM model of Schütze and Wexler (1996), and the Unique Checking Constraint (UCC) of Wexler (1998). The latter predicts many syntactic phenomena through about age 2;6 or 3;0, including which languages go through the optional infinitive stage, properties of case, etc. The child has set the correct relevant parameter values, but the restriction causes the CS to take finite sentences as ungrammatical. All the papers I’ve just referenced state a syntactic (maturational) restriction on the CS and attempt to explain the phenomena via this restriction. Another example is the conclusion that verbal passive and unaccusative sentences are taken as ungrammatical by the child’s CS up to about age 5; again, this is not due to a parameter mis-setting, but rather to a restriction on the child’s grammar, for example the A-Chain Deficit Hypothesis of Borer and Wexler (1987, 1992), its re-statement by Babyonyshev, Ganger, Pesetsky, and Wexler (2001), or the Full Phase Hypothesis of Wexler (2004). There are several other examples, but this is not the place to review them. The point to remember is simply that in syntactic development, a field that has made great strides, the conclusion is that the child’s competence, her CS, has developmental restrictions that are not present in adult grammar. The assumption is that the growth of the child’s brain eliminates these restrictions, expanding the range of syntactic competence.1 In contrast, the study of the acquisition of semantics has for the most part not undergone such a development. The major approach within generative studies, at least, has been to assume that the child knows everything about semantics but that pragmatic development is delayed (e.g. Wexler and Chien 1985; Chien and Wexler 1990; Wexler 1999; Thornton and Wexler 1999). Stephen Crain has argued likewise that children know the universal properties of semantics; see, for example, Crain and Thornton 1998. The paradigmatic example is the argument by Wexler and Chien (1985) and Chien and Wexler (1990) that the Apparent Delay of Principle B effect is due to pragmatic difficulties with reference, rather than to lack of knowledge of syntactic or semantic principles. Briefly, Wexler and Chien (1985) showed that children to about age 5 accept
Cues Don’t Explain Learning
17
sentences like Mama Bear pointed to her as meaning that Mama Bear pointed to herself. This appears to be a violation of Principle B, but Wexler and Chien (1985) proposed that Principle B was not violated by the children. Rather, a pragmatic principle of reference was violated. Chien and Wexler (1990) showed that indeed it was a pragmatic/referential principle that was at stake by experimentally demonstrating that in bound variable contexts like every bear pointed to her (where the referential principle was irrelevant) children did not violate Principle B.2 We can say that the standard “working strategy” of semantic development (to date) is the External Hypothesis: Semantic properties of UG are known to the child, but “external” properties may be delayed. That is, the External Hypothesis says that all aspects of the computational system of semantics are known to the child but that linguistic properties external to the computational system (e.g. pragmatic properties, or properties of processing) may not be fully developed in the child. In this paper, I challenge the External Hypothesis, at least for the development of determiners. In this regard, I seek to bring syntactic development and semantic development closer together, and to provide a motivation for semantic development other than the “confirmation” of linguistic theory in the child. The title of this paper is a pun — it uses maximal in two senses. On the one hand, the working strategy of the paper is to cause maximal trouble to the standard External Hypothesis, by arguing that children actually have a different representation of determiners than do adults; there is a difference in the semantic representations of determiners. On the other hand, maximal trouble also refers to the difficulties that I will claim children have with determiners, namely with the assumption of maximality for the definite determiner. I will discuss the classic empirical problems with use of determiners, discuss the standard “pragmatic” solution, illustrate why this solution is not complete, propose a new solution based on difficulties with the semantic feature of maximality, and show how this analysis captures the acquisition facts. I will then discuss a potential processing/memory explanation and show that it fails to capture the data. Lastly, I will show how “on-line” data using eye tracking cannot be explained using the hypotheses suggested in the literature, but follow from the hypothesis of this paper. The the → a Error in Child Language
There is a large empirical literature on the development of definite and indefinite determiners; I will mention only a few of the studies, but the generalizations I discuss are widely accepted, if not always explicit.
18
Wexler
Karmiloff-Smith (1979) (KS subsequently) is a major study of the development of French determiners. These experiments involve many children and age groups and give us a very clear idea of the overall patterns of development. Results for English (e.g. Warden 1974, 1976) confirm the French results, and I’ll discuss some English results also. For convenience, I’ll use the and a instead of the French translations. Consider when a N or the N is used, for the restrictor noun N. Intuitively, when there is a unique salient instance of N in the discourse (overt or assumed discourse), the definite determiner the is required. Otherwise, a is used.3 I will provide the standard formal definition later in the paper. KS’s Experiment 1 varies whether the object to which the child has to refer is a singleton (requiring the X╃), one of several different-colored objects (requiring the plus adjective, e.g. the blue X╃), or one of several identical objects (requiring a X╃). Here is a sketch of the set-up: The child is told (and sees) that there is a girl-doll’s playroom and a boy-doll’s playroom, with various toys in the playrooms. Here is one relevant instance: girl-doll’s playroom one blue book three multicolored balls one baby-bottle
boy-doll’s playroom three books (each a different color) one multicolored ball one car
The experimenter touches an object (e.g. the blue book in the girl-doll’s playroom) and the child is told to “ask the girl to lend you that.” The child replies and the form the child uses is noted; e.g. the child might say “lend me the blue book” (the definite response; in fact the correct adult response, since the girldoll has exactly one blue book) or “lend me a blue book” (the indefinite response). On the other hand, if the experimenter touched one of the three identical multicolored balls in the girl-doll’s playroom, we would expect an adult to use the indefinite — that is, to say something like “lend me a ball.” Demonstratives (e.g. that book) were discouraged. The age range of participants was 3;0 –9;11, with six to eight children in each year (five in the last year). For most age ranges, the total number of rÂ�esponses over the age range was 50 –70, so there is a substantial amount of data. For simplicity, I’ll summarize results here for only three age groups, ignoring the case of several different-colored objects (which supports the conclusions drawn), and I’ll add together KS’s separate counting of whether the referring expression has a modifier or not, simply counting definite or indefinite. The percentages of indefinite and definite responses for a particular sÂ�timulus (single object or several objects) for the 3-year-olds do not total
Cues Don’t Explain Learning
19
Table 2.1
Percentage of definite and indefinite determiners (based on Karmiloff-Smith 1979). Age
Type of determiner
Singleton object
Several identical objects
3;0–3;11
Definite Indefinite
╇ 62 ╇╇ 0
╇ 48 ╇╇ 3
6;0–6;11
Definite Indefinite
╇ 92 ╇╇ 8
╇ 56 ╇ 44
9;0–0;11
Definite Indefinite
100 ╇╇ 0
╇╇ 0 100
100€because, despite efforts, these children produced some demonstrative responses, which make up the remainder of the 100 percent. In table 2.1, we see that 9-year-olds are “perfect” — i.e., they agree with adult judgments, always using the definite (the) for the singleton and the iÂ�ndefinite (a) for the non-singleton. So adult controls aren’t necessary; the 9-year-olds produce behavior that agrees with adult judgments (and the results of linguistic theory). They can serve as controls, showing that in fact the tÂ�heory’s predictions (and adult judgments) are perfectly replicated in the behavior of children who are old enough. Turning to the younger children, we see that they almost never produce an indefinite for the singleton situation (0 percent for the 3-year-olds, 8 percent for the 6-year-olds), so they know that singletons take the. But younger children (including 6-year-olds) very often use the definite for the “several identical objects” case, where the indefinite is required; e.g., asking the girl-doll for one of the balls, the child will say “lend me the ball” instead of “lend me a ball” (48 percent for the 3-year-olds, 56 percent for the 6-year-olds). This is the classic aâ•–→â•–the child response (still in existence in this experiment at age 6, completely resolved by age 9). That is, a required a is instead produced as the. Crucially, the error is not a random response; children know to use the for the singleton. And it’s not lack of knowledge of a. Children use a correctly when there is no context set, as other experiments show. Now let’s briefly consider KS’s Experiment 5: girl / boy acting. There are three identical objects, e.g. pigs. A girl-doll knocks over one of them and the child is asked “what did the girl do?” Alternatively, there is a singleton object (e.g. a pig), the girl-doll knocks it over, and the child is asked “what did the girl do”? Five-year-olds and 6-year-olds responded 100 percent with the X to the singleton case (e.g. saying the girl knocked over the pig). For the three identical objects, they used the definite about 60 percent of the time, showing overuse of
20
Wexler
the definite article. Ten-year-olds used the definite article only about 13 percent of the time in the case of three identical objects. Again, we see the aâ•–→â•–the eÂ�rror for the 5-year-olds and 6-year-olds, with perfect responses when the was required, i.e. the non-existence of a theâ•–→â•–a error. In this same experiment, the girl then knocked over another X (e.g. another pig), and the child was asked what she had done. Ten-year-olds used the X here only about 11 percent of the time and used various kinds of adequate expressions (mostly another X, sometimes one of the X’s) about 83 percent of the time. Five-year-olds used the X about 56 percent of the time and the indefinite article about 36 percent of the time. Again, we see the aâ•–→â•–the error. To emphasize: Five-year-olds (and 6-year-olds too) very often used the X to refer to a different identical item; e.g. they had just talked about a pig, and then they referred to a different pig as the pig. It seems as if the first use of the pig would have established that pig as the context set, disallowing use of the to refer to another pig (as the 10-year-olds showed), but the 5-year-olds and the 6-year-olds referred to this other pig as the pig. Clearly something is amiss. There are many other experiments in the literature confirming the asymmetry of the errors (i.e., the great prevalence of the aâ•–→â•–the error) in children through at least age 5, but I don’t have the space to review the literature here. Let me just point out some of the evidence that the error isn’t due to lack of knowledge or to inability to use a. In KS’s Experiment 2, Hide and Seek, children have to talk about what is in a bag under various conditions. Five-year-olds overuse the in the case of several identical objects under conditions similar to the previous experiments. But when the experimenter asked children “What’s in the bag?” under conditions in which a discourse set wasn’t established, they almost always used an X (not the X╃) (rates of the were 9 percent, except 1 percent for one group and 3 percent for one other group). Sometimes the younger children omitted the article, but the correct use of a was never less than 80 percent for any of the groups. So children know a, and use it in many situations, despite the fact that they often use the X in the case of a context involving several objects X. This particular situation was a kind of naming situation, but there are also other kinds of situations in which a is used correctly, namely situations in which no context set of X’s has been set up, in what are called de dicto contexts. In Maratsos’s (1976) stories experiment (examining English), children have to say or guess something about a category X. In some particular conditions of the experiment, situations are set up such that no specific example of X has been introduced, but the child has to say that any instance of X is what is being sought. In these situations, a X is the correct way to introduce X. For example, in the lookingfor story, it is made clear to the child that a man is looking for a lion or a zebra,
Cues Don’t Explain Learning
21
but no particular lion or zebra is introduced (essentially these are de dicto contexts). The man is looking for any old lion or zebra. There is no context set of lions or zebras. The child has to say (guess) what the man is looking for. Two groups of 4-year-olds answered correctly with the indefinite a lion or a zebra 91 percent and 98 percent of the time, respectively. Clearly these children (who overuse the in contexts where there is a salient discourse set of N’s) know and use a appropriately in other contexts. We can conclude, then, that there is a specific problem with the choice of a or the in particular semantic situations — the problem is not a lexical gap in the child. The Egocentric Theory
Why do children make the aâ•–→â•–the error? The traditional explanation for the error is that it is an “egocentric response.” Maratsos (1976) used this term, following Piaget (1955), although Karmiloff-Smith has essentially the same explanation. The idea is that children have trouble taking the listener’s point of view. Because the X is used when the discourse context makes exactly one X salient, this discourse context must be known to both the speaker and the listener. The child, though, might use the so long as the child has one salient referent in mind, ignoring whether the listener does also. Piaget (1955, p. 115, as quoted by Maratsos, p. 11) writes: “.â•–.â•–. the explainer always gave us the impression of talking to himself, without bothering about the other child [to whom he or she was telling a story or explaining mechanical workings.] Very rarely did he succeed in placing himself at the latter’s point of view.” Maratsos ( p. 63) writes: “Our analyses have uncovered a developmental stage where egocentric definite responding is quite common. The children fail to take into account that even if they have established for themselves a particÂ� ular boy or girl, or monkey or pig that does something, that referent is not yet uniquely specified for their listener, and must be introduced to the listener with an indefinite expression.” Similarly, Karmiloff-Smith ( p. 72) writes: “It is suggested that .â•–.â•–. the use of definite referential expression [by young children] is in fact deictic. Like the demonstrative, the definite article points to an object under focus of attention and is not yet an exophoric reference to the singleton characteristic of the referent in relation to its current context.” Maratsos’s explanation is essentially pragmatic; the child ignores whether or not the listener knows which object is under discussion. Karmiloff-Smith’s explanation is similar; so long as there is a single object in the focus of attention of the child, the is used no matter whether that single object is in the “current context,” i.e., is known to the listener.
22
Wexler
Wexler and Chien’s explanation of the Principle B effect, as discussed earlier, took the error to be due to the child’s not paying attention to what the listener knew. Avrutin and Wexler (1992) explicitly related this property to the overuse of the and the pragmatic (egocentric) explanation for that use. Vinnitskaya and Wexler (2001) related exactly the same property to overuse of imperfect aspect by children speaking Russian. So the standard view in developmental studies of semantics has come to involve the External Hypothesis, discussed earlier, that all semantic properties ( properties of the computational system of language) are known to the child but that external properties (e.g. paying attention to the listener) may not have developed. The Egocentric Theory Doesn’t Account for All the Data
Thus, although the egocentric view has been standard in both generative-based and non-generative-based developmental psycholinguistics, we may have to awake from a dogmatic slumber. Can the pragmatic view explain all the developmental results on determiners? Let’s try to apply the egocentric theory to Maratsos’s stories experiment, which Karmiloff-Smith replicated with many more participants and age groups. The experimenter tells a child a story, eliciting a response. There were either several boys and girls (in the I (Indefinite) version Xsâ•–→â•–a X╃) or one boy and one girl (in the D (Definite) version, a Xâ•–→â•–the X╃). For example, one of Maratsos’s stories, the Making Noise story, is as follows: Once there was a lady. She had (I version: lots of girls and boys, about four girls and three boys; D version: a boy and a girl ). They were very noisy, and they kept her awake all the time. One night she went to bed. She told them to be very quiet. She said, ‘If anyone makes any noise, they won’t get any breakfast tomorrow.’ Then she went to bed. But do you know what happened? One of them started laughing and giggling. [If the I version was being told, the experimenter said something like ‘Now let’s see, there were four girls and three boys.’] .â•–.â•–. Who was laughing and giggling like that?
The nineteen 4-year-olds were divided into 4-High and 4-Low groups (with roughly same number of children in each) on the basis of an imitation task. Their mean accuracies are presented in table 2.2. When there was a single boy and a single girl, all the children gave close to 100 percent correct answers, i.e. the X. When there were several boys and girls, the 4-Low group used the X (incorrectly) about 58 percent of the time. This is the classic aâ•–→â•–the error; whereas when the is required, it is used correctly. We€ can take the 4-High group to have already developed the adult pattern. Karmiloff-Smith replicated Maratsos’s study (in French), but with a much
Cues Don’t Explain Learning
23
Table 2.2
Percent correct (modified from Marastsos 1975). Age group
a X expected
the X expected
4-Low 4-High
42 98
94 97
Table 2.3
Percent correct (based on Karmiloff-Smith 1979). Expected response definite
Expected response indefinite
Age
Definite
Indefinite
Other
Definite
Indefinite
Other
6 years 10 years
╇ 83 100
9 0
8 0
63 ╇ 0
37 89
╇ 0 11
larger population (68 children, from 3 to 11 years old) and a large number of items. A sampling of results is presented in table 2.3. In KS’s study, the 6-year-olds give 63 percent definite responses — a much larger percentage of error than Maratsos’s 4-year-old group.4 Nevertheless, both studies agree on the strong overuse of the in the story experiment. Maratsos and Karmiloff-Smith take it that the egocentric theory explains these results. But on inspection, there seems to be a problem for this view. I thank Irene Heim for pointing out (in our seminar on Semantics and Acquisition; also see the talk she gave at the Sixteenth Annual CUNY Conference on Human Sentence Processing) that the egocentric/pragmatic hypothesis doesn’t seem capable of explaining the results. Consider the Making Noise story. The child hears a story about “lots of boys€and girls” but isn’t shown the boys or the girls. No discussion of any of the boys or girls takes place; no individual properties are discussed. In short, no individuation of the boys or the girls takes place at all. The child has no boy€or girl to concentrate on, or, in KS’s words, to put “in focus.” So when the€child answers the boy, there can’t be any boy the child has focused on; at least, there can be no boy that the child can infer from what has been presented. For the Egocentric Theory to explain what the child does by saying that the child is concentrating on a particular boy in the set, the child would have to have invented this boy from whole cloth. While not necessarily impossible, it seems unlikely, and it is at least against the spirit of the explanation that says the child sees or has a concept of a particular boy, one of many, perhaps, but the one the child is concentrating on. This idea might explain, say, Karmiloff-
24
Wexler
Smith’s playroom data (table 2.1), but not the story experiments (tables 2.2 and 2.3). The Child’s Problem with Maximality
Let’s go back to square one and rethink what the child might be missing, so that the overuse of the can be understood both when the child has an object in focus and when she doesn’t. It seems natural to think that the error the child makes is to think that the X is to be used if there are any salient elements in the dÂ�iscourse; there doesn’t have to be exactly one. Such an analysis is natural as soon as one looks at the actual semantic-theory definition of the definite determiners, and why the and a are used in particular contexts. The standard (Fregean) analysis of the definite determiner in semantic theory (for now concentrating only on the case of singular NP’s) posits that the use of the N presupposes that there is exactly one N in the relevant context. If there are no N’s in the context, or if there are multiple N’s, the expression is infelicitous. The formal definition of the Fregean analysis in (1) below is taken from (47) of Heim (1991, English translation). The index i refers to the situational context.5 (1)╇Regardless of the utterance context, [the x] P expresses that proposition which is: true at an index i, if there is exactly one x at i, and it is P at i; false at an index i, if there is exactly one x at i, and it is not P at i; and truth-valueless at an index i, if there isn’t exactly one x at i. How can a x be used, e.g. a lecturer arrived ? The standard analysis of a x is simply existential, as in the formal definition in (2) below, taken from Heim’s (112): (2)╇A sentence of the form [a x] P expresses that proposition which is true if there is at least one individual which is both x and P, and false otherwise. Note that there is no “context” index specified in the definition of a; it is simply existential, i.e. at least one x with property P must exist. A classic question in semantic theory is “Why can’t a X be used when it is known that there is exactly one x?” For example, (3) (Heim’s (118)) is not acceptable; we have to say the father, not a father. Yet (2) would seem to make (3) perfectly acceptable. (3)╇ ?I interviewed a father of the victim. Semantic theory concludes that (3) is true, but that there is a rule of the form (4) (Heim’s (123)).
Cues Don’t Explain Learning
25
(4)╇In utterance situations where the presupposition for [the x] P is already known to be satisfied, it is not permitted to utter [a x] P. Since it is generally known that a person has only one father, the father of the victim is appropriate ( presuppositions are satisfied), so (4) prevents a father from being used. Heim ( p. 28) suggests that the general principle from which (4) can be deduced is (5) ( her (62)), a pragmatic principle that, Heim argues, is different from Gricean maxims (and different from whatever is responsible for scalar implicature). (5)╇ Make your contribution presuppose as much as possible. I’ll call this principle Maximize Presupposition. Thus, the adult use of the definite and indefinite has three parts: the semantic analysis of the (3), the semantic analysis of a (4), and the pragmatic principle Maximize Presupposition (5). So what is the child missing? Crucially, [the x] P has no truth-value (i.e. can’t be used) if there are no x’s at i, or if there are multiple x’s at i. We know from the experiments that children use the X when there are multiple X’s. On the other hand, when there are no X’s in the context (as in the naming experiment of KS, and as in the looking for (de dicto) type story of Maratsos), the child does not use the. We want the child to correctly use a X where there are no X’s in the context. Children seem to know that there must be a context set of X’s in order for the X to be used, but they don’t seem to know that there has to be exactly one€X; they seem to think that the X can refer to an X even if there are multiple X’s. Here is the idea for my analysis: Children take the N to presuppose the existence of an N. But they don’t have the uniqueness condition on N. So in Maratsos’s Making Noise story, the lady has lots of boys and lots of girls and one of them cries. The child answers the boy because the child presupposes that there is a boy (at least one) and asserts that a boy is crying. In other words, the child has a presupposition of existence of an N for the N to be useable, but doesn’t have a presupposition of uniqueness. The major proposal of my analysis is that the child has a different lexical entry for the, which I will call theC (the child’s analysis of adult the) and which is pronounced the for the child.6 A formal definition is given in (6). (6) Regardless of the utterance context, [theC x] P expresses that proposition which is true at an index i, if there is an x at i, and it is P at i;
26
Wexler
false at an index i, if there is an x at i, and there is no x such that x is P at i; and truth-valueless at an index i, if there is no x at I. This is modeled directly on the definition of the adult analysis of the x in (1), but presupposes only that there exists at least one x at i; the x doesn’t have to be unique. If there is at least one x that is P, the sentence is true. If there is at least one x but no x’s are P, the sentence is false. If there are no x’s, the sentence is truth-valueless. In terms of truth-value, theC Xâ•–=â•–one of the X’s when there are multiple X’s in the context, and theC Xâ•–=â•–the X when there are no X’s or exactly one X in the context. The child’s analysis of the definite and indefinite determiner system has three parts. In contrast to the adult analysis in (1, 2, and 5), the child has the semantic analysis of theC (6), the semantic analysis of a (2) and the pragmatic principle Maximize Presupposition (5). The child differs only in the semantic analysis of the; otherwise the analysis is identical to the adult’s; a receives the existential definition and Maximize Presupposition holds. The consequences of this analysis are as follows. When there are multiple X’s in the context, the child will use the X, in accordance with the experiments discussed above. This follows from the definition of theC plus Maximize Presupposition, which induces use of theC instead of a. This is the aâ•–→â•–the error. When there are no X’s in the situation, the child will (correctly) use a X, since the presupposition of existence of X for theC is not satisfied. This too is in accordance with experimental results. When there is exactly one X in the situation, the child will (correctly) use theC X, in accordance with the experimental results on singletons. If there is no context set at all, and the child is simply expressing a wish, the prediction is that the child will say, e.g., I want a cone (answering a question, such as What would you like to eat?, in which no choices are given or assumed). This is the right empirical result, as in Maratsos’s (1976) looking for story. (The prediction follows from the definition of theC cone, which presupposes at least one cone in the context.) The assumed difficulties with uniqueness capture all the facts I have discussed. In semantic theory, uniqueness generalizes to maximality when plural sets are considered (Heim 1991). Informally, the oranges are large is true if the maximal group of oranges in the context set of oranges (the ones in the situation) has the property that they all are large. The sentence is false if the oranges in this maximal set aren’t all large. And the sentence lacks truth-value if there is no maximal set (e.g. there are no oranges at all in the context). It can be
Cues Don’t Explain Learning
27
shown that in the singular case, the property of maximality reduces to existence and uniqueness (see Heim 1991). The natural move for the current developmental theory is to generalize the definition of theC to maximality, as is done in Wexler 2003. This predicts errors by the child on plural uses of the, which can be shown to occur empirically. There are many more results in the empirÂ� ical literature, including comprehension experiments. In Wexler 2003, I discuss many of these results and show that troubles with maximality underlie these errors in exactly the way we have shown for the experiments discussed in the present paper. For completeness I should mention that there is at least one other theory in the literature that claims that the aâ•–→â•–the error is due to a semantic error rather than to egocentrism. Matthewson and Schaeffer (2000) suggest that children assume that the refers to a specific entity rather than to a definite entity. That is, children do not have a presupposition of existence. Matthewson had earlier argued that Salish contains an article with this property, so Matthewson and Schaeffer suggest that children learning English might mis-set the parameter to the Salish value. For a discussion of predictions made by this hypothesis, see Ionin, Ko, and Wexler (2004), which investigates whether this property might be relevant to second-language acquisition of determiners. Most of the empirÂ� ical tests appropriate to this hypothesis have not been carried out for children. I don’t see how the hypothesis can explain Maratsos’s stories experiment, but it is worth noting that there might be a variety of hypotheses concerning semantic properties of determiners in children. A Possible External Hypothesis Analysis
On the theory I proposed in the preceding section, the semantic analysis of the is problematic for the child; the pragmatic principle of Maximize Presupposition must be in place. There is no processing difficulty, for example, with respect to Maximize Presupposition. So the External Hypothesis is wrong, on this analysis. Is there an External Hypothesis alternative? Let us consider one idea in the literature about the child’s processing difficulties that one might use€to attempt to explain the empirical results on determiners. Grodzinsky and Reinhart (1993) attempt to modify Wexler and Chien’s (1985) and Chien and Wexler’s (1990) pragmatic explanation (difficulty with the referential prinÂ� ciple) of the Apparent Principle B Delay discussed above. They accept the claim of the previous authors that there is a basic split between the syntactic and referential behavior. But instead of the pragmatic difficulty with reference€(ability to infer what other minds know; Wexler (1999)), Grodzinsky and
28
Wexler
Reinhart claim that there is a processing difficulty with reference. In particular, they claim that the poor referential behavior stems from the difficulty of keeping two representations in mind. When two representations have to be compared (necessary on their account for Principle B contexts), the child breaks down and guesses (see Reinhart, this volume). Call this No Comparisons. Thornton and Wexler (1999) present a discussion of difficulties with this processing account for Principle B effects. Here let us just see what happens when we apply this idea to determiners. Let us accept the standard semantic theory of determiners as embodied in (1), (2), and (5) above. The only aspect of this theory that demands comparison of representations is (5), Maximize Presupposition. Suppose that a should be used — that is, the presuppositions of (2) are satisfied (there are no presuppositions in (2)); and a unique entity doesn’t occur in the context set, so the presuppositions of (1) aren’t satisfied, and thus the can’t be considered. A child who has the adult semantic theory will not make an eÂ�rror on a given No Comparisons, because there is no situation in which two representations have to be compared. Now suppose that there is a unique entity in the context set. This means that the presuppositions of (1) are satisfied and the can be used. But the presuppositions of (2) (none) are also satisfied. So a can be used. The choice between (1) and (2) is determined by Maximize Presupposition (5), which chooses the. But a child subject to No Comparisons will not be able to always correctly compare the two representations (for a and the) to see which has greater presuppositions. Thus, if we apply Grodzinsky and Reinhart’s analysis, the child should guess in these situations and should use a half the time and the half the time. In other words, applying Grodzinsky and Reinhart’s analysis predicts that children will make no errors in contexts where a is correct, but will make eÂ�rrors in contexts where the is correct. This predicts an asymmetry, with many theâ•–→â•–a errors but no aâ•–→â•–the errors, exactly the opposite of the pattern that holds. Thus one well-known version of the External Hypothesis for grammatical representations (a particular version of a processing difficulty, No Comparisons) makes the wrong predictions for the determiner case. I don’t know of any other version of the External Hypothesis that might work, but of course any proposals would have to be considered on their empirical merits. How does Maximality develop in children? The most reasonable explanation seems to lie along the lines of brain-based linguistic maturation (Borer and Wexler 1987; Wexler 1999, 2002; Babyonyshev et al. 2001). If it were a question of learning, Borer and Wexler’s Triggering Problem would exist: Why
Cues Don’t Explain Learning
29
doesn’t the evidence that allows for triggering at a later age not work at the much younger age? Similarly, Babyonyshev et al. (2001) present their Argument [for maturation] from the Abundance of the Stimulus: Since there is so much evidence available to the child, why does it take so long for the child to learn if there is no maturational factor?7 It is possible that maximality itself is a notion that develops maturationally, in which case we would expect any construct that uses maximality to be delayed in the same age range as the. There is some evidence (Chien and Wexler 1990) that children until age 5 will apply every X to a subset of all the instances of X in the situation, rather than limiting its application to the entire set. It is possible that this error is due to an error on maximality. Do Children Fail to Process Interpretations On-Line? Some Problems with a Multiple-Cue Theory
Trueswell et al. (1999) claim that they have uncovered a special problem in children around age 5 that prevents them from accessing interpretation on-line in processing. They call this the “Kindergarten-path Effect.” Briefly, the argument goes as follows: Sentences like (7) are temporarily ambiguous. (7)╇ Put the frog on the napkin in the box. Before in the box is heard and processed, it is possible that on the napkin is the Destination ( VP attachment) or on the napkin is a Modifier ( N attachment) (╃ frog on the napkin). Of course, once in the box is processed, only the Modifier analysis is possible; the sentence is not ambiguous. Adults (and children) have a famous processing tendency to take the Destination reading at first in (7), therefore garden-pathing. There are many explanations for this in the literature. (For some examples, see Frazier and Fodor 1978; Frazier 1987; Trueswell and Tanenhaus 1994.) Choosing between them is not relevant to the present discussion; we simply assume the best eÂ�xplanation. It is known that in adults the Destination Preference can be suppressed by the referential situation (Tanenhaus et al. 1995). Trueswell et al. replicate the result with an eye-tracking experiment with the following set-up: One-frog situation: one frog on a napkin, one empty napkin, one cow, one box Two-frog situation: one frog on a napkin, another frog (not on a napkin), one empty napkin, one box In the one-frog situation, adults very often garden-path (following Destination preference); their eyes look at the wrong destination (the napkin) very often. In
30
Wexler
the two-frog situation, adults garden-path hardly at all; they take the Modifier reading, frog on the napkin. For standard semantic theory, as I described it earlier, the reason that the two-frog situation eliminates the garden path is clear, if we assume only that the participant uses his or her semantic knowledge while responding. Use of the frog, via maximality, presupposes that there is a unique frog in the context set. Taking the relevant N to be frog on the napkin ensures that this presupposition of maximality (uniqueness) is met, since there is only one frog on a napkin. If instead, the Destination reading were taken, and the relevant N was frog, since there are two frogs in the situation, the presupposition of uniqueness for the is not satisfied. In order to interpret the sentence in accordance with their semantic knowledge (concerning the), adults overcome their processing tendency to take the Destination reason. In the one-frog situation, on the other hand, the presupposition of maximality (uniqueness) is satisfied even if the N is frog, unmodified by on the napkin. Thus the natural processing tendency leads to the Destination reading, a garden path. Clearly, semantic properties (the definition of the) are computed on-line and interact with processing tendencies to produce adults’ interpretations and parsing patterns. Trueswell et al. show that 5-year-olds, on the other hand, garden-path in both the one-frog and two-frog situations. Varying the referential context (number of frogs) does not influence their parsing decisions with respect to garden-pathing; this is seen in both eye-tracking results and behavioral (i.e. how they obey the command in (7)) results. Trueswell et al. do not provide an explanation for why children are different, but they seem to suggest that it has something to do with verb biases used in on-line processing. But verb biases do not explain why children garden-path in the two-frog situation. The question is why children don’t use the relevant semantic information provided by the and the context (two frogs) to determine that there is likely a modifier coming after frog. Trueswell et al. suggest that perhaps children can’t consider two alternative syntactic analyses, and thus they simply use the probabilistic cue associated with the likely Destination phrase. But this would be a really major assumption about the processing abilities of a child — that they can’t consider two alternatives. The theory, if believed seriously, would be likely to make major mispredictions about the ability of a child, though there is no room to discuss this here. Moreover, the conceptual idea is hardly well grounded in the processing theory that the authors advocate, the integration of “multiple cues.” What does it mean to not be able to consider alternatives in such a theory? Presumably only one cue could
Cues Don’t Explain Learning
31
be considered at a time by the child. If this idea could be worked out, it seems likely that it would lead to a total breakdown of the child’s ability to comprehend a sentence. Another potential explanation that Trueswell et al. seem to be offering is that€children don’t make use of referential or semantic properties in the course of processing. This is extremely unlikely to be true, given many experiments in€ which children of the relevant age interpret sentences using contextual iÂ�nformation. For example, Wexler and Chien (1985) showed that children at age 5, when presented with two pictures, one of Cinderella’s sister washing herself and one of Cinderella’s sister washing Cinderella, and told Cinderella’s sister is washing herself, would select the correct picture (the one in which the sister is washing herself╃) more than 90 percent of the time. The point of the experiment€was to test whether children know the property that a reflexive must be c-commanded by its antecedent. But the children could attain success only if they not only knew the appropriate syntactic property but also understood and took account of context. If children didn’t take account of context in parsing the sentence, they couldn’t appropriately match the sentence to the picture. There are hundreds of experiments in the literature, on many topics, that would make the same point — that children pay attention to context. Trueswell et al., working within an undifferentiated statistical “cue” model, attribute the difference between adults and children to many different properties, but not to the child’s knowledge or use of the. They (at least implicitly, to some extent explicitly) assume that children know all the relevant properties of€the and behave perfectly on the in off-line experiments. But of course, the classic developmental literature shows that children make many errors on the, as I have briefly summarized. To their credit, Trueswell et al. (this volume) do see (in response to this paper) that knowledge of the is centrally involved, and suggest that children have difficulty learning the properties of the because of unclear input. (See note 6 to the present paper for a comment on this sÂ�uggestion.) Consider another attempt to explain the effect: Hurewitz et al. (2000, p. 263) suggest that the child first makes an attempt to go with the most “reliable” cue (the preference of put for taking a Destination) and “considers this cue first .â•–.â•–. He never returns from the end of the garden path, because of the inability to maintain alternative analyses over an extended period of time.” It isn’t clear what isn’t reliable about the uniqueness property of the. The input to the child is quite clear on the fact that the X presupposes a unique X in€the context set (where context set is partially determined by internal goals and purposes). So it isn’t clear that there is an objective relevant sense in which the preference of put for taking a Destination is any more “reliable” than the
32
Wexler
uniqueness of the entity named by the X (singular). Furthermore, since adults use the uniqueness property of the (which they do, in the experimental results), this must be a reliable “cue.” Perhaps what Trueswell et al. mean is not that the cue isn’t “reliable,” but that it isn’t “accessible.” Perhaps they think that children have difficulties with referential/interpretive properties of certain kinds, but not with syntactic properties. If so, this would be getting close to the “egocentric” theory that the kindergarten-path papers have ignored. Basically, no explanation along these lines seems to me to be possible because of what I see as the fundamental difficulty of the underlying model — the multiple-cue model. What the authors in this tradition (in particular the authors of the frog experiments) argue is that the “reliability” of a cue is the determining factor in whether it is “learned” early. Very reliable cues are learned very early, and less reliable cues are learned later. I know of no evidence for this fundamental assumption in the developmental literature, and it isn’t clear that a definition of “cue” that is empirically valid could be created in order to test the notion. To illustrate in just one relevant domain, let us try to predict the result on de dicto contexts in Maratsos’s stories experiment. In input to the child, does the object exist in the context or not? The answer is “sometimes.” Consider the de dicto verb look for. Suppose we say I’m looking for a pretty gift. This might be used when there are no gifts around ( just telling somebody about the situation). Or it might be used when there are several objects which are potential gifts in a store, and the speaker is looking at them. (Though of course the speaker isn’t referring to one of these objects. But that doesn’t matter from the standpoint of cue theory. So far as I understand, cue theory is concerned only with what is in the environment). So from the standpoint of cue theory, look for doesn’t provide a reliable cue concerning the existence of the referent of the direct object. Sometimes the referent object is there, sometimes it isn’t. The prediction would then follow that the ability to use look for as a de dicto verb would have to be quite late in children. How could they learn this from a non-reliable cue? (The analogy is to the “explanation” that the properties of the are late because of the “unreliability” of the relevant cues.) But recall that Maratsos (1976), in the looking-for story, showed that when it was made clear to 4-year-olds that a man was looking for a lion or a zebra, but no particular lion or zebra was introduced, the children answered correctly with the indefinite a lion or a zebra 91 percent to 98 percent of the time (two groups). Clearly these children had learned the “cue,” in the sense of this theory, despite the fact that the cue isn’t “reliable” in the relevant sense.
Cues Don’t Explain Learning
33
Other examples could be constructed at will by anybody familiar with the developmental language acquisition literature. The knowledge of the child is a complex internal system, not a system of learned “cues.” There is learning to be sure: parameter setting, learning of lexical properties, learning how to map phonetic forms of words to syntactic and semantic properties. But there is no reliable sense of “cue” in the external world that can be used to construct a theory along the lines of multiple-cue theory. Such a theory can make no predictions; it is in the position of Utility Theory in the social sciences, which allows one to always say that there must be some utility that is being maximized (yet no theory of utility is given). When a child learns something, multiple-cue theory will say “Well, that’s a reliable cue.” And if the child doesn’t learn something, it will say “Well, that’s not a reliable cue.” There is no theory of cues that has an empirical testing domain in the child. In my opinion, no such theory can be given. The kindergarten-path effect follows from the theory in this paper, with no recourse to special difficulties in child processing. If children are missing the presupposition of maximality from their analysis of the (or if they can’t compute maximality), then in the two-frog situation, the child can take the frog to refer to one of the frogs (either of the frogs). Thus there is no semantic reason to prefer the modifier analysis, no presupposition to be satisfied by taking a particular analysis. The only presupposition of the frog for the child is that there be at least one frog, and this is true in both the one-frog situation and the two-frog situation. Thus there is no reason for the child to prefer the modifier analysis in the two-frog situation. The natural tendency for the child (as for the adult) to prefer the Destination analysis wins out. None of the papers on the kindergarten-path effect refer to the rather extensive and useful experimental literature on the child’s competence concerning the,8 and for the most part the papers don’t even recognize that the difficult iÂ�ssue for the child concerns the. This brings us back to the point made at the beginning of this paper. The study of natural language processing in adults starts from a competence model; the competence model is derived from linguistic theory. It is this competence model that allows processing studies to proceed. We also need a competence model for children before we can start processing studies. Research on language acquisition has provided, in many areas, this competence model. Work on processing in children can proceed only if it starts from this competence model. In conclusion, with respect to kindergarten-path studies, children process language just like adults; the large differences in behavior follow from one small difference in analysis of the determiner, a difference for which there is extensive evidence. The processing studies are quite valuable; they confirm, using
34
Wexler
on-line methods (eye tracking), that children have essentially the same processing strategies or abilities as adults (e.g. whatever causes the garden path), and that they have the difficulties with the computation of the properties of the (maximality, as I have claimed) that have been discovered in off-line studies. It is very nice to have such an integration of results from different methods. The Revised Kindergarten-Path Explanation of Trueswell, Papafragou, and Choi
Elsewhere in this volume, Trueswell, Papafragou, and Choi ( henceforth TPC) respond to my paper and modify their theory accordingly. The editors asked me to take their paper into consideration in preparing the final version of my paper. In this section I consider the modified ideas presented by TPC. My paper is an investigation of the child’s knowledge of and use of determiners. I pointed out that “none of the papers on the kindergarten-path effect refer to the rather extensive and useful experimental literature on the child’s competence concerning the, and for the most part the papers don’t even recognize that the difficult issue for the child concerns the.” TPC accept this point and refer extensively to the developmental literature on the. Basically, the authors now invoke the “egocentric” hypothesis as part of the explanation for the kindergarten-path effect, which, as I noted, is the standard explanation for overuse of the. Trueswell and his colleagues had never referred to these studies or ideas in their earlier papers on the topic, attempting to explain the results as some kind of cue-learning, and assuming that children had no problems with the. Thus the TPC paper is an advance over earlier versions that Trueswell and his colleagues presented, bringing the kindergarten-path effect into line as simply one more instance of the child’s pattern of errors that I called the aâ•–→â•–the error. As I have argued, there is nothing special about the kindergarten-path effect; one can simply assume that children have the same processing abilities as adults (that is, they produce the garden-path effect for the same reasons as adults do), but they have difficulties with the. That was the main thrust of my discussion of Trueswell’s studies in the preceding section, and to their credit TPC have basically accepted the point. In my view, the point provides a major new (really old) perspective on the kindergarten-path effect, so it is good to have agreement. However, TPC go on to make other points, and these are important to comment on. My paper presented a new theory of children’s difficulties with deÂ� terminers: troubles with maximality. I pointed out that egocentricity was the standard idea, that I had often used the idea, and that I had attempted in several papers to put it together with many other phenomena in the development of language (for example, problems with certain kinds of binding situations).
Cues Don’t Explain Learning
35
However, I argued that there were phenomena (especially as reflected in Heim’s observation) that it didn’t seem possible for egocentrism to handle, but that the new theory could handle. My main motivation was to attempt to find a theory that could handle the phenomena. It is possible that egocentrism and troubles with maximality are both true, although of course it would be a more elegant theory if only one of them had to be invoked. TPC concentrate on trying to show that the Maximality Troubles (let’s call it MT) theory that I suggest is wrong. It is important to realize that even if that is true, it doesn’t vitiate my basic argument about Trueswell’s earlier attempts to explain the kindergarten-path effect. But TPC have conceded on that major point, so now we need only discuss the interesting empirical issue: Is egocentrism or MT more correct? TPC attempt to vindicate the “pragmatic” account against my “semantic” account. In this regard, it is worth observing that the brunt of my argument in the preceding section was that Trueswell had missed the pragmatic account, so we are now proceeding from a completely different space of ideas than Trueswell’s earlier papers, one starting off from the conclusions in my paper. But let’s see what the particular arguments are. TPC’s first and major point is logically false. They say that “Wexler’s semantic account appears to miss intuitions about referential errors.â•–.â•–.â•–. On his account, in a referential domain of several boys (e.g. one crying, one laughing, one sleeping) the sentence The boy is crying is false, rather than simply infelicitous. That is, if there are multiple boys in the domain of reference (and the other boys are not crying), the predicate is false because it is not true of all members of the set boy. However, the present authors all have the strong intuition that the utterance in this situation is true but infelicitous; that is, it should have been said in a different way.” Their claim — that my account predicts that the sentence is false — is itself false. “My” account of the determiner the is the standard Fregean account; I gave Heim’s formalization of this in (1). The Fregean theory has three values of truthfulness: true, false, and truth-valueless. The standard interpretation of truth-valueless in semantic theory is that such a value for an utterance results in an “infelicity.” (1) says that false is not the value in the current situation, since to derive false there has to be exactly one boy at index i — that is, exactly one boy in the current situation. Since there is more than one boy, false is not the value of the sentence the boy is crying. In fact, (1) predicts quite straightforwardly that in this situation the boy is crying is truth-valueless. Therefore the sentence results in an infelicity.9 This account of the semantics of the has been well known for more than 100€ years — ever since Russell’s famous the present king of France is bald
36
Wexler
example. Russell considered this sentence false, but Frege argued that it had no truth-value, introducing a crucial concept to semantic theory. Russell’s example involves what we would now call the presupposition of existence (not satisfied since there is no present king of France). TPC’s example involves the presupposition of uniqueness. I explicitly invoked the Fregean (not the Russellian) theory, so there is no way the theory would predict false for this sentence in this situation. The main point of my small comments on the Trueswell studies was that they didn’t take into account a large body of developmental psycholinguistics. Taking this work into account, I claimed, one could well maintain that there was nothing special about the Trueswell processing results; they would simply follow from the child’s difficulties with the plus the child having the same processing abilities as the adult. Nothing had to be said about statistics of locative phrases following put, etc., at least nothing different than what was said for the adult. So I offered a quite parsimonious account, which seems to stand up well. As I pointed out TPC have accepted the relevance of this literature. I now have one other suggestion, something I didn’t think I would have to argue for. In addition to taking into account a large experimental and analytic literature on psycholinguistic development, it would be very useful to take account of standard semantic theory with regard to the constructions under development. Never in the earlier papers did Trueswell and his colleagues seem to talk about the properties of the or their development. Now that they take the developmental properties to be relevant, it would be a further advance to inÂ� corporate the adult properties of the. From a semantic point of view, the FÂ�regean theory does this. If TPC were to accept such a theory (or give us a well-reasoned alternative), the discussion could be carried out on a much surer footing. In view of TPC’s misunderstanding of standard semantic theory, it is difficult to know what to make of their further points about my theory. Their second point is that there seem to be pragmatic issues involved in the use and development of the. I didn’t deny that; in fact, I pointed it out and mentioned that Trueswell and colleagues weren’t appreciating them. I gave an empirical argument (Heim’s observation) in favor of the troubles with Maximality theory. I believe that theory does account for the empirical literature that I know about (quite a large literature). Furthermore, it is possible that pragmatic difficulties also have to be added to the semantic problem. I will only make two points here. First, TPC have ignored much of the relevant literature, for example the de dicto experiments that I discussed above. Second, I don’t have to be reminded of the importance of the development of pragmatic issues of just this kind. It is a standard theme in my research (see the title and discussion from
Cues Don’t Explain Learning
37
Chien and Wexler 1990). Avrutin and Wexler (1992) tied together the standard empirical results on determiners (Karmiloff-Smith’s especially) and the egocentric analysis with results on the Delay of Principle B. Wexler 1999 and Thornton and Wexler 1999 contain extensive, detailed discussions of the relevant pragmatic issues. I am writing in the spirit of seeing if there is an alternative to the standard egocentric theory, since there appear to be some empirical difficulties with that hypothesis. I am not sure what TPC’s primary goal is. Is it€to say that there is something special about the kindergarten-path effect, in addition to the standard understanding in the literature, a claim that I deny? Is it to defend a theory? The multiple-cue theory? It is an advance that they recognize that children have some difficulties other than those related to the probability that a locative phrase follows put, but I am not sure what else they intend to say, other than to accept the fact that there is pragmatics as well as semantics. TPC’s third point is that the data in one of the two studies I referred to (Maratsos 1976) concerning Heim’s observation weren’t unequivocal. I’ll leave it up to others to read the studies. TPC report the response of two out of four adults on a translation of Karmiloff-Smith’s (1979) materials. Six 10-year-olds produced 0 percent definite articles in the experiment on stories in which there were many X’s (Karmiloff-Smith 1979, table 27, p. 144). Eight 9-year-olds produced 14 percent definite articles. But children of ages 4 to 7 (eight to twelve children in each group) produced between 48 percent and 63 percent definite articles. There is a huge “error” in the younger children, going down to zero in the older children. The fact that two of four adults in anecdotal reporting asked a question like “I don’t know, Mary” doesn’t mean anything other than that they guessed when they thought more information was required. It has nothing to do with the fact that young children ( but not older children, nor, presumably, adults) will use the. This returns us to my main criticism, in the preceding section, of the work by Trueswell et al. on the kindergarten-path effect: that it ignores the experimental developmental literature in favor of preformed ideas. There is a large literature on this topic. TPC’s fourth point is about an experiment by Papafragou and Tantolou (a work not available to me at the time of writing) that claims to show (as I understand it from TPC) that children know that the oranges must refer to the entire set of oranges. Karmiloff-Smith (1979) reports an experiment that shows that isn’t so, that children of this age (5 years) in fact will refer to a subset of the entire set of oranges in the situation as the oranges. It will be interesting to see what the difference between the two experiments is. My first thought is that the Papafragou and Tantolou result is due to the child’s being told by the elephant “I ate some.” To understand the result, we need a precise theory of the sÂ�emantics
38
Wexler
of some and of the relevant pragmatics (including scalar implicature and /or Maximize Presupposition). I don’t understand TPC’s fifth and final point. The theory I suggested was that 5-year-olds have difficulties with the (of the type that are standardly uÂ�nderstood) plus the same processing capacities (at least the relevant ones) as the adult. This yields the results of the kindergarten-path experiments so far as I can tell. I didn’t claim (quite the contrary) that the children have no processing limitations. I suggested that they could be taken to be whatever the limitations were in the adult, whatever caused the garden path. TPC mistakenly say that my claim that children have difficulties with the is related to the results presented in Chien and Wexler 1990. I do not know what to make of this. In that paper, we claimed that children had pragmatic difficulty with reference of a particular kind involving intensions of participants, the type of difficulty that was familiar from the egocentric theory of children’s determiner errors. There was nothing in Chien and Wexler about children’s lack of full semantic knowledge of the. I’ll lay out my bias here. I find the “multiple cue” idea either empty or false, depending on how it is interpreted. On the one hand it is a truism; how else can a sentence be analyzed and interpreted than by integrating lots of different kinds of information? But if it is interpreted, as I fear Trueswell and his colleagues interpret it, to mean that there is nothing in the mind except a set of undifferentiated probabilistic cues, dictated by the “environment,” then I find it false on its face. To be blunt, I don’t understand what is claimed to be the “complicated” theory in TPC. To be concrete, in TPC’s own words, their idea is that “there exist multiple probabilistic sources of evidence for constraining the grammatical structure of an utterance, and the child must discover, weigh, and combine this evidence.” Under one interpretation, as discussed, this is a truism, not a theory. Under another interpretation, if the statement is taken to mean, as I fear the authors intend, that there are no structural conditions, no well-defined syntax and semantics, no genetically programmed properties that the child is looking for, then the statement is false and they provide no reasons to revise this assessment. I hope TPC will think about the following: How could their theory be shown to be false? What kind of result would show that? I would like to stress that I find the kindergarten-path experiment lovely and useful; it confirms what we know about the development of determiners (the overuse of the in certain types of contexts, although so far it hasn’t helped us in differentiating accounts of that phenomenon), and it offers evidence that 5-year-olds have the same type of processing capacity and limitations as do adults. I welcome the results. But statements about uniqueness and specialness
Cues Don’t Explain Learning
39
of the results aren’t optimum when instead the results can fit into a broader and more experimentally supported set of ideas. We need understandable precise theories; this is what at least some of the developmental psycholinguistic literature has attempted to provide. If the kindergarten-path literature joins this attempt at clear analysis, and integration with a broader literature, it might be able to contribute substantially to our understanding. I look forward to such an outcome. Notes 1.╇ “Learning” explanations are unlikely, since the empirical results are very strong and clear: children learn properties that vary across languages almost always right from the start (e.g., Wexler 1998). The kinds of syntactic properties that develop late seem to be part of UG, not variable properties. We’ll try to study semantic development in similar terms. 2.╇ Grodzinsky and Reinhart (1993) follow in essence the development of Chien and Wexler, with a revision concerning the details of the pragmatics, and claim that the child difficulties are due to processing/memory overload in comparing representations rather than to pragmatics. Thornton and Wexler (1999) argue that this processing account can’t explain the empirical results. Wexler (2003) shows that this processing account mispredicts the empirical results concerning determiners that will be the central issue studied in this paper. But see Reinhart (this volume) for an alternative perspective. 3.╇ Here I’m only discussing singular nouns. I’ll briefly discuss plurals later. 4.╇ KS provides a discussion of why she gets somewhat more such responses than Maratsos. KS’s study seems more reliable, since it is much larger. Also, Maratsos simply didn’t test older ages. It might be worth noting that Maratsos’s participants were recruited from ads in such publications as the Harvard Crimson, suggesting that the academic family nature of his participants might have resulted in a more advanced verbal profile than the average population. 5.╇ The situational index i refers to the world in which the utterance is evaluated; for example, if that world is one in which the singleton pig that is being talked about is fat, the pig is fat is true at i; if the single pig in the world is not fat, the sentence is false. 6.╇ An alternative is that the child has the same lexical entry as the adult, but has trouble computing the uniqueness (maximality more generally) requirement specified in the presupposition of the in (1). This could account for why the child has poor but not completely reversed behavior from the adult in contexts in which uniqueness doesn’t hold, e.g. table 1, in which the 6-year-olds provide 56 percent (incorrect) use of the in the several identical objects condition rather than 100 percent. (I am assuming that the 56 percent figure means that each child has a figure quite different from 0 percent or 100 percent, which might not be the case. I’m not sure if enough individual subject analysis has been done to confirm this versus the alternative that each individual child is close to 0 percent or 100 percent.) The lexical entry theory given in the text (6) could also account for the alternating behavior under the assumption that uniqueness (more generally maximality) is a weak /variable property for the child. There is no space here to
40
Wexler
consider what empirical measures could distinguish the two alternatives. It is important to note that either alternative relies on the assumption that the difficulty has to do with uniqueness (more generally, maximality), not with egocentrism or with other semantic conditions or with the computation of Maximize Presupposition. 7.╇ Trueswell et al. (this volume) suggest that the evidence for uniqueness for the is unclear, since the X can be used, for example, when there are two X’s, but one is much more accessible (e.g. much closer to the hearer). But their argument assumes that the child doesn’t know that context situations are determined by internal states, purposes and goals. If the child does know this, then uniqueness is apparent in the input. If the child doesn’t know it, then the question is why not? And we would need a maturational explanation to explain how that knowledge is attained. I know of no reason to believe that children think that context sets are whatever is available in the visual input, rather than being partially determined by goals and purposes. 8.╇ To their credit, Trueswell et al. (this volume), responding to this paper, are an exception. They add the egocentric hypothesis that I reviewed earlier to their set of factors that help to cause the kindergarten-path effect, referring to the experimental literature. See the next section for a discussion of this paper. 9.╇ When the authors claim that their intuitions are that the sentence is “true but infelicitous,” they refer only to their intuition; they don’t mention any semantic theory to support this intuition nor say what the analysis would be. The standard Fregean theory gives an explicit analysis of the intuition. There is presupposition failure. Perhaps TPC want to somehow return to a kind of pre-Fregean Russellian analysis (see the next paragraph in the text for what Russell said), but they give no details of what they have in mind. References Avrutin, S., and Wexler, K. 1992. Development of principle B in Russian: Coindexation at LF and coreference. Language Acquisition 2, 259–306. Babyonyshev, M., Ganger, J., Pesetsky, D., and Wexler, K. 2001. The maturation of grammatical principles: Evidence from Russian unaccusatives. Linguistic Inquiry 32, 1– 44. Borer, H., and Wexler, K. 1987. The maturation of syntax. In T. Roeper and E. Williams (eds.), Parameters in Language Acquisition. Reidel. Borer, H., and Wexler, K. 1992. Bi-unique relations and the maturation of grammatical principles. Natural Language and Linguistic Theory 10, 147–189. Chien, Y. C., and Wexler, K. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1, 225–295. Crain, S., and Thornton, R. 1998. Investigations in Universal Grammar: A Guide to Experiments in the Acquisition of Syntax and Semantics. MIT Press. Frazier, L. 1987. Sentence processing: a tutorial review. In M. Coltheart (ed.), Attention and Performance XII. Erlbaum.
Cues Don’t Explain Learning
41
Frazier, L., and Fodor, J. D. 1978. The sausage machine: A new two-stage parsing model. Cognition 6, 291–325. Grodzinsky, Y., and Reinhart, T. 1993. The innateness of binding and coreference. Linguistic Inquiry 24, 69–101. Guasti, T. 2002. Language Acquisition. MIT Press. Heim, I. 1991. Articles and definiteness. In A. Stechow and D. Wunderlich (ed.), Handbook of Semantics: An International Handbook of Contemporary Research. Walter de Gruyter. Hurewitz, F., Brown-Schmidt, S., Thorpe, K., Gleitman, L., and Trueswell, J. 2000. One frog, two frog, red frog, blue frog: Factors affecting children’s syntactic choices in production and comprehension. Journal of Psycholinguistic Research 29, 597–626. Ionin, T., Ko, H., and Wexler, K. 2004. Article semantics in L2 acquisition: The role of specificity. Language Acquisition 12, 3–69. Karmiloff-Smith, A. 1979. A Functional Approach to Child Language. Cambridge University Press. Maratsos, M. 1976. The Uses of Definite and Indefinite Reference in Young Children: An Experimental Study of Semantic Acquisition. Cambridge University Press. Matthewson, L., and Schaeffer, J. 2000. Grammar and pragmatics in the acquisition of article systems. In J. Gilkerson, M. Becker, and N. Hyams (eds.), UCLA Working Papers in Linguistics: Language Development and Breakdown. Piaget, J. 1955. The Language and Thought of the Child. Meridian Books. Rizzi, L. 1994. Some notes on linguistic theory and language development: The case of root infinitives. Language Acquisition 3, 371–393. Schütze, C., and Wexler, K. 1996. Subject case licensing and English root infinitives. In A. Stringfellow, D. Cahma-Amitay, E. Hughes, and A. Zukowski (eds.), Proceedings of the 20th Annual Boston University Conference on Language Development. Cascadilla. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., and Sedivy, J. C. 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632–1634. Thornton, R., and Wexler, K. 1999. Principle B, VP Ellipsis, and Interpretation in Child Grammar. MIT Press. Trueswell, J. C., and Tanenhaus, M. K. 1994. Toward a lexicalist framework for cÂ�onstraint-based syntactic ambiguity resolution. In C. Clifton, K. Rayner, and L. Frazier (eds.), Perspectives on Sentence Processing. Erlbaum. Trueswell, J. C., Sekerina, I., Hill, N. M., and Logrip, M. L. 1999. The kindergartenpath effect: Studying on-line sentence processing in young children. Cognition 73, 89–134. Vinnitskaya, I., and Wexler, K. 2001. The role of pragmatics in the development of Russian aspect. First Language 21, 143–186. Warden, D. A. 1974. An Experimental Investigation into the Child’s Developing Use of Definite and Indefinite Referential Speech. Ph.D. thesis, University of London.
42
Wexler
Warden, D. A. 1976. The influence of context on children’s use of identifying expressions and references. British Journal of Psychology 67, 101–112. Wexler, K. 1990. Optional infinitives, head movement and the economy of derivation in child grammar. Paper presented at annual meeting of Society of Cognitive Science. Wexler, K. 1993. Optional infinitives, head movement and the economy of derivations. In D. Lightfoot and N. Hornstein (eds.), Verb Movement. Cambridge University Press. Wexler, K. 1998. Very early parameter-setting and the Unique Checking Constraint: A new explanation of the Optional Infinitive stage. Lingua 106, 23–79. Wexler, K. 1999. Maturation and growth of grammar. In W. C. Ritchie and T. K. Bhatia (eds.), Handbook of Language Acquisition. Academic Press. Wexler, K. 2002. Lennenberg’s Dream: Learning, Normal Language Development and Specific Language Impairment. In J. Schaeffer and Y. Levy (eds.), Towards a Definition of Specific Language Impairment. Erlbaum. Wexler, K. 2003. Maximal Trouble. Talk given at Maryland Mayfest: Semantics and Acquisition. Wexler, K. 2004. Theory of phasal development: Perfection in child grammar. MIT Working Papers in Linguistics 48, 159–209. Wexler, K., and Chien, Y. C. 1985. The development of Lexical Anaphors. Papers and Reports on Child Language Development 24, 138–149.
3â•…
Children’s Use of Context in Ambiguity Resolution Luisa Meroni and Stephen Crain
Many experimental investigations of human sentence processing have shown that listeners do not wait until they reach the end of a sentence before they begin to compute an interpretation. Rather, listeners incrementally make commitments to an interpretation as the linguistic input unfolds in real time. A consequence of this feature of sentence comprehension is that it sometimes gives rise to so-called garden-path effects. In the presence of a temporary ambiguity, listeners may assign an interpretation that later turns out to be unworkable and must, therefore, be abandoned in favor of an alternative interpretation. Various explanations have been proposed to account for the garden-path effects that have been documented in certain experimental contexts (e.g. Frazier and Rayner 1982; Trueswell and Tanenhaus 1994; MacDonald 1994). One line of research has claimed, however, that the referential contexts in which sentences ordinarily appear often mitigate, or even eliminate, gardenpath effects. This is the Referential Theory proposed by Crain and Steedman (1985) and extended by Altmann and Steedman (1988). According to the Referential Theory, listeners experience garden-path effects primarily when sentences are interpreted outside any referential context or in infelicitous contexts. If the Referential Theory is correct, garden-path effects are largely experimental artifacts. Recent work by Trueswell, Sekerina, Hill, and Logrip (1999) suggests that children, in resolving temporary syntactic ambiguities, may not be as sensitive to features of the referential context as adults are. If so, children would be expected to experience garden-path effects that are not experienced by adults. This finding would represent a serious setback to the Referential Theory, as it pertains to children; it would also be a setback to the Continuity Assumption, which maintains that children and adults engage the same cognitive mechanisms in language processing (see, e.g., Pinker 1984). In this paper we re-examine the conclusion that children suffer from gardenpath effects to a greater extent than adults do in (on-line) sentence processing.
44
Meroni and Crain
We offer an alternative account of children’s non-adult responses in previous research, and we provide experimental evidence for the conclusion that children actively use the referential context. The Role of Context in Resolving Ambiguity
A central question in recent studies of on-line sentence processing is whether or not children rely on the same parsing strategies as adults do in resolving temporary syntactic ambiguities. In particular, several studies have focused on the ability of children and adults to resolve ambiguities involving two prepositional phrases (PPs) in sentences like (1) (see Trueswell and Gleitman 2004). (1)╇ Put the frog on the napkin into the box. Upon encountering the first PP ‘on the napkin,’ the listener is not sure whether additional linguistic material will follow. A temporary ambiguity arises at this point, because the PP ‘on the napkin’ can be attached ‘high’ to the verb, specifying the destination of the action (i.e., where to move the frog), or can be attached ‘low’ to the noun phrase, indicating which frog to move (i.e., the frog that is on the napkin). Previous research in adult sentence processing has demonstrated that, in the absence of context, the psychological parser prefers high attachment, which yields the interpretation in which ‘on the napkin’ is the destination of the action. This conclusion is supported by evidence of telltale signs of garden-path effects at the second PP ‘into the box’ (Ferreira and Clifton 1984; Britt 1994). Different explanations have been proposed for the source of this attachment preference outside of context.1 One possibility, which we endorse, is that the VP-attachment preference follows from a parsing principle that instructs the parser to assign the theta roles associated with a verb as soon as possible, all other things being equal. Let us call this the Theta Assignment Principle (e.g. Gibson 1991; Gorrell 1995; Weinberg 1992). Adherence to the Theta Assignment Principle creates the garden-path effects witnessed in sentences like (1), according to the following scenario. The English verb put assigns two theta roles to its internal arguments: a Theme role and a Destination role.2 These theta roles are usually realized by a noun phrase and a prepositional phrase, respectively. In accordance with the Theta Assignment Principle, the parser immediately assigns the Theme theta role to the NP ‘the frog’ and assigns the Destination theta role to the PP ‘on the napkin’. These theta roles are assigned before the parser encounters the second PP, ‘into the box’. The theta roles associated with the verb having been discharged, the subsequent appearance of the PP ‘into the box’ informs the parser that it has been led down a garden path.
Children’s Use of Context
45
Figure 3.1
The one-referent context.
In response, the parser attempts to revise its initial decision about the Destination theta role. The necessary revisions include a reanalysis of the first PP, ‘on the napkin’, as a modifier of the NP ‘the frog.’ 3,4 The effects of the Theta Assignment Principle can also be observed when sentences are presented in certain referential contexts. One such context is depicted in figure 3.1, where there is one frog on a napkin, a chick, an empty napkin, and a box. Following Trueswell et al. (1999), we call this the one-Â� referent context. In response to the instruction in (1), both children and adults are expected to experience a garden path in this context when they encounter the second PP, ‘into the box’ (Tanenhaus et al. 1995; Spivey and Tanenhaus 1998). In other referential contexts, according to the Referential Theory, gardenpath effects are not anticipated. In such contexts, the temporary ambiguity involved in attaching the PP ‘on the napkin’ (as a modifier) should be resolved without recourse to the Theta Assignment Principle. One such context is depicted in figure 3.2, where there is one frog on a napkin, a second frog not on a napkin, an empty napkin, and an empty box. Following Trueswell et al. (1999), we call this the two-referent context. In response to the instruction in (1), in the two-referent context the parser should immediately respond by analyzing the PP ‘on the napkin’ as a modifier. According to the Referential Theory, the elimination of garden-path effects€in the two-referent context follows from the modular architecture of the language apparatus. A parsing principle, called the Principle of Referential Success, is hypothesized to take precedence over the Theta Assignment Principle in the two-referent context (Crain and Steedman 1985; Altmann and
46
Meroni and Crain
Figure 3.2
The two-referent context.
Steedman 1988). The Principle of Referential Success directs the parser to abandon structural analyses that do not refer to entities represented in its current model of the domain of discourse, and to maintain analyses that succeed in referring to entities in the model. In the two-referent context, the presence of two frogs in the domain of discourse prevents the application of the Theta Assignment Principle, which, as we have seen, would otherwise direct the parser to analyze the PP ‘on the napkin’ as the destination of the ‘putting’ event. This attachment of the PP, however, would prevent the parser from successfully identifying the referent of the initial definite NP ‘the frog’ in the discourse. The Principle of Referential Success preempts the application of the Theta Assignment PÂ�rinciple and leads the parser to attach the PP ‘on the napkin’ as a modifier because, on this analysis, the parser succeeds in referring to an entity in the discourse (the frog on the napkin), thus satisfying the presupposition of uniqueness triggered by the definite NP ‘the frog’. Once the PP ‘on the napkin’ has been attached low, the Theta Assignment Principle becomes operative again and guides the parser’s assignment of the Destination theta role to the PP ‘into the box’. On this analysis, the second definite NP, ‘the napkin’, succeeds in referring, despite the presence of the empty napkin; it refers to the napkin associated with the frog under consideration. On the Referential Theory, no worries are anticipated for the parser in the two-referent context for either children or adults. Children’s Lack of Sensitivity to the Referential Context
In a recent series of studies using a free-head eye-tracking system, Trueswell et€ al. (1999) examined the parsing strategies used by children and adults in
Children’s Use of Context
47
resolving the attachment ambiguity in sentences like (1) (see also SpiveyKnowlton and Sedivy 1995 for adults). Trueswell et al. considered contexts that supported the early application of the Theta Assignment Principle (the one-referent context) and contexts in which the Principle of Referential Success was expected to forestall the application of the Theta Assignment PrinÂ� ciple (the two-referent context). In the one-referent context, the principle of Referential Success and the Theta Assignment Principle can both be satisfied, because these principles are not in conflict when there is a unique referent for the initial NP ‘the frog’. The parser should experience a garden-path effect, however, upon encountering the second PP, ‘into the box’, in the one-referent context. In the two-referent context, by contrast, the Principle of Referential Success should obviate the application of the Theta Assignment Principle at the initial PP, ‘on the napkin’. Hence, no garden-path effects are expected to occur in the two-referent context. The results for adults were exactly as expected in the Trueswell et al. (1999)€study.5 In the one-referent context, adults showed a preference for VPattachment of the PP ‘on the napkin’. Although adults typically performed the correct actions (e.g., moving the frog directly into the box) in the one-referent context, they nevertheless moved their eyes to the “empty” napkin as they were processing the PP ‘on the napkin.’ Taken together, the findings are interpreted by Trueswell et al. (1999) as evidence that adults initially mis-analyzed the PP ‘on the napkin’ as filling the Destination theta role ( hence the glances to the empty napkin), but adults were able to revise their analysis ( hence the correct actions). The findings for adults in the two-referent context were also consistent with the Referential Theory. Adults not only performed the correct actions in this context; they did not even move their eyes to the empty napkin. By inference, adults immediately interpreted the first PP, ‘on the napkin’, as a modifier of the NP ‘the frog,’ thereby avoiding garden-path effects. This pattern of behavior conforms to the prediction of the Referential Theory. The responses of the children in the Trueswell et al. study tell a different story. In the one-referent context, children’s responses were consistent with the Referential Theory, but children were less able than adults to overcome the garden-path effects they experienced. Not only did children look at the empty napkin, as did adults; they also moved the frog onto that napkin on more than half of the trials. From this, one difference between children and adults is aÂ�pparent: children are less able than adults to recover from their initial misanalysis of temporary ambiguities in sentences like (1). A second difference between children and adults emerged in the two-Â�referent context. In contrast to adults, children continued to make “errors” in performance, just as they had in the one-referent context.6 Children performed “incorrect” actions on more than half of the trials. In particular, on trials where
48
Meroni and Crain
children produced non-adult responses, instead of moving the frog that was on the napkin to the empty napkin (for example), they moved the “wrong” frog — the one that was not on the napkin — 90 percent of the time. Presumably, on these trials, children interpreted the PP ‘on the napkin’ as the destination of the putting event, rather than as a modifier of the NP ‘the frog’. If so, it would seem that for children the Principle of Referential Success did not effectively block the application of the Theta Assignment Principle.7 And once again children were unable to revise their initial attachment of the PP ‘on the napkin’ as the destination of the putting event. These findings led Trueswell et al. (1999) to conclude (a) that children are less able than adults to use the referential context to guide their on-line parsing decisions, at least under certain processing conditions, and ( b) that children are unable to retrace their footsteps once led down a garden path.8 What Lies Beneath
In this section we present our own analysis of the experimental finding reported by Trueswell et al. (1999). Many of the features of our analysis were discussed by Trueswell et al., but the conclusions they reached were quite different from ours. On our account, not only do children consistently adhere to the Principle of Referential Success in resolving temporary structural ambiguities, but they adhere to all the other parsing principles that adults do, and they schedule these principles in the same way as adults do. Obviously children differ from adults, but we would emphasize the limits on these differences, on learnability grounds (see Crain and Wexler 1998). We pursue two features of the Trueswell et al. experimental findings. We wish to acknowledge, first, that children are less able than adults to revise their initial interpretation when they are confronted with linguistic input that cannot be incorporated into the structural analysis they are entertaining. This difference between children and adults is well documented in the literature as the source of children’s non-adult responses to several linguistic constructions. In the study by Trueswell et al., it is readily apparent that once children begin to execute a response, in both the one-referent and the two-Â� referent contexts, they are unable to revise the cognitive algorithms (i.e., plans for acting out the meanings of linguistic expressions), which they had already begun to execute. On the basis of previous research, we assume that children only gradually develop the performance routines needed to delay the immediate execution of plans. As for children’s non-adult responses, Trueswell et al. conclude that the vast majority of children resort to a guessing paradigm in the two-referent context.
Children’s Use of Context
49
This claim is based on an analysis of the responses of all children, not just the responses of children who committed errors. According to Trueswell et al., children who produced the “right” sequence of actions did so for the wrong reasons. Their argument goes roughly as follows: Most children initially guessed which of the two frogs to move, and sometimes a child guessed “right,” choosing to move the frog that was on a napkin. Once a child had selected the “right” frog, however, he or she was reticent to put it on the empty napkin, because the frog the child had selected was already on a napkin. So children ignored the initial PP on these trials and responded only to the second PP, ‘into the box’. We propose another account of children’s adult-like responses, according to which children draw upon the same parsing principles that account for adult responses. To see this, we need to compare children’s responses in the tworeferent context with their responses in the one-referent context. Trueswell et€ al. 1999 observed that whenever children selected the “right” frog in the two-referent context — the one on the napkin — they performed an “incorrect” sequence of actions on only 10 percent of trials. By contrast, whenever children selected the “wrong” frog — the one that was not on the napkin — 90 percent of their responses were “incorrect.” Clearly, this is a significant difference in behavior. To explain the difference, Trueswell et al. argue that children who guessed “right” ignored the PP ‘on the napkin’ because the frog they had selected was already on a napkin. Although this sounds plausible at first, there is ample evidence against it. If it were true, then children should have made the same inference in the one-referent context. Recall that in that context there was a single frog on a napkin, and there was an empty napkin. By parity of reasoning, the child subjects should have ignored the PP ‘on the napkin’ in the onereferent context just as often as the child subjects did in the two-referent context, putting the frog directly into the box, rather than on the empty napkin, on the grounds that it was already on a napkin.9 However, children moved the frog (from the napkin it was on) to the “empty” napkin on 56 percent of the trials in the one-referent context. This casts doubt on the proposal by Trueswell et al. that children ignored the PP ‘on the napkin’ in the two-referent context as a result of a pragmatic inference. For this reason, we disagree with the account proposed by Trueswell et al. We maintain that children who performed the right actions in the two-referent context did so for the right reasons, not because of a pragmatic inference. This brings us to the children who chose the “wrong” frog. The critical observation is that children who chose the “wrong” frog — the one that was not on the napkin — performed “incorrect” sequences of actions on 90 percent of the trials. We contend that in this case children’s non-adult responses were due to
50
Meroni and Crain
a pragmatic inference, and that the pragmatic inference was the children’s attempt to comply with the Principle of Referential Success. The story is simple. As children heard the sentence fragment “Put the frog on the napkin,” they (implicitly) reasoned as follows: “The experimenter is asking me to put the frog on the napkin. But there are two frogs. Which one is she referring to? I know: the one that is not already on a napkin.” In our view, the children were making a reasonable pragmatic inference that the experimenter’s intended referent for the NP ‘the frog’ was clearly discernable in the context — it was the frog that was not already on a napkin.10 If this is correct, then children were complying with both the Theta Assignment Principle and the Principle of Referential Success concurrently. By applying the pragmatic inference, children were able to interpret the PP ‘on the napkin’ as the destination of the action (Theta Assignment Principle) and, at the same time, were able to identify a uniquely salient frog in the context (Principle of Referential Success). In sum, the application of a pragmatic inference together with children’s inability to revise their initial interpretation explains children’s non-adult actions without supposing that children’s language apparatus vÂ�iolates any tenet of the Referential Theory. Plans and the Structure of Linguistic Behavior
The process of acting out instructions like “Put the frog on the napkin into the box” draws upon a cognitive algorithm, or plan. Plans are created and used in concert with other aspects of language processing. These include the outputs of the syntactic and semantic components, as well as the components of the language apparatus that are responsible for accessing linguistic principles and coordinating their use, e.g., verbal working memory. Devising a plan and executing it may be separated in time, or they may be interleaved in time. These two possibilities are analogous to the distinction between compiling and interpreting computer languages. The computer analogy has implications for understanding the differences that children experience in responding to various linguistic structures. Children are less “automated” or “compiled” than adults, and children may have less verbal working memory capacity. As a consequence, children sometimes act out the meanings of sentences in an order-ofmention fashion (Amidon and Peter 1972; Crain 1982), whereas adults act them out in an order that is conceptually correct (e.g., “given” information first). Because children interleave planning and execution, they tend to act out parts of the plans they generate, before all planning has been completed. The interpret-mode behavior of children can be rendered infeasible in certain situations — for example, by withholding the experimental display until
Children’s Use of Context
51
Figure 3.3
the child has listened to the entire instruction. In such circumstances, the child can (mentally) devise a plan in the compile mode, as adults do. If so, children are expected to perform conceptually correct sequences of actions, just as adults do. (Later in this paper, we present an experiment demonstrating that children do not commit errors in responding to instructions like “Put the frog on the napkin into the box” in the “phrase-and-then-display” condition, presumably because this experimental manipulation prevents children from prematurely executing the plans they generate on-line.) There are several examples of children’s premature execution of linguistic instructions in the literature on child language (i.e., relative clauses, sentences with before/after) . The example we focus on here involves the interpretations of prenominal modifiers, such as second striped ball. In a study of children’s command of phrase structure, Matthei (1982) reported systematic non-adult behavior by children when they were asked to perform the instruction in (2) using the array of objects depicted in figure 3.3. (2)╇ Point to the second striped ball. When adults were given the same instruction (counting from the left), they€consistently pointed to the second of the striped balls, i.e., the third object in the array. By contrast, children 4 to 6 years old often pointed to the second ball in the array, which happens to be striped. Hamburger and Crain (1984) raised the possibility that “the child subject .â•–.â•–. might start to plan and even act while the sentence is being uttered, possibly making a premature and incorrect dÂ�ecision.” Consistent with this hypothesis, children’s “errors” vanished if the experimenter produced the test sentence before the child had a chance to execute any action. This feature of experimental design is called phrase-and-then-display. Returning to the findings of Trueswell et al., here is what we are suggesting: Children’s inability to revise their initial interpretation in both the one-referent context and the two-referent context may be the result of interpret-mode behavior. In particular, children may have formulated a plan and started to execute it before the entire instruction “Put the frog on the napkin into the box” had been uttered by the experimenter. Specifically, children may have planned a specific course of action that interpreted the first PP, ‘on the napkin’, as referring to the destination. Having committed themselves to putting the frog that
52
Meroni and Crain
was not on a napkin on the empty napkin, it proved untenable for children to revise their interpretation of the instruction, because the action they were committed to perform could not be integrated into any appropriately revised plan. We call this the bird-in-the-hand problem. It is time to take stock. In our view, the experimental findings reported by Trueswell et al. do not support the conclusion that children are less sensitive than adults to contextual factors in the resolution of ambiguities. But children do differ from adults, because children assemble plans and begin to execute them in the interpret mode, rather than (as adults do) in the compile mode. However, in the discussion of children’s interpretation of phrases like second striped ball we have seen how to render the interpret mode inoperative: The experimental maneuver of phrase-and-then-display sufficed to elicit compilemode behavior from children. It is expected that children will perform like adults, then, if the phrase-and-then-display methodology is implemented in studies of PP attachment. In a later section, we report the findings of an experiment using the phrase-and-then-display methodology to encourage children to process linguistic input with a temporary ambiguity in the compile mode. But we are only halfway home. It remains to substantiate the hypothesis that children who selected the “wrong” frog did so because of a pragmatic iÂ�nference. A Pragmatic Inference
In this section we scrutinize children’s non-adult responses. As we noted in an€earlier section, when children made “incorrect” responses they chose the “wrong” frog 90 percent of the time. We hypothesized that children’s non-adult behavior was due to a pragmatic inference that they made to satisfy the Principle of Referential Success. The pragmatic inference enabled children to iÂ�dentify a uniquely salient referent for the NP ‘the frog’ in the experimental workspace. Upon hearing the fragment “Put the frog on the napkin,” children inferred that the unique referent of the NP ‘the frog’ was the one that was not already on a napkin. Experiment I was designed to evaluate this hypothesis. Experiment I
This experiment used contexts like the one depicted in figure 3.4, which includes two frogs. One frog is on a napkin, the other is not. In addition, there is an empty napkin in the experimental workspace. We call this the on /off context, because the two frogs can be distinguished in virtue of being on or off a napkin. Children were presented with contexts similar to the one depicted in
Children’s Use of Context
53
Figure 3.4
The on/off context.
figure 3.4 and were asked, by means of instructions like (3), to perform relevant actions. (3)╇ Put the frog on the napkin. The experimental hypothesis was that, in response to (3), children’s predominant response would be to move the frog that was not already on a napkin. Twelve children participated in the experiment. They ranged in age from 3;10 to 5;9, with a mean age of 4;7. The experimenter arranged the toys in the experimental workspace as illustrated in figure 3.4. Before each trial, the experimenter ensured that the child knew the name of each object. If the child had a non-standard name for an object, the child’s name for that object was used by the experimenter thereafter. The task was presented to the child as a game in which she had to help the experimenter in getting everything ready for a story. The experimenter instructed the child subject to act out (“Do what I say”) a sentence using the toys available in the experimental workspace. Each child was presented with four target trials, four fillers, and one warm-up. All and only the target trials were instructions containing the verb put as illustrated in (3). As predicted, children moved the frog that was not already on a napkin on 92 percent of trials (44 out of 48). It is worth noting that children never asked the experimenter which frog they should move. Moreover, when they were asked, on the very last trial, to explain why they had moved the frog not on a napkin, they consistently answered “because that one was already on a napkin.” We interpret the findings as confirming our hypothesis that children could make a pragmatic inference in the two-referent context in previous research in
54
Meroni and Crain
order to single out a salient frog, as dictated by the Principle of Referential Success. The preceding discussion suggests that the experimental results reported by Trueswell et al. can be understood without abandoning either the Referential Theory or the Continuity Assumption. If a pragmatic inference is responsible in part for children’s non-adult behavior, the Principle of Referential Success is not at issue. It also follows that, if the pragmatic inference that allows the hearer to identify a “salient” frog can be blocked (or at least delayed), the Principle of Referential Success will become operative in children’s processing. Blocking the Pragmatic Inference
If the present account of children’s non-adult behavior is on the right track, then children should refrain from making the pragmatic inference under discussion if, in the context, neither frog is more salient than the other, i.e., if both frogs are “equally” salient. In such a circumstance, children would be compelled to attach the PP ‘on the napkin’ as a modifier, just as adults do in the two-referent context. This attachment decision about the PP ‘on the napkin’ would enable children to identify a uniquely salient frog, in accordance with the Principle of Referential Success. The second step towards evoking adult-like performance from children, then, is to identify a referential context in which neither of the frogs is more salient than the other. We pursued this by constructing a context in which the two frogs could be distinguished only by a non-contrastive expression. Color adjectives fit the bill. Previous research has shown that color adjectives (e.g., blue), as compared to scalar adjectives (e.g., tall, big), are not interpreted contrastively (Sedivy, Tanenhaus, Chambers, and Carlson 1999; Sedivy 2003). For example, in a series of eye-tracking experiments (Sedivy et al. 1999), subjects were shown displays with four objects and were instructed to move various objects. One set of instructions contained scalar adjectives (example: “Pick up the tall glass”). The finding was that subjects identified the target object (the tall glass in the display) from a competitor object (an object that could have been described using the same adjective tall ) more quickly when a contrasting object of the same category of the target object (a short glass in the display) was present than when a contrasting object was not present. The findings were interpreted as evidence that participants systematically used the contrastive interpretation of prenominal scalar adjectives. Color adjectives were not found to exhibit the same contrastive function as scalar terms in similar contexts (Sedivy 2003). For example, when a subject was presented with a display containing different objects and asked to “Pick up the blue cup,” the presence in
Children’s Use of Context
55
the display of a contrastive object (a red cup) did not facilitate the identification of the target object. With this in mind, we return to the pragmatic inference that we claim children made (in identifying the referent of the NP ‘the frog’) in the two-referent context of the study by Trueswell et al. In the previous section, we showed that children presumably identified the unique referent of ‘the frog’ using a pragmatic inference: it was the one that was not on a napkin. Having satisfied the Principle of Referential Success, children were then able to apply the Theta Assignment Principle. Inhibiting the pragmatic inference, however, should prevent application of the Theta Assignment Principle, forcing children to use€the PP ‘on the napkin’ as a modifier to satisfy the Principle of Referential€Success. Recent work by Sedivy and her colleagues serves the stated goal of finding a situation that blocks the pragmatic inference in the two-referent context. Experiment II
We conducted an act-out task to investigate the hypothesis that the use of a non-contrastive expression in the instructions, such as a color adjective, would suffice to inhibit the pragmatic inference. On a typical trial, children were pÂ�resented with a context with two frogs, each one on a napkin of a different color — one red and one blue. In addition, there was an empty red napkin, as illustrated in figure 3.5. Children were given instructions such as (4). (4)╇ Put the frog on the red napkin.
Figure 3.5
The color adjective context.
56
Meroni and Crain
The two frogs could, in principle, be distinguished by the colors of the napkins on which they were placed. This could foster the application of a pragmatic inference similar in spirit to the one illustrated in experiment I. That is, in response to “Put the frog on the red napkin,” children could reason as follows: “The experimenter is asking me to put the frog on the red napkin. But there are two frogs. Which one is she referring to? I know: the one that is not already on a red napkin.” The question is whether children perform this reasoning to the same extent as they did without the color adjective, in experiment I. Twelve children participated in the experiment. They ranged in age from 3;11 to 5;8 years, with a mean age of 4;08.11 The procedures were similar to those in experiment I, with two differences: the two frogs were placed on napkins of different colors, and the target instructions contained a color adjective, as in (4). Children were tested using an act-out task. Each child was presented with four target trials, in addition to four fillers and one warm-up trial. All and only the target trials were instructions containing the verb put. As expected, all the children experienced difficulty in identifying the frog that was the intended referent of the initial NP, “the frog.” In the first two trials, children consistently displayed confusion. They queried the experimenter about which frog to move on 18 of 24 trials. Across all the trials, moreover, children behaved at chance in selecting one of the two frogs to move. Interestingly, when asked in the last trial to explain the reason for selecting the frog they had moved on that trial, children answered “it doesn’t matter” or “it’s the same.” This contrasts with the findings of experiment I, in which children experienced no difficulty in identifying a salient frog in the experimental workspace, presumably owing to a pragmatic inference. Children did not make a similar inference in the present experiment, when such an inference would have been contingent on the use of a color adjective to distinguish between the two frogs. To recap: On the basis of observations from the study by Trueswell et al., we proposed that the majority of children’s “errors” in the two-referent context involved a pragmatic inference. In the preceding sections we focused on two features of the findings presented by Trueswell et al., namely (a) children’s inability to revise their initial interpretation (i.e., interpret-mode behavior) and ( b) children’s selection of the “wrong” frog on the vast majority of trials in which they produced non-adult behavior (i.e., a pragmatic inference). In discussing these findings, we considered two experimental maneuvers that could lead to improved behavior by children by erasing the effects of these two impediments to children’s successful performance. In particular, we discussed how to render the interpret mode inoperative. For example, we hypothesized, on the basis of previous empirical evidence, that the experimental maneuver of
Children’s Use of Context
57
phrase-and-then-display should suffice to elicit the compile mode from children in response to sentences like “Put the frog on the napkin into the box.” Similarly, we discussed how we might go about inhibiting children’s pragmatic inference in response to sentences like “Put the frog on the napkin” by using color adjectives. This brings us, at last, to an experiment designed to encourage children to process linguistic input with a temporary ambiguity, as in “Put the frog on the (red) napkin into the box.” Experiment III
To follow up on these observations, we conducted an experiment similar to the€two-referent context experiment employed by Trueswell et al. (1999) but introducing two changes in the design. One change was made to inhibit the pragmatic inference, using color terms; the second was to implement the phrase-and-then-display condition, to impede children’s premature execution of a response. The experimental hypothesis was that these experimental procedures would result in “correct” adult interpretations. To inhibit the pragmatic inference, both the relevant objects were positioned on a platform of the same kind but of a different color, so that only a color adjective could be used to distinguish between them. For example, one trial was about two frogs, each of which was on a napkin (one red, one blue), and there was an empty red napkin (figure 3.6). To implement the phrase-andthen-display maneuver, children were introduced to the scene depicted in fÂ�igure 3.6 in the most neutral way, avoiding leads-in that could have biased
Figure 3.6
The two-referent context.
58
Meroni and Crain
children toward one interpretation over the other. For example we never mentioned that one object was on a platform while the other was misplaced. We also avoided the use of the preposition on in describing the context, in order to block earlier inferences. After being introduced to the scene, children were asked to turn away while they were listening to the instructions, which contained a temporary ambiguity, as in (5). After hearing the entire sentence, children were aÂ�llowed to look at the scene again and perform the action they were asked to perform. We interviewed 22 children in the same age range of those tested in the study by Trueswell et al. (The children who participated in our experiment ranged in age from 3;09 to 5;09, with a mean age of 4;09.12) Each child was presented with four target trials, four fillers, and one warmup. All and only the target trials were instructions containing the verb put, as illustrated in (5). (5)╇ Put the frog on the red napkin into the box.13 As predicted, the children performed the correct action on more than 92 percent of trials (81 of 88). The findings suggest that children’s failure to use€ referential information in previous research was due to their tendency to€ execute response plans on the fly, which in turn exploited their ability to make a pragmatic inference in order to satisfy the Principle of Referential SÂ�uccess. Before drawing firm conclusions about the results of this experiment, we need to rule out the possibility that the experimental maneuvers we introduced by using the two-referent context might have improved children’s responses even in the one-referent context. That is, the changes that were introduced ( phrase-and-then-display, use of prenominal adjectives) might have increased children’s willingness to interpret the initial PP as a modifier regardless of the specific context. If so, then the present study simply shows that children are capable of interpreting PPs as modifiers. To control for this possibility, we conducted an experiment incorporating these features, but using a one-referent context. Experiment IV
The one-referent context employed in the present experiment is similar to the two-referent context in experiment III. The only difference is that a frog has been replaced by a different animal in the experimental workspace. For example, one trial was about one frog and one chick, each of which was on a napkin (one napkin was red, one blue), and there was an empty red napkin (see
Children’s Use of Context
59
Figure 3.7
The one-referent context.
figure 3.7). After children were introduced to the scene, they were asked to turn away. While unable to see the display, they were given a verbal instruction that contained a temporary ambiguity, as in (6). Having heard the entire sentence, they were allowed to turn back to the scene and to comply with the instruction. (6)╇ Put the frog on the red napkin into the box. In the study by Trueswell et al. (1999), children were found to experience garden-path effects in responding to instructions like “Put the frog on the napkin into the box” in the one-referent context. By hypothesis, the changes introduced in experiment III were not expected to increase children’s tendency to interpret PPs as modifiers in the one-referent context exemplified in figure 3.7. To the contrary, we expected children to be led down the garden path to (more or less) the same extent in the present experiment as they had been in previous research. We interviewed twelve children (ages 3;05–5;04; mean age 4;10).14 Each child was presented with four target trials, four fillers, and one warm-up. All and only the target trials were instructions containing the verb put followed by the preposition on, as in (6) above. As predicted, children manifested the signs of having been led down a garden path: they moved the frog onto the empty napkin and then into the box on 52 percent of the trials (25/48).15 The findings of experiment IV suggest that the changes introduced in experiment III do not simply facilitate children’s interpretation of the initial prepositional phrase as a modifier. If they had, children would have performed as successfully in the one-referent context as in the two-referent context. This
60
Meroni and Crain
invites the following conclusion about experiment III: When the relevant pragmatic inference was blocked, and when on-line execution of children’s response plans was blocked, children demonstrated adherence to referentially based parsing principles. We interpret the findings as indicating that children generate a structural analysis that conforms to the Principle of Referential Success, so long as they are prevented from immediately executing the response dictated by the Theta Assignment Principle. Conclusion
Based on the findings of experiment IV, we feel confident in maintaining our default hypothesis. First, children and adults share all the core properties of the performance system in which linguistic knowledge is embedded, as well as sharing knowledge of the linguistic principles themselves. We conclude, in particular, that both children and adults adhere to fundamental parsing principles that have been proposed in the literature: the Theta Assignment Principle and the Principle of Referential Success. The difference between children and adults is that the processing of linguistic input is less automatic for children than it is for adults. This conclusion enables us to maintain the Continuity A�ssumption. Acknowledgments
We wish to thank Andrea Gualmini, Carson Schütze, and Amy Weinberg for helpful discussion, and Megan Gilliver, Martin Hackl, Utako Minai, Beth Rabbin, and Eileen Rivers for help in conducting the experiments. We also thank John Trueswell for his detailed comments on an earlier draft. Most of all, we thank the children, teachers, and staff at the Center for Young Children at the University of Maryland. Notes 1.╇ See Spivey-Knowlton and Sedivy 1995 and Trueswell et al. 1999 for a complete review and references. 2.╇ The verb put assigns a third theta role to its external argument, the subject NP. However, this argument has no empirical consequences for the process of ambiguity resolution, so we do not consider it further (cf. Ni, Crain and Shankweiler 1996). 3.╇ The example is open to another interpretation, in which the PP ‘in the box’ is interpreted as a modifier of the preceding NP ‘the napkin’. This meaning can be paraphrased as “Put the frog on the napkin that is in the box.” Although this interpretation may be
Children’s Use of Context
61
accessed in certain contexts, it is ruled out in the one-referent context by the Principle of Referential Success, since there is no napkin in the box. 4.╇ Trueswell et al. adopt the Lexicalist Parsing model to explain the VP-attachment preference (MacDonald et al. 1994; Trueswell and Tanenhaus 1994). According to the model, the preference for VP-attachment is based on the lexical properties of the linguistic input (i.e., the verb put) and the frequency with which particular syntactic constructions are used. The verb put frequently associates the first PP with the Destination theta role, so the Lexicalist Parsing model predicts a VP-attachment preference for the sentences under consideration. 5.╇ This is our interpretation of the study by Trueswell et al. We refer the reader to the original paper for further discussion of a particular version of the Referential Theory, and for presentation of the Lexicalist Parsing Model. 6.╇ We use quotation marks to refer to the pattern of children’s non-adult behavior to avoid committing to the view that these responses were errors. 7.╇ For the purposes of this paper, we focus on the actions performed by children in both contexts under consideration. We refer the reader to Trueswell et al. 1999 for a complete and detailed discussion of the on-line eye-movement patterns. 8.╇ The results of Trueswell et al. (1999) have been replicated by Hurewitz et al. (2000). 9.╇ The logic behind this argument was sketched but not pursued by Trueswell et al. (1999). One qualification about our account is needed, however. We do not think that a visual difference in the context alone determines the salience of one frog over the other. Rather, what is relevant is how that difference is encoded in the linguistic instructions. 10.╇ In a later section, we present experimental evidence that children do make a pragmatic inference to identify a salient frog. 11.╇ The children who participated in experiment II did not participate in experiÂ� ment €I. 12.╇ The children who participated in experiment III did not participate in experiment I or experiment II. 13.╇ Despite the occurrence of a color adjective in the instruction, a temporary ambiguity emerges in (5) after the first PP, as in the study by Trueswell et al. 14.╇ The children who participated in experiment IV did not participate in experiment I, experiment II, or experiment III. 15.╇ Trueswell et al. (1999) reported roughly 60 percent incorrect actions. References Altmann, Gerry, and Steedman, Mark. 1988. Interaction with context during human sentence processing. Cognition 30, 191–238. Amidon, Arlene, and Carey, Peter. 1972. Why five-year-olds cannot understand before and after. Journal of Verbal Learning and Verbal Behavior 11, 417– 423. Britt, Anne M. 1994. The interaction of referential ambiguity and argument structure. Journal of Memory and Language 33, 251.
62
Meroni and Crain
Clark, Eve V. 1971. On the acquisition of the meaning of before and after. Journal of Verbal Learning and Verbal Behavior 10, 266 –275. Crain, Stephen. 1982. Temporal terms: Mastery by age five. Papers and Reports on Child Language 21, 33–38. Crain, Stephen. 1991. Language acquisition in the absence of experience. Behavioral and Brain Sciences 14, 597– 650. Crain, Stephen. 2002. The continuity assumption. In I. Lasser (ed.), The Process of Language Acquisition. Peter Lang. Crain, Stephen, and Steedman, Mark. 1985. On not being led up the garden-path: The use of context by the psychological parser. In D. Dowty, L. Kartunnen, and A. Zwicky (eds.), Natural Language Parsing. Cambridge University Press. Crain, Stephen, and Wexler, Kenneth. 2000. Methodology in the study of language acquisition. In W. C. Ritchie and T. K. Bhatia (eds.), Handbook on Language Acquisition. Academic Press. Ferreira, Fernanda, and Clifton, Charles. 1986. The independence of syntactic processing. Journal of Memory and Language 25, 348–368. Frazier, Lyn, and Rayner, Keith. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14, 178–210. Gibson, Edward. 1991. A Computational Theory of Human Sentence Processing: Memory Limitations and Processing Breakdowns. Ph.D. thesis, Carnegie Mellon University. Gorrell, Paul. 1995. Syntax and Parsing. Cambridge University Press. Hamburger, Henry, and Crain, Stephen. 1982. Relative acquisition. In S. Kuczai (ed.), Language Development II. Erlbaum. Hamburger, Henry, and Crain, Stephen. 1984. Acquisition of cognitive compiling. Cognition 17, 85–136. Hamburger, Henry, and Crain, Stephen. 1987. Plans and semantics in human processing of language. Cognitive Science 11, 101–136. Hurewitz, Felicia, Brown-Schmidt, Sarah, Thorpe, Kirsten, Gleitman, Lila, and Trueswell, John. 2000. One frog, two frog, blue frog: Factors affecting children’s syntactic choices in production and comprehension. Journal of Psycholinguistic Research 29, 567– 626. MacDonald, Maryellen C. 1994. Probabilistic constraints and syntactic ambiguity resolution. Language and Cognitive Processes 9, 157–201. MacDonald, M. C., Pearlmutter, N., and Seidenberg, M. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review 101, 676 –703. Matthei, Edward M. 1981. The acquisition of prenominal modifier sequences. Cognition 11, 301–332. Ni, Wejia, Crain, Stephen, and Shankweiler, Donald. 1996. Sidestepping garden paths: Assessing the contributions of syntax, semantics and plausibility in resolving ambiguities. Language and Cognitive Processes 11, 283–334.
Children’s Use of Context
63
Pinker, Steven. 1984. Language Learnability and Language Development. Harvard University Press. Sedivy, J. C. 2003. Pragmatic versus form-based accounts of referential contrast: Evidence for effects of informativity expectations. Journal of Psycholinguistic Research 32 (1), 3–23. Sedivy, J. C., Chambers, C., Tanenhaus, M., and Carlson, G. 1999. Achieving incremental semantic interpretation through contextual representation. Cognition 71, 109– 147. Spivey, Michael, and Tanenhaus, Michael. 1998. Syntactic ambiguity resolution in discourse: Modeling the effects of referential context and lexical frequency. Journal of Experimental Psychology: Learning, Memory and Cognition 24, 1521–1543. Spivey-Knowlton Michael, and Sedivy, Julie. 1995. Resolving attachment ambiguities with multiple constraints. Cognition 55, 227–267. Tavakolian, Susan L. 1981. The conjoined-clause analysis of relative clauses. In S. Â�Tavakolian (ed.), Language Acquisition and Linguistic Theory. MIT Press. Tanenhaus, Michael K., Spivey-Knowlton, Michael J., Eberhard, Kathleen M., and Â�Sedivy, Julie. 1995. Integration of visual and linguistic information in spoken lanÂ� guage€comprehension. Science 268, 1632–1634. Trueswell, John C., and Gleitman, Lila. 2004. Children’s eye movements during listening: Developmental evidence for a constraint-based theory of sentence processing. In J. M. Henderson and F. Ferreira (eds.), The Interface of Language, Vision, and Action: Eye Movements and the Visual World. Psychology Press. Trueswell, John C., Sekerina, Irina, Hill, Nicole M., and Logrip, Marian L. 1999. The kindergarten-path effect: Studying on-line sentence processing in young children. Cognition 73, 89–134. Trueswell, John, and Tanenhaus, Michael K. 1994. Towards a lexicalist framework for constraint based syntactic ambiguity resolution. In Charles Clifton, Lyn Frazier, and Keith Rayner (eds.), Perspectives on Sentence Processing. Erlbaum. Weinberg, Amy. 1992. Parameters in the theory of sentence processing: Minimal commitment theory goes east. Journal of Psycholinguistic Research 22, 339–364.
4â•…
Referential and Syntactic Processes: What Develops? John C. Trueswell, Anna Papafragou, and Youngon Choi
Introduction
A primary purpose of language is to permit individuals to communicate their perceptions and conceptions of the world. The linguistic system that underlies this communication must therefore be designed for fairly intricate interactions with the human perceptual and conceptual machinery. The study of sentencecomprehension abilities in adults shows quite clearly that this is the case. For instance, it has been found that the recognition of a word includes accessing detailed linguistic information about how that word is likely to combine syntactically and semantically with the current representation of the sentence. In addition, the referential implications of these analyses are computed in real time and appear to exert a simultaneous influence on the ongoing structural analyses, allowing the listener to pursue referentially plausible parses and exclude referentially implausible ones. This rapid “dance” among syntactic, semantic, and referential factors over the course of interpreting a sentence has led researchers to the conclusion that the recognition of a word in a sentence exerts immediate effects on multiple tiers of linguistic and nonlinguistic representation, i.e., phonological, syntactic, semantic, and referential. These representational systems, though distinct, mutually constrain each other in a dynamic fashion (Jackendoff 2002; MacDonald, Pearlmutter, and Seidenberg 1994; Trueswell and Tanenhaus 1994). In this paper, we explore the question of how this sentence-processing machinery develops in children. The study of the development of sentence processing has attracted renewed interest with the advent of a methodology for studying children’s comprehension abilities in real time. In these studies, children’s eye-gaze patterns to objects in the world are recorded as they hear sÂ�poken utterances about this world, with the measure providing a moment-bymoment window into their interpretation process. We will suggest from these and other data that the system is incremental and interactive at a relatively
66
Trueswell et al.
early stage in development, showing sensitivity to a variety of constraints on computing sentential meaning. But at the same time we will argue that there are systematic changes over developmental time in the reliance on certain sources of linguistic and nonlinguistic evidence — changes that depend upon the validity and reliability of the evidence as derived from the learner’s experience. The dynamic abilities of this processing system are also found to change and mature in time; this interface system, like many others, is subject to developmental changes in the control of information processing, especially changes in cognitive and attentional control. In order to begin our discussion of this theory of parsing development, we must first discuss what is known about the adult end state, that is, what we believe to be true about the sentence-comprehension abilities of adults and how they use multiple constraints to shape their on-line structuring of the input. With this account in hand, we will use it to motivate a developmental theory of sentence processing. We will then turn to experimental evidence that we believe justifies our claims. Real-Time Sentence Processing in Adults
Let us first consider what are likely to be inescapable truths about sentence comprehension in adults. First, given the way natural languages work, it is almost certainly the case that listeners must recover much or all of the intended syntactic structure of an utterance. This is because the structural characteristics of an utterance, when combined with the semantics of verbs and other lexical items, convey essential role assignments (i.e., who is doing what to whom). This, of course, has been the bread and butter of not only linguistics but also of most psycholinguistic research on sentence processing carried out over the last thirty years. Less often discussed, but arguably equally important, is the fact that the structure of an utterance simultaneously conveys intended discourse operations (e.g., discourse status and focusing). Grammatical choices made by a speaker (whether to passivize, whether to include a restrictive modifier, etc.) reflect discourse considerations and are designed to communicate what the speaker is referring to. Listeners therefore need this syntactic information to infer the intended meaning of an utterance and its reference to the world. Thus listeners must look for evidence in the linguistic input about the syntactic operations that gave rise to the utterance. Exactly how syntactic and semantic structure is recovered by a listener or a reader has been a topic of considerable debate (e.g., Altmann and Steedman 1988; Frazier and Fodor 1978; Frazier 1987; MacDonald et al. 1994; Trueswell and Tanenhaus 1994). In this paper, we will sketch only our own view — the
Referential and Syntactic Processes
67
view that motivates our developmental account. Specifically, we assume that during the comprehension of a sentence, listeners are engaged in the recovery of phonological, syntactic, and semantic characterizations of the input, each of which is maintained within partially independent representational systems (representational modularity). These representational systems dynamically constrain each other as the sentence unfolds (dynamic interactive processing). These three characterizations of the input ( phonological, syntactic and semantic) should be thought of as interim representations whose primary use is to allow listeners to update their mental model of the world (including what they believe speakers are trying to communicate). Importantly, we assert that the recovery of these interim representations is done in real time via probaÂ� bilistic mechanisms. The process of recognizing a word within a sentence acÂ� tivates probable phonological, syntactic, and semantic structures in parallel, including if necessary multiple alternatives within each subsystem. In turn, interface mechanisms act in real time as the sentence is unfolding to converge on the most consistent and probable solution across these domains (see Trueswell and Tanenhaus 1994; Kim, Srinivas, and Trueswell 2002). It follows from this account that the frequency-based accessibility of structural alternatives will play an important role in a comprehender’s ability to converge on the intended meaning of an utterance. Perhaps the best evidence for this claim comes from adult studies of temporary ambiguity during reading and listening. For instance, in the fragments below, temporary ambiguities arise as readers or listeners attempt to rapidly structure the input: (1)╇ The man sliced the loaf with .â•–.â•–. (2)╇ The child believed the doctor .â•–.â•–. In (1), with could be linked to the verb sliced, thereby denoting an instrument (e.g., with the sharp knife), or be linked to the noun phrase the loaf, thereby denoting possession of a property (e.g., with the burnt crust). In (2), the noun phrase the doctor could be structured as the direct object of believed (e.g., so that the sentence could end there) or as the start of an embedded sentence that is a complement of the verb (.â•–.â•–. believed the doctor was lying). A wealth of experimental findings now suggest that the structural and semantic analyses that comprehenders assign at the point of ambiguity are determined in part by detailed lexical factors. These factors include the probability that a verb takes particular complements, as well as the semantic fit of constituents into the intended roles assigned by the verb (e.g., Britt 1994; Garnsey, Pearlmutter, Myers, and Lotocky 1997; Trueswell, Tanenhaus, and Kello 1993; Trueswell, Tanenhaus, and Garnsey 1994). For instance, the tendency for the
68
Trueswell et al.
verb slice to include an Instrument role in the form of a PP and the tendency for€ believe to include a Patient in the form of an NP predict initial parsing pÂ�references by readers and listeners encountering these phrases (e.g., Garnsey et al. 1997; Taraban and McClelland 1988).1 All of this suggests that wordrecognition processes often drive the structuring of input — indeed, some studies show that covert priming of a verb with different syntactic and semantic tendencies can unconsciously affect comprehenders’ parsing preferences for ambiguous phrases ( Novick, Kim, and Trueswell 2003; Trueswell and Kim 1998). This probabilistic recovery of structure is sensitive to other contingencies as well. In particular, the referential implications of these interim representations are computed in real time and can serve as an important top-down constraint on sentence processing. For instance, Altmann and Steedman (1988) found that readers structure phrases like sliced the loaf with the .â•–.â•–. differently depending on the contents of the story leading up to this sentence. For example, when there are two different loaves in the story, readers prefer to interpret the with-PP as a modifier of the preceding NP (the loaf╃ ). The idea here is that a definite NP must uniquely specify a referent within the current referential domain (Crain 1980; Crain and Steedman 1985). If the simple NP the loaf fails to do this, further linguistic information is expected in the form of a post-NP modifier. Indeed, indefinite NPs (a loaf╃ ) alter this parsing preference (SpiveyKnowlton and Sedivy 1995), and even other referential factors contribute to parsing decisions (Trueswell and Tanenhaus 1991). It seems, at least from this evidence, that comprehenders must also be dynamically tracking what is under discussion and what is within the current referential domain, since these factors rapidly influence the structuring of the input.2 Importantly, however, reading studies of this sort also indicate that the effectiveness of this contextual factor depends on the availability of the structural options at issue. For instance, in (1) above, the effectiveness of the twoloaf story in supporting a modifier interpretation depends on the kind of verb that is used in the stimuli: verbs that often include an Instrument role show substantially delayed and reduced contextual effects (Britt 1994; SpiveyKnowlton and Sedivy 1995; see also Garnsey et al. 1997 and Trueswell 1996 for lexically determined accessibility in other structures). In the picture that emerges from these data, the recognition of a word within a sentence automatically triggers linguistic representations at multiple levels. But this triggering is probabilistic in nature: given the evidence at hand, a listener is engaged in a “guessing game” in which the linguistic procedures that gave rise to the utterance are recovered. The referential implications of these representations are also computed in real time and, when possible, used to constrain the representational hypotheses that the listener is considering.
Referential and Syntactic Processes
69
It cannot be emphasized enough that local ambiguity is ubiquitous in realtime language comprehension. Indeed, computational linguists have recognized the pervasiveness of ambiguity, especially since they began to implement “wide-coverage” parsers and interpreters that were designed to handle naturally produced text (e.g., Marcus, Santorini and Marcinkiewicz 1993). It has even been claimed that local ambiguity of the sort found in highly lexicalized formalisms can provide a processing advantage because it permits greater flexibility in recovering structure and meaning (e.g., Srinivas and Joshi 1999; Steedman 2000; see also Kim, Srinivas, and Trueswell 2002). The implication here is that the accessibility of structure is an unavoidable issue in the study of language comprehension, whether one is interested in syntax, in semantics, or in reference. We strongly suspect the same is true for the study of language comprehension in children. A Developmental Account of Sentence Comprehension
Let us now turn to how the child learns to implement these dynamic sentenceprocessing abilities, focusing especially on how referential contingencies are learned and used. To begin, we must spell out some basic assumptions of our account. First, we will assume a great deal of processing continuity over development. That is, the types of processes used for language comprehension remain constant throughout language learning and into adulthood. In particular, we will assume the following. 1.╇ Real-time processing continuity: From the outset, a language learner or a listener is attempting real-time incremental processing of the input speech stream. 2.╇ Probabilistic processing continuity: From the outset, the detection from the speech stream of already acquired linguistic elements (including syntactic and phrasal elements) is achieved via probabilistic pattern-recognition and patterncompletion processes. There is good reason to believe that these assumptions hold, especially as they pertain to the processing of sub-lexical and lexical elements. In particular, experimental results from Aslin, Newport, and Saffran indicate that 8–12-montholds are sensitive to the distributional properties of syllables, which allows them to discover likely lexical or morphological candidates from continuous speech (e.g., Aslin, Saffran, and Newport 1998; Saffran 2001, 2002; Saffran, Aslin, and Newport 1996). These results suggest that from the outset, language learners are attempting to extract potentially relevant linguistic elements from the input via probabilistic mechanisms. In turn, these elements serve as candidates for word learning. That is, language learners attempt to map these newly discovered elements onto known conceptual representations.
70
Trueswell et al.
As language learners build up a repository of word-meaning pairs, they are faced with temporary ambiguity from the start. They deal with this ambiguity in an adult-like manner — i.e., in real time, as the speech unfolds. For instance, the eye-tracking research of Fernald, Swingley, and colleagues shows that 18–24-month-olds process phonological word cohorts (dog/doll; tree/truck) in much the same way as adults, the major difference being that adults know more words (Swingley, Pinto, and Fernald 1999; Allopenna, Magnuson, and Tanenhaus 1998). Upon hearing doll in a sentence like Look at the doll, 18–24 month-olds will look to cohort referents of doll, such as a picture of a dog, but not to non-cohort referents, such as a picture of a mouse. Suggestive evidence also exists indicating that children in this age range are beginning to engage in real-time syntactic and semantic structuring of these utterances. For instance, 27-month-olds hearing Let’s roll the ball restrict looks to a ball, excluding a non-rollable object in view, as the word ball is being heard (Fernald 2004, as reported in Fernald, Zangl, Thorpe, Hurtado, and Williams 2008; see also Nation, Marshall, and Altmann 2003 for similar studies with older children). All these patterns have been observed in real-time studies of adult listeners (e.g., Allopenna et al. 1998; Altmann and Kamide 1999), suggesting considerable continuity of processing abilities over development. The question of interest here, though, is whether children’s sentence-Â� processing abilities show the same sort of continuity over development, which is exactly what we wish to assert. One obvious obstacle, of course, is that sentence comprehension is arguably orders of magnitude more complex than word recognition. We therefore must make the following further assumptions about the development of language processing. 3.╇ Representational modularity: The language-processing system is innately predisposed to organize linguistic input into three partially independent representational domains: phonological, syntactic, and semantic. 4.╇ Representational interfacing: The language learner expects systematic correspondences between these representational systems. For instance, the number and type of phrasal constituents present in an utterance will have a systematic mapping onto the number and type of participants denoted in the conceptual representation of an event (Gleitman 1990; Gleitman, Cassidy, Nappa, Papafragou, and Trueswell 2005). 5.╇ Interactive processing: From the outset, the language system is detecting and taking advantage of probabilistic tendencies among phonological, syntactic, and semantic elements computed from the input stream so as to constrain possible analyses of this input.
Referential and Syntactic Processes
71
6.╇ Assume reference: The language learner is innately predisposed to assume that communicative acts refer to the world. Thus, from the outset, the language learner attempts to compute the referential implications of the linguistic characterizations of the input. These assumptions, when combined with what we believe are properties of the adult comprehension system, allow us to derive some predictions about how child listeners ought to resolve temporary syntactic ambiguity during sentence comprehension. First, like the adult system, the child sentence-comprehension system is engaged in the recovery of known syntactic and phrasal categories from the input, which is accomplished via pattern-recognition processes. These higher-order syntactic and phrasal elements are likely to be discovered via distributional and statistical mechanisms similar to those proposed for lexical discovery by Saffran, Newport, and colleagues (i.e., Mintz, Newport, and Bever 2002; Gerken 2002; Gómez 2002; Gómez and Gerken 2000; cf. Harris 1957). Crucially, though, we assert that certain sorts of categories are preferred by the linguistic processing system and are assumed to map onto semantic and conceptual representations in systematic ways (assumptions 3 and 4 above). Once a repository of syntactic representations has been learned, we would expect a processing situation somewhat similar to the one characterized in€early lexical processing (and documented by Fernald, Swingley and colleagues): namely, a situation in which the child parsing system must also deal with syntactic ambiguities and must resolve these ambiguities in real time. Since the adult syntactic parsing system is a probabilistic device that weighs multiple contingencies, it follows that the child processing, though organized and operating in the same way, must gradually discover and learn these contingencies. For illustration, consider the Prepositional Phrase (PP). The PP in English is associated with a range of semantic functions, often ambiguously for any particular lexical head. PPs can be used for temporal and spatial specification of events or entities (e.g., sang on Tuesday, sang on the stage, the book on your left). PPs are also used as arguments of events and entities (e.g., given to Susan, put on the table, governor of California). Consider a child who has already learned that PPs serve these semantic functions.3 As the child detects a particular instance of a PP in the input stream (e.g., I really like your doll with the .â•–.â•–.), how does he or she decide which of these semantic (and syntactic) operations to compute? Based on the adult literature outlined above, we can list the possible sources of evidence: •â•‡ •â•‡
lexical head (e.g., with, on, in) and lexical semantics/syntax syntactic structure up to that point (e.g., verb-syntactic-projections)
72
Trueswell et al.
Figure 4.1
The syntactic alternatives associated with the ambiguous word with, shown using the LTAG formalism (Srinivas and Joshi 1999). •â•‡
semantic structure up to that point (e.g., verb-thematic-projections) referential operations up to that point (e.g., currently insufficient referential specificity) •â•‡ prosodic structure up to that point (e.g., currently open intonational phrase) •â•‡
First, the presence of a particular lexical head (with, on, of╃ ) constrains the range of syntactic and semantic alternatives. But this is only probabilistic evidence for the listener. For instance, with is an ambiguous word that denotes, roughly, either ‘accompaniment’ or ‘instrumental-manner,’ and is associated with several syntactic operations (see figure 4.1). Like any ambiguous word, the probability of activating any of these linguistic characterizations will depend on the word’s dominant and subordinate meanings. Second, the current syntactic structure and semantic structure ( points 2 and 3) constrain alternatives probabilistically as well: Let’s color with .â•–.â•–. requires VP-attachment but either as an accompaniment (Let’s color with your friends) or as an instrument (Let’s color with your crayons). Also, the presence of a NP object permits other syntactic-semantic options (e.g., Let’s color the book with the torn cover). The probability of these alternatives is determined largely by the subcategorization and thematic tendencies of the verb, again as discussed above. Third, referential implications up to that point will play a role in resolving a particular PP occurrence, since, as was discussed above, the lack of referential success of a definite NP supports modification (e.g., the book with .â•–.â•–. when there is more than one book under discussion).
Referential and Syntactic Processes
73
Finally, although not discussed above, prosody serves as probabilistic evidence, since the presence or absence of a major prosodic break has a correspondence to phrasal breaks — though again probabilistically. Now for the developmental question. Given these possible constraints on parsing and interpreting PPs, which of these constraints are going to be more valid predictors of semantic/syntactic choice, and hence appear to dominate child parsing and interpretation processes? If the literature on adult sentenceprocessing is any guide, we should expect that lexical constraints on structural analyses ( points 1–3) will play an early and potent role developmentally. Adults track subcategorization and thematic preferences to such a great extent that they immediately constrain parsing options. If children build such databases as they learn words, it follows that this information will appear as an early determinant of child parsing. Beyond the adult literature, there is additional evidence for this conclusion. Research on verb learning strongly suggests that children track the number and types of phrases that occur with verbs so as to assist in learning the meaning of these verbs (e.g., Fisher, Hall, Rakowitz, and Gleitman 1994; Gillette, Â�GleitÂ�man, Gleitman, and Lederer 1999; Gleitman 1990; Yuan and Fisher 2009). Said another way, children from an early age track subcategorization and aÂ�rgument-taking properties of verbs as they learn them. Our assertion here is that this probabilistic evidence, which was tracked and developed so as to discover the meanings of verbs, isn’t “thrown away” after the verb is learned. Rather it is used to recognize the intended structure of an utterance every time that particular verb is encountered again later in life. Moreover, children, like adults, deploy this knowledge of probabilities as a sentence unfolds. What about potential top-down influences of referential implications ( point 4)? Will children show similar early influence of such factors? Given the assumptions sketched above, the answer should be yes, as long as the particular contextual evidence is valid, accurately computed by the child, and reliably constrains the structural analysis. After all, we argue that it is a great advantage for a listener (adult, child, or infant) to discover the referential conditions uÂ�nder which instances of particular linguistic elements are occurring. However, we have reason to believe from the above-mentioned literature on adult parsing that the referential constraints on PP modification are substantially weaker than relevant lexical predictors of these same structures (e.g., Britt 1994). Moreover, it seems reasonable to assume that the ability to track what is currently relevant in the contextual setting of an utterance could be hindered in children, since this often requires dynamically tracking what the speaker is thinking about (e.g., Clark 1993). From children’s difficulties in taking other iÂ�ndividuals’
74
Trueswell et al.
perspectives, as studied in the theory of mind literature, it follows that the referential evidence children build for the purposes of sentence comprehension is going to be noisy and even contaminated. As a result, it should be expected, somewhat counter-intuitively, that lexical predictors of structure will exert an early and potent influence on child processing, whereas the sorts of contextual factors thus far studied in the adult literature will be developmentally delayed in children’s ambiguity-resolution abilities. The Kindergarten-Path Effect
Much of the initial impetus for developing this account comes from the results of a study of child sentence processing by Trueswell, Sekerina, Hill, and Logrip (1999). In that study, 5-year-olds, 8-year-olds, and adults were given spoken instructions to move objects around on a table while their eye movements were tracked.4 Eye position was used to infer listeners’ ongoing referential commitments, which are believed to be derived from provisional syntactic and semantic analyses of the spoken utterance (Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy 1995; Spivey, Tanenhaus, Eberhard, and Sedivy 2002). On critical trials, participants were presented with one of two types of visual scenes, examples of which are shown in figure 4.2. For both scene types, the child was given an instruction that was designed to refer to a particular stuffed animal — in this case, the frog that was sitting on a napkin, as in (3). (3)╇ Put the frog that’s on the napkin into the box. ( Unambiguous Modifier)5 Consider hearing this sentence in the context of the scene in figure 4.2a, where there is only one frog but there are two napkins, one of which is under the frog. If children were simply looking to possible referents based on the individual
Figure 4.2
One-referent (a) and two-referent ( b) scenes from Trueswell, Sekerina, Hill, and Logrip 1999.
Referential and Syntactic Processes
75
words they heard, we might expect them to look to the frog upon hearing frog but to look to either napkin upon hearing napkin. However, if referential commitments (and the resulting eye movements) are derived from real-time structural analyses, and children of all ages are engaged in these real-time processes (assumptions 1 and 6 above), we should expect that both children and adults would not consider the empty napkin as a referent, since the word napkin is part of a relative clause that must modify the NP the frog. Indeed, this latter pattern was exactly what was found: children and adults looked to the frog within a few hundred milliseconds of hearing frog, and continued looking at the frog (and the napkin under it) when hearing napkin, rarely if ever looking over to the empty napkin. Notice however that when sentence (3) is heard in the context of the scene in figure 4.2b the structural position of the frog in the sentence permits reference to either frog, and it is only upon hearing that’s on the napkin that a listener could compute the correct referent. Indeed, eye movements in two-frog scenes supported this expectation. Upon hearing frog, listeners, regardless of age, launched an eye movement to a frog, but they were at chance as to which frog they looked at. This state of affairs remained until hearing napkin, upon which participants shifted gaze to the frog on the napkin if they were at that moment looking at the wrong frog. Again, participants rarely if ever considered the empty napkin. It should also be noted that participants were essentially flawless at carrying out the instruction, regardless of age: they moved the intended frog into the empty box. Thus the data from syntactically unambiguous trials provide compelling evidence that syntactic and semantic analyses are engaged in real time for the ages we have looked at here, and that the referential implications of these structural analyses are reflected in eye movements. It is possible that the pattern could in principle be explained as children’s engaging in some sort of intersecting-set strategy ( hearing frog causes looks to frogs, hearing napkin causes looks to frog-plus-napkin). However, data from another variant of the instruction rule out this possibility and confirm other assumptions and hypotheses sketched above. In particular, on certain trials participants heard instructions like (4). (4)╇ Put the frog on the napkin into the box. (Temporary Ambiguity) Here the PP on the napkin becomes temporarily ambiguous. It could be interpreted as a modifier of the NP the frog or as a Goal for the verb put, i.e., where to put the frog. If participants are engaged in real-time probabilistic estimation of the intended structure regardless of age (assumption 2) and they use lexical evidence
76
Trueswell et al.
to compute these estimations (as hypothesized above), we should expect a strong initial preference to interpret on the napkin as a goal (where to put the frog) rather than a modifier. This is because the semantics of the verb put requires a goal role, and the goal role is typically realized in the form of a PP headed by on or in. Even 5-year-olds are expected to have ample linguistic experience to know these structural tendencies for this verb.6 If in fact the NP the napkin is interpreted as part of a goal PP, the empty napkin in the scene now becomes a possible referent, since one could move a frog to it. Thus we should expect increased looks to the empty napkin upon hearing the word napkin in temporarily ambiguous sentences like (4). Indeed, for one-frog scenes (figure 4.2a) looks to the frog were immediately followed by looks to the empty napkin upon hearing napkin. This occurred for all ages, suggesting that the syntactic and semantic tendencies of verbs are used to estimate structure and compute referential implications in real time. Two-frog scenes, however, were designed such that referential implications of hearing the frog discouraged interpreting on the napkin as a goal. This is because the frog could refer to either frog. If listeners can use this referential analysis in real time to constrain the structuring of further linguistic input, we ought to expect them to prefer to interpret on the napkin as a modifier of the NP the frog, allowing them to specify a unique referent. This top-down constraint, however, must battle against the lexical biases that support the goal analyses of that same phrase. Thus, two-frog scenes provide the potential for a top-down constraint on structuring on the napkin that ought to reduce looks to the empty napkin in the scene, rendering the pattern more like what was seen for unambiguous modifiers like that’s on the napkin. However, we hypothesized above that less reliable evidence for estimating the structure of the input should result in a developmental delay in using this evidence. We also suggested above that this particular sort of referential constraint on structure is less reliable than lexical constraints, in part because referential factors appear to be less effective in adult parsing behaviors. Moreover, we suggest that in order to discover this sort of referential-syntactic contingency children must be fairly skilled at tracking what is currently under discussion (i.e., what is relevant to the speaker). Any difficulty in computing the referential domain of an utterance would contaminate the ability to discover this contingency. Thus, if any developmental progression is expected, it would be that younger children would fail to take into account that the two-frog scene supports a modifier analysis. Indeed, only older children and adults interpreted the ambiguous on the napkin as a modifier in two-frog scenes. That is, they looked randomly to either frog in the scene upon hearing frog but rapidly converged on the intended frog
Referential and Syntactic Processes
77
upon hearing napkin, rarely if ever looking at the empty napkin. In contrast, 5-year-olds showed a strong preference to interpret on the napkin as a goal, just as in one-frog scenes. Upon hearing frog, they were at chance looking at either frog. And upon hearing napkin they were still at chance looking at either frog, which suggests that they often failed to realize that on the napkin could be a modifier of the NP the frog. Instead, hearing napkin triggered increased looks to the empty napkin just as much as one-frog scenes. Thus, 5-year-olds tended to think on the napkin was the goal of put rather than a modifier, despite the presence of two frogs. This was in striking contrast to how these same children behaved on unambiguous sentences like (3), where they realized the modifier analysis. We are fairly confident of our account of 5-year-olds’ interpretations bÂ�ecause of the resulting overt actions that were made by these children. In particular, 5-year-olds were so strongly committed to the goal analysis on ambiguous tÂ�rials that they often carried out actions that involved moving a frog to the empty napkin. On over 60 percent of the trials, they performed such an action, regardless of the type of referential scene. On unambiguous trials, they were nearly flawless, making errors on about only 5 percent of trials. Thus the second PP, into the box, did not always block the goal analysis of on the napkin. It did, however, for older children and adults: they were nearly flawless on all trials regardless of ambiguity or referential scene, though, as expected, onereferent scenes did induce some confusion and errors on ambiguous trials, even for adults. Questions about Our Account of the Kindergarten-Path Effect
Although the data presented in Trueswell et al. (1999) are consistent with our account, certain aspects of this account are as of yet inadequately motivated. We articulate these concerns here by posing two questions. The first question pertains to children’s insensitivity to the referential-scene manipulation: •â•‡ Why
are two-referent scenes less reliable and less effective at resolving this modifier-argument ambiguity? We have explained developmental differences in the ability to use two-referent scenes by appealing to the poorer reliability of this evidence. In particular, we suggested that this situation (i.e., hearing a definite NP that could refer to m�ultiple visually co-present objects) is not as good a predictor of modifier use€as the local lexical evidence (e.g., put .╖.╖. on) which favors an argument analysis. Moreover, we suggested that the poor reliability of this referentialscene evidence arises in part from the difficulty of discovering that it can be informative.
78
Trueswell et al.
However, we have not offered much in the way of an explanation for why such a referential setting would be difficult for a child to discover as iÂ�nformative. Moreover, the claim that this information is less reliable than certain lexical information leaves partially unexplained the end state of this developmental pattern. After all, if this referential situation only weakly predicts the need for linguistic modification, why did Trueswell et al. (1999) and Tanenhaus et al. (1995) find that adults could use the implications of a two-referent scene to override the strong lexical bias to interpret on the napkin as a goal of put? Trueswell et al. (1999) did note that adults had some difficulty with temporary ambiguities in two-referent scenes, but this difficulty was surprisingly small in view of our claims. We will address this issue below in the section titled Referential Scenes, Definite Reference, and Restrictive Modifiers. Specifically, we will look to the literature on definite reference and discuss findings from a recent referential communication study, all of which suggests that a speaker’s choice about linguistic specification (e.g., saying “the frog” vs. “the frog on the napkin”) is not strongly determined by the mere presence of multiple identical objects of the relevant type (e.g., multiple frogs). With this knowledge in hand, we will take a second look at the adult listener’s ability to use these referential€ situations to resolve syntactic ambiguity. We discuss findings indicating that this contextual evidence is not as effective as the original put studies might€suggest (Snedeker and Trueswell 2004; Novick, Thompson-Schill, and Trueswell 2008). Moreover, these findings show that young children, though exquisitely sensitive to experimental manipulations of the lexical biases, are insensitive to the referential scene manipulation even under conditions of weakly biased lexical evidence. Taken together, the data paint a very reasonable picture about the listener’s use of referential situations to constrain the structuring of linguistic input. Reference and parsing will also be discussed in relation to a recent alternative account of the findings of Trueswell et al. (Wexler, this volume). Based on earlier data from young children’s misuse of the definite determiner the in their own productions (Maratsos 1976; Karmiloff-Smith 1979), Wexler proposes that children in the relevant age range lack a complete understanding of the semantics of the. In particular, he proposes that children lack the notion of maximality: the requirement that the definite determiner must apply maximally to the current referential domain. In the section of the present paper titled Pragmatic vs. Semantic Accounts, we evaluate this hypothesis with respect to a broader range of data from Maratsos and Karmiloff-Smith and suggest that the real problem lies in a child’s understanding of what the referential domain is€ at€ any given moment (roughly, the original conclusions of Maratsos and Karmiloff-Smith). Thus, Wexler is correct in concluding that the data of Mar-
Referential and Syntactic Processes
79
atsos and Karmiloff-Smith shed light on the studies of child parsing, though perhaps not in the way he proposes. A second important question that we wish to address here involves the large difference Trueswell et al. observed between younger children and adults in their final interpretation of these temporary ambiguities: •â•‡ Why
did younger children fail to revise initial parsing commitments?
Trueswell et al. (1999) offered a developmental reason for why 5-year-olds failed to revise their goal interpretation upon hearing into the box. In particular, the authors suggested that children’s difficulty with revising was the result of limited processing resources or limited working memory, both of which expand with age. Since then, our research group has refined this claim to suggest specifically that this change in revision abilities is related to developmental differences in general executive-function processes, specifically the ability to€ inhibit competing representations (Trueswell and Gleitman 2004, 2007; Novick, Trueswell, and Thompson-Schill 2005). In the section titled Revision and Lingering Garden-Paths, we discuss this proposal. We begin by pointing out that children are not the only ones who sometimes fail to revise their interpretation of a temporarily ambiguous phrase. Christianson, Hollingworth, Halliwell, and Ferreira (2001) have found that normal adults can also hold onto beliefs consistent with a rejected interpretation. And Mendelsohn (2002, 2003) has found that individual differences in measures of general inhibition and executive control correlate with this ability to revise parses. Referential Scenes, Definite Reference, and Restrictive Modifiers
Given the issues discussed above, a crucial question becomes whether a listener can deduce from scene information alone the need for restrictive modification, specifically, the need for a speaker to provide restrictive modification of a definite NP. Can a person look out into the visual world and anticipate from this information alone a speaker’s need to utter the little star (and not the star), the toy closest to you (and not the toy), or the frog on the napkin (and not the frog)? The pragmatics and psycholinguistics literature on linguistic reference indicates that this is not the case. For instance, Lyons (1980) sketches a hypothetical situation in which two people are working on a motorcycle and one says “Pass me the spanner” (spanner being British English for wrench). In a situation in which there are two wrenches present, one near the speaker and one near the listener, the listener is likely to infer that the wrench closest to the listener is the one intended. And it seems unlikely that the speaker would have said “Pass the spanner that is closer
80
Trueswell et al.
Figure 4.3
to you” (see also Lyons 1999). Similarly, consider figure 4.3 (adapted from Stone and Webber 1998). A person could refer to a particular rabbit in this scene by saying “Pull the rabbit out of the hat,” but it would be ludicrous to say “Pull the rabbit that’s in the hat out of the hat.” Thus, the referential domain, when applied to a scene, is ultimately determined by what is relevant given the goals of the interlocutors. As a result, a referentially ambiguous NP can be disambiguated by these factors rather than linguistically via restrictive modification. In fact, a recent study (BrownSchmidt, Campana, and Tanenhaus 2002) suggests that these “Lyons-esque” situations are relatively common in conversations about visually present objects. Brown-Schmidt et al. observed that adults do not utter restrictive modifiers every time there are multiple potential referents; nearly half of all definite NPs uttered (48 percent) did not have a unique referent in the scene (e.g., “Okay, pick up the square” might be uttered in the presence of multiple squares.) However, listeners’ eye movements, actions, and vocal responses all showed that they routinely achieved referential success under these conditions (e.g., picking up the correct square). Obviously, this success isn’t evidence for psychic abilities on the part of the subjects. Rather, success occurred because the shape of the discourse and the goals of the task had narrowed the field of possible referents down to one (e.g., only one of the squares was currently a plausible referent). Definite NPs containing restrictive modifiers were uttered only when multiple potential referents were currently under discussion. Now consider the child, who is trying to learn how deictic reference works for definite and indefinite NPs. If the child has difficulty understanding how the goals of an interlocutor restrict referential domains (a reasonable assumption), the child should have specific difficulty with these Lyons-esque situa-
Referential and Syntactic Processes
81
tions. Perhaps not surprisingly, developmental studies of definite reference show that young children (3– 6 years) tend to behave egocentrically in these situations. In particular, in the absence of information that might guide a child’s referential domain to the one intended by the speaker, young children’s comprehension and productions suggest that what they assume to be the referent must also be what their interlocutor assumes to be the referent (Maratsos 1976; Karmiloff-Smith 1979). We will return to this issue in more detail when we discuss Wexler’s (2003) recent reinterpretation of this data, but here we will simply relate the egocentric account to the current put-instruction data. In particular, we now have better reason to believe our account of the child put-study. That is, children receive only sporadic ( probabilistic) evidence that a definite NP (the frog) that deictically refers to a member of a set of objects of the same type will require restrictive modification (e.g., the yellow frog, the frog on the napkin). Moreover, children’s discovery of the actual contribution of restrictive modifiers requires an understanding of common ground, the dynamics of discourse, and a decent model of shared goals with interlocutors (see Clark 1993; Tanenhaus, Hanna and Chambers 2004). The findings of Maratsos (1976) and Karmiloff-Smith (1979) suggest that a child understands that a definite NP must apply to the current domain of reference, but his or her estimation of the domain of reference can become misaligned with the interlocutors’, certainly more often than an adult’s estimation in a similar situation. With this in mind, it becomes easier to understand the child-adult differences found in the put study. Adults, when presented with a set of objects that have been labeled (a frog, a napkin, another frog, a box, another napkin), uÂ�nderstand that this set of objects reflects the current referential domain. Young children in such contexts should be expected to behave egocentrically, thinking when hearing the frog that the frog they are thinking of is likely to be the referent. After all, this is true in the common deictic referential situation of one entity, and in situations where other factors allow the child to be guided toward an understanding of a referential domain that is the same as the speaker’s, i.e., the correct subset. Indeed, as Trueswell et al. (1999) noted, children’s eye-Â� fixation patterns show this egocentricity; the frog they looked to first is a fairly good predictor of which frog they return to, and act upon, in their action of putting. Lexical and Referential Evidence in Interaction in Adults and Children
We have suggested that the referential-scene manipulations were inadequate: simply placing multiple objects of the same type within view doesn’t mean that reference to a particular member of this set will be referentially disambiguated via linguistic means (such as adding a restrictive modifier). Unfortunately, this
82
Trueswell et al.
account does not completely explain the behavior of adults in the put study. That is, even in two-referent scenes we should have expected adults to show some temporary consideration of the goal interpretation of on the napkin, because the semantics and syntax of put supports this interpretation (contra the referential context). We know from years of research in adult sentence comprehension that strong lexical biases that run against contextual /plausibility biases result in temporary misanalysis, or garden-pathing (e.g., Britt 1994; Garnsey et€ al. 1997; Spivey and Tanenhaus 1998; Trueswell 1996). Indeed, this is a fundamental prediction of constraint-satisfaction accounts of comprehension, which motivated our account of the developmental patterns. One possibility worth considering quite seriously is that adults in fact did experience some difficulty with “Put the frog on the napkin into the box” in two-frog scenes, but other factors conspired to reduce this difficulty and perhaps even measurement of this difficulty. The presence of a second prepositional phrase, into the box, increases the likelihood that the first PP is an NP modifier and not a goal. Adults may be especially good at using late post-Â� ambiguity information to revise parses. Also, the primary measurement of garden-pathing in adults (looks to the empty napkin) occurs during and after the perception of this second prepositional phrase, into the box. These factors alone could have reduced signs of considering the goal interpretation across the board in ambiguous trials (in both one-referent and two-referent scenes), with two-referent scenes approaching the floor of the measure: essentially no looks to the empty napkin. Indeed, Novick, Thompson-Schill and Trueswell (2008) have found evidence to support these conclusions. In a slightly modified visual setting in which one frog was on a napkin and another frog was in a€bowl, Novick et al. observed clear signs of consideration of the goal interpretation of on the napkin in two-referent contexts during the ambiguous phrase itself, which disappeared immediately upon hearing into the box. The results of Novick et al. (2008) suggest that even adults may have trouble using two-referent visual scenes to completely determine a modifier interpretation of a PP when disambiguating evidence is not provided in the sentence itself. The prediction is that when lexical information supports a particular parsing preference of a globally ambiguous sentence (as in 5a below), adults ought to show difficulty and possibly even a total failure to take the referentialscene implications into account. Snedeker and Trueswell (2004) addressed this question. In the study, college-age adults heard sentences like those in (5). (5) a. Tickle the pig with the fan. (Instrument-biased Verb) b. Feel the frog with the feather. (Equi-biased Verb) c. Choose the cow with the stick. (Modifier-biased Verb)
Referential and Syntactic Processes
83
Figure 4.4
Clip-art illustrations of one-referent and two-referent scenes from Snedeker, Thorpe, and Trueswell 2001 and Snedeker and Trueswell 2004. Physical objects were used.
Here with the X could attach to the verb as an instrument or could attach to the€noun phrase as a modifier. Verbs were selected on the basis of a separate sentence-completion study that evaluated how often a with-phrase would be used for these verbs as an instrument. As a result, verbs were operationally defined as likely to mention an instrument (Instrument-bias), unlikely to mention an instrument (Modifier-bias), or somewhere in between (Equi-bias). The nouns in the PP (╃ fan, feather, stick) were normed in advance for thematic fit as an instrument for the corresponding verb. Nouns were selected that were rated as poor-to-adequate instruments in each verb class, such that average thematic fit was the same across these verb comparisons. The type of visual scene was also manipulated: one-referent vs. two-referent scenes, as shown in figure 4.4.7 The Instrument-biased sentences are most like the put items above. This is because the verb biases support a VP-attach assignment of the PP. Just as the empty napkin was a potential goal in the put-studies, the large stand-alone object (e.g., the feather) serves as the potential instrument here; looks to and use of this object provided a measure of the VP-attach (Instrument) interpretation of the PP (e.g., with the feather). The off-line action and eye-gaze data from 24 adults are presented in fÂ�igure€4.5. On the left is plotted the proportion of trials in which an instrument action was performed by the subject—for example, the proportion of trials in which the subject picked up the potential instrument (e.g., the large feather) and used it to act on one of the animals (e.g., the frog wearing a party hat). The right bar graph shows the proportion of trials in which subjects looked at the potential instrument during the course of the trial, regardless of whether they picked it€up. Consider first the action data from two-referent scenes. If these scenes had required an NP-modifier interpretation of ‘with the X ’, and adult listeners were
84
Trueswell et al.
Figure 4.5
Data from adults (Nâ•–=â•–24). Source: Snedeker and Trueswell 2004.
aware of this fact, we should have seen essentially no use of instruments in these scenes. That is, subjects should have opted to use their hand to act upon the target animal, e.g., the frog holding the feather. However, we see this occurs only for Equi-bias and Modifier-bias verbs (0 –10 percent instrument actions). For Instrument-biased verbs in two-frog scenes, adults showed nearly 70 percent instrument actions, far greater than the expected 0 –10 percent. It appears that most adults didn’t realize that ‘with the x’ could be an NP modifier in this condition, precisely because the verb so strongly suggested the Instrument analysis. If this is so, it means that the definite NP (e.g., the frog) in this condition was referentially ambiguous. Indeed, on these instrument action trials, subjects half the time acted on the target animal (e.g., the frog holding the feather) and half the time acted on the other animal (e.g., the frog wearing the party hat). Subjects who were given these items would sometimes ask “Which one?” in response to the first two-referent target trial (to which the experimenter said “Please do your best.”). Moreover, post-experiment interviews showed that subjects were behaving in a pragmatically appropriate manner in response to this referential ambiguity, often in the way Lyons (1980) suggested. When explaining (at the end of the experiment) why they acted on a particular frog, they offered the spanner strategy (“I thought you must have meant the frog closer to me”) or some other strategy (“I picked the frog not holding the feather just to be symmetrical” or “I picked randomly”). Averaging across all observations, subjects went for either animal with equal frequency.
Referential and Syntactic Processes
85
Adults are not oblivious to the scene constraint on the need for a restrictive modifier. If the linguistic evidence (the verb information) isn’t heavily biased in favor of the instrument interpretation, listeners consider and choose the modifier interpretation (as evidenced by two-referent Equi and Modifier-Â�biased conditions). And of course, one-referent scenes increased the rate of using and looking at the potential instrument in all verb types. This should be expected, since singular definite NP reference within a scene containing multiple potential referents only partially predicts whether the NP will be contrastively marked with a modifier. What if the data we just presented had come from 5-year-olds rather than from adults? One might have concluded that 5-year-olds do not fully understand definite reference — e.g., they don’t understand the meaning of the definite determiner the (Wexler, this volume). But these are adults, not children. The logical conclusion is that syntactic accessibility of forms, as determined by lexical factors, has a potent effect on interpretation and reference. Of course, the question worth considering is how 5-year-olds behave in these very settings. Given that definite reference in these settings poorly constrains the structuring of an upcoming PP, it follows the 5-year-olds might be even less influenced by these contextual manipulations than adults. In addition, these same children ought to be quite sensitive to the lexical manipulations, since this information is highly predictive of structure and is hypothesized to be easy to track. The data from 5-year-olds (as reported in Snedeker and Trueswell 2004) are plotted in figure 4.6. Indeed, children perfectly matched the verb biases (they know which verbs invite an instrument interpretation) but the referential-scene manipulation had no effect on their actions and a small (non-significant) effect on their eye fixations. These data bode well for the developmental account sketched above. In adults, hearing definite reference in these two visual settings only partially contributes to parsing commitments, and verb biases strongly constrain structuring of the linguistic input. Children have trouble using this sort of deictic referential information, but have little trouble using the lexical information.8 Notice also from the data in figure 4.6 that one gets a very different picture€of what the child finds referentially ambiguous depending on the lexicosyntactic properties of the utterance. This clearly indicates that the study of what children “know” and “don’t know” about language must be embedded within a theory of how linguistic information is dynamically processed by children. This point is often overlooked in the traditional study of language acquisition (see, e.g., Wexler, this volume).
86
Trueswell et al.
Figure 4.6
Data from 5-year-olds (Nâ•–=â•–48). Source: Snedeker and Trueswell 2004. The Effects of Discourse and Pragmatics on Child Parsing
The linguistic observations and experimental findings sketched in the previous section strongly suggest that the shape of the discourse and the goals of the interlocutors ought to be a far better predictor of the referential domain of a referential expression and hence a better predictor of the level of specificity needed for that expression. For instance, recall that Brown-Schmidt et al. (2002) found that the conversational content, when combined with the scene, helps to shape the referential domain for a listener and predicts quite well the specificity of definite reference. If a discourse guides a child listener toward conceiving of the situation in the same way as the speaker, we might expect a child to use this contextual information to guide parsing commitments (egocentrically or otherwise). To this end, Hurewitz (2001; see also Trueswell and Gleitman 2004, 2007) asked whether potentially potent evidence from a spoken discourse can influence 5-year-olds’ parsing decisions. In the study, a preceding discourse, established by two conversing puppets, provided the goal to isolate one referent from among multiple referents in the scene before hearing an ambiguous PP. If these discourse goals provide a strong constraint on the need for the otherwise ambiguous PP to be a modifier, one might expect even 5-year-olds to be sensitive to this fact, and to combine it with lexical constraints on structure. In other words, Hurewitz (2001) asked if she could turn children into adults with this manipulation. In the study, children (Nâ•–=â•–24; age 4;0 –5;6) were tested in a modified version of the Truth Value Judgment task (Crain and Thornton 1998). On each
Referential and Syntactic Processes
87
Figure 4.7
Illustration of objects presented to child in example stimuli from Hurewitz et al.
trial, the child heard a story acted out in the presence of a puppet (Mr. Walrus, known to be not terribly bright). At the end of the story, a second puppet (the clever Ms. Rabbit, who had been hiding under the table listening to the story), appeared and asked Mr. Walrus questions about the story. The child’s job was to evaluate and if necessary correct Mr. Walrus’s answers to her questions. On critical trials, each child was always presented with a two-referent scene (as in figure 4.7, two cats, one on a book, one on a fence, a toy barn, another fence, and a turtle; again, real-world objects, not clip-art images, were used). The story, pre-recorded and acted out by the experimenter (E), deictically referred to each animal and established the pair of cats in distinct events. It is paraphrased here: This cat [E grabs the cat on the book] and this turtle [E grabs turtle] decided to go for a walk, and met up on top of the barn [E moves animal to barn]. Suddenly, the turtle tickled the cat. ‘Tickle, tickle tickle!’ ‘Hee! Hee! Hee!’ [E performs appropriate actions with the animals.] And then they went home. [E returns each animal to original starting place.] And, this cat [E grabs cat on fence] saw all this and laughed and laughed as well.
With each animal back in its original location, Ms. Rabbit returns to ask Mr. Walrus a question. In all conditions, Mr. Walrus’s answer contains an attachment ambiguity: I know, the turtle tickled the cat on the fence. Here on the fence can indicate where the tickling happened (Locative VP-attachment) or can indicate a particular cat (Locative NP-attachment). Mr. Walrus’s utterance, however, was preceded by a question from Ms. Rabbit that either supported the€need to contrast the cats (the Contrastive Question condition, Which cat did the turtle tickle?) or did not support that goal (the Non-Contrastive Question
88
Trueswell et al.
condition, Can you tell me something about the story?). In all cases, both interpretations of the ambiguous sentence are false because the story actually involved the cat on the book being tickled by the turtle in a different location, i.e., when they both had been on the barn. Hence, however the child parsed the sentence, he or she must still correct Mr. Walrus. It is the child’s particular correction of Mr. Walrus that can reveal the implicit parse choice (No! It happened OVER HERE on the barn! or No! THIS CAT was tickled, the one on the book!). The Question-Type factor (Contrastive vs. Non-Contrastive) was crossed with a verb manipulation. Half the trials involved eventive verbs (such as tickle), which easily allow for locative ( VP-attach) modifiers such as on the barn. The other half involved stative verbs, where the story and the critical sentence involved, e.g., liking (The turtle liked the cat on the fence.). Stative verbs do not usually permit locative modifiers, because states typically are not connected to a particular place. Given the importance of conversation constraints in modifier use (see above), the multiple-constraint account predicts that this sort of discourse manipulation ( here, Q-type) and lexical information ( V-type) should both influence parsing preferences, even in 5-year-olds. That is, Contrastive questions and Stative verbs should both induce greater modifier interpretations and resulting corrections by the child. Figure 4.8 plots the proportion of NP modification interpretations exhibited by children in the four conditions. The pattern of these corrections across conditions supports our account. In particular, Contrastive questions led to many more NP modification corrections than when the question was not Contrastive. Also, there was an effect of Verb type. Stative verbs led to more NP modification interpretations than when the verb was eventive. This resulted in reliable effects of Question type and Verb type. And, interestingly, adult controls exhibited an even stronger reliance on the discourse needs of the questions, with adults even coercing stative verbs into eventive readings. It should be emphasized that our account of child parsing is that it is the product of a system that automatically combines multiple evidential sources, all to serve the purpose of coming up with an estimate of the intended meaning of an utterance. If this is the case, we should expect to observe striking differences in children’s use of structures in production and comprehension tasks under particular comprehension conditions. That is, children might correctly produce a particular structure in the right referential context, but then fail to understand this same structure during comprehension, when the lexical sources support an alternative analysis.
Referential and Syntactic Processes
89
Figure 4.8
Proportion of trials in which 5-year-olds gave NP Modification response (e.g., “No, it was this cat, the one on the book”) (data from first block only.) Source: Hurewitz et al.
To test these claims, Hurewitz, Brown-Schmidt, Thorpe, Gleitman, and Trueswell (2000) examined 5-year-olds’ production and comprehension abilities in two-referent scenes. Children heard a story, acted out by the experimenter, that introduced salient differences between the two frogs by having them doing different things. Afterwards, they were tested by asking them a Specific question: Which frog visited Mrs. Squid’s house? Answering this question required (a) understanding the story, ( b) understanding that the question requires an answer that distinguishes the frogs via locative modification, as these frogs were otherwise identical, and (c) producing in the answer a Â�restrictive modifier; namely The frog/one on the napkin. Immediately after Â�answering the question, the same child was asked to perform the put-task of Trueswell et€al. (1999): Very good. Touch the Smiley Face. Now put the frog on the napkin into the box. As a control, another group of children were asked a General question (Can you tell me something about the story?) before doing the put task. Children’s production performance on the Specific question showed that they were able to perform all the relevant non-linguistic and linguistic acroÂ� batics necessary to specify a unique referent through restrictive modification:€72 percent of all answers to the Specific question were correct, providing answers like The frog on the napkin. In striking contrast, these same children’s
90
Trueswell et al.
responses to the put instruction showed the same mis-analysis effects as those reported in Trueswell et al. 1999. They performed incorrect actions, which involved the incorrect destination, in over 70 percent of the trials. And children were at chance in selecting between the two frogs. That is, the very same child who had just correctly responded to the story question by producing a PPmodified NP (the frog on the napkin) might now, in response to Put the frog on the napkin into the box, pick up the other frog, move it over to the empty napkin, and put it into the box. The sheer differences in complexity between the two sentences cannot account for the findings, as we know from earlier experimentation (the same children have no difficulty with unambiguous control sentences of equal complexity, e.g., Put the frog that’s on the napkin into the box). A further experiment in this line (Hurewitz et al. 2000, experiment 2) investigated the possibility that children just weren’t inclined to notice napkins as salient components of scene description. Making the platforms on which frogs were ensconced more salient (frilly umbrellas and royal thrones) generally increased performance in production (87 percent restrictive modifiers in production), but still the striking asymmetry between production and comprehension was preserved (60 percent errors in comprehension). In addition, in this version of the experiment we eye-tracked the young subjects, and the on-line results replicated those of Trueswell et al. (1999). So, in both of these experiments, we observe, as in the Rabbit-Walrus study, children understanding how the discourse can specify the need for an NP restrictive modifier. In particular, in the case of the Rabbit-Walrus study, we see this discourse-syntax knowledge at work in comprehension: a Contrastive question generates an increased chance of interpreting an ambiguous PP as an NP modifier, though this knowledge must battle against lexical evidence that may support an alternative interpretation (e.g., Eventive verbs generated some Question / Discourse-inappropriate responses in children). The experiments in the present section demonstrate this discourse-syntactic knowledge in children’s own productions: Contrastive questions generated a need for referential specificity in the form of a modifier (The frog/one on the napkin), which the children often uttered in response to this question type. However, when we then pull out a put sentence from our lexical arsenal, we see that we can tip the scales back to VP attachment, even in the same child who had just a moment ago demonstrated knowledge of the discourse-syntax facts in his or her own productions. It should be noted, though, that the discourse conditions are indeed slightly different between our production and comprehension test conditions. The distinction between the frogs had just been made by the child in his or her utter-
Referential and Syntactic Processes
91
ance, and thus the discourse goal of contrasting the frogs had been achieved by the time we tested for comprehension abilities in our put instruction. We strongly suspect, however, that put was exerting detrimental effects, since unpublished work from our lab has examined put sentences as part of an answer to a contrastive question. (Rabbit: Which frog should I move? Walrus: I know, put the frog on the napkin into the box.) Here we still find strong VP-�attachment preferences despite the immediately preceding contrastive question (Hurewitz, 2001). Thus, the data strongly support the automatic use of verb preferences in the young child parsing system. Pragmatic vs. Semantic Accounts
Our account of the child parsing data depends heavily on the pragmatics litÂ� erature on deictic reference and on the conclusions drawn by Maratsos and Karmiloff-Smith regarding the development of definite reference. Specifically, faced with a definite NP, children have difficulty calculating the relevant referential domain and frequently do so egocentrically, thinking that what they take to be the referent of a definite NP must also be what their interlocutor believes to be the referent. Wexler (2003 and this volume) has offered a reinterpretation of the data of Maratsos and Karmiloff-Smith and has gone on to apply his explanation to the kindergarten-path phenomena. For motivation of his position, Wexler points to a specific study by Maratsos (replicated later by Karmiloff-Smith in French). In this experiment, children heard a story that introduced either multiple potential referents of the same type (i.e., several boys and several girls) or singletons (i.e., one boy and one girl). The story ended with an intentionally vague assertion (e.g., Someone started giggling and laughing) and a question (Who was giggling and laughing?). It was found, especially in the Karmiloff-Smith version of the study, that younger children (ages 3–7) overused definite NPs in answering the question. That is, even in a story containing multiple, undifferentiated boys and girls, these children tended to answer “The boy” or “The girl” rather than the expected “I don’t know.” On the surface, these data seem consistent with an egocentric account: a younger child has a particular boy or a particular girl in mind, and, without realizing that the addressee might not have a particular boy or a particular girl in mind, utters “the boy” or “the girl.” Wexler questions this account because the stories used in these particular studies did not introduce individual boys and girls but simply established an undifferentiated set of boys and an undifferentiated set of girls. This makes it unlikely that the children could have focused on a particular boy or girl in their mental model.
92
Trueswell et al.
Wexler proposes an alternative explanation of these findings, according to which the child’s deficit is semantic rather than pragmatic. He suggests that children in this age range (3–7 years) lack a full understanding of the meaning of the, and hence systematically misuse (and misinterpret) it. In particular, it is proposed that children lack the notion of maximality, which is a crucial part of the linguistic meaning of the definite determiner. Maximality as it applies to the semantics of the definite determiner can be expressed logically as follows: Given a predicate P that takes an argument X, P must be true for each and every member of the set of type X that is in the current referential domain. For instance, in order for the sentence The boy laughed to be true, each and every boy in the current referential domain must have laughed. In case there is more than one boy in this set, the definite NP must be pluralized (The boys laughed ) and all the boys in the set must have laughed. Wexler claims that the child version of the lacks this notion of maximality: for children, the only requires that the predicate applies to some element in the referential domain. Thus, in the Maratsos and Karmiloff-Smith stories, children say that the character who laughed was “the boy,” even though only one of several boys in the referential domain laughed. In the case of Put the frog on the napkin into the box, the number of frogs doesn’t matter to 3–7-year-olds, and the NP the frog on its own can refer to any member of the set of frogs. Wexler’s proposal has some merit because it draws explicit connections between developmental data and semantic theories of definiteness and aims at providing a detailed account of referential development. However, we believe that the theory is unable to capture key intuitions about referential errors and does not handle the range of developmental data already collected about definite reference and resolution of syntactic ambiguity.9 First, Wexler’s semantic account appears to miss intuitions about referential errors of the sort we are interested in. On his account, in a referential domain of several boys (e.g., one crying, one laughing, one sleeping) the sentence The boy is crying is false, rather than simply infelicitous. That is, if there are multiple boys in the domain of reference (and the other boys are not crying), the predicate is false because it is not true of all members of the set boy. However, the present authors all have the strong intuition that the utterance in this situation is true but infelicitous; that is, it should have been said in a different way. Even though adult intuitions about truth vs. felicity are not always robust, it is an advantage for an independently motivated theory to be able to capture such adult judgments. Second, even though Wexler’s account moves away from pragmatic issues in the calculation of the referential domain for definite descriptions, such issues seem inescapable for anyone trying to account for the interpretation of
Referential and Syntactic Processes
93
definites in both children and adults. (For pointers to a large linguistic and philosophical literature, see Larson and Siegal 1995.) From a psycholinguistic viewpoint, we have already seen numerous cases in which the referential domain is calculated not only from the discourse context but also from the pragmatic implications of the semantic content of the utterance itself. For instance, consider again the scene in figure 4.3. What is the maximal domain for Pull the rabbit out of the hat, and why is it different for Pet the rabbit ? In the latter case, we respond Which one? — why not in the former case? Furthermore, there is now good evidence that the referential domain of an utterance changes for an adult listener based on contextual demands as that utterance unfolds. In an online comprehension study with adults, Chambers, Tanenhaus, Eberhard, Filip, and Carlson (2002) showed that adult listeners found the sentence Put the duck inside the can to be felicitous in the presence of more than one can so long as one and only one can was large enough to hold the duck — so felicitous in fact, that they applied this pragmatic restriction in real time, moving their eyes to the appropriate sized can upon hearing .â•–.â•–. inside the.â•–.â•–.â•–. Chambers et al. demonstrated that this effect changes with the goals of the speaker. In the same scene, the sentence Are you able to put the duck inside the can? resulted in looks to all the cans, often accompanied by a vocal response of Which one? In order to capture this flexibility in interpreting referential expressions, any account of reference must include what is plausible given the goals of the interlocutors as a matter of course. Such an account should be able to explain how mismatches of referential domain may give rise to puzzlement or misunderstandings between interlocutors, even when no semantic problems with definite determiners exist (as in the adult case). A reasonable (and parsimoÂ� nious) interpretation of children’s errors with definite descriptions, then, would be to attribute such errors mostly to pragmatic-referential factors. After all, even the youngest children do not consistently fail in referential studies of production and comprehension; sometimes they have calculated the correct referential domain. Third, the full range of developmental data on definite reference do not seem to be explained by the semantic-deficit account, even for the specific study Wexler used to motivate his account. The stories in Maratsos 1976 enumerate the members of the set, which may encourage the establishment of individuals of these sets of boys and girls. This makes it more likely that children had a particular entity in mind when using definite descriptions (e.g. the boy) in their responses. There are also important puzzles in the data: only a subset of the 4-year-olds behaved in the way predicted by Wexler (those who performed poorly on a separate sentence repetition task). This correlation with memory abilities was found only in this age group, and younger children performed
94
Trueswell et al.
better, correctly uttering a boy in the context of several boys 83 percent of the time. Karmiloff-Smith (1979), in a very similar study in French, finds a different developmental pattern, more in line with Wexler’s account: when asked to indicate a particular boy or girl in a response to a question about a story containing several boys and several girls, young children (ages 3–7) answered le garcon /la fille (the boy/the girl) whereas older children (ages 8–10) responded predominantly with un garcon /une fille (a boy/a girl). But interestingly, two out of four adults we asked reacted to translations of these materials by answering with a specific referent (“I don’t know, Mary?”). These subjects volunteered a name despite the fact that there was no correct answer, presumably because the instructions (Guess who it was) required a single, specific referent. Indeed, Karmiloff-Smith noted that children and adults sometimes responded this way (e.g., giving their own name or the name of one of their school mates such as Juliette). This shows that even adults may make quite specific guesses about the identity of referents they cannot truly individuate. Fourth, under Wexler’s view, we would expect children to make more errors than they do with definite descriptions. In recent work that looked at children’s comprehension of implicatures, Papafragou and Tantalou (2004) presented children with stories in which a character was supposed to perform a certain action to win a prize. In one case, an elephant was given three oranges and was told that he had to eat the oranges. When the elephant came back and was asked if he had eaten the oranges, he answered I ate some. A vast majority of 5-year-olds decided the elephant should not get a prize and justified their response by indicating, e.g., that the elephant had not eaten all the oranges. This reveals that children’s preferred interpretation for the oranges obeys maximality, otherwise the contrast with some would be lost. The full range of data on syntactic ambiguity resolution is difficult to capture under a purely semantic account. In particular, Wexler suggests that a sÂ�emantic deficit accounts for the 5-year-olds’ failure to realize that a scene containing multiple frogs requires a modifier interpretation of an ambiguous phrase. However, if this is true, adults too seem to have this semantic deficit. Recall that we showed that adults frequently failed to realize that a modifier interpretation was needed in Tickle the frog with the feather in two-frog scenes (Snedeker and Trueswell 2004). Syntactic accessibility of the alternative meanings of with the feather also drives reference patterns, in both adults and children. Another way of stating this is that the semantic-deficit account must assume that very similar patterns in adults and children result from performance factors in one case and from competence factors in the other, and that the similarity between the adult and child data sets is accidental.
Referential and Syntactic Processes
95
Instead, the picture emerging from this literature on reference and ambiguity resolution is quite different: success on reference assignment requires, among other things, rapid tracking of shifting and flexible referential domains. Children by age 4 or 5 have trouble aligning their referential domain with that of their interlocutor even though they understand the semantics of definiteness, including the maximality assumption. Indeed, such mismatches of referential domains can occur in adults (we return to this issue, and to individual differences in the adult population, in the next section). We should be clear here; we are not claiming that maximality is innate and pragmatic factors mask this fact. The pragmatic facts themselves suggest that learning how definite reference behaves requires some work. We do claim, however, that by age 5 children understand the semantics of definiteness, including the maximality a�ssumption. Success and failure in child reference appears to be driven by successful match or mismatch in calculations of referential domain. Conflict Resolution and Garden-Path Lingering
Here we consider briefly one of the most striking findings reported in Trueswell et al. 1999 which we have not discussed so far. In the original put study and the follow-up studies, children appear to behave impulsively, not revising their initial referential and syntactic commitments. More concretely, after hearing the sentence Put the frog on the napkin into the box, children often act upon the frog that they looked to first upon hearing the frog. Moreover, the tendency to initially interpret on the napkin as a goal is often not rescinded upon hearing the PP into the box. Five-year-olds tend to carry out actions in which a frog goes to the empty napkin. Adults rarely do this, and neither do 5-year-olds when the ambiguity is removed (Put the frog that’s on the napkin .â•–.â•–.). Other labs have replicated this developmental change in the ability to revise parses (Weighall 2008), and the same pattern has been observed in a language that has a substantially different grammar, Korean, which is a verb-final language (Choi and Trueswell 2010). We believe that this developmental change in revision abilities is related to developmental changes in executive-function, specifically the ability to select a subordinate analysis under conditions of representational conflict (Trueswell and Gleitman 2004; Novick, Trueswell, and Thompson-Schill 2005). In particular, Novick et al. (2005) propose that these patterns represent changes in executive-function abilities generally over the course of development, especially those associated with the selection of representations under conditions of conflict (e.g., Diamond and Taylor 1996; Zelazo and Reznick 1991). Frontallobe regions (e.g., Left Prefrontal Cortex) appear to be implicated in this ability
96
Trueswell et al.
(e.g., Thompson-Schill, Jonides, Marshuetz, Smith, D’Esposito, Kan, Knight, and Swick 2002). And indeed, these regions are known to be late developing anatomically, well into years 5 and 6 (e.g., Huttenlocher and Dabholkar 1997). It seems quite plausible then to consider that the re-ranking of interpretations in garden-path phenomena (inhibiting an initial interpretation, and promoting a new interpretation) would involve these very systems. And indeed, perÂ� severation in response is the hallmark of patients with severe frontal-lobe dÂ�amage — and normally developing children. For instance, in the Wisconsin Card Sorting task, participants first sort cards by one criterion (e.g., color) but then are asked to switch to sorting by another criterion (e.g., shape). Children and frontal-lobe patients have great difficulty switching between criteria, presumably because one sort of evidence was developed (color) as relevant and then must be overridden by some other evidence (shape). To address this account of garden-path recovery, we begin by noting that children are not unique in this failure to rescind garden-path interpretations. Christianson, Hollingworth, Halliwell, and Ferreira (2001) have shown this in comprehension questions about garden-path sentences. The questions were specifically designed so that the correct answer would be No for the intended meaning of the sentence as a whole, but would be answered Yes if based (eÂ�rroneously) on the temporarily considered but rejected interpretation. These questions took significantly longer to answer and showed more errors than the same questions about unambiguous versions of the sentences. That is, the garden path appears to linger, as if subjects sometimes failed to completely inhibit or reject the intended interpretation (Christianson et al. 2001). One might expect individual differences in this ability to be related to executive function generally. Indeed, Mendelsohn (2002, 2003) has offered evidence in favor of this claim. In her study, she found that the size of subjects’ lingering garden-path effect correlated with several linguistic and nonlinguistic measures that arguably involve inhibition /selection mechanisms. Perhaps the most compelling observation was that a completely non-linguistic task, the so-called anti-saccade task, correlated with lingering garden-path measures. In this task, subjects were to look in a direction opposite of a flash of light (thus inhibiting the reflex to look to the light); difficulty on this task correlated with the ability to reject garden-path interpretations of sentences. Our own research group, in collaboration with Sharon Thompson-Schill, has been exploring similar issues ( Novick, Kahn, Trueswell, and Thompson-Schill 2009; January, Trueswell, and Thompson-Schill 2009). For instance, Novick et€al. (2009) report on an adult patient with focal damage to the Left Inferior Frontal Gyrus (LIFG), who not only has trouble with executive function tasks
Referential and Syntactic Processes
97
but also fails to recover from parsing commitments in the put-task of Trueswell et al. (1999). That is, in response to Put the frog on the napkin into the box, he will move the frog over to the incorrect goal (the empty napkin) and then into the box, just like five-year-old children. Likewise, he is perfectly normal in his actions for unambiguous trials, suggesting his deficit is associated with inhibiting the initial analysis of an ambiguous phrase. Indeed, he also has similar trouble with other ambiguities, such as lexical ambiguity. January et al. (2009) have found converging evidence in the form of fMRI data in normal adults (see also Ye and Zhou 2009, for similar findings) and other labs have recently rÂ�eported correlations between individual differences in executive-function abilities and the ability to resolve ambiguities, for both children and adults (Brown-Schmidt 2009; Khanna and Boland 2010; Nilson and Graham 2009). (See Mazuka, Jincho, and Oishi 2009 for a review of the relationship between executive function and child language; see Novick, Trueswell, and ThompsonSchill, in press, for the literature on adults.) We believe executive function issues are deeply related to some interesting experimental findings that point to differences in deictic and anaphoric reference in young children. We mentioned above that in some studies by Maratsos (1976) young children behaved much better in their use of definite and indefinite NPs. In these studies, Maratsos again compared referential situations in which either multiple entities were present (several boy dolls and several girl dolls) or singletons (one boy and one girl). All the dolls were placed at the top of a slide, and children played a game with the experimenter in which the experimenter could send a doll down the slide in a toy car. The child had to decide which doll to send down, and tell the experimenter. Two conditions were compared: one in which the dolls could be seen by the child and the experimenter, and one in which the dolls were shown but then placed out of view of the child. When the dolls were in view, many younger children made definite NP errors of the familiar sort (e.g., Okay, send down the boy! in the presence of multiple boy dolls). Maratsos reported that when children used a definite NP they clearly had a particular doll in mind and were looking directly at it. They also made explicit comments on the experimenter’s choice of referents, such as That’s the one! or That one’s all right, I guess. However, when the dolls were out of sight the use of indefinites increased significantly (Okay, send down a boy!). Maratsos took this as a sign that children were acting egocentrically, though it was not made entirely clear why hiding objects would discourage egocentric behavior. We suggest there may in fact be an underlying connection between visual attention and discourse focus that better explains the Maratsos out-of-sight
98
Trueswell et al.
phenomena. In particular, if one takes seriously the informal analogy that the current referential domain in a discourse model is like an attentional mechanism, these patterns are expected. In the case of focusing attention on entities in a discourse model, attention is not spatially restricted: we can think of two boys even if they are not near each other spatially. However, looking at a particular boy in the world necessarily focuses our attention on that boy and not other boys that are spatially distant (the fovea subtends 2–3 degrees visual angle, and material that is foveated is typically what is being attended to). Deictic reference reflects an interface between a mental model of the world and these co-present objects, whose perception and visual recognition point to these same entities in the mental model. Thus, these findings could be recast as arising from differences in executive function: visual attention and discourse attention can conflict; younger children may not easily deal with this sort of conflict. Visual inspection of an object should be expected to activate this entity in the discourse model, but in many cases the current referential domain is larger or different than this attentional space, and hence such effects on the model must be inhibited. Maratsos noted that even some adults behaved in a childlike fashion when the dolls were visually co-present, saying “send down the boy” in the presence of multiple boys. This is to be expected since this sort of disambiguation by eye gaze or other means can occur. However, we would predict that some of this behavior could very well correlate with individual differences in executive function, as described above for ambiguity resolution. Interestingly, Meroni and Crain (this volume) report a related out-of-sight phenomenon in a kindergarten-path put study. They report that children in twofrog scenes do quite well (around 90 percent correct actions) when they are first asked to close their eyes before hearing the utterance. The authors suggest a processing explanation at the motor-planning level: closing one’s eyes gets one out of an “interpret-mode” of language comprehension and children may have trouble inhibiting motor plans made incrementally during speech perception. We are skeptical of these results and their interpretation, especially since Meroni and Crain did not compare their results to children who had their eyes open during utterance interpretation. Indeed, Weighall (2008), in a study of 60€children, reports replicating all the findings of Trueswell et al. and none of the findings of Meroni and Crain; instead, children showed signs of gardenpathing whether or not they were initially looking at the scene. We suspect that the high number of successful interpretations in the study by Meroni and Crain reflect specific properties of their study; the discourse context and the goals for the children were different in the Meroni and Crain study, and, we believe, further supported the correct interpretation.
Referential and Syntactic Processes
99
Closing Remarks
On the basis of the work reviewed here, it should be clear that young children who are trying to comprehend language are faced with a processing problem of considerable complexity. From a sequence of words, a child must rapidly glean detailed grammatical information in order to determine not only who is doing what to whom but also how an utterance relates to their current conception of the world. There exist multiple probabilistic sources of evidence for constraining the grammatical structure of an utterance, and the child must discover, weigh, and combine this evidence. We have suggested elsewhere (e.g., in Trueswell and Gleitman 2004, 2007) that the child tracks and builds detailed syntactic and semantic representations of words that allow for the efficient recovery of the structure of the sentence as a whole. However, this gets a child or adult listener only so far. The referential implications of these analyses must also be considered, since some analyses are very unlikely given the referential setting. Here the developmental theorist and the child face the same problem: they must grapple with the fact that the interlocutors’ goals and shared conception of the world guide what is considered to be the relevant referential domain. Within the realm of deictic reference, the scene itself only partially constrains the choice of referential expressions (i.e., the level of specificity or linguistic economy offered in an utterance). What matters more is which aspects of this world are under discussion and are relevant to the intended goals of the utterance. We discussed numerous instances of adult conversation and adult comprehension in which this was shown to be the case. We then turned our attention to the implications these observations might have for a child who is learning how deictic reference works, and how reference might impact relevant syntactic choices during comprehension. The story we have offered is long, and admittedly complicated, but how could it be any other way? After all, reference is amazingly sensitive to a vast array of considerations, from the linguistic to the nonlinguistic, and the child must learn these facts presumably bit by bit. Indeed, we believe the complexity of the referential process explains in part why “bottom-up” sources of structure (i.e., the lexical evidence) so strongly constrain syntactic ambiguity resolution at earlier stages of development. Even after a child understands how reference works within a referential domain, the computation of that domain, which changes rapidly, is expected to be frequently misaligned with the domain currently entertained by his or her interlocutor. Thus, fairly systematic errors can be observed in definite reference both in production and comprehension, especially with regard to the syntactic concomitants of reference.
100
Trueswell et al.
Nevertheless, we have documented several instances in which the child sÂ�uccessfully computes the referential domain, and understands the level of specificity needed for reference within this domain. Most of these successes, we would argue, arise from circumstances in which there is a clear and reliable€indicator from the conversation about the shape of the current referential domain and the goal of the utterance (e.g., questions beginning with “Which .â•–.â•–.” need to contrast members of a set). Under a view in which comprehension is a “guessing game” in which listeners are continuously guessing the intentions of the speaker, we might expect such a pattern: highly constraining, easyto-discover sources of evidence trump, in developmental time, less reliable and€more complex sources. We strongly suspect that other demonstrations of non-egocentric referential behaviors in young children also arise out of simpler€mechanisms that unambiguously predict the attentional state of the interlocutor (e.g., eye gaze and physical constraints on perception; see Baldwin 1991 and Nadig and Sedivy 2002). Indeed, there is now growing evidence that€ “theory of mind” abilities emerge out of converging cues of this sort, and€ that simpler use of some of these cues arises in part in other species and€grows during development in children (Leslie 2000; Call and Tomasello 2005). Finally, we have considered attentional development in children, and have suggested that during the presence of physical objects under discussion children are overly sensitive to their own visual /perceptual attentional state when it comes to calculating what to attend to in their discourse model. We have concluded that children between the ages of 4 and 6 years understand that a definite NP ( produced or heard) must apply maximally to the current referential domain, but this domain is skewed by their attentional state. This explains a range of phenomena in their own use of definite and indefinite reference, but it also explains their difficulty applying referential facts to parsing procedures. Moreover, developmental changes in a listener’s ability to rescind syntactic commitments may reflect this same processing change, in which competing structure(s) must be rapidly inhibited and the correct structure must be activated. This points to an explanation of individual differences in adults regarding errors in production and comprehension, which are skewed in the direction of the child-like behaviors. It is clear that we are only at the early stages of understanding how the processing demands of language production and language comprehension iÂ�mpinge on the language-learning process. However, methods like those discussed here, which examine how children interpret speech in real time, offer new insight into these developing processing abilities. The data suggest that there is hope for building a more unified theory of language acquisition and language use
Referential and Syntactic Processes
101
over a lifetime — a theory in which we recognize that multiple sources of evidence must be discovered (and sometimes built) by the child, in a way that aÂ�llows for immediate integration of this evidence into comprehension and production mechanisms. Acknowledgments
Some of the research reported herein was supported by grants from the National Institutes of Health (R01-HD37507 to John Trueswell and Lila Gleitman and 5F32-MH6502 to Anna Papafragou). We would like to thank Peter Gordon of Columbia University’s Teachers College and Colin Phillips of University of Maryland at College Park for insightful comments, specifically about the development of definite reference and Wexler’s semantic account of these phenomena. We also thank David January, Samantha Crane, and Alon Hafri for their assistance in the preparation of this paper. Finally, we thank Lila Â�Gleitman for her support and helpful comments on earlier drafts. Notes 1.╇ The explanation of lexically specific effects on parsing is more complicated than this. For instance, one could alternatively say that listeners (and language learners) use known semantic properties of the verb slice to compute the likelihood of an instrument phrase, rather than saying that listeners and learners track the syntactic contingencies given the word itself. It is very difficult to disentangle these two explanations since semantic and syntactic properties of verbs are so intimately related (e.g., Fisher, Gleitman, and Gleitman 1992). Also, one might argue that an individual’s compiled ‘corpus’ of their native tongue is too small to offer lexically based biases. However, even a conservative estimate of language exposure (two words a second for 5 hours a day) yields 13.1 million words per year (see Kelly and Martin 1994 for a similar estimate). This estimate is clearly conservative when one considers that television exposure alone is€an€average 3.6 hours/day by age 3 for American children (Christakis, Zimmerman, DiGiuseppe, and McCarty 2004). Nevertheless, it is an obvious necessity for any theory of language comprehension to include semantic-contingent predictors of structures given the way that language works creatively. Another way of stating this is that the learned semantic classes and properties of known words are also used to constrain parsing and are especially important when a word is uncommon. That is, the parser must “back off ” to larger categories when subcategories have a small N and even infer a likely category when the word is novel to the language user (i.e., Nâ•–=â•–1). For further discussion and experimentation, see Juliano and Tanenhaus 1994; Naigles, Gleitman, and Gleitman 1993; Naigles, Fowler, and Helm 1992, 1995; Trueswell, Tanenhaus, and Kello 1993; Trueswell and Gleitman 2004. 2.╇ Research suggests that adults, when engaged in conversations, coordinate and track the referential domain of utterances, especially when this domain is determined by such
102
Trueswell et al.
factors as common knowledge/ground. Debates exist however about the timing of these effects (e.g., Hanna et al. 2003; Keysar et al. 2000; Keysar et al. 1998). 3.╇ This of course is a miracle in our theory as laid out here. See Gleitman et al. (2005) for a theory of learning “hard words,” among which we include on, in, under, of, and within. See also Nappa, Wessel, McEldoon, Gleitman, and Trueswell (2009) for how parsing and visual attention may combine to aid in the learning of such words. 4.╇ We realize that the decision to study 5-year-olds skips over much of language learning. And indeed the initial choice of this age range was driven largely by mÂ�ethodological limitations (younger children were less willing to wear the eye-tracking visor). Nevertheless, several findings suggest that important developmental changes in parsing and interpretation extend well through this age range (e.g., Chomsky 1969) — findings which have not been adequately studied or explained. 5.╇ Trueswell et al. (1999) used in the box rather than into the box. The use of the preposition in introduces another ambiguity, which we wish to avoid. All follow-up studies (e.g., Hurewitz et al. 2000) have used into and have replicated the results reported here. For simplicity we adopt in our examples the use of into. 6.╇ Put is a very common verb in child-directed speech and is frequently accompanied by a Goal PP (see Trueswell et al. 1999 for corpus evidence). 7.╇ Verb type was manipulated between subjects such that one group of subjects got Instrument-biased stimuli, another group got Equi-biased stimuli, and a third got MÂ�odifier-biased stimuli (embedded in numerous filler trials). For each verb-type, half the trials were two-referent scenes and half were one-referent scenes (with item-Â� condition pairings counterbalanced across subjects). 8.╇ It appears that the earlier put studies may have had additional factors that reduced garden-pathing generally in adults but not in children: specifically post-ambiguity information and prosodic information. Consistent with this, Choi and Mazuka (2003) provided evidence that prosodic information has little effect on children’s interpretation€of syntactic ambiguity in Korean, and Snedeker et al. (2003) found significant but weak effects of prosody on PP-attachment ambiguity resolution in children. Also, postambiguity (i.e., disambiguating) information might be especially effective for adults in parsing, an issue we return to later in the paper. Put the X on the .â•–.â•–. and Tickle the X with the .â•–.â•–. also differ in the type of semantic properties that are competing with each other in the ambiguity. It is possible that the put ambiguity is easier to revise because both interpretations involve the location of an object (see Snedeker and Trueswell 2004). 9.╇ Also, in a recent test of Wexler’s maximality account, Munn, Miller, and Schmidt (2006) offer some experimental data that do not support the maximality account and instead support a more traditional analysis of child definite reference. References Allopenna, P. D., Magnuson, J. S., and Tanenhaus, M. K. 1998. Tracking the time course of spoken word recognition: Evidence for continuous mapping models. Journal of Memory and Language 38, 419– 439.
Referential and Syntactic Processes
103
Altmann, G., and Kamide, Y. 1999. Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition 73, 247–264. Altmann, G., and Steedman, M. 1988. Interaction with context during human sentence processing. Cognition 30, 191–238. Aslin, R. N., Saffran, J. R., and Newport, E. L. 1998. Computation of conditional probability statistics by 8-month old infants. Psychological Science 9, 321–324. Baldwin, D. A. 1991. Infant contribution to the achievement of joint reference. Child Development 62, 875–890. Britt, M. A. 1994. The interaction of referential ambiguity and argument structure in the€ parsing of prepositional phrases. Journal of Memory and Language 33, 251– 283. Brown-Schmidt, S. 2009. The role of executive function in perspective-taking during on-line language comprehension. Psychonomic Bulletin & Review, 16, 893–900. Brown-Schmidt, S., Campana, E., and Tanenhaus, M. K. 2002. Reference resolution in€ the wild: On-line circumscription of referential domains in a natural, interactive problem-solving task. In Proceedings of the 24th Annual Conference of the Cognitive Science Society. Erlbaum. Call, J., and Tomasello, M. 2005. Reasoning and thinking in nonhuman primates. In K. Holyoak and B. Morrison (eds.), Cambridge Handbook of Thinking and Reasoning. Cambridge University Press. Christianson, K., Hollingworth, A., Halliwell, J., and Ferreira, F. 2001. Thematic roles assigned along the garden path linger. Cognitive Psychology 42, 368– 407. Chambers, C. G., Tanenhaus, M. K., Eberhard, K. M., Filip, H., and Carlson, G. N. 2002. Circumscribing referential domains during real-time language comprehension. Journal of Memory and Language 47 (1), 30 – 49. Choi, Y., and Mazuka, R. 2003. Young children’s use of prosody in sentence parsing. Journal of Psycholinguistic Research 32 (2), 197–217. Choi, Y., and Trueswell, J. C. 2010. Children’s (in)ability to recover from garden-paths in a verb-final language: Evidence for developing control in sentence processing. Journal of Experimental Child Psychology 106 (1), 41– 61. Chomsky, C. 1969. Acquisition of Syntax in Children from 5 to 10. MIT Press. Christakis, D. A., Zimmerman, F. J., DiGiuseppe, D. L., and McCarty, C. A. 2004. Early television exposure and subsequent attentional problems in children. Pediatrics 113 (4), 708–713. Clark, H. H. 1993. Arenas of Language Use. University of Chicago Press. Crain, S. 1980. Pragmatic Constraints on Sentence Comprehension. Ph.D. dissertation, University of California, Irvine. Crain, S., and Steedman, M. 1985. On not being led up the garden path: The use of context by the psychological parser. In D. Dowty, L. Karrttunen, and A. Zwicky (eds.), Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives. Cambridge University Press.
104
Trueswell et al.
Crain, S., and Thornton, R. 1998. Investigations in Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics. MIT Press. Diamond, A., and Taylor, C. 1996. Development of an aspect of executive control: Development of the abilities to remember what I said and to “Do as I say, not as I do.” Developmental Psychobiology 29, 315–334. Fernald, A. 2004. The search for the object begins at the verb. Talk presented at 29th Annual Boston University Conference on Language Development. Fernald, A., Zangl, R., Portillo, A. L., and Marchman, V. A. 2008. Looking while lisÂ� tening: Using eye movements to monitor spoken language comprehension by infants and young children. In I. Sekerina, E. Fernandez, and H. Clahsen (eds.), Developmental€ Psycholinguistics: On-Line Methods in Children’s Language Processing. John BÂ�enjamins. Fisher, C., Hall, G., Rakowitz, S., and Gleitman, L. 1994. When it is better to receive than to give. Lingua 92, 333–375. Fisher, C., Gleitman, H., and Gleitman, L. 1992. On the semantic content of subcategorization frames. Cognitive Psychology 23 (3), 331–392. Frazier, L. 1987. Sentence processing: A tutorial review. In M. Coltheart (ed.), Attention and Performance XII: The Psychology of Reading. Erlbaum. Frazier, L., and Fodor, J. D. 1978. The sausage machine: A new two-stage parsing model. Cognition 6, 291–325. Garnsey, S. M., Pearlmutter, N. J., Myers, E., and Lotocky, M. A. 1997. The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language 37, 58–93. Gerkin, L. A. 2002. Early sensitivity to linguistic form. Annual Review of Language Acquisition 2, 1–36. Gillette, J., Gleitman, L., Gleitman, H., and Lederer, A. 1999. Human simulations of vocabulary learning. Cognition 73, 153–190. Gleitman, L. 1990. The structural sources of verb learning. Language Acquisition 1, 3–35. Gleitman, L. R., Cassidy, K., Nappa, R., Papafragou, A., and Trueswell, J. C. 2005. Hard words. Language Learning and Development 1 (1), 23– 64. Gómez, R. L. 2002. Variability and detection of invariant structure. Psychological Science 13, 431– 436. Gómez, R. L., and Gerkin, L. A. 2000. Infant artificial language learning and language acquisition. Trends in Cognitive Sciences 4, 178–186. Hanna, J., Tanenhaus, M. K., and Trueswell, J. C. 2003. The effects of common ground and perspective on domains of referential interpretation. Journal of Memory and Language 49, 43– 61. Harris, Z. 1957. Co-occurrence and transformation in linguistic structure. Language 33, 283–340. Hurewitz, F. 2001. Developing the Ability to Resolve Syntactic Ambiguity. Ph.D. dissertation, University of Pennsylvania.
Referential and Syntactic Processes
105
Hurewitz, F., Brown-Schmidt, S., Thorpe, K., Gleitman, L., and Trueswell, J. 2000. One frog, two frog, red frog, blue frog: Factors affecting children’s syntactic choices in production and comprehension. Journal of Psycholinguistic Research 29, 597– 626. Huttenlocher, P. R., and Dabholkar, A. S. 1997. Regional differences in synaptogenesis in human cerebral cortex. Journal of Comparative Neurology 387, 167–178. Jackendoff, R. 2002. Foundations of Language. Oxford University Press. January, D., Trueswell, J. C. and Thompson-Schill, S. L. 2009. Co-localization of Stroop and syntactic ambiguity resolution in Broca’s area: Implications for the neural basis of sentence processing. Journal of Cognitive Neuroscience 21 (12), 2434–2444. Juliano, C., and Tanenhaus, M. 1994. A constraint-based lexicalist account of the sÂ�ubject /object attachment preference. Journal of Psycholinguistic Research 23 (6), 459– 471. Karmiloff-Smith, A. 1979. A Functional Approach to Child Language: A Study of Determiners and Reference. Cambridge University Press. Kelly, M. H., and Martin, S. 1994. Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in perception, cognition, and language. Lingua 92, 105–140. Keysar, B., Barr, D. J., Balin, J. A., and Brauner, J. S. 2000. Taking perspective in conversation: The role of mutual knowledge in comprehension. Psychological Science 11, 32–37. Keysar, B., Barr, D. J., and Horton, W. S. 1998. The egocentric basis of language use: Insights from a processing approach. Current Directions in Psychological Science 7, 46 –50. Khanna, M. M., and Boland, J. E. 2010. Children’s use of language context in lexÂ� ical€ambiguity resolution. Quarterly Journal of Experimental Psychology 63 (1), 160 – 193. Kim, A., Srinivas, B., and Trueswell, J. C. 2002. The convergence of lexicalist perspectives in psycholinguistics and computational linguistics. In P. Merlo and S. Stevenson (eds.), Sentence Processing and the Lexicon: Formal, Computational and Experimental Perspectives. John Benjamins. Larson, R., and Siegal, G. 1995. Knowledge of Meaning. MIT Press. Leslie, A. M. 2000. Theory of mind as a mechanism of selective attention. In M. Gazzaniga (ed.), The New Cognitive Neuroscience, second edition. MIT Press. Lyons, C. G. 1980. The meaning of the English definite article. In J. Van der Auwera (ed.), The Semantics of Determiners. Croom Helm. Lyons, C. G. 1999. Definiteness. Cambridge University Press. MacDonald, M. C., Pearlmutter, N. J., and Seidenberg, M. S. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review 101, 676 –703. Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19, 313–330. Maratsos, M. P. 1976. The Use of Definite and Indefinite Reference in Young Children: An Experimental Study of Semantic Acquisition. Cambridge University Press.
106
Trueswell et al.
Mazuka, R., Jincho, N., and Oishi, H. 2009. Development of executive control and language processing. Language and Linguistics Compass 3 (1), 59–89. Mendelsohn, A. 2002. Individual Differences in Ambiguity Resolution: Working Memory and Inhibition. Ph.D. dissertation, Northeastern University. Mendelsohn, A. 2003. The role of inhibition in sentence processing as demonstrated by a new task. Paper presented at CUNY Conference on Human Sentence Processing, Cambridge. Mintz, T. H., Newport, E. L., and Bever, T. G. 2002. The distributional structure of grammatical categories in speech to young children. Cognitive Science 26, 393– 424. Munn, A., Miller, K., and Schmitt, C. 2006. Maximality and Plurality in Children’s Interpretations of Definites. In D. Bamman, T. Magnitskaia, and C. Zaller (eds.), Boston University Conference on Language Development (BUCLD) Proceedings 30. pp. 377– 387. Nadig, A. S., and Sedivy, J. C. 2002. Evidence of perspective-taking constraints in children’s on-line reference resolution. Psychological Science 13 (4), 329–336. Naigles, L. G., Gleitman, L. R., and Gleitman, H. 1993. Children acquire word meaning components from syntactic evidence. In E. Dromi (ed.), Language and Cognition: A Developmental Perspective. Ablex. Naigles, L. G., Fowler, A., and Helm, A. 1992. Developmental shifts in the construction of verb meanings. Cognitive Development 7 (4), 403– 427. Naigles, L. G., Fowler, A., and Helm, A. 1995. Syntactic bootstrapping from start to finish with special reference to Down syndrome. In M. Tomasello and W. Merriman (eds.), Beyond Names for Things: Young Children’s Acquisition of Verbs. Erlbaum. Nappa, R., Wessell, A., McEldoon, K. L., Gleitman, L. R., and Trueswell, J. C. 2009. Use of speaker’s gaze and syntax in verb learning. Language Learning and Development 5 (4), 203–234. Nation, K., Marshall, C., and Altmann, G. T. M. 2003. Investigating individual differences in children’s real-time sentence comprehension using language-mediated eye movements. Journal of Experimental Child Psychology 86, 314–329. Nilsen, E. S., and Graham, S. A. 2009. The relations between children’s communicative perspective-taking and executive functioning. Cognitive Psychology 58, 220 –249. Novick, J. M., Kahn, I., Trueswell J. C., and Thompson-Schill, S. 2009. A case for conflict across multiple domains: Memory and language impairments following damage to ventrolateral prefrontal cortex. Cognitive Neuropsychology 26 (6), 527–567. Novick, J. M., Kim, A., and Trueswell, J. C. 2003. Studying the grammatical aspects of word recognition: Lexical priming, parsing and syntactic ambiguity resolution. Journal of Psycholinguistic Research 32 (1), 57–75. Novick, J. M., Trueswell, J. C., and Thompson-Schill, S. 2005. Toward the neural basis of parsing: Prefontal cortex and the role of selectional processes in language comÂ� prehension. Journal of Cognitive, Affective, and Behavioral Neuroscience 5 (3), 263– 281. Novick, J. M., Thompson-Schill, S., and Trueswell, J. C. 2008. Putting lexical constraints in context into the visual-world paradigm. Cognition 107 (3), 850 –903.
Referential and Syntactic Processes
107
Novick, J. M., Trueswell, J. C., and Thompson-Schill, S. In press. Broca’s area and language processing: Evidence for the cognitive control connection. Language and Linguistic Compass. Papafragou, A., and Tantalou, N. 2004. Children’s computation of implicatures. Language Acquisition 12, 71–82. Saffran, J. R. 2001. Words in a sea of sounds: The output of statistical learning. Cognition 81, 149–169. Saffran, J. R. 2002. Constraints on statistical language learning. Journal of Memory and Language 47, 172–196. Saffran, J. R., Aslin, R. N., and Newport, E. L. 1996. Statistical learning by 8-month old infants. Science 274, 1926 –1928. Snedeker, J., Thorpe, K., and Trueswell, J. C. 2002. On choosing the parse with the scene: The role of visual context and verb bias in ambiguity resolution. In Proceedings of the 22nd Annual Conference of the Cognitive Science Society, Edinburgh. Snedeker, J., and Trueswell, J. C. 2004. The developing constraints on parsing decisions: The role of lexical-biases and referential scenes in child and adult sentence processing. Cognitive Psychology 49 (3), 238–299. Spivey-Knowlton, M., and Sedivy, J. 1995. Resolving attachment ambiguities with multiple constraints. Cognition 55, 227–267. Spivey, M. J., and Tanenhaus, M. K. 1998. Syntactic ambiguity resolution in discourse: Modeling the effects of referential context and lexical frequency. Journal of Experimental Psychology: Learning, Memory and Cognition 24 (6), 1521–1543. Spivey, M. J., Tanenhaus, M. K., Eberhard, K. M., and Sedivy, J. C. 2002. Eye movements and spoken language comprehension: Effects of visual context on syntactic ambiguity resolution. Cognitive Psychology 45, 447– 481. Srinivas, B., and Joshi, A. K. 1999. Supertagging: An approach to almost parsing. Computational Linguistics 252 (2), 237–265. Steedman, M. 2000. The Syntactic Process. MIT Press. Stone, M., and Webber, B. 1998. Textual economy through close coupling of syntax and semantics. In Proceedings of INLG. Swingley, D., Pinto, J. P., and Fernald, A. 1999. Continuous processing in word recognition at 24 months. Cognition 71, 73–108. Taraban, R., and McClelland, J. 1988. Constituent attachment and thematic role assignment in sentence processing: Influences of content-based expectations. Journal of Memory and Language 27, 1–36. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., and Sedivy, J. C. 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632–1634. Tanenhaus, M. K., Hanna, J., and Chambers, C. 2004. Referential domains in spoken€language comprehension: Using eye movements to bridge the product and action traditions. In J. M. Henderson and F. Ferreira (eds.), The Interface of Language, Vision, and Action: Eye Movements and the Visual World. Psychology Press.
108
Trueswell et al.
Thompson-Schill, S. L., Jonides, J., Marshuetz, C., Smith, E. E., D’Esposito, M., Kan, I. P., Knight, R. T., and Swick, D. 2002. Effects of frontal lobe damage on interference effects in working memory. Cognitive, Affective and Behavioral Neuroscience 2, 109– 120. Trueswell, J. C. 1996. The role of lexical frequency in syntactic ambiguity resolution. Journal of Memory and Language 35, 566 –585. Trueswell, J. C., and Gleitman, L. 2004. Children’s eye movements during listening: Developmental evidence for a constraint-based theory of sentence processing. In J. M. Henderson and F. Ferreira (eds.), The Interface of Language, Vision, and Action: Eye Movements and the Visual World. Psychology Press. Trueswell, J. C., and Gleitman, L. R. 2007. Learning to parse and its implications for language acquisition. In G. Gaskell (ed.), Oxford Handbook of Psycholinguistics. Oxford: Oxford Univ. Press. Trueswell, J. C., and Kim, A. E. 1998. How to prune a garden-path by nipping it in the bud: Fast-priming of verb argument structures. Journal of Memory and Language 39, 102–123. Trueswell, J. C., Sekerina, I., Hill, N. M., and Logrip, M. L. 1999. The kindergartenpath effect: Studying on-line sentence processing in young children. Cognition 73, 89–134. Trueswell, J. C., and Tanenhaus, M. K. 1991. Tense, temporal context and syntactic ambiguity resolution. Language and Cognitive Processes 6 (4), 303–338. Trueswell, J. C., and Tanenhaus, M. K. 1994. Toward a lexicalist framework for cÂ�onstraint-based syntactic ambiguity resolution. In C. Clifton, K. Rayner, and L. Frazier (eds.), Perspectives on Sentence Processing. Erlbaum. Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. 1994. Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33, 285–318. Trueswell, J. C., Tanenhaus, M. K., and Kello, C. 1993. Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory and Cognition 19 (3), 528–553. Weighall, A. R. 2008. On still being led down the kindergarten path: Children’s processing of structural ambiguities. Journal of Experimental Child Psychology 99, 75–95. Wexler, K. 2003. Maximal trouble. Paper presented at CUNY Human Sentence Processing Conference, Boston. Ye, Z., and Zhou, X. 2009. Conflict control during sentence comprehension: fMRI evidence. NeuroImage 48, 280 –290. Yuan, S., and Fisher, C. 2009. “Really? she blicked the baby?”: Two-year-olds learn combinatorial facts about verbs by listening. Psychological Science 20 (5), 619– 626. Zelazo, P. D., and Reznick, J. S. 1991. Age-related asynchrony of knowledge and action. Child Development 62 (4), 719–735.
5â•…
Parsing, Grammar, and the Challenge of Raising Children LF at Julien Musolino and Andrea Gualmini
In this paper, we tie together two lines of psycholinguistic research represented by work on sentence processing and language acquisition. Until very recently, these two lines of inquiry have mostly proceeded independently from one aÂ�nother. That is, investigators concerned with the operation of the languagecomprehension system have rarely asked when or how the processing mechÂ� anisms uncovered in adults develop in the course of language acquisition ( but€see Trueswell, Sekerina, Hill, and Logrip 1999; Meroni and Crain, this volume). Conversely, researchers concerned with the study of grammatical development, who often rely on language-comprehension paradigms, have traditionally focused on children’s grammatical competence, overlooking the study of linguistic performance in young children ( but see Crain, Ni, and Conway 1994; Crain and Thornton 1998). In this paper, we observe that when the study of language development is brought together with the study of sentence processing, the picture that emerges is substantially more interesting for both sides. First, we show that the study of grammatical development cannot make progress without a careful consideration of the child’s developing processing abilities. Moreover, we argue that differences between children and adults in sentence comprehension can be used to uncover important aspects of children’s grammatical knowledge. Second, we show that data from child language can be used to broaden the empirÂ� ical basis upon which hypotheses on adult sentence processing can be evaluated. In order to accomplish these goals, we follow a traditional approach in the field of psycholinguistics which consists in studying ambiguity resolution as a means to investigate the operations underlying language comprehension (e.g., Fodor, Bever, and Garrett 1974; Frazier and Rayner 1982; Frazier and Fodor 1978). Here, we extend this approach to the study of child language by focusing on the case of ambiguous sentences containing multiple scope-bearing operators, i.e., negation and quantified noun phrases (e.g., some dogs, every boy). The discussion is organized as follows. The second section introduces scope
110
Musolino and Gualmini
ambiguity as it pertains to negation and quantified NPs. The third section preÂ� sents the results of previous studies by Musolino (1998) and Musolino, Crain and Thornton (2000) designed to investigate the way preschoolers (and adults) resolve the scope ambiguities discussed in the second section. The main developmental finding emerging from the work of Musolino and colleagues is the observation that children systematically rely on the surface position of the relevant quantificational elements in their calculation of scope relations. In other words, if a quantificational NP occurs within the scope of negation in the surface syntax, it will correspondingly be interpreted in the scope of negation, and vice versa for an NP occurring outside the scope of negation. This is described by the so-called Observation of Isomorphism (Musolino 1998). The remainder of the chapter is devoted to an exploration of the causes, limits, and implications of this observation. The fourth section discusses whether children’s interpretation is constrained by the linear order between the relevant quantificational elements or the c-command relations holding between them. The fifth section is concerned with the causes of isomorphism. First, we review previous accounts of the phenomenon suggesting that isomorphism reflects a grammatical difference between children and adults. Then we turn to more recent experimental evidence which challenges the grammatical accounts of isomorphism. The sixth section investigates the limits of the phenomenon further, by showing that not all quantificational NPs give rise to isomorphism effects, and in particular that preschoolers are sensitive to the presuppositional status of the NPs involved. Finally, the seventh section considers the implications of the research and findings discussed throughout the paper for both theories of language acquisition and theories of language processing. Scope Relations: The Interaction of Negation and Quantified NPs
An often-discussed property of quantificational expressions is their ability to interact with one another and with other scope-bearing elements to create scope ambiguity (e.g., Horn 1989; Jackendoff 1972; May 1977). Consider for example the sentence in (1). (1) Every student can’t afford a new car. a. ∀x [student (x)â•–→â•–¬â•–can afford a new car (x)] b. ¬∀x [student (x)â•–→â•–can afford a new car (x)] On one reading, (1) can be paraphrased as Every student is such that he or she cannot afford a new car. In this case, the universally quantified subject is interpreted outside the scope of negation (abbreviated ‘everyâ•–>â•–not’), as indicated by the representation in (1a). We call this an isomorphic interpretation, since in
Parsing, Grammar, and Children
111
this case the scope relation between every student and negation can be directly read off their surface position. On another reading, (1) can be paraphrased as Not every student can afford a new car. Here, every student is interpreted within the scope of negation (abbreviated ‘notâ•–>â•–every’), as shown in (1b). We call this a non-isomorphic interpretation, since in this case the surface position of every student and negation does not coincide with their relative position at the level of semantic interpretation. The example in (2) shows that the availability of non-isomorphic interpretations depends in part on the lexical nature of the quantificational element involved. That is, replace every student in (1) with some students and the sentence is no longer perceived to be ambiguous. (2) Some students can’t afford a new car. $x [students(x)â•–∧â•–¬â•–can afford a new car (x)] The most natural — if not the only — interpretation of (2) is an isomorphic interpretation on which the sentence can be paraphrased as There are some students who cannot afford a new car (i.e., ‘someâ•–>â•–not’). Another factor determining the availability of non-isomorphic interpretations is the syntactic position of the quantified expression. This can be seen by comparing (1), which finds the universally quantified NP in subject position and is perceived to be ambiguous, and (3), which finds the universally quantified NP in object position and is not perceived to be ambiguous. (3) The professor didn’t talk to every student. ¬∀x [student (x)â•–→â•–talked to ( professor, x)] Indeed, the most natural interpretation of (3) is one on which the professor talked to some of the students but not to others — in other words, every student is interpreted within the scope of negation (i.e., ‘notâ•–>â•–every’), an isomorphic interpretation. As in the case of quantified subjects, the lexical nature of a quantified object affects its scopal properties with respect to negation. If every student in (3) is replaced by some students, as in (4), the most natural interpretation becomes a non-isomorphic one on which (4) can be paraphrased as There are some students to whom the professor didn’t talk (i.e., ‘someâ•–>â•–not’). (4) The professor didn’t talk to some students. $x [students (x)â•–∧â•–¬â•–talked to ( professor, x)] Finally, a numerally quantified object gives rise to scopal ambiguity and hence to both an isomorphic and a non-isomorphic interpretation, as shown by the indefinite two students.
112
Musolino and Gualmini
(5) The professor didn’t talk to two students. a. ¬$2x [students (x)â•–∧â•–talked to ( professor, x)] b. $2x [students (x)â•–∧â•–¬â•–talked to ( professor, x)] The example in (5) can be paraphrased as meaning that it is not the case that€ the€ professor talked to two students (for instance, the professor may have€talked to only one student). In this case, two students receives a narrowscope interpretation with respect to negation (abbreviated ‘notâ•–>â•–two’), which corresponds to an isomorphic interpretation. Alternatively, (5) can be paraphrased as meaning that there are two particular students to whom the proÂ� fessor didn’t talk. Here, two students receives wide scope with respect to negation (abbreviated ‘twoâ•–>â•–not’), which corresponds to a non-isomorphic interpretation. In sum, quantified expressions interact with other logical operators such as negation. These interactions result in complex interpretive patterns determined in part by the nature of the quantificational expressions involved and by their syntactic position. In order to investigate children’s (and adults’) interpretation of sentences containing negation and quantified NPs, the studies presented in this paper employ an experimental technique known as the Truth Value Judgment Task (TVJT) (see Crain and Thornton 1998). The TVJT was specifically developed to investigate the meanings that children assign to the sentences of their language. The TVJT typically involves two experimenters. The first experimenter acts out short stories in front of the subjects using small toys and props. The second experimenter plays the role of a puppet who watches the stories alongside the subjects. At the end of the story, the puppet makes a statement about what he thinks happened in the story. The subjects’ role is to determine whether the puppet is “right” or “wrong” about the story. Finally, the subjects are asked to justify their answers by explaining why they think the puppet was right or wrong.1 In all the experiments described in this paper, children witnessed stories that were acted out in front of them using small toys and props. Participants typically heard four test sentences (i.e. four different examples of the construction under investigation) and four filler/control sentences. Adult controls were shown the same stories as the ones witnessed by children, but instead of having the stories acted out in front of them they were shown video recordings of the stories. The Observation of Isomorphism
In order to determine when and how preschool children become aware of the€complex mappings between form and meaning involved in sentences containing quantified expressions and negation, Musolino (1998) and Musolino
Parsing, Grammar, and Children
113
et€al. (2000) tested children’s (and adults’) comprehension of sentences like (6)–(9). (6)╇ Every horse didn’t jump over the fence. (7)╇ The Smurf didn’t buy every orange. (8)╇ Some girls didn’t ride on the merry-go-round. (9)╇ The detective didn’t find someone. One of the stories used to test subjects’ comprehension of sentences like (6) involved three horses trying to jump over a fence. Two of the horses jumped over the fence but the third one didn’t. At the end of the story, a puppet described the situation as in (6). Notice that (6) is true on the non-isomorphic (i.e. ‘notâ•–>â•–every’) interpretation, since it is true that not all of the horses jumped over the fence. However, the puppet’s statement is false on the ‘everyâ•–>â•–not’ (isomorphic) interpretation, since two horses did jump over the fence. A ‘yes’ response to the puppet’s statement (along with appropriate justification) would therefore indicate that subjects are accessing the ‘notâ•–>â•–every’ (non-iÂ�somorphic) interpretation; a ‘no’ response (along with appropriate justification) would indicate that they are accessing the ‘everyâ•–>â•–not’ (isomorphic) interpretation. Musolino et al. (2000) tested a group of 20 English-speaking children between the ages of 4;0 and 7;3 (mean age 5;11) and a control group of adults and found that, whereas adults always accepted the puppet’s statement (showing that they could easily assign these sentences a non-isomorphic interpretation), children accepted the puppet’s statements only 7.5 percent of the time. When asked to justify their answers, children typically said that the puppet was wrong because two of the horses did jump over the fence. Children, therefore, unlike adults, systematically accessed the isomorphic interpretation of sentences like (6).2 One of the stories used to test subjects’ interpretation of sentences like (7) involved a Smurf who went to the grocery store to buy some fruit.3 The Smurf considered buying apples and oranges and ended up buying one of three oranges. At the end of the story, the puppet described the situation as follows: “The Smurf didn’t buy every orange.” Notice that this statement is true on an isomorphic (‘notâ•–>â•–every’) interpretation, since the Smurf didn’t buy all the oranges. However, the statement is false on a non-isomorphic (‘everyâ•–>â•–not’) interpretation, since it is false that the Smurf bought none of the oranges. Therefore, a ‘yes’ response to the puppet’s statement would indicate that subjects interpret every orange in the scope of negation, an isomorphic interpretation. A ‘no’ answer, on the other hand, along with appropriate justification, would indicate that subjects assign the sentence an ‘everyâ•–>â•–not’ interpretation.
114
Musolino and Gualmini
What Musolino et al. found in the case of sentences like (7) is that a group of 20 English-speaking children ranging in age between 3;11 and 6;0 (mean age 4;10) accepted the puppet’s statements 85 percent of the time, and therefore systematically assigned sentences like The Smurf didn’t buy every orange a ‘notâ•–>â•–every’ interpretation. Moreover, when asked to justify their answers, children pointed to the fact that the Smurf had bought only one of the three oranges.4 In one story used to test sentences like (8), each of three girls considered riding on a merry-go-round but only one of them ended up doing so. At the end of the story, the puppet described the situation as follows: “Some girls didn’t ride on the merry-go-round.”5 In this case, the puppet’s statement is true on an isomorphic interpretation (‘someâ•–>â•–not’), since there were indeed some girls who didn’t ride on the merry-go-round. On the other hand, a ‘notâ•–>â•–some’ (nonisomorphic) interpretation falsifies the puppet’s statement, since one of the girls did ride on the merry-go-round. Musolino (1998) tested a group of 20 Englishspeaking children ranging between the ages of 4;0 and 6;2 (mean age 4;10) and found that they assigned sentences like Some girls didn’t ride on the merry-goround a ‘someâ•–>â•–not’ (isomorphic) interpretation 100 percent of the time. One story used to test sentences like (9) involved a detective playing hideand-seek with two of his friends. As the story unfolded, the detective found one of his friends but failed to find the other. At the end of the story, the puppet described the situation as The detective didn’t find someone. Notice that this statement is true on a non-isomorphic interpretation since there is indeed someone that the detective didn’t find. However, the puppet’s statement is false on an isomorphic interpretation, since it was not the case that the detective didn’t find anyone. Therefore, a ‘yes’ response to the puppet statement would indicate that subjects access a non-isomorphic interpretation, while a ‘no’ response would indicate an isomorphic interpretation. The subjects were 30 English-speaking children ranging from 3;10 to 6;6.6 The experiment also involved a control group of adults. Adult subjects always accepted statements like (9), showing that they were systematically assigning such sentences a nonisomorphic interpretation. By contrast, children accepted the puppet’s statements significantly less often (i.e., 65 percent of the time for the 5-year-olds and 35 percent of the time for the 4-year-olds). When asked to justify their negative answers, children typically explained that the puppet was wrong because the detective had indeed found someone. Thus, children, unlike adults, often assigned statements like (9) an isomorphic interpretation. Children’s interpretation of sentences like (6)–(9) led Musolino (1998) to propose “The Observation of Isomorphism” as a descriptive generalization. This observation states that children, unlike adults, have a strong tendency to
Parsing, Grammar, and Children
115
interpret negation and quantified NPs on the basis of their surface syntactic position. See (10). (10) Sentence type7 Every horse didn’t jump over the fence The Smurf didn’t buy every orange Some girls didn’t ride on the merry-go-round The detective didn’t find someone
Children ∀¬ ¬∀ $¬ ¬$
Adults ¬∀ ¬∀ $¬ $¬
The Observation of Isomorphism and (more generally) the existence of any systematic difference in the linguistic behavior of children and adults, raise a number of questions. The first question involves the traditional Chomskyan distinction between linguistic competence and linguistic performance (Chomsky 1965). In the case at hand, this question asks whether children’s overly isomorphic interpretations reflect a stage in linguistic development during which children do not have implicit knowledge of the fact that the grammar of their language can generate non-isomorphic interpretations (a competence account). Alternatively, the Observation of Isomorphism may be due to limitations on the computational resources that otherwise grammatically competent children deploy during language comprehension (a performance account). This is what we call the competence question. A second question raised by the findings of Musolino et al. is whether Isomorphism obtains as a consequence of the linear arrangement of the scopebearing elements involved, or whether children’s interpretation is constrained by the c-command relations holding between these elements.8 This is what Lidz and Musolino (2002) call the structural question. This question arises because linear and hierarchical (i.e., c-command) relations are systematically confounded in the constructions investigated by Musolino et al. To be sure, owing to the canonical SVO order of English, the object position follows and falls within the c-command domain of sentential negation, whereas the subject position precedes and falls outside the c-command domain of negation. A third question, related to the scope of the phenomenon observed by Musolino et al., is whether the isomorphism effect can be observed in the acquisition of languages other than English, provided of course that such languages manifest the same kind of scope phenomena as English with respect to negation and quantified NPs. Following Lidz and Musolino (2002), we call this the crosslinguistic question. In the following sections, we present a series of experiments designed to refine our understanding of children’s developing quantificational competence by addressing the questions raised above.
116
Musolino and Gualmini
The Structural Basis of Isomorphism
The role of structure-dependent notions in child grammar has been mostly discussed with respect to syntactic principles (see, e.g., Crain and Nakayama 1987). More recently, however, students of child language have also turned to semantic phenomena. For example, Crain, Gardner, Gualmini, and Rabbin (2002) have reported evidence that preschoolers compute scope relations between logical operators on the basis of c-command relations, rather than linear order. These researchers studied children’s interpretation of the disjunction operator or in negative sentences. As Partee, ter Meulen, and Wall (1990) observed, when the disjunction operator occurs within the scope of negation (or within the scope of any downward entailing operator) one can infer the validity of each disjunct. For example, (11) entails both sentences in (12). (11) The girl who stayed up late will not get a dime or a jewel. (12) a. The girl who stayed up late will not get a dime. b. The girl who stayed up late will not get a jewel. Interestingly, these inferences can be drawn only if disjunction is c-commanded by negation. Thus, a sentence like (13), in which negation precedes but does not c-command the disjunction operator or, does not entail either of the sentences in (14). (13) The girl who did not go to sleep will get a dime or a jewel. (14) a. The girl who did not go to sleep will get a dime. b. The girl who did not go to sleep will get a jewel. To determine if English-speaking children know that the sentences in (11) and (13) yield different patterns of entailment, Crain et al. (2002) conducted a Truth Value Judgment Task. Thirty children participated in the experiment, ranging in age between 3;11 and 5;9 (mean age 5;0). In one of the trials, children were told a story about two girls who had lost a tooth and were waiting for the tooth fairy. One of the girls decided to stay up late (i.e., not to go to bed) to see what the tooth fairy looked like. The tooth fairy was very disappointed by this girl’s behavior and decided to give her a jewel but no dime. Children accepted sentences like (13) 87 percent of the time in a context in which the girl who did not go to sleep received only a jewel, which suggests that they did not interpret (13) as entailing (14a). By contrast, children rejected (11) in the same context 92 percent of the time, which suggests that they did interpret (11) as entailing (12a).
Parsing, Grammar, and Children
117
Returning to isomorphism effects, the question is whether structural relations are also relevant for children’s interpretation of quantified noun phrases in negative sentences. In order to address the structural and the cross-linguistic question, Lidz and Musolino (2002) tested speakers of English and also went to India to test child and adult native speakers of Kannada (a Dravidian language spoken by approximately 40 million people in the state of Karnataka in southwestern India). The canonical word order in Kannada is Subject-ObjectVerb (SOV), and Kannada displays the same kind of scope ambiguities as English with respect to negation and quantified NPs (Lidz 1999). The crucial difference between Kannada and English for our purposes is that in Kannada linear order (qua precedence) and c-command relations are not confounded. Consider the tree diagrams in (15). (15)╇
In English, negation both precedes and c-commands the object position, as discussed earlier. In Kannada, however, negation c-commands the object but does not precede it. This means that in Kannada, a c-command account of isomorphism would predict a preference for the narrow-scope reading of the object with respect to negation whereas a linear account of isomorphism would predict a preference for the wide-scope reading of the object NP. To test the predictions of precedence vs. c-command, Lidz and Musolino (2002) tested 4-year-olds and adults on their interpretation of sentences like (16) in both English and Kannada. (16) Cookie Monster didn’t eat two slices of pizza. a. ¬$2x [slices of pizza (x)â•–∧â•–eat (Cookie Monster, x)] b. $2x [slices of pizza (x)â•–∧â•–¬â•–eat (Cookie Monster, x)] The experiment conducted by Lidz and Musolino had two conditions. In the first, called the narrow-scope condition, sentences like (16) were true on the narrow-scope interpretation (i.e., ‘notâ•–>â•–two’) but were false on the widescope interpretation (i.e., ‘twoâ•–>â•–not’). One of the stories used in this condition
118
Musolino and Gualmini
involved Cookie Monster and two slices of pizza. As the story unfolded, Cookie Monster ate one of the slices but not the other. At the end of the story, the puppet described the situation by stating that Cookie Monster didn’t eat two slices of pizza. In this case, the puppet’s statement was true on the narrowscope interpretation, since Cookie Monster ate only one slice of pizza (and not two), and was false on the wide-scope interpretation, since there weren’t two slices of pizza that Cookie Monster didn’t eat (there was in fact only one such slice). In the second condition, called the wide-scope condition, sentences like (16) were true on the wide-scope interpretation but were false on the narrow-scope interpretation. One of the stories used in this condition involves a situation in which Cookie Monster tried to eat four slices of pizza but managed to eat only two. As before, the puppet described the situation by stating that Cookie Monster didn’t eat two slices of pizza. In this case, the wide-scope interpretation was true, since there were indeed two slices of pizza that Cookie Monster didn’t eat; however, the narrow-scope interpretation, which stated that it was not the case that Cookie Monster ate two slices of pizza, was false, since he ate exactly two slices. The English-speaking participants were 24 children between the ages of 3;11 and 4;11 (mean age 4;4) and 24 adults. The Kannada-speaking subjects were 24 4-year-olds between the ages of 4;0 and 4;11 (mean age 4;5) and 24 adult native speakers of Kannada. Lidz and Musolino (2002) found that adult subjects overwhelmingly accepted sentences like (16) in both the narrowscope and the wide-scope condition, and displayed no significant preference for one scope interpretation over the other (i.e., 97 percent acceptance rate in the narrow-scope condition vs. 93 percent in the wide-scope condition for EÂ�nglish-speaking adults, 87.5 percent acceptance rate in the wide-scope condition vs. 85.4 percent in the narrow-scope condition for Kannada-speaking adults). Four-year-olds, on the other hand, displayed a significant preference for the narrow-scope interpretation of sentences like (16) in both languages (i.e., 81 percent acceptance rate in the narrow-scope condition vs. 33 percent in the wide-scope condition for English-speaking 4-year-olds and 75 percent vs. 22.9 percent for Kannada-speaking 4-year-olds). When asked to justify their answers, children who rejected the puppet’s statements in the wide-scope condition typically explained that the puppet was wrong because Cookie Monster had indeed eaten two slices of pizza. By contrast, in the narrow-scope condition, children almost always said that the puppet was right, and they justified their answers by pointing out that Cookie Monster had eaten only one slice of pizza.
Parsing, Grammar, and Children
119
These results provide an answer to both the cross-linguistic question and the structural question. First, the Observation of Isomorphism is not limited to the acquisition of English. Second, isomorphism does not follow from a one-toone mapping between linear order and semantic scope, but rather from a oneto-one mapping between surface c-command relations and semantic scope.9 The Status of Isomorphism
The competence question as it pertains to the Observation of Isomorphism was addressed by Musolino (1998) and Musolino et al. (2000). In their discussion of the competence/performance distinction, Musolino et al. (2000), following Musolino (1998), write: “The available evidence favors the grammatical hypothesis.â•–.â•–.â•–. We argue, moreover, that isomorphism can be derived from the interaction of a learning principle, the subset principle, along with fundamental properties of UG.â•–.â•–.â•–.” ( p. 20) To illustrate the argument of Musolino et al., let us first consider sentences like Every horse didn’t jump over the fence. Musolino et al. observe that, although such sentences are ambiguous between an isomorphic (i.e., ‘everyâ•–>â•– not’) and a non-isomorphic (i.e., ‘notâ•–>â•–every’) interpretation in some languages (e.g., English), they only permit an isomorphic (i.e., ‘everyâ•–>â•–not’) interpretation in others (e.g., Chinese). Languages like Chinese therefore allow only a subset of the interpretations available in languages like English (i.e., {everyâ•–>â•–not} vs. {everyâ•–>â•–not; notâ•–>â•–every}, respectively). Following the logic of the subset principle proposed by Berwick (1985), Musolino et al. aÂ�rgue that English-speaking children go through a “Chinese-speaking” stage during which their grammar generates only the ‘everyâ•–>â•–not’ interpretation of sentences like Every horse didn’t jump over the fence. Similarly, Musolino et al. argue that the subset principle can also be used to explain children’s isomorphic interpretations of sentences like The detective didn’t find someone. Their approach is based on a distinction, originally due to Hornstein (1984), between two kinds of quantified NPs (QNPs). According to Hornstein, some QNPs are assigned scope via a movement-based mechanism, i.e. type 1 QNPs (see Hornstein 1995 for a specific proposal). In addition to the movement-based mechanism, other QNPs can also be assigned scope via a mechanism that does not involve syntactic movement, i.e., type 2 QNPs (see Reinhart 1997 for a specific proposal in terms of choice functions). Moreover, according to Hornstein’s proposal, QNPs can receive non-isomorphic interpretations with respect to negation only when they are not assigned scope via movement. In other words, when the scope of a QNP is derived via (covert)
120
Musolino and Gualmini
syntactic movement, the resulting interpretation is always an isomorphic one. Thus, the options available to type 1 QNPs (i.e., taking scope via a movementbased mechanism) are a subset of those available to type 2 QNPs (i.e., taking scope via a movement-based or a non-movement-based mechanism). The logic of the subset principle therefore dictates that children initially hypothesize that all QNPs are of type 1, and hence that they must be interpreted isomorphically with respect to negation. Since Musolino’s original findings, children’s interpretation of quantifiers in negative sentences has been the subject of numerous studies. One line of research has focused on the properties of negative sentences discussed by research in formal pragmatics and on the role of “felicity conditions” associated with the use of negative statements. We can introduce the role of felicity conditions for the interpretation of negative sentences through the examples in (17) and (18), which are from Wason 1972. The sentence in (17) is true. Still, (17) appears to be more difficult for people to evaluate than a positive sentence like (18), although (17) and (18) seem to express the same proposition. (17)╇ 5 is not an even number. (18)╇ 5 is an odd number. Interestingly, the difficulty associated with (17) is mitigated considerably if the sentence is preceded by a positive statement, as in (19). (19)╇ 4 is an even number, and/but 5 is not an even number. Similar considerations about the difficulty associated with negative sentences have been offered by De Villiers and Tager-Flusberg (1975). Consider (20). (20)╇ I didn’t drive to work. As De Villiers and Tager-Flusberg (1975, p. 279) point out, the statement in (20) “is more plausible, and consequently easier to comprehend, if it is made by someone who normally drives rather than by someone who commutes by train.”, and this property of negative sentences results from the fact that “negÂ� ative statements are generally used to point out discrepancies between a listener’s presumed expectations and the facts.” The observation that negative sentences present a specific set of felicity conditions led De Villiers and Tager-Flusberg (1975) to an experimental investigation of children’s comprehension of negation. In the experiment, children were asked to complete negative questions (i.e., ‘This is not a â•…â•… ?’) in contexts in which the use of a negative sentence was or was not plausible (e.g., in the plausible context, the experimenter had pointed to various instances of a particular object). The results showed that children as young as two responded signifi-
Parsing, Grammar, and Children
121
cantly faster when the negative question was presented in the plausible context. De Villiers and Tager-Flusberg interpret this finding as evidence that very young children know that the use of negative sentences is subject to the satisfaction of specific felicity conditions. To sum up, previous studies investigating the difficulty of negative sentences show that subjects experience a difficulty in processing negative sentences. The same studies, however, suggest that the difficulty associated with negative sentences can be mitigated if the target sentence is preceded by a positive lead-in or used to point out that an expectation went unfulfilled. Interestingly, recent investigations have studied the role of both factors for children’s interpretation of quantified NPs in negative sentences. The role of felicity conditions in children’s interpretation of negative sentences containing some was investigated by Gualmini (2003). In particular, Gualmini attempted to determine whether children’s difficulty with negative sentences containing some would persist when those negative sentences were used to point out a contrast between what happened and what was expected to happen. Thirty children ranging in age from 4;01 to 5;08 (mean age 4;11) participated in a Truth Value Judgment Task. The children were divided into two groups. In a typical trial, children were told a story about a firefighter playing hide-and-seek with four dwarves. During the story, it was repeatedly pointed out that the firefighter was supposed to find all of the dwarves. In the end, the firefighter managed to find only two of the four dwarves. The two groups of children were asked to evaluate (21) and (22) respectively. (21)╇The firefighter didn’t find some dwarves. (22)╇The firefighter didn’t miss some dwarves. Both sentences were true, but only (21) pointed out a discrepancy between€what happened and what was expected to happen. Consistently, children as young as€4;01 accepted sentences like (21) 90 percent of the time but accepted sentences like (22) only 45 percent of the time. In short, the results documented by Gualmini (2003) show that 4-year-olds can indeed access the non-isomorphic interpretation of negative sentences containing some in object position.10 In a similar vein, the predictions of Musolino, Crain, and Thornton’s (2000) grammatical account of isomorphism were tested in an experiment by Musolino and Lidz (2006) based on Musolino (2000). The experiment had two conditions. In both conditions, children were presented with a context in which€two horses jumped over a fence and a third horse did not jump over the fence. The first condition was designed to replicate Musolino’s (1998) finding regarding children’s isomorphic interpretation of sentences like (23). In the second condition, children were presented with sentence (23) but, in order to satisfy the
122
Musolino and Gualmini
felicity conditions on the use of negative statements described above, this sentence was preceded by an affirmative statement that differed from (23) only in the object NP, as shown in (24). (23)╇ Every horse didn’t jump over the fence. (24)╇Every horse jumped over the log but every horse didn’t jump over the fence. In the story corresponding to (24), the three horses all jumped over the log, and then two of them also proceeded to jump over the fence. Notice that if children interpret Every horse didn’t jump over the fence in (24) to mean that none of the horses jumped over the fence, an isomorphic interpretation, they should reject the puppet’s statement and should explain, as they did in the Musolino et€al. (2000) study, that the puppet was wrong because two horses did jump over the fence. On the other hand, if children assign Every horse didn’t jump over the fence in (24) a non-isomorphic (i.e., ‘notâ•–>â•–every’) interpretation, they should accept the puppet’s statement, since it is true that not all the horses jumped over the fence. The subjects were 20 English-speaking children between the ages of 5;0 and 5;11 (mean age 5;4) and 20 adult native speakers of English. Musolino and Lidz found that, whereas adult subjects accepted sentences like (23) and (24) 92.5 percent and 100 percent of the time respectively, children, who accepted sentences like (23) only 15 percent of the time, accepted sentences like (24) significantly more often, i.e. 60 percent of the time. Moreover, when asked to justify their affirmative answers to (24), children typically said that the puppet was right because one of the horses didn’t jump over the fence while the two others did. Such justifications show that children were indeed accessing a nonisomorphic interpretation. In addition to replicating Musolino’s original finding, these results demonstrate that children’s ability to access the non-isomorphic interpretation of sentences like Every horse didn’t jump over the fence improves dramatically when such a sentence is preceded by an affirmative statement. That children’s ability to access the non-isomorphic interpretation of sentences like The firefighter didn’t find some dwarves or Every horse didn’t jump over the fence improves under certain contextual manipulations casts serious doubt on the grammatical account proposed by Musolino (1998). Such a competence account makes a strong, falsifiable prediction: If children’s non-adult interpretations reflect lack of grammatical knowledge, contextual support should have no effect on the child’s ability to access non-isomorphic interpretations. Clearly, the facts pÂ�resented here indicate otherwise, and so the grammatical account must be abandoned.
Parsing, Grammar, and Children
123
The results presented in the preceding paragraphs provide compelling evidence against a grammatical account of isomorphism. The strategy so far has been to show that under certain conditions children can show adult-like beÂ� havior. Another way to minimize the differences between children and adults would be to show that adults can be “turned into children” (Freeman, Sinha, and Stedmon 1982). This is what Musolino and Lidz (2003) did with respect to sentences containing negation and numerally quantified NPs. First, Musolino and Lidz showed that although adult speakers of English can easily access Â�either scope interpretation of sentences like (25) they display a preference for the isomorphic interpretation in a forced choice situation. (25)╇ Cookie Monster didn’t eat two slices of pizza. Moreover, Musolino and Lidz showed that the isomorphism effect typically displayed by children can also be seen in adults in the case of sentences like (26). (26)╇ Two frogs didn’t jump over the rock. Notice that this example can receive an isomorphic interpretation, i.e. ‘twoâ•–>â•– not’, on which the sentence can be paraphrased as There are two frogs that didn’t jump over the rock. Alternatively, (26) can receive a non-isomorphic interpretation, i.e. ‘notâ•–>â•–two’, on which it can be paraphrased as It is not the case that two frogs jumped over the rock. Musolino and Lidz tested a group of 20 adult native speakers of English on their interpretation of sentences like (26) in a design similar to the one described in the fourth section of the present paper (i.e., one condition in which the isomorphic interpretation is true and the non-isomorphic condition is false and another condition in which this pattern is reversed). Musolino and Lidz found that adults displayed a significant preference for the isomorphic interpretation of sentences like (26), i.e., 100 percent acceptance in the isomorphic condition vs. only 27.5 percent in the non-isomorphic condition. Thus, the isomorphism effect typically seen in children can also be observed in adults, which invites us to consider whether that preference for surface scope could also account for the child data. Beyond Isomorphism
In the previous section, we discussed two lines of evidence against the grammatical account of isomorphism originally proposed by Musolino (1998) and Musolino et al. (2000). Yet, despite having shown that isomorphism is unlikely to reflect a grammatical difference between children and adults, we haven’t really said anything about what gives rise to the effect in the first place. In other
124
Musolino and Gualmini
words, why do children — whose grammars we now know can generate both isomorphic and non-isomorphic interpretations — nevertheless so often resort to isomorphic interpretations? In this section, we focus on the case of sentences containing indefinite objects and negation. We propose an account of children’s isomorphic behavior based on independently motivated principles regulating the interpretation of indefinite NPs. This account, in turn, will allow us to make specific predictions regarding the limits of the phenomenon, i.e. situations in which isomorphism is expected to occur and situations in which it isn’t. These predictions are then tested — and supported — by new experimental evidence. Our account draws on theoretical work on the interpretation of iÂ�ndefinites. In his seminal work on existential constructions, Milsark (1977) observes that whereas strong determiners (e.g., every, the, most) can only have a quantificational reading, weak determiners (e.g., some, many, two) are ambiguous between a “cardinal reading” and a “quantificational reading” (see also Postal 1966). To see this, consider this example from Milsark (1977): (27)╇ Some salesmen walked in. This sentence can mean either that an undetermined number of people have the property of (a) being a salesman and ( b) having walked into the room (the cardinal reading) or that, given a set of salesmen, some members of this set (and not others) walked into the room (the quantificational reading). On the first reading, the property of being a salesman is attributed to the entities under discussion; on the second reading, the existence of entities who have that property is presupposed. More recently, Diesing (1992) proposed that the cardinal and the quantiÂ� ficational interpretation of indefinites are associated with two different logÂ� ical€ forms. According to Diesing, an indefinite used on its cardinal reading is€ not quantificational, in the sense that it only introduces a variable, which is€ then bound through the existential closure operation proposed by Heim (1982). On this view, the cardinal interpretation of (27) can be represented as in (28). (28)╇ $x [salesmen (x) and walked in(x)] By contrast, the quantificational reading, which Diesing (1992) refers to as a presuppositional reading, involves a tripartite structure. For Diesing, the presuppositions associated with quantified sentences are represented in a reÂ� strictive clause (RC). This means that the use of a quantifier that triggers a presupposition will prompt the tripartite structure illustrated in (29). (29)╇ $x [salesmen (x)]RESTRICTIVE CLAUSE [walked in(x)]NUCLEAR SCOPE
Parsing, Grammar, and Children
125
Coming back to sentences containing negation and indefinite objects, we can view the wide-scope reading of indefinites with respect to negation as associated with the projection of a tripartite structure, (30b). On this interpretation, the indefinite is treated quantificationally. By contrast, the narrow-scope reading of the indefinite with respect to negation corresponds to the cardinal reading, (30a), on which the indefinite is treated as a variable. (30) The Smurf didn’t catch two birds. a. ¬ [$x [Smurf caught 2(x) and bird(x)]]NS (indefiniteâ•–=â•–variable) b. $2x [bird(x)]RC ¬[Smurf caught (x)]NS (indefiniteâ•–=â•–quantifier) On this view, one can hypothesize that children’s preference for the (isomorphic) narrow-scope reading of the indefinite in sentences like The Smurf didn’t catch two birds correlates with a preference for the representation in (30a). In other words, children’s “isomorphic” behavior corresponds to the selection of a representation of indefinite objects that does not involve the projection of a tripartite structure. A consequence of this line of reasoning is that the wide-scope interpretation of indefinites in negative sentences should be facilitated if a tripartite structure is independently required. In turn, this suggests that the wide-scope interpretation of indefinites in negative sentences should be facilitated if the indefinite NP also triggers a presupposition. One linguistic expression that fits this description is given by partitive constructions, as in (31). (31)╇ The Smurf didn’t catch two of the birds. The strongly preferred interpretation of (31) is the presuppositional, widescope reading. An explanation consistent with Diesing’s view would point to the presupposition triggered by the definite article the. On Diesing’s view, the need to represent the presupposition of existence triggered by the definite article will lead to a tripartite structure for (31), thereby generating the structure associated with the wide-scope interpretation of the indefinite object. Thus, if children’s preference for the narrow-scope reading of sentences like The Smurf didn’t catch two birds arises when they do not treat the object NP as€being presuppositional, we predict that the presence of an inherently preÂ� suppositional NP, such as the partitive in (31), should lead children to assign such sentences a wide-scope interpretation much more often than in the former case. In order to test this prediction, Musolino and Gualmini (2004) tested 22 English-speaking children between the ages of 3;9 and 4;11 (mean age 4;4) and 22 English-speaking adults on their interpretation of sentences like (32) and (33) in contexts that made the wide-scope reading true and the narrow-scope reading false.
126
Musolino and Gualmini
(32)╇ The Smurf didn’t catch two birds. (33)╇ The Smurf didn’t catch two of the birds. Musolino and Gualmini found that, whereas adults overwhelmingly accepted the puppet’s statements in both the partitive and the non-partitive condition (96 percent and 95 percent of the time, respectively), children’s tendency to assign such sentences a wide-scope, non-isomorphic reading was significantly higher in the partitive condition (33) than in the non-partitive condition (32) — 75 percent and 25 percent, respectively. As predicted, the presence of a partitive in sentences like (33) yielded a significant increase in children’s ability to access the non-isomorphic interpretation. In order to see whether this result would extend to sentences containing some, Musolino and Gualmini tested a group of 15 English-speaking children between the ages of 3;6 and 5;1 (mean age 4;4) and 15 adult native speakers of English on their comprehension of sentences like (34).11 (34)╇ The fireman didn’t find some of the guys. Sentences like (34) were used in contexts in which the wide-scope, non-Â� isomorphic reading (i.e., ‘someâ•–>â•–not’) was true but the narrow-scope, isomorphic reading (i.e. ‘notâ•–>â•–some’) was false. Musolino and Gualmini found that children and adults did not differ significantly in how often they accepted the puppet statements, i.e., 83.3 percent and 73.3 percent of the time, rÂ�espectively.12 Implications for Language Development and Sentence Processing
The research discussed in the present paper indicates continuity of representation as well as continuity of processes between children and adults, thereby lending support to the Continuity Assumption proposed by Pinker (1984, p. 9): “In sum, I propose that the continuity assumption be applied to accounts of children’s language in three ways: in the qualitative nature of the child’s abilities, in the formal nature of the child’s grammatical rules, and in the way that those rules are realized.” Moreover, the findings discussed here concerning children’s interpretation of sentences containing negation and quantified NPs can be brought to bear on recent theoretical proposals in the literature on sÂ�entence processing. A question that has begun to attract the attention of researchers in the field of sentence processing is what we might call “semantic processing” or, more specifically, how the parser constructs LF representations on-line from available syntactic representations (e.g., Kurtzman and MacÂ� Donald 1993; Tunstall 1997; Frazier 1999; Villalta 2003). In order to address such questions, psycholinguists have often turned to the phenomenon of scope
Parsing, Grammar, and Children
127
ambiguity, for reasons which should by now be clear. As Villalta (2003) observes, several recent studies have converged on the notion that the parser prefers to build LF representations that differ minimally from the corresponding surface syntactic representations: Indeed, the few recent studies that have been explicit about how corresponding LF representations are associated to surface representations, argue that the parser first chooses to construct the LF that requires minimal changes from the surface representation (e.g., the Principle of Scope Interpretation by Tunstall 1997 and the Minimal Lowering Principle by Frazier 1999). Of particular interest is the proposal made by Tunstall (1997), which explicitly addresses the question of how LF representations for sentences with multiple quantifiers are constructed. Her Principle of Scope Interpretation states that the preferred interpretation of a sentence corresponds to the LF that differs minimally from the surface structure. (2003: 123)
This captures the essence of the notion of isomorphism that we have discussed throughout the present paper. In other words, the findings presented here regarding children’s interpretation of sentences containing multiple quantificational operators (i.e. negation and quantified NPs) seem to comport rather well with the predictions of principles such as Tunstall’s Principle of Scope Interpretation and Frazier’s Minimal Lowering Principle. The challenge for such accounts, of course, is to explain why this preference isn’t always present in adults (e.g. sentences like Every horse didn’t jump over the fence, in which children and adults assign different interpretations). Moreover, such accounts would also need to explain why in certain cases (e.g., sentences containing partitives, as discussed in the previous section), children (and adults) arrive at interpretations that differ from the surface structure. Finally, processing accounts of the kind described above would also need to explain why and how certain contextual manipulations lead to significant improvements in performance and hence to a relaxation of isomorphism (see fifth section). In sum, although processing accounts relying on principles minimizing differences between surface structure and LF capture the broad pattern described in this pÂ�aper (i.e. the observation of isomorphism), many of the details remain to be accounted for. Thus, data from child language broaden the empirical basis upon which hypotheses on adult sentence processing can be evaluated. Let us now consider the main implications of our findings for theories of language acquisition. A theorist concerned with accounts of children’s lÂ�inguistic competence, and initially intrigued by the observation of isomorphism, might be disappointed by the conclusion that we ultimately reached, namely that isomorphism does not reflect a grammatical difference between children and adults. Seen from this perspective, the discovery of isomorphism may look like an unnecessary detour en route to a proper characterization of preschoolers’
128
Musolino and Gualmini
grammatical knowledge. However, such a pessimistic view misses two important points. First, recall from the fourth section that we were able to determine that children compute scope relations on the basis of c-command relations — and hence that they know c-command — precisely because they behaved isomorphically. Thus, a “mere” performance difference between children and adults nevertheless has important implications for accounts of children’s grammatical competence and, consequently, for theories of language acquisition. Notice that the same point can be made regarding children’s ability to distinguish between partitive and non-partitive NPs (see the preceding section).€Here too, we were able to find that children have knowledge of a subtle linguistic property having to do with presuppositional status of NPs, precisely because of their isomorphic behavior in the case of plain NPs and lack thereof in the case of partitive NPs. Second, recall that the observation of isomorphism allowed us to demonstrate, in the case of sentences containing numerally quantified NPs (e.g. two frogs) and negation (see fourth section), that the effect observed in children represents an exaggerated interpretive preference also observable in adults (see Musolino and Lidz 2006). Furthermore, Musolino and Lidz showed that under certain conditions, adults can be “turned into children” and can display the isomorphism effect seen in children. Finally, Musolino and Lidz showed that the same contextual factors that enable children to override their isomorphic preferences have a similar effect on adults “turned into children.” These results underscore the fact that although the sentence-processing aÂ�bilities of children and adults may differ quantitatively, they do not differ qualitatively — at least within the confines of the phenomena under investigation. Notes 1.╇ For a more detailed description of the TVJT, see Crain and Thornton 199. 2.╇ All the differences between children and adults reported in this paper as significant are significant at the .05 level. 3.╇ Since children’s responses to sentences like (7) and (8) comport so well with grammatical judgments (which are extremely clear in these cases), adult data were not collected for these sentences. 4.╇ Recall that when the universally quantified NP occurs in subject position, as in sentences like Every horse didn’t jump over the fence, children assigned a ‘notâ•–>â•–every’ interpretation only 7.5 percent of the time. 5.╇ In fact, the puppet’s statements in this condition used the future tense instead of the past tense — e.g., Some girls won’t ride on the merry-go-round instead of Some girls didn’t ride on the merry-go-round. This difference, however, is not relevant for the purposes of the present discussion.
Parsing, Grammar, and Children
129
6.╇ In fact, one group of children (nâ•–=â•–18) was tested on their comprehension of expressions like something/someone and a second group (nâ•–=â•–12) was tested on their comprehension of expressions like some N. 7.╇ For sentences like Every horse didn’t jump over the fence (which is ambiguous) and sentences like The detective didn’t find someone (which may be ambiguous), the readings that adults accessed are reported in Musolino (1998). 8.╇ A c-commands B if and only if (i) the first branching node dominating A also dÂ�ominates B, (ii) A does not dominate B, and (iii) A is different from B (see Reinhart 1976). 9.╇ For further evidence that this preference is based on c-command relations and for a discussion of the implications of this result, see Lidz and Musolino 2002 and Lidz and Musolino 2003. 10.╇ It is worth observing that children’s ability to access non-isomorphic readings of negative sentences that point a discrepancy between an expected outcome and the actual outcome also emerged from an experiment investigating sentences containing three scope-bearing elements, such as Every farmer didn’t clean some animal (see Gualmini 2003). 11.╇ The effect of the partitives in minimal pairs such as (32) and (33) having been established, controls of the form The fireman didn’t find some guys were not used in the case of (34). Another reason to not include such controls is that previous findings (e.g. Musolino 1998) demonstrate that children behave isomorphically in such cases. 12.╇ The findings are consistent with the hypothesis that different scope assignments correlate with different semantic representations. However, we would not necessarily conclude that the difference in semantic representation dictates what scope assignment children select. It is possible that the projection of tripartite structure is too costly for young children and that, when they can, children resort to the semantic representation that does not involve such a structure. It is also possible, however, that children are perfectly capable of constructing a semantic representation involving a tripartite structure, but for independent reasons they do so only in limited circumstances. References Berwick, R. 1985. The Acquisition of Syntactic Knowledge. MIT Press. Chomsky, N. 1965. Aspects of the Theory of Syntax. MIT Press. Crain, S., and Nakayama, M. 1987. Structure dependence in grammar formation. Language 63, 522–543. Crain, S., Ni, W., and Conway, L. 1994. Learning, parsing and modularity. In C. Clifton, L. Frazier, and K. Rayner (eds.), Perspectives on Sentence Processing. Erlbaum. Crain, S., and Thornton, R. 1998. Investigations in Universal Grammar: A Guide to Research on the Acquisition of Syntax and Semantics. MIT Press. Crain, S., Gardner, A., Gualmini, A., and Rabbin, B. 2002. Children’s command of negation. In Proceedings of the Third Tokyo Conference on Psycholinguistics.
130
Musolino and Gualmini
De Villiers, J., and Tager-Flusberg, H. 1975. Some facts one simply cannot deny. Journal of Child Language 2, 279–286. Diesing, M. 1992. Indefinites. MIT Press. Fodor, J. A., Bever, T., and Garrett, M. 1974. The Psychology of Language. McGraw-Hill. Frazier, L., and Fodor, J. D. 1978. The sausage machine: A new two-stage parsing model. Cognition 6, 291–328. Frazier, L., and Rayner, K. 1982. Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14, 178–210. Frazier, L. 1999. On Sentence Interpretation. Kluwer. Freeman, N. H., Sinha, C. G., and Stedmon, J. A. 1982. All the cars — which cars? From word to meaning to discourse analysis. In M. Beveridge (ed.), Children Thinking Through Language. Edward Arnold. Gualmini, A. 2003. The Ups and Downs of Child Language. Ph.D. dissertation, University of Maryland, College Park. Heim, I. 1982. The Semantics of Definite and Indefinite Noun Phrases. Ph.D. dissertation, University of Massachusetts, Amherst. Horn, L. R. 1989. A Natural History of Negation. University of Chicago Press. Hornstein, N. 1984. Logic as Grammar. MIT Press. Hornstein, N. 1995. Logical Form: From GB to Minimalism. Blackwell. Jackendoff, R. 1972. Semantic Interpretation in Generative Grammar. MIT Press. Kurtzman, H. S., and MacDonald, M. C. 1993. Resolution of quantifier scope ambiguities. Cognition 48, 243–279. Lidz, J. 1999. The morphosemantics of object case in Kannada. In Proceedings of WÂ�CCFL 18. Lidz, J., and Musolino, J. 2002. Children’s command of quantification. Cognition 84, 113–154. Lidz, J., and Musolino, J. 2003. Continuity in linguistic development: The scope of syntax and the syntax of scope. Manuscript, Northwestern University and Indiana UÂ�niversity. May, R. 1977. The Grammar of Quantification. Doctoral dissertation, Massachusetts Institute of Technology. Milsark, G. L. 1977. Toward an explanation of certain peculiarities of the existential construction in English. Linguistic Analysis 3 (1), 1–29. Musolino, J. 1998. Universal Grammar and the Acquisition of Semantic Knowledge: An Experimental Investigation of Quantifier-Negation Interaction in English. Doctoral dissertation, University of Maryland. Musolino, J. 2000. Universal quantification and the competence/performance distinction. Presented at Boston University Conference on Language Development. Musolino, J., Crain, S., and Thornton, R. 2000. Navigating negative quantificational space. Linguistics 38 (1), 1–32.
Parsing, Grammar, and Children
131
Musolino, J., and Gualmini, A. 2004. The role of partitivity in child language. Language Acquisition 12, 97–107. Musolino, J., and Lidz, J. 2003. ‘The scope of isomorphism: Turning adults into children. Language Acquisition 11 (4), 277–291. Musolino, J., and Lidz, J. 2006. Why children aren’t universally successful with quantification. Linguistics 44 (4), 817–852. Partee, B., ter Meulen, A., and Wall, R. 1990. Mathematical Methods in Linguistics. Kluwer. Postal, P. 1966. On so-called pronouns in English. In F. Dineen S. J. (ed.), Report of the 17th Annual Round Table Meeting on Languages and Linguistics. Georgetown University Press. Reinhart, T. 1976. The Syntactic Domain of Anaphora. Doctoral dissertation, Massachusetts Institute of Technology. Reinhart, T. 1997. Quantifier scope: How labor is divided between QR and choice functions. Linguistics and Philosophy 20: 335–397. Trueswell, J. C., Sekerina, I., Hill, N., and Logrip, M. 1999. The kindergarten-path effect: Studying on-line sentence processing in young children. Cognition 73, 89–134. Tunstall, S. 1997. Interpreting Quantifiers. Doctoral dissertation, University of Massachusetts, Amherst. Villalta, E. 2003. The role of context in the resolution of quantifier scope ambiguities. Journal of Semantics 20, 115–162. Wason, P. 1972. In real life negatives are false. Logique et Analyse 15, 17–38.
6â•…
A Cross-Linguistic Study on the Interpretation of Pronouns by Children and Agrammatic Speakers: Evidence from Dutch, Spanish, and Italian Esther Ruigendijk, Sergio Baauw, Shalom Zuckerman, Nada Vasic´, Joke de Lange, and Sergey Avrutin
Introduction
Young children and agrammatic aphasic speakers show atypical interpretation of pronouns, but not of reflexive elements. When confronted with sentences such as (1), English and Dutch children four to six years of age exhibit chance level performance on (1a) but have no problems with sentences like (1b) (e.g., Chien and Wexler 1990; Koster 1993). (1) a. The boy washes him. (chance performance) b. The boy washes himself. (good performance) The same pattern has been found for English-speaking agrammatic aphasics (Grodzinsky et al. 1993). Apparently, the interpretation of pronouns is more difficult than the interpretation of reflexives for both populations. In the literature on language acquisition, this phenomenon is known as the delay of Principle B effect (Chien and Wexler 1990; Philip and Coopmans 1996). Studies on the comprehension of sentences like (1) demonstrate that the interpretation of pronouns is non-adult-like for children and is disturbed in agrammatic aphasia. Nevertheless, there are also studies which show that pronouns are not always difficult to interpret for these populations, and that, importantly, the interpretation seems to be governed by linguistic principles (Avrutin and Thornton 1994; Avrutin et al. 1999; Grodzinsky and Reinhart 1993; Grodzinsky et al. 1993; Thornton and Wexler 1999). In this paper, we explore further the linguistic principles that underlie the interpretation of pronouns and reflexives by examining two sentence types: simple transitive sentences (as in (2a)) and Exceptional Case Marking (ECM) constructions, as in (2b). (2) Transitive versus ECM a. *John said that the boyi touched himi. b. *John said that the boyi saw himi dance.
134
Ruigendijk et al.
We use these two different structures to distinguish between important linguistic theories: Government and Binding Theory (GB, Chomsky 1981) on the one hand and Reflexivity (REFL, Reinhart and Reuland 1993) and Primitives of Binding (PoB, Reuland 2001) on the other hand. The paper is organized as follows. First, the relevant aspects of the linguistic approaches to binding are summarized. Next, relevant data from earlier studies on language acquisition and agrammatism are presented. This is followed by a description of the methods used in the study. Finally, the results are reported and discussed. We present an alternative account of data from earlier studies and the new data from the present study. Linguistic Background
According to Government and Binding theory (Chomsky 1981), the inÂ� terpretation of pronouns and reflexives is governed by two complementary pÂ�rinciples: Binding Principles (GB, Chomsky 1981) Principle A: An anaphor is bound within its governing category.1 Principle B: A pronoun is free in its governing category. Principle B rules out the interpretation in which the pronoun ‘him’ in (1a) (and (2a) and (2b)) refers to the subject ‘the boy’. Conversely, the reflexive anaphor ‘himself’ must refer to the subject ‘the boy’ in (1b), following Principle A. Reinhart and Reuland (REFL 1993), among others, have pointed out that there are some problems with this description. One problem for GB arises when it is applied to languages with more than one reflexive element. Dutch, Spanish, and Italian have both simple and complex anaphors, or SE- (Simplex Expression) and SELF-anaphors (Reinhart and Reuland 1993). These elements do not have the same distribution. For example, Dutch zich and zichzelf cannot always occur in the same position: (3) a. De jongeni gedraagt zichi/*zichzelfi The boy behaves SE/*himself b. De jongeni ziet *zichi/zichzelfi The boy sees *SE/ himself In order to account for the distribution of pronouns and both SE- and SELFanaphors, Reinhart and Reuland (1993) propose an alternative approach to Binding theory: Reflexivity. In this model, the referential behavior of pronouns and reflexives is constrained by two interacting modules: the (alternative)€binding theory, consisting of two binding conditions, and a (generalized) A-Chain condition.
Interpretation of Pronouns
135
Binding Conditions (REFL, Reinhart and Reuland 1993) Condition A: A reflexive-marked predicate must be interpreted reflexively. Condition B: A reflexive predicate must be reflexive-marked. General Condition on A-Chains A maximal A-Chain (α1â•–.â•–.â•–.â•–αn) has exactly one link: α1, which is both +R and marked for structural Case (where an element is +R when it is referentially independent and specified for all phi-features). The distribution of SE- and SELF-anaphors in (3) follows from Condition B: a reflexive predicate must be reflexive-marked. According to Reinhart and ReuÂ� land, reflexive marking can take place at two different levels: in the lexicon, as a result of particular lexical-semantic properties of the verb, or in the syntax, with the help of SELF-anaphors in object position. Verbs like gedragen ‘bÂ�ehave’ (3a) belong to a class of inherently reflexive verbs whose lexical-Â� semantic properties allow them to be reflexive-marked. This means that SELFanaphors are not needed to reflexive-mark the predicate.2 Verbs like zien ‘see’, on the other hand, are not inherently reflexive. Therefore, if they are interpreted reflexively, they require the presence of a SELF-anaphor in the object position to reflexive-mark the verb, as in (3b). The ungrammaticality of (2a), repeated here as (4), can be accounted for in the same way: ‘touch’ is not an inherently reflexive predicate; therefore Condition B is violated when (4) is interpreted reflexively. (4)╇ *John said that the boyi touched himi. However, Condition B cannot account for the ungrammaticality of (2b), repeated here as (5). (5)╇ *John said that the boyi saw himi dance. Here ‘the boy’ and ‘him’ are not coarguments of the same verb, ‘the boy’ being the subject of ‘saw’ and ‘him’ being the subject of ‘dance’. This means that no reflexive predicate is formed and therefore Principle B is irrelevant. Reinhart and Reuland (1993) claim that (5) is ungrammatical because it violates the A-Chain Condition. They argue that pronominal elements establish an A-Chain with local binders. The Chain Condition states that the tail of an A-Chain must be [−R], i.e., referentially deficient. According to Reinhart and Reuland, pronouns and DPs are [+R], whereas SE-anaphors (and SELF-anaphors) are [−R]. The [+R] specification is dependent on the specification of phi-features and structural case. In his Primitives of Binding, Reuland (2001) restates the Reflexivity approach in terms of the Minimalist Program. There are two important dÂ�ifferences
136
Ruigendijk et al.
between Reflexivity and Primitives of Binding. First, Reuland shows how the different conditions on binding can be derived, rather than postulated as was done in GB and REFL; second, he proposes to replace the Chain Condition by an economy hierarchy for establishing referential relationships. Reuland (2001) proposes that Condition B should be viewed as a filter on arity reduction in syntax. Reflexivization involves the reduction of one theta role by identifying it with another theta role, i.e., it turns a two-place predicate (λxλy (xRy)) into a one-place predicate (λx (xRx)). Concretely, arity reduction is not possible in the computational system or in the course of the interpretational process, unless the verb is reflexive-marked in the lexicon. In some languages this is done by adding a reflexive-marking morpheme to the verb; in other languages, including Dutch, reflexive marking is tied to specific lexicalsemantic properties of the verb. Reuland’s (2001) additional contribution in fine-tuning the concept of binding concerns the nature of Chains. He argues that Chain formation is a byproduct of a feature-checking operation between a pronominal element and a local DP. This operation leads to the deletion of the phi-features of the pronominal element and to the recovery of these features by the local DP. As a result of this operation, a referential dependency — an A-Chain — is established between the pronominal element and the local DP. Reuland further claims that [number], unlike [ person], cannot be recovered by the local DP, owing to its different ontological status.3 This means that no Chain can be created between a pronoun, such as hem ‘him’, and a local DP (as in (4) and (5)), since, unlike SE-anaphors such as zich, pronouns such as hem ‘him’ are specified for both [ person] and [number]: zich can be singular or plural, whereas hem ‘him’ is only singular. In our view the most important modification of PoB is the introduction of an economy hierarchy on referential dependencies, since this hierarchy relates theoretical linguistic notions to ( psycholinguistic) processing aspects of establishing dependencies. According to Reuland (2001), referential dependencies can be established at different linguistic levels: (i)╇ Narrow syntax (feature checking) (ii)╇ C(onceptual)-I(ntentional) interface ( bound-variable interpretation) (iii)╇ Knowledge based (discourse) (coreference)4 Referential dependencies established at different grammatical levels have different computational costs attached to them. The cost depends on the number of cross-modular operations necessary for interpreting a referential element. The most economical dependency is formed by checking features in narrow syntax, thus establishing an A-Chain between an element underspecified for
Interpretation of Pronouns
137
[number], for example an SE-anaphor such as Dutch zich, and a local DP, as in (6). (6) Het meisje wast zich The girl washes SE ‘The girl washes herself’ This possibility is not available to pronouns, because they are specified for [number] and cannot be involved in a feature-checking operation. However, pronouns can in principle enter into dependencies formed outside narrow syntax, at the C-I interface (a bound variable interpretation) or at the discourse level (coreferential interpretation). Both types of dependencies are assumed to be less economical than dependencies formed in narrow syntax, since a boundvariable interpretation requires one extra cross-modular operation and a coreferential interpretation requires two, thus demanding extra computational resources. These other types of dependencies are possible only if no cheaper options are available, otherwise the economy hierarchy would be violated. Syntactic chain formation is therefore the only possible way of establishing a referential dependency between two local elements, if the feature-checking operation is available in a language, and the alternative options ( bound variable and coreferential interpretation) do not yield a different interpretation. Thus, according to Reuland (2001), (7a) is out, because it violates the economy hierarchy that favors Chain formation (as in 7b) over referential dependencies created outside narrow syntax. (7) a. *Mariai voelde haari wegglijden (Bound-variable/coreference: Mary felt her slide-away MÂ�aria — haar) b. Mariai voelde zichi wegglijden (Chain formation: Maria — zich) Mary felt SE slide-away To conclude, REFL and PoB differ in one important aspect from the GB approach to binding. REFL and PoB treat the ungrammaticality of (8a) and (8b) as having a different source, whereas the GB approach considers them both Principle B violations in that the pronoun hem ‘him’ should be free in its governing category. (8) Pronouns in transitive versus ECM constructions a. *De jongeni heeft hemi aangeraakt The boy has him touched *‘The boyi touched himi’ b. *De jongeni zag hemi dansen The boy saw him dance *‘The boyi saw himi dance’
138
Ruigendijk et al.
According to REFL, only (8a) involves a Condition B violation: aangeraakt ‘touched’ is interpreted reflexively, but it is not reflexive-marked. In addition, (8a) violates the chain condition, since the pronoun hem is fully referential. The sentence in (8b) only violates the chain condition; Condition B does not apply, since de jongen ‘the boy’ and hem ‘him’ are not coarguments of the same verb. PoB also makes a distinction between (8a) and (8b), but it differs from REFL in one important respect: the economy hierarchy of establishing referential dependencies. According to Reuland (2001), (8a) and (8b) are ruled out since they violate the economy hierarchy: dependencies created in narrow syntax are cheaper than dependencies created outside narrow syntax. The difference between (8a) and (8b) under PoB is that in (8a) (and not in (8b)) de jongen and hem are coarguments and thus there is an arity-reduction violation. Arity reduction is not permitted here, since the Dutch verb aanraken ‘to touch’ is not reflexive-marked in the lexicon. The different status of pronouns in ECM sentences (such as (8b)) and transitive sentences (such as (8a)) makes it possible to investigate the roles that different modules of our language system (lexicon, narrow syntax, C-I interface, discourse) play in the interpretation of pronominal anaphora. Importantly, the cross-modular approach to pronominal reference assignment as proposed by Reuland (2001) also has implications for language acquisition and breakdown. This approach enables us to pinpoint the linguistic level or levels at which children’s and agrammatic aphasics’ problems with the interpretation of pronouns occur. Language Acquisition and Agrammatic Aphasia
It has repeatedly been shown that children make more mistakes in interpreting pronouns than in interpreting reflexive elements (e.g. Chien and Wexler 1990; Koster 1993; Philip and Coopmans 1996; Thornton and Wexler 1999). At first sight, children’s acceptance of the local subject as the antecedent of the pronoun could be interpreted as the result of children’s failure to acquire Principle B. Alternatively, both Principle A and Principle B may be innate, but binding domains may be parameterized (Wexler and Manzini 1987). Concretely, children may initially consider the VP as the relevant binding domain of pronouns, allowing them to be bound by the local subject freely. Children stop allowing this after they extend the binding domain of pronouns to IP (McKee 1992). Grodzinsky et al. (1993) studied the interpretation of pronouns and reflexives by agrammatic aphasic patients and found the same pattern in their experiment with aphasics that Chien and Wexler (1990) had found with children: the interpretation of reflexives did not cause any problems, whereas the inter-
Interpretation of Pronouns
139
pretation of pronouns in transitive sentences was at chance level. Interestingly, both Chien and Wexler (1990) and Grodzinsky et al. (1993) showed that children and agrammatic aphasics did not violate Principle B of the (standard) Binding Theory. They found that when the local subject was quantified, as in (9a), both populations scored well above chance level. (9) a. Every boy touched him (above chance performance) b. The boy touched him (chance performance) This suggests that children and agrammatic aphasics do not have difficulties with binding but with coreference.5 Local coreference rests on the interpretation of pronouns as free variables. It is not completely impossible in languages such as English and Dutch, but it is limited to special contexts.6 Apparently children and agrammatics allow local coreference in contexts where nonbrain-damaged adults reject it. Grodzinsky and Reinhart (1993) have proposed that local coreference is constrained by a syntax-discourse interface principle called Rule I. This principle states that local coreference is possible only if it yields a different interpretation than the bound-variable construal. Grodzinsky et al. (1993) further claim that children and agrammatic aphasics allow local coreference in sentences like (9b) because they have trouble applying Rule I. This rule requires the speaker to keep two different possible interpretations of the same sentence in short-term memory (the coreference and the bound variable interpretation) in order to check whether applying coreference yields a different interpretation than variable binding. Grodzinsky et al. (1993) argue that children and agrammatic patients often fail to do so because of their more limited processing resources. As a result, they resort to guessing in order to determine the reference of the pronoun. Unlike referential DPs, quantified DPs cannot establish coreference relations with pronouns; they can only bind them. Since a binding construal of (9b) leads to a Principle B violation, children and agrammatic aphasics reject a reflexive interpretation. This account of the errors with pronouns is not complete, since it is mainly based on the interpretation of pronouns in simple transitive sentences. Recently, Philip and Coopmans (in Dutch and English, 1996) and Baauw (in Spanish, 2002) also investigated children’s interpretation of pronouns in ECM constructions. Philip and Coopmans (1996) found that 5-year-old Dutch children performed at chance level on sentences such as (10a), but far below chance level (only 20 percent correct) on sentences like (10b). (10) a. De jongen heeft hem aangeraakt The boy has him touched ‘The boy touched him’
140
Ruigendijk et al.
b. De jongen zag hem dansen The boy saw him dance ‘The boy saw him dance’ Baauw (2002) also found a strong contrast in performance between simple transitive sentences and ECM sentences for Spanish children. The difference between the two types of sentences is that in ECM sentences only the Chain condition (Reinhart and Reuland 1993)7 prevents the pronoun from being bound by the main clause subject. Condition B is irrelevant. Philip and Coopmans (1996) proposed that as a result of an incomplete morphosyntactic acquisition Dutch children often misanalyze third-person pronouns as [−R] elements, treating them as SE-anaphors such as Dutch zich (see Baauw 2002 for a similar but not identical proposal). When the pronoun is interpreted as a [−R] element, the reflexive interpretation of (10b) becomes grammatical, but not the reflexive interpretation of (10a), since this interpretation would still violate Condition B. To summarize, the errors that children and agrammatic speakers make with pronouns in simple transitive sentences have been attributed to (i) a problem in the syntax-discourse interface, not in syntax proper, and have been proposed to€ result from (ii) a language-processing problem, not from incomplete acÂ� quisition or loss of knowledge. The former explains why neither children nor agrammatic aphasics violate syntactic principles such as Principle B, as is shown by their good performance on sentences with quantified subjects. The higher error rate on ECM constructions could be related to the incomplete acquisition of morphosyntactic features and thus have a morphosyntactic source. We would like, however, to propose an alternative account, and to show that it is also possible to explain these data (and our own data) assuming a lack of processing resources for both children and agrammatic aphasic speakers. The aim of the present study is to investigate the nature of the difficulties€ with the interpretation of pronouns in both simple transitive and ECM sentences. We will present experimental results of children and agrammatic speakers of Dutch, Spanish, and Italian. The reason for this comparative approach is that the acquisition data show interesting cross-linguistic differences. It has been reported that Spanish and Italian 5-year-olds do not show a delay in Principle B effect in simple transitive sentences like (11), performing almost like adults (Baauw, Escobar, and Philip 1997; McKee 1992). (11) La niña la señala the girl her touches ‘The girl is touching her’ Baauw (2002) argues that the absence of the delay in Principle B effect in the Romance languages is due to the fact that Romance weak pronouns, being
Interpretation of Pronouns
141
syntactic clitics, cannot establish local coreference relations (see Baauw 2002 and Baauw and Cuetos 2003 for details). This entails that the reflexive interpretation of (11) always involves binding, which then leads to a violation of Principle B. If the cause of the difficulties with pronouns is similar in children and agrammatic aphasics (i.e. as has been proposed: a lack of processing resources, Grodzinsky et al. 1993), it is expected that the same cross-linguistic d�ifferences found in children will also show up in agrammatic aphasics. In other words, as reported in earlier studies, Spanish and Italian children are not expected to show a delay in Principle B effect, and Spanish agrammatic speakers are expected to perform above chance level with the interpretation of pronouns in transitive sentences such as (11). Crucially, we expect to find a difference in the performance of children and agrammatic speakers of all three languages on pronouns in transitive and ECM constructions, where the latter will be more problematic (in line with Philip and Coopmans 1996 and Baauw 2002). These data will be used to distinguish between Government and Binding on the one hand and Reflexivity and Primitives of Binding on the other hand and to show which of the theories, in our view, can account best for the experimental r�esults. Methods Participants
We examined 28 Dutch children (ages 4;3– 6;2, mean age 5;7), 15 Dutch nonbrain-damaged adults (ages 35–84, mean age 56;6) and 8 Dutch agrammatic aphasic speakers (ages 38–82, mean age 56,8). We also examined 38 Spanish children (ages 5;1–5;11, mean age 5;6), 19 non-brain-damaged adults (ages 19;0 –22;0, mean age 218), and 3 agrammatic aphasic speakers (ages 58;7– 68;6, mean age 63;10), as well as 24 Italian children (ages 4;0 –5;7, mean age 4;7) and 24 Italian adults (ages 21;8–23;6, mean age 23). The Dutch aphasic sÂ�peakers were examined with the standard Dutch aphasia examination battery, the Aachen Aphasia Test (AAT, Graetz et al. 1992), before this study. All the patients except one (a non-fluent, non-classifiable patient) were classified as Broca’s aphasics. The Spanish patients were examined with a Spanish version of the Boston Diagnostic Aphasia Examination (BDAE) and were all classified as agrammatic Broca’s aphasics. All aphasic patients spoke agrammatically as determined by the characteristics of agrammatism proposed by Menn and Obler (1990): low number of (finite) verbs, omission of pronouns and determiners, short utterances. Only patients for whom both the speech therapist and an experienced aphasiologist agreed on the diagnosis of agrammatism were included in the study.
142
Ruigendijk et al.
Materials and Procedure
The Dutch and the Italian participants were tested with a picture-selection task, whereas the Spanish participants were examined with a truth-value judgment task (TVJT). The reason for using TVJT in the Spanish study was that one of the goals of the experiment was to investigate whether Spanish agrammatic speakers show the same high performance on (11) as was found (with TVJT) for Spanish children by Baauw (2002). Since there is evidence that pictureselection tasks lead to better performance across the board (Baauw et al., submitted), the use of this task would make it impossible to determine whether possible near perfect performance on (11) is due to similar performance of children and agrammatics or to properties of the task. Both tasks consisted of more conditions than discussed here, eight in all. The other conditions included filler sentences as well as sentences with pronouns in subject or object position, with or without contrastive stress. Results on these conditions are discussed in Zuckerman et al. 2002 and in Ruigendijk et al. 2002. Both methods are described below. Dutch and Italian Picture-Selection Task
The picture-selection task was designed as follows: Each item consisted of two conjoined sentences, always starting with a sentence like ‘First the boy and the man ate something and.â•–.â•–.â•–.’ This first conjunct was followed by a sentence of one of the conditions. An example of the Dutch and Italian conditions (with an English translation) is given in (12) and (13). Note that for Dutch we included reflexives in transitive sentences (12a) so as to be able to compare our data to patterns found in earlier studies,9 such as that of Grodzinsky et al. (1993). In a later study with four agrammatic speakers, we also tested reflexives in ECM sentences such as (12e) in order to control for a possible sentence complexity effect (see Ruigendijk et al. 2006).10 This condition is reported here as well. (12) Dutch Conditions a. reflexives in transitive sentences: .â•–.â•–. daarna heeft de man zichzelf geknepen .â•–.â•–. then has the man himself pinched ‘.â•–.â•–. then the man pinched himself’ b. pronouns in transitive sentences: .â•–.â•–. daarna heeft de man hem geknepen .â•–.â•–. then has the man him pinched ‘.â•–.â•–. then the man pinched him’ c. pronouns in ECM sentences: .â•–.â•–. daarna zag de man hem voetballen .â•–.â•–. then saw the man him playing soccer ‘.â•–.â•–. then the man saw him playing soccer’
Interpretation of Pronouns
143
d. filler sentences: .â•–.â•–. daarna hebben ze gevoetbald .â•–.â•–. then have they played soccer ‘.â•–.â•–. then they played soccer’ e. reflexives in ECM sentences: .â•–.â•–. daarna zag de man zichzelf voetballen .â•–.â•–. then saw the man himself playing soccer ‘then the man saw himself playing soccer’ (13) Italian Conditions a. pronouns in transitive sentences: .â•–.â•–. poi l’uomo l’ha pizzicato .â•–.â•–. then the man him has pinched ‘.â•–.â•–. then the man pinched him’ b. pronouns in ECM sentences: .â•–.â•–. poi l’uomo l’ha visto giocare a calico .â•–.â•–. then the man him has seen play soccer ‘.â•–.â•–. then the man saw him playing soccer’ c. reflexives in ECM sentences: .â•–.â•–. poi l’uomo si è visto giocare a calico .â•–.â•–. then the man himself is seen play soccer ‘.â•–.â•–. then the man saw himself playing soccer’ d. filler sentences: .â•–.â•–. poi hanno giocato a calico .â•–.â•–. then have played soccer ‘.â•–.â•–. then they played soccer’ The examiner, who was a native speaker of the language of the task, read the sentences, while the subject was presented with four pictures on two A4-sized sheets of paper for each item. (See figure 6.1 for some examples of the pictures.) One picture on the left side represented the first conjunct of the sentence, introducing the possible antecedents. On the right side, three pictures were presented, one depicting the action in the sentence, one depicting a direct distracter (a reflexive interpretation in the case of a pronoun condition and vice versa), and one depicting an indirect distracter with the same actors but a different action. The subject was asked to listen carefully and choose the picture that fit the sentence best. For the Dutch control group and the Dutch aphasic group we included 15 items per condition. Together with the four other conditions (included for aÂ�nother study on the effect of contrastive stress) this resulted in a battery of 120€items. To be able to examine this high quantity of items, the agrammatic speakers were tested in two sessions with one or two weeks between them. For
144
Ruigendijk et al.
Figure 6.1
An example of the picture-selection task. Testing ECM: “First the boy and the man ate and then the man saw him playing soccer.” The middle picture on the right depicts the correct response.
the Dutch children we included six items and they were tested twice with the same test, resulting in 12 testing points per condition. The Italian children and controls were tested with six items per condition, and could unfortunately be tested only once (due to limited availability of the participants). For the Dutch and the Italian children the task was presented as a game, using a puppet. (See figure 6.1.) Spanish Truth-Value Judgment Task
The truth-value judgment task differed only in the way the items were presented. Again each item was introduced by a sentence (Primero la mujer y la
Interpretation of Pronouns
145
niña rieron .â•–.â•–. ‘First the woman and the girl were laughing .â•–.â•–.’), with a picture. The experimental sentence was presented with another picture. The participant was asked to judge whether the sentence that was presented described the picture.11 We included as many ‘yes’ items as ‘no’ items, and each sentence occurred in both ‘yes’ and ‘no’ conditions. Six items were included per condition (see (14) for an example of each condition). The test consisted of 60 items in total. (14) Spanish Conditions a. reflexives in transitive sentences: .â•–.â•–. la niñase pellizcó .â•–.â•–. the girl SE pinched ‘.â•–.â•–. the girl pinched herself’ b. pronouns in transitive sentences: .â•–.â•–. la niña la pellizcó. .â•–.â•–. the girl her pinched. ‘.â•–.â•–. the girl pinched her’ c. reflexives in ECM sentences: .â•–.â•–. la mujer se vio reir .â•–.â•–. the woman SE saw laughing ‘.â•–.â•–. the woman saw herself laughing’ d. pronouns in ECM sentences: .â•–.â•–. la niña la vio bailar .â•–.â•–. the girl her saw dancing ‘.â•–.â•–. the girl saw her dancing’ e. Filler .â•–.â•–. la niña pellizcó al hombre .â•–.â•–. the girl pinched acc.-the man ‘.â•–.â•–. the girl pinched the man’ For the children the experiment was presented as a game (with a puppet that “produced” the experimental sentences), and the examiner was a native speaker of Spanish. During testing it was noted whether the participant said ‘yes’ or ‘no’ to the experimental sentence. For additional details on this method, see Baauw and Cuetos 2003. Results
Table 6.1 shows the results of the experiments. Both children and agrammatic aphasics made significantly fewer errors with the interpretation of pronouns in€ simple transitive sentences than in ECM constructions. This held for all
146
Ruigendijk et al.
Table 6.1
Percentages correct on the different conditions. n.a. = not available. For the Spanish data, only scores on the ‘no’ conditions are reported.
Condition
Transitive sentence
ECM construction
Reflexive
Pronoun
Reflexive
Pronoun
Fillers
Dutch
Controls Children Aphasics
╇ 98.7 ╇ 92.3 ╇ 93.3
98.2 75.0 91.7
100a n.a. ╇ 96.7b
97.8 46.7 56.7
100 ╇ 99.7 ╇ 99.2
Spanish
Controls Children Aphasics
100 ╇ 88.6 ╇ 83.3
96.3 90.4 88.7
╇ 94.4 ╇ 86.0 ╇ 83.3
96.3 59.7 11.3
100 ╇ 94 100
Italian
Controls Children
n.a. n.a.
98.6 78.5
100 ╇ 75.7
99.3 57.6
100 ╇ 97.2
a.╇ Data from another control group of 13 Dutch speakers (mean age 42;6). b.╇ Data from four of the eight agrammatic subjects.
languages that were examined (Wilcoxon Signed Ranks Test, Dutch agrammatic aphasics: Zâ•–=â•–−2.379, pâ•–<â•–.05; Spanish agrammatic aphasics: χ2â•–=â•–18.778, pâ•–<â•–.001; t-test for: Dutch children t (27)â•–=â•–7.018, pâ•–<â•–.001; Spanish children t (37)â•–=â•–5.281, pâ•–<â•–.001; Italian children t (23)â•–=â•–3.601, pâ•–<â•–.001). Moreover, both populations performed worse than their control group with regard to the interpretation of pronouns in general (see table 6.2). Children and agrammatic speakers scored at ceiling in the interpretation of filler sentences. There are some differences between the groups. Dutch aphasics, for example, did not perform differently than the Dutch control group on the interpretation of reflexive elements, whereas Dutch children scored lower than the adult control group on the interpretation of reflexive elements. Spanish children did not perform differently from the control group on the condition with pronouns in transitive and reflexives in ECM constructions, whereas Italian children performed differently from adults on all three conditions. The unexpected low score on pronouns in transitive sentences by Italian children (78.5 percent) may be caused by their low mean age. When looking at the performance of the 5-year-olds only, percentage correct on pronouns in transitive sentences becomes 87.5 percent, which is comparable to the Spanish data. Another important finding is that the Spanish data of the children and aphasics show that there is no difference in the interpretation of reflexives in transitive and ECM sentences (children: t (37)â•–=â•–0.650, pâ•–=â•–.52). The Italian children exhibit a significantly different performance between reflexives and pronouns in ECM sentences, the latter being the most difficult to interpret (t (23)â•–=â•–−3.535,
Interpretation of Pronouns
147
Table 6.2
Significant comparison between each group vs. controls for each condition, with MannWhitney U test. n.a. = not available. Transitives
ECM
Group vs. controls
Reflexive
Pronouns
Reflexive
Pronouns
Dutch aphasics
Z = –1.676, p = .19
Z = –2.586, p < .05
Z = –2.098, p = .214
Z = –4.194, p < .01
Dutch children
Z = –3.225, p < .01
Z = –4.357, p < .01
n.a.
Z = –5.206, p < .01
Spanish aphasics
Z = –4.472, p < .01
Z = –2.561, p < .01
Z = –1.906, p = .06
Z = –3.635, p < .01
Spanish children
Z = –2.078, p < .05
Z = –1.226, p = .220
Z = –1.286, p = .199
Z = –3.970, p < .01
Italian children
n.a.
Z = –4.670, p < .01
Z = –5.358, p < .01
Z = –6.042, p < .01
pâ•–<â•–.001). Finally, four Dutch agrammatics made only two errors in the interpretation of reflexives in ECM sentences, whereas these patients performed at chance level on the interpretation of pronouns in ECM sentences (and made significantly more errors than on reflexives in ECM sentences, χ2â•–=â•–23.130, pâ•–<â•–.001). Discussion
Interpreting pronouns in ECM sentences such as (15b) is more difficult than interpreting pronouns in simple transitive sentences such as (15a) for children and agrammatic speakers in the languages we tested. (15) a. transitive: the boy tickles him b. ECM: the boy sees him dancing This supports earlier results on these types of sentences reported by Philip and Coopmans (1996) and Baauw (2002). This difference in performance cannot be explained by differences in sentence complexity. As the Italian and Spanish results and the results from the Dutch agrammatics show, reflexives in ECM sentences are, unlike pronouns, relatively unproblematic, although the two types of elements occupy the same structural position. Since the different performance on (15a) and (15b) is not caused by a complexity effect, there must be a difference in the principles that govern the interpretation of pronouns in transitive and ECM sentences, as proposed by Reinhart and Reuland (1993)
148
Ruigendijk et al.
and Reuland (2001) but not by Chomsky’s Government and Binding theory (1981). Before we discuss the possible explanations for this pattern, we would like to indicate that there is a subtle difference between our results and results from earlier studies on Germanic languages. Recall that Grodzinsky et al. (1993) and Chien and Wexler (1990) found that their participants performed at chance level on the interpretation of pronouns in simple transitive sentences. Our Dutch subjects performed well above chance level for this condition. We think this can be explained by comparing the different methodologies. Studies examining pronoun comprehension by using a truth-value judgment task (TVJT) in general elicit lower scores than studies that report data from a pÂ�icture-selection task (PST). In a TVJT (which was used in Grodzinsky et al. 1993 and in Chien and Wexler 1990) subjects have to process a sentence and check whether there is a possible interpretation of that sentence that fits the picture, that is, they have to check all legal and illicit but theoretically possible interpretations of the sentence that is depicted. This is not necessary for a PST, where a correct picture and a correct interpretation of the sentence is always given. In this task, the subjects can avoid the more difficult and illicit coreference interpretation; therefore their performance will be better than on a TVJT.12 Let’s consider the different explanations for the results obtained in this and earlier studies. As a complexity-based explanation can be ruled out on the basis of the results with reflexives in ECM constructions, our results cannot be explained in terms of GB. Both REFL and PoB are compatible with the obtained pattern. According to Reinhart and Reuland’s Reflexivity (1993), the interpretation of pronouns in transitive sentences is guided by two properties, coargumenthood (i.e. condition B) and referentiality of the pronoun (i.e. the chain condition); whereas the interpretation of pronouns in ECM constructions is ruled out because of referentiality of the pronoun only. This indicates that the chain condition must play an important role, since it is this condition that makes the difference between the two sentence types if it is somehow affected or not applied. In transitive sentences but not in ECM sentences, a reflexive interpretation of a pronoun can still be ruled out on the basis of coargumenthood (Condition B). In earlier studies, it has been proposed that children interpret pronouns as −R, because they have not yet fully acquired the features of the pronouns (Philip and Coopmans 1996; Baauw 2002). This, then, has consequences for the interpretation of pronouns in ECM sentences. When children interpret the pronoun as −R, the chain condition is not violated, and a referential dependency between the pronoun and the matrix subject becomes possible in ECM sentences, whereas it is still ruled out by Condition B in simple transitive sentences. Although this may be an explanation for the problems children have with ECM sentences, there is no independent evidence — and no reason to
Interpretation of Pronouns
149
assume — that agrammatic speakers as a result of their brain damage unlearn or lose a specific syntactic feature (e.g. the number feature) and interpret pronouns as −R. To our knowledge there is no evidence reporting that they do miss a specific feature, such as number, gender, or person.13 There must be another reason why they fail to interpret pronouns in ECM sentences. Alternative Explanation
We would like to propose an alternative explanation that can account for both child language data and the agrammatic aphasics’ results. We believe the crosslinguistic, cross-population results are better accounted for by a lack of processing resources. Accounts based on the assumption that agrammatic speakers do not have enough processing resources to be able to produce complete sÂ�entences or to comprehend complex sentences have been proposed by many (e.g., Kolk and Heeschen 1992). Crucially, evidence from on-line processing studies for this type of account has been provided recently by Burkhardt, Piñango, and Wong (2003), Zurif (2003), and Swinney (2003), who all have used a cross-modal lexical-priming design to examine the reactivation of moved elements in sentences. The rationale behind this paradigm is that moved elements are reactivated at their gap and that this reactivation can be measured with a priming effect, i.e. non-brain-damaged speakers are faster in lexical decision to a word that is related to the moved constituent at the gap, compared to an earlier or later position in the sentence. Agrammatic speakers do not show this reactivation of the moved element at the gap (Zurif et al. 1993). However, these recent studies all have found that agrammatic aphasics have the same activation pattern as non-brain-damaged controls, but some milliseconds later. In other words, their syntactic system is slow (cf. Piñango 1999). Studies examining sentence processing in children using on-line techniques have recently provided evidence that goes in the same direction. In an ERP study, Hahne and Friederici (2001) examined children’s processing of syntactically anomalous sentences. They found that children process these sentences in a similar fÂ�ashion to adults; nevertheless, this process is more laborious for children than it is for adults. Another interesting on-line study was conducted by Sekerina et al. (2003), who examined children’s processing of referentially ambiguous pronouns in an eye-movement study. As already discussed, in off-line studies, such as ours, children overwhelmingly prefer the sentence internal (local) antecedent for the pronoun, even when this is not a legal option. Sekerina et al. (2003) found that children are implicitly aware of the referential ambiguity of the pronoun, they know that the pronoun can refer to a sentence external refÂ� erent. The knowledge is there but its use lags behind because their system has not yet reached its optimal capacity to be able to use this knowledge in real
150
Ruigendijk et al.
time. Children, just like agrammatic aphasic speakers, seem to lack the processing resources that are needed to use the linguistic operations on time. Interestingly, a lack of processing resources has already been used to account for problems with establishing pronominal reference by other authors (e.g. GrodÂ� zinsky et al. 1993; Piñango and Burkhardt 2001) as we have seen in the introduction. Grodzinsky et al. (1993) argued that both agrammatic aphasics and young children have problems computing Rule I due to a lack of processing resources. The next question to answer is how a lack of processing resources can account for the different performance on the two structures. Primitives of Binding can distinguish between the two sentence types under discussion. This distinction is based on the fact that in ECM sentences the pronoun and its antecedent are not coarguments, whereas they are in transitive sentences (as in Refl). Apart from this, the economy hierarchy plays an important role. Syntactic operations are cheaper for the language processor than extra-syntactic oÂ�perations (e.g. bound variable or coreference) and are applied first. Only if syntactic operations are not available can other operations be applied. So, normally when the language system “encounters” a pronoun like ‘him’ in (15b), it notices its referential deficiency and “wants” to find an antecedent. The system immediately tries to establish a dependency in Narrow Syntax, which is not possible for pronouns because of their number feature. Then other operations ( bound variable and coreference) become available for establishing a dependency. However, in our sentences (15a) and (15b), local dependencies are still ruled out, because these can in principle be established in Narrow Syntax. Thus there is a cheaper option available (i.e. cheaper than bound variable and coreference), and ‘the boy’ in (15a) and (15b) will not be available as an antecedent for the pronoun ‘him’. Note that to be able to rule out a reflexive reading of both (15a) and (15b) syntactic operations should be available on time, otherwise there is no way for the system to find out that there is a cheaper option available and rule out extra-syntactic dependencies. Suppose now (as has been argued before) that for children and agrammatic speakers syntactic operations are not available on time because of their limited processing resources (as has been shown in several on-line studies). This would make other extra-syntactic ways of establishing dependencies, such as bound variable interpretation (C-I interface) or coreference (discourse), possible in environments where they are normally ruled out. What happens if the language system of children and agrammatics encounters a pronoun? First, the number feature blocks the local dependency through narrow syntax, as it does in the “normal” system. Then other operations become available for establishing a dependency. Since the syntactic operations are not ready on time, the
Interpretation of Pronouns
151
economy hierarchy will not be violated. Or in other words, because syntax is too slow, there is no way for the system to check whether cheaper options would in principle be available and therefore rule out bound variable or coreference dependencies. Crucially, pronouns in transitive sentences cause fewer problems because of lexical restrictions on the verb, which prohibit arity reduction (i.e. the dependency between the subject and the object of the same predicate will result in reducing the number of semantic arguments from two to one. Thus, a two-place predicate will end up having only one argument; such arity reduction however is not allowed for transitive verbs (in most cases)). These restrictions do not apply in ECM constructions (the pronoun and subject are not coarguments) and since syntax is not ready on time, an extra-syntactic referential dependency through discourse, or at the C-I interface between ‘him’ and ‘the boy’ in (15b) becomes possible for agrammatic speakers and children. All featural specifications and knowledge about when syntactic dependencies can and cannot be established are preserved (syntactic knowledge in general is available), but a reduced capacity to use syntax often leads children and agrammatic speakers to code dependencies outside (narrow) syntax. Note that the good performance of agrammatic speakers and children on pronouns with a quantified subject (e.g. Chien and Wexler 1990; Grodzinsky et al. 1993) can also be accounted for with our alternative explanation. These cases require a bound-variable dependency, which is relatively cheap for agrammatics and children (as compared to syntactic dependencies). In summary, we propose that the observed pattern in child and agrammatic comprehension is not the result of a lack of knowledge, but of the lack of processing resources that are needed to carry out syntactic computations in real time. Syntactic knowledge as such is not impaired, as is shown by e.g. the (relatively) good performance on the interpretation of pronouns in transitive sentences. However, whenever a child or an agrammatic aphasic fails to use syntax to convey meaning, s/ he will resort to extra-syntactic ways to interpret or produce language, which can result in an aberrant interpretation or production pattern. Acknowledgments
This publication was supported by the project “Comparative Psycholinguistics,” which is funded by the Dutch Organization for Scientific Research ( NWO). The authors wish to thank the rehabilitation centers De Hoogstraat in Utrecht, de Trappenberg in Huizen ( Netherlands), and ADACEN in Pamplona and Tudela ( Navarra, Spain) for the opportunity to examine patients. We would also like to thank Fernando Cuetos ( University of Oviedo), Gerardo Aguado
152
Ruigendijk et al.
( University of Pamplona), and Loli García (ADACEN) for their help in organizing the Spanish agrammatism study and Maria Teresa Guasti for her help with the Italian study. We thank the teachers, parents, and children of Colegio Aldeafuente, Madrid and Collegio San Carlo di Milano. Notes 1.╇ Where α is bound by β if β c-commands α and is coindexed with it; and γ is a governing category for α if and only if γ contains α, a governor for α, and an accessible subject. 2.╇ In fact, since inherently reflexive predicates are one-place predicates and gedragen ‘behave’ has no transitive counterpart, the use of a SELF-anaphor would violate the theta-criterion. 3.╇ Reuland (2001) argues that [number] on hen ‘them’ cannot be recovered because the plurality of hen in (a) may in principle refer to another group of entities than the plurality expressed by de jongens ‘the boys’. Similarly, the singularity of hem in ( b) may be a different one than the singularity expressed by de jongen ‘the boy’. a. *De jongensi hoorden [heni zingen]. the boys heard them sing b. *De jongeni hoorde [hem i zingen]. the boy heard him sing 4.╇ For a more detailed discussion of the hierarchy, see Reuland 2001, paragraph 5. 5.╇ As observed by Bloom et al. (1994), Principle B delays do not show up in sÂ�pontaneous productions of children. This supports the claim that children observe Principle B. Apparently, when children want to express reflexivity, they will normally use reflexive pronouns. Only when they are “forced” to judge the possibility of local coreference, as in an experimental setting, do they often allow this reading. 6.╇ Local coreference is possible. Examples: Do you know what Mary and John have in common? Mary admires him and John admires him too. A: Is this speaker Zelda? B: How can you doubt it? She praises her to the sky. No competing candidate would do that. (Reinhart and Reuland 1993) 7.╇ Or the economy hierarchy on referential dependencies in terms of Primitives of Binding (Reuland 2001). 8.╇ As one reviewer pointed out, it would have been better if we had compared the results of the Spanish agrammatic speakers with age-matched controls. Currently, we are running a new experiment, with partly the same conditions, with elderly non-braindamaged subjects. Preliminary results show no difference on these conditions between elderly and younger non-brain-damaged speakers. 9.╇ We chose not to include this condition for Italian. Earlier studies showed that Italian children do not have any problems interpreting reflexives in transitive sentences (McKee 1992). Therefore, to avoid an overload of test items for the children, we decided not to include this condition in the Italian version.
Interpretation of Pronouns
153
10.╇ Unfortunately, the Dutch children were not available for examination of this condition anymore. But see Philip and Coopmans (1996) and Baauw (2002), who also examined sentences of this type. 11.╇ In the child experiment the task was disguised as a guessing game, in which one experimenter, who could not see what was happening in the picture presented to the child by a second experimenter, who acted as a “helper,” had to “guess” what was happening in the picture (e.g. “Hmm .â•–.â•–. a girl, a woman and a big mirror. Did the girl see her dance?”). The child had to judge whether the guess was correct or not. 12.╇ See also Crain and Thornton 1998 for a discussion of the different methods. Moreover, preliminary results from Zuckerman and Baauw show that Dutch children when presented with the same item but different methods indeed make many more errors with the truth-value judgment task than in the picture-selection task. 13.╇ In fact, preliminary results by Vasić and Ruigendijk (2004) show the opposite: (Dutch) agrammatic speakers are perfectly able to use syntactic features like gender and number for the interpretation of pronouns. References Avrutin, S. 1999. Development of the Syntax-Discourse Interface. Kluwer. Avrutin, S., Lubarsky, S., and Greene, L. 1999. Comprehension of contrastive stress by agrammatic Broca’s aphasics. Brain and Language 70, 163–186. Avrutin, S., and Thornton, R. 1994. Distributivity and binding in child grammar. Linguistic Inquiry 25, 165–171. Baauw, S. 2002. Grammatical Features and the Acquisition of Reference: A Comparative Study of Dutch and Spanish. Routledge. Baauw, S., Escobar, L., and Philip, W. 1997. A delay of principle B effect in Spanish speaking children: The role of lexical feature acquisition. In A. Sorace, C. Heycock, and R. Shillcock (eds.), Proceedings of GALA ’97. Human Communication Centre. Baauw, S., and Cuetos, F. 2003. The interpretation of pronouns in Spanish language acquisition and breakdown: Evidence for the ‘Delayed Principle B Effect’ as a nonunitary phenomenon. Language Acquisition 11, 219–275. Baauw, S., Zuckerman, S., Ruigendijk, E., and Avrutin, S. Submitted. Principle B Delays as a processing problem: Evidence from task effects. To appear in M. Grimm, E. Ruigendijk, and C. Hamann (eds.), Production-Comprehension Asymmetries in Child Language. Bloom, P., Barss, A., Nicol, J., and Conway, L. 1994. Children’s understanding of binding and coreference: Evidence from spontaneous speech. Language 70, 53–71. Burkhardt, P., Piñango, M., and Wong, K. 2003. The role of the anterior left hemisphere in real-time sentence comprehension: Evidence from split intransitivity. Brain and Language 86, 9–22. Chien, Y.-C., and Wexler, K. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1, 225–295.
154
Ruigendijk et al.
Chomsky, N. 1981. Lectures on Government and Binding. Foris. Crain, S., and Thornton, R. 1998. Investigations in Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics. MIT Press. Graetz, P., De Bleser, R., and Willmes, K. 1992. Akense Afasie Test. Swets en Zeitlinger. Grodzinsky, Y., and Reinhart, T. 1993. The innateness of binding and coreference. Linguistic Inquiry 24, 69–102. Grodzinsky, Y., Wexler, K., Chien, Y.-C., Marakovitz, S., and Solomon, J. 1993. The breakdown of binding relations. Brain and Language 45, 396 – 422. Hahne, A., and Friederici, A. 2001. Development patterns of brain activity. In J. Weissenborn and B. Höhle (eds.), Approaches to Bootstrapping: Phonological, Lexical, Syntactic and Neurophysiological Aspects of Early Language Acquisition, volume 2. John Benjamins. Kolk, H., and Heeschen, C. 1992. Agrammatism, paragrammatism and the management of language. Language and Cognitive Processes 7, 89–129. Koster, C. 1993. Errors in Anaphora Acquisition. Doctoral dissertation, Utrecht UÂ�niversity. McKee, C. 1992. A comparison of pronouns and anaphors in Italian and English acquisition. Language Acquisition 2, 21–54. Menn, L., and Obler, L. 1990. Agrammatic Aphasia: A Cross-Language Narrative Sourcebook. John Benjamins. Philip, W., and Coopmans, P. 1996. The role of lexical feature acquisition in the development of pronominal anaphora. In W. Philip and F. Wijnen (eds.), Amsterdam Series on Child Language Development, volume 5. Instituut Algemene Taalwetenschap 68, Amsterdam. Piñango, M. 1999. Syntactic displacement in Broca’s aphasias comprehension. In R. Bastiaanse and Y. Grodzinsky (eds.), Grammatical Disorders in Aphasia: A Neurolinguistic Perspective. Whurr. Piñango, M., and Burkhardt, P. 2001. Pronominals in Broca’s aphasia comprehension: The consequences of syntactic delay. Brain and Language 79, 1, 167–168. Reinhart, T., and Reuland, E. 1993. Reflexivity. Linguistic Inquiry 24, 657–720. Reuland, E. 2001. Primitives of binding. Linguistic Inquiry 32, 439– 492. Ruigendijk, E., Vasić, N., and Avrutin, S. 2002. The comprehension of pronouns and contrastive stress in Dutch agrammatism. Brain and Language 83, 182–184. Ruigendijk, E., Vasić, N., and Avrutin, S. 2006. Reference assignment: Using language breakdown to choose between theoretical approaches. Brain and Language 96, 302– 317. Sekerina, I. A., Stromswold, K., and Hestvik, A. 2004. How do adults and children process referentially ambiguous pronouns? Journal of Child Language 31, 123–152. Swinney, D. 2003. Psycholinguistic approaches. Presented at Fourth Science of AÂ�phasia Conference. Thornton, R., and Wexler, K. 1999. Principle B, VP Ellipsis and Interpretation in Child Grammar. MIT Press.
Interpretation of Pronouns
155
Vasić, N., and Ruigendijk, E. 2004. Morphosyntactic features do matter. Brain and Language 91 (1), 110 –111. Wexler, K., and Manzini, R. 1987. Parameters and learnability in Binding Theory. In T. Roeper and E. Williams (eds.), Parameter Setting. Reidel. Zuckerman, S., Vasić, N., Ruigendijk, E., and Avrutin, S. 2002. Experimental evidence for the subject rule. In Proceedings of Israel Association for Theoretical Linguistics 18. Zurif, E. B. 2003. The neuroanatomical organization of some features of sentence comprehension: Studies of real-time syntactic and semantic composition. Psychologica 32, 31–24. Zurif, E. B., Swinney, D., Prather, D., Solomon, J., and Bushell, C. 1993. An on-line analysis of syntactic processing in Broca’s and Wernicke’s aphasia. Brain and Language 45, 448– 464.
7â•…
Processing or Pragmatics? Explaining the Coreference Delay Tanya Reinhart
Chien and Wexler (1990) were pioneering in establishing the basic generalization regarding the acquisition of anaphora. Based on experiments with a large number of children (177, aged 2;6 to 7;0), they showed that acquisition delays are found only with coreference, and not with the binding theory in general: Children performed well on the variable-binding aspects of the binding theory, including condition B, but they performed poorly on coreference in condition B environments. This conformed with Reinhart’s (1983) theoretical conclusion that variable binding and coreference are governed by different types of linguistic conditions. The conditions on binding are absolute output conditions, while the conditions on coreference are relative and context dependent. The acquisition question has been why this difference should entail a delay in children’s performance on coreference. In Reinhart 1983 the condition on coreference was perceived as belonging to pragmatics. It involves an inference based on knowledge of grammar, meaning, and appropriateness to context, and I believed it can be viewed as an iÂ�nstance of Gricean generalized implicatures — another area where poor performance of children has been recently discovered (Chierchia et al. 2001; Gualmini et al. 2001). Chien and Wexler formulated a similar intuition and argued that children’s coreference performance reflects a delay in acquiring the context considerations underlying a pragmatic principle. Grodzinsky and Reinhart (1993) ( henceforth G&R) took a different perspective on this question. Their point of departure has been another seminal result of Chien and Wexler’s study. Virtually all studies on the acquisition of coreference have found not just vague poor performance, but results ranging around 50 percent adult-like answers. Such figures are, in principle, consisÂ� tent€with chance performance. Chien and Wexler conducted careful statistical analyses, including individual data, and found that many of the children Â�perform at chance level individually, namely they sometimes answer “yes”
158
Reinhart
and€ sometimes “no” on the same condition. If so, then this is a pattern of Â�guessing, which is not known to be common in acquisition. G&R argued that€ the account for the coreference delay should also explain this specific Â�pattern. G&R’s account rests on a later development of the coreference condition, stated as rule I (Intrasentential Coreference). Whereas in the 1970s and the 1980s everything that was not governed by syntax proper was lumped together as “pragmatics,” in the 1990s the concept of the interface was beginning to emerge. Rule I views the coreference restriction as belonging to the context Interface, where all the components required for the coreference inference are available (syntax, semantics, and context). In today’s terminology, rule I is a procedure involving reference-set computation, namely an optimality-type procedure comparing two competing representations. To determine whether coreference is permitted in a given derivation, another representation with a bound variable should be constructed. Coreference is permitted only if the two are not equivalent in the given context. The computation involved in coreference is, thus, more complex than that involved in binding, and G&R argue that it is the computational complexity, rather than just the appeal to context, that explains children’s difficulties in the relevant tasks. The processing puts too big a load on their working memory, which is known to be less developed than that of adults. Failing the execution, they may resort to guessing. Thornton and Wexler (1999) adopt rule I, under its reformulation in Heim 1998, but they raise several arguments against the processing account, and Â�conclude that there is no reason to assume children have any difficulties in Â�processing reference set computation of the type required by rule I. They maintain that the coreference delay reflects a pragmatic deficiency, and develop an analysis of the pragmatic factors underlying rule I, which children have not acquired yet. The broader question underlying this debate is the role of processing considerations in acquisition. This factor has hardly been considered in studies on the acquisition of syntactic competence. However, it has been independently established that working memory limitations exist in children — for extensive surveys of the findings see Gathercole and Hitch 1993 and Gathercole and Baddeley 1993. It would make sense, therefore, to determine which effects of acquisition delays can be traced to this factor. Here I will survey this debate and argue in favor of the processing approach. This requires, first, a more detailed presentation of rule I and its relations to the binding theory. As always, the same intuitive idea can be implemented in various ways, and my presentation of rule I will follow its implementation in Reinhart 2000.
Processing or Pragmatics?
159
Rule I (Intrasentential Coreference)
It is by now well established that intrasentential pronominal anaphora has two interpretations: binding and covaluation (coreference). In the first, the pronoun (originally a free variable) is bound by some operator; in the second, the pronoun picks up the same value (reference) as some other argument in the sentence. The most obvious instance of covaluation is coreference, where the value is a referential discourse entity, but other instances are discussed in Heim 1998 and in Reinhart 2000. Quantified DPs cannot serve as antecedents for coreference ( having no reference), so normally they can enter only bound anaphora relations. But referential DPs allow both relations. E.g., there are two anaphoric construals for (1) that can be represented as in (2). (1) Lucie thinks she is smart. (2) a. Lucie (λx (x thinks x is smart)) b. Lucie (λx (x think she is smart)) & sheâ•–=â•–Lucie In (2b) the pronoun corefers with Lucie. In (2a), the pronoun is bound by the λ operator. In the framework of syntactic binding theory, the conditions on binding must be stated in terms of relations between arguments (DPs). Hence, LÂ�ucie is said to bind the pronoun in this representation. However, this means that syntactic binding must be defined as in (3) (from Reinhart 2000; see also Heim 1998). (3)╇Binding: α binds β iff α is an argument of a λ-predicate whose operator binds β. (4)╇ Lucie thinks she is smart and Lily does too. Of course, (2a) and (2b) are equivalent, in isolation. But, as was discovered in the 1970s (since Keenan 1971), certain contexts show that there is a real ambiguity here. E.g. assuming that sheâ•–=â•–Lucie, the elliptic second conjunct of (4) can mean either that Lily thinks that Lucie is smart (the “strict” reading) or that Lily thinks Lily herself is smart (the “sloppy” reading). The first is obtained if the elided predicate is construed as in (2b), and the second if it is construed as in (2a). The conditions under which bound-variable anaphora is possible are pretty much agreed upon, and they are summarized in (5). (5) ( Variable) binding condition. β can be construed as a variable bound by α iff a. α c-commands β, and b. β is a free variable, and c. In the local domain of α, β is not a pronoun. (Condition B)
160
Reinhart
(5a) defines the structural configuration for binding assumed since Reinhart 1976.1 (5b) does not need to be stated as a specific condition, but it is the obÂ� vious condition imposed by logic: Only free variables can be bound by an operator. Pronouns and anaphors are commonly viewed as variables, so these are the candidates for being bound (leaving aside here more complex instances of free variables). (5c), by contrast, is a specific condition of natural language. Condition B of the binding theory determines that pronouns cannot be bound in the local domain of the binder. Only anaphors can be bound in that domain.€Thus, in (6a) the first two conditions of (5) are met: A free variable is c-commanded by a potential binder. Nevertheless, the interpretation (6b) is ruled out by (5c). There are various views on the formulation of condition B, as well as the question why it should exist in natural language, but this topic is irrelevant for the present discussion. (6) a. Every lady praised her. b. *Every lady (λx (x praised x)) (7) a. Lucie praised her. b. *Lucie (λx (x praised x)) c. *Lucie (λx (x praised her)) & herâ•–=â•–Lucie The same condition rules out, obviously, the binding construal with a referential DP, as (7b) for (7a). However, this is not sufficient to rule out anaphora in (7a). In principle, a free pronoun can pick up its value anywhere, so nothing so far rules out its picking up Lucie, as in (7c). In fact, however, the sentence cannot have this covaluation reading. So we need also to define the conditions on covaluation. These conditions are more complex. On the one hand, covaluation is much freer than binding, and it does not require c-command (The woman who praised him hates Max). On the other, the two anaphora types still obey some shared restrictions. Specifically, covaluation appears to also obey condition B (7c). This presently means that condition B has to be stated so that it restricts both binding and covaluation (coreference). How this could be done is not a trivial question. As we saw, anaphora can mean two very different things ( binding and covaluation). For this reason, it is unreasonable to assume that both interpretations can be captured by one and the same coindexing mechanism, as in the classical binding theory. (For a sÂ�urvey of the problems, see Reinhart 1983 and Reinhart 2000.) Let us assume such problems can be solved. (E.g. binding and covaluation are captured by different types of indices, or condition B is stated twice, in slightly different terms — for binding and for covaluation). Still we should note that there is a way to approach this problem that avoids such questions.
Processing or Pragmatics?
161
An observational generalization which emerges so far is that covaluation is generally free, except in the c-command domain. Recall that this is the domain that enables variable binding. In this domain, it turns out that if binding is excluded, covaluation is excluded as well. This is stated in (8). (8) Covaluation condition (Temporary) α and β cannot be covalued if a. α is in a configuration to bind β (namely, α c-commands β) and b. α cannot bind β. Suppose that in the processing of (7a) (Lucie praised her) we are considering assigning the pronoun the value of Lucie, namely the covaluation of she and Lucie (7c). For that, (8) needs to be consulted. Lucie is in a configuration enabling it to bind the pronoun, but actual binding is ruled out by condition B (5c). Hence, (8b) determines that the covaluation in (7c) is also disallowed. In their empirical coverage the two approaches to condition B effects are precisely identical. But in the second, rather than checking a direct structural restriction on covaluation, we need to consider an altogether different question, namely whether binding is possible in our given derivation. On this view, covaluation is not directly governed by a condition of the computational system, but by an interface strategy that takes into account the options open for the computational system in generating the given derivation. At this point, this second approach may seem a weirdly indirect way to capture the given facts. But this becomes less so when we consider another set of facts. It was noted in Reinhart 1983 that in the case of coreference we can find systematic violations of condition B, as in (9). (Coreference is indicated by italics.) (9) a. Despite the big fuss about Felix’ candidacy, when we counted the votes, we found out that, in fact, only Felix himself voted for him. (Reinhart 1983) b. I dreamt that I was Brigitte Bardot and I kissed me. (George Lakoff, discussed in Heim 1998) c. You are you and she is she. Don’t lose your ego! (10) *Oscar is depressed these days. He almost seems t to hate him. (Meaning — to hate himself╃). Contextually similar examples were noted in Evans 1980 for (what became known as) condition C.2 Evans argued that the reason why his equivalent examples are permitted is that although the pronoun ends up coreferring with a c-commanding NP, it is not referentially dependent on that NP, but rather it
162
Reinhart
picks up its value from the previous mention of this referent in the discourse. If this is the correct explanation, it is not clear why condition B violations are not always possible. It is known that actual discourse tends to maintain referential continuity, so in a large majority of cases, a potential antecedent has already been mentioned. Still, an arbitrary instance of referential continuity, like (10), does not allow a condition B violation. I argued in Reinhart 1983 that the reason why coreference is possible in (9)€ is that the coreference interpretation is clearly distinguishable from the iÂ�nterpretation that would be obtained by variable-binding. No such distinction can be found in (10). (The conditions under which the two readings are distinguishable are discussed further in G&R, and, at greater depth, in Heim 1998. I will illustrate them briefly below.) If a comparison with the bound interpretation is relevant for deciding whether covaluation is possible in a condition B environment, it is very difficult to see how this could even be stated by a purely structural condition. The covaluation condition (8), by contrast, easily enables stating this by adding a clause, as in (11). (11) is rule I (Intrasentential Coreference) of G&R (Grodzinsky and Reinhart 1993), as reformulated in Reinhart 2000. (11) Covaluation Rule I α and β cannot be covalued if a. α is in a configuration to bind β (namely, α c-commands β) and b. α cannot bind β and c. The covaluation interpretation is indistinguishable from what would be obtained if α binds β. In (9a) (Only Felix voted for him), Felix is in a configuration to bind him, (11a); hence (11b) needs to be consulted for covaluation. Felix cannot bind him; hence (11c) needs to be consulted. But covaluation and binding are distinguishable here. Hence the third conjunct of (11) does not hold, so (11) does not rule out covaluation. In the case of (9a), the distinction is clearly truth conditional: the reading obtained by covaluation (Only Felix (λx (x voted for Felix)) is true only when no one else voted for Felix, while the reading that would have been obtained by binding (Only Felix (λx (x voted for x)), may be true if many people voted for Felix, but he is the only person who voted for himself. More broadly, in all the examples of (9), applying (11c) would show that the bound variable interpretation is distinguishable from the covaluation reading. Heim (1998) points out that defining the notion “distinguishable interpretation” is not a trivial matter, and develops the notion of guises to account for some of the contexts that she believes are not captured this way. However, in many of her cases, the contexts remain the same, namely those where a con-
Processing or Pragmatics?
163
trast can be found with the bound-variable interpretation. (I return below to the area where our views differ.) For (9b), G&R argue that this is because most likely Lakoff’s dream did not involve an act of self-kissing. Heim’s concept of guises can capture this intuition in a different way. In the identity cases like (9c), Reinhart (1983) argued that the bound variable interpretation (You (λx (x is x))) is a tautology, while the intended covaluation reading (You (λx (x is you))) is an empirical statement. Heim (1998) developed the concept of “structured meaning” to handle such cases. In (10), as in (9), (the trace of╃) Oscar c-commands him (11a) and cannot bind it (11b). But here, neither the internal semantics of the sentence, nor the context, provide any possible distinction between the covaluation interpretation and the binding interpretation. Since all conjuncts of (11) hold, it rules out covaluation. It is because of clause c of (11) that the covaluation rule cannot be a simple structural condition on coindexation outputs. Rather, it is an optimality-type condition. To compute (11c), a reference (comparison) set must be constructed. E.g. for (12a), the binding construal (12c) is ruled out by condition B, which is an absolute, non-negotiable, condition. But suppose we consider assigning the free pronoun the value of Oscar, namely (12b). For this, we need to construct the reference set (12b,c). Although the derivation at hand does not allow the interpretation (12c), it needs to be constructed, and compared with (12b). Only if the two are distinct in the relevant context is (12b) allowed. (12) a. Oscar hates him. Reference-set for covaluation. b. Oscar hates him and himâ•–=â•–Oscar c. Oscar (λx (x hates x)) There are several approaches to the question why the computation of covaluation requires a comparison of representations in a reference set, namely what is behind rule I. Initially I assumed (Reinhart 1983) that what governs the covaluation condition is the fact that in configurations of c-command one could also opt for binding. A predecessor of this view was Dowty (1980), who proposed that the underlying principle was “avoid ambiguity.” The scope of his proposal was only instances of (what became) condition B: For (12a), he observed that replacing the pronoun with the reflexive anaphor would yield an unambiguous anaphora interpretation, while the choice of a pronoun allows both an anaphoric and non-anaphoric interpretation. I argued that this is a more general phenomenon (also when opting for variable binding still leaves the derivation ambiguous). Assuming that binding is in general a more explicit way to express anaphora than covaluation, then avoiding it with no interpretative reason suggests non-coreference. This is also the approach taken by G&R,
164
Reinhart
stated there in more general terms of economy. There are several more sophisticated lines attempting to explain rule I in terms of “least effort” economy, most notably Fox (1998) and Reuland (2001)3. However, there have always been some problems with this interpretation of the economy requirement underlying rule I. If variable binding is always preferred over coreference we would expect a sentence like (13a) to allow only the bound variable construal of the pronoun. In fact, it allows both construals, as witnessed by the classical ambiguity of the VP ellipsis in (13b). The problem for this economy view is how the construal (13c) is generated for (13b), given that (13a) allows also variable binding. (13) a. Max likes his mother. b. Max likes his mother and Felix does too. c. Max [likes his mother ( hisâ•–=â•–Max)], and Felix does [like his mother ( hisâ•–=â•–Max)] too. d. *Max praised him and Lucie did too ( himâ•–=â•–Max). e. *He likes Max’s mother, and Felix does too ( heâ•–=â•–Max). At first glance, it may be suggested that (13c) is licensed by the ellipsis context (namely, that the more economical variable binding construal can be avoided, because the covaluation reading is distinct in this context). But this is not so. Although ellipsis contexts enable the two construals to be distinct, they crucially do not license covaluation in and of themselves. In (13d), the fact that we want to use the predicate (λx (x praised him, himâ•–=â•–Max)) in the elided conjunct, does not enable covaluation of Max and him in the first. The same point is illustrated for condition C environments in (13e). More generally, evaluating whether the bound reading is distinct from the covaluation reading can be based only on information in the derivation itself ( perhaps relative to its preÂ� vious context), but not on considerations of how it would affect upcoming discourse. See Fox 1995 for an extensive discussion of this point, in the case of quantifier raising in ellipsis structures. Thus, sentences like (13b), which have served as the classical illustration of the availability of two construals of anaphora, remain unexplained within this view of economy. (G&R could address this problem only with a stipulation (in their note 13). Other attempts are discussed in Reinhart 2000.) In view of this and other problems, I proposed (Reinhart 2000, 2006) that the type of economy involved here is different. Rule I as stated in (11) prohibits free covaluation only when binding is disallowed in a given derivation (clause b of (11)). Since in (13a) binding is permitted, covaluation is also free, without consulting clause c of (11). I argued that rule I is an instance of a broader interface strategy that can be labeled Minimize interpretative options. The problem for users of linguistic derivations is how to minimize the set of
Processing or Pragmatics?
165
possible interpretations of a given phonological form. The more options there are, the more mysterious is the fact that speakers manage to understand each other. Hence, for an efficient use, an interpretative option ruled out by the computational system should not be sneaked back arbitrarily by procedures available at the discourse level. In our case, if binding is ruled out, namely the set of interpretative options of the pronoun is restricted by the computational system, rule I determines that one cannot obtain precisely the same interpretation by using the discourse option of covaluation. Other instances of Minimize interpretative options are discussed in Reinhart 2006. One may wonder, then, why rule I applies just in the case of condition B environments. The answer is that it does not. In fact, rule I is a general restriction on covaluation. The other environment where covaluation is not allowed is when the pronoun c-commands the antecedent. This has been originally perceived as a structural condition on coreference. The condition has essentially remained unchanged since its first formulation in Reinhart 1976, as in (14a), and it is presently known as condition C (14b). (14) a. “A given NP cannot be interpreted as coreferential with a Â�non-pronoun in its c-command domain.” (Reinhart 1976) b. Condition C: (Chomsky 1981): An R-expression is free. Definitions: i. An NP is bound iff it is coindexed with a c-commanding NP. ii. An NP is free iff it is not bound. iii. An R-expression is any DP which is not a free pronoun or anaphor. The term pronoun in (14a) was defined to include also reflexive pronouns, so a non-pronoun is neither pronoun nor reflexive. If coreference is to be captured by coindexation, (14a) determines that a non-pronoun cannot be coindexed with a c-commanding NP. (14b) captures the same generalization, by use of the definitions (i–iii). ‘Free’ is defined as not coindexed (i.e. not coreferential ) with a c-commanding NP. An R-expression is a non-pronoun (or anaphor). The term has nothing to do with reference: bound variables, like wh-traces, are also defined as R-expressions. The two formulations in (14) thus express the same condition, based on the view that both binding and coreference (or more broadly covaluation) are guided by a structural rule. Let us assume, for the moment, that condition C, just like condition B, is a condition on (variable) binding, namely, that it should be added to the binding conditions in (5). This means that binding is impossible in (15a), namely that (15b) is ruled out. But just as in the case of condition B (exemplified in (7)), this is not sufficient to rule out the covaluation construal in (15c).
166
Reinhart
(15) a. She thinks that Lucie is smart. b. *Lucie (λx (x thinks that x is smart)) c. *She (λx (x thinks that Lucie is smart)) & sheâ•–=â•–Lucie However, once rule I (11) is assumed, nothing needs to be added to it to rule out (15c). In (15a), she c-commands Lucie. Hence, if we are considering cÂ�ovaluation of the two, (11b) needs to be consulted. By Condition C, she cannot bind Lucie (15b). The fate of (15c) now depends on clause c of rule I. In the given context, the covaluation reading (15c) is equivalent to (i.e. indistinguishable from) the bound reading (15b). Hence (15c) is ruled out. In (16), by contrast, the two readings are truth-conditionally distinct. (In (16b) considering oneself smart is said to hold only of Lucie; in (16c) considering Lucie smart holds only of Lucie.) Hence rule I allows covaluation here. (16) a. Only she thinks that Lucie is smart. b. Only Lucie (λx (x thinks that x is smart)) c. Only she (λx (x thinks that Lucie is smart)) & sheâ•–=â•–Lucie I will return to other instances where rule I permits coreference in apparent violation of condition C. In fact, the evidence for clause c of rule I is much stronger in the case of condition C than with condition B. (See note 2 for a potential reason.) All of Evans’s (1980) original examples were with apparent condition C violations, and historically my major motivation in Reinhart 1983 was capturing the interpretative and conceptual problems posed by condition C. So far we assumed that, similarly to condition B, condition C is still needed as a structural condition on binding, independently of covaluation. Let us now turn to the question whether this is indeed so. To check this, let us incorporate condition C into the binding condition we assumed before in (5), as clause d of (17). (17) ( Variable) Binding condition (with condition C) β can be construed as a variable bound by α iff a. α c-commands β, and b. β is a free variable, and c. In the local domain of α, β is not a pronoun. (Condition B), and d. β is not an R-expression. (Condition C) Recall that the definition of R-expression covers anything which is not a free pronoun or anaphor, namely cannot be construed as a free variable. (Bound variables, like wh- traces, are R-expressions.) Thus, (17d) just repeats (17b). Clause b is, of course, crucial for binding, but as I mentioned, it is just a prerequisite of logic, that does not have to be stated as a specific linguistic condition. (One cannot imagine that natural language would be useable at the
Processing or Pragmatics?
167
interface, if it had a concept of variable-binding undefined in logic.) Our original binding condition (5) is thus sufficient to capture the restrictions on variable binding under consideration. (As in the standard theory, weak crossover is not captured by what has been stated here; see note 1.) Condition C is thus superfluous as far as binding goes. However, since the linguistics community seems attached to condition C, we may leave open here the question whether clause d of (17) is needed independently of clause b, namely, whether condition C exists. Either way, condition C, just like condition B, only restricts variable binding, and the crucial point here is that covaluation (coreference) in condition C environments is governed by the covaluation rule I. Let us examine a further example for how clause b of (17), or condition C, interacts with rule I in classical cases where it is assumed that condition C is at work. (18) a. Who does she think t is smart? b. Who (λz (she think z is smart)) c. *Who (λz (z think z is smart)) In the strong-crossover structure (18b), the pronoun she c-commands the trace z. But by (17b) (namely 5b), it cannot bind it, because the trace is a bound, rather than free variable. Who can still bind the free pronoun she. In this case we would obtain (18c), where the pronoun and the trace are covalued — both get the value z. (That the pronoun z does not bind the trace z can be verified with definition (3).) However, since she both c-commands the trace z and cannot bind it, rule I (11) determines that this covaluation is ruled out. Condition C effects in quantification contexts (*Shei thinks that every ladyi is smart) are precisely analogous, assuming that the quantified DP undergoes quantifier rÂ�aising. Turning now to the acquisition of rule I, as established since Chien and WÂ�exler (1990), there is a sharp contrast between children’s performance on bound anaphora and on coreference. Specifically, children rule out variable binding construals in condition B contexts at a rate of 80 –90 percent, but they perform at around 50 percent on ruling out coreference in these contexts. G&R assume that both the binding conditions and rule I (or the broader strategy behind it) are innate, and fully available to the child, namely, there is no deficiency of information, or any factor that awaits acquiring. The child also masters innately the basic laws of logic, and has the tools to compute logical equivalence, as required by clause c of rule I. But the difference in children’s performance on binding and coreference follows from the different types of computation involved in resolving these two types of anaphora. Computing clause c of rule I requires constructing, holding, and carrying a semantic
168
Reinhart
Â� comparison of a reference set with two representations, and G&R argue that the amount of processing required by this step exceeds children’s working memory capacity, which is not as developed yet as that of adults. In The MIT Encyclopedia of Cognitive Sciences, Smith (1999, p. 888) dÂ�efines the working memory system as follows: “Cognitive scientists now aÂ�ssume that the major function of the system in question is to temporarily store the outcomes of intermediate computations when problem solving, and to pÂ�erform further computations on these temporary outcomes (e.g. Baddeley 1986).” It is obvious that reference set computation relies heavily on this ability to store and perform further computation on temporary outcomes. Independently of this task, it has been by now pretty established in psychology that children’s working memory is not yet fully developed. For an extensive survey of the literature and findings in this area, see Gathercole and Hitch 1993 and Gathercole and Baddeley 1993. (For examples of experiments on the linguistic effects of this limitation with pre-school children, see Gathercole and Adams 1993 and Adams and Gathercole 1996.4) Given this limitation, one may assume that children know precisely what they have to calculate in order to answer the questions in rule I tasks, but they fail to execute the required procedure. I will return to more details of how this works in the subsection titled Processing Load. G&R argue that the crucial indication for working memory failure is in the€ statistics of children’s performance. What the repeating experiments on coreference in condition B environments confirmed is that at the relevant experimental setting, the results are at approximately 50 percent of adult-like performance. As mentioned, Chien and Wexler (1990) showed that, in these circumstances, chance performance is found also in individual children (conflicting answers on the same condition), which indicates a guess pattern. The same results were confirmed by Thornton and Wexler (1999), who showed that (for most children in their experiments) the analysis of the individual results corresponds to the binomial model — probability of arbitrary choices between two options. I will turn to their statistical findings and their significance in the section titled Explaining Chance Performance, where I will also discuss the experimental conditions under which chance performance is to be expected. 50 percent performance consistent with a guess pattern is not something that is found all over the place in acquisition. If children don’t know a given rule, one may still expect a uniform performance pattern for an individual child. But if the source of the difficulty is a processing failure these results are explained: To resort to a guess, the child has to know that he is missing something. (OÂ�therwise he would operate uniformly based on his assumptions on what the relevant rule is, or, in case of default strategies, according to the default.) This
Processing or Pragmatics?
169
condition is met here because the child knows innately that he has to execute the comparison required by clause c of rule I. Since he gets stuck in the execution, and there is a pressure to answer either “yes” or “no,” one of these is chosen arbitrarily. Thornton and Wexler’s Arguments against the Processing Account
Thornton and Wexler (1999) ( henceforth T&W) argue against the processing account. They assume a version of rule I that follows its reformulation in Heim 1998. As mentioned, on Heim’s view, there are some contexts where what enables a coreference reading is that the two NPs pick up the shared referent under distinct guises. In all other contexts, Heim’s analysis is the same as outlined above. But T&W extend her analysis to all contexts and argue that it is not the need to construct and evaluate a comparison set that hinders children’s performance on coreference, but rather a pragmatic deficiency in identifying the use of guises. Thus, children’s difficulties do not reflect a processing limitation, but problems with contextual orientation that develops with age. Let us first examine T&W’s main arguments against the processing account of G&R. One argument is conceptual: On G&R’s account, the processing bottleneck that children encounter is ‘of the sort€known to diminish with age’ (1993, 91). Thus, they do not share the assumption that€ children have access to a universal parser (see Crain and Wexler 1999; Crain and€ Thornton 1998). Rather, the child’s processing system has different properties from€adults, and Rule I remains problematic until this system matures. (T&W 1999, 47)
I definitely share the theoretical assumption of a universal parser, in the references cited by T&W. But G&R’s point of departure is precisely that the children’s parser, being innate, is identical to that of adults. They argue that “there is no known reason to assume that any of the steps [of rule I] requires knowledge that surpasses children’s innate endowment.â•–.â•–.â•–. But the execution of all these steps, in the specific case of structures like [Oscar touched him] puts a much heavier burden on working memory than do other rules (e.g. the binding conditions).â•–.â•–.â•–. If this is so, then presented with [such sentences], children know exactly what they are required to do by Rule I, but getting stuck in the execution process, they give up and guess.” (G&R, 88) The difference between children and adults, in this case, is only in the size of their working memory. It is commonplace wisdom that precisely one and the same parser (software), applying in two hardware systems differing only in the size of their memory, may fail at some tasks in the one, but not in the other. A difference in memory space cannot be described as a different parser, nor precisely as a different
170
Reinhart
processing system. Rather, acknowledging commonplace wisdom in psychology, that children’s working memory is smaller than adults’, enables us to explain how the same innate computational system and parser can still fail in children’s processing. Conceptual issues aside, T&W raise two arguments against G&R’s analysis, which they summarize as follows: “There are two main problems with G&R’s account.â•–.â•–.â•–. First, there is little or no evidence to support the proposal that some sentences containing pronouns (e.g. Mama bear is washing her) cause a processing overload whereas others (e.g. Mama bear is washing her face .â•–.â•–.) do not. Second, there are reliable experimental findings showing that whereas children misinterpret pronouns in principle B structures, they do not have difficulty with parallel principle C structures (i.e. Mama bear is washing her vs. She is washing Mama Bear). On a rule I account, both should be equally difficult to process” ( p. 52). Let us examine each of these arguments. Processing Load
The empirical prediction of G&R is that all tasks that require the processing of clause c of rule I (as stated here) would lead to chance performance by children, which G&R take as the evidence for a processing load. In the present formulation, clause c is the step that requires a semantic reference-set computation. Let us see, first, how this works. Rule I is repeated below. (11) Covaluation Rule I α and β cannot be covalued if a. α is in a configuration to bind β (namely, α c-commands β) and b. α cannot bind β and c. The covaluation interpretation is indistinguishable from what would be obtained if α binds β. Suppose the child is considering coreference assignment in a given derivation. This means rule I must be consulted. If either clause a or clause b of (11) does not hold, the assessment ends here, with nothing complex about it. (19) a. Max’s mother loves him (& heâ•–=â•–Max). b. The woman next to Max praised him (& himâ•–=â•–Max). (20) a. Mama Bear is washing her face (& herâ•–=â•–Mama Bear). b. Mama Bear is washing herself (& herselfâ•–=â•–Mama Bear). In (19), clause a of rule I (11a) does not hold — neither of the candidates for covaluation c-commands the other. Hence, clause b (11b) need not even be consulted, and the covaluation goes through. In (20a), clause a holds, so clause b must be checked. However, under the present formulation of rule I (see the discussion of (13) above), clause b does not hold since binding conditions B
Processing or Pragmatics?
171
and C do not rule out the binding of the anaphoric element. So assessment ends here, and coreference is permitted. The same is true of (20b). There is no evidence that the need to check clause b of (11) poses any processing difficulties to children. But if both clause a and clause b hold, assessment must go through clause c, which is the costly step. These are the cases of coreference in apparent violation of conditions B (21) and C (22, 23). (21)╇ *Mama Bear is washing her (& herâ•–=â•–Mama Bear). (22)╇ *She is washing Mama Bear (& sheâ•–=â•–Mama Bear). (23)╇ Only she is washing Mama Bear (& sheâ•–=â•–Mama Bear). In these cases, a comparison representation must be constructed and compared to the intended coreference representation. In terms of processing it does not matter whether the final verdict of rule I is “allow,” as in (23), or “disallow,” as in (21) and (22). In both cases, the decision requires a complex computation. A question that arises is what it is precisely about step c of (11) that exceeds the processing ability of children. In fact, two procedures take place in applying this clause. It is easiest to spell them out from the perspective of the comprehension side of the parser. First, in order to determine whether coreference is distinct from binding, the binding representation needs to be constructed. This representation is not available at the input derivation (which is associated with the phonological input the parser receives), since the input derivation does not allow binding. So the parser has to construct an alternative representation with variable binding. (The details of the procedure of constructing the alternative derivation are discussed in Reinhart 2000.) The next step is semantic computation: The two representations need to be compared against the context, and only if they are distinct, coreference is allowed. The second procedure seems similar in nature to that involved in semantic disambiguation, where two representations need to be compared in order to select the one appropriate to the context. It is known that semantic disambiguation itself imposes a processing load, because it requires holding two (or more) representations in working memory. So one may ask which of the parsing procedures involved in rÂ�eferenceset computation surpasses the capacity of children’s working memory. G&R assumed that the latter task is already beyond children’s ability, and cited children’s performance on lexical disambiguation as evidence. (Faced with a lexically ambiguous word, children select the reading that is statistically more frequent, rather than comparing the competing readings against the context.) As pointed out in T&W, in the case of lexical disambiguation, alternative analyses of children’s performance are available. Still, it is known that children (like adults, in fact) tend to develop defaults to bypass the parsing of semantic
172
Reinhart
disambiguation, which is some evidence for the greater processing load posed by this task (see e.g. Crain, Ni, and Conway 1994). Nevertheless, I suggested in Reinhart 1999 that the conclusion of G&R was mistaken, and it is only the full complex involved in reference-set computation which leads to a processing crash of the child’s parser. More empirical work is needed on children’s performance on semantic disambiguation, but the hypothesis put forth here is that we expect a processing failure only when the computation also requires the first step of constructing a derivation not available at the parser’s input. The theoretical expectation of the present framework is that reference-set computation (of the relevant global type) is always associated with group performance at the 50 percent range in dual choice tasks in acquisition, namely, that if other areas of language are found that require reference-set computation, with properties similar to clause c of rule I, then (under the appropriate experimental setting) we should find children’s performance at the 50 percent range in these areas as well. The proposed explanation is that children are aware of the innately required computation, but they cannot carry it out because of their limited memory resources, and they resort instead to strategies enabling bypassing it. In Reinhart 2006 I argue that two such bypassing strategies are possible: One is simple guessing, witnessed by individual performance at the range of 50 percent. The other, dominant in tasks involving semantic disambiguation, is the selection of an arbitrary default, which may be fixed for a given child across tasks. But since the choice of the default is, itself, arbitrary, the group results remain at the 50 percent range. As we will see, in the area of rule I, the guess strategy is found in all anaphora tasks involving step c of this rule, including in tasks similar to (23), where this clause rules coreference in. However, T&W offer an alternative account for children’s difficulties in all these cases (except (23)). Hence it is appropriate for them to raise the question what independent evidence exists that the source of difficulty here is indeed the processing load. In principle, it should be possible to test the processing load directly in rule I tasks by standard measurement of processing time, or more sophisticated eye-tracking experiments. To my knowledge, this has not been done. Another type of possible independent evidence is if performance at the 50 percent range is also found in areas other than anaphora, where, on the one hand, reference-set computation has been established, and, on the other, T&W’s pragmatic analysis cannot apply. I argue in Reinhart 2006 that such evidence has indeed emerged in the areas of stress shift for focus. Chierchia et al. (2001) and Gualmini et al. (2001) found the same in the area of implicatures, which also involve semantic reference-set computation. (Since semantic disambiguation is at work in both these areas, the dominant strategy in both is the arbitrary fixed default.5)
Processing or Pragmatics?
173
Condition C
The second major argument of T&W regards condition C. The theoretical stand underlying their argument is that condition C must be assumed as an independent syntactic condition. (Heim (1998) modified rule I to apply only in condition B environments, leaving the covaluation problems with condition C for future research.) However, as we saw above, whether condition C is needed for binding or not is independent of the question under consideration, of children’s performance on the coreference aspects of condition C. Be that as it may, it seems to me that T&W should expect exactly the same behavior in condition C environments as G&R do. This is because they state that, as in Chien and Wexler 1990, they continue to assume that the pragmatic generalization governs both the coreference aspects of condition B and of condition C ( p.€31), and they provide an alternative account of why children’s performance on condition C coreference appears better than that on condition B (see below). This is not surprising, since both G&R and Chien and Wexler (1990, further developed by T&W) share the assumptions of Reinhart (1983) that variable binding and coreference are governed by different types of rules. Chien and Wexler’s study was pioneering in establishing that, correspondingly, children perform well on binding tasks (in both condition B and C environments) but perform at around 50 percent in coreference tasks. So under both analyses one may expect the same with the coreference aspects of condition C. Nevertheless, T&W also argue that given that condition C is innate, children should not have problems with its coreference aspects, and use this as an argument against G&R’s analysis. Let us follow this argument. With the exception of Grimshaw and Rozen (1990), who found near-chance performance on sentences like their (24a)6, most studies found that children rule out coreference disallowed by condition C at a much higher rate than their performance on condition B. (24) a. *He said that Bert touched the box. (heâ•–=â•–Bert) b. Because he heard a lion, Tommy ran fast. (heâ•–=â•–Tommy) However, G&R argued that the apparent improved performance on condition C might reflect an independent factor: In a right branching language, the most frequent instances of condition C violations also involve backward anaphora. With the exception of Crain and McKee (1985), studies that found a high rate of rejection of anaphora in (24a) also found that children reject backward anaphora in structures like (24b), where it is permitted by condition C. These studies (e.g., Tavakolian 1977; Solan 1978; Lust and Clifford 1982) attribute both results to an independent directionality factor and conclude that children reject backward anaphora regardless of condition C.
174
Reinhart
In their chapter 2, T&W argue in reply that the findings regarding directionality effects are just a product of the experimental setting, specifically, of using act-out or elicited-imitation tasks. Thus, they conclude that there is no evidence for children’s independent difficulties with backward anaphora, which means that the reason they reject anaphora in structures like (24a) can only be adherence to condition C ( p. 49). Interestingly, however, in their chapter 3, T&W encounter a directionality problem for their own analysis. Their account for children’s performance on condition B is that they have not yet mastered the use of guises. Thus, they may obtain “local coreference” under two different guises in (25a), where the adult conditions for obtaining distinct guises are not met. One would expect, then, that children should also allow local coreference in the condition C environment (25b), by precisely the same procedure of assigning different guises to the relevant entity. Still, in T&W’s experiments, children rejected local coreference in sentences like (25b) at a high rate of 92 percent, while they rejected coreference in sentences like (25a) only at the rate of 57 percent, with mostly chance performance. (25) a. Mama Bear washed her. b. She washed Mama Bear. (T&W, 25, p. 106) T&W explain that “the crucial difference between sentences subject to principle B and those subject to principle C is the obvious one: In the former, the pronoun is in object position, and in the later, the pronoun is in subject position” ( p. 106), namely the crucial factor is directionality. They proceed to offer two reasons why when the pronoun precedes the potential antecedent (as also in 24a), anaphora computation will be blocked independently of the guises options. One is in terms of processing: Pronouns are assigned a reference as soon as they are encountered. If she in (25b) has been assigned the reference of Mama Bear (from the previous discourse), then deciding whether it could corefer with the next occurrence of Mama Bear, under a different guise, would require backtracking this step and starting a new guise-computation. It is this backtracking which is difficult for children, or as T&W conclude, “obviously, an on-line incremental parser would find this amount of computation burdensome” ( p. 107)7. Presumably, then, children do not even consider the coreference option in such contexts. Whether this is the precise formulation of the directionality factor or not, it confirms G&R’s conclusion that with backward anaphora there is an independent factor that disables the application of rule I for children. Possibly, the factor is not just any directionality, as G&R assumed, following previous sÂ�tudies, but only directionality involving the subject, as proposed by T&W. In any case, it is this directionality factor that explains why children reject coref-
Processing or Pragmatics?
175
erence in the common condition C contexts, independently of rule I. As it turns out in chapter 3, also within T&W’s analysis, children’s improved performance on condition C tasks does not provide any evidence for their mastery of the computation of the coreference aspects of condition C. In fact, they assume, just like G&R, that children bypass rule I, or the guises computation, in these environments. G&R suggest that to circumvent the directionality factor, children’s performance on the coreference aspects of condition C should be checked in the few instances where this factor is absent, most notably in reconstruction contexts such as (26). (26)╇ *Near Ann, she saw a lion (sheâ•–=â•–Ann). Under the reconstruction analysis, condition C (and hence rule I) blocks coÂ� reference here because, once reconstructed to its original position, the PP is c-commanded by the pronoun. On the other hand, there is no directionality effect here, since during processing the pronoun follows the antecedent. Indeed, in such environments, experiments reached a clear consensus: children perform poorly, with exact figures varying according to the experimental method (e.g., Ingram and Shaw 1981; Taylor-Brown 1983; Lust, Loveland, and Kornet 1980). T&W dismiss these findings as well (chapter 2), based on the claim that one of the studies (Taylor-Brown 1983) used what they consider deficient methodology. (They ignore in this chapter the question of how children suddenly master guise computation in these contexts.) They argue further that a more recent study (Chierchia and Guasti 2000) has proved unequivocally children’s mastery of condition C in reconstruction contexts. In fact, however, Chierchia and Guasti’s study focused on bound variable anaphora, such as (the Italian version of╃) (27). (27)╇ *In the barrel of every piratei, hei carefully put a gun. Indeed, children rejected anaphora in such sentences 90 percent of the time. But this is precisely the expected result for both Chien and Wexler (1990) and€ G&R. Rule I is not involved in the processing of (27). Bound variable anaphora is governed directly by the binding conditions, whether the relevant condition here is condition C, or clause b of (17) (the logical requirement that only free variables can be bound). The crucial assumption of G&R (as of T&W) is that children should face no problem in the processing of variable binding. In G&R’s terms this is because a heavy computational load is only involved in coreference tasks, where rule I needs to be consulted to determine whether coreference is still permitted although binding is ruled out. Chierchia
176
Reinhart
and Guasti, in fact, emphasize this point in that same paper, stating explicitly that they did not study coreference in these structures, but their theoretical eÂ�xpectation is that in coreference tasks, the same difference would be found between children’s performance on bound-variable (quantified) anaphora and on coreference, as found in condition B environments. It remains the case that to properly check children’s performance on rule I in condition C environments, one should abstract away from possible directionality factors. An unexpected further confirmation that children perform at chance level if directionality is controlled comes from T&W’s own experiments on coreference in VP ellipsis, with sentences like (28). (28)╇ The kiwi bird cleaned Flash Gordon and he did too. In the story context, Kiwi bird and Flash Gordon fell in the mud. Flash Gordon asked a third participant to help clean him, but that participant refused. Kiwi bird helped clean Flash Gordon, but mostly, Flash Gordon had to clean himself on his own. Children accepted (28) as true in this context 54 percent of the time. In other words, they allowed the pronoun he to corefer with Flash Gordon at chance level. Let us first examine the type of computation required to determine whether coreference is permitted here. At the interpretation stage, the λ predicate formed in the first conjunct (29a) is present also in the second conjunct (29b) (whether copied from the first or just deleted at phonological form but present at logical form). (29) a. The kiwi bird (λx (x cleaned Flash Gordon)) and b. He did (λx (x cleaned Flash Gordon)) too (& heâ•–=â•–Flash Gordon) Now, we are considering assigning the pronoun the value of Flash Gordon, which would result in a covaluation configuration in clause (29b). The first clause of rule I (11) holds: the pronoun c-commands Flash Gordon in (29b), hence rule I must be further checked. Clause b holds as well — the pronoun cannot bind Flash Gordon, by condition C, or its equivalent logical prohibition (clause b of 17). So clause c of rule I must be applied, namely the representation (30) should be constructed and compared with (29b). (30)╇ He did (λx (x cleaned x)) too (& heâ•–=â•–Flash Gordon). The coreference construal (29b) is permitted only if, in the context of (29a), (29b) is distinguishable from (30). The fact of the matter is that it is. The parallelism requirement is that the predicates in the two conjuncts be identical (uÂ�nder the relevant definition). The predicate in (30) does not occur in (29a). The only candidate for parallelism is the predicate as construed in (29b). In
Processing or Pragmatics?
177
more intuitive terms, the property shared by the two events is that of cleaning Flash Gordon, not of cleaning oneself. The type of meaning distinctness this example shows is similar to that in (31), observed by Evans (1980) and discussed by Reinhart (1983), Grodzinsky and Reinhart (1993), and Heim (1998). Heim labeled such contexts “structured meaning” contexts. (31) I know what Ann and Bill have in common: She thinks that Bill is terrific, and he thinks that Bill is terrific. (adapted from (49) in Evans 1980) (32) a. She (λx (x thinks that Bill is terrific)) and b. He (λx (x thinks that Bill is terrific)) (& heâ•–=â•–Bill). (33) He (λx (x thinks that x is terrific)) (& heâ•–=â•–Bill). The last conjunct in (31) violates condition C. Nevertheless, the coreference interpretation (32b) is permitted. In this case, parallelism is not imposed by ellipsis, but by the content of the preceding context, which requires identifying a shared property of Ann and Bill. Although the proposition (32b) is equivalent to (33), the properties attributed to their subjects are not identical (they denote different sets). It is only the property in (32b) that is indeed shared by (32a) and (32b), or by Bill and Ann. This suffices for rule I to allow the coreference construal in (32b). Typical of parallelism configurations like both (28) and (31) is that the shared material must be destressed (in 31) or fully suppressed (in 28), which entails that in both the subject pronoun is stressed. By this computation, then, (28) comes out as an instance of coreference ruled in by rule I.8 This contrasts with some claims in the theoretical litÂ� erature€on coreference that (28) is ruled out for adults (e.g. Fiengo and May 1994). But that it is indeed the correct verdict is witnessed by the results in the€ adult-control experiment of T&W, where adults accepted (28) at a rate of€83 percent.€The less than 100 percent acceptance rate here may be typical of€ rule I computations, which require more effort from adults, but it is still in€sharp contrast to their full rejection of coreference in condition C environments ruled€out by rule I, such as He cleaned Flash Gordon — with no ellipsis context. Recall that, unlike adults, children performed here at chance level, allowing coreference 54 percent of the time. For G&R, this is the expected result whenever clause c of rule I needs to be processed. Whether coreference is ruled in€or ruled out by rule I cannot be relevant, since children cannot complete the computation anyway. The question then is why this expectation is confirmed only in the ellipsis context (28) and not in the experiments with the same sentences
178
Reinhart
as matrix clauses, as in (25b) (She washed Mama bear or He cleaned Flash Gordon). T&W provide an answer: In the VP ellipsis context, the on-line processing factor, which blocks even considering coreference in the matrix-clause cases, does not play a role, because the full predicate attributed to the subject pronoun in the elliptic conjunct is available to the child from the previous conjunct. So when the reference of the pronoun is decided, the full proposition is available for computation ( p. 129). Thus, the offensive backtracking is not required, or, in our terms, the directionality factor is avoided. This, then, is a novel direct confirmation of G&R’s prediction: In contexts where the directionality factor is neutralized, children’s performance on condition C aspects of rule I should be at chance level. Nevertheless, T&W, in two chapters of their book, present this experimental finding as a major and decisive argument against G&R’s analysis: “On Grodzinsky and Reinhart’s view, the proposed asymmetry between matrix and VP ellipsis structures is not to be expected. For Grodzinsky and Reinhart, children should respond at chance levels to matrix sentences governed by principle C because rule I requires two representations to be compared.” ( p. 129) “Thus, G&R’s account cannot be correct in its present form.” ( p. 201) This same experimental finding also sheds some light on another argument of T&W against G&R’s analysis. Under G&R’s account, the crucial factor leading to chance performance is that children are unable to carry out the computation required by rule I because of their underdeveloped working memory. This means that what the actual verdict of rule I is (for adults) cannot have an effect on children’s performance. Whether rule I permits coreference or not, they will not be able to complete the computation to decide this. In T&W’s account, by contrast, the explanation for children’s delay rests on their extending adults’ conditions for the creation of guises: “Children create guises in a superset of the contexts in which adults do.” ((18), p. 102) It follows from this principle that children should allow coreference under distinct guises wherever adults do, though they extend this also to areas where adults don’t. Thus, when rule I permits coreference in condition B and C environments, T&W predict that children should perform like adults. T&W acknowledge that these environments have not been studied experimentally (overlooking the relevance of€ their own experiment on sentences like (28), where, as we saw, children performed at chance level in a condition C environment which happens to be ruled in by rule I). Nevertheless, they expect that these are the results that would be found, once the experiments are done, and explain that such findings would be a further argument against G&R’s analysis that predicts the opposite ( p. 48).
Processing or Pragmatics?
179
Questions of Learnability
In the absence of any actual argument against a processing account, the two accounts for coreference delay in acquisition appear, so far, equivalent in their empirical coverage, with the exception of one area — derivations ruled in by rule I. Assuming that T&W’s analysis can be modified to handle such cases, this is an interesting situation, where two different accounts appear equally possible for the same phenomenon. This is particularly interesting since the two accounts are based on essentially the same view of the linguistic background, namely on the division of labor between binding theory and the coreference restrictions. Both follow the view, advanced in Reinhart 1983, that binding theory (under its various formulations) restricts only variable binding, and its operations are of the familiar type of output conditions of the computational system. Coreference (or covaluation), on the other hand, is governed by a different type of procedure, which is based on context-dependent inference. In principle, it is just as reasonable to assume that what causes coreference delay in acquisition is the type of computation involved, the execution of which requires larger working memory than children have (G&R), or that it is some deficiency of the relevant context-dependent factors, which children have not mastered yet (T&W). Our next question, in this section and the following one, is whether it is possible, nevertheless, to decide between these two possible accounts. Note, first, that for the processing account, no question of learnability arises. G&R assume that children know innately everything that is required for coÂ� reference computation. So as soon as their working memory matures, they will be able to execute it. T&W’s account is based on some deficiency in knowledge, which has to be acquired. So the question how it is acquired is relevant. To assess how T&W answer this question, more details of their analysis of guises are needed. What T&W find particularly attractive in Heim’s (1998) reformulation of rule I is that in some areas it enables coreference computation to depend only on the identification of guises, without applying rule I, namely, with no comparison of representations. The clearest example is what Heim labels “identity debate contexts,” as in the exchange reproduced here as (34). (34) Speaker A: Is this speaker Zelda? Speaker B: How can you doubt it? She praises her to the sky. No competing candidates would do that. In such contexts, it can be argued that her refers to the person that A and B identify as Zelda, while she refers to the person who is the speaker (say on
180
Reinhart
stage), and whom B does not manage to identify. The same entity (Zelda) is then presented here under two guises. Heim argues that in this case a comparison of a coreference representation with the bound variable representation must identify them as logically equivalent. So rule I as stated would wrongly rule coreference out. She proposes that, in fact, when entities are represented under different guises, their relation does not count as coreference (roughly, since pronouns have guises as their denotation, and not individuals, the pronouns do not have the same denotation here). Thus, they are not subject to rule I at all. Though this is not crucial for the present discussion, I should mention that I do not share Heim’s intuition that the reason why speaker B’s utterance in (34) is appropriate is that no coreference is involved. Though postulating non-coreference may solve a technical problem, my own intuition is that the fact that she and her corefer is crucial for the interpretation — it is the inference that speaker B wants speaker A to draw. Instances of “debated identity” seem to me related to what Heim labeled “structured meaning” cases. Namely, what matters in the context is the difference of the properties, rather than the full propositions, which are equivalent in both cases. In the identity-debate contexts, the property one ascribes to an individual under discussion should help establish his identity. Praising oneself (λx (x praises x)) and praising herâ•–=â•–Zelda (λx (x praises her)) are distinct properties. If we identify someone as belonging to the set of those who praise themselves to the sky, this cannot help in establishing this person’s identity, in the given context, but locating her in the set of those who praise Zelda to the sky, enables the inference that she is Zelda. (In Heim’s example, the context also spells out that Zelda is probably the only member in this set. But the same inference would also be licensed without this addition.) Obviously, we do not yet have the formal tools to describe precisely this type of inference (which may rest on notions like relevance). This is the worry that led Heim to exclude such problems from the range of rule I. Doing so enables us to keep the term “distinguishable interpretation,” which is used in rule I, purely truth-conditional. But it is not clearly getting us closer to understanding either the inference in question, or the conditions under which speakers are allowed to opt for coreference rather than variable binding. Let us, however, assume with Heim that identity-debate contexts are instances of distinct guises. If so, then the coreference task for the child is just to€determine whether it is possible that two referring expressions represent, in the given context, different guises of a discourse entity. If this happens, then coreference is permitted. T&W argue that given that guises are coded, they must be innate. What children have not mastered yet are the conditions under which speakers associate different guises with the same discourse entity,
Processing or Pragmatics?
181
since€ children have a general deficiency in identifying speakers’ contextual intentions. However, for the analysis to work, T&W extend Heim’s analysis much further, e.g. to the contexts Heim labeled “structured-meaning” that we examined in (31), changed in (35) so it illustrates condition B environments. (35)╇I know what Ann and Bill have in common: She adores him passionately and he adores him passionately. For Heim, he and him in the italicized clause cannot possibly pass as two distinct guises of Bill. Allowing this would deprive the intuitive concept of guises of any content, since there is nothing here that suggests speakers’ uncertainty concerning identity, or a dual perception of the same individual. Heim assumes that in this case, rule I applies as in Reinhart 1983, or in G&R, namely a semantic comparison of representations is needed, though she offers further refinement of the conditions under which they are distinguishable. T&W, by contrast, argue that two guises are involved here as well. There is Bill “in the guise of the individual, in the flesh” who adores someone, and there is Bill in the guise of the person that Ann adores ( p. 94, adopted here from T&W’s examples (6) and (9)).9 Bearing different guises, as used here, seems to mean just bearing different θ-roles. T&W label this type of guise distinction “role reversal guise” ( p. 101). The same of course can be said of any instance of permitted coreference. E.g. in Bill adores himself, there is Bill the agent, and Bill the patient. But it also holds for all instances of blocked coreference. In Mama bear washed her we have automatically two guises. T&W appear aware of this, and they add another condition on the identification of guises. They note that in the relevant clause in (35) the subject pronoun is stressed, and propose the generalization that “stress on the pronoun has the effect of presenting [Bill] in a different guise, in virtue of the unexpected property of self-admiration” ( p. 93). More generally, stress is a major clue for iÂ�dentifying guises in T&W’s analysis, and they argue that except for identitydebate contexts, it is required in all cases of local coreference under distinct guises. The heavy stress marks that there is something surprising, and noncharacteristic about the situation expressed in the sentence. With this, then, T&W can identify one of the factors that children have to acquire when they eventually reach adult coreference use. (36)╇“Children must learn that stress .â•–.â•–. marks the speaker’s intention to convey the local coreference interpretation by bringing [the stressed element] into focus.” (T&W, p. 205) While it is true that in many instances of coreference approved by rule I in condition B contexts there is heavy stress on one constituent or the other (in
182
Reinhart
(35) it is on the subject; in (37) it is on the object), the same stress pattern is found in many other instances where it does not have the effect of allowing coreference. E.g. (37) is a context that T&W believe allows coreference for Mama Bear washed her, with the help of the heavy stress (a judgment not shared by all). But the same stress-pattern, with the same sentence, in the contexts of (38) has precisely the opposite effect of enforcing a non-coreference interpretation (Mama Bear could only wash Daisy Duck). (37) Mama Bear did not wash Miss Piggy. Mama bear washed her. (38) a. First Daisy Duck washed Mama Bear and then Mama Bear washed her. b. First Daisy Duck washed Miss Piggy and then Mama Bear washed her. (39) Children must learn that stress marks the speaker’s intention to convey non-coreference interpretation, by bringing the stressed element into focus. By the same logic, then, we should add to the conditions the child must learn the one in (39). A theory equipped with both (36) and (39) can never fail, bÂ�ecause it covers the whole domain of options (stress means either corefÂ� erence€or non-coreference). In a sense, it captures the facts accurately — the child€eventually knows that heavy stress is sometimes associated with corefÂ� erence, and sometimes with non-coreference, as is the state of affairs in the€ adults’ world. Nevertheless, it is not clear that this is the type of theory we€want.10 A more appealing conclusion for such a state of affairs would be that stress is not the factor that determines coreference options in condition B contexts. If it were indeed possible to reduce all instances of coreference approved by€rule I to distinct guises, then there would be no motivation to assume any reference-set computation for coreference to begin with. It would be necessary only to determine for a given sentence whether the two referential occurrences are under the same or a different guise, which does not involve constructing and comparing semantic representations. Indeed, T&W mention, in passim, that possibly, “rule I can be dispensed with entirely” ( p. 104). If so, then the processing account is of course unmotivated, and we are indeed left only with pragmatic considerations. This would not be the first attempt to dismiss the problem posed by coreference computation by enriching the set of referential distinctions, namely, to capture it directly by properties of the participating arguments, rather than by properties of the full representations. A whole family of accounts, starting with
Processing or Pragmatics?
183
Evans 1980, attempted to distinguish coreference from ‘referential dependence’ and to argue that the binding conditions restrict only the latter. E.g. Fiengo and May (1994) argue that coreference is always possible for two given NPs, as long as “it is not part of the meaning of the sentence that they are covalued” (their linking rule). Like Evans, they do not consider why, then, coÂ� reference is not simply always possible (see the discussion of (10) above). The apparent success of such attempts rests on using undefined notions. Thus, as I mentioned, Evans’s insight was in exposing and illustrating virtually all contexts that allow coreference in apparent violation of the binding conditions. But his description of the distinction that he assumes would equally allow the same everywhere else. (For a more detailed survey of this point, see Reinhart 1983.) A theory based on undefined distinctions is always true, by virtue of being unfalsifiable. T&W are probably aware of the danger of unfalsifiability posed by their description of guises as signaled by heavy stress. So they appear to view this just as a necessary condition (in all but identity-debate contexts). Stress alone is not sufficient to determine interpretation of guises. In addition, they assume that there are special contextual cues that speakers use as “markers of the speaker’s intended interpretation” ( p. 105). It is only when these special cues are used that the sentence can be associated with what T&W view as surprising, or non-characteristic traits of the situation expressed by the sentence, which in turn allow coreference. In other words, these cues determine when the speaker, by using heavy stress, actually intends to use the expressions as different guises. It is these cues that the child has to learn. Regarding what these cues are, T&W do not say much but rather refer the reader to Heim (1998). As we saw, however, Heim argues that no guises are involved at all in the examples under consideration. On the view of Heim (1998) and Reinhart (1983), determining that coreference is possible here is not based on any cues, but rather on applying logic: If the coreference representation is logically distinguishable from the bound one, coreference is permitted. This set of presently unspecified contextual conditions (cues) is, then, what children are missing at the age of the experiments, and will acquire in the next couple of months or years. “Children learn from experience that specific contextual cues accompany the local coreference interpretation, such as the factor of ‘surprise’.â•–.â•–.â•–.” (T&W, p. 103) How does this learning from experience happen, in the absence of negative evidence? T&W suggest that “once children have witnessed a sufficient number of examples of the local coreference interpretation in contexts that contain the relevant contextual cues, they will thereafter refrain from assigning this interpretation in the absence of these special markers of the speaker’s intended interpretation” (ibid.).
184
Reinhart
The underlying assumption of T&W is probably that the acquisition of cÂ�ontextual abilities, and identification of speakers’ intentions is of a different type than found with innate UG principles. Nevertheless, the same learnability questions still arise. As mentioned, actual examples of coreference in condition B environments are quite rare in discourse. One may wonder if by the age of about 6 all children get sufficient exposure to such uses. One may also wonder how each child decides at this age that the examples in the corpus he has encountered so far cover the whole set of options of use. Let us assume that once the set of “special markers of the speaker’s intended interpretation” is defined with some greater precision, these questions can be answered. Explaining Chance Performance
Assuming that the pragmatic and the processing accounts fare roughly the same in predicting the areas of delay in the acquisition of coreference, and even if it turned out that they are equally plausible in terms of learnability, we may still ask whether they both, indeed, explain the experimental findings. To address this, we need to be clearer about what the problem is that requires an explanation. Though it is standard to describe the experimental findings as indicating a delay in the acquisition of coreference, the findings are much more specific than that. Acquisition delay can take several forms. If children don’t know a given rule, or have set the parameter incorrectly, the most natural result to expect is in the vicinity of 90 –100 percent non-adult performance. (A variety of different group-level statistical results are to be expected if children’s performance differs individually.) But in all experiments on condition B coreference (of the relevant type — see below), the group performance of children is around 50 percent. This in and of itself is a curious result, but it becomes more puzzling once it is established that this is indeed chance performance, taking into consideration the performance of individual subjects. As mentioned, Chien and Wexler (1990) provide statistical analyses of individual performance, showing that many children perform individually at chance level (sometimes rejecting and sometimes accepting coreference under the same experimental cÂ�onditions). G&R’s point of departure was that chance performance of this kind indicates guessing, which requires an explanation. Let me first clarify the experimental conditions at which 50 percent performance is found, as explained in G&R. The target sentence is preceded by or embedded in another sentence which also provides an antecedent for the pronoun, as in (40a).
Processing or Pragmatics?
185
(40) a. This is A. This is B. Is A washing him? Picture/story context: b. A washes A c. A washes B The story or picture accompanying the sentence includes either the situation in€(40b) or that in (40c). In both Chien and Wexler 1990 and Grimshaw and Rozen 1990, children had no problem answering “yes” in the vicinity of 90 percent of the time in the context (40c), but they had around 50 percent performance in the context of (40b). It is this condition (40b) (the “mismatch” condition) that is relevant for our discussion. By comparison, Chien and Wexler found that in the same context, if A is a quantified DP, like every bear, rather than a referential DP, children at the age of 5 gave the adult answer, “no,” 85 percent of the time. On G&R’s account, the reason why chance performance occurs only for (40b) is that it is in only this context that (clause c of╃) rule I needs to be consulted. Though G&R do not explain this, rule I applies when a coreference interpretation is considered. In the context (40c) the option of coreference is not suggested by the context, so there is no reason for the child to even examine the option in deciding on an answer. In (40b), by contrast, a coreference interpretation corresponds to the context situation. So rule I has to determine whether the target sentence allows coreference (in which case the answer to (40a) is “yes”) or not (with the answer “no”).11 Since in this sentence binding is disallowed, clause c of rule I needs to be processed. Adults would complete the task successfully and answer “no,” but children cannot complete the execution and hence they perform at chance, or guess. Not all subsequent experiments confirmed chance performance at the level of individual children, but Thornton and Wexler (1999) point out that usually the experiments’ results have not been sufficiently analyzed to determine that (no comparison with the binomial model of results expected based on guessing out of two choices). In the detailed statistical analysis of T&W of their own experiments, a similar pattern to that of Chien and Wexler 1990 was found. By their own conclusions, the binomial model was confirmed in about 75 percent of the children (15 out of 19 subjects). The group’s performance on condition B sentences like Bert brushed him was approval of coreference 58 percent of the time.12 The individual subject data reveal that out of the 19 subjects, 8 children accepted 3/4 or 4/4 trials, 7 children accepted 2/4 trials, 1 child accepted 1/4 trials, and 3 children accepted 0/4 trials. Note that seven children showed an equal number of “yes” and “no”€responses on the four trials in the same condition. But this is not the only
186
Reinhart
indication of individual chance performance (since chance allows different individual numbers). The combined group results are almost consistent with the binomial model for guessing between two options. T&W point out that the number of children with 3/4 or 4/4 correct answers (adult-like rejection of coreference) is a bit higher than the probability in a binomial model: The model predicts 2 such children, while there are 4 ( p. 175). T&W propose to identify these 4 children as a separate subgroup. For the other 15 children (or, statistically, for 17 of the 19 children), the response pattern is fully consistent with the binomial model of pure chance, or guessing. As for the subgroup of 4 children that rejected coreference in condition B environments, T&W assume that they have reached adult knowledge. In their terms, this means they have mastered early the cues to guise-identification, or (if correct), in G&R terms, this would mean that their working memory has developed early, so they are able to execute the computation. Technically, only 2 children diverge from the binomial statistics, as we just saw. But T&W followed this group in all conditions of the sequence of experiments, and they found out that the same children performed equally well in all conditions involving coreference in condition B environments. (E.g. in the ellipsis condition with sentences like Bert brushed him and the Tin Man did too, this sub-group permitted coreference incorrectly in 1/16 trials, precisely the same result as in the non-ellipsis condition Bert touched him, that we have just examined.) This uniform behavior across conditions justifies singling out all four children as a separate group. However, T&W’s conclusion that the reason they are singled out is that, unlike the other children, they have reached adult knowledge does not follow automatically. In fact, it is not consistent with another of T&W’s findings. In the VP ellipsis tests of condition C, that we discussed as (28), repeated in (41), the children as a group performed at chance level, allowing the construal he cleaned Flash Gordon 54 percent of the time. But on this task, unlike the condition B tasks, there was no significant difference between the two groups of children, as seen in (42) (T&W p. 200). (41)╇ The kiwi bird cleaned Flash Gordon and he did too. (42)╇Acceptance of the interpretation ‘He cleaned Flash Gordon and heâ•–=â•– Flash Gordon’: a. Group I (4 children): 44 percent (7/16) b. Group II (15 children): 57 percent (32/56) The adult control group accepted coreference here 83 percent of the time, but the group of 4 children that are presumably “little adults” in their knowledge of guises (or rule I) accepted coreference 44 percent of the time, which is in the range of chance performance. (T&W do not provide the individual data for this
Processing or Pragmatics?
187
group of 4 children on this condition.) Recall that this “structured-meaning context” is an instance of coreference ruled in by rule I, although binding is ruled out here by condition C. In T&W’s analysis this is a case of distinct guises. If the four children in question mastered adults’ guise understanding, which T&W assume to explain their performance on condition B tasks, they should have manifested this also in the present task. It is in principle possible that, when children are unable to execute a given task, some of them develop some sort of a strategy (default) to deal uniformly with such tasks without applying the difficult procedure. Children operating by a strategy end up performing uniformly across different tasks, and, depending on the default strategy and the experimental condition, it can happen to be the adult-like response. It is not crucial for the present discussion to determine what strategy could explain the full data of the performance for these four children.13 Still the facts suggest that a strategy is at play, for these children, in the case of condition B. It seems rather less likely that they would have adult knowledge in one case and not in the other. In any case, abstracting away from the group of four, the crucial finding confirmed again in T&W’s experiments is that at least for the majority of children, performance in rule I environments is at chance level, consistent with individual guessing. So, a crucial question about a given analysis of coreference delay is whether it can explain this specific guess pattern, which, as mentioned, is not a common finding in all areas of acquisition. Even if not all children show this pattern, those who do need explaining. The processing analysis of G&R has taken this finding as its point of departure, and it provides a straightforward answer: A guess pattern is found when, on the one hand, the child knows what needs to be computed to provide an answer, and on the other hand, he is not able to complete the task. So, given that there are two options to choose from (“yes” or “no”), the choice is arbitrary, resulting in guessing. On the pragmatic account of T&W, it is hard to see how chance performance could be derived even for a minority of the children. On their account, the source of children’s coreference delay is their extension of the conditions allowing distinct guises: They permit distinct guise-interpretation in a superset of the conditions under which adults permit them (T&W’s “extended guise creation,” p. 102). Their performance, then, should be determined by the size and properties of the superset they adopt. Suppose, e.g. children accept freely what T&W labeled “role reversal guises,” namely they allow every thematic role to correspond to a separate guise. In this case they should always allow coreference in condition B environments, because (like in any other instance of€ coreference) the two occurrences have different thematic roles. So their
188
Reinhart
pÂ�erformance should be close to 100 percent acceptance. Suppose they take heavy stress as always allowing distinct guises. Then their performance should depend on the experimental conditions. In sentences where heavy stress is used, they should allow, again, coreference at the range of 100 percent. But if€ heavy stress is avoided (as in most of the experiments), they should perform€“like adults” and disallow coreference to the same extent. So the guisessuperset analysis can indeed predict correctly that children’s performance would differ from adults, but it cannot predict the specific way it differs, namely the actual findings of individual chance performance. Notes 1.╇ Note, however, that (5a), as stated here, does not rule out cases of weak crossover, because it does not specify at which stage of the derivation c-command should hold. In His mother loves every boy, every boy can c-command the pronoun after quantifier raising, and (5a) will not rule out binding in this derivation. The covaluation conditions I turn to directly would also not rule this out. As is standard, I assume now that weak crossover is handled by a different generalization. (In my earlier work I assumed that c-command must hold at the overt structure, hence the same condition also rules out weak crossover.) 2.╇ As is noted in Reinhart 1983, coreference where principle B blocks binding is much harder to find than coreference in condition C environments. E.g. in the context of (9a), it would be more natural to express the idea with When we counted the ballots .â•–.â•–. Only Felix voted for Felix. The reason suggested there is that using the full proper name is the more explicit way to capture the intended meaning. (The pronoun requires the further task of identifying its value.) Nevertheless, examples like (9) are possible, with effort. 3.╇ Within this view of economy, rule I could be stated without clause b of (11), as in (i), which is essentially how it was viewed in Grodzinsky and Reinhart 1993 (modulo technical changes introduced in Reinhart 2000). (i) α and β cannot be covalued if a. α is in a configuration to A-bind β, and b. The covaluation interpretation is indistinguishable from what would be obtained if α A-binds β. That variable binding is more economical is possibly defendable, in terms of semantic processing. Compare the two interpretations of (ii). (ii) Max loves his mother a. Max (λx (x loves x’s mother)) b. Max (λx (x loves z’s mother) and (zâ•–=â•–Max)) In (a), where the pronoun is bound, the VP forms a set, and we just have to check whether Max is in it. In (iib), the pronoun remains a free variable. The VP remains an open property, and it has to be held open until the pronoun is assigned a value. Only when this happens, assessment can take place. If it turns out that the intended value
Processing or Pragmatics?
189
is,€anyway, Max, then it is not obvious why we had to go through assignment at all. The€economy requirement would be, then, “get rid of free variables — i.e. close open properties — as soon as possible.” So this appears to be an instance of the ‘least effort’ principle of economy. This view of the economy requirement is developed, under a different terminology, in Fox 1998. Reuland (2001), assuming a generalization like (i), offers a different rationale for why “least effort” requires that (i) should hold. On his analysis, variable binding is a procedure taking place within the computational system (forming a chain), while coÂ� reference is a discourse procedure. He argues, roughly, that in general, procedures applying during the derivation are more economical than those applying at the interface. So when the first is available, an interpretation based on the second is excluded. Nevertheless, in view of problems surveyed partially below, I argued (Reinhart 2002) that these potential economy considerations do not, in fact, play a role in anaphora resolution, and there is no general preference for variable binding over coreference or covaluation, when both are allowed by the computational system. 4.╇ The working memory system should not be confused with memory resources in general (long-term memory). E.g., an anonymous reviewer of this paper argues against the claim put forth here that “children are capable of memorizing a large number of new words; they are capable of learning rules of new games, etc.” and that the same is true of aphasic patients. But these tasks concern memory resources in general, regarding which I am not aware of evidenced limitation in children. Smith (1999) explains that the view of working memory as a gateway to long term memory has been undermined by neuropsychological studies that found that there are patients who are impaired on working memory tasks but perform normally on long-term memory tasks. Note also that the precise details of how working memory develops — whether memory capacity itself increases, or only efficiency in allowing more resources to be employed in storage — is a subject of debate. But these details are not important for the present discussion, because either way children’s working memory was found not to operate as efficiently as adults’. 5.╇ T&W also try to provide empirical counter-evidence to G&R’s claim that it is the complexity of the computation which is responsible for children’s difficulties. This is based on the assumption that there are other areas of anaphora that involve equally complex computations, and still, they pose no difficulties to children. As they put it, “indeed, there are several empirical findings in the literature showing that for many complex structures, children can hold two representations in memory and compare them for the purposes of computing the reference of a pronoun” (T&W, 46). However, the two examples they discuss of such complex computations do not, in fact, involve any reference-set computation, nor can it be argued that they pose a comparable complexity of computation. One example concerns discourse anaphora instances as in (i) (T&W’s (39), p. 46). (i) a. No mouse/every mouse came to Simba’s party. He wore a hat. b. A mouse came to Simba’s party. He wore a hat. The pronoun can refer to the indefinite in (ib), but not to the quantified DP in (ia). T&W cite experiments of Crain and of Conway that found that children performed almost like adults on anaphora tasks in such sentences, and they conclude that this is despite the
190
Reinhart
fact that “clearly children must be able to hold both sentences in memory in order to apply the relevant constraint” (T&W, 47). It is not obvious why this is so clear. I am not aware of an analysis that requires a reference-set computation in such tasks, and if one exists, it is unmotivated. In this case, there is no need to hold two representations at all. It has been established in Discourse Representation Theory (and other frameworks) that indefinites introduce a discourse referent that can be picked up in subsequent discourse (ib), while quantified NPs normally do not (special circumstances, absent in (ia), aside). This generalization can be stated under many theoretical formulations, but the task involved is establishing which item in the discourse referent storage is available for the pronoun to get its value from. The mouse entity is available in this storage for (ib) but not for (ia). In any case, the task requires looking at the discourse storage, rather than retaining two representations, let alone comparing them. The other example is with quantified ( bound) anaphora in the domain of condition B. T&W cite experiments by Crain (1991) and Thornton (1990), which checked sentences like (iia). (ii) a. I know who scratched him — Bert. b. Every turtle scratched him and Bert did too. Children correctly rejected (iia) if Bert was shown scratching himself, which means that they had no difficulties in processing the sentence. But this is hardly surprising. T&W view (iia) as an instance of ellipsis, namely a VP needs to be copied or reconstructed for Bert. If so, this is precisely equivalent to the type of task in (iib), which was the focus of T&W’s own experiments (although T&W did not experiment with sentences precisely like (iib), but rather with sentences with the reverse ordering of the conjuncts, which, they note, may have been an oversight). The construal of the second conjunct is determined by the construal of the first. But in the first, there is no coreference option to begin with, because the antecedent is quantified. Binding condition B disallows construing the pronoun as bound, namely forming the predicate λx (x scratched x), hence this predicate cannot be reconstructed in the second conjunct. (As shown already in Chien and Wexler (1990) and reiterated in T&W, children do not have difficulties with the variable binding aspects of condition B.) Next, the parallelism condition determines that a coreference interpretation in the second (elided) conjunct is only possible if it is available in the first. According to T&W, children essentially master this condition. In any case, it seems that in this example, T&W’s analysis and the reference-set analysis have precisely the same predictions. So if this experiment poses a problem, it is a problem for both. 6.╇ In the reported experiments, children allowed coreference in (23) in 37.5 percent of the cases, which was not significantly different from their performance on condition B violations. 7.╇ In fact, T&W make a stronger claim that “backtracking in order to reconsider the interpretation of the pronoun, is not likely to be within the parser’s capacity, either for children or for adults” ( p. 107). This cannot be true, since the adult parser can clearly deal with backward anaphora, as well as with the apparent condition C violations permitted by rule I, such as (16) above, or (i) (Evans 1980): (i)╇I know what Ann and Bill have in common. She thinks that Bill is a genius, and he thinks that Bill is a genius.
Processing or Pragmatics?
191
Needless to say, if T&W’s generalization also holds for adults, then there is very little evidence for condition C in right-branching languages, since most of what it rules out would also be ruled out by the special parser limitation. 8.╇ As explained in Fox 1995, an ellipsis context is not sufficient, by itself, to license coreference by rule I. Thus, in the reverse order in (i), coreference in the first conjunct is not allowed, even though this would enable the interpretation that both the kiwi bird and Flash Gordon himself cleaned Flash Gordon. (i)╇ He cleaned Flash Gordon, and the kiwi bird did too. However, the prohibition stated by Fox is (roughly) against letting future discourse affect the processing of a given derivation. In (28), by contrast, the computation of rule I applies after the relevant predicate has been formed in the previous context. That apparent condition C violations are possible in the given ellipsis context has been noted before. T&W mention that Fiengo and May (1994) suggested, for sentences€ like Mary likes John and he thinks that Sally does too, that an operation of “vÂ�ehicle change” (roughly) changes the status of John in the second conjunct to that of€a pronoun. As T&W point out, however, this would not work for the local context of (28), where a pronoun is ruled out as well. (Fiengo and May argue that sentences like (28) are indeed ruled out, but given the adults’ answers in the experiment (accepting (28) 83 percent of the time), this cannot be true.) The account T&W offer for why cÂ�oreference is permitted in (28) is that since the pronoun is stressed, it is taken as a different guise of Flash Gordon. Hence this is an instance of coreference under different guises. 9.╇ T&W’s example is (i), along with a similar example with the predicate vote for him. (i)╇You know what Mary, Sue and John have in common? Mary admires John, Sue admires John, and he admires him to. 10.╇ This either/or condition is reminiscent of the original Principle P that Chien and Wexler (1990) offered to account for coreference: Assuming that binding conditions B€ and C always enforce contraindexing, the principle says that “contraindexed NPs are€noncoreferential unless the context explicitly forces coreference.” In other words, contraindexed NPs are either coreferential or not, depending on unspecified context considerations. 11.╇ This is in general the case with interface reference-set computation. In Reinhart 2006, I argue that the same computation is found with quantifier raising and stress shift for focus. In these cases as well, the computation needs to be carried out only if the relevant interpretation is considered. This means, e.g., that not all interpretations of quantifier scope are equally complex. To compute whether in (i) a woman can take scope over every bear no special computation needs to be applied. (i)╇ A woman washed every bear. But if the option of a quantifier-raising interpretation is considered (wide scope for eÂ�very bear) a semantic reference-set needs to be constructed, so the computation is more costly. 12.╇ In the VP ellipsis sentences like Bert brushed him and the Tin man did too, the coreference acceptance rate was 43 percent. Their combined rate was 50.5 percent.
192
Reinhart
Usually, the more results are combined in a chance pattern of performance, the closer it gets to precisely 50 percent. 13.╇ It is possible, in fact, to formulate such a strategy, but it is hard to see where it could come from: It would be to disallow coreference whenever a pronoun can be bound, skipping rule I altogether. This would rule out coreference in condition B environments but not in (40), where the pronoun cannot be bound. Hence, rule I still needs to be processed, leading to the familiar failure and guessing. There is another interesting finding of T&W that appears consistent with such a strategy. This regards the strict interpretation of reflexives in VP ellipsis contexts such as (ib), namely, the question whether children allow a coreference interpretation for the reflexive in (ia), as opposed to its interpretation as a bound variable. On this issue, there is no reason to expect 50 percent performance, under either theory, and indeed it was not found. But the two groups performed dramatically differently, as summarized in (ii) (T&W p. 195). (i) a. Hawkman fanned himself. b. Hawkman fanned himself and the baby boy did too. (ii) Acceptance of the strict interpretation of (ib) (The baby boy fanned the Hawkman): a. Group I (4 children): 13 percent (1/16) b. Group II (15 children): 81 percent (21/26) The four children T&W identified as little adults disallowed it; the others allowed it. From the perspective of rule I alone, coreference should be permitted in (ia), because clause b of rule I does not hold — Hawkman can bind himself. (See the discussion of (20). The question why this is not an option taken by adults more commonly is an independent issue, to which I do not know the answer.) Children who apply rule I will be able to get this far, and since this clause does not hold, they will allow coreference here. Children who bypass rule I and operate by the strategy just outlined will rule out coÂ� reference in (ia) because the reflexive can be bound. References Adams, A.-M., and Gathercole, S. E. 1996. Phonological working memory and spoken language development in young children. Quarterly Journal of Experimental Psychology 49A, 216 –233. Baddeley, A. D. 1986. Working Memory. Oxford University Press. Chien, Y.-C., and Wexler, K. 1990. Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics. Language Acquisition 1, 225–295. Chierchia, G., Crain, S., Guasti, M. T., Gualmini, A., and Meroni, L. 2001. The acquisition of disjunction: Evidence for a grammatical view of scalar implicatures. In Anna H.-J. Do et al. (eds.), BUCLD 25 Proceedings: Proceedings of the 25th annual Boston University Conference on Language Development. Cascadilla. Chierchia, G., and Guasti, M. T. 2000. Backward vs. forward anaphora: Reconstruction in child grammar. Language Acquisition 8 (2), 129–170. Chomsky, N. 1981. Lectures on Government and Binding. Foris.
Processing or Pragmatics?
193
Crain, S., and McKee, C. 1985. The acquisition of structural restrictions on anaphora. In Proceedings of NELS 16, University of Massachusetts, Amherst. Crain, S., Ni, W., and Conway, L. 1994. Learning, Parsing, and Modularity. In C. Clifton et al. (eds.), Perspectives on Sentence Processing. Erlbaum. Crain, S., and Thornton, R. 1998. Investigations in Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics. MIT Press. Dowty, D. 1980. Comments on the paper by Bach and Partee. In K. J. Kreiman and A.€E. Ojeda (eds.), Papers from the Parasession on Pronouns and Anaphora. Chicago Linguistic Society. Evans, G. 1980. Pronouns. Linguistic Inquiry 11 (2), 337–362. Fiengo, R., and May, R. 1994. Indices and Identity. MIT Press. Fox, D. 1995. Economy and scope. Natural Language Semantics 3, 283–341. Fox, D. 1998. Locality in variable binding. In P. Barbosa et al. (eds.), Is the Best Good Enough? Optimality and Competition in Syntax. MIT Press. Gathercole, S. E., and Adams, A. 1993. Phonological working memory in very young children. Developmental Psychology 29, 770 –778. Gathercole, S., and Baddeley, A. 1993. Working memory and language. In Essays in Cognitive Psychology. Erlbaum. Gathercole, S., and Hitch, G. 1993. Developmental changes in short-term memory: A revised working memory perspective. In A. F. Collins et al. (eds.), Theories of Memory. Erlbaum. Grimshaw, J., and Rosen, S. T. 1990. Knowledge and obedience: The developmental status of the binding theory. Linguistic Inquiry 21, 187–222. Grodzinsky, Y., and Reinhart, T. 1993. The innateness of binding and coreference. Linguistic Inquiry 24 (1), 69–102. Gualmini, A., Crain, S., Meroni, L., Chierchia, G., and Guasti, M. T. 2001. At the sÂ�emantics/pragmatics interface in child language. In Proceedings of SALT X. CLC. Heim, I. 1998. Anaphora and semantic interpretation: A reinterpretation of Reinhart’s approach. In U. Sauerland and O. Percus (eds.), The Interpretative Tract, MIT Working Papers in Linguistics 25. Ingram, D., and Shaw, C. 1981. The comprehension of pronominal reference in children. Manuscript, University of British Columbia, Vancouver. Keenan, E. 1971. Names, quantifiers and a solution to the sloppy identity problem. Papers in Linguistics 4 (2), 211–232. Lust, B., and Clifford, T. 1982. The 3D study: Effects of depth, distance and directionality on children’s acquisition of Mandarin Chinese. In J. Pustejovsky and P. Sells (eds.), Proceedings of NELS 12. University of Massachusetts, Amherst. Lust, B., Loveland, K., and Kornet, R. 1980. The development of anaphora in first language. Linguistic Analysis 6, 217–249. Reinhart, T. 1976. The Syntactic Domain of Anaphora. Ph.D. dissertation, Massachusetts Institute of Technology.
194
Reinhart
Reinhart, T. 1983. Anaphora and Semantic Interpretation. Croom Helm and University of Chicago Press. Reinhart, T. 1999. The processing cost of reference-set computation: Guess patterns in acquisition. OTS Working Papers in Linguistics, University of Utrecht. Reinhart, T. 2000. Strategies of anaphora resolution. In H. Bennis et al. (eds.), Interface Strategies. Royal Netherlands Academy of Arts and Sciences. Reinhart, T. 2006. Interface Strategies. MIT Press. Reuland, E. 2001. Primitives of binding. Linguistic Inquiry 32 (3), 439– 492. Smith, E. E. 1999. Working memory. In R. A. Wilson and F. C. Keil (eds.), The MIT Encyclopedia of the Cognitive Sciences I. MIT Press. Solan, L. M. 1978. Anaphora in Child Language. Doctoral dissertation, University of Massachusetts, Amherst. Tavakolian, S. L. 1977. Structural Principles in the Acquisition of Complex Sentences. Doctoral dissertation, University of Massachusetts, Amherst. Taylor-Browne, K. 1983. Acquiring restrictions on forwards anaphora: A pilot study. Working Papers in Linguistics 9, University of Calgary. Thornton, R. 1990. Adventures in Long-Distance Moving: The Acquisition of Complex Wh-Questions. Doctoral dissertation, University of Connecticut, Storrs. Thornton, R., and Wexler, K. 1999. Principle B, VP Ellipsis, and Interpretation in Child Grammars. MIT Press.
IIâ•…
Adults’ Processing of Reference: Evidence from the Eye-Tracking Paradigm Visual-World
8â•…
Disfluency Effects in Comprehension: How New Information Can Become Accessible Jennifer E. Arnold and Michael K. Tanenhaus
Spontaneous speech is rarely fluent. This results in hesitations, filler words (“um,” “uh”), repeated or repaired words, and pronouncing “the” as /thee/ (rhyming with “tree”) or “a” as /ey/ (rhyming with “say”) (Fox Tree and Clark 1997). It is often assumed that these phenomena are “performance errors” (Chomsky 1965) that merely hinder uncovering the linguistic properties of the input. However, disfluency occurs systematically with respect to other features of the discourse, situation, and speaker’s intentions (Clark and Fox Tree 2002; Clark and Wasow 1998). This means that the presence, the type, and the location of a disfluency carry potentially useful information for the listener (Arnold, Tanenhaus, Altmann, and Fagnano 2004; Arnold, Hudson Kam, and Tanenhaus 2007; Corley, MacGregor, and Donaldson 2007; Clark and Fox Tree 2002; Bailey and Ferreira 2003; Barr 2001a,b; Barr and Seyfeddinipur 2010; Brennan and Schober 2001; Fox Tree 1995, 2001). In this paper we describe ongoing research into the question of how speaker disfluency might affect on-line reference comprehension. Our research is motivated by an interest in understanding the basic processes underlying reference comprehension. In particular, we believe that disfluency can provide information about how and why comprehenders perceive some things in a discourse as more accessible than others. Most researchers, including Almor (2000), Brennan et al. (1987), Dahan et€al. (2002), Gernsbacher (1990), McDonald and MacWhinney (1995), and Sanford and Garrod (1981), agree that the comprehension of referring expressions is affected by the accessibility of potential referents. Accessibility is generally assumed to be related to focus of attention, where speakers and listeners focus on some things in the discourse situation more than others. However, it seems necessary to assume that discourse participants coordinate the accessibility of entities jointly, at least partially. A speaker may have focused all her attention on an entity, but if that entity is not available to her addressee then it is not felicitous to use referring expressions that are normally reserved for highly accessible entities, such as pronouns or deaccented noun phrases.
198
Arnold and Tanenhaus
How, then, do interlocutors coordinate accessibility? The traditional approach to this question is to appeal to the discourse record, and especially to linguistic, textual properties (Ariel 1990; Chafe 1976, 1994; Givón 1983; Grosz and Sidner 1986; Grosz, Joshi, and Weinstein 1995; Gundel et al. 1993; Prince 1992). The way in which different entities have been treated in the textual history of the discourse is information that is available to all discourse participants. At the most basic level, entities that are discourse-old or given are considered more accessible than those that are discourse-new (Chafe 1976, 1994; Gundel 1988; Prince 1992). Though there are many definitions of these terms, given information is often considered that which is known to the discourse participants, whereas new information is that which is not known. However, there are different domains over which “known” can be calculated. Prince (1992) distinguishes between information that is known to the hearer (“hearer-old”) and that which is both hearer old and has also been mentioned in the discourse (“discourse-old”). Some studies limit “given” status to only that which is both known and highly accessible. There are also different modes in which information can become given, e.g. through linguistic mention, visual presentation, and inferring something through association with something else that has been mentioned or presented (Prince 1992; Haviland and Clark 1974; Hankamer and Sag 1976). Nevertheless, many studies operationalize givenness in terms of whether something has been referred to linguistically. Among given entities, discourse properties also make some entities more accessible than others. For example, a first-mentioned entity in subject position is more accessible than entities mentioned later in non-subject positions (Arnold, Eisenband, Brown-Schmidt, and Trueswell 2000; Gernsbacher 1990; Gordon, Grosz, and Gilliom 1993; Hudson-D’Zmura and Tanenhaus 1998). Other factors that have been shown to be important are the thematic roles that entities have played (Arnold 2001; Stevenson, Crawley, and Kleinman 1993), the implicit causality of verbs (Garvey and Caramazza 1974; Ehrlich 1980; Garnham, Traxler, Oakhill, and Gernsbacher 1996; McDonald and MacÂ� Whinney 1995), focus constructions (Almor 1999; Arnold 1998), recency of mention (Givón 1983), and parallelism (Chambers and Smyth 1998). Other temporal, spatial, and causal properties of a discourse can also make certain entities more accessible than others (Bower and Morrow 1990; Zwann, Graesser, and Magliano 1995). The Expectancy Hypothesis
The discourse-history approach to accessibility has isolated a variety of factors that affect reference comprehension, but it does not explain why this diverse
Disfluency Effects in Comprehension
199
set of factors should affect the focus of attention of discourse participants. A potential answer to this question can be found in what we have termed the “expectancy hypothesis.” The expectancy hypothesis was first proposed by Arnold (1998, 2001), who used corpus studies to show that many of the abovementioned textual characteristics are correlated with a higher likelihood for a particular entity to be mentioned in the following utterance. For example, if an entity is mentioned recently, and in particular in the position of grammatical subject of the preceding clause, it has a higher likelihood of occurring in the incoming clause than a non-recently mentioned or non-subject entity. That is, it has a higher expectancy. The same is true for recently mentioned vs. not recently mentioned entities, for entities that just played the thematic role of goal, compared to theme, for entities in focus position of clefts, and for entities in parallel syntactic position to the noun phrase that is currently being encountered (Arnold 1998, 2001). Similarly, Gernsbacher and Jescheniak (1995) aÂ�rgue that indefinite this and contrastive stress make referents more accesÂ� sible€by indicating that they are more likely to be mentioned in the following discourse. The expectancy hypothesis suggests that the textual history of the discourse can be a strong determinant of referent accessibility, insofar as it affects the perceived likelihood that a particular entity will be mentioned in the upcoming discourse. When the comprehender perceives that an entity is likely to be mÂ�entioned, it is likely that the speaker is also focusing attention on this entity. The expectancy hypothesis is based on the idea that accessibility is linked to the comprehender’s assessment of what is considered the most important to the task at hand at each moment. This would have to be a dynamic assessment, as well as multi-dimensional, since different things can be “important” at the same time for different reasons. In this sense, the expectancy hypothesis focuses on what the textual approach originally intended to capture: the cognitive status of discourse entities (Gundel et al. 1993). Even though the above-mentioned features are meant to affect accessibility as a cognitive phenomenon, many studies focus instead on the effects of the linguistic features themselves. In fact, Grosz and Sidner (1986, p. 179) explicitly define the attentional state component of their model as an “abstraction of the participants’ focus of attention .â•–.â•–. a property of the discourse itself, not of the discourse participants.” The expectancy hypothesis contrasts with the discourse-history view of accessibility in that it suggests that other, non-textual, factors should also affect referent accessibility, if they affect the likelihood of that entity’s being mentioned. Here we test this idea by looking at how disfluency affects reference comprehension. While the discourse-history account suggests that given information is always more accessible than new information, the investigation of
200
Arnold and Tanenhaus
disfluency effects in comprehension offers an opportunity to investigate conditions under which new entities may become relatively more accessible, perhaps even more accessible than given entities. Disfluency tends to occur when speakers are having trouble during language production (Clark and Fox Tree 2002; Clark and Wasow 1998; Fox Tree and Clark 1997). There are many aspects of production the speaker could have trouble with, including deciding what to say, selecting a lemma, generating a phonetic plan, and executing a phonetic plan (Levelt 1989). All these processes are more difficult for new entities than given ones. This means that disfluent phrases like “thee uh .â•–.â•–.” may signal comprehenders that the speaker is about to refer to something new — i.e., something that has not already been mÂ�entioned. The research described in this paper builds directly upon Dahan, Tanenhaus, and Chambers’s (2002) findings that comprehenders are biased toward given referents when interpreting fluent definite noun phrases, with different preferences for accented and deaccented noun phrases. Listeners were asked to follow instructions such as “Put the candy above the diamond. Now put the candy/ CANDY.â•–.â•–.â•–.” The target noun phrase was the theme of the second instruction, and it was either accented or deaccented. The visual display included four objects, two of which were cohort competitors (for example, a candy and a candle). These objects had names with overlapping initial segments, which meant that the target noun was temporarily ambiguous during the initial part of the word (“cand .â•–.â•–.”). Eye-movement data revealed that for deaccented nouns participants quickly converged on the target, which was the given and most highly accessible entity from the preceding instruction, and experienced little competition from the unmentioned cohort. For accented nouns, participants initially fixated more on the cohort object, revealing a preference for accented noun phrases to refer to an entity that was discourse-new, but visually present, in comparison with a given and highly accessible one. A subsequent experiment revealed that comprehenders preferred to link accented nouns with referents that had been mentioned but were not highly accessible, as in “Put the necklace above the candy. Now put the CANDY .â•–.â•–.â•–,” as opposed to referents that had not been mentioned at all. That is, even though all the objects were visually available to the experiment participants, both accented and deaccented nouns were preferentially associated with entities that were linguistically given. This research joins other studies showing that reference comprehension is facilitated for given, especially linguistically given information (e.g., Haviland and Clark 1974; Clark and Haviland 1977). Like Dahan et al. (2002), we present listeners with natural-sounding instructions in the context of concrete, real-world referents, and use free-head eye tracking to observe the time course of listeners’ hypotheses about what the
Disfluency Effects in Comprehension
201
speaker is referring to. The use of this methodology forms part of an additional goal of our research, which is to study language comprehension in naturalistic situations, investigating speech characteristics that are normally absent from psycholinguistic experiments. While most laboratory research ignores “messy” speech characteristics such as disfluency, overlapping speech, and sentence fragments, these features are characteristic of natural language use. In these ways, this research represents an attempt to shed light on the issues around which this volume is organized. Like Dahan et al., we adopt a technique of including cohort competitors (e.g., candle/camel) in the visual situation, which introduces a temporary ambiguity of the referent of the target word. Research with the eye-tracking vÂ�isual-world paradigm has shown that comprehenders tend to look at the entities they are considering as referents of temporarily ambiguous expressions, revealing a fine-grained picture of the competitor set for particular expressions and discourse conditions (Allopenna et al. 1998; Dahan et al. 2002; Tanenhaus et al. 1996). Although these cohort competitor effects seem subtle, there is a growing body of evidence that a variety of manipulations can change the magnitude of competitor effects, including target and competitor frequency Â�(Dahan, Magnuson, and Tanenhaus 2001), and accent (Dahan et al. 2002). Here we investigate the relative proportion of looks to the referents of cohort competitors (e.g., a picture of a candle or camel) to investigate the relative availability of each entity for fluent and disfluent referring expressions. This line of research promises to not only address questions about reference comprehension and accessibility, within a natural setting, but also to provide a rich testing ground for other questions about language processing. Experiments 2 and 3 (described below) begin to explore the different cues that lead to the perception of disfluency, in particular the relationship between traditional manifestations of disfluency and the prosodic characteristics of the utterance. Subsequent research, described briefly below, asks what kind of mechanism uÂ�nderlies the effects of disfluency, and in particular whether comprehenders make attributions about the cause of speaker disfluency. We hope that these studies will shed light on the more general question of whether language comprehension takes into account estimations of the speaker’s production processes and intentions. The Systematic Distribution of Disfluency in Production
The reason that disfluency provides a potential cue to comprehenders is that it€ reflects normal production processes, and therefore occurs systematically with respect to features of the discourse and speech situation. Research on the
202
Arnold and Tanenhaus
production of disfluency has suggested that speakers are more likely to be disfluent when they are having trouble with some aspect of language production and need extra planning time (Clark and Wasow 1998; Fox Tree and Clark 1997). Clark and colleagues argue that disfluency results from the tension between the time needed to plan upcoming speech (the “formulation imperative”) and from the need to avoid long delays or silences, which may signal that the speaker is no longer participating in the conversation (the “temporal imperative”; Clark 1996; Clark and Wasow 1998). There are many ways in which production difficulty changes the speech signal. Many of these are identified as forms of disfluency, including the disfluent words “um” and “uh” (Clark and Fox Tree 2002), repeats or repairs (Clark and Wasow 1998), and pronouncing “the” as “thee” (rhyming with “tree”) (Fox Tree and Clark 1997). Notably, any particular problem in production may result in many such changes to the speech signal, which leads to the tendency for multiple disfluent productions to co-occur (Fox Tree and Clark 1997). Production difficulty also affects the prosodic characteristics of speech. It is well documented that words tend to be pronounced with longer durations in situations of production difficulty, or when speakers need an extra moment to plan (Bell et al. 2003; Gregory, Joshi, and Sedivy 2003; Griffin and Garton 2003). Pauses can also constitute a form of disfluency, although they are often ignored due to the difficulty in distinguishing disfluent from nondisfluent pauses (Fox Tree 1995). But even the placement of non-disfluent prosodic breaks is influenced by the difficulty of just-completed and upcoming constituents. Watson (2002; see also Watson and Gibson, 2004) found that the likelihood of a sÂ�peaker’s producing an intonational phrase boundary depends on the complexity of the following constituent, perhaps because speakers need this break to plan upcoming material. Since disfluent productions tend to occur when the speaker is having difficulty, it stands to reason that they would occur more often in some situations than in others. Research on language production shows that indeed this is the case. For instance, disfluent speech and pausing is most likely to occur at the onsets of syntactic constituents (Clark and Wasow 1998; Ferreira 1991) and intonational phrases (Clark and Fox Tree 2002), suggesting that these are the units over which speakers do much of their planning. The likelihood of disfluency is also related to the complexity of the constituent (Clark and Wasow 1998), suggesting that disfluency occurs under situations of cognitive load during planning and production. We should therefore also expect a higher rate of disfluency when speakers are referring to something that requires an extra second of planning, for example something that is new to the discourse.
Disfluency Effects in Comprehension
203
A corpus analysis reveals that indeed speakers are more likely to be disfluent when producing a noun phrase that refers to something new than when producing one that refers to something given. We analyzed data that were collected in an experiment in which speakers gave instructions to addressees ( both the speakers and the addressees were naive participants) about giving objects to animals, e.g. “Give the corn to the pink duck” (Arnold, Wasow, Losongco, and Ginstrom 2000). The objects were physically present on a table in front of both participants, and the animals were represented by pictures glued onto boxes. Cue cards led the addressee to begin each trial with a question about either an animal or an object, so either the animal or the object was “given.” The objects were either simple ones, generally described with short phrases (e.g., “the scissors”), or sets of objects that required longer descriptions (e.g., “the cup with blue and red spots”). The speakers’ instructions were transcribed, and the noun phrases referring to each referent (animal or object) were coded for whether the object was given or new within the trial, whether the NP was “long” (three or more words) or “short” (one or two words), and whether the speaker was disfluent during or immediately before the noun phrase. “Disfluent speech” included “um”, “uh,” “huh,” repeated or repaired words, “thee” for “the” or “ay” for “a,” and phrases indicating repair like “I mean.” Analyses showed that a noun phrase was more likely to be disfluent when it was new (21 percent) than when it was given (16 percent). Disfluencies were also more common for long NPs than for short ones, and for theme NPs (objects), which were mentioned only once, than for goal NPs (animals), which were mentioned repeatedly throughout the experiment. However, neither length nor NP type (theme or goal) accounted for the correlation between disfluency and new referents. ANOVAs showed main effects of both given /new status (F1(1,45)â•–=â•–19.489, pâ•–<â•–0.001; F2(1,23)â•–=â•–5.496, pâ•–<â•–0.05) and NP length (F1(1,45)â•–=â•–16.801, pâ•–<â•–0.001), F2(1,23)â•–=â•–8.245, pâ•–<â•–0.01), but no interaction (╃p’s > 0.5). A separate ANOVA considered given /new status with respect to NP type and revealed main effects of given /new status: (F1(1,45)â•–=â•– 15.210, pâ•–<â•–.001) and NP type: (F1(1,45)â•–=â•–16.821, pâ•–<â•–.001), but no interaction (╃p > .3).1 Does Disfluency Affect Comprehension?
Given that disfluency correlates with reference to new objects, we wanted to know whether comprehenders make use of this information during comprehension. We tested this idea in two on-line experiments (experiments 1 and 2),
204
Arnold and Tanenhaus
Table 8.1
Sample instructions. Given context instruction New context instruction Fluent target instruction Disfluent target instruction
Put the grapes below the candle. Put the grapes below the camel. Now put the CANDLE below the salt shaker. Now put thiy, uh, CANDLE below the salt shaker.
Figure 8.1
Sample visual display for all experiments.
and one off-line experiment (experiment 3) (Arnold, Tanenhaus, Altmann, and Fagnano 2004). In experiments 1 and 2, participants followed fluent and disfluent instructions (e.g., “Now put thee, uh .â•–.â•–.” vs. “Now put the .â•–.â•–.”; see table 8.1) to move objects on a computer screen (figure 8.1) while their eye movements were recorded with a head-mounted eye tracker. On each trial the scene contained two cohort competitor objects (e.g. candle and camel) and two distracters (e.g., grapes and salt shaker). The names of the distracters had no phonetic overlap with those of the cohorts. Participants heard one of the four pairs of instructions resulting from the cross of the context and target phrases in tÂ�able 8.1. The context instruction established the target as either given or new, and the second (critical) instruction was either disfluent or fluent. All target words were accented, which meant that the fluent NPs were predicted to lead to a bias toward the given but less accessible entity from the context sentence, i.e. the second-mentioned object (Dahan et al. 2002). By contrast, if comprehenders are sensitive to the correlation between disfluency and reference to new objects, the disfluent NP (which was also accented) should lead to a bias toward the discourse-new entity.
Disfluency Effects in Comprehension
205
One of the challenges of studying features of naturally occurring language, such as disfluency, is to maintain experimental control and still present participants with natural-sounding stimuli. Though disfluency is a frequent part of language, it may sound out of place in a laboratory setting, where scripted, f�luent speech is the norm. We overcame this by telling participants that the instructions had been generated by another subject in the context of the same visual scene, when in fact they had been recorded by the experimenter. It was emphasized that the speaker was shown what to say by graphic cues but had to€come up with her own words. A post-experiment questionnaire confirmed that most participants believed the story, and those that expressed any doubts about the natural production of the stimuli (<6 percent) were excluded from the analyses. Experiment 1
Sixteen participants’ eye movements were monitored as they viewed scenes like that shown in figure 8.1 and followed instructions like those in table 8.1. Each of the 16 experimental items was rotated through the four conditions resulting from the cross of the context and target instructions, and combined with 32 fillers. The target item (camel vs. candle) was also manipulated as a control variable; this resulted in eight lists, with both forward and backward versions. All filler items contained cohorts; half began like the target items, but did not mention either cohort in the second utterance. The other half mentioned no cohort in the first utterance. Half the fillers contained disfluencies of various types and in various locations. The visual stimuli were versions of the pictures used by Snodgrass and Vanderwart (1980), colored and normed for frequency, visual complexity, and familiarity (Rossion and Purtois 2001). These dimensions were counterbalanced across items, so on average the properties were the same for the target and the competitor (Dahan et al. 2001). The locations of the cohorts were also counterbalanced across items. The instructions were recorded to achieve a natural-sounding fluent or disfluent instruction. As in naturally occurring speech, there were a number of features that differentiated fluent items from disfluent items: disfluent “thee” vs. fluent “thuh,” presence vs. absence of “uh,” longer durations for “Now” and “Put” in the disfluent condition, and a larger pitch excursion on “Now” in the disfluent condition. The prosodic characteristics of the disfluent “Now put” resulted in the impression that the speaker was thinking ( henceforth termed “thinking prosody”). In the “fluent” condition, the fluent, accented NP provided an initial bias toward the given but nonfocused object, so we expected faster looks to the
206
Arnold and Tanenhaus
target in the given condition, and a bigger cohort effect in the new condition (replicating Dahan et al. 2002). If the disfluency shifts attention to discoursenew entities, the “disfluent” condition should lead to an increase in looks to the target when it was new, and a bigger cohort effect when it was given. The results supported these predictions. Starting 200 msec after the onset of the head noun (e.g., “candle”), there was an interaction between disfluency and referent, such that there were more looks to the competitor (e.g., the camel) in the disfluent /given and fluent /new conditions. That is, the disfluent condition gave rise to a preference for the new cohort, and the fluent condition to a preference for the given cohort. These results show that the previously established bias toward given entities holds only for fluent referring expressions. Disfluent expressions disrupt this pattern, instead facilitating reference resolution for discourse-new objects (Arnold, Tanenhaus, Altmann, and Fagnano 2004). These results demonstrate that disfluency affects on-line reference resolution. But they raise the question of what features of disfluency contribute to the effect, since our fluent and disfluent conditions differed on multiple dimensions. Of particular interest is the different pitch contour on the words “Now put.” Pitch movement can signal different patterns of accenting, which have also been linked to information status (Dahan et al. 2002; Terken and Hirschberg 1994). Experiment 2 investigated the role of pitch in this effect. Experiment 2
We used the same materials and methods as in experiment 1, except that the pitch contour on “Now put” was manipulated explicitly, resulting in a 2 (given vs. new)â•–×â•–2 (fluent vs. disfluent)â•–×â•–2 (large pitch excursion vs. small) design. Target identity (e.g. camel vs. candle) was not manipulated because there were no main effects or interactions with this variable in experiment 1. The given / new conditions were created by cross-splicing each target sentence with one of the two context sentences, as in experiment 1. Fluency and pitch were manipulated by creating a single “Now put” for each condition, and cross-splicing these into each item (disfluent / large excursion, disfluent /small excursion, fÂ�luent / large excursion, fluent /small excursion). These stimuli were further controlled by cross-splicing a single target word into each condition of an item. Acoustic analyses with Praat (Boersma and Weenink 2002) confirmed that the “large” conditions had greater pitch ranges than the “small” conditions (disfluent / large: 120 Hz; disfluent /small: 21 Hz; fluent / large: 108 Hz; fluent / small, 38 Hz). Duration was maintained as a feature of disfluency, since it tends to co-occur with other forms of disfluency (Gregory et al. 2003; Bell et€al. 2003; Shriberg 2001). That is, the “Now put” segments in the disfluent
Disfluency Effects in Comprehension
207
conditions were longer (disfluent / large: 965 msec; disfluent /small: 1,026€msec) than in the fluent conditions (fluent / large: 642 msec; fluent /small: 531 msec). The amplitude of “Now” and “put” was roughly the same for all conditions. The results replicated those for experiment 1. Again there was an interaction between disfluency and referent beginning 200 msec after the onset of the target noun. Participants fixated on the competitor more in the fluent /given condition than the fluent /new condition, but this bias disappeared for the disfluent stimuli. This finding is reflected in analyses of variance, conducted on the proportions of competitor looks in each condition for a time slice from 200 to 600 msec after the target noun. These analyses revealed an interaction between disfluency and referent (given vs. new) (F1(1,31)â•–=â•–7.211, pâ•–<â•–.05; F2(1,8)â•–=â•– 7.394, pâ•–<â•–0.05).2 There were no significant effects of the pitch manipulation or interactions between it and other variables. However, figure 8.2 shows that the disfluency effect was numerically greatest when a large pitch excursion supported the idea that the speaker was having some kind of trouble. Furthermore, separate analyses of the two pitch conditions revealed that the disfluency by referent interaction was only reliable in the large-pitch-excursion condition (although it was marginal in the participants analysis, F1(1,31)â•–=â•–3.633; pâ•–=â•–.066; F2(1,8)â•–=â•–9.255; pâ•–=â•–.016), and not in the small-pitch-excursion condition (F1(1,31)â•–=â•–2.677, pâ•–=â•–.112; F2(1,8)â•–=â•– 2.116, pâ•–=â•–.184).3 These results suggest that comprehenders perceive disfluency through multiple cues. When disfluency was signaled through only the presence of “thee uh” and longer durations on “Now put,” the presence of disfluency removed the given bias that occurs with fluent stimuli. When a large pitch excursion contributed to the impression that the speaker needed a moment to think, the disfluent stimuli led to an initial bias toward the new cohort. These findings underscore the need to think of disfluency as multifaceted; many features of the signal may reflect a single underlying source of production difficulty. The more the speech signal supports the impression of disfluency, the higher the likelihood that we will see disfluency effects on reference resolution. In particular, when “thinking prosody” occurs early on in an utterance, it may foreshadow later production difficulty. This information may build over time, as it is supported by other speech characteristics, and lead listeners to generate expectations about what the speaker is likely to refer to. We cannot, however, reduce the disfluency effect to pitch. If that were the case, we would expect similar results in both “large excursion” conditions, whether fluent or disfluent. Instead, in both fluent conditions comprehenders fixated on the competitor more when it was given than when it was new. While pitch contributed to the new bias in disfluent conditions, the most reliable
208
Arnold and Tanenhaus
Figures 8.2
Percent fixations on the competitor from 200 to 600 msec after onset of target noun (e.g., candle).
iÂ�nformation about disfluency came from the words “thee uh” and longer word durations. In sum, when the speaker was fluent, listeners were biased toward given objects. But when listeners perceived that the speaker was being disfluent, new objects became relatively more accessible. There were many cues that may have signaled disfluency, but pitch did not play as strong a role as the clear cues to disfluency from longer word durations and the words “thee uh.” These results are consistent with the expectancy hypothesis, which suggests that accessibility is influenced by the comprehender’s perception of the likelihood that the speaker will refer to that entity. The results from Experiments 1
Disfluency Effects in Comprehension
209
and 2 show that disfluency affects the accessibility of entities during reference comprehension. The correlation between disfluency and reference to new objects further suggests that disfluency may provide information that the speaker is relatively more likely to be referring to something new than something given. We tested this idea more specifically in experiment 3. Experiment 3
If disfluency leads comprehenders to perceive a higher likelihood of new entities being mentioned, we should see this reflected in listener’s guesses about what the speaker is likely to say, at different points in the utterance. We tested this idea in an off-line experiment, where we played instructions from experiment 2 that were truncated either after “Now put,” or after “Now put the/thee uh,” in both “large-pitch-excursion” and “small-pitch-excursion” conditions. This experiment provided an explicit test of listener’s off-line judgments of expectancy. It also further investigated the role of prosodic information in the new bias, in that the shorter “Now put” stimuli contained only prosodic indicators of disfluency. Twenty-four participants were presented with eight of the experimental items in experiment 2, combined with four fillers. The experimental items occurred in one of two length conditions, where listeners heard either “Now put,” which contained the pitch (and duration) information, but no other cues to disfluency, or “Now put the/thee uh,” which provided more definitive evidence€of fluency or disfluency. Their task was to choose which object on the screen they thought the speaker was about to mention. The filler items were truncated at some point in the middle of the target word, rendering the filler target words uniquely identifiable. This was done to encourage participants to pay attention to the stimuli, rather than make guesses before hearing the entire fragment. Results showed that after only “Now put,” participants chose one of the new objects 63 percent of the time (68 percent in the disfluent condition; 58 percent in the fluent condition). But after “Now put the/thee uh,” the proportion of new objects chosen rose to 83 percent for the disfluent stimuli, and dropped to 35 percent for the fluent stimuli. Again, there were no main effects or interactions with the prosody manipulation, i.e. whether the “Now put” had a large pitch excursion or a flatter pitch. These results suggest that listener’s expectations are not influenced by the thinking prosody on “Now put” alone. Although “thinking prosody” foreÂ� shadows other manifestations of disfluency in one condition, it was not in and of itself enough to create a new bias. However, the occurrence of definitive
210
Arnold and Tanenhaus
information about disfluency leads listeners to probabilistically expect upcoming reference to something new. By contrast, definitive information about fluent speech increases the expectancy of the given objects. Discussion
The results from experiments 1 and 2 provide strong support for the hypothesis that disfluency affects reference comprehension. Listeners showed a preference to fixate the discourse-new cohort when the instruction was disfluent, and to fixate the discourse-given cohort when the instruction was fluent. These bÂ�iases occurred before the target word could be disambiguated on the basis of€speech information alone. These results, along with those from experiment 3, suggest that disfluency influences listeners’ expectations about what the speaker is referring to. In fact, the difference between fluent and disfluent conditions occurred beginning 200 msec after the onset of the target word. Since it takes approximately 200 msec to program and launch an eye movement, this is the earliest we could expect to observe any effects from the speech signal. Thus, fluency information is driving reference resolution as early as the phonetic information from the input. These data also show that, under some circumstances, given (mentioned) entities are no more accessible than ones that have not been mentioned, and may be less accessible. This suggests that accessibility cannot be purely described in terms of how an entity has been treated in the textual history of the discourse. In our stimuli, given entities with an identical discourse history could be either more or less accessible, depending on the fluency of the target instruction. What Mechanism Underlies the Disfluency Effect?
While the above findings show that disfluency affects language comprehension, they raise the question of exactly how they do so. Of particular interest is the question of the degree to which listeners make attributions about why the speaker is being disfluent, and whether they use this information to guide reference comprehension. On one hand, listeners may use an attributional mechanism. They may use disfluency to infer that the speaker is having trouble with production, and identify possible sources of difficulty. It is more difficult to refer to new objects than to given ones, which makes new referents more plausible sources of difficulty than given ones. At a more intuitive level, it would seem strange for a speaker to be disfluent when referring to something that was just mentioned (“Do you see the candle? Pick up, thee, uh, candle,”) unless there was some
Disfluency Effects in Comprehension
211
additional reason for the speaker to be distracted. Do comprehenders make these inferences quickly enough to drive reference comprehension? If so, it would be dramatic evidence that comprehension processes are influenced by representations of the mental processes of the speaker. Such evidence would be relevant to the current theoretical debate over the degree to which language processing is influenced by representations of the mental processes, goals, and perspectives of one’s interlocutor. On the other hand, the disfluency effects described here are also consistent with a correlational mechanism that involves no direct calculation of the cause of the disfluency. Since disfluent NPs are correlated with new referents, one possibility is that listeners store and use this correlational information automatically. When they hear signs of disfluency, it may increase the activation of discourse representations for referents that are evoked in the discourse situation (e.g., visually), but have not been mentioned yet. At the same time, activation may fall for discourse representations that have been recently mentioned, perhaps proportional to the accessibility of the entity. Critically, this correlational information can be calculated on the basis of information about the referents themselves (e.g. whether they have recently been mentioned), without concern for either why the speaker is being disfluent, or whether new objects are difficult to name. In this way, correlational information might provide a “short cut,” allowing listeners to make calculations about what a speaker is likely to say without taking into account the speaker’s mental state or intentions. Evidence for this sort of mechanism would parallel some recent results from research on language production, which has found that some production decisions are made without consideration of whether they might result in ambiguity for the listener (Arnold, Wasow, Asudeh, and Alrenga 2004; Ferreira and Dell 2000). Evidence from a second line of research sheds light on this question by looking at what other kinds of entities become accessible in the presence of disfluency (Arnold, Hudson Kam, and Tanenhaus 2007). It is possible that disfluency also leads to biases towards other types of entities, for example those that are difficult to name or identify for other reasons. If this is so, then a correlational mechanism would require the processing system to keep track of correlations between disfluency and all such entities. In a series of experiments, we investigated whether disfluency also introduces an expectation that the speaker is referring to a novel, complex object (e.g., a funny squiggly shape) rather than a known object (e.g., an ice cream cone). Participants viewed a scene, like that illustrated in figure 8.3, with two€ novel objects and two known objects, each in two colors. They heard Â�instructions like “Click on {the/thee uh} red ice cream cone,” or “Click on
212
Arnold and Tanenhaus
Figure 8.3
Sample visual display in the novel/known experiments. The top two items here were in black in the actual experiment, the bottom two in red.
{the/thee uh} red funny squiggly shape that looks kind of like a monkey.” We examined fixations during production of the color word, that is, before the target noun was heard. In the fluent condition, where there was no expectation for either a known or novel target, there were increased fixations on both colormatched objects (e.g., the red ice cream cone and the red squiggly shape). By contrast, disfluency created an expectation for a novel target, resulting in increased fixations on only the novel color-matched object. Disfluency therefore facilitates reference comprehension for at least two kinds of entities: those that are new to the discourse and those that are novel. Although these results are also consistent with both attributional and correlational mechanisms, they complicate the kind of statistics that would be needed for a correlational mechanism. Listeners would have to store information about two correlations (disfluency with new referents, disfluency with novel referents). Alternatively, they might learn that disfluency correlates with “difficultto-name objects.” This, however, would require making decisions about which objects are difficult to name, thus narrowing the distinction between the attributional and statistical mechanisms. A critical question is whether disfluency leads to biases for new and novel objects in all circumstances, or whether these biases disappear in situations where the cause of disfluency can be attributed to something other than naming difficulty. Arnold et al. (2007) tested this by telling participants that the inÂ� structions were produced by a speaker with a cognitive disorder that interfered with ordinary object recognition. This led participants to believe that from the speaker’s perspective, familiar objects were functionally unfamiliar. As pre-
Disfluency Effects in Comprehension
213
dicted, the disfluent novel bias was significantly reduced, compared with a condition where participants were told that the speaker was not disordered. These results suggested that disfluency biases do involve situation-specific inferences. However, the results of an additional experiment showed that such attributions may be limited. When the instructions included evidence that the speaker was distracted (for example by a loud noise), we still observed the bias toward novel objects with disfluent instructions. In sum, the above results clearly establish that disfluency affects on-line reference comprehension, increasing the accessibility of both new (unmentioned) referents and novel objects with no common name. These results support the need for language-comprehension research to consider disfluency and other features of naturally occurring discourse in order to fully understand the processes of language comprehension (Ferreira, Lau, and Bailey, 2004). The results also suggest that the impression of production difficulty may stem from a variety of speech characteristics, including prosodic ones. Finally, this line of research offers a vehicle for exploring some fundamental questions about the mechanisms of language comprehension. In particular, disfluency research can shed light on questions about whether listeners make attributions about why speakers say things in a particular way, which is related to questions about whether listeners model the thought processes, intentions, and goals of the speaker. Acknowledgments
We are grateful to Rebecca Altmann, Maria Fagnano, and Dana Subik for their help in collecting and coding the data, to Katherine Crosswhite for her help with intonational analyses, and to Bob McMurray for his help with the setup and analysis of the novelty experiments. We also thank Sarah Brown-Schmidt, Mikhail Masharov, and Duane Watson for helpful discussion about the experimental design. This research was partially supported by NIH grants HD-41522 to J. Arnold and HD-27206 to M. Tanenhaus. Notes 1.╇ The items analysis could not be performed, because of missing cells. 2.╇ The items analysis also included item group (i.e., those items that rotated together through the lists) as an independent variable. 3.╇ The analyses of variance with participant as the random factor also included disfluency and referent (given vs. new) as independent variables. The items analyses also included disfluency, referent, and item group.
214
Arnold and Tanenhaus
References Allopenna, P. D., Magnuson, J. S., and Tanenhaus, M. K. 1998. Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language 38, 419– 439. Almor, A. 1999. Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review 106 (4), 748–765. Ariel, M. 1990. Accessing Noun-Phrase Antecedents. Routledge. Arnold, J. 1998. Reference Form and Discourse Patterns. Ph.D. dissertation, Stanford University. Arnold, J. E. 2001. The effect of thematic roles on pronoun use and frequency of reference continuation. Discourse Processes 31 (2), 137–162. Arnold, J. E., Eisenband, J. G., Brown-Schmidt, S., and Trueswell, J. C. 2000. The rapid use of gender information: Evidence of the time course of pronoun resolution from eyetracking. Cognition 76, B13–B26. Arnold, J. E., Hudson Kam, C., and Tanenhaus, M. K. 2007. If you say thee uh- you’re describing something hard: The on-line attribution of disfluency during reference comprehension. Journal of Experimental Psychology: Learning, Memory and Cognition 33, 914–930. Arnold, J. E., Tanenhaus, M. K., Altmann, R. J., and Fagnano, M. 2004. The old and thee, uhh, new: Disfluency and reference resolution. Psychological Science 15 (9), 578–582. Arnold, J., Wasow, T., Losongco, T., and Ginstrom, R. 2000. Heaviness vs. newness: The effects of structural complexity and discourse status on constituent ordering. Language 76 (1), 28–55. Arnold, J. E., Wasow, T., Asudeh, A., and Alrenga, P. 2004. Avoiding attachment amÂ� biguities: The role of constituent ordering. Journal of Memory and Language 51 (1), 55–70. Bailey, K. G. D., and Ferreira, F. 2003. Disfluencies affect the parsing of garden-path sentences. Journal of Memory and Language 49, 183–200. Barr, D. J. 2001a. Trouble in mind: Paralinguistic indices of effort and uncertainty in communication. In C. Cavé, I. Guaïtella, and S. Santi (eds.), Oralité et gestualité: Interactions et comportements multimodaux dans la communication. L’Harmattan. Barr, D. J. 2001b. Paralinguistic correlates of discourse structure. Poster presented at the 43rd Annual Meeting of the Psychonomic Society, Orlando. Barr, D. J., and Seyfeddinipur, M. 2010. The role of fillers in listener attributions for speaker disfluency. Language and Cognitive Processes 25, 441–455. Bell, A., Jurafsky, D., Fosler-Lussier, E., Girand, C., Gregory, M., and Gildea, D. 2003. Effects of disfluencies, predictability, and utterance position on word form variation in English conversation. Journal of the Acoustical Society of America 113, 1001–1024. Boersma, P., and Weenik, D. 2004. Praat acoustic analysis software. www.praat.org. Bower, G. H., and Morrow, D. G. 1990. Mental models in narrative comprehension. Science 247 (4938), 44– 48.
Disfluency Effects in Comprehension
215
Brennan, S. E., and Schober, M. E. 2001. How listeners compensate for disfluencies in spontaneous speech. Journal of Memory and Language 44 (2), 274–296. Brennan, S. E., Friedman, M. W., and Pollard, C. J. 1987. A centering approach to pronouns. In Proceedings of the 25th Annual meeting of the Association for Computational Linguistics. Chafe, W. 1994. Discourse, Consciousness, and Time. University of Chicago Press. Chafe, W. L. 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In C. N. Li (ed.), Subject and topic. Academic Press. Chambers, C. G., and Smyth, R. 1998. Structural parallelism and discourse coherence: A test of centering theory. Journal of Memory and Language 39 (4), 593– 608. Chomsky, N. 1965. Aspects of the Theory of Syntax. MIT Press. Clark, H. H. 1996. Using Language. Cambridge University Press. Clark, H. H., and Fox Tree, J. E. 2002. Using uh and um in spontaneous speaking. Cognition 84, 73–111. Clark, H. H., and Haviland, S. E. 1977. Comprehension and the given-new contract. In R. O. Freedle (ed.), Discourse Production and Comprehension. Ablex. Clark, H. H., and Wasow, T. 1998. Repeating words in spontaneous speech. Cognitive Psychology 37, 201–242. Corley, M., MacGregor, L. J., and Donaldson, D. I. 2007. It’s the way that you, er, say it: Hesitations in speech affect language comprehension. Cognition 105, 658–668. Dahan, D., Magnuson, J. S., and Tanenhaus, M. K. 2001. Time course of frequency effects in spoken word recognition: Evidence from eye movements. Cognitive Psychology 42, 317–367. Dahan, D., Tanenhaus, M. K., and Chambers, C. G. 2002. Accent and reference resolution in spoken language comprehension. Journal of Memory and Language 47, 292– 314. Ehrlich, K. 1980. Comprehension of pronouns. Journal of Experimental Psychology 32, 247–255. Ferreira, F. 1991. Effects of length and syntactic complexity on initiation times for prepared utterances. Journal of Memory and Language 30, 210 –233. Ferreira, V., and Dell, G. 2000. Effect of ambiguity and lexical availability on syntactic and lexical production. Cognitive Psychology 40, 296 –340. Ferreira, F., Lau, E. F., and Bailey, K. G. D. 2004. Disfluencies, parsing, and tree-Â� adjoining grammars. Cognitive Science 28, 721–749. Fox Tree, J. E. 1995. The effects of false starts and repetitions on the processing of subsequent words in spontaneous speech. Journal of Memory and Language 34, 709– 738. Fox Tree, J. E. 2001. Listeners’ uses of um and uh in speech comprehension. Memory and Cognition 29 (2), 320 –326. Fox Tree, J. E. 2001. Pronouncing “the” as “thee” to signal problems in speaking. Cognition 62, 151–167.
216
Arnold and Tanenhaus
Garnham, A., Traxler, M., Oakhill, J., and Gernsbacher, M. A. 1996. The locus of implicit causality effects in comprehension. Journal of Memory and Language 35, 517– 543. Garvey, C., and Caramazza, A. 1974. Implicit causality in verbs. Linguistic Inquiry 5, 459– 464. Gernsbacher, M. A. 1990. Language Comprehension as Structure Building. Erlbaum. Gernsbacher, M. A., and Jescheniak, J. D. 1995. Cataphoric devices in spoken discourse. Cognitive Psychology 29, 24–58. Givón, T. 1983. Topic continuity in discourse: An introduction. In T. Givón (ed.), Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins. Gordon, P. C., Grosz, B. J., and Gilliom, L. A. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science 17, 311–347. Gregory, M., Joshi, A., and Sedivy, J. 2003. Adjectives and processing effort in production: So, uh, what are we doing during disfluencies? Paper presented at CUNY Conference on Human Sentence Processing, Boston. Griffin, Z. M., and Garton, K. L. 2003. Procrastination in speaking: Ordering arguments during speech. Paper presented at CUNY Conference on Human Sentence Processing, Boston. Grosz, B., Joshi, A., and Weinstein, S. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 2 (21), 203–225. Grosz, B., and Sidner, C. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12, 175–204. Gundel, J. K., Hedberg, N., and Zacharaski, R. 1993. Cognitive status and the form of referring expressions. Language 69 (2), 274–307. Hankamer, J., and Sag, I. 1976. Deep and surface anaphora. Linguistic Inquiry 7, 391– 426. Haviland, S. E., and Clark, H. H. 1974. What’s new? Acquiring new information as a process in comprehension. Journal of Verbal Learning and Verbal Behavior 13, 512– 521. Hudson-D’Zmura, S., and Tanenhaus, M. K. 1998. Assigning antecedents to ambiguous pronouns: The role of the center of attention as the default assignment. In M. Walker, A. Joshi, and E. Prince (eds.), Centering Theory in Discourse. Oxford University Press. Levelt, W. J. M. 1989. Speaking. MIT Press. McDonald, J., and MacWhinney, B. 1995. The time course of anaphor resolution: Effects of implicit verb causality and gender. Journal of Memory and Language 34, 543– 566. Prince, E. F. 1992. The ZPG letter: Subjects, definiteness, and Information-status. In W.€ C. Mann and S. A. Thompson (eds.), Discourse Description: Diverse Linguistic Analyses of a Fund-Raising Text. John Benjamins. Rossion, B., and Pourtois, G. 2001. Revisiting Snodgrass and Vanderwart’s object database: Color and texture improve object recognition [abstract]. Journal of Vision 1 (3), 413a.
Disfluency Effects in Comprehension
217
Sanford, A. J., and Garrod, S. C. 1981. Understanding Written Language. Wiley. Shriberg, E. 2001. To ‘errr’ is human: Ecology and acoustics of speech disfluencies. Journal of the International Phonetic Association 31 (1), 153Â�–169. Snodgrass, J. G., and Vanderwart, M. 1980. A standardized set of 260 pictures: Norms for name agreement, image agreement, familiarity, and visual complexity. Journal of Experimental Psychology: Human Learning and Memory 6 (2), 174–215. Stevenson, R. J., Crawley, R. A., and Kleinman, D. 1994. Thematic roles, focus and the representation of events. Language and Cognitive Processes 9 (4), 473–592. Terken, J., and Hirschberg, J. 1994. Deaccentuation of words representing ‘given’ information: Effects of persistence of grammatical function and surface position. Language and Speech 37, 125–145. Watson, D., and Gibson, E. 2004. The relationship between intonational phrasing and syntactic structure in language production. Language and Cognitive Processes 19, 713– 755. Watson, D. 2002. Intonational Phrasing in Language Production and Comprehension. Dissertation, Massachusetts Institute of Technology. Zwaan, R. A., Magliano, J. P., and Graesser, A. C. 1995. Journal of Experimental Psychology: Learning, Memory, and Cognition 21 (2), 386 –397.
9â•…
It’s Not What You Said, It’s How You Said It: How Modification Conventions Influence On-Line Referential Processing Jodi D. Edwards and Craig G. Chambers
To interpret a definite noun phrase, a listener must map the components of the phrase to a uniquely identifiable referent in the relevant referential domain. Evidence from a range of studies has shown that this mapping process occurs continuously as the phrase unfolds in time. One example comes from a visualworld study by Eberhard, Spivey-Knowlton, Sedivy, and Tanenhaus (1995) in which listeners followed instructions containing modified definite noun phrases (e.g., Touch the starred yellow square). The results revealed that eye movements to the intended referent in a visual display were closely time-locked to the point in the speech stream where there was sufficient information to distinguish it from the other possible candidates. For example, when the target item was the only starred item in the display, fixations to this object were initiated upon hearing starred. When the display contained multiple objects that were both starred and yellow, identification of the target was delayed until the final noun. This outcome suggests a process whereby the semantics of each successive term are used to continuously narrow the set of referential candidates to a unique entity. In a subsequent study, Sedivy, Tanenhaus, Chambers, and Carlson (1999) investigated whether this continuous evaluation process is also observed for definite noun phrases containing scalar adjectives, which do not denote a stable property independently of the noun they modify. For example, the degree of width conveyed by wide is greater in wide river than in wide belt. In light of this apparent dependency, it is plausible that incremental mapping processes might be delayed until the head noun is encountered. However, the results showed that incrementality is also evident in the interpretation of scalar adjectives. This is possible because listeners recognize that the typical function of a modifier is to distinguish the intended referent from other candidates of the same class. Thus the word tall in the unfolding sentence Pick up the tall .â•–.â•–. is understood to signal a contrast between objects that would otherwise be denoted by the same term. As a result, the adjective directs attention to the tallest
220
Edwards and Chambers
member of this contrast set (e.g., the taller of two drinking glasses in a visual scene). Of particular interest is that this effect occurs even in the presence of a visibly taller candidate that is not a member of the contrast set (e.g., a single water pitcher). This outcome demonstrates an important point, namely that the semantics of a modifier are not “neutrally” applied in the course of evaluating referents in real time. That is, the candidates judged compatible with a modifier are not simply those possessing the properties that are in some way compatible with the modifier’s semantics. Rather, the evaluation is further contoured by the listener’s understanding of why the modifying information is being provided (see also Sedivy 2003). In the studies reported here, our goal is to further clarify how the iÂ�ncremental interpretation of complex noun phrases in spoken language is shaped by factors beyond the basic semantics of their constituents. Our point of departure, however, is not a consideration of why certain information is provided, but rather how this information is expressed. The Linguistic Realization of Conceptual Properties
As was mentioned above, the interpretation of definite noun phrases involves the rapid and ongoing evaluation of each successive term against the properties of candidate referents, with the goal of narrowing the candidate set to the intended unique referent. However, in the studies conducted to date there has been little consideration of the fact that a speaker can use a range of lexical and /or syntactic forms to express information in a complex noun phrase. As a starting point, consider that in order to refer to a particular entity a speaker must decide among a range of alternative forms that encode similar or related meanings (e.g., dog, canine, collie, etc.). A similar decision is required for the expression of properties by means of modifiers. In this case, however, the decision involves not only choosing a term with the appropriate semantics but also choosing an appropriate syntactic construction. For example, the property of “furriness” can be realized by means of a prenominal adjective (e.g., the furry dog), or a postnominal construction such as a relative clause (e.g., the dog that is furry). As is the case for selecting a noun to denote a particular object, certain modifier constructions appear to be more typical or natural for realizing particular kinds of properties. For instance, while some visual characteristics such as color and size tend to be expressed via prenominal adjectives (e.g., the green box/the tall glass), others are more naturally realized postnominally, uÂ�sing constructions such as prepositional phrases (e.g., the sign with the arrows). These conventions can be understood as part of the specific linguistic system that language users acquire, a point supported by the existence of cross-
It’s Not What You Said
221
linguistic differences. For example, in French, while relative size is expressed prenominally, color tends to be expressed by means of postnominal adjectives, e.g., la grosse pomme/la pomme rouge (“the big apple”/“the red apple”). It is important to stress that these regularities by no means constitute inviolable rules. In fact, it is not even the case that all properties are typically expressed in one position or the other. Rather, the preference for a particular position is a matter of degree, with some properties occupying a central area of a continuum and readily allowing the use of either a prenominal or postnominal construction (e.g., the striped bag / the bag with the stripes). What relevance might these conventions have for models of real-time processing? It is obvious that this knowledge must be taken into account during the production of referring expressions in order to avoid circumlocutions or otherwise awkward-sounding expressions (e.g., #the box that is green, #the arrowed sign). However, there is no obvious reason why this knowledge should be explicitly considered in the course of comprehension. The available evidence suggests that interpretive systems opportunistically draw information from lexical terms regardless of where they occur in an utterance. This is evident from the previously discussed case of head-dependent scalar adjectives, as well as from studies showing that referential candidates are evaluated against verb or preposition information encountered before the noun phrase (Altmann and Kamide 1999; Chambers, Tanenhaus, Eberhard, Filip, and Carlson 2002). Nevertheless, research in other areas such as syntactic processing has shown that listeners can anticipate forms that are likely to occur at particular points in a sentence. For example, although the verb say can occur with either a noun phrase or a sentential complement as its grammatical object (e.g., The condemned prisoner said .â•–.â•–. [his last words] / [ justice had failed him]), there is a€bias to interpret unfolding postverbal material as a sentential complement owing to the high probability of co-occurrence of sentence complements with this specific verb. These co-occurrence probabilities (drawn from the comprehender’s language experience) are graded in character and have been observed to produce processing biases that vary in degree depending on the specific verb encountered (Garnsey, Pearlmutter, Myers, and Lotocky 1997; Jurafsky 1996; MacDonald 1993; Trueswell, Tanenhaus, and Kello 1993). Although the disambiguation of syntactic complements does not bear directly on the position of modifiers or general issues of reference, it does reveal comprehenders’ capacity to rapidly generate expectations for a particular form (drawn from a set of alternatives) based on utterance-specific information. It is at least plausible, then, that conventions governing the placement of modifiers may in some way guide on-line referential interpretation. If so, how would this knowledge be manifested? The possibility we pursue here is that the candidacy
222
Edwards and Chambers
of referential alternatives may depend in part on whether a compatible modifier occurs in its expected position. Consider, for example, the partial utterance Fred is going to buy the car .â•–.â•–.â•–, spoken in a dealer’s lot. As the head noun car is heard, the set of compatible candidates will presumably be narrowed to include cars and to exclude trucks, motorcycles, and other vehicles. However, it may also be possible to exclude, to some degree, any cars that would naturally be differentiated by means of color or relative size. This is because a color or size adjective would normally have been encountered before the head noun. Thus, even though the available information is still semantically compatible with all cars in the context, consideration may be limited to the subset of cars whose properties would most naturally be encoded using a postnominal construction. It is not likely, however, that this process would result in absolute inclusion in or exclusion from the candidate set. Rather, the graded nature of the correspondences between property types and modifier construction would be expected to produce continuous distinctions in the degree to which a candidate is or is not considered. Experiment 1
The goal of experiment 1 was to conduct an initial test of whether knowledge of conventions governing the form and position of modifiers is used on-line to evaluate candidate referents belonging to the same conceptual class. The study adopts the basic design of the visual-world study by Eberhard et al. (1995), described earlier, in which characteristics of the alternative candidates in the display were varied while holding constant the properties and description of the target referent. On critical trials, the visual displays in the current study contained a target object, a “competitor” object of the same category as the target, and two unrelated objects (see figure 9.1). The target object was always distinguished from the competitor by means of information expressed in a postnominal phrase (e.g., Click on the square with the diamonds). The properties of the competitor (which was never referred to) were varied along a fourstep continuum. In the postnominal condition, the competitor would be most naturally distinguished using a postnominal prepositional phrase (e.g., the square with the happy face). In the “either” condition, the competitor’s distinguishing property could be realized either prenominally or postnominally (e.g., the starred square / the square with the star). In the prenominal condition, the competitor would normally be distinguished using a prenominal color adjective (e.g., the green square), rather than a postnominal phrase. In the “different shape” condition, the competitor was an object from another shape category
It’s Not What You Said
223
Figure 9.1
Example display for experiment 1. Corresponding instruction: “Click on the square with the diamonds.”
(e.g., a circle with a happy face), which allowed it to be identified on the basis of noun information. A separate production task confirmed that the conditions reflected the decreasing probability that the competitor object would be distinguished with postnominal modification. If expectations about the position of a modifier modulate referential hypotheses, the likelihood of describing the competitor with a postnominal construction should be reflected in the degree to which the competitor is considered (i.e., attracts eye movements) upon hearing the head noun (e.g., Click on the square .â•–.â•–.). For example, in the postnominal condition, the competitor conÂ� tinues to be a viable candidate for the unfolding noun phrase at this point because its distinguishing property is likely to be mentioned in a postnominal modifier. In the “either” condition, the expectation that the competitor will be distinguished via a postnominal modifier is weaker, and as a result the listener may be less apt to consider the competitor object after hearing the head noun. In the prenominal condition, there is even less expectation for postnominal modification, and so consideration of the competitor should be further reduced. Finally, in the “different shape” condition, there should be only minimal consideration of the competitor object, since the head-noun information should differentiate the target and competitor objects.
224
Edwards and Chambers
Method Participants
Twelve undergraduate students at the University of Calgary participated in exchange for bonus credit in a psychology course. All had normal or correctedto-normal vision and reported that English was their native language. Materials
The visual materials for this experiment consisted of geometric objects occupying the inner cells of a grid pictured on a computer screen. For each display, participants heard two pairs of prerecorded instructions that were presented via desktop speakers. Each instruction pair was of the form Click on the (object). Now place it (at location). The visual display also contained four fixed objects in the outer grid cells that served as reference points for the location specified in the instruction. These objects were consistent across trials and could not be clicked and dragged to different locations. On critical trials, the display contained a target object, a competitor, and two€ unrelated objects. Six critical trials were assigned to each of the four e�xperimental conditions described above. Each array of objects and its ac� companying instruction were cycled through each of the four conditions to counterbalance any effects stemming from differences in the array-instruction pairings. In addition to the critical trials, 36 filler trials were included to prevent participants from recognizing linguistic or visual contingencies on experimental trials. A number of filler instructions contained unmodified noun phrases or noun phrases with various kinds of prenominal and postnominal modifiers. The structure of the arrays of objects was also varied such that some displays did not contain multiple exemplars from the same category while others contained two exemplars that were not referred to over the course of the trial. Procedure
Participants were seated in front of a computer screen and were told that they would be required to follow instructions to click on an object in the display and place it in a different area of the grid. They were then fitted with an ASL Model 501 head-mounted eye-tracking device that consisted of a monocle, an eye camera, and a scene camera mounted to a headband. After a brief calibration procedure, participants first completed four practice trials to ensure that they understood the experimental procedure. Each trial was recorded on a Sony DSR-30 digital video cassette recorder. The experimental session lasted approximately 30 minutes.
It’s Not What You Said
225
Results and Discussion
Data were analyzed using frame-by-frame playback of the video recording. The onsets and offsets of critical words were recorded along with the sequence of eye movements, beginning at the onset of the instruction and ending when participants began moving the screen cursor toward the target object. As described above, if expectations about the linguistic realization of modifier information are used to assess referential candidacy, we should observe a decreasing trend to fixate the competitor object across the conditions at the point after the head noun (e.g., square) has been heard. To assess this possibility, we calculated the proportion of trials containing a saccade to the competitor object within the first 500 msec after the midpoint of the head noun (see figure 9.2). The data show that the competitor object was considered most Â�often in the postnominal condition, with fewer saccades to the competitor in the “either” condition, still fewer in the prenominal condition, and only€Â�minimal
Figure 9.2
Proportion of trials containing a saccade to the competitor object in 500-msec window after midpoint of head noun in experiment 1.
226
Edwards and Chambers
saccades to the competitor in the “different shape” condition. This outcome indicates that listeners can anticipate the modifier most likely to distinguish a candidate referent, and that this evaluation in turn modulates referential candidacy. A repeated-measures analysis of variance testing for linear trend (including a factor for list) indicated that the predicted pattern of differences was significant: F(1, 8)â•–=â•–21.31, pâ•–<â•–.001. It may be relevant to consider whether differences in fixations to the competitor could be driven in part by purely visual characteristics, rather than by€linguistic factors such as correspondences between modifier position and property type. For example, the relatively bland visual characteristics of the solid-colored competitor in the prenominal condition may be less likely to cÂ�apture visual attention, thereby reducing the likelihood of generating a saccade to this object. However, we suspect that this is unlikely. First, recall that our measurement window begins at the critical noun. The visual display had been present for at least 5 seconds at this point, and fixations to most display€objects had typically occurred by this time. It is therefore doubtful that attentional capture plays a significant role in the effect. Second, this explanation cannot account for the difference between the “either” and postnominal€conditions, where the visual complexity of the competitor objects is quite comparable.1 In sum, the results of this experiment indicate that the candidacy of a given referent is influenced by the expectation that its distinguishing property will be realized by means of a prenominal versus a postnominal modifier. For example, when a head noun is encountered with no preceding modifier, listeners decrease their consideration of candidates that would normally be dÂ�istinguished using a prenominal form. The degree to which consideration is decreased is in proportion to the likelihood with which prenominal modification is expected for the property in question. At this point, our findings do not reveal the precise mechanism linking a listener’s knowledge about the realization of modifiers to the eventual referÂ� ential consequences. One possibility is the effect described above reflects a process of optimizing “referential success” (or alternatively, minimizing “referential failure”) during on-line interpretation. Listeners may draw on correspondences between property types and modifier position to evaluate the potential for an unfolding noun phrase to succeed or fail at individuating specific referents. If this is the case, then expectations about modifier position may inform other linguistic processes such as syntactic ambiguity resolution, which has previously been shown to reflect considerations of referential success or failure. This possibility was addressed in experiment 2.
It’s Not What You Said
227
Experiment 2
A substantial body of evidence suggests that the number of candidates available for reference influences expectations about the syntactic structure of an unfolding utterance (Altmann and Steedman 1988; Britt 1994; Chambers, Tanenhaus, and Magnuson 2004; Crain and Steedman 1985; Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy 1995; van Berkum, Brown, and Hagoort 1999). Here we illustrate this effect with reference to a visual-world study by Tanenhaus et al. (1995) in which listeners followed instructions containing definite noun phrases with postnominal modification, such as Put the apple on the towel in the box. In this sentence, the phrase on the towel is temporarily ambiguous between a “goal” analysis, in which this phrase specifies the intended location for the apple, and a “modification” analysis, in which the phrase denotes a property of the apple. When the instruction accompanied a visual display containing a single apple (located on a towel), the results showed that the ambiguous phrase is initially misinterpreted as specifying the goal lÂ�ocation for the apple, as reflected by the high proportion of eye movements to€ a second “empty” towel in the display. The goal analysis is then abandoned€in favor of a modification analysis when the rest of the sentence is encountered. In contrast, when there were two apples present in the visual array, the ambiguous phrase was initially interpreted as a modifier, as shown by the finding that the empty towel was fixated no more often than when the instruction was linguistically unambiguous (e.g., Put the apple that’s on the towel in the box). In the presence of multiple “apple” candidates, the listener expected additional information about the intended apple, recognizing that the expression would otherwise result in “referential failure” (Altmann and Steedman 1988). The goal of experiment 2 was to conduct a preliminary test of whether conventions governing noun-phrase modifiers are used to gauge the potential for referential failure to occur. If so, this information would presumably modulate referential effects in syntactic ambiguity resolution. To evaluate this possibility, we asked whether the goal bias typically observed in one-referent contexts is reduced when the listener receives an early cue that the syntactically ambiguous phrase may not adequately describe the intended goal location. The experiment used linguistic stimuli that were similar to those used by TÂ�anenhaus et al., such as Put the apple on the plate in the sand bucket. The corresponding display (see figure 9.3) consisted of a target object (an apple on a plate), a “true goal” (a sand bucket), an unrelated item (e.g., a toy train), and a “false-goal” region (consisting of two empty plates). Because the display contained only a
228
Edwards and Chambers
Figure 9.3
Example display for experiment 2. Corresponding instruction: “Put the apple (that’s) on the plate in the sand bucket.”
single apple, listeners should generate an initial goal analysis for the phrase on the plate. This analysis will be rejected in favor of a modification analysis when the rest of the sentence (in the sand bucket) is encountered. An important feature of the display is that there are two potential plate candidates that could constitute the goal referent for the ambiguous phrase on the plate. As a result, modifying information would be necessary to distinguish which object is intended. Of interest is whether the initial goal (mis)analysis of the phrase is rejected more quickly when it becomes clear that the head noun within this phrase is not preceded by a modifier that would typically be used to distinguish the intended goal. This is illustrated in the prenominal condition in the example display. In this condition, the two objects in the false-goal region would be most naturally distinguished in terms of relative size, which is routinely expressed using prenominal adjectives (Quirk, Greenbaum, Leech, and Svartvik 1985). Thus, if listeners anticipate the position of the likely modifier as the sentence unfolds, the word plate, occurring without an adjective (e.g., small or large), may provide an early signal that the current goal analysis of the prepositional phrase will entail referential failure for the noun phrase the plate. In turn, listeners may reduce their commitment to a goal analysis. The panel on€the right shows the corresponding postnominal condition. In this case the distinguishing properties of the plates are their inherent visual features, such as a flower pattern or a heart pattern. These represent properties at the other end of the modification continuum, namely properties that are most naturally de-
It’s Not What You Said
229
scribed using modifiers that follow the head noun. As a result, encountering the head noun provides no early cue that might reduce the commitment to a goal analysis for the ambiguous phrase. As in the analyses of Tanenhaus et al. (1995), differences in the degree to which a goal (mis)analysis is pursued should be reflected in the extent to which objects in the false-goal region attract fixations. Method Participants
Twenty-eight new participants were recruited from the same population used in experiment 1. Materials
The visual materials for this experiment consisted of a variety of real objects placed on a tabletop display. For each array of display objects, participants heard two pairs of auditory instructions to move or manipulate objects. Each critical trial contained the four object types described above. The properties of the false-goal objects were varied such that their contrasting properties would be described with either a prenominal size adjective or a postnominal prepositional phrase. This manipulation was crossed with a second manipulation varying whether the critical phrase in the instruction was linguistically disambiguated by the complementizer that’s, as in Put the apple that’s on the plate in the sand bucket. The unambiguous conditions provide a baseline for evaluating the degree to which the goal misanalysis was or was not initially adopted in the conditions with ambiguous prepositional phrases. Each of the four conditions was paired with four critical trials, yielding 16 critical trials in total. The varÂ� ious object arrays were cycled through each of the four conditions. In addition to the critical instructions, 48 filler instructions were constructed. Sixteen of these followed the critical instructions on trials with critical displays. The remaining 32 filler instructions were assigned in pairs to 16 filler displays that were randomly interspersed with the critical displays. As in experiment 1, the filler instructions and displays were included to ensure that participants did not recognize recurring characteristics of the critical displays and instructions. For example, because critical displays were always “onereferent” contexts, some of the filler trials were designed to contain two potential target referents, where modification is genuinely necessary to distinguish which is intended. A variety of linguistic constructions were used on filler tÂ�rials, including cases where put the X was followed by a single prepositional phrase, or two prepositional phrases where the second one was a modifier (e.g., put the duck beside the [cup [with the spoon]]).
230
Edwards and Chambers
Procedure
The procedure for this experiment was similar to the procedure in experiment 1 with the exception that participants manipulated real objects rather than objects pictured on a video display. The entire experiment lasted approximately 45 minutes. Results and Discussion
The measure of interest for this study is the degree to which objects in the false-goal region were considered when the first prepositional phrase was encountered, which indicated whether the phrase was initially understood as specifying the goal argument for the verb put. Figure 9.4 shows the proportion of trials containing a saccade to the false-goal region after onset of the amÂ� biguous phrase. Across conditions, the proportions ranged from 0.35 to 0.55. ( Note that some fixations to the false-goal region are expected even with unambiguous sentences, because listeners may anticipate that a false-goal object will be referred to in the eventual goal phrase, e.g., Put the apple that’s on the plate on one of the other plates.) The data pattern shows that the false-goal objects were fixated more often when the phrase was ambiguous than when it was preceded by the copular complementizer that’s, which explicitly disam-
Figure 9.4
Proportion of trials containing a fixation to the false-goal region after onset of ambiguous phrase in experiment 2.
It’s Not What You Said
231
biguates the phrase’s syntactic role. However, within the ambiguous phrase conditions, there is no apparent evidence that the initial commitment to a goal analysis was reduced when the critical head noun was encountered without the expected preceding adjective. This would have been reflected in a lower probability of fixating the false-goal region following the ambiguous phrase in the€ prenominal condition than in the postnominal condition. The data were submitted to a 2 (ambiguous / unambiguous instruction)â•–×â•–2 ( prenominal / postnominal competitor property) repeated-measures analysis of variance (including a between-participants factor for list). The analysis revealed only a main effect of ambiguity (F(1, 24)â•–=â•–5.42, pâ•–<â•–.05). Neither the main effect of false-goal property nor the interaction was significant (Fâ•–<â•–1). In summary, then, it does not appear that expectations about modifier position provide a sufficiently strong signal to influence on-line syntactic hypotheses. We reasoned that the non-occurrence of an expected prenominal modifier in the goal phrase might lead the listener to infer that this phrase might result in referential failure, leaving a unique referent unspecified. This could, in turn, provide a reconsideration of the goal analysis assigned to the phrase, producing a modest yet detectable modulation in the degree to which eye movements were made to objects in the false-goal region. In this experiment, our interest was clearly in a “strong” version of this hypothesis, where the effects of mÂ�odifier-property correspondences would be detected even when pitted against informational constraints that biased the ambiguous phrase towards a goal analysis. In the current study, these constraints included the argument structure requirements of the verb put, the frequency of a goal phrase occurring after a put + noun-phrase construction, memory resource constraints, and the minimal referential motivation for modification of the theme noun phrase. Although effects of modifier expectations were not detected in the current study, we do not know whether they could modulate syntactic processing in situations where there is less consistent evidence supporting a particular structural analysis. However, because referential effects are often detected in studies using sentence materials resembling those in the current study, we believe our results are informative for understanding the circumstances in which referential factors are most likely to influence other processing tasks, such as resolving syntactic ambiguity. In particular, we suspect that the differences between our study and earlier ones are directly related to the “non-absolute” nature of conventions for modifier placement. As was stated earlier, even with prenominally disposed property types such as size or color, the expectation for the property to be linguistically realized in a particular position is far from absolute. As a result, the failure to provide distinguishing information in a premodifier does not entail that the speaker has no options for realizing this information after the head
232
Edwards and Chambers
noun is reached. A postnominal construction, though somewhat awkward and unexpected, can still be used. This contrasts with the scenario in other studies of referential effects on syntactic processing where the failure to encounter postnominal information will clearly lead to indeterminate reference for the critical noun phrase. The potential for referential factors to influence syntactic processes might therefore be limited to cases where a particular syntactic analysis would lead to inevitable referential failure. General Discussion
Previous studies of noun-phrase interpretation have shown that the semantic constraints of noun modifiers are used to rapidly narrow the set of candidate referents as each modifier is encountered. The current study shows that this incremental process is further contoured by an evaluation of the structural alternatives available for describing the properties of referential candidates. When the syntax of an unfolding phrase reaches a point where particular modifier forms can no longer occur, candidates whose properties would typically be realized by such forms receive less consideration in referential hypotheses (experiment 1). This is the case even though the candidates are compatible with the available information on semantic grounds. For example, given a visual context containing two squares and the instruction Click on the square with the lightbulb, listeners reduced their consideration of a solid green “competitor” square once the head noun square was heard. This is because its distinguishing property would typically have been mentioned using a prenominal adjective by this point (e.g., the green square). Importantly, the degree to which candidates are considered is graded in nature and reflects the likelihood that their distinguishing properties could still be realized in the remaining portion of the noun phrase. For example, when the competitor’s distinguishing feature is a stripe pattern, the competitor continues to receive some consideration after the head noun is encountered because stripedness is readily described using prenominal or postnominal forms (striped square / square with the stripes). In our second experiment, we found that the incremental effects observed in experiment 1 could not be readily understood as a process of optimizing “referential success” given the available noun-phrase information, at least on a strong version of this claim where the potential for success is observed to influence other processing tasks. In particular, we found that the failure to encounter an expected prenominal modifier does not prompt reconsideration of the syntactic role assigned to an ambiguous constituent, even when the eventual form of the constituent would not conform to conventions for noun modifica-
It’s Not What You Said
233
tion. This outcome may reflect the circumstances under which referential factors are able to influence other linguistic processes. Specifically, effects such as syntactic reanalysis may be triggered only when referential failure appears to be inevitable. If the necessary referential information could still in principle be encountered postnominally, there may not be sufficient reason to provoke revision of the current structural analysis. Our experiments represent only a preliminary foray into the question of how expectations about the structural realization of object properties influence realtime processing. However, we believe the results raise a number of issues pertinent to the interpretation of noun phrases and semantic interpretation more generally. To begin with, although we have been treating correspondences between modifier position and property types as rather arbitrary conventions, this is an oversimplification. For example, there are well-known and fairly consistent correlations between modifier position and the degree to which the modifier describes a stable versus temporary property of an entity (Bolinger 1967). Though the statement I’d like to have the corner table is appropriate in a restaurant (where a table’s location is relatively fixed), the same statement sounds odd when uttered in a furniture store (where items are routinely moved as they are stocked and sold). In fact, the use of a prenominal construction to realize a property that is intrinsically transient in nature often results in ungrammaticality (e.g., ?*the afraid passenger / the passenger who is afraid ). Again, though it is clear that such constraints must be taken into account during the planning and production of noun phrases, their role in comprehension is unclear. It is possible, for example, that the failure to hear a prenominal adjective in an unfolding noun phrase may bias a listener to consider objects that would normally be distinguished in terms of transient properties. If so, expectations about modifier position may be understood to reflect semantic distinctions that are encoded as part of the “meaning” of a syntactic construction (Goldberg 1997). The remaining question would be whether this proposal could subsume the effects we have described in terms of more coarse-grained and arbitrary correspondences between modifier position and various “types” of properties (color, size, etc.). A second issue, which is relevant for most experimental studies using restrictive modifiers, concerns the degree to which listeners can anticipate the properties that speakers will use to differentiate similar entities. For referents that have already been linguistically evoked, such predictions are simplified by the “conceptual pacts” established between speakers and listeners in the course of a conversation (Brennan and Clark 1996). Once a description is established for a given referent, listeners assume that the speaker will use this description for subsequent reference rather than a description that mentions other pÂ�roperties
234
Edwards and Chambers
of the referent (Metzing and Brennan 2003). For entities that have yet to be referred to, listeners might be expected to be neutral on the issue of which properties will be encoded. However, the results of our first experiment suggest otherwise. In order to efficiently anticipate where a distinguishing modifier will occur, it would seem that listeners must already have some idea of which property will be encoded. To some extent, this ability may reflect the limited ways in which the objects on critical trials differed from one another.€Indeed, displays of this sort appear to be characteristic of the majority of visual-world studies conducted to date. One question for future study is how incremental interpretation is affected in scenarios where scene objects contrast on more than one dimension and where their distinguishing properties are not necessarily grounded in perception (e.g., the shirt from your aunt or the new CD). Although the resulting decrease in predictability could in principle moderate rapid incremental and anticipatory effects, we suspect that even in these cases the range of relevant properties is highly restricted by factors such as the behavioral context of communication and the actions evoked by semantic predicates (Altmann and Kamide 1999; Chambers et al. 2002; Chambers et al. 2004). A final consideration concerns the degree to which the on-line comprehension of modified noun phrases draws on pragmatic factors such as “common ground” and mechanisms of extralogical inference. A number of studies have demonstrated that a comprehender’s referential hypotheses reflect the information or knowledge attributed to the speaker (Keysar, Barr, and Horton 1998). In addition, an increasing body of evidence suggests that this knowledge and other information sources are coordinated by means of fast-acting inferential mechanisms in the early moments of comprehension (Hanna and Tanenhaus 2004; Hanna, Tanenhaus, and Trueswell 2003; Grodner and Sedivy, this volume). These outcomes appear to be compatible with theoretical perspectives challenging the existence of a sharp distinction between linguistic versus pragmatically enriched meaning and questioning the assumption that all inferences are likely to incur significant processing demands (Levinson 2000). One compelling demonstration comes from a study by Sedivy (2003). This study€showed that listeners are more likely to assume a color adjective signals a contrast among two or more entities when the color is highly predictable for the object in question (and consequently would be uninformative for simply identifying the object). Thus, the term yellow is more likely to be understood as contrastive in the phrase the yellow banana than in the yellow book, where redundant specification of the object’s arbitrary color might be helpful for object identification. Most generally, this result shows that the potential to signal contrast depends on pragmatic judgments of informativity in the specific situation,
It’s Not What You Said
235
rather than semantic presuppositions encoded as part of an adjective’s meaning. In light of this and other evidence regarding the early involvement of pragmatic processes, one question for the current study is whether listeners adjust referential hypotheses based simply on their own knowledge about conventions for modifier position, or whether these conventions are judged against the speaker’s perceived ability to make particular distinctions. On the latter account, the reduced consideration of a solid-colored competitor at the head noun (experiment 1) would involve a pragmatic judgment that the speaker would use a prenominal color adjective as the most effective means to identify the competitor object. If the speaker was thought to lack the knowledge relevant for encoding this distinction, then the listener would respond differently. For example, if the speaker was known to be color blind or was observing the listener’s scene environment via a black-and-white video monitor, then the listener would be less likely to reduce consideration of a colored competitor simply because the speaker failed to use a prenominal modifier. It seems likely, then, that future investigation into the speed and efficiency of referential interpretation will require relatively rich and complex conversational contexts. By carefully evaluating the environmental and interpersonal determinants of referential expectations, we can better understand how on-line comprehension depends on not only the semantics of individual expressions, but also the listener’s judgments of why and how this information is expressed. Note 1.╇ Results from other studies in our laboratory further suggest the effect is not due to differences in the complexity of visual patterns. First, preliminary evidence shows that the early exclusion of the competitor (e.g., in the prenominal condition) is attenuated when the plain, unpatterned competitor is of a nonfocal color that corresponds to no commonly used color term. In this situation, listeners apparently have weaker expectations that a prenominal modifier would be used to differentiate this object. Second, we have observed early exclusion of a competitor in separate studies where a size adjective, rather than a color adjective, would most naturally be used to differentiate this object. In this case, visual differences between targets and competitors are limited to differences of scale and not the complexity of visual patterns. References Altmann, G., and Kamide, Y. 1999. Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition 73, 247–264. Altmann, G., and Steedman, M. 1988. Interaction with context during human sentence processing. Cognition 30, 191–238.
236
Edwards and Chambers
Bolinger, D. 1967. Adjectives in English: Attribution and predication. Lingua 18, 1–34. Brennan, S. E., and Clark, H. H. 1996. Conceptual pacts and lexical choice in converÂ� sation. Journal of Experimental Psychology: Learning, Memory and Cognition 22, 482– 493. Britt, M. A. 1994. The interaction of referential ambiguity and argument structure in the parsing of prepositional phrases. Journal of Memory and Language 33, 251–283. Chambers, C. G., Tanenhaus, M. K., and Magnuson, J. S. 2004. Actions and affordances in syntactic ambiguity resolution. Journal of Experimental Psychology: Learning, Memory, and Cognition 30, 687– 696. Chambers, C. G., Tanenhaus, M. K., Eberhard, K. M., Filip, H., and Carlson, G. N. 2002. Circumscribing referential domains during real-time language comprehension. Journal of Memory and Language 47, 30– 49. Crain, S., and Steedman, M. 1985. On not being led up the garden path: The use of context by the psychological parser. In D. R. Dowty, L. Kartunnen, and A. Zwicky (eds.), Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives. Cambridge University Press. Eberhard, K. M., Spivey-Knowlton, M. J., Sedivy, J. C., and Tanenhaus, M. K. 1995. Eye movements as a window into real-time spoken language processing in natural contexts. Journal of Psycholinguistic Research 24, 409– 436. Garnsey, S., Pearlmutter, N., Myers, E., and Lotocky, M., 1997. The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language 37, 58–93. Goldberg, A. 1997. Construction grammar. In E. K. Brown and J. E. Miller (eds.), Concise Encyclopedia of Syntactic Theories. Elsevier Science. Hanna, J. E., and Tanenhaus, M. K. 2004. Pragmatic effects on reference resolution in a collaborative task: Evidence from eye movements. Cognitive Science 28, 105–115. Hanna, J. E., Tanenhaus, M. K., and Trueswell, J. C. 2003. The effects of common ground and perspective on domains of referential interpretation. Journal of Memory and Language 49, 43– 61. Jurafsky, D. 1996. A probabilistic model of lexical and syntactic access and disambiguation. Cognitive Science 20, 137–194. Keysar, B., Barr, D. J., and Horton, W. S. 1998. The egocentric basis of language use: Insights from a processing approach. Current Directions in Psychological Science 7, 46 –50. Levinson, S. C. 2000. Presumptive Meanings. MIT Press. MacDonald, M. C. 1993. The interaction of lexical and syntactic ambiguity. Journal of Memory and Language 32, 692–715. Metzing, C., and Brennan, S. W. 2003. When conceptual pacts are broken: Partnerspecific effects on the comprehension of referring expressions. Journal of Memory and Language 49, 201–213. Quirk, R., Greenbaum, S., Leech, G., and Svartvik, J. 1985. A Comprehensive Grammar of the English Language. Longman.
It’s Not What You Said
237
Sedivy, J. C. 2003. Pragmatic versus form-based accounts of referential contrast: Evidence for effects of informativity expectations. Journal of Psycholinguistic Research 32, 3–23. Sedivy, J. C., Tanenhaus, M. K., Chambers, C. G., and Carlson, G. N. 1999. Achieving incremental interpretation through contextual representation. Cognition 71, 109–147. Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., and Sedivy, J. C. 1995. Integration of visual and linguistic information during spoken language comprehension. Science 268, 1632–1634. Trueswell, J., Tanenhaus, M., and Kello, C. 1993. Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory, and Cognition 19, 528–553. van Berkum, J. J. A., Brown, C. M., and Hagoort, P. 1999. Early referential context effects in sentence processing: Evidence from event-related brain potentials. Journal of Memory and Language 41, 147–182.
10â•…
The Effect of Speaker-Specific Information on Pragmatic Inferences Daniel Grodner and Julie C. Sedivy
It is commonplace to observe that utterances can convey more information than they explicitly encode. Indeed, the field of pragmatics has grown from the insight that speakers exploit communicative conventions in order to say more with less. Less attention has been paid to the burden this places on perceivers. Because speakers’ meanings are underspecified, perceivers must infer implicit content in order to interpret, and successfully situate, utterances within the current discourse. These inferences1 appeal to many types of knowledge, including entailment relations, world knowledge, and discourse context. The last of these presents a particular challenge to investigators because contexts are dynamic and utterance meaning can be sensitive to context. For instance, the uÂ�tterance in (2) implies something like (2a) if it is a response to (1a) and something like (2b) if it is a response to (1b).2 (1) a. What time is it? b. How good is the party? (2) Some guests are already leaving. a. It must be late. b. The party is not much fun. The ease and the prevalence of such inferences facilitate efficient communication but create difficulty for models of language understanding. A central pÂ�uzzle is how the extrasentential context is combined with intrasentential information to ultimately yield an interpretation of a sentence. The present paper addresses this issue by exploring a particular dependency between the referential environment and linguistic form. Suppose a speaker wishes to refer to one member of a set of entities belonging to the same nominal category in the current discourse. The speaker must use a modified expression in order to refer successfully. For instance, if one cup is the intended referent in a context containing two cups, the speaker must use a restrictive
240
Grodner and Sedivy
modifier, such as “the cup on the left” or “the red cup.” Significantly, this dependence appears to be bidirectional. When a listener encounters a restrictively modified referential description, such as the tall cup, two sets are invoked in the immediate discourse: a target set corresponding to the literal denotation of the expression (e.g., a tall cup) and a contrast set containing an entity of the same type as the noun but differing along the dimension picked out by the adjective (e.g., a short cup). Indirect evidence for the contrastive inference comes from studies of structural ambiguity resolution. In general, the sentence-processing mechanism prefers syntactic alternatives that contain simple unmodified descriptions when given the option. Crain and Steedman (1985) proposed that this preference stems from the fact that modified structures occasion costly changes to the discourse model. As an illustration, consider the string The horse raced past the barn .â•–.â•–.â•–, which is ambiguous between a main clause containing a simple NP subject and a reduced relative clause (RC) modifying the subject. The simple NP reading presupposes the existence of a single referent set, namely a horse, in the current discourse model. The complex NP reading involves the projection of an additional contrasting set of entities that share the properties denoted by the head noun but differ by virtue of the property expressed in the modifier. For the present example this corresponds to a non-empty set of horses that were not raced past the barn. In a null context, the modified NP reading requires the greatest number of additions to the current discourse model in oÂ�rder to be felicitous. Thus, the simple NP is preferred. There is evidence that by establishing appropriate referent and contrast sets it is possible to alter parsing preferences (Altmann, Garnham, and Dennis 1992; Altmann and Steedman 1988; Sedivy 2002; Spivey-Knowlton and Tanenhaus 1994). More direct evidence of contrastive inference comes from studies that monitor perceivers’ eye movements as they rearrange objects according to spoken instructions (Tanenhaus, Spivey-Knowlton, Eberhard, and Sedivy 1995; Sedivy, Tanenhaus, Chambers, and Carlson 1999; Sedivy 2003). Sedivy et al. had individuals respond to instructions containing a prenominally modified phrase, such as “Pick up the tall cup.” Object arrays consisted of four entities: a target object (e.g., a tall cup), a competitor from a different nominal category that shared the modifier property of the target (e.g., a tall pitcher), an irrelevant distracter object, and either a contrasting object of the same category as the target but possessing a distinct modifier property (e.g., a short cup) or a second distracter. Figure 10.1 depicts a sample display. Note that an indeterminacy is introduced when the modifier is uttered. The input to this point is compatible with reference to either the target or the competitor. If perceivers are driven by the literal meaning of the modifier, their attention could be directed to either
Effect of Speaker-Specific Information
241
Figure 10.1
Sample contrast condition display for the target instruction “Pick up the tall glass.” Each display contained four objects. In the “no contrast” display the contrast was replaced by a second distracter object.
referent. However, if they interpret the modifier contrastively, they should attend to the target object, as this object requires a modified description in order to distinguish it from the contrasting object of the same category. Consistent with this, individuals made earlier fixations on the target and fewer fixations on the competitor in the presence of a contextual contrast. The influence of referential contrast on eye movements has been observed extremely early in the speech stream — within 200 msec of the onset of the head noun (Sedivy 2003). In light of estimates that saccadic eye movements are planned 150 –200 msec before being launched (Matin, Shao, and Boff 1993), restrictive modifiers can generate the expectation for referential contrast well before the nominal head can be identified. Because the speech input does not uniquely isolate the target before utterance of the noun, perceivers’ fixations must be guided by inferential content. Two types of accounts have been forwarded to explain the effect of dÂ�iscourse contrast: form-based and pragmatic. Form-based accounts attribute the contrastive inference to the conventional content of restrictively modified descriptions. This type of explanation has the advantage that it requires consideration of information contained only within the sentence. Perceivers would not have to deliberate over various types of extrasentential information. This comports well with the speed and the automaticity of contrast effects. One way for the conventional meaning to engender a contrastive inference is if definite modified NP structures conventionally presuppose contrasting
242
Grodner and Sedivy
eÂ�ntities in much the same way that a definite description might be said to conventionally presuppose uniqueness of reference. This idea was suggested by Steedman and Altmann (1989). However, this account is difficult to maintain in light of examples like (3), which indicate that the contrastive inference is cancelable in a way that conventionally expressed content is not (Grice 1975; Sadock 1978). (3)╇ Use the tall cup because there are no other cups. Perhaps a more plausible way to play out a form-based presuppositional account is to propose that modificational content is by default placed in focus. Focused material is typically analyzed as foregrounded against a background of contrasting alternatives (Rooth 1985, 1992). Thus, for instance, focusing the modifier for the string above as in The horse RACED PAST THE BARN .â•–.â•–. would make prominent a single horse that was raced past the barn against a presupposed background of a set of horses that were not raced past the barn. Note that a focus-mediated presuppositional account is consistent with the occurrence of modifiers in the absence of referential contrast. This is because modifiers need not always be placed in focus, and unfocused material is not predicted to generate a set of alternatives. Reading-time results from studies examining syntactic ambiguities lend some support to the focus-mediated interpretation of referential effects. In particular, the presence of the overt focus operator “only” can induce a preference for ambiguous material to be analyzed as a nominal modifier ( Ni, Crain, and Shankweiler 1996; Sedivy 2002; but see Clifton, Bock, and Radó 2000). However, it is difficult to extend this explanation to spoken stimuli involving prenominal modification (e.g. “Pick up the tall cup.”). For these utterances, focus marking would result in greatest prominence on the adjective. However, refÂ� erential contrast effects were found with spoken stimuli in which greatest prominence occurred on the head noun, according to rules of nuclear stress placement. Hence, the referential-contrast effect does not seem to depend on focus marking. Another form-based possibility is that the effect of referential contrast is lexically driven. For instance, scalar adjectives, such as tall, are inherently relational and must be evaluated relative to a contextually salient comparison class. Clearly what counts as tall for a child is very different from what counts as tall for an elephant (Siegel 1980; Bierwisch 1987). It might be a lexical Â�idioÂ�syncrasy of scalar modifiers that they cause perceivers to attend inordinately to the discourse context and, thereby, receive contrastive interpretations. This provides an account for the referential-contrast effect with scalar adjectives, but does not extend to referential effects with other modifiers, such as the
Effect of Speaker-Specific Information
243
postnominal modification involved in syntactic ambiguity resolution, or even referential effects found for non-scalar modifiers, such as modifiers denoting material properties (Sedivy 2001). Thus, at the very least, any form-based approach seems to require a combination of at least two somewhat distinct mechanisms to account for the range of effects observed to date. Perhaps the most striking evidence against any of the above accounts, either singly or in combination, is evidence that whether a modifier invokes a contrast can depend on what it modifies. Sedivy (2003) examined the processing of descriptions that contained prenominal color modifiers in the presence or absence of a contrasting object. In one condition, the adjective denoted a property that was highly predictable from the noun category, as in “the yellow banana.” In another, the color property was not strongly associated with the noun, as in “the yellow cup.” The presence of a contextual contrast aided individuals in identifying objects whose colors were predictable. For instance, individuals more quickly identified a yellow banana, and directed fewer eye movements to other yellow objects in the display, in the presence of a green banana. However, there was no benefit of contextual contrast conferred when identifying objects whose color properties were unpredictable. Individuals’ eye movements did not converge more quickly on a yellow cup in the presence of a green cup. The contrastive inference is therefore not an inherent part of the meaning of the color modifier but rather is related to its informativity with respect to the entity being modified. Such informativity effects are predicted to arise within a pragmatic explaÂ� nation of referential contrast (Clifton and Ferreira 1989; Sedivy 2003). The pragmatic account claims that contrastive inferences arise because the use of a restrictive modifier is embedded in a collaborative communicative exchange. There is an understanding between discourse participants that speakers are only as informative as they need to be. This follows from Grice’s second maxim of quantity (1975, p. 26): “Don’t make your contribution more informative than is required for the purposes of the present exchange.” A simple NP would suffice to pick out the intended referent in a context with only one entity. The use of a more prolix or unusual form indicates that a different state of affairs prevails (see Levinson 2000 for an extended treatment of this idea). The most natural way to make the inclusion of the modifier informative is to ascribe a distinguishing function to it. To apply this logic, perceivers must reason counterfactually about what the speaker could have said but chose not to say. In support of the pragmatic view, contrastive inferences arise only for adjective types that are not used to label objects in isolation. Scalar modifiers and predictable color modifiers are rarely encoded in default descriptions, whereas unpredictable color modifiers are frequently encoded in default descriptions
244
Grodner and Sedivy
(Sedivy 2000, 2002). The connection between inferential patterns in comÂ� prehension and distributions in production suggests that perceivers actively generate expectations as to what referential form a speaker will opt for in sÂ�electing an entity. Deviations from those expectations result in pragmatic iÂ�nferences. The appeal of this Gricean mechanism is that it is capable of adapting appropriately to a wide range of stimulus conditions. It also provides a unified explanation for all the referential-context effects found to date. On the other hand, it requires a set of as yet unspecified computations that appeal to hetÂ� erogeneous and potentially unconstrained knowledge bases. Many different factors might be weighed in deciding whether a modified form should be interpreted contrastively. These include the intrinsic properties of a referent, the discourse context, the reliability of the speaker, the intentions of the speaker, the common background of the interlocutors, the goals of the communicative situation, and expectations about alternative forms. It is unlikely that pÂ�erceivers could consider all of these in the limited time frame in which contextual contrast effects have been observed. In fact, Clifton and Ferreira argued that such conversational implicatures would be too cumbersome for individuals to engage in on-line, and that, hence, an expeditious parser would not consider such information in initial interpretive processes. The studies described above establish only that listeners are sensitive to the use of default descriptions when inferring a contrast. This need not imply that individuals are engaged in full-blown Gricean reasoning. It may be that certain inferential steps are short-circuited. This resembles the position of the socalled neo-Griceans (Gazdar 1979; Horn 1984; Levinson 2000). These researchers propose a principled division between conversational implicatures that are generalized across situations and those that are specific to certain situations. For instance, the implicatures in (2a) and (2b) are dependent on the particular context in which (2) is uttered. However, the scalar implicature in (2c) is present across of the vast majority of discourses. (2)╇ c.╇ Not all of the guests are leaving. Levinson (2000) hypothesizes that generalized implicatures like this one form part of the meaning for certain utterance types, whereas particularized implicatures are associated with utterance tokens. For quantity-based implicatures, the form type selected by a speaker is located on an informativity scale with respect to some anticipated reference type (Horn 1971, 1989). This comparison process can generate an implicature when the selected form is less informative than an alternative. For instance, the selection of “some” in (2) triggers a comparison with the more informative scalar alternative “all.” The failure to use the more constraining alternative results in the implicature (2c). Analogously,
Effect of Speaker-Specific Information
245
contrastive inferences arise when the form selected is more informative than an anticipated alternative. The generality and simplicity of this comparison mechanism suggest that quantity-based inferences could arise through automatic processes rather than being computed on the fly. If the routines for projecting a contrast are precompiled in this way, this could explain the immediacy with which the visual context influences reference resolution. It also predicts that generalized inferences might be harder to block or suspend than particularized ones such as (2a) and (2b). Motivated by computational constraints, Jurafsky (2003) outlines a simÂ� ilar idea to explain how perceivers interpret what speech act a speaker intended to perform. In this model, perceivers use surface cues probabilistically to rÂ�ealize the illocutionary force of an utterance. Because the implicit content arises from statistical associations, cues that are highly reliable indicators of implicit content should be harder to ignore. In the present example, this reasoning might predict an insensitivity to the situation in which the modifier is used. Because default descriptions are statically linked with individual referents, contrastive inferences do not require that perceivers consider the particular circumstances of the immediate discourse. Perceivers might reflexively infer a contrast whenever the speaker deviates from a stored default form. An alternative is proposed by Carston (1998). Building on the work of Sperber and Wilson (1986), he claims that all conversational inferences are actively constructed and therefore particularized to their circumstances. Such nonce inference is clearly a necessary component of language comprehension in oÂ�rder to explain the robustness of the inferences made in (2a) and (2b). What is at issue is whether this nonce mechanism is involved in the earliest stages of interpretive processing. The experiment described below makes an effort to distinguish the shortcircuited and nonce inference positions by manipulating characteristics idiosyncratic to a particular speaker — specifically, the degree to which the speaker could be considered to be adhering to normative conversational principles. If contrastive inferences are generalized implicatures triggered by deviations from a default form, the effect of contrast should be impervious to this manipulation, at least with respect to initial referential commitments. On the other hand, if contrastive inferences are the product of pragmatic reasoning about particular discourse situations, there should be a noticeable effect when the circumstances are altered in this critical way. Experiment
There is a great deal of evidence that speakers make many accommodations that demonstrate their sensitivity to the specific needs of a listener. For e�xample,
246
Grodner and Sedivy
experts adjust referential forms appropriately to the knowledge state of the addressee (Isaacs and Clark 1987), and speakers hyper-articulate to make speech clearer under conditions where intelligibility is likely to be reduced for a particular listener (Bradlow 2002). Knowledge specific to a conversational partner is also a potentially powerful constraint in comprehension. For instance, suppose (4) is uttered discourse initially. (4)╇ He’s such a jerk. Resolving the antecedent of the pronoun depends critically on discerning what potential referent is most likely to be salient for the individual speaker. In extreme cases, an utterance such as (4) may be effortlessly resolved even when uttered out of the blue as the first interaction in days or even weeks between interlocutors, and with no particular supporting visual context if the interlocutors share knowledge of some male person extraordinarily inclined to behave badly. Few studies have investigated the moment-to-moment application of speaker-related knowledge in comprehension. The few studies that have investigated this have employed head-mounted eye tracking to monitor visual attention as listeners perform a target-identification task. One important finding that emerges from this work is that listeners are sensitive to the speaker’s perspective in the referential discourse. Listeners exhibit preferential consideration for entities in the shared referential environment over items to which they have privileged access (Hanna, Tanenhaus, and Trueswell 2003; Nadig and Sedivy 2002; but see Barr and Keysar 2002 and Barr 2003). Hanna et al. (2003) provide a particularly powerful illustration of this ability. They show that listeners are sensitive to the speaker’s discourse model even when it is at odds with the perceptually available referential environment. These findings illustrate that speaker-based information can be recruited very quickly to resolve potential referential indeterminacies. It is interesting to note, however, that these studies exploit the conventional properties of the referring expression (in these cases, the uniqueness requirement of definite descriptions) to signal the need to constrain reference. It has been argued that circumscribing a referential domain is central to communication (Chambers et al. 2004; Hanna et al. 2003; Tanenhaus 2003). The requirement that definite descriptions refer to a uniquely identifiable entity, when combined with an experimental scenario in which satisfying this requirement necessitates accessing speaker-based information, creates optimal conditions for observing powerful and rapid speaker-based effects. It is interesting to extend such research to the question of whether perceivers access speaker-based information even when the conventional properties of the referring expression can be satisfied without recourse to speaker-based
Effect of Speaker-Specific Information
247
iÂ�nformation. Metzing and Brennan (2003) reported the use of speaker-based knowledge to interpret referring expressions in just such a situation. They found that perceivers were sensitive to the way a particular speaker had chosen to refer to an object in the past, and that they used this information inferentially. When the same speaker used a different but equally plausible label for an item, looks to that object were delayed, and there were more fixations to other objects in the display before converging on the target referent. Participants expected the new label to refer to a new object and interrogated the scene in an attempt to find it. This may have indicated that a pragmatic inference was arrived at when individuals tried to find a relevant function for the departure from an entrained label (cf. Grice’s maxim of manner). No such penalty was observed when a new speaker used the novel label. Participants were equally quick to respond to the new or the old label when the label was uttered by a new speaker. However, one must be cautious before concluding that these findings demonstrate immediate use of speaker identity in inferencing. On average, the first fixations to the target object were launched approximately one second after the onset of the referring expression. This is slower than the latencies seen in comparable tasks, and it raises the possibility that participants aÂ�dopted specialized strategies or that eye movements were being driven by non-primary processes. Still, it demonstrates that individuals are capable of applying the pragmatic expectation that a speaker will use a specific entrained description at some point in reference resolution. The present experiment extends the result of Metzing and Brennan by looking at whether perceivers generate expectations about the type of referring eÂ�xpression a speaker will use and then use this as the basis for generating inferences of referential contrast. We constructed a scenario in which a particular speaker does not obey the standard communicative conventions. Specifically, a number of cues were given to indicate that the speaker’s use of restrictive modification was not a reliable signal of the presence of a contextual contrast. If the contrastive inference is predicated on the perceiver’s belief that the speaker rationally chose one linguistic form over another, the contrast effect should be defeated in this situation. If instead the inference is a reflex of generalized and automatic informativity expectations, this manipulation should not affect the projection of a contrast, and the presence of a contextual contrast should aid listeners in identifying the target. A further manipulation in the present study was to look at the differential properties of material and scalar modifiers. Material properties are not context dependent in the same way as scalar properties. Nevertheless, just as for scalar adjectives and redundant color adjectives, the referents for descriptions containing material modifiers are easier to identify in the presence of a contextual
248
Grodner and Sedivy
contrast (Sedivy 2001). One possibility is that the different modifiers engender a contrastive inference in disparate ways. Adjectives that denote inherently relational properties, such as scalar terms, might engender a contrast effect as a result of their lexical semantics. Modifiers that have no special relation to the context of their use, such as material and color terms, might instead convey contrast pragmatically. If so, then the expectation for contextual contrast given by scalar terms might be less susceptible to being blocked by the present pragmatic manipulation than the expectation for contextual contrast given by material items. Participants
Thirty-one members of the Brown University community were paid for their participation in the experiment. Each was a native speaker of English and had normal or corrected-to-normal vision. All were naive with respect to the goals of the experiment. Materials and Design
The methodology and the design were adapted from Sedivy et al. (1999). Participants were asked to manipulate arrays of four objects according to a set of prerecorded instructions. Twenty stimulus sets similar to that illustrated in figure 10.1 were constructed. For each display array there were two to four instructions. A critical instruction containing a prenominally modified noun (e.g., “Pick up the tall cup”) always occurred first in the series. In ten of the twenty critical phrases, the prenominal modifier referred to a scalar property. For the other ten, the modifier described a material property (e.g., “the glass mug”). Table 10.1 lists the contrast, competitor, and target objects for the stimuli. For the scalar items, the competitor object was selected so that it was a better exemplar of the modifier property than the target (e.g., an unusually tall pitcher). Thus, if there was an early bias toward the literal interpretation of the adjective, it should have resulted in elevated erroneous looks to the competitor rather than the target. This could not be done for the material items. Half of the experimental items in each session included a contrasting object, and half did not. Each participant saw only the contrast variant or the no-contrast variant of any stimulus set. An additional ten trials contained displays that were like those in the contrast condition except that a modified noun was used to refer to the competitor item. These counterbalancing trials were intended to ensure that individuals would not be cued to expect a modified noun to refer to a member of the contrasting pair. Stimuli and counterbalancing trials were pseudorandomly ordered with 26 filler trials.
Effect of Speaker-Specific Information
249
Table 10.1
Experimental items employed in the study. Modifier type
Target
Contrast
Competitor
Distracter 1
Distracter 2
Scalar
Small pad
Large pad
Small doll
Razor
Shampoo
Scalar
Thin marker
Thick marker
Thin brush
Potato masher
Crayons
Scalar
Thick notebook
Thin notebook
Thick book
Rag
Horseshoe
Scalar
Tall cup
Short cup
Tall pitcher
Eraser
Diskette
Scalar
Narrow Post-its
Wide Post-its
Narrow ribbon
Peanut butter
Pink bow
Scalar
Long envelope
Short envelope
Long spatula
Toy shovel
Lotion
Scalar
Tall doll
Short doll
Tall mug
Black pen
Yellow folder
Scalar
Long spoon
Short spoon
Short pencil
Peach
Quarter
Scalar
Wide tape
Narrow tape
Wide belt
Banana
Red pencil
Scalar
Fat crayon
Thin crayon
Fat marker
Egg
Tupperware
Material
Brass frame
Wood frame
Brass candle holder
Fork
Thread
Material
Leather glove
Wool glove
Leather wallet
Blue bow
Tape measure
Material
Paper plate
Plastic plate
Paper bag
Salt shaker
Shot glass
Material
Porcelain bowl
Plastic bowl
Porcelain saucer
Candle
Sunglasses
Material
Plastic spoon
Metal spoon
Plastic comb
Tie
Battery
Material
Styrofoam ball
Rubber ball
Styrofoam cup
Comb
Lego
Material
Metal ladle
Plastic ladle
Metal pan
Orange
Toy octopus
Material
Wool sock
Cotton sock
Wool cap
Duct tape
Plastic ruler
Material
Wood cutting board
Plastic cutting board
Wood ruler
Ribbon spool
Mirror
Material
Glass mug
Ceramic mug
Glass vase
Pencil
Floss
250
Grodner and Sedivy
All participants were told that the experimental instructions had been generated by “an individual who was asked to direct a listener through a sequence of object configurations” and that the experiment had been designed to test how effectively individual speakers were able to convey instructions by observing a perceiver’s responses. Fifteen participants were assigned to the reliablespeaker condition. The remaining sixteen participants were assigned to the unreliable-speaker condition. The impression of unreliability was conveyed in three ways. First, participants were told that the speaker who had recorded the instructions had an “impairment that caused language and social problems.” Second, the speaker described objects and locations erroneously. Five times over the course of the experiment an object was mislabeled. (For instance, a toothbrush was called a “hairbrush.”) On three occasions the instructions directed the perceiver to move an object to a location that did not exist. (For instance, a destination might be described as above object A and below object B, when in fact object B was below object A.) Both of these error types occurred in a minority of the nearly 200 instructions. Third, the speaker consistently used over-informative descriptions. There were 234 nonpronominal references where an unmodified form would have sufficed to indicate the object of inÂ� terest. Of these, 197 contained a superfluous modifier. The remaining 37 descriptions were unmodified nominal descriptions. Note that the presence of the modifier was not a reliable cue to the presence of a contextual contrast for the reliable speakers either. Overall, participants in the reliable-speaker condition heard thirty prenominally modified descriptions: the twenty stimuli and ten counterbalancing trials (the latter included with the specific intent of eliminating any within-experiment correlations between modification and contrastive reference). For each participant, only ten of these (the number of stimuli presented in the contrast condition) were uttered in the presence of a contextual contrast for the target. Although the reliable speaker generated far fewer modified forms than the unreliable speaker (30 versus 207), there was no contingency between modification and contextual contrast over the course of the experiment for either speaker. To avoid placing prominence on the prenominal modifier, nuclear stress was placed on the head noun of the NP in critical instructions. Durations of the adjective and noun are listed in table 10.2. Note that critical regions were cÂ�omparable across the unreliable-speaker conditions and the reliable-speaker conditions. Procedure
Display changes took approximately 5 seconds, and participants were permitted to watch the display as it was being changed. Every display contained a
Effect of Speaker-Specific Information
251
Table 10.2
Duration of modifier and noun in target instructions across conditions, in milliseconds. (Standard errors in parentheses.) Scalar
Adjective Noun
Material
Unreliable
Reliable
Unreliable
Reliable
288 (18) 378 (37)
289 (8) 373 (33)
351 (42) 332 (36)
372 (39) 355 (42)
centrally located fixation cross. Each trial began with a request for the subject to look at the cross, and participants were instructed to rest their eyes on the central cross between instructions. This was done so that eye movements to the target objects could be measured from a default position that was equidistant from all of the objects in the display. Participants were told to follow the instructions as quickly and accurately as they could. While the participant followed instructions to move objects in the workspace, eye-movement data were recorded using a lightweight ISCAN headmounted video-based tracking system. The camera provided an infrared image of the eye, and determined monocular eye position by monitoring the locations of the center of the pupil and the cornea reflection. A scene camera was mounted on the side of the helmet, providing an image of the subject’s field of view. Calibration was carefully monitored throughout each trial, and minor adjustments were occasionally made between trials. A VCR record consisting of the€instructions recorded with a microphone and the participant’s moment-bymoment gaze fixation superimposed over the scene-camera image, with timecode stamps at 30 Hz, was made for each experimental trial. Because the scene camera was mounted onto the helmet itself, and moved with the participant’s head, the VCR record took into account any head movements made by the participant, allowing for unrestricted head and body movements throughout the experiment. The entire experiment, including introduction to the equipment and task, practice and experimental trials, and debriefing, took approximately 35 minutes. Eye movements were analyzed by playing the audio-video record back for each 33-millisecond frame on a Sony DSR-30 digital VCR. For the experimental trials, critical points in the speech stream were identified, corresponding to the onsets of the adjective and head noun, and the offset of the head noun. Continuous eye movements occurring from the beginning of the instruction were noted until the participant reached for the target object. Thus, the joint identification of critical points in the speech stream and the eye-movement data
252
Grodner and Sedivy
Figure 10.2
Eye-movement record in real time for scalar items. Horizontal axis depicts msec after adjective onset. In the no-contrast conditions the data labeled “contrast” indicate the second distracter item in the display.
Effect of Speaker-Specific Information
Figure 10.2
(continued)
253
254
Grodner and Sedivy
allowed for the alignment of eye movements to speech, as presented in the Results section below. For purposes of analysis, the work surface was divided into a 3â•–×â•–3 grid with visibly demarcated boundaries. Eye movements to an object were coded from the first frame in which a saccade was launched to the square containing that object. Occasionally, poor calibration or eye blinks resulted in a temporary loss of tracking. If tracking resumed less than five video frames later and reappeared on the same location, the track was treated as continuous; otherwise, the eye-movement record noted the loss of tracking and, for that time period, treated the fixation as falling on none of the objects in the display. Results
Trials in which a participant reached for the incorrect object or the participant fixated on the target at the beginning of the adjective were omitted from analÂ� yses. The latter restriction was intended to exclude fixations on the target that were not initiated on the basis of speech input. Further, data points more than two standard deviations away from the mean for each condition and time frame were replaced with the mean for that condition. This affected 3.2 percent of the data. Figure 10.2 depicts the proportion of trials including fixations to each of the objects in the display from the onset of the adjective. To correct for variability in the auditory duration of the modifier across stimuli, each trial was aligned to the offset of the adjective. Average offsets for the noun and the adjective in each condition are indicated. Critical comparisons were conducted over the 500-msec window beginning 200 msec after adjective offset. This corresponds to where manipulations of discourse contrast have been observed in previous work. Analyses were performed for each modifier type separately. For scalar items, target advantage scores were computed by subtracting fixations to the competitor from fixations to the target over the critical interval. This provided a composite measure of the relative proportion of fixations to the entities that should be affected by the presence of a referential contrast. Figure 10.3 depicts target advantage scores for the scalar items. A 2â•–×â•–2 ANOVA crossing speaker type and the presence or absence of a contrasting object in the display resulted in an interaction significant in the participants analysis (F1(1,29)â•–=â•–5.05, MSEâ•–=â•–.043, pâ•–<â•–.05), but not in the items analysis (F2(1,9)â•–=â•–2.24, MSEâ•–=â•–.032, pâ•–=â•–.17). Planned comparisons for each type of speaker revealed that pÂ�erceivers responding to the reliable speaker benefited from the presence of a contextual contrast (F1(1,14)â•–=â•–17.98, MSEâ•–=â•–.024, pâ•–<â•–.001; F2(1,9)â•–=â•–4.6, MSEâ•–=â•–.041, pâ•–<â•–.05) but those responding to the unreliable speaker did not (F’sâ•–<â•–1). To establish the relative contributions to this pattern of looks to the target and the
Effect of Speaker-Specific Information
255
Figure 10.3
Target advantage scores for trials containing scalar adjectives over 500-msec interval region beginning 200 msec after adjective offset.
competitor, additional analyses were conducted for fixations to each of these objects separately. Analysis of the proportion of fixations to the target patterned similarly to target advantage scores. There was a trend toward an interaction of speaker and contrast marginal by participants (F(1,29)â•–=â•–3.36, MSEâ•–=â•– .01, pâ•–=â•–.08), though not by items (F2(1,9)â•–=â•–1.41, MSEâ•–=â•–.01, pâ•–=â•–.26). Independent comparisons established that reliable speakers elicited significantly more looks to the target in the presence of a contrast (F1(1,14)â•–=â•–11.9, MSEâ•–=â•– .007, pâ•–<â•–.01; F2(1,9)â•–=â•–6.4, MSEâ•–=â•–.01, pâ•–<â•–.05) and unreliable speakers did not (F’sâ•–<â•–1). Proportions of fixations to the competitor also appear to have contributed to the interaction of target advantage scores. There was a marginal interaction by participants (F1(1,29)â•–=â•–4.07, MSEâ•–=â•–.019, pâ•–=â•–.05), but not by items (F2(1,9)â•–=â•–1.44, MSEâ•–=â•–.021, pâ•–=â•–.26). For the reliable speaker, there were fewer spurious looks to the competitor in the presence of a contrast (F1(1,14)â•–=â•–15.2, MSEâ•–=â•–.008, pâ•–<â•–.01; F2(1,9)â•–=â•–2.2, MSEâ•–=â•–.026, pâ•–=â•–.09). The manipulation of contrast did not affect fixations to the competitor for the unreliable speaker (F’sâ•–<â•–.03). Figure 10.4 portrays looks to objects in the display in response to instructions containing material modifiers. Target advantage scores were computed
256
Figure 10.4
Grodner and Sedivy
Eye-movement record for the items containing material adjectives. Horizontal axis Â�depicts msec after adjective onset. In the no-contrast conditions the data labeled Â�“contrast” indicate the second distracter item in the display.
Effect of Speaker-Specific Information
Figure 10.4
(continued)
257
258
Grodner and Sedivy
Figure 10.5
Target advantage scores for trials containing material adjectives over 500-msec interval region beginning 200 msec after adjective offset.
and submitted to a 2â•–×â•–2 ANOVA crossing speaker reliability and contextual contrast (see figure 10.5). This revealed a significant interaction by items (F2(1,9)â•–=â•–9.03, MSEâ•–=â•–.011, pâ•–<â•–.05), but not by participants (F1(1,29)â•–=â•–1.92, MSEâ•–=â•–.048, pâ•–=â•–.18). Planned comparisons demonstrated a marginally reliable trend for higher target advantage scores in the presence of a contrast for reliable speakers (F1(1,14)â•–=â•–2.63, MSEâ•–=â•–.036, pâ•–=â•–.06; F2(1,9)â•–=â•–2.3, MSEâ•–=â•– .024, pâ•–=â•–.08). In contrast, target advantage scores were numerically lower in the unreliable-speaker condition, though this trend was not reliable (F’sâ•–<â•–2.8). Just as for the scalar conditions, looks to the target and the competitor were analyzed separately. There were no clear effects or interactions for fixations to the target (F’sâ•–<â•–.5). The absence of these effects and the relatively high target advantage scores likely reflect the tendency for individuals to identify the tÂ�arget extremely rapidly for the material modifiers. Looks to the competitor therefore provide a more sensitive indicator of the effect of contrast. Speaker type and contrast interacted reliably (F1(1,29)â•–=â•–5.6, MSEâ•–=â•–.011, pâ•–<â•–.05; F2(1,9)â•–=â•–18.4, MSEâ•–=â•–.003, pâ•–<â•–.01). In response to reliable speakers, there were fewer looks to the competitor when a contrasting object was in the dÂ�isplay (F(1,14)â•–=â•–7.64, MSEâ•–=â•–.013,â•–=â•–pâ•–<â•–.05; F2(1,9)â•–=â•–18.1, MSEâ•–=â•–.004, pâ•–<â•–.01).
Effect of Speaker-Specific Information
259
This was not the case for individuals in the unreliable-speaker condition (F’sâ•–<â•–.75).3 It was possible that perceivers in the unreliable-speaker condition might have delayed interpretive processes in light of the irregular instructions. To ensure that this was not the case, analyses were performed to establish when combined fixations to the competitor and the target, each of which matched the modifier property, diverged from looks to other objects in the display. In response to the reliable speaker, participant fixations isolated the target and the competitor with marginal reliability between 67 and 100 msec after adjective offset (F(1,14)â•–=â•–2.36, MSEâ•–=â•–.009 pâ•–=â•–.07). For instructions containing scalar modifiers, this divergence first occurred in the window between 200 and 233 msec after adjective onset (F(1,14)â•–=â•–2.43, MSEâ•–=â•–.009, pâ•–=â•–.07). For material modifiers, the difference was first observed between 33 and 66 msec (F(1,14)â•–=â•– 2.04, MSEâ•–=â•–.011, pâ•–=â•–.09). Adjective-linked eye movements in response to the unreliable speaker were initially observed between 67 and 100 msec over all stimulus items (F(1,15)â•–=â•–2.87, MSEâ•–=â•–.005, pâ•–=â•–.06), between 100 and 133 msec for scalar conditions (F(1,15)â•–=â•–2.04, MSEâ•–=â•–.012, pâ•–=â•–.09), and between 67 and 100 msec for the material conditions (F(1,15)â•–=â•–3.53, MSEâ•–=â•–.033, pâ•–<â•–.05). For every comparison given here, fixations to the target and the competitor were significantly higher than to other objects in the display at each subsequent 33-msec analysis frame (F’sâ•–>â•–3.5, p’sâ•–<â•–.05). In view of estimates that programming and launching a saccade takes approximately 200 msec (Matin, Shao, and Boff 1993), these comparisons indicate that participants in both conditions are using the speech input to incrementally fix reference before the offset of the modifier. Importantly, participants in the unreliable-speaker condition were at least as rapid to respond to the literal denotation of the modifier (and, in the case of the scalar adjectives, perhaps quicker) as those in the reliable-speaker condition. The unnaturalness of the speaker did not cause interpretation to be any less incremental. Note too that eye movements were not retarded by the slightly faster material adjectives in the unreliable-speaker cÂ�ondition. Discussion
The results demonstrate that speaker-specific attributes influence whether a restrictive modifier will be interpreted contrastively. In line with previous work, individuals in the reliable-speaker condition were aided in determining the referent for a restrictively modified nominal description by the presence of a contextual contrast. With scalar modification, there were more early looks to the target, and fewer looks to a competing object that matched the scalar property. For the unreliable speaker, neither effect was observed. When the reliable
260
Grodner and Sedivy
speaker uttered descriptions containing material modifiers, fewer looks were elicited to the competing object in the presence of a contrasting object. This too was not observed with the unreliable speaker. Thus, individuals were not aided by a contextual contrast when they had reason to believe a speaker did not use modification cooperatively. This is in accord with the view that participants in the unreliable-speaker condition did not take the presence of a modifier to iÂ�mply the existence of a contrast set. These participants were just as responsive to the literal meaning of the adjective as for the reliable speaker. The speaker manipulation selectively eliminated the generation of the contrastive implicature licensed by the adjective. That contrastive inferences can be eliminated by manipulating speaker reliability provides strong support for a pragmatic interpretation of referential contrast effects, and furthermore implies that early inferencing admits episodic information. The speed of contrastive inferences therefore cannot be explained as an automatic reflex of deviating from an immutable default form. This is at odds with the most straightforward reading of the short-circuited implicature proposal introduced above. However, this result does not imply that iÂ�ndividuals are engaging in an overt counterfactual deductive reasoning process in order to use the contextual contrast appropriately. We return to this point in the general discussion. Each modifier condition was similarly affected by speaker reliability and by€contextual contrast. The presence of a contrast improved target identification for both scalar and material conditions in response to reliable speakers. In both cases, the effect of referential contrast vanished with unreliable speakers. Thus the contrastive inference is pragmatic in origin for relational (scalar) and non-relational (material) modifiers alike. This rules out the possibility that sÂ�calar terms convey the expectation for contextual contrast as part of their literal meaning. If they did, then the contrast effect should not have been canceled by manipulating the pragmatic context of the modifier’s use. This is not to say that scalar and material adjectives were interpreted identically. IndividÂ� uals were somewhat slower to map scalar meanings to the subset of items that matched the modifier denotation. It is possible that the relational component of scalar meaning complicates the task of converging on a target. Parallel findings from a recent production study conducted in our laboratory found that more disfluencies occur before prenominal scalar modifiers than before material or€ color modifiers (Gregory, Grodner, Joshi, and Sedivy 2003). This is parÂ� ticularly striking in light of the fact that scalar terms are more frequent in both€spoken and written corpora. Taken together, these observations buttress the thesis that scalar denotations are more conceptually complex than non-
Effect of Speaker-Specific Information
261
comparative adjective denotations. Investigating how the lexical semantics of various modifiers mediates reference resolution is an interesting direction for future research. General Discussion
The results reported herein strengthen the case for a pragmatic explanation of referential contrast effects by demonstrating their defeasibility. In addition, they indicate that there is a limit to the generality of quantity-based inferences and that characteristics particular to a speaker are taken into account in the generation of referential contrast inferences. A number of questions remain open. For example, the results reviewed above do not address how perceivers’ expectations are updated or represented in a way that can influence early inferencing. It is interesting to speculate which cue to unreliability, or combination of cues, caused the attenuation of the contrast effect. One possibility is that perceivers’ inferences hinge directly on their overt beliefs about the degree to which the speaker is conforming to principles of rational, orderly communication. Hence, an awareness of the mistakes made by the speaker and the explicit identification of the speaker as non-normal defuse the contrastive inference. Perceivers may believe that the speaker’s impairment causes him to be an uncooperative or unreliable communicative partner and thus suspend any inferences made on that basis. A second possibility is that pragmatic inferencing reflects a more implicit assessment of the communicative proclivities of the speaker. For instance, perceivers may attend to the statistics of modifier use for a particular speaker. Over the course of the experiment they would note that the presence of the adjective is not a reliable cue to contrast. As a result of the overuse of the modifier, they might recalibrate the anticipated default description to include a modifier. This view predicts that the effect of contrast might get weaker with repeated exposure to over-descriptive referring expressions (though it is unclear how much local experience might be necessary to override the prepotent contingency between modification and contextual modification). Anecdotal evidence from post-experimental debriefing suggests that perceivers do not register conscious awareness of the extent to which speakers exhibit optimally informative communicative behavior. When queried whether they noticed anything unusual about the experiment, participants frequently mentioned trials in which objects were mislabeled or in which object destinations made reference to impossible configurations. They rarely if ever mentioned the over-explicit object labels. These observations hint that the speaker’s overspecified descriptions did not have a large impact on the overt assessment of the reliability of
262
Grodner and Sedivy
the speaker. Still, the speaker-specific modulations of the contrast effect could be attributable to high-level beliefs about speaker characteristics, lower-level statistical properties of the speakers output, or some combination of both. If only high-level cues are needed to indicate speaker unreliability, then pÂ�erceivers need not learn about the particular way in which the present speaker is uninformative, and should suspend the computation of a broad range of implicit meanings. Further, there should be no evidence of greater reduction in the contrast effect as the perceiver accumulates evidence of over-informativity. On the other hand, sensitivity to lower-level cues should result in incremental changes to how perceivers respond to the descriptive patterns used by a speaker as more tokens of speaker descriptions are encountered. To evaluate these alternatives, exploratory analyses were conducted comparing performance on items in the first and the second halves of the exÂ� periment. The hypotheses differ with respect to their predictions for the reliable-speaker conditions. To see why this is so, note that reliable speakers uttered modified forms to refer to competitors as often as they did to pick out a member of a contrasting pair (for ten items across the 56 experimental trials in each case). Hence, there was no reliable contingency between modification and contextual contrast for the reliable-speaker condition. Thus, over the course of the experiment, perceivers may have come to adjust their expectations about the information conveyed by modification as a result of encountering a significant number of modified forms in the absence of contrast. The unreliable speaker, on the other hand, consistently used over-specific forms on a much large number of trials, with 15 modified forms occurring in the absence of contrast even before the first experimental item. Therefore, statistics corresponding to the overuse of the modifier may have already been adjusted on the expected referential form by the point at which inferential processes could be assessed in the first half of the experiment. The most interesting measure for the reliable speaker was looks to the competitor for scalar items, depicted in figure 10.6. The interaction of contrast and block order was marginal (F(1,13)â•–=â•–3.61, MSEâ•–=â•–.021, pâ•–=â•–.08). Consistent with the statistical tuning hypothesis, looks to the competitor were reduced in the presence of a contrast in the first block of the experiment (F(1,13)â•–=â•–7.41, MSEâ•–=â•–.031, pâ•–<â•–.05), but not in the second (Fâ•–<â•–1). There were no effects of contrast or interactions with block order for the unreliable speaker (F’sâ•–<â•–1.5). Two caveats are in order: The present experiment was not specifically designed to test these hypotheses and no other measure differed reliably across blocks. Still this trend is suggestive that overuse of the modifier contributes to the elimination of the contrast effect.
Effect of Speaker-Specific Information
263
Figure 10.6
Looks to competitor object over interval 200 –700 msec after noun onset for scalar items across first and second blocks of experiment. Top and bottom panels represent data from the reliable and unreliable speaker conditions, respectively.
264
Grodner and Sedivy
A follow-up study delved further into the source of the pragmatic effect. A€ reliable-speaker condition was compared against two unreliable speakerconditions. In one of these unreliable conditions, the speaker overused prenominal modifiers consistently just as in the present experiment. In the other, the speaker encoded the same redundant content as the prenominal condition, but did so with a postnominal modifier (e.g., “Pick up the cup that’s tall”). As in the previous experiment, critical trials tested instructions containing prenominal modifiers for all three conditions. But in contrast with the first experiment, none of the explicit cues to speaker irregularity were given. If perceivers are sensitive to the redundancy of content, then both unreliable speaker types should result in a reduction of the contrast effect. If perceivers are especially sensitive to a redundancy of a particular form, then the excessive use of the prenominal modifier should result in a greater reduction of the contrast effect than the postnominal condition. It is also possible that high-level knowledge of the speaker’s impairment would be necessary to draw attention to low-level redundancies in the present study. If so, then there might not be a marked reduction in the effect of contrast. Figure 10.7 depicts target advantage scores for instructions containing scalar terms over the same temporal region analyzed in the present experiment. For all three conditions there was an apparent benefit of the presence of a contrast in identifying the target. This was significant by participants and items for the reliable speaker (F1(1,18)â•–=â•–5.16, MSEâ•–=â•–.05, pâ•–<â•–.05; F2(1,9)â•–=â•–5.48, MSEâ•–=â•– .02, pâ•–<â•–.05), and by participants for the condition with excessive prenominal modification (F1(1,18)â•–=â•–3.87, MSEâ•–=â•–.03, pâ•–<â•–.05; F2(1,9)â•–=â•–1.28, MSEâ•–=â•–.03, pâ•–=â•–.14). Though a similar trend was observable for the condition with excessive postnominal modification, it was marginally reliable only in the items analysis (F1(1,16)â•–=â•–1.59, MSEâ•–=â•–.09, pâ•–=â•–.11; F2(1,9)â•–=â•–2.44, MSEâ•–=â•–.04, pâ•–=â•– .08). There was no hint of an interaction between speaker type and contrast (F’sâ•–<â•–.5). This work demonstrates that overusing restrictive modification is not sufficient by itself to eradicate the contrastive interpretation of scalar modifiers. This does not immediately imply that the high-level cues to speaker reliability were solely responsible for defusing the expectation of contrast in the earlier experiment. It is possible that high-level cues were necessary to draw the listener’s attention to the low-level redundancy. It is also possible that the pattern observed in the follow-up experiment was unique to scalar modifiers, which are inherently comparative. Another open question is what kind of pragmatic mechanism could be responsible for the above findings. A Gricean explanation for contrastive interpretations of restrictive modifiers claims that perceivers make the following inferential steps when a speaker utters a modified description:
Effect of Speaker-Specific Information
265
Figure 10.7
Target advantage scores elicited by instructions containing scalar terms uttered by a speaker who use modification reliably (Nâ•–=â•–19), a speaker who used prenominal modification excessively (Nâ•–=â•–19), and a speaker who used postnominal modification excessively (Nâ•–=â•–17).
(i)╇ If the speaker means to use this utterance to pick out an intended referent in a context with only one entity of that type, then a default description would be the most natural means to do so. (ii)╇ The speaker chose a more specific description than the default by using a modified referential phrase. (iii)╇ If the speaker is behaving cooperatively, he should not be more informative than is necessary. Overspecification must have some purpose. (iv)╇ Because of (ii) and (iii), the conditions for uttering a default form in (i) must not hold. That is, it must not be the case that perceiver intends the utterance of the modified form to pick out an entity in isolation. One possible conclusion of (i)–(iv) is (v): (v)╇ There are multiple entities in the context of the same type. There are a number of challenges inherent to implementing this sort of reasoning in a real-time processing system. For one, it is not clear what kind of reasoning system would permit arriving at (v) as rapidly as perceivers do. That is,
266
Grodner and Sedivy
given step (iii), what ranges of “purposes” for the inclusion of modification are considered, and what weight is each of them given? It is certainly not the case that distinguishing between like entities is the only or even the most frequent function of modification. (See Fox and Thompson 1990 for a taxonomy of functions of relative-clause modifiers.) Second, a critical step is embedded in (ii). How does the system decide the appropriate alternative to compute and compare to a given referential expression? For instance, “the plastic cup” is a more specific description than “the plastic entity,” yet intuition tells us that the€ contrastive entity invoked is another cup and not another plastic object. The€comparison of alternative forms is even more impressive when we consider that it is occurring incrementally even before the noun is encountered. A partial answer to both of these questions might be that statistical regularities constrain the consideration of alternative expressions and their functions, and indeed that inferential steps may be statistically linked. This would permit the processor to bypass the complexity inherent in reasoning about alternatives cÂ�ounterfactually. Let us sketch one way this approach might work to account for the currently known facts about contrastive inferences. First, determining the referential forms to be compared could be a function of patterns of co-occurrence among properties in referential descriptions. Modifiers are more promiscuous than nouns. “Plastic” will be used to modify a wide variety of artifact labels, whereas “cup” will be used to describe a narrow set of object referents. The most accessible default for “the plastic cup” is “the cup” rather than “the plastic thing” because the descriptions in which “cup” participates form a more coherent set of properties than the descriptions in which “plastic” participates. This would explain our intuitions about the dimension along which a contrast set differs from the referent set. Whether or not an expression that is actually used deviates from the expected default could be computed from the likelihood of using a modifier encoding some specific property together with a particular noun. For instance, the ratio of expressions encoding yellowness as a property given the total occurrences of a noun such as “banana” would presumably be lower than the ratio of encoding “yellow” given “notebook.” Thus, “yellow banana” would represent more of a deviation from the expected default than “yellow notebook,” triggering a search for some function to the modifier. The preferred function (i.e. referentially contrasting) could be arrived at via a statistical link between steps (ii) and (v) (though note that this statistical link rÂ�emains to be empirically established, given the observations in Fox and Thompson). Clearly this is an extremely partial sketch of how the aforementioned challenges might be met. Our present purpose is merely to point to potential directions for realizing a reasoning system that is flexible enough to
Effect of Speaker-Specific Information
267
consider situation-based information, but sufficiently constrained to operate in real time. There are at least two points in the process outlined above at which speakerspecific effects could have influenced the present experiment. One is to directly recalibrate the default form at step (ii) for a particular speaker on the basis of statistical regularities in the recent episodic record. An alternative that seems more computationally cumbersome is to generate expectations of defaults€more generally in step (ii) as a function of global experience across many speakers, and then to invoke the criterion in (iii) as a prerequisite to identifying the most likely function for the modifier. In the former case, speaker-particular regularities for the speakers in the experiment reported above might be used to calibrate expectations about default expressions. That is, modified descriptions would count as deviations from the expected default for reliable speakers, but not for unreliable speakers. This type of rapid calibration to characteristics of a speaker’s speech could be similar to the low-level acoustic calibration that is automatically achieved to take into account speaker characteristics such as age and gender, allowing perceivers to cope efficiently with speaker variability. The second alternative posits that the use of a modified phrase would count as a deviation from the default in both speaker conditions; however, step (iii) would then be invoked, and the determination of a speaker’s unreliability with respect to communicative norms would suspend a search for an appropriate function for the modifier. This latter explanation is more computationally Â�complex, and is not able to exploit direct statistical links between steps (ii) and€(v). Our follow-up experiment attempts to distinguish between these two explanations. Thus far, the results suggest that a determination of speaker reliability (i.e. step (iii)) may not be entirely dispensable in accounting for contrast set inferences. Further research is needed to determine whether overt signaling of speaker unreliability or uncooperativeness is both necessary and sufficient to suspend the process of forming contrastive inferences, as well as what sorts of evidence can be used to make this determination. Another way of exploring the explanatory power of statistical regularities in generating pragmatic inferences is to examine whether perceivers’ generation of a contrastive implicature depends on the identification of plausible alternative functions of modifiers. To test this theory, it would be interesting to see if a pragmatic manipulation that did not alter the cue validity of a modifier could defuse the contrastive implicature. Consider the sentence in (5) uttered within a discourse where a woman is trying and failing to reach a cup on a shelf. (5)╇ The short woman could not reach the cup on the top shelf.
268
Grodner and Sedivy
The modifier provides a causal explanation for the prominent event. Intuitively no contrasting woman is conjured by this example. It is possible that the prepotent identifying function of the modifier initially results in a contrastive interpretation, which is later retracted on the basis of secondary deliberative processes. Alternatively, one might see no evidence of contrast inferences in initial referential commitments, which would suggest that the identification of alternative functions comes into play before the conclusion reached in step (v). This could be implemented either at step (ii), in allowing relevance-based considerations to constrain expected default expressions, or at step (iii), once a deviation from the expected expressions triggers a search for plausible functions for that deviation. Careful temporally sensitive experimentation along these lines has the potential to clarify the computation of pragmatic inferences that have hitherto been addressed primarily by theoretical linguists. Representational distinctions have been posited between encoded content implicit content, and between generalized and particularized implicature. These formal distinctions provide useful starting points for formulating processing hypotheses. Further investigation along these lines will allow us to gain ground in understanding which parts of the inferential process are generated automatically and which parts are computed ad hoc. Acknowledgments
This work profited from helpful comments and feedback given by the aÂ�udiences at the 16th CUNY conference on Sentence Processing in Cambridge and the 8th AMLaP Conference in Glasgow. We are grateful to Anjula Joshi, Natasha Trentacosta, and Michele Hebert for assistance in collecting and coding data. This work was supported in part by NIH grant F32 MH65837-01 awarded to the first author and NIH grant R01 MH62566-01 awarded to the second aÂ�uthor. Notes 1.╇ Here and throughout, we use the term “inference” to refer to information that is communicated to the perceiver via the utterance of an expression, but which is not part of its asserted content. This would include both accommodated presuppositions and implicatures. We adopt this term because it is neutral with respect to whether the inferred content arises from the conventional or implicated meaning of the critical expression. 2.╇ This example is adapted from Levinson 2000. 3.╇ Figure 10.4 suggests that there were frequent looks to the second distracter (labeled as “contrast”) before adjective offset in the unreliable-speaker no-contrast condition. Though proportion of fixations appears to be elevated relative to looks to the other
Effect of Speaker-Specific Information
269
distracter, this difference is not significant (F’sâ•–<â•–1.7). There are also no such elevated fixations for the reliable-speaker no-contrast condition. Further, the elevation occurs extremely early (200 msec before adjective offset), which makes it unlikely that it was affected by the critical description. References Altmann, G., Garnham, A., and Dennis, Y. 1992. Avoiding the garden-path: Eye movements in context. Journal of Memory and Language 31, 685–712. Altmann, G., and Steedman, A. 1988. Interaction with context during human sentence processing. Cognition 30, 191–238. Barr, D. J. 2003. Listeners are mentally contaminated. Poster presented at 44th annual meeting of Psychonomic Society, Vancouver. Barr, D. J., and Keysar, B. 2002. Anchoring comprehension in linguistic precedents. Journal of Memory and Language 46, 391– 418. Bierwisch, M. 1987. The semantics of gradation. In M. Bierwisch and E. Lang (eds.), Dimensional Adjectives. Springer-Verlag. Bradlow, A. R. 2002. Confluent talker- and listener-related forces in clear speech production. In C. Gussenhoven and N. Warner (eds.), Laboratory Phonology 7. Mouton de Gruyter. Carston, R. 1998. Pragmatics and the Explicit-Implicit Distinction. Ph.D. thesis, University College London. Chambers, C. G., Magnuson, J. S., and Tanenhaus, M. K. 2004. Actions and affordances in syntactic ambiguity resolution. Journal of Experimental Psychology: Learning, Memory and Cognition 30, 687– 696. Clifton, C., Jr., Bock, J., and Radó, J. 2000. Effects of the focus particle ‘only’ and intrinsic contrast on comprehension of reduced relative clauses. In A. Kennedy, R. Radach, D. Heller, and J. Pynte (eds.), Reading as a Perceptual Process. Elsevier. Clifton, C., Jr., and Ferreira, F. 1989. Ambiguity in context. Language and Cognitive Processes 4 (special issue), 77–103. Crain, S., and Steedman, M. 1985. On not being led up the garden path: The use of context by the psychological parser. In D. Dowty, L. Karttunnen, and A. Zwicky (eds.), Natural Language Parsing. Cambridge University Press. Fox, B. A., and Thompson, S. A. 1990. A discourse explanation of the grammar of relative clauses in English conversation. Language 66, 297–316. Gregory, M., Grodner, D., Joshi, A. and Sedivy, J. 2003. Adjectives and processing effort in production: So, uh, what are we doing during disfluencies? Paper presented at 16th annual CUNY conference on human sentence processing, Boston. Grice, H. P. 1975. Logic and conversation. In P. Cole and J. Morgan (eds.), Syntax and Semantics 3: Speech Acts. Academic Press. Hanna, J. E., Tanenhaus, M. K. and Trueswell, J. C. 2003. The effects of common ground and perspective on domains of referential interpretation. Journal of Memory and Language 49, 43– 61.
270
Grodner and Sedivy
Isaacs, E. A., and Clark, H. H. 1987. References in conversations between experts and novices. Journal of Experimental Psychology: General 116, 26 –37. Jurafsky, D. 2003. Pragmatics and computational linguistics. In Laurence R. Horn and Gregory Ward (eds.), Handbook of Pragmatics. Blackwell. Levinson, S. 2000. Presumptive Meanings. MIT Press. Metzing, C., and Brennan, S. E. 2003. When conceptual pacts are broken: Partner-Â� specific effects in the comprehension of referring expressions. Journal of Memory and Language 49, 201–213. Nadig, A. S., and Sedivy, J. C. 2002. Evidence of perspective taking constraints in children’s on-line reference resolution. Psychological Science 13, 329–336. Ni, W., and Crain, S. 1989. How to resolve structural ambiguities. In Proceedings of the 20th North East Linguistic Society Conference. Ni, W., Crain, S., and Shankweiler, D. 1996a. Sidestepping garden paths: Assessing the contributions of syntax, semantics and plausibility in resolving ambiguities. Language and Cognitive Processes 11 (3), 283–334. Rooth, M. 1985. Association with Focus. Ph.D. Dissertation, University of Massachusetts, Amherst. Sadock, J. 1978. On testing for conversational implicature. In P. Cole (ed.), Syntax and Semantics, volume 9: Pragmatics. Academic Press. Schober, M. F. 1998. Different kinds of conversational perspective-taking. In S. R. Fussell and R. J. Kreuz (eds.), Social and Cognitive Psychological Approaches to Interpersonal Communication. Erlbaum. Sedivy, J. C. 2001. Evidence of Gricean expectations in on-line referential processing. Paper presented at 14th CUNY Conference on Sentence Processing. Sedivy, J. C. 2002. Invoking discourse-based contrast sets and resolving syntactic ambiguities. Journal of Memory and Language 46 (2), 341–370. Sedivy, J. C. 2003. Pragmatic versus form-based accounts of referential contrast: Evidence for effects of informativity expectations. Journal of Psycholinguistic Research 32 (1), 3–23. Sedivy, J. C. 2005. Evaluating explanations for referential context effects: Evidence for€ Gricean mechanisms in on-line language interpretation. In J. Trueswell and M. Tanenhaus (eds.), World Situated Language Use: Psycholinguistic, Linguistic, and Computational Perspectives on Bridging the Product and Action Traditions. MIT Press. Sedivy, J. C., Chambers, C., Tanenhaus, M., and Carlson, G. 1999. Achieving incremental semantic interpretation through contextual representation. Cognition 71, 109– 147. Siegel, M. 1980. Capturing the Adjective. Garland. Sperber, D., and Wilson, D. 1986. Relevance. Harvard University Press. Spivey-Knowlton, M., and Tanenhaus, M. 1994. Referential context and syntactic ambiguity resolution. In C. Clifton, L. Frazier, and K. Rayner (eds.), Perspectives on Sentence Processing. Erlbaum.
Effect of Speaker-Specific Information
271
Steedman, M., and Altmann, G. 1989. Ambiguity in context: A reply. Language and Cognitive Processes 4 (special issue), 211–234. Tanenhaus, M. 2003. Referential domains in language processing. Paper given at 2003 Architectures and Mechanisms of Language Processing conference, Glasgow. Tanenhaus, M., Spivey-Knowlton, M., Eberhard, K., and Sedivy, J. 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268, 1632–1634.
11â•…
Referential Processing in Monologue and Dialogue with and without Access to Real-World Referents Simon Garrod
In psycholinguistics there is a long tradition of studying referential proÂ� cessing — how language users produce and interpret referring expressions. It began with an interest in anaphora resolution in written or spoken narrative (Garnham 2001). Recently it has encompassed research on referential processing using the visual-world paradigm (Tanenhaus, Spivey-Knowlton, and Eberhard 1995) and work on reference in interactive communication (Brennan and Clark 1996; Garrod and Anderson 1987). These recent studies focus on situations that are quite different from the traditional reading or listening tasks. In contrast to the reading research, visual-world experiments provide an extralinguistic context against which utterances are interpreted and referential communication tasks concentrate on how dialogue participants achieve referential alignment (i.e., come to a common understanding of what is being referred to) in even richer contexts. This paper sets out to evaluate the contribution of the recent work in relation to that of the earlier reading studies. The paper is organized into three main sections. The first considers the general nature of reference and the role of situation models in theories of referential language processing. The second contrasts the recent work on pronoun resolution using either eye tracking in reading or a visual-world paradigm. The third explores referential alignment in dialogue and shows how a recent visualworld-paradigm study throws new light on such alignment processes. The Role of Situation Models
Reference features in a wide range of communication situations both linguistic and otherwise (Evans 1982). At one extreme you might refer to something simply by pointing to it. For example, consider the situation in which you respond to a shop assistant’s request by pointing to the item that you want to purchase. At the other extreme you might refer to something with a reflexive pronoun in a written text.1 So referential processing may call on a wide range
274
Garrod
of information from quite different sources. When you point to an item in a shop, the crucial information relates to what you and the shop’s assistant can both see. However, when you are interpreting the reflexive pronoun, the crucial information may come from the syntactic analysis of the sentence in which the pronoun appears. This means that referential processing, even when restricted to linguistic reference, may be influenced by a range of factors. We need to bear this in mind when comparing studies using different paradigms (e.g., in a visual-world experiment as compared to a reading experiment). I will argue that one important difference is in the construction and use of situation models to support reference. Situation models are assumed to be multi-dimensional representations containing information about space, time, causality, intentionality, and currently relevant individuals (Zwaan and Radvansky 1998). They capture what people are “thinking about” while they understand a text, and therefore are in some sense within working memory (they can be contrasted with linguistic representations and with general knowledge). Situation models play an important role in most theories of referential processing (Garnham 2001; Johnson-Laird 1983; Sanford and Garrod 1981). The basic idea is that models of the “discourse world” mediate between references in the language and the entities they refer to. Under some circumstances, the reference may be to something that is not immediately accessible and may not even exist (e.g., a fictional character in a novel). Under other circumstances, the reference may be to something that is immediately accessible (e.g., a picture before you and your audience). I shall argue that models are needed in both cases to keep a record of the mapping between expression and referent, otherwise it would not always be possible to interpret anaphoric references to entities previously introduced into the discourse. Nevertheless, it is also important to recognize that the situation model plays a different role in these two cases. When there is no “available” realworld referent, as in the first case, the model acts as a “surrogate” represenÂ� tation, in a sense standing in for the world of discourse. This means that referential processing in this situation has two aspects: constructing the model as a representation of the world portrayed in the discourse and mapping references onto their referents in the model. As an example, consider sentences (1) and (2) below (adapted from Stenning 1975). (1)╇ In the morning Harry let out his dog Fido. (2)╇ In the evening he returned to find a starving beast. The first sentence introduces two entities into the situation model, one for Harry and the other for his dog Fido. With the second sentence most readers
Referential Processing
275
Figure 11.1
Visual context for interpreting sentences (1) and (2).
then interpret a starving beast as coreferential with Fido. In other words, the reader maps a starving beast onto the entity in the model corresponding to Fido. Stenning (1975) suggested that readers do this because the interpretation leads to the most parsimonious discourse model. In other words, rather than introduce a third referent into the discourse model, the interpretation keeps the model simple by merging the two referents into one; Fido becomes identified with the starving beast. What is interesting about the example is that readers make the coreferential interpretation despite the absence of cues in the text to support it. His dog Fido is not semantically associated with the description starving beast, nor is there a syntactic cue to indicate anaphora (i.e., starving beast is introduced with an indefinite rather than a definite NP). Now consider sentences (1) and (2) in the context of figure 11.1. Here most readers or viewers would interpret Fido in (1) as referring to the dog, but would interpret a starving beast in (2) as referring to the tiger. So listening to the sentence while viewing the pictures would change the referential interpretation. This simple illustration points to an important difference between refÂ� erential processing in the two situations. With a visual context what you see dominates the referential interpretation — what Stephen Dedalus called “the ineluctable modality of the visible” in Ulysses (Joyce 1932). So rather than go with the one linguistic antecedent, Fido, viewers now prefer to treat starving beast as referring to the most salient entity in the picture that matches the description given (i.e., the tiger). In the light of this example, it might be tempting to argue that once there is a real-world context, even just pictures of potential
276
Garrod
referents, a situation model is superfluous for referential processing. With a picture the viewer simply links the linguistic reference directly to what is accessible from the picture. But this alternative account runs into problems with anaphora. For example, if the short text in (1) and (2) were to continue with (3), the reader would have to be able to work out that It referred either to the tiger or the dog. There is no straightforward way of doing this without access to a situation model that reflects, in some way, mappings between previous expressions and their referents. Consider how difficult it would be to interpret the pronoun in (3) given the picture without the preceding text in (1) and (2). (3)╇ It had had nothing to eat all day. In addition, mapping the expression a starving beast onto the tiger in the picture has the effect of indexing the reference in memory (Glenberg and Robertson 1999). In other words, it leads to a special kind of memory representation that links the reference to features of the real-world situation as seen in the picture. This enables the viewer to represent the starving beast in memory as a particular and ferocious tiger. The difference between the two situations — that is, reading with and without a visual world — relates to how different sources of information define the situation model. Whereas someone reading a novel has to conjure a model ( based on information in the text together with knowledge of what the text is about), someone listening to descriptions of a visual scene does not have to do this, because he already has a model based on what he can see in the picture. This means that we might expect to find different processes dominating referential interpretation in the two situations. In the first case, these will be processes aimed at constructing a coherent and parsimonious situation model on the basis of what has been read so far. In the second case, they will be processes aimed at establishing mappings between the utterances under interpretation and a model based on the visual world in view. The role of the situation model also differs in monologue and dialogue processing. When one is interpreting monologue (e.g., reading a novel), the primary goal is to come up with a coherent model that is consistent with what is being read. If the novel is well written and one is a competent reader, then one should be able to come to an interpretation that matches roughly what the author intended. However, this does not depend on establishing any kind of consensus with the author. After all, she may well be long dead and gone. By contrast, establishing consensus is a primary goal of dialogue, and consensus requires alignment of situation models (see Pickering and Garrod 2004). Consider the short extract of dialogue shown in table 11.1, in which two pÂ�layers in a collaborative maze game are trying to establish precisely where each
Referential Processing
277
Table 11.1
Example of maze-game dialogue from Garrod and Anderson 1987. Colons within dialogue mark noticeable pauses of less than 1 second. The positions A and B are describing are illustrated in figure 11.2. 1.╅╇ B:╇ O.K. Stan, let’s talk about this. Whereabouts—whereabouts are you? 2.╅╇ A:╇ Right: er: I’m: I’m extreme right. 3.╇╅ B:╇ Extreme right. ╛╛ 8.╇╅ A:╇ You know the extreme right, there’s one box. 9.╅╇ B:╇ Yeah right, the extreme right it’s sticking out like a sore thumb. 10.â•… A:╇ That’s where I am. 11.â•… B:╇ It’s like a right indicator. 12.â•… A:╇ Yes, and where are you? 13.â•… B:╇ Well I’m er: that right indicator you’ve got. 14.â•… A:╇ Yes. 15.â•… B:╇ The right indicator above that. 16.â•… A:╇ Yes. 17.â•… B:╇Now if you go along there. You know where the right indicator above yours is? 18.â•… A:╇ Yes. 19.â•… B:╇If you go along to the left: I’m in that box which is like: one, two boxes down, OK?
pÂ�layer’s token is located in a maze whose configuration is known to both of them (see figure 11.2).2 Take B’s reference to A’s position in utterance (11) with the description right indicator. Clearly, the description is idiosyncratic but sufficient to establish consensus about where A’s token is located at that point. Furthermore, the consensual description is so effective for the interlocutors that right indicator is then used as the basis for subsequent location descriptions (see utterances 13–16 in table 11.1). What has happened here is that interlocutors have aligned on a model in which the maze is seen as a collection of figures or patterns and right indicator refers to any pattern sticking out on the right (for a detailed discussion see Garrod and Anderson 1987). Alignment of reference and situation model seems to be essential for successful dialogue (Brennan and Clark 1996; Garrod and Clark 1993; Garrod and Doherty 1994; Pickering and Garrod 2004). In the same way that different kinds of processing may dominate referential resolution with vs. without a visual world, we would expect different kinds of processing to dominate reference resolution in dialogue vs. monologue. In particular, we would expect processes based on the interaction between interlocutors to be primary (Clark 1996; Clark and Wilkes-Gibbs 1986).
278
Garrod
Figure 11.2
A schematic representation of the maze being talked about in the transcript reproduced in table 11.1. The two positions being described in the transcript are indicated with the arrows. Reference in Monologue: Results from Eye-Tracking Studies with and without a “Visual World”
As a background for evaluating the recent visual-world experiments with monologue, I begin by considering comparable studies in reading. In partic� ular, I concentrate on studies investigating the time course of interpretation of pronouns. Because both the reading studies and the visual-world studies use eye-tracking techniques, it is possible to make direct comparisons of the time course of processing in the two cases. This is important because it turns out that many of the studies using the visual-world paradigm point to earlier reference resolution than is found in the reading studies. The reading studies that I consider here are by Garrod, Freudenthal, and Boyle (1994) on the time course of pronoun resolution and by Sturt (2003) on resolution of reflexives. I choose these because they are similar in certain key respects to visual-world experiments on pronoun resolution by Arnold, Eisenband, Brown-Schmidt, and Trueswell (2000) and on resolution of reflexives by Runner, Sussman, and Tanenhaus (2003). Garrod et al. (1994) used materials containing contextual anomalies to determine when during reading participants integrate information associated
Referential Processing
279
with the interpretation of pronouns with material from the preceding discourse. We used passages such as the following: An incident in the pool Female-agent context: Elizabeth was an inexperienced swimmer and would not have gone in if the male lifeguard had not been standing by the pool. However, she soon got out of her depth and began to wave her hands in a frenzy. Male-agent context: Alexander was an inexperienced swimmer and would not have gone in if the male lifeguard had not been standing by the pool. However, he soon got out of her/ his depth and began to wave his hands in a frenzy. (a) Within seconds she jumped into the pool ( b) Within seconds she sank into the pool (c) Within seconds he jumped into the pool (d) Within seconds he sank into the pool Notice that when the focused character is Elizabeth as in the female-agent context sentences (a) and (d) are implausible continuations, whereas sentences ( b) and (c) are not. Furthermore, the implausibility of (a) and (d) comes from what we know about the relative contextual state of Elizabeth and the lifeguard; whereas Elizabeth can sink but not jump, the lifeguard can jump but not sink at that point in the story. So the question is whether readers are sensitive to this contextual anomaly at the point of encountering the critical verb. In other words, do readers spend longer on the verb jumped in (a) than sank in ( b) and longer on sank in (d) than jumped in (c)? The materials also contained a second manipulation in which the focused character Elizabeth was replaced by Alexander, as exemplified in the maleagent context. This meant that there were two context conditions: one in which the antecedents of the pronouns were gender differentiated (e.g., Elizabeth/the male lifeguard) and one in which they were not (e.g., Alexander/the male lifeguard). This manipulation was included to establish the influence of focus alone on resolution of the critical pronoun. The reading times showed two effects. First there was evidence of an immediate influence of pronoun gender matching. This was reflected in the firstpass reading times on the verbs when the gender of the preceding pronoun mismatched the gender of the focused antecedent. In other words, readers spent longer on jumped in (c) than on sank in ( b) when the focused character was the gender-differentiated Elizabeth. The second effect related to the point at which readers detected the anomaly. When the pronoun unambiguously
280
Garrod
identified the focused character in the story (e.g. Elizabeth in the example shown), readers spent longer in first-pass reading of the anomalous verb [i.e., longer on jumped in (a) than sank in ( b)]. This indicates an immediate integration of the context into the interpretation of the sentence. However, when the pronoun either identified only the non-focused character or was ambiguous, the anomaly (as indexed by the difference in reading time between contextually appropriate verb and anomalous verb) was detected only later in the sentence. This indicated a delayed integration of the context. Taken together, the two effects point to two stages in the processing of the pronouns. First, there is a stage in which the gender information is taken into account. This yields the immediate effect on reading times of a mismatch between the gender of the pronoun and the gender of the focused character. S�econd, there is an integration stage in which the putative antecedent of the pronoun (either the focused antecedent if it matches in gender, otherwise a non-focused antecedent that matches in gender) is then evaluated in relation to the material in the rest of the sentence. Crucially, this second integration stage produces an immediate commitment to one antecedent only when the pronoun is unambiguous and identifies the currently focused antecedent. This finding is€ consistent with a two-stage account that separates what Sanford, Garrod, Lucas, and Henderson (1984) called bonding from resolution of the pronoun. Sanford et al. argued that the initial bonding stage identifies possible antecedents on the basis of gender or number, which are later evaluated in relation to€the overall situation model. A similar two-stage process has been identified in resolving role anaphors, such as when identifying the pen while reading sentence (5) as referring to the instrument used in writing a letter in sentence (4). (4)╇ The teacher was writing a letter of complaint to a parent. (5)╇ She was annoyed when the pen dropped to the floor. Garrod and Terras (2000) found evidence that readers initially link the anaphor (e.g., the pen) to the verb (e.g., writing) on the basis of a memory matchingprocess similar to the bonding stage in pronoun interpretation. At a subsequent stage, readers evaluate the match on the basis of the context as a whole. Garrod and Terras found evidence for bonding for a dominant instrument, such as pen for writing, even when the instrument was inappropriate in the context (e.g., writing an exercise on the blackboard ). Experiments using antecedent proberecognition techniques also suggest that encountering a pronoun suppresses the memory activation of non-matching antecedents (Gernsbacher 1989). It has been suggested that suppression of non-antecedents may arise from bonding (Garrod et al. 1994).
Referential Processing
281
Figure 11.3
Visual context scenes for example (6). Reprinted from Arnold et al. 2000 with permission from Elsevier.
These results raise the question of whether interpreting pronouns in relation to entities in a visual scene also leads to a two-stage process. So let us compare the results from the study by Garrod et al. (1994) study with those from a vÂ�isual-world experiment that used very similar materials. Arnold et al. (2000, experiment 2) had participants evaluate spoken passages like (6) below against pictures of the characters (e.g., Donald Duck, Mickey Mouse, Minnie Mouse) while tracking their eye gazes (an example set of the pictures is shown in fÂ�igure€11.3). (6) Donald is bringing some mail to Mickey/Minnie. He’s sauntering down the hill, while a violent storm is beginning. He’s/She’s carrying an umbrella and it looks like they’re both going to need it. Arnold et al. were interested in establishing when listeners fixated the appropriate referent for the pronoun (she/ he) in the third sentence in relation to the€point at which the pronoun had been presented. Like Garrod et al. (1994),
282
Garrod
Arnold et al. manipulated both the focus of the potential antecedent and the€influence of gender matching. There was always one focused antecedent (in this case Donald) and either a gender-matched (Mickey) or a gender-mismatched (Minnie) alternative antecedent. To measure the point of resolution, Arnold et al. compared proportions of fixations to the antecedents as a function of the time after hearing the critical pronoun. For three of the conditions the pattern of fixations was essentially identical. Within 400 milliseconds of the onset of the pronoun, viewers were more likely to fixate the intended referent either when the antecedents were€differentiated in terms of gender matching (i.e., when the antecedents were€DonaldÂ� or Minnie in the picture) or when the gender-ambiguous pronoun was consistent with the focused character (i.e., he in the context containing Donald and Mickey). However, when the ambiguous pronoun turned out to be appropriate only for the non-focused character (e.g., when Mickey rather than Donald was carrying the umbrella) viewers would initially tend to fixate the focused character Donald and subsequently show no preference for either character. In fact, in this condition 41 percent of viewers responded that the passage did not match the picture even though it did. In other words, it would match the picture if the pronoun were taken to refer to the non-focused antecedent (e.g., he referring to Mickey). So in two aspects the results of Arnold et al. contrast with those of Garrod et€al. Whereas in the reading experiment immediate resolution occurred only when the pronoun unambiguously matched the focused antecedent, in the vÂ�isual-world study resolution was just as fast when the unambiguous pronoun referred to the non-focused individual. The second difference relates to the viewers’ unwillingness to entertain a non-preferred interpretation of the pronoun as referring to a non-focused antecedent when the pronoun was ambigÂ� uous. In the reading-time study the question-answering results indicated that readers were able to adopt the non-preferred interpretation even though this occurred only after they had encountered the disambiguating verb. What is striking about the visual-world results is that focus appears to have a strong effect when there is no gender differentiation between antecedents but to have no effect whatsoever when there is gender differentiation. One possible explanation for this is that viewers have already coded the characters in the scene on€ the basis of gender (Donaldâ•–=â•–he, Minnieâ•–=â•–she) and simply look at the gender-appropriate character as soon as they hear the relevant gender-marked pronoun. This would amount to treating the pronouns as deictic devices not dependent for their interpretation on anything in the previous discourse. In terms of the proposed distinction between bonding and resolution this would mean that in the visual-world situation the bonding would be to the character
Referential Processing
283
identified in the scene rather than on the basis of any previous mention in the text. Furthermore, it looks as if bonding and resolution coincide in the visualworld situation but occur as distinct processes in reading. The reason for this difference might lie in specific attentional factors that come into play when viewing a scene. It has often been claimed that attending to a particular element in a visual scene inhibits attention to alternatives (see e.g. Desimone and Duncan 1995). If this is the case in the visual-world paradigm, it may be that once a particular referent has been fixated other potential referents are automatically made less accessible and hence less likely to be considered as possible antecedents for the pronoun. This would relate to the proposal, made in the first section of the present paper, that in situations where referents are visually accessible our attention to the scene exerts more influence on the situation model than the language can. A possible alternative explanation for the results is that the pattern of fixations in the visual-world task reflects only the bonding process (i.e., the initial mapping of pronoun onto potential antecedents). Hence, it would not be sensitive to subsequent dynamic shifts in interpretation. However, this seems unlikely in the light of other results from visual-world tasks in which shifts in fixation can reflect syntactic reanalysis (Tanenhaus et al. 1995). It would also not explain why listeners in the visual-world task have so much difficulty in adopting a non-preferred interpretation that they can adopt when reading text. Finally, we also need to recognize that viewers in the visual-world study are responding to a spoken utterance that contains prosodic cues not present in the written material used by Garrod et al. (1994). So one factor that might particularly affect interpretation of the ambiguous pronoun is whether or not it receives stress. It might be expected that an unstressed pronoun would be taken to refer to the default or focused antecedent whereas a stressed pronoun would be taken to refer to the alternative or non-focused antecedent. If the spoken stimuli all used unstressed pronouns in the crucial sentence, this would always bias the listener toward the default focused antecedent reading. Nevertheless, prosodic factors could not explain the absence of any focusing effect for the interpretation of the unambiguous pronouns. In the second reading study that I consider, Sturt (2003) looked at the interpretation of reflexive pronouns. Sturt was interested in examining the on-line application of binding constraints in interpreting reflexives by tracking participants’ eye movements while they read materials like those in (7). (7) a. Jonathan was pretty worried at the City Hospital. He remembered that the surgeon had pricked himself with a dirty needle.â•–.â•–.â•–.
284
Garrod
b. Jennifer was pretty worried at the City Hospital. She remembered that the surgeon had pricked himself with a dirty needle.â•–.â•–.â•–. c. Jonathan was pretty worried at the City Hospital. He remembered that the surgeon had pricked herself with a dirty needle.â•–.â•–.â•–. d. Jennifer was pretty worried at the City Hospital. She remembered that the surgeon had pricked herself with a dirty needle.â•–.â•–.â•–. Sturt reasoned that if binding constraints applied immediately then readers should have problems when they encounter the pronoun herself in (c) and (d) because its gender does not match the stereotypical gender of the grammatically accessible antecedent surgeon. And, in fact, he found evidence of difficulty as indicated in the first-pass reading time on the reflexives in (c) and (d) as compared to (a) and ( b). But this was not all that occurred. Sturt also found evidence of difficulty later in the sentence associated with the gender of the grammatically inaccessible antecedent (Jennifer vs. Jonathan in (a) and ( b)). Readers showed significantly longer second-pass reading times both at the end€of the sentence and while re-reading the beginning of the sentence when the inaccessible antecedent did not match the gender of the pronoun (e.g., JÂ�ennifer/ himself╃) than when it did match (e.g., Jonathan/ himself╃). A followup study also showed that readers commonly misinterpreted the reflexive pronoun when it matched the inaccessible antecedent in gender while mismatching the accessible antecedent (e.g., with 31 percent of readers interpreting herself as referring to Jennifer in condition (d)). Taken together, these results again point to an extended process of resolution after an initial immediate bonding of the pronoun onto potential antecedents. Interestingly, in this case the bonding process seemed to be sensitive to binding constraints, as indicated in the immediate pattern of eye fixations, whereas the later resolution process could be€ influenced by binding-theory inaccessible antecedents as well. Kennison (2003) reported similar interference from binding-theory inaccessible antecedents for regular pronouns and possessive pronouns in self-paced reading. Again, this pattern of results seems to conflict with the pattern found in a visual-world study by Runner et al. (2003) of the interpretation of reflexive pronouns. Although the study by Runner et al. was primarily concerned with interpretation of reflexive pronouns in the special context of possessive picture noun phrases, certain features of it are comparable with features of Sturt’s study. A participant had to pick up named dolls and follow instructions to have the doll touch a picture of himself or other dolls confronting the participant. A typical instruction might be “Look at Ken. Have Joe touch Harry’s picture of
Referential Processing
285
him/ himself.”3 Runner et al. recorded the participants’ eye movements in order to establish when they looked at the putative antecedent for the pronoun in relation to hearing it in the instruction. Like Arnold et al., they found immediate preferential looking to the chosen antecedent and no evidence of any subsequent shift in preference. In other words, in this case there was no evidence for any delayed reinterpretations of the kind suggested in Sturt’s reading results. In this respect, both the results of Arnold et al. and those of Runner et al. are consistent with the idea that in a visual-world situation there is no distinction between bonding and resolution. To summarize, the few visual-world studies on pronominal reference to date point to an interesting difference between referential processing in reading and the visual-world situation. Whereas readers and listeners show evidence of very early mapping between pronouns and potential antecedents in both cases, only in the reading studies is there evidence of a separation between initial mapping processes and final resolution of the pronoun. As I suggested above, I suspect that the difference arises from the way in which a visual world captures attention. Once a reference is indexed onto a referent in the visual scene, other potential referents are in some sense made less accessible. As a consequence, the initial referential choice tends to dominate subsequent processing. Experiments on Referential Processing in Dialogue
At the end of the first section, I pointed out that successful reference in dialogue depends on establishing consensus. It is not sufficient for a speaker to produce an accurate description of a referent if his interlocutor fails to understand the description in the same way. What is crucial is that interlocutors align their situation models (Garrod and Pickering 2004; Pickering and Garrod 2004). This process of alignment is nicely illustrated by the maze dialogue shown in table 11.1. Over the first 11 utterances the two players come to a consensus that speaker A is located at “the right indicator.” Notice that this reference would be infelicitous in most monologues, unless the speaker or writer had spent some time explicating the reference in the context. Garrod and Anderson (1987) found that such idiosyncratic referential descriptions were very common in maze dialogues. But they also showed that both interlocutors invariably aligned on the same kind of idiosyncratic description. For instance, after the dialogue extract shown in table 11.1 both A and B went on to talk of top, middle, and bottom right indicators. But how does this kind of alignment of reference in terms of situation models come about? Pickering and Garrod (2004) propose an interactive alignment account whereby interlocutors align their representations at many linguistic levels€(e.g.,
286
Garrod
phonological, lexical, syntactic, semantic, and the level of the situation model). According to our account, each level of representation is aligned between interlocutors via an automatic process that we treat as a form of priming, with alignment at one level automatically strengthening alignment at other levels. One of the consequences of interactive alignment is that production and comprehension become interdependent because they operate on a single theme created by both interlocutors at the same time. Hence, there is a high degree of overlap in the utterances produced and the computations that lead to these uÂ�tterances are reused (Schenkein 1980; Tannen 1989). As the conversation proceeds, this alignment leads to routinization, in which computations in production and comprehension become increasingly fixed. So utterances involve an increasing proportion of expressions whose form and interpretation are partly or completely frozen for the purposes of the conversation, as is well illustrated by the expression right indicator in table 11.1. Such routinized expressions are similar to stock phrases and idioms (Jackendoff 2002), except that they relate to the particular interaction (Pickering and Garrod 2005). Routinization greatly simplifies the production process (Kuiper 1996) and gets around problems of ambiguity resolution in comprehension. To date the evidence for routinization has come mainly from examining the incidence of reuse of information in dialogue utterances, either in constrained dialogue tasks (Brennan and Clark 1996; Garrod and Anderson 1987; Garrod and Clark 1993; Garrod and Doherty 1994) or in corpora of naturally occurring dialogue. In relation to corpus analysis, Aijmer (1996) estimates that up to 70 percent of words in the London-Lund speech corpus occur in recurrent word combinations (see Altenberg 1990). In other words, they are likely to occur in routine expressions in the dialogues. However, recently the visual-world paradigm has been extended to dialogue settings and enables some interesting tests of the processing consequences of routinization. The tests relate to lexical cohort competitor effects and point-ofdisambiguation effects that have been observed in monologue visual-world experiments. When listeners were instructed to “pick up a racket,” fixations to the target object began as early as 200 msec after the onset of the noun (Allopenna, Magnuson, and Tanenhaus 1998). Eye movements launched at this point in the speech stream are equally likely to be directed to the eventual referent and to other objects in the context with names that are also consistent with the speech signal, such as ‘raccoon’. (A picture of a racket and a picture of a raccoon were present.) However, fixations on these cohort competitors are reduced or even eliminated when the context makes the competitor an implausible referent. This effect of cohort competitors on referential processing is similar to that found in spoken-word recognition in which cohort competi-
Referential Processing
287
tors€are activated as the word is being heard (Marslen-Wilson 1987). Point-ofdisambiguation effects have also been observed in visual-world experiments. Eberhard, Spivey-Knowlton, Sedivy, and Tanenhaus (1995) presented displays containing a variety of differently colored shapes as listeners followed instructions such as Click on the red triangle. In some of the trials the color of the target item was different from the color of all other items in the array, so on these trials the color alone served to disambiguate the reference. On the remaining trials the target item was the same color as other items in the array, so only the shape information disambiguated the reference. Eberhard et al. found that eye movements to the target increased dramatically immediately after the point of disambiguation (POD), whether it was at the adjective (i.e., the color) or at the noun (i.e., the shape). Taken together, these results indicate that lisÂ� teners in monologue situations quickly integrate information from the speech with that from the visual scene. They also indicate that the listeners are sensitive to alternative possible interpretations of both the acoustic information in the speech stream (as indexed by the cohort-competitor effects) and the compositional semantic information in the noun phrase itself (as indicated by sensitivity to the disambiguation afforded by the adjective in the referential description). How can these effects be used to test predictions about the comprehension of routinized referential expressions in dialogue? The crucial point about routines is that they enable listeners to short-circuit various stages of the process of production and comprehension because they fix the links between different levels of analysis. For example, in the routinized description right indicator (see table 11.1) this whole sub-phrase maps directly onto a particular element in the interlocutors’ situation model of the maze. Pickering and Garrod (2004) argue that this information (i.e., the direct mapping) should form part of what we call the implicit common ground — the body of aligned information that represents a common ground between interlocutors at any point in a dialogue. This means that speakers can directly generate the description without having to entertain alternative expressions for adjective or noun (i.e., speakers do not consider rightmost or pointer as alternative lexicalizations of right and indicator). It also means that listeners can quickly identify the referent by directly retrieving the interpretation for the whole expression, in effect treating it like any other lexical item. So comprehension should proceed differently in dialogue situations from monologue situations to the extent that the interlocutors are using routinized referential expressions. Both cohort-competitor effects and point-of-disambiguation effects should disappear or be substantially reduced under such circumstances, because the expressions have taken on a direct interpretation in relation to the particular dialogue.
288
Garrod
Brown-Schmidt, Campana, and Tanenhaus (2004) carried out a dialogue version of a visual-world experiment that affords a test of these predictions. They had pairs of interlocutors instruct each other about where to place blocks in an array such that the blocks on each of their boards matched up by the end of the experiment. The whole arrangement was designed so that the items bÂ�eing described contained pictures similar to those used in the above-described study by Allopenna et al. So it was possible to investigate cohort-competitor effects. The arrangement was also designed to encourage complex NP descriptions to make it possible to investigate point-of-disambiguation effects. Looking first at the cohort-competitor effects they were surprised to find no evidence of such effects in the dialogue setting. In two and a half hours of conversation they only found one look to a cohort competitor in the 300 cases where it might have been expected (i.e., when the competitor was less than 3.5 inches from the target). In other words, interlocutors treated descriptions as uniquely identifying the intended referent. In terms of the point-of-disambiguation effects, the results were again quite striking. Brown-Schmidt et al. were able to compare fixations on target objects while listeners were interpreting either disambiguated descriptions (i.e., those having a clear point at which the description identified a unique item in the visual scene) or ambiguous descriptions (i.e., those that remained consistent with more than one item in the visual scene when completed). For the disambiguated descriptions they found evidence of a POD effect. Viewers were more likely to look at the target after the point of disambiguation than before the point of disambiguation. However, there was also evidence that looks to the target were more likely than looks to the competitor even before the point of disambiguation ( personal communication). The interesting finding related to the ambiguous descriptions, which represented 55 percent of all the descriptions produced in the dialogues. First, ambiguous descriptions elicited more looks to the target than the disambiguated (i.e., unambiguous) descriptions. Second, for ambiguous descriptions fixations converged on the target within 300 msec of the onset of the NP (this compared with convergence 900 msec after NP onset for the disambiguated descriptions). On the surface the results seem surprising. After all, one would expect listeners to have more difficulty identifying referents for ambiguous than unambiguous descriptions — they should be less likely to fixate the correct target and should take longer to start fixating it. However, on closer examination the results are consistent with the interactive alignment account once we consider how the implicit common ground established by the previous conversation can restrict the interpretation of references. Consider first the lack of cohort-competitor effects. If the description is rÂ�epresented in implicit common ground as a routine, then its semantic interÂ�
Referential Processing
289
pretation is going to dominate the access to the meaning of any cohort competitor in the same way that discourse context interacts with meaning access in€the classic Cohort Model of word recognition (Marslen-Wilson and Tyler 1980). Now consider the absence of a POD effect. Again, if the description is a routine, its interpretation will be fixed with respect to the preceding dialogue€ and the entities represented in implicit common ground at that point. Hence, it is reasonable to expect earlier fixations to the target for these descriptions (faster interpretation) as well as more fixations (functional uniqueness of interpretation). This way of interpreting the results depends on the assumption that the ambiguous referential descriptions corresponded to routinized expressions. There was some evidence in favor of this assumption, but there were also differences between ambiguous and disambiguated descriptions that pointed to an additional influence of interactive alignment when referring to things in a dialogue. Brown-Schmidt et al. carried out a subsequent analysis of the factors that might have enabled listeners to interpret the ambiguous references. They considered recency of previous mention, whether or not the reference used a “collaborative term” (i.e., was idiosyncratic with a history of introduction and acceptance by both interlocutors, as with right indicator in table 11.1), the proximity of the alternatives to the target reference, and general task constraints (e.g., physical limitations on where a block could be placed at the time of the reference). It turned out that recency of mention and use of collaborative terms accounted for 66 percent of the ambiguous references (49 percent and 17 percent respectively). However, important additional factors included whether the competitor was closer to the last-mentioned referent than the target and whether the competitor also fit the current task constraints. Recency of mention and use of collaborative terms are consistent with routinization of the referential expression because they indicate a recent dialogue history sufficient to support routinization. However, the other two factors point to an additional feature of collaborative tasks that may have influenced referential processing: the degree to which interlocutors align on particular ways of referring to things in the context of the task itself. Brown-Schmidt et al. (2004) noted that their participants adopted particular strategies for describing the blocks they were manipulating (i.e., blocks that had to be moved in order to make the boards match each other). For instance, they tended to sort out one section of the board at a time, and they tended to talk about blocks that were close to the ones most recently mentioned. This meant that many references were predictable so long as both interlocutors were aligned on the same strategy. In effect, the block that was most salient for the speaker would also tend to be most salient for the listener on the basis of
290
Garrod
an€aligned strategy. Hence, less specific (i.e., ambiguous) references could be treated as unambiguous in the context of the aligned strategy. The visual-world paradigm, when used in conjunction with the more interactive dialogue task, uncovers special principles that apply to making successful reference in dialogue. In particular, these principles relate to the degree to which dialogue, through the interaction, enables interlocutors to align on particular referential expressions (i.e., routines) and to align on particular strategies for reference in relation to the interactive task at hand. Final Discussion
I began the paper by discussing the nature of referential processing in various communicative settings. I argued that in all settings situation models mediate between the language and the discourse world. This is true whether or not there is a real or a visual world to which the expressions in the language refer. However, I also argued that the model plays different roles in the different settings. In the absence of a visual world the model acts as a surrogate representation, whereas in the context of a visual world it acts as an interface between the language and a visual scene. In dialogue, in which consensus is so important, models play a different role again, because interlocutors must align their mÂ�odels if reference is to succeed. To illustrate these points, I considered a sample of the recent research on referential processing in a range of settings, from reading to dialogue. Despite the many similarities in the time course of referential processing in the different settings, the research points to interesting differences. In reading tasks there is evidence for an extended process of pronoun resolution with two stages: bonding, in which potential antecedent referents are entertained, and resolution, in which antecedent information is integrated into discourse inÂ� terpretation. However, in monologue-based visual-world tasks it seems that bonding and resolution occur together. I argued that this reflects the degree to which the visual world drives the construction of the situation model. In particular, I argued that attending to specific referents in the visual scene may affect the accessibility of alternative antecedent referents and so dominate the resolution process. In relation to dialogue, the recent interactive visual-world study by BrownSchmidt et al. (2004) points to interesting differences in referential processing from the more traditional monologue situation. In particular, findings on the lack of cohort-competitor effects and on the rapid interpretation of elliptical and ambiguous references support Pickering and Garrod’s (2004) claims about routinization in dialogue processing. They also point to the importance of
Referential Processing
291
aligned referential strategies in both generating and interpreting references in a dialogue setting. Taking all these findings together, it is interesting to observe how much the study of referential processing has developed since the earlier work on anaphora. Notes 1.╇ It has been argued that reflexive pronouns are not, strictly speaking, referential, because they act like syntactically bound variables (Bosch 1983). However, reflexive pronouns can also take on logophoric interpretations in which they do not act as syntactically bound variables (Reinhart and Reuland 1993). 2.╇ In more detail, the procedure was as follows. Two players are confronted with two computer-controlled mazes that do not differ in relevant ways. They are seated in different rooms but communicate via an audio link. Each player has a token representing his current position in the maze that is only visible to him, and they take turns moving the tokens through the maze one position at a time until both players have reached their respective goal positions. At any time approximately half of the paths in each maze are closed. The closed paths are in different positions for each player and are only visible to that player. What makes the game collaborative is that the mazes are linked in such a way that when one player lands in a position where the other player’s maze has a “switch” box, all his closed paths open and all his open paths close. This means that the players have to keep track of each other’s positions to successfully negotiate their mazes. The dialogue shown in table 11.1 is taken from a conversation that occurred at the beginning of a game. 3.╇ The viewer was asked to look first at Ken so that there would be an antecedent for the pronoun that was not in the same sentence as the pronoun. References Aijmer, K. 1996. Conversational Routines in English: Convention and Creativity. Longman. Allopenna, P. D., Magnuson, J. S., and Tanenhaus, M. K. 1998. Tracking the time course of spoken word recognition: Evidence for continuing mapping models. Journal of Memory and Language 38, 419– 439. Altenberg, B. 1990. Speech as linear composition. Paper presented at Fourth Nordic Conference for English Studies, University of Copenhagen. Arnold, J. E., Eisenband, J. G., Brown-Schmidt, S., and Trueswell, J. C. 2000. The rapid use of gender information: Evidence of the time course of pronoun resolution from eyetracking. Cognition 76, B13–B26. Bosch, P. 1983. Agreement and Anaphora: A Study of the Role of Pronouns in Syntax and Discourse. Academic Press. Brennan, S. E., and Clark, H. H. 1996. Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition 22, 1482–1493.
292
Garrod
Brown-Schmidt, S., Campana, E., and Tanenhaus, M. K. 2004. Real-time reference resolution by naive participants during a task-based unscripted conversation. In J. C. Trueswell and M. K. Tanenhaus (eds.), Approaches to Studying World-Situated Language Use: Bridging the Language-as-Product and Language-as-Action Traditions. MIT Press. Clark, H. H. 1996. Using Language. Cambridge University Press. Clark, H. H., and Wilkes-Gibbs, D. 1986. Referring as a collaborative process. Cognition 22, 1–39. Desimone, R., and Duncan, J. 1995. Neural mechanisms of selective visual attention. Annual Review of Neuroscience 18, 193–222. Eberhard, K. M., Spivey-Knowlton, M. J., Sedivy, J. C., and Tanenhaus, M. K. 1995. Eye movements as a window into spoken language comprehension in natural contexts. Journal of Psycholinguistic Research 24, 409– 436. Evans, G. 1982. Varieties of Reference. Oxford University Press. Garnham, A. 2001. Mental Models and the Interpretation of Anaphora. Psychology Press. Garrod, S., and Anderson, A. 1987. Saying what you mean in dialogue: A study in conceptual and semantic co-ordination. Cognition 27, 181–218. Garrod, S., and Clark, A. 1993. The development of dialogue co-ordination skills in schoolchildren. Language and Cognitive Processes 8 (February), 101–126. Garrod, S., and Doherty, G. 1994. Conversation, co-ordination and convention: An eÂ�mpirical investigation of how groups establish linguistic conventions. Cognition 53, 181–215. Garrod, S. C., Freudenthal, D., and Boyle, E. 1994. The role of different types of anaphor in the on-line resolution of sentences in a discourse. Journal of Memory and Language 33, 39–68. Garrod, S., and Pickering, M. J. 2004. Why is conversation so easy? Trends in Cognitive Sciences 8, 8–11. Garrod, S., and Terras, M. 2000. The contribution of lexical and situational knowledge to resolving discourse roles: Bonding and resolution. Journal of Memory and Language 42, 526–544. Gernsbacher, M. A. 1989. Mechanisms that improve referential access. Cognition 32, 99–156. Glenberg, A. M., and Robertson, D. A. 1999. Indexical understanding of instructions. Discourse Processes 28, 1–25. Jackendoff, R. 2002. Foundations of Language. Oxford University Press. Johnson-Laird, P. N. 1983. Mental Models: Toward a Cognitive Science of Language, Inference, and Consciousness. Harvard University Press. Joyce, J. 1932. Ulysses. Odyssey. Kennison, S. M. 2003. Comprehending the pronouns her, him, his: Implications for theories of referential processing. Journal of Memory and Language 49, 335–352.
Referential Processing
293
Kuiper, K. 1996. Smooth Talkers: The Linguistic Performance of Auctioneers and Sportscasters. Erlbaum. Marslen-Wilson, W. D. 1987. Functional parallelism in spoken word-recognition. Cognition 25, 71–102. Marslen-Wilson, W. D., and Tyler, L. K. 1980. The temporal structure of spoken language comprehension. Cognition 8, 1–71. Pickering, M. J., and Garrod, S. 2004. Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences 27, 169–225. Pickering, M. J., and Garrod, S. 2005. Routinization in the interactive-alignment model of dialogue. In A. Cutler (ed.), Twenty-First Century Psycholinguistics: Four Cornerstones. Erlbaum. Reinhart, T., and Reuland, E. 1993. Reflexivity. Linguistic Inquiry 24, 657–720. Runner, J. T., Sussman, R. S., and Tanenhaus, M. K. 2003. Assignment of reference to reflexives and pronouns in picture noun phrases: Evidence from eye movements. Cognition 89, B1–B13. Sanford, A. J., and Garrod, S. C. 1981. Understanding Written Language. Wiley. Sanford, A. J., Garrod, S. C., Lucas, A., and Henderson, R. 1984. Pronouns without antecedents? Journal of Semantics 2, 303–318. Schenkein, J. 1980. A taxonomy of repeating action sequences in natural conversation. In B. Butterworth (ed.), Language Production, volume 1. Academic Press. Sturt, P. 2003. The time-course of the application of binding constraints in reference resolution. Journal of Memory and Language 48, 542–562. Tanenhaus, M. K., Spivey-Knowlton, M. J., and Eberhard, K. M. 1995. Integration of visual and linguistic information in spoken language comprehension. Science 268 (5217), 1632–1634. Tannen, D. 1989. Talking Voices: Repetition, Dialogue and Imagery in Conversational Discourse. Cambridge University Press. Zwaan, R. A., and Radvansky, G. A. 1998. Situation models in language comprehension and memory. Psychological Bulletin 123, 162–185.
IIIâ•…
Adults’ Processing of Reference: Evidence from Corpora and Reading Experiments
12â•…
Noun-Phrase Anaphor Resolution: Antecedent Focus, Semantic Overlap, and the Informational Load Hypothesis H. Wind Cowles and Alan Garnham
One area of language research that has received a great deal of attention, both theoretical and empirical, is the use of anaphoric expressions. Such expressions can be thought of as serving two functions. The primary function is to refer back to a referent from previous discourse; the secondary, but no less important, function is to help provide discourse coherence and structure. Thirdperson pronouns such as he or she are anaphoric expressions par excellence, but fuller anaphoric expressions, including demonstrative and definite noun phrases ( NPs) such as that woman and the woman are also used in natural discourse. In this paper we shall focus primarily on definite NP anaphor resolution, and in particular we shall examine the interaction of two factors that are related to the identification of antecedents: the focus status of the antecedent and the semantic relationship between the antecedent and the anaphor (including semantic overlap). After presenting these factors, we will discuss one particular approach to anaphor resolution, Almor’s (1999) Informational Load Hypothesis (ILH), and present three experiments that examined the findings presented by Almor (ibid.). The results of these experiments will lead us to consider in more detail the secondary, discourse-structuring function of anaphoric expressions. Factors in Identifying and Mapping Antecedents
All anaphors pose the question of how to correctly identify and map onto an antecedent. A primary question for pronominal anaphors is how this identification process occurs in the absence of semantic information that could allow the identification of a single antecedent when several possible antecedents exist. However, NP anaphors can provide additional semantic information, increasing the likelihood that there will be only one unambiguous antecedent for an NP anaphor at any given time. At first, this might appear to make the question of identification trivial for NP anaphors (and thus uninteresting). It certainly suggests that antecedent identification should be easier for NP anaphors, and
298
Cowles and Garnham
Garrod (1994) argued that, at least for the low-level matching process of anaphor bonding (Sanford 1985), fuller forms of anaphors have an advantage over pronouns because their increased semantic specificity allows them to more easily identify their antecedents. However, despite the relative ease of antecedent identification for NP anaphors, the resolution process is not trivial, and in fact it appears to be sensitive (as it is for pronoun anaphors) to non-semantic properties of the antecedent, including its current discourse status (in particular, whether the antecedent is in focus) and the number of antecedent competitors. Garnham (1989) found longer reading times for anaphor-containing sentences when the antecedent was placed in a conjunct with a semantically similar noun phrase. The longer reading times indicated competition between the two possible antecedents. Thus, while it appears that extra semantic information may allow for a more unambiguous mapping, the available evidence does not preclude a process in which other possible antecedents are taken into account before the correct antecedent is selected. In the following two subsections we will discuss two major factors involved in the process of antecedent identification and mapping: the focus status of the antecedent and the semantic relationship between the antecedent and the anaphor. Focus Status of the Antecedent
The focus status of the antecedent appears to play an important role in anaphor resolution. The term ‘focus’ has been used to denote several (sometimes overlapping, sometimes conflicting) concepts in the linguistics and psycholinÂ� guistic literatures (cf. Gundel 1999). In this paper, we use the term in its psychological sense to mean that the representation of a referent is at the center of attention and is highly activated. According to several accounts of anaphor resolution, a focused antecedent should be easier to map onto an anaphor than a non-focused, less accessible antecedent. For pronouns, which have very little semantic content to help in antecedent identification, the focus status of potential antecedents has been argued to be especially relevant (e.g., in Centering Theory; see Grosz, Joshi, and Weinstein 1995). Two syntactic positions in particular have been shown to increase the accessibility of their referents and thus place them in focus: grammatical subjects (Grosz et al. 1995) and clefts (Arnold 1998; Birch, Albrecht, and Myers 2000; Cowles 2003). Examples of these positions are given below, with subject positions (in (1)) and cleft (in (2) and (3)) positions underlined. (1)╇ Sally went to the store with Jill. (2)╇ It was Sally that went with Jill to the store.
Noun-Phrase Anaphor Resolution
299
(3)╇ The one who went to the store with Jill was Sally. Speaking only in terms of focus status in these examples, it should be easier to refer back to Sally than to refer back to Jill with a pronominal anaphor in a subsequent sentence because the representation of Sally is more highly activated than that of Jill. In this paper we shall focus principally on clefts as a means to focus antecedents. Whereas in spoken language prosody can influence the focusing properties of clefts (Cowles 2003), in written texts the clefted entity becomes the most salient (i.e. accessible) entity for the following text (Almor 1999; Arnold 1998; Birch et al. 2000). Thus, clefts appear to be well suited to manipulations of focus when written materials are used. There are other means of causing a referent to be at the center of attention. For example, it could be the current topic of the discourse. Another possibility is that the information encoding or content of the antecedent could capture attention. It could well be the case that having an unexpected entity or an entity that is inappropriately specific or informative could draw attention to that entity, which could lead to a high activation and focus status for the entity. Antecedent activation, considered an important factor in pronoun resolution, has also been proposed as an important factor in NP anaphor resolution (Almor 1999). In one experiment, Almor found that category NP anaphors referring back to typical antecedents (e.g. robin-bird) were read faster when the antecedent was clefted (and thus in focus) than when it was not. The design of€ the experiment did not include pronoun anaphors, so it is difficult to say whether the difference in reading time between focus and non-focus antecedents would have been comparable between pronouns and NP anaphors, but Almor’s results nonetheless suggest that NP anaphors are sensitive to antecedent focus status, and further, that being in focus facilitates the processing of NP anaphors. Almor accounted for these focus results by arguing that anaphors serve to reactivate the representation of the antecedent in working memory, and if this representation is already highly activated (as it would be if the antecedent were in focus) then reactivation is easier, independent of the form of the anaphor. However, it may well be that the effect of antecedent focus status is somewhat complicated in the case of NP anaphors by their additional role in a€ discourse. We shall address this possibility in more detail in our General Discussion. The focus status of the antecedent is also related to the timing of the resolution process — that is, to when the antecedent is selected and the anaphor mapped onto it. Sanford and Garrod (1989) argued that the processes of identification and matching, especially for full NP anaphors, are started and often completed very quickly. One piece of evidence for their argument came from Garrod and Sanford (1985), who found that spelling errors for verbs following
300
Cowles and Garnham
a repeated anaphoric NP (or name) were detected more quickly when the action denoted by the verb was predictable from the anaphor’s role in the discourse. Other evidence came from Dell, McKoon, and Ratcliff (1983), who not only found priming for a more-specific antecedent at an NP anaphor ( burglarcriminal) but also found priming for other words in the sentence containing the antecedent. Still more evidence for rapid resolution of repeated name anaphors came from Gernsbacher (1989). In summary, despite the greater amount of semantic information available during NP anaphora resolution, the focus status of possible antecedents appears to play an important role. In particular, there is evidence that the increased availability of the referent of a focused antecedent causes faster NP anaphor resolution when compared with non-focused antecedents (Almor 1999). Focus status can be attained via structural /grammatical position (i.e. grammatical subject or cleft position) or possibly via other discourse means (i.e. any attention-gaining mechanism). Relationship between the Antecedent and the Anaphor
Independent of antecedent focus is the issue of deciding that a particular possible antecedent is an appropriate match to the anaphor. Even in the absence of clear antecedent competitors for an NP anaphor, this decision process can be complicated. For example, while pronouns must take their reference from an antecedent, noun phrases have no such requirement and need not be anaphoric. Thus, one question that arises with NP anaphors is whether the use of a given noun phrase is anaphoric. There are certain formal cues that can cause a preference for anaphoric interpretation, such as the use of definite determiners, but, unlike pronominal anaphors, even definite NP anaphors need not be strictly coreferential with a previous discourse entity. This is the case, for example, with uniquely identifiable entities such as “the sun was shining yesterday” and “the night before last we went to a pub.” One can then ask: If there is a possible antecedent that is a poor match, at what point does the system decide against an anaphoric reading and opt instead to posit the existence of a new entity to which the noun phrase refers? However, setting aside the issue of how to decide whether a noun phrase is anaphoric, there are a number of factors that have been implicated in antecedent mapping that have to do with the semantic relationship of the antecedent and the anaphor. This relationship has been discussed in terms of semantic feature overlap between the antecedent and anaphor (Garrod and Sanford 1977) as well as in terms of the conceptual distance (Almor 1999). It is thought that the amount of semantic overlap between an antecedent and an anaphor influences how easily the anaphor is matched to the antecedent. The more
Noun-Phrase Anaphor Resolution
301
overlap there is, the easier the matching process. For example, one way that there can be greater overlap between the antecedent and anaphor is to have a typical antecedent for a category anaphor (e.g. an antecedent like car for an anaphor like vehicle). Garrod and Sanford (1977) found that sentences containing NP anaphors were read faster when the anaphor referred to a typical antecedent than when it referred to an atypical one (e.g. boat). They found that this effect was more robust for antecedent-anaphor relations, but could also be found in sentences in which the anaphor noun phrase was made non-anaphoric (e.g. some vehicles). They also found evidence that the effect of typicality was not due to priming effects per se, and suggested that it was due to a checking processing in which possible antecedents within the current discourse episode were checked against the anaphor and semantic features were matched. The greater the overlap in semantic features between the antecedent and the anaphor, the greater the ease of matching the antecedent to the anaphor, and thus the faster the reading.1 The typicality effect for NP anaphors found by Garrod and Sanford (1977) has been replicated several times, using both reading-time and eye-tracking methodologies (Duffy and Rayner 1990; Rayner, Kambe, and Duffy 2000). With respect to antecedent focus, Rayner et al. (2000) used antecedents that occurred in the predicate rather than the subject of a sentence. Thus, the typicality effect for category NP anaphors has been found not only in different studies using different methodologies, but also with antecedents that had different focus statuses, suggesting that the processing advantage for anaphors with typical antecedents (that thus have greater semantic overlap) does not interact with effects of antecedent focus. Antecedent typicality also appears to have an effect on anaphor resolution when the anaphor is more specific than the antecedent, for example when the antecedent is vehicle and the anaphor is car (Garrod and Sanford 1977; Myers, Cook, Kambe, Mason, and O’Brien 2000). While typicality is a well-studied way to manipulate the degree of semantic overlap between an antecedent and an anaphor, it is not the only way. For example, one can have identity in semantic overlap by simply repeating the antecedent as the anaphor (e.g. vehicle-vehicle). In this case it appears that the focus status of the antecedent does have an influence on how quickly or slowly a repeated anaphor is read, with a repeated-name penalty (Gordon, Grosz, and Gilliom 1993) for repetitive anaphors back to focused antecedents. Gordon et€al. (1993) found that when a repeated name was used to refer back to the grammatical subject of a previous sentence, the reading time for the anaphorcontaining sentence was increased compared to when the repeated-name anaphor referred back to the grammatical object. This interaction with antecedent
302
Cowles and Garnham
focus, which appears to contradict the previous (albeit somewhat indirect) findings for typicality and focus, suggest that either the use of names (as opposed to noun phrases) or the complete semantic overlap between antecedents and anaphors in the case of repetitive anaphors causes a somewhat different set of processes to operate. Further evidence that semantic overlap and focus interact during NP anaphor resolution came from Almor (1999), who found an inverse typicality effect in which focused, atypical antecedents caused faster anaphor reading than focused, typical antecedents. His results, and our efforts to replicate them, will be discussed in greater detail below. However, in addition to manipulations of typicality, antecedent-anaphor pairs can differ in terms of their conceptual distance (if not semantic overlap) owing to the specificity of the antecedent (e.g. car-vehicle vs. hatchback-vehicle). Cowles and Garnham (2003, 2005) tested the interaction of antecedent focus and antecedent specificity by comparing reading times to category anaphors with antecedents that were either close in a conceptual hierarchy (car-vehicle) or one category away (hatchback-vehicle). They found an interaction with focus such that for non-focused antecedents, conceptually closer antecedents resulted in faster anaphor reading times, in keeping with the idea that a greater overlap of semantic features with an antecedent (or perhaps proportion of semantic features) aids in anaphor processing and converging with the results of previous studies showing an advantage for typical antecedents. However, they found the reverse effect was true for focused antecedents: conceptually more distant antecedents (hatchback-vehicle) resulted in shorter anaphor reading times. In summary, the semantic relationship between an antecedent and an anaphor, and in particular the amount of semantic overlap between them, appears to be an important and robust factor in NP anaphor processing. However, the precise effect of this relationship and its interaction with focus status is still not fully understood. On one hand, there is evidence that typical antecedents cause faster anaphor processing when compared to atypical antecedents. However, some studies show that when an antecedent is focused, complete semantic overlap (as in the case of repeated anaphors) actually slows processing. Further, other evidence also suggests that greater overlap does not speed anaphor reading when the antecedent is in focus. We shall now turn to one account of this interaction between focus and semantic overlap. The Informational Load Hypothesis
Almor (1999) proposed the Information Load Hypothesis ( henceforth ILH), a theory of anaphor processing that is based on Grice’s (1975) maxim of quan-
Noun-Phrase Anaphor Resolution
303
tity: speakers should make their contribution exactly as informative as necessary. According to the ILH, an anaphor ( pronominal or NP) is resolved under a set of influences that are principally driven by the role of working memory in anaphor resolution. Essentially, while increased conceptual similarity (i.e. semantic overlap) aids in anaphor resolution, it comes at a cost: an increased working-memory burden for greater semantic overlap. This burden means that there is a processing pressure to use the least degree of conceptual similarity necessary in order to identify an antecedent. In cases where the antecedent is relatively inaccessible, greater similarity should be an aid because it is necessary in order to correctly identify the antecedent; however, when the aÂ�ntecedent is easily identified (for example, by being in focus), relatively little conceptual similarity should be required, and too much similarity will cause an additional processing burden. This general notion of conceptual similarity and its costs is formalized in Almor’s definitions of C-difference and informational load. C-difference is a measure of the semantic distance between an anaphor and antecedent. When the anaphor is more general than its antecedent (e.g. when the antecedent is car€and the anaphor is vehicle), the C-difference of the antecedent-anaphor pair is negative and becomes more negative as the semantic distance increases. Thus, the C-difference for boat-vehicle is lower (i.e. more negative) than the C-difference for car-vehicle. When an anaphor is more specific that its aÂ�ntecedent, as when the antecedent is vehicle and the anaphor is car, the C-difference is positive and increases with increasing semantic distance between the antecedent and anaphor. Thus, the C-difference for vehicle-boat is€greater than that for vehicle-car. For repetitive anaphors, the C-difference for the pair is zero. The informational load of an antecedent-anaphor pair is a function of the C-difference such that a larger C-difference leads to a larger informational load. Conversely, the smaller the C-difference, the lower the information load. Thus, for anaphors that are more general than their antecedents, information load decreases as the semantic distance between the anaphor and the antecedent increases, including when the anaphor becomes more general (i.e. less semantically specified, as in the case of pronouns). Informational load imposes a processing burden, with larger loads causing€ a€ larger working-memory load. Therefore, the informational load of an antecedent-anaphor pair should be only as large as is necessary, and must be functionally justified either by aiding antecedent identification or by adding new information about the antecedent. Almor argued that focused antecedents, because their representations are highly activated, should require less semantic overlap in order to be identified than antecedents that are not in focus, and whose representations are activated to a lesser degree. This is based on the
304
Cowles and Garnham
premise that identifying the antecedent involves reactivating the representation of the antecedent in working memory, and that the representation of a focused discourse referent will be stronger, and thus more easily reactivated, than a discourse referent that is not in focus. Informational load thus interacts with antecedent focus because when the anaphor does not provide new information greater informational load will be justified only in those cases where the antecedent is not in focus. When the antecedent is in focus, greater informational load is not justified, and should impose a greater processing burden, meaning that lower informational loads should result in faster processing times for focused antecedents. The advantage of ILH is that it provides a unified approach to anaphora that is based on independently motivated psychological processing. However, there are at least two potential weaknesses. First, despite the unifying approach to pronouns and NP anaphors, the ILH does not provide a detailed account of how anaphors (and pronouns in particular) identify their antecedents, stating only that closer conceptual distance aids in identification. The ILH does not appear to provide an account of effects of antecedent competitors for NP anaphors, such as those found by Garnham (1984). In some sense, correct identification of the antecedent is assumed; thus, given that there is a known (correct) antecedent for the anaphor, the ILH then accounts for why one antecedent-anaphor pair should be easier to process than another. The ILH does recognize that pronouns, which almost by definition will have little semantic overlap with their antecedents, will be more difficult to map onto antecedents than fuller forms of reference, but it is not clear what this extra difficulty might mean, especially when the correct antecedent is not in focus. The second weakness is that the ILH largely ignores the role that discourse coherence (and form appropriateness) might play in the resolution process. We shall consider this weakness in more depth in our General Discussion. The ILH makes a number of empirical predictions based on the interaction of semantic overlap and focus status of the antecedent, two of which are of particular interest in this paper. First, it predicts that for antecedent-anaphor pairs with low informational loads referring back to a focused antecedent should be faster than referring back to a non-focused antecedent. This is true not only for pronoun anaphors, but also for NP anaphors when they are more general than their antecedents. This is to some degree in conflict with the discourse role that NP anaphors can take. As we have already seen, a number of studies have suggested that the discourse purpose of NP anaphors is to shift discourse frames, or episodes. Thus, one could predict the opposite: that NP anaphors refer best back to non-focus antecedents. Almor’s (1999) results are
Noun-Phrase Anaphor Resolution
305
somewhat mixed. First, in Almor’s experiment 1, category NP anaphors with typical antecedents were read faster when the antecedent was focused than when it was not, supporting the ILH’s prediction. However, in a different study that included both typical and atypical antecedents, this advantage for focus was found only for atypical antecedents. The second prediction has two parts: (1) When the antecedent is not in fÂ�ocus, increased overlap should aid in antecedent identification, and so greater overlap should make processing easier and conversely. (2) When the antecedent is in focus, any increase in semantic overlap is not justified, and thus antecedentanaphor pairs with less overlap should be processed faster than antecedentanaphor pairs with more overlap. The first part of this prediction accounts for typicality effects. The second part of this prediction is far from trivial, because it predicts, somewhat strangely, inverse typicality effects — when the antecedent is in focus, less overlap is better, and thus atypical antecedents should cause faster anaphor processing than typical ones. In keeping with previous findings, Almor (1999) found (somewhat weak) typicality effects in self-paced reading times for NP anaphors in an experiment (experiment 5) that used cleft constructions to manipulate the focus status of€the antecedent. Interestingly, in support of the ILH, Almor found a typicalÂ� ity€ effect only when the antecedent was not in focus, and found an inverse typicality effect when it was. An example of his materials is given below in (4). (4) 1. The professor and her student arranged the transportation for their field trip. 2. a. It was the student that rented the car/ boat. b.╇ What the student rented was the car/ boat. 3. The vehicle was necessary for getting to the exploration site. Almor found a statistically reliable interaction between typicality and focus, the anaphor of the typical antecedent pair (car-vehicle) being read faster than that of the atypical antecedent pair (boat-vehicle) in the non focus, it-cleft condition (404 vs. 426 msec) and the reverse results for the focus, wh-cleft condition (413 vs. 388 msec). In pairwise comparisons, the inverse typicality effect was reliable but the typicality effect only marginally so. These results are somewhat surprising, but they are predicted by the ILH by virtue of the fact that when the antecedent is in focus, increased semantic overlap is no longer necessary for identification, and thus the increase in overlap only causes an increase in working-memory burden, thus making the pair with the least overlap (i.e. the atypical antecedent) the easiest to process.
306
Cowles and Garnham
Antecedent Focus and Semantic Relationships Reexamined
Almor’s (1999) finding of an anaphor processing benefit for atypical antecedents is somewhat striking in the face of many previous findings of typicality effects. Almor accounted for this lack of convergence with previous results by arguing that in previous studies that examined typicality effects the antecedent was not in focus to the extent that it was in his study. While this is certainly a possibility, studying the processing of typicality effects seems worthy of further consideration, given that in some studies (e.g., Garrod and Sanford 1977) the antecedent was in subject position (as well as first mentioned), which is a prominent position often associated with increased accessibility of its referent. Thus, despite the theoretical support provided by the ILH, the results of experiment 5 were unexpected enough to warrant further investigation, and thus we repeated Almor’s experiment 5 in three studies, with mixed results. While these results do not provide conclusive evidence, they are nonetheless illuminating with respect to the interaction of factors like focus and semantic overlap, and suggest that a more complicated set of influences is actually at work in these results. Experiment 1
In this experiment we used Almor’s (1999) design and materials from his experiment 5. An example of one of our stimuli is given in table 12.1, with the antecedent underlined. Following Almor, the logic of this design is that the different antecedent focus and typicality statuses could alter how easily the category NP anaphor is processed. Any resulting increase in difficulty for such processing should reTable 12.1
Example of material used in experiment 1. Context sentence Antecedent sentence â•… Non-focus, typical â•… Non-focus, atypical â•… Focus, typical â•… Focus, atypical
The businessman and his wife admired the gardens of their five-star hotel. It was the businessman that admired the oak. It was the businessman that admired the palm. What the businessman admired was the oak. What the businessman admired was the palm.
Target sentence Subject NP Anaphor // Predicate
The tree // had been standing there for over 100 years.
Question
Did the businessman admire anything?
Noun-Phrase Anaphor Resolution
307
sult in longer reading times when the anaphor is encountered in subject position of the target sentence. Like Almor, then, in this experiment we were interested in the reading times in response to these anaphoric subject NPs. Method Subjects
Twenty-nine students at the University of Sussex participated in exchange for £3.00. Design and materials
Two factors were crossed by manipulating the setup sentences: Antecedent Focus (clefted vs. non-clefted) and Typicality (typical vs. atypical), giving the experiment a 2â•–×â•–2 design with both factors within participants and within items. The 24 items from Almor’s (1999) experiment 5 were used with minor alterations. First, we lacked the original comprehension questions that Almor used and so created our own. Our comprehension questions were similar to what Almor reported, i.e., the correct answer was not always straightforward. However, and we believe importantly, the comprehension questions in this first study did not require fully processing the antecedent-anaphor relationship in order to provide correct answers. Second, we made minor modifications to a few of the items in order for them to conform with British spelling and, where necessary, to sound more natural to speakers of British English (e.g. using footballer instead of ballplayer). Four lists of experimental items were created such that each experimental item appeared exactly once in each list and every list had the same number of items from each condition. Thus, no speaker saw any item more than once, and each item appeared in each list in a different condition. Forty-eight filler items were created which had the same number of sentences as the critical stimuli and included an anaphoric relationship between the setup sentence and the target, but used alternate syntactic structures and anaphors in order to prevent participants from generating strategies while reading. All filler items were included in each list. The items in a list were presented in random order to each participant. Procedure
A self-paced reading task was used in which participants were shown passages on a computer screen and instructed to read each passage carefully in order to answer the questions at the end of each passage correctly. Unlike Almor (1999), we did not strictly monitor participants’ responses to practice items or made them repeat practice items until they got 90 percent of them correct. At the
308
Cowles and Garnham
beginning of each trial, “$$ READY $$” appeared on the screen and participants pressed a button that corresponded to their dominant hand (self-reported) to indicate that they were ready to begin the trial and to advance through each part of the passage in the following way: The setup sentence of the passage appeared on the screen all at once, and then was replaced by the antecedent sentence. This in turn was replaced, in all critical and most filler trials, by the subject NP of the target sentence. Next, the rest of the sentence would replace the subject NP. Finally, a yes/no question replaced the final sentence of the passage, and subjects responded “yes” with their dominant hand or “no” with their non-dominant hand. The time participants took to read the subject NP by itself was recorded along with their response to the yes/no question. All materials were presented, in black on a white background, in an 18-point sans-serif font (Geneva). Results
Participants’ accuracy on the comprehension questions in critical trials was calculated, and any participant scoring below 80 percent accuracy was dropped from further analysis. One subject was excluded on the basis of this criterion, and the mean accuracy of all participants was 92 percent. Participants’ reading times for the subject NP were calculated, and any participant whose average reading time was more than two standard deviations above or below the overall mean of all subjects was dropped from further analysis. No subjects were excluded on the basis of this criterion. Following Almor, we removed data points for each condition that fell more than two standard deviations from the mean across all subjects for that condition. This removed 5 percent of the data in this experiment (which is identical to the amount of the data removed in Almor’s experiment 5), with no condition affected by the screening process more than oÂ�thers. The reading times for the NP anaphor were submitted to a 2â•–×â•–2 analysis of variance (ANOVA) of antecedent focus status (non-focus vs. focus) and typicality (typical vs. atypical) with participants and items as random factors. The mean reading times for the NP anaphor are reported in table 12.2. In the effects of focus, there does not appear to be an overall advantage for focused antecedents: whereas anaphors to atypical antecedents were read faster when the antecedent was in focus, the reverse was true for anaphors to typical antecedents. There also does not appear to be an overall advantage for typical antecedents over atypical ones: whereas anaphors to typical antecedents were read faster in the non-focus condition (a typicality effect), the reverse was true when the antecedents were in focus (an inverse typicality effect). These observations are borne out statistically. We found that there was no overall difference in reading time between anaphors with focused and non-Â�
Noun-Phrase Anaphor Resolution
309
Table 12.2
Experiment 1, mean anaphor reading times (standard errors). Antecedent typicality Focus status
Typical
Atypical
Non-focus (it-cleft) M SEM
703 (24.14)
775 (36.16)
Focus (wh-cleft) M SEM
749 (32.77)
717 (27.64)
focused antecedents (F’sâ•–<â•–1), nor was there any difference between typical and atypical antecedents (F1 (1, 27)â•–=â•–1.783, pâ•–=â•–.193; F2â•–<â•–1). We did, however, find an interaction of the two factors (F1 (1, 27)â•–=â•–8.031, pâ•–<â•–.009; F2 (1, 23)â•–=â•–4.201, pâ•–<â•–.052). Planned comparisons of focus effects within typical and atypical antecedent conditions revealed that when the antecedent was typical, anaphor reading was marginally slower by participants when the antecedent was focused (t1â•–=â•–1.963, pâ•–<â•–.06, t2â•–=â•–1.673, pâ•–=â•–.11). When the antecedent was atypical, anaphor reading was significantly faster by participants when the antecedent was focused (t1â•–=â•–2.499, pâ•–<â•–.019, t2â•–=â•–1.625, pâ•–=â•–.12). Pairwise comparisons of typicality within each focus status revealed that the typicality effect for the non-focus condition was significant by subjects (t1â•–=â•–2.912, pâ•–<â•–.007) but not items (t2â•–=â•–1.552, pâ•–=â•–.13) and that the inverse typicality effect for focus conditions was not significant (t1â•–=â•–1.438, pâ•–=â•–.16, t2â•–=â•–1, pâ•–=â•–.33). Thus, in numerical pattern, these results replicate the results of Almor’s (1999) experiment 5, a reversal of typicality effects for focused antecedents. However, these results are somewhat weaker statistically than Almor’s results. Almor reported a reliable difference by participants and items for the inverse typicality effect. Further, the reading times in this experiment were quite high; given that they were recorded for a short, two-word phrase (e.g. the vehicle), this meant that participants were, on average, spending 368 msec per word, which is considerably longer than Almor’s average of 204 msec per word (although not completely out of line with standard reading times of roughly 300 words per minute, which would predict roughly 200 msec per word, plus time to press the response button). These longer reading times could have been caused by the instructions to read carefully. Because of concerns about the processes behind these longer reading times, we decided to repeat the study using different instructions.
310
Cowles and Garnham
Experiment 2
The first experiment replicated Almor’s finding of typicality and inverse typicality effects based on the focus status of the antecedent. Nonetheless, we were concerned about the rather dramatic difference in overall reading times between our first experiment and Almor’s. We also had concerns about the display of the sentence fragments that both we and Almor had used, and so decided to replicate the study with two small changes. First, we changed the wording of the instructions to the subjects to omit any mention of reading carefully and instead encouraged them to read the passages as they would read a book or magazine. Second, instead of having the predicate of the target sentence appear where the subject NP had been, we used a cumulative display in which the predicate appeared after the subject NP, and the subject NP remained on the screen. Unlike in experiment 1, we collected reading times in response to these predicates (which were identical across conditions within an item) because we believed that they could also indicate processing difficulty for the anaphor, and further, these reading times are more similar to the whole-sentence measures used by Garrod and Sanford (1977). Method
The procedure and materials were identical to those for experiment 1 except for the difference in the wording of the instructions discussed above. Twentynine students at the University of Sussex participated in exchange for £3.00. All were native speakers of English, and none had participated in experiÂ� ment€1. Results
Participants’ accuracy for the comprehension questions in critical trials was calculated, and any participant scoring below 80 percent accuracy was dropped from further analysis. No subjects were excluded on the basis of this criterion, and the mean accuracy of all participants was 93 percent. Participants’ reading times for the subject NP were calculated, and any participant whose average reading time was more than two standard deviations above or below the overall mean of all subjects was dropped from further analysis. One subject was excluded on the basis of this criterion; the data from the remaining 28 participants were used. As before, we removed data points for each condition that fell more than two standard deviations from the mean across all subjects for that condition. This removed 5 percent of the data in this experiment, again in keeping with the amount of data so removed in Almor’s study. Also as before, reading times for the subject anaphor were submitted to a 2â•–×â•–2 analysis of variance (ANOVA) of antecedent focus status (non-focus vs. focus) and typicality (typ-
Noun-Phrase Anaphor Resolution
311
Table 12.3
Experiment 2, mean subject anaphor reading times (standard errors). Antecedent typicality Focus status
Typical
Atypical
Non-focus (it-cleft) M SEM
545 (23.06)
566 (26.28)
Focus (wh-cleft) M SEM
539 (16.13)
569 (23.91)
Table 12.4
Experiment 2, mean added predicate residual reading times (standard errors). Antecedent typicality Focus status
Typical
Atypical
Non-focus (it-cleft) M SEM
–14.80 (32.98)
26.06 (39.04)
Focus (wh-cleft) M SEM
.85 (36.43)
–16.84 (33.46)
ical vs. atypical) with participants and items as random factors. In addition, reading times for the added predicate were recorded. To control for differences in length between items, we examined the residuals from a linear regression analysis of reading times based on the character length of the added predicate for each participant (Trueswell, Tanenhaus, and Garnsey 1994). The results for the subject anaphor are reported in table 12.3; the added predicate results are reported in table 12.4. The changes in our results for the subject anaphor as a result of the minor differences in the instructions were surprising. The inverse typicality effect for the focused antecedent condition disappeared, and in fact was replaced numerically with faster reading times for typical antecedents. The effects of focus seen in experiment 1 were also different, with a slight advantage for focus in typical antecedent conditions, and no difference for atypical antecedents. As before, there was no main effect of focus status (F’sâ•–<â•–1). However, typical antecedents did cause faster reading times overall by participants (F1 (1, 27)â•–=â•–
312
Cowles and Garnham
5.162, pâ•–<â•–.031, F2 (1, 23)â•–=â•–2.453, pâ•–<â•–.13). There was no interaction of focus and typicality (F’sâ•–<â•–1). Planned pairwise comparisons for focus status within typical and atypical antecedents revealed no differences (all t’sâ•–<â•–1). Similarly, planned comparisons of typical versus atypical antecedents within each focus condition also found no significant differences (non-focus: t1â•–=â•–1.408, pâ•–<â•–.17, t2â•–=â•–1.286, pâ•–<â•–.21; focus: t1â•–=â•–1.644, pâ•–<â•–.11, t2â•–=â•–1.223, pâ•–<â•–.23). In effect, in experiment 2, the effects of focus and inverse typicality that were seen in experiment 1 disappear, while the typicality effect seen in the non-focus condition of experiment 1 was replicated overall. Residual reading times for the added predicate show a pattern similar to that found at the anaphor in experiment 1, with faster reading for the typical condition than for the atypical one when the antecedent was not focused, but slower reading for the typical condition when the antecedent as focused. However, none of these differences were statistically significant (all F’sâ•–<â•–1). The results of experiment 2 support the view that greater semantic overlap aids in antecedent identification in general, though this perhaps interacts with focus in some way, but does not support the idea that when focused, conceptually distant antecedents (i.e. atypical ones) should cause faster anaphor reading times due to decreased informational load. However, it is not clear why the results for experiment 2 are different from those of both Almor (1999) experiment and our experiment 1. One possible reason for the difference in these results compared to experiment 1 is that the cumulative display caused subjects to press the button to continue before they were fully ready, knowing that they could continue to read the subject NP after the predicate had appeared. If this were the case, then we would expect to find effects that were at the NP anaphor in experiment 1 to be present in the predicate reading times in experiment 2. However, this was not the case. With these results, we were forced to reconsider our initial interpretation of experiment 1 and ask why changing the instructions would eliminate the effects of focus and replace the inverse typicality effect found in experiment 1 with a weak effect of typicality. Perhaps the results of experiment 2 were due to some kind of semantic priming between the antecedent and target noun phrases rather than the intended anaphoric processes. This could be true if, for example, participants failed to fully process the target NP anaphorically. Reconsidering the parameters of experiment 2, we decided that this was a distinct possibility, as we had encouraged “natural” reading while not providing any motivation for the subjects to process the target NPs anaphorically; that is, we had not asked comprehension questions that required anaphoric processing. So we conducted a third replication in which we used the same materials and procedure as experiment 2, changing only the comprehension questions.
Noun-Phrase Anaphor Resolution
313
Experiment 3
When we asked subjects to read more naturally, we saw a decrease in reading times that placed them somewhat closer to those found by Almor (1999), but failed to find an inverse typicality effect. We reasoned that by reading more quickly subjects may not have been making the anaphoric connection necessary to trigger inverse typicality effects, especially in light of the fact that our comprehension question made no demands for anaphoric processing of the NP anaphor. So we altered the comprehension questions in such a way that in order to answer them correctly, the subjects must have made the anaphoric connection. An example of this change for the item in table 12.1 is given below in (5). No other changes were made to the materials, and no changes were made to the design, the instructions, or the procedure. (5) Experiments 1 and 2: Did the businessman admire anything? Experiment 3: Had the palm been there a long time? Method
The procedure and materials were identical to those for experiment 2 except for the different comprehension questions discussed above. Thirty-two students at the University of Sussex participated in exchange for £3.00. All were native speakers of English, and none had participated in experiments 1 or 2. Results
Participants’ accuracy for the comprehension questions in critical trials was calculated, and any participant scoring below 80 percent accuracy was dropped from further analysis. Two subjects were excluded on the basis of this criterion. The mean accuracy of all remaining participants was 91 percent. Participants’ reading times for the subject NP were calculated, and any participant whose average reading time was more than two standard deviations above or below the overall mean of all subjects was dropped from further analysis. Two subjects were excluded based on this criterion, and data from the remaining 28 participants were used. As before, we removed data points for each condition that fell more than two standard deviations from the mean across all subjects for that condition. This removed 4 percent of the data in this experiment, with all conditions affected equally. The data from the subject anaphor and added predicate were analyzed in the same way as experiment 2. The results are presented in tables 12.5 and 12.6. At the subject anaphor, the results of experiment 3 numerically resemble those of experiment 2, with typical antecedents causing somewhat faster reading than atypical antecedents. However, this effect was not reliable (F’sâ•–<â•–1),
314
Cowles and Garnham
Table 12.5
Experiment 3, mean anaphor reading times (standard errors). Antecedent typicality Focus status
Typical
Atypical
Non-focus (it-cleft) M SEM
576 (20.94)
587 (21.53)
Focus (wh-cleft) M SEM
593 (15.95)
596 (16.72)
Table 12.6
Experiment 3, mean added predicate reading times (standard errors). Antecedent typicality Focus status
Typical
Atypical
Non-focus (it-cleft) M SEM
90.13 (48.52)
30.31 (56.84)
Focus (wh-cleft) M SEM
–52.93 (33.08)
–88.13 (48.27)
nor was there a difference in reading time due to focus status (F1â•–<â•–1, F2 (1, 23)â•–=â•–1.01, pâ•–<â•–.33) or an interaction of focus and typicality (F’sâ•–<â•–1). In short, participants did not read any of the conditions reliably faster at the NP anaphor. This could indicate that participants were not performing the task properly, but there are two reasons to believe this was not the case. First, despite the fact that the comprehension questions were slightly more complicated than in previous experiments, the correct answer rate was roughly the same, suggesting that participants were reading carefully enough to answer the questions correctly. The second reason comes from the analysis of the added predicate regression times, which shows an influence of antecedent focus status on reading€ times. A analysis of variance of the predicate reading times revealed a sigÂ�nificant effect of focus status by participants, such that focused antecedents€ caused faster overall predicate reading than non-focused antecedents (F1 (1, 27)â•–=â•–4.73, pâ•–<â•–.032, F2 (1, 23)â•–=â•–2.50, pâ•–=â•–.128). No other effects were significant (F’sâ•–<â•–1). This suggests that participants found NP anaphor refer-
Noun-Phrase Anaphor Resolution
315
ence back to focused antecedents easier than that back to non-focused antecedents, which would be in keeping with the idea that focus aids in anaphoric processing, although it is not clear why in this experiment the results would be delayed until the predicate. Further, the question remains as to why we failed to find evidence of typicality effects or their interaction with focus in this e�xperiment. General Discussion
Taken together, the results of these attempts at replication do not provide a completely clear picture of the NP resolution processes involved in either these studies or Almor’s (1999) original experiment 5. In this section we shall discuss some of the implications of these results with respect to Almor’s findings and with respect to factors known to influence anaphor processing. In the results concerning antecedent focus, the reading times for the anaphor in experiment 1 and the predicate reading times in experiment 3 bring up a point regarding the interaction of focus and typicality that remains unresolved. Almor (1999) predicted that for otherwise identical, low-informational-load antecedent-anaphor pairs, having a focus antecedent should result in faster anaphor reading than when the antecedent is not in focus. He found evidence supporting this prediction in an experiment (experiment 1) that used typical antecedent-anaphor pairs, such as car-vehicle. However, when he tested the same prediction in an experiment that also contained a typicality manipulation, Almor found an odd result: reading was faster in the focused antecedent condition only when antecedents were atypical for the category anaphor; for typical antecedents there was no effect one way or the other. Almor suggested that this lack of focus effect for typical antecedents could be due to their relatively high informational load, and that, perhaps because of the larger semantic overlap between the typical antecedent and the anaphor, the anaphor behaved in a fÂ�ashion similar to repetitive anaphors, which in effect means that there was a focus effect as predicted, but it was greatly mitigated by the effect of having an informational load that is close to that of repeated anaphors, which disprefer reference to focused antecedents. However, this does not address the fact that Almor found focus effects for typical antecedents in a separate experiment that did not manipulate typicality of the antecedents. In effect, across our experiments and Almor’s, we see that the effects of antecedent focus status are not consistent, and appear to change depending on a number of different experimental factors, including the mix of antecedent types (all typical vs. typical / atypical), subject instructions, and comprehension questions. Further evidence supporting the idea that effects of antecedent focus are unstable comes from the fact that it is in the focus conditions that the iÂ�nteraction
316
Cowles and Garnham
of typicality and focus appears most sensitive to the differences between experiments. That is, for the most part, the typicality effects of the non-focus condition were stable (experiment 3 being the exception) while the influence of typicality on anaphors with focused antecedents appeared to shift from experiment to experiment. This raises several possibilities. First, the results could be due to the use of the kind of cleft constructions that were used in the antecedent focus conditions. Almor considers the contrastive nature of clefts as a factor in€one experiment, and appears satisfied that his results are not due to the contrastive aspect of the clefts alone. Additionally, Van Gompel, Liversedge, and Pearson (2003), in a partial replication (including only focused antecedents) of Almor’s experiment 5 using eye tracking, found that it made no difference in their results whether focus was achieved via cleft position or grammatical subjecthood. Though their results converge somewhat with Almor’s, they do not provide clear evidence in favor of the ILH: typicality effects on first-pass times were found only in a post-anaphor region, and inverse typicality effects were found only in regression-path times for a sentence-final region. Finding inverse typicality effects in what would appear to be end-of-sentence processing suggests that, although the somewhat surprising inverse typicality effects found by Almor (1999) and replicated in our experiment 1 are reliable in some sense, they may not be due to the processes (and factors) that Almor proposed. Van Gompel et al. proposed a two-stage resolution model in which the antecedent is first identified with the aid of semantic overlap, then the specificity of the anaphor is checked for semantic appropriateness. In some sense, as Van Gompel et al. themselves point out, their approach is to pull apart two processes that Almor had combined. The proposal of Van Gompel et al. accounts for the apparent temporal ordering of the two effects in their data, which Almor’s approach cannot do. However, neither approach fully accounts for why both Almor’s results and our own show that the influence of semantic overlap is more unstable when the antecedent is in focus. If the results are not simply due to the use of clefts, one possibility is that antecedent focus status has particular properties regarding anaphoric processing that make it less sensitive to the kinds of factors involved in anaphoric reference to non-focused antecedents. If it is the case that being in focus causes a general increase in referent availability that makes anaphoric reference easier (regardless of reference form), then reference back to the referents of focused antecedents may not be as sensitive to other factors in identification, such as semantic overlap or conceptual distance. It may be the case that focus status provides an initial “default” antecedent that is chosen, and that information concerning semantic overlap becomes relevant only when this default mapping is not the correct one. This would explain why focus conditions do
Noun-Phrase Anaphor Resolution
317
not show robust facilitatory effects for semantic overlap, and thus why neither Almor’s experiments nor ours found such effects. However, by itself it does not explain why the difference between typical and atypical antecedents should sometimes be completely inverse from typicality effects. It would appear that the interaction of focus and semantic overlap is more complicated than the ILH suggests, and further work is clearly necessary to explain not only why inverse typicality effects were found in focused antecedent conditions but also why these effects appear to be so fragile with respect to differences in instructions and in comprehension questions. One likely source of additional complexity is the role that reference form plays in providing cohesion and structure to a discourse: the ease or difficulty with which an anaphor is processed should reflect not only antecedent mapping processes of the kind that we have already discussed but also how well the form of the anaphor itself fits into the discourse structure. There is a large body of evidence from both corpora analysis and experimental studies that more aÂ�ttenuated forms of (co)reference are used to maintain reference within a discourse episode, while more specified forms are used during shifts to a new topic or episode (Clancy 1980; Fox 1984; Marslen-Wilson, Levy and Tyler 1982; Fletcher 1984). In two written sentence-production tasks, Vonk, Hustinx, and Simons (1992) found that participants tended to produce a shift in discourse theme more often when they were prompted to use an overspecified form of anaphor (i.e. a proper name) rather than a pronoun. In addition, participants produced more overspecified forms of reference when they produced a shift in discourse theme compared to when they continued it. Further, in a probe-recognition task Vonk et al. found that reading times for content word targets that were in the same clause as the antecedent of either an over-sÂ�pecified (name plus modifier) or a non-overspecified ( pronoun) anaphor took longer when the anaphor was over-specified, suggesting that the anaphor caused a decrease in the availability of the surrounding antecedent context (as would happen in a change in the theme in the discourse), and in a further study they found that this was an immediate effect, found shortly after the anaphor. Thus, there is evidence that pronouns are more appropriately used as anaphors to referents at the current center of attention (i.e., the focus, or the current discourse topic) and are thus used to maintain reference. NP anaphors, on the other hand, are more appropriately used to refer back to referents that are not in focus, and are used to reintroduce reference or shift discourse topic. This same point has been made in theoretical proposals by Ariel (1990) and Gundel, Hedberg, and Zacharski (1993). It has also been argued that there is a specific kind of anaphor that is preferentially used to refer to a referent that is out of current focus: the demonstrative NP anaphor. Evidence for this comes from
318
Cowles and Garnham
French (Fossard 2002), Finnish (Kaiser and Trueswell 2003), and English (Fossard, Garnham, and Cowles 2003). If NP anaphors are dispreferred for maintaining reference to antecedents that are currently under discussion (i.e. in focus), then this dispreference is in conflict with the idea that all anaphors should refer more easily to focused antecedents. From the perspective of discourse function, only pronouns should be good anaphors for such antecedents, and thus for focused antecedents the discourse function of NP anaphors could be considered to be a competing factor with the higher referent accessibility of antecedent focus status. In other words, it could be true both that antecedent focus aids in anaphor resolution generally and that NP anaphors are inappropriate anaphors to refer back to focused elements. This is because these factors serve two different purposes, one relevant in antecedent identification and one relevant in discourse coherence. These factors may interact during NP anaphor processing, possibly diminishing any processing benefits of focused antecedent status for NP anaphors. The ILH’s predictions, especially for antecedent focus but also for inverse typicality effects, do not really take into account the discourse function of reference form, and so do not provide a competing factor in NP anaphor resoÂ� lution. If antecedent focus aids in NP anaphor resolution, this benefit is in opposition to dispreferences for coreference to focused antecedents by nonpronominal anaphors, including NP anaphors. Thus, the lack of consistent results for focused antecedents could be due to the interaction (and competition) between these factors. Neither Almor’s work nor our experiments presented here directly compared NP anaphor resolution and pronoun resolution, and so we cannot make a comparison between the interaction of focus antecedents and semantic distance for pronouns and NP anaphors, but this may be one direction to pursue. Conclusions
We have discussed three factors in NP anaphor resolution: antecedent focus status, semantic overlap, and anaphor discourse function. The first two of these factors are incorporated into Almor’s (1999) ILH by appealing to the workingmemory burden that greater semantic overlap between an antecedent and an anaphor imposes. The ILH predicts that when an antecedent is in focus greater semantic overlap should cause more difficulty in processing the anaphor, an inverse typicality effect. Almor (ibid.) presented results from one experiment showing such an effect, and in this paper we reviewed these results and presented three experiments that attempted to replicate Almor’s results. Our results were mixed, one study showing the same results but the other two failing to replicate the inverse typicality effect.
Noun-Phrase Anaphor Resolution
319
The ILH also makes a second prediction: that under certain circumstances NP anaphors should be processed faster when they refer to focused antecedents than when they refer to non-focused ones. Almor found evidence in favor of this prediction in a study in which only typical antecedents were presented (1999, experiment 1), but he did not find the same results in a different experiment, in which atypical antecedent conditions were included (experiment 5). We examined this prediction, and, in keeping with the results of Almor’s experiment 5, did not find evidence that NP anaphors with typical antecedents were processed faster when the antecedents were focused. These results, along with Almor’s, suggest that effects of focus are sensitive in some way to the mix of antecedent types in the experiments. Our results also suggest that whereas greater semantic overlap plays a facilitatory role for NP anaphor processing with non-focused antecedents, it does not clearly do so in the case of focused antecedents. However, the inverse typicality effect found by Almor and by our experiment 1 is not robust in the sense that it is easily replicated. In fact, for focused antecedents in our experiments, the results appeared to depend on participant instructions and on comprehension questions. We suggest that the discourse function of the form of an anaphor is important and that it is unvalued in Almor’s approach, and that, in particular, the role of antecedent focus with respect to NP anaphors is particularly complicated by the fact that NP anaphors are believed to have a particular discourse function in shifting focus rather than maintaining focus on a previously focused discourse element. The conflict between these two factors for NP anaphors may explain why the results for the focus conditions were inconsistent across experiments, although it does not explain, without further study, why each study obtained the results that it did. Another possibility is that semantic overlap may be less important in the focus condition if the ( NP) anaphor initially maps onto the focused antecedent without taking into account any overlap in semantic features. Semantic features could confirm this mapping, and such features could then be used to find the correct antecedent. This would explain why, for non-focused antecedents, increased semantic overlap appears to benefit anaphor reading while it does not provide a clear effect for focused antecedents. Acknowledgments
This work was supported by ESRC grant R000239362 (“Local Focus and NP Interpretation: Testing the Informational Load Hypothesis”) to the second author. The authors thank Julia Simner, who collaborated on experiment 1, and Amit Almor, who shared his materials from the experiments reported in Almor 1999.
320
Cowles and Garnham
Note 1.╇ One experimental detail worth noting about their work is that the comprehension questions that they used contained an even mix of questions which required the anaphoric relationship to be identified (and information between the sentences integrated) and questions that did not. This will become relevant in the discussion of some of our own results that follow. References Almor, A. 1999. Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review 106 (4), 748–765. Ariel, M. 1990. Accessing Noun-Phrase Antecedents. Routledge. Arnold, J. 1998. Reference Form and Discourse Patterns. Ph.D. dissertation, Stanford University. Birch, S. B., Albrecht, J. E., and Myers, J. L. 2000. Syntactic focusing structures influence discourse processing. Discourse Processes 30, 285–304. Clancy, P. M. 1980. Referential choice in English and Japanese narrative discourse. In W. L. Chafe (ed.), The Pear Stories: Cognitive, Cultural, and Linguistic Aspects of Narrative Production, volume 3: Advances in Discourse Processes. Ablex. Cowles, H. W. 2003. Processing Information Structure: Evidence from Comprehension and Production. Ph.D. dissertation, University of California, San Diego. Cowles, H. W., and Garnham, A. 2003. Effects of antecedent focus, specificity and adjective modification in NP anaphor resolution. Paper presented at 16th Annual CUNY Conference on Human Sentence Processing, Cambridge, Massachusetts. Cowles, H. W., and Garnham, A. 2005. Antecedent focus and conceptual distance effects in category noun-phrase anaphora. Language and Cognitive Processes 20 (6), 725–750. Dell, G. S., McKoon, G., and Ratcliff, R. 1983. The activation of antecedent information during the processing anaphoric reference in reading. Journal of Verbal Learning and Verbal Behavior 22, 121–132. Duffy, S. A., and Rayner, K. 1990. Eye movements and anaphor resolution: Effects of antecedent typicality and distance. Language and Speech 33, 103–119. Fletcher, C. R. 1984. Markedness and topic continuity in discourse processing. Journal of Verbal Learning and Verbal Behavior 23, 487– 493. Fossard, M., and Rigalleau, F. 2002. Cognitive aspects of pronominal anaphors: The case of the French hybrid demonstrative pronouns ‘celui-ci/cellc-ci’. In A. Branco, T. McEnery, and R. Mitkov (eds.), Proceedings of the 4th Discourse Anaphora and Anaphor Resolution Colloquium. Fossard, M., Garnham, A., and Cowles, H. W. 2003. Referential accessibility and anaphoric resolution: The case of the demonstrative noun-phrase ‘that N’. Poster presented at 9th Annual Conference on Architectures and Mechanisms for Language Processing, Glasgow.
Noun-Phrase Anaphor Resolution
321
Fox, B. A. 1984. Anaphora in popular written English narratives. In R. S. Tomlin (ed.), Coherence and Grounding in Discourse, volume 11: Typological Studies In Language. John Benjamins. Garnham, A. 1984. Effects of specificity on the interpretations of anaphoric noun phrases. Quarterly Journal of Experimental Psychology 36A, 1–12. Garnham, A. 1989. Integrating information in text comprehension: The interpretation of anaphoric noun phrases. In G. Carlson and M. Tanenhaus (eds.), Linguistic Structure in Language Processing. Kluwer. Garrod, S. C. 1994. Resolving pronouns and other anaphoric devices: The case for diversity in discourse processing. In C. Clifton, L. Frazier, and K. Rayner (eds.), Perspectives on Sentence Processing. Erlbaum. Garrod, S. C., and Sanford, A. J. 1977. Interpreting anaphoric relations: The integration of semantic relations while reading. Journal of Verbal Learning and Verbal Behavior 16, 77–90. Garrod, S. C., and Sanford, A. J. 1985. On the real-time character of interpreting during reading. Language and Cognitive Processes 1, 43– 61. Gernsbacher, M. A. 1989. Mechanisms that improve referential access. Cognition 32, 99–156. Gordon, P. C., Grosz, B. J., and Gilliom, L. A. 1993. Pronouns, names, and the centering of attention. Cognitive Science 17, 311–347. Grice, H. P. 1975. Logic and conversation. In P. Cole and J. Morgan (eds.), Syntax and Semantics, volume 3: Speech Acts. Academic Press. Grosz, B. J., Joshi, A., and Weinstein, S. 1995. Centering: A framework for modelling the local coherence of discourse. Computational linguistics 21, 203–226. Gundel, J. 1999. On different kinds of focus. In P. Bosch and R. van der Sandt (eds.), Focus: Linguistic, Cognitive and Computational Perspectives. Cambridge University Press. Gundel, J., Hedberg, N., and Zacharski, R. 1993. Cognitive status and the form of referring expressions in discourse. Language 69, 274–307. Kaiser, E., and Trueswell, J. 2003. Dividing up referential labor: Finnish pronouns and demonstratives in on-line processing. Paper presented at 9th Annual Conference on Architectures and Mechanisms for Language Processing, Glasgow. Marslen-Wilson, W., Levy, E., and Tyler, L. K. 1982. Producing interpretable discourse: The establishment and maintenance of reference. In R. J. Jarvella and W. Klein (eds.), Speech, Place and Action: Studies in Deixis and Related Topics. Wiley. Myers, J. L., Cook, A. E., Kambe, G., Mason, R. A., and O’Brien, E. J. 2000. Semantic and episodic effects on bridging inferences. Discourse Processes 29, 179–200. Rayner, K., Kambe, G., and Duffy, S. A. 2000. The effect of clause wrap-up on eye movements during reading. Quarterly Journal of Experimental Psychology 53A, 1061– 1080. Sanford, A. J., and Garrod, S. C. 1989. What, when and how? Questions of immediately in anaphoric reference resolution. Language and Cognitive Processes 4, S1235–S1262.
322
Cowles and Garnham
Sanford, A. J. 1985. Aspects of pronoun interpretation: Evaluation of search formulations of inference. In G. Rickheit and H. Strohner (eds.), Inference in Text Processing. North-Holland. Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. 1994. Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33, 285–318. Van Gompel, R. P. G., Liversedge, S. P., and Pearson, J. 2004. Antecedent typicality effects in the processing of noun phrase anaphors. In M. Carreiras and C. Clifton Jr. (eds.), The On-Line Study of Sentence Comprehension: Eyetracking, ERP, and Beyond. Psychology Press. Vonk, W., Hustinx, L. G. M. M., and Simons, W. H. G. 1992. The use of referential expression in structuring discourse. Language and Cognitive Processes 7, 301–333.
13â•…
Investigating the Interpretation of Pronouns and Demonstratives in Finnish: Going beyond Salience Elsi Kaiser and John C. Trueswell
There is a general consensus that the more reduced an anaphoric expression is, the more salient its referent has to be. As Arnold (1998, p. 4) points out, “Loosely speaking, all researchers have observed that pronouns are used most often when the referent is represented in a prominent way in the minds of the discourse participants, but more fully specified forms are needed when the representation of the referent is less prominent.” In this paper, following Arnold and others, we take as our starting point the view that the term ‘salient’ means referents that are prominently represented in the mental models that the speakers and listeners construct in the course of a discourse. One could also think of salient referents as possessing a more activated cognitive status than non-salient referents (Gundel, Hedberg, and Zacharski 1993). The initial impetus for connecting reduced referential forms and salient referents is obvious, since “an expression that has little semantic content (e.g., a definite pronoun) or even none (a zero form) can contribute little or nothing to the identification process, and can only be used where identification of the referent is either straightforward or not an issue” (Garnham 2001, p. 55). However, many languages have referential forms that do not appear to differ in terms of the information they provide. For example, in English the pronoun it and the demonstratives this and that “are indistinguishable with respect to the description they provide for the intended referent (an inanimate object)” (Ariel 2001, p. 29). Nevertheless, it is widely agreed that demonstratives are used for less salient (i.e., less prominent) referents than pronouns. In fact, the claim that pronouns are used to refer to more salient referents than demonstratives is part of a larger accessibility/salience hierarchy of referential forms that has been proposed by a range of researchers, including Gundel, Hedberg, and Zacharski (1993), Givón (1983), and Ariel (1990).1 Part of the hierarchy is shown below in (1), where a referent that is further to the left is more salient. (1)╇nullâ•–>â•–unstressed / bound pronounâ•–>â•–stressed/independent pronounâ•–>â•– demonstrativeâ•–>â•–full NP .â•–.â•–.
324
Kaiser and Trueswell
The connection between reduced referential forms (e.g., ‘he’) and salient referents on the one hand, and fuller referential forms (e.g., ‘the man with the straw hat’) and non-salient referents on the other hand, is intuitively appealing and plausible. But what about different kinds of “uninformative” referential forms, such as pronouns (e.g., ‘it’) and demonstratives (e.g., ‘that’)? The assumption that pronouns are used for more salient referents than demonstratives does not follow directly from any constraints imposed by the identification process, since a full pronominal form or a demonstrative is not any more informative (in terms of the description of the referent) than a regular or reduced pronoun.2 One might thus ask whether differences in the referential properties of such “equally uninformative” forms involve the same kind of salience distinction that distinguishes ‘a man’ from ‘the man with the straw hat’, or whether something else is going on with the “uninformative” referential expressions. In this paper, in order to learn more about what differentiates the referential properties of pronouns and demonstratives, we investigate the referential properties of pronouns and demonstratives in Finnish, a language whose pronominal system has interesting informational properties that make it a revealing test bed for exploring saliency hierarchies. Before turning to the details of our research, let us begin by discussing previous research relevant to salience and referential form. Previous Work
If the most reduced referential forms in a language are used to refer to the most salient referents, what makes a referent salient and endows it with a high degree of prominence? In other words, what makes a referent a good candidate for subsequent reference with a reduced form? This question has received a lot of attention in the literature, and a number of factors have been claimed to influence salience, including syntactic role, discourse status/information structure, word order, anaphoric form, discourse connectives, and verb semantics (see Arnold 1998 for an overview). Researchers differ as to whether they believe referent salience to be determined by a single factor or whether they regard it as a “compound” notion that arises from the interaction of multiple factors (see Arnold 1998 and Strube and Hahn 1999 for differing views). We will discuss this in more detail later in the paper. The experiments reported here focus specifically on the effects of grammatical role and information structure (as encoded by word order), while aiming to control for the other factors. In this section we briefly review work on the effects of, and the relationships between, grammatical role, information structure, and word order. The discussion in this section takes as its starting
Pronouns and Demonstratives in Finnish
325
point the prevailing assumption that the most reduced anaphoric form refers to the most salient referent. According to this approach, pronouns can be used to test which entity mentioned in the preceding discourse is the most salient. As we will see later, this assumption is not sufficient to capture the referential properties of Finnish pronouns and demonstratives, and we will propose a new way of thinking about the connection between reduced referential forms and salience. However, the discussion in this section is formulated in terms of the prevailing assumption, since much of the work that we summarize here relies on it. Previous research on the effects of syntactic role indicates that subjects are more salient than non-subjects (Chafe 1976; Brennan, Friedman, and Pollard 1987; Matthews and Chodorow 1988; Stevenson et al. 1994; McDonald and MacWhinney 1995). For example, Crawley and Stevenson (1990) conducted an experiment in which the participants’ task was to write continuations for stories ending with a pronoun prompt, e.g., “Shaun led Ben along the path and he.â•–.â•–.â•–.” The results show that participants interpreted the pronoun as referring back to the subject significantly more often than they interpreted it as referring back to the object. It is important to note at this point that terms such as ‘sÂ�ubject/ object’, ‘syntactic function’, and ‘grammatical role’ are slight oversimplifications, because there are cases in which the grammatical subject is not the agent of the sentence, for example in the case of psychological predicates with experiencer objects, such as ‘The movie frightened Lisa’. It has been claimed that in such cases the experiencer object, e.g. ‘Lisa’, is more salient than the subject, e.g. ‘the movie’ (Turan 1995, 1998; Di Eugenio 1998). However, the experiments presented in this paper focus on sentences in which the subject is agentive, and thus in the subsequent discussion we can assume that there is no€ mismatch between syntactic and thematic roles. Other areas of research, including corpus work and reading-time studies, have also found that subjects€ are more salient than objects (Brennan, Friedman, and Pollard 1987; GÂ�ordon, Grosz, and Gilliom 1993; Stevenson and Urbanowicz 1995; Tetreault 2001). Importantly, sentence-continuation studies without pronoun prompts (Crawley and Stevenson 1990) showed the same results, i.e., participants were more likely to continue writing about the subject than about the object. In other€ words, the subject preference arises even in the absence of a “prompt pronoun.” Thus, there is a large body of data suggesting that a referent’s grammatical function is related to its salience. This brings up the question of why subjects are more salient than objects. In English, with its relatively rigid subject-object order, there are at least two possible explanations for the special prominence of subjects over objects: (i) that subjects are located linearly before objects and
326
Kaiser and Trueswell
(ii) that agentive subjects differ thematically/semantically from objects.3 To better understand the reasons behind the “subjecthood advantage” — and to see if it persists even in the absence of linear precedence — researchers have investigated languages with flexible word order. Existing work reveals considerable cross-linguistic variation, some researchers finding that referent salience is aÂ�ffected by word order and others finding that it is affected by grammatical role. For example, Rambow (1993) claims that in German word order in the Mittelfeld — the positions between the finite and the nonfinite verb — correlates with salience, entities mentioned first being more salient than those mentioned later (see Gernsbacher and Hargreaves 1988 and Gordon, Grosz, and Gilliom 1993 for order-of-mention effects in English). In contrast, Turan (1998) and Hoffman (1998) claim that in Turkish referent salience correlates with grammatical (or semantic) role and is not influenced by word order. Prasad and Strube (2000) make the same claim for Hindi. However, when discussing the effects of word order, we also need to keep in mind the information-structure reasons that trigger word-order variation. A constituent can occur in a noncanonical position for various reasons, e.g. because it has already been mentioned in the discourse or because it contrasts with something else in the discourse model (Kiss 1995; Vilkuna 1995). In fact, Rambow (1993) claims that in German topicalized word orders sometimes affect salience and sometimes do not. According to him, whether or not salience is determined by word order depends on the discourse function of the topicalized structure. In other words, information-structure considerations can play an important role. In detailed studies of the relation between information structure and salience, Hoffman (1998) and Strube and Hahn (1996, 1999) address the relationship between discourse-driven or context-driven word-order variation and salience, but reach rather different conclusions. Hoffman’s corpus-based work on Turkish is done within the framework of Centering Theory, a model of the local-level component of the attentional state in discourse that looks at how focus of attention, choice of referring expression, and local coherence within a discourse segment are connected (Grosz, Joshi, and Weinstein 1995, pp. 4–5). According to Centering Theory, the entities mentioned in a sentence (the “centers”) are ranked according to their salience in a so-called Cf list. The Cf rÂ�anking depends on various things, including syntactic, morphological, and prosodic criteria, and appears to be language-specific (Prince 1994). For English, the ranking is usually assumed to be based on grammatical roles and to€ have the order subjectâ•–>>â•–direct objectâ•–>>â•–indirect object (Walker, Joshi, and Prince 1998, p. 7). This ordering fits with findings by Chafe, Crawley, and Stevenson and others that subjects are more salient than objects.
Pronouns and Demonstratives in Finnish
327
Hoffman finds that Turkish word-order variation is guided by the discourse status of the referents (i.e., whether they have already been mentioned in the discourse or whether they are being mentioned for the first time). In other words, in Turkish word order is used for purposes of information structuring (IS); it indicates how the utterance is connected to the larger discourse context. More specifically, if we think of the hearer’s knowledge store as a filing system and each discourse referent as a file card that contains information about that entity and its connection to other referents (Heim 1982, 1983), then the information structure carried by word order tells us which file card to update. Thus, according to Hoffman, there is a tight connection between information structure and word order. However, she finds in her corpus work that the Cf ranking in Turkish is not influenced by word order and in fact depends on thematic role. She concludes: [Reference resolution] and information structure have different purposes in discourse processing .â•–.â•–.â•–. The sentence topic instructs the hearer to go to a certain file card in order to update it with the information in the sentence; it does not tell the hearer whether that file card is in the center of attention and makes no predictions about what will be talked about in the next sentence. While the referential form of NPs indicates how accessible file cards are, the information structure indicates what to do with the file cards with respect to information updating. [Reference resolution], on the other hand, is used to link each utterance to the prior discourse by keeping track of which file card is at the center of attention. (Hoffman 1998, p. 270)
Hoffman’s view that reference resolution and information structure are separated from each other mirrors Vallduví’s (1993, p. 77) claim that the topic of a sentence (in his terminology, the link) has the function of “pointing to a given address,” i.e., telling the hearer which referent the sentence is providing information about and that the referential form of the link ( pronoun, full NP, etc.) has a separate function. According to these approaches, the fact that word order does not affect the referential properties of anaphoric expressions is to be expected: the information-structure factors driving word-order variation are independent of reference resolution. In contrast, Strube and Hahn (1996, 1999) do not regard information structure and reference resolution as separate. Instead, they propose an anaphorresolution algorithm for German and other free-word-order languages that uses Prince’s (1992) notion of hearer status (i.e., whether the entity is old or familiar to the hearer) to rank the centers in the Cf list. According to their algorithm, “hearer-old discourse entities are ranked higher than hearer-new discourse eÂ�ntities” (1999, p. 320). Strube and Hahn do not focus specifically on word order, but it follows from their proposal that if word-order variation is guided by iÂ�nformation-structure factors — in particular, hearer status — then it will
328
Kaiser and Trueswell
iÂ�nfluence salience. (For a somewhat different approach that also incorporates the claim that information structure influences salience, see Hajičová and Â�Vrbová 1982 and Hajičová, Kuboň, and Kuboň 1990.) In sum, two main claims regarding the relation between salience and word order have been put forth in the literature: (i) that anaphor resolution and the information-structure factors driving word-order variation are separate from each other, and salience is determined by grammatical role (e.g., Hoffman 1998) and (ii) that information structure determines referent salience and hence also anaphor resolution (e.g., Strube and Hahn 1996, 1999). According to€Hoffman, in Turkish discourse-driven word-order variation does not influence salience, because anaphor resolution is not dependent on information structure. But according to an approach along the lines of Strube and Hahn, word-order variation that is driven by information-structure factors has a crucial impact on anaphor resolution. Finnish
There are two main typological characteristics of Finnish that make it a good language for investigating the relation between information structure and reference resolution. First, Finnish displays discourse-driven word-order variation, which enables us to investigate the effects of word order and grammatical role in a language that is historically unrelated to either German or Turkish. Second, standard Finnish has two kinds of third-person anaphors that can be used to refer to humans (the gender-neutral pronoun hän ‘s/ he’ and the demonstrative tämä ‘this’), which — given the claims of accessibility-hierarchy-based approaches — can be used to test the salience of referents. Previous work on the relation between information structure and reference resolution has not methodically investigated the referential properties of different anaphoric forms. In the next two subsections, we will take a closer look at Finnish word order and anaphoric forms. Discourse Factors and Word Order
Finnish has no definite or indefinite article.4 The canonical word order is SVO, but all six permutations of S, V, and O are grammatical in the appropriate c�ontexts ( Vilkuna 1989, 1995). In this paper, we focus on the informationstructure properties of the SVO-OVS variation. The choice between SVO and OVS is guided by whether or not the arguments have been mentioned in the preceding discourse (Hiirikoski 1995; Chesterman 1991; see also Helasvuo 2001 on pronominal subjects). Let us begin by considering subjects. Subjects in a noncanonical, postverbal position introduce discourse-new referents, as in
Pronouns and Demonstratives in Finnish
329
(2a).5 In contrast, preverbal subjects usually refer to entities that have already been mentioned in the discourse, as in (2b). In a discourse-initial “all new” utterance, a preverbal subject NP can also be interpreted as discourse-new. (2) a. Pylvääseen nojasi solakka tummahiuksinen nainen (Remes 1997, p.€369) column-ILL leaned slim-NOM dark-haired-NOM woman-NOM ‘A slim, dark-haired woman leaned against the column.’ b. Nainen puhui hänen kanssaan saksaa. (Remes 1997, p. 343) woman-NOM spoke he-GEN with German-PART ‘The woman spoke German with him.’ Let us now consider the pattern for objects. An object in a noncanonical preverbal position in an OVS sentence, as in (3a), is discourse-old information. Objects in their canonical postverbal positions can be interpreted as new or old information, as shown in (3b). (3) a. Tiedon julkaisi ‘Nature’. Information-ACC published Nature-NOM ‘Nature [a science magazine] published the information.’ (from a news story published 3/20/04 on the website of the science magazine Tiede) b. Tyttö osti auton. girl-NOM bought car-ACC ‘The girl bought a /the car.’ Anaphoric Forms of Finnish
In Finnish, third-person human referents can be referred to with hän ‘s/ he’ or tämä ‘this’. The pronoun hän has been claimed to refer to the most important character in a particular situation or context ( Vilppula 1989) or to the character in the foreground, the most central character (Kalliokoski 1991). According to Saarimaa (1949), the subject of a sentence is more in the foreground than other constituents and thus hän tends to refer to entities realized in subject position. Supporting evidence for this claim comes from Halmari (1994), who finds in her corpus study that hän is usually used to refer back to a preceding subject.6 An example is given in (4). The demonstrative tämä ‘this’ has multiple uses in Finnish. Just as in English, it can function as a demonstrative pronoun or as a discourse deictic (Etelämäki 1996), as in ‘This is my sister’ and ‘Julius pushed Julianna. This surprised her.’ In addition, and in contrast with English, tämä can also function as an anaphor referring to third-person human referents. In this paper we focus on this particular use of tämä. Whereas hän has been observed to refer to
330
Kaiser and Trueswell
sÂ�alient referents, tämä refers to characters in the background ( Varteva 1998). According to Sulkala and Karjalainen (1992, pp. 282–283), tämä is “used to indicate the last mentioned out of two or more possible referents.” An example is given in (5). Crucially, we cannot replace hän in (4b) with tämä as shown in (4c). This is because use of tämä in these kinds of contexts hinges on the presence of a second possible referent, and using tämä to refer to the only salient referent results in infelicity.7 (4) a. Koskela marssitti joukkueensa parakin eteen. Koskela-NOM marched troops-ACC-his barrack-GEN in-front-of ‘Koskela had his troops march to the front of the barracks.’ b. Kotvan hän seisoskeli aivan kuin miettien miten aloittaisi. (Linna 1954/1999, p. 10) moment-GEN he-NOM stood exactly as-if thinking how startCONDITIONAL ‘He stood there for a moment, as if thinking about how he should start.’ c. # Kotvan tämä seisoskeli aivan kuin miettien miten aloittaisi. (5) a. Koskela alkoi tuijottaa laulavaa vänrikkiä. Koskela-NOM started to-stare singing-PART second-lieutenantPART ‘Koskela started to stare at the singing second lieutenant.’ b. Tämä jatkoi aluksi lauluaan, mutta alkoi sitten vaivautua .â•–.â•–. (Linna 1954/1999, p. 285) This-NOM continued in-beginning song-his, but started then to-bebothered ‘At first he (the second lieutenant) continued singing, but then started to be a bit bothered .â•–.â•–.’ However, what about the effects of word order (and the information structure it encodes) on the referential properties of hän and tämä ? Is Finnish simÂ� ilar to German in that the information structuring of the preceding sentence, as reflected in the word order, will influence what hän and tämä are used to refer to (Strube and Hahn 1996, 1999)? Or does Finnish resemble Turkish, in which information-structure word-order variation has no effect on reference resolution, and anaphor resolution is sensitive to grammatical role (Hoffman 1998)? The existing work on hän and tämä does not provide a conclusive answer to this question. According to Saarimaa (1949), grammatical role is what matters. He states that tämä refers to a recent non-subject, and hän to a subject, presumably regardless of word order. However, the validity of this claim cannot be evaluated by means of the existing corpus studies on hän and tämä in standard
Pronouns and Demonstratives in Finnish
331
Finnish (Halmari 1994; Kaiser 2000), owing to the difficulties of finding sufficiently large numbers of the relevant types of examples in a non-tagged corpus. To solve this problem, we decided to conduct psycholinguistic experiments in order to investigate how word order and information structure affect the referential properties of the pronoun hän and the demonstrative tämä. In this paper, we discuss the results of two sentence-completion studies and an eyetracking experiment. Predictions
Before turning to the sentence-completion experiments, it is worthwhile to spell out in detail some predictions that we can make about the presence or absence of the effects of information structure and word order on referent salience and on the referential properties of hän and tämä. If we think back to the claims made by Hoffman (1998), Vallduví (1993), and Strube and Hahn (1996, 1999) concerning the connections between reference resolution and information structure, we can use them to formulate two main predictions: (i)╇ If we follow Strube and Hahn, we predict that word order, and in particular the information structure it encodes, determines the referential properties of hän and tämä. If we assume that old information is more salient than new information, and that pronouns are ranked higher than demonstratives on the hierarchy of referential forms, we predict that hän refers to entities that are old information and tämä to entities that are new information,8 regardless of the grammatical role of those entities. In other words, focusing on the SVO-OVS alternation in Finnish, we predict that hän and tämä will show sensitivity to word order but not grammatical role. (ii)╇ If we follow Hoffman (1998) and Vallduví (1993), we predict that since the process of anaphor resolution is separate from the domain of information structure, hän and tämä should not show any sensitivity to word order. Instead, as Hoffman (1998) and Turan (1998) found for Turkish, we predict that salience is determined by the grammatical role of the referent, with subjects ranked above objects. Thus, focusing on the SVO and OVS orders in Finnish, we predict that hän will refer to an entity whose most recent antecedent was in subject position9 and tämä to entities whose most recent antecedent was in object position, regardless of information structure and word order. Experiments
In this section we discuss three experiments which investigate the effects of word order/information structure and grammatical role on the referential properties of hän ‘s/ he’ and tämä ‘this:’ two sentence-completion studies and an
332
Kaiser and Trueswell
eye-tracking experiment. In the first sentence-completion experiment, the critical SVO and OVS sentences were presented without a discourse context, but in the second experiment they are embedded in supportive discourse contexts. In other words, the preverbal argument is explicitly discourse-old ( previously mentioned) in experiment 2, whereas in experiment 1, discourse status is signaled only by word order but not supported by context. The third experiment uses eye-tracking methodology in order to investigate the incremental aspects of reference resolution to see if the findings of the sentence-completion studies can be replicated in a more on-line, incremental setting. In all three experiments, both the subject and object in the SVO and OVS sentences were full NPs, and thus in this paper we focus primarily on the effects of grammatical role, word order, and information structure. There is also a brief discussion of the effects of the referent’s anaphoric form in the general discussion section (for research investigating the role of anaphoric form see Kaiser 2003, 2005), but unfortunately an investigation of the effects of other potentially salienceinfluencing factors is beyond the scope of this paper. Experiment 1: Investigating hän ‘s/↜he’ and tämä ‘this’ in a Sentence-Completion Study
This experiment tested the effect of word order and grammatical role on the referential properties of hän and tämä. The stimuli consisted of written SVO and OVS sentences, each of which was followed by the first word of the next sentence, either hän ‘s/ he’ or tämä ‘this.’ Anaphor type and word order were crossed to create four conditions: (a) SVO/Hän, ( b) OVS/Hän, (c) SVO/Tämä, (d) OVS/Tämä. Thirty-two native Finnish speakers participated in this experiment. Each participant was asked to complete 38 items, whose order was randomized: 8 critical items and 30 fillers. The nouns used for the subject and object in the critical items were professions or other kinds of “occupational roles” (e.g., cook, teacher, author). We used these semantically rich referents to make participants’ continuations easier to code. In addition, to control for possible verb-bias effects, all verbs were action/agent-patient verbs (as defined by€ Stevenson et al. 1994). Participants’ continuations were coded according to€which of the referents in the preceding sentence (subject or object) the participants chose as the referent of the pronoun. When it was not clear which referent the participant had interpreted as being the referent of the pronoun or demonstrative, the continuation was marked as “unclear.” In addition, if the demonstrative tämä was not used as an anaphor for one of the characters mentioned in the preceding sentence (e.g., if it was used as a discourse deictic, as in ‘This was rather strange’), the continuation was coded as a “demonstrative” use.
Pronouns and Demonstratives in Finnish
333
Table 13.1
To which referent does the anaphor refer?
SVO/Hän OVS/Hän SVO/Tämä OVS/Tämä
Subject
Object
Demonstrative
Unclear/other
62.5% 61% ╇ 1.5% 33%
22% 25% 83% 37%
0 0 12.5% 16%
15.5% 14% ╇ 3% 14%
Results
An analysis of the continuations reveals that hän and tämä are affected in different ways by the word-order manipulation. The pronoun hän ‘s/ he’ shows no sensitivity to word order, and is interpreted as referring to the subject of both SVO and OVS sentences. In the SVO/Hän condition, the pronoun was interpreted as referring to the preceding subject in 62.5 percent of the cases and to the object in 22 percent of the cases. In the OVS/Hän condition, there were 61 percent subject interpretations and 25 percent object interpretations. In contrast, the demonstrative tämä is very sensitive to the change in word order. In the SVO/Tämä condition, tämä has a strong preference for the object (83 percent object interpretations), but in the OVS/Tämä condition tämä is fairly evenly divided between subject interpretations (33 percent) and object interpretations (37 percent). (See table 13.1.) Overall, there are significant effects of anaphor type and word order — as well as a significant interaction — on reference to both subjects and objects (╃p’sâ•–<â•–0.05). Thus, whether an anaphoric expression is interpreted as referring to the preceding subject or to the preceding object depends on whether the anaphor is hän or tämä and whether the word order of the preceding sentence is SVO or OVS. The interaction between word order and anaphor type also shows that one has an effect on the other. More specifically, the effect of word order depends on the anaphor: the pronoun hän does not display sensitivity to word order, but with the demonstrative tämä changing the word order has a big effect on the referential patterns. Discussion
Let us consider again the predictions discussed above, to see how they compare to the actual results. One prediction, based on work by Strube and Hahn (1996, 1999), was that word order, and in particular the information structure it encodes, determines salience and thereby also determines which referents hän and tämä refer to. In other words, we predicted that hän and tämä will
334
Kaiser and Trueswell
show sensitivity to word order but not to grammatical role. However, this prediction clearly does not match what we saw for the pronoun hän, which was interpreted as referring to the preceding subject regardless of word order. In fact, it looks like hän patterns as predicted by Hoffman’s (1998) claim that the process of anaphor resolution is separate from the domain of information structure. Recall that we predicted on the basis of this approach that hän refers to subjects and tämä to objects, regardless of information structure and word oÂ�rder. This fits with the data we obtained for hän, but the demonstrative tämä does not fit this prediction. Let us look at the referential properties of tämä in more detail, in order to see the implications for the two predictions. We saw that tämä prefers postverbal objects with SVO order and is divided between subject and object in OVS oÂ�rder. This pattern does not fit either of the predictions; in fact, it seems that with tämä we are dealing with additive effects of both word order and grammatical role. In other words, when there is a potential referent that is both postverbal and a non-subject, tämä shows a strong preference for that referent (as we saw in the SVO/Tämä condition). However, if being post-verbal does not coincide with being a non-subject, tämä is split between subject and object (as in the OVS/Tämä condition). In sum, tämä appears to be sensitive to both word order and grammatical role. As a whole, the results of experiment 1 show that for hän only grammatical role is relevant and that for tämä both grammatical role and word order are relevant. These results pose challenges for any approach which assumes (i) that pronouns are used to refer to highly salient referents and demonstratives for less salient referents and (ii) that hän and tämä are sensitive to the same kind of salience (whether it be determined by a single factor, such as grammatical role or information structure, or by a set of factors). Counterintuitively, the results of experiment 1 show that not all referential forms within a single language are sensitive to the same salience-influencing factor(s) to the same degree. But does this mean that we must abandon the notion of salience cÂ�ompletely? The most parsimonious explanation might be that hän and tämä cause comprehenders to preferentially probe different types of representations when trying to locate the most likely referent in their mental model of the discourse. We would like to suggest that, when a comprehender processes the sentence containing hän or tämä, two different representations of the current linguistic iÂ�nput remain activated and are relevant for computing reference: (i)╇ the syntactico-semantic representation of the prior sentence, which we assume includes information about grammatical and thematic roles
Pronouns and Demonstratives in Finnish
335
(ii)╇ the comprehender’s mental model of the preceding discourse, which we assume includes information about the situation or event being described, as well as the entities involved in it. The mental model is not a representation of a text (Glenberg, Kruley, and Langston 1994), but rather a discourse model that the comprehender has constructed on the basis of the preceding discourse (Johnson-Laird 1983; Van Dijk and Kintsch 1983). The idea that both of these representations are available and needed for processing is by no means new. For example, if we accept the claim of Fiengo and May (1994) that the elided verb phrase in VP-ellipsis is a syntactic copy of the antecedent, it follows that it is necessary to hold on to the syntactico-semantic representation of the preceding sentence at least temporarily (see also Shapiro and Hestvik 1995; Shapiro, Hestvik, Lesan, and Garcia 2003, for experimental evidence). The importance of various kinds of mental models has been inÂ� vestigated by a range of researchers, including Van Dijk and Kintsch (1983), Glenberg, Kruley, and Langston (1994), and Johnson-Laird (1983). Furthermore, the two levels we hypothesize to be relevant for computing reference for hän and tämä are also related to the claims put forth by Hoffman (1998) regarding the importance of grammatical role, and to the claims of Strube and Hahn (1996, 1999) concerning the role of information structure. We will discuss these connections more in the General Discussion section. We would like to suggest that at each of these levels of representation the relevant entities can be ranked in terms of their salience. More specifically, in light of existing research, we conclude that at the syntactico-semantic level agentive subjects are more salient than non-agentive objects and oblique arguments. At the level of the mental representation of the discourse, we hypothesize, the salience of referents is influenced by a range of factors, especially information-structure factors such as old versus new information (Strube and Hahn 1996, 1999). This approach enables us to capture the referential properties of hän and tämä straightforwardly without having to give up the notion of salience. We claim that a comprehender, when interpreting the pronoun hän, preferentially looks to the syntactico-semantic level, and, when interpreting the demonstrative tämä, looks to the mental discourse model that she has constructed. It Â�follows from this that hän is sensitive to grammatical role and that tämä is sensitive to both discourse status (encoded here by word order) and gÂ�rammatical role. Furthermore, the subject preference that we observe for hän is compatible with the claims that pronouns prefer highly salient referents, and with the preference that tämä exhibits for discourse-new nonsubjects also fits with claims that demonstratives prefer less salient referents. In sum, by assuming that hän
336
Kaiser and Trueswell
and tämä prompt comprehenders to attend to different sorts of representations, we can account for the results of experiment 1.10 We explore this hypothesis further in experiment 2 by examining how the interpretation of hän and tämä is influenced by stronger manipulations of discourse status. Experiment 2: Additional Effects of Discourse Status on hän and tämä
This experiment addresses a specific question left open by experiment 1: whether the same results are obtained if we situate the SVO and OVS sentences in an appropriate discourse context. In other words, what happens if we strengthen the discourse-oldness of the preverbal argument? As was discussed earlier, SVO-OVS variation in Finnish is driven by the discourse status of the arguments, with discourse-old arguments preceding discourse-new arguments. In SVO order the subject is usually discourse-old, and in OVS order the object is discourse-old. However, in experiment 1 we presented the sentences without any preceding context. As a result, the discourse statuses of the arguments and the information-structure contribution of the sentences were not presented relative to any real discourse context; rather, they were only signaled by the word order. Remember that in experiment 1 hän ‘s/ he’ showed a strong grammaticalrole effect and clearly preferred subjects over objects, whereas tämä ‘this’ demonstrated sensitivity to both word order and grammatical role, which we hypothesized to be a result of hän and tämä causing comprehenders to preferentially attend to different types of representations — a syntactico-semantic level and the mental-discourse-model level, respectively — when trying to locate the most likely referent in the preceding discourse. If this is the case, the OVS/Tämä configuration should be most affected by the changes between experiment 1 and experiment 2. We expect that making the preverbal NP explicitly discourse-old in experiment 2 will increase the salience of this referent in the comprehender’s mental model of the discourse. As a result, the salience contrast between the preverbal and postverbal NPs should be greater, and this should influence the interpretation of the demonstrative tämä, which is proposed to be sensitive to this factor. Thus, the presence of tämä following the OVS sentence should now show greater preference for the postverbal, less salient subject, rather than showing no preference. (For details see Kaiser and Trueswell 2008.) In experiment 2, as in experiment 1, the participants’ task was to provide natural-sounding completions for sentence fragments. In contrast to experiment 1, the critical SVO or OVS sentence was now preceded by a brief, twosentence context mentioning the preverbal argument of the critical sentence (i.e., S in SVO, O in OVS). The postverbal argument (SVO, OVS) is intro-
Pronouns and Demonstratives in Finnish
337
duced for the first time in the critical sentence. Thus, in experiment 2 both SVO and OVS sentences are felicitous, because in both orders the preverbal argument is discourse-old and the post-verbal argument is discourse-new. The nouns used for the subject and object were occupational roles, and the verbs were agent-patient verbs. We had the same four conditions as in experiment 1: SVO/Hän, OVS/Hän, SVO/Tämä, OVS/Tämä. Sixteen native Finnish sÂ�peakers participated in the experiment. The number of critical items was increased to 16, and there were 32 filler items. Results
The results of this experiment complement those of experiment 1. As in experiment 1, hän continues to show a strong preference to refer to the grammatical subject of the previous sentence regardless of its position in the sentence and despite the fact that this experiment now establishes one referent as discourse-old. (SVO/Hän and OVS/Hän both show the same pattern of continuations, which can be summarized as follows: >60 percent subject interpretations, <15 percent object interpretations, <25 percent unclear or other.) This suggests that hän is indeed influenced by what is in the most salient gramÂ� matical position (i.e., the subject position). In the SVO/Tämä condition, tämä has a very strong preference to refer to the postverbal argument of the preceding sentence (88 percent object interpretations, 0 percent subject interpretations; the remainder of continuations were coded as demonstrative or unclear or other), just as in experiment 1. In contrast, in the OVS/Tämä condition the preferences for tämä differ from those found in experiment 1. In experiment 2, tämä shows a significant preference for the postverbal subject (there are almost five times as many subject interpretations as object interpretations). Recall that with OVS order in experiment 1 tämä showed no clear preference for one aÂ�rgument over the other (33 percent subject interpretations vs. 37 percent object interpretations). This difference between experiments 1 and 2 suggests that€ tämä does indeed prompt comprehenders to consider the mental model that they computed on the basis of the preceding discourse, in particular the information-structure properties of the referents. Overall, as in experiment 1, there are significant effects of anaphor type and word order — as well as a significant interaction — on reference to both subjects and objects. The referential properties of tämä in experiment 2 can be summarized by saying that tämä displays a preference for the postverbal referent over the preverbal referent in both SVO and OVS orders. Nevertheless, this pattern is more pronounced with SVO than with OVS order. In the OVS condition, in more than one-third of the continuations tämä was not used as a third-person anaphor, but rather in some other way, e.g., as a discourse deictic (as in ‘This was
338
Kaiser and Trueswell
a strange thing to do’). A potential explanation for this pattern is discussed below. For more details concerning the results of experiment 2, see Kaiser 2003. Discussion
A comparison of experiments 1 and 2 shows that the results are basically the same, except for the OVS/Tämä condition. In experiment 1 OVS/Tämä shows no clear preference for either argument, but in experiment 2 this condition rÂ�eveals a significant preference for the postverbal subject. This difference is presumably due to the addition of a discourse context, thus supporting the information structure signaled by the word order. The observed difference provides further support for our claim that tämä causes comprehenders to probe a level of representation at which referent salience is influenced by information structure (which, we hypothesize, is the comprehender’s general mental model of the discourse) when trying to locate the most likely referent for tämä. However, the effects of grammatical role that we saw for tämä in experiment 1 have not completely disappeared, as is indicated by the differences in experiment 2 between the SVO/Tämä condition (with a postverbal, discoursenew object) and the OVS/Tämä condition (with a postverbal, discourse-new subject).11 Tämä has a stronger postverbal preference in the SVO/Tämä conÂ� dition than in the OVS/Tämä condition, and the OVS/Tämä condition also prompted a much higher number of “demonstrative” continuations.12 The lack of a clear subject preference in the OVS/Tämä condition suggests that grammatical role also plays a role in influencing referent salience at the level of the mental-discourse model, with subjects being more salient than objects. Given our claim that tämä prefers low-salience referents, the continuation patterns suggest that a postverbal, discourse-new subject is not as well suited for tämä as a postverbal, discourse-new object. The high number of demonstrative continuations in the OVS/Tämä condition may well be a result of neither the preverbal object’s nor the postverbal subject’s being of sufficiently low salience to “qualify” as a good referent for tämä. As a whole, the results of experiment 2 corroborate the conclusion we reached, based on experiment 1, that not all referential forms within a single language are sensitive to the same salience-influencing factor(s) to the same degree. In addition, the data from experiment 2 support the hypothesis we fÂ�ormulated using the data from experiment 1, namely that the pronoun hän prompts comprehenders to consider a particular level of representation (the syntactico-semantic level) and the demonstrative tämä triggers listeners to look at a different level of representations (the comprehender’s mental model of the discourse). This, of course, raises a question: Why should different referential forms trigger consideration of different levels of representation? We
Pronouns and Demonstratives in Finnish
339
address this question in the General Discussion section after experiment 3, where we also discuss the implications of our findings for existing theories of reference resolution. Experiment 3: Eye-Tracking Study
In this subsection we report the results of a study that investigates people’s interpretation of anaphoric expressions in a highly incremental, on-line manner, by following their eye movements. This experiment has two aims. First, we would like to find out whether the patterns observed in experiments 1 and 2 also occur in real-time, on-line processing, or whether they are delayed effects that take longer to kick in. Second, in light of the findings that the referential properties of hän and tämä are not reducible to a common single factor, we would like to find out how our hypothesis about the two different kinds of representations, namely those at the syntactico-semantic level and those at the level of the mental discourse-model, fits with data from on-line processing. Method
Sixteen native Finnish-speaking participants, most of them students at the Helsinki University of Technology or at the University of Helsinki, took part in this experiment. This experiment used a paradigm of eye movement during listening. Participants saw large color pictures (made with clip art) and listened to short pre-recorded stories about these pictures. They were told that in some cases the story might not match the picture, and that in such cases their task was to correct (verbally) the story according to what they saw in the picture (e.g., in example (6) below, the fact that neither character is standing next to a photocopier). Typically, the pictures contained two to four characters ( people or animals) and other objects that made up a coherent scene. The stories the participants heard as they viewed the pictures described actions carried out by characters in the picture. The sound files were recorded using the Syntrillium CoolEdit program on a laptop PC. The same female native Finnish speaker’s voice was used for all sound files. A digital camera was used to record participants’ eye movements during the experiment. On each trial, the participant was shown a large color picture, and above this picture was a Sony DVcam digital camcorder with audio-lock recording. The DVcam recorded the participant’s face and eyes, the auditory stimuli, and the participant’s spoken responses. Later, we analyzed participants’ eye movements, using a digital VCR with jogshuttle control. Materials and Coding
There were 16 critical items ( picture-story pairs) in the experiment and 32 filler items. Every critical item consisted of two human characters, positioned
340
Kaiser and Trueswell
on opposite sides of the picture, one on the left and one on the right (e.g., the secretary and the businessman in (6)). Each story began with an opening sentence that described what a character called Liisa was doing. Then, in the second sentence, a new referent was introduced ( here, a secretary). This referent was mentioned again in the third sentence, which had SVO or OVS order. In both SVO and OVS conditions, the preverbal argument was discourse-old, as it was mentioned in the preceding sentence. Thus, both SVO and OVS sentences were felicitous. The critical sentence was the final sentence, which bÂ�egan with the anaphoric expression ( hän or tämä). This sentence was incorrect with respect to both of the characters in the picture, since both were standing next to something but neither was standing next to a photocopier. This was intentional, since we did not want to bias participants towards either interpretation. The second clause of this sentence mentioned some other objects present in the picture, in order to encourage participants to look away from the two mentioned characters. The stimuli were constructed such that the entities mentioned in this second clause were not potential referents for hän or tämä, because of number and/or animacy. (6) Liisa steps into the main office of a big company. She notices a secretary who is talking on the phone. After a moment the secretary-SUBJ criticizes a businessman-OBJ who has just walked in. After a moment the secretary-OBJ criticizes a businessman-SUBJ who has just walked in .â•–.â•–. while the printers are churning out the day’s reports. S/ he // This is standing next to the photocopier. All stories started with Liisa being mentioned in the first sentence. We included her in the story because, in order to refer felicitously to the discourse-old referent in the SVO or OVS sentence ( here, secretary) using a full NP, we needed to have another entity present in the story. The videotapes of the participants’ eyes were coded frame by frame for whether the participant was looking to the left, to the right, to the middle, or elsewhere. Since the sound was turned off, the coders were blind to experimental condition. Eye-movement coding was used to establish which characters participants had looked at. Predictions
Before we look at the results of this experiment, it will be useful to consider what we predict will happen in each of the four conditions. In light of what we saw in the sentence-completion experiments, we predict that in the conditions with the pronoun hän participants will interpret hän as referring to the preced-
Pronouns and Demonstratives in Finnish
341
ing subject, regardless of whether the order is SVO or OVS. For the demonstrative tämä, we predict that when it is preceded by an SVO sentence, participants will interpret tämä as referring to the postverbal object, and that when it is preceded by an OVS sentence, the pattern might be less clear, but participants will still prefer the postverbal argument over the preverbal one. In€other words, if we think of these predictions in terms of linear order, we predict that SVO/Hän is the only condition that prompts looks to the firstmentioned referent. The other three conditions — SVO/Tämä, OVS/Hän and OVS/Tämä — are all expected to trigger more looks to the second-mentioned, postverbal referent than to the first-mentioned referent. We might expect to see a slight delay in participants’ responses to the demonstrative tämä, since it will not be clear until the verb that tämä is being used anaphorically rather than as a prenominal modifier (e.g., this man) or a discourse deictic (e.g., this was fun). In contrast, with the pronoun hän, there is no ambiguity and its anaphoric function is clear right away, so we do not have any reason to expect delays. Results
Because of space limitations, we present the time-course eye-gaze data in abbreviated form in figure 13.1. (A more detailed discussion is available in K�aiser and Trueswell 2008.) In particular, we calculated the first-mention-looking-
Figure 13.1
First-mention advantage for each of the four conditions during three time slices, with the first time slice starting at the onset of the anaphoric expression.
342
Kaiser and Trueswell
advantage scores for three time windows, each lasting 2/3 of a second, starting at the onset of the anaphoric expression and continuing for the next 2 seconds (i.e., from 0 to 59 frames; there are 30 frames per second). First-mention looking advantage was calculated by taking the proportion of time that participants spent looking at the referent of the preverbal NP in a given time slice and subtracting off the proportion of time spent looking at the referent of the postverbal NP during that same time slice. As figure 13.1 shows, for SVO contexts participants showed an anticipation effect; they expected the first NP of the upcoming utterance to refer to the subject of the previous utterance. That is, in both SVO conditions ( hän or tämä) subjects show a first-mention advantage during the 0 –19 frame window (from 0 to 667 msec). In fact, as Kaiser (2003) discussed, this anticipatory effect starts to appear before the onset of the anaphoric expression, and is probably due to SVO order and its discourse properties. We then see a shift in the SVO/ Tämä condition over the next two time slices (20 –39 frames, i.e., 668–1333 msec, and 40 –59 frames, i.e., 1334–2000 msec): hearing the demonstrative tämä triggers assignment of the referent to the second-mentioned NP, i.e., the less salient object in the previous sentence. In contrast, the SVO/Hän condition shows a strong first-mention advantage during all three time slices. OVS contexts show a different pattern. Here we do not see a clear anticipatory preference for considering either previous NP as the referent (frames 0 –19, 0 – 667 msec). It is true that this time slice shows a slight preference for the second NP in the hän condition, but inspection of more detailed timecourse plots shows that this is not an anticipatory effect. Rather, in the OVS/ Hän condition there is a shift toward looking to the referent of the second NP approximately five frames (167 msec) into this first time slice. This preference for the second NP is also apparent in the two subsequent time slices (667–1333 msec, 1334–2000 msec). However, detailed time-course plots (not shown here) revealed that the demonstrative tämä does not have such a clear early preference, and takes longer to exhibit a preference to the second-mentioned NP. This suggests that in the OVS condition the pronoun hän allows participants to refer rapidly and unambiguously to the subject of the previous sentence (the second NP), whereas tämä, which, we hypothesize, is sensitive to a discourse-based notion of salience (a more graded and variable constraint), shows a less systematic pattern. Statistical analyses were conducted on the proportion of looks to the firstmentioned referent on 400-msec time slices in order to obtain detailed information about the time-course patterns. The analyses show that during the first 400 msec after the onset of the anaphoric expression an effect of word order begins to emerge, and this effect is significant in all segments from 400 to 1600
Pronouns and Demonstratives in Finnish
343
msec (╃p’sâ•–<â•–0.05). There is also a significant interaction between word order and anaphor type in the 400 –800-msec segment (╃p’sâ•–<â•–0.05). After this point, the interaction remains significant by items only for one more segment, until 1200 msec. An effect of anaphor type begins to emerge during the 800 –1200msec segment and strengthens during the 1200 –1600-msec segment (╃p’sâ•–<â•– 0.05). Discussion
Taken as a whole, the results of the eye-tracking study support the predictions that in the SVO/Hän condition participants will interpret the pronoun as referring to the first-mentioned referent (the subject), and that in the other three conditions, OVS/Hän, SVO/Tämä, and OVS/Tämä, there will be no such firstmention bias in the interpretation. In other words, these results corroborate the off-line findings that (i) the pronoun hän is sensitive to grammatical role and (ii) the demonstrative tämä is sensitive to discourse status (correlated here with word order) and grammatical role. The eye-tracking findings are compatible with our hypothesis that the pronoun hän prompts comprehenders to preferentially attend to the syntactico-semantic level of representation, and that the demonstrative tämä prompts comprehenders to probe the mental-discoursemodel level. Thus, the eye-tracking results show that, even on an incremental level, we cannot maintain an approach which assumes that pronouns and demonstratives are sensitive to a single notion of salience. Hän and tämä should not display different sensitivities to different factors if what they correspond to are simply two different rankings on a one-dimensional salience hierarchy. General Discussion
The sentence-completion experiments and the eye-tracking study indicate that the pronoun hän ‘s/ he’ and the demonstrative tämä ‘this’ are sensitive to different factors. In the kinds of contexts investigated here, hän is used to refer to subjects and thus seems to be sensitive to grammatical role, whereas tämä prefers postverbal, discourse-new referents, especially objects — revealing a sensitivity to word order/information structure and grammatical role. Given that these results are problematic for an approach that treats hän and tämä as sensitive to the same kind of salience (whether it be determined by a single factor, such as grammatical role or information structure, or by a set of factors), what conclusions should we draw concerning the notion of salience? We would like to suggest that, rather than abandoning the notion of salience€completely, researchers should investigate whether hän and tämä cause comprehenders to preferentially probe different types of representations when
344
Kaiser and Trueswell
trying to locate the most likely referent for the anaphoric form. We hypothesize that two different representations of the previous linguistic input are relevant for computing reference: (i) For hän, what matters most is the syntacticosemantic representation of the prior sentence, which includes information about grammatical and thematic roles. (ii) For tämä, what matters is the comprehender’s mental model of the discourse. According to our approach, in each of these representations referents can be ranked in terms of their salience. At the syntactico-semantic level, grammatical and thematic roles influence salience; at the level of the mental discourse model, a number of factors conÂ� tribute to referent salience, especially information structure (encoded here in word order). Thus, we hypothesize that hän is used to refer to referents that are high in salience at the syntactico-semantic level, and tämä is used for referents that are low in salience at the discourse-model level. It is worth noting that the two representations that we hypothesize to be relevant for computing the referents of hän and tämä are related to the arguments of Hoffman (1998) and Strube and Hahn (1996, 1999). Hoffman finds that in Turkish anaphor resolution and the information-structure factors driving word-order variation are separate from each other and salience is determined by grammatical role. In contrast, Strube and Hahn argue that in German information structure determines referent salience and hence also anaphor resolution. These claims, combined with our findings for Finnish, raise interesting questions about the range of cross-linguistic variation. In light of Hoffman’s and Strube and Hahn’s work, one might have thought that perhaps languages fall into two main groups: languages in which information-structure-motivated word-order variation is irrelevant for reference resolution, and languages in which it matters. However, the Finnish data presented in this paper suggest that both options can be present in one language. In light of the range of variation displayed by Finnish, German, and Turkish, it would be interesting to know more about information structureÂ�–anaphor resolution relation in other languages, in order to see whether other languages instantiate both options and, if they don’t, which of the two remaining options is more common. Our approach raises the question why different referential forms are associated with different kinds of representations. For example, why is tämä sensitive to the discourse model and to information structure or word order when hän does not seem to be affected by these factors? We do not offer a definitive answer to this question in this paper, but we suggest that perhaps it has something to do with the generally discourse-bound nature of the demonstrative tämä, which — in addition to functioning as an anaphor for human rÂ�eferents — can be used as a proximal demonstrative and a discourse deictic. Often, in such uses, the referent of tämä is very much context-dependent and not at all
Pronouns and Demonstratives in Finnish
345
influenced by the grammatical role of its referent, since tämä — when used as a proximal demonstrative or discourse deictic — often does not even have an antecedent that is a linguistic constituent. This is in striking contrast to the pronoun hän, which is used to refer to human entities. In other words: Unlike hän, tämä is used in many contexts in which the entity to which it refers has no grammatical role whatsoever. It does not seem so strange, then, to posit that grammatical role is not the most highly ranked factor for tämä. Related work investigating pronouns and demonstratives in English was done by Brown-Schmidt, Byron, and Tanenhaus (2004, 2005). Using eyetracking methodology, Brown-Schmidt et al. investigated the interpretation of the pronoun it and the demonstrative that in English, and found that both forms are sensitive to extra-linguistic information, for example to how easily two objects could be viewed as a composite. For instance, given (7), participants interpreted that as referring to the composite object ‘cup-and-saucer’ 88 percent of the time. (7)╇ Put the cup on the saucer. Now put that over by the shovel. The frequency of such composite interpretations depended on the prepositions used, as well as on the kinds of objects used (everyday objects or children’s toy blocks), and they were more frequent with everyday objects that formed likely composites. Moreover, that was interpreted as referring to a composite more often than it was. In sum, these results show that differences between it and that are not adequately captured by a linguistically based salience scale, since extra-linguistic factors such as “composite status” also play a role. Perhaps the preference of that to refer to composite entities (i.e., entities that do not refer to previously mentioned linguistic constituents) is related to the informationstructure sensitivity of tämä, another form that can be used to refer to entities that do not have linguistic antecedents. Let us now consider how our findings can be integrated with existing approaches to reference resolution. First of all, we would like to emphasize that one should not abandon the basic observation that more informative referential forms (e.g., ‘the man with the straw hat’, ‘that man’, ‘the man’) can be used to refer to less salient, less prominent referents than very informationally poor referential forms can (e.g., ‘he’) (Givón 1983; Ariel 1990; see also Gundel et€al. 1993). Rather, the implications of our findings are most relevant in cases where we encounter two or more anaphoric forms that cannot be distinguished on the basis of such informational grounds. For example, in Finnish ‘hän called yesterday’ and ‘tämä called yesterday’ both tell us that the referent of the anaphoric expression is singular, human, and at least somewhat salient (since it can be identified on the basis of such limited information). However, as eÂ�xisting
346
Kaiser and Trueswell
corpus studies and native speaker intuitions show, hän and tämä do not possess the same referential properties. What, then, is the difference? We would like to suggest that the difference lies in what kind of representation the referential expression is associated with. In other words: When a comprehender is processing a sentence with hän or tämä, what type of representation does the comprehender preferentially probe in order to find a referent? Thus, our findings are not compatible with any approaches that regard hän and tämä as sensitive to a unified, monolithic notion of salience. This incompatibility persists whether salience is regarded as determined by a set of factors or as determined by a single factor — e.g., grammatical role, as Hoffman (1998) claims, or information structure, as Strube and Hahn (1999) argue). However, a saliencescale-based approach that is capable of accommodating the idea that referential forms may differ in the kind of representation they primarily access is compatible with our findings. It is worth contrasting our account with a more extreme interpretation, namely that hän accesses its referents solely via its linguistic antecedent whereas tämä accesses its referents using all available information sources. For a number of reasons, we do not want to make such a strong claim here. First, we think it is likely that all referential expressions attempt direct access to the representation of the discourse model, but that some (such as hän) rely more heavily on mediating representations (such as the syntactico-semantic representation of the preceding sentence). Thus, we assert that Finnish listeners implicitly know that the syntactico-semantic properties of the preceding utterance are relevant for computing the referent of hän, but this does not prevent them from using other information sources. In other words, our claim that hän relies heavily on the syntactico-semantic representation of a sentence does not imply that access to other kinds of representations is entirely impossible. For example, hän can be used without a linguistic antecedent if the extralinguistic context is sufficient. Consider a situation in which someone I have never seen before just rode by on a bicycle very quickly. In such a context, I could exclaim to you “Boy, was he fast!” and could use hän without a linguistic antecedent — that is, without a linguistically derived syntactico-semantic representation. The second reason why we do not claim that speakers are limited exclusively to the syntactico-semantic representation when interpreting hän is related to how listeners process repeated occurrences of a pronoun. Kaiser (2003, 2005) investigated the effects of anaphoric form. In two sentence-completion studies, she investigated what happens if the SVO-OVS sentence (which is followed by€ hän or tämä, just as in experiments 1 and 2) contains a pronominalized aÂ�rgument — in other words, has hän either in subject or object position. The SproVO, OproVS, and SVOpro configurations were tested, and the results reveal
Pronouns and Demonstratives in Finnish
347
that in the OproVS condition a subsequent hän shows somewhat of a preference for the pronominalized, preverbal object over the postverbal subject. However, this preference disappears in the SVOpro condition, in which a subsequent occurrence of hän is split evenly between the subject and the object. In the case of tämä we see a preference for the postverbal referent in all conditions, but this preference is at its weakest in SVOpro. In light of these results, Kaiser hypothesizes that a chain of pronouns patterns differently from a single use. More specifically, she suggests that the different findings can be reconciled if we make a distinction between the first occurrence of hän (i.e. when hän is used for a referent for the first time) and a second occurrence (i.e. when a referent picked out with hän is referred to with hän again). Kaiser (2003) discusses how the difference between first-occurrence and second-occurrence uses, as well as the differences between OproVS and SVOpro, could be modeled using a referenttracking system in which a second use of a pronoun can be interpreted as being anaphoric on the first use and not directly on the referent itself (see Kaiser 2003, for detailed analysis). The distinction between pronoun chains and firstoccurrence uses may well be due to some kind of repetition priming, and this phenomenon and its cognitive properties merit further research. Of course, many questions regarding the referential properties of pronouns and demonstratives remain open. For instance, the grammatical-subject preference of hän needs to be investigated in more detail, to see how hän behaves with experiencer/psych verbs and passive constructions. In the experiments presented here, the subject of the preceding sentence was always agentive. Thus, we cannot yet tell whether hän is sensitive to subjecthood or to agentivity or a combination of both. In future work, by looking at different constructions and verb types, we plan to investigate these kinds of questions in detail. Also needing to be looked at more is how verb semantics, connectives, and global discourse structure figure in the interpretation of hän and tämä. In sum, our investigation of the referential properties of the pronoun hän ‘s/ he’ and the demonstrative tämä ‘this’ in Finnish suggests that not all refÂ� erential forms within a single language are sensitive to the same salience-Â� influencing factors to the same degree. We hypothesize that hän and tämä prompt comprehenders to preferentially probe different sorts of representations when trying to locate the most likely referent within the discourse: hän preferentially accesses the syntactico-semantic representation of the preceding sentence, whereas tämä preferentially accesses the current discourse representation directly. Thus, our results suggest that to understand the referential properties of different forms we need to consider the sorts of representations that a referential form preferentially accesses as well as the salience of the entities within those representations.
348
Kaiser and Trueswell
Acknowledgments
Thanks to Cassie Creswell, Eleni Miltsakaki, Kimiko Nakanishi, Ritva Laury, Ellen Prince, Maribel Romero, Jennifer Venditti, and the psycholinguistics lab group at Penn for many useful comments and suggestions. This research was partially supported by a grant from the National Institutes of Health (1-R01HD37507) awarded to the second author. A fuller description of experiments 2 and 3 can be found in Kaiser and Trueswell 2008. Notes 1.╇ These approaches resemble each other in that they all propose a ranking of referential forms depending on the salience or accessibility of the referent. However, there are other important differences between them, such as their claims concerning the nature of the relation between particular referential expressions and accessibility statuses (see Ariel 2001 for a detailed discussion). 2.╇ The fact that some referential forms in some languages can provide information about things such as number, gender, animacy or “humanness” (e.g., English ‘it’ vs. ‘she/ he’) etc. is, in our opinion, incontrovertible, and thus we do not address it here. (For work on gender and number, see Arnold, Eisenband, Brown-Schmidt and Trueswell 2000; Greene, McKoon, and Ratcliff 1992; Garrod and Sanford 1982; Albrecht and Clifton 1998.) We focus here on choices in referential form that cannot be explained by these kinds of factors. 3.╇ In this paper we focus on the effects that grammatical role has on what happens in subsequent discourse. This does not mean that preceding discourse is irrelevant. It has often been noted that referents appear in subject position because they are salient in the preceding discourse (Chafe 1976, 1994; Prince 1992). The relationships between subjecthood and a referent’s role in the preceding and the subsequent discourse are interconnected. If a referent is salient in the preceding discourse and is therefore realized as the subject of an utterance, it is not surprising that, from the perspective of the subsequent utterance, that referent can be more salient than, say, the object of the preceding utterance (see also Walker, Joshi, and Prince 1998). Thus, when we discuss effects that the syntactic role of potential referents has on subsequent referential forms, we do not mean to imply that preceding discourse is not relevant. Thanks to Ritva Laury for bringing this question to our attention. 4.╇ In dialects of spoken Finnish, the demonstrative pronoun se ‘it’ is evolving into a kind of definite article (see Laury 1997). However, this is not the case in Standard FÂ�innish. 5.╇ Corpus data show that in Finnish the alternation between SVO and OVS depends on discourse status, not hearer status. In other words, whether an entity counts as “old” or “new” depends on its discourse status (whether it has been mentioned in the preceding discourse), not on whether it is known or old to the hearer ( hearer status). Thus, names of family members or famous people (which are hearer-old) can be postverbal subjects if they are discourse-new (see also (3a)). See Prince 1992 for further discussion of dis-
Pronouns and Demonstratives in Finnish
349
course status and hearer status. It is worth noting at this point that a referent that is discourse-old is necessarily also hearer-old, whereas a discourse-new entity can be hearer-old or hearer-new. 6.╇ In this paper we focus on standard Finnish. Dialects of colloquial Finnish have somewhat different anaphoric systems (Laitinen 1992; Seppänen 1998). 7.╇ As discussed in Kaiser 2008, there are certain special contexts, involving logophoricity (i.e. from the perspective of the person whose speech, thoughts or feelings are being reported), in which tämä can be used to refer to what seems at first blush to be a salient referent. However, as Kaiser (ibid.) shows, a closer look at these kinds of examples shows that even in these contexts (where the logophoric referent is very salient) the defining characteristics of tämä is that it is used to refer to characters that are not the most salient, not at the center of attention. 8.╇ In the 1999 version of their paper, Strube and Hahn focus on hearer-status (i.e., whether an entity is familiar to the hearer), whereas Finnish word order is guided by discourse status (i.e., whether the entity has already been mentioned in the current discourse; see note 5). Hearer status and discourse status often overlap, but not always — a referent can be hearer-old and discourse-new, for instance ‘the president’. In formulating predictions for Finnish on the basis of Strube and Hahn’s claims, we use discourse status instead of the hearer-status criterion that Strube and Hahn used for German. UÂ�sing discourse status instead of hearer status is in some ways reminiscent of Strube and Hahn’s 1996 paper, which distinguished context-bound and context-unbound eÂ�lements. 9.╇ In the rest of this paper, we will often say, for the sake of brevity, that the pronoun/ demonstrative refers to the subject/object — even though this is not strictly speaking correct, since the referential form actually, in the end, picks out the entity whose most recent antecedent was in subject/object position. 10.╇ Another hypothesis regarding the interpretation of hän and tämä is that workingmemory factors — specifically recency/ locality (Gibson 1998, 2000) — play a role, with comprehenders showing a preference for the most local/most recent referent. According to this approach, the recency preference can be overridden only under certain conditions, namely in the SVO/Hän configuration. However, it is not clear how, in the domain of anaphor resolution, a locality/recency preference would fit with the findings indicating that being sentence-initial makes a referent more salient. 11.╇ The influence of grammatical role on tämä is also shown by the fact that corpus data indicates that tämä can also refer to discourse-old referents. If it is preceded by a transitive sentence that contains two discourse-old arguments, which in Finnish will normally occur in S-O order, it prefers the object. 12.╇ Interestingly, it seems that a demonstrative interpretation may not have been as easy to use as an “escape hatch” in experiment 1 as in experiment 2 — perhaps because of the lack of a preceding discourse context in the first experiment. References Albrecht, J. E., and Clifton, C., Jr. 1998. Accessing singular antecedents in conjoined phrases. Memory and Cognition 26, 599– 610.
350
Kaiser and Trueswell
Ariel, M. 1990. Accessing NP Antecedents. Routledge, Croom Helm. Ariel, M. 2001. Accessibility Theory: An overview. In T. Sanders, J. Schilperoord, and W. Spooren (eds.), Text Representation, Linguistic and psycholinguistic aspects. John Benjamins. Arnold, J. 1998. Reference Form and Discourse Patterns. Ph.D. dissertation, Stanford University. Arnold, J. E., Eisenband, J. G., Brown-Schmidt, S., and Trueswell, J. C. 2000. The rapid use of gender information: Evidence of the time course of pronoun resolution from eyetracking. Cognition 76, B13–B26. Birner, B., and Ward, G. 1998. Information Status and Noncanonical Word Order in English. John Benjamins. Brennan, S., Friedman, M., and Pollard, C. 1987. A Centering approach to pronouns. In€ Proceedings of the 25th Annual Meeting of the Association for Computational LÂ�inguistics. Brown-Schmidt, S. B., Byron, D., and Tanenhaus, M. K. 2004. That’s not it and “it’s” not “that”: The role of conceptual composites in in-line reference resolution. In M. Carreiras and C. Clifton, Jr. (eds.), On-Line Sentence Processing: ERPS, Eye Movements, and Beyond. Psychology Press. Brown-Schmidt, S., Byron, D. K., and Tanenhaus, M. K. 2005. Beyond salience: Interpretation of personal and demonstrative pronouns. Journal of Memory and Language 53, 292–313. Chafe, W. L. 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In C. N. Li (ed.), Subject and Topic. Academic Press. Chafe, W. L. 1994. Discourse, Consciousness, and Time. University of Chicago Press. Chesterman, A. 1991. On Definiteness. Cambridge University Press. Crawley, R., and Stevenson, R. 1990. Reference in single sentences and in texts. Journal of Psycholinguistic Research 19 (3), 191–210. Di Eugenio, B. 1998. Centering in Italian. In M. A. Walker, A. K. Joshi, and E. F. Prince (eds.), Centering Theory in Discourse. Oxford University Press. Etelämäki, M. 1996. Keskutelu tarkoitteesta kamerataiteen oppitunnilla — ja pronominit tuo, se, tämä. Thesis, University of Helsinki. Fiengo, R., and May, R. 1994. Indices and Identity. MIT Press. Garnham, A. 2001. Mental Models and the Interpretation of Anaphora. Psychology Press. Garrod, S. C., and Sanford, A. J. 1982. The mental representation of discourse in a focused memory system: Implications for the interpretation of anaphoric noun phrases. Journal of Semantics 1, 21– 41. Gernsbacher, M. A., and Hargreaves, D. 1988. Accessing sentence participants: The advantage of first mention. Journal of Memory and Language 27, 699–717. Gibson, E. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition 68, 1–76.
Pronouns and Demonstratives in Finnish
351
Gibson, E. 2000. The dependency locality theory: A distance-based theory of linguistic complexity. In Y. Miyashita, A. Marantz, and W. O’Neil (eds.), Image, Language, Brain. MIT Press. Givón, T. 1983. Topic Continuity in Discourse: A Quantitative Cross-Language Study. John Benjamins. Glenberg, A., Kruley, P., and Langston, W. E. 1994. Analogical processes in comprehension. In M. A. Gernsbacher (ed.), Handbook of Psycholinguistics. Academic Press. Gordon, P., Grosz, B., and Gilliom, L. 1993. Pronouns, names, and the Centering of attention in discourse. Cognitive Science 17, 311–347. Greene, S., McKoon, G., and Ratcliff, R. 1992. The role of implicit causality and gender cue in the interpretation of pronouns. Language and Cognitive Processes 73 (4), 231–255. Grosz, B., Joshi, A., and Weinstein, S. 1995. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics 21 (2), 203–225. Gundel, J., Hedberg, N., and Zacharski, R. 1993. Cognitive status and the form of referring expressions in discourse. Language 69, 274–307. Hajičová, E., Kuboň, V., and Kuboň, P. 1990. Hierarchy of salience and discourse aÂ�nalysis and production. In Proceedings of the 13th Conference on Computational LÂ�inguistics. Hajičová, E., and Vrbová, J. 1982. On the role of the hierarchy of activation in the pÂ�rocess of natural language understanding. In J. Horecký (ed.), COLING 82. NorthHolland. Halmari, H. 1994. On accessibility and coreference. Nordic Journal of Linguistics 17, 35–59. Heim, I. 1982. The Semantics of Definite and Indefinite Noun Phrases. Ph.D. dissertation, University of Massachusetts, Amherst. Heim, I. 1983. File change semantics and the theory of definiteness. In R. Bauerle, C. Schwarze, and A. von Stechow (eds.), Meaning, Use, and the Interpretation of Language. Walter de Gruyter. Helasvuo, M.-L. 2001. Syntax in the Making: The Emergence of Syntactic Units in Finnish Conversation. John Benjamins. Hiirikoski, J. 1995. Correlations between some morphosyntactic features and word oÂ�rder in Finnish and English: Some preliminary results of testing the transitivity hypothesis. In B. Wårwik, S. K. Tanskanen, and R. Hiltunen (eds.), Organization in Discourse, Proceedings from the Turku Conference. Hoffman, B. 1998. Word order, information structure and Centering in Turkish. In M.€A. Walker, A. K. Joshi, and E. F. Prince (eds.), Centering Theory in Discourse. Oxford University Press. Johnson-Laird, P. 1983. Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Harvard University Press. Kaiser, E. 2000. Pronouns and demonstratives in Finnish: Indicators of referent salience. In P. Baker, A. Hardie, T. McEnery, and A. Siewierska (eds.), Proceedings of the Discourse Anaphora and Reference Resolution Conference.
352
Kaiser and Trueswell
Kaiser, E. 2003. The quest for a referent: A crosslinguistic look at reference resolution. Ph.D. dissertation, University of Pennsylvania. Kaiser, E. 2005. Different forms have different referential properties: Implications for the notion of ‘salience’. In A. Branco, T. McEnery, and R. Mitkov (eds.), Anaphora Processing: Linguistic, Cognitive and Computational Modelling. John Benjamins. Kaiser, E. 2008. (Anti)logophoricity in Finnish. In Proceedings from the Main Session of the Chicago Linguistic Society’s Fortieth Meeting. Kaiser, E., and Trueswell, J. C. 2008. Interpreting pronouns and demonstratives in Â�Finnish: Evidence for a form-specific approach to reference resolution. Language and Cognitive Processes 23, 709–748. Kalliokoski, J. 1991. Empathy as motivation for style shifting in narrative. In J. Verschueren (ed.), Levels of Linguistic Adaptation. Selected Papers of the International Pragmatics Conference, volume 2. John Benjamins. Kiss, K. 1995. Discourse Configurational Languages. Oxford University Press Laitinen, L. 1992. Välttämättömyys ja persoona. Suomen murteiden nesessiivisten rakenteiden semantiikkaa ja kielioppia. Suomalaisen Kirjallisuuden Seura. Laury, R. 1997. Demonstratives in Interaction. John Benjamins Linna, V. 1954/1999. Tuntematon Sotilas. Werner Söderström Oy. Matthews, A., and Chodorow, M. 1988. Pronoun resolution in two-clause sentences: Effects of ambiguity, antecedent location, and depth of embedding. Journal of Memory and Language 27, 245–260. McDonald, J., and MacWhinney, B. 1995. The time course of anaphor resolution: Effects of implicit verb causality and gender. Journal of Memory and Language 34, 543– 566. Prasad, R., and Strube, M. 2000. Discourse salience and pronoun resolution in Hindi. University of Pennsylvania Working Papers in Linguistics 6.3, 189–208. Prince, E. F. 1992. The ZPG letter: Subjects, definiteness and information status. In S. Thompson and W. Mann (eds.), Discourse Description: Diverse Analyses of a FundRaising Text. John Benjamins. Rambow, O. 1993. Pragmatic aspects of scrambling and topicalization in German. PÂ�aper presented at Workshop on Naturally-Occurring Discourse, University of PÂ�ennsylvania. Remes, I. 2000. Pääkallokehrääjä. Werner Söderström Oy. Saarimaa, E. A. 1949. Kielemme käytäntö. Pronominivirheistä. Virittäjä 49, 250 –257. Seppänen, E.-V. 1998. Läsnäolon pronominit. Tämä, tuo, se ja hän viittaamassa keskustelun osallistujaan. Suomalaisen Kirjallisuuden Seura. Shapiro, L. P., and Hestvik, A. 1995. On-line comprehension of VP-ellipsis: Syntactic reconstruction and semantic influence. Journal of Psycholinguistic Research 24 (6), 517–532. Shapiro, L. P., Hestvik, A., Lesan, L., and Garcia, A. R. 2003. Charting the time-course of VP-ellipsis sentence comprehension: Evidence for an initial and independent structural analysis. Journal of Memory and Language 49 (1), 1–19.
Pronouns and Demonstratives in Finnish
353
Stevenson, R., Crawley, R., and Kleinman, D. 1994. Thematic roles, focus and the representation of events. Language and Cognitive Processes 9, 519–548. Stevenson, R., and Urbanowicz, A. 1995. Structural focusing, thematic role focusing and the comprehension of pronouns. In Proceedings of the 17th Annual Conference of the Cognitive Science Society. Strube, M., and Hahn, U. 1996. Functional Centering. In Proceedings of ACL ’96. Strube, M., and Hahn, U. 1999. Functional Centering: Grounding referential coherence in information structure. Computational Linguistics 25 (3), 309–344. Sulkala, H., and Karjalainen, M. 1992. Finnish. Routledge. Tetreault, J. 2001. A corpus-based evaluation of Centering and pronoun resolution. Computational Linguistics. Turan, Ü. D. 1998. Ranking forward-looking centers in Turkish. In M. A. Walker, A. K. Joshi, and E. F. Prince (eds.), Centering Theory in Discourse. Oxford University Press. Turan, Ü. D. 1995. Null vs. Overt Subjects in Turkish Discourse: A Centering Analysis. Ph.D. dissertation, University of Pennsylvania. Vallduví, E. 1990. The Informational Component. Ph.D. dissertation, University of Pennsylvania. Van Dijk, T. A., and Kintsch, W. 1983. Strategies of Discourse Comprehension. Academic Press. Varteva, A. 1998. Pronominit hän ja tämä tekstissä. Virittäjä 2, 202–223. Vilkuna, M. 1989. Free Word Order in Finnish. Suomalaisen Kirjallisuuden Seura. Vilkuna, M. 1995. Discourse configurationality in Finnish. In K. Kiss (ed.), Discourse Configurational Languages. Oxford University Press. Vilppula, M. 1989. Havaintoja hän- ja he-pronominien käytöstä suomen murteissa. Virittäjä 93, 389–399. Walker, M. A., Joshi, A. K., and Prince, E. F. 1998. Centering Theory in Discourse. Oxford University Press.
14â•…
Not All Subjects Are Born Equal: A Look at Complex Sentence Structure Eleni Miltsakaki
There is a voluminous literature in linguistics, psycholinguistics, and computational linguistics on the relationship among topics, subjects, and pronominalization. Across disciplines, the relationship arising among the three is, loosely, the following: pronouns refer to salient entities, salient entities are often topics, and topics tend to appear in subject positions. For the most part, the relationship among topics, salience, and subjecthood has been investigated in the cÂ�ontext of simple main clauses. Simple discourses, consisting primarily of a succession of main clauses, mask potential processing effects of complex sentence structure. Complex sentences, on the other hand, raise a host of interesting theoretical and empirical questions on the salience statuses of different types of subjects, the lifespan of topic assignments, and the choice of linguistic expression for reference to entities evoked in various positions in a variety of clauses. This paper addresses the effects of complex structure on topichood, subjecthood, and pronominal interpretation from an empirical point of view. Specifically, it asks the following related questions: •â•‡
Are all subjects born equal? Do subordinate clauses establish their own topics? •â•‡ Is pronominal interpretation sensitive to complex sentence structure? •â•‡
Empirical investigation of the above questions promises to offer crucial insights into the interaction of topicality, subjecthood, and pronominalization and to improve our understanding of the use of subordination in discourse. In what follows, I will first review predominant accounts on the relationship among salience, reference, and choice of referring expression and sketch out what their predictions might be for the salience status of entities in complex sentences. In the Centering framework of discourse coherence, Kameyama’s (1998) proposal on how to break up complex sentences in topic-update units will also be reviewed. The rest of this paper is devoted to experimental and
356
Miltsakaki
corpus studies on entity salience in adverbial and relative clauses. The pri�� mary€findings from these studies reveal a distinction between main and sub� ordinate subjects: main-clause subjects maintain their discourse salience over subordinate-clause subjects. Relatedly, main-clause subject positions appear to be favored over their subordinate counterparts for establishing shifts to new topics. I conclude with some thoughts on why entities evoked in subordinate clauses tend to be marked with low salience. Previous Work on Topics, Subjects, and Pronouns The Gradient View of Topics
In the functional literature, subjects were viewed as grammaticalized topics (Givón 1976; Chafe 1976). Later, Givón (1983) defended the view that all entities are topical to a greater or lesser degree. The relationship between topicality and the typology of referring expressions was seen as another instance of iÂ�conicity in language, with different types of linguistic expressions reflecting different degrees of topicality. In Givón’s universal “grammar of topic identification,” zero anaphora marks the most topical entities and full NPs mark the least topical entities: zero anaphoraâ•–>â•–unstressed/ bound pronounsâ•–>â•–stressed/ independent pronounsâ•–>â•–full NPs. For the empirical testability of the grammar of topic identification, Givón (1983) proposes the following three measures of topicality: referential distance, potential interference, and persistence. With respect to complex sentences, these measures predict that referents evoked in a recent subordinate clause are more likely to be referenced with a pronoun than referents evoked in a more distant main clause. Further, it is expected that the presence of multiple entities in a recent subordinate clause should hinder pronominal reference to the main-clause subject.1 As we will see, these expectations are challenged. The Centering View
Centering theory was developed as a model of the relationship among discourse coherence, discourse structure, and choice of referring expression (Grosz, Joshi, and Weinstein 1995). What we perceive as the topic of an utterance, at least in the sense of Reinhart 1981 and Horn 1986, is formally defined as the Backward-looking center. Each utterance evokes a list of Forward-� looking centers, ranked according to degree of salience. The highest-ranked entity on the list of Forward-looking centers is called the Preferred center. The Backward-looking center is the highest-ranked entity of the preceding utterance that is realized in the current utterance.
Complex Sentence Structure
357
Table 14.1
Centering transitions.
Cb(Ui) = Cp Cb(Ui) ≠ Cp
Cb(Ui) = Cb(Ui – 1)
Cb(Ui) ≠ Cb(Ui – 1)
Continue Retain
Smooth shift Rough shift
The Centering model is designed to capture those aspects of processing that are responsible for the difference in the perceived coherence of discourses as those demonstrated in (1) and (2) below (Grosz et al. 1995). (1) a. John went to his favorite music store to buy a piano. b. He had frequented the store for many years. c. He was excited that he could finally buy a piano. d. He arrived just as the store was closing for the day. (2) a. John went to his favorite music store to buy a piano. b. It was a store John had frequented for many years. c. He was excited that he could finally buy a piano. d. It was closing just as John arrived. Discourse (1) is intuitively more coherent than discourse (2). This difference may be seen to arise from the different degrees of continuity in what the discourse is about. Discourse (1) centers in on a single individual, ‘John’, whereas discourse (2) seems to center in and out on different entities, ‘John’, ‘store’, ‘John’, ‘store’. Degrees of continuity are reflected in four Centering transitions: Continue, Retain, Smooth Shift, and Rough Shift. Centering transitions are computed according to table 14.1. In that table, the Backward-looking center is designated as Cb and the Preferred center as Cp. The current utterance is shown as Ui and the preceding utterance as Ui╃–1. The most coherent transition is a Continue, identified when the topic of the current utterance. Cb(Ui) is the same as the topic of the previous utterance, Cb(Ui╃–1), and is realized in a prominent position (Cp), e.g., in subject position. The least coherent transition is a Rough Shift, identified when the topic of the current utterance, Cb(Ui), is not the same as the topic of the preceding utterance, Cb(Ui╃–1), and it is not realized in a prominent position (Cp). Interpreting a pronominal reference as the Backward-looking center in an utterance is captured in Centering’s Pronoun rule, which predicts that if there is a single pronoun in an utterance then this pronoun refers to the current topic. Of course the Pronoun rule holds in utterances with more than one pronoun, but the rule makes a prediction for only one of these pronouns. So in Centering,
358
Miltsakaki
subjects, topics, and pronouns are related via the notions of Preferred center and Backward-looking center. For computation of topic transitions as well for empirical evaluation of the Pronoun rule, the definition of the utterance, i.e., the topic-update unit, is crucial. In the original formulation of Centering, the question of the extent of the utterance was left open to empirical investigation. Here, I will sketch out the predictions that will be tested to determine whether subordinate clauses are processed as an utterance. If each tensed clause, main or subordinate, determines the extent of an utterance, we expect that a succeeding pronoun, whether in a main or a subordinate clause, will co-specify with the current topic, which in experimental conditions can be expressed by the subject of the preceding clause. In a corpus, the topic of an utterance cannot be identified independently. The extent of an utterance will be tested by comparing Centering transitions in two conditions: processing each tensed clause as a unit and processing the complex sentence as a unit. The condition yielding more coherent transitions will be taken to reflect the appropriate extent of an utterance, assuming that written text is maximally coherent. Prior Centering analyses of corpora (e.g. Di Eugenio 1998) have indeed shown that Rough-Shift transitions, for example, do not occur in written text. For the ranking of entities in the list of Forward-looking centers, entities are€ranked according to grammatical role as suggested by Brennan, WalkerFriedman, and Pollard (1987) and Walker, Joshi, and Prince (1998). Subjects rank higher than objects, which rank higher than other entities. For reasons discussed in detail in Reinhart 1981, quantificational expressions, non-specific indefinite phrases as well as impersonal references (Prince 1999b) are either not included in the list of Forward-looking centers or ranked low. In the Centering framework, complex sentence structure was studied by Kameyama (1998, 1993). Kameyama (1998) suggests breaking up complex sentences according to the following hypotheses: Conjoined and adjoined tensed clauses form independent center update units.2 Tenseless subordinate clauses, report complements, and relative clauses belong to the update unit containing the main clause. With regard to the tensed adjunct hypothesis, which treats tensed adjunct clauses as independent units, Kameyama (1998) brings support from backward anaphora. She argues that the tensed adjunct hypothesis predicts that a pronoun in a preposed subordinate clause is anaphorically dependent on an entity already introduced in the immediate discourse and not on the subject of the main clause it is attached to. However, this argument is challenged by empirical data. Carden (1982), Hoek (1997), and Tanaka (2000) demonstrate that pronouns are sometimes the first mention of their referents in discourse. The data presented here on adverbial clauses will also challenge Kameyama’s
Complex Sentence Structure
359
tensed adjunct hypothesis. Our data suggest that adverbial clauses are not processed as an independent center update unit but, instead, belong to the unit containing the main clause. Referent Accessibility
The Centering model of topics presented above makes specific claims about the relationship among topics, salience, and discourse coherence, but its predictions on choice of referring expression are limited to one pronoun per uÂ�tterance. Centering makes no other claims regarding referring expressions. Detailed mappings between type of referring expression and cognitive status, however, have been proposed by researchers working on accessibility hierarchies, which model the cognitive status of evoked entities. The work by Gundel, Hedberg, and Zacharski (1993) is representative of this line of research. They have identified six cognitive statuses ranked in the givenness hierarchy shown below and mapped to appropriate linguistic expressions. in focus > activated > familiar > uniquely > referential > type identifiable indefinite identifiable it that, this, that N the N this N a N this N Pronouns are associated with the in focus end of the scale. An entity is in focus when it is in the short-term memory and is also in the current center of attention. Such entities are likely to be topics of subsequent utterances. Note that in this account both subjects and objects bring entities in focus which can then be referenced with a pronoun. Given that more than one entity can be simultaneously in focus, the givenness hierarchy does not make predictions for the interpretation of the pronouns in (3) and (4). Presumably, in such cases the interpretation of the pronoun is guided by factors other than salience. (3)╇ Johni criticized Georgej because hej .â•–.â•–. (4)╇ Johni criticized Georgej. Then, hei .â•–.â•–. Other researchers whose work focuses on the effects of implicit causality (CÂ�aramazza and Gupta 1979; McDonald and MacWhinney 1995) and on the effects of verb semantics on pronominal interpretation (Stevenson, Crawley, and Kleinman 1994; Stevenson, Knott, Oberlander, and McDonald 2000) have proposed some such factors. Specifically, Stevenson et al. (1994) have argued that verbs project their own focusing preferences. The interpretation of the pronoun in (3) is expected due to the semantics of the verb ‘criticize’, which directs attention to the entity associated with the endpoint of the event, i.e., the object of the preceding clause in this case. Stevenson et al. (2000) have also
360
Miltsakaki
argued that, depending on its semantics, the presence of a connective such as ‘because’ or ‘so’ can reinforce the salient status of an entity. The semantic focusing account is challenged by data such as in (5), in which the focusing preferences projected by the verb do not persist. In a sentence-continuation task (Stevenson et al. 2000), it was found that the pronoun in (5) was interpreted as ‘John’ (the subject of the preceding clause) and not ‘Bill’ (the entity associated with the endpoint of the event). (5) a. Johni criticized Billj. b. Next, hei insulted Susan. Experiments on Adverbial Clauses
Work on the effects of adverbial clauses on topic continuity and pronominal interpretation has been done by Miltsakaki (2001, 2002a,b) and Cooreman and Sanford (1996). Space restrictions do not allow for a complete review of this work. I will selectively focus on two experiments (Miltsakaki 2002a; Cooreman and Sanford 1996) that specifically address the salience status of subjects in main and adverbial clauses. Relevant data on because clauses from Suri, McCoy, and DeCristofaro (1999) will also be presented. The first experiment was designed to evaluate the interpretation of a subject pronoun in main-main and main-subordinate conditions. In a sentence-completion task, participants were asked to read sets of two clauses. Each set contained two main clauses (main-main condition) or a main clause and a subordinate clause (main-Â� subordinate condition). Twenty adult native speakers of English were asked to write free continuations of the second clause that contained a subject pronoun. In both conditions, the main clause evoked two same-gender referents, one in subject position and the other in object position. The predicate of the first main clause was an action verb (‘hit’, ‘kick’, ‘kiss’, ‘hug’, etc.). To counterbalance the potential semantic effect of the subordinate conjunctions, in the main-main condition the second main clause contained a clause-initial adverb. The following connectives were included: the subordinate conjunctions ‘although’, ‘because’, ‘while’, ‘when’, and ‘so that’ and the clausal adverbials ‘however’, ‘then’, period, ‘as a result’, and ‘what is more’. Each connective appeared in three items yielding a total of 30 critical items which were combined with 90 fillers. Sample critical items are shown in examples (6)–(9). (6)╇ The groom hit the best man. Moreover, he .â•–.â•–. (7)╇ The beggar pushed the gentleman so that he .â•–.â•–. (8)╇ The boxer kicked the referee. As a result, he .â•–.â•–.
Complex Sentence Structure
361
Figure 14.1
Percentage of references to subject in main and subordinate clauses in English.
(9)╇ The policeman shot the burglar because he .╖.╖. The interpretation of the subject pronoun as the referent of the subject in the preceding main clause was quantified and converted into percentages. The scores were submitted to an ANOVA analysis. The results of the ANOVA showed a strong main effect of clause type (F(1,19)╖=╖79.33, p╖<╖0.001). F�igure€14.1 shows the percentages of reference to the subject of the first main clause in each condition. The percentages for the main-subordinate condition show that the interpretation of the subject pronoun varied between the subject and the object of the main clause. In contrast, in the main-main condition the subject pronoun showed a very strong tendency to be interpreted as the subject of the main clause. The main finding of this experiment is that pronominal interpretation appears to be sensitive to the distinction between main and subordinate clauses. Across main clauses, subjects are more salient than objects and their referents are consistently picked for the interpretation of the subsequent subject pronoun. However, when a subject pronoun is in a subordinate clause, its interpretation varies. We conclude that subject salience is strong across main clauses but intra-sententially subject salience may not be the primary factor for intrasentential pronominal interpretation. These results are consistent with the hypothesis that, in Centering, topics are updated across sentences, i.e., that main and subordinate clauses are processed as one unit. In the main-main condition, the highest-ranked entity in the first
362
Miltsakaki
main clause is the subject, and most likely topic of the next sentence. The interpretation of the succeeding pronoun is then correctly predicted to be the subject of the first main clause. If, indeed, a main clause and a subordinate clause form a single unit of topic update, the next question to be addressed concerns the salience status of entities evoked in a subordinate clause after the unit is processed. This question is equivalent to asking what determines the relative salience of entities within the complex structure. Cooreman and Sanford (1996) investigated the interpretation of a subject pronoun following a main clause and an adverbial clause, each introducing a same-gender referent. In a sentence-completion task, they presented pÂ�articipants with a complex sentence containing a main clause and an adverbial clause. Participants were prompted to start a continuation with a pronoun, which could be interpreted either as the entity introduced in the main clause or the entity introduced in the adverbial clause. To check for clause-order effects, the adverbial clause appeared both after and before the main clause. Three sets of subordinate conjunctions were used: ‘after/ before’, ‘when/while’, and ‘because/ since’. Sample items are shown in examples (10) and (11). (10) After the tenor opened his music store the conductor sneezed three times. He .â•–.â•–. (11) The conductor sneezed three times after the tenor opened his music score. He .â•–.â•–. The results of this experiment revealed that for all three sets of connectors the referent of the main clause was the preferred choice for the interpretation of the pronoun in the continuation: 92.9 percent for ‘after/ before’, 80.3 percent for ‘when/while’, and 79.8 percent for ‘because/since’. The order in which the main and adverbial clauses were presented did not make a difference except for the subordinate conjunction ‘because’: the referent of the main clause was the preferred choice for the interpretation of the pronoun in the continuation in 75.2 percent of the instances of main-subordinate order versus 85.4 percent of the instances of subordinate-main order. Cooreman and Sanford (1996) report that there was no such effect for any other subordinate conjunction, including ‘since’. Relatedly, Suri et al. (1999) studied the interpretation of a subject pronoun following a complex sentence with a main clause and a because clause. In a series of experiments, they found that in discourses such as (12) the pronoun in (12d) picks its referent from the subject of the preceding main clause and not from the subject of the because clause, which was already pronominalized and which also was evoked more recently. They also found that manipulating the
Complex Sentence Structure
363
semantics in the second main clause to make resolution to ‘Dodge’ the most plausible choice was not sufficient to warrant felicitous pronominalization, as in (13). The low salience of the entity in the because clause was further supported by another experimental study in which subjects judged that a natural way to refer to ‘Dodge’ in (14c) was by name repetition. (12) a. Dodge was robbed by an ex-convict the other night. b. The ex-convict tied him up c. because he wasn’t cooperating. d. Then he took all the money and ran. (13) a. Dodge was robbed by an ex-convict the other night. b. The ex-convict tied him up because he wasn’t cooperating. c. #Then he started screaming for help. (14) a. Dodge was robbed by an ex-convict the other night. b. The ex-convict tied him up because he wasn’t cooperating. c. Then Dodge started screaming for help. In sum, the findings of the above experiments converge on one main point: that subjects of main clauses are more likely to be referenced with a pronoun than subjects of adverbial clauses. To the extent that a subject pronoun is expected to refer to the most topical entity, these findings challenge the measures of topicality proposed by Givón (1983). Subjects of adverbial clauses are less salient than subjects of main clauses, even when they appeared more recently. The effect of recency was also challenged by Clifton and Ferreira (1987), who found that topic antecedents were preferred to non-topic antecedents regardless of distance. Clifton and Ferreira increased the distance between the anaphoric expression and the antecedent primarily by adding gerunds, which were assumed to not introduce or establish a new topic. However, this assumption cannot be made for tensed subordinate clauses. In fact, we quite often see CÂ�entering-based anaphora-resolution algorithms process every tensed clause as an independent topic-update unit. The studies on subject salience and pronominal interpretation in adverbial clauses presented here offer additional support for the predominance of topichood over recency. They also indicate that new topics are not established in subordinate clauses even when these are fully tensed clauses. A Few Thoughts on Adverbial Clauses and Coherence Relations
The results from the experiments on adverbial clauses raise new questions that€ need to be addressed. First, it is not clear what property of the type of s�ubordinate clauses is responsible for the attested patterns. Subordinate clauses
364
Miltsakaki
are introduced with subordinate conjunctions that express a discourse relation between the main clause and the subordinate clause. Adverbial clauses express a variety of relations, including causality, concession, and temporal sequence. To control for the effect of the relation established by subordinate conjunction, the main-main condition of the experiment on adverbial clauses contained several adverbials that also expressed discourse relations. Still, the effect of the type of clause was strong. On the other hand, closer inspection of the discourse relations expressed with connectives shows that not all discourse relations can be expressed with both a subordinate connective and a sentence adverb.3 Kehler (2002) proposed a promising account of the relationship between coherence relations and reference. According to the proposed account, pronouns are treated as variables whose interpretation is contingent on the type of coherence relation established in the ongoing discourse and falls out of the semantic representation. Three basic relations are identified: resemblance, cause-effect, and contiguity. Skipping the details of the proposed theory, the establishment of a resemblance relation basically accounts for data that show subjects to be the preferred antecedents. A cause-effect relation accounts for data that may show an object to be the most preferred antecedent. Interestingly, representative connectives of resemblance relations belong to the class of sentence adverbials, such as ‘however’ and ‘for example’. Representative connectives of cause-effect relations are subordinate conjunctions, such as ‘because’ and ‘even though’, with the exception of ‘and as a result’ and ‘and therefore’. It would be useful to see if the effects of the main/subordinate distinction can be uniformly attributed to coherence relations as defined by Kehler (ibid.). Relative clauses, though, would still require special investigation, as in the current formulation of Kehler’s theory they remain unexplored territory. Entity Salience in Relative Clauses
In this and the following section, we turn our attention to relative clauses. Two new corpus studies are reported: the first study was designed to evaluate the salience of entities evoked in relative clauses. The second study looks at the contribution of non-restrictive relative clauses on topic continuity as evaluated in the Centering framework. Corpus Annotation
For the investigation of the salience of entities in relative clauses, a set of features typically associated with salience status was annotated on a corpus of 300 relative clauses, some restrictive and some non-restrictive. In particular, the frequency of subsequent reference to the referent of the head noun was compared and contrasted with the frequency of reference to other entities evoked in
Complex Sentence Structure
365
the relative clause. Crucially, the type of referring expression used for reference to either the referent of the head noun or the other entities evoked in the relative clauses was also recorded. Referring expressions were classified as (a) non-applicable ( N/A) when there was no subsequent reference to either the head noun or other entities evoked in the relative clause, ( b) NP when a full noun phrase was used, (c) NP-associative when a referential link was established by inference (e.g., house–door, teams–the Eagles), (d) pronoun, and (e) other. Cases of NP-associative anaphora will not be discussed further, because the use of a noun phrase for the type NP-associative is in most cases obligatory. A few cases of first-person and second-person anaphora that were identified in the corpus will not be discussed either. The 300 tokens of relative clauses extracted from the corpus included 100 tokens of who-relatives, 100 tokens of which-relatives, and 100 tokens of thatrelatives. The corpus used for the annotation of who- and which-relative clauses included The Adventures of Sherlock Holmes by Arthur Conan Doyle (104,693 words) and a small part of the Brown corpus (1,000,000 words). For that-Â� relatives the following sources were added: Increasing Human Efficiency in Business by Walter Dill Scott (61,608 words), an excerpt from The Discovery and Settlement of Kentucky by John Filson (8,843 words), and an excerpt from Minnesota Historical Society by Solon J. Buck (43,850 words). The Brown Corpus is available through the Language Data Consortium at www.ldc.upenn .edu. All other sources are available through Project Gutenberg at www .gutenberg.net. The search for relative-clause tokens started with The Adventures of Sherlock Holmes. The rest of the corpora were added as needed in the order mentioned above until 100 tokens of each type of relative were identified. Results
Each of the 300 relative clauses was annotated with the set of features described in the preceding subsection. The results of this annotation are summarized in table 14.2. The columns “Ref. to the head noun” and “Ref. to ‘other’â•›” show raw figures for the number of instances in which the head noun and ‘other’ than the head noun referents were referenced in the subsequent discourse. “Subsequent discourse” is defined as a single but possibly complex sentence, including one main clause and all its dependent clauses. The columns marked “Referring expression” shows the type of referring expression used for€reference to the head nouns and ‘other’ entities. The row “N/A” under “Ref. to ‘other’â•›” gives the count of relative clauses that did not evoke any ‘other’ entities. Three tokens of it-clefts were dropped for the analysis, which included a total of 297 tokens. For who-relatives, the head noun was subsequently referenced in almost 50 percent of the tokens. Reference to the head noun with a pronoun occurred 14
366
Miltsakaki
Table 14.2
Reference and referring expression in relative clauses. Ref. to head noun
Referring expression
who
which
that
Total
Yes No
47 52
17 82
18 81
╇ 82 215
Total
99
99
99
297
N/A NP NP-assoc Pronoun first / second person Other
Ref. to ‘other’
who
which
that
Total
52 19 ╇ 2 14 ╇ 4
82 ╇ 7 ╇ 1 ╇ 7 ╇ 0
81 ╇ 5 ╇ 7 ╇ 3 ╇ 0
215 ╇ 31 ╇ 10 ╇ 24 ╇╇ 4
╇ 8 99
╇ 2 99
╇ 3 99
╇ 13 297
Referring expression
who
which
that
Total
Yes No N/A
╇ 7 88 ╇ 4
34 58 ╇ 7
27 55 17
╇ 68 201 ╇ 28
Total
99
99
99
297
N/A NP NP-assoc Pronoun first / second person Other
who
which
that
Total
92 ╇ 5 ╇ 0 ╇ 0 ╇ 0
65 13 ╇ 1 ╇ 7 11
72 ╇ 9 15 ╇ 1 ╇ 0
228 ╇ 27 ╇ 16 ╇╇ 8 ╇ 11
╇ 2 99
╇ 2 99
╇ 2 99
╇╇ 6 297
times, that is, 29 percent of the total number of references to the head noun. In 7 of the 14 instances of pronominal reference to the head noun, the head noun was the subject of the clause in which it appeared. An example is given in (15). (15) a. Barberi, whoi is in his 13th year as a legislator, said there are “some membersj of our congregational delegation in Washington whoj would like to see it (the resolution) passed.” b. But hei added that none of Georgia’s congressmen specifically asked him to offer a resolution. From the remaining seven tokens of pronominal reference to the head noun, in two cases the reference was in the same sentence, in one case the subject of the main clause was also referenced with a pronoun, and in the remaining four cases there was no competing antecedent in the main clause and syntactic constraints made the realization of the head referent in subject position either impossible or awkward. A typical example of this last type is given in (16).
Complex Sentence Structure
367
(16) a. A special presentation was made to Mrs. Geraldine Thompson of Red Bank, who is stepping down after 35 years in the committee. b. She was also the original GOP national committeewoman from New Jersey in the early 1920s following adoption of the women’s suffrage amendment. If we put syntactic constraints aside, the annotation of who-relatives shows that a pronoun was used for reference to the head noun either when the head noun was the subject of the main clause or, as we saw in one case, when the subject of the main clause had already been referenced with a pronoun (cf. Centering’s Pronoun Rule). For 40 percent of the remaining references, the preferred referring expression was a full NP. Closer inspection of the relevant tokens reveals that a full NP was used primarily when the head referent was a non-subject in the clause that it was evoked. This was the case for 16 out of the 19 instances of NP reference to the head noun. A representative example is given in (17). That example is especially interesting because it contains two male referents. The first one is introduced as the main-clause subject. The other one is introduced as the object of the main clause and is also the head noun of the relative clause. Crucially, the referent of the head noun, ‘Mr. Breeden’, is the subject of the relative clause and the only third-person male entity evoked in the relative clause. In addition, the relative clause is sentence final, thus immediately preceding the succeeding main clause, which contains the NP reference to ‘Mr. Breeden’. Further, the referent of the head noun is referenced with a pronoun in the embedded complement clause. If relative-clause subjects are marked as salient on a par with main-clause subjects, we would expect subsequent reference to the subject of the relative clause with a pronoun. This expectation would only be strengthened by the fact that the relative-clause subject receives additional mention in the complement clause with a pronoun. Contrary to expectation, if we replace the full NP ‘Mr. Breeden’ with the pronoun ‘he’ in (17c), the pronoun is interpreted as ‘Mr. Brady’, the subject of the main clause. (17) a. In testimony to the Senate securities subcommittee, Mr. Brady disputed the view of SEC Chairman Richard Breeden, b. who told a House panel Wednesday that he doesn’t want the ability to halt the market. c. Mr. Breeden contended that discretionary power could have an impact on the markets if rumors were to circulate about when the exchanges might be closed. d. He added that the president already has the power to close the markets in an emergency.
368
Miltsakaki
The remaining three cases of NP reference to the head noun are less informative, as in two instances the NP expression appeared in a sentence crossing a paragraph boundary, and one instance involved NP reference in a pÂ�arenthetical phrase.4 In the “Ref. to ‘other’â•›” results, we observe that ‘other’ entities were rarely referenced. Reference to an ‘other’ entity occurred in seven of the 95 tokens (excluding N/A tokens). There were no instances of pronominal reference to an ‘other’ entity. In which-relatives, the referent of the head noun was subsequently referenced in approximately 17 percent of the tokens. Comparing these results with those obtained from who-relatives, we observe that reference to the head noun was much less frequent for which-relatives, probably reflecting the tendency of€which-relatives to modify non-subject (and non-human) referents. Whichrelatives modified a subject in only six cases. Now let us turn to type of referring expression. A pronoun was used for reference to the head noun in seven of the 17 cases. Looking closer at these seven tokens, we observe the following distribution: In four cases the head noun was the subject in the sentence. A typical example is shown in (18), in which the antecedent of the pronoun ‘it’ in (18b) is the subject of the preceding sentence (18a). In another case, the pronoun appeared in the same sentence containing the relative clause. The remaining two pronominal references were harder to analyze. In one case, the pronoun appeared in an elliptical utterance; in the other, it occurred in a discourse containing complement clauses and switches from indirect to quoted speech, shown in (19). Analyzing this pronominal use requires a better understanding of topic management and of its interaction with discourse structure in indirect and quoted speech. (18) a. The roadi in whichi we found ourselves as we turned round the corner from the retired Saxe-Coburg Squarej presented as great a contrast to itj as the front of a picture does to the back. b. Iti was one of the main arteries which conveyed the traffic of the City to the north and west. (19) a. The only day they “have a chance to compete with large supermarkets is on Sunday,” the council’s resolution said. b. The small shops “must be retained, for they provide essential service to the community,” according to the resolutioni, whichi added that they “also are the source of livelihood for thousands of our neighbors.” c. Iti declares that Sunday sales licenses provide “great revenue” to the local government.
Complex Sentence Structure
369
A full NP was used for reference to the head noun in seven tokens. In six of these, the head referent was a non-subject. In the remaining token, the head referent was a subject embedded in a complement clause. An ‘other’ entity was subsequently referenced in 34 cases. A pronoun was used in seven tokens, of which in five cases the ‘other’ entity was already pronominalized in the relative clause (i.e., the relative clause contained a reference to an ‘other’ entity already evoked in the preceding discourse). In each these cases, the discourse contained other first-person pronominal references and no competing antecedents. A typical example of this category is shown in (20). (20) a. Indeed, apart from the nature of the investigation which my friend had on hand, there was something in his masterly grasp of a situation, and his keen, incisive reasoning, which made it a pleasure to me to study his system of work, and to follow the quick, subtle methods by which he disentangled the most inextricable mysteries. b. So accustomed was I to his invariable success that the very possibility of his failing had ceased to enter into my head. From the remaining two cases, in one the pronoun appeared intra-Â�sententially, as shown in (22). In the other, shown in (21), the main clause in (21a) contains a there construction and the main clause as a whole provides the setting against which the main character in the story is introduced. Note that the only entities evoked in the main clause are furniture items. This type of example is stylistically marked, often encountered in literary text. What is marked about this example is that the main clause is used to present the background against which the main character is introduced. An expectation that the ‘small man’ is going to be central in the subsequent discourse is facilitated by the fact that no other character is introduced in the sentence and is probably cued by the non-Â� canonical post-verbal position of the subject in the relative clause.5 In the remaining 13 cases of reference to an ‘other’ entity, the referring expression used was a full NP. (21) a. There was nothing in the office but a couple of wooden chairs and a deal table, behind which sat a small mani with a head that was even redder than mine. b. Hei said a few words to each candidate as he came up,â•–.â•–.â•–. (22)╇The lawi whichi governs home rule charter petitionsj states that theyj must be referred to the chairman of the board of canvassers for verification of the signatures within 10 days.
370
Miltsakaki
As with which-relatives, in that-relatives reference to the head noun was low, approximately 18 percent. An anaphoric expression other than an NP was used three times, all intra-sententially. In two cases, the anaphoric was the null subject of a participial form occurring in the same sentence that contained the relative and a special case of ‘one’ anaphora shown in (23). There were no pronominal references to the head noun in the subsequent sentence. (23)╇Frequently it is not the teami with the greater muscular development or speed of foot thati wins the victory, but the onei-assoc with the more grit and perseverance. The most frequent expression for reference to the head noun was a full NP. Consistent with the results on who- and which-relatives, a full NP was used when the head noun had a non-subject grammatical role in the main clause, as in (24). This was the case for all five instances of NP reference to the head€noun. (24) a. Mr. Doherty kept only those muscles tense that were used in the game. b. The muscles especially necessary for tennis were also, so far as possible, kept lax except at the instant for making the stroke. Turning to ‘other’ referents in that-relatives, again we observe that the reference pattern for ‘other’ referents is similar to that observed with which-Â�relatives with approximately 27 percent reference to ‘other’ in subsequent discourse. From those cases, one instance contained pronominal reference to the head noun intra-sententially, as in (25), another contained zero anaphora intraclausally, and a third case included anaphora to the head noun with a relative pÂ�ronoun (with no further reference in the following sentence), again intra-Â� sententially. So, as in the case of the head noun, there were no pronominal references to an ‘other’ entity in the subsequent sentence. For the remaining nine tokens, the referring expression used was a full NP. (25)╇And being informed, by two of their numberi thati went to their townj, that the Indians had entirely evacuated itj, we proceeded no further and.â•–.â•–.â•–. The corpus analysis of entity salience and choice of referring expression for reference to entities evoked in who-, which-, and that-relative clauses shows a strong tendency for entities evoked in relative clauses to be subsequently referenced with a full noun phrase rather than a pronoun. Pronominalization of an entity mentioned in a relative clause was attested when that entity had been introduced earlier in the discourse and was already referenced with a pronominal expression in the relative clause. Further, we identified instances of subsequent pronominalized reference to entities evoked in the relative clause, but in
Complex Sentence Structure
371
those instances the subject of the main clause was also referenced with a pronominal expression — a finding that offers support for Centering’s Pronoun Rule, which posits that topical entities are pronominalized first. Apparently, once the topic of the current unit is pronominalized, pronominal reference to other entities evoked in the preceding discourse is possible. These findings are supportive of Centering’s Pronoun Rule to the extent that the main clause and the relative clause are processed as a single “utterance.” With respect to the status of subjects in relative clauses, we saw cases in which pronominal reference to a relative-clause subject was not possible despite the fact that the relative-clause subject was the most recent subject. Instead, a full noun phrase was used in subject position, which most likely was perceived as a topic switch. If this is the case, we conclude that reference to an entity in the subject position of a relative clause is not sufficient to establish it as the most salient entity in the ongoing discourse. These findings are inconsistent with Givón’s theory of topicality, which predicts that a pronoun is licensed when the referent is evoked at a short distance from the anaphoric expression, especially when no competing antecedents are evoked to hinder entity accessibility. To the extent that we want to retain recency as a valid factor for referent accessibility, our data indicate that this factor is sensitive to the structural configuration of the preceding discourse rather than absolute distance measured linearly. In terms of Centering, the pronominalization results of the current study indicate that entities evoked in relative clauses rank lower than the set of entities evoked in the main clause and therefore do not contribute a Preferred Center unless none is available in the main clause. Topic Continuity in Relative Clauses
In the previous section, the main criterion for the evaluation of entity salience in relative clauses was the choice of referring expression for subsequent references to those entities. The observed reference patterns lend support to the Centering view of the relationship among topics, subjects, and choice of referring expression, provided that the relative clause is processed as a single unit with the main clause. In this section, we will explicitly address this question. Centering’s definitions of center transition are exceptionally well suited for this task. If relative clauses contribute a topic-update unit, we expect to see more “coherent” transitions when they are processed as such. If, on the other hand, relative clauses belong with the main clause and together form a topicupdate unit, we expect to see more “coherent” transitions when they are processed together. In this condition, entities evoked in relative clauses should not hinder continuation on the topic established in the main clause.
372
Miltsakaki
For this study, 100 tokens of non-restrictive relative clauses were extracted from the Wall Street Journal corpus. Only non-restrictive relative clauses were extracted for this study, because restrictive relative clauses participate in the verbal predicate argument structure and are therefore less likely to form independent topic-update units. Non-restrictive relative clauses, on the other hand, have been treated as independent clauses, by some accounts even at the syntactic level (e.g., McCawley 1981). The extraction of non-restrictive relative clauses was done according to the following criteria: (a) the relative clause was preceded by a comma (to exclude restrictive relative clauses); ( b) the sentence following the relative clause included reference to at least one entity evoked in the sentence containing the relative clause, either in the main clause or in the relative clause; and (c) the relative clause was in sentence-final position (to ensure that the relative clause is adjacent to the following unit). For each token, Centering transitions were computed in two versions. In version A, two Centering transitions were computed: one for the sentence cÂ�ontaining the relative clause and one for the sentence following the relative clause. In other words, in version A the center update unit is the complex sentence. Let’s call this the complex-sentence version. In version B, three Centering transitions were computed: one for the first sentence, excluding the relative clause; one for the relative clause; and one for the sentence following the relative clause. Version B, then, assumes that each clause, either main or relative, is an independent unit. Let’s call it the single-clause version. The results of the computation of Centering transitions in the two conditions are shown in table 14.3. Specifically, table 14.3 shows the results for the single-clause version. The column “more ‘coherent’ transition” contains the number of cases in which a more “coherent” transition was computed in the final sentence in the single-clause condition by comparison to the complex-sentence condition. The column “less ‘coherent’ transition” shows how many times a less “coherent” transition was computed in the final sentence, and the column “no effect” shows how many times the same transition was computed in both conditions. The relevant degree of coherence was specified according to Centering’s transitions rule: Continueâ•–>â•–Retainâ•–>â•–Smooth Shiftâ•–>â•–Rough Shift. So, for example, if the transition computed for the unit following the relative clause was Continue in the single-clause condition but Rough Shift in the complex-Â� Table 14.3
Effect of non-restrictive relatives on Centering transitions. More “coherent” transition
Less “coherent” transition
No effect
Total
13
46
41
100
Complex Sentence Structure
373
sentence condition, then the transition was identified in table 14.3 as more “coherent.” Reversely, if the transition computed for the last unit was, for eÂ�xample, Smooth Shift in the single-clause condition and Continue in the cÂ�omplex-sentence condition, the transition was identified as less “coherent.” Examples are shown below. A typical example of the category “less ‘coherent’ transition” is given in (26) and (27). The computation of transitions in the single-clause condition shown in (26) yields a Rough-Shift transition, which is classified as less “coherent” than the Continue transition computed in the complex-sentence transition, shown in (27). (26) (A disaffected, hard-drinking, nearly-30 heroi sets off for snow country in search of an elusive sheep with a star on its back at the behest of a sinister, erudite mobster with a Stanford degree.) SINGLE-CLAUSE CONDITION a. Hei has in tow his prescient girlfriendj, Cbâ•–=â•–hero Cpâ•–=â•–hero Trâ•–=â•–Continue b. whosej sassy retorts mark her as anything but a docile butterfly. Cbâ•–=â•–girlfriend Cpâ•–=â•–girlfriend Trâ•–=â•–Smooth Shift c. Along the way, hei meets a solicitous Christian chauffeur who offers the hero God’s phone number; Cbâ•–=â•–none Cpâ•–=â•–hero Trâ•–=â•–Rough Shift (27) COMPLEX-SENTENCE CONDITION a. Hei has in tow his prescient girlfriend, whose sassy retorts mark her as anything but a docile butterfly. Cbâ•–=â•–hero Cpâ•–=â•–hero Trâ•–=â•–Continue b. Along the way, hei meets a solicitous Christian chauffeur who offers the hero God’s phone number; Cbâ•–=â•–hero Cpâ•–=â•–hero Trâ•–=â•–Continue Examples such as the above are supportive of the hypothesis that relative clauses are not processed as topic-update units. Processing the relative clause as a unit by itself yields three problems. First, we would process ‘girlfriend’ as the most likely topic of the subsequent discourse, an expectation that is not met. In fact, this entity is not mentioned at all in the following sentence. Second, counter to intuition, the discourse would be modeled as disconnected. Disconnected discourses are predicted to be hard to process because they place on the hearer the extra burden of inferring the intended link. Third, the use of the pronoun would be puzzling. If the most salient entity after processing the relative clause is ‘girlfriend’, then the pronominalized reference to the ‘hero’ which was evoked two units before is unexpected.
374
Miltsakaki
On the other hand, if the discourse is processed according to the complexsentence hypothesis, none of the problems above arises. The highest-ranked entity in that unit, ‘hero’, is processed as the most likely topic of the discourse, an expectation that is met as indicated by the pronominal reference in the subject position. The discourse now “coheres” in that it is perceived as being about the same entity. In the results reported in table 14.3, the single-clause condition yields a more “coherent” transition in 13 of the 100 cases, whereas for 41 cases it yields a less “coherent” transition. So, overall these findings lend support to the cÂ�omplex-sentence hypothesis. But what about the 13 instances in which the single-clause hypothesis appears to yield more “coherent” transitions? The examples shown in (28) and (29) are representative of the cases in which the single-clause condition yields a more “coherent” transition. In this case, processing the relative clause as an independent unit yields a Continue transition, which is more coherent than the Smooth-Shift transition computed in the complex-sentence condition. Closer inspection of this example, however, reveals that the head noun is referenced in the subsequent discourse with a full NP, despite the fact that it appears in subject position in the relative clause. This pattern of reference in which an entity is promoted to a subject position with an NP form has been observed in other languages (Miltsakaki 2003; Turan 1995) as a strategy used by speakers to signal a shift to a new topic. An entity first evoked in a non-salient position is then promoted to a subject position with a full NP and is established as the new topic. Pronominal reference is avoided in this case, despite the accessibility of the referent, presumably because the referent was not the topic of the sentence in which it was evoked but rather is intended to be a new topic. Other independent factors that could account for the use of an NP do not hold here. Specifically, the use of the NP in this case is not dictated by the grammar, does not provide any further information about the referent (Fox 1987), and does not appear on a segment boundary (Passonneau and Litman 1993). In any event, other factors licensing the use of an NP perform functions that are independent of referent accessibility, so we should still be able to use a pronoun to refer to ‘Mr. Kilpatrick’ successfully. This is not the case. According to native speakers’ judgment, the preferred interpretation for a subject pronoun in the last sentence would be ‘Wilson Taylor’, the subject of the main clause. It seems, then, that in this example, we do, in fact, have a Smooth-Shift transition that correctly reflects processing of ‘Mr. Kilpatrick’ as the new topic. (28) SINGLE-CLAUSE CONDITION a. Wilson H. Taylori, president and chief executive officer of this insurance and financial services concern, was elected to the additional post of chairman.
Complex Sentence Structure
375
b. Mr. Taylori, 45 years old, succeeds Robert D. Kilpatrickj, 64, Cbâ•–=â•–Taylor Cpâ•–=â•–Taylor Trâ•–=â•–Continue c. whoj is retiring, as reported earlier. Cbâ•–=â•–Kilpatrick Cpâ•–=â•–Kilpatrick Trâ•–=â•–Smooth Shift d. Mr. Kilpatrickj will remain a director. Cbâ•–=â•–Kilpatrick Cpâ•–=â•–Kilpatrick Trâ•–=â•–Continue (29) COMPLEX-SENTENCE CONDITION a. Wilson H. Taylori, president and chief executive officer of this insurance and financial services concern, was elected to the additional post of chairman. b. Mr. Taylori, 45 years old, succeeds Robert D. Kilpatrickj, 64, who is retiring, as reported earlier. Cbâ•–=â•–Taylor Cpâ•–=â•–Taylor Trâ•–=â•–Continue c. Mr. Kilpatrickj will remain a director. Cbâ•–=â•–Kilpatrick Cpâ•–=â•–Kilpatrick Trâ•–=â•–Smooth Shift The design of the corpus study described in this section was based on basic principles of Centering theory. However, the significance of the findings goes beyond the Centering framework. Centering formalizes basic intuitions that we have about entity-based coherence. Discourses that are carefully planned around a single entity and smoothly shift our attention to new topics are perceived as more coherent than discourses that either focus entities in and out or discourses that appear to be disconnected (captured with Rough Shifts). So, the notion of the topic-update unit is not theory internal. Topic-update units may or may not turn out to be defined on structural grounds (as is suggested here), but they need to be identified somehow. What the studies reported here have revealed is that there is a distinction between main and subordinate clauses, which challenges the tacit assumption that all clauses are processed as single units. At the very minimum, topic identification, entity salience, and choice of referring expression appear to be sensitive to the syntactic choices made by speakers when they organize discourse. Conclusions For Topics, Subjects, and Pronouns
The present study has addressed a number of questions and offered some preliminary answers. The basic conclusion is that the choice of clause type in which an entity is evoked affects the entity’s salience status in the discourse. Specifically, on the issue of subjecthood, we saw that subjects in adverbial and relative clauses exhibit significant differences in their behavior when compared with their main-clause counterparts. We saw that when two competing antecedents for a pronominal expression appear in subject position in a main
376
Miltsakaki
clause and a subordinate clause, the subject of the main clause takes lead even if the subordinate clause is linearly closer to the pronominal expression. We also saw that subject pronouns in main clauses exhibit greater sensitivity to structural focusing than subject pronouns in adverbial clauses. Finally, the Centering study of relative clauses suggested that topic assignment is also sensitive to sentence structure. Entities introduced in structurally salient positions in relative clauses seem to leave the topicality status of entities in the main clause unaffected. The main/subordinate distinction raises further questions about the nature of subordination itself. Syntactic subordination of the type we are concerned with here seems to be a universal property of languages. Still, little is understood as€to why subordinate clauses exist in grammar. With the exception of comÂ� plement clauses and restrictive relative clauses, subordinate clauses such as adverbial and non-restrictive relative clauses do not participate in the pÂ�redicateargument structure of the verb. As a toy experiment, one could successfully rewrite any text as a succession of main clauses without changing propositional content or discourse relations, as in (30) and (31). While the investigation of the nature and purpose of subordination in grammar will await further research, the fact remains that grammars allow speakers to choose between using a subordinate clause and using a non-subordinate clause to express propositions and relations between propositions. (30) a. Mary was late this morning because she missed the 8 a.m. bus. b. Mary missed the 8 a.m. bus this morning. As a result, she was late. c. Mary was late this morning. She missed the 8 a.m. bus. (31) a. Mary was hired by Mr. Brown, who is the director of NBC. b. Mary was hired by Mr. Brown. Mr. Brown is the director of NBC. Earlier studies of the relationship between choice of linguistic form and discourse function showed that speakers use syntactic variability to express a variety of discourse functions which contribute a range of meanings that are not derived compositionally from the syntactic representation. Prince (1999a) showed that syntactic “topicalization” has two discourse functions, one being to trigger an inference on the part of the speaker that the “topicalized” entity stands in a partially ordered set relation to some other entities evoked in the preceding discourse. Other word-order phenomena have also been shown to serve a variety of discourse functions and information-packaging needs (Ward and Birner 1996; Vallduvi and Vilkuna 1998). In searching for an explanation of the current findings on the salience status of entities in main and subordinate clauses, I would like to suggest that one of the factors driving the choice of subordinate clauses in discourse might be to
Complex Sentence Structure
377
mark entities with low salience. Strategies for marking low salience have received little attention in the literature, presumably because low salience can be seen to fall naturally and effortlessly out of a model of discourse salience. Strategies of marking low salience, however, can reduce the complexity of inferencing required in processing discourse. Joshi and Kuhn (1979) proposed a logic approach to discourse processing according to which one of the entities in the discourse is singled out to form a special argument, the discourse center. A complex predicate is then constructed, including other entities; it is predicated of the centered entity. In such a model, subordinate clauses can be seen as delimiting the boundaries of the internal structure that will be temporarily hidden in the complex predicate. In this way, it is possible to retain a single center while introducing multiple entities that are propositionally related to the center. As I warned at the beginning of my conclusions, the research discussed in this paper raises more questions than it answers. A successful theory that will account for the complexities of reference and salience in discourse is still missing. However, I hope that the current findings will spur new interest in the effects of complex structure and, in particular, the role of subordination in discourse. Notes 1.╇ Similar measures of reference accessibility have been proposed in Ariel (1990). However, in Ariel (1990), topicality is a binary property of an entity and is viewed as one of the factors determining referent accessibility. 2.╇ The study of subordinate clauses has received much attention in the narrative literature (Reinhart 1984; Labov 1972; Thompson 1987; Talmy 2000, among others). This line of research is primarily concerned with the potential correlation of subordination and backgrounding in discourse. Since this work does not make any explicit claims with respect to topicality, subjects and pronominal interpretation it is not discussed here, but the interested reader is referred to Miltsakaki 2003 for a review. 3.╇ In fact, only temporal and contrastive relations can be expressed with both a subordinate conjunction and an adverbial connective (‘then’-‘when’, ‘however’ ‘although’). 4.╇ To accurately reflect the frequency of reference to the head noun entity, table 14.2 reports cases of first-person and second-person reference to the head entity. Tokens classified as “other” include reference with quantificational expressions and other rÂ�elative pronouns. These cases are excluded from the discussion of the results as the conÂ�tribution of each of these expressions to entity salience and reference are poorly understood. (i) a. From north, south, east, and west every mani whoi had a shade of red in his hair had tramped into the city to answer the advertisement. b. Fleet Street was choked with red-headed folki, and Pope’s Court looked like a coster’s orange barrow.
378
Miltsakaki
5.╇ It is, of course, possible that the post-verbal position of the subject is due to the heaviness of the NP. References Ariel, M. 1990. Accessing Noun-Phrase Antecedents. Routledge. Brennan, S., Walker-Friedman, M., and Pollard, C. 1987. A Centering approach to pronouns. In Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics. Caramazza, A., and Gupta, S. 1979. The roles of topicalization, parallel function and verb semantics in the interpretation of pronouns. Linguistics 3, 497–518. Carden, G. 1982. Backwards anaphora in discourse context. Journal of Linguistics 18, 361–387. Chafe, W. 1976. Givenness, contrastiveness, definiteness, subjects, topics and point of view. In C. N. Li (ed.), Subject and Topic. Academic Press. Clifton, C., and Ferreira, F. 1987. Discourse structure and anaphora: Some experimental results. In Attention and Performance XII. Erlbaum. Cooreman, A., and Sanford, A. 1996. Focus and Syntactic Subordination in Discourse. Technical report, Human Communication Research Center. Di Eugenio, B. 1998. Centering in Italian. In M. Walker, A. Joshi, and E. Prince (eds.), Centering Theory in Discourse. Clarendon. Fox, B. 1987. Discourse structure and anaphora. Cambridge: Cambridge University Press. Givón, T. 1976. Topic, pronoun and grammatical agreement. In C. Li (ed.), Subject and Topic. Academic Press. Givón, T. 1983. Topic continuity in discourse: A quantitative cross-language study. In Topic Continuity in Discourse: An Introduction. John Benjamins. Grosz, B., Joshi, A., and Weinstein, S. 1995. Centering: A framework for modeling lÂ�ocal coherence in discourse. Computational Linguistics 21 (2), 203–225. Gundel, J., Hedberg, N., and Zacharski, R. 1993. Cognitive status and the form of referring expressions in discourse. Language 69, 274–307. Horn, L. 1986. Presupposition, theme and variations. In Proceedings of 22nd Annual Meeting of the Chicago Linguistics Society. Joshi, A., and Kuhn, S. 1979. Centered logic: The role of entity centered sentence representation in natural language inferencing. In Proceedings of 6th International Joint Conference on Artificial Intelligence. Kameyama, M. 1993. Intrasentential Centering. In Proceedings of Workshop on Centering, University of Pennsylvania. Kameyama, M. 1998. Intrasentential Centering: A case study. In M. Walker, A. Joshi, and E. Prince (eds.), Centering Theory in Discourse. Clarendon. Kehler, A. 2002. Coherence, Reference, and the Theory of Grammar. CSLI PÂ�ublications.
Complex Sentence Structure
379
Labov, W. 1972. The transformation of experience in narrative syntax. In Language in the Inner City. University of Pennsylvania Press. McCawley, J. 1981. The syntax and semantics of English relative clauses. Lingua 53, 99–149. McDonald, J., and MacWhinney, B. 1995. The time course of anaphor resolution: EÂ�ffects of implicit causality and gender. Journal of Memory and Language 34, 543– 566. Miltsakaki, E. 2001. On the interpretation of weak and strong pronominals in Greek. In Proceedings of the 5th International Conference in Greek Linguistics, Sorbonne. Miltsakaki, E. 2002a. Effects of subordination on referential form and interpretation. In Proceedings of the 26th Penn Linguistics Colloquium. Miltsakaki, E. 2002b. Towards an aposynthesis of topic continuity and intra-sentential anaphora. Computational Linguistics 28 (3), 319–355. Miltsakaki, E. 2003. The Syntax-Discourse Interface: Effects of the Main-Subordinate Distinction on Attention Structure. Doctoral dissertation, University of Pennsylvania. Passonneau, R., and Litman, D. 1993. Intention-based segmentation: Human reliability and correlation with linguistic cues. In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. Prince, E. 1999a. How not to mark topics: ‘Topicalization’ in English and Yiddish. In Texas Linguistic Forum. University of Texas. Prince, E. 1999b. Subject pro-drop in Yiddish. In P. Bosch and R. van der Sandt (eds.), Focus: Linguistic, Cognitive, and Computational Perspectives. Cambridge University Press. Reinhart, T. 1981. Pragmatics and linguistics: An analysis of sentence topics. Philosophica 27 (1), 53–93. Reinhart, T. 1984. Principles of gestalt perception in the temporal organization of narrative texts. Linguistics 22, 779–809. Stevenson, R., Crawley, R., and Kleinman, D. 1994. Thematic roles, focusing and the representation of events. Language and Cognitive Processes 9, 519–548. Stevenson, R., Knott, A., Oberlander, J., and McDonald, S. 2000. Interpreting pronouns and connectives: Interactions among focusing, thematic roles and coherence relations. Language and Cognitive Processes 15 (3), 225–262. Suri, L., McCoy, K., and DeCristofaro, J. 1999. A methodology for extending focusing frameworks. Computational Linguistics 25 (2), 173–194. Talmy, L. 2000. Toward a Cognitive Semantics. MIT Press. Tanaka, I. 2000. Cataphoric personal pronouns in English news reportage. In Proceedings of the 3rd International Conference of Discourse Anaphora and Anaphor Resolution Conference. Thompson, S. 1987. In Coherence and Grounding in Discourse, volume 11. John BÂ�enjamins. Turan, U. 1995. Null vs. overt subjects in Turkish discourse: A Centering analysis. Doctoral dissertation, University of Pennsylvania.
380
Miltsakaki
Vallduvi, E., and Vilkuna, M. 1998. On rheme and kontrast. Syntax and Semantics 29, 79–108. van Hoek, K. 1997. Anaphora and Conceptual Structure. University of Chicago Press. Walker, M., Joshi, A., and Prince, E. (eds.). 1998. Centering Theory in Discourse. CÂ�larendon. Ward, G., and Birner, B. 1996. On the discourse function of rightward movement in English. In A. Goldberg (ed.), Conceptual Structure, Discourse and Language. Center for the Study of Language and Information.
15â•…
Complement Focus and Reference Phenomena Anthony J. Sanford and Linda M. Moxey
In this chapter we review various accounts of a phenomenon that Moxey and Sanford (1987) termed complement set reference (compset reference for short). We show that the compset reference pattern is part of a cluster of phenomena that have to do with the perspectives induced by negative and positive quantifiers. We then discuss various attempts to integrate the phenomenon into discourse semantics, and present a summary of psychological experimentation that suggests the phenomenon may be best understood in terms of the association between aspects of negativity and denial of a supposition. The paper reflects a shift in what complement set focus actually means. A quantified statement like Some boys are scouts may be described in terms of the following sets: 1.╇ A set of boys. 2.╇ Those boys who are scouts (a necessarily non-empty set). 3.╇ Those boys who are not scouts (a possible set, pragmatically implicated). For positive quantifiers, set 2 may be referred to by a plural pronoun, as in (1) and (2) below. (1)╇ Some boys are scouts. They are always prepared for anything. (2)╇Eight of the ten marbles are in the jar. They were found when Mum vacuumed the floor. But set 3 may not be referred to: (3)╇ Some of the boys are scouts. They are in the Sea Cadets instead. (4)╇ Eight of the ten marbles are in the jar. They are under the sofa. These observations form the basis for the claim in Discourse Representation Theory (Kamp and Reyle 1983) that under normal circumstances set 2 is the set that is accessible by pronouns. We shall call this set the Reference Set, or
382
Sanford and Moxey
refset for short. Within DRT, a process of abstraction following a quantified statement can make various sets available for reference, but there is no operation corresponding to the subtraction of set 2 from set 1, which would derive the compset (see Kamp and Reyle 1983). Although considered to be typical of all quantifiers, these observations hold only for positive quantifiers. Negative quantifiers have been shown to behave quite differently (Moxey 1986; Moxey and Sanford 1987). As example (5) shows, acceptable pronominal anaphoric reference may be made to something other than set 2: (5)╇ Not many of the boys are scouts. They are in the Sea Cadets instead. Moxey and Sanford (1987) proposed that in (5) they is taken as referring to the set of boys who are not scouts, i.e., to set 3. This they called the complement set. As we shall argue, examples like (5) show reference to sets other than set 1 and set 2. We shall examine the possibilities and describe a theory that describes how a set closely related to set 2 becomes available. In this theory, there is no need to posit the subtraction procedure that DRT rules out. Empirically, when participants are invited to write continuations to quantified sentences containing negative quantifiers, and when continuations begin with the pronoun they, many of the continuations are of the complement reference type exemplified by (5). Subsequent studies have confirmed this pattern and expanded our knowledge of what types of continuations are typical of a whole range of quantifiers (Sanford, Moxey, and Paterson 1996; Moxey, Sanford, and Dawydiak 2001). Examples of quantifiers and the proportions of continuations that were compset references, from a number of studies, are shown in table 15.1. The focus difference between positive and negative quantifiers has also been demonstrated in on-line reading tasks, including self-paced reading (Sanford et€al. 1996, experiment 1; see also Percus, Gibson, and Tunstall 1997), and eye tracking (Paterson et al. 1998). For instance, given (6), the patterns for reading times on (7) are influenced by focus. (A slash delimits options.) (6)╇Prospective air-traffic controllers had to take a series of aptitude tests. A few/ Few of the applicants passed. (7)╇ Their success/failure confirmed the organizer’s expectations. If A few occurred in (6), then (7) was read faster in the success version. If Few occurred in (6), then (7) was read faster in the failure version. The focus difference is thus robust, and occurs in several different paradigms (see also Sanford, Williams, and Fay 2001).
Complement Focus and Reference
383
Table 15.1
Examples of positive and negative quantifiers, with complement set production rates. (These are illustrative, showing that compset bias for the negatives has a considerable range. In all cases here, data is based on pronominal reference patterns.) Negative (monotone decreasing)
Positive
Quantifier
% compset
Quantifier
% compset
Not many (x)a Not quite all (x)a Few (x)a Less than half (x)a No more than 10 / 10% (x)a Not quite 90% (x)b At most 10 / 10% (x)b
83 63 66 60 63 33 16
A few (x)a Nearly all (x)a
0 0
More than half (x)a No less than 80% (x)a Nearly 90% (x)b At least 10 / 10%b
0 2.5 0 1
a.╇ From Sanford et al. 1996. b.╇ From Moxey et al. 2001. The Broader Setting
Within semantics, the established view is that the different focus patterns set up by negative quantifiers represent an unusual or abnormal state of affairs (see especially Nouwen 2003). Indeed, even the reality of complement set patterns of reference has been doubted by some people, and it has received scant attention from formalists. However, in this section we want to show that the focus difference between negative and positive expressions is not a trivial phenomenon, but is a key manifestation of perspective differences that have broad-ranging implications for communicating with quantifiers. For us, complement focus is a central aspect of negative quantifiers, and is part of the more general story of negativity. We give two examples from the broader setting as an argument that complement focus is a real phenomenon that requires theoretical treatment. Affective Polarity
In a quantified statement concerning an event that is negative, negative and positive quantifiers show a contrast in the affective evaluations that they allow: (8)╇ A few people were killed in the ferry disaster, which is a terrible thing. (9)╇ Few people were killed in the ferry disaster, which is a good thing. The endings terrible thing and good thing cannot be interchanged, though the€appropriateness of the ending can be reversed if the event is changed to a
384
Sanford and Moxey
positive one, as in Quant people survived the.â•–.â•–.â•–. Over a wide range of situations, these intuitions have been shown to hold in people’s judgments (Sanford, Fay, Stewart, and Moxey 2002, experiment 1). This particular phenomenon is very important for communication, since it generalizes beyond quantifiers to frequency adverbs and risk portrayal (Sanford and Moxey 2000; Teigen and Brun 1999). In (10) and (11), slight risk and little risk denote about the same level of risk, but little risk puts focus on the large non-risk, while slight risk puts focus on the risk that does exist, with clearly different effects: (10)╇There is a slight risk of ill-effects from the power-lines, which is a bad thing. (11)╇There is little risk of ill-effects from the power lines, which is a good thing. Attribution of Cause
Given an event, attributing the cause of that event to one thing or another depends crucially on base-rate information. For instance, in (12) and (13), the base-rate information indicates that most people other than Mary do not trip over Ralph: (12)╇ Mary trips over Ralph on the dance floor. (13)╇ Few other people trip over Ralph on the dance floor. Here we form the impression that there must be something about Mary that caused the trip, since so few others trip over Ralph. Now compare this with a few substituted for few: (14)╇ Mary trips over Ralph on the dance floor. (15)╇ A few other people trip over Ralph on the dance floor. Here there is no impression that Mary is the problem; indeed, maybe Ralph is clumsy. It is known from psychometric studies that a few and few denote the same amount in a given context (Moxey and Sanford 1993; Sanford et al. 1996), so the difference is not due to the number of people who trip over Ralph, but to the polarity of the quantifier and the perspective that it induces. These intuitions hold up experimentally when participants are asked to make explicit attributions (Barton and Sanford 1990). The literature of social cognition includes a large number of studies testing how people make attributions, many of which have used simple vignette descriptions (e.g., Cheng and Novick 1990; Hilton and Slugoski 1986; McArthur 1972). The aim has been to observe how the presentation of information about various types of base-rate information might influence the pattern of attribu-
Complement Focus and Reference
385
tion. According to the main theoretical framework of covariation theory, what matters is the number of cases in which some base-rate event co-occurs with the target event in question (Cheng and Novick 1990; Kelley 1967). In these studies, covariation information is nearly always depicted through the use of quantifiers (or corresponding frequency adverbs), and there is a confound. Small degrees of covariation are depicted through negative quantifiers, while large amounts are depicted by positive quantifiers. Here we have shown that positive and negative quantifiers produce different patterns of attribution, even when they denote closely similar amounts. Indeed, for some methods of assessing attribution, it turns out that focus explains the entire effect, nothing being attributable to amount (Majid, Sanford, and Pickering 2006). We have chosen these two examples of affective polarity and attribution of cause to illustrate why the communicative impact of negative and positive quantifiers is pervasive and certainly worth consideration. In each case the relationship to focus is clear. What matters is whether the refset or the compset is the object of attention, because it is this that sets the perspective induced by a quantified sentence. We now go on to look at theoretical issues relating to complement focus, beginning with a reaction to the very idea of compset focus as a theoretical possibility. Sidestepping Complement Set Focus
One of the earliest reactions to the claim that negative quantifiers allow complement set reference was that complement set reference simply does not o�ccur. If we take the production and comprehension data seriously, this reaction amounts to explaining complement set reference as something else. Generalizations and the MaxSet
Perhaps the most obvious route to take is that reference is not to the complement set proper, but to the superset. So, given (16), rather than taking They in (17) as referring to the complement set, one could take it as referring to the whole set of boys: (16)╇ Few boys in the class passed the physics test. (17)╇ They couldn’t be bothered to do any revision. It could be true of the boys in general, or of all the boys, that they did no revision, and as result, few passed. This prevalent view (e.g., Nelken and Francez 1997, p. 408) is thoroughly articulated by Corblin (1996; see Guerts 1997 and Percus et al. 1997 for related arguments). Kamp and Reyle (1993) defined a
386
Sanford and Moxey
process of abstraction that can occur given a quantified sentence. So, given (16), three levels of abstraction are deemed possible: 1.╇ total abstraction: the set of boys from the class who passed the test 2.╇ intermediate abstraction: boys in the class 3.╇ minimal abstraction: boys in general (generic) It is assumed that the complement set ( boys in the class who did not pass the test) is inaccessible. Corblin’s idea is that cases of apparent compset reference can be understood as anaphoric to one of the more general sets, thus preserving Kamp and Reyle’s admissible reference sets and removing the need for set subtraction. So-called complement set reference reduces to what Corblin calls the MAXSET (the set generally). However, if we suspend all theoretical prejÂ� udices, they could just as easily refer to the Compset, as the participants in our€experiments clearly believe. Moxey and Sanford (1987; see also Sanford et€al. 1996 and Moxey et al. 2001) asked participants to indicate “who they were” in the continuations they had produced. The following alternatives were offered: the boys who passed the physics test the boys who did not pass the physics test the boys in general all of the boys other ( please specify). For examples like (16), the overwhelming response was The boys who did not pass. So participants believe that that is what they are referring to, even if we have some reason to doubt the participant’s beliefs in what they were talking about. Participant’s beliefs have been refuted as evidence of what they actually referred to, on the grounds that there is no reason to suppose that participants know to what they are referring ( Nouwen 2003). However, we still need to explain participants’ intuitions regarding the appropriate referent for they, even if we go along with the refutation. Some empirical data cast doubt on the Maxset thesis (Moxey and Sanford 2000). For example, in the extensive data of Sanford et al. (1996), comprising hundreds of continuations, there are virtually no cases of explicit mention of generalization, as in (18): (18)╇ They mostly/generally/typically/on the whole preferred geometry. Furthermore, although some examples, like (16), seem to us to be essentially undecidable, the argument against compset seems to us unsustainable in many instances. For instance, complement pattern continuations occur when the complement set is in the minority.
Complement Focus and Reference
387
(19)╇ Not quite all of the students were at the orienting meeting. (20)╇ They couldn’t get there because of the bus strike. This example cannot be given a collective interpretation (Guerts 1997), and so there is no way that they could refer to the Max Set in this instance, since there is no way the students in general could be unable to be at the meeting. (See Moxey and Sanford 2000 for more detailed arguments.) We argue on this basis that although there may be some cases where the reference is undecidable between Maxset and Compset, there are clear cases where there does appear to be Compset reference. Perhaps the strongest evidence for the possibility of complement set focus comes from a study by Sanford, Williams, and Fay (2001). This line of argument uses the includes(x, y) relation. If x is an individual, and y is a set, then there is an unambiguous mapping of an individual into a set. In (21), there is a clear mapping of John into the set of individuals who went mountain climbing: (21)╇ A few people went mountain climbing, including John. In contrast, most people would agree that in (22) John did not go mountain climbing: (22)╇ Not many people went mountain climbing, including John. In fact, in a judgment experiment (Sanford et al. 2001, experiment 1), this was precisely what was found for not many(x). Although these studies do not force participants to resolve a plural pronoun to some salient subset, they do force participants to include John in the most salient subset. Participants reveal which subset John belongs to, and hence which subset is in focus, by their answer to the question. Sanford et al. (2001, experiment 2) carried out a selfpaced reading task using materials like (23). (23)╇Not many of the MPs attended the meeting, and that included John. His presence/absence helped the meeting to run smoothly. Although no overt judgment was called for in this task, set-mapping preferences were revealed through the relative reading times for the two versions of the second sentence. For the version with his presence (focusing on the reference set), reading times were longer than reading times for the version with his absence (that focuses on the complement set). The opposite pattern was found if a few was used instead of not many. So, even when no explicit judgment is called for, reading-time data support our claim that attachment patterns for a few and not many are different. With the set-inclusion judgments illustrated by (21) and (22), there tends to€ be a little individual variation, especially in the judgments for negative
388
Sanford and Moxey
e�xpressions, such as (22). However, measures of the confidence with which such judgments were made by naive participants showed no variation between€p�ositive and negative quantifiers, or among quantifiers in general. Variability in itself is not problematic, and we trade on it in the development of the Supposition-Denial Theory outlined later in the paper. Despite the evidence described above, some still question the reality of compset reference. It is, as we shall see, difficult to explain in terms of existing formal semantic accounts, and this theory-driven concern is no doubt one of the reasons for continued questioning. One point to note is that the processes that lead to compset reference may be quite different from those that lead to refset reference, and we have no argument with this view. Indeed our interest is in finding what processes do lead to successful reference and successful understanding. The data as they stand are indisputable, and regardless of whether one believes that apparent reference to the compset is real reference to the compset, it is clear that compset reference (as a phenomenon) is different from refset reference, and that quantifiers vary systematically with respect to these reference types. The most effective account of the data, without t�heoretical prejudice, is one that explains both compset and refset reference. Attempts to Treat the Problem Semantically
It is obviously desirable to develop a semantic treatment of complement anaphora and the related phenomena, but, while clarifying some theoretical issues, the attempts up to now have generally failed to engage the data adequately. We outline two approaches, and discuss the more obvious problems that they encounter. In Dynamic Semantics
There have been two main attempts to treat complement reference licensing within semantic frameworks. The earliest, due to Kibble (1997a,b), assumes, as we did at the time, that complement set reference is licensed by a semantic feature, downward monotonicity.1 Kibble’s aim is to modify van den Berg’s (1996) Generalized Dynamic Quantifier Logic (GDQL), which itself is an attempt to incorporate Generalized Quantifier Theory into dynamic plural logic. Other treatments of quantifiers focus on the relationships expressed within the quantified sentence, and hence do not account for inter-sentential relationships and anaphoric reference. Hence, Kibble’s aim is to explain complement set reference by slightly modifying current formal theories of the semantics of quantified text. According to GDQL, monotone decreasing quantifiers must be modeled as negations of monotone increasing counterparts. Kibble (1997a) argues that a
Complement Focus and Reference
389
monotone increasing quantifier can be negated in two ways that have different dynamic effects, although they are equivalent in their truth conditions. The two kinds of negation proposed by Kibble are related to forms of negation identified by Zwarts (1996). Informally, the complement of a quantifier Q (−Q) looks like the opposite of the quantity itself, for instance less than 20 percent of the Asâ•–→â•–20 percent or more of the As; the contradual of a quantifier Q (Q−) is the opposite of the subset identified by the quantifier, for instance less than 20 percent of the Asâ•–→â•–80 percent or more of the As. Kibble’s (1997a) external negation is based on the complement (−Q), which he calls Qd, and which effectively negates the quantity, so that few MPs went to the meetingâ•–→â•–the set of MPs who came to the meeting was not a set of many of the MPs; internal negation (Kibble 1997a) on the other hand is based on the contradual of Q (Q−), is called Q′, and effectively attaches the negation of Q to the complement of the subset identified in the quantified noun phrase, so that few MPs went to the meetingâ•–→â•–the set of MPs who did not come to the meeting was a set of many of the MPs. Kibble does not specify the circumstances under which a monotone decreasing quantifier will lead to either external or internal negation; rather, he focuses on what must be represented in a formal model of these forms, and on how these lead to the natural availability of both refset and compset for subsequent reference. The definitions of external and internal negation as given by Kibble (1997a) are such that external negation leads naturally to representation of the refset and to the overall set (the MPs who went to the meeting and all of the MPs in the above example), whereas internal negation leads naturally to the representation of the compset and to the overall set (the MPs who did not go to the meeting and all of the MPs). However, crucially from our point of view, the definition for internal negation is also such that the availability of the compset for reference depends on the cardinality of the quantifier. Specifically, the quantifier must be proportional because the compset is derived from the sÂ�ubtraction of the refset from the whole set. (See Kibble 1997a for formal definitions.) In Optimality Theoretic Semantics
More recently, Nouwen (2003) attempted a treatment of Complement reference as an extension of work on quantification in Optimality-Theoretic Semantics (Hendriks and de Hoop 2001). According to Optimality Theory (Prince and Smolensky 1997), a syntactic string leads to several possible interpretations, the set of which is called the candidate set. Within the system there is a hierarchy of constraints that allow each possible interpretation to be more or less favored. Any particular interpretation is optimal if it violates fewer constraints than all the other interpretations available in the candidate set.
390
Sanford and Moxey
Nouwen (2003) describes three constraints that influence our choice of referent for a plural pronoun following a quantified statement: 1.╇ Emptiness: As the antecedent of an anaphoric expression, do not choose a set which is or may be empty. 2.╇ Avoid Contradiction: Do not choose a set that will lead to a contradiction. 3.╇ Forward Directionality: The topic range induced by the domain of quantification of a determiner is reduced to the topic range induced by the intersection of the two argument sets of this determiner. These three constraints do not have equal strength, but are ordered from stronger (1) to weaker (3). Thus if there are two interpretations, one of which violates constraint 1 while the other violates constraint 3, the interpretation that violates constraint 3 will be chosen, as this is the optimal interpretation (creating less of a violation than the other interpretation, which violates a stronger constraint). Forward directionality sets the default interpretation of a plural pronoun at the refset. Avoid Contradiction means “If this interpretation leads to a contradiction, as in (24) and (25), then go for the compset.” (24)╇ Not many of the boys went to the party. They went to the beach instead. (25)╇ Many of the boys went to the party. They went to the beach instead. Emptiness, however, is the strongest constraint, and since in (25) many does not exclude the possibility of an empty compset, we end up with a plural pronoun referring to a set that leads to a contradiction. In fact, it can be argued that not many allows the possibility of an empty refset, so that (26) sounds odd. (26)╇ Not many of the boys went to the party. They enjoyed it very much. Example (26) is consistent with Forward Directionality, and this interpretation does not lead to a contradiction. However, the potential emptiness of the refset is a violation of the Emptiness constraint. If they in (26) refers to the compset, then Forward Directionality and Avoid Contradiction are violated, but not Emptiness. Hence the system should choose a compset reference. Yet Nouwen implies that they is interpreted as a reference to the refset, because he modifies the emptiness constraint as follows (seemingly to preserve the a priori assumption that the refset must be the default): 1′ Modified Emptiness: As the antecedent of an expression do not choose a set which is potentially empty, except when this set is the reference set of a sentence. In fact, data on reading time (Sanford et al. 1996) and eye tracking (Paterson, Sanford, Moxey, and Dawydiak 1998) showed that materials such as (26) are
Complement Focus and Reference
391
very difficult for participants to process, so perhaps Nouwen could argue that the constraints are weighted such that in some instances there is no outright winner. According to Nouwen’s analysis, it seems that compset focus is possible only if a reference to the refset would lead to a contradiction and if the compset necessarily exists (i.e., the quantifier cannot be interpreted as including the whole set in its denotation). Nouwen argues that whereas refset pronominal anaphora (in common with other types of anaphora) select salient referents, compset pronominal anaphora are extraordinary. He argues that their antecedents are not salient and that hence acceptability of the anaphoric link is dependent on semantic and pragmatic constraints. These constraints are as follows: (A)╇ The compset must be inferable. (B)╇ The compset must be a uniquely inferable set, not sharing all of its semantic features with other possible referents. (C)╇ Only inferences using semantic information are allowed. That is, the effort involved in making the anaphoric link cannot be too great. (D)╇ The anaphoric link must support discourse coherence. Nouwen argues that the inferability of compsets (constraint A above) is dependent on their status with respect to the emptiness constraint. That is, for a compset to be inferable, it must exist. In order to check for this, Nouwen provides the equation D(A)(B)â•–→â•–$x (A(x) ∧ ¬ B(x)). In other words, there must be some set which is a subset of the noun-phrase set but not the verb-phrase set, again the compset is defined by the difference between the whole set and the refset, entailing that compset reference is not possible with cardinal quantifiers. In (27), anaphoric reference to the compset is to all of those MPs who do not attend the meeting. (27)╇ Few of the MPs attended the meeting. They were busy elsewhere. Thus, if it turns out that some of the MPs are neither at the meeting nor busy elsewhere, the second sentence of (27) would be false. Though we agree that the referent of a plural pronoun must be highly salient, if not unique, it is simply not the case that it must correspond to the whole set minus the refset. Consider (28). (28)╇Few single men expect to father children. They want to date many women, and children would tie them down. Others are gay and know it simply won’t be possible for them. Here They does not refer to all of those single men who do not expect to father children, yet the referent of They is clear and does not seem awkward.
392
Sanford and Moxey
Although Nouwen argues that only semantic information can lead to inferences that support compset reference, it seems to us that it has to do with the amount of effort. It is widely known in psycholinguistics that language understanders limit their inferential activity in various ways and to various extents in processing text. Inferences supporting compset reference are no exception. Finally, if compset reference is one type of non-salient reference, then it must support discourse coherence. Nouwen argues that the complement set is used as a resolution for pronominal reference only when there is a semantic reason to do so — that is, when it is necessary to avoid contradiction. In other words, in (27) above resolving the plural pronoun as referring to the refset would lead to a contradiction (the MPs cannot both be at the meeting and elsewhere), and so it is resolved to the compset. It follows that when we first encounter they in reading (27) we either delay the assignment of a referent or resolve it inappropriately and have to backtrack after the second VP is processed ( having worked out that it is a contradiction). Both our theory and our data support the claim that compset reference occurs later in processing, after the VP of the sentence containing the pronoun. Problems with Extant Semantic Treatments
For Kibble, a more substantial problem is that some monotone decreasing quantifiers do not seem to produce complement set references. Thus, it has been suggested, on the basis of intuition, that At most N of the (x) would not produce compset references (remarks by Krifka cited in Kibble 1997a; also see Devlin 1997). If this is true, then either Kibble’s semantic theory as it stands does not properly account for when complement set focus will and will not occur, or it is wrong. Moxey, Sanford, and Dawydiak (2001) tested these conjectures using a continuation task and found that continuations to sentences containing At most N of the (x) led to a much lower incidence of complement set references than its close relative No more than N of (x), although there were some complement pattern continuations. What is important is not so much that this monotone decreasing quantifier does not produce complement set references at all as that different quantifiers seem to lead to different proportions of complement set references. In the study by Sanford et al. (2001), the value for at most N of the (x) was 19 percent, as assessed by the includes(x, y) method described above. In the study of Sanford et al. (1996), which used pronominal continuations as the dependent variable, values ranged from around 30 percent for expressions like less than X percent of the (x) to more than 95 percent for not many(x), with few (x) taking an intermediate position. These values are typical and reasonably stable. The purely semantic approaches have difficulty explaining the variation in complement set reference pattern with respect to
Complement Focus and Reference
393
individual quantifiers. This was one of the principal objectives behind the development of the theory we describe below. There are problems common to both of these semantic accounts. The empirical difficulties spring from a simple failure to explain the facts. First, according to both accounts, complement set focus is possible for proportional quantifiers, but not for cardinal quantifiers, such as (29). (29)╇ Less than 10 fans went to the big football match. This is because technically there is no way to define complement set where€there is no superset. For both semantic accounts, compsetâ•–=â•–supersetâ•–−â•–refset. However, there is ample evidence that complement set references do occur with negative cardinal quantifiers (Moxey and Sanford 1998; Sanford, Dawydiak, and Moxey 2007). Both Nouwen (2003) and Kibble (1997a) assumed that complement set references would not occur with cardinal quantifiers. Perhaps it can be argued that Nouwen (2003) can account for some variÂ� ability in the incidence of compset after negative quantifiers such as At most N€ of the (x). According to his account, the default resolution of plural pronouns€after quantified statements is to the refset. Only if such a resolution leads to a contradiction can resolution settle in favor of the compset. The problem with this account is that if we assume that a language-production system operates under similar constraints as the comprehension system, or even that it operates in a way that is compatible with the comprehension system, then referring to the compset by pronouns in a production task would never occur, although production has been the main method used to demonstrate compset focus effects. According to Nouwen’s account, the system would first have to produce a VP that is inconsistent with the reference. If we suppose that a cÂ�omprehension system has these defaults, then we still have to account for a different set of constraints for the production system and in addition we have to assume that the comprehension system is only ever successful after some backtracking. Even if Nouwen can explain compset reference as dependent on context, and hence not all-or-none, he can only (in principle) explain why values for a particular quantifier varied from context to context. He still has no way to explain why not many(x), few(x), less than half of the (x) and at most N of the (x) chronically produce different incidences of compset reference. A further problem for Nouwen’s account is that it is not clear how or why the compset becomes one of the sets the system tries to prioritize. That is, while the constraints act as filters on possible sets for reference leading to a definite choice of refÂ� erent, the processor functions on a set of sets. It is not clear what constrains the€production of possible sets for this set of sets. Clearly it is not possible to
394
Sanford and Moxey
generate all possible sets, so how does the processor decide which sets to apply the constraints to? And does this mechanism always generate the compset? The Supposition-Denial Theory of Complement Set Focus
More generally, we believe that part of the problem with trying to explain complement set focus in the ways discussed above is that the explanations are based on the assumption that sets which are available for reference are to be drawn from an over-restricted range of possibilities. As we have seen, this is partly because of the sets assumed to be made available within the standard semantic treatments. Our approach derives from cognitive psychology and psycholinguistics, which focus more on human understanding than on some abstract notion of linguistic meaning per se. That is, existing knowledge, goals, and intentions are seen to interact with linguistic information to produce understanding of utterances in cognitive psychology, so the role of pragmatic information must be central and irrepressible if we are to have a full account of meaning. For this reason, the approach to complement set focus developed by the Glasgow group (Sanford et al. 1996; Moxey et al. 2001) suggests a very different way of making a relevant set available. In essence, rather than thinking of a complement set as set 3 in our opening example, we suggest how a set closely related to this set is brought to mind. Our approach is based on the idea that negations tend to be associated with denials of suppositions, and on the idea that, through denial, a set similar to but not the same as the complement sets discussed up to now comes into prominence. The most developed and tested manifestation of the Supposition-Denial theory is presented in Sanford, Dawydiak, and Moxey 2007. (See also Moxey 2006.) It will be outlined here. Negation and Denial
The starting point of this account is the link between negation and denial of suppositions. It has been widely noted that negation is associated with the denial of some sort of expectation (Clark 1976; Horn 1993; Wason 1965). Consider (30). (30)╇ I didn’t have a sandwich for lunch. According to the denial thesis, this sentence is understood as implicating that it might be supposed by someone relevant to the discourse arena that I would have a sandwich for lunch, and this supposition (to use Clark’s (1976) handy phrase) is then denied by the content of the utterance. By the same token, (31), when uttered, implicates that more students were expected by someone.
Complement Focus and Reference
395
(31)╇ Not many of the students came to the linear algebra lecture. In a direct test, Moxey and Sanford (1993) presented participants with sentences like (31). One group was asked “What proportion of the students did the writer think the reader might have expected to be the case before the reader encountered the statement?” It turned out that this proportion was given as higher for the negative determiners few, very few, and not many than for the positive a few. But the proportions denoted by these expressions were virtually indistinguishable. More indirect tests of denial are also possible. The diagnostics of sentential negation suggested by Klima (1964) have been construed as diagnostic of denial (Clark 1976; Horn 1989, chapter 3). S-negation has the form of a denial — for instance, John didn’t have a sandwich for lunchâ•–→â•–Not (John had a sandwich for lunch). Klima (1964) suggested several diagnostics, including the use of tags, as in (32) and (33). In (32) the tag don’t they? is diagnostic of an affirmation, while in (34) the tag do they? is diagnostic of a denial. (32)╇ A few people believe in Santa Claus, don’t they / *do they? (33)╇ Not many people believe in Santa Claus, do they / *don’t they? Other diagnostics include the choice of either/too in constructions where either two affirmations or two denials are combined: (34)╇A few people believe in Santa Claus and a few people believe in the Tooth Fairy too / *either. (35)╇Not many people believe in Santa Claus and not many people believe in the Tooth Fairy either/*too A similar contrast may be made with neither/so, the third diagnostic considered in our own research. Sanford, Dawydiak, and Moxey (2007) used people’s judgments of these diagnostics to determine whether particular quantifiers led to denials (S-negations) or to affirmations. To get a sense of this, the reader is invited to apply the tests. For instance, (36) and (37) seem intuitively aÂ�cceptable. (36)╇ At most 10 people went to the meeting, didn’t they? (37)╇ No more than 10 people went to the meeting, did they? Thus, At most 10 seems to lead to an affirmation, while No more than 10 leads to a denial. Moxey et al. (2001) observed that this corresponds closely to what is observed about compset reference, and conjectured that affirmations lead to€refset reference whereas denials lead to compset reference. The next question is whether those quantifiers that produce lower rates of complement set
396
Sanford and Moxey
references are judged as indicating a denial less frequently than those that produce high rates of complement reference. Mr. In-Between
To paraphrase a classic song, Mr. In-Between is judged as not something to be messed with. One of the complex aspects of complement set data is that some negative quantifiers seemed to produce proportions of complement set references in continuation tasks that were quite low. Sanford et al. (1996) examined many quantifiers in the same experiment and obtained the proportions of compset responses in a continuation task (table 15.1). It is striking that while all of the negative expressions yielded some complement reference, this could be as low as 30 percent. Data like that look decidedly messy, and require explanation. That explanation is at the core of the Supposition-Denial Theory. Sanford, Dawydiak, and Moxey (2007) examined the proposition that the probability of a quantifier depends upon its generating representations in sentences that are construed as denials (S-negations). Sanford et al. (2004) examined a wide range of quantifiers using the three denial diagnostics, asking people to decide which tags (affirmation-signaling or denial-signaling) fitted best, as discussed above. By taking the total number of denial-signaling endings checked as a proportion of the total, they obtained a Denial Index for each quantified sentence. The first observation that they made was that, while some quantifiers led to very high values on the denial index (e.g., not many), others were checked just about equally for denial and affirmation (e.g., less than N of the (x)), and others were checked more clearly as affirmations (e.g., At most N of the (x)). Participants were also asked to choose between complement and standard reference set attachment, as in (38). (38)╇At most 10 of the fans went to the match, including John. Did John go the match? (yes/no) By having a large number of people make judgments diagnostic of attachment and of denial, it was possible to determine whether the extent of complement pattern focus correlated with the extent of denial. Some quantifiers gave very high values on the denial index and also gave very high proportions of complement set attachments. Others gave intermediate values on the denial index, indicating that sometimes they were interpreted as denials and other times not; these gave intermediate proportions of complement set responses. Others produced virtually no complement set attachments, and also were almost always interpreted as affirmations rather than denials. In their experiments, Sanford et€al. (2007) used a wide range of quantifiers under a range of conditions, but
Complement Focus and Reference
397
for illustrative purposes four of the quantifiers with compset attachment values and denial values were the following: Hardly any of the (x)
compsetâ•–=â•–.92
denialâ•–=â•–.92
Not many of the (x)
compsetâ•–=â•–.89
denialâ•–=â•–1.0
Less than 50 percent of the (x)
compsetâ•–=â•–.48
denialâ•–=â•–.373
At most 90 percent of the (x)
compsetâ•–=â•–.08
denialâ•–=â•–.067
Overall, the relationship between denial values for various quantifiers and the proportion of people choosing complement set attachment was linear with a slope of unity, accounting for more than 90 percent of the variance. Furthermore, this held whether the quantifiers were cardinal quantifiers or p�roportional quantifiers, and whether they were in partitive or nonpartitive constructions. In this way, rather than being simply messy, in-between gave some of the best data supporting the idea that complement reference depends on denial.2 Processing and the Supposition-Denial Theory
According to the Supposition-Denial Theory (Sanford, Dawydiak, and Moxey 2007), complement references become possible when a quantified sentence is interpreted as a denial (S-negation). For monotone decreasing quantifiers, a denial-inducing quantifier will lead to a state of affairs in which the amount someone is supposing to be the case is asserted as being less: for instance, if someone asserts that not many people went to Fred’s party, the interpretation will be that less people went than was the supposed by someone ( possibly the speaker). The difference between supposition and assertion is called a sÂ�hortfall. The next part of the theory is that a shortfall, being unexpected, draws the attention of the listener. There is evidence for this in that a frequent form of continuation that contains a complement set reference is what we have termed a “reason-why-not,” that is, a reason why the predicate of the quantified sentence should apply to less of the set being quantified over than is expected. For instance: (39)╇ Few people went to the meeting. They thought is would be too boring. It is commonplace for reasons to be given for things that violate expectation, and we view the prevalence of reason-why-not responses as good evidence for the denial argument (Turnbull 1986). The comprehender’s inferential activity is thus centered on the shortfall, and these inferences concern a set of people that corresponds to what we have called the compset. That is, for example, the understander wants to explain why those who didn’t go to the meeting didn’t go. Hence reference to the compset is in fact reference to the shortfall. The significance of this shortfall is not in the size of the difference between these
398
Sanford and Moxey
amounts; it is simply in the fact that a supposition has been triggered and then denied and a shortfall has been introduced. Thus, the main influence on whether a shortfall set is generated is the quantifier. Note that some positive quantifiers may also be associated with denial, but for these the amount depicted is taken to be more than the supposed amount, and the result is a focus on the surplus, not the shortfall. No less than N of the (x) is such a quantifier. If I say “No less than 15 of the students got a distinction in creative writing,” I am introducing the possibility (supposition) that fewer than 15 may have got a distinction, and then denying this. In such cases, the focus is on the surplus set, which is of course simply an extension of the refset. In summary, the Supposition-Denial Theory is based on the idea that when a negatively quantified statement leads to a denial of an expectation, there is a shortfall between what was expected and the amount being asserted. Because there is something special about a denial in the mind of the listener, the shortfall is what becomes the focus of attention. The shortfall is equivalent to the compset: because it is focused, it predominates in pronominal reference, and in attachment when using the including(x, y) relation. Conclusion
Rather than construe complement set reference as a reference to the Max Set, or even to the whole set minus the refset, we suggest that complement reference phenomena are based on reference to the set of individuals for which the predicate could have been true but isn’t — that is, the shortfall. In some cases this may be the whole set minus the refset, but it need not be as example (28) shows. That is, in (28) we do not expect that all single men expect to father children (we probably don’t desire it either), and so the they refers to those we expected to expect to father children, but who it turns out don’t expect to father children. The key to the generation of a compset is focus on a shortfall, a difference between the value denoted by a quantifier and some other supposed or eÂ�xpected value. The purpose of semantic theories like DRT is to try to capture, in a fÂ�ormalism, the essentials of which sets are available for anaphoric reference for a wide range of cases. We believe that such approaches are importantly limited if they cannot adequately capture the occurrence of complement-type reference. We are unconvinced that attempts to minimize this shortcoming have succeeded. At one level, we would argue that semantic treatments of complement reference have failed because they simply cannot give an adequate account of the facts. Kibble’s account gives a detailed account of what semantic feature is associated with compset reference and of how successful compset reference is
Complement Focus and Reference
399
achieved. However, this feature is not as good a predictor as denial. Furthermore, denial is not an all-or-none property, so that some other variable must be included to account for the data. We suggest that, although this is possible, the feature required is not simply a semantic feature associated with the quantifier, but a feature that functions alongside pragmatic information. Nouwen accounts for preference in the comprehension of pronouns following quantified statements through a series of filters, but it is not clear how the set of possible referents is constructed, and the filters create a hierarchy that not only deviates from our data but also is also inconsistent with a sympathetic language-Â� production system. The semantic accounts assume that the sets available are linguistically given — that is, are part of the meaning of the quantifiers. Other work has shown that non-linguistic antecedents may take precedence over linguistic ones (Oakhill, Garnham, Gernsbacher, and Cain 1992): (40)╇ I need a plate, where do you keep *it / them? (41)╇ I need an iron, Where do you keep it / *them? The Supposition-Denial theory is similarly based on the generation of sets that are not exclusively linguistic. It is possible to think of compset reference as a form of mental deixis and to sidestep its inclusion in any formal theory of reference, claiming that it is not anaphora. However, given the general importance of quantifier-induced perspective effects, and the close relation of these to compset reference, an alternative strategy might be to include denial through negation in formal theories of reference. Notes 1.╇ Downwards entailment, or the property of being monotone decreasing, is a characteristic of quantifiers that are intuitively negative. The property is described in some detail by Barwise and Cooper (1981; see also Keenan and Stavi 1986). Essentially, a quantifier (a determiner plus a noun) is monotone decreasing if what is true of a superset is also true of a subset, e.g., It is true that If less than 10 people went to meeting, then less than 10 people went to the meeting early. So, less than 10 people is a monotone decreasing quantifier. In the case of some quantifiers, this test can yield positive results analytically (e.g., less than 10(x)). In other cases (e.g., few, not many), truth is based on intuition and depends on the assumption that the quantifier includes the null set in its semantics. Downward monotonicity occurs when two of the De Morgan conditions that specify full negation are true, so monotone decreasing quantifiers can be thought of as “weakly negative” (Zwarts 1991; 1998). â•… Some positive quantifiers are monotone increasing. These do not fit the test shown above, but rather fit a test for whether what is true of a subset holds for a superset. For example, it is true that if many people went to the party early, then many people went to
400
Sanford and Moxey
the party, so many(x) is monotone increasing. Some positive quantifiers fit neither frame, and are non-monotonic (e.g., exactly 7). 2.╇ These findings lead to the question of what it is that creates in-between values or extreme values for the denial index. In fact, a number of linguists have also noted that tests of S-negation do not yield clean results in all cases, and that one of the strongest cues (in English) is the presence of a negative particle, such as not (see Ross 1973; Horn 2001, p. 185). References Barton, S. B., and Sanford, A. J. 1990. The control of attributional patterns by the focusing properties of quantifying expressions. Journal of Semantics 7, 81–92. Barwise, J., and Cooper, R. 1981. Generalized quantifiers and natural language. Linguistics and Philosophy 4, 159–219. Cheng, P. W., and Novick, L. R. 1990. A probabilistic model of causal induction. Journal of Personality and Social Psychology 58, 545–567. Clark, H. H. 1976. Semantics and comprehension. Mouton. Corblin, F. 1997. Quantification et anaphore discursive: la reference aux complementaires. Languages 123, 51–74. Dawydiak, E. J., Sanford, A. J., and Moxey, L. M. 2004. A cognitive theory of quantifier perspective effects. Unpublished manuscript. Devlin, N. 1997. Pronominal Anaphora Resolution and the Quantified Phrase. MA tÂ�hesis, University of California, Santa Cruz. Guerts, B. 1997. Review of L. M. Moxey and A. J. Sanford (1993), Communicating quantities. Journal of Semantics 18, 87–94. Hendriks, P., and de Hoop, H. 2001. Optimality theoretic semantics. Linguistics and Philosophy 13, 273–324. Hilton, D. J., and Slugoski, B. R. 1986. Knowledge-based causal attribution: The abnormal conditions focus model. Psychological Review 93, 75–88. Horn, L. R. 1989. A Natural History of Negation. University of Chicago Press. Kamp, H., and Reyle, U. 1993. From Discourse to Logic: Introduction to Model-Â� Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic. Keenan, E. L., and Stavi, J. 1986. A semantic characterization of natural language determiners. Linguistics and Philosophy 9, 253–326. Kelley, H. H. 1967. Attribution theory in social psychology. Nebraska Symposium on Motivation 15, 192–238. Kibble, R. 1997a. Complement anaphora and monotonicity. In G. J. M. Kruiff, G. V. Morrill, and R. T. Oehrle (eds.), Formal Grammar. CSLI. Kibble, R. 1997b. Complement anaphora and dynamic binding. In A. Lawson (ed.), Proceedings of SALT VII. Cornell University Press.
Complement Focus and Reference
401
Klima, E. S. 1964. Negation in English. In J. A. Fodor and J. J. Katz (eds.), The structure of Language. Prentice-Hall. Majid, A., Sanford, A. J., and Pickering, M. 2006. Covariation and quantifier polarity: What determines causal attribution in vignettes? Cognition 99 (1), 35–51. McArthur, L. A. 1972. The how and what of why: Some determinants and consequences of causal attributions. Journal of Personality and Social Psychology 22, 171–193. Moxey, L. M. 1986. Ph.D. thesis, University of Glasgow. Moxey, L. M. 2006. Effects of what is expected on the focussing properties of quantifiers: A test of the presupposition-denial account. Journal of Memory and Language 55, 422–439. Moxey, L. M., and Sanford, A. J. 1987. Quantifiers and focus. Journal of Semantics 5, 189–206. Moxey, L. M., and Sanford, A. J. 1993. Prior expectation and the interpretation of nÂ�atural language quantifiers. European Journal of Cognitive Psychology 5, 73–91. Moxey, L. M., and Sanford, A. J. 1997. Choosing the right quantifier: Usage in the context of communication. In T. Givón (ed.) Conversation. John Benjamins. Moxey, L. M., and Sanford, A. J. 1998. Complement set reference and quantifiers. In M.€A. Gernsbacher and S. J. Derry (eds.), Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. Moxey, L. M., and Sanford, A. J. 2000. Focus effects with negative quantifiers. In M. Crocker, M. Pickering, and C. Clifton (eds.), Architectures and Mechanisms of Language Processing. Cambridge University Press. Moxey, L. M., Sanford, A. J., and Barton, S. B. 1990. The control of attentional focus by quantifiers. In: K. J. Gilhooly, M. Keane, R. H. Logie, and G. Erdos (eds.), Lines of Thinking, volume 1. Wiley. Moxey, L. M., Sanford, A. J., and Dawydiak, E. J. 2001. The role of denial in negative quantifier focus. Journal of Memory and Language 44, 427– 442. Nelken, R., and Francez, N. 1997. The analogy between nominal and temporal anaphora revisited. Journal of Semantics 14, 369– 416. Nouwen, R. 2003. Complement anaphora and interpretation. Journal of Semantics 20, 73–113. Oakhill, J., Garnham, A., Gernsbacher, M. A., and Cain, K. 1992. How natural are conceptual anaphors? Language and Cognitive Processes 7, 257–280. Paterson, K. B., Sanford, A. J., Moxey, L. M., and Dawydiak, E. 1998. Quantifier polarity and referential focus during reading. Journal of Memory and Language 39, 290– 306. Percus, O., Gibson, T., and Tunstall, S. 1997. Antecedenthood and the evaluation of quantifiers. Poster presented at Tenth CUNY conference. Prince, A., and Smolensky, P. 1997. Optimality: From neural networks to universal grammar. Science 275, 1604–1610. Ross, J. R. 1973. Slifting. In M. Gross, M. Halle, and M.-P. Schutzenberger (eds.), The Formal Analysis of Natural Languages. Mouton.
402
Sanford and Moxey
Sanford, A. J., Dawydiak, E, and Moxey, L. M. 2007. A unified account of quantifier perspective effects in discourse. Discourse Processes 44, 1–32. Sanford, A. J., and Moxey, L. M. 2000. Risk portrayal and risk appreciation as a problem in language use. In L. Lindquist and R. J. Jarvella (eds.), Language, Text, and Knowledge. Walter de Gruyter. Sanford, A. J., Fay, N., Stewart, A. J., and Moxey, L. M. 2002. Perspective in statements of quantity, with implications for consumer psychology. Psychological Science 13, 130–134. Sanford, A. J., Moxey, L. M., and Paterson, K. B. 1996. Attentional focusing with quantifiers in production and comprehension. Memory and Cognition 24, 144–155. Sanford, A. J., Williams, C., and Fay, N. 2001. When being included is being excluded: A note on complement set focus and the inclusion relation. Memory and Cognition 29, 1096–1101. Teigen, K. H., and Brun, W. 1999. The directionality of verbal probability expressions: Effects on decisions, predictions, and probabilistic reasoning. Organizational Behavior and Human Decision Processes 80, 155–190. Turnbull, W. 1986. Everyday explanation: The pragmatic of puzzle resolution. Journal for the Theory of Social Behavior 4, 7–11. Van den Berg, M. 1996. Dynamic generalized quantifiers. In J. van der Does and J. Van Eijck (eds.), Quantifiers, Logic and Language. CSLI. Wason, P. C. 1965. The contexts of plausible denial. Journal of Verbal Learning and Verbal Behavior 4, 7–11. Zwarts, F. 1991. Negation and generalized quantifiers. In J. van der Does and J. van Eijck, eds., Generalized Quantifier Theory and Applications. Dutch Network for Language, Logic and Information. Zwarts, F. 1998. Three types of polarity. In E. Hinrichs and F. Hamm (eds.), Plural Quantification. Kluwer.
16â•…
The Binding Problem for Language, and Its Consequences for the Neurocognition of Comprehension Peter Hagoort
For a linguist and a psycholinguist, this paper takes a slightly odd starting€point. It considers the organization of sentence and discourse processing from the vantage point of brain sciences. This does not entail a change of the explanans, which remains the same, namely to provide an adequate account of the processing architecture of language processing. However, it assumes that useful additional constraints can be derived from our understanding of brain organization. The domain of interest in this paper is not single-word processing, nor language production. The paper focuses on the interpretation processes beyond single-word recognition, at the level of the utterance and beyond (discourse). It is generally agreed among language-comprehension researchers that a characterization of interpretation as a concatenation of lexically stored single-word information is insufficient. In contrast, incoming sound or orthographic information triggers a cascade of memory-retrieval operations that make available the relevant basic ingredients for understanding. These include the morphophonological, semantic, and syntactic features of lexical items, which have to be combined in a principled way to bring about a coherent interpretation of the full input string. In analogy to the visual neurosciences, I will refer to the unification of different language-relevant feature types as the binding problem for language. Binding in this context refers to a problem that the brain has to solve, not to a concept from a particular linguistic theory. The view that I develop in this chapter is strongly biased by research in which I was involved. I do not claim to do justice to the field as a whole. Nevertheless, I hope that I will succeed in sketching a picture of on-line language comprehension at the level of the sentence and beyond that is sufficiently motivated by the available empirical evidence. The Binding Problem
One of the central questions in neuroscience is referred to as the binding problem. This problem is particularly well studied in the domain of vision. In short,
404
Hagoort
it is the explanatory gap between the knowledge of relatively specialized brain areas for particular visual features (such as edges, color, motion, etc.) and the unified representation of the visual world that dominates awareness. How are the different attributes of an object, which are known to be processed in different cortical areas within visual cortex, brought together so that they result in a unified visual percept? One solution that has gained popularity in recent years, although it is still controversial, is that the mechanism of visual binding is related to the synchronicity of firing in the cell assemblies that code for the individual visual features ( Varela et al. 2001). A fair amount of data suggest that synchronicity of neuronal firing might be an important mechanism for visual binding. However, this does not guarantee that the same mechanism can solve the binding problem for language. In fact, I believe that synchronicity of firing cannot contribute to binding in the domain of language processing to the extent that it presumably does in visual perception. One major reason is that visual binding is more or less instantaneous. The relevant areas in visual cortex deliver their specific outputs (color information, motion information, etc.) within a very narrow time window. On the basis of the available experimental evidence, it is assumed that synchronous networks emerge and disappear at time scales between 100 and 300 msec ( Varela et al. 2001). In contrast, one of the hallmarks of language processing is that information is spread out over relatively extended time periods. For instance, in parsing the auditory sentence “Noam thought of a couple of nice example sentences for his linguistics class but by accident wrote them down in his political diary,” the information of Noam as the subject of the sentence still has to be available a second or so later when the acoustic information encoding the finite verb form ‘wrote’ has reached auditory cortex. In addition, the inherently hÂ�ierarchical nature of language processing creates problems for a feature-binding account (such as when the same lexical features have to be bound into two different entities, as holds for the lexical features of “dog” in the phrase “the little, but not the big dog”). A feature-binding account does not seem to be able (at least in a straightforward way) to prevent the interpretation of this phrase as “the little big dog.” This is known as the problem of 2 (Jackendoff 2002). Crucially, the binding problem for language is how information that is processed not only in different parts of cortex, but also at different time scales and at relatively widely spaced parts of the time axis, can be unified into a coherent representation of a multi-word utterance. One requirement for solving the binding problem for language is, therefore, the availability of cortical tissue that is particularly suited for maintaining information on-line, while binding operations take place. Prefrontal cortex (PFC) seems to be especially well suited for doing exactly this (Mesulam 2002). It
The Binding Problem for Language
405
has reciprocal connections to almost all cortical and subcortical structures, which puts it in a unique neuroanatomical position for binding operations across time, both within and across different domains of cognition. In human evolution, PFC has shown a massive expansion. It occupies roughly one-third of the neocortical mantle in humans. Two major areas within PFC are lateral prefrontal cortex and orbitofrontal cortex. Lateral PFC includes portions of the inferior, middle, and superior frontal gyri. The posterior portions of the lateral PFC (roughly involving Brodmann’s areas 9, 44, 45, and 46) are especially involved in various cognitive tasks (Knight and Stuss 2002). A core function of these areas is related to working memory; that is, to maintaining information over time and manipulating the contents during the maintenance period. Whether or not domain-specific subdivisions exist within lateral PFC is currently under debate. Another relevant area is orbitofrontal cortex, which is crucial for emotional and social control of cognitive function (PÂ�etrides and Pandya 2002). Activations related to sentence and discourse comprehension have been found in lateral prefrontal cortex, mainly in the left hemisphere. As I will argue below, this part of the brain is crucial for binding of phonological, syntactic, semantic, pragmatic, and presumably also non-linguistic contextual information (e.g. visuo-spatial, as in gestures) into a coherent discourse or situational model. In addition, the left temporal cortex is suggested to play a critical role in storage and retrieval of linguistic information that language acquisition has€laid down in memory. Thus, a major subdivision in the left-hemisphere temporofrontal language network is between the retrieval of lexically stored information (temporal cortex) and the on-line integration/binding of this information into the current context. How this constraint from considerations of brain organization fits to an explicit computational model, and to empirical data on language processing, will be discussed in more detail below for two crucial binding operations, namely syntactic binding and semantic binding. Syntactic Binding
Recent accounts of the human language system (Jackendoff 1999, 2002; L�evelt 1999) assume a cognitive architecture that consists of separate processing l�evels for conceptual /semantic information, orthographic/phonological information, and syntactic information. Based on this architecture, most current models of language processing agree that, in on-line sentence processing, d�ifferent types of constraints are very quickly taken into consideration during€ speaking and listening/reading. Constraints on how words can be struc� turally combined operate alongside qualitatively distinct constraints on the
406
Hagoort
cÂ�ombination of word meanings, on the grouping of words into phonological phrases, and on their referential binding into a discourse model. Moreover, in recent linguistic theories, the distinction between lexical items and traditional rules of grammar is vanishing. For instance, Jackendoff (2002) proposes that the only remaining rule of grammar is UNIFY PIECES, “and all the pieces are stored in a common format that permits unification” ( p. 180). The unification operation clips together lexicalized patterns with one or more€variables in it. The operation MERGE in Chomsky’s (1995) Minimalist Program has a similar flavor. Thus, phonological, syntactic, and semantic/ pragmatic constraints determine how lexically available structures are glued together. In models of language processing, there exists fairly wide agreement on the types of constraints that are effective during the formulation and the interpretation of sentences and beyond. However, disagreement prevails with respect to exactly how these are implemented in the overall sentence-processing architecture. One of the defining issues is when and how the assignment of a synÂ� tactic structure to an incoming string of words and the semantic integration of€ single-word meanings interact during listening or reading. The by-nowclassical view is that in sentence comprehension the syntactic analysis is autonomous and initially not influenced by semantic variables (Frazier 1987). Semantic integration can be influenced by syntactic analysis, but it does not contribute to the computation of syntactic structure. An alternative view maintains that lexical-semantic information and discourse information can guide or contribute to the syntactic analysis of the utterance. This view is mainly supported by studies showing that the reading of syntactically ambiguous sÂ�entences is immediately influenced by lexical information or by more global semantic information (e.g., Altmann and Steedman 1988; Trueswell et al. 1993; 1994; Tyler and Marslen-Wilson 1977). Some of the discrepancies between the different views on this topic are due to the fact that no clear distinction is made between cases in which the syntactic constraints are (at least temporarily) indeterminate with respect to the structural assignment (syntactic ambiguity) and cases in which these constraints are sufficient to determine the syntactic analysis. In the former case, there is a substantial body of evidence for an immediate influence of non-syntactic context information on the structure that is assigned (Tanenhaus and Trueswell 1995; Van Berkum et al. 1999a). However, for the latter case, although it has not been studied as intensely, the available evidence seems to provide support for a certain level of syntactic autonomy (Hagoort 2003; O’Seaghdha 1997). A more recent version of the autonomous syntax view is that proposed by Friederici (2002). Based on the time course of different language-relevant ERP
The Binding Problem for Language
407
effects, Friederici proposes a three-phase model of sentence comprehension. The first phase is purely syntactic in nature. An initial syntactic structure is formed on the basis of information about the word category (noun, verb, etc.). During the second phase, lexical-semantic and morphosyntactic processes result in assignment of thematic roles. In the third phase, integration of the different types of information takes place, and the final interpretation results. This proposal is based mainly on findings in ERP studies on language processing. The last 15 years have seen an increasing number of ERP studies on syntactic processing, triggered by the discovery of an ERP effect to syntactic violations that was clearly different from the well-known N400 effect to semantic violations (Hagoort et al. 1993; Osterhout and Holcomb 1992; figure 16.1). These studies have been followed up by a large number of ERP studies on syntactic processing that have provided a wealth of data. Here I will connect the known syntax-related ERP effects to a computational model of parsing ( Vosse and Kempen 2000) that was developed to account for a large portion of behavioral findings in the parsing literature and for deficit patterns in aphasic patients. In the context of considerations based on brain organization, it makes the right distinction between lexicalized patterns and a unification component. However, before discussing the model, I will first discuss the relevant ERP results, then present some data that are incompatible with a syntax-first model. Later in this chapter, I will indicate how the model connects to relevant brain areas for syntactic processing, and to data from lesion studies. Language-Relevant ERP Effects
The electrophysiology of language as a domain of study started with the dÂ�iscovery by Kutas and Hillyard (1980) of an ERP component that seemed especially sensitive to semantic manipulations. Kutas and Hillyard observed a negative-going potential with an onset at about 250 msec and a peak around 400 msec ( hence the N400), whose amplitude was increased when the semantics of the eliciting word (i.e., socks) mismatched with the semantics of the sentence context, as in He spread his warm bread with socks. Since 1980, much has been learned about the processing nature of the N400 (for extensive overviews, see Kutas and Van Petten 1994 and Osterhout and Holcomb 1995). As Hagoort and Brown (1994) and many others have observed, the N400 effect does not depend on a semantic violation. Subtle differences in semantic expectancy, as between mouth and pocket in the sentence context “Jenny put the sweet in her mouth/pocket after the lesson,” can modulate the N400 amplitude (figure 16.2; Hagoort and Brown 1994). The amplitude of the N400 is most sensitive to the semantic relations between individual words, or between words and their sentence and discourse
408
Hagoort
Figure 16.1
ERPs to visually presented syntactic prose sentences. These are sentences without a coherent semantic interpretation. A P600/SPS is elicited by a violation of the required number agreement between the subject-noun phrase and the finite verb of the sentence. The averaged waveforms for the grammatically correct and the grammatically incorrect words are shown for electrode site Pz ( parietal midline). The word that renders the sentence ungrammatical is presented at 0 msec on the time axis. The waveforms show the ERPs to this and the following two words. Words were presented word by word, with an interval (SOA) of 600 msec. Negativity is plotted upwards. (adapted from Hagoort and Brown 1994; copyright 1994 Erlbaum; reprinted by permission)
context. The better the semantic fit between a word and its context, the more reduced the amplitude of the N400. Modulations of the N400 amplitude are generally viewed as directly or indirectly related to the processing costs of integrating the meaning of a word into the overall meaning representation that is built up on the basis of the preceding language input (Brown and Hagoort 1993; Osterhout and Holcomb 1992). This holds equally when the preceding language input consists of a single word, a sentence, or a discourse, indicating that semantic binding operations might be similar in word, sentence, and discourse contexts ( Van Berkum et al. 1999b). In addition, recent evidence indi-
The Binding Problem for Language
409
Figure 16.2
Modulation of the N400 amplitude as a result of a manipulation of the semantic fit between a lexical item and its sentence context. The grand-average waveform is shown for electrode site Pz ( parietal midline), for the best-fitting word ( high cloze), and a word that is less expected in the given sentence context (low cloze). The sentences were visually presented word by word, every 600 msec. In the figure the critical words are preceded and followed by one word. The critical word is presented at 600 msec on the time axis. Negativity is up. (adapted from Hagoort and Brown 1994; copyright 1994 Erlbaum; reprinted by permission)
cates that sentence verification against world knowledge in long-term memory modulates the N400 in the same way (Hagoort et al. 2004). In recent years a number of ERP studies have been devoted to establishing ERP effects that can be related to the processing of syntactic information. These studies have found ERP effects to syntactic processing that are qualitatively different from the N400. Even though the generators of these effects are not yet well determined and not necessarily language specific (Osterhout and Hagoort 1999), the existence of qualitatively distinct ERP effects to semantic and syntactic processing indicates that the brain honors the distinction between semantic and syntactic binding operations. Thus, the finding of qualitatively distinct ERP effects for semantic and syntactic processing operations supports the claim that these two levels of language processing are domain specific.
410
Hagoort
However, domain specificity should not be confused with modularity (Fodor 1983). The modularity thesis makes the much stronger claim that domain-� specific levels of processing operate autonomously without interaction (informational encapsulation). Although domain specificity is widely assumed in models of language processing, there is much less agreement about the organization of the cross-talk between different levels of sentence processing (e.g. Boland and Cutler 1996). ERP studies on syntactic processing have reported a number of ERP effects related to syntax (for an overview, see Hagoort et al. 1999). The two most salient syntax-related effects are an anterior negativity, also referred to as LAN, and a more posterior positivity, here referred to as P600/SPS. LAN
A number of studies have reported negativities that differ from the N400 in that they usually show a more frontal maximum ( but see Münte et al. 1997) and are sometimes larger over the left hemisphere than over the right, although in many cases the distribution is bilateral (Hagoort et al. 2003b). Moreover, the conditions that elicit these frontal negative shifts seem to be more strongly related to syntactic processing than to semantic integration. Usually, LAN effects occur within the same latency range as the N400, that is, between 300 and 500 msec post-stimulus (Friederici et al. 1996; Kluender and Kutas 1993; Münte et€ al. 1993; Osterhout and Holcomb 1992; Rösler et al. 1993). But in some cases the latency of a left-frontal negative effect is reported to be much earlier, between approximately 100 and 300 msec (Friederici 2002; Friederici et al. 1993; Neville et al. 1991). In some studies, LAN effects have been reported to violations of word-Â� category constraints (Friederici et al. 1996; Hagoort et al. 2003b; Münte et al. 1993). That is, if the syntactic context requires a word of a certain syntactic class (e.g. a noun in the context of a preceding article and adjective), but in fact a word of a different syntactic class (e.g. a verb) is presented, early negativities are observed. Friederici (1995) and colleagues (Friederici et al. 1996) have tied the early negativities specifically to the processing of word-category information. However, in other studies similar early negativities are observed with number, case, gender, and tense mismatches (Münte and Heinze 1994; Münte et al. 1993). In these violations, the word category is correct but the morphosyntactic features are wrong. Friederici (2002) has attributed the very early negativities that occur approximately between 100 and 300 msec (labeled ELAN) to violations of word category, and the negativities between 300 and 500 msec to morphosyntactic processing.
The Binding Problem for Language
411
LAN effects have also been related to verbal working memory in connection to filler-gap assignment (Kluender and Kutas 1993). This working-memory account of the LAN is compatible with the finding that lexical, syntactic, and referential ambiguities seem to elicit very similar frontal negativities (Hagoort and Brown 1994; Van Berkum et al. 1999a; Kaan and Swaab 2003b; King and Kutas 1995). Lexical and referential ambiguities are clearly not syntactic in nature, but can be argued to tax verbal working memory more heavily than sentences in which lexical and referential ambiguities are absent. Syntactic ambiguities may also tax working memory more strongly than their unambig� uous counterparts. Future research should indicate whether or not these two functionally distinct classes of LAN effects can be dissociated at a finer grain of electrophysiological analysis. P600/SPS
A second ERP effect that has been related to syntactic processing is a later positivity, nowadays referred to as P600/SPS (Coulson et al. 1998; Hagoort et€al. 1999; Osterhout et al. 1997). One of the antecedent conditions of P600/ SPS effects is a violation of a syntactic constraint. If, for instance, the syntactic requirement of number agreement between the grammatical subject of a sentence and its finite verb is violated (see (1), with the critical verb form in italics; the * indicates the ungrammaticality of the sentence), a positive-going shift is elicited by the word that renders the sentence ungrammatical (Hagoort et al. 1993). (1)╇ *The spoiled child throw the toy on the ground. This positive shift starts about 500 msec after the onset of the violation and usually lasts for at least 500 msec. Because of the polarity and the latency of its maximal amplitude, this effect was originally referred to as the P600 (Osterhout and Holcomb 1993) or, on the basis of its functional characteristics, as the Syntactic Positive Shift (Hagoort et al. 1993). An argument for the independence of this effect from possibly confounding semantic factors is that it also occurs in sentences in which the usual semantic/pragmatic constraints have been removed (Hagoort and Brown 1994). This results in sentences like (2a) and (2b), where one is semantically odd but grammatically correct and the other contains the same agreement violation as in (1). (2) a. The boiled watering-can smokes the telephone in the cat. b. *The boiled watering-can smoke the telephone in the cat. If one compares the ERPs and the italicized verbs in (2a) and (2b), a P600/SPS effect is visible to the ungrammatical verb form (figure 16.1). Though these sentences do not convey any conventional meaning, the ERP effect of the
412
Hagoort
Â� violation demonstrates that the language system is nevertheless able to parse the sentence into its constituent parts. Similar P600/SPS effects have been reported for a broad range of syntactic violations in different languages (English, Dutch, German), including violations of phrase structure (Hagoort et al. 1993; Neville et al. 1991; Osterhout and Holcomb 1992), of subcategorization (Ainsworth-Darnell et al. 1998; Osterhout et al. 1997; Osterhout et al. 1994), of agreement of number, gender, and case (Coulson et al. 1998; Hagoort et al. 1993; Münte et al. 1997; Osterhout 1997; Osterhout and Mobley 1995), of subjacency (McKinnon and Osterhout 1996; Neville et al. 1991), and of the empty-category principle (McKinnon and Osterhout 1996). A P600/SPS has also been reported in relation to thematicrole animacy violations (Kuperberg, Sitnikova, Caplan, and Holcomb 2003). Moreover, a P600/SPS can be found with both written and spoken input Â�(Friederici et al. 1993; Hagoort and Brown 2000a; Osterhout and Holcomb 1993). In summary, two classes of syntax-related ERP effects have been consistently reported. These two classes differ in polarity, in topographic distribution, and in latency characteristics. In terms of latency, the first class of effects is an anterior negativity. Apart from LANs related to working memory, anterior negativities mainly appear in response to syntactic violations. In a later latency range, positive shifts occur that are elicited not only by syntactic violations, but also by complexity variation in grammatically well-formed sentences (Kaan et€al. 2000), or as a function of the number of alternative syntactic structures that are compatible with the input at a particular position in the sentence (syntactic ambiguity) (Osterhout et al. 1994; Van Berkum et al. 1999a). Since these two classes of effects are now well established in the context of language processing, and are clearly different from the N400 effect, the need arises to account for these effects in terms of a well-defined model of language processing. Broadly speaking, models of sentence processing can be divided into two types. One type of model assumes a precedence of syntactic information. That is, an initial syntactic structure is constructed before other information (e.g., lexical-semantic, discourse information) is taken into account (Frazier 1987). I will refer to this type of model as a syntax-first model. The alternative broad set of models claims that the different information types (lexical, syntactic, phonological, pragmatic) are processed in parallel and influence the interpretation process incrementally, that is, as soon as the relevant pieces of information are available (Jackendoff 2002; Marslen-Wilson 1989; Zwitserlood 1989). I will refer to this type of model as the immediacy model. Overall, the behavioral data, although not decisive, favor the second type of model more than the first.
The Binding Problem for Language
413
I will first present some recent ERP data that are more compatible with the immediacy model. Evidence Against the Syntax-First Principle
The strong version of a syntax-first model of sentence processing assumes that the computation of an initial syntactic structure precedes semantic binding operations, because structural information is necessary as input for thematic role assignment. In other words, semantic binding will be impaired if no syntactic structure can be built up. Certain electrophysiological evidence has been taken as evidence for this syntax-first principle (Friederici 2002). Alternative models (Marslen-Wilson and Tyler 1980; MacDonald et al. 1994) claim that semantic and syntactic information are immediately used when they become available without a priority for syntactic information over other information types. ERP evidence for an autonomous syntax-first model for sentence processing is derived from a series of studies in which Friederici and colleagues found an ELAN in response to auditorily presented words whose prefix is indicative of a violation of word category. For instance, Hahne and Jescheniak (2001) and Friederici et al. (1993) had their subjects listen to sentences such as “Die Birne wurde im gepflückt ” (“The pear was being in-the plucked ”) or “Die Freund wurde im besucht ” (“The friend was being in-the visited ”), where the prefixes “ge-” and “be-” in combination with the preceding auxiliary “wurde” indicate a past participle where the preposition “im” requires a noun. In this case a very early ( between 100 and 300 msec) left anterior negativity is observed that precedes the N400 effect. Although this evidence is compatible with a syntax-first model, it is not necessarily incompatible with an immediacy model of sentence processing. As long as word-category information can be derived earlier from the acoustic input than semantic information, as was the case in the above-mentioned sÂ�tudies, the immediacy model predicts that it will be used as it comes in. The syntax-first model, however, predicts that even in cases where word-category information comes in later than semantic information, the syntactic information will nevertheless be used earlier than semantic information in sentence processing. Van den Brink and Hagoort (2004) designed a strong test of the syntax-first model in which semantic information precedes word-category information. In many languages, information about the word category is often encapsulated in the suffix rather than the prefix of a word. In contrast to an immediacy model, a syntax-first model would, in such a case, predict that semantic processing (more specifically, semantic binding) is postponed until after the information about the word category has become available.
414
Hagoort
Figure 16.3
A waveform of an acoustic token of the Dutch verb form “kliederde” (messed). The suffix “-de” indicates past tense. The total duration of the acoustic token is approximately 450 msec. The onset of the suffix “-de” is at approximately 300 msec. After 300 msec of signal, the acoustic token can be classified as a verb. Thus, for a context that does not allow a verb in that position, the Category Violation Point (CVP) is at 300 msec into the verb.
Van den Brink and Hagoort (2004) compared correct Dutch sentences (see (3a)) with their anomalous counterparts (see (3b)) in which the critical word (italicized in (3)) was a semantic violation in the context and also had the incorrect word category. However, in contrast to the experiments by Friederici and colleagues, word-category information was encoded in the suffix ‘-de’. (3) a. Het vrouwtje veegde de vloer met een oude bezem gemaakt van twijgen (The woman wiped the floor with an old broom made of twigs) b. *Het vrouwtje veegde de vloer met een oude kliederde gemaakt van twijgen (The woman wiped the floor with an old messed made of twigs) Figure 16.3 shows the waveform of the spoken verb form ‘kliederde’ (messed). This verb form has a duration of approximately 450 msec. The stem already contains part of the semantic information. However, the onset of the suffix ‘-de’ is at about 300 msec into the word. Only at this point will it be clear that the word category is a verb, not a noun as required by the context. We define this moment of deviation from the correct word category as the Category Violation Point (CVP), because only at this time is information provided on the basis of which it can be recognized as a verb (the incorrect word category in the syntactic context). Although in this case semantic information can be extracted from the spoken signal before word-category information, the syntaxfirst model predicts that this semantic information cannot be used for semantic binding until after the assignment of word category.
The Binding Problem for Language
415
Figure 16.4
Connected speech. Grand-average ERPs from two frontal electrode sites (F7, F8) and three posterior electrode sites (Pz, P3, P4) to critical words that were semantically and syntactically congruent with the sentence context or semantically and syntactically incongruent. Grand-average waveforms were computed after time locking on a trial-bytrial basis to the moment of word-category violation (CVP: Category Violation Point). The baseline was determined by averaging in the 180 –330-msec interval, corresponding to a 150-msec interval preceding the CVP in the incongruent condition. The time axis is in milliseconds. Negativity is up. The ELAN is visible over the two frontal sites, the N400 and the P600/SPS over the three posterior sites. The onset of the ELAN is at 100 msec after the CVP; the onset of the N400 effect precedes the CVP by approximately 10 msec. (after Van den Brink and Hagoort 2004)
Figure 16.4 shows the averaged waveforms that are time-locked to the CVP for two frontal sites where the ELAN is usually observed, and two posterior sites that are representative of N400 effects. As can be seen, the N400 effect clearly precedes the ELAN in time. Whereas the ELAN started approximately 100 msec after the CVP, the N400 effect was already significant before the CVP. To my knowledge, this is the clearest evidence so far for the claim that semantic binding can start before word-category information is provided. This is strong evidence for the immediacy assumption: information available in the signal is immediately used for further processing. In contrast to what a strong version of the syntax-first model predicts, semantic binding need not wait until
416
Hagoort
an initial structure is built on the basis of word-category information. A weaker syntax-first model, which allows prediction of word category, could claim that this prediction was only falsified at the CVP, and thus that semantic binding could be started in advance. However, this weaker version gives up the characteristic of bottom-up priority and assumes an interaction between syntactic context and lexical processing. One can then ask which feature of the processing architecture guarantees that interaction between context and lexical processing is restricted to syntax. In summary, the evidence so far indicates that distinct ERP effects are observed for semantic integration ( N400) and syntactic analysis ((E)LAN, P600/ SPS). The ERP data presented are evidence against a syntax-first model of sentence processing. Rather, as soon as semantic or syntactic information is available, it is used for the purpose of interpretation. This is in line with the assumptions of the immediacy model. The triggering conditions of the syntaxrelated ERP effects are becoming clearer. Apart from the LAN effects related to working memory, so far (E)LAN effects have mainly been seen in response to syntactic violations. These violations can be word-category violations that are sometimes seen early (ELAN), but they can also be morphosyntactic violations that are usually observed within the same time frame as the N400 effects (300 –500 msec). The Anterior Negativities are normally followed by a P600/ SPS. In contrast to the (E)LAN, the P600/SPS is not only seen in response to syntactic violations, but also to syntactically less preferred structures (i.e., in the case of syntactic ambiguity; Van Berkum et al. 1999a; Osterhout et al. 1994), and to syntactically more complex sentences (Kaan et al. 2000). In many cases, the P600/SPS occurs without a concomitant early negativity. For straightforward syntactic violations, the distribution of the P600/SPS seems to be more posterior than the P600/SPS reported in relation to syntactic ambiguity resolution and syntactic complexity (Hagoort et al. 1999; Kaan and Swaab 2003a,b). The Unification Model
The increasing number of ERP studies on syntactic processing in the last 15 years has resulted in a substantial amount of data that are in need of a coherent overall account. I will propose an explicit account of syntax-related ERP effects based on a computational model of parsing developed by Vosse and Kempen (2000), here referred to as the Unification Model. This proposal is certainly not the final version, but only a beginning. The model needs to be adapted, and the account of the ERP data needs to be refined. Nevertheless I believe that progress will be made only if we attempt to connect not only the behavioral data but also data from electrophysiology and neuroimaging to explicit computational accounts. I will first describe the general architecture of this model.
The Binding Problem for Language
417
Figure 16.5
Syntactic frames in memory. These frames are retrieved on the basis of incoming wordform information for the example sentence “the woman sees the man with the binoculars.” DP: determiner phrase. NP: noun phrase. S: sentence. PP: prepositional phrase. art: article. hd: head. det: determiner. mod: modifier. subj: subject. dobj: direct object.
According to the Unification Model each word form in the lexicon is associated with a structural frame. This structural frame consists of a three-tiered unordered tree specifying the possible structural environment of the particular lexical item (see figure 16.5; for details concerning the computation of word order, see Harbusch and Kempen 2002). The top layer of the frame consists of a single phrasal node (e.g., NP). This so-called root node is connected to one or more functional nodes (e.g., Subject, Head, Direct Object) in the second layer of the frame. The third layer contains phrasal nodes to which lexical items or other frames can be attached. This parsing account is “lexicalist” in the sense that all syntactic nodes (S, NP, VP, N, V, etc.) are retrieved from the mental lexicon. That is, chunks of syntactic structure are stored in memory. There are no syntactic rules that introduce additional nodes. In the on-line comprehension process, structural frames associated with the individual word forms incrementally enter the unification workspace. In this workspace, constituent structures spanning the whole utterance are formed by a unification operation. This operation consists
418
Hagoort
Figure 16.6
The unification operation of two lexically specified syntactic frames. The unification takes place by linking the root node NP to an available foot node of the same category. The number 2 indicates that this is the second link that is formed during on-line processing of the sentence “The woman sees the man with the binoculars.”
of linking lexical frames that have matching root and foot nodes (see figure 16.6) and checking agreement features (number, gender, person, etc.). It specifies what Jackendoff (2002) refers to as the only remaining “grammatical rule”: UNIFY PIECES. The resulting unification links between lexical frames are formed dynamically, which implies that the strength of the unification links varies over time until a state of equilibrium is reached. Because of the ambiguity that is inherent in natural language, alternative binding candidates will usually be available at any point in the parsing process. That is, a particular root node (e.g., PP) often finds more than one matching foot node (i.e. PP) with which it can form a unification link (see figure 16.7). Ultimately, one phrasal configuration results. This requires that only one of the alternative binding candidates remain active. The required state of equilibrium is reached through a process of lateral inhibition between two or more alternative unification links. In general, owing to gradual decay of activation, more recent foot nodes will have a higher level of activation than those that entered the unification space earlier. This is why the likelihood of an attachment of the PP into the syntactic frame of the verb ‘sees’ is higher than into the€syntactic frame for ‘woman’ (figure 16.7). In addition, the strengths of the
The Binding Problem for Language
419
Figure 16.7
Lateral inhibition between three different PP-foot nodes that are candidate unification sites for the PP-root node of the preposition with. The three possible unification links are indicated by arrows. Lateral inhibition between these three possible unifications (6, 7, and 8) ultimately results in one unification that wins the competition and remains active.
unification links can vary as a function of plausibility (semantic) effects. For instance, if instrumental modifiers under S nodes have a slightly higher default activation than instrumental modifiers under an NP node, lateral inhibition can result in overriding the recency effect. For our example sentence (figure 16.7) it means that the outcome of lateral inhibition is that the PP may be linked to the S-frame (Unification link 7) rather than to the more recent NP node of ‘man’ (U link 8) (for details, see Vosse and Kempen 2000). The Unification Model accounts for sentence-complexity effects known from behavioral measures such as reading time. In general, sentences are harder to analyze syntactically when more potential unification links of similar
420
Hagoort
strength enter into competition with one another. Sentences are easy when the number of U links is small and the links are of unequal strength. The Unification Model has these advantages: it is computationally explicit, it accounts for a large series of empirical findings in the parsing literature ( but presumably not for all the locality phenomena in Gibson 1998) and in the neuropsychological literature on aphasia, and it belongs to the class of lexicalist parsing models that have found increasing support in recent years (Bresnan 2001; Jackendoff 2002; Joshi and Schabes 1997; MacDonald et al. 1994). This model also nicely accounts for the two classes of syntax-related ERP effects reported in this paper and in and many others. In the Unification Model, binding (unification) is prevented in two cases: when the root node of a syntactic building block (e.g., NP) does not find another syntactic building block with an identical foot node (i.e., NP) to bind to, and when the agreement check finds a serious mismatch in the grammatical feature specifications of the root and foot nodes. The claim is that a (left) anterior negativity (AN) results from a failure to bind, as a result of a negative outcome of the agreement check or a failure to find a matching category node. For instance, the sentence “The woman sees the man because with the binoculars” does not result in a completed parse, since the syntactic frame associated with ‘because’ does not find unoccupied (embedded) S-root nodes that it can bind to (see figure 16.8). As a result, unification fails. In the context of the Unification Model, I propose that the P600/SPS is related to the time it takes to establish unification links of sufficient strength. The time it takes to build up the unification links until the required strength is reached is affected by ongoing competition between alternative unification options (syntactic ambiguity), by syntactic complexity, and by semantic influences. The amplitude of the P600/SPS is modulated by the amount of competition. Competition is reduced when the number of alternative binding options is smaller, or when lexical, semantic or discourse context biases the strengths of the unification links in a particular direction, thereby shortening the duration of the competition. Violations result in a P600/SPS as long as unification attempts are made. For instance, a mismatch in gender or agreement features might still result in weaker binding in the absence of alternative options. However, in such cases the strength and build-up of U links will be affected by the partial mismatch in syntactic feature specification. Relative to less complex or syntactically unambiguous sentences, in more complex and syntactically ambiguous sentences it takes longer to build up U links of sufficient strength. The latter sentences, therefore, result in a P600/SPS in comparison to the former ones. In summary, it seems that the Unification Model provides an acceptable preliminary account for the collective body of ERP data on syntactic processing.
The Binding Problem for Language
421
Figure 16.8
A dangling syntactic frame for the conjunction element because. This syntactic frame cannot be attached into the phrasal configuration for the remaining parts of sentence.
Moreover, it does not assume a syntax-first architecture. It is, therefore, a better account of the empirical data, both behavioral and electrophysiological, than models that assume a syntax-first phase. Semantic Binding
Along with syntactic binding, semantic binding operations have to take place. Studies of neuropsychological patients and data from neuroimaging studies suggest that semantic representations may be distributed, with the involvement of brain areas that support the most salient aspects of a concept (e.g., visual, kinesthetic, linguistic or propositional) (Allport 1985; Saffran and Sholl 1999). Context can differentially activate or select the saliency of meaning aspects (as€ in “The girl gave a wonderful performance on the old piano” vs. “Four men€were needed to transport the old piano”). At the same time, the semantic
422
Hagoort
aspects retrieved on the basis of lexical access have to be integrated into a coherent interpretation of a multi-word utterance. This I will refer to as semantic binding. It turns out that left lateral prefrontal cortex is also crucial for semantic binding (see below). Binding-relevant areas within the left prefrontal cortex (LPC) may overlap, at least to some degree, for syntactic and semantic binding. But there is also evidence that semantic binding may involve more ventral areas (especially Brodmann’s area 47) than syntactic binding. More research is needed to determine commonalities and differences in LPC between areas involved in phonological, syntactic, and semantic binding. However, the qualitative differences between ERP effects of semantic ( N400) and syntactic (LAN, P600) binding suggest that the brain honors the distinction between these two types of binding operations. The Level of Semantic Binding: Sentence vs. Discourse
A central issue for semantic binding is whether or not a semantic representation at the sentence level is built up first, before semantic information is integrated into a discourse model in a second step. For instance, in their blueprint of the listener, Cutler and Clifton (1999) assume that utterance interpretation on the basis of syntactic analysis and thematic processing takes place first, before integration into a discourse model. Kintsch (1998; see also Ericsson and Kintsch 1995) has made similar claims. We conducted an ERP study to investigate how and when the language-comprehension system relates an incoming word to semantic representations of the unfolding local sentence and the wider discourse ( Van Berkum et al. 1999b). In the first experiment, subjects were presented with short stories, of which the last sentence sometimes contained a€ critical word that was semantically anomalous with respect to the wider dÂ�iscourse (e.g., “Jane told the brother that he was exceptionally slow” in a dÂ�iscourse context where he had in fact been very quick). Relative to a dÂ�iscoursecoherent counterpart (e.g., ‘quick’), these discourse-anomalous words elicited a large N400 effect (i.e., a negative shift in the ERP that began about 200 to 250 msec after word onset and peaked around 400 msec). In addition to the discourse-related anomalies, sentence-semantic anomaly effects were elicited under comparable experimental conditions. We found that€the ERP effects elicited by both types of anomalies were highly similar. Relative to their coherent counterparts, discourse-anomalous and sentenceanomalous critical words elicited an N400 effect with an identical time course and identical scalp topography (figure 16.9). The similarity of these effects, particularly in polarity and scalp distribution, is compatible with the claim that they reflect the activity of a largely overlapping or identical set of underlying neural generators, indicating similar functional processes.
The Binding Problem for Language
423
Figure 16.9
N400 effects triggered by discourse-related and sentence-related anomalies. Waveforms are presented for a representative electrode site. The latencies of the N400 effect in discourse and sentence contexts ( both onset and peak latencies) are the same. (after Van Berkum et al. 1999b)
In summary, there is no indication that the language-comprehension system is slower in relating a new word to the semantics of the wider discourse than in relating it to local sentence context. Our data clearly do not support the idea that new words are related to the discourse model after they have been evaluated in terms of their contribution to the semantics of the sentence. The speed with which discourse context affects processing of the current sentence appears to be at odds with recent estimates of how long it would take to retrieve information about preceding discourse from long-term memory. In the mate� rials of Van Berkum et al., the relative coherence of a critical word usually
424
Hagoort
hinged on rather subtle information that was implicit in the discourse and that required considerable inferencing about the discourse topic and the situation it described. Kintsch (1998; see also Ericsson and Kintsch 1995) has suggested that during on-line text comprehension such subtle discourse information is not immediately available and must be retrieved from “long term working memory” when needed. This is estimated to take some 300 – 400 msec at least. However, the results of our experiments suggest that the relevant discourse information can be brought to bear on local processing within at most 200 –250 msec. The observed identity of discourse-level and sentence-level N400 effects is most parsimoniously accounted for by a processing model that abandons the distinction between sentence-level and discourse-level semantic binding. This is compatible with the notion of common ground (Stalnaker 1978; Clark 1996). Clark’s analysis clearly demonstrates that the meaning of linguistic utterances cannot be determined without taking into account the knowledge that the speaker and the listener share and mutually believe they share. This common ground includes a model of the discourse itself, which is continually updated as the discourse unfolds. If listeners and readers always immediately evaluate new words relative to the discourse model and the associated information in common ground (i.e., immediately compute “contextual meaning”), the identity of the ERP effects generated by sentence and discourse anomalies has a natural explanation. With a single sentence, the relevant common ground includes only whatever discourse and world knowledge has just been activated by the sentence fragment presented so far. With a sentence presented in discourse context, the relevant common ground will be somewhat richer, now also including information elicited by the specific earlier discourse. But the process that maps incoming words onto the relevant common ground can run into trouble either way. The N400 effects observed by Van Berkum et al. (1999b) reflect the activity of this unified binding process. Of course, this is not to deny the relevance of sentential structure for semantic interpretation. In pÂ�articular, how the incoming words are related to the discourse model is coconstrained by sentence-level syntactic devices (such as word order, case marking, local phrase structure, or agreement), and by the associated mapping onto thematic roles. However, this is fully compatible with the claim that there is no separate stage during which word meaning is exclusively evaluated with respect to “local sentence meaning,” independent of the discourse context in which that sentence occurs. Binding Plasticity
It is often assumed in language-comprehension research that all information has to be available at the right moment for binding operations to occur. How-
The Binding Problem for Language
425
ever, in the reality of daily communication it is not uncommon that the system works under noisy conditions. Therefore, it might well be that the system works with what it gets, which is very often non-optimal. With a noisy signal or an underspecified context, one type of binding operation might be more eÂ�asily achieved than another. The comprehension system adapts to changing circumstances, and can change the weights assigned to the different binding operations that run in parallel accordingly. Some evidence for this idea came from a recent ERP study with agrammatic aphasics in which Hagoort, Wassenaar, and Brown (2003a) investigated the ERP effects of syntactic violations in aphasic patients and their elderly controls. The most interesting finding was that the ERP response to one type of syntactic violation (a violation of word order in adverb-adjective-noun sequences) was qualitatively different from the ERP effect in the non-agrammatic subjects. In these latter subjects (elderly controls and a group of non-agrammatic aphasics), the word-order violation resulted in a P600/SPS. In contrast, the ERP of the agrammatic aphasics was dominated by the N400 effect that is uÂ�sually observed to semantic binding operations during on-line language processing. Thus, whereas word-order violations triggered a syntax-related ERP response in normal controls and non-agrammatic comprehenders, the same violations triggered an ERP response related to semantic binding in Broca’s aphasics with agrammatic comprehension. Interestingly, a similar shift can be seen in early second-language learners (Osterhout, personal communication). We offered the following explanation for this semantic ERP response in agrammatic aphasics: The absence of a P600/SPS suggests that the agrammatic aphasics are no longer able to exploit syntactic information during sentence comprehension. The N400 effect for the word-order violations suggests that these sentences were processed through another (compensatory) processing route. The lack of a syntax-related ERP effect suggests that the agrammatic comprehenders did not interpret these sentences through a hierarchically organized phrase-structure representation. Instead, word meanings were incrementally integrated in the semantic representation of the linear string of preceding words, where the interpretation process was more difficult when the adjective preceded the adverb (thief steal expensive very .â•–.â•–.) than in the reverse order (thief steal very expensive .â•–.â•–.). In the adjective-before-adverb word order, the internal event structure is less coherent than in the correct order, owing to the reversal of the semantic arguments of the denoted event. That is, in the semantic context of thief steal expensive, the canonical structure of events is better matched by mentioning what is being stolen than by further expanding the meaning of expensive (as in thief steal expensive very). The results indicate that agrammatic patients still have access to this level of semantic information and are able to use this during real-time processing. This is not to say that their use
426
Hagoort
of semantic information is optimal, but it is certainly less affected than syntactic processing operations. As such, the relative preservation of a semantic processing route presumably results in the N400 effect. The data therefore address the real-time functioning of the language system under impairment: the way in which different sources of linguistic information are combined to derive an interpretation seems to be tailored to the processing options that are still available to the impaired language-comprehension system. The results we obtained suggest that a semantic processing stream provides an optimization of language comprehension within the limitations imposed by a syntactic deficit resulting from brain damage. Although this does not imply that semantic processing is fully optimal in agrammatic aphasics, it is relatively preserved compared to syntactic processing. Under impairment, the comprehension system seems to weigh the remaining information differently or more strongly. This multiple-route plasticity instantiates the potential for on-line adaptation to impairments in the language-comprehension system. What holds for the system under impairment might also hold for the intact system working under conditions of some external noise. It just works with what it gets, and is, to some degree, capable of flexibility and adaptation to what is the most salient information under the current circumstances. Language comprehension is characterized by processing that is “opportunistic” rather than rigidly regulated (Jackendoff 2003). The Neural Implementation of Binding in Language
In the context of the language system, the binding problem refers to the following question: How is information that is incrementally retrieved from the mÂ�ental lexicon unified into a coherent overall interpretation of a multi-word utterance? Most likely, unification must take place at the conceptual, syntactic, and phonological levels, as well as across these levels (Jackendoff 2002). So far I have discussed the features of the cognitive architecture for syntactic and semantic binding. In this section I will argue that the left inferior prefrontal cortex may have the characteristics necessary for performing the unification operations at the different levels of the language system. One requirement for solving the binding problem for language is the availability of cortical tissue that is particularly suited for maintaining information on-line while binding operations take place. Prefrontal cortex seems to be especially well suited for doing exactly this. Areas in prefrontal cortex are able to€hold information on-line (Mesulam 2002) and to select among competing alternatives (Thompson-Schill, D’Esposito, and Kan 1999). Electrophysiological recordings in the macaque have shown that this area is important for sus-
The Binding Problem for Language
427
Figure 16.10
Common areas of activation (shaded) in a meta-analysis of 28 imaging studies on the processing of syntactic information during language comprehension. The activated areas are shown on a lateral view of the left hemisphere. They were restricted to the temporal and frontal lobes of that hemisphere. (after Indefrey 2003)
taining information triggered by a transient event for many seconds (Miller 2000). This allows prefrontal cortex to establish unifications between pieces of information that are perceived or retrieved from memory at different moments in time. I will make some tentative suggestions about how the different components of the Unification Model for syntactic binding that I discussed above could be connected to our knowledge about the neural architecture. This proposal is not yet explicitly tested, but, as I will argue, it makes good sense in the light of our current knowledge about the contributions of the areas involved. In a recent meta-analysis of 28 neuroimaging studies, Indefrey (2003) found two areas that were critical for syntactic processing, independent of the input modality (visual in reading, auditory in speech). These two supramodal areas for syntactic processing were the left posterior superior temporal gyrus and the left posterior inferior frontal cortex (see figure 16.10). As is known from lesion studies in aphasic patients, lesions in different areas of left perisylvian cortex can result in deficits in syntactic processing in sentence comprehension (Caplan, Hildebrandt, and Makris 1996). The idea that modality-independent grammatical knowledge was mainly represented in BÂ�roca’s area (Zurif 1998) has thus been proved incorrect. At the same time, the left posterior temporal cortex is known to be involved in lexical processing (Indefrey and Cutler 2004). In connection to the Unification Model, this part of
428
Hagoort
the brain may be important for the retrieval of the syntactic frames that are stored in the lexicon. The Unification Space, where individual frames are connected into a phrasal configuration for the whole utterance, may be localized in the left frontal part of the syntax-relevant network of brain areas. One of the main specializations of prefrontal cortex is the holding on-line and binding of information (Mesulam 2002). It may be the right area for providing the computational resources for binding together lexical-syntactic frames through the dynamics of creating unification links between them (cf. Duncan and Miller 2002). It thus seems that the components of the Unification Model and the areas known to be crucial for syntactic processing can be connected in a relatively natural way, with left superior temporal cortex relevant for storage and retrieval of syntactic frames, and with the left prefrontal cortex important for binding these frames together. The need for combining independent bits and pieces into a single coherent percept is not unique for syntax. Models for semantic/conceptual unification and phonological unification could be worked out along similar lines as the Unification Model for syntax. Recent neuroimaging studies suggest that parts of prefrontal cortex in and around Broca’s area may be involved in conceptual and phonological unification, with Brodmann Areas (BA) 47 and 45 involved in semantic binding, BA 45 and 44 in syntactic binding, and BA 44 and ventral BA 6 in phonological binding (see figure 16.11). Six Principles of the Processing Architecture
In analogy to other domains of cognitive neuroscience, for language comprehension I have made the distinction between memory retrieval and unification or binding. I have discussed features of the processing architecture for syntactic and semantic binding. Evidence from neuroimaging studies seems to support the distinction between brain areas recruited for memory retrieval and brain areas crucial for binding. Based on the evidence discussed in the preceding sections, I propose the following six general architectural principles for comprehension beyond the single-word level: (i)╇ The brain honors the distinction between syntactic and semantic binding. However, both involve contributions from the left prefrontal cortex (in and around Broca’s area), it being the workspace where unification operations take place. It is very well possible that this area is not language-specific but also subserves other functions (e.g. binding in music; see Patel 2003). Left prefrontal cortex is suggested to maintain the activation state of representational structures retrieved from memory (the mental lexicon), and to provide the necessary neuroanatomical space for binding operations.
The Binding Problem for Language
429
Figure 16.11
The gradient in left inferior frontal cortex for activations and their distribution, related to semantic, syntactic, and phonological processing, based on the meta-analysis in Bookheimer 2002. Centers represent the mean coordinates of the local maxima; radii represent the standard deviations of the distance between the local maxima and their means (courtesy of Karl Magnus Petersson). The activation shown is from artificial grammar violations (Petersson et al. 2004).
(ii)╇ Immediacy is the general processing principle of binding. Semantic binding does not wait until relevant syntactic information (such as word-class information) is available, but starts immediately with what it derives on the basis of the bottom-up input and the left context. The corollary of immediacy is incrementality: output representations are built up from left to right in close temporal contiguity to the input signal. (iii)╇ There does not seem to be a separate stage during which word meaning is exclusively integrated at the sentence level. Incremental interpretation is, for the most part, done by an immediate mapping onto a discourse model (Clark 1996). (iv)╇ In parsing, lexically specified structures enter the unification space. Lex� ical information (e.g. animacy), discourse information, and (recent data suggest) inputs from other modalities (e.g., visual world, gesture) immediately influence the competition between alternative binding options, and can change the binding links. However, in the absence of competing binding sites, assignment of structure is not influenced by non-syntactic information. (v)╇ There is no evidence for a privileged position of syntax and/or a processing priority for syntax, as is assumed in syntax-first models. The different p�rocessing levels ( phonological, syntactic, semantic/pragmatic) operate in parallel, and to some degree independently. Where necessary, cross-talk takes
430
Hagoort
place, which is again characterized by the immediacy principle. That is, crosstalk takes place more or less moment-to-moment. (vi)╇ Within certain limitations, the language-comprehension system can adapt the weight of evidence in the light of system-internal or system-external noise. The degrees of freedom in language comprehension are much greater than in language production. References Ainsworth-Darnell, K., Shulman, H., and Boland, J. 1998. Dissociating brain responses to syntactic and semantic anomalies: Evidence from event-related potentials. Journal of Memory and Language 38, 112–130. Allport, D. A. 1985. Distributed memory, modular subsystems and dysphasia. In S. K. Newman and R. Epstein (eds.), Current Perspectives in Dysphasia. Churchill Livingstone. Altmann, G. T. M., and Steedman, M. 1988. Interaction with context during human sentence processing. Cognition 30, 191–238. Boland, J. E., and Cutler, A. 1996. Interaction with autonomy: Multiple output models and the inadequacy of the great divide. Cognition 58, 309–320. Bresnan, J. W. 2001. Lexical-Functional Syntax. Blackwell. Bookheimer, S. 2002. Functional MRI of Language: New approaches to understanding the cortical organization of semantic processing. Annual Review of Neuroscience 25, 151–188. Brown, C., and Hagoort, P. 1993. The processing nature of the N400: Evidence from masked priming. Journal of Cognitive Neuroscience 5, 34– 44. Brown, C., and Hagoort, P. (eds.). 1999. The Neurocognition of Language. Oxford University Press. Caplan, D., Hildebrandt, N., and Makris, N. 1996. Location of lesions in stroke patients with deficits in syntactic processing in sentence comprehension. Brain 119, 933–949. Chomsky, N. 1995. The Minimalist Program. MIT Press. Cutler, A., and Clifton, C. E. 1999. Comprehending spoken language: A blueprint of the listener. In C. M. Brown and P. Hagoort (eds.), The Neurocognition of Language. Oxford University Press. Chwilla, D. J., Brown, C. M., and Hagoort, P. 1995. The N400 as a function of the level of processing. Psychophysiology 32, 274–285. Clark, H. H. 1996. Using Language. Cambridge University Press. Coulson, S., King, J. W., and Kutas, M. 1998. Expect the unexpected: Event-related brain response to morphosyntactic violations. Language and Cognitive Processes 13, 21–58. Duncan, J., and Miller, E. K. 2002. Cognitive focus through adaptive neural coding in the primate prefrontal cortex. In R. T. Knight (ed.), Principles of Frontal Lobe Function. Oxford University Press.
The Binding Problem for Language
431
Ericsson, K. A., and Kintsch, W. 1995. Long-term working memory. Psychological Review 102, 211–245. Fodor, J. D. 1983. The Modularity of Mind. MIT Press. Frazier, L. 1987. Sentence processing: A tutorial review. In M. Coltheart (ed.), Attention and Performance XII. Erlbaum. Friederici, A. D. 1995. The time course of syntactic activation during language processing: A model based on neuropsychological and neurophysiological data. Brain and Language 50, 259–281. Friederici, A. D. 2002. Towards a neural basis of auditory sentence processing. Trends in Cognitive Sciences 6, 78–84. Friederici, A. D., Hahne, A., and Mecklinger, A. 1996. Temporal structure of syntactic parsing: Early and late event-related brain potential effects. Journal of Experimental Psychology: Learning, Memory, and Cognition 22, 1219–1248. Friederici, A. D., Pfeifer, E., and Hahne, A. 1993. Event-related brain potentials during natural speech processing: Effects of semantic, morphological and syntactic violations. Cognitive Brain Research 1, 183–192. Gibson, E. 1998. Linguistic complexity: Locality of syntactic dependencies. Cognition 68, 1–76. Hagoort, P. 2003. Interplay between syntax and semantics during sentence comprehension: ERP effects of combining syntactic and semantic violations. Journal of Cognitive Neuroscience 15, 883–899. Hagoort, P., and Brown, C. M. 1994. Brain responses to lexical ambiguity resolution and parsing. In K. Rayner (ed.), Perspectives on Sentence Processing. Erlbaum. Hagoort, P., and Brown, C. M. 2000a. ERP effects of listening to speech compared to reading: the P600/SPS to syntactic violations in spoken sentences and rapid serial visual presentation. Neuropsychologia 38, 1531–1549. Hagoort, P., and Brown, C. M. 2000b. ERP effects of listening to speech: Semantic ERP effects. Neuropsychologia 38, 1518–1530. Hagoort, P., Brown, C. M., and Groothusen, J. 1993. The Syntactic Positive Shift (SPS) as an ERP measure of syntactic processing. Language and Cognitive Processes 8, 439– 483. Hagoort, P., Brown, C., and Osterhout, L. 1999. The neurocognition of syntactic processing. In P. Hagoort (ed.), Neurocognition of Language. Oxford University Press. Hagoort, P., Hald, L., Bastiaansen, M., and Petersson, K. M. 2004. Integration of Word€Meaning and World Knowledge in Language Comprehension. Science 304, 438– 441. Hagoort, P., Wassenaar, M., and Brown, C. 2003a. Real-time semantic compensation in patients with agrammatic comprehension: electrophysiological evidence for multipleroute plasticity. Proceedings of the National Academy of Sciences of the United States of America 100, 4340 – 4345. Hagoort, P., Wassenaar, M., and Brown, C. M. 2003b. Syntax-related ERP-effects in Dutch. Cognitive Brain Research 16, 38–50.
432
Hagoort
Hahne, A., and Jescheniak, J. D. 2001. What’s left if the Jabberwock gets the semantics? An ERP investigation into semantic and syntactic processes during auditory sentence comprehension. Cognitive Brain Research 11, 199–212. Harbusch, K., and Kempen, G. 2002. A quantitative model of word order and movement in English, Dutch and German complement constructions. Paper presented at 19th International Conference on Computational Linguistics, San Francisco. Indefrey, P. 2003. Hirnaktivierungen bei syntaktischer Sprachverarbeitung: Eine MetaAnalyse. In G. Rickheit and H. M. Mueller (eds.), Neurokognition in der Sprache. Stauffenburg. Indefrey, P., and Cutler, A. 2004. Prelexical and lexical processing in listening. In M. D. Gazzaniga (ed.), The Cognitive Neurosciences, third edition. MIT Press. Jackendoff, R. 1999. The representational structures of the language faculty and their interactions. In P. Hagoort (ed.), The Neurocognition of Language. Oxford University Press. Jackendoff, R. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press. Jackendoff, R. 2003. Précis of foundations of language: Brain, meaning, grammar, evolution. Behavioral and Brain Sciences 26, 651–707. Joshi, A. K., and Schabes, Y. 1997. Tree-adjoining grammars. In A. Salomma and G. Rosenberg (eds.), Handbook of Formal Languages and Automata, volume 3. SpringerVerlag. Kaan, E., Harris, A., Gibson, E., and Holcomb, P. 2000. The P600 as an index of syntactic integration difficulty. Language and Cognitive Processes 15, 159–201. Kaan, E., and Swaab, T. Y. 2003a. Electrophysiological evidence for serial sentence processing: A comparison between non-preferred and ungrammatical continuations. Cognitive Brain Research 17, 621– 635. Kaan, E., and Swaab, T. Y. 2003b. Repair, revision and complexity in syntactic analÂ� ysis:€An electrophysiological differentiation. Journal of Cognitive Neuroscience 15, 98–110. King, J. W., and Kutas, M. 1995a. A brain potential whose latency indexes the length and frequency of words. Newsletter of the Center for Research in Language 10, 3–9. King, J. W., and Kutas, M. 1995b. Who did what and when? Using word- and clauselevel ERPs to monitor working memory usage in reading. Journal of Cognitive Neuroscience 7, 376 –395. Kintsch, W. 1998. Comprehension: A Paradigm for Cognition. Cambridge University Press. Kluender, R., and Kutas, M. 1993. Subjacency as a processing phenomenon. Language and Cognitive Processes 8, 573– 633. Knight, R. T., and Stuss, D. T. 2002. Prefrontal cortex: The present and the future. In R.€T. Knight (ed.), Principles of Frontal Lobe Function. Oxford University Press. Kuperberg, G. R., Sitnikova, T., Caplan, D., and Holcomb, P. J. 2003. Electrophysiological distinctions in processing conceptual relationships within simple sentences. Cognitive Brain Research 17, 117–129.
The Binding Problem for Language
433
Kutas, M., and Hillyard, S. A. 1980. Reading senseless sentences: Brain potentials reflect semantic anomaly. Science 207, 203–205. Kutas, M., and Van Petten, C. K. 1994. Psycholinguistics electrified: Event-related brain potential investigations. In M. A. Gernsbacher (ed.), Handbook of Psycholinguistics. Academic Press. Levelt, W. J. M. 1999. Producing spoken language: A blueprint of the speaker. In C.€M.€B. P. Hagoort (ed.), The Neurocognition of Language. Oxford University Press. MacDonald, M. C., Pearlmutter, N. J., and Seidenberg, M. S. 1994. Lexical nature of syntactic ambiguity resolution. Psychological Review 101, 676 –703. Marslen-Wilson, W. 1989. Access and integration: Projecting sound onto meaning. In W. Marslen-Wilson (ed.), Lexical Representation and Process. MIT Press. Marslen-Wilson, W., Brown, C. M., and Tyler, L. K. 1988. Lexical representations and language comprehension. Language and Cognitive Processes 3, 1–16. Marslen Wilson, W., and Tyler, L. K. 1980. The temporal structure of spoken language understanding. Cognition 8, 1–71. McKinnon, R., and Osterhout, L. 1996. Constraints on movement phenomena in sentence processing: Evidence from event-related brain potentials. Language and Cognitive Processes 11, 495–523. Mesulam, M.-M. 2002. The human frontal lobes: Transcending the default mode through contingent encoding. In R. T. Knight (ed.), Principles of Frontal Lobe Function. Oxford University Press. Miller, E. K. 2000. The prefrontal cortex and cognitive control. Nature Review Neuroscience 1, 59– 65. Münte, T. F., and Heinze, H. J. 1994. ERP negativities during syntactic processing of written words. In H. J. Heinze, T. F. Münte, and G. R. Mangun (eds.), Cognitive Electrophysiology. Birkhäuser. Münte, T. F., Heinze, H. J., and Mangun, G. R. 1993. Dissociation of brain activity related to syntactic and semantic aspects of language. Journal of Cognitive Neuroscience 5, 335–344. Münte, T. F., Matzke, M., and Johannes, S. 1997. Brain activity associated with syntactic incongruities in words and pseudo-words. Journal of Cognitive Neuroscience 9, 300 –311. Neville, H., Nicol, J. L., Barss, A., Forster, K. I., and Garrett, M. F. 1991. Syntactically based sentence processing classes: Evidence from event-related brain potentials. Journal of Cognitive Neuroscience 3, 151–165. O’Seaghdha, P. G. O. 1997. Conjoint and dissociable effects of syntactic and semantic context. Journal of Experimental Psychology 23, 807–828. Osterhout, L. 1997. On the brain response to syntactic anomalies: Manipulations of word position and word class reveal individual differences. Brain and Language 59, 494–522. Osterhout, L., Bersick, M., and McKinnon, R. 1997. Brain potentials elicited by words: Word length and frequency predict the latency of an early negativity. Biological Psychology 46, 143–168.
434
Hagoort
Osterhout, L., and Hagoort, P. 1999. A superficial resemblance doesn’t necessarily mean you’re part of the family: Counterarguments to Coulson, King, and Kutas (1998) in the P600/SPS debate. Language and Cognitive Processes 14, 1–14. Osterhout, L., and Holcomb, P. J. 1992. Event-related brain potentials elicited by syntactic anomaly. Journal of Memory and Language 31, 785–806. Osterhout, L., and Holcomb, P. J. 1993. Event-related potentials and syntactic anomaly: Evidence of anomaly detection during the perception of continuous speech. Language and Cognitive Processes 8, 413– 438. Osterhout, L., and Holcomb, P. J. 1995. Event-related potentials and language comprehension. In M. G. H. Coles (ed.), Electrophysiology of Mind. Oxford University Press. Osterhout, L., Holcomb, P. J., and Swinney, D. A. 1994. Brain potentials elicited by garden-path sentences: Evidence of the application of verb information during parsing. Journal of Experimental Psychology: Learning, Memory, and Cognition 20, 786 –803. Osterhout, L., and Mobley, L. A. 1995. Event-related brain potentials elicited by failure to agree. Journal of Memory and Language 34, 739–773. Patel, A. D. 2003. Language, music, syntax and the brain. Nature Neuroscience 6, 674– 681. Petersson, K. M., Forkstam, C., and Ingvar, M. 2004. Artificial syntactic violations activates Broca’s region. Cognitive Science 28, 383– 407. Petrides, M., and Pandya, D. N. 2002. Association pathways of the prefrontal cortex and functional observations. In R. T. Knight (ed.), Principles of Frontal Lobe Function. Oxford University Press. Rösler, F., Friederici, A. D., Pütz, P., and Hahne, A. 1993. Event-related brain potentials while encountering semantic and syntactic constraint violations. Journal of Cognitive Neuroscience 5, 345–362. Saffran, E., and Sholl, A. 1999. Clues to the functional and neural architecture of word meaning. In C. M. Brown and P. Hagoort (eds.), The Neurocognition of Language. Oxford University Press. Stalnaker, R. C. 1978. Assertion. In P. Cole (ed.), Syntax and Semantics 9: Pragmatics. Academic Press. Tanenhaus, M. K., and Trueswell, C. 1995. Sentence comprehension. In P. D. Eimas (ed.), Speech, Language, and Communication. Academic Press. Thompson-Schill, S. L., D’Esposito, M., and Kan, E. P. 1999. Effects of repetition and competition on activity in left prefrontal cortex during word generation. Neuron 23, 513–522. Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. 1994. Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33, 285–318. Trueswell, J. C., Tanenhaus, M. K., and Kello, C. 1993. Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory, and Cognition 19, 528–553. Tyler, L. K., and Marslen-Wilson, W. D. 1977. The on-line effects of semantic context on syntactic processing. Journal of Verbal Learning and Verbal Behavior 16, 683– 692.
The Binding Problem for Language
435
Van Berkum, J. J., Brown, C. M., and Hagoort, P. 1999a. Early referential context effects in sentence processing: Evidence from event-related brain potentials. Journal of Memory and Language 41, 147–182. Van Berkum, J. J., Hagoort, P., and Brown, C. M. 1999b. Semantic integration in sentences and discourse: Evidence from the N400. Journal of Cognitive Neuroscience 11, 657– 671. Van den Brink, D., and Hagoort, P. 2004. The influence of semantic and syntactic context constraints on lexical selection and integration in spoken-word comprehension as revealed by ERPs. Journal of Cognitive Neuroscience 16, 1068–1084. Varela, F., Lachaux, J.-P., Rodriguez, E., and Martinerie, J. 2001. The brainweb: Phase synchronization and large-scale integration. Nature Reviews Neuroscience 2, 229–239. Vosse, T., and Kempen, G. A. M. 2000. Syntactic structure assembly in human parsing: A computational model based on competitive inhibition and lexicalist grammar. Cognition 75, 105–143. Zurif, E. B. 1998. The neurological organization of some aspects of sentence comprehension. Journal of Psycholinguistic Research 27, 181–190. Zwitserlood, P. 1989. The locus of the effects of sentential-semantic context in spokenword processing. Cognition 32, 25– 64.
Index
Accessibility/salience hierarchy, 8, 9, 197, 198, 323, 324, 328, 331, 343, 359 A-Chain condition, 134–140, 148 Almor, A., 7, 197, 198, 297–319 Altmann, G., 43, 45, 68, 221, 227, 234, 240, 242, 406 Ambiguity resolution, 52, 74, 92, 94, 95, 99, 109, 226, 227, 240, 243, 286 Anaphor, 4, 7, 297–319, 323, 328, 329, 337, 341, 344, 346 SE-, 134–137, 140 SELF-, 134, 135 Anaphora, 159, 160, 163, 173, 174 backwards, 173, 174, 358 bound variable, 159, 160, 163, 164, 167, 175, 176 Anaphoric expression. See Anaphor Anderson, A., 277, 285 Anti-saccade task, 96 Ariel, M., 198, 317, 323, 324, 345 Arity reduction, 136, 138, 151 Arnold, J., 5, 198, 199, 203, 204, 206, 211, 278, 281, 282, 285, 299, 323, 324 Aslin, R. N., 69 Assume reference, 71 Audience design, 210–213 Baauw, S., 4, 139–142, 145, 147, 148 Berwick, R., 119 Binding conditions, 134, 135, 138, 140, 148. See also Binding theory Binding plasticity, 424–426 Binding principles. See Binding conditions, Binding theory
Binding problem in language, 10, 403–405, 426 in neuroscience, 403, 404 Binding theory, 4, 134, 139, 157–160, 167, 169, 175, 179, 182. See also Government and Binding Theory condition A (principle A), 134, 135, 138 condition B (principle B), 134, 137– 141, 157–178, 181–187 condition C (principle C), 161, 164– 167, 170–178, 186, 187 Bound variable (binding) interpretation, 158, 162–166, 173, 175, 176, 180. See also Variable binding Boyle, E., 278 Brennan, S. E., 247 Britt, M. A., 44, 67, 68, 73, 82, 227 Broca’s area, 427, 428 Brodmann’s areas, 405, 422, 428 Brown, C., 227, 407, 408, 411, 412, 425 Brown-Schmidt, S., 80, 86, 89, 97, 198, 278, 288–290, 345 Burkhardt, P., 149, 150 Byron, D., 345 Campana, E., 288–290 Caplan, D., 412, 427 Carlson, G., 54, 93, 219, 221, 240 Carston, R., 245 C-command, 31, 110, 115–119, 128, 159–167, 170, 175, 176 Centering Theory, 8, 9, 298, 326, 327, 355–361, 364, 367, 371, 372, 375–377 Chafe, W. L., 326
438
Chain formation, 136, 137 Chambers, C. G., 54, 93, 200, 201, 205, 206, 219, 220, 240 Chien, Y.-C., 4, 31, 38, 133, 138, 139, 148, 151, 157, 167, 168, 173, 175, 184, 185 Chierchia, G., 175, 176 Choi, Y., 34–39 Chomsky, N., 4, 115, 134, 165, 406 Christianson, K., 3, 79, 96 Clark, H. H., 73, 197, 200, 202, 424, 429 Clifton, C. E., 363, 442 Cohort competitor effects, 70, 200, 201, 204–207, 286–290 Cohort Model, 289 Common ground, 234, 424 Communicative conventions, 239, 243– 248 Competence/performance distinction, 15, 109, 115, 127, 128 Complement set focus, 382–399 Continuity Assumption, 43, 54, 60, 126 Contrastive inference, 240–245, 264, 265 Conversational implicatures, 244–246 Coopmans, P., 139–141, 147, 148 Cooreman, A., 362 Corblin, F., 385, 386 Coreference delay, 157, 158, 179, 184, 187 Coulson, S., 411 Covariation theory, 385 Cowles, W., 302 Crain, S., 43, 45, 51, 68, 86, 98, 112, 116, 240 Crawley, R., 325, 326 Cross-modal lexical priming, 149 Cue multiple-, theory, 30–33, 38 reliability, 31, 32 Cuetos, F., 141, 145 Cutler, A., 422, 427 Dahan, D., 200, 201, 205, 206 Dawydiak, E. J., 382, 390–397 DeCristofaro, J., 362–365 Delay of Principle B, 16, 17, 27, 37, 133, 138, 140, 141
Index
Dell, G., 300 Demonstrative. See Pronoun, demonstrative D’Esposito, 426 Determiner, 9, 10 definite, 24 development of, 17, 18, 25, 26 indefinite, 24, 25 De Villiers, J., 120, 121 Dialogue, 273, 276, 277, 285–291 Diesing, M., 124, 125 Discourse coherence, 355–359, 363, 364, 372, 375 Discourse deictic. See Pronoun, discourse deictic Discourse model, 7, 405, 406, 424. See also Situation model Discourse Representation Theory (DRT), 9, 381, 382, 398 Disfluency, 5, 197–217 Domain specificity, 409, 410 Dowty, D., 163 Dynamic Semantics, 388, 389 Early left anterior negativity (ELAN), 413, 416 Eberhard, K. M., 1, 93, 219, 222, 240 Economy, derivational, 164 Economy hierarchy, 136–138, 150, 151 Egocentrism, 21–23, 32–35 Escobar, L., 140 Evans, G., 161, 166, 177, 182 Event-related potentials (ERPs), 407, 415 Exceptional Case Marking, 4, 133, 137– 142, 145–151 Expectancy Hypothesis, 198, 199, 208– 210 Fay, N., 387 Felicity conditions on negative statements, 120–122 Fernald, A., 70 Ferreira, F., 3, 79, 96, 363 Fiengo, R., 177, 182, 335 Filip, H., 93 Focus, 7, 8, 297–319 Fodor, J. D., 410
Index
Form-based discourse contrast, 241–243 Fox, D., 164 Fox Tree, J. E., 197, 200, 202 Frazier, L., 406, 412 Free variable, 159 Frege, G., 24, 35, 36 Freudenthal, D., 278 Friederici, A., 149, 406, 410, 413 Garden-path effect, 43–47 Gardner, A., 116 Garnham, A., 298, 302, 323 Garnsey, S. M., 68, 82 Garrod, S., 276–291, 298–301 Generalized Dynamic Quantifier Logic, 388 Generalized implicature, 157 Generalized Quantifier Theory, 388 Gibson, E., 202 Givenness hierarchy, 359 Givón, T., 323, 345, 356, 363, 371 Gleitman, L., 86, 89, 95, 99 Glenberg, A., 335 Gordon, P., 301, 302 Government and Binding Theory, 4, 134, 137, 139, 141, 148 Grice, H. P., 6, 242–246, 302 Grimshaw, J., 173, 185 Grodzinsky, Y., 2, 4, 27, 28, 133, 138, 139, 141, 142, 148, 150, 151, 157, 158, 162, 163, 167–181, 184–187 Grosz, B., 298, 326, 356, 357 Gualmini, A., 116, 121, 125, 126 Guasti, M. T., 175, 176 Guises, 162, 163, 169, 174, 175, 178– 183, 187 Gundel, J., 323, 345, 359 Hagoort, P., 407–411, 414, 416, 425 Hahne, A., 149, 413 Hahn, U., 326–335, 344, 346 Halliwell, J., 3, 79, 96 Halmari, H., 329, 331 Hamburger, H., 51 Hän, 328–347 Hanna, J., 246 Head-mounted eye-tracking, 1, 5
439
Hedberg, N., 323, 359 Heim, I., 24, 35, 158, 159, 162, 169, 173, 177–183, 327 Heinze, H. J., 410 Henderson, A., 280 Hildebrandt, N., 427 Hill, N. M., 1, 74 Hillyard, S. A., 407 Hoffman, B., 326–331, 334, 335, 344, 346 Holcomb, P., 408, 411 Hollingworth, A., 3, 79, 96 Hornstein, N., 119 Hurewitz, F., 31, 86, 89–91 Hustinx, L., 317 Identity debate context, 179–183 Immediacy model, 412–416, 429, 430 Implicit causality, 359 Incremental interpretation, 219, 220, 232–234 Indefrey, P., 427 Informational Load Hypothesis (ILH), 7, 8, 302–305, 312, 315, 316 Information structure, 324–347 Interactive alignment, 273, 276, 277, 285–291 Intrasentential coreference. See Rule I Jackendoff, R., 65, 404, 406, 412, 418, 426 Jescheniak, J. D., 413 Johnson-Laird, P., 335 Joshi, A., 298, 326, 356 Jurafsky, D., 245 Kaan, E., 412, 416 Kaiser, E., 332, 338, 342, 346, 347 Kameyama, M., 355, 358, 359 Kamp, H., 9, 381, 382 Kan, E. P., 426 Karjalainen, M., 330 Karmiloff-Smith, A., 18–23, 37, 78, 81, 91, 92, 94 Keenan, E., 159 Kehler, A., 364 Kello, C., 67
440
Kempen, G., 10, 407, 416 Kennison, S., 284 Kibble, R., 388, 389, 392, 393, 398, 399 Kim, A., 68 Kindergarten-path effect, 29, 30, 33, 34, 74–77, 91 Kintsch, W., 335, 422, 424 Klima, E. S., 395 Knight, R. T., 405 Kruley, P., 335 Kutas, M., 407 Lambda (λ)-predicate (-operator), 159, 176 Langston, W. E., 335 Left anterior negativity (LAN), 410, 412, 416, 420, 422 Left inferior frontal gyrus, 96 Levelt, W. J. M., 405 Levinson, S., 243–245 Lidz, J., 117, 118, 121–123, 128 Liversedge, S., 316 Logrip, M., 1, 74 Lotocky, M. A., 67 Lucas, A., 280 Lyons, C. G., 79, 84 MacDonald, M. C., 65, 413 Makris, N., 427 Manzini, R., 138 Maratsos, M., 20–23, 32, 78, 81, 91–93, 97, 98 Marslen-Wilson, W., 412, 413 Matthei, E. M., 51 Matthewson, L., 27 Maturation in child language, 16, 28, 29 Maximality, 17, 25, 28–30, 33–37, 78, 92, 94, 95 Maxim of manner, 247 Maxim of quantity, 6, 243, 302, 303 May, R., 177, 182, 335 McClelland, J., 68 McCoy, K., 362–364 McKee, C., 138, 140 McKoon, G., 300 Mendelsohn, A., 79, 96
Index
Mental model, 8, 323, 336–339, 343, 344, 346, 347 Meroni, L., 98 Mesulam, M.-M., 404, 426, 427 Metzing, C., 247 Miller, E., 427 Milsark, G. L., 124 Minimize interpretive options, 164, 165 Modularity, 410 Monologue, 276–278, 285–290 Moxey, L., 381–387, 390–397 Münte, T. F., 410 Musolino, J., 112–128 Myers, E., 67 N400, 407–416, 422–426 Neo-Gricean, 244 Newport, E. L., 69 Nouwen, R., 383, 389–393 Novick, J. M., 68, 82, 95, 96 Observation of Isomorphism. See Quantifier scope, isomorphic interpretation of Operator disjunction, 116 scope, 109 Optimality, 158, 163 Optimality Theoretic Semantics, 389– 392 Optional infinitive stage, 16 Osterhout, L., 408, 409, 411, 412 P600, 408–412, 416, 420, 422, 425 Pandya, D. N., 405 Papafragou, A., 34–39, 94 Partee, B., 116 Partitive, 125, 126, 128 Paterson, K., 390, 391 Pattern recognition, 71 Pearlmutter, N. J., 65, 67 Pearson, J., 316 Petrides, M., 405 Philip, W., 139–141, 147, 148 Piaget, J., 21 Pickering, M., 276, 277, 285–287, 290, 291
Index
Picture-selection task, 143, 148 Piñango, M., 149, 150 Pinker, S., 126 Pinto, J. P., 70 Pragmatic-based discourse contrast (Gricean inference), 243–248, 260– 267 Prasad, R., 326 Prefrontal cortex (PFC), 404, 405, 426– 428 inferior, 426 lateral, 405, 422 left, 95 orbitofrontal, 405 Primitives of Binding, 134–138, 141, 148, 150 Prince, E. F., 198, 326, 327 Principle A. See Binding theory, condition A Principle B. See Binding theory, condition B Processing continuity, 69 Pronoun. See also Anaphor, Anaphora and salience, 323–325, 328, 334, 338, 342–347 chain, 347 demonstrative, 323–347 discourse deictic, 329, 332, 337, 341, 344 prompt, 325 Quantifier, 9, 10 negative, 381–385, 388, 393–397 positive, 381–385, 395, 398 Quantifier scope ambiguity, 110, 111, 126, 127 isomorphic interpretation of, 110–128 and negation, 110–112, 120, 121, 125– 127 Rabbin, B., 116 Radvansky, G. A., 274 Rambow, O., 326 Ratcliff, R., 300 Reference anaphoric, 97 deictic, 91, 97–99
441
Reference set, 168–172, 182 Referent accessibility, 371, 374 Referential contrast. See Contrastive inference Referential domain, 68, 93, 99, 100 Referential processing, 273–291 Referential Success, Principle of, 45–55, 58, 60, 226 Referential Theory, 43, 45, 47, 54 Reflexivity, 134–138, 141, 148, 150 Reinhart, T., 2, 4, 27, 28, 134, 135, 139, 140, 147, 148, 157–187 Repeated-name penalty, 301, 302 Repetition priming, 347 Representational interfacing, 70 Representational modularity, 67, 70 Restrictive modifiers, 233, 234, 239–243, 247, 259, 264 Reuland, E., 4, 134–140, 147, 148 R-expression, 165, 166 Reyle, U., 9, 381, 382 Rooth, M., 242 Rosen, S. T., 173, 185 Rule I, 139, 150, 158, 159, 162–182, 185–187 Runner, J. T., 284, 285 Saarimaa, E. A., 329, 330 Saffran, J. R., 69, 71 Salience, 8, 335, 338 Sanford, A., 280, 299–301, 362, 381–397 Scalar modifiers, 242–244, 247–251, 254–255, 258–265 Schaeffer, J., 27 Scope ambiguity. See Quantifier scope ambiguity Sedivy, J., 1, 54, 55, 68, 219, 220, 234, 235, 240, 241, 243, 248 Seidenberg, M., 65 Sekerina, I., 1, 74, 149 Semantic binding, 405, 408, 413, 415, 416, 421–428 Simons, W., 317 Situation model, 273–277, 280–291. See also Discourse model Smith, E. E., 168 Snedeker, J., 82, 94
442
Spivey-Knowlton, M., 1, 68, 82, 219, 240 Stalnaker, R. C., 424 Starving beast, 274–276. See also Fido Steedman, M., 43, 45, 46, 68, 240, 242 Stenning, K., 274, 275 Stevenson, R., 325–327, 359, 360 Strong crossover, 167 Strube, M., 326–335, 344, 346 Structural ambiguity resolution. See Ambiguity resolution Structured meaning, 163, 177, 180, 181, 187 Sturt, P., 283, 284 Stuss, D. T., 405 Subset principle, 119, 120 Sulkala, H., 330 Superior temporal gyrus, 427, 428 Supposition-Denial theory, 388, 394–399 Suri, L., 362, 363 Swaab, T. Y., 416 Swingley, D., 70, 149 Synchronicity of neuronal firing, 404 Syntactic binding, 405, 420, 422, 426, 428 Syntactico-semantic level, 339, 343, 344, 346, 347 Syntactic positive shift (SPS). See P600 Syntax-first model, 407, 412–416, 421, 429 Tager-Flusberg, H., 120, 121 Tämä, 328–347 Tanenhaus, M. K., 1, 6, 45, 54, 65, 67, 68, 78, 82, 93, 200, 201, 205, 206, 273, 287, 288, 219, 220, 227, 229, 240, 345 Tantalou, N., 94 Taraban, R., 68 Ter Meulen, A., 116 Terras, M., 280 Theta Assignment Principle, 44–50, 55, 60 Thompson-Schill, S., 82, 95, 426 Thornton, R., 4, 86, 112, 158, 168–187 Thorpe, K., 89 Topicality, 356, 363, 371
Index
Trueswell, J. C., 1–3, 29–39, 43–51, 54– 59, 65, 67, 68, 77–82, 86, 89, 90, 94, 95, 99 Truth Value Judgment task (TVJT), 86, 112, 116, 121, 142, 144, 145, 148 Turan, U. D., 326, 331 Tyler, L., 413 Typicality effect for NP anaphors, 301, 302, 305–319 Unification model, 416–420, 427, 428 Uniqueness, 26 Universal Grammar (UG), 17 Vallduví, E., 327, 331 Van Berkum, J. J., 408, 412, 422–424 Van den Berg, M., 388 Van den Brink, D., 414 Van Dijk, T. A., 335 Van Gompel, R., 316 Varela, F., 404 Variable binding, 4, 157, 159–167, 171, 175, 179. See also Bound variable interpretation Verb phrase (VP) ellipsis, 164, 176, 178, 186 Villalta, E., 127 Visual binding, 404 Visual-world eye-tracking paradigm, 195–293 Vonk, W., 317 Vosse, T., 10, 407, 416 Walker, M. A., 326 Wall, R., 116 Wason, P., 120 Wasow, T., 197, 200, 202 Wassenaar, M., 425 Watson, D., 202 Weak crossover, 167 Weinstein, S., 298, 326, 356 Wexler, K., 4, 16, 17, 22, 27, 28, 31, 37, 38, 78, 85, 91–94, 133, 138, 139, 148, 151, 157, 158, 167–187 Williams, C., 387 Wisconsin Card Sorting task, 96 Wong, K., 149
Index
Working memory, 3, 50, 79, 158, 168– 171, 178, 179, 186, 274, 299, 303– 305, 318, 405, 411, 412, 416, 424 Zacharski, R., 323, 359 Zurif, E. B., 149 Zwaan, R., 274 Zwitserlood, P., 412
443