Reference
NEW DIRECTIONS IN COGNITIVE SCIENCE Series Editors FRANCIS JEFFRY PELLETIER Simon Fraser University ANDREW BROOK Carleton University
Metarepresentations: A Multidisciplinary Perspective Dan Sperber Common Sense, Reasoning, and Rationality Edited by Renée Elio Reference: Interdisciplinary Perspectives Edited by Jeanette K. Gundel and Nancy Hedberg
Reference Interdisciplinary Perspectives
Edited by Jeanette K. Gundel and Nancy Hedberg
1 2008
3 Oxford University Press, Inc., publishes works that further Oxford University’s objective of excellence in research, scholarship, and education. Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam
Copyright © 2008 by Oxford University Press, Inc. Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016 www.oup.com Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress Cataloging-in-Publication Data Reference : interdisciplinary perspectives / edited by Jeanette K. Gundel and Nancy Hedberg. p. cm. — (New directions in cognitive science) Includes bibliographical references and index. ISBN 978-0-19-533163-9 1. Reference (Linguistics) 2. Reference (Philosophy) I. Gundel, Jeanette K. II. Hedberg, Nancy A. P325.5.R44R395 2008 415—dc22 2007011888
987654321 Printed in the United States of America on acid-free paper
Contents
Contributors
vii
1. Introduction 3 Jeanette K. Gundel and Nancy Hedberg PART I: What Is Reference? 2. On Referring and Not Referring Kent Bach
13
PART II: What Is the Appropriate Linguistic Analysis of Different Forms of Referring Expression? 3. Issues in the Semantics and Pragmatics of Definite Descriptions in English 61 Barbara Abbott 4. Equatives and Deferred Reference Gregory Ward
73
PART III: How Is Reference Resolved? 5. Rethinking the SMASH Approach to Pronoun Interpretation 95 Andrew Kehler 6. Good-Enough Representation in Plural and Singular Pronominal Reference 123 Sungryong Koh, Anthony J. Sanford, Charles Clifton Jr., and Eugene J. Dawydiak
v
PART IV: How Do We Select Forms of Referring Expression? 7. The Overlapping Distribution of Personal and Demonstrative Pronouns 143 Donna K. Byron, Sarah Brown-Schmidt, and Michael K. Tanenhaus 8. Reference, Centers, and Transitions in Spoken Spanish Maite Taboada 9. Linguistic Claims Formulated in Terms of Centering Massimo Poesio 10. Looking Both Ways 246 Alan Garnham and H. Wind Cowles Index
vi
contents
273
176 216
Contributors
Barbara Abbott: Department of Linguistics and Germanic, Slavic, Asian and African Languages, and Department of Philosophy, Michigan State University Kent Bach: Department of Philosophy, San Francisco State University Sarah Brown-Schmidt: Department of Brain and Cognitive Sciences, University of Rochester Donna K. Byron: Department of Computer Science and Engineering, The Ohio State University Charles Clifton Jr.: Department of Psychology, University of Massachusetts at Amherst H. Wind Cowles: Program in Linguistics, University of Florida Eugene J. Dawydiak: Department of Psychology, University of Glasgow Alan Garnham: Department of Psychology, University of Sussex Jeanette K. Gundel: Department of Linguistics, University of Minnesota Nancy Hedberg: Department of Linguistics, Simon Fraser University Andrew Kehler: Department of Linguistics, University of California, San Diego Sungryong Koh: Seoul National University, Korea Massimo Poesio: Center for Mind/Brain Sciences, Universita’ di Trento, Department of Computer Science, University of Essex Anthony J. Sanford: Department of Psychology, University of Glasgow Maite Taboada: Department of Linguistics, Simon Fraser University Michael K. Tanenhaus: Department of Brain and Cognitive Sciences and Department of Linguistics, University of Rochester Gregory Ward: Department of Linguistics, Northwestern University
vii
This page intentionally left blank
Reference
This page intentionally left blank
1
Introduction jeanette k. gundel and nancy hedberg
This volume grew out of a conference on “Discourse Processing: Reference,” which was held in Vancouver, British Columbia, Canada, in February 2003 and was cosponsored by the Cognitive Science Program of Simon Fraser University and the Cognitive Systems Program of the University of British Columbia. The conference was the twelfth in a series of “Vancouver Studies in Cognitive Science” conferences. Jeanette Gundel, who was guest coordinator of the conference, chose the theme and invited the speakers. We owe a great debt of gratitude to Martin Hahn, of Simon Fraser University’s Philosophy Department, for helping to organize the conference. Some of the speakers at that conference are not represented here (Jennifer Arnold, David Beaver, David Braun, Craig Chambers, and Ron Zacharski) since they did not submit their papers for inclusion in the volume. Andrew Kehler’s paper was not presented at the conference, as he was unable to attend. The volume also includes papers from researchers who were not invited speakers at the conference, but whose contributions were invited for inclusion in the volume (Alan Garnham and H. Wind Cowles; Sungryong Koh, Tony Sanford, Charles Clifton Jr., and Eugene J. Dawdiak; and Massimo Poesio).
The Study of Reference as Cognitive Science Reference and reference production and understanding is an essential feature of human cognition. It comprises the ability to think of and represent objects
3
(both real and imagined/fictional), to indicate to others which of these objects we are talking about, and to determine what others are talking about when they use a (pro)nominal expression. The subject of reference is one of the best examples of an area of study that Cognitive Science as an interdisciplinary field of study excels in exploring. It has been the focus of study in virtually all the cognitive sciences—philosophy (including philosophy of language and mind, logic, and formal semantics), theoretical and computational linguistics, and cognitive psychology. It is impossible to discuss reference without bringing into play perspectives from more than one discipline. The study of reference in semantics and pragmatics essentially involves linguistics and philosophy; the study of psychological reference processing involves linguistics as well as psychology, and the study of computational reference processing involves linguistics and computer science, as well as quite often triggering questions about the psychological reality of the mechanisms proposed. Researchers from different disciplines study reference from different perspectives and are informed by different research traditions. While the participants in this volume were primarily trained in one of the four represented disciplines—computer science, linguistics, philosophy, and psychology—and use methodologies typical of that discipline, each one of them bridges more than one discipline in their work and their approach. Thus, Kent Bach and Barbara Abbott apply primarily philosophical methods and pose traditional philosophical questions, but both are influenced by linguistics. Conversely, Gregory Ward applies primarily linguistic methods but is influenced and informed by work in philosophy of language. Andrew Kehler and Maite Taboada also come from a linguistic perspective but focus their attention on computational algorithms, while Massimo Poesio comes from a computational perspective but focuses his attention on linguistic ramifications of a computational algorithm. The team of Sungryong Koh, Anthony J. Sanford, Charles Clifton Jr., and Eugene J. Dawydiak were primarily trained in psychology and apply experimental methods, but focus on psycholinguistic models that bridge psychology and linguistics. Some of the papers are coauthored by people trained in different disciplines. Thus H. Wind Cowles, who was trained as a linguist, coauthors with Alan Garnham, a psychologist; and Donna Byron, a computer scientist/computational linguist, coauthors with psycholinguists Michael Tanenhaus and Sarah Brown-Schmidt. The papers in the volume also employ methodologies characteristic of a number of different cognitive science disciplines. The chapters by Abbott, Bach, Ward, and Kehler rely primarily on intuitive judgments of the investigators, a methodology characteristic of theoretical linguistics and philosophy of language; the papers by Koh, Sanford, Clifton, and Dawydiak, and by Garnham and Cowles employ an experimental methodology characteristic of psychology and psycholinguistics; and the papers by Poesio, Taboada, and Byron, Tanenhaus and Brown-Schmidt employ a corpus-analytic methodology that is characteristic of work in natural language processing/ computational linguistics as well as linguistics.
4
REFERENCE
Questions about Reference Posed in This Volume The chapters in this volume are organized according to the primary questions about reference that they attempt to answer. The most general question is “what is reference?” The next most general question is “what is the appropriate linguistic analysis of different referring forms?” More specific questions are “how is reference resolved?” and “how do speakers/writers select appropriate expressions with which to encode referents?” These latter two questions involve the two complementary sides of the reference processing issue: the interpretation or comprehension side and the generation or production side, respectively.
What Is Reference? In his chapter, “On Referring and Not Referring,” Kent Bach takes a philosophical perspective on basic questions such as “what is reference?” “what kind of expression can be used to refer?” and “how is reference accomplished?” Bach agrees with Russell and Strawson that expressions, including singular terms such as definite descriptions, are not linguistically (semantically) referential. He also accepts Strawson’s position that referring is something a speaker does, not something an expression does, although he takes a more conservative and interpersonally oriented position about what kinds of expressions can be used to refer and what counts as speaker reference. Bach views speaker reference as part of an act of communication, that is, as inherently audience-directed, whereby a speaker uses an expression to refer an audience to an individual. He proposes that any expression that can be used to refer can also be used non-referentially and that descriptive reference (Donnellan’s attributive use) is not genuine reference because to refer to something or to understand a reference to it requires being able to have singular thoughts about it and we cannot have a singular thought about an individual we can “think of ” only under a description. Having a singular thought about a referent requires perceiving it, being informed of it, or remembering it; that is, there must be a representational connection between thought and object. Bach does not, however, take the more restrictive view associated with Bertrand Russell that limits this connection to personal acquaintance. Among the expressions that cannot be used to refer, Bach includes indefinites, specific as well as nonspecific uses. In Bach’s view, definite descriptions, demonstratives, proper names, and definite pronouns can all be used referentially, but they can also be used non-referentially, for example, in attributive uses of definite descriptions and when the speaker intends to refer to something but there is no such thing. Bach argues that context does not determine reference. Rather, it is exploited by the audience to ascertain the reference, partly by being so intended.
INTRODUCTION
5
What Is the Appropriate Linguistic Analysis of Different Forms? In her chapter “Issues in the Semantics and Pragmatics of Definite Descriptions in English,” Barbara Abbott addresses the question of which meanings associated with the English definite article the are an essential part of the semantics of this determiner and which are pragmatically derived. She presents arguments supporting the view that uniqueness is part of the conventional meaning of the English definite article, while familiarity is pragmatically derived, and argues that approaches which take familiarity as essential are incorrect. In “Equatives and Deferred Reference,” Gregory Ward is concerned with a special metonymic type of reference where an expression is used to refer to an entity related to, but not directly denoted by, the conventional meaning encoded in the expression. The chapter focuses specifically on such “deferred reference” in identity statements such as “I’m the Pad Thai,” which he calls “deferred equatives.” Ward argues that deferred equatives constitute a distinct subtype of deferred reference with its own specific properties and that current theories of deferred reference will have to be revised to take account of this fact. While he agrees with previous researchers that felicitous use of deferred reference requires a contextually licensed correspondence from one object to another, he claims that such a correspondence applies differently to equatives and non-equatives. He also challenges previous claims that deferred reference involves meaning or sense transfer rather than reference transfer and argues that in deferred equatives the deferred interpretation results in a shift in meaning of the copula alone. He argues that in the case of equatives the pragmatic mapping between set members is explicitly encoded by the two NPs of the equative construction and no reference transfer is involved. For non-equatives, the mapping is implicit and only one of the mapped members is explicitly evoked in the discourse. Since both mapped entities are explicitly evoked within the equative, their meanings remain intact.
How Is Reference Resolved? Andrew Kehler’s chapter, “Rethinking the SMASH Approach to Pronoun Interpretation,” is concerned with how reference is resolved and specifically with the interpretation of pronouns. Kehler argues on both empirical and conceptual grounds that the SMASH approach (Search, Match, and Select with Heuristics) that has dominated the psychological and computational linguistics literature is incapable of explaining or even adequately describing pronoun interpretation within the human processing mechanism. He argues that the amount of processing that would have to occur in the Search and Match components and the morphologically based preferences of the Select phase is at odds with the fact that pronouns actually facilitate processing. He also discusses examples that cast doubt on the Search and Match paradigm, since a pronoun cannot be used felicitously even when it is the only one that
6
REFERENCE
satisfies the Match constraints. Kehler suggests that a variety of surface level preferences that have been posited in the literature are either epiphenomenal (e.g., the preference for grammatical role parallelism) or derivative (e.g., the preference for subject assignment) and proposes an account that embeds pronoun interpretation within a larger model of discourse processing. This model takes into account crucial interactions between information structure (including attentional state) and inferential processing mechanisms that underlie the establishment of coherence in discourse. Since the complexity in the data results from these processes and not from pronoun interpretation itself, such an account would explain how pronouns can actually facilitate processing by expressing topic continuance. Kehler also provides a critical overview of Centering Theory, an example of a SMASH account, and shows that it fails to make correct predictions in some cases because the notion of coherence it captures is primarily entity based and because it does not provide an incremental method for pronoun interpretation, as the preferred interpretation for a pronoun can’t be determined until the whole sentence is processed. Koh, Sanford, Clifton, and Dawydiak, in their chapter, “Good-Enough Representation in Plural and Singular Pronominal Reference: Modulating the Conjunction Cost,” explore the question of how we can resolve anaphoric singular pronouns whose referents were introduced in a conjoined noun phrase. Processing is slower for singular pronouns referring to one of the individuals introduced in a conjoined antecedent NP than for plural pronouns referring to the conjoint referent or for singular pronouns referring to an individual introduced in a separate simple NP. This effect is known as the “conjunction cost.” Koh et al. find that the conjunction cost disappears when the role played by the referent of the singular pronoun is in an action that is construed as being carried out on behalf of the pair (a number-indifferent action). They suggest that the mental representation of the text may fail to distinguish whether one or two individuals carry out the action in such cases. Koh et al. also touch on the question of selection of form of referring expression in that in number-sensitive cases, the conjunction cost disappears when a proper name is used instead of a pronoun. They suggest that the full NP, unlike the pronoun, cues the shift in discourse theme (topic) that results from the shift from plural to singular reference and so facilitates the shift.
How Do We Select Forms of Referring Expressions? Byron, Brown-Schmidt, and Tanenhaus, in “The Overlapping Distribution of Personal and Demonstrative Pronouns,” explore the question of what governs the choice of personal pronouns such as it as opposed to demonstrative pronouns such as that. Byron et al. build on the claims of the Givenness Hierarchy of Gundel, Hedberg, and Zacharski, which predicts that referents of personal pronouns are assumed to be in the focus of the addressee’s attention,
INTRODUCTION
7
while referents of demonstrative pronouns are only assumed to be activated (in working memory), and thus may be in the addressee’s focus of attention but most often are not. Byron et al. conduct both a corpus study and a psycholinguistic experiment to examine what factors allow a demonstrative pronoun to refer to an already in-focus entity and what factors affect whether a personal pronoun is used to refer to an entity that is not in focus according to syntactic criteria. For the corpus study, the group examined the TRAINS93 corpus, a transcription of a spoken dialogue, problem-solving task. They found that personal pronouns are typically used for in-focus referents but that factors such as referring to a higher order discourse topic or the global problem-solving task permit personal pronouns to be used for referents that are not brought into focus by the syntax of the preceding utterance. Demonstrative pronouns can be used for in-focus entities when the previous mention was itself a demonstrative and when the referent is a composite entity constructed from heterogeneous parts. The psycholinguistic experiment used an object-manipulation methodology. It was found that a personal pronoun was most often interpreted as referring to the previously mentioned direct object referent but was also often interpreted as referring to the composite created by the previous instruction when the task had been to place the first object on top of the second object rather than beside it. Byron et al. hypothesize that in this case the structure of the task resulted in attention on the composite, thus placing the composite in focus. Demonstrative pronouns were more often interpreted as referring to the composite, but were also often interpreted as referring to the direct object entity. This latter finding is clearly in line with Givenness Hierarchy predictions, since demonstrative pronouns only require activation and are thus predicted to be useable when the referent is in focus as well as when it is merely activated. Maite Taboada, in “Reference, Centers, and Transitions in Spoken Spanish,” focuses on the question of how choice of form is determined by transition types as specified in Centering Theory (Grosz, Joshi, and Weinstein 1995, inter alia.). Taboada examines two corpora of spoken dialogue in Spanish and reports on her analysis of almost 1,500 utterances, which she classified according to transition type: Continue, Retain, Smooth Shift, Rough Shift, Null Backward-Looking Center (Cb). She found that Continues and Smooth Shifts are encoded by zero pronouns in the majority of cases, with another one-fifth realized as clitics. An interesting contrast arises with written Italian as reported by Di Eugenio 1998, where Smooth Shifts were most often realized as stressed pronouns. Retains were realized as zero pronouns in about a third of the cases and as clitics in another third. There were very few Rough Shifts in the data. Taboada finally examines transition realizations that are contrary to expectation. Proper names are sometimes repeated in the telephone corpus and full noun phrases and dates are sometimes repeated in the travel-
8
REFERENCE
arrangements corpus due perhaps to contrast or to establishment of the referent as mutually known. Taboada concludes that Centering Theory may need to be revised to account for spoken language phenomena, both with regard to Cb realization and with regard to the global structure of conversational discourse. Massimo Poesio, in “Linguistic Claims Formulated in Terms of Centering: A Re-examination using Parametric CB-tracking Techniques,” also focuses on the selection of particular forms of referring expression from a Centering Theory perspective. He reports on the GNOME Corpus—a corpus of museum descriptions, pharmaceutical leaflets, and tutorial dialogues, for a total of about 900 finite clauses. He investigates several linguistically interesting claims, relating, for example, to “topic,” in terms of Centering Theory. The first issue is when do we use this-NPs, both pronominal and determiner. Gundel, Hedberg, and Zacharski propose that this-NPs must be at least “activated” (i.e., in working memory), and Poesio and Modjeska defined a notion “Active” to algorithmically identify this notion of “activated.” They conclude from their analysis of the GNOME corpus that this-NPs are used for Active entities as well as for entities other than the Cb of the previous utterance. Poesio also examines the type of transition compared to the form of the subject. He finds that pronouns (vs. demonstrative or full NP) are by far the most common with Continues and Smooth Shifts, where the Cb is equal to the Cp (the highest item on the Cf list), and are very rare with Retain, Rough Shift, Zero, or Null Transitions (the latter two being two types of utterances with no Cb). An intermediate case is the Establishment transition, where the current utterance has a Cb but the previous utterance did not. Establishments thus differ from Continues, which contradicts the conclusion of previous researchers. At the level of global discourse structure, Poesio finds that most longdistance pronouns had already been Cb’s earlier in the discourse. He also finds that continuous transitions (transitions where there is a Cb in both the current utterance and the previous one) are less likely than average to occur at discourse segment boundaries, whereas Establishment, Zero, and Null transitions are more likely than average to occur there. The Garnham and Cowles chapter, “Looking Both Ways: The JANUS Model of Noun Phrase Anaphor Processing,” provides a critical review of the psycholinguistic literature on structural and semantic factors that influence both the production and comprehension of anaphoric expressions, and outlines the basic assumptions of their own JANUS model of coreferential NP anaphor processing, showing how it has been derived from previous models and empirical findings. Garnham and Cowles agree with researchers who posit that purely structural strategies such as subject assignment should be regarded as fallback strategies that are used when other strategies fail. The fundamental assumption of their model is that a proper psychological account of coreferential NP anaphora must take account of both how the anaphor relates back to previous text and what function the anaphor
INTRODUCTION
9
performs in its own clause. JANUS also acknowledges that the type of a coreferential NP anaphor (e.g., pronoun, full definite NP) may influence how the process of finding the NP takes place. The basic principle of the JANUS model is that anaphoric expressions have two types of functions: those that involve looking backward to the previous text and those that involve looking forward to the upcoming text. With respect to looking backward, the JANUS model predicts that a coreferential anaphoric NP should have enough content to avoid indeterminacy of reference, and when this fails, other material in the clause is used for disambiguation. Looking forward is looking forward to the consequence of using a particular expression, that is, how it contributes to signaling the future direction of the text. The authors agree with Almor and others that unnecessary content in an anaphor slows processing, but do not agree that this effect arises through semantic interference in working memory. Rather, they propose that the processing of unnecessary content itself causes problems because the processing system is following functional principles, specifically the Gricean principle of quantity, that an expression (or in this case an expression along with other material in the clause) should contain no more or less information than necessary. They criticize the Informational Load Hypothesis of Almor and associates as too simplistic in considering only the relative salience of antecedents and relations between an anaphor and its actual antecedent, and note that some types of expression may preferentially seek nonfocused, or nonsalient antecedents.
10
REFERENCE
I
WHAT IS REFERENCE?
This page intentionally left blank
2
On Referring and Not Referring kent bach
Referring is not something an expression does; it is something that someone can use an expression to do. —P. F. Strawson (1950)
Even though it’s based on a bad argument, there’s something to Strawson’s dictum. He might have likened ‘referring expression’ to phrases like ‘eating utensil’ and ‘dining room’: just as utensils don’t eat and dining rooms don’t dine, so, he might have argued, expressions don’t refer. Actually, that wasn’t his argument, though it does make you wonder. Rather, Strawson exploited the fact that almost any referring expression, whether an indexical, demonstrative, proper name, or definite description, can be used to refer to different things in different contexts. This fact, he argued, is enough to show that what refers are speakers, not expressions. Here he didn’t take seriously the perfectly coherent view that an expression’s reference can vary with context. So, he concluded, what varies from context to context is not what a given expression refers to but what a speaker uses it to refer to. Strawson went on to suggest that there are several dimensions of difference between various sorts of referring expressions: degree of dependence on context, degree of “descriptive meaning,” and being governed by a general convention versus an expression-specific one. But despite these differences,
13
he insisted that regardless of kind, referring expressions don’t themselves refer—speakers use them to refer. Strawson’s dictum flies in the face of common philosophical lore. It is generally assumed, and occasionally argued, that there is indeed a class of referring expressions—indexicals, demonstratives, and proper names—and that they aren’t just eminently capable of being used to refer, which nobody can deny, but that they themselves refer, albeit relative to contexts. There is general consensus that at least some expressions do this, but there is considerable dispute about which ones. It is rare to find a philosopher who includes indefinite descriptions among referring expressions, but some are liberal enough to include definite descriptions. Some reject definites but include demonstrative descriptions (complex demonstratives) on their list. Some balk at descriptions of any kind referring but have no qualms about proper names. Some have doubts about proper names referring, but readily include indexicals and simple demonstratives. Anyhow, I can’t recall anyone actually responding to Strawson’s argument. Instead, what I’ve observed is that philosophers slide down a verbal slippery slope. Suppose Madonna says, referring to Britney Spears, “She is ambitious.”
Slippery Slide Madonna is using ‘she’ to refer to Britney. Madonna’s use (or utterance) of ‘she’ refers to Britney. The token of ‘she’ produced by Madonna refers to Britney. ‘She’, as used by Madonna, refers to Britney. ‘She’, relative to the context of its use by Madonna, refers to Britney.
The slide goes from a person using a term to refer to a use referring (as if uses refer) to a token referring to a use-relative reference by the term to a contextrelative reference.1 With this slippery slide in mind, from now on (except when discussing others’ views) instead of using ‘referring expression’ I’ll use the marginally better phrase ‘singular term’ for expressions that can be used to refer. This phrase is only marginally better because there is also a tradition to use ‘singular term’ for the natural-language counterparts of individual constants in logic. This tradition excludes definite descriptions from counting as singular terms, at least from the perspective of anyone who has learned the lesson of Russell’s (1905) theory of descriptions (however problematic the details of his formulation of it), but using ‘singular term’ at least has the advantage that I won’t have to say that some referring expressions don’t refer. By ‘reference’ I will mean singular reference only (I will not be considering whether general terms refer and, if so, to what), and when I describe a use as nonreferential, I will not mean that reference fails but that there is no attempt to refer.2
14
WHAT IS REFERENCE ?
In this chapter, I will be making a number of points about reference, both speaker reference and linguistic (or semantic) reference. The bottom line is simple: reference ain’t easy—at least not nearly as easy as commonly supposed. Or so it seems to me. Much of what speakers do that passes for reference is really something else, and much of what passes for linguistic reference is really nothing more than speaker reference. But here’s a running disclaimer: I do not pretend that the data, observations, or even the arguments presented here are conclusive. I do think they support what might fairly be regarded as default hypotheses about speaker reference and linguistic reference. So if you think these hypotheses are wrong, you need to show that. You need to argue against them and to find a way to accommodate or explain away the data and the observations. We’ll take up speaker reference first. Referring is one of the basic things we do with words, and it would be a good idea to understand what that involves and requires before worrying about the linguistic means by which this is done. Then we’ll focus on expressions that are used to refer. Rather than start with intuitions about the semantic values or propositional contributions of various singular terms and proceed from there, we’re going to start with common uses of singular terms. By going from speaker reference to linguistic reference, we’ll be in a position to raise questions about the semantics of singular terms that take these various uses into account. Here are the main points to be made:
Speaker Reference S0 Speaker reference is a four-place relation, between a speaker, an expression, an audience, and a referent: you use an expression to refer someone to something. S1 To be in a position to refer to something (or to understand a reference to it) requires being able to have singular thoughts about it, and that requires perceiving it, being informed of it, or (having perceived or been informed of it) remembering it. S2 To refer an audience to something involves expressing a singular proposition about it. S3 In using a certain expression to refer someone to something, you are trying to get them, via the fact that you are using that expression, to think of it as what you intend them to think of. S4 We generally choose the least informative sort of expression whose use will enable the hearer to identify the individual we wish to refer to, but this is not a matter of convention. S5 Often the only way to refer to something is by using a definite description. S6 Just as an object can be described without being referred to, so a singular proposition can be described without being grasped.
ON REFERRING AND NOT REFERRING
15
S7
Descriptive ‘reference’, or singling out, is not genuine reference.
S8 With a specific use of an indefinite description, one is not referring but merely alluding to something. S9 S10
So-called discourse reference is not genuine reference. Fictional “reference” is pseudo-reference.
Linguistic (Semantic) Reference and Singular Terms L0 If an expression refers, it does so directly, by introducing its referent into the proposition semantically expressed by sentences in which it occurs (so ‘direct reference’ is redundant). L1 So-called singular terms or referring expressions—indexicals, demonstratives (both simple and complex), proper names, and definite descriptions—can all be used in nonreferential ways too. L2 A given singular term seems to mean the same thing whether it is used referentially or not, and an adequate semantic theory should explain this or else explain it away. L3 When meaning doesn’t fix reference, generally ‘context’ doesn’t either. L4 The speaker’s referential intention determines speaker reference, but it does not determine semantic reference, except in a pickwickian way. L5 There is no such thing as descriptive “reference-fixing” (not because something isn’t fixed, but because it isn’t reference). L6 Pragmatic arguments of the same sort used to defeat objections against Millianism (such as those based on fictional and empty names and on occurrences of names in attitude contexts) can also be used against Millianism itself.
1. Speaker Reference Here’s a platitude for you. We commonly talk about particular persons, places, or things. We refer to them and ascribe properties to them. In so doing, we are able to accommodate the fact that an individual can change over time (as to properties, relations, and parts), that our conception of it can also change over time, that we can be mistaken in our conception of it, and that different people’s conceptions of the same individual can differ. This suggests something less platitudinous: the feat it describes is possible because in thinking of and in referring to an individual we are not constrained to represent it as that which has certain properties. This may smack of direct reference but, as we will see shortly, it is really indicative of something else. First we need to consider what it is to refer to something. S0 Speaker reference is a four-place relation, between a speaker, an expression, an audience, and a referent: you use an expression to refer someone to something.
16
WHAT IS REFERENCE ?
What referring is depends on whether expressions do it or speakers do it. The reference relation between singular terms and individuals (objects, persons, times, places, etc.) is a two-place relation.3 However complicated the explanation for what makes it the case that a certain term refers to a certain thing, the relation itself is between the term and the thing. If ‘Mt. Everest’ refers to Mt. Everest, this is a simple relation between a linguistic expression and a thing, regardless of what explains the fact that this relation obtains. On the other hand, when a speaker uses an expression to refer, the relation in question is a four-place relation: a speaker uses an expression to refer his audience to an individual. Communication is essentially an interpersonal affair, and reference by a speaker is part and parcel of an act of communication.4 So whereas expressions just refer to things, speakers don’t just refer but use expressions to refer audiences to things. I am claiming, then, that speaker reference is essentially audiencedirected.5 One might object and suggest there is another, more basic sort of speaker reference that has nothing directly to do with an audience. This more basic sort involves a specifically semantic intention regarding a singular term (except for “pure” indexicals like ‘I’ and ‘today’) that endows it with its reference. For example, in using ‘that’ to refer to a certain thing, the speaker intends ‘that’ to stand for that thing (for the nonce) and, in addition, intends his audience to recognize that it does. So this view implies that speakers have two referential intentions, one semantic and one pragmatic (communicative). One primarily concerns the referring expression, and the other concerns how the audience is to interpret it. Though consistent with the deep-seated tendency to treat singular terms used to refer as themselves referring, this view implausibly multiplies intentions beyond necessity. Moreover, it overlooks the fact that in choosing a singular term to use, the speaker does so with the audience in mind. One chooses it to enable one’s audience to think of or focus on the intended object. So I question whether speakers have referential intentions that are not part of their communicative intentions. As I see it, a speaker has one referential intention which is essentially audience-directed, an intention to use a certain expression to refer his audience to a certain thing. Indeed, part of what enables them to think of or focus on what one intends them to do so is the pragmatic fact that you are using that expression. This information is not carried by the expression itself, not even in a context-relative way (see Point L3). There is a different and psychologically more plausible way of thinking of the connection between a person’s demonstrative thought (a thought he would express using a demonstrative) and the linguistic means by which he expresses the thought. Say you look at a lamp near you and think a singular thought that you would express by uttering “That is black.” The connection between your having this thought and its linguistic manifestation is not a matter of intention but a matter of expression. You think of the lamp by way of a percept, which functions as a mental indexical, and you use a demonstrative to express that constituent of your thought. But your thought does
ON REFERRING AND NOT REFERRING
17
not itself have a demonstrative constituent but has merely an indexical one. That is, there is nothing by means of which you are calling your own attention to the object you’re attending to. You’re just attending to it. You don’t think of the lamp as “that” but, rather, think of it under the percept involved in your attending to it. The fact that you are inclined to express your thought by uttering “That is black” does not show that you have any independent intention to use ‘that’ for the lamp, apart from your communicative intention to refer your audience to it. What happens, rather, is that you form an intention to refer to a certain thing and choose an expression whose use by you, under the circumstances, will enable your audience to figure out that this is what you intend to refer to (see the following Points S3 and S4 ). S1 To be in a position to refer to something (or to understand a reference to it) requires being able to have singular thoughts about it, and that requires perceiving it, being informed of it, or (having perceived or been informed of it) remembering it.
Obviously you can’t refer to something unless you’re in a position to refer to it. So what does that involve? Here I will sketch but not defend a view on singular thought, according to which we have singular thoughts about objects we are perceiving, have perceived, or have been informed of (Bach 1987/1994: ch. 1). We do so by means of non-descriptive, ‘de re modes of presentation’, which connect us, whether immediately or remotely, to an object. The connection is causal-historical, but the connection involves a chain of representations originating with a perception of the object. Which object one is thinking of is determined relationally, not satisfactionally. That is, the object one’s thought is about is a matter not of satisfying a certain description but of being in a certain relation to that very thought (token). We cannot form a singular thought about an individual we can “think of ” only under a description. So, for example, we cannot think of the first child born in the twenty-second century because we are not suitably connected to that individual (see Point S7). We cannot think of it but merely that there will exist a unique individual of a certain sort. Our thought “about” that child is general in content, not singular. We cannot think of the first child born in the fourth century bc either. However, we can think of Aristotle, because we are connected to him through a long chain of communication. We can think of him even though we could not have recognized him, just as I can think of the bird that just flew by my window. Being able to think of an individual does not require being able to identify that individual by means of a uniquely characterizing description.6 So on my conception of singular thought, there must be a representational connection, however remote and many-linked, between thought and object. A more restrictive view, though not nearly as restrictive as Russell’s (1917, 1918), limits this connection to personal acquaintance (via perception and perception-based memory), and disallows singular thoughts about unfamiliar objects. A more liberal view would allow singular thought via
18
WHAT IS REFERENCE ?
uniquely identifying descriptions of special sorts. In any case, although I am assuming this conception of singular thought, the questions to be asked and the distinctions to be drawn, such as the distinction between referring to something and merely alluding to or merely singling out something, do not essentially depend on that conception (of course, how one uses these distinctions to divide cases does depend on one’s conception). I’ll mainly rely on the assumption that one can have singular thoughts about at least some objects one has not perceived and that only certain relations one can bear to an object put one in a position to have singular thoughts about it. S2 To refer an audience to something involves expressing a singular proposition about it.
If the expression (normally a noun phrase) one uses to refer to something itself refers to that thing, that expression must introduce an object into what is semantically expressed by the sentence in which it occurs. If that sentence semantically expresses a proposition (it might not—see Bach 1994), it expresses a singular proposition with respect to that object.7 The referent of that expression is a constituent of that proposition. But whether or not that expression itself refers, when a speaker uses it to refer, he uses it to indicate which thing he is speaking about. If he is making an assertive utterance, he is asserting a singular proposition about that object. What does it take to refer to an individual?8 In particular, can you refer to something if, as Russell would say, you “know it only by description”? Suppose you use a description and believe there to be a unique individual that satisfies the description, but you are not in a position to think of that individual. Can you refer to that individual anyway? If descriptions are quantificational and the propositions semantically expressed by sentences containing them are general, it would seem that you cannot use such a sentence to convey a singular proposition involving whichever individual satisfies the description (see Point S7). For example, if you said, “The Sultan of Brunei is fabulously wealthy” but had no idea who the sultan of Brunei is, you would be stating a general proposition, albeit one that is made true by a fact about a particular individual (to wit, Haji Hassanal Bolkiah Mu’izzaddin Waddaulah). Of course, your audience, if they were in a position to think of that individual and thought that you were, too, might mistakenly take you to be conveying a singular proposition, but that’s another matter. Here’s a different situation. Suppose you are in a position to think of a certain individual, but you do not wish to indicate which individual that is. You might say, for example, “A special person is coming to visit.” You intend your audience to realize that you have a certain individual in mind, but you do not intend them to figure out who it is. Indeed, you intend them not to. You are not referring but merely alluding to that individual (see Point S8). However, in my view, merely singling something out descriptively or even alluding to it does not count as referring to it. Neither alluding to an individual nor singling one out descriptively counts as referring to it—you are not expressing a singular proposition about it.
ON REFERRING AND NOT REFERRING
19
S3 In using a certain expression to refer someone to something, you are trying to get them, via the fact that you are using that expression, to think of it as what you intend them to think of.
In using a noun phrase to refer to a certain individual, your aim is to get your audience to think of that individual by way of identifying that individual as the one you are thinking of, hence referring to. How referring works and what it involves depends on whether the referent is already the subject of discussion, is at least an object of the audience’s attention, is at least capable of being called immediately to their attention, is at least familiar to them, or is not even familiar to them. Which situation obtains constrains what sort of singular term you need to use to enable them to think of, or at least to direct or keep their attention on, the object you intend to be referring to. Also, what it takes to refer your audience to something depends on whether it has a name and whether you and they know its name. Reference succeeds only if your audience identifies the individual you are talking about as the individual you intend to be talking about. Your audience must think of the right thing in the right way, of the individual intended in the way intended. If your audience identifies the individual in some other way, that’s a matter of luck, not of successful communication. It is rather like having a justified true belief that p without knowing that p. There are different ways in which a speaker can fail to refer. In the case just considered, there is a certain thing he intends to refer to, but his listener does not identify the intended individual (in the intended way). More interesting is the case in which the speaker intends to refer to something but there is no such thing. In that case there is no singular proposition about that individual to be expressed or conveyed. The speaker can have a referential intention, and his audience can recognize that he has such an intention, but nothing counts as getting it right. The speaker’s referential intention cannot be fulfilled, and full communication cannot be achieved. Since there is nothing for the hearer to identify, and no singular proposition for her to entertain, the best the hearer can do is recognize that the speaker intends to convey a singular proposition of a certain sort. The speaker has the right sort of intention, to be speaking of some particular thing, but there is no thing for him to succeed in referring to. A different situation would arise if the speaker merely made as if to refer to something, perhaps to deceive the hearer (“See that spider over there?”) or perhaps to play along with the hearer’s mistaken belief in the existence of something (“Bigfoot was seen in Montana last night”). In this case, although the speaker does not intend to refer to something, he does intend to be taken to be. He can succeed in that if he is taken to be referring to the individual the hearer mistakenly believes in. But since there is no such thing, there is no singular proposition to be grasped. S4 We generally choose the least informative sort of expression whose use will enable the hearer to identify the individual we wish to refer to, but this is not a matter of convention.
20
WHAT IS REFERENCE ?
Suppose you want to refer to your boss. In some circumstances, it may be enough to use the pronoun ‘she’ (or ‘he’, as the case may be). The only semantic constraint on what ‘she’ can be used to refer to is that the referent be female (ships and countries excepted). So its use provides only the information that the intended referent is female. If it is to be used successfully to refer the hearer to a certain female, there must be some female that your audience can reasonably suppose you intend to be referring to. If out of the blue you said, “She is insufferable,” intending with ‘she’ to refer to your boss, you could not reasonably expect to be taken to be referring to her. However, if she were already salient, say by being visually prominent or by having just been mentioned, or you made her salient in some way, say by pointing to her office or to a picture of her, then using ‘she’ would suffice. In other circumstances, you would have to use some more elaborate expression. For example, to distinguish her from other women in a group you could use ‘that woman’, with stress on ‘that’ and an accompanying demonstration. Or, assuming your audience knows her by name, you could refer to her by name. Otherwise, you would have to use a definite description, say ‘my boss’. This example suggests that a speaker, in choosing an expression to use to refer the hearer to the individual he has in mind, is in effect answering the following question: given the circumstances of utterance, the history and direction of the conversation, and the mutual knowledge between me and my audience, how informative an expression do I need to use to enable them to identify the individual I have in mind? Note that informativeness here can depend not only the semantic information encoded by the expression but on the information carried by the fact that it is being used. Some linguists have suggested that which sort of expression is most appropriately used depends, as a matter of convention, on the degree of “givenness” (or “familiarity” or “accessibility”) of the intended referent. For example, Gundel, Hedberg, and Zacharski (1993) distinguish being in focus (being the unique item under discussion or current center of mutual attention), being activated (being an item under discussion or being an object of mutual awareness), being familiar (being mutually known), and being uniquely identifiable (satisfying a definite description). They suggest that different degrees of givenness are not merely associated with but, as a matter of linguistic convention, are encoded by different types of singular terms. Perhaps they suggest this because, taking their accessibility scale to concern the cognitive status of representations in the mind of the hearer, they think this status has to be linguistically marked if it is to play a cognitive role. As I see it, however, this scale concerns the mutual (between speaker and hearer) cognitive status of the intended referent. After all, in using an expression to refer the speaker aims to ensure that the hearer thinks of the very object the speaker is thinking of, and what matters is that the expression used to refer, and the fact that the speaker is using it, provide the hearer with enough information to figure out what he is intended to take the speaker to be thinking of, hence to think of it himself. The parsimonious alternative to Gundel et al.’s conventionalist
ON REFERRING AND NOT REFERRING
21
view is that the different degrees of givenness associated with different types of singular terms are not encoded at all; rather, the correlation is a by-product of the interaction between semantic information that is encoded by these expressions and general facts about rational communication. On this, the null hypothesis, it is because different expressions are more or less informative that the things they can be used to refer to are less or more given or accessible. That is, the more accessible the referent is, the less information needs to be carried by the expression used to refer to it to enable the hearer to identify it. Notice that not only is it enough to use the least informative sort of expression needed to enable your audience to identify the individual you have in mind, it is normally misleading to use a more informative one (or at least odd, as when Michael Jordan used to refer to himself as “Michael Jordan”). So, in general, when you can use an indexical to refer to something, you should. And when you can use a short definite or demonstrative description to refer to something rather than a long one, you should. For example, in talking to a student, normally you would refer to yourself with ‘I’. Only if your capacity as, say, his adviser needed to be stressed, would you use ‘your adviser’ to refer to yourself. Normally you would only use it to refer to someone else. Similarly, you wouldn’t refer to the previous day by its date or even as ‘last Thursday’ when you could use ‘yesterday’. To refer to something that has just been mentioned, you would use ‘it’ if nothing else is also salient. Otherwise, you would use a definite description, say ‘the car’, but not ‘the car that Jones rented last week to drive to Lake Tahoe’, even if, indeed especially if, it had just been said that Jones rented a car the previous week to drive to Lake Tahoe. In telling a story about a particular person, it is always sufficient, once the individual is introduced, to use a personal pronoun—provided, of course, that no other individual of the same gender has been introduced in the meantime. There are stylistic or other literary reasons to use their name or a definite description every so often, but unless it is obvious that this is the name or a description of the individual in question, it would be inferred that reference is being made to some other individual. This inference would be made on the charitable assumption that one is not being needlessly informative (and violating Grice’s (1989, 26) second maxim of quantity). The basic point here is that to refer to something you need to use an available singular term that is as informative as necessary but no more. S5 Often the only way to refer to something is by using a definite description.
If Russell’s theory of descriptions is basically right, which I think it is (see Bach 2004b), then definite descriptions are the paradigm of singular terms that can be used to refer but are not linguistically (semantically) referential.9 So we should not be overly impressed by the fact that a given class of singular terms is commonly used referentially. Suppose you want to refer to some thing (or someone). Suppose it is not perceptually present, has not just come up in the conversation, and is
22
WHAT IS REFERENCE ?
not otherwise salient. Suppose that it does not have a name or that you are unaware of its name or think your audience is unaware. Then you cannot use an indexical, a demonstrative (pronoun or phrase), or a proper name to refer to it. If you want to refer to it, what are you going to do? Unless you can find it or a picture or some other nonlinguistic representation of it to point to, you need to use a linguistic expression, some sort of singular noun phrase (what else?), to call it to your audience’s attention. You must choose one that will provide your audience with enough information to figure out, partly on the supposition that you intend them to figure out, which object you’re talking about. Your only recourse is to use a definite description. This raises the question, when you use a description, how does your audience know that you are referring to something and expressing a singular proposition, rather than making a general statement and expressing a kind of existential proposition? Although the presence of a description does not signal that you are referring—semantically, descriptions are not referring expressions—what you are saying might not be the sort of thing that you could assert on general grounds, that is, as not based on knowledge of some particular individual (see Ludlow and Neale 1991). This will certainly be true whenever it is mutually evident which individual satisfies the description in question and what is being said regarding the individual that satisfies the description can only be supposed to be based on evidence about that individual. For example, if Claire says to me, “The decanter is broken,” I can’t not take her to be talking about the actual decanter of ours. On the other hand, if before we decided on a decanter she said, “The decanter had better not cost more than $100,” clearly she would be making a general statement pertaining to whichever decanter we buy (notice that the corresponding demonstrative description, ‘that decanter’, is usable only in the latter, non-referential case). Also, its being mutually evident which individual satisfies a description will generally be sufficient for a referential use, since there will usually be no reason for the hearer not to be taken as making a singular statement about that individual. This applies especially to descriptions of occupiers of social positions or practical roles, such as ‘the boss’ or ‘the freezer’. Moreover, if the description is incomplete, as in these cases, and there is no mutually salient or obviously distinctive completion in sight, then the hearer, at least if he is mutually familiar with the boss or freezer in question, can only take the description to be used referentially.10 But if ‘the F’ is incomplete and it is obvious that the hearer is unfamiliar with the relevant F, then a (referential) use of ‘the F’ must be preceded by an introduction of the relevant F. Now according to Russell’s theory, a sentence of the form ‘The F is G’ semantically expresses a general (uniqueness) proposition. So if you utter such a sentence but use the description referentially, what you say is a general proposition but what you mean is a singular one.11 But how and why does the hearer take you to be doing that? For example, if you uttered, ‘The plumber is pernicious’, I would take you to be asserting not a general proposition but a singular one, about the plumber. Why would I do that? Well, I am acquainted
ON REFERRING AND NOT REFERRING
23
with the plumber and presumably so are you. Besides, that a certain individual is a plumber has nothing to do with his being pernicious. To suppose that it does would be to take you to be stating something for which you have no evidence (you would be violating Grice’s (1989, 27) second maxim of quality). So I have no reason to suppose, as if you were unfamiliar with the plumber, that you are making a general statement, the content of which is independent of who the plumber is. So I have positive reason to think that you have in mind, and intend me to think you have in mind, a certain individual who satisfies the description you are using. If you are using the description to refer and I am taking you to be doing so, we must have ways of thinking of the individual in question, the plumber, in some other way than as the plumber. Presumably we both remember him by way of a memory image derived from seeing him. In thinking of him via that image, you take him to be the plumber and use the description ‘the plumber’ to identify him for me, which triggers my memory of him. We both think of him, via our respective memories of him, as being the plumber. This fits with how Mill describes the functioning of a proper name in thought as an “unmeaning mark which we connect in our minds with the idea of the object, in order that whenever the mark meets our eyes or occurs to our thoughts, we may think of that individual object” (1872, 22). Though not “unmeaning,” a definite description can play a similar role. In using a description referentially, you are using it in lieu of a sign for the object. S6 Just as an object can be described without being referred to, so a singular proposition can be described without being grasped.
It is one thing to entertain a singular proposition and another thing merely to know that there exists a certain such proposition. Russell’s famous discussion of Bismarck illustrates how this can be. He operates with a notoriously restrictive notion of acquaintance, but this is not really essential to the distinction he is drawing. I agree with Russell that we cannot have singular thoughts about individuals we “know only by description,” but I will not assume that the ones we can have singular thoughts about are limited to those with which we are acquainted in his highly restrictive sense. They include individuals we are perceiving, have perceived, or have been informed of and remember. So although Russell’s choice of example (Bismarck) would have to be changed to be made consistent with a much more liberal notion of acquaintance, I will use it to illustrate his distinction. Russell contrasts the situation of Bismarck himself, who “might have used the name [‘Bismarck’] directly to designate [himself] . . . to ma[k]e a judgment about himself ” having himself as a constituent (1917, 209), with our situation in respect to him: [W]hen we make a statement about something known only by description, we often intend to make our statement, not in the form involving the description, but about the actual thing described. That is, when we say anything about Bismarck, we should like, if we could, to make the judg-
24
WHAT IS REFERENCE ?
ment which Bismarck alone can make, namely, the judgment of which he himself is a constituent. [But] in this we are necessarily defeated. . . . What enables us to communicate in spite of the varying descriptions we employ is that we know there is a true proposition concerning the actual Bismarck and that, however we may vary the description (as long as the description is correct), the proposition described is still the same. This proposition, which is described and is known to be true, is what interests us; but we are not acquainted with the proposition itself, and do not know it, though we know it is true. (1917, 210–11)
The proposition that “interests us” is a singular proposition, but we cannot actually entertain it—we can know it only by description, that is, by entertaining a general (uniqueness) proposition which, if true, is made true by a fact involving Bismarck. But this general proposition does not itself involve Bismarck, and would be thinkable even if Bismarck never existed.12 S7 Descriptive ‘reference’, or singling out, is not genuine reference.
David Kaplan suggests that one can use a description to refer to something even if one is not in a position to have a singular thought about it or, as he would say, even if one is not “en rapport” with it. He asks rhetorically, “If pointing can be taken as a form of describing, why not take describing as a form of pointing?” (1979, 392). Well, there’s a reason why not. First consider the following example of Kaplan’s “liberality with respect to the introduction of directly referring terms by means of ‘dthat’,” which “allow[s] an arbitrary definite description to give us the object” (1989a, 560).13 (1)
Dthat [the first child to be born in the twenty-second century] will be bald.
‘Dthat’ is a directly referential term and, as Kaplan explains, “the content of the associated description is no part of the content of the dthat-term” (1989b, 579); it is “off the record (i.e., off the content record)” (1989b, 581). So ‘dthat’ is not merely a rigidifier (like ‘actual’) but a device of direct reference.14 What gets into the proposition is the actual object (if there is one) that uniquely satisfies the description, not the description itself (i.e., the property expressed by its matrix).15 Not only does Kaplan’s “liberality” impose no constraint on the definite description to which ‘dthat’ can be applied to yield a “directly referential” term, it imposes no epistemological constraint on what one can “directly refer” to. As Kaplan puts it, “Ignorance of the referent does not defeat the directly referential character of indexicals,” from which he infers, “a special form of knowledge of an object is neither required nor presupposed in order that a person may entertain as object of thought a singular proposition involving that object” (1989a, 536). However, even if we concede that any definite description can be turned into a directly referential term, so that a sentence containing the ‘dthat’ phrase expresses a singular proposition about the actual object (if there is one) that uniquely satisfies the description,
ON REFERRING AND NOT REFERRING
25
it is far from obvious that the user of such a phrase can thereby refer to, and form singular thoughts about, that object. Kaplan seems to think this ability can be created with the stroke of a pen. Consider, for example, whether one can refer to the first child born in the twenty-second century. Assume that nearly one hundred years from now, this description will be satisfied (uniquely). Then there is a singular proposition involving that individual, as expressed by (1).16 Without ‘dthat’ (and the brackets), (1) would express a general (uniqueness) proposition, the one expressed by (1'), (1')
The first child to be born in the 22nd century will be bald.
Now can one use the description ‘the first child to be born in the twenty-second century’ referentially, to refer to that child? Kaplan thinks there is nothing to prevent this, that it is a perfectly good example of pointing by means of describing. However, what enables one to form an intention to refer to the individual who happens to satisfy that description? If one is prepared to utter (1') assertively, surely one is prepared to do so without regard to who the actual such child will be—one’s grounds are general, not singular. For example, one might believe that the first child born in the 22nd century is likely to be born in China and that Chinese children born around then will all be bald, thanks to China’s unrestrained use of nuclear power. But this only goes to show that one’s use of the description is likely to be taken to be attributive. Unless one were known to be a powerful clairvoyant, one could not plausibly be supposed to have singular grounds for making the statement. Nevertheless, Kaplan thinks that one could intend to use the description referentially anyway, as if putting the description in brackets and preceding it with ‘dthat’ could not only yield a term that refers to whoever actually will be the first child born in the twenty-second century but could enable a speaker to refer to that child. It seems, though, that one is in the same predicament as the one Russell thought anyone other than Bismarck would be in if he wanted to refer to Bismarck. Would it help to have the tacit modal intention of using the description rigidly, or even to insert the word ‘actual’ into the description? Referring to something involves expressing a singular proposition about it, but rigidifying the description or including the word ‘actual’ would not make its use referential. Even though the only individual whose properties are relevant to the truth or falsity of the proposition being expressed (even if that proposition is modal) is the actual F (if it exists), still that proposition is general, not singular. This proposition may in some sense be object-dependent, but it is not object-involving. The property of being the actual F may enter into the proposition, but the actual F does not. The fact that there is something that satisfies a certain definite description does not mean that one can refer to it. One can use a description to describe or, as I will say, single out something without actually referring to it. For if a different individual satisfied the description or you were discussing a hypothetical situation in which that would be the case, you would have singled
26
WHAT IS REFERENCE ?
out that individual instead. Nevertheless, you could use the description just as though you were introducing the thing that satisfies it into the discourse. You could, for example, use pronouns to “refer” back to it. You could say, “The first child to be born in the twenty-second century will be bald. It will be too poor to use Rogaine.” Giving it a name won’t help. You could dub this child ‘Newman 2’ (just as Kaplan dubbed the first child to be born in the twenty-first century ‘Newman 1’), but this would not enable you to refer to it or to entertain singular propositions involving it. In this, as Russell would have said, “we are necessarily defeated.” Even though (1') does not express a singular proposition, the proposition which “interests us” but which we cannot entertain, one can still use the sentence to describe that proposition. It might be objected that in characterizing descriptive “reference” as singling out rather than referring to an object, I am not making a substantive claim but am merely engaging in terminological legislation. I would reply that anyone who insists on calling this reference should either show that a singular proposition is expressed or explain why, when one conveys a general, object-independent proposition, this should count as referring. One possible reason is taxonomic: if we are to maintain that indexicals and demonstratives are inherently referring expressions and not merely expressions that are often used to refer, allowances must be made for the fact that we sometimes use them to do what I would describe as merely singling out an object, that is, an object that the speaker is not in a position to have singular thoughts about. For example, one could use ‘he’ or ‘that child’ to single out the first child born in the twenty-second century. But the question is whether this counts as genuine reference. Indeed, one can use such expressions without even singling out an individual, as in, “If a child eats a radioactive Mars bar, he/that child will be bald.” The mere fact that some philosophers are in the habit of calling indexicals and demonstratives “referring expressions” does not justify cultivating this habit. In summing up his account of the referential-attributive distinction, Donnellan concedes that there is a kind of reference, reference in a “very weak sense,” associated with the attributive use of a definite description (1966, 304). Since he is contrasting that use with the referential use, this is something of a token concession. Reference in this very weak sense is too weak to count as genuine reference, for one is “referring” to whatever happens to satisfy the description, and one would be “referring” to something else were it to have satisfied the description instead. This is clear in modal contexts, such as in (2): (2)
The next president, though probably a man, could be a woman.
The speaker is not likely to be asserting of some one possible president that he or she will probably be a man but could be a woman, say if he had a sexchange operation before her inauguration. Here the description is taken to fall within the scope of ‘could’. The speaker is allowing for different possible presidents, some male, some female, only one of whom will actually be the next president. Surely this is not reference, not even in a very weak sense.
ON REFERRING AND NOT REFERRING
27
S8 With a specific use of an indefinite description, one is not referring but merely alluding to something.
Indefinite descriptions can be used nonspecifically, referentially, or specifically.17 In the very common nonspecific (or purely quantificational) use, there is no indication that the speaker has any particular thing in mind; one is expressing a general proposition. With the referential use, which is relatively rare, as it is with quantificational phrases generally, one does express a singular proposition (see Ludlow and Neale 1991, 176–180), but this is about an individual that is already the focus of mutual attention. Here I will consider the specific use of indefinite descriptions. What is distinctive about the specific use is that the speaker communicates that he has a certain individual in mind, but he is not communicating which individual that is—he doesn’t intend you to identify it. Suppose a man says to his wife, (3)
An old girlfriend will call today.
Unless he thinks this is the sort of day for a call from an old girlfriend, presumably he has a particular one in mind. He could have made this clear by including the word ‘certain’ (or ‘particular’), as in “A certain old girlfriend will call today.”18 He could even elaborate on why he is not specifying which old girlfriend it is by continuing “An old girlfriend will call today” with “but I can’t tell you who” or “but only to discuss Russell.” In a specific use, the speaker indicates that he is in a position to refer to a certain individual, but is not actually doing so. He is not identifying or trying to enable the hearer to identify that individual—he is merely alluding to her. He has a certain singular proposition in mind but is not trying to convey it. So what must the hearer do in order to understand the utterance? It would seem that she must merely recognize that the speaker has some singular proposition in mind, about a certain individual of the mentioned sort, in this case an old girlfriend. It might be objected that a specific use of an indefinite description is a limiting case of a referential use, not mere allusion but what might be called ‘unspecified’ reference. After all, can’t the hearer, recognizing that the speaker has some individual in mind, at least think of that individual under the description ‘the individual the speaker has in mind’? But, as we have already seen, descriptive ‘reference’ is not genuine reference. Besides, the speaker is not really referring the hearer to that individual and, in particular, does not intend her to think of the individual he has in mind under the description ‘the individual you (the speaker) have in mind’ or in any similar way. He is merely indicating that he has a certain unspecified individual in mind. That is, he is not referring but merely alluding to that individual. To appreciate why this is, consider a situation in which the speaker has in mind one F among many and proceeds to say something not true of that individual. Suppose a group of unsavory men crash a party late at night and start a fight. Later an elderly partygoer utters (4) to the police,
28
WHAT IS REFERENCE ?
(4)
A big hoodlum had a concealed weapon.
She has a particular hoodlum in mind when she says this, but does not specify which one. Obviously the words ‘a big hoodlum’ do not refer to the hoodlum she had in mind, for if some big hoodlum had a concealed weapon but she was mistaken about which one, (4) would still be true. So (4) semantically expresses a general proposition. Even so, since the elderly partygoer does have a certain hoodlum in mind, is she using this indefinite description to refer to that hoodlum? Even if what she said is a general proposition, is what she meant a singular proposition, about the hoodlum she had in mind? No, because the police could understand her perfectly well without having any idea which hoodlum she has in mind. They understand merely that she has a certain hoodlum in mind, the one she is alluding to. S9
So-called discourse reference is not genuine reference.
It is well-known that unbound pronouns, as well as definite descriptions, can be used anaphorically on indefinite descriptions, as in these examples: (5)
Russell met a man today. He/The man was bald.
(6)
A plumber bought a lottery ticket yesterday, and he/the plumber won $1,000,000.
(7)
If there were a mermaid there, Merlin would have seen her/the mermaid.
(8)
Every farmer owns a donkey. He feeds it/the donkey popcorn.
In (5) and (6), the pronoun (and the definite description) can be used to refer. If the speaker is using ‘a man’ specifically in uttering (5), he could even use ‘he’ (or ‘the man’) to refer to the man he thinks Russell met that day, although success at that would require his audience knowing who that was. If, on the other hand, the speaker is not in a position to refer to such a man, he can only use ‘a man’ nonspecifically and could not use ‘he’ (or ‘the man’) referentially; the most he could intend to convey is the general proposition that Russell met a man that day who was bald (for a plausible account of such examples, see King 1987). Similar points apply to (6). However, it seems that the pronouns and the definite descriptions in (7) and (8) cannot be used to refer at all. In (7) ‘her’ (or ‘the mermaid’) is not being used to refer to an unspecified (and presumably nonexistent) mermaid, and in (8) ‘it’ (or ‘the donkey’) functions quantificationally, ranging over the different donkeys owned by the different farmers. Despite the fact that the pronouns and the definite descriptions in cases like (5) and (6) need not, and in cases like (7) and (8) cannot, be used to refer, many semanticists have attributed ‘discourse referents’ to them. I am not suggesting that these semanticists seriously believe that discourse referents are real referents, but this only makes it puzzling why they use this locution. Here is how Karttunen (1976) introduced the phrase:
ON REFERRING AND NOT REFERRING
29
Let us say that the appearance of an indefinite noun phrase establishes a discourse referent just in case it justifies the occurrence of a coreferential pronoun or a definite noun phrase later in the text. . . . We maintain that the problem of coreference within a discourse is a linguistic problem and can be studied independently of any general theory of extra-linguistic reference. (Karttunen 1976, 366; my emphasis)
I agree, except that what Karttunen regards as coreference need not be reference at all. He goes on to explain, In simple sentences that do not contain certain quantifier-like expressions,19 an indefinite NP establishes a discourse referent just in case the sentence is an affirmative assertion. By ‘establishes a discourse referent’ we meant that there may be a coreferential pronoun or definite noun phrase later in the discourse.20 Indefinite NPs in Yes-No questions and commands do not establish referents. (383)
So the “coreferential pronoun or definite noun phrase later in the discourse” can, thanks to the discourse referent “established” by the indefinite NP, have a discourse referent even if, as in our examples, it is not used to refer. The notion of discourse referent has inspired a great deal of theorizing in semantics, including discourse representation theory (DRT) and so-called dynamic semantics. However, as is implicitly conceded by Karttunen’s definition (“[it] can be studied independently of any general theory of extra-linguistic reference”), discourse reference is no more reference than the relation we bear to our perceptual experiences (as opposed to objects perceived) is perception. The basic problem is simply this: a chain of “reference” isn’t a chain of reference unless it is anchored in an actual (“extra-linguistic”) referent. Despite the widespread use of the phrase ‘discourse referent’ in some semantic circles, so-called discourse referents are not literally referents. The pronouns in the examples like those we’ve just considered are not used as referential terms. They are used as surrogates for definite descriptions, descriptions which if present in place of the pronouns would not be referential. These pronouns are what Neale (1990, ch. 5) calls “D-type pronouns.” The basic idea is that the pronoun is used elliptically for a definite description recoverable from the matrix of the antecedent indefinite description (see Bach 1987/1994, 258–61). Neale develops a detailed account of how D-type pronouns work in a wide variety of cases. It is essential to this account that the descriptions implicit in the use of D-type pronouns not be construed as referential, even when they are used referentially. To see the significance of looking at such pronouns in this way, let us return to two of our examples. Consider (6) again: (6)
A plumber bought a lottery ticket yesterday, and he/the plumber won $1,000,000.
Suppose that the speaker merely heard that a plumber had bought a lottery ticket the day before and was now partying. In that case, the speaker
30
WHAT IS REFERENCE ?
does not have any particular plumber in mind and is not using ‘a plumber’ specifically. Even so, it might be thought that ‘he’ refers to a certain unspecified plumber (the ‘discourse referent’) who bought a lottery ticket the day before. But how could this be? Obviously, the first clause of (6) semantically expresses a general proposition, but what about the second clause? Does it express a singular proposition about a certain plumber who bought a lottery ticket the day before? Well, suppose that no plumber who bought a lottery ticket the day before won $1,000,000 or came even close. Then which plumber would the second clause of (6) be about, and what singular proposition would it express? Answer: no plumber, and no proposition. On the view that ‘he’ is referential in the second sentence of (6), that sentence would express a singular proposition if and only if it is true! Surely, though, which proposition a sentence semantically expresses, or that it expresses any, cannot depend on whether or not it is true. The situation is similar with a quantified sentence, as in (8), (8)
Every farmer owns a donkey. He feeds it popcorn.
Suppose there are farmers who own more than one donkey. In that case, what does it take for the second sentence in (8) to be true? Its truth does not require every farmer to feed popcorn to every donkey that he owns, but also it doesn’t require merely that every farmer feed popcorn to one donkey that he owns (theorists think that it must be one or the other, but they disagree as to which). So it is not clear what it would take for the second sentence in (8) to be true when there are farmers who own more than one donkey. However, DRT and dynamic semantics have been motivated by the thought that this is clear, and they have tried to make sense of sentences like the second one in (8) in terms of the notion of discourse referent. If an indefinite description is used to introduce a so-called discourse referent into a conversation and repeated back “reference” is made to this item, that doesn’t mean that the item is actually being referred to. It (the F that the speaker has in mind if indeed he has any in mind) wasn’t referred to when it was introduced, and repeated use of pronouns or definite descriptions to “refer” back to it doesn’t make any of what is going on count as reference. This is no more reference than if you used a formula with an unbound variable and then conjoined more and more atomic formulas with the same unbound variable. Any interpretation must give all the succeeding occurrences of the variable the same value, but pronouncing the variable as ‘it’ doesn’t make it a referential pronoun. S10
Fictional “reference” is pseudo-reference.
This is far too big a topic to take up in any detail here. Any serious discussion has to distinguish fictional reference (reference in a fiction) and reference (outside the fiction) to fictional entities. Reference in a fiction does not count as genuine reference, at least if it is to fictional persons, places, and things (in a fiction, there can be genuine reference to real persons, places, and things).
ON REFERRING AND NOT REFERRING
31
If Salmon (1998) is right in claiming that fictional entities are real, albeit abstract entities, then we can genuinely refer to them. Otherwise, we can only pretend to. In my view, both fictional reference and reference to fictional entities involve special sorts of speech acts, but there is nothing special about fictional language itself (Bach 1987/1994, 214–18). In particular, words in fictional discourse do not have special meanings, roles, or references just because they occur in fictional discourse. Obviously, writers of fiction can introduce special meanings for particular words, or even introduce new lexical items, as in Tolkien, or, indeed, a whole new language, such as Namsat in Anthony Burgess’s A Clockwork Orange. But this is irrelevant to the question of whether fictional names and pronouns in fiction refer to anything and to whether an author who uses such singular terms is referring. So far as I can tell, the desire to maintain some sort of Millian view about proper names and direct-reference view about indexicals is the only motivation for maintaining that genuine referring is going on in fiction. In my opinion, which I won’t try to defend here, facts about language and language use provide little support for substantive metaphysical theses.
2. Linguistic (Semantic) Reference and Singular Terms Strawson’s dictum was that expressions don’t refer, speakers do. The basis for it had nothing to do with Russell’s strange contention that the only “logically proper” names of ordinary language, of English in particular, are the demonstratives ‘this’ and ‘that,’ but only as used to refer to one’s current sense-data, and the pronoun ‘I’ (1917, 216). Russell based this contention on his highly restrictive doctrine of acquaintance, according to which the only particulars one can be acquainted with are one’s current sense-data and oneself. Everything else one can know only by description. Accordingly, Russell denied that ordinary proper names, like ‘Plato’ and ‘Pluto’, are logically proper names. That is, ordinary proper names cannot be understood on the model of individual constants of formal logic, which are Millian in the sense that their meanings are their references. Combining Strawson’s dictum with Russell’s contention yields an extremely restrictive answer to the question of which expressions are capable of referring. I will defend this answer, but on different grounds than Strawson’s and Russell’s. Strawson’s grounds were that virtually any expression that can be used to refer to one thing in one context can be used to refer to something else in another context. Even if that is correct, it is not a good reason for denying that expressions refer. It’s a good reason only for denying that they refer independently of context. Perhaps many expressions do refer, but do so relative to a context. So Strawson’s dictum needs the support of a better argument. Here’s a very simple one: almost any term that can be used to refer can also be used not to refer, and without any difference in meaning. This argument may seem too simple to be credible, but it does call into question philosophers’
32
WHAT IS REFERENCE ?
knee-jerk tendency to view singular terms on the model of individual constants of formal logic. As for Russell’s contention, we don’t need to accept his highly restrictive conception of acquaintance to insist that for an expression to refer to something it must introduce that thing into propositions semantically expressed by sentences in which it occurs. But what does it take for an expression to do that? L0 If an expression refers, it does so directly, by introducing its referent into the proposition semantically expressed by sentences in which it occurs (so ‘direct reference’ is redundant).
Contrary to Frege, Russell insisted that the relation of a description to what it denotes is fundamentally different from the relation of a name to what it refers to. Whereas a genuine, “logically proper” name introduces its referent into the proposition, a description introduces a certain quantificational structure, not its denotation. The denotation of a description is thus semantically inert—the semantic role of a description does not depend on what, if anything, it denotes. But a genuine name “directly designat[es] an individual which is its meaning” (1919, 174). Notice Russell’s use here of the adverb “directly” in characterizing how names designate their objects, just as Kaplan (1989a) characterizes indexicals and demonstratives as “directly referential.” However, given the distinction between denotation and reference, the occurrence of ‘directly’ in ‘directly referential’ is redundant; and ‘indirectly referential’ would be an oxymoron. So if we distinguish reference from denotation as two different species of what Kripke calls “designation,” then all expressions that (semantically) refer are rigid designators and all denoting expressions are non-rigid designators, except those, like ‘the smallest prime’, that are rigid de facto, that is, rigid for nonsemantic reasons (Kripke 1980, 21). This leaves open which expressions fall into which category. L1 So-called singular terms or referring expressions—indexicals, demonstratives (both simple and complex), proper names, and definite descriptions—can all be used in nonreferential ways, too.
To repeat the platitude from the beginning of part 1, we commonly talk about particular persons, places, or things, and in so doing we are able to accommodate the fact that they can change over time, that our conceptions of them can also change over time, that we can be mistaken about them, and that different people’s conceptions of them can differ. Moreover, it seems that all this is possible if in thinking of and in referring to an individual we are not constrained to represent it as having certain properties. This was Mill’s idea about proper names. In his view, their function is not to convey general information but “to enable individuals to be made the subject of discourse”; names are “attached to the objects themselves, and are not dependent on . . . any attribute of the object” (1872, 20). Similarly, according to Russell, a proper name, at least when “used directly,” serves “merely to indicate what we are speaking about; [the name] is no part of the fact asserted. . . : it is merely part of the symbolism by
ON REFERRING AND NOT REFERRING
33
which we express our thought” (1919, 175). In contrast, because the object a definite description describes “is not part of the proposition [expressed by a sentence] in which [the description] occurs” (170). Nevertheless, Russell allowed that proper names can not only be “used as names” but also “as descriptions,” adding that “there is nothing in the phraseology to show whether they are being used in this way or as names” (175). Interestingly, Russell’s distinction regarding uses of names is much the same as Donnellan’s famous distinction regarding uses of definite descriptions. If the property expressed by the description’s matrix (the ‘F’ in ‘the F’) enters “essentially” into the statement made, the description is used attributively;21 when a speaker uses a description referentially, the speaker uses it “to enable his audience to pick out whom or what he is talking about and states something about that person or thing” (1966, 285). Donnellan’s distinction clearly corresponds to Russell’s. Whereas an attributive use of a definite description involves stating a general proposition, as with the use of a proper name “as a description,” a referential use involves stating a singular proposition, just as when a proper name is used “as a name.” And just as Russell comments that “there is nothing in the phraseology” to indicate in which way a name is being used, so Donnellan observes that “a definite description occurring in one and the same sentence may, on different occasions of its use, function in either way” (281). If Russell and Donnellan are right, respectively, about proper names and definite descriptions, then expressions of both sorts can be used referentially (as a name, to indicate what we are speaking about) or attributively (as a description). This leaves open whether either sort of expression is semantically ambiguous (or maybe underdeterminate) or whether, in each case, one use corresponds to the semantics of the expression and the other use is accountable pragmatically from that use.22 For Russell, a definite description, whichever way it is used, is inherently a quantifier phrase, whereas a “logically proper” name is a referring term.23 Evidently Donnellan was unsure whether to regard the referential-attributive distinction as indicating a semantic ambiguity or merely a pragmatic one. However, it seems highly implausible that a given description-containing sentence should be semantically ambiguous, expressing a singular or a general proposition depending on whether the description is being used referentially or attributively. And very few philosophers are so moved by the referential-attributive distinction as to defend this rather implausible ambiguity.24
Proper Names Philosophers have hardly noticed Russell’s observation about the dual use of proper names. Like it or not, proper names do have non-referential uses, including attributive uses and predicative uses. Before discussing such uses and their significance for the Millian view of names, consider Russell’s rationale for his very narrow view regarding which singular terms qualify
34
WHAT IS REFERENCE ?
as logically proper names. It is based on a highly restrictive conception of acquaintance—there can be no doubt about the existence of an object of one’s acquaintance. Russell insisted that a logically proper name refer to an object of acquaintance in this highly restrictive sense. Although Russell is often ridiculed for his highly restrictive conception of acquaintance, it is interesting to note that he also had a more plausible, logical reason for his restrictive view of proper names. Consider that in standard, first-order logic the role of proper names is played by individual constants and existence is represented by the existential quantifier. So there is no direct way to use that notation to say that a certain object exists, say the one assigned the name ‘n’. In standard logic, we can’t straightforwardly say that n exists. We have to resort to using a formula like ‘∃x(x = n)’, which is to say that there exists something identical to n. And, when there is no such thing as n, we can’t use the negation of a formula of that form, ‘~∃x(x = n)’, to express the truth that there isn’t anything to which n is identical, because standard first-order logic disallows empty names. Free logic allows this, but either it has to represent existence as a predicate or else invoke some dubious distinction, such as that between existence and being. Anyway, my only point here is that Russell had a logical motivation for insisting that a genuine name be one which is (epistemically) guaranteed to have a referent. This is important because it eliminates the familiar problems for Millians discovered by Frege and by Russell. Millians who address these problems (Braun 1993, 2005; Salmon 1998; and Soames 2002, 89–95) have to use some fancy footwork to handle them in a way that comports with their Millianism.25 I won’t examine their treatments of these problems here but simply mention a few problems, some more familiar than others. First, there is the problem of existential statements, both positive and negative, containing terms that semantically refer (this problem is related to Kant’s famous contention, in connection with the ontological argument for the existence of God, that existence is not a property). Ordinarily, to ascribe a property to an object presupposes that it exists. To assert of something that it exists rather than not is not to presuppose that it exists. Indeed, it is rather odd to characterize this as asserting of something that it exists. Even more problematic is the case of negative existentials, and the related problem of empty names. To assert, for example, that Hamlet does not exist is surely not to assert of Hamlet that he does not exist, much less to presuppose that he exists. It is possible to argue that Hamlet is a fictional character, specifically an abstract entity created by Shakespeare (presumably), and that when one uses ‘Hamlet does not exist’ to deny that Hamlet exists one is not speaking literally and does not mean what one says but rather that Hamlet the fictional character is not a real person but only a character (Soames 2002, 94, endorsing Salmon 1998). This view seems less plausible with such names as ‘Santa Claus’, ‘Bigfoot’, and ‘Vulcan’. However, a Millian who denies that such names refer altogether must deny that sentences containing such names are capable of expressing propositions. Yet it seems that, to
ON REFERRING AND NOT REFERRING
35
take the most obvious example, the sentence ‘Santa Claus (Bigfoot, Vulcan) does not exist’ does express a proposition, indeed a true one. Since Millianism entails that proper names contribute only their bearers to the semantic contents of sentences in which they occur, it has to deny that there is any semantic difference between co-designating proper names. This, of course, gives rise to Frege’s problem (see Salmon 1986) about the informativeness of identity sentences like ‘Marshall Mathers is Eminem’ and to the further problem about substituting co-designating names in attitude ascriptions, as here: (9)
George believes that Eminem is a great musician.
(10)
George believes that Marshall Mathers is a great musician.
It seems to many that (9) could be true while (10) is false, hence that they have different semantic contents. That is because, it seems, these two belief sentences ascribe to George belief in two different things. Millians must reject that, and explain away the appearance of substitution failure as based on some sort of pragmatic or psychological confusion (see Salmon 1986, Braun 1998, and Soames 2001). However, many philosophers find such explanations, however ingenious, to be implausible. There are also ascriptions like these to consider: (11)
Nimrod thinks that Michael Jackson is a great basketball player.
(12)
Bozo thinks that Michael Jackson is Michael Jordan.
The contents of these ascriptions, in their most likely uses, obviously do not accord with Millianism. In uttering (11) a speaker would probably not be attributing to Nimrod a belief about Michael Jackson. And it is unlikely that a speaker uttering (12) would be using ‘Michael Jordan’ to refer to Michael Jordan, much less to attribute to Bozo a belief in the false identity proposition which, according to Millianism, is semantically expressed by ‘Michael Jackson is Michael Jordan’. The Millian’s only way around these examples would be to argue that when used in the ways described, these sentence are not being used literally. Millians tend to neglect the fact that names can be used as predicates (Burge 1973, Lockwood 1975), as in these examples: (13)
Dan Quayle is no Jack Kennedy.
(14)
Leningrad became St. Petersburg in 1991.
Indeed, names can be pluralized and combined with quantifiers as in (15) and (16),
36
(15)
Any Kennedys have died tragically.
(16)
There are hundreds of O’Learys in Dublin.
WHAT IS REFERENCE ?
You could argue that (13) is not being used literally but is used to mean that Quayle was not like Kennedy, and you could argue that in (15) the name ‘Kennedy’ is not being used literally, since it is used to mean member of the famous Kennedy family, but the intended uses of (14) and (16) seem perfectly literal, at least to people I’ve informally queried. Clearly the Millian treatment of proper names on the model of individual constants can’t handle such examples, which suggests that proper names are more like other nominals than is commonly supposed. In general, nominals occur with determiners (as in ‘the man’, ‘an animal’, ‘few tigers’, ‘all reptiles’, and ‘some water’), and so-called bare nominals, such as ‘reptiles’ and ‘water’, are treated by syntacticians as constituents of noun phrases with covert determiners. In fact, noun phrases are now generally classified as determiner phrases, which include a position for a determiner even if there is no overt one. As some of these examples illustrate, proper names can occur with overt determiners, but even singular proper names (in the context of a sentence) are constituents of noun phrases, despite generally occurring, at least in English, without an overt determiner. Indeed, in some languages, such as Italian and German, singular proper names are often used with the definite article. I could go into much greater detail, but examples like (11), (12), (14), and (16) suffice to suggest that proper names can be used nonreferentially yet apparently literally.26 At the very least, Millians need to show that these are not literal uses or else that proper names are systematically ambiguous as between referential and non-referential uses. A further consideration is that proper names seem to be able to function as variable binders in just the same way as noun phrases that are clearly quantificational. Compare the following two sentences, in which the relation between the pronoun and the noun phrase that syntactically binds it seems to be the same: (17)
Bob1 hates his1 boss.
(18)
Every employee1 hates his1 boss.
It might seem that the pronoun ‘his1’ is a referentially dependent anaphor when bound by a singular term like a proper name and is a variable when bound by a quantificational phrase. However, it is difficult to see what the relevant difference could be (for a detailed argument to this effect, see Neale 2004). Notice further that there are readings of the following sentences in which the proper name is coordinate with a quantifier phrase, as in (19) and (20), or occurs as part of a quantifier phrase, as in (21), that binds the pronoun: (19)
[Bob and every other employee]1 hates his1 boss.
(20)
[Bob and most other employees]1 hate their1 boss.
(21)
[Only Bob]1 hates his1 boss.
Against the suggestion that a proper name is a variable binder it could be argued, I suppose, that in (19) and (20) it is the entire phrase in which the
ON REFERRING AND NOT REFERRING
37
proper name occurs that binds the pronoun, but consider the following example, involving verb-phrase ellipsis: (22)
Bob hates his boss, and so does every other employee.
If the pronoun is not a bound variable, then (22) could only mean that every other employee hates Bob’s boss. It could not have a reading on which it says that every other employee hates his respective boss.
Indexicals We have already seen (in Point S9) that indexicals can be used nonreferentially but literally.27 The most obvious example is when they are anaphoric on but not bound by an indefinite description or other quantifier phrase and are used as short for a definite description recoverable from the nominal contained in that phrase, as in these earlier examples: (5)
Russell met a man today. He was bald.
(6)
A plumber bought a lottery ticket yesterday, and he won $1,000,000.
(7)
If there were a mermaid there, Merlin would have seen her.
(8)
Every farmer owns a donkey. He feeds it popcorn.
Here are two more examples, involving simple and complex demonstratives: (23)
I thought I saw a dagger, but this was only a hallucination.
(24)
Everyone who survives a heart attack never forgets that moment.
King (2001) has investigated various nonreferential uses of complex demonstratives and develops a unitary semantic account on which both their referential and nonreferential uses can be understood as literal. What about so-called descriptive uses of indexicals, as documented by Nunberg (1993, 2004; see also Recanati 1993, ch. 16), as well as quantificational uses? Here are some examples: (25)
[Answering the phone after 10 rings] I thought you were a telemarketer.
(26)
Any time she gives you her phone number, she’s interested.
(27)
[bumper sticker] If you can read this, you’re getting too close.
(28)
He who hesitates is lost.
(29)
Never put off to tomorrow what you can do today.
These uses are clearly not referential, but are they literal? If so, an adequate semantic account of each of these indexicals, unless it posits outright ambiguity, would have to characterize their meanings in a way that is compat-
38
WHAT IS REFERENCE ?
ible with their having both nonreferential and referential uses. However, it is arguable that these uses are not literal.28 However, the nonreferential uses of the indexicals and demonstratives in (5)–(8), (23), and (24) do seem clearly to be literal and do not suggest any semantic ambiguity. L2 A given singular term seems to mean the same thing whether it is used referentially or not, and an adequate semantic theory should explain this or else explain it away.
Philosophers may disagree about which particular sorts of expressions are capable of referring, but there is general consensus that at least some deserve the label ‘referring expression’. For example, it is widely supposed that proper names, indexicals, and demonstratives are referring expressions, with allowances made for reference failure if not for nonreferential uses. It is almost as widely supposed that definite descriptions are not referring expressions, even though they can be used to refer, and are, rather, quantifier phrases. A more controversial case is that of complex demonstratives, which have the form of quantifier phrases but often seem to behave like referring expressions.29 So what should we say about Strawson’s dictum? Do some expressions, at least in some of their uses, qualify as referring expressions and not merely as expressions that can be used to refer? Or was he right to insist that referring is not something an expression does and is merely something that speakers can use expressions to do? If Strawson was right, it was not for the right reasons. He relied heavily on the fact that an alleged referring expression can be used to refer to different things on different occasions and took that to be sufficient for his conclusion. He did not consider the possibility that, à la Kaplan, an expression can have different referents with respect to different contexts. So, for example, relative to a context ‘I’ would seem to refer to whoever uses it, and ‘now’ would seem to refer to the time at which it is used (but see Smith 1989 and Predelli 1998). Even so, the question remains, given that some expressions, notably definite descriptions, which are clearly not referring expressions can be used to refer, why suppose that expressions of any sort that can be used to refer can be so used only because they themselves (semantically) refer? Here’s an embarrassingly simple argument:
ESA 1. Virtually any expression that can be used to refer can also be used literally but not referentially. 2. No variation in meaning (semantic ambiguity or underspecification, indexicality, or vagueness) explains this fact. 3. So the meaning of such an expression is compatible with its being used nonreferentially. 4. So virtually any expression that can be used to refer is not inherently referential.30
ON REFERRING AND NOT REFERRING
39
It remains to be seen who this argument embarrasses. If it is a bad argument, even if put more rigorously than stated here, it should embarrass me. However, whether good or bad, it should embarrass anyone who endorses an account according to which expressions of a given type are referring expressions and who does not address the case of nonreferential uses of those expressions, much less reconcile their pet account with those uses. Referentialists about definite descriptions are the only ones who regularly face up to the fact that the expressions they’re concerned with have non-referential uses. They may have to resort to the claim that definite descriptions are systematically ambiguous, indexical somehow, or semantically underspecified, but at least they confront the problem. Referentialists about indexicals, demonstratives, and proper names try to survive on a lean diet of examples and, to stay on their diet, keep non-referential uses out of sight. Direct-reference theorists about indexicals and demonstratives rarely consider descriptive uses of those expressions, and when they do treat of such uses, tend to engage in special pleading to avoid abandoning their referentialist predilections. Similarly, Millians, who think of proper names on the model of individual constants, do not bother reckoning with predicative uses of proper names. They implicitly dismiss predicative uses as marginal cases. Long ago Tyler Burge (1973) deplored such an attitude, with its “appeal to ‘special’ uses whenever proper names do not play the role of individual constants,” as “flimsy and theoretically deficient” (1973/1997, 605). Much preferable is a unified account of names, one that handles their various uses instead of marginalizing those uses which, according to one’s pet theory, count as deviant. As the examples in the last section illustrate, indexicals seem to have literal uses that do not fall into the referentialist paradigm. These uses do not seem to be explained by some special sort of semantic ambiguity or underspecification, and they do seem literal. So how can they be explained? The ESA suggests that whatever their explanation, a purely referentialist account can’t provide it. L3 When meaning doesn’t fix reference, generally “context” doesn’t either.
At the outset I mentioned a verbal slippery slide that seems to lead philosophers from the trivial claim that singular terms can be used to refer to the conclusion that these terms are semantically referring expressions. Here’s another verbal slippery slide I’ve noticed. It starts from the platitude that what an expression can be used refer to can vary from one context to another or, in the case of an ambiguous expression, that what an expression can be used to mean can so vary. People slide from contextual variability to context relativity to context sensitivity to context dependence to contextual determination. This leads people to conclude that context somehow manages to “provide” or “supply” semantic values to expressions, resolve ambiguities, and work other semantic miracles. That’s why I call this an appeal to “context ex machina” (the title of Bach 2004c). Context does have a role to play in semantics, but its
40
WHAT IS REFERENCE ?
role is limited. There are a few expressions which really do refer as a function of context, but in general it’s not the context that does the trick. Later (Point L4) I defend the obvious alternative, that the speaker’s referential intention does the trick, and that it is a mistake to treat the speaker’s intention as part of context, as just another contextual parameter. Indexicals and demonstratives are often casually described as “contextsensitive” or “context-dependent.” Taken literally, this means that the reference of such a term is determined by its linguistic meaning as a function of a contextual variable (call this the semantic context). But is the reference of indexicals and demonstratives really context-dependent in this sense? It is not obvious that indexicals in general, including demonstratives, should be assimilated to the special case of “pure” indexicals. The reference of pure indexicals, such as ‘I’ and ‘today’, may be determined by their linguistic meanings as a function of specific contextual variables (this is context in the narrow, semantic sense), but other indexicals—and demonstratives—are different. Their meanings merely impose constraints on how they can be used to refer (Bach 1987/1994, 186–92), and context doesn’t finish the job. That’s why John Perry describes their reference as “discretionary” rather than “automatic,” as depending on the speaker’s intention, not just on “meaning and public contextual facts” (2001, 58–59). That is, the speaker’s intention is not just another contextual variable, not just one more element of what Kaplan calls “character” (1989a, 505). If this is correct, then demonstratives and most indexicals suffer from a character deficiency.31 Context does not determine reference, in the sense of constituting it, of making it the case that the reference is so-and-so; rather, it is something for the speaker to exploit to enable the listener to determine the intended reference, in the sense of ascertaining it. Accordingly, although it is often casually remarked that what a speaker says in uttering a given sentence “depends on context,” is “determined” or “provided” by context, or is otherwise a “matter of context,” this is not literally true.32 What Perry describes as “public contextual facts” is not context in the narrow, semantic sense but context in a broad, cognitive or evidential sense. It is the mutually salient common ground, and includes the current state of the conversation (what has just been said, what has just been referred to, etc.), the physical setting (if the conversant are face-to-face), salient mutual knowledge between the conversant, and relevant common background knowledge. Its role is epistemic, not constitutive; pragmatic, not semantic. Because it can constrain what a hearer can reasonably take a speaker to mean in saying what he says, it can constrain what the speaker could reasonably mean in saying what he says. But it is incapable of determining what the speaker actually does mean. That is a matter of the speaker’s referential intention and his communicative intention as a whole, however reasonable or unreasonable it may be. To appreciate this point, first consider an example involving ambiguity. Suppose a dinner host utters the ambiguous sentence ‘The chicken is ready to
ON REFERRING AND NOT REFERRING
41
eat’. Presumably she is not saying and does not mean that a certain chicken (one of the guests!) is hungry. Even so, given the ambiguity of the sentence, she could, however bizarrely, say and mean that. Context doesn’t make it the case that she does not. But, of course, she could not reasonably expect such a communicative intention to be recognized. Now consider an example involving demonstrative reference. Suppose you see a group of ducks sitting quietly by a pond and one duck starts quacking furiously. You say, “That duck is excited.” I naturally take you to be referring to the duck that’s quacking. But is it the context that makes it the case that this is the duck that you are referring to? Not at all. For all I know, and contrary to what I can reasonably suppose, you could be referring to a quiet duck that you recognize by its distinctive color. I won’t identify which duck you’re referring to, and you haven’t done enough to enable me to, but still you could be trying to refer to that duck, however ineffectually. So if ‘that duck’ refers (relative to this context), what does it refer to? The quacking duck or the distinctively colored duck? Given the story I have just told, it is clear which duck you intend to be referring to (the distinctively colored one) and which duck I take you to be referring to (the quacking one). But is there any determinate fact of the matter as to which duck ‘that duck’ refers to? I don’t think so, and I don’t think there is any reason to expect so. So philosophers can casually describe context as “providing” or “supplying” the references of demonstratives and discretionary indexicals, but these expressions do not refer as a function of the contextual variables given by their meanings, that is, narrow, semantic context. But broad, cognitive context does not determine reference either, in the sense of making it the case that the expression has a certain reference. It merely enables the audience to figure out the reference. That’s why I say that demonstratives and discretionary indexicals suffer from a “character deficiency”—they do not refer as a function of context. It is only in an attenuated sense that these expressions can be called ‘referring’ expressions. Besides, as we have seen, they have clearly nonreferential but perfectly literal uses, for example, as proxies for definite descriptions and as something like bound variables. That’s why it’s a real challenge to give a fully general account of the meaning of indexicals. I wish I could meet that challenge here and do something like what King (2001) has done for complex demonstratives. L4 The speaker’s referential intention determines speaker reference, but it does not determine semantic reference, except in a pickwickian way.
The fact that the speaker’s intention picks up the slack in determining reference might suggest that the specification of the meaning of a discretionary indexical or a demonstrative contains a parameter for the speaker’s intention. However, I am unaware of any direct argument for that. There is talk about how the reference of indexicals and demonstratives is “determined by context” but no argument as to why the speaker’s referential intention should
42
WHAT IS REFERENCE ?
count as part of the context. I think there’s reason to think that it shouldn’t. If context were defined so broadly as to include anything other than linguistic meaning that is relevant to determining what a speaker means, then of course the speaker’s intention would be part of the context. But if the context is to play the explanatory role claimed of it, it must be something that is the same for the speaker as it is for his audience, and obviously the role of the speaker’s intention is not the same for both. Context can constrain what the speaker can succeed in communicating given what he says, but it cannot constrain what he intends to communicate in choosing what to say. Of course, in implementing his intention, the speaker needs to select words whose utterance in the context will enable the hearer to figure out what he is trying to communicate, but that is a different matter. To illustrate the role of speakers’ intentions, let’s look at some simple examples involving pronouns used to make anaphoric reference.33 Compare (30a) and (30b): (30)
a. A cop arrested a robber. He was wearing a badge. b. A cop arrested a robber. He was wearing a mask.
It is natural to suppose that in (30a) ‘he’ refers to the cop and in (30b) to the robber. It is natural all right, but not inevitable. The speaker of (30a) could be using ‘he’ to refer to the robber, and the speaker of (30b) could be using it to refer to the cop. Such speakers would probably not be understood correctly, at least not without enough stage setting to override commonsense knowledge about cops and robbers, but that would be a pragmatic mistake. Nevertheless, the fact that ‘he’ can be so used indicates that it is the speaker’s intention, not the context, which determines that in (30a) it refers to the cop and in (30b) to the robber. The same point applies to these examples with two pronouns used anaphorically: (31)
a. A cop arrested a robber. He took away his gun. b. A cop arrested a robber. He used his gun. c. A cop arrested a robber. He dropped his gun. d. A cop arrested a robber. He took away his gun and escaped.
In (31a), presumably ‘he’ would be used to refer to the cop and ‘his’ to the robber, whereas in (31b) both would be used to refer to the cop, in (31c) both would be used to refer to the robber, and in (31d) ‘he’ would be used to the robber and ‘his’ to the cop. However, given the different uses of the pronouns in what is essentially the same linguistic environment, clearly it is the speaker’s intention, not the context, that explains these differences in reference. It is a different, pragmatic matter how the audience resolves these anaphoric references; the broad, communicative context does not literally determine them but merely provides the extralinguistic information that enables the audience to figure them out.
ON REFERRING AND NOT REFERRING
43
Similar points apply to demonstrative reference. Reference is not determined by acts of demonstration or any other features of the context of utterance. Rather, these features are exploited by the audience to ascertain the reference, partly on the basis of being so intended. Indeed, they are exploited by the speaker in choosing what expression to utter to carry out his referential intention, since, as part of his communicative intention, he intends his audience to take into account the fact that he intends them to recognize his intention. His referential intention determines the reference, but this is not to suggest that it succeeds by magic or is somehow self-fulfilling. You cannot utter any old thing and gesture in any old way and expect to be taken to be referring to whatever you have in mind. You do not say something and then, as though by an inner decree (an intention), determine what you are using the expression to refer to. You do not just have something in mind and hope your audience is a good mind-reader. Rather, you decide to refer to something and try to select an expression whose utterance will enable your audience, under the circumstances, to identify that object (see Point S4). If you utter ‘that duck’ and the duck you intend to be referring to is the only one around or is maximally salient in some way, you won’t have to do anything more to enable your audience to identify it. Otherwise, you will need to point at it and make it salient, hence make it obviously the one you intend to be referring to.34 Here are a few more examples. Suppose you point at a Ferrari and say, “That belongs to me.” Presumably you’re referring to that particular car. Suppose you say instead, “That’s my favorite color.” Presumably you’re referring to the color of that car. Suppose you say instead, “That’s my favorite sports car.” Presumably you’re referring to that type of car, Ferrari, or perhaps that particular model, say a Spider. In each case, what enables your audience to figure out what you’re referring to is the content of the predicate. In each case, that’s what you can expect them to take into account in figuring this out, and they can reasonably assume that this is what you expect. But nothing is to prevent you from intending to refer to something else. For example, you could be referring to that particular car when you say, “That’s my favorite sports car” (you might have a big car collection that includes sports cars). And you could be referring, however incoherently, to that model of Ferrari when you say, “That’s my favorite color.” In this last case, you’d have to say something much more elaborate in order to succeed in communicating what you mean. With a personal pronoun or a complex demonstrative, more remote references are possible. You could say, “He/That guy spends all his money on cars” and be referring to the owner of that Ferrari, or you could say, “She/That woman is going to leave him” and be referring to his wife. In each of these cases, it is not literally the context but the speaker’s referential intention that determines the reference. And, as I have been suggesting, it’s only in an attenuated sense that the expression used to refer, whether a demonstrative or a personal pronoun, does the referring. Now I have been supposing all along that speaker reference is essentially an audience-directed affair, that you use an expression to refer someone to
44
WHAT IS REFERENCE ?
something. This was Point S0, that speaker reference is a four-place relation, between a speaker, an expression, an audience, and a referent. One might agree that a referential intention in that sense, which is inherently pragmatic, does not determine semantic reference, but insist that speakers have specifically semantic intentions that do. After all, it might be argued, when a speaker utters a sentence containing a lexical or a structural ambiguity, it is the speaker’s intention that resolves the ambiguity. For example, if someone utters, “My lawyer is lying on the bench” or “The turkey is ready to eat,” how it is to be taken is a matter of the speaker’s intention. So why not suppose that when we use a demonstrative word or phrase, a discretionary indexical, or a proper name belonging to more than one individual, we use it with an intention that genuinely gives it a reference relative to that context?35 The short answer is that it’s one thing to select from properties that an expression or a string of words already has and quite another thing to endow an expression with a new property. (Indeed, in the former case it is arguable, on the assumption that linguistic items are form-meaning pairs, that the relevant linguistic intention is simply to utter a certain sentence, rather than another, like-sounding one.) To appreciate the difference, imagine that you utter a sentence containing a common expression you intend to use in a new, unprecedented way. Say you utter “My dog has a deleterious tail” and mean that your dog has a curly tail. Even though this is what you mean, your intention to use ‘deleterious’ to mean curly does not endow ‘deleterious’ with a new meaning. Even if your audience figures out what you mean, in much the way they would if you used a word unfamiliar to them, ‘deleterious’ doesn’t acquire a new meaning. The situation is more like that of using an expression metaphorically, where you say one thing and mean something else instead, except that the literal meaning plays no role in enabling the audience to figure out what the speaker means. In both cases, the audience has to figure out that the expression is not being used in a normal way, but in the case of ‘deleterious’ its conventional meaning is merely a distraction. Now I am not suggesting that using an expression (such as a demonstrative) to refer to something is just like using a familiar word in an unfamiliar way. Obviously, a referential use of a demonstrative is consistent with its meaning. The relevant similarity is that in both cases the putative property of the expression plays no role. So if you say, referring to your desk lamp, “That is black,” your audience does not figure that you are using ‘that’ to refer to your lamp by way of determining that ‘that’ refers to it. Rather, they figure out what you could plausibly intend and reasonably expect them to be using ‘that’ to refer to. So even if, contrary to what I am suggesting, ‘that’ does refer to the lamp, this would play no role in how your audience recognizes what you’re using ‘that’ to refer to. Except perhaps for the case of pure indexicals, semantic reference by singular terms is an otiose property. Attributing this property to singular terms across the board commits a version of what Barwise and Perry call the “fallacy of misplaced information,”
ON REFERRING AND NOT REFERRING
45
that is, “that all the information in an utterance must come from its interpretation” (1983, 34), and ignores the essentially pragmatic fact that the speaker is making the utterance. I am well aware of our deep-seated inclination to think of demonstratives and singular terms generally as expressions that refer. This inclination is especially strong in the context of modal logic, formal semantics, and model theory, where it’s customary to speak of “assignments” and not worry about where they come from. We think this way when proving that a certain proposition is possibly true, that a certain proposition is necessarily true, that one proposition entails another, and so on, but what is the rationale for this way of thinking when theorizing about natural language and its use? I don’t deny that for formal purposes one can assign referents to singular terms, but this is a matter of pure stipulation. However, the singular terms of natural language, with the possible exception of pure indexicals, all have literal but nonreferential uses and at least some of these uses are perfectly literal. So however deep-seated our tendency to think that singular terms refer and are not merely used to refer, there is still a need for an argument for why we need the notion of reference made by a singular term and for why we can’t make do with the notion of reference made by a speaker in using it. What do we need the former notion for? What is added by saying not merely that the peaker is using ‘that’, for example, to refer to something but also that the word ‘that’, as used by the speaker refers to it? L5 There is no such thing as descriptive “reference-fixing” (not because something isn’t fixed, but because it isn’t reference).
This point is a corollary of a point made earlier, Point S7, that descriptive singling out does not count as genuine reference. Using a description like ‘the planet that is perturbing Uranus’ to “fix” the reference of ‘Neptune’ to a certain planet, or using a description like ‘the serial killer terrifying the people of London’ to “fix” the reference of ‘Jack the Ripper’, where the description is treated as rigidified (as “Dthat-ed”), is to do nothing more than to make the names equivalent to rigidified descriptions. It does not enable such a name to introduce the individual described into propositions semantically expressed by sentences in which the name occurs. I am not denying that, when the names ‘Neptune’ and ‘Jack the Ripper’ were introduced, there were singular propositions containing Neptune or Jack the Ripper. I am merely denying that sentences containing those names expressed such propositions. I am not denying that Neptune was given the name ‘Neptune’ or that Jack the Ripper, whoever he was, was given the name ‘Jack the Ripper’. In denying that socalled descriptive reference-fixing manages to fix reference, I am denying that these names functioned as referring terms. As we saw earlier, Kaplan’s liberality about direct reference imposes no constraint, beyond the requirement of unique satisfaction, on the definite description to which ‘dthat’ can be applied to yield a “directly referential” term. As he wrote, “ignorance of the referent does not defeat the directly
46
WHAT IS REFERENCE ?
referential character of indexicals” (1989a, 536). Evidently, thought Kaplan, this character can be created with the stroke of a pen. He thought it could be done not only with his ‘dthat’ operator but also with proper names, such as the name ‘Newman 2’ for the first child born in the twenty-second century. I don’t deny that this child, assuming there will be one, can be given that name now. What I deny is this act of dubbing makes it the case that we can thereby form singular thoughts about Newman 2. Furthermore, we understand sentences containing that name, for example, ‘Newman 2 will be bald’, and do so without having a singular thought about Newman 2. So the proposition that sentence expresses can’t be a singular proposition (about Newman 2). The closest that “descriptive reference-fixing” comes to enabling a name to refer is that the description involved can be taken as rigidified. But this doesn’t mean that sentences containing such a name express singular propositions. Even though rigidifying the description (‘the actual F’) means that the only individual whose properties are relevant to the truth or falsity of the proposition being expressed (even if that proposition is modal) is the actual satisfier of the description (if it should exist), still that proposition is general, not singular. This proposition may in some sense be object-dependent, but it is not object-involving. The property of being the actual F may enter into the proposition, but the actual F does not. L6 Pragmatic arguments of the same sort used to defeat objections against Millianism (such as those based on fictional and empty names and on occurrences of names in attitude contexts) can also be used against Millianism itself.
There is an interesting irony about Millianism. To defend the claim that proper names semantically refer and that sentences containing them semantically express singular propositions about their bearers, Millians indulge in fancy footwork, typically of the pragmatic variety, to meet objections based on Frege’s and Russell’s puzzles. These efforts, which I won’t review here, are often charged with being counterintuitive, but Millians meet that charge by exploiting the distinction between the semantic content of a sentence (relative to a context) and what a speaker would normally convey in uttering the sentence. A typical strategy is to explain away this counterintuitiveness by claiming that the relevant intuitions pertain to the truth conditions of what people ordinarily use the sentences in question to convey. This is essentially the strategy which Kripke (1977) used to explain away the apparent semantic significance of the referential-attributive distinction regarding definite descriptions (he relied on a distinction between semantic reference and speaker’s reference) and which two of the most prominent Millians, Salmon (1986) and Soames (1988, 2002), use to explain away the anti-substitution intuition about names in attitude contexts.36 The irony is that the same strategy might be effective against Millianism itself, which, after all, is based mainly on what Kripke describes as “direct intuitions of the truth conditions of particular sentences” (Kripke 1980, 14).37 Perhaps the intuitive basis for Millianism is not as strong as it seems.
ON REFERRING AND NOT REFERRING
47
I agree that our intuitions are often insensitive to the difference between the semantic contents of sentences and what speakers normally use them to convey. There is a deep explanation for their insensitivity, which reflects the fact that for efficient and effective communication people rarely make fully explicit what they are trying to convey and rarely need to. Most sentences short enough to use in everyday conversation do not literally express things we are likely ever to mean, and most things we are likely ever to mean are not expressible by sentences we are likely ever to utter.38 Moreover, in the course of speaking and listening to one another, we generally do not need to make conscious intuitive judgments about the semantic contents of the sentences we utter or hear. We focus instead on what we are communicating or on what is being communicated to us. We do not need to be able to make accurate judgments about what information is semantic and what is not in order to have real-time access to semantic information. For this reason, seemingly semantic intuitions cannot be assumed to be driven by, or to be reliable about, what we take them to be about. Consider an analogy with the case of proper names. The number or, rather, the numeral on an athlete’s jersey is often used to refer to that player. In special cases, this can occur even outside of the context of a particular game. For example, Willie Mays was for decades referred to as ‘24’ and, more recently, Michael Jordan as ‘23’ (LeBron James may now have something to say about that). Yet nobody would suggest these numbers themselves refer to those players. So why does it seem to most philosophers that proper names aren’t merely used to refer to their bearers but do so themselves? When you use a name to refer, generally the property of bearing the name does not enter into what you are trying to convey. For example, if you say, “Aristotle was the greatest philosopher of antiquity,” presumably you are not suggesting that having the name ‘Aristotle’ had anything to do with being a great philosopher. Rather, you intend the property of bearing that name merely to enable your audience to identify who you are talking about. In this respect proper names are like most definite descriptions, which are incomplete and are also generally used referentially.39 And when we use them to refer to specific individuals, the properties they express are incidental to what we are trying to convey. Suppose that the property of bearing a certain name did matter. Suppose we cared about the proper names people had regardless of whose names they were. An employer might want to hire someone because his name was ‘Peter Pepperoni’, a tourist might visit a city because its name was ‘Cincinnati’, and a diner might be tempted to try a restaurant called ‘Colestra’. However frivolous such sentiments might be, people could attach great importance to names and come to regard bearing a certain name as a noteworthy property, regardless of who or what the name belongs to. In such a world, proper names would commonly have attributive uses. The intuition of rigidity has the same source as the intuition of referentiality. According to Kripke, “We have a direct intuition of rigidity, exhibited in our understanding of the truth conditions of particular sentences. In addition, various secondary phenomena, about ‘what we would say’, . . . give
48
WHAT IS REFERENCE ?
indirect evidence of rigidity.” (1980, 14). However, he does not show that it is the truth conditions of sentences, rather than of what people try to convey in uttering them, that drive our intuitions. Of course, it is not true that Aristotle might not have been Aristotle. Obviously he could not have been somebody else. But it does not follow that sentence (32) semantically expresses this (false) singular proposition. (32)
Aristotle might not have been Aristotle.
If one takes this sentence to express that proposition, one takes both occurrences of the name ‘Aristotle’ to refer to a particular Greek philosopher. Then of course the sentence will seem false. How could the sentence be true? Well, suppose that Aristotle’s parents decided to name their first two sons ‘Aristotle’ and ‘Aristocrates’ but hadn’t decided in which order. Then, when their first son was born, they made up their minds and named him ‘Aristocrates’, saving ‘Aristotle’ for their second son, the future student of Plato. They could have made the reverse decision. In this scenario, sentence (32) is true: Aristotle would have been Aristocrates, not Aristotle. On the (predicative) reading on which it is true, it does not mean that Aristotle might have been somebody else but merely that he might not have had the property of bearing ‘Aristotle’. This reading is perfectly intuitive, at least to anyone not taken in by Millianism. On that reading, at least the second occurrence of ‘Aristotle’ does not refer to Aristotle.40
3. The Bottom Line Referring is not as easy as is commonly supposed. Much of what speakers do that passes for referring really isn’t but is merely alluding or describing. And it is far from clear that so-called referring expressions (aside from the few pure indexicals) really refer, except in a pickwickian sense. But I must repeat my running disclaimer: I do not pretend that the data, observations, or even the arguments presented here, especially what I dubbed the “Embarrassingly Simple Argument,” are conclusive. I do think they support what might fairly be regarded as default hypotheses about speaker reference and linguistic reference, for example, that a demonstrative has the same meaning whether or not it is used referentially and is used literally either way. The ESA poses the challenge of refuting these hypotheses. So if you think these hypotheses are wrong, you need to show that. You need to argue against them and to find a way to accommodate or explain away the data and observations. You can’t just appeal to intuitions about truth or falsity of certain sentences in various circumstances unless you make a case that this is what the intuitions are really responsive to. And you can’t make that slippery slide from speaker reference to linguistic or semantic reference by blindly attributing referential properties to uses of linguistic expressions or to tokens of them. It is one thing for a speaker, when using an expression in a certain way, to express a thought about a certain object and quite another for the expression to stand
ON REFERRING AND NOT REFERRING
49
for that object, even relative to the context. Pure indexicals may do this, but other singular terms do not, or at least we have not seen any reason to suppose that they do, however deep-seated our tendency to think that they do. This chapter is an expanded version of “What Does It Take to Refer?” in The Oxford Handbook of Philosophy of Language, edited by Ernest Lepore and Barry C. Smith (Oxford: Oxford University Press, 2006). NOTES 1. In my view, only the last notion, context-relative reference by an expression, has a genuine place in semantics. In particular, I think utterance reference is a bastard notion, as is the notion of utterance content considered as semantic. The only respect in which an utterance has content over and above the semantic content (relative to the context) of the uttered sentence is as an intentional act performed by a speaker. In that respect, the content of an utterance is really the content of the speaker’s communicative intention in making the utterance. Focusing on the normal case of successful communication, where the listener gets the speaker’s communicative intention right, can make it seem as though an utterance has content in its own right, independently of that intention. But this is illusory, as is evident whenever communication fails. In that case, in which the speaker means one thing and his audience thinks he means something else, there is what the speaker means (and what he could reasonably mean) and what his listener takes him to mean (and what she could reasonably take him to mean), but there is no independent utterance content. For further discussion, see Bach 2004c, sec. 1. 2. Although our topic is singular reference, there is a broad sense in which every expression refers (or at least every expression that has a semantic value that contributes to the propositional content of sentences in which it occurs). There is also the question of which expressions have such semantic values or, to put it differently, which syntactic units are semantic units. The most famous instance of this question concerns definite descriptions. Russell’s answer was that they are not semantic units. Although he granted that definite descriptions have denotations of sorts, according to his theory of descriptions they “disappear on analysis” and are therefore semantically inert. This does not follow from the fact (or alleged fact, if Graff (2001) is correct) that they are quantifier phrases, because quantifier phrases can be, and nowadays often are, treated as semantic units whose semantic values are properties of properties (with the determiners they contain having two-place relations between properties as their semantic values). In any case, the phrase ‘referring expression’ is ordinarily limited to any expression whose propositional contribution is its referent (if it has one). 3. One could argue that linguistic reference is not really a two-place relation, in that (i) (some) expressions, namely indexicals and demonstratives, refer only relative to a context, so that the same expression can have different referents in different contexts, and (ii) it is only as belonging to a particular language that an expression refers, so that the same expression could have different referents in different languages. In reply one could argue, first, that even if linguistic reference is context-relative, this shows only the relation that obtains between an expression and its referent is context-bound, not that it is really a three-term relation, and,
50
WHAT IS REFERENCE ?
second, that the same expression cannot literally occur in more than one language (that expressions are individuated partly by the languages they belong to). I think that nothing substantive hinges on either question—both seem merely terminological. 4. The view I am alluding to, inspired largely by Austin (1962), Strawson (1964), and Grice (in the papers on meaning and conversation collected in his 1989 volume), was expounded and defended in Bach and Harnish 1979 and is sketched in Bach 2004a. 5. Evans takes a similar view. He conceives of referring as part of communicating, and thinks that “communication is essentially a mode of the transmission of knowledge” (1982, 312), whereby the addressee comes to know of the individual to which the speaker refers. 6. There is the further question of whether a singular proposition can comprise the complete content of a singular thought. Schiffer (1978) argued that it cannot. In my view, de re modes of presentation are also involved (Bach 1987/1994, ch. 1). Moreover, I have argued that a belief ascription whose ‘that’-clause expresses a singular proposition does not fully individuate the belief being ascribed (Bach 1997, 2000). I point out that, for example, the one ‘that’-clause (assuming it expresses a singular proposition) in the two ascriptions, ‘Peter believes that Paderewski had musical talent’ and ‘Peter disbelieves that Paderewski had musical talent’, does not fully characterize something that Peter both believes and disbelieves. And, as I say, every case is potentially a Paderewski case. 7. Note that a proposition, e.g., the proposition that I eat anchovies, can be singular with respect to one argument place and general with respect to another. 8. Our discussion here is limited to reference to spatio-temporal things. 9. It is odd that Kripke (1977), in defending Russell’s theory against the claim that Donnellan’s (1966) distinction has semantic significance, contrasts “speaker’s reference” with a definite description’s “semantic reference.” But by this he can only mean the description’s denotation, the individual that uniquely satisfies it. See Point L0, which follows. 10. Why mutually salient or familiar? Obviously it is not enough for the intended referent to be salient or familiar merely to the speaker, if it is not salient or familiar to the audience and if this is not evident to the speaker (etc.). So, in general, when I say that something is salient or familiar, I will mean that it is mutually so. 11. Here and throughout I am assuming a distinction between saying and meaning or stating, a distinction which I have tried elsewhere to vindicate (Bach 2001). It corresponds to Austin’s distinction between locutionary and illocutionary acts. This distinction is often blurred, e.g., by Donnellan (1966), whenever he suggests that in using a description referentially rather than attributively, one is saying something different, allegedly because the content of the description does not enter into what is said. 12. The difference in type of proposition is clear from Russell’s observations about the use of the indefinite description ‘a man’: What do I really assert when I assert “I met a man”? Let us assume, for the moment, that my assertion is true, and that in fact I met Jones. It is clear that what I assert is not “I met Jones.” I may say “I met a man, but it was not Jones”; in that case, though I lie, I do not contradict myself, as I should do if when I say I met a man I really mean that I met Jones. It is clear also
ON REFERRING AND NOT REFERRING
51
that the person to whom I am speaking can understand what I say, even if he is a foreigner and has never heard of Jones. But we may go further: not only Jones, but no actual man, enters into my statement. This becomes obvious when the statement is false, since there is no more reason why Jones should be supposed to enter into the statement than why anyone else should. Indeed, the statement would remain significant, though it could not possibly be true, even if there were no man at all. (1919, 167–68) 13. Since I am discussing Kaplan, I will here use his term ‘direct’ to modify ‘reference’, although, it is redundant (see Point S0). An expression, like a definite description, that merely denotes an object does not refer to that object, in the sense that the object is not a constituent of propositions expressed by sentences in which the expression occurs. 14. As Kaplan explains (1989b: 579–82), certain things he had previously said, and even his formal system (in Kaplan 1989a), could have suggested that ‘dthat’ is an operator on definite descriptions, with the content of the associated description included in the content of the whole phrase. This would make ‘dthat’ a rigidifier rather than a device of direct reference. 15. To stipulate that any phrase of the form ‘dthat [the F]’ refers “directly” does not, of course, mean that it is guaranteed a referent. It means only that the referent, if there is one, is a constituent of the singular proposition (if there is one) expressed by a sentence in which the phrase occurs. 16. A singular proposition is not only object-involving but object-dependent, in that it would not exist if its object-constituent did not exist (at some time or other). So a singular proposition exists contingently. This does not imply that it exists only when its object-constituent exists. Existing contingently does not make singular propositions temporal. 17. They can also be used predicatively, as when one refers to an object and then describes it as a such-and-such, as when you say of the thing in your hand, ‘This is a pomegranate’. It is arguable that this is not really a quantificational use but a distinctively predicative use, no different in kind from saying, ‘This is red’. It is merely because phrases containing singular common nouns require (in English) an article that one cannot say, ‘This is pomegranate’ (one could say the equivalent of this in Russian). Graff (2001) has argued that all uses of indefinite descriptions are actually predicative, and boldly extends her arguments to definite descriptions. Her account also covers generic uses of definite and indefinite descriptions, as in ‘The tiger has stripes’ and ‘A philosopher is not in it for the money’. 18. I do not mean to suggest that ‘a certain F’ is always used to indicate that the speaker has a particular unspecified individual in mind. He might have in mind merely some unexpressed restriction on ‘F’. For example, one might say, ‘A certain contestant will go home happy’, without specifying that whoever wins the contest in question will go home happy. Similarly, an utterance of the quantified ‘Every author loves a certain book’ could be made true if every author loves, say, the first book he wrote. 19. Karttunen is excluding negative quantificational phrases like ‘no donkey’ and negative cases like ‘Bill didn’t see a unicorn. *It/The unicorn had a gold mane’. 20. In other words, each link in an ‘anaphoric chain’ (Chastain 1975) is treated as having a discourse referent, even if intuitively it does not refer. It should not be supposed, as Chastain (1975) and many others have, that when the links
52
WHAT IS REFERENCE ?
in the chain (the expressions anaphoric on the indefinite description) are used to refer, the indefinite description itself refers. 21. Unless otherwise indicated, when discussing definite descriptions I will assume that the description occurs in a simple sentence of the form ‘the F is G’. On Russell’s theory, the type of general proposition is what Strawson called a “uniquely existential” or what I call simply a “uniqueness” proposition. 22. Alternatively, an expression could be semantically unspecified with respect to each use—each is compatible with, but neither is determined by, the meaning of the expression. Recanati (1993, ch. 15) and Bezuidenhout (1997) take this line with definite descriptions. They deny that descriptions are semantically ambiguous but do not treat one use as literal and explain the other pragmatically. They do this because intuitively they find referential uses to be no less literal than attributive ones. Accordingly, they suggest that the existence of both uses is symptomatic of semantic underdetermination or what Recanati calls, borrowing a phrase from Donnellan, “pragmatic ambiguity’ (perhaps this is what Donnellan had in mind by that phrase). However, from this it implausibly follows that a sentence like ‘The discoverer of X-rays was bald’ does not express a determinate proposition. If we wish to maintain that such a sentence does express a determinate proposition, and does so univocally, the obvious choice is a general proposition, in which case the description functions as a quantifier phrase and only its attributive use is the strictly literal one. 23. Of course, Russell held that ordinary proper names are ‘disguised’ or ‘truncated’ descriptions, in which case they, too, contrary to appearances, are quantificational. 24. For the latest defense of “referential descriptions,” see Devitt 2004. I reply to his main arguments in the final section of Bach 2004b. 25. Soames is not a strict Millian, since he attributes additional descriptive content to proper names of certain sorts. I do not believe that this has any bearing on any points made here. 26. No doubt my own intuitions are as theory-driven as those of the Millians, for in my view, which I defend in Bach 2002, a proper name expresses the property of bearing that very name. This was not Mill’s view, of course but, interestingly enough, he did write, “When we refer to persons or things by name, we do not convey “any information about them, except that those are their names” (1872, 22; my emphasis). 27. As promised by his title, “The Multiple Uses of Indexicals,” Quentin Smith (1989) identifies various unorthodox ways in which indexicals can be used. Accordingly, he rejects the view, such as Kaplan’s theory of character, according to which reference is determined as a simple function of context. No simple rule can directly account for this variety of uses. However, he still thinks that each use is rule-governed and proposes that for each indexical there is a “meta-rule” that determines, as function of a context, which reference-determining rule is operative (at least in cases where the indexical is used to refer). Unfortunately, his statement of these meta-rules is too sketchy and schematic to be very helpful. Moreover, he makes no attempt to show that all the uses he identifies for a given indexical are literal uses. It seems that some are not, in which case there is no need for a rule of the sort he imagines to cover them. On the other hand, since indexicals can be used literally but nonreferringly, if there is such a meta-rule, it would have to take those uses into account, in which case it would not be limited
ON REFERRING AND NOT REFERRING
53
to determining, as function of a context, which reference-determining rule is operative. However, it is not clear that there is any rule that determines semantic content as a function of context (see Point L3, which follows). 28. In his commentary on an earlier version of this paper (APA Pacific Division, March 26, 2004), Jeff King offered strong reasons to suppose that these uses are not literal, and thus require a partly pragmatic explanation. 29. Being of the form ‘that F’, these may also be called ‘demonstrative descriptions’. See Braun 1994 for a critical comparison of various referential and non-referential approaches and their respective accounts of the semantic role of the ‘F’ in ‘that F’. See King 2001 for a thoroughgoing defense of the claim that complex demonstratives are quantifier phrases. 30. Two qualifications here. First, to say that an expression is used to refer does not entail that it is successfully used to refer. For example, a use of the description ‘the dagger I see before me’ could count as referential even if there is no dagger before the speaker. Also, premise 1 in the ESA says ‘virtually any expression’ to allow for the case of ‘I’, ‘today’, and a few others (“pure” indexicals). ‘You’ might be added to that list, despite the fact that it has an impersonal use, for it is not generally true that the second-person pronoun has an impersonal use. For example, French has ‘on’ rather than an impersonal ‘tu’ or ‘vous’, and German has ‘man’ rather than an impersonal ‘du’ or ‘sie’. Also, I wouldn’t argue that ‘he’ and ‘she’ have nonreferential uses because they are colloquially used as count nouns (“It’s a she!”). 31. In the case of demonstratives, Kaplan points out the need for “completing demonstrations and recognizes some of the problems this poses for his framework of character and content. David Braun (1996) has made the best effort I know of to solve these problems broadly within that framework, but it requires an additional level of meaning and requires that demonstrations be explicitly represented. Leaving aside nonreferring uses of demonstratives, it is not clear to me how Braun’s account can be extended to handle referring uses of demonstratives that do not involve demonstrations or cases in which what is referred to is not what is demonstrated, as in many of Nunberg’s (1993) well-known examples (see also Borg 2001). Nunberg (2004) now disavows describing these as cases of “deferred reference.” 32. Points similar to those of this paragraph are made incisively by Schiffer (except that he invokes the notion of token reference): Meaning-as-character may initially seem plausible when the focus is on a word such as ‘I’, but it loses plausibility when the focus is on other pronouns and demonstratives? What “contextual factors” determine the referent of the pronoun ‘she’ in a context of utterances? . . . Evidently, the meaning of ‘she’ (very roughly speaking) merely constrains the speaker to refer to a female. We do not even have to say that it constrains the speaker to refer to a contextually salient female, since the speaker cannot intend to refer to a particular female unless he expects the hearer to recognize to which female he is referring, and the expectation of such recognition itself entails that the speaker takes the referent to have an appropriate salience. What fixes the referent of a token of ‘she’ are the speaker’s referential intentions in producing that token, and therefore in order for Kaplan to accommodate ‘she’, he would have to say that a speaker’s referential intentions constitute one more component of those n-tuples that he construes as ‘contexts’. The trouble with this is that there is no work for Kaplanian contexts to do once .
54
WHAT IS REFERENCE ?
one recognizes speakers’ referential intentions. The referent of a pronoun or demonstrative is always determined by the speaker’s referential intention. (Schiffer 2005, sec. 2) 33. Our examples are limited to non-reflexive pronouns, which can also be used to make deictic reference. Linguists reserve the term ‘anaphor’ for reflexives and reciprocals. 34. It might seem that the property of being salient, which has figured in our discussion of the pragmatics of reference, somehow figures in the meaning of demonstrative phrases. For example, on John Perry’s account of the content of a demonstrative phrase, the “basic content of [an utterance of ‘that f’] is the identifying condition, being the salient f to which the speaker of [that utterance of ‘that f’] directs attention” (2001, 77). But why suppose the role of salience is anything more than pragmatic? A speaker who wishes to use a simple demonstrative or demonstrative phrase to refer to something needs to make sure that the intended referent is salient not because the meaning of ‘that’ requires this but because otherwise his audience would not be able to figure out what he is referring to. If he uses the demonstrative to (try to) refer to something that isn’t salient, he is not misusing the word ‘that’, in the sense of using it to mean something it doesn’t mean (as he would if, say, he thought ‘honorary’ meant what ‘honorable’ means). Rather, he would be committing the pragmatic mistake of trying to refer to something that his listener would have no reason to take him to be referring to. It would be like correctly using arcane words knowing full well that one’s audience was unfamiliar with them. Obviously it is not part of the meaning of arcane words that they be uttered only to people who understand them. 35. Jason Stanley posed this objection in his commentary on an earlier version of this paper (APA Pacific Division, March 26, 2004). 36. For example, if it is true that Hammurabi believed that Hesperus is visible only in the evening, then it is true that he believed that Phosphorus is visible only in the evening. We may prefer to say the former, because what we say is sensitive to a pragmatic “requirement that the reporter be maximally faithful to the words of the agent unless there is reason to deviate” (Soames 1988,: 123). However, the anti-substitution intuition, insofar as it pertains to what the belief sentence itself says, betrays an implicit confusion between what the sentence says and what uttering it conveys, namely, that the subject believes the proposition expressed by the ‘that’-clause by taking that proposition in a way that is pragmatically associated with its wording. From a Millian point of view, any difference between two co-referring proper names cannot be semantic. 37. I should cancel any implication that Kripke 1980 is a defense of Millianism. As Soames points out, “nowhere in Naming and Necessity, or anywhere else, does Kripke say what the semantic content of a name is” (2002, 5). 38. I develop this picture in Bach 2004c. It is set against the background of the neo-Gricean framework presented in previous work (Bach and Harnish 1979 and Bach 1994, 2001, and 2002). Also, see Levinson 2000 for a comprehensive discussion of regularities in speaker meaning that go beyond linguistic meaning. His numerous examples illustrate the error of supposing that the semantic content of an indicative sentence is what it is mostly likely to be used to assert. Most utterances involve what I call “conversational impliciture,” in which what the speaker means is not made fully explicit.
ON REFERRING AND NOT REFERRING
55
39. An incomplete definite description is one whose matrix is satisfied by more than one individual. Most definite descriptions we use are incomplete. Usually we use ‘the book’ and ‘the car’, for example, to refer to a certain readily identifiable book or car. 40. Similarly, it seems to me that sentence (a) has a reading such that on the above scenario, where Aristotle were his parents’ first born, it would be true. (a) Aristotle might have been Aristocrates. I am not committing the vulgar mistake of suggesting that (a) has a true reading because it could have semantically expressed a different proposition than the one it does express. This would indeed be to “confuse use and mention,” which Kripke warns against. Kripke notes the difference between these three theses: “(i) that identical objects are necessarily identical, (ii) that true identity statements between rigid designators are necessary; (iii) that identity statements between what we call ‘names’ in actual language are necessary” (1980, 4). And, as he points out, thesis (iii) follows from (ii) only if ordinary names are rigid. However, he does not consider the possibility that a sentence like (29) has a reading on which it is not an identity statement at all, and involves the ‘is’ of predication rather than identity.
REFERENCES Austin, J. L. 1962. How To Do Things with Words. Oxford: Oxford University Press. Bach, K. (1987/1994), Thought and Reference, paperback. ed., revised with postscript. Oxford: Oxford University Press. —— . 1994. “Conversational Impliciture,” Mind and Language 9: 124–62. —— . 1999. “The Semantics-Pragmatics Distinction: What It Is and Why It Matters,” in K. Turner (ed.), The Semantics-Pragmatics Interface from Different Points of View. Oxford: Elsevier, 65–84. —— . 2000. “A Puzzle about Belief Reports,” in K. Jaszczolt, The Pragmatics of Propositional Attitude Reports. Oxford: Elsevier, 99–109. —— . 2001. “You Don’t Say?” Synthese 128: 15–44. —— . 2002. “Giorgione Was So-called Because of His Name,” Philosophical Perspectives 16: 73–103. —— . 2004a. “Speech Acts and Pragmatics,” in L. Horn and G. Ward (eds.), The Handbook of Pragmatics. Oxford: Blackwell, 463–87. —— . 2004b. “Descriptions: Points of Reference,” in A. Bezuidenhout and M. Reimer (eds.), Descriptions and Beyond. Oxford: Oxford University Press. —— . 2004c. “Context ex Machina,” in Z. Szabó (ed.), Semantics vs. Pragmatics. Oxford: Oxford University Press. Bach, K. and Harnish, R. M. (1979), Linguistic Communication and Speech Acts. Cambridge, MA: MIT Press. Barwise, Jon and John Perry (1983), Situations and Attitudes. Cambridge, MA: MIT Press. Bezuidenhout, A. (1997), “Pragmatics, Semantic Underdetermination, and the Referential-Attributive Distinction,” Mind 106: 375–410. Borg, E. (2002), “Pointing at Jack, Talking about Jill: Understanding Deferred Uses of Demonstratives and Pronouns,” Mind and Language 17: 489–512.
56
WHAT IS REFERENCE ?
Braun, D. (1993), “Empty Names,” Noûs 27: 449–69. —— . 1994. “Structured Characters and Complex Demonstratives,” Philosophical Studies 74: 193–219. —— . 1996. “Demonstratives and Their Linguistic Meanings,” Noûs 30: 145–73. —— . 1998. “Understanding Belief Reports,” Philosophical Review 107: 555–95. —— . 2005. “Empty Names, Fictional Names, and Mythical Names,” Noûs 39: 596–631. Burge, T. (1973), “Reference and Proper Names,” Journal of Philosophy 70: 425– 39; reprinted in Peter Ludlow (ed.), Readings in The Philosophy of Language. Cambridge, MA: MIT Press, 593–608. Chastain, C. (1975), “Reference and Context,” in K. Gunderson (ed.), Minnesota Studies in the Philosophy of Science, vol. 7, Language, Mind, and Knowledge, Minneapolis: University of Minnesota Press, 194–269. Devitt, M. (2004), “The Case for Referential Descriptions,” in A. Bezuidenhout and M. Reimer (eds.), Descriptions and Beyond. Oxford: Oxford University Press. Donnellan, K. (1966), “Reference and Definite Descriptions,” Philosophical Review 75: 281–304. Evans, G. (1982), The Varieties of Reference. Oxford: Oxford University Press Graff, D. (2001), “Description as Predicates,” Philosophical Studies 102: 1–42. Grice, H. P. (1989), Studies in the Way of Words. Cambridge, MA: Harvard University Press. Gundel, J., Hedberg, N. and Zacharski R. (1993), “Cognitive Status and the Form of Referring Expressions in Discourse,” Language 69: 274–307. Kaplan, D. (1979), “Dthat,” in P. French, Theodore Uehling, and H. Wettstein (eds.), Contemporary Perspectives in the Philosophy of Language. Minneapolis: University of Minnesota Press, 383–400. —— . 1989a, “Demonstratives,” in J. Almog, J. Perry, and H. Wettstein (eds.), Themes from Kaplan. Oxford: Oxford University Press, 481–563. —— . 1989b. “Afterthoughts,” in J. Almog, J. Perry, and H. Wettstein (eds.), Themes from Kaplan. Oxford: Oxford University Press, 565–614. Karttunen, L. (1976), “Discourse Referents,” in J. D. McCawley (ed.), Syntax and Semantics 7. New York: Academic Press, 363–86. King, J. (1987), “Pronouns, Descriptions, and the Semantics of Discourse,” Philosophical Studies 51: 341–63. —— . 2001. Complex Demonstratives: A Quantificational Account. Cambridge, MA: MIT Press. Kripke, S. (1977), “Speaker’s Reference and Semantic Reference,” Midwest Studies in Philosophy 2: 255–76. —— . 1980. Naming and Necessity. Cambridge, MA: Harvard University Press. Levinson, S. (2000), Presumptive Meanings: The Theory of Generalized Conversational Implicatures, Cambridge, MA: MIT Press. Ludlow, P. and Neale, S. (1991), “Indefinite Descriptions: In Defense of Russell,” Linguistics and Philosophy 14: 171–202. Lockwood, M. (1975), “On Predicating Proper Names,” Philosophical Review 84: 471–98. Mill, J. S. (1872), A System of Logic, definitive 8th edition. 1949 reprint. London: Longmans, Green and Company.
ON REFERRING AND NOT REFERRING
57
Neale, S. (1990), Descriptions. Cambridge, MA: MIT Press. —— . 2004. “Pragmatics and Binding,” in Z. Szabó (ed.), Semantics vs. Pragmatics. Oxford: Oxford University Press. Nunberg, G. (1993), “Indexicality and Deixis,” Linguistics and Philosophy 16: 1–43. —— . 2004. “Descriptive Indexicals and Indexical Descriptions,” in A. Bezuidenhout and M. Reimer (eds.), Descriptions and Beyond. Oxford: Oxford University Press. Perry, J. (2001), Reference and Reflexivity. Stanford: CSLI Publications. Predelli, S. (1998), “I am not here now,” Analysis 58: 107–115. Recanati, F. (1993), Direct Reference: From Language to Thought. Oxford: Blackwell. Russell, B. (1905), “On Denoting,” Mind 14: 479–93. —— . 1917/1957. “Knowledge by Acquaintance and Knowledge by Description,” in Mysticism and Logic, paperback edition. Garden City, NY: Doubleday Anchor, 202–24. —— . 1918/1956. “The Philosophy of Logical Atomism,” in Logic and Knowledge. London: George Allen and Unwin, 175–281. —— . 1919. “Descriptions,” ch. 16 of Introduction to Mathematical Philosophy. London: George Allen and Unwin, 167–80. Salmon, N. (1986), Frege’s Puzzle, Cambridge, MA: MIT Press. —— . 1998, “Nonexistence,” Noûs 32: 277–319. Schiffer, S. (1977), “Naming and Knowing,” Midwest Studies 2: 28–41. —— . 2005. “Russell’s Theory of Descriptions,” Mind 114: 1135–1183. Smith, Q. (1989), “The Multiple Uses of Indexicals,” Synthese 78: 167–91. Soames, S. (1988), “Substitutivity,” in J. J. Thomson (ed.), On Being and Saying: Essays in Honor of Richard Cartwright, Cambridge, MA: MIT Press, 99–132. —— . 2002. Beyond Rigidity. Oxford: Oxford University Press. Strawson, P. F. (1950), “On Referring,” Mind 59: 320–44. —— . 1964. “Intention and Convention in Speech Acts,” Philosophical Review 73: 439–60.
58
WHAT IS REFERENCE ?
II
WHAT IS THE APPROPRIATE LINGUISTIC ANALYSIS OF DIFFERENT FORMS OF REFERRING EXPRESSION?
This page intentionally left blank
3
Issues in the Semantics and Pragmatics of Definite Descriptions in English barbara abbott
1. Introduction As is well known, Russell assigned indefinite and definite descriptions the interpretations represented schematically in (1) and (2), respectively, where “CNP” stands for “Common Noun Phrase” in the sense used by Montague (1973)—that is, as standing for the constituent which a determiner combines with to form a noun phrase (NP). (1)
a. . . . a/an CNP . . . b. ∃x[CNP(x) & . . . x . . .]
(2)
a. . . . the CNP . . . b. ∃x[CNP(x) & ∀y[CNP(y) → y=x] & . . . x . . .]
Examples (3) and (4) are illustrations. (3)
a. Mary bought a car that she liked. b. ∃x[Car(x) & Liked(m, x) & Bought(m, x)]
(4)
a. Mary bought the car that she liked. b. ∃x[Car(x) & Liked(m, x) & ∀y[[Car(y) & Liked(m, y)] → y=x] & Bought(m, x)]
61
The difference, as is obvious, is the underlined clause expressing uniqueness—exhaustive possession by the entity in question of the property expressed by the CNP. Szabó (2000) and Ludlow and Segal (2004) (following Kempson (1975), Breheny (1997), and others) defend analyses on which definite descriptions are assigned the same quantificational interpretation as Russell assigned to indefinite descriptions. Thus, on both accounts (3a) as well as (4a) would be given the quantificational analysis in (3b). Both proposals acknowledge that definite descriptions differ from indefinites in their implications—where “implication” is to be understood as neutral between semantic and pragmatic conveyance. One of these implications is what is commonly termed “familiarity”—an assumption that the denotation of the NP1 has already been introduced, as such, to the addressee of the utterance. The other is commonly termed “uniqueness,” in its simplest form as expressed by the underlined clauses in examples (2b) and (4b), but frequently relativized to context and addressee. (We will return to this issue shortly.) Both analyses take familiarity to be more central to the interpretation of definite descriptions, and attempt to derive the uniqueness implication as a conversational implicature. I agree that, as far as contribution to truth conditions goes, (3b) may suffice for both (3a) and (4a). However, my position differs from that in both papers in holding that uniqueness is a part of the conventional import of definite descriptions and that familiarity is not conventionally, but only conversationally, associated with them. (Compare also the account in Gundel et al. 1993, 2001.) Let me be more specific about the view I want to defend. It is, that use of a definite description presupposes the existence and uniqueness clauses of Russell’s analysis (as in (2b) and (4b) ), where “presupposes” means roughly ‘conveys as something not needing assertion’ (cf. Abbott 2000). The first of these presuppositions (the existence one) is also an entailment (as encoded in (1b) and (3b) ). The second (uniqueness) seems aptly described as a conventional implicature—an implication conveyed semantically, but whose falsehood would not be felt as critically damaging to the truth of the utterance as a whole. Thus, (4a) conveys the information that Mary had liked one and only one car as something not needing assertion, whereas (3a) asserts that Mary had liked a car (and bought it) but says nothing about how many she liked. If Mary had not liked any car, then neither (3a) nor (4a) would be true. However, if Mary had (recently) liked two cars (and bought one of them), then (4a) might be felt to be true although anomalous in conventionally implicating that she had only liked one car. Before plunging forward, it is necessary to clear up some points surrounding the relevant concept of “uniqueness.” First of all, Russell’s analysis was couched in a logic geared to speak only of discrete entities one at a time. Of course in natural language we do not so confine ourselves, so at the very least if something like this analysis is to be maintained it must be amplified
62
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
accordingly. Fortunately, Sharvy (1980) has shown a way to maintain the spirit of Russell’s analysis while extending it to deal with CNPs headed by mass and plural nouns, namely, by restating uniqueness as exhaustiveness (cf. also Hawkins 1978, 1991). Incidentally, one could hold, with Sharvy, that “the primary use of ‘the’ is not to indicate uniqueness. Rather, it is to indicate totality; implication of uniqueness is a side effect” (Sharvy 1980, 623). Or one could argue that the primary use of the is to indicate uniqueness, and that totality (or exhaustiveness of application of the descriptive content of the relevant CNP) is a side effect of achieving uniqueness with mass and plural CNPs. In any case, I will continue to use the term “uniqueness” for whatever it is we’re talking about. But there is another, even more pressing issue—the problem of incomplete definite descriptions. Someone can say, baldly, Please put the book on the table, in full knowledge that the world is littered with books and tables. Probably the preferred way to deal with this problem currently is to modify the notion of uniqueness to something like “identifiability in context,” the idea being that use of the signals an assumption that the addressee can individuate the speaker’s intended referent from among the potential referents in that particular context of utterance. (See, e.g., Birner and Ward 1998, 122.) As a statement of the net effect, this is fine, although ultimately I would want to try to argue that it is something Russellian, as elaborated by Sharvy, which is conventionally encoded in definite descriptions, and that the part to do with what the addressee is expected to be able to do can be derived from that plus Grice’s rules of conversation (Grice 1975). I won’t attempt to make the argument here in full, though I will try to sketch something partial along these lines. The remainder of this paper is organized as follows. In the next two sections I give evidence supporting my claims that familiarity is not conventionally associated with the and that uniqueness is conventionally associated with the. Following that, I look at the case of stressed the, and then look into the derivation of the familiarity implication as a conversational implicature. The penultimate section sketches very briefly the direction of my response to some of the other arguments presented in the papers of Szabó and Ludlow and Segal, and the final section concludes.
2. The Familiarity Implication Is Not Conventional Ludlow and Segal make the explicit suggestion that familiarity (their term is “givenness”) is a conventional implicature of definite descriptions, but they seem somewhat unclear on what that suggestion entails. For one thing, they seem to believe that conventional implicatures are inferred, and for another, that they can be overridden.2 Perhaps sensing the inconsistency here, they note in a footnote that they are not “wedded” to the idea of givenness as a conventional implicature, commenting that “all of what we say here can be recast in terms of explicature or inference by one’s favorite account of how
ISSUES IN SEMANTICS AND PRAGMATICS
63
the new/given distinction is to be understood.” (Puzzlingly, they continue, “Anne Bezuidenhout has observed that it might be possible to build the new/ given information into the semantics of the determiner. In principle we don’t have a problem with that move either. . . .” (Ludlow and Segal 2004, 425n5, italics in original; the puzzlement is because conventional implicatures are semantic).) There is a suggestion here, in the use of the term “inference,” that familiarity, as well as uniqueness, could be derived as a conversational implicature. (Szabó also often sounds as though he had something like this in mind, although he has said elsewhere (Szabó 2003) that that was not his intention.) In any case, there would be a fundamental problem with proposing to derive familiarity or givenness as a pragmatic inference, given that one is also proposing that definite descriptions are assigned the same conventional meaning as indefinites. The problem is that if there is no conventional distinction between the and a, then there is no way to derive familiarity in one case and not the other. Any pragmatic distinction of this type would have to be based on some kind of conventional difference. (This difficulty is discussed in more detail in Abbott 2003.) So let us turn to the idea that familiarity or givenness is a conventional implicature. It is a characteristic of conventional (as opposed to conversational) implicatures that they cannot be canceled or “overridden.” This follows from their nature: if these implications are linguistically encoded, then there is no way to deny them on the spot without sounding like you are contradicting yourself. Consider (5): (5)
# Even Kim, who is very smart, could solve that problem.
Even conventionally implicates a relatively unexpected instance. This implicature (which is not inferred but semantically encoded in the word) conflicts with the expressed assumption that Kim is very smart, hence would be expected to solve problems. The result is anomaly. Now contrast the case of the familiarity implication of definites, using (6): (6)
The new curling center at MSU, which you probably haven’t heard of, is the first of its kind.
In (6) any assumption that the addressee is familiar with the curling center is explicitly denied and yet the result is perfectly felicitous. (The denial must be appropriately hedged, of course, since it would be infelicitous for independent reasons to make a bald assertion about the knowledge state of the addressee.) If familiarity or givenness were a part of the conventional meaning of definite descriptions, (6) should be anomalous, but it is not. (See also the research reported in Gundel et al. 2001 and the works cited there, suggesting that in some contexts, as many as a third to a half of occurrences of definite descriptions may introduce unfamiliar referents.) It should be acknowledged that some definite descriptions are used anaphorically. Standard examples are of the type that Heim gave in presenting
64
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
her version of the familiarity theory of definiteness, examples like that in (7) (cf. Heim 1982, 275). (7)
A woman and a man met on the street. The woman said ‘Hi’.
Even if not all definite descriptions convey the familiarity idea, it might be suggested, still some, like the one in (7), do convey that as a part of their conventional meaning. I want to resist this move for several reasons. One is that, given that familiarity is not in general conventionally associated with definite descriptions (as indicated by (6) as well as the research cited in the last paragraph), this move would seem to require that we regard definite descriptions, or the definite article, as ambiguous. This is both contrary to speaker intuitions and methodologically distasteful. Another reason for resisting the claim that definite descriptions at least sometimes convey familiarity conventionally is that it may not be necessary, if we can give a sufficiently convincing account of how the uniqueness approach can account for such examples. We will return to this issue later.
3. The Uniqueness Implication Is Conventional Both Szabó, and Ludlow and Segal, attempt to treat the uniqueness implication as a conversational implicature, derived pragmatically with the help of the element of familiarity or givenness, however it is obtained. There are both similarities and differences in the two suggested derivations. Szabó’s approach, which is adapted from the “file card” approach of Heim 1982, involves a potential conflict between familiarity and a pragmatic principle he calls ‘Non-arbitrariness’: (8)
Non-arbitrariness: When filing an utterance, don’t make arbitrary choices.
The idea behind (8) is that it should be clear where the addressee is to look for a referent, in other words that an addressee shouldn’t have to make an arbitrary choice between two potential referents. (Ideally perhaps (8) should be rephrased to apply to speakers, and enjoin them from forcing addressees to make arbitrary choices. It might then be collapsed with Grice’s rules of Manner.) Suppose a speaker has made an assertion using a definite description the F. The addressee, Szabó explains, will reason as follows (41; an F card is a file card representing an entity with the property denoted by the CNP F): (9)
Suppose I had two private F cards with incompatible conditions. Then I could not have filed A’s utterance . . . : either Familiarity or Non-arbitrariness would have been violated.
So the speaker can conclude that the addressee does not have two private incompatible F cards—that is, that he or she knows of at most one F. There are a couple of problems with this line of derivation. One is that it is not so clear that Familiarity and Non-arbitrariness must conflict in this way. Part of the
ISSUES IN SEMANTICS AND PRAGMATICS
65
stipulation for the conflict is that the two potential F cards have incompatible conditions. That means the two potential F cards must have different information on them, which in turn suggests that choosing one of them over the other need not be an arbitrary choice after all. Another problem is presented by definite descriptions with plural or mass head nouns, which Szabó does not address in his paper. It is not clear how this derivation will extend to those cases. The derivation sketched by Ludlow and Segal is also potentially liable to the difficulties presented by plural and mass NPs, but suffers from a more basic problem as well. They use the traditional Gricean format for displaying the calculability of conversational implicatures (p. 427): (10)
a. S has expressed the proposition that [(∃x: Fx](Gx). [Recall that this is Ludlow and Segal’s (and Szabó’s) analysis of the truth-conditional contribution of definite descriptions.] b. There is no reason to suppose that S is not observing the CP and maxims. c. S could not be doing this unless he thought that [(ιx: Fx](Gx). Gloss: By invoking the determiner ‘the’, S intends to communicate that whatever F or Fs he is talking about is/are given in the conversational context. [This is the conventional implicature of givenness, or familiarity.] By refraining from using a plural noun, S intends to communicate that just one F is given in the conversational context. If there were more than one F given in the context, S would have used the plural definite description (otherwise S would flout the maxim of quantity). . . .
But it is not a violation of the rule of quantity to speak indiscriminately about one of a number of familiar entities. One can do this without any conversational strain using a phrase like one of these students. It would only be a violation of Quantity if we assumed that the descriptive content of F were sufficient to determine a unique entity, but that would be assuming a uniqueness implication, rather than deriving it. In any case there is a further problem with any proposal to derive the uniqueness implication pragmatically, as a conversational implicature or some other type of non-conventional inference. The problem is that we would then expect this implication of uniqueness to be cancelable, unlike the case with the implication of unexpectedness with even (as in (5) above), but like the case with familiarity (as in (6) ). The problem for both sets of authors is that this implication behaves more like a conventional implicature than a conversational implicature, as shown by (11). (11)
# Russell was the author of Principia Mathematica; in fact, there were two.
In (11) there has been an attempt to cancel the implication of uniqueness, but the result is anomaly, just as in other cases of attempted cancellation of conventional implicatures. And we get the same effect with a plural definite description, as shown in (12).
66
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
(12)
# Akmajian and Demers were the authors of Linguistics; in fact, there were three.
The anomaly of (11) and (12) supports the claim that uniqueness/exhaustiveness is conventionally encoded in the definite article, and not derived conversationally. A major claim of Szabó’s paper is that the implication of uniqueness for definite descriptions and the implication of non-uniqueness for indefinite descriptions are symmetrical and should be treated as such. However this does not seem to be the case. Given that definite descriptions encode uniqueness, the nonuniqueness associated with indefinite descriptions may be readily derived as a conversational implicature: a choice of a rather than the conveys that the is not appropriate, hence that the descriptive content of the NP does not apply uniquely (within the local discourse context). This analysis, which makes a number of subtle predictions concerning contexts in which the implication of nonuniqueness will and will not occur, is defended in some detail in Hawkins 1991; see also Gundel et al. 1993. Note too that the implication of nonuniqueness in the case of indefinites is cancelable: (13)
Russell is an author of Principles of Mathematics, in fact the only one.
Unlike (5), (11), and (12), (13) does not sound self-contradictory, confirming the claim that nonuniqueness is only a conversational implicature of indefinite descriptions.3
4. Stressed THE There are other arguments for taking uniqueness rather than familiarity to be the essence of definite descriptions—see e.g. Hawkins 1991, Birner and Ward 1994. I want to mention one argument here that was given in Abbott 1999, and replied to by Ludlow and Segal. That argument involved noting that when the is stressed, it is an implication of uniqueness, rather than familiarity, which is fronted. One example is (14): (14)
That wasn’t a reason I left Pittsburgh, it was the reason. [= Abbott 1999, ex. 2]
Concerning this example, Ludlow and Segal remark We take it that someone may very well utter [14] with stress as indicated despite having several reasons for leaving Pittsburgh. Stressing ‘the’ indicates that the reason in question was not merely one of many reasons for leaving but rather the causally determinate reason—the big reason. (Ludlow and Segal 2004, 435)
In so remarking, Ludlow and Segal echo comments of Epstein (1996), who described the stressed the in (15) as signaling ‘prominence’ or ‘great importance’.
ISSUES IN SEMANTICS AND PRAGMATICS
67
(15)
In other countries, soccer is the sport. If the national team loses, there could be a coup. [Los Angeles Times 6/5/94, p. C9; italics in the original] [= Epstein 1996, ex. 2]
However, as I argued in my 1999 paper in reply to Epstein, it seems both more specific and more accurate, to describe [15] as conveying this prominence through hyperbole. Obviously soccer isn’t the only sport in any country, but to describe it as such in forceful terms, as is done in [15], is to convey its prominence in a specific way. So on my account the speaker of [15] says literally that soccer is the only sport. This is almost certainly false, invoking standard Gricean mechanisms to arrive at the hyperbolic understanding. In support of this claim notice that one could replace THE sport in [15] with the only sport, achieving the same effect although in a slightly more heavyhanded way. . . . [16] below is another naturally occurring example of stressed the where the referent is not actually the unique satisfier of the description. . . . In this example the hyperbole is tacitly acknowledged.4 (Abbott 1999, 3) (16)
‘People say it’s the night that the movers and shakers are going,’ a mischievous-sounding Gehry said when he was asked about his Thursday invitation. ‘I was told, of course, that every night is the most important one,’ Gehry. . .added. ‘But I was told that this was the most important one.’ [The New Yorker 12/22and29/97, p. 50; italics in original] [= Abbott 1999, ex. 8]
Note too the number of other cases where expressions literally expressing uniqueness have come through hyperbole to convey prominence or some other kind of specialness: e.g. the one and only X, X is very unique, they broke the mold. . . . Of course none of these examples seem to have anything to do with the familiarity of the referents involved, and hence are problematic for the familiarity theory. Ludlow and Segal do provide one kind of case which might look at first as though it were familiarity rather than uniqueness which was being emphasized with stressed the: Ludlow’s third grade teacher had a husband named William Faulkner. On vacation in the South, he was asked, “are you the William Faulkner”? Presumably, the questioner was not asking if he was the unique individual named William Faulkner . . ., but was asking whether this individual was the famous—i.e. given or familiar—William Faulkner. (Ludlow and Segal 2004, 435, italics in original)
One problem with this analysis is that both William Faulkners are given, in the sense of known to the addressee, in this example. Furthermore we have to assume that to the addressee, he himself is the more given or familiar of the two. Nevertheless, of course, his answer should be No. There is also the fact
68
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
that the phrase the William Faulkner imparts a sense of luster to the referent. Ludlow and Segal would like to see this as an extrapolation from givenness. But one could just as well see the prominence of the author as granting him a unique salience, and the stressed the as expressing that idea. That would have the additional advantage of making this example consistent with those above, where prominence is obviously seen as a species of uniqueness and not givenness.
5. Deriving Familiarity as a Conversational Implicature I have argued above that familiarity or givenness is not conventionally encoded in definite descriptions, and that uniqueness is. It remains to say something more about where the implication of familiarity comes from, in those cases where it does arise. The prototypical cases are anaphoric uses as in (7) above, repeated here. (7)
A woman and a man met on the street. The woman said ‘Hi’.
As noted at the outset, I am assuming that a definite description will convey as a presupposition that there is one and only one entity meeting the descriptive content of the CNP. In a case such as (7), with a very incomplete definite description like the woman, it is clear that this uniqueness must hold within a narrowly circumscribed circumstance and, handily enough, the preceding sentence gives exactly such a circumstance. Furthermore it follows from Grice’s principles that sequences of sentences will be relevant to each other, unless the subject is explicitly changed. An inference that the speaker intends to refer to the female entity introduced in this preceding sentence is almost unavoidable, a fact of which both speaker and addressee must be assumed to be mutually aware. Despite the naturalness of this derivation let us for a moment consider a claim that familiarity is conventionally encoded in definite descriptions used anaphorically. Could one then try to supplement such a claim by showing that the familiarity is not cancelable? The problem is that in any normal circumstances the utterer of (7) will intend to refer to the woman they have introduced into the conversation. As suggested above, the discourse would be incoherent otherwise. But this does not mean that we need to regard familiarity (or anaphoricity) as encoded in the definite article, any more than we need to regard it as encoded in proper names. The implication of familiarity comes from general principles for coherent discourse, which probably follow from even more general principles as Grice suggested. So although (7′) is definitely anomalous: (7′)
# A woman and a man met on the street. The woman (not the one I just mentioned) said ‘Hi’.
The anomaly is one of incoherent discourse and not self contradiction.
ISSUES IN SEMANTICS AND PRAGMATICS
69
6. Brief Replies to Some Other Arguments Szabó and Ludlow and Segal give two similar kinds of arguments in favor of their approaches. One kind involves citing general facts about determiner interpretation that seem to show that, were the uniqueness implication to be conventionally encoded in an article, English would be out of line, in some sense, with the languages of the world. The other involves the explanation for the definiteness effect in existential sentences. I can only briefly sketch the direction a complete reply to these arguments would take. On the general claims about possible determiner interpretations, I would suggest that there are a number of quite specific types of information commonly encoded in determiners in some, but not all, of the languages of the world. Some of these refer to aspects of referents such as animacy, shape, visibility, and so on. Of course, it might be claimed that quantificational meanings are special, and there we should expect to find uniformity across languages. However, even here there are notable lacks of uniformity. For example, some languages encode a dual number (as English used to), but clearly not all languages have that category. The issue of the definiteness effect in existential sentences I have addressed elsewhere (e.g., Abbott 1993, 1997). Briefly, my position is that it is less stipulative, as well as empirically more adequate, to see the awkwardness of definites in nonenumerative existentials as a result of a clash between the presupposition of existence grammatically encoded in definites and the assertion of existence of an existential sentence. Ludlow and Segal find this unconvincing in view of the fact that “one can perfectly well say ‘The mayor is a mayor’ ” (p. 433, n. 14). But in this example the conflict is at the level of content morphemes, not grammatical constructions as in the case of existential sentences and definite DPs.
7. Concluding Remarks In this chapter I have tried to support the uniqueness view of definite descriptions, and to argue that the approaches of Szabó 2000 and Ludlow and Segal 2004, which take familiarity as essential to definite descriptions and uniqueness as derivable, are not correct. There remain plenty of problems with viewing uniqueness as the essence of definiteness, but, in general, they do not support the position that familiarity is the essence instead. Portions of this paper were read at the annual meeting of the Linguistic Society of America (Atlanta, January 2003) under the title “The Difference between Definite and Indefinite Descriptions.” I am grateful to Larry Horn and Kent Bach for reading drafts of the LSA paper and providing me with extensive and very helpful comments. I am also grateful to Jeanette Gundel and Martin Hahn for organizing a terrific workshop, and for inviting me. I thank the audiences in Atlanta and Vancouver for their comments, and
70
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
especially Chris Barker, Michael Israel, Christopher Potts, Ori Simchen, and Gregory Ward. Finally I owe a special debt to Kent Bach for suggesting I think about replying to the papers of Szabó and Ludlow and Segal that are addressed here. Despite all of this high-class help, I must acknowledge the possibility of remaining flaws and accept responsibility for them. NOTES 1. Strictly speaking, it might be considered to be inconsistent with Russell’s analysis to talk in terms of a denotation for definite descriptions. 2. “ . . . when we implicate that something is given information we are not explicitly saying: ‘this is given information.’ It is simply something that competent users of English can infer from the conventional implicature inherent in ‘the’.” “When the predicate by itself ensures uniqueness, the implicature [of givenness] gets overridden” (Ludlow and Segal 2004, 425; emphasis added). 3. The point in this paragraph was also made in Abbott 2003. 4. See Apostolou-Panara 1994 for description of a construction in Greek which may have been influenced by these hyperbolic constructions in English.
REFERENCES Abbott, Barbara. 1993. A pragmatic account of the definiteness effect in existential sentences. Journal of Pragmatics 19, 39–55. —— . 1997. Definiteness and existentials. Language 73, 103–108. —— . 1999. Support for a unique theory of definite descriptions. SALT 9: Proceedings from Semantics and Linguistic Theory IX. Ithaca, NY: CLC Publications, 1–15. —— . 2000. Presuppositions as nonassertions. Journal of Pragmatics 32, 1419–1437. —— . 2003. A reply to Szabó’s “Descriptions and uniqueness.” Philosophical Studies 113, 223–231. Apostolou-Panara, Athena. 1994. Language change underway?: The case of the definite article in Modern Greek. In Irene Philippaki-Warburton, Katerina Nicolaidis, and Maria Sifianou, eds., Themes in Greek linguistics. Amsterdam: John Benjamins, 397–404. Birner, Betty and Gregory Ward. 1994. Uniqueness, familiarity, and the definite article in English. BLS 20, 93–102. —— . 1998. Informational status and noncanonical word order. Philadelphia: John Benjamins. Breheny, Richard. 1997. A unitary approach to the interpretation of definites. UCL Working Papers in Linguistics 9, Department of Phonetics and Linguistics, University College London, 1–27. Epstein, Richard. 1996. Viewpoint and the definite article. In Adele E. Goldberg, ed., Conceptual structure, discourse, and language. Stanford: CSLI Publications, 99–112. Grice, H. Paul. 1975. Logic and conversation. In Peter Cole and Jerry L. Morgan, eds., Syntax and semantics, vol. 3: Speech acts. New York: Academic Press, 41–58. —— .1981. Presupposition and conversational implicature. In Peter Cole, ed., Radical pragmatics. New York: Academic Press, 183–198.
ISSUES IN SEMANTICS AND PRAGMATICS
71
Gundel, Jeanette K., Nancy Hedberg and Ron Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language 69, 274–307. —— . 2001. Definite descriptions and cognitive status in English: Why accommodation is unnecessary. English Language and Linguistics 5, 273–295. Hawkins, John A. 1978. Definiteness and indefiniteness. Atlantic Highland, NJ: Humanities Press. —— . 1991. On (in)definite articles: Implicatures and (un)grammaticality prediction. Journal of Linguistics 27, 405–442. Heim, Irene. 1982. The semantics of definite and indefinite noun phrases. Amherst, MA: University of Massachusetts doctoral dissertation. Kempson, Ruth M. 1975. Presupposition and the delimitation of semantics. Cambridge: Cambridge University Press. Ludlow, Peter and Gabriel Segal. 2004. On a unitary semantical analysis for definite and indefinite descriptions. In Marga Reimer and Anne Bezuidenhout, eds., Descriptions and beyond. Oxford: Oxford University Press, 420–436. Montague, Richard. 1973. The proper treatment of quantification in ordinary English. In Jaakko Hintikka, Julius Moravcsik, and Patrick Suppes, eds., Approaches to natural language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics. Dordrecht: Reidel, 221–242. Prince, Ellen F. 1992. The ZPG letter: Subjects, definiteness, and information status. In William C. Mann and Sandra A. Thompson, eds., Discourse description: Diverse linguistic analyses of a fund-raising text. Philadelphia: John Benjamins, 295–326. Russell, Bertrand. 1905. On denoting. Mind 14, 479–493. Sharvy, Richard. 1980. A more general theory of definite descriptions. Philosophical Review 89, 607–624. Szabó, Zoltán Gendler. 2000. Descriptions and uniqueness. Philosophical Studies 101, 29–57. —— . 2003. Definite descriptions without uniqueness: A reply to Abbott. Philosophical Studies 114, 279–291.
72
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
4
Equatives and Deferred Reference gregory ward
Introduction One of the most creative, but still poorly understood, features of natural language is the possibility of deferred reference (Nunberg 1977, 1979, 1995): the metonymic use of an expression to refer to an entity related to, but not directly denoted by, the conventional meaning of that expression. Various types of deferred reference—and the various linguistic mechanisms available for such reference—have been identified and discussed in the literature. A few classic examples, taken from Nunberg 1995, are given in (1): (1)
a. [server to coworker in deli] The ham sandwich is at table 7. [= Nunberg’s (1995) ex. (19)] b. [restaurant patron to valet, holding up a key] This is parked out back. [= Nunberg’s (1995) ex. (1)] c. Yeats is still widely read. [= Nunberg’s (1995) ex. (49)]
In (1a), the speaker is referring indirectly to the person who ordered the ham sandwich via the ham sandwich itself. In (1b), the speaker refers indirectly to her car by an ostensive reference to the car’s key, while in (1c) the speaker is referring to the works of the poet via the poet himself.
73
The phenomenon of deferred reference has been intensively investigated in a number of frameworks and under a variety of labels (e.g., Sag 1981, Fauconnier 1994; see Nunberg 1995 for a review). Often overlooked is the possibility of deferred reference with copular sentences of the form NP-be-NP, so-called equative sentences or identity statements, illustrated by the naturally occurring examples in (2): (2)
a. [customer to server holding tray full of orders] I’m the Pad Thai. [BL, 8/10/02] b. When it comes to allergies, I’m a grass-ragweed-pet dander. [Zyrtec TV ad] c. Samir Abd al-Aziz al-Najim is the four of clubs. [Chicago Tribune, 4/19/03]
With these sentences, the speaker is equating the referents of the two NPs to convey a particular correspondence between them. In (2a), the speaker identifies himself with his lunch order to convey indirectly that he is the person who ordered Pad Thai, while in (2b) the speaker is indirectly identifying her allergens. Lastly, in (2c), the reporter is identifying which playing card bears the picture of al-Najim, the former Iraqi Ba’th Party Regional Command Chairman for East Baghdad. Henceforth, I shall refer to such sentences as deferred equatives. In this chapter, I present a pragmatic analysis of deferred equatives, showing how they constitute a distinct type of deferred reference. I argue that previous accounts, in not distinguishing between deferred equatives and non-equatives, have failed to consider the distinctive properties of the former, and that current theories of deferred reference will have to be revised in light of these facts. Specifically, I claim that the felicitous use of deferred equatives requires the presence of a contextually salient correspondence, or pragmatic mapping, to hold between sets of relevant discourse entities (cf. Nunberg’s (1995) notion of functional correspondence). In (2a), for example, the relevant mapping is between restaurant customers (set 1) and their orders (set 2). A crucial difference between the equative in (2a) and the non-equative in, say, (1a), is that with deferred equatives both of the mapped entities are explicitly evoked within the equative and, consequently, their meanings remain intact. Contrary to previous studies that suggest that all deferred reference requires a shift in sense, that is, nominal or predicate meaning, I argue that in the case of deferred equatives the deferred interpretation is the result of a shift in meaning of the copula alone, and that in the case of certain deferred non-equatives (e.g., (1a) ) the relevant transfer is one of reference and not sense. In addition, the felicitous use of a deferred equative requires the presence of a contextually licensed open proposition whose instantiation encodes the particular mapping between entities.
74
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
1. Background Nunberg (1977) provided the first systematic and comprehensive account of the sort of linguistic metonymy illustrated in (1), arguing that such deferred interpretations are the result of a transfer of reference from one discourse referent (e.g., a ham sandwich) to another (the orderer of that sandwich). In more recent work, Nunberg (1994,1995) argues that the transfer mechanism applies not to the referents of linguistic expressions, but to their senses. He bases this account on the notion of meaning transfer: “[t]he name of a property that applies to something in one domain can sometimes be used as the name of a property in another domain, provided the two properties correspond in a certain way” (1995,111). Linguistically, these properties can be supplied by predicates of any semantic kind and in any syntactic position. Nunberg identifies two basic linguistic mechanisms for meaning transfer: predicate (VP) transfer (3a) and common noun (N) transfer (3b). (3)
a. Predicate transfer I’m parked out back. Þ ‘I’m the owner of a car parked out back.’ b. Common noun transfer The ham sandwich is at table 7. Þ ‘The ham sandwich orderer is at table 7.’
In predicate transfer, the predicate supplies the property to which the transfer applies: the VP be parked out back in (3a) provides a property of ‘being parked out back’, the meaning of which undergoes a transfer to the property of ‘being someone whose car is parked out back’. Such a transfer would be licensed, say, in the context of a valet locating a car belonging to a customer. In the case of nominal transfer, the property is supplied by a common noun—ham sandwich in (3b)—which then undergoes transfer to the property of ‘being the person who ordered a ham sandwich’. This transfer would be relevant in the context of a server referring to a particular customer. Thus, under Nunberg’s account, the relevant relationship is between predicates and properties and not between (sets of) discourse entities. Consistent with that view, Nunberg explicitly claims that the referent of the NP is “not involved in the interpretation of the utterance” (1995,115), and that only the relevant property or predicate associated with the common noun within that NP is: the transfer is one of sense and not reference. Nunberg does not distinguish between equatives and non-equatives within his theory of deferred reference; indeed, he provides (4) as an unexceptional example of nominal transfer applying to the definite description in predicate position: (4)
I am the ham sandwich and I’d like it right now.
Note, then, that one consequence of Nunberg’s system is that, in the case of deferred equatives consisting of two definite descriptions, one in subject
EQUATIVES AND DEFERRED
75
position, the other in predicate position, the relevant property could, in principle, result from either nominal transfer on the subject N or predicate transfer on the VP, as seen in (5):1 (5)
a. The man at table 7 is the ham sandwich. Þ; ‘The order of the man at table 7 is the ham sandwich.’ (nominal transfer) b. The man at table 7 is the ham sandwich. Þ ‘The man at table 7 is the ham sandwich orderer.’ (predicate transfer)
However, in the case of deferred equatives with no definite descriptions, Nunberg’s nominal transfer mechanism is presumably unavailable given the absence of a common noun upon which to base the transfer. Consider the proper name equative in (6): (6)
[coworkers discussing Secret Santa assignments] Who’d you get? I’m Anne.
Here, neither the pronominal subject NP (I) nor the predicate proper name NP (Anne) lends itself to a nominal transfer. Thus, the only mechanism available would be predicate transfer, as in (7), in which the predicate be Anne would supply the relevant property of ‘being Anne’ and would undergo a meaning transfer to provide the property of ‘being the family member assigned to Anne’. (7)
I’m Anne. Þ ‘I’m the family member assigned to Anne.’
For Nunberg, then, the deferred equative in (4), I am the ham sandwich, is taken to be the result of nominal transfer, whereas the deferred equative in (7), I’m Anne, is the result of predicate transfer. Thus, under a theory of meaning transfer, very similar constructions and uses receive quite different semantic analyses. In what follows, I will present evidence in favor of the notion of reference transfer, pace Nunberg, as well as of a non-transferred analysis for deferred equatives.
2. Sense Transfer Versus Reference Transfer Nunberg (and others) have offered a number of arguments in support of the notion of meaning or sense transfer and against the notion of reference transfer; these are listed in (8): (8)
• number clash • reference to the (non-)deferred referent • reflexives, bound anaphora, and quantification
In this section, I review some weaknesses of these arguments and relate them specifically to deferred equatives.
76
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
2.1 Number Clash Nunberg (1995) provides examples of number clash as in (9) (judgments in original): (9)
That (*those) french fries is (*are) getting impatient.
Recall that under Nunberg’s analysis, there are two possible deferred interpretations: one involving nominal transfer on french fries (to ‘orderer of french fries’), and one involving predicate transfer on be getting impatient (to ‘be the order of the person who is getting impatient’), analogous to the two interpretations of the ham sandwich example in (5). However, Nunberg rules out predicate transfer in (9) on the grounds that the (non-transferred) plural subject (french fries) would force plural agreement with the predicate, which, as (10) shows, is infelicitous: (10)
# Those french fries are getting impatient. Þ ‘Those french fries are the order of the person who is getting impatient.’
That is, to get the deferred interpretation with predicate transfer, the subject nominal (those french fries) would have to remain untransferred (and plural) and with plural verb agreement the result is infelicitous. With predicate transfer ruled out, Nunberg must analyze (9) as an instance of nominal transfer, in which the presence of the demonstrative determiner, grammatically marked for number, forces agreement with the head noun. In the case of french fries, the head noun is plural, but its corresponding deferred meaning (‘orderer’) is singular. This leads to the possibility of a number clash between the deferred and non-deferred interpretations, as illustrated in (11): (11)
a. That french fries is getting impatient. Þ ‘That orderer of french fries is getting impatient.’ b. *Those french fries are getting impatient. Þ *‘Those orderer of french fries are getting impatient.’
Since the grammatical number of both the determiner (that) and the verb (is) in (9) agrees with that of the transferred property (11a) and not with that of the plural nominal (11b), Nunberg concludes that transfer applies to the meaning of the common noun—and not the full NP. However, there is another—simpler—explanation for the (relative) acceptability of (11a): french fries is being used here to denote a unit of food and not a plurality of objects, as illustrated in the following service encounter: (12)
Server: May I take your order? Customer: A large fries, two milks, and three chicken nuggets. Þ ‘A large order of fries, two cartons of milk, and three orders of chicken nuggets.’
As is well known, nouns that are canonically collective and plural (e.g., french fries, chicken nuggets) or mass and singular (e.g., milk) can be unitized
EQUATIVES AND DEFERRED
77
into countable (and grammatically singular) nouns. Thus, a noun like french fries is not a good candidate for seeing whether meaning transfer has applied. A better test case would be pluralia tantum nouns like tongs, glasses, and pajamas, that is, grammatically plural nouns that denote single objects. Unlike unitized nouns like french fries, pluralia tanta in deferred reference constructions do display a number clash between a singular demonstrative and a plural noun, contrary to the predictions of an account based solely on meaning transfer: (13)
a. # I think that that shorts over there is pretty cute. Þ ‘I think that that person wearing shorts over there is pretty cute.’ b. # That pliers you were talking to now wants to know where we keep the power tools. Þ ‘That person you were talking to who bought the pliers now wants to know where we keep the power tools.’
Although acceptability judgments vary,2 deferred reference here to a singular referent should be perfectly felicitous if meaning transfer applies to the underlined common nouns in these examples. That it is the number clash and not plurality per se that accounts for the infelicity of these examples is illustrated by (14): (14)
a. I think that the shorts over there is pretty cute.Þ ‘I think that the person wearing shorts over there is pretty cute.’ b. The pliers you were talking to now wants to know where we keep the power tools. Þ ‘The person you were talking to who bought the pliers now wants to know where we keep the power tools.’
Here, there is no number clash between the article and the plural noun, and the resulting examples are felicitous. Thus, it would appear that what blocks felicitous deferred reference in (13) is the number clash between the singular demonstrative determiner and the plural common noun and not between the plural common noun and the singular target. In contrast, note that in the case of deferred equatives, plural demonstratives are possible: (15)
A: Who’s responsible for delivering which sandwiches? B: I’m the sandwiches on the table. And you’re those sandwiches that John put in the refrigerator, remember?
The possibility of a plural demonstrative in this example confirms an analysis of equatives in which the nominal within the postverbal NP does not undergo a transfer of sense. Nunberg concedes that the felicity of examples involving the demonstrative depends on “specific principles of English morphosyntax” (1995,115). However, what is clear is that whatever these principles turn out to be, the number facts do not, in fact, support the theory of meaning transfer.
78
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
2.2 Reference to the (Non-)Deferred Referent Another argument often presented in discussions of deferred reference is the alleged unavailability of the NP that forms the basis for the transfer to serve as an antecedent for subsequent anaphora. Recall that Nunberg does concede the existence of examples like (4), repeated in (16), in which the felicity of the pronominal reference might suggest that transfer applies to the full NP (the ham sandwich), rather than just the common noun (ham sandwich): (16)
I am the ham sandwich and I’d like it right now.
About this, Nunberg (1995,129ff.) says: I don’t find this an odd thing to say, but I don’t think this means that the referent of the NP the ham sandwich in [16] is itself the antecedent of the subsequent pronoun. Rather, we should think of this token of it as a pronoun of laziness, analogous to the anaphors in [17a–c]: (17)
a. They enjoy eating rabbit, when there are any to be found. b. I don’t speak Italian, but I’d love to go there. c. You’d better put on some mosquito repellent, just in case there are any around.
In each of these sentences, a pronoun is used to refer to an entity that is semantically or materially connected to the referent of an expression in a previous clause—as rabbits to rabbit meat, Italy to Italian, mosquitoes to mosquito repellent. This phenomenon is well known, as are the constraints on this sort of usage. And [16] permits the same kind of analysis: when we use the ham sandwich to identify a person who has ordered a ham sandwich, we introduce a ham sandwich into the discourse context that is available for pronominal reference.
Thus, for Nunberg, reference to the non-deferred referent is possible only via a pronoun of laziness (Geach 1962, Partee 1978); that is, a pronoun that occurs as a type of syntactic substitution device to avoid repetition of its antecedent. As further support of this argument, Nunberg (1995) argues that, unlike genuinely anaphoric pronouns, pronouns of laziness disallow cataphora, as seen in (18) (judgments in original): (18)
a. If ever you get there, you’ll find Italy a lovely country. b. ? If ever you get there, you’ll find Italian surprisingly easy to learn.
This argument raises a number of issues regarding reference and anaphora, and it is to those issues that I now turn.
2.2.1
PRONOUNS OF LAZINESS AND OUTBOUND ANAPHORA First, I agree with Nunberg that when the ham sandwich is used to refer to its orderer, a ham sandwich has been evoked in the discourse model; however, I disagree that subsequent reference to the sandwich is via a pronoun of laziness.
EQUATIVES AND DEFERRED
79
Nunberg is arguing for two distinct sources for discourse referents: those evoked directly by referring expressions and those supplied indirectly via some kind of pragmatic inference. Interestingly, exactly the same argument was made to account for the alleged ungrammaticality of reference to entities evoked by word-internal elements, as illustrated in (19): (19)
# John is a truck-driver but doesn’t live in it. [cf. John drives a truck but doesn’t live in it.]
However, as Ward et al. (1991) have shown in their work on so-called anaphoric islands, reference to entities evoked by compound-internal elements (i.e., what Postal (1969) termed outbound anaphora) is fully grammatical and subject only to pragmatic—and not morphosyntactic—constraints. In particular, they argued that the felicity of a pronoun is sensitive to the salience of its referent in context; the morphosyntactic form of a pronoun’s antecedent in discourse (if any) is but one of several factors that affect the salience of that pronoun. Thus, any attempt to use the (in)felicity of a pronoun to argue definitively for one syntactic or semantic structure over another is highly suspect. Consider the minimal pair in (20): (20)
a. John is a big Jane Fonda fan. He has every single one of her workout videos. b. John is a big fan of Jane Fonda. He has every single one of her workout videos.
Under the (now discredited) outbound anaphora view of pronouns (Postal 1969; Lieber 1990, inter alia), one would have to argue that the interpretation of the pronoun in (20b) is the result of a direct coreferential relation with its antecedent NP Jane Fonda, while the interpretation of the pronoun in (20a) is the result of accessing the relevant entity in the discourse model that had been evoked by the compound-internal element. This is exactly the same distinction that Nunberg argues for with respect to meaning transfer in his discussion of example (16), repeated below as (21a) for convenience: (21)
a. I am the ham sandwich and I’d like it right now. [= (16)] b. I am the orderer of the ham sandwich and I’d like it right now.
Nunberg would have to argue that the pronoun in (21b) has as its antecedent the NP the ham sandwich, but that in the case of (21a), the pronoun has no linguistic antecedent but can be used to refer to the inferrable ham sandwich evoked by the NP the ham sandwich via meaning transfer. But as Ward et al. (1991) point out, there is in fact no basis for making such a distinction: All intersentential (unbound) anaphora can be accommodated under the same general mechanism of discourse model reference. 2.2.2 FELICITY OF DUAL REFERENCE TO DEFERRED / NON-DEFERRED REFERENT A crucial question in studies on deferred reference has been whether reference to the non-deferred referent is possible. Nunberg uses the
80
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
(rather awkward) sentence in (16) to make his case; however, this example does not provide sufficient context to license reference to the customer’s order. Consider a more natural-sounding example of reference to the nondeferred entity in the same restaurant context: (22)
Hey, Shirley, the filet mignon at table 7 says it’s delicious.
In fact, it is possible to construct examples in which reference to both the deferred and non-deferred entities within the same utterance is felicitous, as in (23), in which the indefinite NP is hearer-new (in the sense of Prince 1992) and serves as the only available antecedent for the subsequent pronoun: (23)
Hey, Shirley, I’ve got a filet mignon at table 7 that says it’s the best steak he’s ever eaten.
Here, Nunberg would have to argue that the first pronoun (it) is a pronoun of laziness, while the second pronoun (he) is the result of meaning transfer, with the transferred meaning serving as that pronoun’s antecedent. Under the discourse model view of reference, both pronouns are interpreted in the same way, by reference to entities that have been rendered salient in the discourse model through reference transfer. 2.2.3 CATAPHORA Cataphora is in fact possible with deferred reference. Consider the examples in (24), which my informants and I judge to be uniformly felicitous: (24)
a. Fearing that it would get broken into, the BMW shelled out $15 for valet parking. b. [valet to valet] Because he’s afraid it will get stolen, the BMW always insists on parking it himself. c. [server to server] Right after he ordered it, the filet mignon at table 7 decided he wanted it well-done rather than rare.
It may be true that cataphora is more difficult with deferred interpretations, but that could simply be the result of the added complexity of having to hold off on interpreting the pronoun until the reference transfer has been processed. 2.2.4 ASSOCIATIVE ANAPHORA A fourth problem for Nunberg’s account of meaning transfer is the existence of deferred reference with associative or indirect anaphora (also known as inferrables, in the sense of Prince 1981, 1992). Consider the examples in (25): (25)
a. Look—the ham sandwich is tearing off the crust! b. Look—the tomato soup is spitting out the croutons!
As is well known, there must be an anchoring referent in the discourse model (i.e., a salient soup and sandwich) for these inferrables to be licensed and interpreted appropriately (Clark 1977; Hawkins 1978, 1991; Prince 1981, 1992; inter alia). That is, the crust that the speaker is referring to in (25a)
EQUATIVES AND DEFERRED
81
is not the crust that is plausibly associated with sandwiches in general, but rather it is the crust of the particular ham sandwich that the ham sandwich orderer has ordered. So, if the meaning of ham sandwich is transferred and there is no ham sandwich evoked in the discourse, then the felicity of the associative anaphora in (25) remains a mystery. Note that under Nunberg’s analysis, we would expect the same forms that license a pronoun of laziness to license associative anaphora, but that is not what we find. As seen by the infelicity of (26), it is not the case that associative anaphora is generally possible wherever we would find a pronoun of laziness (cf. (17b) ). (26)
I speak Italian. #The weather is great.
Thus, an account based on meaning transfer alone cannot account for the well-formed examples of associative anaphora with deferred reference.
2.3 Reflexives, Bound Anaphora, and the Scope of Quantification The last argument that Nunberg offers in support of nominal transfer—and against reference transfer—is one that involves reflexives, bound anaphora, and the scope of quantification. First, he argues that for deferred reference with definite descriptions such as the ham sandwich, the definite article has scope over the transferred property (‘ham sandwich orderer’) and not the (non-transferred) common noun (ham sandwich). Consider the example in (3b), repeated in (27): (27)
The ham sandwich is at table 7.
Thus, for Nunberg, the uniqueness presupposition, entailment, or implicature associated with definite NPs (e.g., Russell 1905; Hawkins 1978, 1991; Lewis 1979; Kadmon 1990, 2001; Roberts 2003; Gundel et al. 1993; Birner and Ward 1994; Lambrecht 1994; Abbott 2004; inter alia) holds of the transferred meaning, not the conventional one. As Nunberg (1995,116) puts it: “Example [27] doesn’t presuppose the existence of a unique ham sandwich (think of a waiter in a fast-food restaurant who is standing in front of a table piled with ham sandwiches), but does presuppose the existence of a unique ham-sandwich orderer.” I agree with Nunberg that the meaning of the definite article is associated with the deferred referent, that is, that it is the orderer of the ham sandwich that must be uniquely identifiable in context and not the ham sandwich itself. However, it does not follow from this that transfer must occur on the common noun ham sandwich instead of on the referent of the full NP. Instead, what the scope facts show is that the reference transfer operation (like predicate/nominal transfer) is sensitive to the form and meaning of the NP in establishing the referent upon which the transfer applies. Consider the examples in (28): (28)
a. {The/A} ham sandwich left me a big tip. b. Every ham sandwich I waited on today left me a big tip.
82
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
Under the theory of discourse model reference being adopted here, the subject NPs in these examples evoke a semantically coherent (i.e., non-anomalous) discourse entity that forms the basis for subsequent reference transfer. In the case of (28a), for example, the definite NP the ham sandwich evokes a coherent discourse entity (a uniquely identifiable ham sandwich) that then provides the basis for the transfer in the discourse model to a uniquely identifiable ham sandwich orderer. Similarly, the indefinite NP a ham sandwich evokes a coherent discourse entity (a non-uniquely identifiable ham sandwich) that provides the basis for a transfer to a non-uniquely identifiable ham sandwich orderer. In sum, the coherence of an entity evoked by the NP will affect the transparency of the subsequent transfer operation: Those NPs that evoke a coherent referent will facilitate reference transfer, while those that do not will inhibit it. Nunberg provides a similar argument based on reflexive anaphors and binding theory. Consider (29a), originally from Fauconnier (1994), with the interpretation provided in (29b):3 (29)
a. # The mushroom omelet was eating itself with chopsticks. b. Þ The mushroom omelet orderer was eating the actual mushroom omelet with chopsticks.
At issue for Nunberg is the lack of a deferred reference reading: Why can’t (29a) mean (29b): ‘The orderer of the mushroom omelet was eating it with chopsticks’? The answer, according to Nunberg, is that the subject NP in (29a) has undergone nominal transfer to ‘the mushroom omelet orderer’ (with a human referent) and is therefore no longer a possible binder for the (neuter) reflexive anaphor. Indeed, we find the reflexive anaphor in (30a) agreeing with the transferred meaning in (30b): (30)
a. The mushroom omelet was thoroughly enjoying himself. b. Þ The mushroom omelet orderer was thoroughly enjoying himself.
Here, the gender feature of the reflexive agrees with the deferred referent, that is, the person who ordered the omelet, not the omelet itself. If nominal transfer did not apply, Nunberg argues, the mushroom omelet would remain the neuter subject NP of (29a) and (30a) and, under normal binding conditions, the reflexive anaphor itself—c-commanded by and coindexed with the subject NP—would be required and the reflexive anaphor himself disallowed. Since, under the deferred interpretation, the neuter form is ungrammatical and the masculine form grammatical, Nunberg concludes that nominal transfer must have applied. There is, however, an alternative explanation for the ill-formedness of (29a) and the well-formedness of (30a): the gender feature of a reflexive anaphor agrees with the referent—deferred or non-deferred—of a c-commanding NP in its local domain (see Burzio 1992, Lidz 2001,
EQUATIVES AND DEFERRED
83
inter alia). As exemplified in (31), this holds just in case the NP itself is unspecified for that feature: (31)
a. That drag queen is very impressed with himself/herself. b. Someone is very impressed with himself/herself.
In this way, on independent grounds, the interpretation of reflexives in general requires access to information about the referent of an expression and the same applies to the interpretation of deferred referents. No special appeal to meaning transfer is required, thus eliminating it, too, as an argument against reference transfer.
3. An Alternative Account In light of these problems with previous accounts of deferred reference based on meaning transfer, I believe an alternative account merits consideration. I agree with Fauconnier (1994) and Nunberg (1995) that the felicitous use of deferred reference requires a contextually licensed correspondence from one object to another. However, I claim that such a correspondence applies to equatives and non-equatives differently. In the case of equatives, the pragmatic mapping between set members is explicitly encoded by the two NPs of the equative construction and no reference transfer is involved. For non-equatives, the mapping is implicit and only one of the mapped set members is explicitly evoked in the discourse, yet as a result of reference transfer, both set members may be available for subsequent anaphoric reference under the right pragmatic conditions. That is, the referential availability of both mapped set members is, in the case of equatives, the result of explicit evocation, and in the case of non-equatives, of reference transfer. I am assuming here a view of reference that makes crucial use of the notion of discourse model (Sidner 1979, Webber 1979, Grosz and Sidner 1986). Under this view, reference is seen as an interactive, dynamic process between speakers and hearers—specifically, the use of a linguistic expression to induce a hearer to access or create some entity in his or her mental model of the discourse. A discourse entity represents the referent of a linguistic expression, that is, the actual individual (or event, property, relation, situation, etc.) that the speaker has in mind and is saying something about. Within philosophy, the traditional view has been that reference is a direct relationship between linguistic expressions and the objects in the real world that they denote. However, the discourse model approach takes a different perspective, viewing this relation as mediated through the (assumed) mutual beliefs of the participants. Under this view, the form of referring expression depends on the assumed information status of the referent, which in turn depends on the assumptions that a speaker makes regarding the hearer’s knowledge store, as well as what the hearer is attending to in a given discourse context (Ariel 1990, Gundel et al. 1993, Prince 1992 inter alia). Entities that are more salient are more available for subsequent reference. No single factor accounts
84
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
for the accessibility of discourse entities, but among the factors that determine salience are recency of mention, contrast, topicality, argument position, and morpho-syntactic form of the antecedent (if any) (Ward et al. 1991, Kehler and Ward 2004 inter alia). In a similar vein, Recanati (2003) argues for an account of deferred reference in terms of the dual notions of activation and accessibility: When the words the ham sandwich are uttered, we may [. . . ] suppose that the representation of the ham sandwich is more active than the ‘derived’ representation of the ham sandwich orderer because it is linguistically encoded and has some form of priority over the ham sandwich orderer (derived value). [This] initial ranking is reversed when further linguistic material comes into play. After the predicate in the sentence The ham sandwich has left without paying has been processed, the ham sandwich is no longer a more accessible candidate than the ham sandwich orderer—the order of accessibility is reversed . . . The predicate has left without paying demands a person as argument; this raises the accessibility of all candidates who are (represented as) persons. In this way the representation of the ham sandwich orderer gains some extra activation which makes him more accessible than the ham sandwich, after the predicate has been processed. Under this view, both representations are available in the discourse model; they differ only in terms of their relative accessibility as determined by the discourse context. One of the factors that affects accessibility is the salience of the correspondence between the referents. To illustrate, consider again example (1a), repeated here as (32), in which the speaker’s use of the ham sandwich to refer to his customer presupposes a contextually licensed mapping between customers and orders: (32)
The ham sandwich is at table 7.
For such non-equatives, this correspondence is implicit. The server who utters (32) asserts not that his customer has ordered the ham sandwich, but that she is at table 7. His use of deferred reference presupposes that his hearer can access the relevant correspondence between customers and orders upon which to base the transfer. For those cases of deferred reference involving correspondences that are highly context-specific (as in (32) ), the transfer operation applies to the referent of the NP and not to the denotation of the bare nominal. That is, there is a (pragmatic) transfer of reference and not a (semantic) transfer of sense in these cases, and both deferred and non-deferred referents are available in the discourse model for subsequent reference, given sufficient accessibility.4 In the case of deferred equatives, a rather different situation obtains. First, note that for canonical equatives, two distinct NPs are used to refer to the same discourse entity, as when a speaker uses (33) to assert that Chris and the guy in question are the same individual:
EQUATIVES AND DEFERRED
85
(33)
Chris is that guy I was telling you about.
In deferred equatives, on the other hand, the two NPs are not themselves coreferential; rather, the equative encodes the mapping between members of distinct sets of discourse entities. That is, the referential mapping that is characteristic of extralexical deferred reference in general is explicitly realized in a deferred equative. For example, a diner could use the equative in (2a) (repeated here as (34a) ) to assert that she is the person who ordered the Pad Thai, while at the same time presupposing that there exists a salient mapping between restaurant customers (set 1) and their orders (set 2): (34)
a. I’m the Pad Thai. b. I map onto the Pad Thai.
Crucially for deferred equatives, both of the mapped set members—“I” and “the Pad Thai” in example (34)—are explicitly represented, with the copula interpreted as linking the two set members rather than literally equating them, as informally represented in (34b). This extension, or coercion, of copular meaning from ‘be’ to ‘map onto’ is similar to Nunberg’s notion of predicate transfer; however, it is much more constrained. First, it applies only to the copula be of equatives (for Nunberg, any verb can undergo predicate transfer); second, it applies only to the verb and not to its arguments. Now compare the equative in (34a) with the non-equative in (32), in which the extralexical mapping between customers and orders is left implicit, triggering reference transfer on the evoked ham sandwich. In fact, what distinguishes deferred equatives from deferred non-equatives is precisely the explicitness of the encoding of the mapping operation, which we can represent by means of an open proposition (OP).
3.1 Deferred Equatives and Open Propositions An OP is a proposition with one or more variables or underspecified elements, corresponding to that aspect of information structure that constitutes backgrounded or presupposed information. Consider the examples of OP-sensitive constructions in (35) and (36): (35)
a. I plan to discuss several topics. What I’ll discuss first is the notion of political correctness. b. OP: I’ll discuss X first. c. FOCUS: the notion of political correctness
(36)
a. I’m not really into sports. Baseball I like, but more for the scene at Wrigley Field than the actual game. b. OP: I have disposition X toward {sports}. c. FOCUS: like
86
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
In (35a), the first sentence (I plan to discuss several topics) makes accessible the proposition that I will discuss some topic first. Thus, the open proposition “I’ll discuss X first” is salient, in which X is a variable ranging over the set {topics}. It is the salience of this OP that licenses the use of the wh-cleft in the second sentence; the expectation that the speaker will discuss some topic first renders the use of the cleft felicitous. The instantiation of the variable selected from this set—here, the notion of political correctness—constitutes the focus of the utterance and is realized prosodically with a pitch accent. This packaging of information into an open proposition and a focus corresponds closely to the focus/presupposition (or focus/focus frame) distinction of Chomsky (1971), Jackendoff (1972), Rochemont (1986), Vallduví (1992), Lambrecht (1994), Gundel and Fretheim (2004), inter alia. For further discussion of OPs and the constructions that are sensitive to them, see Prince (1986), Ward (1988), Birner and Ward (1998), Birner, Kaplan, and Ward (2001), inter alia. Similarly, the first sentence in (36a) evokes the notion that I have certain likes and dislikes regarding sports. The OP ‘I have disposition X toward sports’ is thus salient and licenses the preposing in the second sentence (Baseball I like). The instantiation of the variable here is the focused element like. Without a salient OP, the preposing would be infelicitous: (37)
My father-in-law is visiting this weekend. #Baseball I like, so at least we’ll have something to talk about.
In this example, the initial sentence does not make salient the notion that the speaker has various likes and dislikes toward different sports, and therefore the preposed variant is infelicitous. In the case of deferred equatives, the relevant OP contains two variables, corresponding to the two sets from which the instantiations of these variables are drawn. Consider (2a), repeated here as (45a): (38)
a. I’m the Pad Thai. b. OP: X maps onto Y (where X is a member of the set {customers} and Y is a member of the set {orders}). c. FOCI: I, the Pad Thai
The OP corresponding to (38a) is formed in the usual way by replacing the two foci (I, the Pad Thai) with variables, as in (38b). The instantiation of the variables must be drawn from the two sets involved in the mapping. For this example, we might gloss the instantiation informally as: “I, a member of the set of customers, correspond to the Pad Thai, a member of the set of orders.” What makes the utterance in (38a) deferred is not a transfer of sense or reference from either of the equative NPs; rather, it is the coercion of be to map onto as represented in the OP. In this way, there is no need to invoke any kind of predicate, nominal, or reference transfer mechanism for deferred equatives.
EQUATIVES AND DEFERRED
87
In fact, it is this coercion of the copula’s meaning that requires the presence of a salient OP. Note that without the OP in (38b) being salient, the deferred equative is infelicitous, as seen in (39): (39)
A: How was your meal? B: Good. #I was the Pad Thai.
Here, the OP in (b)—that various customers correspond to orderers of various dishes—is not salient, and the deferred equative is therefore infelicitous. In contrast, note that the corresponding non-deferred reference is felicitous, as in (40): (40)
A: How was your meal? B: Good. I had the Pad Thai.
From this we can conclude that the infelicity observed in (39) is not the result of one’s answering the question “How was your meal?” with a description of what one ate. Nor is it the result of referring to one’s lunch order with a definite article, as long as that order is uniquely identifiable in context. Rather, the infelicity of the deferred equative in (39) can be attributed to the absence of a contextually salient double-variable OP. Note that non-deferred equatives, illustrated in (41), are not subject to this constraint: (41)
a. I think that guy over there is my next-door neighbor Sam. b. This painting is the same as the one hanging in the Louvre.
Neither of these equatives requires that any particular OP be salient to ensure felicity. A crucial consequence of the difference between deferred equatives and non-equatives is that in the case of equatives, both of the NPs can retain their literal interpretation even under a deferred interpretation. For example, in the case of I’m the Pad Thai, only the copula undergoes a transfer of meaning; the referent of the post-copular NP is the actual Pad Thai. This difference can be teased out with the addition of a relative clause, as in (42) and (43): (42)
a. # John is the Pad Thai, who drives a Rolls Royce. b. John is the Pad Thai, which looks delicious. c. John is talking to the Pad Thai, who drives a Rolls Royce.
(43)
a. # The Pad Thai, who drives a Rolls Royce, is John. b. The Pad Thai, which looks delicious, is John. c. The Pad Thai, who drives a Rolls Royce, is talking to John.
In (42a), the intended interpretation—that John, who ordered the Pad Thai, drives a Rolls Royce—is unavailable, suggesting that the referent of the predicate NP is the actual Pad Thai and not John. The felicity of the relative
88
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
clause in (42b)—with the Pad Thai as head of the relative clause—confirms this analysis. In the non-equative in (42c), on the other hand, the intended interpretation—that the person who ordered the Pad Thai drives a Rolls Royce—is well formed, suggesting that the referent of the relevant NP in the non-equative sentence is not the Pad Thai, but the one who ordered it. The same contrast between equatives and non-equatives applies to NPs in subject position, as seen in the corresponding examples in (43). We see the same pattern emerge from wh-questions with deferred interpretations, as exemplified in (44): (44)
a. Let’s see … You’re what, the Pad Thai or the Nam Sod? b. # Let’s see … You’re who, the Pad Thai or the Nam Sod?
(45)
a. # Tell me honestly, what do you like more, the Pad Thai or the Nam Sod? b. Tell me honestly, who do you like more, the Pad Thai or the Nam Sod?
Again, the preferred form for the wh-question corresponding to the equative in (44) is what, showing agreement with the non-human (and non-deferred) referent. Conversely, the preferred form for the wh-question corresponding to the non-equative in (45) is who, showing agreement with the [+human] feature of the deferred referent. Thus, for deferred equatives, an account based on mapping with a concomitant shift in copula meaning correctly predicts that the copular NPs are interpreted literally and that their (non-deferred) referents are available in the discourse model for subsequent reference. Of course, it is possible for one of the copular NPs to itself undergo a sense transfer while still participating in a pragmatic mapping. Consider the utterance in (46): (46)
[Physician assigning interns to patients] You and you are shortness of breath. You and you take vertigo. [ER, 4/25/03]
This utterance was produced in a context in which the attending physician of an emergency room was running through a list of symptoms and assigning each to a particular intern. The equatives serve to encode the pragmatically salient mapping between interns and symptoms. However, the post-copular NP shortness of breath, describing a patient’s symptom, itself undergoes a sense transfer from symptom to patient. Thus, the equative maps an intern to a given symptom, the latter undergoing a sense transfer to the corresponding patient displaying that particular symptom.
4. Conclusion The findings presented here support an account of deferred equatives based on the notion of pragmatic mapping. This mapping is explicitly encoded by the two NPs of the equative construction, neither of which undergoes
EQUATIVES AND DEFERRED
89
a transfer of sense or reference, and both of whose referents are available in the discourse model for subsequent reference. Moreover, deferred equatives require that an open proposition be salient in the discourse at the time of utterance. For non-equatives, it is argued that the pragmatic mapping is implicit and that only one of the mapped set members is explicitly evoked in the discourse. For certain extralexical mappings, both set members may nonetheless be available for subsequent anaphoric reference as a result of reference transfer, subject to accessibility. It remains to be seen to what extent this account can be extended to the other deferred reference constructions. I am indebted to Barbara Abbott, David Beaver, Tonia Bleam, Ann Bradlow, David Braun, Ann Bunger, David Dowty, Dan Grodner, Larry Horn, Brian Joseph, Nikki Keach, Chris Kennedy, Bill Lachman, Jeff Lidz, Yoshiko Matsumoto, Geoff Nunberg, Scott Schwenter, Elisa Sneed, Jason Stanley, Sam Tilsen, and especially Betty Birner for their valuable comments and assistance on earlier versions of this work. This paper represents an earlier and much abridged version of Ward 2004. I am grateful to the LSA for permission to include it in this volume.
NOTES 1. There is, of course, a third possibility that is not relevant to the current discussion: nominal transfer on the predicate N within the VP, as in (i): (i) The man at table 7 is the ham sandwich. Þ ‘The man at table 7 is the ham sandwich orderer.’ 2. One of the anonymous reviewers finds the examples in (13) to be acceptable, while I and my informants categorically reject them. At issue, I believe, is the salience (and consequent acceptability) of the number clash (e.g., that shorts) and not the possibility of reference transfer. 3. Another possible, but irrelevant, reading for (29a) is the bizarre (non-deferred) interpretation that an actual omelet was literally eating itself with chopsticks. 4. See Sag (1981) for a number of proposals on how such a mapping might be formalized.
REFERENCES Abbott, Barbara. 2004. Definiteness and indefiniteness. In Laurence R. Horn and Gregory Ward (eds.), Handbook of Pragmatics. Oxford: Basil Blackwell, 122–149. Ariel, Mira. 1990. Accessing Noun-Phrase Antecedents. London: Routledge. Birner, Betty J., Jeffrey P. Kaplan, and Gregory Ward. 2001. Open propositions and epistemic would. Paper presented at the LSA Annual Meeting, Washington, D.C., January. Birner, Betty J. and Gregory Ward. 1994. Uniqueness, familiarity, and the definite article in English. BLS 20:93–102. —— . 1998. Information Status and Noncanonical Word Order in English. Amsterdam: John Benjamins.
90
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
Burzio, Luigi. 1992. On the morphology of reflexives and impersonals. In Christiane Lauefer and Terrell Morgan (eds.), Theoretical Analyses in Romance Linguistics (LSRL XIX). Amsterdam: John Benjamins. Chomsky, Noam. 1971. Deep structure, surface structure, and semantic interpretation. In Danny Steinberg and Leon Jakobovits (eds.), Semantics: An Interdisciplinary Reader in Philosophy, Linguistics, and Psychology. Cambridge: Cambridge University Press, 183–216. Clark, Herbert H. 1977. Bridging. In Philip Johnson-Laird and Peter Wason (eds.) Thinking: Readings in Cognitive Science. Cambridge: Cambridge University Press, 411–420. Fauconnier, Gilles. 1994. Mental Spaces. Cambridge, MA: MIT Press. Geach, Peter T. 1962. Reference and Generality. Ithaca, NY: Cornell University Press. Grosz Barbara and Candace L. Sidner. 1986. Attention, intentions, and the structure of discourse. Computational Linguistics 12:175–204. Gundel, Jeanette and Thorstein Fretheim. 2004. Topic and focus. In Laurence R. Horn and Gregory Ward (eds.), Handbook of Pragmatics. Oxford: Basil Blackwell. Gundel, Jeanette, Nancy Hedberg, and Ron Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language 69:274–307. Hawkins, John A. 1978. Definiteness and Indefiniteness. Atlantic Highlands, NJ: Humanities Press. —— . 1991. On (in)definite articles: Implicatures and (un)grammaticality prediction. Journal of Linguistics 27:405–442. Jackendoff, Ray. 1972. Semantic Interpretation in Generative Grammar. Cambridge, MA: MIT Press. Kadmon, Nirit. 1990. Uniqueness. Linguistics and Philosophy 13:273–324. —— . 2001. Formal Pragmatics. Oxford: Basil Blackwell. Kehler, Andrew and Gregory Ward. 2004. Constraints on ellipsis and event reference. In Laurence R. Horn and Gregory Ward (eds.), Handbook of Pragmatics. Oxford: Basil Blackwell, 383–403. Lambrecht, Knud. 1994. Information Structure and Sentence Form. Cambridge: Cambridge University Press. Lewis, David. 1979. Scorekeeping in a language game. Journal of Philosophical Logic 8:339–359. Lidz, Jeffrey. 2001. Condition R. Linguistic Inquiry 32:123–140. Lieber, Rochelle. 1990. On the Organization of the Lexicon. New York: Garland. Nunberg, Geoffrey. 1977. The Pragmatics of Reference. Ph.D. dissertation, City University of New York. —— . 1979. The non-uniqueness of semantic solutions: Polysemy. Linguistics and Philosophy 3:143–184. —— . 2004. The pragmatics of deferred interpretation. In Laurence R. Horn and Gregory Ward (eds.), Handbook of Pragmatics. Oxford: Basil Blackwell. —— . 1995. Transfers of meaning. Journal of Semantics 12:109–132. Partee, Barbara H. 1978. Bound variables and other anaphors. In David Waltz (ed.), Proceedings of TINLAP-2, University of Illinois, 79–85. Postal, Paul. 1969. Anaphoric islands. CLS 5:205–239. Prince, Ellen F. 1981. Toward a taxonomy of given/new information. In Peter Cole (ed.), Radical Pragmatics. New York: Academic Press, 223–254.
EQUATIVES AND DEFERRED
91
—— . 1986. On the syntactic marking of presupposed open propositions. CLS 22:208–222. —— . 1992. The ZPG Letter: Subjects, definiteness, and information-status. In Sandra Thompson and William Mann (eds.), Discourse Description: Diverse Analyses of a Fundraising Text. Amsterdam: John Benjamins, 295–325. Recanati, François. 2003. Literal Meaning. Cambridge: Cambridge University Press. Roberts, Craige. 2003. Uniqueness in definite noun phrases. Linguistics and Philosophy 26:287–350. Rochemont, Michael. 1986. Focus in Generative Grammar. Amsterdam: John Benjamins. Russell, Bertrand. 1905. On denoting. Mind 14:479–493. Sag, Ivan. 1981. Formal semantics and extralinguistic context. In Peter Cole (ed.), Radical Pragmatics. New York: Academic Press, 273–294. Sidner, Candace. 1979. Towards a Computational Theory of Definite Anaphora Comprehension in English Discourse. Ph.D. dissertation, MIT. Vallduví, Enric. 1992. The Informational Component. New York: Garland. Ward, Gregory. 1988. The Semantics and Pragmatics of Preposing. New York: Garland. —— . 2004. Equatives and deferred reference. Language 80:262–289. Ward, Gregory, Richard Sproat, and Gail McKoon. 1991. A pragmatic analysis of so-called anaphoric islands. Language 67:439–474. Webber, Bonnie L. 1979. A Formal Approach to Discourse Anaphora. New York: Garland Press.
92
APPROPRIATE LINGUISTIC ANALYSIS OF REFERRING EXPRESSION
III
HOW IS REFERENCE RESOLVED?
This page intentionally left blank
5
Rethinking the SMASH Approach to Pronoun Interpretation andrew kehler
1. Introduction The last three decades of research in psycholinguistics and computational linguistics have produced an extensive body of work on pronoun interpretation. Despite the diversity of viewpoints, research methodologies, and ultimate conclusions provided by this literature, a majority of these studies has assumed (either explicitly or implicitly) that a particular type of process underlies pronoun interpretation. I will call this process (for lack of a better term) the smash paradigm—for Search, Match, and Select using Heuristics— which is characterized as follows: 1. Search: Collect possible referents within some suitable contextual window (usually the current utterance and 1–3 utterances prior). 2. Match: Filter out those referents that fail ‘hard’ morphosyntactic constraints such as number, gender, and person agreement, and intrasentential syntactic binding constraints. 3. Select using Heuristics: Select a referent from those that remain by applying a set of heuristically based (‘soft’) preferences. These are usually based on surface-level morphosyntactic factors, such as grammatical role ranking and grammatical role parallelism, among others. The particular way this abstract procedure is instantiated varies in its details across different theories and algorithms, of course. The Centering
95
algorithm of Brennan et al. (1987), studied psycholinguistically by various authors (Hudson-D’Zmura 1989; Gordon, Grosz, and Gilliom 1993; Brennan 1995; inter alia), performs Selection with respect to a ranking of centering transitions and a Centering rule (“Rule 1”) is added as a filter in the Match phase along with the more standard constraints mentioned earlier. The Select phase of Lappin and Leass’s (1994) computational algorithm uses a combination of weighted preference factors to determine the salience of potential referents, from which the referent with the highest value is chosen. Hobbs’s (1978) well-known syntactic search mechanism, some predictions of which were psycholinguistically tested by Matthews and Chodorow (1988), uses an ordered Search phase that renders the Selection phase trivial: The referent chosen is the first one encountered during the search that satisfies the Match tests. Many other examples exist, not only of computer algorithms like these, but also with respect to the manner in which hypotheses are formulated and tested in the greater psycholinguistics literature. In this chapter I argue that the smash way of framing the problem must be abandoned by any theory that seeks to explain pronoun interpretation within the human language processing mechanism.1 In so doing, I will offer a set of adequacy criteria that any analysis should at least be compatible with, if not ultimately explain. I will provide an outline of the form which I believe an adequate solution will ultimately take, although several questions will be left open for future work. The remainder of the chapter is organized as follows. In the next section, I present a set of facts that are problematic for the Search and Match aspects of the smash paradigm. In section 3, I present facts that are problematic for the manner in which superficial morphosyntactically based ‘preferences’ are typically used in the Select phase. In light of these problematic facts, I will argue in section 4 that any adequate account of pronoun interpretation will have to be embedded in a larger model of discourse processing that accounts for the interaction between information structure2 and inferential processing mechanisms that underlie the establishment of coherence in discourse.
2. Problems for the Search and Match Approach The third step of the smash approach uses heuristic preferences to choose a referent from the potential referents identified in Step 1 that survive a hard-constraint check in Step 2. If only one possible referent remains after the first two steps, that will be the referent trivially; in this case Step 3 has no work to do. At first blush, this seems perfectly reasonable. In most cases an inability for a pronoun to refer to a locally introduced entity is due to the fact that a more salient compatible entity exists in the discourse context. For instance, Bush will (in most contexts) be preferred to Blair as the referent of He in passage (1).
96
HOW IS REFERENCE RESOLVED ?
(1)
At the summit in Lisbon last Tuesday, George W. Bush met with Prime Minister Blair to discuss Middle East policy. He had to call the meeting short, however, due to a crisis that arose back home.
This accords with the predictions of most theories: even though Bush and Blair are both semantically plausible referents, the subject grammatical role is treated as a position that affords its occupant more salience than the object of a with-PP. This fact does not mean that a reference to Blair is inherently unpronominalizable, however, as can be seen by considering passage (2). (2)
At the summit in Lisbon last Tuesday, National Security Advisor Condoleezza Rice met with Prime Minister Blair to discuss Middle East policy. He had to call the meeting short, however, due to a crisis that arose back home.
In this case, the person mentioned in subject position does not agree with the pronoun in gender. Thus, the reference to Blair is unambiguous—he is the only potential referent left when Step 3 is reached—and thus the reference will be correctly resolved by a smash procedure. However, cases exist in which a pronoun cannot be felicitously used to refer to an entity even when it is the only one that satisfies the Match constraints. This unexpected fact casts serious doubt on the tenability of the Search and Match steps of the smash paradigm. The following subsections discuss three different types of example.
2.1 Not All Entities Are Salient Enough to Pronominalize The first case I consider is example (3), which is a variant of an example from Gundel et al. (1993): (3)
Two Sears employees delivered some new appliances to my neighbors with the Doberman pinscher. a. # It’s the same dog that bit Susan last summer. b. That’s the same dog that bit Susan last summer.
The entity denoted by the Doberman pinscher does not license a subsequent pronominal reference; a demonstrative must be used per (3b). This failure to license a pronoun occurs despite the fact that the Doberman pinscher is (i) the most recently mentioned entity, (ii) occurs only one sentence back, and (iii) is in fact the only entity that satisfies the number restriction on the pronoun. As such, a smash procedure predicts such reference to be unambiguous and unproblematic. Clearly this failure to pronominalize is not due to anything intrinsic to Doberman pinschers; placement in subject position allows for both pronouns
RETHINKING THE SMASH APPROACH
97
and demonstratives, as witnessed in (4), again adapted from an example used by Gundel et al.: (4)
My neighbor’s Doberman pinscher bit a girl on a bike. a. It’s the same dog that bit Susan last summer. b. That’s the same dog that bit Susan last summer.
The grammatical position of the mention of the Doberman pinscher therefore appears to be of relevance; indeed, a natural argument to make here is that the syntactic position of the mention in (3)—a sentence-final noun phrase that is embedded in a prepositional phrase modifier to another noun phrase— is accorded too low a degree of salience to permit subsequent pronominalization. If that were the case, then a smash procedure could be consistent with these facts as long as it incorporated a lower bound on referent accessibility. There is more to the story than that, however; compare passage (3) to passage (5): (5)
Lenox delivered new expensive china to my neighbors with the wild child. He’ll have it all broken within a week!
The pronoun He in this case is felicitous, despite the fact that its referent is mentioned from the same grammatical position as the Doberman pinscher in (3). So, as Gundel et al. point out using similar examples (which I have modified so as to make each pronoun morphosyntactically incompatible with all but one entity-level referent), factors other than syntax must come into play. It is tempting to speculate that the crucial difference between these examples lies in the speaker’s purpose in uttering the prepositional phrases in each case. In (3), the most readily inferred speaker purpose behind the utterance of with the Doberman pinscher is that it restricts the reference of the NP it modifies to a unique set of neighbors. Once these neighbors are identified, the Doberman pinscher appears to have no further contribution to the overall proposition. While the prepositional phrase with the wild child may serve the same purpose in the first sentence of (5), the image that this sentence conjures up suggests that this particular choice for restricting the reference to the neighbors is not random—it is instead interpretable as a conversationally relevant description in the sense of Kronfeld (1990). That is, this choice may have been intended to create an expectation that the ensuing discourse will address the inadvisability of a couple with a wild child buying expensive china, which gives the wild child a role in the discourse that goes beyond restricting the reference to a unique set of neighbors. If this line of reasoning is correct, it suggests that a discourse process as high-level as reasoning about the intentions that underlie a speaker’s choice of linguistic expression is necessarily intertwined with the seemingly lower level process of pronoun interpretation.
98
HOW IS REFERENCE RESOLVED ?
2.2 Apparent Interference between ‘Soft’ Preferences and ‘Hard’ Constraints In the last section, we saw an example of a reference that succeeded when the referent was mentioned in subject position (ex. 3), but did not when it was mentioned in an embedded, and therefore less salient, position (ex. 4). The success of the reference in (3) is unsurprising, since subject position is generally assumed to be associated with the highest degree of salience among grammatical roles. However, there are examples in which even a subject referent cannot be felicitously pronominalized in the subsequent utterance. (To my knowledge, examples of this sort were first noticed by Oehrle (1981).) Consider example (6): (6)
?? Margaret Thatcher admires Ronald Reagan, and George W. Bush absolutely worships her.
All of my informants have agreed that this example is infelicitous, assuming the pronoun is deaccented.3 These informants generally report a feeling that the pronoun should be assigned to Reagan, as if the speaker is confused about his gender. This is particularly striking for reasons similar to those cited with respect to example (3): not only is Thatcher evoked from the subject position of the previous sentence but (i) this pronoun assignment results in a completely plausible interpretation, and (ii) Thatcher is the only potential referent that meets the gender restriction of the pronoun. Again, a smash algorithm would happily identify Thatcher as the referent regardless of the preferences it employs, since she is the only potential referent that survives the morphosyntactic constraint check in the Match phase. As one might expect, the same effect occurs for mismatches in number, as in (7). (7)
?? Republicans admire Ronald Reagan, and George W. Bush absolutely worships them.
Again, the pronominal reference is infelicitous even though there is only one suitable referent available. It may appear on the surface that these are examples in which a ‘soft’ preference for grammatical role parallelism actually trumps ‘hard’ constraints like gender and number agreement. This would in turn suggest that the Match step of the smash procedure cannot come strictly before the Select step; they would instead need to be integrated in some manner. However, I will argue against this conclusion in section 4.3, where I claim that there are information structural factors governing accent placement at work in such examples that are completely orthogonal to whether a referring expression is realized as a pronoun. I will then argue that this fact denies the existence of any grammatical role parallelism preference at work in pronoun interpretation.
RETHINKING THE SMASH APPROACH
99
2.3 Conjunction The final set of examples I consider in this section involves reference to an entity mentioned within a conjoined NP. First consider example (8): (8)
?? Bush and Blair gave a press conference, and a reporter asked him a rude question.
The pronominal reference is infelicitous for reasons that should be apparent. Bush and Blair are evoked from equivalent grammatical positions, and are thus presumably accorded the same degree of salience in the discourse state. As such, they are indistinguishable as possible referents of the pronoun. Most smash theories would predict this fact, unless (i) they incorporate a preference for first mention, which if construed linearly (as opposed to hierarchically) would favor Bush, or (ii) recency, which if construed linearly would favor Blair. A related fact that is less expected on the smash model is that a change in gender in one of the conjuncts does little to improve the felicity of the pronominal reference. Consider (9): (9)
?? Rice and Blair gave a press conference, and a reporter asked him a rude question.
At a minimum, accent is required on him to make the reference felicitous. Repeating Blair’s name is more felicitous yet: (10)
Rice and Blair gave a press conference, and a reporter asked Blair a rude question.
Yet again we have a case in which one would expect reference with an unaccented pronoun to be unambiguous since there is only one male mentioned in the discourse, and as such a smash procedure would happily identify Blair as the referent. Note that the situation regarding (8) and (9) is in marked contrast to the difference between passages (1) and (2). Changing the subject referent from Bush in (1) to Rice in (2) made pronominal reference to Blair perfectly felicitous, whereas the same change from (8) to (9) did not. Further complicating the situation are examples such as (11) and (12): (11)
Bush gave a circumspect answer to a question on Iraq from Wolf Blitzer. He then brought up the hunt for Bin Laden.
(12)
Bush and Blair each gave a circumspect answer to a question on Iraq from Wolf Blitzer. He then brought up the hunt for Bin Laden.
Most of my informants identify Bush as the referent of he in (11) and Blitzer as the referent of he in (12). We have already seen that the result for (11) is expected, since entities evoked from subject position are typically accorded greater salience than those evoked from within a PP modifier to another NP. The results for (12) are more interesting. Since all three possible
100
HOW IS REFERENCE RESOLVED ?
referents are mentioned from embedded positions, it would be reasonable to expect that Bush and Blair would be more salient than Blitzer, since the former are embedded within the subject position.4 The “tie” in salience between Bush and Blair does not appear to cause infelicity as it did in example (8), but instead allows the pronoun to be assigned to a lower ranked entity, Blitzer. As before, the situation does not change appreciably if we change the gender of one of the conjuncts in subject position: (13)
Rice and Blair both gave a circumspect answer to a question on Iraq from Wolf Blitzer. He then brought up the hunt for Bin Laden.
These data might be taken as evidence that there is more to determining the referent of a pronoun than degree of salience, that is, that there is a topichood requirement at play (Gundel, Hedberg, and Zacharski 1993, inter alia). In each of the cases, the entities introduced in the conjoined subject NP presumably must either serve as topic together or not at all—it cannot be that one entity is the topic and the other is not.5 However, while topics are preferentially realized in subject position (in English), they need not be. Thus, a hearer trying to infer a potential topic from the first sentence of (13) as the referent of he will identify Blitzer, even though a more salient referent that agrees with the pronoun is (arguably) available. Analogous reasoning applies to the inability to identify a suitable topic as the referent of him in (9). To summarize this section, the smash procedure predicts that if only one referent in the search window survives the application of ‘hard’ morphosyntactic constraints, then that entity is the referent trivially. We have seen here, however, three scenarios in which that is not the case. I am sure that others exist. It should be acknowledged that one might object to the line of argumentation I have taken, citing the fact that since pronoun interpreters (human or machine) will generally not be confronted with passages like these (under the assumption that actual discourses are for the most part felicitous), the smash algorithm is doing the most reasonable thing by still selecting a referent. There are at least two reasons that I would not be swayed by such an objection. First, because these data reveal that the smash paradigm is a flawed way to conceive of pronoun interpretation, I would expect an analysis that is immune to these problems to have advantages in accounting for other data as well. Second, the problem with smash becomes more tangible when we look at it from a generation perspective. If left unenriched with additional information structural constraints, such algorithms will predict that pronominalization can occur during discourse production in all of the preceding problematic examples (again, on the basis of there being only one potential referent in the search window that satisfies Match constraints), thereby resulting in the generation of infelicitous discourse.
RETHINKING THE SMASH APPROACH
101
3. Problems for Selection Using Superficial Preferences In the last section, I argued against the search-and-match aspects of the smash procedure—Steps 1 and 2—by appealing to three different types of example. In each case, infelicitous pronominal reference is predicted to be unambiguous and unproblematic by the smash approach. In this section I consider Step 3 of the smash procedure, in particular its use of superficial morphosyntactic preferences to select pronominal referents. A variety of such preferences have been posited in the literature, based on orderings on grammatical roles, grammatical role parallelism, orderings on thematic roles, verb semantics, and the referential form of the antecedent, for example. Here I will focus on two of these: the preference for entities evoked from the subject position of the previous clause (henceforth the subject assignment preference), and the preference for entities evoked from the parallel grammatical role of the previous clause (henceforth the grammatical role parallelism preference). I will argue that these preferences should not be blindly applied to all examples as smash procedures invariably do. Instead, we will see that each preference is in force only in a particular class of cases. I will argue that these preferences are in fact epiphenomena of deeper interpretation processes that only operate in particular contextual circumstances.
3.1 The Status of the Subject Preference Many approaches to pronoun interpretation encode a preference for a pronoun to refer to the subject of the previous sentence. This preference is usually based on one of two related (but not equivalent) claims. The first is that subject position accords greater degree of salience to its occupant than do other grammatical roles; pronouns are thus presumed to be sensitive to the degree of salience of their possible referents. The second is that the subject position is the canonical place from which to mention a discourse topic (Chafe 1976; Gundel, Hedberg, and Zacharski 1993; Lambrecht 1994; inter alia); pronouns in this view are considered to be indicators of topic continuation. We have already seen examples that support this preference; example (1) is repeated as (14): (14)
At the summit in Lisbon last Tuesday, George W. Bush met with Prime Minister Blair to discuss Middle East policy. He had to call the meeting short, however, due to a crisis that arose back home.
There are two possible referents of He, Bush and Blair, which are equally plausible semantically. The fact that Bush was placed in subject position appears to create the preference for him to be the referent. As such, if we were to switch the mentions of Blair and Bush, Blair becomes the preferred referent:
102
HOW IS REFERENCE RESOLVED ?
(15)
At the summit in Lisbon last Tuesday, Prime Minister Blair met with George W. Bush to discuss Middle East policy. He had to call the meeting short, however, due to a crisis that arose back home.
Likewise, mentioning both Bush and Blair in a conjoined subject renders the reference infelicitous, as we saw for similar examples in section 2.3: (16)
At the summit in Lisbon last Tuesday, George W. Bush and Prime Minister Blair met to discuss Middle East policy. # He had to call the meeting short, however, due to a crisis that arose back home.
Since the first sentence in each of these three passages describes the same situation, the choice of syntactic form—which in turn determines which referents are mentioned from which grammatical roles—appears to be the crucial factor in determining the reference assignments. There are a variety of situations in which the subject assignment preference is mysteriously neutralized, however. One such class includes constructions that indicate a transfer of possession or change of mental state. Stevenson et al. (1994), for instance, argue that occupants of different thematic roles in such constructions have different degrees of centrality within a hearer’s mental representation of the end state of an event, based on examples such as (17a–b) and (18a–b). (17)
a. John seized the comic from Bill. He . . . [began reading] b. John passed the comic to Bill. He . . . [began reading]
(18)
a. Ken impressed Geoff. He . . . [knows a lot about cars] b. Ken admired Geoff. He . . . [knows a lot about cars]
In a set of sentence completion experiments, Stevenson et al. found that hearers are more likely to continue passage (17a) in way that requires he to refer to John (as suggested by the bracketed text that I have added), whereas their completions of (17b) more often required he to refer to Bill. This result for (17b) belies the fact Bill is embedded within a sentence-final prepositional phrase, a position normally accorded less salience than the subject position. The property that these examples share is that the preferred referent occupies the goal thematic role of its respective predication, whereas the dispreferred entity occupies the source role. The analogous pattern was found for passages like (18a) and (18b), in which the referent assignments favored the occupant of the stimulus role (Ken and Geoff, respectively) over the occupant of the experiencer role (Geoff and Ken, respectively). Another set of counterexamples manifest a causal relation between the clauses, as in (19) and (20). (19)
Bush blamed CIA director Tenet for the mistake. He had not properly vetted the speech.
(20)
Colin pushed Don. He tumbled to the ground.
RETHINKING THE SMASH APPROACH
103
In each case the preferred referent is the occupant of the object position, even though the subject position entity is morphosyntactically compatible with the pronoun. In these cases, it would appear that semantic and world knowledge considerations are the determining factors. The final class of counterexamples that I will discuss includes cases in which a pronoun favors the occupant of a parallel grammatical role, which is the topic of the next section.
3.2 The Status of the Grammatical Role Parallelism Preference A variety of authors have argued for a basic preference in pronoun interpretation based on grammatical role parallelism (Kameyama 1986; Smyth 1994; Chambers and Smyth 1998; inter alia). This heuristic states that a pronoun will preferentially be associated with an antecedent in a parallel grammatical role. Various examples appear to support the existence of such a strategy. First, the grammatical role parallelism preference is supported by examples in which a subject pronoun corefers with the subject of the previous sentence, as in examples (14) and (15). However, because this preference makes the same prediction as the subject assignment preference, examples with pronouns in other grammatical positions need to be considered to differentiate between the two. A canonical example of this sort is given in (21): (21)
Margaret Thatcher admires Hillary Clinton, and George W. Bush absolutely worships her.
There is an extremely strong preference to assign the (unaccented) pronoun her to Clinton. This example thus counterexemplifies the subject assignment preference, since Thatcher—which is not only consistent with the pronoun but preferred with respect to semantic plausibility—occupies that position. Intuitively, there appears to be a parallelism effect at play here that supersedes the subject preference. We have already seen a number of examples, however, that call this preference into question. These include Stevenson et al.’s examples (17b) and (18a), and similarly the causally related examples in (19) and (20), in which a subject pronoun is preferentially assigned a nonsubject referent. Likewise, examples in which a nonsubject pronoun prefers a subject referent are not hard to find. For instance, Kameyama (1996) reports that a majority of native informants prefer to identify the object pronoun in (22a) and (22b) with the subject of the preceding clause: (22)
a. John kicked Bill. Mary told him to go home. [= John ] b. Bill was kicked by John. Mary told him to go home. [= Bill ] c. John kicked Bill. Mary punched him. [= Bill ]
104
HOW IS REFERENCE RESOLVED ?
In contrast, her study revealed a preference for grammatical role parallelism in (22c), creating an inconsistency among these data. Thus, both the subject assignment and the grammatical role parallelism preferences are associated with examples that appear to definitively support their respective existences while counterexemplifying the other. In the next section I discuss how this confusing state of affairs can be resolved.
3.3 Do Preferences Compete? So how are we to reconcile this contradictory evidence? Some authors have proposed that some type of competition among these preferences is at play. For instance, in ultimately arguing for the primacy of a grammatical role parallelism preference, Smyth (1994) acknowledges a role for the subject assignment preference: I conclude from these observations that pronoun interpretation in conjoined sentences involves an obligatory search for a morphologically compatible antecedent which meets the binding theory (Chomsky, 1981) criteria for coreference and which, in addition, has the same grammatical role as the pronoun. If a match is found, the the parallel interpretation is obligatory, unless the pronoun is stressed, in which case it is selectively blocked. If no match is found, resolution is less certain, but will most often result in SA [= subject assignment], although if the pronoun or the first clause verb is stressed, alternative strategies govern the selection of an antecedent. On this view, SA is a default strategy for sentences in which the degree of nonparallelism exceeds some limit; PF [= parallel function] is a specific outcome of the more general principle that the probability of parallel resolution depends on the number of features shared by the pronoun and the candidate antecedents. Retaining SA in the model allows us to account for an otherwise mysterious asymmetry between subject and nonsubject pronouns. (204–205)
Similarly, Stevenson et al. (1995) carried out a set of question-answering experiments to detemine whether the subject assignment and parallel grammatical role preferences jointly contribute to the interpretation of pronouns. They found that subjects more often resolved the pronoun to the subject position entity when both strategies indicated that preference, as compared to when only the subject assignment strategy applied. Furthermore, they found that nonsubject assignments were actually preferred when the two strategies disagreed. They conclude from this that the subject assignment and parallel function heuristics operate jointly, and that this in turn “implies a model of discourse processing in which a number of constraints compete in the interpretation of noun phrases.” As I argue in Kehler (2002), however, these studies miss a crucial factor that distinguish the data, in particular the coherence relation that is manifest between the clauses.6 (My discussion here will out of necessity have to be quite brief; see Kehler (2002) for further details.) In that work, I categorize
RETHINKING THE SMASH APPROACH
105
coherence relations into three classes: Resemblance relations, Contiguity relations, and Cause-Effect relations. Examples of relations in these three categories, respectively, follow; the relation definitions are taken or adapted from those of Hobbs (1990): Parallel: Infer p(a1, a2, . . .) from the assertion of S1 and p(b1, b2, . . .) from the assertion of S2, where ai and bi are similar for all i. Occasion: Infer a change of state for a system of entities from each of S1 and S2, where the final state of S1 provides the initial state for S2. Explanation: Infer P from the assertion of S1 and Q from the assertion of S2, where normally Q → P.
Examples in which these different relations are operative show a characteristic pattern with respect to what surface-level interpretation preferences they provide support for. For instance, cases that support the subject assignment preference, for example (22a–b), are instances of the Occasion relation. (Certain cases that are problematic for this preference, such as example (17b), are instances of Occasion also. I return to this point in section 4.3.) Occasion is the relation typically operative in a narrated sequence of events; it allows one to express a situation centered around a system of entities by using intermediate states of affairs as points of connection between partial descriptions of that situation. The hearer’s job, therefore, is to infer any information necessary to identify the initial state of the eventuality described by an utterance with the final state of the one that came before it. On the other hand, the examples that support a grammatical role parallelism preference, such as (21), are typically instances of the parallel relation. This relation and others in the Resemblance category (Contrast, Exemplification, etc.) have quite a different character than Occasion; coherence with these relations requires that commonalities and contrasts among corresponding sets of parallel relations and entities be recognized using inference processes based on comparison, analogy, and generalization. Finally, the examples that were problematic for both approaches in (19) and (20) are instances of Cause-Effect relations, specifically Explanation (defined earlier) and Result (which is essentially the same relation except with the clause order reversed), respectively. These relations are established in yet a third way, as they require the identification of a causal chain that connects the propositions denoted by the utterances. This pattern suggests that the characteristics of the different inference processes that underlie the establishment of these different types of coherence relation are likely to be in part responsible for the otherwise puzzling distribution of the data. Perhaps the most definitive evidence for this claim can be seen from examples that have more than one possible coherence interpretation. Consider example (23): (23)
106
Colin Powell defied Dick Cheney, and George W. Bush punished him.
HOW IS REFERENCE RESOLVED ?
This passage has both a Parallel and a Result reading, and crucially, the assignment for the unaccented pronoun him is parasitic on this choice of interpretation. Under the Parallel interpretation, in which coherence is licensed by the similarity between defying and punishing, the pronoun must be interpreted as referring to Cheney. This result accords with the grammatical role parallelism preference. On the other hand, under the Result reading of (23) in which coherence is licensed by the causal knowledge that a person who defies someone might get punished for it, him will be interpreted to refer to Powell. This result accords with the subject assignment preference. It is therefore hard to see how a preference-based system could predict this ambiguity (at least without predicting that all cases in which these two preferences compete are equally ambiguous, which we have already seen is not the case), since obviously the morphosyntactic properties of the passage remain constant for the two interpretations. In the next section, I discuss analyses that integrate coherence and pronoun interpretation with respect to these and other data. I then revisit the status of the subject assignment and grammatical role parallelism preferences, and conclude that (i) the status of the subject preference is considerably more complicated than it might first appear, and (ii) that there is in fact no basic preference for parallelism at the level of grammatical roles in pronoun interpretation.
4. The View from Above In the previous two sections, I argued on empirical grounds that the smash model cannot be correct. The data pose problems for both the search-and-match component of the model (section 2), and the reliance on superficial preferences (and combinations thereof) to rank candidate referents (section 3). Beyond these empirical deficiencies, the smash approach also presents a larger conceptual problem. The amount of processing that has to occur at the time the pronoun is encountered in many smash algorithms (e.g., consider Smyth’s (1994) and Stevenson et al.’s (1995) preference combining systems from the last section, or the Centering algorithm described in section 4.2, for instance) would seem to be at odds with a very basic fact: that the appropriate use of a pronoun generally has the effect of facilitating discourse interpretation, and not hindering it. After all, in choosing to use a pronoun, a speaker elects to use a potentially ambiguous expression that, under the smash paradigm, may require a computationally intensive effort for the hearer to resolve, rather than a less ambiguous or even unambiguous one that would presumably not (such as a proper name). Yet, as shown in the experiments of Gordon et al. (1993), for instance, discourses may be read more slowly if a proper name is used to refer to a focused entity instead of a pronoun. This suggests a rather obvious, but often ignored, facilitation paradox for a theory of pronoun interpretation. That is, if pronoun interpretation was really as hard as some smash approaches suggest, why would a speaker do the disservice to her hearer of using one?
RETHINKING THE SMASH APPROACH
107
The holy grail of a theory of pronoun interpretation, then, is to provide an answer to the facilitation paradox and at the same time explain the complex patterns of behavior one finds in the data. Picking up where we left off in section 3.3, I will now explore analyses that address the interaction of pronoun interpretation and coherence establishment with an eye toward these desiderata. In section 4.1 I discuss Hobbs’s well-known treatment, in which pronoun interpretation is viewed purely as a side-effect of coherence establishment. While this account offers a potential explanation for the facilitation paradox, several facts regarding pronominal behavior are left unexplained. I then consider Centering theory in section 4.2. Although I believe Centering is correct in modeling the relationship between attentional state and pronoun interpretation explicitly (unlike Hobbs’s account), it does not capture the type of interactional effects between information structure and coherence that the data suggest need to be modeled. In section 4.3 I begin to develop a theory that posits such an integration. My claim is that by properly accounting for the division of labor between information structure, coherence establishment, and pronoun interpretation, we will find that much of the complexity that some smash procedures consider to be part of pronoun interpretation should actually be accounted for elsewhere in the discourse processing mechanism. will ultimately argue that not only does a consideration of coherence lead to a (perhaps partial) resolution of the facilitation paradox, but it also explains a number of otherwise contradictory facts that have to this point resisted a satisfactory analysis.
4.1 Coherence and Inference To my knowledge, Hobbs (1979) was the first to develop a theory of pronoun interpretation specifically based on the establishment of coherence relations. In fact, in his analysis pronoun interpretation is not an independent process at all, but instead results as a by-product of more general reasoning about the most likely interpretation of an utterance. Pronouns are modeled as free variables in logical representations that become bound during these inference processes; potential referents of pronouns are therefore those which result in valid proofs of coherence. Let us illustrate with passages (24a) and (24b), adapted from an example from Winograd (1972). (24)
The city council denied the demonstrators a permit because . . . a. . . . they feared violence. b. . . . they advocated violence.
In Hobbs’s account, the correct assignment for the pronoun in each case falls out as a side-effect of the process of establishing the Explanation relation (here signaled by because), the definition of which is repeated as follows:
108
HOW IS REFERENCE RESOLVED ?
Explanation: Infer P from the assertion of S1 and Q from the assertion of S2, where ormally Q → P.
Oversimplifying considerably, I will code the world knowledge necessary to establish Explanation for (24) within a single axiom, given in (25). (See Hobbs et al. (1993, p. 111) for a more detailed analysis of a similar example.) (25)
fear(X, V ) Ù advocate(Y, V ) Ù enable_to_cause(Z, Y , V ) É deny(X, Y , Z )
This axiom says that if some X fears some V, some Y advocates that same V, and some Z would enable Y to bring about V , then X may deny Y of Z. To make this more concrete, the instantiation of this rule that is relevant for example (24) would say that if the city council fears violence, the demonstrators advocate violence, and a permit would enable the demonstrators to bring about violence, then this might cause the city council to deny the demonstrators a permit. The first sentence in (24) can be represented with the predication given in (26). (26)
deny(city_council, demonstrators, permit)
This representation matches the consequent of axiom (25), triggering a process of abductive inference that can be used to establish Explanation. At this point, X will become bound to city_council, Y to demonstrators, and Z to permit. Each of the follow-ons (24a–b) provides information that can be used to help ‘prove’ the predications in the antecedent of the axiom, thereby establishing a connection between the clauses. Clause (24a) can be represented with predicate (27), in which the unbound variable T represents the pronoun they. (27)
fear(T , violence)
When this predicate is used to match the antecedent of axiom (25), the variables T and X are necessarily unified. Since X is already bound to city_council, the variable T representing they also receives this binding, and the pronoun is therefore resolved. Likewise, clause (24b) can be represented as predicate (28). (28)
advocate(T , violence)
This predicate also matches a predicate within the antecedent of axiom (25), but in this case, the variables T and Y are unified. Since Y is already bound to demonstrators, the representation of they also receives this binding. Thus, the correct referent for the pronoun is identified as a by-product of establishing Explanation in each case. The crucial information determining the choice of referent is semantic in nature, based on the establishment of the relationship between the predication containing the pronoun and the predication containing the potential referents. The fact that coreference came “for free” captures the effortlessness with which people appear
RETHINKING THE SMASH APPROACH
109
to be able to interpret pronouns, offering a potential explanation for how the choice to use of pronoun can actually facilitate, rather than hinder, the process of discourse comprehension. As perhaps the most elegant proposal out there for pronoun interpretation, it would certainly be nice if this was all there was to it. There are a variety of facts that are left unaccounted for, however. First, and most obviously, are sentence pairs in which pronominal assignments switch based on nonsemantic factors, such as in passages (22a–b), repeated here as (29a–b): (29)
a. John kicked Bill. Mary told him to go home. [= John ] b. Bill was kicked by John. Mary told him to go home. [= Bill ]
Since Hobbs’s inference processes are carried out on representations that are purely semantic, syntactic distinctions such as voice are lost and thus are of no use in predicting this variation. Indeed, examples (24a–b) and (29a–b) create an interesting contrast: Whereas (29a–b) suggest that grammatical role is the primary determinant of pronominal interpretation (these examples keep semantics roughly constant), (24a–b) seem to suggest that semantics is the primary determinant, since these examples keep the relevant syntactic relationships constant, differing only with respect to the verb that occurs after the pronoun. In fact, however, even variants of (24a–b) can be used as evidence that there is more to pronoun interpretation than coherence-driven reasoning. Consider the same passage except where the initial clause is passivized, as in (30): (30)
The demonstrators were denied a permit by the city council because . . . a. . . . they feared violence. b. . . . they advocated violence.
A strong majority of my informants assign the pronoun they to the subject (the demonstrators) in both follow-ons, even though the semantics of sentence (30a) might be expected to cause the pronoun to be identified with the city council in the same way that was just depicted for (24a). (Caramazza and Gupta (1979), in fact, performed experiments that showed this distinction using other stimuli.) The alternation displayed by (24a–b) is simply absent in (30a–b). This is a problem for a purely coherence-based approach, since the logical relationships expressed by the first sentences of (24) and (30) are the same. I take this as evidence that information structure is still important for pronoun interpretation, and that any adequate model will have to integrate it with coherence in a suitable way. Surely coherence-external factors were never fully disposable to begin with, as there are a variety of phenomena involving pronominal reference for which a coherence-based theory offers little help, such as intraclausal anaphora (John likes his mother) and exophora (Look at him!).
110
HOW IS REFERENCE RESOLVED ?
Likewise, a model of information structure and its effect on attention will surely prove to be crucial for explaining certain garden-path effects that suggest that pronouns are interpreted incrementally. This phenomenon has in part motivated a framework that has been argued to integrate such factors with discourse coherence, namely Centering theory. In the next section I examine Centering and show that it does not provide answers to the kind of problems identified here.
4.2 Centering as a Theory of Pronoun Interpretation The Centering theory of Grosz et al. (1995, henceforth GJW) is largely motivated by two related facts about language that are not explained by purely content-based models of reference and coherence, such as that of Hobbs (1979). The first of these is that the coherence of a discourse does not depend only on semantic content but also on the type of referring expressions used. GJW illustrate this point with passage (31), which is meant to be interpreted as part of a longer segment that is currently centered on John. (31)
a. He has been acting quite odd. [He = John] b. He called up Mike yesterday. c. John wanted to meet him quite urgently.
The third sentence in this passage is odd, despite the fact that the pronoun him in (31c) unambiguously refers to Mike. The oddness of this sentence stems from the choice of referring expressions used, in particular, the fact that the entity that is more central to the discourse (John) is not referred to with a pronoun whereas the less central element (Mike) is. As such, the same passage is perfectly felicitous if the final reference to John is pronominalized instead. The second motivating fact is the existence of garden-path effects in pronoun interpretation, in which a pronoun appears to be interpreted before adequate semantic information has become available. GJW discuss passage (32): (32)
a. Terry really goofs sometimes. b. Yesterday was a beautiful day and he was excited about trying out his new sailboat. c. He wanted Tony to join him on a sailing expedition. d. He called him at 6 a.m. e. He was sick and furious at being woken up so early.
The passage is perfectly acceptable until sentence (32e), which causes the hearer to be misled. Whereas semantic plausibility considerations indicate that the intended referent for He is Tony, hearers tend to initially assign Terry as its referent, creating a garden-path effect. Such examples provide evidence
RETHINKING THE SMASH APPROACH
111
TABLE
5.1.
Transitions in the BFP Algorithm
Cb(Un+1) = Cp(Un+1) Cb(Un+1) ¹ Cp(Un+1)
Cb(Un+1) = Cb(Un) or unbound Cb(Un)
Cb(Un+1) ¹ Cb(Un)
Continue Retain
Smooth-Shift Rough-Shift
that more is involved in pronoun interpretation than simply reasoning about semantic plausibility. In fact, they suggest that hearers assign referents to pronouns at least in part based on other factors, before interpreting the remainder of the sentence. In what follows, I will focus on Brennan et al.’s (1987, henceforth BFP) centering-based algorithm for pronoun interpretation.7 Each utterance in a discourse has exactly one backward-looking center (denoted Cb) and a partially ordered set of forward-looking centers (Cf1. . ., Cfn). The notation Cb(Un) is used to refer to the Cb of utterance n, and Cf (Un) to refer to the Cf list of utterance n. As a shorthand, the highest ranked forward-looking center Cf1 of utterance n is called the preferred center, or Cp(Un). Roughly speaking, Cf(Un) contains all entities that are referred to in utterance n; amongst this list is Cb(Un). Cb(Un+1) is the most highly ranked element in Cf(Un) that is realized in Un+1. The entities are ranked on the Cf list with respect to a grammatical role obliqueness hierarchy: subject, object, indirect object, other subcategorized functions, and adjuncts. They define four transitions between utterances depending on the relationship between Cb(Un), Cb(Un+1), and Cp(Un+1), shown in table 5.1. Now we can give the two rules of centering: Rule 1: If any element of Cf(Un) is realized by a pronoun in Un+1 then the Cb(Un+1) must be realized by a pronoun also. Rule 2: Transition states are ordered. Continue is preferred to Retain is preferred to Smooth-Shift is preferred to Rough-Shift.
The BFP algorithm is a smash procedure, in which (i) Rule 1 is used during the Match phase, along with the other morphosyntactic constraints mentioned in the introduction, and (ii) the Select phase chooses the pronoun assignment(s) that result in the most preferred transition per Rule 2. This strategy correctly predicts that He and him in sentence (32d) refer to Terry and Tony, respectively, since this assignment results in a Continue relation, whereas the Tony/Terry assignment results in a less-preferred Retain relation. The algorithm also accounts for the oddness of sentence (32e), since assigning he to Tony results in a Smooth-Shift, whereas assigning he to Terry results in a Continue. There are a few misconceptions about centering floating around the literature that I believe are worth clearing up. For instance, several previous authors (Hudson-D’Zmura 1989; Chambers and Smyth 1998) have
112
HOW IS REFERENCE RESOLVED ?
claimed that the centering algorithm incorporates a subject assignment preference of the sort discussed in section 3.1. It does not; it instead incorporates what one might call a topic continuation preference, in which the Cb is taken to represent the sentence topic. Indeed, it is exactly this property that distinguishes a centering-based account from a purely salience-based strategy. Consider, for instance, example (33a–b), with possible follow-ons (33c) and (33c'): (33)
a. Terry is always willing to go sailing. b. Tony dropped by his house yesterday. c. He knocked but no one answered. c'. He wasn’t home.
Whereas a subject preference predicts that He in (33c) refers to Tony, centering predicts that it will refer to Terry, since that assignment leads to a Continue transition (Tony gives rise to a Smooth-Shift). Whereas the subject preference gets it right in this case, it doesn’t always turn out that way: the preferred referent for He in follow-on (33c') is Terry, as predicted by centering. The reality is that discourses that end in a Retain, like (33a–b), often create an ambiguity for a following subject pronoun, since it is unclear whether or not the placement of a new entity in subject position was motivated by an intention to shift topic. A second misconception is that centering provides an incremental mechanism for pronoun interpretation, modeling “a speaker’s immediate tendency to interpret a pronoun” (Brennan 1995). This also is not the case. Detailed discussion of this issue would take us too far afield; the reader is referred to Kehler (1997) for examples and further discussion. Briefly, the problem is that the preferred assignment for a pronoun in the BFP algorithm cannot necessarily be determined until the entire sentence has been processed. The reason stems from two properties of the algorithm: that determining the transition type between a pair of utterances Un and Un+1 requires the identification of Cb(Un+1), and a noun phrase (pronominal or not) can occur at any point in the utterance that will alter the assignment of Cb(Un+1). This fact compromises the algorithm’s ability to model the effects that are a result of incremental pronoun interpretation, unlike systems that use preference-driven strategies more directly (e.g., the subject assignment strategy). The main problem with centering for my purposes here, however, is that the notion of coherence it captures is primarily entity-based—for instance, of the type needed to account for the behavior of example (31)—and not of the type based on semantic relations and inference that is necessary to address the reference patterns we have seen thus far. Indeed, the constructs centering uses to assign referents are quite restricted: the Cb of the current and previous sentence and the Cp of the current sentence, neither of which are determined by semantic considerations. As such, centering offers no handle with which to explain why pronoun interpretations are different in several minimal pairs for which
RETHINKING THE SMASH APPROACH
113
these constructs are invariant, including (17a–b), (18a–b), (22a) and (22c), and (24a–b). I therefore conclude that while attentional state and topic continuation are likely to be important influences in pronoun interpretation, others ultimately need to be accounted for also. This leads us to the next section.
4.3 Integrating Information Structure and Inference The data we have discussed so far offer contradictory evidence about the mechanisms that underlie pronoun interpretation. In section 3.3, I argued that previous approaches have failed to control for the type of coherence relation that is operative in the examples they consider. In section 4.1, however, I also claimed that it is not enough to only consider coherence. Instead, I argue (as I have elsewhere (Kehler 2002) ) that an adequate analysis must capture the interaction between coherence establishment and information structural constraints on pronouns and their referents. A complete account of this interaction is no doubt a complex matter, and this short section will leave much open for future work. My goal here is instead to outline what I hope to convince the reader is a more promising direction for research than the smash paradigm. As I indicated in section 3.3, different types of coherence are associated with different types of inference processes used to establish them. One might therefore expect their interactions with information structure to vary as well. Since herein will lie the answer to why different pronoun interpretation preferences appear to be in force when different coherence relations are operative, I will consider this question in the context of some of the pronoun interpretation data we have discussed thus far, particularly with respect to the three coherence relations introduced in section 3.3. Let us begin with passage (21), repeated as (34), which is an instance of the Parallel relation: (34)
Margaret Thatcher admires Hillary Clinton, and George W. Bush absolutely worships her.
Recall that hearers universally assign Clinton as the referent of her in (21) if it is unaccented, even though world knowledge would strongly suggest Thatcher. Such examples initially appear to provide support for a grammatical role parallelism preference as advocated by Smyth (1994) and Chambers and Smyth (1998). However, as we saw in section 3.2, there are a variety of other examples that refute the existence of such a general preference, particularly ones characterized by coherence relations other than Parallel. It turns out that the strong bias toward grammatical role parallelism evident in examples like (34) is a result of the interaction between information structural constraints imposed by Parallel relations and rules of accent placement in English, and thus cannot be profitably attributed to a pronounspecific interpretation preference. As I indicated in section 3.3, the inference process associated with Parallel and other relations in its class first identifies
114
HOW IS REFERENCE RESOLVED ?
pairs of parallel entities and predicates as arguments to the relation, and then attempts to identify points of similarity and contrast among the members of each pair. The commonalities in such constructions create a common topic in the sense of R. Lakoff (1971), which in turn serves as the background against which focal elements are introduced. For instance, the common topic for (34) could be paraphrased roughly as how politicians feel about one another. As I describe in Kehler (2005), the mapping among parallel elements creates a situation in which any element in the second clause that is not coreferential with its parallel element in the first becomes part of the focus on the sentence, even if it denotes given information. General rules governing focus and accent placement in English then require that such constituents contain an accent, which in this case means placing accent on the pronoun. So if the referent of her is intended to be Thatcher, it must be accented as in (35a): (35)
a. Margaret Thatcher admires Hillary Clinton, and George W. Bush absolutely worships HER. b. Margaret Thatcher admires Hillary Clinton, and George W. Bush absolutely worships THATCHER/#Thatcher.
The fact that this accent requirement is not the result of a pronoun-specific strategy is easily seen by replacing the pronoun in (35a) with the proper name Thatcher as in (35b), for which accent is still required. Deaccenting Thatcher here is infelicitous because Thatcher and Clinton are not coreferent. Thus, the fact that a pronoun is used in (35a) and likewise in (34) is simply irrelevant: Constraints on pronominalizing the non-parallel referent Thatcher are met in (34), but independent constraints on deaccentuation are not. We can now explain the effect we saw in section 2.2, in which a ‘soft’ preference for grammatical role parallelism appeared to interfere with a ‘hard’ gender agreement constraint in example (6), repeated as (36): (36)
?? Margaret Thatcher admires Ronald Reagan, and George W. Bush absolutely worships her.
It is now clear that there is nothing ‘soft’ about this parallelism effect. For the pronoun to remain unaccented, it has to corefer with its parallel element, regardless of agreement considerations. The clash between gender and information structural constraints is irreconcilable in (36), hence its infelicity. These same constraints on deaccentuation do not apply for relations outside of the Resemblance class, which is why felicitous examples that violate the parallel grammatical role preference are readily found.8 Recall from section 3.3 the class of examples from Stevenson et al. (1994), repeated here as (37a–b), which participate in the Occasion relation: (37)
a. John seized the comic from Bill. He . . . b. John passed the comic to Bill. He . . .
RETHINKING THE SMASH APPROACH
115
Recall that Stevenson et al. found that hearers are more likely to interpret he to refer to John in passage (17a) and to Bill in (17b), despite the fact that in (17b) Bill is mentioned from within a sentence-final prepositional phrase. This lent support for a preference for occupants of the Goal thematic role over occupants of the Source role, since the Goal is presumably more central to the final state of the eventuality. That is, such constructions create the expectation that the recipient of the goods will be focused on next (cf. Arnold (2001) ). Recall that the inference process used to recognize Occasion attempts to connect the initial state of the eventuality described by an utterance with the final state of the one that came before it. As such, at the time that a pronoun is encountered, the referent most attended to should be the one that is most prominent with respect to the hearer’s conceptualization of the end state of the previous eventuality. While this will often be the subject of the preceding sentence, this is not always so, as we see from example (37b). We therefore predict that specific evidence for a Goal preference will only be found when an Occasion relation is operative; no such preference is expected for examples of other coherence relations, such as the Parallel relation in (38a): (38)
a. John passed the comic to Bill. He threw the book to Fred. (Parallel, He = John). b. John passed the comic to Bill. He didn’t want it anymore. (Explanation, He = John)
As expected, the unaccented subject pronoun in (38a) can felicitously refer only to John. As such, there can be no general preference for Goals over Sources, as we only find a preference for Goals where we would expect it: in Occasion relations, which crucially involve connecting the content of an utterance to the end state of the previous eventuality. The same point is demonstrated by example (38b), an instance of the Cause-Effect relation Explanation. The inference process associated with such relations attempts to identify a causal chain between the semantics of the clauses being related. In this case, the manner in which the referent is determined depends on how the contributions to the discourse instantiate pre-existing causal knowledge; in (38a), this knowledge favors John as the referent. We saw how different semantic predications can lead to different pronoun assignments in otherwise identical passages in our discussion of examples (24a–b) in section 4.1. However, while there is no explicit model of information structure or attention in Hobbs’s analysis, we have seen two reasons to believe that such a model is necessary. The first reason is the existence of garden-path effects, such as we saw in example (32). While that example involved an Occasion relation, gardenpaths can occur when Cause-Effect relations are operative also. Consider the adapted version of (24b) given in (39).
116
HOW IS REFERENCE RESOLVED ?
(39)
The city council denied the demonstrators a permit because they decided that the best way to draw attention to issues is to advocate violence.
As in (24b), the pronoun they in (39) is intended to refer to the demonstrators, but in this case the crucial information that leads to the eventual coherence relation realized—and with it, the intended pronominal referent—comes too late after the pronoun to shift attention away from the city council in time. Therefore, since there is an initial information structural bias toward the subject the city council, the hearer interprets it as the intended referent and a garden-path results. The second reason is the effect of passivization that we saw in examples (30a–b), repeated here as (40a–b): (40)
The demonstrators were denied a permit by the city council because . . . a. . . . they feared violence. b. . . . they advocated violence.
Recall that a purely coherence-driven strategy does not explain why hearers interpret they to refer to the demonstrators in both follow-ons, unlike the alternation seen in the active voice versions in (24a–b). There are crucial differences between the active and passive forms that are no doubt responsible. First, they are not on equal footing attentionally. In (24a–b), the nonsubject potential referent is in object position, which is still a relatively salient position. In (40a–b), on the other hand, the non-subject potential referent is embedded in a sentence-final adjunct by-phrase, and thus much less salient. Second, they are not equivalent with respect to topichood. Whereas the subject and object positions of active clauses can both serve as topics (with perhaps a moderate preference for the subject), the subject is the presumed topic in a passive clause. Under the assumption that pronouns refer primarily to (potential) topics, the difference between (24a–b) and (40a–b) is consistent. To summarize to this point, we have seen that a variety of surface-level preferences that have been posited in the literature are actually epiphenomena of interactions between information structure and coherence establishment processes. When the predictions of preferences such as grammatical role parallelism and those of the coherence analysis diverge, the predictions of the coherence analysis win. Note that we can now explain the previously mentioned contrasts between Kameyama’s examples (22a–c), repeated here in (41a–c). (41)
a. John kicked Bill. Mary told him to go home. [= John ] b. Bill was kicked by John. Mary told him to go home. [= Bill ] c. John kicked Bill. Mary punched him. [= Bill ]
RETHINKING THE SMASH APPROACH
117
As we might now expect, in the cases of Occasion in (41a) and (41b) we see the appearance of a subject assignment preference, whereas in the case of Parallel in (41c) we see the appearance of a grammatical role parallelism preference. Whereas I have argued that the grammatical role parallelism preference is epiphenomenal, it is more appropriate to characterize the subject assignment preference as being derivative. The reason stems from its status as the default position from which to mark topic, coupled with the influence of topichood on pronoun interpretation. Consider again the difference between examples (41a) and (41b). These are both instances of Occasion; the only difference is which entity is more topical going into the second clause, that is, the entity in subject position. Importantly, however, occupants of other syntactic positions in active voice clauses can serve as potential topics to which coherence establishment can shift attention, whereas we have seen that this is not so readily done for passive clauses. Thus, it may not be appropriate to reduce the role of subjecthood to a single subject assignment preference, as the effect of being in subject position is almost certainly dependent on the particular syntactic construction that the subject participates in. While I have argued that surface-level morphosyntactic cues are not the ultimate determinants of pronominal assignments, it must be pointed out that they do appear to have an indirect role, in that they may affect what coherence relations are recognized (in ways, I might add, that are frankly not well-understood). For instance, strong syntactic parallelism may bias an interpretation toward the Parallel relation more than a less parallel structure would. The crucial point is that the connection between these superficial cues and pronoun interpretation is not direct, but mediated by the recognition of coherence. This observation, in fact, underlies my primary criticism of more recent study of pronoun interpretation in the context of coherence establishment by Stevenson et al. (2000). Stevenson et al. compare two hypotheses, one based on ‘semantic focusing’ and the other termed ‘relational’. In semantic focusing, focus is placed on an entity central to the final state of an event as posited in Stevenson et al. (1994), but an intervening connective can alter it. The ‘relational’ hypothesis is basically the Hobbs analysis; the view I have presented falls into this category also. At issue is whether connectives directly affect focus, or whether they affect coherence establishment, which in turn affects focus. Based on their results (which space concerns preclude me from discussing in detail), Stevenson et al. conclude that the semantic focusing account is superior. However, there are a variety of facts discussed herein with which their account is inconsistent. First, they assume that there is a one-to-one mapping between coherence relations and predicted referents, which as we have already seen is not the case. Passages (24a–b), for instance, are both instances of the Explanation relation, but the preferred pronominal assignments are different as predicted by default world-knowledge relationships. Second, Parallel relations are conspicuously absent from their study, and it is hard to see how they could be accounted for with semantic and connective
118
HOW IS REFERENCE RESOLVED ?
focusing. For instance, their analysis predicts that the pronoun assignments for passages (37b) and (38a–b) will be the same: semantic focusing starts with Bill, and there is no connective to change it. Third, and relatedly, since there cannot be a shift from the semantic focus without a connective, their analysis makes the wrong predictions for the garden-path in example (32e). In their analysis, Patients receive semantic focus over Agents, and thus reference to Tony is predicted to be preferred (and therefore felicitous). Fourth, the analysis does not account for the differences between active and passive voice in examples like (24a–b) and (40a–b), since the thematic roles and connectives are constant in these passages. Finally, their analysis cannot predict the ambiguity of passages like (23), repeated here as (42), since the factors that determine semantic focusing are obviously invariant between the two readings: (42)
Colin Powell defied Dick Cheney, and George W. Bush punished him.
In the analysis presented in this section, the ambiguity is predicted by the existence of two distinct coherence construals for this passage. All these problems notwithstanding, Stevenson et al.’s model rightly distinguishes the dynamic effects that linguistic structure has on discourse state from the process of pronoun interpretation itself, and thus it is not a smash procedure.
5. Conclusions To conclude, an adequate model of pronoun interpretation must explain a variety of facts, including (i) why in some cases a pronoun cannot be felicitously used even when only one suitable referent is available, (ii) why we find evidence for so-called interpretation ‘preferences’, and why different preferences appear to prevail in different contextual circumstances, and (iii) how pronouns could have the effect of facilitating pronoun interpretation within a broader theory of discourse processing and comprehension. The commonly assumed smash paradigm fails at all three, and thus should be abandoned as a framework for theorizing about pronoun interpretation. I believe that much of the confusion in the literature is a direct result of casting pronoun interpretation in terms of this untenable paradigm. Furthermore, no matter what paradigm is assumed, future psycholinguistic studies should take great care in controlling for the operative coherence relations in their stimuli, since different relations will provide support for different preferences. Much of the complexity that some smash procedures consider to be part of pronoun interpretation undoubtedly originates from sources independent of it. In this chapter I have focused on one: the interaction between information structure and coherence establishment. Because coherence establishment processes fundamentally differ with respect to the type of coherence they establish, we can explain why we see evidence for various (mostly epiphenomenal)
RETHINKING THE SMASH APPROACH
119
preferences only in certain contextual circumstances. Furthermore, because the complexity in the data results from these processes and not pronoun interpretation itself, we can explain how pronouns can actually facilitate comprehension by expressing topic continuance, as opposed to more elaborated referential forms that would require additional processing. Several of the phenomena I have discussed as problematic for previous approaches raise questions about the respective roles of activation, subjecthood, and topichood in pronoun interpretation, including the examples discussed in sections 2.1 and 2.3, and the active-passive incongruences discussed in section 4.3. A full investigation of these and other remaining questions will require a more precise and elaborated model of discourse processing than I have provided here, and thus are subjects for future work. NOTES 1. Thus, I will not argue that the smash paradigm should be abandoned in computer algorithms developed with engineering goals in mind (for instance, to achieve high accuracy on a pronoun interpretation task). Limits of current technology (to parse accurately, to model world knowledge, etc.) and the imperative to handle frequently occurring patterns reliably (which often means ignoring rarer phenomena that might nonetheless be the most illustrative for human language processing) are two of several reasons why such systems should not necessarily strive to model human language processing. 2. For rhetorical convenience, I will use the term ‘information structure’ in a particularly general way for the remainder of the paper, to include not only standard pragmatic notions such as topic but also linguistic factors that bear on the cognitive notion of attention in language processing. The proper division of labor among these concepts is not always clear, but it is not my goal here to go into such questions in any detail. 3. All of the analyses of pronouns that I discuss are restricted to unaccented cases, unless otherwise indicated. In the case of example (6), accent must be placed on the predicate absolutely worships instead. The reason for this will become clear when we consider this example again in Section 4.3. 4. This is the case for the Lappin and Leass algorithm and Hobbs algorithm, for instance. Centering does not specify its ranking in enough detail to determine a prediction for this example. 5. Indeed, a reference to Rice in (13) would make it a so-called contrastive topic (Büring 1999, inter alia), which would in turn require it to be accented. 6. A more recent paper by Stevenson and colleagues (Stevenson et al. 2000) does address coherence; I discuss their paper in section 4.3. 7. All of the centering constructs described here are from GJW, except that BFP split out GJW’s Shift transition into two subcases and modify Rule 2 accordingly. The algorithm for interpreting pronouns using these constructs is BFP’s. 8. To be clear, this does not mean that we will never see grammatical role parallelism in examples involving coherence relations like Occasion, Explanation, and Result. We clearly do, for instance, in any case in which a subject pronoun is identified with a subject antecedent. The point is that there is always an explanation for the particular choice of reference assignments other than the existence of a grammatical role parallelism preference.
120
HOW IS REFERENCE RESOLVED ?
REFERENCES Arnold, Jennifer. 2001. The effect of thematic roles on pronoun use and frequency of reference continuation. Discourse Processes, 21(2):137–162. Brennan, Susan E. 1995. Centering attention in discourse. Language and Cognitive Processes, 10:137–167. Brennan, Susan E., Marilyn W. Friedman, and Carl J. Pollard. 1987. A centering approach to pronouns. In Proceedings of the 25th Meeting of the Association for Computational Linguistics, pages 155–162, Stanford, CA. Büring, Daniel. 1999. Topic. In Peter Bosch and Rob van der Sandt, editors, Focus: Linguistic, Cognitive, and Computational Perspectives. Cambridge University Press, Cambridge, pages 142–165. Caramazza, A. and S. Gupta. 1979. The roles of topicalization, parallel function, and verb semantics in the interpretation of pronouns. Linguistics, 3:497–518. Chafe, Wallace L. 1976. Givenness, contrastiveness, definiteness, subjects, topics, and point of view. In Charles N. Li, editor, Subject and Topic. Academic Press, New York, pages 25–55. Chambers, Craig C. and Ron Smyth. 1998. Structural parallelism and discourse coherence: A test of centering theory. Journal of Memory and Language, 39:593–608. Gordon, Peter C., Barbara J. Grosz, and Laura A. Gilliom. 1993. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17(3):311–347. Grosz, Barbara J., Aravind K. Joshi, and Scott Weinstein. 1995. Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21(2):203–225. Gundel, Jeanette K., Nancy Hedberg, and Ron Zacharski. 1993. Cognitive status and the form of referring expressions in discourse. Language, 69(2):274–307, June. Hobbs, Jerry R. 1978. Resolving pronoun references. Lingua, 44:311–338. —— . 1979. Coherence and coreference. Cognitive Science, 3:67–90. —— . 1990. Literature and Cognition. CSLI Lecture Notes 21, Stanford, CA. Hobbs, Jerry R., Mark E. Stickel, Douglas E. Appelt, and Paul Martin. 1993. Interpretation as abduction. Artificial Intelligence, 63:69–142. Hudson-D’Zmura, Susan. 1989. The Structure of Discourse and Anaphor Resolution: The Discourse Center and the Roles of Nouns and Pronouns. Ph.D. thesis, University of Rochester. Kameyama, Megumi. 1986. A property-sharing constraint in centering. In Proceedings of the 24th Annual Meeting of the Association for Computational Linguistics, pages 200–206, New York. —— . 1996. Indefeasible semantics and defeasible pragmatics. In M. Kanazawa, C. Pi non, and H. de Swart, editors, Quantifiers, Deduction, and Context. CSLI, Stanford, CA, pages 111–138. Kehler, Andrew. 1997. Current theories of centering for pronoun interpretation: A critical evaluation. Computational Linguistics, 23(3):467–475. —— . 2002. Coherence, Reference, and the Theory of Grammar. CSLI Publications. Kehler, Andrew. 2005. Coherence-driven constraints on the placement of accent. In Proceedings of the 15th Conference on Semantics and Linguistic Theory (SALT-15), Los Angeles, CA.
RETHINKING THE SMASH APPROACH
121
Kronfeld, Amichai. 1990. Reference and Computation. Cambridge University Press, Cambridge. Lakoff, Robin. 1971. If ’s, and’s, and but’s about conjunction. In Charles J. Fillmore and D. Terence Langendoen, editors, Studies in Linguistic Semantics. Holt, Rinehart, and Winston, New York, pages 114–149. Lambrecht, Knud. 1994. Information Structure and Sentence Form. Cambridge University Press, Cambridge. Lappin, Shalom and Herbert Leass. 1994. An algorithm for pronominal anaphora resolution. Computational Linguistics, 20(4):535–561. Matthews, A. and M.S. Chodorow. 1988. Pronoun resolution in two-clause sentences: Effects of ambiguity, antecedent location, and depth of embedding. Journal of Memory and Language, 27:245–260. Oehrle, Richard T. 1981. Common problems in the theory of anaphora and the theory of discourse. In Herman Parret, Marina Sbis á, and Jef Verschueren, editors, Possibilities and Limitations of Pragmatics. John Benjamins, Amsterdam, pages 509–530. Studies in Language Companion Series, Volume 7. Smyth, R. 1994. Grammatical determinants of ambiguous pronoun resolution. Journal of Psycholinguistic Research, 23:197–229. Stevenson, Rosemary J., Rosalind A. Crawley, and David Kleinman. 1994. Thematic roles, focus, and the representation of events. Language and Cognitive Processes, 9:519–548. Stevenson, Rosemary J., Alistair Knott, Jon Oberlander, and Sharon McDonald. 2000. Interpreting pronouns and connectives: Interactions among focusing, thematic roles, and coherence relations. Language and Cognitive Processes, 15(3):225–262. Stevenson, Rosemary J., Alexander W. R. Nelson, and Keith Stenning. 1995. The role of parallelism in strategies of pronoun comprehension. Language and Speech, 38(4):393–418. Winograd, Terry. 1972. Understanding Natural Language. Academic Press, New York.
122
HOW IS REFERENCE RESOLVED ?
6
Good-Enough Representation in Plural and Singular Pronominal Reference Modulating the Conjunction Cost sungryong koh, anthony j. sanford, charles clifton jr., eugene j. dawydiak
This chapter concerns the nature of representations set up when two individuals can be referred to by means of plural pronouns. The conditions under which plural pronominal reference is possible, or preferred from a processing perspective, has been the subject of recent psycholinguistic research (Albrecht and Clifton, 1998; Koh and Clifton, 2002; Eschenbach, Habel, Herweg, and Rehkampser, 1989; Kaup, Kelter, and Habel, 2002; Moxey, Sanford, Sturt, and Morrow, 2004; Sanford and Moxey 1995). As part of the attempt to understand plural reference, the concept of complex reference object has been used as an explanatory construct: plural reference is deemed possible if a plural reference object can be formed (see Eschenbach et al., 1989, Kaup et al., 2002, and Koh and Clifton, 2002 for descriptions). The issue then becomes translated into what permits a complex reference object to be formed. In this chapter, we are concerned with singular and plural anaphoric reference to one or more individuals evoked by a previous sentence, as in (1): (1)
Stan and Pam asked the usherette for assistance.
Such a sentence, containing a noun phrase (NP) with two conjoined individual names (denoting atomic individuals), can be followed by a pronominal reference either to the pair of people (They appreciated the help) or to one individual (e.g., He appreciated the help), and still be intelligible. The same applies
123
to a sentence like Stan mowed the lawn with Pam. There has been some interest in the conditions making plural reference possible, and in just what it is that swings the balance in favor of plural or singular pronominal reference during processing. One major factor in favoring plural pronominal reference to two “atomic” referents is if they occur in a conjoined NP, as in (1). Thus Albrecht and Clifton (1998; see also Garrod and Sanford, 1982; Moxey, Sanford, Sturt, and Morrow, 2004) demonstrated slower reading times for a sentence in which the pronoun referred to one individual than one in which it referred to the pair, and slower times for a sentence containing a singular pronoun that referred to one of the conjoined pair than for a sentence containing a singular pronoun that referred to a third individual. The effect, known as the Conjunction Cost, appeared both in whole-sentence reading times in self-paced reading experiments and in the region of the pronoun or shortly afterwards in eye-tracking experiments. Furthermore, Moxey et al. (2004) showed that as singular reference became disadvantaged through the use of a conjoined NP, plural reference became facilitated.
Basis of the Conjunction Cost Two possible factors bringing about the conjunction cost have been identified. The first is syntactic. Albrecht and Clifton (1998) argued that it reflected a mechanism of splitting the conjoined NP apart to gain access to its constituent individuals. Since, in the Albrecht and Clifton eye-tracking study, the effect appeared on the pronoun (plus a following adverb) when the pronoun unambiguously took one of the conjoined names as its antecedent, it was supposed that the splitting mechanism operated on the syntactic phrase structure of the sentence, or some structure closely related to surface structure. However, it is not simply association through being in a conjoined NP that leads to the conjunction cost. Sanford and Moxey (1995) suggested that what was important for plural reference to be felicitous was that two individuals played a common role in a discourse. Thus, whether or not atomic referents are evoked by a conjoined NP, a conjunction cost should occur if they share a common role.1 According to their theory, a complex reference object is effectively a mapping of individuals into a common role in background knowledge. Moxey et al. (2004) compared the conjunction cost for three types of sentence, in which the atomic individuals were introduced in three different ways: (2)
John and Mary painted the room. {syntactic-conjoined NP; common role(agent)}
(3)
John painted the room with Mary. {Common role (coagents)}
(4)
John painted the room for Mary. {Benefactive; different roles (agent/beneficiary)}
In an eye-tracking study, the largest conjunction cost occurred for the conjoined NP, and the smallest for the benefactive, but the co-agent condition
124
HOW IS REFERENCE RESOLVED?
fell in between, and showed a conjunction cost also. Moxey et al. argued that the conjoined NP is a strong cue for the atomic individuals being mapped into a common role, whereas although the co-agent condition would lead mainly to co-agent interpretations, small differences in role (e.g., difference in who had the most control, or expertise, John or Mary, or even construing Mary as an instrument, as in body-action painting) might dilute the effect of being mapped into a common role. So in addition to superficial phrasesplitting, role-mapping is a major cue leading to the conjunction cost, and through syntactic joining in a conjoined NP, common role mapping is maximized. In this chapter, we suggest that there is a third, discourse-based factor that we believe influences the conjunction cost, and that is discourse theme, or thematic subject. Specifically, if two individuals have been introduced in a conjoined phrase, then the cost associated with using a pronominal reference to one of the individuals may be associated with a shift in thematic subject, from a text about both individuals to one about just one of them. From a communication perspective, a text in which two individuals are introduced as playing a common (undifferentiated) role is very different from a text in which two individuals play completely different roles. An unprincipled or unheralded break from a discourse about both to being about just one could disrupt processing. If, on the other hand, the shift from both individuals to just one is justified in some obvious way, then one would expect no disruption, and so no observed conjunction cost. We conjecture that there are two possible justifying conditions for changing from a plural thematic subject to a singular one. First, if an action that is typically carried out by one person is done on behalf of the pair, then this should not be perceived as being a thematic shift. Second, if a shift is to be made, it could be signaled by a device in the discourse, for example, referring to one person with a proper name. We examine each of these in turn, reporting experiments that test our conjectures.
Experiment 1: Singular Actions on Behalf of Two, and Maintaining Thematic Unity We propose that a simple contrast can be drawn that embodies the first condition discussed above. First, consider example (5): (5a)
Last night John and Mary went to an Italian restaurant. He really enjoyed the food.
(5b)
Last night John and Mary went to an Italian restaurant. They really enjoyed the food.
The intuition that the use of he in (5a) is strained fits with a further intuition, that by going on to additionally use she somehow renders the split into singulars more acceptable:
GOOD-ENOUGH REPRESENTATION
125
(6)
Last night John and Mary went to an Italian restaurant. He really enjoyed the food, but she thought it was pretty mediocre.
Of course, what has happened in (6) is that the thematic shift to John, which we claim here is a major source of processing disruption, is repaired by reintroducing Mary, in this case in a contrastive manner. In this way, from a communication point of view, the shift is justified, since the very point is that we might have expected them to play a common role (i.e., both react in the same way). Another way in which a shift may be signaled (in spoken language) is by putting stress on a pronoun. For instance, if the second sentence of (6) were to be HE really enjoyed the food, then because the stress brings about a contrast between what he did and what she must have done, the continuation becomes more acceptable. This line of reasoning opens up a further possibility. If the use of singular pronoun is related to an action that does not lead one to question whether the two characters are reacting or acting in the same way (preserving a common role), then it should not lead to a conjunction cost. One striking case where this appears possible is where one of the individuals carries out some action that is consistent with the two of them being in a common role. Such an example is where an individual does something on behalf of the two of them. Consider the following (7a)
Last night John and Mary went to an Italian restaurant. He asked for a table.
(7b)
Last night John and Mary went to an Italian restaurant. They asked for a table.
In (7a), the action of asking for a table, although typically (physically) carried out by one person, is an action that may be carried out by one on behalf of both protagonists. We propose that when someone reads (7a), their natural interpretation will be that John asked for a table for both of them. In (7b), the act of asking for the table is also ascribed to both individuals, because they both want the table, but in all probability the enactment of this would have been carried out by only one of them. We propose that with this example, it simply doesn’t matter whether the action is depicted as being carried out by one or both individuals. It is a number-indifferent action in the present context. Later we shall discuss the notion of number-indifference in more detail. Interestingly, such materials are not typical of the ones that have been used in experiments showing the conjunction cost. On the other hand, examples like (5), where the shift to a singular is unmotivated, are typical of these experiments. Our proposal is that in number-sensitive instances, a conjunction cost will be observed, whereas with number-indifferent instances, it will not, because it does not constitute a true shift in thematic subject. We first demonstrate that it is possible to develop materials that by consensus fit these two types, identified here only intuitively. We shall discuss later what these intuitions may be tapping into.
126
HOW IS REFERENCE RESOLVED?
Validation of Materials Sixteen materials based on examples like (5) and (7) were constructed. Each discourse began with a sentence that introduced one couple (one male, one female name). The discourse continued with a singular (he or she) pronoun reference to the first member of the couple. The sentence where this reference occurred described an action that was of either the number-indifferent or the number-sensitive type, according to experimenter intuitions. To validate the experimenters’ judgments, sixteen University of Massachusetts undergraduates completed a questionnaire that contained both versions of all 16 discourses. One version (number-indifferent or number-sensitive) was randomly chosen to appear in the first half of the questionnaire, and the other appeared in the second half. Participants were instructed to rate the second sentence of each discourse on a four-point scale, ranging from 1 (“almost certainly done by just one person on behalf of the couple”) to 4 (“almost certainly done just for the person himself or herself; both people could do the same thing”). They were given several examples intended to illustrate the decision, and then asked to evaluate all 32 discourses. The mean rating of the number-indifferent versions was 1.34, while the mean rating of the number-sensitive versions was 3.10 (t(15) = 12.31, p < .001), confirming our intuitions. Most of the items clearly satisfied our criterion of differing in whether the actions reflected being number-indifferent or number-sensitive.
Method A whole-sentence self-paced reading procedure was used to measure the speed with which 16 of the short pre-tested discourses were read. Sixty undergraduates at the University of Massachusetts participated for course credit. Each was tested individually in a session that lasted less than 25 minutes. One-quarter of the participants were tested on each of the four counterbalanced lists described in the following section.
PARTICIPANTS
The 16 discourses from the materials evaluation were used2. As shown in table 6.1, each came in four versions: two with a singular pronoun, as examined in the materials evaluation, and two with the plural pronoun they. Thus, the four versions of each sentence were determined by the factorial combination of singular versus plural pronoun and actions that were number-indifferent or number-sensitive. In half of the items the singular pronoun referred to the female and in half, to the male. Standard counterbalancing procedures ensured that each participant saw an equal number of sentences in each of the four conditions (balanced over gender of pronoun) and that, over the entire experiment, each sentence was tested equally often in each condition. MATERIALS
GOOD-ENOUGH REPRESENTATION
127
TABLE
6.1.
A Sample Item from Experiment 1
Sentence Condition
Example
Lead-in (Sentence 1) Singular/Number Indifferent Plural/Number Indifferent Singular/Number Sensitive Plural/Number Sensitive
Jerry and Lisa entered the opera house on time. He gave the tickets to the usher. They gave the tickets to the usher. He looked around the beautiful lobby. They looked around the beautiful lobby.
These 16 discourses were embedded in a list of 64 short passages. Half of the items were followed by a wh-question or a yes-no question. Participants were to choose the correct one of two answers that appeared on the video screen by pulling a trigger under the correct answer. A 6-item practice list was also constructed, with three sentences followed by questions. PROCEDURE After being tested on the practice list, the 64 passages were presented to each participant in an individually randomized order. Each sentence was presented individually, at the left edge of the screen, and remained on the screen until the participant pulled a trigger to advance to the next sentence. On half the trials, the second sentence in a discourse was followed by the word “QUESTION” for 500 ms and then by the presentation of a question and two answers. The time to read each of the two sentences in a discourse and the accuracy of the question answer were recorded.
Results and Discussion The questions that appeared after half the items were answered with 95% or better accuracy in each condition. The primary data of interest are the reading times for the second sentence of a discourse. (No differences in reading times for the first sentence approached significance; all F < 1.) The means of these times appear in table 6. 2. Individual times over 5000 ms were eliminated (1.3% of the data). An analysis of variance was carried out with the factors singular versus plural pronoun and typical singular versus numbersensitive, plus the factor of counterbalancing groups or counterbalancing item sets as recommended by Pollatsek and Well 1995. As predicted, this
6.2. Mean Reading Times in Ms for Sentence 2, Experiment 1, as a Function of Condition, Including Standard Errors
TABLE
Pronoun
128
Content
Singular
Plural
Number-indifferent Number-sensitive
1891 (76) 1898 (73)
1899 (87) 1742 (67)
HOW IS REFERENCE RESOLVED?
indicated a significant interaction between pronoun plurality and numbersensitivity: F1(1, 56) = 4.33, p < .05; F2(1,10) = 6.15, p < .04. Sentences with a plural pronoun were read a significant 156 ms faster than ones with a singular pronoun when the discourse was number-sensitive, [t1(59) = 2.62, p < .03; t2(13) = 2.31, p < .03] but the difference for number-indifferent was nonsignificant (t < 1). In other words, the conjunction cost appears to be present in the case of the number-sensitive materials, but absent in the case of the number-indifferent materials, as expected. A 156 ms conjunction cost was observed for number-sensitive actions like (5). Sentences with a plural pronoun were read more rapidly than sentences with a singular pronoun whose antecedent was inside a conjoined NP. This effect basically replicates Garrod and Sanford (1982) and Albrecht and Clifton (1998). However, the conjunction cost disappeared for number-indifferent actions, where the relevant role was one typically played by a single individual on behalf of both. One apparent complication in the data is that, while reading times for a plural pronoun sentence depicting a number-sensitive action were shorter than for a plural pronoun sentence depicting a number-indifferent action, reading times for singular pronouns were equally long in both numbersensitive and number-indifferent scenarios. In principle, the reading time for the singular pronoun sentence should be longer in the number-sensitive condition. However, it is inappropriate to treat this comparison too seriously, since largely different content is being compared, and the sentences differ in length. A second complication in the data comes from the possibility that making plural reference in a number-indifferent predicate is implausible or infelicitous.3 We claim that the disappearance of conjunction cost in number-indifferent scenarios reflects an increased plausibility of using a singular reference in these scenarios, compared to singular reference in number-sensitive scenarios. However, this difference could reflect a decreased plausibility of the plural reference in number-indifferent situations. The pattern of data is not precisely what this suggestion would entail. Reading times were not notably long for plural pronouns in number-indifferent situations. However, one could argue that plural pronoun sentences were actually processed faster across-the-board than any singular pronoun sentences, but that the advantage of plural reference was offset by the implausibility of the plural pronoun sentence in a number-indifferent discourse. The following plausibility rating study was conducted to assess this possibility.
Experiment 2: Plausibility Ratings Method The 16 items of Experiment 1 were used in a written questionnaire study, in which 44 University of Massachusetts undergraduates (one was eliminated for failure to complete the questionnaire) rated each item on a five-point scale
GOOD-ENOUGH REPRESENTATION
129
TABLE
6.3.
Mean Plausibility Ratings, Experiment 2
Content
Number-indifferent Number-sensitive
Pronoun Singular
Plural
4.19 3.56
4.31 4.40
from “odd or clumsy” (1) to “perfectly OK” (5). Four forms of the questionnaire were prepared. In each form, four items appeared in each version used in Experiment 1 (number-indifferent vs. number-sensitive X singular vs. plural pronoun). The four questionnaire forms were counterbalanced, so that each item occurred in each version in one form. Eight items of questionable plausibility were added to the questionnaire to define the range (e.g., Susan and Keith were cooking up a big holiday dinner. She decided to go and exercise.) Each form was separately randomized.
Results and Discussion The mean plausibility ratings appear in table 6. 3. The interaction was highly significant (F1(1, 42) = 18.1, p < .001; F2(1, 15) = 12.56, p < .01), as were both main effects, reflecting the low rated plausibility of a singular pronoun sentence in a number-sensitive context. There was no support for the concern that implausibility of plural pronoun sentences in number-indifferent actions discourse masked an underlying across-the-board advantage of plural reference. Indeed, the plausibility ratings reflect the same factor that affected reading times: There is a penalty for referring to one of the individuals in a conjoined NP if that individual is not performing an action on behalf of both of the individuals. So, Experiments 1 and 2 support the view that when an action by an individual is done of behalf of both characters, it does not disrupt the coherence of the text if the thematic subject is changed from both people to one person.
Experiment 3: Type of Anaphor and Thematic Shift We conjectured that if part of the Conjunction Cost is due to a shift in thematic subject, then it should be possible to eliminate it by signaling that such a shift will take place. Experiment 3 examined the speed of reading essentially the same scenarios used in Experiment 1, except that sentences with a singular pronoun were compared with sentences containing the proper name of the referent. A substantial literature demonstrates slower reading time when a repeat of the proper name is used to refer to an individual that may be felicitously referred to by means of a pronoun (the “repeated name penalty”; e.g., Gordon, Grosz, and Gilliom 1993). In the present experiment, we applied
130
HOW IS REFERENCE RESOLVED?
a similar idea to the status of singular pronouns in number-sensitive and number-indifferent settings. In a number-sensitive setting, a singular pronoun is inappropriate, since it is not bound to something that can be classed as being done on behalf of both. We have argued that in such cases, there is effectively a shift in thematic subject from both individuals to just one. Vonk, Hustinx, and Simons (1992) showed that using full noun-phrases or proper names as anaphors served to reintroduce the denoted individuals as the new topic. So, in the number-sensitive case, we would expect reading to be facilitated when a repeat-name is used rather than the corresponding singular pronoun, since it accommodates the change in discourse topic. On the other hand, we would expect the proper name to be either of no particular advantage, or even to be a disadvantage to processing in the number-indifferent case (as in the repeated-name-penalty effect), since there is no shift in topic.
Method Eighty-four University of Massachusetts undergraduates were tested using the same procedure described in Experiment 1.
PARTICIPANTS AND PROCEDURE
The 16 2-sentence quadruples of Experiment 1 were modified by replacing the plural pronoun with the name of the person referred to by the singular pronoun, as illustrated in (8) and (9). These 16 discourses were embedded in a total of 64 sentences, as was done in Experiment 1. Half were followed by questions, as in Experiment 1.
MATERIALS
(8a)
Last night John and Mary went to an Italian restaurant. He asked for a table. [Singular pro; number indifferent]
(8b)
Last night John and Mary went to an Italian restaurant. John asked for a table. [Name; number indifferent]
(9a)
Last night John and Mary went to an Italian restaurant. He really enjoyed the food. [Singular Pro; number sensitive]
(9b)
Last night John and Mary went to an Italian restaurant. John really enjoyed the food. [Name; number sensitive]
Results and Discussion The questions that appeared after half the items were answered with 93% or better accuracy in each condition. The mean reading times in ms for the second sentences appear in table 6.4. Because the names were systematically longer than the pronouns, we present times linearly adjusted for length, following the regression equation procedures described by Ferreira and Clifton (1986), and also Trueswell, Tanenhaus and Garnsey (1994). The regression parameters were based on all 64 passages used in the experiment.
GOOD-ENOUGH REPRESENTATION
131
6.4. Mean Adjusted Reading Times (with SE), Ms, Sentence 2, Experiment 3
TABLE
Number-indifferent Number-sensitive
Pronoun
Name
23 (30) 35 (27)
79 (34) −41 (28)
Times longer than 5,000 ms were eliminated (0.3 % of the data). Analyses of variance were conducted with the factors name vs. pronoun and numberindifferent vs. number-sensitive. The critical interaction between pronoun versus name and type of scenario was significant [F1(1,80) = 5.14, p < .03; F2(1, 12) = 5.66, p < .05]. As predicted, sentences with a name were read marginally faster than sentences with a pronoun following number-sensitive actions, albeit marginally by materials (41 ms faster than predicted on the basis of length for the name data, vs. 35 ms slower than predicted on the basis of length for the pronoun data; t1( (83) = 2.10, p < .04; t2(15) = 1.87, p = .081). For number-insensitive actions, there is no reliable difference [t1(83) = 1.15; t2(15) = 1.65, NS]. Thus, the experiment offers support for the idea that using a proper name facilitates processing of a shift to one of the individuals under the number-sensitive condition, but not under the number-indifferent condition. We suggest that this is because singling out one individual’s action results in an unprincipled, unheralded, and unaccounted for change of thematic subject when this cannot be accommodated as “on behalf of both.” The use of a repeat name accommodates this shift in the way a pronoun cannot, as predicted on the basis of the observations of Vonk et al. (1992), that specifying an NP more fully than through a pronoun serves to signal a forthcoming change in thematic subject (i.e., which individual is in focus). Taken together, the results of Experiments 1, 2, and 3 support our ideas regarding boundaries on the Conjunction Cost: There is no global cost associated with singular pronouns when the action occurs on behalf of both protagonists, and the use of a repeat name serves to cue the shift in thematic subject in the number-sensitive cases.
Indifference and Good-Enough Representation The experiments described up to now suggest that the processing system is effectively indifferent to a shift to a single individual if, from a narrative point of view, this does not force a shift in thematic subject. An interesting question is whether under this condition of acceptable shift the mental representation of the text actually fails to distinguish between both protagonists taking an action, and just one protagonist taking an action. There is evidence from several sources showing that language input is not necessarily represented uniformly, but may sometimes result in a shallow, or incomplete (underspeci-
132
HOW IS REFERENCE RESOLVED?
fied) representation (Sanford and Sturt 2002), and that the level of specification achieved during interpretation may be just that which is “good enough” for the tasks of interpretation at hand (Ferreira, Ferraro, and Bailey 2002). One way of thinking about the representation of an action by an individual that is taken on behalf of a pair is that possible pronominal reference is not really differentiated between singular and plural. For instance, once we have learned that Jack and Jill drove to the station, it simply doesn’t matter whether They or He parked the car at the useful level of granularity of the event representation. To see how this might work, consider the level of granularity required to understand the sentence Jack and Jill drove to the station. What is not specified is who actually operated the pedals and the steering wheel, yet clearly only one of them could actually do it at a time. It’s simply the case that the sentence does not need that level of specification to be understood. By the same token, They parked the car does not need to be specified either. Clearly they didn’t both work the controls to get the car into a parked position, though they might have taken it in turns in some frustratingly difficult situation, or one might have “directed” while the other steered, and so on. The thing is that outside of some guiding context, there is simply no need to go to that level of specification. So, at the level of granularity adopted during reading these sentences, whether he, she, or they parked doesn’t matter, because the attributes necessary to make the appropriate distinctions simply aren’t activated. If a continuation to both driving to the station was he wanted to pick up his niece, then there is a clear need to represent that fact at a level of granularity that differentiates he/him from them. The shift in grain is part and parcel of the shift in thematic subject and discourse topic. In Experiment 4, we tested this idea using a text-change procedure (e.g., Sanford and Sturt, 2002), in which a text is shown under self-paced reading conditions, and then shown a second time after a brief delay. On the second exposure, one of the words might be changed. Such a technique has been shown to detect influences of contrastive focus (Sturt, Sanford, Stewart and Dawydiak, 2004) and syntactic load (Sanford, Bohan, Sanford, and Molle, 2003) on detection of changes. In the present context, we can make the prediction that if an action can be carried out by one person on behalf of two, then a change from a plural to a singular reference would be noticed less easily than would a change to singular when the action was judged not to be on behalf of two.
Experiment 4: Change Detection and Representation Method DESIGN AND MATERIALS The test materials consisted of the 16 2-sentence materials of Experiment 1 that were adapted by the addition of a further sentence to each of them. In all cases in the test materials, the first presentation consisted of the plural anaphor version, while in the second
GOOD-ENOUGH REPRESENTATION
133
presentation the pronoun was changed to a singular pronoun corresponding to a reference to the first individual in the conjoined NP that introduced the characters. An example material for the number-indifferent case is shown in (10): (10)
Jared and Kathleen went to the mall on Sunday. They (⇒ He) found a parking space near the entrance. Suddenly, it started to rain heavily.
Here, (⇒ He) denotes a change from They to He in the second exposure. For the number-sensitive case, the corresponding example had the second sentence (12): (11)
They (⇒ He) walked toward the south entrance.
In addition, various fillers were incorporated. There were 16 fillers with the same general form as the 16 test materials, except that they were used in a nochange condition. The pronoun was the same (plural) for both presentations. A further 16 fillers based on simple 3-line narratives were used, these not having conjoined NPs. These contained singular and/or plural pronouns, but did not include a change to a pronoun. Rather, in this set, changes to nouns were made in the second presentation, and these were distributed over a very wide range of locations, from the third word of the first sentence to near the end of the third sentence. The word-changes used here were semantically large, and designed to be easily detected, so that the participants would expect changes to occur at any physical location. Finally, a set of 32 further items from another study using changes in verbs was incorporated, 16 of which had changes in sentences 2 and 3, and 16 of which had no changes. Thus the overall proportion of trials on which a change took place was 60%. PARTICIPANTS Sixteen native-English-speaking undergraduates from the University of Glasgow took part. They were naïve with respect to the aims of the study.
The materials were presented under computer control using the Psyscope experimental package (Cohen, MacWhinney, Flatt, and Provost, 1993). Participants pressed a button on a button-box to initiate a trial, which began with a display of the complete paragraph. They then read it at their own pace, being instructed to read as they would normally for meaning, and not to re-read after the first pass through. The presence of the experimenter in the room induced a compliance with this request. They pressed the button again when they had finished reading, and the paragraph was replaced by a gray screen for 500 ms, after which the passage re-appeared. They were instructed to re-read at their own pace, primarily for meaning, but were told that a word may be changed on some occasions, and if they noticed it, to indicate, again without systematically re-reading. The trial was ended when the participant pressed the button again. If they believed that they had
PROCEDURE
134
HOW IS REFERENCE RESOLVED?
detected a change, they were instructed to say what the change was. It was stressed that the primary objective was simply to read through the text for meaning, without going over the text more than once, and that we were interested in the time it took them to read the passages. Spotting changes was portrayed as a secondary task. Participants were credited with detecting a change if they correctly stated which word had changed and which word it had changed to. For the experimental materials, as anticipated, the detection rate for changes from a plural to a singular pronoun was lower for the number-indifferent materials, at 38.7% (7.1 items, se = .258) than for the number-sensitive materials, at 44% (6.19 items, se = .368). Although numerically small, this is a reliable difference, with t1(15) = 2.448, p < .027; t2(15) = 2.210, p < .043). Overall detection rate for the non-experimental materials was 41%, and false alarms were low at a mean of 4.3 per participant. False alarms were distributed unsystematically with respect to position in the paragraphs. There are reasons that the observed difference should be small. First, overall detection rate was rather low, suggesting that the task was difficult. More crucially, it is possible that some of the critical detections occurred based on the change in form from they to he rather than being based on a difference in the discourse representation (e.g., the identity of the thematic subject). Only the latter detections should reflect the number-sensitivity of the predicate. Nonetheless, the observed difference was reliable and in the direction predicted. We therefore conclude that readers are more indifferent to the choice of plural or singular pronoun in the cases where the action could be carried out by one on behalf of both, or be depicted as being carried out by both, than in the cases where the singular pronoun singles out an action carried out by just one individual.
Discussion The conditions that lead to preferences for plural reference over singular reference have been the subject of recent enquiry. In this chapter, we assume that if two protagonists play a common role in an action or episode, then that constitutes grounds for using a plural pronoun for those individuals, provided they are properly focused (cf. Moxey et al., 2004; Sanford and Moxey, 1995). Syntactic structure, in the form of conjoined NPs, also leads to a preference for plural reference, though this is as yet not properly untangled from the common role constraint. An important experimental procedure for looking at preferences is to see when a singular pronoun is infelicitous, and when under conditions where the atomic individuals play a common role (or are in a conjoined NP), use of a singular pronoun is disadvantaged—the so-called conjunction cost. In the present chapter, we claim that at least some of the conjunction cost may be due to factors at the discourse level. That is, if a shift from a discourse about two individuals in a common role suddenly becomes about
GOOD-ENOUGH REPRESENTATION
135
just one of them in an unmotivated way, then there will be a cost in terms of processing effort while the reader accommodates this shift. Our experiments are a test of this idea. We distinguished between shifts to singulars in situations that are number-sensitive and those that are number-indifferent. A numberindifferent situation is one in which an action that is undertaken by just one of two individuals can be construed as being carried out on behalf of both individuals, while a number-sensitive situation is one where the actions of a single individual cannot be construed as being on behalf of both. Experiment 1 showed that the conjunction cost is absent in the first case, and present in the second case, as predicted. Experiment 2 showed that this pattern is present in the judgments of plausibility, which we take as being indicative of felicity. A second issue we addressed is whether an appropriate cue might facilitate the transition from a plural to a singular thematic subject. Our argument is that the conjunction effect comes about typically because a shift from a plural thematic subject to a singular one is not properly motivated or accommodated. In Experiment 3, we showed that using a proper name in the place of a singular pronoun is an advantage when the action is number-sensitive, and suggest that this is because the use of proper names serves to indicate a shift in thematic subject, and so facilitates the shift. What then is the conjunction cost? We suggest that it is a complex phenomenon. Certainly, eye-tracking evidence suggests that there is a small conjunction cost quite early on in processing, shortly after the relevant pronoun is encountered. If text is set up with two conjoined individuals, and a singular pronoun is introduced before the action in which the character it denotes is encountered, then there is a small, syntactically based conjunction cost (Albrecht and Clifton, 1998; Moxey, Sanford, Sturt, and Morrow, 2004; Sanford, Sturt, Moxey, Morrow, and Emmott, 2004). The effects we have demonstrated show another factor at work, that of shift in thematic subject. Being number-indifferent only becomes obvious at the predicate of the critical sentence, and so can only reduce or eliminate the conjunction cost after the predicate has been read, not before. The global absence of a conjunction cost in the number-indifferent condition of Experiment 1 should therefore be seen as an accommodation and swamping of this early effect. The effects we obtained are, we believe, related to the granularity of representations resulting from an interpretation of the materials. Consider (12): (12)
John and Mary went for a meal.
We assume that when this sentence is interpreted, John and Mary are mapped into a common role in an elicited background-knowledge structure (Sanford and Moxey, 1995). But this means that the level of granularity afforded the representation can only be at a level where it is possible for them both to be in the same role. This might seem very straightforward in a case like (12), where
136
HOW IS REFERENCE RESOLVED?
the background situation is unlikely to be much more than having a goal of getting to a restaurant. It is more obviously an issue with a case like (13): (13)
John and Mary drove to the station.
If they are mapped into a common role, it simply cannot be one that specifies that they both sat in the driving seat, both worked the pedals and the steering wheel, both kept the car on track, and both watched the road and the rearview mirror to control the trajectory of the vehicle. The interpretation, we suggest, is at a cruder level of grain, where they both used a car, with one of them in control (unspecified), to get to a destination. Further actions will be interpreted against these scenarios, unless that cannot be done. So, if after (12), the action is He asked the waiter for a table, this will easily be accommodated as meeting a goal of both protagonists, and so is the equivalent of both of them asking for a table (i.e., both individuals are effectively mapped into the role of asking for a table). Similarly, if after (13), the action is He parked the car near the concourse, this action again realizes something that is part of both driving, and so by default leads both protagonists being mapped into the role of agent in parking. The new actions thus preserve previously established role-mappings. Experiment 4, in which we used text-change detection, showed a reduced capacity to detect a change from they to he/she when the action was numberindifferent, showing that on some occasions at least, an action by one individual led to a representation that was not different from the representation for both individuals carrying out the action. The argument put forward above is the most extreme version of rolemapping, in that the idea is that using he/she or they can be completely interchangeable. In practice, we believe that it is probably not, and that some differentiation may occur, such that when a singular pronoun is used, there will always be a mapping into the fact that the protagonist was male or female and singular. But at the same time, both individuals will be mapped into the principal scenario. For this reason, the effect in Experiment 3 is not absolute, although it is perfectly reliable.
Summary • Conjoining individuals into complex NPs through the use of a conjunction leads to a preference for reference to the individuals through a plural pronoun (i.e., to both) rather than a singular pronoun (i.e., to just one of them) under many conditions. • The preference shows itself in longer reading times to sentences with singular pronominal grammatical subjects rather than plural ones. This is the Conjunction Cost. • Part of the conjunction cost results from a shift in thematic subject from a plurality to a singular. If such a shift is not motivated by a
GOOD-ENOUGH REPRESENTATION
137
cue in the discourse, then it constitutes an infelicitous change, and causes a processing difficulty. • A singular pronoun can be used felicitously if the action carried out by the person singled out by the pronoun can be construed as being “on behalf of both”. • If it is so construed, then at the level of granularity of the representations involved in understanding the discourse, there is effectively no discrimination between one person doing the action and both doing the action. This research is intended to add to our understanding of plural reference, specifically with respect to instances when individuals are effectively individuated and when they are not. This research was supported in part by grants HD-18708 from NIH to the University of Massachusetts. The third and fourth authors would also like to acknowledge support from the ESRC through grants R000223622, R000223497, and R000239888. Portions of these data were presented at AMLaP, Leiden, The Netherlands, September 2000. NOTES 1. This is a simplification. In their theory, Sanford and Moxey (1995) argue that different mappings can occur even if different individuals are mapped into the same main role, if background knowledge permits other mappings into different roles too. For instance, in our examples with John and Mary, John, being a male-name, carries a mapping into knowledge structures tagging this name as male; with Mary the mapping is to knowledge about female names. The balance between same-role mappings and different role mappings for any pair of individuals determines how differentiated the individuals are in the mental representation, and hence how appropriate a plural or a singular referential pronoun is to refer to one or both individuals. 2. The materials are available from the authors on request. 3. We thank Lyn Frazier for this suggestion.
REFERENCES Albrecht, J., and Clifton, C., Jr.(1998). Accessing singular antecedents in conjoined phrases. Memory and Cognition, 26(3), 599–610. Cohen, MacWhinney, Flatt, and Provost. (1993). PsyScope: A new graphic interactive environment for designing psychological experiments. Behavior Research Methods, Instrumentation, and Computers, 25, 257–271. Eschenbach, C., Habel, C., Herweg, M., and Rehkamper, K. (1989). Remarks on plural anaphora. In Proceedings of 4th European Chapter of the Association for Computational Linguistics, (pp. 161–167). University of Manchester. Ferreira, F., and Clifton, C., Jr. (1986). The independence of syntactic processing. Journal of Memory and Language, 25, 348–368.
138
HOW IS REFERENCE RESOLVED?
Ferreira, F, Ferraro, V., and Bailey, K. G. D. (2002). Good-enough representations in language comprehension. Current Directions in Psychological Science, 11, 11–15. Garrod, S., and Sanford, A. (1982). The mental representation of discourse in a focused memory system: Implications for the interpretation of anaphoric noun phrases. Journal of Semantics, 1, 21–41. Gordon, P. C., Grosz, B. J., and Gilliom, L. A. (1993). Pronouns, nouns, and the centering of attention in discourse. Cognitive Science, 17, 311–349. Kaup, B., Kelter, S., and Habel, C. (2002). Representing referents of plural expressions and resolving plural anaphors. Language and Cognitive Processes, 17, 405–450. Koh, S., and Clifton, C., Jr. (2002). Resolution of the antecedent of a plural pronoun: Ontological categories and predicate symmetry. Journal of Memory and Language, 46, 830–844. Moxey, L. M., Sanford, A. J., Sturt, P., and Morrow, L. (2004). Constraints on the formation of plural reference objects: The influence of role, conjunction, and type of description. Journal of Memory and Language, 51, 346–364. Pollatsek, A., and Well, A. D. (1995). On the use of counterbalanced designs in cognitive research: A suggestion for a better and more powerful analysis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 785–794. Sanford, A. J., and Garrod, S. (1998). The role of scenario mapping in text comprehension, Discourse Processes, 26, 159–190. Sanford, A. J, and Moxey, L. (1995). Notes on plural reference and the scenariomapping principle in comprehension. In G. Rickheit and C. Habel (Eds.), Focus and coherence in discourse processing (pp. 18–34). New York: Walter de Gruyter. Sanford, A. J. and Sturt, P. (2002). Depth of processing in language: Not noticing the evidence. Trends in Cognitive Sciences, 6, 382–386. Sanford, A. J., Sturt, P., Moxey, L. M., Morrow, L., and Emmott, C. (2004). Production and comprehension measures in assessing plural object formation. In M. Carreiras and C. Clifton, (Eds.), The On-line Study of Sentence Comprehension: Eye-tracking, ERP, and Beyond. Brighton: Psychology Press. Sanford, Alison, Bohan, J., Sanford, A. J. and Molle, J. (2003). Detecting changes as a function of load and extent of change: What’s the mechanism? Poster at the conference Architectures and Mechanisms of Language Processing (AMLaP), Glasgow, August 25–27. Sturt, P., Sanford, A. J., Stewart, A., and Dawydiak, E. J. (2004). Linguistic focus and good-enough representations: An application of the change-detection paradigm. Psychonomic Bulletin and Review, 11, 882–888. Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. (1994). Semantic influences on parsing: Use of thematic role information in syntactic disambiguation. Journal of Memory and Language, 33, 285–318. Vonk, W., Hustinx, L. G . M .M., and Simons, W. H. G. (1992). The use of referential expressions in structuring discourse. Language and Cognitive Processes, 7, 301–333.
GOOD-ENOUGH REPRESENTATION
139
This page intentionally left blank
IV
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION?
This page intentionally left blank
7
The Overlapping Distribution of Personal and Demonstrative Pronouns donna k. byron, sarah brownschmidt, michael k. tanenhaus
1. Introduction When a speaker wishes to refer to a particular entity in the world, a variety of different forms can be utilized for the referring expression. A speaker might refer to a particular building, for example, with the expression “my house,” “42 Main Street,” “the house next door,” or “it,” depending on the situation. Although various expressions of different forms can all refer to the same referent, at a specific point in any given discourse, the speaker is not completely at liberty to employ any alternative he chooses. Using the wrong form at the wrong time can make a sentence confusing or even incomprehensible (for a review, see Garnham (2001) ). The distinction between descriptive NPs and reduced forms such as pronouns has been accounted for by various models in terms of the cognitive status, accessibility, or information status of the NP’s referent (Prince (1981), Garrod and Sanford (1982), Ariel (1990), Gundel et al. (1993) ). While descriptive phrases can introduce new referents into a discourse, reduced forms are used to refer to items that are recoverable from the previous discourse or the discourse setting. Descriptive phrases can re-mention a referent that has receded in the discourse participants’ memory, while pronouns, due to their limited ability to convey descriptive details about their referents, refer to an item with high attentional prominence at the point in the discourse where the pronoun appears.
143
The distribution of different pronominal forms is affected by felicity conditions that have yet to be completely understood. In English, for example, both personal pronouns such as it and them, and demonstrative pronouns such as that/these, can be used to refer to entities with neuter gender. Grammatically, these two types of pronouns differ based on animacy1 and numeric agreement features (Channon (1980) ). However, in most cases these variations do not explain the distribution of the two pronoun types in naturally occurring discourse. Thus, their usage must be explained by other pragmatic factors. In several previous studies, their distribution has been characterized as a distinction along the dimension of attentional salience: personal pronouns are used to refer to items that are currently topical or at the center of attention, while demonstrative pronouns can refer to other attentionally prominent items that are not among the most salient items (Linde (1979), Schuster (1988), Passonneau (1985), Gundel et al. (1993) ). Although this distinction serves as a good general guideline, it does not fully account for the distribution of these two types of pronouns found in naturally occurring discourse. Personal pronouns can sometimes be used to refer to items that are difficult to characterize as having high attentional salience, and demonstrative pronouns sometimes refer to items that would be considered topical by most criteria. Although some previous studies have acknowledged that the line between personal and demonstrative pronouns is blurry (Linde (1979), Passonneau (1985), Gundel et al. (1993) ), no in-depth analysis of data in the boundary area has been offered in the literature. Our motivation for exploring this topic is based on the need for robust automated processes for interpreting pronouns. Both personal and demonstrative pronouns occur frequently in spoken dialogue, and therefore the usage of these two different types of pronouns must be understood in more detail to support automated systems that comprehend and produce spoken language. Toward that end, this chapter presents an analysis of the data collected in two different experiments that investigated the meanings assigned to personal and demonstrative pronouns in task-oriented discourse. We will present data that diverges from the findings of previous studies and discuss the properties that, we believe, allowed these cases to be inconsistent with the typical usage patterns found in previous studies. This analysis leads us to conclude that a variety of factors interact to determine which pronominal form is appropriate to use for a particular referent at a particular point in a discourse. The attentional prominence of the referent is only one property among many that drive a speaker’s choice of surface form.
2. Related Background 2.1 Pronouns and Attentional Salience Often, when a pronoun is encountered in discourse, there are multiple items evoked by the preceding discourse that the pronoun could potentially refer to,
144
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
yet in many cases people do not find the pronoun to be ambiguous. This phenomenon is said to occur because certain elements of the event described in a sentence are more attentionally prominent than others. They are brought into focus by either the actions described in the sentence, by their appearance in particular syntactic positions, by prosodic marking, and so on. The more prominent entity is selected as the referent for a pronoun unless its semantic properties are incompatible with the predicative context of the pronoun (Sidner (1983) ). For example, in discourse (1) from Sidner (pg. 373), the pronoun He in sentence (b) can refer to either the dog or the bull, but the dog is chosen because, Sidner claims, it is more attentionally prominent based on sentence (a). (1)
(a) Pam walked her dog near a bull one day. (b) He trotted quietly along.
Computational linguists working to find automated methods of pronoun resolution put this observation into use by creating algorithms that judge which item is at the center of attention at each point in a discourse. The one most prominent item in a sentence cannot be determined by looking at the sentence in isolation. Rather, it is commonly calculated by taking sequences of sentences into account. As Sidner (1983, 366) explains “Focusing is a discourse phenomenon rather than one of single sentences.” Beaver (2004) defines the most prominent item as “the entity referred to in both the current and the previous sentence, such that the relevant referring expression in the previous sentence was minimally oblique,” where obliqueness is determined by grammatical role. Beaver adopts the term topic for this entity, which is taken to be the preferred referent of a personal pronoun appearing in the current sentence. When there is no entity mentioned in a sentence that was also in the previous sentence, the sentence has no topic. Therefore, the first sentence of a discourse has no topic by definition. Because grammatical role is used to determine the topic, this definition allows only entities mentioned in the sentence as base noun phrases.2 Using grammatical role ranking to estimate the relative salience of items in a sentence is a common feature among algorithms that need to determine which items in a sentence should be judged to be most salient. The preference ordering often used for pronoun resolution in English is determined by the surface form alone and applies only to base noun phrases. For example, Baldwin’s cogNIAC system (1997), the S-list algorithm by Strube (1998), and the influential series of pronoun resolution studies inspired by the Centering Framework (Grosz et al. (1995), Brennan et al. (1987), Tetreault (2001) ), all share the common trait of using surface linguistic properties such as grammatical role or NP form to sort base noun phrases into a list of potential antecedents ordered by salience. A common technique for resolving personal pronouns is to associate the pronoun with the most salient item in this list that matches its agreement features. Each of the algorithms listed here is reasonably successful at determining the meanings of personal pronouns
OVERLAPPING DISTRIBUTION OF PRONOUNS
145
when they co-refer with a base noun phrase, but cannot suggest meanings for pronouns that do not have a base noun phrase antecedent. This limitation has only a small impact on performance when the language to be analyzed is formal written text,3 but poses more of a problem for spoken discourse. Unlike personal pronouns, demonstrative pronouns tend to refer to entities other than the expected topic. Demonstrative pronouns are under-studied compared to personal pronouns, perhaps because they occur infrequently in written discourse. For example in a collection of Wall Street Journal articles annotated by Ge et al. (1998), only 2% of the pronouns are demonstratives. However, demonstrative pronouns occur much more frequently in spoken discourse. For example, in the TRAINS93 corpus of problem-solving dialogues, demonstrative pronouns occur in equal proportion to personal pronouns. In the Switchboard dialogues, Strube and Muller (2003) calculated that roughly 25% of the third-person pronouns are demonstratives. Therefore, our ability to develop pronoun resolution software for dialogue-based systems depends on accurate models for both demonstrative and personal pronouns. Previous studies exploring demonstrative pronouns have found very consistent usage patterns. Linde (1979) studied demonstrative pronouns in a corpus of apartment descriptions, and found that it was preferred for entities within the current local focus, while that was typically used for items outside the current focus of attention. However, she did note that a small number of pronouns in her study violated this pattern, and therefore she was not able to account for the entire distribution of pronouns in her corpus in a precise way. A similar pattern was found in a corpus of career counseling interviews (Schiffman (1985), Passonneau (1989) ). Based on this corpus, Passonneau formulated the distinction between personal and demonstrative pronouns in syntactic terms: • If both the pronoun and the antecedent are in subject position, a personal pronoun4 is highly preferred, while if either the antecedent or the pronoun is not in subject position, a demonstrative pronoun is preferred. • If the antecedent expression was more noun-like, the subsequent pronoun chosen tends to be personal, whereas if it was more clauselike, a demonstrative pronoun is preferred. The further the form of the antecedent was from being a simple NP, the more likely it was to be pronominalized as a demonstrative. When the antecedent was a pronoun (so-called pronoun chains), the pronoun chosen was far more likely to be a personal pronoun, as long as it was in subject position. Both Linde’s and Passonneau’s studies involved post-hoc analysis of discourse transcripts, in which the analysis relies on third-party observers, who were not party to the original discourse, to assign meanings to the pronouns in the corpus. Such corpus-based studies are not able to study the alternation of demonstrative and personal pronouns occurring in an identical prior context.
146
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
In contrast, Schuster (1988) created a survey in which she presented participants with different versions of a sentence pair where only the form of the pronoun was altered. An example of her stimuli is given in discourse (2): (2)
(a) John thought about {becoming a street person}i. (b (1) ) Iti would hurt his mother and iti would make his father furious. (b (2) ){Iti would hurt his mother}j and thatj would make his father furious.
Her informants interpreted the pronoun in the second conjunct of the (b) sentence differently depending on its form. Use of the personal pronoun it in option b(1) maintains the reference established by the first it (John’s becoming a street person). In contrast, use of the demonstrative pronoun (alternative b(2) ) changes the interpretation to something like (John’s mother being hurt would make his father furious). Notice that the subject of the first conjunct in the (b) sentence is a pronoun which refers to (John’s becoming a street person). This establishes entityi as the topic before the critical point of pronoun alternation, making that entity less compatible with the demonstrative pronoun. As Passonneau had observed, demonstratives tend not to refer to items in subject position, and many authors have noted that demonstratives rarely co-specify an item that has just been mentioned via a personal pronoun. Therefore, Schuster’s test sentences force the reader to find an alternative interpretation for the demonstrative pronoun in the second conjunct. Therefore, Schuster’s findings only apply to an entity that has been established as the discourse focus through use of a personal pronoun. They should not be interpreted to mean that the most salient entity of any sentence cannot be rementioned in the next sentence with a demonstrative pronoun. There is some variation in how strongly the predicted topic from a particular sentence is focused, and in some constructions the predicted topic can be pronominalized as either a personal or demonstrative pronoun, as the discourses that follow here demonstrate. In (3), the demonstrative works fairly well to re-mention the chair, which is not the most salient item in sentence a according to grammatical role rankings. In (4), the chair is the most salient item in sentence (4-a) in terms of grammatical role, yet referring to it with a demonstrative pronoun in sentence (4-b) is still felicitous. Although the chair is the most salient item in sentence (4-a), it is not in subject position. In (5), after the chair has been repeated and has thereby become the topic in sentence (5-b), using the demonstrative pronoun to refer to it in sentence (5-c) is awkward but the sentence is still interpretable. Finally, in sequence (6), after the chair has been mentioned in subject position and the personal pronoun has been used to refer to it, it is no longer felicitous to refer to it with a demonstrative pronoun in (6-c). (3)
(a) John went to pick up his favorite chair from the upholsterer’s. (b) That’s what he was giving his nephew for a wedding present.
OVERLAPPING DISTRIBUTION OF PRONOUNS
147
(4)
(a) Load the wingback chair into the car. (b) That’s what I’m giving John for his wedding present.
(5)
(a) John went to pick up his favorite chair from the upholsterer’s. (b) This upgrade was long overdue, since the chair was very old. (c) ?That’s what he was giving his nephew for a wedding present.
(6)
(a) John went to pick up his favorite chair from the upholsterer’s. (b) It was long overdue for an upgrade, since it was very old. (c) ?/*That’s what he was giving his nephew for a wedding present
Passonneau captured this preference in her definition of local center (Passonneau (1993), ∼204). The local center is stronger than the Cb and is determined by the local center establishment rule: “Two utterances U1 and U2 that are adjacent in their segment establish an entity E as a local center only if U1 contains a third person, singular, non-demonstrative pronoun N1 referring to E, U2 contains a co-specifying third person, singular, non-demonstrative pronoun N2, and N1 and N2 are both subjects or both non-subjects, in that order of preference.” This rule captures the cohesive effect of using a personal pronoun for the same item in two successive utterances—there is an extremely high preference for the second it to co-refer to the same entity. This distinction between different strengths of salience plays an important role in the distribution of personal and demonstrative pronouns. In the remainder of this chapter, we use Beaver’s definition for topics, and Passonneau’s term local center for a topic that is a personal pronoun, and the term syntactic focus to mean the item from a particular sentence that would be highest ranked using an obliqueness ranking on the arguments of the main verb. Schuster’s study found that the demonstrative pronoun is incompatible with a local center because the entity (John’s becoming a street person) is repeated in the second sentence, is in subject position, and is a pronoun. Whether demonstrative pronouns are incompatible with other types of foci will be explored in our results. In addition to their use to access less salient items, Channon (1980) suggested that demonstrative pronouns can also be used to refer to complex, heterogeneous referents that a speaker wishes to refer to, when numeric agreement is difficult to calculate. Consider the example in (7), in which a demonstrative pronoun is used to refer to the speaker’s own hamburger and coke. Using a plural pronoun for this entity doesn’t seem quite right, and a singular personal pronoun is definitely infelicitous. However, use of a demonstrative pronoun allows the speaker to hedge the agreement features for the composite entity, hamburger and coke. (7)
148
Larry had a hamburger and a coke, and I had that, too.
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
However, it should be noted that Channon’s example could also be explained in terms of salience. The combination of the hamburger + coke might be less salient than either the hamburger or the coke taken individually; and that could explain why the demonstrative that is more appropriate than unstressed it. In the computational linguistics literature, demonstrative pronouns have been explored primarily for their ability to perform discourse deixis: reference to the meaning of non-NP constituents. Consider example (8), inspired by Webber (1988): (8)
(a) Each Fall, penguins migrate to Fiji. (b (1) ) That’s where they wait out the winter. (b (2) ) That’s when it’s too cold even for them. (b (3) ) That’s why I’m going there next month.
Demonstrative pronouns often appear in copular constructions such as these. In copular constructions, a demonstrative pronoun in the subject serves as a placeholder for a high-order referent in the predicate complement. Since the demonstrative pronoun has a strong affinity with high-order referents such as propositions, events, and situations, and in most cases there is not a default interpretation for the demonstrative pronoun in a given context, its meaning must be constrained through predication. This predication is often provided through copular constructions or verb selectional restrictions.
2.2 The Givenness Hierarchy The Givenness Hierarchy (henceforth GH) formally defines the distinctions between a number of referring expression forms (Gundel et al. (1993) ), including formalizing the distinction between demonstrative and personal pronouns, which was characterized less precisely by the studies described earlier. The GH has been found to account for the distinction between personal and demonstrative pronominal forms in a number of languages, and has been successfully incorporated into several algorithms for pronoun resolution (Eckert and Strube (2001), Byron (2002a), Strube and Muller (2003) ). The GH partitions all possible referents into six categories (shown in table 7.1) according to the attentional status that the referent is assumed to have in the memory of the addressee at the time the referring expression is spoken. This hierarchy is different from others (e.g., Givon (1983), Ariel (1990) ) that considered statuses to be mutually exclusive. In the GH, each status entails all lower statuses (statuses to the right). Therefore, the set of focused entities is a subset of the set of activated entities, and all entities are ultimately included in the type-identifiable status. The authors claim that an entity’s classification in one of these categories is both a necessary and sufficient condition for referring to it appropriately with the forms listed under that
OVERLAPPING DISTRIBUTION OF PRONOUNS
149
TABLE
7.1.
In
The Givenness Hierarchy
Uniquely
Status Focus > Activated Form it/them/they this/that this N
Type
> Familiar > Identifiable > Referential > Identifiable that N the N this N aN (indefinite use)
category. Therefore, the form used for reference can be taken by the addressee to be a signal indicating where in his or her prior knowledge to search for the referent. In the GH, the in focus status is characterized as including entities that are likely to be continued as the topic in the next utterance, for example “subjects and direct objects of matrix sentences”(Gundel et al. (1993), 279). The GH predicts that unstressed personal pronouns will only be used to refer to in-focus items, while demonstrative pronouns can refer to a broader category of entities, called activated entities. Activated entities are those that are estimated to be currently in the short-term memory of the addressee, having been evoked into short term memory either by the discourse or by the physical setting. Because the in-focus entities are a subset of the activated entities, this model predicts that demonstrative pronouns will sometimes be used to refer to in focus entities, although, according to Grice’s maxim of Quantity (Q1), speakers should use the highest form that obtains for the referent. Therefore, if an entity is in focus, a personal pronoun is more appropriate than a demonstrative in most cases. The in focus status is necessary for felicitous reference by personal pronouns. However, the authors of the GH do not commit to any particular method of determining which entities are in focus. They state that in focus status is appropriate for the local discourse focus as well as “still-relevant higher-order topics”(Gundel et al. (1993), 279. As we mentioned above in section 2.1, many computational models utilize surface linguistic features such as grammatical role to determine which items are considered in focus, but the authors of the GH point out that inclusion in the in-focus set depends ultimately on pragmatic factors, and is not always uniquely determinable from the syntax (pg. 280)5. The GH does not specify a process for determining whether a particular referent is in focus at a particular point in a discourse. Rather, this is left as a matter for further research. For our discussion, the important points of this model are as follows: (1) Unstressed personal pronouns are said to refer only to items in focus, but determining which items are in focus is still an open question. (2) Demonstrative pronouns refer to items that are activated, either in focus or not, although they are preferentially used for items not in focus.
150
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
The formulation of this model leaves open two intriguing questions. First, the model does not explain the conditions that must obtain for a demonstrative pronoun to refer to in focus entities. It is allowed in the model, but other authors, as already described , have found that demonstrative pronouns tend not to be used for focused entities. What parameters influence the speaker’s choice of pronominal form when a focused entity is re-mentioned? Second, what non-syntactic factors affect salience, and do these factors account for the instances we see of personal pronouns referring to an entity that is not in focus according to the surface structure? These questions were examined in two related experiments, described later.
3. Pronouns in Problem-Solving Dialogue Conversational systems that interact in natural dialogue have been under development within the Computer Science community for some time. Building algorithms to interpret pronouns in these systems has proved quite challenging. In contrast to written discourse, spoken discourse contains many more demonstrative pronouns, and many pronouns whose referent was not previously mentioned as a base noun phrase. Both of these characteristics of spoken dialogue create problems for language-understanding software. For example, Eckert and Strube (2001) note that 22% of pronouns in a set of Switchboard dialogues had other-than-noun-phrase antecedents, and 33% had no discernible antecedent. Byron and Allen (1998) report that only 50% of pronouns in the TRAINS93 corpus have a noun phrase antecedent, and Botley (1996) found 20% of pronouns in his corpus to have no noun phrase antecedent. Nonetheless, due to the density of pronouns in problem-solving dialogue, creating a robust pronoun resolution technique is vital for computer applications to be able to converse in this or a similar domain of collaborative problem solving. This section investigates the demonstrative and personal pronouns in a set of problem solving dialogues from the TRAINS93 spoken dialogue corpus (Heeman and Allen (1995) ). In each TRAINS93 dialogue, two human participants, who were separated by a partition (to prevent their sharing a visual context) collaborate to solve a problem in a transportation logistics domain. Byron (2003) annotated the referent of each personal and demonstrative pronoun for a set of TRAINS93 dialogues. Section 3.1 examines how well this data fit the predictions of the GH and other previous research. Then, section 3.2 describes the development of an automated algorithm to interpret these pronouns, and what the success of the algorithm reveals about personal and demonstrative pronouns in this corpus. Finally, section 3.3 describes the cases for which the algorithm failed to match the pronoun with the correct referent, and what these cases reveal about the assumptions it incorporates from the prior studies described earlier.
OVERLAPPING DISTRIBUTION OF PRONOUNS
151
3.1 Pronouns in the TRAINS93 Corpus A quick glance at the TRAINS93 dialogues reveals that these dialogues abound with demonstrative pronouns (see example dialogue fragments quoted in the following pages for samples of the corpus). In fact, there are roughly an equal number of demonstrative and personal pronouns in the corpus. Therefore, it is unlikely that a conversational system could interact in this domain without successfully interpreting and producing demonstrative pronouns. This makes these dialogues a perfect vehicle for comparing and contrasting the usage of personal and demonstrative pronouns using the predictions of the GH model. In these dialogues, a pair of participants collaborate to build a plan to deliver cargo to specific cities via an imaginary set of trains. The assigned task often includes constraints such as delivering the cargo within a specified time limit or avoiding travel through certain cities. Therefore, in these dialogues, the participants discuss not only individual items in the task world such as cities and train cars but also actions that they might perform, the effects of those actions on the resulting schedule, and characteristics of the evolving plan. Such entities are high-order semantic objects, as opposed to individual referents. Whereas individual referents are associated with variables in a logical representation, high-order entities express some relation between individuals or collections of individuals (Asher (1993) ). For example, high-order entities express a property that holds for some individual at a spatio-temporal location, an event in which the individuals were involved, a fact or proposition, and so on. High-order entities in the TRAINS93 dialogues include situations, propositions, actions, facts, and so on. Discourses (9) through (12) show representative examples of pronouns with high-order referents, taken from the TRAINS93 corpus.6 These examples illustrate the versatility of pronominal referring expressions. As these examples show, reference to high-order entities is often performed by demonstrative pronouns, but can also be performed using personal pronouns. (9)
Reference to an event type [d92a–1.4]
utt73: s: oh let me just check that we don’t have two trains uh trying to cross each other on the same track utt74: u: okay utt75: s: um but I don’t think that’s happening (that = the event type of two trains crossing each other on the same track) The discourse fragment (9) is an example of pronominal reference to a highorder entity. The that in utterance 75 refers to the event of two trains crossing each other on the same track, which was not previously mentioned in
152
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
a noun phrase. Rather, it was introduced in the sentential complement in utterance 73. The reference is not to a particular event, but rather to the kind of event in general. (10)
Reference to an action type [d93–14.2]
utt37: u: how long does it take to convert the oranges into juice? utt38: s: it takes one hour (it = the action of turning oranges into orange juice) Pronouns can also refer to kinds of actions. In Example (10), it in utterance 38 refers to the action of converting oranges into orange juice, not to a specific instance of the action type. Converting oranges into orange juice was previously mentioned as an infinitive phrase complement to the verb take in utterance 37. Utterance 37 also includes an instance of a non-referential it that looks just like a pronoun, but functions merely as a syntactic placeholder so that the clausal subject can be moved to the predicate of the sentence. (11)
Reference to a property of an action [d92a–1.3]
utt49: s: okay so that’ll take two hours to get to Corning an hour to load the oranges and two hoursto get to Bath utt50: s: so that’ll be another five hours (that = the time required to perform the action of going to Corning, loading, and going to Bath) Example (11) demonstrates a reference to an entity that requires slightly more reasoning to construct. The correct interpretation of the pronoun that in utterance 50 is the amount of time required to complete an action; in other words, it is an attribute of that action rather than the action itself. (12)
Reference to a fact [d93–14.2]
utt75: u: okay and we need to pick up a boxcar of bananas in Avon utt76: s: okay um there are boxcars that are closer to Avon ifthat helps any utt77: u: um it doesn’t really matter but okay . . . (that = it = the fact that there are closer boxcars) In Example (12), an entire assertion introduces the concept that subsequently becomes the referent for two pronouns. The sentence “There are boxcars that are closer to Avon” introduces the fact, re-mentioned by that in utterance 76, which in turn is the antecedent for it in utterance 77. This example contains both a personal pronoun and a demonstrative used to refer to the same high-order entity. The first pronominalization of the concept is expressed as a demonstrative pronoun and subsequent mentions take the
OVERLAPPING DISTRIBUTION OF PRONOUNS
153
form of personal pronouns. This is a common pattern observed by several previous authors (e.g. Schuster (1988), Webber (1990), Passonneau (1985) ). Table 7.2 shows the characteristics of antecedents of the two pronoun types in this corpus, which we tabulated in order to investigate how well the corpus fit the predictions of the previous studies described in section 2.17. The antecedent of each pronoun was defined as the linguistic constituent that describes the same entity as the pronoun, and is the closest prior mention of that entity to the pronoun in linear order. As the table shows, more personal pronouns than demonstrative pronouns have an antecedent in subject position, and demonstrative pronouns tend not to refer to items that were previously mentioned with base noun phrases. This data, therefore, conforms to the patterns observed by other authors.
3.2 Automated Pronoun Resolution We used the tendencies shown in table 7.2, and the well-established practice in the computational linguistics literature to use grammatical role ranking as an indication of salience, to develop a working definition of cognitive status for use in a pronoun resolution algorithm. Our operationalized definitions of each status may not conform precisely to the GH as originally envisioned by its authors, but rather represent our attempt to formulate a procedure for organizing potential referents that could be used in an automated pronoun resolution process. At each timestep, the process must first assign a cognitive status to each available referent, which results in a partial ordering of the available referents, and then, if possible, define an ordering criterion that describes a full ordering on the resulting sets. Based on GH definitions, each entity evoked by the discourse or represented on the maps or instructions provided to the subjects at the beginning of the experiment should have at least activated status, since they could be expected to be in the discourse participants’ short-term memory. Entities arising from stretches of discourse that are larger than one base noun phrase were considered to have activated but not in focus status, and one entity from each discourse unit, based on grammatical role ranking, was assigned infocus status. The focused item is the first third-person base-noun-phrase in this order: Subject > Direct Object > Indirect Object > other arguments. This is a syntactic rather than a semantic judgment, and is based on the surface constituent that gives rise to the referent rather than on its meaning or the information structure of the discourse. TABLE
7.2.
Antecedents of Pronouns in the TRAINS93 Corpus
Antecedent is subject Antecedent is a base noun phrase Antecedent is a clause or larger
154
All pronouns
Personal
Demonstrative
29% 50% 50%
38% 73% 26%
11% 23% 70%
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
These simplified formulations of cognitive status were the bases of an automated pronoun resolution technique, called Phora, built as part of the TRIPS spoken dialogue system at the University of Rochester (Ferguson and Allen, (1998) ). The technique was evaluated using a portion of the TRAINS93 dialogues. This algorithm differs from previous implemented algorithms because we do not assume that personal pronouns refer only to in-focus entities.8 After each utterance, the algorithm partitions activated entities into those most suited for reference by personal pronouns and those most appropriate for reference by demonstrative pronouns. After each utterance, items having only activated (not focused) status are those that were either triggered by linguistic constituents larger than a base noun phrase or inferrable entities that result from domain actions. For example, squeezing oranges results in orange juice. After an orangesqueezing action is mentioned, both the squeezing event and the orange juice have activated status, whether or not the orange juice was explicitly mentioned. Only entities mentioned as base noun phrases are classified as in-focus. The algorithm addresses the problem of how to classify all of the noun phrases in a sentence, assuming only a small number of them should be labeled as in focus. There may be many entities introduced by noun phrases in a sentence, yet only a ‘small set’ of them should be considered to be in focus, according to the GH. Additionally, there is the problem of how to classify the entities introduced by noun phrases two, three, or four sentences ago. Although they are not likely as predicted topics of the next sentence, these older entities are still more likely to be re-mentioned as a personal pronoun rather than as a demonstrative pronoun. To solve these problems, this algorithm takes the extreme position that only one entity at a time, the syntactic focus, is classified as having in-focus status. This entity has the privileged status of being the default referent for personal pronouns. To accommodate all of the other entities introduced as base noun phrases, we created a new status between focused and activated called mentioned entities. This allowed us to segregate the high-order entities in the activated set from other nonfocused entities that were the referents of noun-phrases in each sentence.9 Mentioned entities remain in the context and are available for reference throughout the entire discourse, while activated entities from clause N are deleted unless re-mentioned in clause N+1. After the system assigns an interpretation to each independent clause, the referents of all referential NP’s, 10 including proper names, descriptive NP’s, and demonstrative and 3rd-person personal, possessive, and reflexive pronouns, are initially assigned mentioned status. Then, one mentioned entity from the clause is identified as the focused entity, using a simple grammatical role ranking. However, only 27% of the pronouns in this corpus refer to the syntactic focus, so the algorithm must be allowed to consider possible referents with other statuses, and to perform an expanded search, additional criteria must be applied. The additional criteria applied by Phora are semantic type restrictions on the predicate-argument positions in which the pronoun’s referent appears.
OVERLAPPING DISTRIBUTION OF PRONOUNS
155
Although it has long been acknowledged that semantics plays a role in pronoun resolution (Charniak (1972), Carbonell and Brown (1988) ), implemented algorithms have avoided the use of semantics because developing a semantic resource that will work in the general case is an unsolved problem. However, in problem-solving discourse in a constrained domain, the knowledge that must be represented is limited, and is already encoded in a system designed to collaborate on the task. Problem-solving discourse thus provides the perfect genre for the development of algorithms that incorporate semantics into pronoun resolution. Other proposed techniques for resolving demonstrative pronouns in dialogue also rely heavily on semantics (Eckert and Strube (2003) ). The algorithm also exploits the observation mentioned earlier that semantics is vital for resolving pronouns that refer to activated entities. A particular stretch of discourse might evoke several activated entities, and there is no known method for judging their level of salience relative to one another. Therefore, for a speaker to clearly indicate which activated referent he wishes to refer to, he must provide constraining clues in the semantic content of the sentence. Since demonstrative pronouns often refer to activated entities, developing an algorithm to resolve demonstrative pronouns is impossible without utilizing semantics. To select the referent for a demonstrative pronoun, more information than simply cognitive status must be employed. In Phora, semantic type constraints for each pronoun are determined as the first step of pronoun resolution. Semantic constraints come from the following: • Verb senses have associated semantic restrictions for each argument position. (e.g., “Load them into the boxcar” produces the constraint that the theme of LOAD must be CARGO). • Predicate NPs: Be-verbs and other near-copular constructions that are interpreted as EQUAL constrain the subject to be the same type as the complement. For example: “That’s the best route” constrains the referent of the subject pronoun to be a ROUTE. • Predicate Adjectives: Copular constructions containing adjectives also constrain the possible semantic type of a pronoun in subject position. For example, “It’s right” is interpreted as CORRECT(X) and the argument of the predicate CORRECT() must be a PROPOSITION. In some contexts, the pronoun’s type is left unconstrained. For example, in “That’s good” the pronoun is the argument of ACCEPTABLE(X), which can apply to any semantic type. To resolve each pronoun, the algorithm first calculates the most general semantic type that satisfies constraints from the constructions listed above. Only the remaining discourse entities that meet this semantic criterion are considered as potential candidates for the pronoun, and each pronoun type implements a unique search order. The search through entities with mentioned status works backward from the most recent clause to the beginning of the discourse. Within each clause, mentioned entities are searched in left-to-right order to find a referent that matches the
156
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
pronoun’s agreement features and the semantic constraints. The first entity encountered that matches the predication context for the pronoun is the one chosen as the pronoun’s referent. This entity might be the focused entity, but it can also be a less salient entity when the predication context is incompatible with all items of higher salience.
Personal Pronouns Search Order (1) Mentioned entities to the left of the pronoun in the current clause are searched in right-to-left order (2) The Focused entity from the previous clause (3) The remaining Mentioned entities from all clauses (4) Activated entities from the previous clause All activated entities have singular agreement except for Semantic Kinds, which have plural agreement.
Demonstrative Pronouns Search Order (1) Activated entities from the previous clause (2) The Focused entity only if it refers to a semantic Kind (3) Mentioned DE’s from the entire discourse, preferring semantically heterogeneous entities such as an engine+boxcars in each clause This/that have no numeric agreement constraint, while these/those must match plural referents. This technique was evaluated on a set of 180 test pronouns from 10 of the TRAINS93 dialogues that had been annotated with pronoun meanings. LRC, a leading pronoun resolution algorithm described in Tetreault (2001) was used as the baseline for comparing our technique. LRC correctly resolved only 37% of the pronouns in the evaluation set. The LRC algorithm utilizes linguistically determined salience ranking and selects the first antecedent it finds that matches agreement features of the pronoun. Using Phora, the pronoun resolution precision increased from 66 correct (37%) to 130 correct (72%). However, this improvement was largely gained by resolving demonstrative pronouns, which LRC is not designed to resolve properly. Resolution of personal pronouns improved from 65% to 79%, while resolution on demonstrative pronouns improved more dramatically from 14% in the baseline to 67% with Phora. The syntactic and semantic analysis module sending input to Phora was able to automatically calculate semantic constraints from the three types of constructions listed earlier. An informal analysis shows that a majority of the pronouns that the algorithm fails to resolve correctly could be resolved correctly by developing more sophisticated techniques for determining the semantic constraints (see Byron (2002a) for a discussion). The large number of personal pronouns in this corpus that do not refer to the syntactic focus might surprise some readers familiar with the computational pronoun resolution literature, which has shown that a
OVERLAPPING DISTRIBUTION OF PRONOUNS
157
large majority of personal pronouns can be resolved correctly using only grammatical role rankings or other syntactic methods for determining the expected topic. In most evaluations, pronouns are removed from the evaluation if they do not have a co-referential noun-phrase antecedent. This leaves only pronouns with co-referential noun phrases in the evaluation, and a large portion of those pronouns normally have antecedents that were the expected topic, such as the subject NP of the previous sentence. Algorithms that define their search order by preferring the syntactic focus or other preference orderings based on surface structure work well on this subset of pronouns. However, other pronouns with other types of antecedents do not get resolved properly using these techniques. Even for the personal pronouns, syntactically determined salience alone does not provide sufficient information to select the correct referent. When forced to resolve all the pronouns, the performance of traditional techniques such as LRC is somewhat lower than the levels reported from traditional evaluation procedures.
3.3 Discussion of Findings The success of Phora on this corpus of problem-solving dialogue points out the fact that personal pronouns are not restricted to referring only to referents with in focus status based on their syntactic properties. In designing the algorithm, we chose to determine the cognitive status using only surface properties of the discourse, but this was purely an engineering decision. The GH acknowledges that other factors can lead to an entity being in focus. Our findings suggest that personal pronouns are sensitive to the relative salience of potential referents, but they can refer to an item that is not in the highest cognitive status as long as enough distinguishing semantic information is supplied by the predicative context of the pronoun to preclude its referring to higher ranked referents. This outcome is in line with the observations of the earliest discussions of pronouns and salience, which described personal pronouns as referring to the most salient item whose properties match the predication context of the referent (Sidner (1983) ). This sentiment is echoed in later work by Kameyama (1998) who suggests that salience calculations provide a good preference ordering when multiple ambiguous candidates for a pronoun remain after all other possibilities have been examined. For these authors, calculating salience is not meant to be the sole process for determining the meanings of pronouns. Rather, it is a heuristic for ordering referents that remain after other processes have been applied. This is exactly what Phora does: first eliminate all entities that are incompatible with the semantics of the pronoun, then search the remaining candidates in salience order. Another important result from this study is that the resolution accuracy for both personal pronouns and demonstratives was improved by adding semantic type filtering. This implies that both types of pronouns are sensitive to the kinds of conceptual information that is utilized to deter-
158
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
mine selectional restrictions on verbs and predicate adjectives. The data in this corpus show that personal and demonstrative pronouns do not contrast quite so strongly as one might have expected. Many personal pronouns refer to activated entities and some demonstrative pronouns refer to focused entities. Although we cannot offer any definitive explanation for this distribution, this section describes some of the factors that we believe are relevant. 3.3.1 PERSONAL PRONOUNS The GH claims that personal pronouns are used to refer to items expected to be the topic based on the local linguistic context and also to “still active high order topics.” Our impoverished operational definitions of in focus and activated cognitive statuses defined earlier placed the referent of many personal pronouns in this corpus in the activated category, and Phora was able to associate the correct high-order referent with a few personal pronouns when sufficient constraining semantic information was present in the utterance. Many more pronouns would be resolved correctly if Phora made use of a more sophisticated method of determining the local topic, rather than simply ranking the referents of noun phrases in each utterance, such as marking the expected answer to a question as a focused item likely to be phrased as a personal pronoun. In addition to the local discourse structure, extralinguistic information arising from the problem-solving nature of these dialogues introduces other entities whose relevance to the solution under development makes them salient throughout the problem-solving session. In these problemsolving dialogues, the interaction is initiated by posing the question of how to construct a plan to achieve the desired results. Walker (1989) points out that, in dialogue, once a question is asked, it remains salient until answered. Roberts (1989) formalized this as the Question Under Discussion (QUD), and proposes that a stack of open QUDs serves to structure dialogue. As a result, in problem-solving conversations, particular aspects of the solution that are important for determining how well the proposed solution addresses the overall goal are salient even if they were not explicitly mentioned. The particular aspects of the solution that become salient after each action depends on the task underway. If the problem assigned to the participants requires them to deliver as much cargo as possible in a specified amount of time, then the open QUD is “how much cargo can be delivered?” On the other hand, if the assigned task is to deliver a specific quantity of cargo as quickly as possible, the open QUD is “How long will it take to do the task?” For example, in the TRAINS93 domain, when the discourse participants must determine how much time the plan will take, the following exchange is typical: (13)
(a) Send engine 1 to Elmira. (b) So that’ll/it’ll be six p.m.
OVERLAPPING DISTRIBUTION OF PRONOUNS
159
In the TRAINS93 dialogues, we observed that these inferred QUD entities were compatible with both the personal and the demonstrative pronoun, though they were more often mentioned with a demonstrative pronoun. These entities typically occur in highly constrained predication contexts, such as copular constructions, which severely restrict the referential possibilities of the pronoun. However, due to the fact that they were more often mentioned with demonstrative pronouns, it is unclear whether they should be considered to have activated or in focus status. Other entities that served as referents of personal pronouns in this corpus are the elements of the problem statement itself, rather than of the evolving solution. These entities could be considered part of the global topic. An example is shown in utterance 106 of example (14), where the pronoun it refers to the objective stated in the instructions given to the participants of the experiment. Entities in this category often appear in highly constraining contexts in which information provided in the utterance leads the addressee to the correct referent of the pronoun, since the local linguistic context does not provide it. (14)
A high-order that[d93–12.4]
topic
that
cannot
be
expressed
as
utt101 u: we’re done utt102 s: oh but I thought we had to get <sil> the orange juice to Avon utt103 u: no utt104 s: oh okay utt105 u:
utt106 s: oh I thought it was <sil> two tankers of OJ to Avon and three boxcars of bananas to Elmira Another example, referring to the problem assigned to the participants, is shown in utterance 101 of example (15). In this case, the semantics is not sufficient to constrain the pronoun’s meaning, and the use of a personal pronoun itself is the cue that reference is being made to the global topic. In many of these cases of reference to the global topic, the referent cannot be accessed with a that. Discourse (15) provides one such example. Using a demonstrative pronoun in utterance 101 seems infelicitous. The pronoun it in this utterance refers to the problem that the participants were assigned to complete. Substituting that for it changes the meaning of the sentence, and it seems impossible to construct an alternative sentence for utterance 101 that uses a that to refer to the entity referred to by it in the original sentence. Substituting that for the personal pronoun in utterance 101 switches the meaning to the action just completed, rather than to the entire plan. This appears to be related to the demonstrative pronoun’s use as a marker of contrast.
160
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
(15)
Reference to the global topic [d93–10.4]
utt96 s: that’s six a.m. + to Dansville + utt97 u: + and then to + Avon utt98 s: so that’ll be nine sorry nine p.m. utt99 u: right utt100 s: that they’ll get there utt101 u: and it’s done To summarize our observations about the personal pronouns in this corpus: • Many (65%) refer to an item predicted to be the local topic because it is the syntactic focus of the prior utterance. • The majority of the remaining instances refer to an item that could be characterized as the high-order topic, although that is also a felicitous choice in many of these instances. • It is the only felicitous choice for some high-order referents related to the global problem-solving task. 3.3.2 DEMONSTRATIVE PRONOUNS Based on the previous research that had found the demonstrative pronoun to be dispreferred for re-mentioning an entity already in focus, Phora employed a search order that did not allow demonstrative pronouns to refer to the in focus entity unless it was a semantic Kind. However, 10% (17) of the demonstrative pronouns in the corpus referred to the syntactic focus. In several cases, the demonstrative pronoun was even used to refer to the entity that was the pronominal subject of the previous sentence. Although the GH predicts that demonstrative pronouns can be used to mention in focus entities, it does not specify the conditions under which this happens. An informal subsequent analysis of the data reveals that demonstrative pronouns were used in this corpus to refer to the in focus entity in at least two conditions: (1) When the previous mention of the referent was itself a demonstrative form, even a demonstrative pronoun in subject position. (2) When the entity is a composite object that has been constructed from heterogeneous parts. The corpus contains several instances of a topic expressed as a demonstrative pronoun when the previous mention of the same referent was a demonstrative form, either a demonstrative pronoun or a demonstrative determiner such as “those boxcars.” This may be an instance of a naming effect, where repeating the referent with the same lexical form as the one previously used seems natural. Another possibility is that using the demonstrative pronoun as the subject of a sentence does not have the same effect on topical structure
OVERLAPPING DISTRIBUTION OF PRONOUNS
161
as using a personal pronoun in subject position. Passonneau’s local center establishment rule required the anaphoric mention to be a personal pronoun. Perhaps the criteria for labeling an item as the topic should also be modified to disallow demonstrative pronouns from being the topic. Some examples of this phenomenon are shown in (16) and (17). (16)
Demonstrative referring to the syntactic focus [d93–17.1]
utt18 u: are we starting from Avon? utt19 s: we can start anywhere utt20 u: anywhere we want utt21 s: we just have to get <pause> that’s where the um the banana warehouse is utt22 u: right so that’d probably be the best place to start (17)
Demonstrative pronoun referring to the topic [d93–14.2]
utt64 u: Um go to Corning from Elmira utt65 u: pick up the oranges and bring them back to Elmira to convert it into orange juice and get that to Avon utt67 s: Okay, That’ll be there at noon Out of the 17 cases of demonstrative pronouns referring to the syntactic focus, 9 cases utilize this repetition of the demonstrative form. Six of the remaining instances refer to an entity that has been conglomerated through a sequence of actions, and may no longer match the singular numeric agreement required for the personal pronoun. Because demonstrative pronouns have loose agreement features, they can be used when the speaker wishes to refer to a composite entity such as an engine plus its attached boxcars or tankers that have just been loaded with cargo. It is difficult to determine whether this conglomeration of objects is singular or plural, but the demonstrative pronoun allows the speaker to hedge the numeric agreement. However, the precise meaning of some of these pronouns is difficult to pin down in a post-hoc analysis. For example, in (17), it is difficult to determine whether the entity the speaker intended to refer to was the orange juice by itself or the entire train transporting the orange juice. The same problem is encountered in creating a post-hoc analysis of example (18). Even though the explicit antecedent for the pronoun in utterance 89 is “engine E3” in utterance 87, the meaning the speaker intended to convey may be the entire train and not just its engine. (18)
162
Demonstrative [d93–13.1]
pronoun
referring
to
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
focused
entity
utt87 s: So engine E3 to Dansville utt88 u: Yep utt89 s: Okay, that gets there at 3 a.m.
3.4 Summary of TRAINS93 Findings Task-oriented dialogue contains a high density of pronominal expressions, allowing us to compare and contrast the usage of personal and demonstrative pronouns in a constrained domain. The usage of pronouns in this corpus tended to follow the predictions of the Givenness Hierarchy, although we find some overlap in their use: unstressed personal pronouns can occasionally be used to refer to items that with only activated status, and 10% of demonstrative pronouns refer to a focused item with linguistically based salience sufficient to license the use of a personal pronoun. Both pronouns were used to refer to the most important attributes of the evolving problem-solving state, such as the number of hours consumed by a proposed sequence of deliveries. In many cases where the pronoun is used contrary to the expected pattern, additional information is provided explicitly in the semantics of the utterance to sufficiently constrain the pronoun’s referent. The innovative pronoun resolution algorithm described in this section uses the observations of the Givenness Hierarchy authors, as well as observations of other authors, to partition the available referents into those that might preferentially be re-mentioned with a personal pronoun from those more compatible with a demonstrative pronoun. This partitioning allows the algorithm to introduce additional high-order referents that most pronoun resolution algorithms would not attempt, but these additional referents do not degrade the performance of basic personal pronoun resolution since the additional referents are marked as having activated, but not in-focus, status. Through the addition of semantic filtering, the algorithm also allows personal pronouns to refer to high-order topics introduced by constituents other than noun phrases, which are present in the activated entity set. The technique performs far better than prior techniques, and also resolves not only personal pronouns but also demonstrative pronouns.
4. Linguistic versus Task Salience The study described in section 3 used examples from naturally occurring problem-solving discourse. While such discourses provide an excellent source of evidence for our analysis, working with naturally occurring language data has the disadvantage that the number of referents available at any point in the discourse is very large and uncontrolled between one speaker and the next. As a result, no two utterances are spoken under the same conditions or with the same number of referents available. Different attributes that might be modulating the speaker’s choice of pronoun form in each instance
OVERLAPPING DISTRIBUTION OF PRONOUNS
163
are difficult to identify. Moreover, we can only infer what interpretation the addressee assigned to the pronoun. Previous studies exploring the alternation of demonstrative and personal pronouns have also analyzed naturally occurring discourse (Linde (1979), Passonneau (1985), Passonneau (1989), Gundel et al. (1993), Borthen (1997), Byron and Allen (1998), Eckert and Strube (2001) ) and suffer from the same problem. One exception is the study by Schuster, described in section 2.1, which explored only one type of topiccreating sequence. In order to augment our corpus analysis, we designed a controlled experiment to investigate the interpretation of the pronouns it and that in a simple task performed with controlled stimuli. The small size of the task world used in this experiment allows us to precisely control the number of available referents and their cognitive status. In this study, we were able to explore one of the questions we introduced at the end of section 2: how does linguistically determined salience interact with other non-linguistic factors to modulate the interpretation of pronouns by human subjects. The study employed the visual world eye-tracking paradigm, which has been developed to explore the incremental, online processing of language by humans (Tanenhaus et al. (1995) ). In a previous report (Brown-Schmidt et al. (2004) ), we focused on the pattern and timing of the eye movements as they relate to online processing differences between personal and demonstrative pronouns. Here, we concentrate on the final interpretation chosen for each pronoun.
4.1 Overview of the Experiment In this experiment, four items were placed on a table in front of the participant, who then heard pre-recorded audio instructions directing him to move the items around on the table. By directing the participant to move an item, we could observe which item was chosen as the referent of a referring expression in the instruction. For example, if the direction is “Move it beside the cup,” the object that the participant moves provides clear evidence about the interpretation he assigned to the pronoun it in the instruction. This method of collecting observations provides a clearer picture of the assignment of pronoun meanings than conducting post-hoc analysis of discourse transcripts, in which the intended meaning of each utterance must be inferred from the transcript, as in our previous study. Another difference between this study and the corpus study described in section 3 is that this study does not include pronouns that refer to high-order entities. Because the subject is directed to move an object, only physical objects from the scene were feasible referents for the pronouns. The items used in the study were small toy objects that were easy to grasp and easy to distinguish visually. In half of the trials, the objects were small wooden building blocks and in the remaining half the objects were other sorts of everyday small objects such as a cup, flowerpot, and doll. The target instructions directed the subject to place an item either next to or on top of another item. In these trials,
164
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
the objects were chosen so that, when put together, they created a composite object: something that would look natural when formed from its components. For example, a bird would be placed next to or on top of a nest. Other object pairs include a butterfly and a flower, a cup and a saucer, and a lamp and table from a dollhouse. The participant’s world knowledge should allow him to conceive of the two items as individuals or as parts of a composite. We hypothesize that the on top of instruction should encourage more composite interpretations. The initial on top of or next to instruction is immediately followed by an instruction containing a pronoun, either it or that. The purpose of the experiment was to see whether the presence of this composite object would affect the interpretation assigned to pronouns in the instruction that followed. The next section describes the experimental stimulus in greater detail and the hypotheses about the interpretation of the pronouns in this study. Then, section 4.2 discusses the results.
4.2 Experimental Method Participants in this study followed instructions to manipulate small objects placed in front of them on a table. The instructions were pre-recorded to ensure that prosodic features of the instructions were kept consistent for all participants. Participants were told that the task was fairly easy and to simply ‘do the first thing that comes to mind’ upon hearing the instructions. A video record of the subject’s behavior was captured using the video stream from a light-weight head-mounted eye-tracker. Each trial consisted of four objects, as illustrated in figure 7.1. Two classes of stimuli were used: children’s blocks and everyday objects. The children’s blocks were six small, brightly colored blocks (yellow, red, purple, blue, orange, green) of different cuboid shapes. The everyday objects were slightly larger than the blocks, and more variable in size and shape. The objects included some items that form a coherent whole when placed together, such as a toy hamburger and a plate. Example instruction sequences from trials using the children’s blocks (ex. (19) ) and the objects (ex. (20) ) are presented here. A
B
7.1. A. Artist’s Rendering of Blocks Condition. B. Artist’s Rendering of Objects Condition.
FIGURE
OVERLAPPING DISTRIBUTION OF PRONOUNS
165
Sixty-four sets of four instructions were used in the study. Half of the subjects heard the thirty-two blocks trials first, and half heard the objects instructions before working with the blocks. Half of the instruction sequences contained a pronoun. The entire experiment lasted approximately one hour. (19)
(Scene includes a red, blue, green, and yellow block.) a. Put the red block next to the blue block. b. Now put that on the green block. c. Put the green block on the yellow block. d. Now put the blue block on the yellow block.
(20)
(Scene includes cup, saucer, sock, shovel.) a. Put the cup on the saucer. b. Now put it over by the shovel. c. Put the sock on the shovel. d. Now put the cup in front of the sock.
An important feature of the experimental design is that the syntactic construction of each sentence is identical. In each set of instructions, the first (a) sentence mentions two of the objects in the scene: the object that the participant is instructed to move, appearing in the grammatical role of Direct Object (indicated in italics) and the object used as the destination for the moved object (henceforth the Goal), which is mentioned in an adjunct position as the object of a prepositional phrase. Therefore, using only grammatically determine salience to assign the cognitive status of the objects, when the pronoun is encountered in the (b) sentence, the moved item has in focus status since it is the most salient among the mentioned items, and the Goal object should be slightly less salient. The other two items in the scene, as well as the imagined composite object, have activated status, since they are present in short-term memory but they have not been brought into focus linguistically. Before each set of four instructions, all objects were removed from the table and the scene was re-set for the next instruction sequence, so there should be no carry-over of discourse or task salience effects from one trial to the next. The study manipulated the following three factors: (1) Pronoun type (it or that) (2) Object type (everyday objects or children’s blocks) (3) Item location (whether the moved item was placed on top of or next to the Goal object) Crucially, these manipulations do not modify the expected effects of syntactic structure on the cognitive status of referents available for the interpretation of the pronoun at the onset of the (b) instruction. In all conditions, the linguisti-
166
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
cally determined salience ranking of items at the onset of the (b) sentence is Direct Object > Goal > other unmentioned objects in the scene. Within each instruction set, pronoun type and location were both manipulated using a Latin square design so that each participant was presented a total of 32 target pronoun sentences;:four in each of the eight different conditions. The (b) instruction asks the participant to move either it or that. There are at least five possible interpretations of this instruction that are consistent with the semantics of the sentence: each of the four items individually or the composite. Since the semantics of the instruction allow the participant to select any of these five interpretations, he has no other information to use to select an interpretation other than the salience structure established at the beginning of the sentence and the form of the pronoun used in the instruction. If the claims of the GH and other theories are correct, the listener should use the form chosen for the pronoun as an indication of the intended referent. Our hypothesis is that the personal pronoun it will be interpreted as referring to the Focus (the item that was just moved in the (a) instruction). Also, the parallelism between the (a) and (b) instructions (“Put the cup . . . Put it”), should provide an additional reason for the item moved in the first sentence to be moved again in the second instruction. But what about the demonstrative pronoun? At the point in each trial when the pronoun is encountered, all five referents have activated status, but our participants should take the form of this expression (demonstrative rather than personal pronoun) as a signal to select an item with activated but not in focus status. That still leaves them with four objects to choose between. Previous experiments have shown that participants fixate task-relevant objects, and they may disregard other possible referents with similar semantic properties (Brown-Schmidt et al. (2002), Dzikovska and Byron (2000) ). So the two items in the scene that were not mentioned in the (a) sentence should be considered to be ‘out of play’; therefore, we do not expect participants to select them. The remaining two items were mentioned in the first sentence and together form the composite object. As Channon (1980)showed, demonstrative pronouns, due to their loose agreement features, can be used to refer to conglomerations of items with unclear agreement features, when neither a singular nor a plural pronoun seems to match the referent. We found several such examples in the TRAINS93 dialogues. Also, as we discussed in section 3, demonstrative pronouns are often used to refer to the salient outcomes or results of actions during task-oriented discourse. For both of these reasons, we expected the composite object to be the preferred referent of demonstrative pronouns. One might expect the Goal object to be the preferred referent of the demonstrative because it has been mentioned but is not the Focus. However, the Goal object is difficult to classify in terms of its cognitive status. It is not the most salient item, but it should have fairly high salience since it was mentioned in the (a) instruction. Therefore, if the listener interprets the demonstrative pronoun as referring to the item that is the most salient without being the Focus, the Goal object would be a reasonable choice.
OVERLAPPING DISTRIBUTION OF PRONOUNS
167
Our use of two sets of stimuli (blocks and objects), as well as the location manipulation, was intended to vary the degree to which the participants were able to conceive of the composite object as a coherent whole. If the (a) instruction had the participant set one object next to another, this should not result in a clear composite, while setting one object on top of another forms a stronger composite. The objects should invoke a stronger composite effect than the blocks, since they were intentionally chosen to form strong composites (e.g., the cup on the saucer). In summary, we predicted that personal pronouns would be interpreted as referring to the linguistically focused entity in all conditions. In contrast, demonstrative pronouns refer either to the Goal object or the composite, with the proportion of composite interpretations varying according to the salience of the composite object.
4.3 Results The data included here were collected from 16 paid undergraduate students recruited from the University of Rochester community. To tabulate the participants’ responses, research associates viewed a videotape of each participant’s behavior and noted the object that the participant moved in each target trial. The object the participant moved as a response to the (b) instruction was considered to be the referent he assigned to the pronoun. Table 7.3 indicates the referent choice of our participants in each condition. The Blocks/next-to condition is the one with the smallest expected composite effect, and the Objects/on-top is the condition in which the composite effect is expected to be strongest. The interpretation preferences suggest that we successfully modulated the salience of the composite object: there are more composite interpretations in on-top than in next-to trials, and there are also more composite interpretations in objects trials than in blocks trials. Results for the interpretation of personal pronouns, shown in table 7.3, show that in nearly all of the next to trials, participants interpreted the pronoun it as referring to the Focus object, as predicted by the linguistically determined
TABLE
7.3.
Referent of Pronouns in the Object Manipulation
Study Next-to
“It”
Blocks Objects
On-top
Focus
Composite
Focus
Composite
95% 98%
5% 2%
83% 60%
17% 40%
Next-to
“That” Blocks Objects
168
On-top
Focus
Composite
Focus
Composite
52% 50%
39% 43%11
21% 12%
75% 88%
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
topical structure. Based on the discourse alone, the composite object must be considered to have at most activated status, and therefore to be inappropriate as the referent of an unstressed personal pronoun. However, even in this condition, the composite was chosen a small percent of the time. In the on-top condition, participants sometimes interpreted the personal pronoun as referring to the composite. This happened much more often in the objects trials (40%) than in the blocks trials (17%). This large number of composite interpretations of the personal pronoun is surprising given that, based on the form of the instruction sequences, the Focus is a strong default referent due to linguistic clues, and is also a fairly poor match for the singular numeric agreement of the pronouns it. There are two possible ways to interpret this data. Either 1) personal pronouns can sometimes refer to entities that are at most activated but not in focus, or 2) on some trials, the participant experienced the composite entity as being more salient than either of its components. Although the TRAINS93 data establishes the fact that personal pronouns can refer to activated entities, because of their strong preference for the highest salience item, they tend only to refer to non-focused entities when they appear in a predicative construction that makes their semantics incompatible with the more salient entities. In this study, however, the predicative constraints of ‘Put X . . .’ are an equally good match for either the Focus or the composite. Therefore, we attribute our findings to the effects of salience that arise from the task structure. In the on top condition, it would be reasonable for the participant to infer that the intent of the (a) instruction was to direct him to build a composite from two individual objects. Construction of the composite may keep attention on the composite as participants continue the task in the (b) instruction. Therefore, in the ontop condition, we observed that the linguistic and the task structure compete to bring either the Focus or the composite into highest salience. This provides one clue about how linguistically and pragmatically determined salience interact. The objects that result from actions in the task are highly salient and compete as referents for personal pronouns with the linguistically determined topic. In future work it will be important to evaluate this hypothesis by systematically varying the strength of the task-based focus. Interpretation of the demonstrative was modulated by the location manipulation as expected (see table 7.3). We observed more composite interpretations for the objects than for the blocks, and more composite interpretations in the on-top condition than in the next-to condition. We also observed a large number of Focus interpretations of the demonstrative pronoun. Although it is a surprise to have so many Focus interpretations for the demonstrative pronoun, this is allowed by the GH model since the set of activated entities does include the in-focus entities. Across all of the next-to trials, approximately 50% of the demonstrative pronouns were interpreted as referring to the in-focus item. This stands in stark contrast to the findings of the authors surveyed in section 2, based on which we would expect the majority of demonstrative pronouns to refer to non-focused items.
OVERLAPPING DISTRIBUTION OF PRONOUNS
169
In the next-to condition, where it was relatively easy to conceive of the Focus and the Goal as separate objects, one might expect participants to interpret the demonstrative pronoun as referring to the nonfocused Goal object, but surprisingly this happened in less than 4% of trials. Participants rarely interpreted either pronoun as the Goal object alone. In summary, we can see that in this simplified referential world, the expected function of the demonstrative pronoun to select a nonfocused referent did not obtain for a large portion of trials. An alternative explanation is that our artificially constructed sentences were awkward or unnatural. Another interpretation is that the parallelism of the (a) and (b) instructions might have led some participants to have a strong expectation to move the same entity in the (b) instruction as they had in the (a) instruction. When interpreting the demonstrative, participants may have maintained this assumption without strong evidence to the contrary. Thus, a noun phrase with a definite article would be the preferred form for referring to the Goal in discourses like these (Dahan (2002) ). In summary, the results from this experiment show that the interpretation of both it and that was modulated by world knowledge that affected how easily the Theme + Goal could be conceived as a composite entity. This effect was strongest for the objects conditions, but can also be seen in the blocks condition. These observations were confirmed by analyses of variance,11 which suggest that both classes of pronoun were sensitive to manipulations of the extra-linguistic context. Apart from their contrastive use in terms of attentional salience, demonstrative and personal pronouns are similar in that pragmatic factors clearly guided the interpretation of both types of pronouns. The strongest evidence comes from the increase in composite interpretations for the objects/on-top condition compared to the blocks/on-top condition. This effect can only be due to the knowledge that allows the participant to conceive of the two items as one composite object.
5. Summary and Implications Both of the studies described in this chapter examined the similarities and differences between personal and demonstrative pronouns. The first study examined natural conversational data from problem-solving dialogues, and the second examined the interpretation of pronouns in constructed sentences in a controlled experimental setting. Previous studies had consistently described personal and demonstrative pronouns as serving contrastive functions across the dimension of attentional salience, but these studies described that distinction as only a tendency. Although it has long been known that demonstratives and personal pronouns have overlapping referential domains, as the Givenness Hierarchy predicts, no previous studies had concentrated on shedding light on the cases where personal pronouns refer to less salient items and demonstrative pronouns refer to focused items. Pronouns in our data again exhibited these tendencies, with demonstratives most often referring to an item not previously placed in focus, and personal pronouns often referring to the predicted topic as determined by local linguistic structure.
170
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
However, the referents of both types of pronouns ranged widely over the spectrum of activated entities. These references succeed because the pronoun’s form is just one clue that an addressee utilizes during the interpretation process. Another powerful source of information applied during pronoun interpretation is conceptual structure. The meaning of both personal and demonstrative pronouns was modulated by conceptual structures: either task-based semantics in the TRAINS93 data, or the composition of individual items into complex entities in the blocks/objects study. This sensitivity to conceptual structures has a direct implication for automated models of pronoun resolution. Past algorithms have relied almost exclusively on using surface linguistic features such as word order or grammatical roles to determine a salience structure that orders the search for a pronoun’s antecedent. Algorithms designed in this way are limited to resolving the pronouns that co-refer with a syntactically focused noun phrase. Both personal and demonstrative pronouns can refer to less salient referents as long as sufficient distinguishing semantic constraints are supplied in the utterance. By utilizing this semantic information, an algorithm can successfully resolve pronouns that do not refer to the most salient item. Our data also provides evidence for ways in which computational models of salience should be refined to improve pronoun resolution performance. Rather than expecting the most salient item from a sentence (as determined by syntactic features) to be re-mentioned as a personal pronoun and other items to be re-mentioned with the demonstrative, the picture is more complex. As the GH specifies, entities made salient by the task structure are compatible with the personal pronoun. Whether these items should be considered to have the same in focus salience status as other items made salient by local discourse structure is a point that merits further exploration. Our data also suggests that the demonstrative pronoun is incompatible with most topics (especially local centers), but can be used to re-mention a topic that was itself a demonstrative form. The demonstrative pronoun can also be used to refer to the most salient entity from a sentence with no topic, such as the first sentence of a discourse. Besides providing these additional details about the interaction between pronominal forms and the salience of their referents, our data also highlights the fact that agreement features of the referent must also be considered when discussing which pronominal form is appropriate in a particular instance. Some composite entities do not match the singular numeric agreement of the personal pronoun, and therefore a demonstrative pronoun must be used even if the entity is currently in focus. Other activated entities are most compatible with the specifically singular agreement feature of it. These findings all lead to the conclusion that in addition to the cognitive status of the referent, many other factors come into play to determine which pronominal form will be used to express a particular meaning in a particular sentence. The usage conditions on these two types of pronominal form are more complex than previous studies had revealed. Although the addressee will take the use of a pronominal form as a signal that the referent has at least activated status, determining the referent of the pronoun requires accessing and using many other types of knowledge.
OVERLAPPING DISTRIBUTION OF PRONOUNS
171
NOTES 1. Referring to an animate entity with a demonstrative pronoun is felicitous only as an exophoric reference or anaphorically by using a copulative construction (Fillmore (1982) ). 2. A base noun phrase is a noun phrase that does not contain another noun phrase. to be considered as the sentence’s topic 3. For example, in the Wall Street Journal evaluation corpus used by Tetreault (2001) to evaluate the LRC algorithm, 1,699 pronouns co-refer with another noun phrase, but another 834 pronouns in the corpus were excluded from testing because of a variety of reasons: 311 plurals required set construction, 135 referred to an action or event, 52 were demonstrative pronouns, and 336 appeared in indirect speech. These types of data exclusions are typical for all computational pronoun resolution studies. 4. Passonneau used the term ‘definite pronoun’ for personal pronouns such as it. 5. Although, of course, there are syntactic constructions, such as clefts, in which the focused item is unambiguously indicated by syntax. 6. In the examples, numbers of the form d93–10.4 represent the dialogue identification number, uttx means utterance number x in sequence in the dialogue, and S/U indicates the dialogue participants, both humans. S is the one assigned to play the role of the system, and U is the participant assigned to play the role of the user. The notation <sil> indicates a pause and indicates a breathing noise. 7. See Byron (2003) for inter-annotator agreement on these attributes. 8. A more detailed account of the algorithm’s design is provided in Byron (2002a). 9. These entities would be labeled as having familiar cognitive status using the original GH definitions. 10. Constituents that function syntactically as NP’s, but that do not refer, do not evoke mentioned entities. Examples are predicate complements, frequency adverbials and expletives (as in “It’s good that you cleaned up”). 11. An ANOVA including pronoun type (it that), object location (on top/next to) and object type (objects/ blocks) was significant for all three main effects, as well as a significant object type by object location interaction. The remaining three-way and the two-way interactions were not significant (all F’s < 1.5) (Brown-Schmidt et al. (2003) ).
REFERENCES Ariel, Mira. (1990). Accessing Noun-Phrase Antecedents. Routledge. Asher, Nicholas. (1993). Reference to Abstract Objects in Discourse. Kluwer Academic Publishers. Baldwin, Breck. (1997). Cogniac: High precision coreference with limited knowlege and linguistic resources. In Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts (ACL-97 workshop), pages 38–45. Beaver, David I. (2004). The optimization of discourse anaphora. Linguistics and Philosophy, 27(1):3–56.
172
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Borthen, Kaja; Fretheim, Thorstein; and Gundel, Jeanette K. (1997). What brings a higher-order entity into focus of attention? Sentential pronouns in english and norwegian. In Mitkov, Ruslan and Boguraev, Branimir, editors, Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts, pages 88–93. Association for Computational Linguistics. Botley, Simon. (1996). Comparing demonstrative features in three written English genres. In Approaches to Discourse Anaphora: Proceedings of the Discourse Anaphora and Resolution Colloquium (DAARC96), pages 86–105. Brennan, Susan E.; Friedman, Marilyn W.; and Pollard, Carl J. (1987). A centering approach to pronouns. In Proceedings of ACL ’87, pages 155–162. Brown-Schmidt, Sarah; Campana, Ellen; and Tanenhaus, Michael K. (2002). Reference resolution in the wild. In Proceedings of the 24th Annual Meeting of the Cognitive Science Society, Fairfax, VA. Brown-Schmidt, Sarah; Byron, Donna K. ; and Tanenhaus, Michael K. (2004). That’s not it and it is not that: Reference resolution and conceptual composites. In Carreiras, Manuel and Clifton, Chuck, editors, The online study of sentence comprehension: Eyetracking, ERP, and beyond, pages 209–228. Psychology Press. Byron, Donna K. and Allen, James F. (1998). Resolving demonstrative pronouns in the TRAINS93 corpus. In New Approaches to Discourse Anaphora: Proceedings of the Second Colloquium on Discourse Anaphora and Anaphor Resolution (DAARC2), pages 68–81. Byron, Donna K. (2002a). Resolving pronominal reference to abstract entities. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL ’02), pages 80–87. —— . (2003). Analysis of pronominal reference in two spoken language collections. Technical Report 703, University of Rochester Computer Science Dept. Carbonell, Jaime G. and Brown, R. D. (1988). Anaphora resolution: a multistrategy approach. In Proceedings of the 12th International Conference on Computational Linguistics (COLING ’88), pages 96–101. Channon, Robert. (1980). Anaphoric that: A friend in need. In Kreiman, J. and Ojeda, A., editors, Papers from the Parasession on Pronouns and Anaphora, pages 98–109. Chicago Linguistic Society. Charniak, Eugene. (1972). Toward a model of children’s story comprehension. Technical Report AI TR-266, Massachusetts Institute of Technology. Dahan, Daphne; Tanenhaus, Michael K.; and Chambers, Craig G. (2002). Accent and reference resolution in spoken-language comprehension. Journal of Memory and Language 47:292–314. Dzikovska, Myroslava O. and Byron, Donna K. (2000). When is a union really an intersection? Problems resolving reference to locations in a dialogue system. In Proceedings of the Fourth workshop on the semantics and pragmatics of dialogue (GOTALOG). Eckert, Miriam and Strube, Michael. (2000). Dialogue acts, synchronising units and anaphora resolution. Journal of Semantics, 17(1):51–89. Ferguson, George and Allen, James F. (1998). TRIPS: An intelligent integrated problem-solving assistant. In Proceedings of the National Conference on Artificial Intelligence (AAAI ’98). Fillmore, Charles J. (1982). Towards a descriptive framework for spatial deixis. Speech, Plans and Action, 3(4):31–59.
OVERLAPPING DISTRIBUTION OF PRONOUNS
173
Garnham, A., editor. (2001). Mental models and the interpretation of anaphora. Psychology Press. Garrod, S. C. and Sanford, A. J. (1982). The mental representation of discourse in a focused memory system: Implications for the interpretation of anaphoric noun phrases. Journal of Semantics, 1:21–41. Ge, Niyu; Hale, John; and Charniak, Eugene. (1998). A statistical approach to anaphora resolution. In Proceedings of the Sixth Workshop on Very Large Corpora, pages 161–170. Givon, Talmy. (1983). Topic continuity in discourse: a quantitative cross-language study. John Benjamins, Amsterdam. Grosz, Barbara J.; Joshi, Aravind K.; and Weinstein, Scott. (1995). Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2):203–226. Gundel, Jeanette K.; Hedberg, Nancy; and Zacharski, Ron. (1993). Cognitive status and the form of referring expressions in discourse. Language, 69(2):274–307. —— . (2005). In Branco, A. McEnery, T.; and Mitkov, R., editors. Anaphora Processing: Linguistic, cognitive, and computational modelling. Pronouns without NP antecedents: How do we know when a pronoun is referential?, pages 351–364. John Benjamins. Heeman, P. and Allen, J. (1995). The Trains spoken dialog corpus. CD-ROM, Linguistics Data Consortium. Kameyama, Megumi. (1998). Intrasentential centering: A case study. In Walker, Marilyn ; Joshi, Aravind ; and Prince, Ellen, editors, Centering Theory in Discourse, pages 89–112. Clarendon. Linde, Charlotte. (1979). Focus of attention and the choice of pronouns in discourse. In Givon, Talmy, editor, Syntax and Semantics 12: Discourse and Syntax, New York. Academic Press. Passonneau, Rebecca J. (1989). Getting at discourse referents. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL ’89), pages 51–59. —— . (1993). Getting and keeping the center of attention. In Bates, M. and Weischedel, R., editors, Challenges in natural language processing, pages 179– 226. Cambridge University Press. Prince, Ellen F. (1981). On the inferencing of indefinite this nps. In Joshi, Aravind K.; Webber, Bonnie Lynn; and Sag, Ivan, editors, Elements of Discourse Understanding, pages 231–250. Cambridge University Press. Roberts, Craige. (1989). Model subordination and pronominal anaphora in discourse. Schiffman, Rebecca. (1985). Discourse constraints on ‘it’ and ‘that’: A study of language use in career-counseling interviews.Ph.D. thesis, University of Chicago. Schuster, Ethel. (1988). Anaphoric reference to events and actions: A representation and its advantages. In Proceedings of the 12th International Conference on Computational Linguistics (COLING ’88), pages 602–607. Sidner, Candace L. (1983). Focusing in the comprehension of definite anaphora. In Brady, M. and Berwick, R., editors, Computational Models of Discourse, pages 363–394. Strube, Michael and Muller, Christoph. (2003). A machine learning approach to pronoun resolution in spoken dialogue. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL ’03), Sapporo, Japan.
174
how do we select forms of referring expression?
Strube, Michael. (1998). Never look back: An alternative to centering. In Proceedings of ACL’98, pages 1251–1257. Tanenhaus, M. K.; Spivey-Knowlton, M. J.; Eberhard, K. M.; and Sedivy, J. E. (l995). Integration of visual and linguistic information in spoken language comprehension. Science, 268:1632–1634. Tetreault, Joel. (2001). A corpus-based evaluation of centering and pronoun resolution. Computational Linguistics, 27(4):507–520. Walker, Marilyn A. (1989). Evaluating discourse processing algorithms. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL ’89), pages 251–261. Webber, Bonnie Lynn. (1988). Discourse deixis: Reference to discourse segments. In Proceedings of the 26th Annual Meeting of the Association for Computational Linguistics (ACL ’88), pages 113–122. —— . (1990). Structure and ostension in the interpretation of discourse deixis. Technical Report MS-CIS-90-58, Department of Computer and Information Science, University of Pennsylvania.
OVERLAPPING DISTRIBUTION OF PRONOUNS
175
8
Reference, Centers, and Transitions in Spoken Spanish maite taboada
1. Introduction The question that much of the research on anaphora attempts to answer is: how does a speaker choose which referring expression to use? One assumption is that the speaker uses the referring expression that conveys the exact amount of information that the hearer will need in order to interpret the current utterance correctly. Given a possible choice between he, this man, the man, and John, it is plausible that a speaker will choose one that will help the hearer link to the intended referent with the minimum amount of effort. If the conversation has been about John throughout, with no other male referent intervening, he is probably the most common choice. If the speaker uses John instead, she might indicate that the hearer is to pay attention to the referent, or that a new John has been introduced in the conversation. Any explanation needs to not only account for the most typical realization (i.e., the expected realization) but also explain what factors are involved when the choice is contrary to expectation. Bolinger formulates the question in the following terms: “At X location, what reason might the speaker have for using a word that is leaner in semantic content rather than one that is fuller, or vice versa?” Usually this means “Why use a pronoun?” or “Why repeat the noun?” (Bolinger, 1979: 290)
176
Different explanations have been proposed to account for how the choices are made, and for the effects of such choices, such as Gundel et al.’s (1993) Givenness Hierarchy or Ariel’s (1996) accessibility marking scale. In these, the form of the referring expression is linked to the salience of the referent. Other explanations emphasize the importance of first mention (Carreiras et al., 1995; Gernsbacher and Hargreaves, 1988), or syntactic organization (Gordon et al., 1999). In this chapter, I explore a different way to explain the form of a referring expression, by applying Centering Theory (Grosz et al., 1995). Centering Theory is a theory of local focus in discourse that proposes different transition types between any pair of utterances. Those transitions are based on salience, but also on the expectations that the hearer might have about the focus of the next utterance. Researchers within Centering Theory have already proposed that there is a relation between the form of a referring expression in a given utterance and the transition linking that utterance to the previous one, or that Centering structures guide the interpretation of pronouns in discourse (Brennan, 1995; Di Eugenio, 1998; Gordon et al., 1993; Hudson-D’Zmura and Tanenhaus, 1998; Roberts, 1998; Walker, 1998). I extend that research by applying Centering to Spanish spoken discourse. It should be obvious that transition type is not the only factor involved: Centering proposes four transition types; most languages number more than four choices in their repertoire of referring expressions, meaning that more than four referring forms are possible for a given entity. For example, Gundel et al. (1993) propose six cognitive statuses and at least seven different referring expressions in English that denote them. That means that other factors must be at play in the choice. In this chapter, I also explore some of those factors. The study was carried out on two corpora of spoken Spanish. The first one, the Interactive Systems Lab corpus, is a collection of task-oriented conversations between two speakers. The second one is the CallHome corpus, a set of telephone conversations between relatives or friends. A total of fourteen conversations from the two corpora were annotated according to Centering theory. The chapter is structured as follows: section 2 will briefly introduce Centering Theory; section 3 describes its application to spoken discourse, in particular as regards to segmentation. section 4 explains the process of constructing the list of entities, the Cf list. The results of the corpus analysis are presented and discussed in section 5, with section 6 providing conclusions.
2. Centering Theory Centering (Grosz et al., 1995; Walker et al., 1998) was developed within a theory of discourse structure (Grosz and Sidner, 1986) that considers the interaction between (i) the intentions, or purposes, of the discourse and the discourse participants, (ii) the attention of the participants and (iii) the structure of the discourse. Centering is concerned with the participants’ attention
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
177
and how the global and local structures of the discourse affect the referring expressions and the overall coherence of the discourse. It models the structure of local foci in discourse, that is, foci within a discourse segment. Centers are semantic entities that are part of the discourse model of each utterance in the segment. For each utterance, Centering establishes a ranked list of entities mentioned or evoked, the forward-looking center list (Cf). The list is ranked according to salience, defined most often in terms of grammatical relations (see section 4). The first member in the Cf list is the preferred center (Cp). Additionally, one of the members of the Cf list is a backward-looking center (Cb), the highest ranked entity from the previous utterance that is realized in the current utterance. Example (1) illustrates these concepts.1 Let us assume that the utterances in the example constitute a discourse segment. In the first utterance, (1a), there are two centers: Harry and snort. (1a) does not have a backward-looking center (the center is empty), because this is the first utterance in the segment. In (1b), two new centers appear: the Dursleys and their son, Dudley. The lists include centers ranked according to two main criteria: grammatical function and linear order. (Ranking will be further discussed in section 4.) The Cf list for (1b) is: Dursleys, Dudley.2 The preferred center in that utterance is the highest ranked member of the Cf list, that is, Dursleys. The Cb of (1b) is empty, since there are no common entities between (1a) and (1b). In (1c), a few more entities are presented, and they could be ranked in a number of ways. To shorten the discussion at this point, I will rank them in linear order, left to right. In most important entities seem to be the Subject, which is the same as in (1b), Dursleys; and Dudley, realized by in the possessive adjective his (twice). The Cp is Dursleys, since it is the highest ranked member of the Cf list, and the Cb is also Dursleys, because it is the highest ranked member of (1b) repeated in (1c). The new utterance, (1d), reintroduces Harry to the discourse, and links to (1c) through Dudley, which is the Cb in (1d). (1)
a. Harry suppressed a snort with difficulty. b. The Dursleys really were astonishingly stupid about their son, Dudley. c. They had swallowed all his dim-witted lies about having tea with a different member of his gang every night of the summer holidays. d. Harry knew perfectly well that Dudley had not been to tea anywhere; e. he and his gang spent every evening vandalising the play park, [ . . . ]
In (2) we see the Cf, Cp, and Cb for each of the utterances in the segment: (2)
a. Cf: Harry, snort Cp: Harry – Cb: Ø
178
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
b. Cf: Dursleys, Dudley Cp: Dursleys – Cb: Ø c. Cf: Dursleys, Dudley, lies, tea, member, gang, night, holidays Cp: Dursleys – Cb: Dursleys d. Cf: Harry, Dudley, tea Cp: Harry – Cb: Dudley e. Cf: Dudley, gang, evening, park Cp: Dudley – Cb: Dudley In addition to the different types of centers, Centering proposes transition types, based on the relationship between the backward-looking centers of any given pair of utterances, and the relationship of the Cb and Cp of each utterance in the pair. Transitions, shown in table 8.1, capture the introduction and continuation of new topics. Cbi and Cpi refer to the centers in the current utterance. Cbi-1 refers to the backward-looking center of the previous utterance. Thus, a continue occurs when the Cb and Cp of the current utterance are the same and, in addition, the Cb of the current utterance is the same as the Cb of the previous utterance. Transitions capture the different types of ways in which a conversation can progress: from how an utterance refers to a previous topic, the Cbi-1, and it is still concerned with that topic, the Cpi, in a continue, to how it can be not linked at all to the previous topic, in a rough shift. Transitions are one explanation3 for how coherence is achieved: a text that maintains the same centers is perceived as more coherent. In example (1), the first utterance has no Cb, because it is segment-initial, and therefore no transition (or a zero-Cb transition). The transition between (1a) and (1b) is also zero. Between (1b) and (1c) there is a continue transition, because the Cb of (1b) is empty, and the Cp and Cb of (1c) are the same, Dursleys4. Utterance (1d) has a different Cb from (1c), and it also shows different Cb and Cp, producing then a rough shift in the transition between (1c) and (1d). Finally, (1e) and (1d) are linked by a retain transition. Because transitions capture topic shifts in the conversation, they are ranked according to the demands they pose on the reader. The ranking is: continue > retain > smooth shift > rough shift. This transition ranking is often referred to as Rule 2 in the Centering paradigm. Centering predicts that continue will
TABLE
Cbi=Cpi Cbi¹Cpi
8.1.
Transition Types
Cbi=Cbi-1 or Cbi-1= Ø
Cbi¹Cbi-1
continue retain
smooth shift rough shift
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
179
be preferred to retain, and retain to shifts, all other things being equal. The preference applies both to single transitions and to sequences of transitions. Rule 1 captures the preference for pronouns when the same topic of discourse is continued. The formulation of Rule 1 is as follows: For each Ui in a discourse segment D consisting of utterances U1, . . . , Um, if some element of Cf(Ui-1, D) is realized as a pronoun in Ui, then so is Cb(Ui, D).
Rule 1 is sometimes referred to as the Pronoun Rule. It captures the fact that a topic that is continued from a previous utterance does not need to be signaled by more explicit means than a pronoun (or a zero pronoun, in languages that allow those). Other pronouns are of course allowed in the same utterance, but the most salient entity must be realized by the least marked referring expression. In (1c), the backward-looking center, Dursleys, is realized as a pronoun, following Rule 1, since other pronouns are also present in the utterance (his to refer to Dudley). Relationships have been established between the transition type between a pair of utterances, and the type of referring expression chosen to realize entities in the second utterance in the pair. Di Eugenio (1998) found that continue transitions, because they keep the same center, often encode the subject as a zero pronoun in Italian. Shifts (smooth or rough) result in less pronominalization. We will see that these relationships are quite complex, and different factors come into play in the choice of referring expression.
3. Centering and Spoken Language The Centering framework has been applied to both constructed examples and naturally occurring discourse, but not widely to spontaneous conversation. There are a number of issues involved in such application, namely the segmentation into Centering units (utterances), the presence of false starts and backchannels, linearity and overlap, and the presence of first and second person pronouns. I discuss each one of those in this section. The approach taken here to apply Centering to spoken dialogue owes much to the work done by Byron and Stent (1998). They report experiments on different variations of segmentation, false starts, inclusion of first and second person pronouns, and linearity. The model for dialogue adopted here is Byron and Stent’s Model 1, that is, a model where both first and second person pronouns are included in the Cf list. In addition, utterances are consecutive: in the search for Cbn, only Cfn-1 is searched, whether it was produced by the same speaker or not. Byron and Stent (1998) found that this model performed better than models that discarded first and second person pronouns, and models that considered previous or current speaker’s previous utterance.5
3.1 Utterance Segmentation The first step in a Centering analysis involves deciding on the minimal units of analysis, commonly referred to as ‘utterances’. The notions of discourse
180
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
segment and utterance are very important: Centering predicts the behavior of entities within a discourse segment; centers are established with respect to the utterance. In this chapter, I use the term ‘utterance’ or ‘segment’ to refer to the units of analysis in Centering Theory. In other applications, ‘segment’ or ‘discourse segment’ refers to the broad parts into which a discourse can be divided (e.g., introduction, thesis statement), or to discourse segments that achieve a purpose each (Grosz and Sidner, 1986). I am not concerned with those higher level discourse segments here, but only with minimal units of analysis, typically interpreted to be either entire sentences or finite clauses. These concerns are general to Centering applications, but even more pressing when dealing with spoken language, where the notion of sentence is more difficult to instantiate. That is why, in spoken language, traditional notions of clause and sentence are abandoned in favor of the idea of an utterance (Schiffrin, 1994). In general, an utterance is an intonation unit. In the corpora studied, utterances are already marked in the transcripts. For the ISL corpus, an utterance is defined as an intonation unit marked by either a period or a question mark. Note that a comma does not always define an utterance. In example (3), the period after Miriam indicates falling intonation, as in the end of a sentence. There are, therefore, two Centering units in (3).6 (3)
a. Miriam. ‘Miriam.’ b. yo creo que /uh/ no nos va a alcanzar el tiempo. ‘I believe that, uh, we won’t have enough time.’
In the CallHome corpus, utterances, at the first level of granularity, are equivalent to dialogue acts, which were assigned to the Spanish CallHome corpus (Levin et al., 1999). In this corpus, the speech act was more important than intonation when it came to segmenting speech into utterances. The following example was segmented into two dialogue acts, which also correspond to two tensed clauses. (4)
a. Se supone que hay mucho ganado, ‘Supposedly there are a lot of animals,’ b. pero yo no vi nada. ‘but I didn’t see any.’
Pauses also indicate a new segment, whether a segment was introduced already in the transcripts or not. Example (5) was one unit, but since a pause exists after de él, the second part was considered to be a new Centering unit. (5)
a. claro, pero, o sea, él, según él, soy el socio de él [pause] ‘right, but, I mean, he, according to him, (I) am his partner’
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
181
b . según él, ¿no es cierto? ‘according to him, right?’ Segmentation into utterances has been a topic of study in the Centering literature. In the analysis, I have followed Kameyama’s (1998) proposals for intra-sentential Centering. They consist of separating any tensed coordinate or subordinate clauses from their matrix, and of including report complements and reported speech together with the reporting units. Tenseless subordinate clauses are part of the matrix clause.7 In addition to the segmentation already in the corpora (utterances and dialogue acts), complex clauses are broken up according to Kameyama’s rules. Tensed adjuncts are separated from the main clause, as in example (6). (6)
a. No compro nada, no nada, nada ‘(I) don’t buy anything, nothing, nothing’ b. porque quiero irme a ver a mi hermana. ‘because (I) want to go see my sister.’
Kameyama (1998) considers reported speech a hierarchical unit, embedded with the reporting unit, and I followed that approach. That is, in cases where reported speech appears, the reported unit is processed, and Centering structures are created within it. But once it has been processed, the next unit looks back to the reporting unit for antecedents, and for Cb comparison purposes. I also included relative clauses together with their antecedent NP, that is, relative clauses were treated as embedded. Poesio et al. (Poesio et al., 2000; 2004) report that this produces fewer violations of Centering constraints (specifically, of Constraint 1, that all utterances of a segment, except the first one, have one Cb). The final issue in segmentation was the speech addressed to a third party. In CallHome conversations, which are on the telephone, one of the interlocutors sometimes directs speech to another person on his or her side of the line. This was recorded, and quite likely audible to the other interlocutor. I considered speech directed to a third party as a separate Centering unit, and included it in the Centering analysis, because entities mentioned in the speech to the third party often appear in the conversation between the main interlocutors. We can see an illustration in (7). The speakers, A and B, are debating how long they have been on the phone (7a and 7b). Speaker B then asks somebody else (mamá), and reports back the answer. The vocative mamá is included in the Cf list of (7c).8 A Centering analysis including (7c) shows that speech directed to a third party must be included in the analysis since it contains the antecedent for the null pronoun in (7d), which is speech directed at A, and as a consequence part of the main conversation. Without (7c), the transition between (7b) and (7d) is a zero transition (no Cb). (7)
A: a. ¿Te late que como quince? ‘Does fifteen (minutes) sound about right?
182
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
B: b. Pues no sé yo. ‘Well, I don’t know.’ c. llevamos como quince minutos, mamá? ‘Have (we) been (talking) for about fifteen minutes, Mom?’ d. dice que más o menos. ‘(She) says that more or less.’ The segmentation was performed by two annotators separately. We first segmented one CallHome and four ISL conversations as training, compared the results, and refined the coding manual (Hadic Zabala and Taboada, 2004). Then an evaluation was performed, segmenting four additional CallHome conversations, which amounted to 895 segments in the final agreement. The disagreement in those 895 segments was 18.7% of the total. This included any instance of disagreement (two instead of one segments, or vice versa, or disagreements in the inclusion of segments for the analysis). The high disagreement rate is due to problems in interpreting spoken data (boundaries are not clear), deciding on whether to include inferables (if an utterance contains no entities, it is not considered a unit for the analysis), and, to a lesser extent, human error. Current efforts are directed toward making the coding manual more transparent, and devising a training process, which might include segmenting on-line, without looking ahead, as Brennan (1995) suggests.
3.2 False Starts and Backchannels An utterance may not be complete syntactically, but still include referential information that affects the rest of the discourse. Some of these incomplete utterances are referred to as false starts. In the analysis, I considered false starts that included some referential information, whether the utterance was complete or not, which was also the approach followed by Eckert and Strube (1999). Most of those false starts were not utterances in themselves. For instance, in (8), the speaker introduces te (‘you’), but then changes her mind and produces a different sentence. The entity you, however, has already been introduced, and therefore it has to be considered as part of the Cf list. (8)
bueno. /mm/ entonces quedamos así. ‘Good. you mm then (we) agree on that.’
Following Byron and Stent (1998), “empty utterances”, that is, utterances that contain no discourse entities, are attached to their preceding or following utterance, according to context. This applies to empty utterances across turns as well, so that backchannels (Yngve, 1970) are ignored for Centering purposes. In the following example, (9b) is a backchannel signal, making (9a) and (9c) the adjacent utterances for Centering.
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
183
(9)
A: a. Me levanto a las siete ‘(I) get up at seven’ B: b. Sí. ‘Yes’ A: c. empiezo las clases de ocho a nueve cuarenta ‘(I) start class from eight till nine forty’
3.3 Linearity and Overlapping A conversation is the combined effort of two or more participants. Reference passes back and forth between speakers, producing a sense of coherent whole for the entire conversation. As a consequence, I considered that Centering transitions applied from one utterance to the next, regardless of whether the two utterances were produced by the same speaker or by different speakers, in line with Byron and Stent’s (1998) proposal. This applies when the turns are actually floor-holding (Edelsky, 1981), rather than backchannel signals, as in example (9) above. Example (10) shows two turns. The centers in B’s turn include an entity in A’s turn, a reference to B herself. (10)
A: a. qué tal te viene? ‘how is (that) for you?’ Cf: meeting (null), ¿B (te, ‘you’) B: b. no. te contesté recién que /eh/ hoy viernes yo no puedo. ‘no. (I) just told you that uh today Friday I can’t.’ Cf: B (‘I’, null), A (te, ‘you’), Friday Cb: B
3.4 First and Second Person Pronouns Spoken language usually contains a high number of first and second person pronouns. Centering was devised explicitly with third person pronouns in mind, and most applications of Centering do not take first and second person pronouns into account. Byron and Stent (1998) found that it was necessary to include them in the Cf list. This is certainly the case in the data, where the antecedent for null first and second person pronouns is to be found in previous utterances. In the following example, I and you in (11b) are linked to we in (11a). Of course, part of that reference is situational, but it can certainly be included in a Centering analysis. (11)
184
a. Mónica. /eh/ te parece que nos juntemos algún día en la mañana, toda la mañana entera? y trabajemos?
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
‘Monica uh what do you think (we) get together some day in the morning, all morning? and work?’ b. así que querría saber si vos el miércoles diecisiete podés. ‘So (I)’d like to know if you can Wednesday the 17th in the morning.’ First and second pronouns, in this data, constitute a large number of the entities for each utterance; in fact, the only entities in many cases. Were they not included in the Cf list, we would find many more instances of transitions with no backward-looking center.
4. What Is Salient? And How Much? Once the conversations have been segmented into utterances to be considered for the Centering analysis, the next step is to assign a Centering structure to each one of them. This involves (i) building the Cf list (the list of forwardlooking centers), (ii) determining the Cb, and (iii) establishing which transition holds between two consecutive utterances. The thorniest of those tasks is the construction of the Cf list. In this section, I discuss the different issues involved in populating the Cf list.
4.1 Entity Realization In Centering, the list of forward-looking centers is a partial ordering of the entities realized in the utterance. Precisely what the definition of ‘realized’ is, and what criteria we should use for that ordering are the two problems in ranking the Cf list, that is, in deciding which entities are salient in the discourse, and how salient they are in relation to each other. The definition of ‘realize’ depends, according to Walker, Joshi and Prince (1998: 4), on the semantic theory one chooses. But, in general, “realize describes pronouns, zero pronouns, explicitly realized discourse entities, and those implicitly realized centers that are entities inferable from the discourse situation.” Cornish (2005) argues, in general, that entities in focus are not only those that have been explicitly introduced in the discourse. We need to consider, then, inferable entities. Inferable entities are of particular importance in dialogue because it relies more than monologue on the context outside the text proper. To populate the Cf list, indirect realization of entities was permitted: null subjects; member-set relations (Mom–Mom and Dad) and part-whole relations (branches-trees). A strict direct realization (where the entities have to be mentioned explicitly in the utterance) resulted in a large number of empty Cbs. What exactly an indirectly realized entity is may, of course, not be obvious. I used the relations identified by Halliday and Hasan (1976) as lexical cohesion (synonymy, hyponymy, superordinate, but not collocation, which does not necessarily involve reference to the same entity). Particularly
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
185
difficult in this respect were decisions having to do with dates and times, and how those are related to each other. I considered mostly ‘include’ relations (Hurewitz, 1998), such that, for instance, a date was deemed to be related to the previous utterance’s Cf list if it was part of a date range mentioned there. However, when the date was not within the time frame established, it is plausible to think that the hearer had to construct a new model for it. In example (12), speaker A proposes the week of the fourth, after having discussed the previous week. However, speaker B returns to the previous week, and mentions Friday, October 1st, that is, a date not in the week of the fourth. This is a new entity, and cannot be related to the immediately preceding utterance. As it happens, this results in an empty Cb, since there are no entities in common between the two utterances. (12)
A: . . . quieres tratar la semana de cuatro? ‘. . . do you want to try for the week of the 4th?’ B: qué te parece el viernes primero de octubre, luego de las once de la mañana? ‘what do you think of Friday October 1st, after 11am?’
Spoken language tends to leave much unsaid. That characteristic poses further problems for an account of the ‘realize’ constraint in Centering. It has been proposed that bridging inferences (Clark, 1977) can be used to relate entities between utterances. In example (13a), speaker A mentions Internet, which is continued in (13b) and (13c), in two null subjects. In (13d), speaker B does not refer to Internet at all, but introduces computer in the conversation, with a definite article. Usually, there would be no connection between (13c) and (13d): the Cf list for (13c) includes only Internet, and the Cf list for (13d) is: b (the speaker), computer. However, computer is an inferable (Prince, 1981), a computer being needed to access the Internet, and it can therefore become the Cb of (13d), picking up on Internet in (13c). (13)
A: a. estoy conectado con Internet y todo ‘(I)’m connected to Internet and all.’ B: b. qué tal ‘How’s (that)?’ A: c. es bárbaro ‘(It)’s great.’ B: d. yo no me pude comprar la máquina todavía, loco ‘I haven’t been able to buy the computer yet, man.’
Example (14) shows another instance of an inferable entity. The speaker in (14a) says that he wrote ‘a lot’ (muchísimo is an adverb). In the next utterance, he says that ‘(they) don’t arrive.’ The plural null pronoun can be interpreted as being
186
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
a reference to the product of his writing, probably letters. The two utterances were considered to have letters in common, which is then the Cb of (14b). (14)
a. Escribí
muchísimo,
write: 1 sg.past
very.much
‘I wrote a lot,’ b. lo
que
pasa
es que no
llegan.
the what happen:3sg.pres is that not arrive:3pl.pres ‘what happens is that (they) don’t arrive.’ Null, or zero, subjects, are common in Spanish, but always recoverable from the context and the morphology of the verb, and are always added to the Cf list. Ambiguous cases do occur, just as pronouns in English can be ambiguous. Those are disambiguated to the most plausible referent when creating the Cf list. There exist other instances of implicit entities, beyond null subjects. In example (15), the conversation is clearly about children, those of both interlocutors. However, children are only mentioned once, in the first turn. We have to assume that they are implicit in the rest of the exchange, as are the subjects, so that the sentences read: Do you have children? and We don’t have children yet. The summary in (16) represents the two lists of entities of the exchange, depending on a literal interpretation, or one that allows inferable entities.9 I decided to use the one on the right, which includes all the entities inferable from the context. It is plausible to assume that those entities are in the focus of attention throughout the exchange. (15)
B: a. . . . ¿Y chicos? ‘And children?’ A: b. Sí. Todavía no ‘Yes. Not yet.’ B: c. ¿Ah? ‘Huh?’ A: d. Todavía no ‘Not yet.’ B: e. ¿Todavía no? ‘Not yet?’ A: f. ¿Ustedes? ‘You (plural)?’
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
187
B: g. Ah bueno, dos ya ‘Ah well, two already.’ (16)
Dialogue B: And children? A: Yes, not yet A: Not yet B: Not yet? A: And you? B: Ah well, two already
Cf list without inferables children B
Cf list with inferables A, children A, children A, children A, children B, children
2 (children)
B, 2 children
4.2 Cf Ranking The ranking of the entities in the Cf list is most often performed by following grammatical relations. Thus, subjects are ranked higher than objects, and these higher than adverbials. In English, this results in the following order (Walker et al., 1998): (17)
Subject > Object(s) > Other
The ranking is, however, not fixed, and considered to be languagedependent. When a new language is considered, a Cf template (Cote, 1998) for that language needs to be developed. Several languages have been studied using Centering, and thus different templates exist. For instance, the template for Japanese includes topic markers (wa) and empathy markers on verbs, resulting in the following template (Walker et al., 1994): (18)
(Grammatical or zero) Topic > Empathy > Subject > Object2 > Object > Others
Di Eugenio (1998) also ranks empathy highest in her template for Italian, following Turan’s (1995) for Turkish. Turan and Di Eugenio take the notion of empathy from Japanese, and view it as reflected in psychological verbs (interest, seem), perception verbs ( feel, appear), and certain expressions that refer to point of view (in her opinion). There are proposals to incorporate other factors in the Cf template, such as Strube and Hahn’s (1999) use of discourse status, whether hearer-old or hearer-new (Prince, 1981), to analyze German. Cote (1998) uses Jackendoff ’s (1990) Lexical Conceptual Structures. Gordon, Grosz, and Gilliom (1993) discovered that both grammatical function and surface order had a role in giving an entity prominence within the Cf. In the next few sections I discuss some of the factors that affect Cf ranking in Spanish.
4.3 Empathy and Animacy Spanish is a pro-drop language; subjects do not need to be realized as pronouns if they are known in context. Additionally, it has direct and indirect object
188
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
clitics (unstressed pronouns). Corresponding stressed object pronouns are possible for animate entities only. I mainly follow grammatical relations as the basis for ordering the Cf list in Spanish. Therefore, subjects are ranked higher than objects, whether they appear as full pronouns, or as null pronouns. There are two other criteria that play a role in the Cf ordering in Spanish: empathy and animacy. Following Di Eugenio (1998), I take empathy with the speaker or hearer over strict word order as a ranking criterion. Empathy, as defined by Kuno (1987: 206), “is the speaker’s identification, which may vary in degree, with a person/thing that participates in the event or state that he describes in a sentence.” There are no studies, to my knowledge, of how empathy and point of view are expressed in Spanish, in general.10 The main place where I observe empathy-related effects is in the argument structure of psychological verbs. In those, the point of view taken is that of the experiencer, regardless of whether it is the subject or not (e.g., ‘it seems to me’, ‘I think’, and the like). In (19) the speaker is the highest ranked entity, because it is the experiencer of a psychological verb (parece). In this case, the experiencer is encoded with clitic doubling (Fernández Soriano, 1999; Suñer, 1988): the PP a mí, plus the clitic me. In example (20), the clitic me refers to the speaker, for whom Thursday is a better date.11 (19)
a mí me
parece
también, bueno dehacer una
to me cl.1sg seem: 3sg.pres too
good
of
reunión,
do:infa meeting,
‘It also seems good to me to have a meeting,’ Cf: i (a mí, me), to have a meeting, meeting (20)
me
viene
mejor
el
jueves,
cl.1sg come:3sg.pres
better
the
Thursday
‘Thursday is better for me.’ Cf: i (me), it (the meeting, null), thursday However, the point of view criterion need not apply to the speaker only. In (21), the point of view is that of the interlocutor. (21)
este qué tal para ti, so
how-
for
del
quince al
you:sg from.the fifteen
diecinueve.
to.the nineteen
‘So, how is it for you from the fifteenth to the nineteenth?’ Cf: you (para tí), it (the meeting, null), from the 15th to the 19th A number of verbs in Spanish follow this pattern (‘me conviene’, ‘me viene mejor’, ‘se me hace que’; it’s good for me, it’s better for me, it seems to me). Thus, for these verbs, the thematic role of experiencer takes precedence over the grammatical function of subject. Empathy also includes verbs with clausal
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
189
grammatical subjects, but with an animate experiencer, or person from whose point of view the statement is to be interpreted. In (22), there is a displaced clausal subject, ‘to meet with you that day’. The subject is included in the Cf list as a single entity. The speaker is the most salient entity, represented in para mí ‘for me’. (22)
así
que
para
mí
sería
so
that
for
me
be:pres. cond impossible
juntar-me
con
join: inf-cl.1sg with
imposible
vos
/eh/
ese
día
you:sg
uh
that
day
‘So it would be impossible for me to meet with you that day.’ Cf: i (para mí), to meet with you, that day Not all experiencers, however, seem to be good candidates for higher placement. In a sentence like Juan asusta a María, ‘John frightens Mary’, the subject Juan seems to me to be more prominent than María, although María is an experiencer. It is possible that experiencers are ranked higher only when they are first and second person, which also happen to be higher in most hierarchies of animacy.12 Animacy is a relevant feature in the ordering of clitics and reflexive pronouns that refer to participants in the discourse. Animacy is considered relevant in general for salience and topicality (Givón, 1983). Stevenson et al. (1994) found that animacy has a role in deciding which entity will be in focus, and it was also found to have an effect in pronominalization (GNOME, 2000).13 Clitics and reflexive pronouns, in addition to conveying empathy, are also placed before the verb, linearly before (clitic) direct objects (whether empathy is involved or not).14 It is usually the case that indirect objects are animate, whereas direct objects may not be. In summary, three reasons speak for ordering the objects as indirect before direct: (i) indirect objects can convey empathy; (ii) indirect object clitics are always placed before direct object clitics; (iii) indirect objects tend to be animate. Wanner (1994) argues that clitic sequences in Spanish obey constraints of empathy and animacy. An illustration is to be found in (23), where the indirect clitic se ‘to her’ precedes the direct lo ‘it’, which refers to a scholarship for a program that was given to the speaker’s sister. Notice that the null subject is arbitrary (see section 4.4), and thus ranked last. (23)
a. Mi hermana solicitó un programa de arqueología y antropología en Grecia. ‘My sister applied to a program in archeology and anthropology in Greece.’ b. ¡Y and
190
que
se
lo
that
cl.3sg.dat cl.3sg.masc.acc give:3pl.pres
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
dan!
‘And they give (gave) it to her!’ Cf: Sister (se, ‘to her’), program (lo, ‘it’), they (null)
4.4 Cf Proposal for Spanish Subjects take precedence in the Cf list in most other cases (i.e., when they are not clausal, and when there are no experiencers). Accordingly, the elements of the Cf list follow the order in (24).15 This ranking applies first to main (matrix) clauses, and then to subordinate clauses, when the two are within the same Centering unit (usually, because the subordinate clause is non-finite; see section 3.1 on segmentation). (24)
Experiencer > Subj > Animate IObj > DObj > Other > Impersonal/Arbitrary pronouns
At the end of the ranking are null arbitrary subjects (Jaeggli, 1986), as in (23), and subjects in impersonal constructions with se, as in example (25). The word se in this example indicates a non-specific subject in an impersonal middle voice construction (Mendikoetxea, 1999), meaning “one can hear that you are well.” (25)
Ya
se
already se
te
oye
cl.2sg.acc hear:3sg.pres
muy
bien.
very
well
‘You already sound very well.’ Cf: you (te), one (se) Also included as impersonal pronouns are instances of the second person singular, which can be used impersonally (Butt and Benjamin, 2000). It is interesting to note that this second person form is often used as an indirect form of reference to the speaker. In example (26), the speaker is implying that he has to take one exam every year. The tú form might indicate simply that that’s the norm, and he is no exception. If we were to consider that the second person form has some reference to the speaker, its ranking in the Cf list would have to change to: i (speaker), exams, every year, since the subject is the second person singular. The sentence, however, seems to be more about the exams than about who has to submit them. (26)
a. Son, son los tutoriales. ‘(They) are the exams.’ b. Tienes
que presentar
have:2sg.pres that submit:inf
uno
cada
año.
one
every
year
‘(You) have to submit one every year.’ Cf: exams (uno), every year, one/you (null subject)
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
191
4.5 Noun Phrases with More Than One Entity A few other issues need to be addressed in the Cf ranking. The first is related to noun phrases that contain more than one referent or entity, whether possessives (my brother, my letter), nouns with a prepositional phrase (the census of the city), or conjoined NPs (Juan and María). For possessives I follow Di Eugenio (1998): the possessor is ranked before the possessed, if the possessed is inanimate, and the possessor after the possessed, if the possessed is animate.16 In (27), the ranking of mi examen (‘my exam’) is speaker > exam. However, in (28), the ranking of mi mamá (‘my Mom’) is mom > speaker. (27)
Una
maestra
este,
me
tuvo
que
a
teacher
eh
cl.1sg
have: 3sg.pret
that
venir
a
hacer
mi
último
examen
aquí.
come:inf
to
make:inf
my
last
exam
here
‘A teacher uh, had to come and give me my last exam here.’ Cf: teacher, i (mi), exam, here (28)
mi mamá posiblemente llegue my Mom
possibly
la
otra
semana
arrive:3sg. pres.subj the other week
‘My Mom will probably arrive next week.’ Cf: mother, i (mi), next week The same principle applies to noun phrases with a PP modifier usually headed by ‘of ’ (de in Spanish). In most of those constructions, the meaning is that of a genitive (las cartas de Marta = Marta’s letters). The approach taken here is different from Walker and Prince’s (1996) Complex NP Assumption, which ranks NPs with a possessive determiner in linear order, left to right. Since I am considering animacy as a relevant feature, I prefer to follow Di Eugenio’s ranking for possessives, and to expand it to other NPs that include more than one entity. Thus, in example (29), una de Marta refers to one (letter) from Marta. Since Marta is animate, it is ranked higher than letter. (29)
Y
una
de
Marta.
and
one
of
Marta
‘And one (letter) from Marta.’ Conjoined NPs activate as most salient entity the group denoted by the conjoint. Thus, in John and Mary, the most salient entity is the group John and Mary. The individual entities, John and Mary, are less salient than the group (Gordon et al., 1999). In that same paper, Gordon and colleagues suggest that the individual entities are equally salient. The mention of either John or Mary results in the same processing time in a psycholinguistic exper-
192
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
iment. It could be argued that this result would lead to multiple entities in the same position within the Cf list, as in (30), where the separate entities John and Mary occupy the same place in the Cf list. However, I feel that allowing multiple entities in the same position would make ranking too complex, and would also complicate future attempts at implementing these methods in an anaphora resolution system,17 and prefer to use linear order to sort the two entities (31). (30)
John and Mary went to the store. John Cf: John and Mary,
store Mary,
(31)
Cf: John and Mary, John, Mary, store
4.6 Wh-Pronouns Wh-pronouns, qué (‘what’), quién (‘who’), cuándo (‘when’), are included in the list of forward-looking centers, and are ranked according to the syntactic role they have in the clause. Although wh-pronouns do not have a specific referent, they do serve as antecedents for other referring expressions. According to Halliday (Halliday, 1967; Halliday and Matthiessen, 2004), wh-words can be Themes in a clause, and I believe that they can establish cohesive ties throughout a text.18 In (32b), qué ‘what’ is included in the Cf list, and used as an antecedent for ecología in (32c), thus becoming the Cb of that utterance. (32)
B: a. se va a la Universidad de Gales, del Sur, donde estudió Sarucán, también. ‘She is going to the University of South Wales, where Sarucán studied as well.’ A: b. A hacer qué. ‘To do what?’ B: c. Este. A hacer ecología. ‘Eh, to do environmental science.’
4.7 Reference through More Than One Expression An utterance may contain reference to the same entity through more than one referring expression. For instance, the utterance in (33) contains reference to the subject both through the null subject pronoun and through a clitic (nos). In Centering we are usually concerned with the entities mentioned in the
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
193
utterance, not so much with the referring expression(s) used to evoke them. However, since my concern in this chapter is the link between Centering transitions and referring expressions, this was an important issue. The ranking of such entities is straightforward: the most salient grammatical function (or other criterion that may apply) is used to list the entity in the Cf list. The problem is which form should be used to categorize the form of the Cb in that utterance (see table 4). I have, for the time being, categorized such examples under the most marked form of reference. In example (33), the referring expression used to denote the entity “first person plural” is listed as a clitic, not as a null pronoun (clitics are considered more marked than null pronouns). It could be argued that the least marked form should be used to classify the Cb, but that would not show the fact that the Cb is, in a way, reinforced by another referring expression, by being referred to twice in the same utterance. (33)
nos
vamos
con
cl.1pl go:1pl.pres with
mi
madre
my mother
‘(We) are going with my mother.’ The verb be (ser and estar in Spanish) functions as a linking verb, so subjects and predicates (nominal and adjectival) of the verb to be are coreferential and only need to be listed once in the Cf list. In (34), there are two references to the person the speaker is talking about, his teacher. The first reference is through the null subject, and the second through the predicate noun, amiga. The case is similar to the one in (33), where two referring expressions are used to refer to the same entity. As above, I classified the most marked one (NP in this case). (34)
porque
aparte
es
mi
amiga,
because
besides
be:3sg.pres
my
friend:fem.sg
‘because (she)’s also my friend,’ It is possible to have only a predicate (elliptical subject and predicator) in an utterance. In these cases, since the predicate is coreferential with the elliptical subject of the elliptical predicator, I include the subject in the list of forward-looking centers. In example (35), the speaker refers to himself with ‘covered’. Although there is no predicator in the sentence, reference to the speaker is included as if a null subject were present. (35)
Lleno
de
full:masc.sg of
granitos,
no,
este
zits
no
eh
‘(I’m) covered in zits.’ In most cases, the subject and the nominal predicate have exactly the same reference. In some cases, the reference may be slightly different: The dinner choice is pasta.19 Miltsakaki and Kukich (2004) label these predicates as specificational (and predicates such as the one in example (34) as predicational). They
194
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
rank specificational predicates higher than their corresponding subjects. I did not make such distinction, and treated all linking verb predicates in the same manner, as described earlier: the first (subject) reference determines the location in the Cf list; the predicate determines the type of referring expression used to refer to the Cb, if the entity in question is the Cb of the utterance.
4.8 Right- and Left-Dislocation The ordering of the Cf list is affected by other factors, among them right- and left-dislocation. I have not, for the moment, dealt with those, but a closer look at the data suggests that the ranking will be affected by dislocated elements. In example (36), two different rankings are possible. The first one (37) ranks modem according to its grammatical function, object. The alternative (38) is to rank it higher than the pro subject we, because it is left-dislocated. The usual ranking produces a retain transition from (36a) to (36b), and a smooth shift from (36b) to (36c). The alternative ranking, with modem higher, results in a continue followed by a retain20. (36)
A: a. ¿módem? ‘Modem?’ B: b. módem, los tenemos ‘Modems, (we) have them’ c. pero no los instalamos todavía ‘but (we) haven’t installed them yet.’
(37)
Grammatical ranking a. Cf: modem Cb: 0 b. Cf: we, modems Cb: modems – Transition: retain c. Cf: we, modems Cb: we – Transition: smooth shift
(38)
Alternative ranking b. Cf: modems, we Cb: modems – Transition: continue c. Cf: we, modems Cb: modems – Transition: retain
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
195
4.9 Unresolved Issues There are a number of unresolved issues in the ranking of the Cf list. The first one is the use of prosody in addition to the other factors that affect the ranking. A number of researchers have pointed out that prosody and stress affect the order of elements in the list when dealing with spoken language (Brennan, 1995; Cornish, 1999). This remains a task to be addressed in future research. Another difficulty within Centering is the treatment of pronouns that refer to discourse segments, or to abstract entities (Asher, 1993; Byron, 2002). I have excluded them from analysis for the time being.
5. Which Referring Expression? Anaphora resolution, and the form of the anaphoric term itself have long been linked to the relative prominence of entities in the discourse (Gundel et al., 1993; Prince, 1981; Sidner, 1983). Rule 1 of Centering Theory establishes that the Cb of an utterance must be a pronoun, if other pronouns are present. That is, the Cb will be realized by the most reduced form (a pronoun) if other pronouns are present. Centering does not suggest any other rules for what will happen in other situations, that is, when there are no pronouns at all. However, researchers have proposed a relation between the transition type, that is, the progression of local discourse topics, and either the form of referring expressions used to realize the subject (Di Eugenio, 1998), or the Cb of an utterance (Taboada, 2002a). The main purpose of this study is to determine what relationship there is between Centering transitions and referring expressions. For that purpose, I carried out a corpus analysis of two types of spoken language corpora in Spanish. The corpora are the ISL corpus and the CallHome corpus. The ISL corpus is a large collection (a total of about 500 conversations) of task-oriented conversations recorded in a lab, with externally controlled turns. The participants, who were native speakers of Spanish,21 had to press the ‘Enter’ key on a keyboard to yield the turn, which makes the conversations similar to one-way radio, although the speakers are present in the same room. The task was to arrange for a two-hour meeting within a time period that ranged from two to four weeks. The speakers had conflicting agendas, and usually proposed a number of dates before an agreement was reached. Nine conversations from this corpus were analyzed, three each of dyads of female-female, male-male, and female-male speakers. The nine conversations amounted to 262 utterances, as defined in section 3.1, and a total of 2,798 words. The CallHome corpus is a collection of telephone conversations lasting up to 30 minutes between native speakers of Spanish. One party was given a free long-distance call, free choice of who to call, and no restriction on topics. Most participants called relatives or friends.22 For this study, five conversations were used, a total of 1,198 utterances and 8,694 words.
196
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
TABLE
ISL CallHome
8.2.
Centering Transitions in Two Corpora
Continue
Retain
Smooth shift
Rough shift
121 65% 515 67%
27 16% 129 15%
22 15% 116 12%
11 4% 30 6%
Total 181 790
The conversations were first segmented according to the guidelines outlined in section 3.1. Then each utterance was coded according to Centering principles, including Cf list and type of transition. Table 8.2 shows the number of non-zero transitions for both corpora. The results are as predicted in Centering: continue transitions are preferred (overwhelmingly) over other types of transitions, and retains are preferred over shifts. Rough shifts are relatively rare. It is interesting to see that the two corpora have similar percentages of all types. Although the corpora are both spontaneous spoken conversations, they are somewhat different, in that the ISL conversations are task-oriented, whereas the CallHome recordings are casual. Those differences do not seem to affect the distribution of Centering transitions. The numbers shown in table 8.2 are for transitions that had a backwardlooking center. A large number of transitions had an empty Cb, and were not included in the analysis. The numbers are presented in table 8.3. There exist a number of reasons for the high occurrence of utterances with an empty backward-looking center. Some of those utterances do introduce completely new entities in the discourse, thus beginning a new discourse segment: Centering operates at the local discourse level; transitions between discourse segments are part of the global structure, and strictly not part of a Centering analysis.23 In a number of cases, however, the entities were inferable from the context, but the inference seemed a bit far-fetched, and I decided not to establish it. That is the case in (39b), where the speaker refers to the days she has mentioned in utterance (39a). The utterance could read “check if you can meet on Tuesday the 16th after 12 noon,” but instead it is “check if you can.” This is not just a question of a null object, but a null VP. I decided to not include the date in the Cf list for (39b).
TABLE
8.3.
Utterances with Empty Backward-
Looking Centers
ISL CallHome
Utterances
Cb=0
%
262 1198
81 408
30.92% 34.06%
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
197
(39)
a. así que recién podría el martes /eh/ dieciséis después de las doce del mediodía. ‘So I could on Tuesday, uh, the 16th after 12 noon.’ b. fijate si vos podés. ‘Check if you can.’
As I pointed out in section 4.1, the issue here is what kind of inferables can be included in the Cf list of an utterance. Hurewitz proposes to include entities that are in a functional dependency with previously mentioned entities or that are subsets of other entities (Hurewitz, 1998), and also discourse deictic pronouns, that is, pronouns that refer to a part of the discourse, such as events or clauses (Webber, 1981). In Hurewitz’s account, utterances joined by one of those relations constitute a new type of transition, a partial shift. Fais (2004) links entities in the discourse to other previously mentioned entities using cohesive relations (Halliday and Hasan, 1976). In some other cases, empty Cbs resulted from problems with the segmentation (Poesio et al., 2000), or from the strict adjacency constraint in Centering: only entities in the previous utterance can become the Cb of the current one. Some empty Cbs were as predicted by Centering, that is, they initiated a new discourse segment; for instance, a new topic is being discussed, or a new date is being proposed,24 and therefore contained no link to the previous utterance. The new discourse segments are often a completely new ‘push’ onto the focus stack (Grosz and Sidner, 1986), but they can also be insertion or side sequences (Jefferson, 1972) or corrections (Schegloff et al., 1977).
5.1. Referring Expressions and Transitions The Cb of each utterance was coded according to whether it was one of the several possible referring expressions, and those types of expressions were related to the transition types. The referring expressions are illustrated in (40) to (46). The referring expression in question is in bold.
zero pronoun (40)
a. Conozco, en serio, un doctor que hizo su doctorado en Japón, ‘Seriously, I know a doctor who did his Ph.D. in Japan,’ b. acabó
y [pause]
finish:3sg.pret
and
‘(He) finished and’ c. no [-] not null
encontró
chamba,
find:3sg.pret
employment
‘didn’t find a job,’
198
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
clitic (41)
a. Llega a Atenas ‘(She) will arrive in Athens’ b. y va a estar ahí tres semanas ‘and (she) is going to be there for three weeks’ c. y and
luego
la
andan
then
CL.3SG. FEM.ACC
go:3pl. pres walk: ger
de
isla
en
isla
from
island
to
island
paseando
‘and then (they) are going to take her around from one island to the next.’
pronoun (42)
a. No, no. Si de hambre no me muero. ‘No, no. (I)’m not going to starve.’ (lit., ‘die of hunger’) b. Pero yo but
quiero
ser
astrofísica
want:1sg.pres be:inf astrophysicist
I
‘But I want to be an astrophysicist.’
demonstrative pronoun (43)
B: a. Aquí le llaman tutorial. ‘Here they call it tutorial.’ A: b. Sí,
pues
ha
de
ser
yes
then
have:3sg.pres
of
be:inf that
eso.
‘Yes, then it must be that.’
full noun phrase (44)
B: a. ¿Y tu hermana? And your sister?’ A: b. Mi My
hermana
está
bie-
sister
be:3sg.pres
we-
‘My sister is well.’
other (45)
Wh- pronoun A: a. Ay, pero no muchos días más. ‘Ah, but not many more days.’
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
199
B. b. Cuánto más. ‘How much more?’ (46)
Adverbial (NP or PP)25 A: a. no. el lunes en la mañana <no> no puedo. ‘No. Monday morning (I) can’t.’ b. tal vez el lunes en la tarde, después de las doce? ‘Maybe Monday in the afternoon, after twelve?’ B: c. bueno well
el
lunes
tengo
una
the
Monday
have:1sg.pres
a
reunión
de
uno
a
cuatro
meeting
from
tw-
one
to
four
‘Well, on Monday (I) have a meeting from tw- one to four.’ Table 8.4 shows that, overall, the Cb tends to be expressed through a zero pronoun. This is the least marked form available in Spanish. For that reason, it is to be expected that the Cb will be coded as a zero pronoun when the transition is a continue. Such is the case: out of the 636 continue transitions (for both corpora together), 55% had a zero pronoun as Cb. When we move onto retain, where the Cb is continued from the previous utterance, but will likely not be continued further, the percentage of zero pronouns decreases. However, it grows again in the smooth shifts, to almost the same percentage as for continue (53.6%). Di Eugenio (1990; 1998) found that in Italian,26 speakers typically encode center continuation with zero subjects, and center retention and shift with stressed pronouns. She also found that instances of retain and shift with null pronoun subjects are possible if the utterance that constitutes the change contains syntactic features that force the zero subject to refer to an entity other than the Cb of the previous utterance. Indeed, I found many cases of null pronouns in subject position that made the referent clear, when it was other than the Cbi-1. In example (47c), the number agreement on the
8.4. Referring Expressions for the Cb of Each Utterance, according to Transition
TABLE
Continue Zero pronoun 350 55.0% Clitic 114 17.9% Pronoun 53 8.3% Demonstr. pr. 15 2.4% Full NP 86 13.5% Other 18 2.8% N 636
200
Retain
44 48 8 4 26 26 156
28.2% 30.8% 5.1% 2.6% 16.7% 16.7%
Smooth Shift
Rough Shift
74 53.6% 24 17.4% 9 6.5% 4 2.9% 22 15.9% 5 3.6% 138
9 21.9% 14 34.1% 4 9.8% 4 9.8% 9 21.9% 1 2.4% 41
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
verb links the null subject to the object in the previous utterance (‘mountains’), not its subject and Cb, the Yosemite National Park that the speakers have been discussing. (47)
a. Sí. Sí, es un parque nacional ‘Yes, yes, (it)’s a national park’ b. y es, tiene así montañas, ‘and (it)’s got like mountains,’ c. no, no son muy grandes, ‘(they) are not, not very big,’
Di Eugenio also found that speakers encode center retention or shift with a stressed subject pronoun (presumably in the cases when syntactic factors do not exclude reference resolution to the previous Cb). If we look at Table 8.4, we can see that pronouns are not used very often, across all four transition types. They actually occur less often in retain and smooth shift transitions than in continue, and only increase within rough shift, to 9.8%, which are only four instances, given the low number of rough shifts. More numerous are full noun phrases (definite noun phrases or proper nouns), and for those we can see a steady increase from continue to rough shift. It is possible that center change is expressed more often in (spoken) Spanish via a full noun phrase. For instance, in (48), the conversation has been about B’s activities, and she is then the Cb in (48a). When B takes her turn, she shifts and talks about Cristina, previously introduced. She could have used a stressed personal pronoun (ella), especially given that there is no competing referent, but instead chose to repeat the proper name. (48)
A: a. Mary, tú fuiste por tu vestido rojo donde Cristina. ‘Mary, did you go get your red dress from Cristina’s?’ B: b. Mmm. Ay, sí, pero Cristina está en Bogotá ‘Mmm. Oh, yes, but Cristina is in Bogotá.’
Clitics are, after null pronouns, the preferred form of realization across transition types. They are used in continue to refer to the speaker quite often, with psychological verbs (49), or other verbs, as indirect objects (50). (49)
me
parece
lo
CL.1SG
seem:3sg.pres the
mejor dejar-lo best
para la
otra
semana,
for
other
week
the
leave:inf-cl.3sg.masc.acc
‘(It) seems better to me to leave it for next week,’
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
201
(50)
para que me
lo
so
cl.3sg.masc.acc fix:3sg.pres.subjnow for
that
CL.1SG
arregle
ahora para diciembre. December
‘So that (she) can fix it for me now for December.’ Clitics do not always refer to the speaker. They can refer to the interlocutor (51) or to a third party, as in (52), with a pronominal verb, se vino (‘came’).27 (51)
correcto
Mónica,
te
decía
el
viernes
correct
Mónica
CL.2SG.ACC
tell:1sg.pret
the
Friday
por
aquello
de
la
muerte
de
Gaitán,
for
that
of
the
death
of
Gaitán
‘Right Mónica, (I) was telling you Friday because of Gaitán’s death,’ (52)
Sí, se
vino
para
acá
estar
yes
come:3sg.pret
towards
here
be:inf with.me
CL.3SG
conmigo.
Yes, (she) came to be here with me.’ Demonstrative pronouns are not very frequent in general: there are only four instances each for retain and both shifts. They are slightly more common in continue transitions, but still only account for 2.4% of the Cbs in those. They are used to refer to both things (53) and people (54). In some cases, they also refer to abstract entities, as in (55). Gundel et al. (1993) found that demonstrative pronouns are rarely used for referents that are familiar or in focus, which would be the case with most Cbs in this study, and Ariel (1988) also found a very low level of demonstratives in her corpus analysis. (53)
B: a. lo que sí son buenos, y no los sé usar, son los los enlaces para estar así platicando ‘What’s good, and (I) don’t know how to use them, are the the links to be like chatting.’ A: b. Ahá ‘Uh-huh.’ B: c. esos son buenos ‘Those are good.’
(54)
B: a.No, eh, ay, mami, el viernes se viene Alicia, la que tú tenías, para acá. ‘No, uh, uh, mami, on Friday comes Alicia, the one you used to have, here.’
202
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
B: b. Vamos a ver qué tal me resulta. ‘(We)’ll see how she turns out.’ A: c. Esa es una fiera. ‘That (one) is amazing.’ (lit. ‘She’s an animal.’) (55)
A: a. Aquí aprovechando las llamaditas estas que nos dan, ah ‘Here, taking advantage of these calls (they) give us, uh’ B: b. Ay, sí, claro, pero eso cómo es, Chipi, Ah, yes, right, but how’s that, Chipi,’ c. eso cómo funciona. ‘How does that work?’
Finally, the category “Other” includes a number of other realizations: whpronouns, adverbial NPs, and possessive determiners and pronouns. Some of these appear frequently in retain transitions, in preparation for a change of topic. In (56), for instance, speaker A is expressing despair, and ends his turn with a rhetorical question that includes reference to himself in a null pronoun. Speaker B continues talking about speaker A, but uses a possessive determiner (tu ‘your’), trying to steer the conversation toward exactly what is the problem (resentimiento ‘resentment’). (56)
A: . . . qué voy a hacer? ‘What am (I) going to do?’ B: ya, hay mucho resentimiento en tu voz, no? ‘I see, there’s a lot of resentment in your voice, isn’t there?’
In summary, a continue transition generally realizes the Cb as a zero pronoun, followed by a clitic. retain transitions are also realized through zero or a clitic, although other possibilities exist. These realizations are as expected, and reflect the types of situations that the different transitions were meant to encode. The next section deals with realizations that appear to be contrary to expectation.
5.2. Realization against Expectation The descriptions in the previous section are all of the type ‘x transition tends to encode the Cb in y form’. We have seen there are some clear tendencies. My concern here is the realizations that do not follow those tendencies. The most clearly stated tendency, in this chapter and in the literature, is that the Cb of a continue transition is realized via a reduced expression: zero pronoun, clitic, unaccented pronoun, and so on. Other realizations are said to
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
203
make processing more difficult. For instance, Gordon et al. (1993: 341) establish that there exists a ‘repeated name penalty’, where repeating a name that continues to be the Cb in the discourse deprives the reader28 of an important cue that the current utterance is coherent with the previous one. And yet, 13.5% of the Cbs in continue transitions are realized as full noun phrases, many of them proper names.29 The explanations for repeated noun phrases all have to do with spoken language phenomena. For instance, in the CallHome corpus, speakers frequently ask about other friends or relatives. These exchanges typically involve one speaker mentioning the name of the person, and the other repeating the name, as in (57). (Proper names are included in the full NP category). (57)
A: Qué han sabido de Eddie. ‘What have you heard from Eddie?’ B: De Eddie nada, ‘From Eddie, nothing,’
This is quite frequent when the turn changes, but it also happens within a speaker’s turn. In (58), the speaker repeats Mónica, although the referent should be clear, and the clitic la would have sufficed. (58)
a. y Mónica sin embargo ha crecido un montón. ‘And Mónica, however, has grown a lot.’ b. Tu
papá se
asombra
de ver-la
a Mónica,
your Dad cl.3sg surprise: 3sg.pres of see:inf-cl.3sg.fem.acc to Mónica ‘Your Dad is surprised to see her, Mónica,’
Brennan (1995) found that referents introduced in object position were then re-introduced in subject position with a full noun phrase. Only after that were they referred with a pronoun. Brennan believes that the referent needs to be in subject position so that it can become a backward-looking center, and thus candidate for pronominalization. This is the case in some of the examples, as in example (57), where the repeated NP/proper name becomes the backward-looking center of the utterance. I also found in the corpus instances of entities in subject position, but left-dislocated (Y Juan, ¿cómo está? ‘And Juan, how’s he?’). The proper name is repeated in subject position before it is pronominalized. It is possible that a neutral subject position is necessary before pronominalization takes place. In general, proper name repetition might be a device to establish common ground between the interlocutors. Downing (1996) points out that proper names are used very often in conversation: to introduce individuals in the conversation, as the most easily identifiable form of reference; and to refer
204
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
again to those individuals, as a marker of true familiarity with the referent denoted by the proper noun. In the ISL corpus, repeated referents across turns are either the participants or the dates being discussed. In (59), speaker B refers to herself with a full pronoun at the beginning of her turn. Amaral and Schwenter (2005) discuss cases like (59), and propose that the pronoun is obligatory, because it establishes a contrast30. (59)
A: puedes reunirte conmigo en mayo? ‘can you meet with me in May? B: a ver yo estoy de viaje del to see I
am
treinta y
uno hasta . . .
of travel from.the thirty and one until. . .
‘let’s see, I am away from the 31st until . . . ’ In (60), speaker B uses a full NP, el jueves to refer to the date being discussed, present in the immediately preceding utterance as a null pronoun. Note that in this case, contrast does not play a role. (60)
A: a. creo que el jueves veintisiete, que lo tengo totalmente libre podría ser. ‘I think Thursday the 27th, which (I) have completely free, it could be.’ b. qué te parece? ‘What do you think (of that date)?’ B: c. bueno. el jueves realmente es un día ocupado para mí. ‘Well, Thursday is actually a busy day for me.’
The presence or absence of the personal pronoun subject in Spanish has received a great deal of attention (e.g., Alonso-Ovalle et al., 2002; Cameron, 1992; Davidson, 1996; Enríquez, 1984). Stewart (1999) proposes that the use of the first person singular pronoun is a politeness resource, which helps contrast the speaker with other individuals or groups. Luján (1999) also points out the contrastive character of first and second person pronouns. This seems to be the case in the ISL corpus, where the speaker’s agenda is contrasted with the interlocutor’s. Davidson (1996) finds that the personal pronoun is used for emphasis and to negotiate conversational turns (to claim the floor for an extended period of time). He also found that the first person pronoun was used more frequently than second or third person pronouns. Those three factors might account for the presence of yo in examples such as (59). Conversely, using a zero pronoun when something else is expected could result in more difficulty in processing. In example (61), speaker A has been talking about visiting his sister in Greece. The Cb at the end of A’s turn is sister. B then replies with a question, ‘isn’t (that) very expensive?’ There is no
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
205
repeated entity across the turns, but B uses a zero for the third person singular subject of (61e). One possible referent is the idea of sightseeing, which A used at the end of his turn. It is possible that B realizes this possible mistaken interpretation, and reformulates in (61f), to specify that she is referring to the cost of the flight, not of doing tourism.31 (61)
A: a. Sí. Sí pues, es que se va a ir a Grecia ‘Yes. Yes, so she’s going to Greece’ b. y luego se queda las tres últimas semanas ‘and then she’s staying the last three weeks’ c. tres, tres semanas más, se queda ‘three, three more weeks, she’s staying’ d. y and
no
más
se
pasea
not
more
cl.3sg walk:3sg.pres
‘and she’s just going to do tourism.’ (lit. ‘she’s just going to walk around’) B: e. Pues, ¿no but
te
sale
not cl.2sg come.out:3sg.pres
carísimo? very.expensive
‘Well, isn’t (that) very expensive for you?’ f. o sea el avión, yo digo. ‘I mean the plane, I mean.’
6. Conclusions I have presented an application of Centering theory to two corpora of spoken Spanish. The study contributes to an understanding of the relationship between Centering transitions and choice of referring expression. The analysis shows that, when the topic stays constant, that is, when a continue transition is present, the most common realization of the backward-looking center is in a null pronoun. Null pronouns are also used in the other three transition types, likely because they are clearly identifiable from context, through person or number marking. Full noun phrases and pronouns are used quite often to encode the backward-looking center. This is contrary to the expectation that the topic of the utterance is encoded with the minimum amount of information. According to Gordon and colleagues (Gordon et al., 1993), there is a ‘repeated name penalty’ when using a more informative referring expression than necessary. It was found that speakers tend to repeat pronouns referring to themselves,
206
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
and proper names referring to third persons. This occurs most often when there is a change of turn, but also within the turn. In spontaneous conversation, Downing (1996) found that proper names are often used, even when pronouns would ensure correct identification of referent, to establish the referent in the discourse as common to both speakers. It is possible, then, that we may be forced to revise Centering predictions as to center realization, to take into account spoken language phenomena. Another source of evidence in support of this view is that a number of empty backward-looking centers were attributed to the presence of side or insertion sequences, which are characteristic of conversation. This ties in with the relationship of Centering and the global structure of the discourse. Centering was designed as a model for the local focus of attention. It is not clear how Centering can relate the two levels. For instance, in (62), there is a global story about how speaker B’s boss was quite proud of his work in a particular situation, because speaker B and his boss had issued 2,700 notices for back taxes on cars. Speaker B then starts a small story about how his boss came to know that he had done well in comparison to others. The story covers utterances (62c) to (62j). In utterance (62k), speaker B refers again to his boss, which is part of the global focus of discourse, as part of the ‘we’ in (62a). However, in (62k) the subject el tipo ‘the guy’ cannot be linked to the immediately preceding utterance, and thus results in an empty backward-looking center. (62)
B: a. en en en cinco días - hicimos dos mil setecientas citaciones ‘in in five days, (we) did two thousand seven hundred notices’ A: b. ahá ‘uh huh’ B: c. cuando él fue a la reunión de los abogados ‘when he went to the meeting with the lawyers’ d. todos habían hecho cien, ciento veinte ‘(they) had all done a hundred, a hundred and twenty’ e. no lo podían creer, viste ‘(they) couldn’t believe it, you know’ A: f. mirá, vos ‘really?’ B: g. y, pero así también fue la gente que empezó a caer ‘and, also that’s how people realized’ h. imagínate, la mitad de la gente, toda caliente ‘imagine, half the people, all mad’
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
207
i. porque le pedían impuestos que ya se, autos de hace treinta años que se transfirieron ‘because (they) were being asked for taxes for cars that already, cars that had been transferred thirty years ago’ j. [pause] que no existen más, viste, ¡una goma! [A: { laugh } ] tremenda, ‘that don’t exist any more, you see, what a situation!’ k. entonces viste, el tipo vino calentón, así ‘then, you see, the guy came back all excited, you know.’ Future research will be focused on the relationship between Centering and the discourse structure of the conversations, paying attention to conversational phenomena such as side sequences and turn-taking. I will also study the relationship between the local focus of attention (which Centering was devised to model) and the global structure of the conversations. NOTES 1. From J. K. Rowling (2003) Harry Potter and the Order of the Phoenix. Vancouver: Raincoast Books (p. 8). 2. Small capitals indicate that the list contains entities, not their linguistic realization. The reference to Dudley is conveyed by two different referring expressions: their son and Dudley. 3. Centering transitions are just one explanation for coherence. A text can be coherent without repeating or referring to the same entities (Brown and Yule, 1983: 195–199; Poesio et al., 2000). 4. Other proposals suggest that transitions for utterances after an empty Cb should be different: if Cbi is not empty, but Cbi-1 is, the transition is a center establishment; if Cbi is empty and it follows an also empty Cbi-1, the transition is null. It is only when Cbi is empty, and Cbi-1 is not that we have a zero transition (Kameyama, 1986; Poesio et al., 2004). 5. Their performance measures were based on (i) number of zero Cbs, (ii) whether the Cb that Centering found corresponded with a loose notion of sentence topic, and (iii) number of cheap vs. expensive transitions. The cheap/expensive distinction refers to inference load on the hearer (Strube and Hahn, 1999), according to whether Cpn-1, expected to be Cbn, is actually realized as such. 6. Spanish examples are glossed word-by-word only when the gloss provides information considered relevant. In all other cases, they are translated as close to the original as possible, which may sometimes make them sound awkward. Parentheses around a pronoun in the translation indicate that it is null in Spanish. Slashes (/eh/) indicate filled pauses or backchannels. Angle brackets (<de>) indicate false starts. 7. For a more detailed explanation of the segmentation, see Hadic Zabala and Taboada (2004) and Taboada and Hadic Zabala (2005). 8. I believe vocatives should be part of the Cf list (see Lambrecht, 1994 about vocatives being topics, and therefore referential), but I am not sure where they
208
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
belong in the Cf ranking. The current coding includes them in the highest position, following Lambrecht’s (1994) suggestion that they are topics. 9. There is one further complication in (15), a request for repetition in (15c). In (16) I have excluded that turn, since it does not contain entities, under either view (with or without inferable entities). 10. Although Wanner (1994) and Heap (1998) discuss how empathy affects the ordering of clitics in Spanish. 11. Abbreviations used in the examples: 1/2/3 – first/second/third person; cl – clitic; nom – nominative; acc – accusative; dat – dative; sg – singular; pl – plural; fem – feminine; masc – masculine; poss – possessive; pres – present; pret – preterite; inf – infinitive; ger – gerund; subj – subjunctive; cond – conditional. 12. Thanks to Jeanette Gundel and Nancy Hedberg for bringing up this point and suggesting the example. 13. Zaenen et al. (2004) discuss previous literature on the importance of animacy in a number of areas, including the choice between Saxon genitive and the of-genitive, which may affect ranking in Centering. 14. See Heap (1998) for an Optimality Theory account of how empathy is also involved in non-standard rearrangements of clitics. 15. This Cf template is slightly different from previous proposals (Taboada, 2002a, 2002b). 16. Gordon et al. (1999) suggest that the head of the NP (i.e., the possessed) is always the most salient. However, their experiments were based on NPs with animate possessor and possessed. The experiments were designed to test (and debunk) a linearity hypothesis (Gernsbacher and Hargreaves, 1988; Walker and Prince, 1996), but they were all conducted in English. Further crosslinguistic experiments would be desirable: in Spanish, and in other languages, possessives with two full NPs (e.g., Mary’s letters) have a different word order (las cartas de María). Tetreault (2001) shows that an anaphora resolution algorithm performs better using Gordon and colleagues’ ranking—though the corpus used was English as well. 17. Poesio et al. (2004) discuss the need for a second criterion when two entities may be ranked in the same place. They use linearity. 18. Pesetsky (1987; 2000) proposes that some wh-words are D(iscourse)linked; that is, they ask a question whose answer is drawn from a salient set. However, he says that only which questions are D-linked. I think that all wh-words establish a link between the question and its answer. 19. Thanks to Laurie Fais for this point and for the example. 20. Transition preference for individual utterances is perhaps not enough of a reason to consider the alternative. Rule 2 is mostly about preference for sequences of certain transitions. Another complicating factor is that left-dislocation may not signal salience: Givón’s (1983) topic accessibility scale ranks left-dislocated NPs as less accessible than neutral-ordered NPs. It is not clear whether less accessible in Givón’s scale means more salient in Centering terms. 21. The speakers came from all corners of the Spanish-speaking world. For more details on the corpus, see Taboada (2004). 22. Participants were also speakers of different dialects. Details about the transcriptions are available at: http://www.ldc.upenn.edu/Catalog/docs/ LDC96T17/ch_span.txt. 23. Identifying discourse segments is not a trivial matter. My observations here about when discourse segments start are impressionistic; rigorous analysis
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
209
and annotation needs to be done to integrate Centering into the global structure of the discourse. 24. See Taboada (2000; 2004, ch. 6) for a discussion of discourse segments in the ISL conversations. A new discourse segment was always initiated when a new date is being proposed, as evidenced by a break in the chain of cohesive links in the conversation. 25. Adverbials that are added to the Cf list are mostly those that denote times and places 26. Di Eugenio analyzed excerpts from two novels, newspaper articles, short stories, and a bulletin board post. There is a difference between Di Eugenio’s analysis and mine: she studied the realization of the subject; I examine the Cb. 27. The word se in this example is a clitic, co-referential with the subject, and different from the se in example (25). This se is in a paradigm with other clitics: me for first person singular subject; te for second person singular subject, etc. These constructions are referred to as pseudo-reflexive or middle voice constructions (Mendikoetxea, 1999). They appear to be reflexive, but are used with intransitive verbs, some of which have both an intransitive and a pseudo-reflexive use (hence the term ‘pronominal verbs’ when used pseudo-reflexively). See also Sharp (2005) for a unified account of all instances of se in Spanish. 28. Gordon et al.’s (1993) experiments were written. My explanations for the lack of ‘repeated name penalty’ are all related to the fact that the data analyzed here is spoken. 29. Di Eugenio (1998) found a few instances of strong pronouns in subject position with continue transitions. She relates it to the transition type preceding the continue, a possibility I have not yet explored in my data. 30. Dimitriadis (1996) proposes that a pronoun is chosen when the antecedent is not the Cp of the previous sentence (i.e., it is not the most salient entity in the previous sentence). It is possible that that is the case in many situations, but not in example (59), where tú (‘you’), the null pronoun from the first utterance, is realized as a strong pronoun (yo) in the second utterance, of course with the change in person due to the change of speaker. Contrast and the change of turn seem to be the decisive factors here. 31. Geluykens (1994) attributes this type of repair to a conflict between principles of Clarity and Economy, derived from Grice’s (1975) maxims.
REFERENCES Alonso-Ovalle, Luis, Susana Fernández-Solera, Lyn Frazier and Charles Clifton. (2002). Null vs. overt pronouns and the topic-focus articulation in Spanish. Rivista di Linguistica (Italian Journal of Linguistics), 14 (2), 151–170. Amaral, Patrícia Matos and Scott A. Schwenter. (2005). Contrast and the (non-) occurrence of subject pronouns. In D. Eddington (Ed.), Selected Proceedings of the 7th Hispanic Linguistics Symposium (pp. 116–127). Somerville, Mass.: Cascadilla Press. Ariel, Mira. (1988). Referring and accessibility. Journal of Linguistics, 24, 65–87. —— . (1996). Referring expressions and the +/− coreference distinction. In T. Fretheim and J. K. Gundel (Eds.), Reference and Referent Accessibility (pp. 13–25). Amsterdam and Philadelphia: John Benjamins. Asher, Nicholas. (1993). Reference to Abstract Objects in Discourse. Dordrecht: Kluwer.
210
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Bolinger, Dwight L. (1979). Pronouns in discourse. In T. Givón (Ed.), Syntax and Semantics, Vol. 12: Discourse and Syntax (pp. 289–309). New York: Academic Press. Brennan, Susan E. (1995). Centering attention in discourse. Language and Cognitive Processes, 10 (2), 137–167. Brown, Gillian and George Yule. (1983). Discourse Analysis. Cambridge: Cambridge University Press. Butt, John and Carmen Benjamin. (2000). A New Reference Grammar of Modern Spanish. Chicago: McGraw-Hill, 3rd ed. Byron, Donna K. (2002). Resolving pronominal reference to abstract entities, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL-02) (pp. 80–87). Philadelphia, Pa. Byron, Donna K. and Amanda Stent. (1998). A preliminary model of Centering in dialog, Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics (ACL-98) (pp. 1475–1477). Montréal, Canada. Cameron, Richard. (1992). Pronominal and Null Subject Variation in Spanish: Constraints, Dialects, and Functional Compensation. Unpublished Ph.D. dissertation, University of Pennsylvania. Carreiras, Manuel, Morton Ann Gernsbacher and Victor Villa. (1995). The advantage of first mention in Spanish. Psychonomic Bulletin and Review, 2 (1), 124–129. Clark, Herbert H. (1977). Bridging. In P. N. Johnson-Laird and P. C. Wason (Eds.), Thinking: Readings in Cognitive Science (pp. 411–420). Cambridge: Cambridge University Press. Cornish, Francis. (1999). Anaphora, Discourse, and Understanding: Evidence from English and French. Oxford: Clarendon. —— . (2005). Degrees of indirectness: Two types of implicit referents and their retrieval via unaccented pronouns. In A. Branco, T. McEnery and R. Mitkov (Eds.), Anaphora Processing: Linguistic, Cognitive and Computational Modelling (pp. 199–220). Amsterdam and Philadelphia: John Benjamins. Cote, Sharon. (1998). Ranking forward-looking centers. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 55–69). Oxford: Clarendon. Davidson, Brad. (1996). ‘Pragmatic weight’ and Spanish subject pronouns: The pragmatic and discourse uses of ‘tú’ and ‘yo’ in spoken Madrid Spanish. Journal of Pragmatics, 26 (4), 543–565. Di Eugenio, Barbara. (1990). Centering theory and the Italian pronominal system, Proceedings of the 13th International Conference on Computational Linguistics (COLING-90) (pp. 270–275). Helsinki, Finland. —— . (1998). Centering in Italian. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 115–137). Oxford: Clarendon. Dimitriadis, Alexis. (1996). When pro-drop languages don’t: Overt pronominal subjects and pragmatic inference. In L. M. Dobrin, K. Singer and L. McNair (Eds.), CLS 32: Papers from the Main Session (pp. 33–47). Chicago: Chicago Linguistics Society. Downing, Pamela A. (1996). Proper names as a referential option in English conversation. In B. A. Fox (Ed.), Studies in Anaphora (pp. 95–143). Amsterdam and Philadelphia: John Benjamins.
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
211
Eckert, Miriam and Michael Strube. (1999). Resolving discourse deictic anaphora in dialogues, Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL-99) (pp. 37–44). Bergen, Norway. Edelsky, Carole. (1981). Who’s got the floor? Language in Society, 10, 383–421. Enríquez, Emilia. (1984). El pronombre personal sujeto en la lengua española hablada en Madrid. Madrid: Consejo Superior de Investigaciones Científicas. Fais, Laurel. (2004). Inferable centers, Centering transitions and the notion of coherence. Computational Linguistics, 30 (2), 119–150. Fernández Soriano, Olga. (1999). El pronombre personal: Formas y distribuciones. Pronombres átonos y tónicos. In I. Bosque and V. Demonte (Eds.), Gramática descriptiva de la lengua española (Vol. 1: Sintaxis básica de las clases de palabras, pp. 1209–1273). Madrid: Espasa. Geluykens, Ronald. (1994). The Pragmatics of Discourse Anaphora in English: Evidence from Conversational Repair. Berlin: Mouton de Gruyter. Gernsbacher, Morton Ann and David J. Hargreaves. (1988). Accessing sentence participants: The advantage of first mention. Journal of Memory and Language, 27 (6), 699–717. Givón, Talmy. (1983). Topic continuity in discourse: An introduction. In T. Givón (Ed.), Topic Continuity in Discourse: A Quantitative Cross-Language Study (pp. 1–41). Amsterdam and Philadelphia: John Benjamins. GNOME. (2000). GNOME Project Final Report. Edinburgh: University of Edinburgh. Gordon, Peter C., Barbara J. Grosz and Laura A. Gilliom. (1993). Pronouns, names, and the Centering of attention in discourse. Cognitive Science, 17 (3), 311–347. Gordon, Peter C., Randall Hendrick, Kerry Ledoux and Chin Lung Yang. (1999). Processing of reference and the structure of language: An analysis of complex noun phrases. Language and Cognitive Processes, 14 (4), 353–379. Grice, H. Paul. (1975). Logic and conversation. In P. Cole and J. L. Morgan (Eds.), Speech Acts. Syntax and Semantics, Volume 3 (pp. 41–58). New York: Academic Press. Grosz, Barbara J., Aravind K. Joshi and Scott Weinstein. (1995). Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21 (2), 203–225. Grosz, Barbara J. and Candace L. Sidner. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12 (3), 175–204. Gundel, Jeanette K., Nancy Hedberg and Ron Zacharski. (1993). Cognitive status and the form of referring expressions in discourse. Language, 69, 274–307. Hadic Zabala, Loreley and Maite Taboada. (2004). Centering Theory in Spanish: Coding Manual. Unpublished manuscript, Simon Fraser University. Halliday, Michael A. K. (1967). Notes in Transitivity and Theme in English. Part II. Journal of Linguistics, 3, 199–244. Halliday, Michael A. K. and Ruqaiya Hasan. (1976). Cohesion in English. London: Longman. Halliday, Michael A. K. and Christian M.I.M. Matthiessen. (2004). An Introdution to Functional Grammar (3rd ed.). London: Arnold.
212
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Heap, David. (1998). Optimalizing Iberian clitic sequences. In J. Lema and E. Treviño (Eds.), Theoretical Analyses on Romance Languages (pp. 227–248). Amsterdam and Philadelphia: John Benjamins. Hudson-D’Zmura, Susan and Michael K. Tanenhaus. (1998). Assigning antecedents to ambiguous pronouns: The role of the center of attention as the default assignment. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 199–226). Oxford: Clarendon. Hurewitz, Felicia. (1998). A quantitative look at discourse coherence. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 273–291). Oxford: Clarendon. Jackendoff, Ray. (1990). Semantic Structure. Cambridge, Mass.: MIT Press. Jaeggli, Osvaldo A. (1986). Arbitrary plural pronominals. Natural Language and Linguistic Theory, 4, 43–76. Jefferson, Gail. (1972). Side sequences. In D. Sudnow (Ed.), Studies in Social Interaction (pp. 294–338). New York: Free Press. Kameyama, Megumi. (1986). A property-sharing constraint in Centering, Proceedings of the 24th Annual Meeting of Association for Computational Linguistics (ACL-86) (pp. 200–206). New York, USA. —— . (1998). Intrasentential Centering: A case study. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 89–112). Oxford: Clarendon. Kuno, Susumu. (1987). Functional Syntax: Anaphora, Discourse and Empathy. Chicago: University of Chicago Press. Lambrecht, Knud. (1994). Information Structure and Sentence Form: Topic, Focus, and the Mental Representation of Discourse Referents. Cambridge: Cambridge University Press. Levin, Lori, Klaus Ries, Ann Thyme-Gobbel and Alon Lavie. (1999). Tagging of speech acts and dialogue games in Spanish Call Home, Proceedings of ACL-99 Workshop on Discourse Tagging. College Park, Md. Luján, Marta. (1999). Expresión y omisión del pronombre personal. In I. Bosque and V. Demonte (Eds.), Gramática descriptiva de la lengua española (Vol. 1: Sintaxis básica de las clases de palabras, pp. 1275–1315). Madrid: Espasa. Mendikoetxea, Amaya. (1999). Construcciones con se: medias, pasivas e impersonales. In I. Bosque and V. Demonte (Eds.), Gramática descriptiva de la lengua española (Vol. 2: Las construcciones sintácticas fundamentales; Relaciones temporales, aspectuales y modales, pp. 1631–1722). Madrid: Espasa. Miltsakaki, Eleni and Karen Kukich. (2004). Evaluation of text coherence for electronic essay scoring systems. Natural Language Engineering, 10 (1), 25–55. Pesetsky, David. (1987). Wh-in-situ: Movement and unselective binding. In E. Reuland and A. G. B. ter Meulen (Eds.), The Representation of (In)definiteness (pp. 98–129). Cambridge, Mass.: MIT Press. Pesetsky, David. (2000). Phrasal Movement and Its Kin. Cambridge, Mass.: MIT Press. Poesio, Massimo, Hua Cheng, Renate Henschel, Janet Hitzeman, Rodger Kibble and Rosemary Stevenson. (2000). Specifying the parameters of Centering Theory: A corpus-based evaluation using text from application-oriented domains, Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000) (pp. 400–407). Hong Kong.
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
213
Poesio, Massimo, Rosemary Stevenson, Barbara Di Eugenio and Janet Hitzeman. (2004). Centering: A parametric theory and its instantiations. Computational Linguistics, 30 (3), 309–363. Prince, Ellen F. (1981). Towards a taxonomy of given-new information. In P. Cole (Ed.), Radical Pragmatics (pp. 223–255). New York: Academic Press. Roberts, Craige. (1998). The place of Centering in a general theory of anaphora resolution. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 359–399). Oxford: Clarendon. Schegloff, Emmanuel, Gail Jefferson and Harvey Sacks. (1977). The preference for self-correction in the organization of repair in conversation. Language, 53, 361–382. Schiffrin, Deborah. (1994). Approaches to Discourse. Malden, Mass.: Blackwell. Sharp, Randy. (2005). A unified treatment of Spanish se. In A. Branco, T. McEnery and R. Mitkov (Eds.), Anaphora Processing: Linguistic, Cognitive and Computational Modelling (pp. 113–136). Amsterdam and Philadelphia: John Benjamins. Sidner, Candace L. (1983). Focusing in the comprehension of definite anaphora. In M. Brady and R. C. Berwick (Eds.), Computational Models of Discourse (pp. 267–330). Cambridge, Mass.: MIT Press. Stevenson, Rosemary, Rosalind A. Crawley and David Kleinman. (1994). Thematic roles, focus and the representation of actions. Language and Cognitive Processes, 9, 519–548. Stewart, Miranda. (1999). Hedging your bets: The use of yo in face-to-face interaction. Web Journal of Modern Language Linguistics, 4–5, http://wjmll.ncl. ac.uk/issue04-05/stewart.htm. Strube, Michael and Udo Hahn. (1999). Functional Centering: Grounding referential coherence in information structure. Computational Linguistics, 25 (3), 309–344. Suñer, Margarita. (1988). The role of agreement in clitic-doubled constructions. Natural Language and Linguistic Theory, 6, 391–434. Taboada, Maite. (2000). Cohesion as a measure in generic analysis. In A. Melby and A. Lommel (Eds.), The 26th LACUS Forum (pp. 35–49). Chapel Hill, N.C.: The Linguistic Association of Canada and the United States. Taboada, Maite. (2002a). Centering and pronominal reference: In dialogue, in Spanish, Proceedings 6th Workshop on the Semantics and Pragmatics of Dialog, EDILOG (pp. 177–184). Taboada, Maite. (2002b). Foco y pronominalización en la lengua hablada: Una primera aproximación. Documentos de español actual, 3–4, 173–200. Taboada, Maite. (2004). Building Coherence and Cohesion: Task-Oriented Dialogue in English and Spanish. Amsterdam and Philadelphia: John Benjamins. Taboada, Maite and Loreley Hadic Zabala. (2005). What are the units of discourse structure? Segmenting discourse within Centering Theory. Unpublished manuscript, Simon Fraser University. Tetreault, Joel R. (2001). A corpus-based evaluation of Centering and pronoun resolution. Computational Linguistics, 27 (4), 507–520. Turan, Ümit Deniz. (1995). Null vs. Overt Subjects in Turkish Discourse: A Centering Analysis. Unpublished Ph.D. dissertation, University of Pennsylvania, Philadelphia.
214
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Walker, Marilyn A. (1998). Centering, anaphora resolution, and discourse structure. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 401–435). Oxford: Clarendon. Walker, Marilyn A., Masayo Iida and Sharon Cote. (1994). Japanese discourse and the process of Centering. Computational Linguistics, 20 (2), 193–232. Walker, Marilyn A., Aravind K. Joshi and Ellen F. Prince. (1998). Centering in naturally occurring discourse: An overview. In M. A. Walker, A. K. Joshi and E. F. Prince (Eds.), Centering Theory in Discourse (pp. 1–28). Oxford: Clarendon. Walker, Marilyn A. and Ellen F. Prince. (1996). A bilateral approach to givenness: A hearer-status algorithm and a Centering algorithm. In J. K. Gundel and T. Fretheim (Eds.), Reference and Referent Accessibility (pp. 291–306). Wanner, Dieter. (1994). El orden de los clíticos agrupados en castellano. Thesaurus, 49 (1), 1–57. Webber, Bonnie Lynn. (1981). Structure and ostension in the interpretation of discourse deixis. Language and Cognitive Processes, 6 (2), 107–135. Yngve, Victor H. (1970). On getting a word in edgewise. In Papers from the Sixth Regional Meeting of the Chicago Linguistics Society (pp. 567–577). Chicago: University of Chicago. Zaenen, Annie, Jean Carletta, Gregory Garretson, Joan Bresnan, Andrew Koontz-Garboden, Tatiana Nikitina, M. Catherine O’Connor and Tom Wasow. (2004). Animacy encoding in English: Why and how. In B. Webber and D. K. Byron (Eds.), Proceedings of ACL-2004 Workshop on Discourse Annotation (pp. 118–125). Barcelona, Spain.
REFERENCE , CENTERS , AND TRANSITIONS IN SPOKEN SPANISH
215
9
Linguistic Claims Formulated in Terms of Centering A Re-Examination Using Parametric CB-Tracking Techniques massimo poesio
1. Introduction The notion of ‘discourse topic’1 plays an important role in a variety of linguistic theories: for example, in theories about the factors that affect the choice of np form (Givon, 1983; Ariel, 1990; Gundel et al., 1993)—especially for what concerns the use of empty subjects or objects in languages that allow them (Kameyama, 1985; Walker et al., 1994; Di Eugenio, 1998; Prince, 1998)—and in theorizing about languages in which ‘topics’ occupy fixed positions or topichood licenses certain types of movements, such as scrambling (Vallduvi, 1990; Rambow, 1993; Portner and Yabushita, 1998). One of the reasons for the interest in Centering (Grosz et al., 1995) among linguists is the hope that the theory may make this elusive notion of ‘discourse topic’ more precise, and such claims easier to verify (see, e.g., (Gundel, 1998; Beaver, 2004) ). In fact, however, a number of the key concepts of Centering are not fully defined, and behave more like ‘parameters’ to be ‘instantiated’ in different ways for each language (Walker et al., 1994; Poesio et al., 2004). In previous work, we attempted to identify more precisely the main claims of the theory, and compared several of its instantiations by developing reliable techniques to annotate a corpus with the information required to test Centering, and scripts that could be used to automatically compute cfs and cb of each utterance so as to find which instantiation led to the greater num-
216
ber of violations of the theory’s claims. In this chapter, we use these results to provide an independent evaluation of several claims made in the literature concerning the linguistic impact of topics. Conversely, we hope that these claims may serve as a different type of evaluation of the theory, serving as a different way of identifying the ‘best’ parameter instantiation—the ‘best’ instantiation being the one which makes more of such claims verified.
2. Centering: A Parametric Theory The notion of ‘topic’ or ‘discourse focus’ is notoriously difficult to formalize. We used as the basis for our investigation of this notion the terminology and ideas introduced in Centering Theory by Grosz et al. (1995) and Walker et al. (1998), in particular the notions of Backward-Looking Center (cb) and Preferred Center (cp). In the ‘mainstream’ version of Centering by Grosz et al. (1995), it is assumed that each utterance introduces new discourse entities (or Forward-Looking Centers, abbreviated as cfs) into the discourse, and in so doing, updates the ‘local focus’. For example, (1) mentions at least five discourse entities / cfs: the corner cupboard, the drawing, an engraving of the cupboard, Branicki, and Branicki’s attention, plus possibly abstract objects such as events and states. (1)
The drawing of the corner cupboard, or more probably an engraving of it, must have caught Branicki’s attention.
One of the results of the update of the local focus will be to single out these five entities as the most recently mentioned ones. In addition, according to Centering, the discourse entities / cfs realized by an utterance such as (1) are ranked; the most highly ranked entity in an utterance is called the Preferred Center, or cp. The cb is Centering’s equivalent of the notion of ‘topic’ or ‘focus’, and is defined as follows: CB cb(Ui), the backward-looking center of utterance Ui, is the highest ranked element of cf(Ui−1) that is realized in Ui.
Note that the theory provides no definition of the notions of ‘ranking’, ‘utterance’ and ‘realization’; researchers using the theory have to specify their own. In Poesio et al., (2004), several ways of instantiating Centering Theory’s parameters were studied, using the theory’s claims as a way of comparing these instantiations. We will assume the results of Poesio et al. (2004), and only consider in this chapter those instantiations of the ranking, utterance, and realization parameters found to result in fewest violations of the claims of Centering in that earlier study. One of the findings of Poesio et al. was that Centering’s claim that most discourse segments are locally coherent (Constraint 1, see later discussion) crucially depends on defining realization so as to allow for indirect realization: that is, a discourse entity counts as realized in an utterance even if that utterance only contains an associative reference to that entity, as in (2), where the door in (u2) is an associative refer-
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
217
ence to the house. Even though the house is not directly mentioned in (u2), it will be counted as realized in it—in fact, it will be its cb. (2)
(u1) John walked toward the house. (u2) The door was open.
Two more parameters are relevant for the discussion in this chapter: the definition of ‘utterance’ and the ranking function. In this study, we tested for each of these two parameters two definitions proposed in the literature, which according to the results of Poesio et al. could result in instantiations satisfying the theory’s claims. For what concerns the definition of utterance, we studied the results obtained by identifying utterances either with sentences, as implicitly done in much work on Centering, or with finite clauses, as suggested by Kameyama (1998). For ranking, we considered, first of all, ranking based on grammatical function (Kameyama, 1985; Grosz et al., 1995)—making entities realized as subjects rank higher than entities realized as objects, and these higher than entities ranked as adjuncts. (More precisely, we tested the ranking function that Poesio et al. called gftherelin, which adds a disambiguation factor based on linear order to grammatical function ranking (so that, for example, in a clause containing two entities realized in adjunct position, the entity realized in the leftmost position has highest rank).) Secondly, we tested ranking based on givenness, as proposed by Strube and Hahn (1999). According to Strube and Hahn, the ranking of entities depends on their givenness status as defined by Prince (1992): HEARER-OLD entities rank higher than MEDIATED entities, which in turn rank higher than HEARER-NEW entities. One of the most important claims of Centering is that packaging information in a certain way results in utterances that are easier to process than others. There are two distinct dimensions that make processing an utterance faster. First of all, according to Grosz et al., following an utterance about a certain topic (cb) with a second utterance about the same topic is easier to process than if the second utterance is about a different topic. Secondly, utterances in which the topic (cb) is realized in the most prominent position (i.e., as a cp) are easier to process than utterances in which the cb is not also the cp. The result of this dual classification is the 2×2 classification of utterances into different types of transitions shown in table 9.1.2 Traditional presentations of Centering do not distinguish clearly between definitions (such as the definition of cb given here) and the actual claims of the theory. Poesio et al. (2004) identified three main such claims.
TABLE
9.1.
CB(Un) = cp(Un) cb(Un) ¹ CP(Un)
218
Classification of Transitions in Centering Theory CB(Un) = CB(Un−1) or CB(Un−1) =NIL
CB(Un)
CONTINUE (con) RETAIN (RET)
SMOOTH-SHIFT(ssh) ROUGH-SHIFT (RSH)
¹ CB(Un−1)
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Constraint 1 (Strong): All utterances of a segment except for the first have exactly one cb. Rule 1 (GJW95): If any cf is pronominalized, the cb is. Rule 2 (BFP): Transition states are ordered. The continue transition (con) is preferred to the retain transition (ret), which is preferred to the smoothshift transition (ssh), which is preferred to the rough-shift transition (rsh).
Constraint 1 expresses the original claim from Joshi and Kuhn(1979) and Joshi and Weinstein (1981) that discourses with exactly one (or no more than one) ‘topic’ at each point are easier to process. A weaker version of the Constraint, allowing also for utterances with no topic, can also been found in the literature: Constraint 1 (Weak): All utterances of a segment except for the 1st have at most one cb. (Note that both versions of the Constraint express a dispreference for utterances having more than one topic.)
Rule 1 is the main claim of Centering about pronominalization. In the version presented here, it states a preference for pronominalizing the cb, if anything is pronominalized at all. (Other versions also exist.) Finally, Rule 2 (bfp) is a claim about coherence: it states a preference for preserving the cb over changing it, and for preserving it as the most salient entity over changing its relative ranking.3 A point about the theory that should be kept in mind in what follows is that Centering does not state ‘hard’ facts about language (i.e., the kind of facts whose violation leads to ungrammaticality judgments) but preferences which, when followed, lead to texts that are easier to process. The mere presence of a few exceptions to a claim does not, therefore, count as a falsification. For one thing, we should expect these preferences to interact with other constraints (a point not always emphasized enough in the Centering literature). And secondly, there may be no way of expressing a particular piece of information without violating some such preferences. So, at best, we can expect the three claims to be verified in a statistical sense: that is, that the number of utterances that verify such claims will be significantly higher than the number of utterances that violate them—and in fact, we may find that for some claims, even statistical significance will not be achieved.
3. A Corpus-Based Investigation of Centering The recent development of reliable annotation techniques for discourse, and the increased availability of discourse annotated corpora4 make it possible to subject the claims of seminal theories of salience and focus such as Centering to rigorous empirical testing. One of the main motivations for the work discussed in this chapter was the feeling that a variety of claims concerning the influence of topicality on language use could be given a more rigorous evaluation by building on the results of our already mentioned corpus-based study
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
219
of Centering (Poesio et al., 2004). In that work, we annotated a corpus with information claimed by theoreticians to affect the computation of the cb, and developed scripts that could simulate a variety of ways of updating the local focus, depending on their input parameters. These methods were used to test several definitions of cb and ways of setting the ‘parameters’ of Centering (in the sense discussed in the previous section) proposed in the literature, and to identify those which resulted in fewer violations of the claims of Centering. In this section, we discuss the data used in that study, how they were annotated, how the data were used to compute a variety of statistics about the claims of Centering, and the main results of the study.
3.1 The GNOME Corpus We begin with a brief discussion of the corpus we used for this study, the gnome corpus, and of its annotation. TEXTS CONTAINED IN THE GNOME CORPUS The gnome corpus currently includes texts from three domains. The museum subcorpus consists of descriptions of museum objects and brief biographical texts discussing the lives of the artists that produced them.5 The pharmaceutical subcorpus is a selection of leaflets providing the patients with mandatory information about their medicine.6 The crucial property of these texts is that entity coherence (Poesio et al., 2004) is the main device to ensure their cohesion; relational coherence (i.e., coherence ensured by rhetorical relations, in the sense of Rhetorical Structures Theory (Mann and Thompson, 1988) ) is less important. The third subcorpus consists of tutorial dialogues from the Sherlock corpus collected at the University of Pittsburgh, in which relational coherence plays a central role. Each subcorpus contains about 6,000 nps; in this study we used texts from the first two domains, for a total of about 3,000 nps, including about 500 pronouns, 1,100 other definite nps, 1,100 indefinite nps, and around 300 other nps including quantifiers, coordinated nps, gerunds, and so on. Among potential utterances, the corpus includes about 500 sentences and 900 finite clauses; the actual number of utterances used in the study is one of the parameters that we varied, as discussed later. ANNOTATION AND MARKUP SCHEME The annotation of the gnome corpus followed a systematic manual, available from the gnome corpus’s home page at http://cswww.essex.ac.uk/Research/nle/corpora/gnome/. (See Poesio (2004b, a) for details about the annotation; only the most important aspects of the scheme are discussed here.) A number of compromises are still necessary in discourse annotation as it is often difficult to reach satisfactory levels of agreement for many types of information one would want to annotate. One of the compromises we had to make in this work has to do with segmentation. Although a proper study of Centering would require segments to be identified, we couldn’t reach a
220
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
satisfactory level of agreement, so a heuristic approach was adopted. Instead of annotating segments, we relied on heuristics based on layout information, a great deal of which was annotated in our texts, including information about sections and paragraphs. The simpler such heuristics identified segments with certain type of text divisions (sections or paragraphs). We also tested the heuristic proposed by Walker (1989): treat every paragraph as a separate segment unless its first sentence contains a pronoun in subject position, or a pronoun whose agreement features are not matched by any other cf in the same sentence. These heuristics were all tested and compared by making the choice of segmentation heuristic one of the parameters of the scripts computing local focus information (see later discussion). All sentences were also marked as <s> elements; all sub-sentential units of text that might be identified with utterances (in the Centering sense) were also marked, using a separate tag, . A variety of attributes of <s> and elements were also annotated (e.g., to identify finite and non finite clauses, main clauses, etc.). Next, each np was marked with a (‘nominal expression’) tag and with a variety of attributes capturing syntactic and semantic properties. Important attributes for our purposes are cat (specifying the type of an np), gf specifying its grammatical function, the agreement features, and the deix feature (whether the object is a visual deictic reference or not). Finally, a separate element was used to mark anaphoric relations; the element itself specifies the index of the anaphoric expression and the type of semantic relation (e.g., identity), whereas one or more embedded elements indicate possible antecedents (the presence of more than one element indicates that the anaphoric expression is ambiguous). All markup was done in xml. In (3) parts of the xml markup of sentence (1) are illustrated. (3) <s . . . . . . > The drawing of the corner cupboard, or more probably an engraving of it , ...
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
221
...
3.2 Automatic Computation of Centering Information Perl scripts working off the annotated corpus automatically compute utterances, cfs and cb according to the particular parameter instantiation chosen, and find violations of Constraint 1, Rule 1, and Rule 2 (according to several versions of Rule 1 and Rule 2), and evaluate the claims using the statistical tests. The behavior of the scripts is controlled by a number of parameters, including: CBdef: which definition of cb should be used among those proposed in the literature, including those discussed in Grosz et al. (1983, 1995), Passonneau (1993), and Gordon et al. (1993). uttdef: identify utterances with sentences, finite clauses, or verbed clauses. realizes: which definition of realization to use. Only allow direct realization, or indirect realization via bridging references as well. cfselect: treat all nps as introducing cfs, or exclude certain classes. At the moment it is possible to omit first and second person nps, and/or nps in predicative position (e.g., a policeman in John is a policeman). ranking: rank cfs according to grammatical function, linear order, or a combination of the two as in (Gordon et al.(1993), or information status as in Strube and Hahn (1999). prodef: only consider for the purposes of Rule 1 third person personal pronouns (it, he, she, they), or also demonstrative pronouns (that, these), and/or the second person pronoun (you). segment(ation): identify segments using Walker’s heuristics, or with paragraphs, sections, or whole texts. Among the many other script parameters whose effect will not be discussed here, we will just mention those that determine whether implicit anaphors in bridging references should be treated as cfs; the relative ranking of entities in complex nps; and how to handle ‘preposed’ adjunct clauses. The algorithm used to compute the statistics concerning the violations of the claims is fairly straightforward, and we will therefore omit it here.7
222
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
3.3 The Effect of Parameter Instantiation As mentioned earlier, Poesio et al. identified three main claims of the theory: Constraint 1, Rule 1, and Rule 2 (each of which comes in several variants). The scripts just discussed compute the percentage of violations of each claim under each instantiation, which makes it possible both to test whether these claims are verified and to compare different instantiations. The first result obtained by Poesio et al. was that if the parameters are set in the most ‘mainstream’ way, the so-called Vanilla instantiation—identifying utterances with finite clauses, using grammatical function as ranking, and only allowing for direct realization—only Rule 1 (GJW 95 and GJW 83) is clearly verified. The results concerning Constraint 1 are especially negative: with this instantiation only 35% of utterances are continuous in the sense of Karamanis (2003) (i.e., cf(Un) Ç cf(Un−1) ≠ 0), so that only the weak version of Constraint 1 (requiring an utterance to have at most one cb) is verified. Strong C1, the bestknown formulation of the constraint requiring every utterance to have exactly one cb, and the one that in our view best captures the idea of ‘entity coherence’, clearly doesn’t hold. Another interesting observation is that if ranking is only required to be partial, some utterances end up with more than one cb: the percentage of such utterances is only 1% with the Vanilla instantiation, but can be as high as 6% with some instantiations. This is perhaps obvious, but to our knowledge had not been previously discussed. The violations of Strong C1 can be eliminated by augmenting grammatical function ranking in the sense discussed earlier (subjects rank more highly than objects that rank more highly than adjuncts) with a linear disambiguation factor, thus obtaining the ranking function gftherelin already mentioned. As for Rule 2, with the Vanilla instantiation the version proposed by Brennan et al. is verified by a Page Rank test (Siegel and Castellan, 1988, p. 184–188), but arguably, the most striking fact with this instantiation as far as transitions are concerned is the prevalence in the corpus of non-continuous transitions seldom or not at all discussed in the Centering literature: null transitions, in which an utterance without a cb is followed by a second one (47.9%); Establishments (est) in which an utterance without a cb is followed by an utterance which does have one (18.8%); and zeros, the opposite case of est (16.7%). All together, the four types of transitions falling under the remit of Rule 2 as classically formulated account for only 16% of utterances; and if Smooth Shifts and Rough Shifts are counted together, with the Vanilla instantiation there are more shifts than retains. Other classifications and versions of the rule do not correlate much better with the observed frequencies: for example, only 39% of entity-coherent transitions (139 out of 357), and 14% of the total, are cheap in the sense of Strube and Hahn (1999) (i.e., cp(Un−1) predicts cb(Un) ). One of the goals of this study is to use linguistic claims such as the proposed correlation between form of the subject and type of transition to investigate proposals in the literature to ‘collapse’ some of these distinctions by, for example, merging est and con (Walker et al., 1994); we return to this topic in section 5.
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
223
These findings concerning the Vanilla instantiation do not mean, however, that the claims of Centering do not hold, because it turns out that parameters do matter. That is, it is possible to define ‘utterance’, ‘ranking’, and ‘realization’ in such a way that all three claims come out verified (in a statistical sense). Because Strong C1 is the claim with the largest percentage of violations, the parameters whose setting matters the most when trying to find an instantiation in which all claims are satisfied are those controlling utterance definition and cf realization. Considering a center as realized in an utterance which contains a bridging reference to that center is sufficient for Strong C1 to be verified; identifying utterances with sentences instead of finite clauses also has a strong positive effect. With the resulting families of instantiations, that we called IF (for Indirect realization, Finite clauses) and IS (for Indirect realization, Sentences), Strong C1 is verified, as well as the two ‘basic’ versions of Rule 1. As stated earlier, in this study we only considered members of the if and is families of instantiations of the theory—that is, we always assumed that discourse topic could be maintained even by indirect realization with bridging references, as in example (2). Poesio et al. also found, however, that there is a tradeoff between Strong C1, on one side, and Rule 1 and Rule 2, on the other: the changes to the utterance and realization parameters just mentioned reduce the violations of Strong C1, but increase those of Rule 1 and Rule 2. Identifying utterances with sentences, or (to a lesser extent) allowing indirect realization, results in statistically significant increases in the number of violations to Rule 1—up to a total of 7.4% in the is instantiation—although Rule 1 (GJW 95) and Rule 1 (GJW 83) are so robust that they are still verified even in these instantiations. These changes to the utterance and realization parameters have an even greater impact on Rule 2 (BFP), a claim only weakly verified with the Vanilla instantiation. With the if and is instantiations, and grammatical function ranking, we find many more rsh than ssh, and many more ret than ‘pure’ con (i.e., without counting est); indeed, in the is instantiation with gftherelin ranking, ret are the second most common transition. As a result, Rule 2 (BFP) is only verified with is instantiations at the .05 level, and with if instantiations only if second person pronouns are counted as realizations of cfs. On the positive side, with these instantiations a much greater percentage of utterances—45%—is classified as either con, ret, ssh or rsh, and a further 16% as est. Poesio et al. found that these results can be further strengthened by adopting the ranking function proposed by Strube and Hahn (1999), instead of gftherelin. With this instantiation, Rule 2 (BFP) is verified at the .01 level, rather than only at the .05 level. This is because although the strube-hahn ranking function has no effect on Strong C1 (obviously) or R1 (more surprisingly), it does result in some of the ret becoming con, and some of the ssh becoming rsh. Even though we still find more ret than con and more rsh than ssh, these changes are enough to make Rule 2 (BFP) verified at the .01 level by a Page Rank test with the is instantiation. Strube and Hahn’s own version of Rule 2 still isn’t verified, but this version of
224
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
the rule is not verified by any of the instantiations we evaluated. In other words, with the is or if instantiation and strube-hahn ranking, all three claims of the theory are verified at the .01 level.
3.4 Investigating Linguistic Claims about Topic via Automatic Topic Tracking As we said earlier, the notion of cb was put forward as a formalization of the notion of discourse topic, but the fact that central notions were only partially specified made it difficult to use it as a tool except by adopting often arbitrary definitions of notions such as utterance. The results just discussed suggest that by adopting either the if or the is setting (i.e., allowing for indirect realization, and identifying utterances either with finite clauses or with sentences), and either the ranking function that Poesio et al. called gftherelin or the ranking function proposed by Strube and Hahn, and that we will call strube-hahn, we obtain instantiations of the theory which are sufficiently precise to allow for computer implementations, and such that all three main claims are verified. We believe that further narrowing down of the range of options, or indeed, refinements such as defining ranking functions that conflate grammatical function and hearer status, are not desirable at this stage, and best left to investigations using online experiments with human subjects. (And anyway, there is some debate as to whether the same ranking function would be ‘best’ for all languages, see, for example, Walker et al., 1994; Turan, 1998; Prasad and Strube, 2000; Miltsakaki, 2002.) Nevertheless, the results of Poesio et al. already narrow down the range of plausible definitions of the parameters of Centering enough to make it possible to use Centering theory for one of the purposes for which it was conceived—to investigate a variety of correlations between topichood and linguistic usage by running through an (annotated) corpus computer programs that simulate local focus update and then compute correlations between, say, cb-hood and subjecthood in an automatic fashion. (A correlation holding with all of the four variants would, of course, be especially promising.) A systematic investigation of such correlations will require a much larger corpus, also including texts from genres such as fiction and conversational dialogue. For the moment, we can only run pilot studies of the type of investigation that, in our view, methods like those we are suggesting will make possible in the future. In the rest of the chapter, we discuss four such studies.
4. A First Pilot Study: Demonstratives The first example study we discuss, already reported in Poesio and Modjeska (2005), is an investigation of the suggestion that this-Noun Phrases— demonstrative pronouns this and these, and full nps with this as a determiner—are primarily used to refer to entities that are somehow ‘salient’, but without being the ‘focus’ or ‘topic’ of the discourse (Linde, 1979; Gundel et al.,
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
225
1993; Passonneau, 1993). Examples of entities that are felicitously realized by means of this-nps are entities in the visual situation, or ‘deixis’, such as the room mentioned in (4) (Kaplan, 1979; Jarvella and Klein, 1982; André et al., 1999); abstract objects such as propositions, facts, or types, implicitly introduced in the discourse without being explicitly mentioned, as in (5) (Asher, 1993; Webber, 1991); and entities mentioned in a discourse, but not in most salient position, such as the area mentioned in (6) (Linde, 1979; Gundel et al., 1993; Passonneau, 1993). (4)
A [inside a room, looking around]: This room is incredibly dirty.
(5)
For example, binocular stereo fusion is known to take place in a specific area of the cortex near the back of the head. Patients with damage to this area of the cortex have visual handicaps but they show no obvious impairment in their ability to think. This suggests that stereo fusion is not necessary for thought. (Webber, 1991)
(6)
a. In spite of his French name, Martin Carlin was born in Germany and emigrated to Paris to become an ébéniste. b. He settled there with other German and Flemish craftsmen and took employment in the workshop of Jean-Francois Oeben, whose sister he married. c. Inventories made after Carlin’s death show that the ébéniste and his wife lived modestly in a five-room apartment in THE FAUBOURG SAINT-ANTOINE, an unfashionable quarter of Paris, with simple furniture, a few pastel portraits, and a black lacquer clock. d. Few of Carlin’s wealthy clientele would have cared to venture into this area
Gundel et al. (1993) proposed that the np form chosen to realize discourse entities results from the interaction of two factors: the speaker’s assumptions about the status of such entities in the addressee’s cognitive/mental state, and Grice’s Maxim of Quantity (Grice, 1975) requiring speakers to be as informative as consistent with their knowledge, but no more informative. Gundel et al. propose that there are six possible ‘cognitive statuses’, organized in a ‘Givenness Hierarchy’ reflecting increasing mental salience. The statuses relevant to the distribution of this-nps are the highest two: activated and in focus. In order for a this-np to be felicitous, the discourse entity it realizes is required to be at least activated: Gundel et al. define ‘activated’ as ‘being in short-term memory’. On the other hand, if an entity were ‘in focus’, we would expect the speaker to be as informative as possible and therefore use a pronoun, the type of np typically used to realize such entities, unless other factors made the use of a pronoun infelicitous. Gundel (1998) already proposed that notions from Centering can be used to specify more precisely which entities may be ‘in focus’, although she also
226
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
calls for a modification of the theory, arguing that more than one entity may be ‘in focus’ (we return to this topic in the discussion section). Poesio and Modjeska (2005) identified three ‘natural’ ways of formalizing the notion of entity ‘in focus’ used by Gundel et al. in terms of the conceptual vocabulary of Centering. An entity may be said to be ‘in focus’ if it is 1. cb(Ui), the cb of the present utterance; or perhaps 2. cb(Ui−1), the cb of the previous utterance; or perhaps 3. cp(Ui−1), the most highly-ranked entity of the previous utterance. As Gundel et al. did not provide details concerning the identification of active entities,8 Poesio and Modjeska devised their own, reliable scheme, introducing, however, a new term, active, to identify the notion of ‘activation’ specified by this scheme. Poesio and Modjeska analyzed every this-np in the subset of the gnome corpus also used for the other studies in this chapter, using the scripts developed by Poesio et al. (2004) to compute utterances and their cb and cp according to the four instantiations of Centering we are considering: if + gftherelin, if + strube-hahn, is + gftherelin, and is + strube-hahn. That analysis revealed that virtually all this-nps in the corpus are active, and that between 90 and 93% of this-nps (depending on the instantiation) were used to refer to entities other than cb(Ui−1); between 75 and 80% to refer to entities other than cp(Ui−1); and between 61 and 65% to refer to entities other than cb(U). They concluded that the distribution of this-nps in the corpus used for both this study and theirs is best characterized by what they called the this-np hypothesis: THIS-NP Hypothesis: this-nps are used to refer to entities which are active (in the sense of Poesio and Modjeska). this-nps are preferred for entities other than cb(Ui−1).
5. A Second Pilot Study: Type of Transition versus Form of Subject Many types of correlations between topicality and subjecthood have been proposed in the linguistic literature (e.g., Li, 1976; Givon, 1983). In the Centering literature, this correlation—which is of particular interest for Natural Language Generation—has been studied with respect to languages such as Japanese (Kameyama, 1986), Italian (Di Eugenio, 1998), and Turkish (Turan, 1998), resulting in the hypothesis that in languages with both a ‘weak’ and a ‘strong’ pronominal form, the form of the subject of an utterance is affected by the type of transition (con, ret, etc.) that that utterance realizes. Typically, it was argued, weak pronominal forms are preferred for the subjects of utterances expressing continuations (con), whereas strong pronominal forms are preferred for the subject when the utterance expresses a center shift (ssh, rsh) or center retain (ret). In the case of English, Passonneau (1998) and others found a similar correlation between con and personal pronouns, whereas other transitions correlated more with demonstrative pronouns.
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
227
In this section we present results concerning the correlation between uses of pronouns and full nps in subject position and transition types obtained with our corpus, with all four instantiations that we are considering in this chapter. In part the intention is simply to reexamine previous results from the Centering literature using a different genre, reliably annotated data, and automatic cb-tracking techniques allowing for different instantiations of some of the crucial parameters. In addition, however, we hope that this investigation may shed some light on the role of the non-continuous transitions (est, null, and zero) in Centering. The reader should again keep in mind that the results can only be viewed as preliminary, given the low frequency of some events.9 IS+GFTHERELIN If we treat sentences as utterances, we have 669 utterances, of which 418 have a subject.10 The full contingency table for the instantiation is (indirect realization, sentences as utterances) with ranking function gftherelin, is as follows. This contingency table cannot be used for a χ2 test, because of the low or zero counts in some of the cells; we need, therefore, to collapse some of the distinctions between transitions. First of all, we could collapse demonstrative nps with full nps, under the assumption that demonstrative nps are not weak forms (Passonneau, 1993). This results in the following table, where the percentages of cases in which the subject is realized with a given form for that type of transition are also indicated. Nancy Hedberg suggested to us a very interesting way of analyzing these figures. Table 9.3 indicates that with this configuration, transitions can be grouped in three classes for the purposes of predicting the form of the subject. One class includes con and ssh (the transitions in which cb=cp) in which pronouns are used about four times the average chance of using a pronoun in subject position for the whole set of transitions (40% vs. 11.2%). The second class includes ret, rsh, zero, and null, with which, on the contrary, the chance of using a pronoun (2.7%) is about a quarter of the average use of pronouns in subject position across all transitions. Finally, there are Establishments, in which we find pronouns in subject position with exactly
9.2. Contingency Table for the Correlation between Form of the Subject and Transitions, with IS+GFTHERELIN
TABLE
EST CON RET SSH RSH ZERO NULL
Totals
228
Pers Pronoun
Dem Pronoun
Full NP
Total
10 21 2 9 3 0 2 47
1 3 3 0 1 3 1 12
77 30 79 13 60 47 53 359
88 54 84 22 64 50 56 418
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
9.3. Correlation between Form of the Subject and Transitions, IS+GFTHERELIN
TABLE
EST CON RET SSH RSH ZERO NULL
Totals
Pronoun
Non-Pronoun
Total
10 (11.4%) 21 (38.9%) 3 (4.7%) 9 (40.9%) 3 (4.7%) 0 (0%) 2 (3.6%) 47 (11.2%)
78 (88.6%) 33 (61.1%) 61 (95.3%) 13 (59.1%) 61 232(95.3%) 50 (100%) 54 (96.4%) 371(88.8%)
88 54 64 22 64 50 56 418
the same frequency as we find them in the entire set of transitions. This grouping is summarized in table 9.4. This table has an extremely high value of χ2 = 79.03 (with 2df, p < 0.001). We will present the results obtained with the other configurations before discussing these results further. IF+GFTHERELIN We consider next the results obtained with the if+gftherelin configuration: identifying utterances with finite clauses instead
of sentences, while still ranking according to grammatical function with linear order as tie breaker. If we identify utterances with finite clauses, we obtain 972 utterances instead of 669. However, only 585 of these utterances have subjects (see earlier discussion), and of these, 32 are relative clauses in which the subject is grammatically constrained to realize an entity of the previous utterance (the clause in which the relative occurs). We excluded these relative clauses from the analysis. The result is following contingency table 9.5, in which we have already collapsed demonstrative pronouns and full nps, and indicated the percentages of utterances performing a certain transition whose subject is realized as a pronoun. These data exhibit a pattern very similar to that found by identifying utterances with sentences. Again, the transitions in which cb=cp, con and ssh, have similar frequencies of pronouns in subject position, much higher than average—in fact, with this configuration, the majority of utterances
9.4. Similar Types of Transitions for Subject Form Prediction
TABLE
CON+SSH
Pronoun
Non-Pronoun
Total
30 (39.5%)
46 (60.5%)
76
7 (2.7%) 10 (11.4%) 47 (11.2%)
247(97.2%) 78 (88.6%) 371(88.8%)
254 88 418
RET+RSH+ ZERO+NULL EST
Total
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
229
9.5. Contingency Table for the Correlation between Form of the Subject and Transitions, IF+GFTHERELIN
TABLE
EST CON RET SSH RSH ZERO NULL
Totals
Pers Pronoun
Non-Pronoun
Total
25 (20.8%) 36 (51.4%) 0 (0%) 17 (65.4%) 4 (6.2%) 0 (0%) 8 (8.5%) 90 (16.3%)
95 34 99 9 60 80 86 463
120 70 99 26 64 80 94 553
performing these transitions have a pronoun in subject position. (These percentages, particularly for Smooth shifts, are now very similar to those reported by Di Eugenio (1998) for zero pronouns in subject position in Italian, using a very similar configuration—see later discussion.) Again, with a second group of transitions consisting of ret, rsh, zero, and null, we find that pronouns in subject position are much rarer than average (overall, around 3%). Finally, we find the est transitions, in which the frequency of occurrence of pronouns in subject position is again very similar to the average frequency of pronouns in subject position across the entire set of transitions. These results are summarized in table 9.6. Next, we examine the results obtained using the other ranking function giving the best results, strube-hahn. IS+STRUBE-HAHN As already reported in Poesio et al. (2004), using as a ranking function strube-hahn, instead of gftherelin, has the effect of turning a significant number of ret into con, and of rsh into ssh. (Changing the ranking function, of course, doesn’t affect the total number of utterances, or of continuous utterances.) Contingency table 9.7 obtained using strubehahn is shown as follows. Again we observe that transitions naturally group in three classes as far as the frequency of pronouns in subject position is concerned: con and ssh, est (whose percentage of pronouns in subject position is again almost exactly the TABLE 9.6. Similar Types of Transitions for Subject Form Prediction, IF+GFTHERELIN
CON+SSH
Pronoun
Non-Pronoun
Total
53 (55.2%)
43
96
12 (3.6%) 25 (20.8%) 90 (16.3%)
325 95 463
337 120 553
RET+RSH+ ZERO+NULL EST
Total
230
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
TABLE 9.7. Correlation between Form of the Subject and Transitions, IS+STRUBE-HAHN
EST CON RET SSH RSH ZERO NULL
Totals
Pers Pronoun
Non-Pronoun
Total
10 (11.4%) 22 (28.6%) 2 (3%) 10 (28.6%) 1 (2.1%) 0 (0%) 2 (3.6%) 47 (11.2%)
78 55 63 25 46 50 54 371
88 77 65 35 47 50 56 418
same as the average occurrence of pronouns in subject position across all transitions), and everything else. The main differences are that with gftherelin we find more con than ret and more ssh (still fewer than rsh, though), which seems to lead to a reduced percentage of pronouns in subject position for con and ssh—on average, about 28% as opposed to almost 40% with gftherelin. IF+STRUBE-HAHN Finally, we come to the last of the four ‘best’ configurations: again using strube-hahn, but identifying utterances with finite clauses as opposed to sentences. The ‘transformed’ contingency table for this instantiation is table 9.8. These percentages are very similar to those obtained with gftherelin, apart from a change in the relative frequency of con and ret, and of ssh and rsh; and again, we find the same three natural groups of transitions. SUMMARY AND PRELIMINARY DISCUSSION A clear pattern emerges from this analysis. Irrespective of the way utterance and ranking are defined, transitions can be classified into three groups as far as the frequency of occurrence of pronouns in subject position. On one side, we have the transitions in which cb always equals the cp, con, and ssh: with these transitions the frequency of pronouns in subject position is three to four times the average, and, in fact, if we identify utterances with finite clauses, more than half of such transitions have
9.8. Correlation between Form of the Subject and Transitions, IF+STRUBE-HAHN
TABLE
EST CON RET SSH RSH ZERO NULL
Totals
Pers Pronoun
Non-Pronoun
Total
25 (20.8%) 35 (38.0%) 0 (0%) 19 (47.5%) 3 (6.8%) 0 (0%) 8 (8.5%) 90 (16.3%)
95 57 83 21 41 80 86 463
120 92 83 40 44 80 94 553
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
231
pronouns in subject position. On the opposite side, we have the transitions where the cb is never the same as the cp: ret, rsh, zero, and null. With these transitions, pronouns in subject position are three to four times less frequent than average. Establishments stand right in the middle: with this type of transition, in which the cb may or may not be the same as the cp, the frequency of occurrence of pronouns in subject position is almost exactly the same as the average frequency. All of these correlations are highly significant by an χ2 test. These results suggest that at least from this perspective, it is not a good idea to conflate est and con, contrary to what is suggested in Walker et al. (1994); establishments appear to behave differently from other types of transitions. The results also suggest that simply deciding the form of np to be used in subject position depending on the transition wouldn’t result in good algorithms for determining np type. This reflects a general finding of Poesio et al. (2004): when deciding how to express certain information, coherence conflicts with what we called a principle of variety— do not use the same forms over and over again. This point will be illustrated again when we discuss the correlation between transitions and segment boundaries. Finally, it is interesting to compare these results with the results obtained by Di Eugenio when analyzing the correlation between transition and pronoun type (weak or strong) for Italian (Di Eugenio, 1990, 1998), particularly in light of the proposal in Di Eugenio (1990) that null subjects signal con, whereas strong pronouns signal ret or shift. In the later paper, Di Eugenio (1998) found the percentages in table 9.9, using a configuration very close to that called here if+gftherelin. (Di Eugenio also used in her analysis a Center-Establishment transition similar to our est.) We think there is at least one interesting point emerging from this comparison: again, est behaves very differently from con—in this case, it patterns almost exactly like ret and ssh.
6. Third Pilot Study: The Interaction between Local and Global Focus In the intentions of Grosz and colleagues (Grosz and Sidner, 1986; Grosz et al., 1995), Centering was only part of an overall theory of the ‘Attentional State’—that is, of what makes discourses coherent and what is salient in
9.9. Correlation between Form of the Pronoun in Subject Position and Transitions for Italian in (Di Eugenio, 1998)
TABLE
Zero EST CON RET RSH SSH
OTHER Total
232
12 (54.5%) 56 (81.2%) 4 (50%) 0 6 (54.5%) 2 (66.6%) 80 (70.8%)
Strong
Total
10 (45.5%) 22 13 (18.80%) 69 4 (50%) 8 0 0 5 (45.5%) 11 1 (33.3%) 3 33 (29.2%) 113
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
discourse. Specifically, Centering is the part of the Grosz et al. framework theory formalizing the ‘local’ aspects of the attentional state (or local focus): local coherence (i.e., coherence between utterances) and local salience (i.e., which entities are most salient after processing an utterance). Grosz and Sidner also developed a second theory formalizing what they called the ‘global’ focus: how discourse segments must be related in order for a discourse to be (globally) coherent, and which segments are most salient at a given point in time. This distinction between local focus and global focus is broadly consistent with the distinction between different ‘cognitive statuses’ underlying the Givenness Hierarchy hypothesis discussed earlier (Gundel et al., 1993; see also Poesio and Modjeska, 2005). This distinction also finds some support in psychological research: for example, there is evidence for a distinction between expressions whose interpretation is preferentially to be found in the local focus, such as pronouns, and expressions whose interpretations tend to be interpreted with respect to the global focus, such as definite descriptions (Garrod, 1993). However, Grosz and Sidner did not provide a fully worked out theory of the relation between the two aspects of the attentional state, which could provide answers to questions such as: do segment boundaries also represent discontinuities at the local level? (I.e., do discourse segments correspond to switches in discourse topic/cb?) Are segments supposed to be fully locally coherent? (I.e., can there be any utterances without a cb?) Finally, there is a need to clarify the predictions of the theory as far as the use of different forms is concerned: it has long been known, for example, that pronouns can also be used to refer to entities last mentioned two or more utterances before (Fox, 1987; Hitzeman and Poesio, 1998). Further progress will be needed along the integration between the Grosz et al. framework and the Givenness Hierarchy framework of Gundel et al. (preliminary suggestions can be found in Gundel, 1998; Poesio and Modjeska, 2005). In this section we discuss preliminary findings in two areas related to these questions: the interaction between global and local coherence, and the use of pronominal forms to refer to entities introduced more than two utterances before. The reader should keep in mind that these results are even more preliminary than those discussed in previous sections, as the gnome corpus isn’t properly annotated for segments. As discussed earlier, segmentation heuristics were used: we tested identifying segments with paragraphs, sections, and the heuristic proposed in Walker(1989). As this latter gave the best results concerning the claims of the theory, we used it for the computations discussed in this section.
6.1 The Correlation between Transitions and Segment Boundaries Two studies relevant to the questions explored in this section were carried out by Passonneau (1998) and Walker (1998), who studied whether transitions predict segment boundaries, that is, whether establishments and shifts occur more at segment boundaries, and continuations prevail within a segment.
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
233
Passonneau (1998) used data from the Pear stories (Chafe, 1980) annotated for segments by Passonneau and Litman (1993) to measure the usefulness of Centering transitions as predictors of segment boundaries. Passonneau tested two classification systems for transitions: the scheme due to Grosz et al. (1995) that distinguishes between con, ret, and shift (that she called ‘Version A’), and the one proposed in Kameyama et al. (1993) that classifies them into ret1 (= con+ret), est (our est), and null (our null) (that she called ‘Version B’). Passonneau measured the accuracy of the shift Centering transition (for Version A) and the null transition (for Version B) as boundary predictors. The measures she used were the two measures from Information Retrieval traditionally used for evaluation in nlp—precision P (the percentage of instances of the chosen Centering transitions actually corresponding to boundaries) and recall R (the percentage of boundaries correctly predicted by the chosen Centering transitions)—together with a new measure, error rate, defined as the percentage of incorrect associations between transitions and boundaries: for example, for Version A, E=(con/ret at boundary + shift at nonboundary) total and similarly for Version B (replacing shift with null, etc.). Passonneau found that neither type of transition was a good predictor of segment boundaries, primarily because precision was very low. For shift as boundary predictor, she found R=.78, P=.25, E=.41; for null, R=.86, P=.26, E=.40. Walker (1998) classified 98 utterances containing the discourse cue now (a fairly good indicator of the beginning of a new segment) as well as to an equal number of other utterances according to the scheme in table 9.1. Walker found that even though the distribution of transitions at segment boundaries was significantly different from that found with the other utterances (e.g., only about 2% of boundaries were con, as opposed to 43% for other utterances), again the correlation between segment boundaries and cb changes was very imperfect: for example, two thirds of segment-initial boundaries had a cb. In these two studies only one instantiation of the theory was considered. The goal of the study reported here was to test whether a better result could be obtained using the ‘best’ instantiations identified in Poesio et al. (2004) and already used for the two previous studies. As seen earlier, with this configuration we have 972 utterances. Of these, 225 are segment boundaries if segments are identified using the Walker heuristic. The relation between transitions and boundaries with this instantiation is shown in the contingency table in table 9.10. This table suggests a number of interesting observations. First of all, although the results of the χ2 test for this table are highly significant (with 6df, χ2 = 39.1, p < .001), it is nevertheless clear that the correlation IF+GFTHERELIN
234
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
9.10. Correlation between Transition Types and Segment Boundaries: IF+GFTHERELIN
TABLE
CON RET SSH RSH EST ZERO NULL
Total
Boundary
Not Boundary
Total
18 (19.4%) 16 (11.8%) 4 (6.9%) 9 (12.1%) 54 (32.0%) 46 (31.9%) 78 (26.1%) 225 (23.2%)
75 (80.6%) 119 (88.2%) 54 (93.1%) 65 (87.9%) 115 (68.0%) 98 (68.1%) 221(73.9%) 747 (76.8%)
93 135 58 74 169 144 299 972
between cb continuations and segment continuations is imperfect at best: the percentage of con that are boundaries (about 19%) is not that much lower than the overall percentage of boundaries (about 23%). Secondly, the table gives further support to the view that est behave differently from con, already emerging from the earlier discussion about the correlation between transition type and subject type. Whereas slightly less that 1/5 of continuations are segment boundaries, almost 1/3 of est are, a higher percentage than that for utterances overall (1/4). In fact, est are more frequently boundaries that the two shifts or ret; the only other transitions that correlate as highly with segment boundaries are zero and null (1/3 of zero and 26% of null are boundaries, as well). (The fact that the two types of shift are much less likely to occur at boundaries than continuations is also worth pointing out.) These results suggest a way of using transitions to predict segment boundaries that is different both from what would be suggested by a simpleminded view of the relation between the local focus and the global focus, according to which we get a new segment every time we find a new cb (i.e., use ssh+rsh+est+zero+null to predict segment boundaries), and from the two methods studied by Passonneau (use shift or null to predict boundaries). Instead, the best results at predicting boundaries appear to be obtained by dividing transitions in, on the one hand, the four continuous transitions, con, ret, ssh, and rsh; and on the other, the three transitions in which at least one of the two utterances doesn’t have a cb. The four continuous transitions are less likely to be found at boundaries than average, whereas the four non-continuous ones are more likely. The distribution obtained by partitioning transitions in this way is shown in table 9.11. Using est+zero+null as boundary predictors, and the three measures used by Passonneau, we find results very comparable with hers: P=178/612 (29%), R=178/225 (79.1%) (F= 54.0%), and E=47+434/972 (49.5%). Using the continuous transitions as predictors of non-boundaries we find worse recall and better precision, but for better overall results: P = 313 / 360 (86.9%) and R=313/747 (41.9%), F= 64.4% (E obviously stays the same).
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
235
9.11. Two Classes of Transitions with Respect to Segment Boundary Prediction
TABLE
Boundary
Not Boundary
Totals
47 (13.0%)
313
360
178(29.1%) 225(23.1%)
434 747
612 972
CON+RET+ SSH+RSH EST+ZERO+ NULL
Totals
The transition/boundary distribution with this instantiation (with which we only get 669 utterances) is summarized in table 9.12. The distribution is also significant (with 6df, χ2 = 28.7, p < 0.001), and the results are very similar to those obtained with if+gftherelin. Again, we find that the three non-continuous transitions (est, zero, and null) occur at boundaries with greater frequency than continuous transitions; and that est are much more frequently boundaries than con. Again we find that the percentage of con occurring at boundaries is almost the same as the average (32% vs. 33.6%), and much higher than the likelihood of ret, rsh, and ssh to occur at boundaries. The best way of collapsing these classes into two categories for the purposes of predicting boundaries or non-boundaries is also that observed with if+gftherelin: the transitions occurring at boundaries more frequently than average are est +null+zero, whereas the transitions occurring less frequently at boundaries are the continuous transitions. The occurrences are shown in table 9.13. Using the non-continuous transitions as segment boundary predictors, we get higher precision than we obtained when identifying utterances with finite clauses, P=41.6% (153/368), but lower recall, R= 68% (153/225), (F= 54.0%.) The error rate is a bit lower, E=42.9%(72+215/669). Using the continuous transitions to predict non-boundaries, we find P=76.1% (229/301) and R=51.6% (229/444), for an F=63.9%. IS+GFTHERELIN
9.12. Correlation between Transition Types and Segment Boundaries: IS+GFTHERELIN
TABLE
CON RET SSH RSH EST ZERO NULL
Total
236
Boundary
Not Boundary
Total
23 (31.9%) 26 (21.5%) 7 (21.9%) 16 (21.0%) 49 (45.8%) 45 (44.6%) 59 (36.9%) 225 (33.6%)
49 95 25 60 58 56 101 444
72 121 32 76 107 101 160 669
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
9.13. Two Classes of Transitions with Respect to Segment Boundary Prediction
TABLE
Boundary
Not Boundary
Totals
72 (23.9%)
229
301
153 (41.6%) 225 (33.6%)
215 444
368 669
CON+RET+ SSH+RSH EST+ZERO
+NULL Totals
IF+STRUBE-HAHN The division of transitions into those occurring more frequently than average at boundaries and those occurring less frequently found with this instantiation is again very similar to those found with the previous two instantiations, and the percentages are similar to those observed with if+gftherelin. The overall distribution is shown in table 9.14. This distribution is just as unlikely to be due to chance as that obtained with if+gftherelin: χ2 = 37.5, p ≤ 0.001. The ‘collapsed’ distribution is shown in Table 9.15. The effectiveness of transitions computed according to this scheme for predicting segment boundaries is again similar to that obtained with the if+gftherelin. Using discontinuous transitions as predictors of segment
TABLE 9.14. Correlation between Transition Types and Segment Boundaries: IF+STRUBEHAHN Boundary CON RET SSH RSH EST ZERO NULL
Total
Not Boundary
Totals
98 104 53 58 115 98 221 747
117 119 57 67 169 144 299 972
19 (16.2%) 15 (12.6%) 4 (7.0%) 9 (13.4%) 54 (32.0%) 46 (31.9%) 78 (26.0%) 225 (23.1%)
9.15. Two Classes of Transitions with Respect to Segment Boundary Prediction
TABLE
Boundary
Not Boundary
Totals
47 (13.0%)
313
360
178 (29.1%) 225 (23.1%)
434 747
612 972
CON+RET+ SSH+RSH EST+ZERO+ NULL
Totals
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
237
9.16. Correlation between Transition Types and Segment Boundaries: is+strubehahn
TABLE
CON RET SSH RSH EST ZERO NULL
Total
Boundary
Not Boundary
Totals
28 (29.8%) 22 (21.4%) 9 (20.4%) 13 (21.7%) 49 (45.8%) 45 (44.5%) 59 (36.8%) 225 (33.6%)
66 81 35 47 58 56 101 444
94 103 44 60 107 101 160 669
TABLE 9.17. Two Classes of Transitions with Respect to Segment Boundary Prediction, IS+STRUBE-HAHN Boundary
Not Boundary
Totals
72 (23.9%)
229
301
153 (41.6%) 225 (33.6%)
215 444
368 669
CON+RET+ SSH+RSH EST+ZERO+ NULL
Totals
boundaries, we obtain P = 29% (178/612), R = 79.1% (178/225), F = 54.0%, and E = 49.5% (47+434/972). Using continuous transitions as predictors of non-boundaries, we get P = 86.9% (313/360), R = 41.9% (313/747), F=64.4%. IS+STRUBE-HAHN
Finally, with the is+gftherelin instantiation the results are again similar to those obtained with the is parameter setting and gftherelin ranking, except that the values of χ2, while still significant, are lower. The full contingency table is shown in table 9.16. For this table χ2 = 28.01, p < 0.001. The correlation between boundaries and transitions reflects that found with gftherelin ranking, and again suggests collapsing rows as in the contingency table 9.17. Using these collapsed classes for predicting boundaries as discussed earlier, we find P=41.6% (153/368), R=68% (153/225), F=54.8%, E=42.9% (72+215/669) using est+zero+null to predict boundaries, and P=76.1%(229/301), R=51.6% (229/444), F=63.8% using continuous transitions to predict non-boundaries. The figures reported here indicate, on the one hand, that it is very unlikely that the variables transition and boundary are independent. Irrespective of the instantiation, we found that transitions divide in two classes: the continuous transitions are less likely than average to occur at
SUMMARY
238
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
boundaries, whereas est, zero, and null are more likely than average. The correlation between local focus and global focus shifts is, however, complex: changes in cb are poor predictors of segment changes, and even non-continuous transitions do not always occur at segment boundaries. The best predictor we found was using continuous utterances as predictors non-boundary: with is configurations, this predictor gives us an F of around 64% and an error rate E of around 43%.9. Our results also support the finding reported earlier in the chapter that est utterances behave differently from con utterances; more generally, our findings support the position that it is important to take into account noncontinuous transitions in theorizing on Centering.
6.2 Long-Distance Pronouns Hitzeman and Poesio (1998) claimed that it is not sufficient for an antecedent to be available on the stack for the use of a long-distance pronoun (a pronoun whose antecedent is not in the previous utterance) to be licensed; it is necessary for the entity to have been a cb. Again, we tested this claim using the four best instantiations already introduced.11 The first, perhaps obvious, finding is that the importance of this issue greatly depends on the definition of utterance. Hitzeman and Poesio assumed that each finite clause was a separate utterance, as suggested by Kameyama; if we adopt this definition, then 17 pronouns out of 217 are long distance, which is the same percentage (8%) found in the corpus used by Hitzeman and Poesio.12 If we identify utterances with sentences, however, we only get 5 long-distance pronouns.13 Hitzeman and Poesio’s claim is verified in our corpus as well, both with the IF instantiation and the IS instantiation. With IF, 13 long-distance pronouns out of 17 (76.5%) had been cbs and 4 had not, p < .02 by the sign test with gftherelin ranking; with strube-hahn, 14 (82.3%) and 3, respectively. With is, we find +3, −2 with gftherelin ranking and strube-hahn ranking, but there is not enough data for a significance test. An even better result, however, was found by weakening the licensing condition to having been a cp rather than a cb: in this case, with if we have +17, 0, p < .01 by the sign test with gftherelin ranking, and +16, −1 with strubehahn ranking. With is, the results are +4, −1 with gftherelin ranking, and +5, 0 with strubehahn. In other words, whereas between 76% and 82% of long-distance pronouns were cbs, virtually all have been at least cps.
7. Discussion This work was conceived as complementary to the work discussed in Poesio et al. (2004), in which the comparison between instantiations of Centering was based on theory-internal criteria (minimizing the number of violations of the three main claims)—the goal here being to use linguistic evidence as
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
239
an additional source of insights into versions of Centering. Although looking at a more data, and from a greater variety of genres including at least spoken dialogue and narrative, will be necessary to ensure the generality of our findings, we believe that a methodological point can already be raised: that is, we hope we convinced at least a few readers that recent developments in annotation methodology and Centering theory make it possible to investigate the notion of discourse topic in a way that is less dependent on subjective judgments, by using reliably annotated corpora, and using the cb as an approximation of the notion of discourse topic. The main findings of the work at the current stage are as follows. The first pilot reported in the chapter tested some of the claims of Gundel et al. (1993) concerning demonstratives. Its results suggest that it is possible to identify (by hand) ‘activated’ entities in a corpus using reliable guidelines, and that by combining this information with a definition of ‘in focus’ as ‘being cb(Ui−1),’ it is possible to predict when demonstratives are used to realize a discourse entity in our corpus with great accuracy using a simplified form of the proposal by Gundel et al. Our second pilot tested the correlation between type of transition and the decision to use a pronoun to realize the subject of an utterance, finding that the transitions in which cb = cp (con and ssh) are most predictive with if+gftherelin: for about 55% of con and ssh the subject is realized with a pronoun. The last two studies were concerned with the interaction between local focus and global focus. We found a clear correlation between segment boundaries and the use of ‘discontinuous’ transitions in which one or both of the utterances have no cb (est, zero, and null), but also that this correlation could not be reliably used to predict boundaries. We also found confirmation for the claim by Hitzeman and Poesio (1998) that only entities that had been cb could be the antecedent of long-distance pronouns. Many of these results replicate the results of earlier work, and will have to be confirmed by larger scale investigations, but to our knowledge ours is the first study to test these claims using a combination of reliable annotation techniques and automatic cb tracking methods; also, this is the first time that such results are shown to be robust across instantiations—that is, to hold whether utterances are identified with sentences or finite clauses, and whether grammatical function or ‘information status’ ranking is assumed. This invariance across instantiations is one of the most interesting results from a Centering perspective: the four instantiations that proved ‘best’ from the perspective adopted in Poesio et al. (2004) give very similar results when used to predict which entity will be realized as a demonstrative, and results that ‘go in the same direction’ concerning when a pronoun should be used for a subject (although predictions are much more accurate if utterances are identified with finite clauses) and whether a particular transition is a nonboundary (although the predictions are much more accurate when utterances are identified with sentences). This invariance makes any such findings much more robust. We also found evidence that est behave very differently from con both in terms of their correlation with segment boundaries and when trying
240
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
to predict whether a pronoun should be used in subject position, which goes against the suggestion in Walker et al. (1994) that est and con should be treated as the same transition. Finally, our results concerning segment boundaries suggest that more attention should be paid by the Centering literature to non-continuous transitions, as they seem to be key to understanding the relation between local and global focus.
Acknowledgments Thanks to my collaborators Barbara di Eugenio, Janet Hitzeman, and Rosemary Stevenson for many discussions on the topics discussed in this paper, and to the other members of the gnome project. Many, many thanks to Nancy Hedberg, who at least in two cases suggested much better ways of analyzing the data than those I had originally proposed. The gnome project was supported by epsrc grant GR/L51126⁄01. NOTES 1. One of the goals of work on Centering is to introduce precisely defined notions in order to avoid the terminological confusions afflicting work in this area of discourse, but some term is needed to refer to the pretheoretical notions that Centering attempts to capture. Broadly speaking, we use the term ‘(discourse) topic’ to refer to the notion often referred to using the term ‘focus’ in psycholinguistics: the ‘most salient entity’ of a stretch of utterances. For good discussions of the notions of topic and focus, see Reinhart, 1981; Vallduvi, 1990; Portner and Yabushita, 1998; Gundel, 1998; Beaver, 2004. 2. Several classifications of utterances into transitions have been proposed in the literature. The one discussed here is a refinement, due to Brennan et al. (1987), of the classification originally proposed by Grosz et al. (1995). 3. A great many versions of Rule 2 have been proposed in the literature (Grosz et al., 1995; Brennan et al., 1987; Strube and Hahn, 1999; Kibble, 2001). The version presented here is due to Brennan, Friedman, and Pollard, hence the name Rule 2 (bfp). 4. For a survey of recent work in discourse annotation, see the proceedings of the 2004 workshop on discourse annotation, held at acl and edited by D. Byron and B. Webber. 5. The museum subcorpus extends the corpus collected to support the ilex and sole projects at the University of Edinburgh. ilex generates Web pages describing museum objects on the basis of the perceived status of its user’s knowledge and of the objects she previously looked at (Oberlander et al., 1998). The sole project extended ilex with concept-to-speech abilities, using linguistic information to control intonation (Hitzeman et al., 1998). 6. The leaflets in the pharmaceutical subcorpus are a subset of the collection of all patient leaflets in the UK which was digitized to support the iconoclast project at the University of Brighton, developing tools to support multilingual generation (Scott et al., 1998). 7. Readers interested in trying the scripts can access them from the page http://cswww.essex.ac.uk/staff/poesio/cbc/.
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
241
8. Detailed guidelines were later developed (Gundel, 2003). 9. Not all cells of our contingency table contain at least 5 elements, which increases the chances of incorrectly accepting the Null Hypothesis, i.e., committing a Type 1 error (Woods et al., 1986). A variety of ways of reducing the dimensionality of the contingency table will be considered. 10. Many utterances are non-sentential, even with this configuration. The two main cases of non-sentential utterances are titles, which often are constituted of a single np (e.g., Jewelry); and instructions in imperative form (e.g., Don’t forget to contact your doctor if you have any problem). 11. There is no overlap between the texts used in this study and the texts used for the Hitzeman/Poesio study, which were spoken dialogues. 12. Overall, with this definition of utterance, 1,158 anaphoric expressions have their antecedents in the current or previous utterance, and 455 at a distance 2–6; none if farther away. 13. With this definition, 1,242 have their antecedent in the same or previous utterance; for 385 it is farther away.
REFERENCES André, E., M. Poesio, and H. Rieser, editors. Proceedings of the ESSLLI Workshop on Deixis, Demonstration and Deictic Belief in Multimedia Contexts, Utrecht, 1999. University of Utrecht. Ariel, M. Accessing Noun-Phrase Antecedents. Croom Helm Linguistics Series. Routledge, 1990. Asher, N. Reference to Abstract Objects in English. D. Reidel, Dordrecht, 1993. Beaver, D. The optimization of discourse anaphora. Linguistics and Philosophy, 27(1):3–56, 2004. Brennan, S. E., M. W. Friedman, and C. J. Pollard. A centering approach to pronouns. In Proceedings of the 25th ACL, pages 155–162, Stanford, CA, June 1987. Chafe, W. L. The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production. Ablex, Norwood, NJ, 1980. Di Eugenio, B. Centering theory and the Italian pronominal system. In Proceedings of the 13th COLING, Helsinki, Finland, 1990. —— . Centering in Italian. In M. A. Walker, A. K. Joshi, and E. F. Prince, editors, Centering Theory in Discourse, chapter 7, pages 115–138. Oxford, 1998. Fox, B. A. Discourse Structure and Anaphora. Cambridge University Press, Cambridge, UK, 1987. Garrod, S. C. Resolving pronouns and other anaphoric devices: The case for diversity in discourse processing. In C. Clifton, L. Frazier, and K. Rayner, editors, Perspectives in Sentence Processing. Lawrence Erlbaum, 1993. Givon, T., editor. Topic continuity in discourse: A quantitative cross-language study. J. Benjamins, Amsterdam and Philadelphia, 1983. Gordon, P. C., B. J. Grosz, and L. A. Gilliom. Pronouns, names, and the centering of attention in discourse. Cognitive Science, 17:311–348, 1993. Grice, H. P. Logic and conversation. In P. Cole and J. Morgan, editors, Syntax and Semantics, vol. 3: Speech Acts, pages 41–58. Academic Press, New York, 1975. Grosz, B. J., and C. L. Sidner. Attention, intention, and the structure of discourse. Computational Linguistics, 12(3):175–204, 1986.
242
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Grosz, B. J., A. K. Joshi, and S. Weinstein. Centering: A framework for modeling the local coherence of discourse. Computational Linguistics, 21(2):202–225, 1995. (The paper originally appeared as an unpublished manuscript in 1986.) —— . Providing a unified account of definite noun phrases in discourse. In Proceedings of ACL-83, pages 44–50, Cambridge, MA, 1983. Gundel, J. K. Coding protocol for statuses on the givenness hierarchy. Unpublished manuscript, June 2003. —— . Centering theory and the givenness hierarchy: Towards a synthesis. In M. A.Walker, A. K. Joshi, and E. F. Prince, editors, Centering Theory in Discourse, chapter 10, pages 183–198. Oxford University Press, 1998. Gundel, J. K., N. Hedberg, and R. Zacharski. Cognitive status and the form of referring expressions in discourse. Language, 69(2):274–307, 1993. Hitzeman, J., and M. Poesio. Long-distance pronominalisation and global focus. In Proceedings of ACL/COLING, vol. 1, pages 550–556, Montreal, 1998. Hitzeman, J., A. Black, P. Taylor, C. Mellish, and J. Oberlander. On the use of automatically generated discourse-level information in a concept-to-speech synthesis system. In Proceedings of the International Conference on Spoken Language Processing (ICSLP98), page 591, Australia, 1998. Jarvella, R. J., and W. Klein, editors. Speech, Place and Action:- Studies in Deixis and Related Topics. John Wiley, Chichester and New York, 1982. Joshi, A. K., and S. Kuhn. Centered logic: The role of entity centered sentence representation in natural language inferencing. In Proceedings of IJCAI, pages 435–439, Tokyo, 1979. Joshi, A. K., and S. Weinstein. Control of inference: Role of some aspects of discourse structure–centering. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 385–387, Vancouver, CA, 1981. Kameyama, M. A property-sharing constraint in centering. In Proceedings of the 24th ACL, pages 200–206, New York, NY, 1986. —— . Intra-sentential centering: A case study. In M. A. Walker, A. K. Joshi, and E. F. Prince, editors, Centering Theory in Discourse, chapter 6, pages 89–112. Oxford, 1998. —— . Zero Anaphora: The case of Japanese. Ph.D. thesis, Stanford University, Stanford, CA, 1985. Kameyama, M., R. Passonneau, and M. Poesio. Temporal centering. In Proceedings of the 31st ACL, pages 70–77, Columbus, OH, 1993. Kaplan, D. Dthat. In P. Cole, editor, Syntax and Semantics, vol. 9: Pragmatics, pages 221–243. Academic Press, New York, 1979. Karamanis, N. Entity coherence for descriptive text structuring. Ph.D. thesis, University of Edinburgh, Informatics, 2003. Kibble, R. A reformulation of Rule 2 of Centering Theory. Computational Linguistics, 27(4): 579–587, 2001. Li, C. N. Subject and Topic. Academic Press, New York, 1976. Linde, C. Focus of attention and the choice of pronouns in discourse. In T. Givon, editor, Syntax and Semantics 12. Academic Press, 1979. Mann, W. C., and S. A. Thompson. Rhetorical Structure Theory: Towards a functional theory of text organization. Text, 8(3):243–281, 1988. Miltsakaki, E. Towards an aposynthesis of topic continuity and intrasentential anaphora. Computational Linguistics, 28(3):319–355, 2002.
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
243
Oberlander, J., M. O’Donnell, A. Knott, and C. Mellish. Conversation in the museum: Experiments in dynamic hypermedia with the intelligent labelling explorer. New Review of Hypermedia and Multimedia, 4:11–32, 1998. Passonneau, R. J. Getting and keeping the center of attention. In M. Bates and R. M. Weischedel, editors, Challenges in Natural Language Processing, chapter 7, pages 179–227. Cambridge University Press, 1993. —— . Interaction of discourse structure with explicitness of discourse anaphoric noun phrases. In M. A. Walker, A. K. Joshi, and E. F. Prince, editors, Centering Theory in Discourse, chapter 17, pages 327–358. Oxford University Press, 1998. Passonneau, R. J., and D. Litman. Intention-based segmentation: reliability and correlation with linguistic cues. In Proceedings of the 31st Annual Meeting of the ACL, pages 148–155, Columbus, OH, 1993. Poesio, M. Discourse annotation and semantic annotation in the gnome corpus. In Proceedings of the ACL Workshop on Discourse Annotation, pages 72–79, Barcelona, July 2004a. —— . The MATE/gnome scheme for anaphoric annotation, revisited. In Proceedings of SIGDIAL, Boston, May 2004b. Poesio, M., and N. N. Modjeska. Focus, activation, and this-noun phrases: An empirical study. In A. Branco, R. McEnery, and R. Mitkov, editors, Anaphora Processing, pages 429–442. John Benjamins, 2005. Poesio, M., R. Stevenson, B. Di Eugenio, and J. M. Hitzeman. Centering: A parametric theory and its instantiations. Computational Linguistics, 30(3):309– 363, 2004. Portner, P. H., and K. Yabushita. The semantics and pragmatics of topic phrases. Linguistics and Philosophy, 21:117–157, 1998. Prasad, R., and M. Strube. Discourse salience and pronoun resolution in Hindi. In Penn Working Papers in Linguistics, vol. 6, pages 189–208, 2000. Prince, E. F. The ZPG letter: Subjects, definiteness, and information status. In S. Thompson and W. Mann, editors, Discourse Description: Diverse Analyses of a Fund-Raising Text, pages 295–325. John Benjamins, 1992. —— . Subject-prodrop in Yiddish. In P. Bosch and R. van der Sandt, editors, Focus: Linguistic, Cognitive, and Computational Perspective, pages 82–104. Cambridge, 1998. Rambow, O. Pragmatics aspects of scrambling and topicalization in German. In Proceedings of the Workshop on Centering Theory in Naturally-Occurring Discourse, Philadelphia, 1993. Institute for Research in Cognitive Science (IRCS). Reinhart, T. Pragmatics and linguistics: An analysis of sentence topics. Philosophica, 27(1), 1981. Also distributed by Indiana University Linguistics Club. Scott, D., R. Power, and R. Evans. Generation as a solution to its own problem. In Proceedings of the 9th International Workshop on Natural Language Generation, Niagara-on-the-Lake, CA, 1998. Siegel, S., and N. J. Castellan. Nonparametric Statistics for the Behavioral Sciences. McGraw-Hill, 2nd edition, 1988. Strube, M. and U. Hahn. Functional centering–grounding referential coherence in information structure. Computational Linguistics, 25(3):309–344, 1999. Turan, U. Ranking forward-looking centers in Turkish: Universal and languagespecific properties. In M. A. Walker, A. K. Joshi, and E. F. Prince, editors,
244
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Centering in Discourse, chapter 8, pages 139–160. Oxford University Press, 1998. Vallduvi, E. The Informational Component. Ph.D. thesis, University of Pennsylvania, Philadelphia, 1990. Walker, M. A. Evaluating discourse processing algorithms. In Proceedings of ACL89, pages 251–261, Vancouver, CA, June 1989. —— . Centering, anaphora resolution, and discourse structure. In M. A.Walker, A. K. Joshi, and E. F. Prince, editors, Centering in Discourse, chapter 19, pages 401–435. Oxford University Press, 1998. Walker, M. A., M. Iida, and S. Cote. Japanese discourse and the process of centering. Computational Linguistics, 20(2):193–232, 1994. Walker, M. A., A. K. Joshi, and E. F. Prince, editors. Centering Theory in Discourse. Clarendon Press/Oxford, 1998. Webber, B. L. Structure and ostension in the interpretation of discourse deixis. Language and Cognitive Processes, 6(2):107–135, 1991. Woods, A., P. Fletcher, and A. Hughes. Statistics in Language Studies. Cambridge University Press, 1986.
LINGUISTIC CLAIMS FORMULATED IN TERMS OF CENTERING
245
10
Looking Both Ways The JANUS Model of Noun Phrase Anaphor Processing alan garnham and h. wind cowles
Introduction In 1996 one of us (Garnham, 1996) suggested that, in the domain of language processing, the mental models “theory” was better viewed as a framework in which detailed accounts of specific processes might be developed, rather than as a theory of language processing itself. In addition, Garnham (1996) suggested that the mental models framework had some of the characteristics of what Marr (1982) called a computational theory. By “computational theory,” Marr meant a formal specification of the (mathematical) function that a cognitive system computes and an account of why it computes that function. Such a theory is typically derived, at least in part, from a formal analysis of the task (or tasks) that the cognitive system in question performs. So, in the domain of language comprehension, task analysis suggests that one part of that process is to extract information from discourse or text, and that this information typically takes the form of a specification of a situation, or set of situations, in the real world, an imaginary world, or an abstract domain. Situations comprise entities, their properties, and relations between them, though there has been an intense debate in formal semantics about which of these components are “basic semantic types.” However, everyone agrees that entities are a basic semantic type and that some linguistic expressions
246
(though again there is a debate about which expressions) refer directly to entities. It is usual in discourse or text to refer to entities more than once and it is an empirical observation (more of which later) that second and later references typically use different referring expressions from initial references. The linguistic expressions that are usually taken to refer to entities are noun phrases (NPs) of various kinds (full definite NPs, with or without modifiers, demonstratives of various kinds, and pronouns), and at least for second and later references these NPs are usually definites. In this chapter, we take for granted the basic tenets of the mental model theory of comprehension, and language processing more generally, in particular that texts are about situations, that entities are components of situations, and that linguistic expressions refer and re-refer to entities. We further assume that clauses refer to eventualities (events, states, changes of state, and processes) and that relations between individuals and between eventualities are expressed by a mixture of implicit and explicit means. We attempt to develop, within that framework, a detailed account of coreferential NP anaphora. Thus, JANUS is a model of coreferential NP anaphora within the mental model framework. The fundamental assumption of the JANUS model is that in order to understand how coreferential NP anaphora works, whether in production or in comprehension, it is necessary to look both ways1 in the text: backward to what has gone before and forward to what is yet to come. We will describe in detail later what this looking back and looking forward entails. A crucial issue, which is determined by the processes of looking forward and looking back, is the form of the anaphoric expression itself: how is it chosen in production and, given a particular form, how easy is that form to understand in comprehension?
Background Within psycholinguistics there has been a great deal of interest in coreferential NP anaphora, with attention primarily focused on definite pronominal anaphora and, to a lesser extent, on full definite noun phrase anaphora (see Garnham, 1987, 2001, for an overview of this work). The question, therefore, naturally arises as to what kinds of accounts have been developed to date, and why a new one is needed. The answer is that, for the most part, a somewhat unsystematic approach has been taken within psycholinguistics itself. A range of factors affecting coreferential NP anaphor processing has been identified, but only partial attempts have been made to systematize these factors. Clearly morphosyntactic factors influence the interpretation of coreferential NP anaphors: number and gender are important, and case, even though it is determined by the role of the anaphor in its own clause, could also play a role. Similarly, at least within a sentence, constraints usually defined syntactically in terms of c-command or related notions operate (see, e.g., Reinhart, 1981). It should be noted, however, that there have been attempts to explain these effects semantically (in terms of function-argument relations) rather than syntactically (e.g., Bach and Partee, 1980; Keenan and Faltz, 1985).2
LOOKING BOTH WAYS
247
Turning to so-called discourse-level factors, a distinction can also be drawn in this domain between factors that are structure based and those that are meaning based. However, in this domain, the two types of factor appear to be distinct, and not merely different perspectives on the same basic influence (as in the case of the within-clause constraints on backward anaphora). In this domain, the influence of meaning-based factors (i.e., their effects on processing) is typically associated with knowledge about the world, and such factors ensure that anaphors are preferentially interpreted so as to produce plausible interpretations. For example, in (1) “he” is usually taken to refer to John, because someone who is confessing something is likely to be a candidate for needing a prison sentence to be reduced. John confessed to Bill because he wanted a reduced sentence. Within the literature, there is no real agreement about what the structural effects at the discourse level are or how they are related to other types of effect. One view is that the knowledge-based effects are the most important ones. This view is related to a hypothesis propounded by Hobbs (1979) that coreference relations often fall out of the computation of coherence relations. This view is, in turn, related to one proposed by Stevenson, Nelson, and Stenning (1995) that (“heuristic”) structural strategies, such as subject assignment, are only used when knowledge-based strategies (including gender matching) fail to produce a referent for an NP anaphor. There are also effects that appear to be structural but may actually be determined by information structure rather than syntactic structure. So, for example, what is often referred to as a preference for sentential subjects as antecedents for definite pronouns might really be a preference for local topics, or locally focused items, in the psychological rather than the linguistic sense (Arnold, 1998; Cowles, 2003). Indeed, such effects are related to what is called local focusing in the Grosz and Sidner framework (see e.g. Grosz and Sidner, 1986). Local focusing effects of the kind defined in the Grosz and Sidner framework are described in Centering Theory, which is discussed in more detail later. More generally, both local and global focusing effects are important in coreferential NP anaphor processing, and have been studied in the psycholinguistic literature. In that literature, global focusing effects typically come under the head of foregrounding. Lesgold, Roth, and Curtis (1979) first clearly demonstrated that the effect of the distance between a noun phrase anaphor such as “the forest” and its antecedent was largely or entirely associated with changes in (psychological) focus in the intervening material. They compared noun phrase anaphors in passages in which the material between the antecedent and the anaphor either kept the referent foregrounded, as in (2), or did not, as in (3), or in which there were no intervening clauses (2b and 2c or 3b and 3c omitted).
248
(2a)
A thick cloud of smoke hung over the forest.
(2b)
The smoke was thick and black, and began to fill the clear sky.
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
(2c)
Up ahead Carol could see a ranger directing the traffic to slow down.
(2d)
The forest was on fire.
(3a)
A thick cloud of smoke hung over the forest.
(3b)
Glancing to the side, Carol could see a bee flying around the back seat.
(3c)
Both of the kids were jumping around, but made no attempt to free the insect.
(3d)
The forest was on fire.
The final sentence containing the anaphoric noun phrase, “the forest” was read more slowly in (3) than in (2). Furthermore, the reading time for (2d) was not significantly faster when the intervening sentences (2b) and (2c) were omitted. Comparable studies with definite pronominal anaphors are more difficult to carry out, because with distant defocused antecedents such anaphors are usually inappropriate and may be either incomprehensible (in the sense of having no clear referent) or naturally taken as referring to a closer antecedent. In the psycholinguistic literature, the evidence relating to more local effects is somewhat confused, though it has been clarified to some extent in recent years. “Local” coreference is typically, but not always, achieved by the use of definite pronouns. As has already been indicated, there is a long-standing claim that such anaphors prefer to have as their antecedents the grammatical subjects of preceding clauses. However, preference for antecedents in subject position is often conflated with other things. For example, when the pronoun is the subject of its clause, preference for subjects is conflated with a preference for parallelism of grammatical function (Sheldon, 1974), in this case between anaphor and antecedent. And if the antecedent is in a simple active main clause, preference for subjects is often conflated with preference for first mentioned entities (Gernsbacher and Hargreaves, 1988, 1992). Gernsbacher (e.g., 1990) argues that first mention effects are predicted by her structure-building framework, and has presented empirical evidence for very strong effects of this kind, even when first mention is not confounded with other factors such as subjecthood. However, the very strong effects are all reported from studies using a probe recognition task, in which people are asked to judge whether a word presented after a sentence or passage has been read occurred in that passage. Gordon, Hendrick, and Foster (2000) have shown that such effects might arise from a strategy of maintaining in working memory a list of words that are likely to be probed, in the order that they appear in the sentence. On this view, first mention effects are primacy effects in list recall. Therefore, from the point of view of text comprehension, there is no good evidence that entities mentioned in preposed phrases are favored over grammatical subjects as antecedents for coreferential NP anaphors. Indeed, there is some evidence (from both an offline antecedent choice task and an online naming task) suggesting that grammatical subjects are preferred as antecedents over entities in such preposed phrases (Cowles, 2003).
LOOKING BOTH WAYS
249
The relation between parallel function and subject preference has been studied in more detail by Stevenson et al. (1995) and by Smyth (1992, 1994; Chambers and Smyth, 1998). Both sets of authors showed that preferences for parallel grammatical function of antecedent and anaphor are enhanced by strict syntactic parallelism between the two clauses. Stevenson et al. argue that subject preference and the preference for parallelism are separate heuristic strategies that are weighted differently in different situations. Stevenson et al.’s idea that subject preference is a fall-back strategy, used when contentbased strategies and other factors do not help with anaphor resolution, may go some way to explaining why it is not reliably found, for example in reading time studies. Preferences for antecedents that are grammatical objects, often close antecedents, are sometimes reported. The first author has long suspected that factors such as animacy and personhood are important here. For example, in parts of narratives where the action is moving forward, and where there is a single main character, that character is likely to be the local topic and the grammatical subject of most sentences. And it will often be referred to pronominally. Inanimate objects and non-human animals (except where personified) often play different types of roles in narrative and these roles may affect where an anaphor preferentially finds its antecedent. Stevenson’s own view (see, Stevenson, Crawley, and Kleinman 1994) is that the preferred antecedents for pronouns are determined not by grammatical roles but by thematic roles, and that those preferences are modulated by relations among clauses expressed by connectives such as “and,” “but,” “because,” and “so.” This view is related to the previously mentioned notion, propounded by Hobbs, that coherence often determines coreference. It can also be seen as a precursor of our own view, discussed in more detail later, that to understand the functioning of coreferential NP anaphors it is necessary to look forward as well as backward in the text. The parallel function strategy might also be connected to a discourse or coherence relation of parallelism. The parallelism between eventualities will be most striking when everything about their descriptions, and in particularly syntactic structure and grammatical/thematic roles, is also parallel. How can order be imposed on the rather confusing view of the factors governing coreferential NP anaphora that comes from the psycholinguistic literature? In the next sections we will look at two attempts to say something more systematic about this type of anaphora, and judge how well they succeed.
Centering Theory Centering Theory (Grosz, Joshi, and Weinstein, 1995; Walker, Joshi, and Prince, 1998) is an account of local focusing and its effects on coreferential NP anaphor, developed within the overall framework suggested by Grosz and Sidner (1986). On this view, discourses are divided into segments, with coherence relations holding between the segments. These coherence rela-
250
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
tions define the global structure of the discourse and explain how and where global focus changes. Within a discourse segment, local coherence must be maintained. Attentional focus changes from moment to moment and determines, among other things, the most appropriate way of referring to the entities that are mentioned in the discourse. According to Centering Theory, each utterance has a set of forwardlooking centers (of attention). These centers are entities that are likely to be referred to again in the upcoming part of the discourse, and they are partially ordered, in terms of how likely they are to be talked about immediately. Each utterance also has one backward-looking center (Cb), the current center of attention, which should be one of the forward-looking centers of the previous utterance. Centers can be continued, retained, or shifted. Shifts can be rough or smooth. These transitions are defined in terms of whether the backwardlooking center remains the same from one utterance to the next (continuation and retention) or changes (smooth and rough shift), and whether the current backward-looking center is the same as the highest-ranked forward-looking center (the preferred center) (continuation and smooth shift) or different (retention and rough shift). The preferred order of transitions, with the more (locally) coherent transitions being given before the less coherent ones, is as follows: continuation, retention, smooth shift, rough shift. Centering Theory has one rule about the form that a coreferential NP anaphor should take: the so-called pronominalization (or pronoun) rule. This rule states that if any element of an utterance (strictly any element of the set of forward-looking centers of the previous utterance) is pronominalized, the backward-looking center of the current utterance must be. Centering Theory thus makes the strong assumption that pronominalization requires that the antecedent of the pronoun must be explicitly mentioned. However, there is some empirical evidence (e.g. Cornish et al., 2005) that non-Cb pronouns without explicit antecedents are sometimes both acceptable and readily interpretable. Grosz et al. (1995) do consider this possibility: something that is not explicitly mentioned is not part of the set of forward-looking centers of the previous utterance, and so referring to that entity using a pronoun is not, strictly speaking, a violation of the pronoun rule. One obvious question raised by Centering Theory is what determines the (partial) order of the ranking of the forward-looking centers for any particular utterance. The theory itself, as outlined so far, does not determine the factors that determine ordering. Brennan, Friedman, and Pollard (1987) suggested, in accounting for the facts about pronominalization, that grammatical subjects are ranked higher than grammatical objects, which in turn are ranked higher than indirect objects. Anything else is ranked below entities introduced in these grammatical positions. However, they do not believe that this scheme represents the final word on the ordering of forward-looking centers in Centering Theory. For example, it may be that thematic roles rather than, or as well as, grammatical roles affect the ordering of forwardlooking centers.
LOOKING BOTH WAYS
251
If one takes psycholinguistic data on how pronouns are interpreted, and how easy particular interpretations are to process, as indicators of ranking in the set of forward-looking centers for the antecedent-containing clause, several problems arise. For example, if there are genuine cases of object preference, as opposed to subject preference, on the natural interpretation of Centering Theory, there must be something about the antecedent-containing clause that induces a different from normal order of the centers. More dramatically, the interactions between antecedent-containing clause type (as determined by, say, the thematic grid for that clause) and connective between antecedentcontaining and anaphor-containing clauses seem hard to incorporate within Centering Theory, because the ordering of centers has to be determined by the clause in which those centers occur. For example, Stevenson et al, (1994) suggest that the entity with thematic role X may be preferred over that with role Y as the antecedent of a pronoun when the connective to the anaphorcontaining clause is “because” introducing a causal clause, whereas Y may be preferred over X when “and so” leads to a consequence (where X and Y are the thematic roles of the two main protagonists in the antecedent-containing clause). Whether these effects can be explained in terms of types of transition has yet to be seen. Finally, most writings on Centering Theory suggest that each utterance has one highest-ranked forward-looking center. However, given that the ranking of forward-looking centers is partial, this need not be the case. Two or more entities might tie for highest-ranked center. The notion of a highestranked forward-looking center is linked to the stronger claim (e.g., Greene, McKoon, and Ratcliff, 1992) that at any point in a well-written discourse there is only one entity that can be referred to pronominally, and that if a pronoun is used in other circumstances, it will not be interpreted. Gordon and Scearce (1995) have argued that the results Greene et al. use to support their claim are based on a methodological artifact. And it is certainly true that there are entirely straightforward passages where there are two or more possible candidates for pronominal reference, before the continuation of the passage chooses between them, as in (1), repeated here as (4a) which should be compared with (4b). (4a)
John confessed to Bill because he wanted a reduced sentence.
(4b)
John confessed to Bill because he offered a reduced sentence.
The Informational Load Hypothesis Although Centering Theory draws on psychological notions such as attention and intention, and has inspired some psycholinguistic research, it is rooted in computational linguistics. In addition, its scope is narrow. Its primary aim is to account for local focusing phenomena, within a discourse segment, and almost inevitably the type of coreferential NP anaphora that it has most to say about is definite pronominal anaphora. A more recent theory,
252
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
with aims more closely related to our own, is Almor’s (1999) Informational Load Hypothesis (ILH). The ILH is an account of the processing of coreferential NP anaphors in discourse. In one sense the ILH gives a uniform account of coreferential NP anaphora, in that all instances are governed by the same general principles. However, Almor stresses that NP anaphors do not behave uniformly. Their interpretation is affected by their processing cost (defined in terms of the activation of semantic information) and their discourse function (identifying the antecedent and/or adding new information about it). Almor’s starting point, in formulating the ILH, is Grice’s (1975) maxim of quantity: a speaker’s contribution (in this case the form of a coreferential NP anaphor) should be as informative as is required, but no more so. In fact, Almor’s interpretation of this Gricean maxim is similar to that of Relevance Theory (Sperber and Wilson, 1995). Speakers (and writers) do not intentionally follow a cooperative principle or its submaxims, but the architecture of the language processing system makes them appear to be following such a principle. Almor’s view, then, is that any processing cost associated with interpreting an anaphoric expression must be justified by the way it aids identification of the antecedent or the way it adds new information about the referent. From a more strictly Gricean perspective, an (apparently) overspecific anaphoric expression should give rise to an implicature. As we discuss later, such an implicature might be to the effect that there is a change in perspective on the referent of the anaphoric expression. As will emerge later in this chapter, we are in general agreement with Almor at this level of description. However, we believe he is incorrect in his account of what contributes to the processing cost of an anaphoric expression, and that he takes too narrow a view both of the processes of identifying the antecedent of an anaphoric expression and of what constitutes “adding new information.” In Almor’s view, processing anaphoric expressions makes use of verbal working memory and, more particularly, requires information to be both temporarily stored in that memory and worked on, or processed. Although there are different views about the nature of working memory and, indeed, about whether it is functionally and anatomically separate from long-term memory, no one would deny, and we certainly would not, that the simultaneous short-term storage and processing of information is an essential part of language processing in general, and of anaphor processing in particular. Almor feels he can remain agnostic about whether the classic Baddeley and Hitch (1974) view of working memory is correct. Nevertheless, he argues, by analogy with the well-established phonological interference effect in working memory (see Baddeley, 1992, for a summary), that holding semantically similar items in working memory should be more difficult than holding semantically dissimilar items in working memory and, indeed, the greater the semantic similarity, the greater the difficulty. In linking an anaphoric expression with its antecedent this effect works against another well-established effect: the more semantically similar one item is to another, the better a cue
LOOKING BOTH WAYS
253
it will be for retrieving that item from memory. Almor is therefore led to the view that where an anaphor’s antecedent is difficult to access from memory, perhaps because it was mentioned a good while ago and it is not currently salient (or “in focus”), then processing difficulty caused by overlap of content (say between the anaphor “the bird” and the antecedent “the robin”) will be justified. However, when the antecedent is readily available, the overlap will not be justified, and indeed will cause problems. These considerations lead to the prediction of an inverse typicality effect for focused antecedents (“. . . the ostrich . . . the bird . . .” should be easier than “. . . the robin . . . the bird . . .”) because the atypical category member overlaps less with the category term than does the typical category member. Almor (1999: Experiment 5) provides evidence for this effect when the antecedents are focused by clefting. Cowles and Garnham (2005) showed a parallel effect in category hierarchies (e.g., reptile-snake-cobra) and also showed that more straightforward ways of focusing the antecedent than clefting, such as making it a grammatical subject, produced similar effects. Is there any evidence for semantic interference in working memory? Almor, Kempler, MacDonald, Andersen, and Tyler (1999) provide evidence from Alzheimer patients that working memory plays an important role in pronoun resolution, but their evidence does not directly demonstrate semantic interference. Baddeley (1966) reported a small 6% reduction in memory performance for semantic overlap, compared with a 72% effect for phonological overlap. In two recent experiments with more carefully designed stimuli, carried out in collaboration with Jools Simner, we found effects of 18%, 25% for phonological similarity, but no effect at all (−2%, and −1%) for semantic similarity. In support of this finding, Haarmann and Usher (2001; Haarmann, Davelaar, and Usher, 2003) have argued that there are separate semantically based and phonologically based working memory systems. Furthermore, their evidence for the semantically based system comes from semantic facilitation effects in shortterm memory. Haarmann et al. (2003) make the plausible argument that the semantically based system is the more important one in text comprehension. If there is no good evidence for semantic interference effects in working memory, how can inverse semantic distance effects be explained? In the ILH, standard semantic distance effects are eliminated because mapping onto a focused antecedent is a default that does not depend on the relation between the content of the anaphor and the content of the antecedent. They are inverted because, with no identification function to perform, semantic overlap just causes interference. In attempting to replicate Almor’s own result with a typicality manipulation (Cowles and Garnham, in press), we did not find consistent results, and might well have concluded that typicality effects were eliminated but not reversed. However, with category hierarchies, as reported earlier, we did find inverse effects. An alternative to Almor’s account is that processing unnecessary semantic information causes problems. On this view, Almor’s results, but not ours, suggest that “bird” is more unnecessary after “robin” than after “ostrich” (when the robin or the ostrich is salient),
254
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
and our results suggest that “reptile” is more unnecessary after “snake” than after “cobra.” We will return to this issue later, when we consider what use is made of redundant semantic information in an anaphor. Turning to the broad outline of Almor’s ILH, Almor argues that “the ease of processing NP anaphors can be described by the interaction of three factors: discourse focus, the amount of new information added by the anaphor, and the informational load of the anaphor-antecedent pair” (1999: 753). We will consider each of these factors separately. By using the term “discourse focus,” Almor might seem to have had in mind what in the Grosz and Sidner framework is called global focus. However, Almor has relatively little to say about focus. For his experimental investigations he talks in terms of salience, and operationalizes focus as clefting of the antecedent in the sentence immediately prior to the one containing the anaphor. So, in the Centering Theory framework, it would appear that Almor is manipulating local focus in such a way that a clefted element appears either first or near the beginning of the set of partially ordered forward-looking centers. Indeed, the use of clefts, as opposed to simple active sentences in which the subject would typically be the preferred center, raises the question of whether ordering is the only thing that is important. Sentences such as (5) and (6) have the same set of forward-looking centers, the dog and the cat, presumably ordered in the same way (dog, cat), but in our judgment, the dog appears to be more salient in (6) than in (5), and hence is a more likely candidate for subsequent reference. In (5) two factors, order and syntactic prominence, contribute to salience. In (6) three factors contribute: order, syntactic prominence, and focal stress. (5)
The dog chased the cat.
(6)
It was the dog that chased the cat.
The issue of focusing is a complex one and we will return to it in the context of the JANUS model. When Almor talks about anaphors adding new information, what he appears to have in mind is that a coreferential NP anaphor may have a more specific content than its antecedent, for example in the anaphor-antecedent sequence “. . . the bird . . . the robin. . . .” The effects of the relative specificity of antecedent and anaphor have been investigated by Garrod and Sanford (1977; Sanford and Garrod, 1980) and by Garnham (1981, 1984, 1989), though the two series of studies produced different results, with no resolution of the differences being offered. Almor did not study in detail cases where the anaphor is more specific than the antecedent. His reason was that he did not have any way to predict the relative sizes of different factors that he claimed should influence anaphor resolution in the two cases. Our own view, which we will develop in more detail later, is that the information content of an anaphoric NP needs to be considered not only in relation to the information content of its antecedent but also in relation to other information carried by the text, information about the referent of both the antecedent and the anaphor, but also about other potential antecedents for the anaphor.
LOOKING BOTH WAYS
255
We turn now to the most important idea in Almor’s theory or, at least, the one that gives it its name: informational load. Informational load is (a monotonic increasing function of) the semantic or conceptual difference between the anaphor and the antecedent. Conceptual difference (called C-difference by Almor because of its somewhat counterintuitive properties) is negative when the anaphor is more general (and hence has less conceptual content) than the antecedent. So for the typical case in which a more specific antecedent is followed by a more general anaphor (e.g., “. . . the robin . . . the bird . . .,” “. . . the robin . . . it . . .”) C-difference is greater (i.e., less negative) when the anaphor has more content and is less general (“the bird”) rather than when it has less content and is more general (“it”). Going back to Almor’s original conceptualization of anaphor resolution in terms of processing cost and its functional justification, informational load is a measure of cost. Justification is in terms of adding new information (if any is added) and identifying the antecedent. In Almor’s view, this is where focus plays a role. If there is a single strongly focused (highly salient) element in a text, an anaphoric NP will naturally map onto that element with little effort. Thus there is no justification in that anaphor having any content if it remains more general than the antecedent, since content will decrease its absolute C-difference from the antecedent and increase (its negative Cdifference, and hence its) informational load. As should be apparent from this discussion, in his exposition Almor considers only the relation between the anaphor and its actual antecedent in calculating informational load and hence, more neutrally, the processing cost of the anaphor. We believe that it is an interesting empirical question in what circumstances only one possible antecedent is considered. Our view is considerably more complex than Almor’s. We believe that which possible antecedents are considered depends on, possibly among other things, the type of anaphoric expression, what has already been mentioned in the text, and the extent to which one or more of those things is particularly salient when the anaphoric expression is encountered. In this respect our view is more akin to that of Ariel (1990), who identifies competition (from other potential antecedents) as one of four factors influencing anaphoric reference. Rather than considering competitor antecedents, Almor focuses on a “matching” process that established the link between the anaphor and its actual antecedent. For focused antecedents, this process relies relatively little on the content of the anaphor and the antecedent, since a focused antecedent is a default referent. Here Almor’s implicit assumptions about focus seem broadly compatible with Centering Theory. However, because full noun phrases are often used to refer to referents that are out of focus, such anaphors cannot ignore every possible antecedent except the most active/salient/ focused one. Also, with the additional conceptual information that many full noun phrase anaphors contain, correct resolution cannot simply depend on mapping to the focused antecedent. The additional information may help uniquely identify an antecedent, but by the same token, it also places greater
256
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
constraints on what the antecedent may be. Thus, the influence/relevance of competitor antecedents should not be lightly dismissed. Garnham (1989) provides some evidence that the presence of competitor antecedents influences processing under certain circumstances. Our overall view of the ILH is that it represents an interesting attempt to treat all instances of coreferential NP anaphor within a single framework, and to identify the factors that influence the ease with which such anaphors are interpreted. However, we believe it to have a number of shortcomings, which we attempt to rectify in the JANUS framework, to which we now turn.
JANUS—Looking Both Ways Like Almor, in formulating JANUS we start from a functional perspective, in fact the same functional perspective that Almor started from, namely, the assumption that the form and content of an anaphoric expression can be explained by the need to link it to its antecedent and establish its referent and the possibility of adding new information about the referent. However, as will become apparent, we take a broader view of “adding new information” than Almor did. Although we suspect that in reality the workings of anaphoric reference cannot always be explained functionally, we believe it makes sense to formulate, in the first instance, a strong and testable theory that says that they can. So we assume that the content of an anaphoric expression is no more and no less than is required to do the work that it is intended to perform. The question is: what is that work? The basic principle of the JANUS model is that anaphoric expressions have two types of function that will influence both how they are produced by speakers and how they are resolved by comprehenders: • Those that involve “looking” “backward” to the previous text. • Those that involve “looking” “forward” to the upcoming text. “Looking” is in scare quotes for reasons explained in note 1. Both “backward” and “forward” are also in scare quotes because the functions that we have in mind do not always strictly involve looking back to the previous text or forward to the upcoming text. For example, the “backward” functions are mainly concerned with identifying antecedents, and some types of anaphora (backward anaphora or cataphora) have “antecedents” that follow the anaphoric expression. And some of the information that people look back to is not, strictly speaking, in the text. In addition to considering these two types of function we believe that a theory of anaphoric processing probably has to allow that different types of coreferential NP anaphor function in different ways. This idea has been a common one in the history of research into anaphoric processing. For example, Sanford and Garrod (e.g., 1981) proposed that pronouns trigger a search for an antecedent among items in explicit focus only, whereas fuller noun phrases might find their antecedents in implicit focus. In theoretical accounts (e.g., Gundel et al., 1993; Ariel, 1992), different forms are
LOOKING BOTH WAYS
257
attributed to different cognitive statuses, which may also result in different processing responses. On the other hand, the ILH claims that such facts fall out of the interaction between focus, added information, and informational load, without having to ascribe specific functions to different types of anaphor. However, Almor does not give a detailed account of how this happens.
JANUS—Looking Back JANUS is a model of coreferential NP-anaphor processing. So it should account for the choice of anaphoric expression in production, the possible interpretations of such expressions in comprehension, and the ease with which each interpretation can be made. Our focus here will be primarily on comprehension, because comprehension rather than production has been the subject of most psycholinguistic studies of anaphor processing. Given that a linguistic expression is the realization of a coreferential NP anaphor, the (usually) preceding linguistic and non-linguistic context relevant to its interpretation have to be determined by the comprehender, and its interpretation must then be derived (see, Garnham, 1987). In fact, for some noun phrases it may not be apparent whether they are anaphoric until an attempt is made to interpret them anaphorically. So, the processes of deciding whether a noun phrase is an anaphor, and interpreting it correctly if it is, may well be interleaved. The JANUS model predicts, from its basic functional assumption, that, as far as the looking back function is concerned, a coreferential anaphoric NP should have enough content to avoid indeterminacy of reference, though sometimes this will be in conjunction with other material in its own clause, as we discuss later. Cases such as “Jane talked to Sue and then she left” are straightforwardly predicted to be problematic, because neither the anaphor alone, nor the anaphor together with the rest of the material in its clause, provides referential determinacy. The ILH focuses on the actual antecedent in determining how easy an anaphoric noun phrase is to understand. However, there are clearly circumstances in which there is more than one possible antecedent (that the speaker or writer might refer to in production, or that the hearer or reader might take the anaphor to refer to in comprehension). So, the content of the anaphoric expression must depend on what other potential antecedents are available and how they might be referred to. Any other content must be justified in other ways, and in particular in terms of the looking-forward function of the anaphor. Centering Theory and the ILH both assume that, at any point in a text, there is one entity that is most likely to be referred to. In Centering Theory this entity is the highest-ranked forward-looking center of the utterance that has just been completed. The ILH is vaguer on this issue, though Almor does assume that there is a (most highly) focused item that is the default referent for any coreferential NP anaphor. And as was previously mentioned, Greene et al. (1992) suggested that if there were not just one possible referent for a pronoun, that pronoun would not be interpreted.
258
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Whatever the case for pronouns—and neither Centering Theory nor the ILH assumes that pronouns can only refer to the most salient entity at any point in a discourse—other coreferential NP anaphors can refer to less salient entities. The question therefore arises as to how a coreferential NP anaphor becomes associated with its actual antecedent and how the relations between the anaphor and that antecedent and the anaphor and other possible antecedents affect the mapping process. The partial ordering of forward-looking centers in Centering Theory and the assumption of, at least, a most salient entity in other frameworks suggest that there could be an ordered search through a set of possible antecedents. If this ordered search did take place, then when the most likely referent was the actual referent, only the actual referent and not other possible referents should affect the mapping process. However, even in the case of pronouns, there is evidence that potential referents that turn out not to be the actual referent are considered (e.g., Corbett and Chang, 1983; Gernsbacher, 1989). If at least some consideration is given to potential antecedents that turn out not to be the actual antecedent, the questions are: which ones are considered and how? There is general agreement that focus, in the psychological sense, is important in determining which entities are considered as candidates for anaphoric reference. Both psychological theories (e.g., Sanford and Garrod, 1981) and computational linguistic theories (e.g., Grosz, 1981; Sidner, 1983) make this claim. It is also implicit in linguistically based theories such as those of Ariel (1990) and Gundel, Hedberg, and Zacharski (1993). In the ILH, focus status simply affects the salience of potential referents and hence whether the information load produced by a particular anaphor, given its antecedent, is justified. In our view, this idea is too simplistic. Some types of anaphoric expression may preferentially seek nonfocused, or nonsalient, antecedents. We shall discuss such anaphors in more detail later. In terms of an ordered search, this type of effect might change the order in which potential antecedents are searched. Because Almor only considers the anaphor and its actual antecedent, he fails to consider these possibilities. From the point of view of production, the writer or speaker knows which entity he or she wishes to refer to, and has to choose an appropriate referring expression. The question in production is, therefore: to what extent does the writer or speaker consider potential confusions on the part of the listener or reader? It is obvious that gross confusions are avoided. If someone has been talking about John Brown and John Robinson, he or she is unlikely to try to refer to one of them just using the name “John.” But what about more subtle confusions? This question can be addressed empirically, for example in penciland-paper continuation tasks, though we have not done so. People can be asked to produce continuations of texts that introduce a number of different entities with overlapping semantic properties. By looking to see which potential confusions speakers and writers avoid in their choice of referring expressions in their continuations, it should be possible to determine what referents they think that their audience will consider for particular anaphoric expressions. Recent
LOOKING BOTH WAYS
259
work by Ferreira, Slevc, and Rogers (2005) suggests that speakers will notice and avoid conceptual ambiguities. For example, when faced with pictures of a large and small cassette tape, speakers will use adjectives to avoid ambiguity. However, speakers do not appear to notice lexical ambiguities caused by homophones: when presented with pictures of a roll of sticky tape and a cassette tape, speakers will use the unmodified word “tape” to refer to both. Likewise, studies of comprehension should be able to determine the potential referents that readers and listeners are actually considering. Probe word and cross-modal priming tasks can indicate that referents are or are not being considered. For these techniques to be applied usefully, it will be necessary to have clear hypotheses about which antecedents might be considered, and why. For example, if a pronoun refers to the most salient entity in recent discourse (the highest-ranked forward-looking center of the previous utterance), is there any evidence that other possible referents are considered? Is a probe related to a nonreferent primed? Is there an effect of whether the nonreferent does or does not have the same gender as the referent, in reading times? The issue of how other potential referents might be referred to has, in fact, been addressed indirectly in the literature on gender cueing in pronoun resolution (e.g., Carreiras, Garnham, and Oakhill, 1993; Garnham, Oakhill, Ehrlich, and Carreiras, 1995). Even when grammatical gender has no conceptual counterpart, as is usually the case when a gender-marked expression refers to an inanimate, it is often easier to resolve a pronoun when, on grounds of gender it can only refer to one of two things, as in (8), compared with when it can refer to both, and the actual referent is determined by context, as in (7). (7)
(8)
La
cape
a protégé
la
The-FEM
cape
protected
the-FEM jacket
elle
était impermeable.
it-FEM
was waterproof
La
cape a protégé
The-FEM cape protected elle
était impermeable.
it-FEM
was waterproof.
le
veste
parce qu’ because
manteau parce qu’
the-MASC coat
because
Are there more subtle effects of how things might be referred to? For example, following a sentence about a snake and a rat (in English), “it” could potentially refer to either creature. In an otherwise similar passage about a snake and a hiker, “it” could only be the snake, and the hiker would be “she” or “he.” We believe that the “repeated name penalty,” the difficulty in comprehension caused by the repeated use of a proper or common name instead of a pronoun (Gordon et al., 1993), which would be expected in the passage about the snake and the hiker, should be eliminated in the corresponding passage about the snake and the rat. However, direct comparison of a repeated name and a pronoun is not sensible in this case, because of the referential indeterminacy of the pronoun in the snake/rat case.
260
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Almor has argued, and his Informational Load Hypothesis predicts, that the “repeated name penalty” is really a penalty for overspecificity, which need not necessarily involve repetition. If he is right, then the penalty for using “reptile” in the snake/hiker case, where “it” is unambiguously the snake and not the hiker, should disappear in the snake/rat case in which “it” would be referentially indeterminate between the snake and the rat. We are planning experiments to test this prediction, which contrasts with that of Centering Theory where the repeated name penalty is a penalty for failing to pronominalize. In addition, the ILH predicts a penalty is for overspecificity because of interference between the additional content and the content introduced by antecedent into working memory. JANUS claims that overspecificity itself in problematic, because it has no functional explanation. The issue of what is a genuine candidate for anaphoric reference at any point in a text is a complex one, and there may be preferences among the genuine candidates, as suggested by Centering Theory in its partial ordering of forward-looking centers. And certainly not all genuine candidates need have been explicitly mentioned in the preceding discourse. Centering Theory identifies, for a particular purpose, the set of entities mentioned in the immediately previous “utterance” as candidates for reference in the present utterance, but even from one utterance to the next, entities mentioned indirectly, as in (9), or not at all as in (10) may become candidates for reference. (9)
Tom dreams a lot, but he never remembers them.
(10)
John could not control the car. The front offside tire had burst.
Looking at broader aspects of the discourse, it is clear that knowledge of the world determines the structure of what might be called “focus spaces.” A good example of this phenomenon is found in the classic “mental models” experiment of Glenberg, Meyer, and Lindem (1987). Glenberg et al. showed that whether a runner put on or took off a sweatshirt before starting to run determined how available that sweatshirt was later in the text. World knowledge tells us that if the sweatshirt was taken off, it was left behind, whereas if it was put on, it went with the runner. If the focus of the text remains on the runner, the sweatshirt remains more focused when it is with the runner. Another place where world knowledge plays a role in anaphor resolution is when the anaphor itself does not contain enough information to determine its referent. Under these circumstances, as we mentioned previously, Hobbs (1979) suggested that coreference relations often fall out of establishing coherence relations. As pointed out by Garnham (1991), coherence relations are best thought of as depending on, on the one hand, knowledge of linguistic conventions about the use of such things as sequences of tenses and conjunctions, and, on the other hand, knowledge of the world. This general idea encompasses Stevenson et al.’s observations about the effects of conjunctions on coreference assignment and about the effect of thematic roles (in the antecedent containing clause). Patterns of thematic roles (e.g., agent-theme vs.
LOOKING BOTH WAYS
261
source-goal) will affect the coherence relations between a clause expressing them and subsequent references to protagonists, in a way that will be modified by conjunctions. For example, source-goal relations typically occur in narratives where the default continuation is one that moves events forward, focusing on the consequences of the first event (and hence the goal). However, a following clause that begins with “because” will switch attention to the cause and hence the source. It is also possible that differences in behavior between anaphors referring back to humans (or animates) and anaphors referring back to nonhumans (or inanimates) could be explained by the different types of coherence relations that are likely to hold between eventualities whose participants come from these different classes. We also agree with Stevenson (Stevenson et al., 1995) that strategies such as subject assignment should be regarded as fallback strategies that will only be used when other strategies fail, for example when someone doesn’t have the knowledge to recover a particular causal relation. So, for example, in (11) the “because” signals a causal relationship, but whether John or Bill is the likely seller depends on whether a magnamometer is an instrument used by doctors or not. (11)
John sold his magnamometer to Bill because he wanted to become a doctor.
If doctors use magnamometers, “he” will be Bill. If magnamometers are very expensive piece of equipment, the sale of which might help a person to fund medical school, “he” will be John. Someone who doesn’t know what a magnamometer is will probably take “he” to be John, using subject assignment as a fallback. Parallel function is a tricky case, because it works best when it is very strongly signaled by linguistic parallelism. The principle that in its backward-looking function an anaphoric expression should have enough content to avoid indeterminacy of reference, but no more, answers a question we raised in Garnham and Cowles (in press): why is repetition not the favored form of anaphoric reference, as it is likely to be the least ambiguous? Complete semantic matching between the content of the antecedent and the anaphor is not the overriding consideration, though there must be sufficient overlap to avoid indeterminacy. The problem with (12) will readily be noticed, as no mapping is possible. (12)
Bob gave Bill a present because she was a kind person.
However, unnecessary content is potentially confusing, because it raises the question of what role it is playing. It should, therefore, be avoided, so as not to confuse comprehenders. A further issue about focus, which is raised by Almor’s experimental work, is that of contrastive focus. Almor used clefts to manipulate psychological focus, but such clefts are used most felicitously under contrastive focus contexts and have been argued to have a specific role in marking contrast (Kiss, 1998). Chafe (1976) defined contrastiveness using the following three factors: (1) shared background knowledge, (2) a set of possible candidates for
262
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
the contrasted element, and (3) the assertion that one of those candidates is the correct one, to the exclusion of all the others (exhaustive contrast). Thus, in contrastive contexts, there are limits on the identity of the referent of an anaphor—it must come from the set of candidates given by context. There is empirical evidence that the under certain discourse conditions, the existence of a contrast set in the discourse places an extra burden on working memory (Cowles, 2003; Cowles, Polinsky, Kutas, and Kluender, 2004). The question therefore arises as to whether any of the effects that Almor reports are specifically associated with contrastive focusing through clefting, or whether they are effects of focus per se. In our studies of inverse semantics distance effects in categories (Cowles and Garnham, 2005) we compared such effects when the antecedents were in clefted sentences with cases where the antecedents were in simple active sentences or in passive sentences. We obtained the same effects in both cases, suggesting that they are general effects of psychological focus status, and not specific to the case of contrastive focus. Nevertheless, contrastive focusing does imply a more complex focus structure, and might be expected to have effects on the processing of coreferential NP-anaphora. We turn now to cases where an anaphor has either too little or too much content to identify its referent. When it has too little content, disambiguation must be brought about by other means, and we claim that it is almost always by the remaining content of the anaphor-containing clause. The idea that, in cases of indeterminacy of the anaphor itself, the reader or listener looks to the other content of the anaphor-bearing clause is supported by data from Vonk (1984). In an eye movement study of implicit causality sentences, using only sentences with endings consistent with the verb’s own bias, Vonk found no overall effect of gender cueing, where gender cueing made an otherwise indeterminate pronoun determinate. However, people spent less time looking at the pronoun, and more time looking at the rest of the second clause, when there was no gender cue. We would expect this effect to be a general one: where an anaphor does not contain enough information to determine its reference, more time is spent processing the rest of its clause. The other type of case is when the pronoun contains more than enough information to determine its referent. In order to determine when this condition holds it is necessary to answer the question of what is a genuine candidate for the referent of a pronoun, since it is among the genuine candidates that the anaphor has to select. Ideally, if we could assume that anaphors always have just sufficient content to perform their forward- and backward-looking functions, and we could be sure of the circumstances in which they had no forward-looking functions, we could manipulate the content of the anaphor to determine what the genuine candidates for its referent were. More generally, we predict that when an anaphor has more content than it needs to identify its referent, for example, when an anaphor with less content is referentially determinate, processing will be disrupted. If the language processing system is following functional principles, the additional content should be
LOOKING BOTH WAYS
263
used to predict that the anaphor will have a forward-looking function as well as a backward-looking one. And it seems reasonable to assume that making this prediction would slow processing down. The repeated name penalty would be an example of such slowing. However, as we indicated previously, there will be cases where repeating a name is the best way of avoiding a referential indeterminacy, and we would predict that any penalty for repeating the name would disappear (though, for other reasons, a nominal NP anaphor may take longer to process than a pronominal NP anaphor). So our view, like Almor’s, is that unnecessary content in an anaphor slows processing, but we do not agree with Almor that this effect arises through semantic interference in working memory. In fact, we are in a stronger position than Almor, who is often hard-pressed to make specific predictions. The reason is that the interference he postulates in working memory works against other factors such as facilitation of memory retrieval by overlapping content. Perhaps surprisingly, it is not even clear that Almor predicts standard typicality effects for nonfocused antecedents—“. . . the robin . . . the bird . . .” being easier than “. . . the ostrich . . . the bird . . .,” since the facilitation of memory retrieval is these cases (bird accessing robin better than it accesses ostrich) has to overcome interference in working memory (bird and robin interfere more than bird and ostrich). In our view, it is the processing of unnecessary content itself that causes problems, and it does so because the language processing system is following functional principles which state that content should be justified in terms of communication of information. How does this idea relate to Almor’s inverse typicality effect and, more generally, to the inverse semantic distance effects that the ILH predicts for strongly focused antecedents? In our view it is not clear that “the bird” contains more unnecessary content if its antecedent is “the ostrich” than if its antecedent is “the robin.” However, if the antecedent is strongly focused, both are likely to contain unnecessary content relative to “it,” since a strongly focused antecedent is the default referent for a pronoun and the pronoun can be mapped onto it without worrying too much about whether the content matches. So, we would not predict a typicality effect, but neither would we necessarily predict an inverse typicality effect. An alternative explanation, looking ahead to forward-looking functions of anaphors, is that referring to an ostrich as a bird is somewhat more likely to indicate a shift of perspective on what is being talked about than referring to a robin as a bird. It may be that this stronger anticipated shift of perspective explains any inversion of the typicality effect— remember that in our own studies we found mixed evidence for the inversion of the typicality effect. This explanation might also apply to why we did find an inverse semantic distance effect in hierarchies such as cobra-snake-reptile. Referring to a cobra as “the reptile” may well indicate a stronger change of perspective than referring to a snake as “the reptile,” with a greater change of specificity between the former than the latter. Let us summarise our views on the backward-looking processes of coreferential NP-anaphor resolution. The overriding principle is the Gricean
264
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
one that the anaphoric expression (or, more strictly, that expression and the other material in its clause) should contain just sufficient content—no more, no less—to fulfill its backward-looking and forward-looking functions. If the anaphor does not have, or appears not to have, the right content, there should be a functional explanation. In Gricean terms, an implicature should be generated. The principal backward-looking function is to identify the correct referent from among those entities that are genuine candidates for anaphoric reference at the point where the anaphoric expression occurs. We will discuss later whether the type of the anaphoric expression plays a role in determining what is a genuine candidate. If the anaphoric expression itself contains insufficient information to choose between two (or more) genuine candidates, the reader or listener must consider the other information in the anaphor-containing clause to determine the actual referent. Initially, we make the strong assumption that information further forward in the discourse should not be relevant to determining the referent of an anaphoric expression. There may be cases where, for example to create a certain type of tension in a text, information is deliberately withheld. But we regard these cases as marked, in the linguistic sense.
JANUS—Looking Forward According to the JANUS model, the content of a coreferential NP-anaphor is determined not only by the relation of the anaphor to previous text but also by its relation to upcoming text. This looking forward is looking forward to the consequence (for the discourse) of rementioning a particular entity, rementioning it in a particular way, and rementioning it in relation to what is said about it in its own clause. To put this idea another way, the form of an anaphoric expression is determined, in part, by the discourse function of the anaphoric expression itself—how it contributes to signaling the future direction of the text. This discourse function of coreferential NP-anaphors is something that Almor ignores in his ILH. In discussing the upcoming text, we will focus primarily on the content of the clause containing the anaphor. If the anaphor is in subject position, that text will be almost entirely upcoming. However, it is possible that some of that text comes before the anaphoric expression itself, particularly when the anaphor is not in subject position. We are not committed to the idea that text beyond the anaphor-containing clause can exert no influence on the content of an anaphoric expression, though we believe this kind of influence to be uncommon, and relatively minor when it does occur. We also believe that the extent to which an anaphoric expression needs to signal a discourse function may depend on whether there are other textual markers of that discourse function in its own clause. In considering how an anaphoric expression “looks back” to previous text, we have focused on cases in which the antecedent can be uniquely determined at the point where the anaphoric expression is encountered. However,
LOOKING BOTH WAYS
265
it is not always the case that such resolution is possible. Sometimes it is only the anaphor together with the remaining content of the clause in which it occurs, that determines its referent. Indeed, in the case of the “implicit causality” sentences with no gender cue to a pronoun’s referent, such as (1) repeated here as (13), the full content of the “because” clause is needed to determine that “he” is John. (13)
John confessed to Bill because he wanted a reduced sentence.
In such sentences, the content of the anaphoric expression can be determined by the content of what is to follow. It has less content than it apparently needs to have, because other information contributes to determining what it refers to. There are also cases in which the anaphoric expression has more content than it needs in order to determine its reference. These cases have received less attention in psycholinguistics, but we believe they are equally important. Vonk, Hustinx, and Simons (1992) showed both that writers use overspecified referring expressions (i.e., ones with more content than necessary to determine their referent) to signal a thematic shift, and that readers take such overspecified referring expressions to signal such a shift. An example of the type of shift they used was from talking about someone’s research to talking about their family life as in (14a) followed by (14b) or (14c). (14a)
Professor Alan Johnson is a very busy man. In addition to being the father of a large family, he is employed at the medical faculty of the University of Utrecht. His current research subject is massage as therapy. There are, he tells us, a large number of different massage techniques, and new techniques are added each year. He mentions footsole-massage as one of the most important techniques. Johnson was trained as a masseur in the past. He still works regularly as a masseur. In this way he keeps in touch with the field and interesting ideas for new research come up again and again.
(14b)
Johnson, a professor of medicine, is the father of seven children.
(14c)
He is the father of seven children.
(14d)
Johnson, a professor of medicine, considers this research important.
(14e)
He considers this research important.
Vonk et al.’s results showed that, even though there was no thematic shift in (14a) followed by (14d), the use of an overspecific NP anaphor, “Johnson, a professor of medicine,” made information from the preceding text relatively unavailable, just as when there was an actual shift of theme, as in (14a) followed by (14b) or (14c). Given the strong hypothesis that a coreferential NP anaphor should contain just enough information to identify its antecedent uniquely, and
266
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
no more, it follows that the language processing system should form expectations depending on whether the information in the anaphor itself can perform the identification function. If it is not sufficient, then immediately upcoming information should complete the disambiguation (as in some implicit causality sentences). If there is too much information, a change of theme or perspective should be expected, and if it does not happen the processing system should be “surprised.” More generally, we expect that a final “decision” about whether the form of an anaphoric expression is appropriate will be taken, at the latest, by the end of the clause containing the anaphor, and that empirical consequences of the unfolding processes should be detectable. This hypothesis predicts that manipulations of subjects’ expectations about upcoming discourse could have an effect on how seemingly extra semantic information is processed. If subjects believe that such information will be relevant in the upcoming discourse (i.e., post-anaphor) they find it easier to process than when the anaphor itself is the first signal of a thematic shift.
JANUS—The Anaphoric Expression According to JANUS, anaphoric processing can only be understood properly by considering how the anaphoric expression relates back to the preceding text and forward to the upcoming text. We also believe that the type of the anaphoric expression plays a direct role in determining how it is interpreted. Centering theory has relatively little to say about the interpretation of different types of coreferential NP-anaphora, except that it implicitly says, and has been taken (e.g., by Peter Gordon and colleagues) as saying, that fuller noun phrases may be problematic in certain cases where pronouns could or should have been used. The ILH explicitly assumes that the same factors affect the interpretation of all types of coreferential NP anaphors and that systematic differences in behavior among different types of anaphor, if there are any, fall out of these general principles. As previously mentioned, a different tradition, represented in work of Sanford and Garrod (1981), suggests that different types of anaphoric expression explicitly trigger different types of search in memory. In Sanford and Garrod’s system, definite pronouns trigger a search for an antecedent in explicit focus, a representation of entities recently and directly mentioned, whereas fuller definite NPs trigger a search in explicit and implicit focus, where implicit focus contains entities dependent on the currently relevant scenario, and which may not have been explicitly mentioned. It does seem reasonable to suggest that the form of an anaphoric expression might direct the search for its antecedent. Indeed, in some recent work with Marion Fossard (Fossard, Garnham, and Cowles, 2003) we examined the processing of demonstrative NP anaphors. People read passages with beginnings such as (15a) which continued as (15b) or (15c).
LOOKING BOTH WAYS
267
(15a)
At restaurants Peter/Alice loves taking his/her time to read the menu. The last time, he/she had hesitated so much between two dishes that he/she finally had to ask a waitress to help him/her choose something from the menu.
(15b)
In fact, he/she simply ordered the dish of the day.
(15c)
In fact, that man/woman simply ordered the dish of the day.
We found that the demonstratives were understood more rapidly than the pronouns when they referred to nonfocused antecedents and when there was no gender cue that discriminated between different possible referents for the pronoun. If coreferential NP anaphors were always initially taken to refer to the most highly focused possible antecedent, regardless of their own form, it would be difficult to explain this result.
Conclusions With the notable exception of Almor (1999), psychologists have not addressed themselves to producing an integrated theory of (coreferential) NP anaphor processing. Many factors affecting the interpretation of anaphoric expressions have been identified, but no general framework for integrating them has been proposed. Centering Theory, which derives primarily from computational linguistics, contains some interesting ideas, but focuses primarily on pronouns. Almor’s Informational Load Hypothesis tries to identify a small number of very general principles governing the use of coreferential NP anaphors, and it is more psychologically based than Centering Theory. Nevertheless, we believe it has some serious shortcomings. The two principal ones are its failure to consider the role of possible alternative antecedents for an anaphor and its failure to consider the discourse function of the anaphor itself. We have proposed JANUS as, what we hope is, a promising alternative to the ILH. JANUS suggests that a proper psychological account of coreferential NP anaphora must take account of both how the anaphor relates back to previous text and what function the anaphor performs in its own clause. JANUS also acknowledges that the type of a coreferential NP anaphor—pronoun, definite NP, demonstrative NP—may influence the way that the process of finding the antecedent takes place. JANUS acknowledges that working memory plays an important role in anaphor resolution and that working memory limitations, both in storage and in processing, have repercussions for the process of finding an antecedent. However, we do not believe that Almor is correct in identifying interference in working memory as a crucial process in anaphor resolution. In this chapter we have been able only to sketch the basic assumptions of the JANUS model, and to discuss how they have been derived from a consideration of previous models and of a limited range of empirical findings. The model needs to be further developed to accommodate existing data, but also
268
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
to determine what some of the existing data mean. We have also suggested some specific predictions from the JANUS model, and these predictions need to be put to empirical test. Our work was supported by Grant R000239362 Local Focus and NP Interpretation: Testing the Informational Load Hypothesis from the UK Economic and Social Research Council to Alan Garnham. We would like to thank Jeanette Gundel and Nancy Hedberg for comments on an earlier draft of this chapter NOTES 1. We do not intend, either here or elsewhere in this chapter, that “look” should be taken in a strictly literal sense of, say, making eye movements in reading. Indeed such a literal sense is not available for either the production or comprehension of spoken language. Rather, we mean that it is necessary to consider what has gone before and what is to come, perhaps from some memory representation of that material. 2. We will not discuss these syntactically based effects in detail, but throughout we assume that they are operating in both language production and language comprehension.
REFERENCES Almor, A. (1999). Noun-phrase anaphora and focus: The informational load hypothesis. Psychological Review, 106, 748–765. Almor, A., Kempler, D., MacDonald, M. C., Andersen, E., and Tyler, L. K. (1999). Why do Alzheimer patients have difficulty with pronouns? Working memory, semantics, and reference in comprehension and production in Alzheimer’s Disease. Brain and Language, 67, 202–227. Ariel, M. (1990). Accessing noun-phrase antecedents. London: Routledge. Arnold, J. (1998). Reference form and discourse patterns. Unpublished Ph.D. Dissertation, Stanford University. Bach, E., and Partee, B. H. (1980). Anaphora and semantic structures. In J. Kreiman and A. Ojeda (Eds.), Papers from the parasession on anaphora. Chicago: Chicago Linguistics Society. Baddeley, Alan D. (1966). Short-term memory for word sequences as a function of acoustic, semantic, and formal similarity. Quarterly Journal of Experimental Psychology, 18, 362–365. —— (1992). Working memory. Science, 255, 556–559. Baddeley, A. D., and Hitch, G. J. (1974). Working memory. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 8), pp. 47–90. New York: Academic Press. Brennan, S. E., Friedman, M. W., and Pollard, C. J. (1987). A centering approach to pronouns. Proceedings of the 25th Annual Meeting of the Association for Computational Linguistics, 155–162, Stanford, CA. Carreiras, M., Garnham, A., and Oakhill, J. V. (1993). The use of superficial and meaning-based representations in interpreting pronouns: Evidence from Spanish. European Journal of Cognitive Psychology, 5, 93–116.
LOOKING BOTH WAYS
269
Chafe, W. L. (1976). Givenness, contrastiveness, definiteness, subjects, topics, and points of view. In Ch. N. Li (Ed.), Subject and Topic (pp. 25–55). New York: Academic Press. Chambers, C. G., and Smyth, R. H. (1998). Structural parallelism and discourse coherence: A test of centering theory. Journal of Memory and Language, 39, 593–608. Corbett, A. T., and Chang, F. R. (1983). Pronoun disambiguation: Accessing potential antecedents. Memory and Cognition, 11, 283–294. Cornish, F., Garnham, A., Cowles, H. W., Fossard, M., and André, V. (2005). Indirect anaphora in English and French: A cross-linguistic study of pronoun resolution. Journal of Memory and Language 52(3), 363–376. Cowles, H. W. (2003). Processing Information Structure: Evidence from Comprehension and Production. Unpublished PhD Dissertation. University of California at San Diego. Cowles, H. W., and Garnham, A. (in press). Noun-phrase anaphor resolution: Antecedent focus, semantic overlap and the Informational Load Hypothesis. In E. Gibson and N. Pearlmutter (Eds.), The processing and acquisition of reference. Cambridge, MA: MIT Press. —— . (2005). Antecedent Focus and Conceptual Distance Effects in Category Noun-Phrase Anaphora. Language and Cognitive Processes. 20, 725–750. Cowles, H. W., Polinksy, M., Kutas, M., and Kluender, R. (2003) Brain responses to differences in the processing of informational and contrastive focus. Paper presented at the Ninth Annual Architectures and Mechanisms for Language Processing Conference, Glasgow, U.K. Ferreira, V. S., Slevc, L. R., and Rogers, E. S. (2005). How do speakers avoid ambiguous linguistic expressions? Cognition 96, 263–284. Fossard, M., Garnham, A., and Cowles, H. W. (2003). Referential accessibility and anaphoric resolution: The case of the demonstrative noun-phrase “that N.” Poster presented at the Ninth Annual Conference on Architectures and Mechanisms for Language Processing (AMLaP-2003), Glasgow, August 2003. Garnham, A. (1981). Anaphoric reference to instances, instantiated and noninstantiated categories: A reading-time study. British Journal of Psychology, 72, 377–384. —— . (1984). Effects of specificity on the interpretation of anaphoric noun phrases. Quarterly Journal of Experimental Psychology, 36A, 1–12. —— . (1987). Understanding anaphora. In A. W. Ellis (Ed.), Progress in the psychology of language (vol. 3, pp. 253–300). London: Lawrence Erlbaum Associates. —— . (1989). Integrating information in text comprehension: The interpretation of anaphoric noun-phrases. In G. Carlson and M. Tanenhaus (Eds.), Linguistic structure in language processing (pp. 359–399). Dordrecht: Kluwer Academic Publishers. —— . (1991). Where does coherence come from: A psycholinguistic perspective. Occasional Papers in Systemic Linguistics, 5, 131–141. —— . (1996). The other side of mental models: Theories of language comprehension. In J. V. Oakhill, and A. Garnham (Eds.), Mental models in cognitive science: Essays in honour of Phil Johnson-Laird (pp. 35–52). Hove, East Sussex: Psychology Press.
270
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
—— . (2001). Mental models and the interpretation of anaphora. Hove, East Sussex: Psychology Press. Garnham, A., and Cowles, H. W. (in press). Mental models and noun phrase anaphora. In C. Zelinsky (Ed.), Memory and Language. Amsterdam: John Benjamins. Garnham, A., Oakhill, J. V., Ehrlich, M-F., and Carreiras M. (1995). Representations and processes in the interpretation of pronouns: New evidence from Spanish and French. Journal of Memory and Language, 34, 41–62. Garrod, S. C., and Sanford, A. J. (1977). Interpreting anaphoric relations: The integration of semantic information while reading. Journal of Verbal Learning and Verbal Behavior, 16, 77–90. Gernsbacher, M. A. (1989). Mechanisms that improve referential access. Cognition, 32, 99–156. —— . (1990). Language comprehension as structure building. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Gernsbacher, M. A., and Hargreaves, D. (1988). Accessing sentence participants: The advantage of first mention. Journal of Memory and Language, 27, 699–717. —— . (1992). The privilege of primacy: Experimental data and cognitive explanations. In D. L. Payne (Ed.), Pragmatics of word order flexibility (pp. 83–116). Philadelphia: John Benjamins. Glenberg, A. M., Meyer, M. and Lindem, K. (1987) Mental models contribute to foregrounding during text comprehension. Journal of Memory and Language, 26, 69–83. Gordon, P. C., Hendrick, R., and Foster, K. (2000). Language comprehension and probe-list memory. Journal of Experimental Psychology: Learning, Memory and Cognition, 26, 766–775. Gordon, P. C., and Scearce, K. A. (1995). Pronominalization and discourse coherence, discourse structure and pronoun interpretation. Memory and, Cognition, 23, 131–323. Greene, S. B., McKoon, G., and Ratcliff, R. (1992). Pronoun resolution and discourse models. Journal of Experimental Psychology: Learning Memory and Cognition, 18, 266–283. Grice, H. P. (1975). Logic and conversation. In P. Cole and J. Morgan (Eds.), Syntax and semantics 3: Speech acts (pp. 41–58). New York: Academic Press. Grosz, B. (1981), Focusing and description in natural language dialogues. In A, K. Joshi, B. L. Webber, and I. A. Sag (Eds.), Elements of discourse understanding. Cambridge: Cambridge University Press. Grosz, B., Joshi, A., and Weinstein, S. (1995). Centering: A framework for modelling the local coherence of discourse. Computational Linguistics, 21, 203–226. Grosz, B., and Sidner, C. L. (1986). Attentions, intentions, and the structure of discourse. Computational Linguistics, 12, 175–204. Gundel, J., Hedberg, N., and Zacharski, R. (1993) Cognitive status and the form of referring expressions in discourse. Language, 69, 274–307. Haarmann H. J., Davelaar E. J. and Usher M. (2003). Individual differences in semantic short-term memory capacity and reading comprehension. Journal of Memory and Language, 48, 320–345. Haarman, H., and Usher, M. (2001). Maintenance of semantic information in capacity-limited item short-term memory. Psychonomic Bulletin and Review, 8, 568–578.
LOOKING BOTH WAYS
271
Hobbs, J. R. (1979). Coherence and coreference. Cognitive Science, 3, 67–90. Keenan, E. L., and Faltz, L. M. (1985). Boolean semantic for natural language. Dordrecht, The Netherlands: Reidel. Kiss, K. (1998) Identificational focus versus information focus, Language, 74, 245–273. Lesgold, A. M., Roth, S. F., and Curtis, M. E. (1979). Foregrounding effects in discourse comprehension. Journal of Verbal Learning and Verbal Behavior, 18, 291–308. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. San Francisco: Freeman. Reinhart, T. (1981). Definite NP-anaphora and C-command domains. Linguistic Inquiry, 12, 605–635. Sanford, A. J., and Garrod, S. C. (1980). Memory and attention in text comprehension: The problem of reference. In R. S. Nickerson (Ed.), Attention and Performance8 (pp. 459–474). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. —— . (1981). Understanding written language: Explorations in comprehension beyond the sentence. Chichester, West Sussex: John Wiley and Sons. Sheldon, A. (1974). The role of parallel function in the acquisition of relative clauses in English. Journal of Verbal Learning and Verbal Behavior, 13, 272–281. Sidner, C. L. (1983). Focusing in the comprehension of definite anaphora. In M. Brady and R. C. Berwick (Eds.), Computational models of discourse. Cambridge, MA: MIT Press. Smyth, R. H. (1992). Multiple feature matching in pronoun resolution: A new look at parallel function. Proceedings of the Second International Conference on Spoken Language Processing (pp. 145–148). Edmonton: Priority Printing. —— . Grammatical determinants of ambiguous pronoun resolution. Journal of Psycholinguistic Research, 23, 197–229. Sperber, D., and Wilson, D. (1995). Relevance: Communication and cognition (2nd ed.). Oxford: Blackwells. Stevenson, R. J., Crawley, R. A., and Kleinman, D. (1994). Thematic roles, focus and the representation of events. Language and Cognitive Processes, 9, 519–548. Stevenson, R. J., Nelson, A. W. R., and Stenning, K. (1995). The role of parallelism in strategies of pronoun comprehension. Language and Speech, 38, 393–418. Vonk, W. (1984). Eye movements during the comprehension of pronouns. In A. G. Gale and F. Johnson (Eds.), Theoretical and applied aspects of eye movement research (pp. 203–212). Amsterdam: North-Holland. Vonk, W., Hustinx, L. G. M. M., and Simons, W. H. G. (1992). The use of referential expressions in structuring discourse. Language and Cognitive Processes, 7, 301–333. Walker, M. A., Joshi, A. K., and Prince, E. F. (Eds.) (1998). Centering theory in discourse. Oxford: Clarendon Press.
272
HOW DO WE SELECT FORMS OF REFERRING EXPRESSION ?
Index
Abbot, Barbara, 4, 6, 61–62, 64, 67–68, 70–72, 82, 90 accessibility, 21, 85, 90, 98, 143, 177, 209 activated, 8–9, 21, 133, 149, 150, 154–57, 159–60, 163, 166–67, 169, 171, 226, 240 Albrecht, J. E. 123–4, 129, 136 algorithm, 4, 9, 95–96, 99, 101, 107, 112–13, 120, 145, 149, 151, 154–58, 163, 171–72, 209, 222, 232 Allen, James F., 151, 164 Almor, A., 10, 253–59, 261, 265, 268 Alonso-Ovalle, Luis, 205 Amaral, Patrícia Matos, 205 anaphora/anaphor/anaphoric, 7, 9–10, 29, 37–38, 43, 52, 53, 55, 69, 76, 79–84, 110, 130–31, 133, 162, 172, 176, 193, 196, 209, 221–2, 242, 246–50, 252–59, 261–68 anaphoric reference, 42–43, 84, 90, 123, 256, 259, 261–62 Andersen, E., 254 André, E., 226 André, V., 270 animacy, 70, 144, 188–90, 192, 250 antecedent, 7, 10, 30, 79–81, 85, 104–5, 109, 120, 124, 129, 145–46, 151, 153–54, 157–58,
162, 171, 182, 184, 193, 221, 239–40, 242, 248–68 Aposolou–Panara, A., 71 Ariel, Mira, 84, 143, 149, 177, 202, 216, 256–57, 259 Arnold, J., 3, 116, 248 Asher, Nicholas, 152, 196, 226 associative anaphora, 81–82 attentional state, 7, 108, 114, 233 Austin, J. L., 51 Bach, E., 247. Bach, Kent, 4–5, 18–19, 22, 30, 32, 40–41, 50–51, 53, 55–56, 70 backchannel, 180, 183–84, 208 backward–looking center (Cb), 8, 112, 178–80, 185, 197, 204, 206–07, 217, 251, 262–65 Baddeley, Alan D., 253–54 Baldwin, Breck, 145 Barwise, Jon, 45 Beaver, David I., 3, 90, 145, 148, 216, 241 Benjamin, Carmen, 191, Bezuidenhout, Anne, 53, 64 Birner, Betty J., 63, 67, 82, 87, 90 Black, A., 243 Bolinger, Dwight L., 176 Borthen, Kaja, 164 Botley, Simon, 151 bound anaphora, 76, 82 Braun, David, 3, 35–36, 54, 90
273
Breheny, Richard, 62 Brennan, Susan E., 96, 112–13, 145, 177, 183, 196, 204, 223, 241, 251 Bresnan, Joan, 215 Bridging, 186, 222, 224 Brown, Gillian, 208 Brown, R. D., 156 Brown-Schmidt, Sarah, 4, 7, 164, 167, 172 Burge, Tyler, 36, 40 Büring, Daniel, 120 Burzio, Luigi, 83 Butt, John, 191 Byron, Donna K., 4, 7–8, 149, 151, 157, 164, 167, 172, 180, 183–84, 196, 241 CallHome corpus, 177, 181–83, 196–97, 204 Cameron, Richard, 205 Campana, Ellen, 173 Caramazza, A., 110 Carbonell, Jaime G., 156 Carletta, Jean, 215 Carreiras, Manuel, 177, 260 Castellan, N. J., 223 cataphora, 79, 81, 257 centering, 7–9, 95–96, 107–8, 111–13, 120, 145, 177–86, 188, 191, 193–94, 196–98, 206–10, 216–9, 231–35, 239–41, 248, 250–52, 255–56, 258–59, 261, 267–68 Centering Theory, 7–9, 108, 111, 177, 181, 196, 206, 217–18, 225, 240, 248, 250–52, 255–56, 258–59, 261, 267–68 centering transitions, 196–97, 206, 208, 234 Cf ranking, 188, 192, 209 Chafe, Wallace L., 102, 234, 262 Chambers, Craig, 3, 104, 112, 114, 250 Chang, F. R., 259 Channon, Robert, 144, 148–49, 167 Charniak, Eugene, 156 Chastain, Charles, 52 Cheng, Hua, 213 Chodorow, M. S., 96 Chomsky, Noam, 87, 105 Clark, Herbert H., 81, 186
274
INDEX
clefts, 87, 172, 254–55, 262–63 Clifton, C. Jr., 3–4, 7, 123–4, 129, 131, 136 cognitive status, 21, 143, 154–56, 158–59, 164, 166–67, 171–72, 177, 226, 233, 258 Cohen, J. D., 134 coherence relation, 105–6, 108, 114, 116, 118–20, 248, 250, 261–62 common noun transfer, 75 common role mapping, 124–6, 135–37, 138 composite, 8, 148, 161–62, 165–71 conjunction, 7, 31, 100–1, 103, 123–5, 129–30, 134–37, 192, 261–62 conjunction cost, 7, 123–6, 129–30, 132, 135–37 Constraint 1, 182, 217, 219, 222–3 context, 5–6, 13–14, 16–17, 27, 32, 39–47, 50, 53–54, 62–63, 66–67,74–75, 79–82, 84–85, 88–89, 95–96, 102, 119–20, 130, 133, 146, 155, 157–60, 187–88, 197, 206, 258, 262–63 Continue, 8–9, 112–13, 179–80, 195, 197, 200–4, 206, 210, 218–19 contrastive, 120, 126, 133, 170, 205, 262–63 conventional implicature, 63–64, 66, 71 conversational implicature, 55, 63–67 Corbett, A. T., 259 Cornish, Francis, 185, 196, 251 Cote, Sharon, 188 Cowles, H. W., 3–4, 9, 248–49, 254, 262–63, 267 Crawley, Rosalind A., 250 Curtis, M. E., 248 Dahan, Daphne, 170 Davelaar E. J., 254 Davidson, Brad, 205 deferred equative, 6, 74–76, 78, 85–90 deferred reference, 6, 54, 73–75, 78–86, 88, 90 definite description, 5–6, 13–14, 16, 21–31, 38–40, 42, 47–48, 50–53, 56, 61–67, 69–71, 75–76 demonstrative, 5, 7–9, 13–14, 17–18, 22–3, 27, 32–33, 38–42, 44–45,
49–50, 54–55, 77–78, 97–98, 143–44, 146–64, 167–72, 199, 202, 222, 225, 227–9, 240, 247, 267–68 descriptive reference, 5, 46–47 Di Eugenio, Barbara, 8, 177, 180, 188–89, 192, 196, 200–1, 210, 216, 227, 230, 232, 241 Dimitriades, Alexis, 210 discourse deixis, 149 discourse focus, 147, 150, 217, 255 discourse model, 79–81, 83–85, 89–90, 178 discourse reference, 16, 29–31, 52 discourse theme, 7, 125 discourse topic, 8, 131, 133, 196, 216, 224–5, 233, 240 Donnellan, Keith, 5, 27, 34, 51, 53 Downing, Pamela A., 204, 207 Dzikovska, Myroslava O., 167 Eberhard, K. M., 175 Eckert, Miriam, 149, 151, 156, 164, 183 Edelsky, Carole, 184 Ehrlich, M-F., 260 empathy, 188–90, 209 Enríquez, Emilia, 205 entity coherence, 220, 223 entity realization, 185 Epstein, Richard, 67–68 Eschenbach, Carola, 123 Establishment (EST), 9, 223, 228, 230, 32–36, 239–41 Evans, Gareth, 51 Evans, R., 244 exhaustiveness, 63 existential sentences, 35, 70 eye-tracking, 124, 136, 164–65 Fais, Laurel, 198, 209 false start, 180, 183 Faltz, L. M., 247 familiarity/familiar, 6, 18, 20–21, 23–24, 45, 51, 55, 62–70, 150, 172, 202, 205 Fauconnier, Gilles, 74, 83–84 Ferguson, George, 155 Fernández-Solera, Susana, 210 Fernández Soriano, Olga, 189
Ferreira, F., 131 Ferreira, V. S., 133, 260 fictional reference, 16, 31 Fillmore, Charles J., 172 first mention, 100, 177, 249 Flatt, M., 134 Fletcher, P., 245 forward-looking centers (Cf), 112, 178, 185, 193–94, 217, 251–52, 255, 258–61, 263–65 Fossard, M., 267 Foster, K., 249 Fox, B. A. 233 Frazier, Lyn, 138 Frege, Gottlob, 33, 35, 36, 47 Fretheim, Thorstein, 87 Friedman, Marylin W., 241, 251 Garnham, A., 3–4, 9, 143, 246–47, 254–55, 257–58, 260–63, 267, 269 Garnsey, S. M., 131 Garretson, Gregory, 215 Garrod, Simon C., 124, 129, 143, 233, 255, 257, 259, 267 Ge, Niyu, 146 Geach, Peter Thomas, 79 Geluykens, Ronald, 210 Gernsbacher, Morton Ann, 177, 209, 249, 259 Gilliom, Laura A., 96, 130,188 Givenness Hierarchy (GH), 8, 149–50, 163, 170, 177, 226, 233 Givón, Talmy, 190, 209 Glenberg, A. M., 261 GNOME corpus, 9, 190, 220, 227, 233, 241 good-enough representation, 7, 123, 132 Gordon, Peter C., 96, 107, 130, 177, 188, 192, 204, 206, 209–10, 222, 249, 252, 260, 267 gender, 83, 95, 97, 99, 100–1, 127, 144, 247, 260, 263, 266, 268 grammatical role (parallelism), 7, 95, 99, 102–7, 110, 112, 114–15, 118, 120, 145, 147, 150, 154, 158, 166, 171, 251 grammatical role ranking, 95, 145, 147, 154–55, 158
INDEX
275
Greene, S. B., 252, 258 Grice, H. P., 22, 24, 51, 55, 63, 65–66, 69, 150, 210, 226, 253 Gricean, 10, 55, 66, 68, 253, 264–65 Grosz, Barbara J., 8, 84, 96, 111, 130, 145, 177, 181, 188, 198, 216–18, 222, 232–34, 241, 248, 250–51, 255, 259 Gundel, Jeanette, 7, 9, 21, 62, 64, 67, 70, 82, 84, 87, 97–98, 101–2, 143–44, 149–50, 164, 177, 196, 202, 209, 216, 225–27, 233, 240–41, 242, 257, 259, 269 Gupta, S., 110
indefinite descriptions, 14, 16, 28–31, 38, 51–53, 62, 67, 70 indexicals, 14, 16–18, 22–23, 25, 27, 32–33, 38–42, 45–47, 49–50, 53–54 indirect realization, 185, 217, 222, 224, 228 inferable entities, 183, 185–88, 197–98, 209 information status, 84, 143, 222, 240 Informational Load Hypothesis (ILH), 10, 252–61, 264–65, 267–69 intonation unit, 181 ISL corpus, 181, 196, 205
Haarmann H. J., 254 Habel, Christopher, 123 Hadic Zabala, Loreley, 183, 208 Hahn, Udo, 188, 218, 222–25, 227, 230–31, 237–39, 241 Hale, John, 174 Halliday, Michael A. K., 185, 193, 198 Hargreaves, David J., 177, 209, 249 Harnish, R. M., 51, 55 Hasan, Ruqaiya, 185, 198 Hawkins, John A., 63, 67, 81, 82 Heap, David, 209, 223 hearer-new, 81, 188, 218 hearer-old, 188, 218 Hedberg, Nancy, 7, 9, 21, 101–2, 209, 228, 241, 259, 269 Heeman, P., 151 Heim, Irene, 64–65 Hendrick, Randall, 249 Henschel, Renate, 213 Herweg, Michael, 123 Hitch, G. J., 253 Hitzeman, Janet, 233, 239–42 Hobbs, Jerry R., 96, 106, 108–11, 116, 118, 120, 248, 250, 261 Hudson-D’Zmura, Susan, 96, 112, 177 Hughes, A., 245 Hurewitz, Felicia, 186, 198 Hustinx, L. G. M. M., 131, 266
Jackendoff, Ray S., 87, 188 Jaeggli, Osvaldo A., 191 JANUS, 9–10, 246–47, 255, 257–58, 261, 265, 267–69 Jarvella, R. J., 26 Jefferson, Gail, 198 Joshi, Aravind K., 8, 185, 219, 250
Iida, Masayo, 215, 245 in focus, 8, 21, 132, 150–51, 154–55, 158–61, 166–67, 169–71, 185, 190, 202, 226, 227, 240, 254
276
INDEX
Kadmon, Nirit, 82 Kameyama, Megumi, 104, 117, 158, 182, 216, 218, 227, 234, 239 Kaplan, David, 25–27, 33, 39, 41, 46–47, 52–54, 87, 226 Karamanis, N., 223 Karttunen, Lauri, 29–30, 52 Keenan, E. L., 247 Kehler, Andrew, 3–4, 6–7, 85, 105, 113–15 Kempler, D., 254 Kempson, Ruth M., 62 Kibble, Rodger, 241 King, Jeffrey, 29, 38, 42, 54 Kiss, K., 262 Klein, W., 226 Kleinman, David, 250 Kluender, R., 263 Knott, A., 122, 244 Koontz-Garboden, Andrew, 215 Kronfeld, Amichai, 98 Kripke, Saul, 33, 47–48, 51, 55–56 Kuhn, S., 219 Kukich, Karen, 194 Kuno, Susumu, 189 Kutas, M., 263
Lakoff, Robin, 115 Lambrecht, Knud, 82, 87, 102, 208, 209 Lappin, Shalom, 96, 120 Leass, Herbert J., 96, 120 Ledoux, Kerry, 212 left-dislocation, 195, 204, 209 Lesgold, A. M., 248 Levin, Lori, 181 Li, C. N., 227 Lieber, Rochelle, 80 Linde, Charlotte, 144, 146, 164, 225–26 Lindem, K., 261 linguistic reference, 15, 30, 50 See also semantic reference Litman, D., 234 local center, 148, 162 Lockwood, Michael, 36 long-distance pronouns, 239–40 Ludlow, Peter, 23, 28, 62–71 Luján, Marta, 205 MacDonald, M. C., 254 MacWhinney, B., 134 Mann, W. C., 220 Marr, D., 246 Matthews, A., 96 Matthiessen, Christian M. I. M., 193 Maxim of Quantity, 10, 22, 66, 150, 226, 253 McKoon, G., 252 meaning transfer, 75–76, 78, 80–82, 84 Mellish, C., 243–44 Mendikoetxea, Amaya, 191, 210 Meyer, M., 261 Mill, John Stewart, 24, 33, 53 Millianism, 16, 32, 34–37, 40, 47, 49, 53, 55 Miltsakaki, Eleni, 194, 225 Modjeska, N. N., 9, 225, 227, 233 Montague, Richard, 61 Moxey, Linda M., 123–5, 135–36, 138 Muller, Christoph, 146, 149 Neale, Stephen, 23, 28, 30, 37 Nelson, A. W. R., 248 Nikitina, Tatiana, 215
non-arbitrariness, 65 non-referential use, 5, 23, 34, 37, 40, 54, 153 null pronoun, 182, 186, 189, 194, 200, 200–1, 205–6, 210 null subject, 185–87, 189–91, 193–94, 201, 232 Null transition, 208, 223, 228–32, 234–40 number indifferent action, 7, 126–32, 134–36 number sensitive action, 7, 126–32, 134–36 Nunberg, Geoffrey, 38, 54, 73–84, 86, 90 O’Connor, M. Catherine, 215 O’Donnell, M., 244 Oakhill, J. V., 260 Oberlander, J., 241 Oehrle, Richard T., 99 open proposition (OP), 74, 86–88, 90 outbound anaphora, 79–80 Partee, Barbara H., 79, 247 Passonneau, Rebecca J., 144, 146–48, 154, 162, 164, 172, 222, 226–28, 233–35 Perry, John, 41, 45, 55 personal pronoun, 7–8, 22, 44, 144–48, 150–55, 157–64, 167–72, 191, 201, 205, 227 Pesetsky, David, 209 plural reference, 7, 102, 123–24, 129–130, 133, 135, 157 plural reference object, 123 Poesio, Massimo, 3–4, 9, 182, 198, 208, 216, 217–18, 220, 223–25, 227, 230, 232–34, 239–41 Polinksy, M., 263 Pollard, Carl J., 241, 251 Pollatsek, A., 128 Portner, P. H., 216, 241 Postal, Paul M., 80 Power, R., 244 pragmatic mapping, 6, 74, 84, 89–90 Prasad, R., 225 Predelli, Stefano, 39 predicate transfer, 75–77, 86
INDEX
277
preferred center (Cp), 9, 112, 178–79, 208, 210, 217–18, 223, 227–29, 231–32, 239–40, 251, 255 Prince, Ellen F., 81, 84, 87, 143, 185–86, 188, 192, 196, 216, 218, 250 pronoun chains, 146 pronoun, 5–10, 21–23, 27, 29–32, 37–38, 43–44, 54–55, 79–82, 95–102, 104–5, 107–120, 123–24, 126–32, 134–38, 143–72, 176–77, 180, 182, 184–91, 193–94, 196, 198–208, 210, 221–22, 224–33, 239–41, 247–52, 254, 257–60, 263–64, 266–68 pronouns of laziness, 79, 82 pronoun rule, 251 See also Rule 1 proper name, 5, 7–8, 13–14, 16, 23–24, 32–38, 40, 45, 47–48, 53, 55, 57, 69, 76, 107, 115, 125, 130–32, 136, 155, 201, 204, 207 Provost, J., 134 Rambow, O., 216 Ratcliff, R., 252 realization of referents, 185, 222–25, 228, 258 Recanati, François, 38, 53, 85 referential intentions, 16–17, 20, 41–42, 44–45, 54–55 referring expression, 9, 13–14, 16–17, 27, 33, 39–40, 50, 80, 84, 111, 143, 145, 149, 152, 176–77, 180, 193–96, 198, 200, 206, 247, 259, 266 reflexives, 55, 76, 82–84, 155, 190, 210 Reinhart, T., 241, 247 Rehkämper, Klaus, 123 Relevance Theory, 253 repeated name penalty, 130, 204, 206, 210, 260–61, 264 Retain, 8–9, 112–13, 179–80, 195, 197, 200–03, 218–19, 223, 227, 251 Rhetorical Structure Theory, 220 Rieser, H., 242 right-dislocation, 195 Roberts, Craige, 82, 159, 177
278
INDEX
Rochemont, Michael S., 87 Rogers, E. S., 260 Roth, S. F., 248 Rough shift, 8, 179, 197, 200–01, 223, 251 Rule 1, 96, 112, 180, 196, 219, 222–24 See also pronoun rule Rule 2, 112, 120, 179, 209, 219, 222–25, 241 Russell, Bertrand, 5, 14, 18–19, 22–24, 26–29, 32–35, 47, 50–51, 53, 61–63, 71, 82 Sacks, Harvey, 214 Sag, Ivan A., 74, 90 salience/salient, 10, 21–23, 41, 44, 51, 54–55, 69, 74, 80–81, 84–90, 96–103, 113, 117, 144–45, 147–49, 151, 154, 156–59, 163–64, 166–71, 177–78, 180, 185, 190, 192, 194, 209–10, 225–26, 232–33, 241, 254–56, 259–60 Salmon, Nathan, 32, 35–36, 47 Sanford, Anthony J., 3–4, 7, 123–24, 129, 133, 135, 136, 138, 143, 255, 257, 259, 267 Scearce, K. A., 252 Schegloff, Emmanuel, 198 Schiffer, Stephen, 51, 54–55 Schiffman, Rebecca, 146 Schiffrin, Deborah, 181 Schuster, Ethel, 144, 147–48, 154, 164 Schwenter, Scott A., 90, 205 scope of quantification, 82 Scott, D., 90, 241 Sedivy, J. E., 175 segment boundaries, 9, 232–41 semantic reference, 16, 42, 45, 47, 49, 51 See also linguistic reference Sharp, Randy, 210 Sharvy, Richard, 63 Sheldon, A., 249 Sidner, Candance L., 84, 145, 158, 177, 181, 196, 198, 232–33, 248, 250, 255, 259 Siegel, S., 223 Simons, G., 131, 266
singular proposition, 19–0, 24–9, 31, 34, 47, 49, 51–52 singular term, 5, 14–17, 20–2, 32–34, 37, 39–40, 45–46, 50 singular thought, 5, 15, 18–19, 24–7, 47, 51 Slevc, L. R., 260 SMASH, 6–7, 95–120 Smith, Quentin. 39, 50, 53 Smooth shift, 8–9, 179, 195, 200–01, 223, 230, 251 Smyth, Ron, 104–5, 107, 112, 114, 250 Soames, Scott, 35, 36, 47, 53, 55. Spanish, 8, 176–77, 179, 181, 183, 185, 187–97, 199–201, 203, 205–10 speaker reference, 5, 15–17, 42, 44–46, 49 Sperber, D., 253 Spivey-Knowlton, M. J., 175 Stenning, K., 248 Stent, Amanda, 180, 183–84 Stevenson, Rosemary, 103–5, 107, 115–16,118–20, 190, 241, 248, 250, 252, 261–62 Stewart, Miranda, 133, 205 Strawson, P. F., 5, 13–14, 32, 39, 51, 53 Strube, Michael, 145–46, 149, 151, 156, 164, 183, 188, 208, 218, 222–25, 227, 230–31, 237–39 subject preference, 102, 104, 107, 113, 250, 252 Suñer, Margarita, 189 syntactic focus, 148, 155, 157–58, 161–62 Taboada, Maite, 4, 8–9, 183, 196, 208–10 Tanenhaus, Michael K., 4, 7, 131, 164, 177 Taylor, P., 243 Tetreault, Joel, 145, 157, 172, 209 thematic roles, 102–3, 116, 119, 189, 250–52, 261 theory of descriptions, 14, 22, 50 Thompson, S. A., 220 topic(hood), 7–9, 85–87, 101–2, 113–15, 117–18, 120, 131, 133, 144–48,
150, 155, 158–64, 169–72, 179–80, 182, 188, 190, 196, 198, 203, 206, 208–09, 216–19, 223–25, 233, 240–41, 250 TRAINS corpus, 8, 146, 151–52, 154, 155, 157, 159–60, 163, 167, 169, 171 Trueswell, J. C., 131 Turan, Ümit Deniz, 188, 225, 227 Tyler, L. K., 40, 254 underspecification, 39, 40, 86, 132 uniqueness/unique, 6, 18–19, 21, 23, 25–26, 53, 62–71, 82–83, 88, 98, 150, 256, 265–66. Usher M., 254 utterance segmentation/segment, 9, 111, 148, 177–83, 185, 191, 196–98, 208–10, 217, 219–22, 232–41, 250–52 Vallduví, Enric, 87, 216, 241 Villa, Victor, 211 Vonk, W., 131–32, 263, 266 Walker, Marilyn A., 159, 177, 185, 188, 192, 209, 216–17, 221–23, 225, 232–34, 241, 250 Wanner, Dieter, 190, 209 Ward, Gregory, 4, 6, 63, 67, 71, 80, 82, 85, 87, 90 Wasow, Tom, 215 Webber, Bonnie Lynn, 84, 149, 154, 198, 226, 241 Weinstein, Scott, 8, 219, 250 Well, A. D., 128 Wilson, D., 253 Winograd, Terry, 108 Woods, A., 242 Yabushita, K., 216, 241 Yang, Chin Lung, 212 Yngve, Victor H., 183 Yule, George, 208 Zacharski, Ron, 3, 7, 9, 21, 101–2, 259 Zaenen, Annie, 209 Zero transition, 9, 179, 182, 208, 223, 228–32, 235–40
INDEX
279