Linguistic Inquiry Monograph Forty-five
Interface Strategies Optimal and Costly Computations
Tanya Reinhart
Interfac...
51 downloads
762 Views
9MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Linguistic Inquiry Monograph Forty-five
Interface Strategies Optimal and Costly Computations
Tanya Reinhart
Interface Strategies
Linguistic Inquiry Monographs Samuel Jay Keyser, general editor 6. 10. 12. 13. 15. 16. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45.
Some Concepts and Consequences of the Theory of Government and Binding, Noam Chomsky On the Nature of Grammatical Relations, Alec Marantz Logical Form: Its Structure and Derivation, Robert May Barriers, Noam Chomsky Japanese Tone Structure, Janet Pierrehumbert and Mary Beckman Relativized Minimality, Luigi Rizzi Argument Structure, Jane Grimshaw Locality: A Theory and Some of Its Empirical Consequences, Maria Rita Manzini Indefinites, Molly Diesing Syntax of Scope, Joseph Aoun and Yen-hui Audrey Li Morphology by Itself: Stems and Inflectional Classes, Mark Arono¤ Thematic Structure in Syntax, Edwin Williams Indices and Identity, Robert Fiengo and Robert May The Antisymmetry of Syntax, Richard S. Kayne Unaccusativity: At the Syntax–Lexical Semantics Interface, Beth Levin and Malka Rappaport Hovav Lexico-Logical Form: A Radically Minimalist Theory, Michael Brody The Architecture of the Language Faculty, Ray Jackendo¤ Local Economy, Chris Collins Surface Structure and Interpretation, Mark Steedman Elementary Operations and Optimal Derivations, Hisatsugu Kitahara The Syntax of Nonfinite Complementation: An Economy Approach, Zˇeljko Bosˇkovic´ Prosody, Focus, and Word Order, Maria Luisa Zubizarreta The Dependencies of Objects, Esther Torrego Economy and Semantic Interpretation, Danny Fox What Counts: Focus and Quantification, Elena Herburger Phrasal Movement and Its Kin, David Pesetsky Dynamic Antisymmetry, Andrea Moro Prolegomenon to a Theory of Argument Structure, Ken Hale and Samuel Jay Keyser Essays on the Representational and Derivational Nature of Grammar: The Diversity of Wh-Constructions, Joseph Aoun and Yen-hui Audrey Li Japanese Morphophononemics: Markedness and Word Structure, Junko Ito and Armin Mester Restriction and Saturation, Sandra Chung and William A. Ladusaw The Linearization of Chains and Sideward Movement, Jairo Nunes The Syntax of (In) Dependence, Ken Safir Interface Strategies: Optimal and Costly Computations, Tanya Reinhart
Interface Strategies
Tanya Reinhart
Optimal and Costly Computations
The MIT Press Cambridge, Massachusetts London, England
6 2006 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please e-mail special_sales@mitpress .mit.edu or write to Special Sales Department, The MIT Press, 55 Hayward Street, Cambridge, MA 02142. This book was set in Times Roman and was printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Reinhart, Tanya. Interface strategies : optimal and costly computations / Tanya Reinhart. p. cm.—(Linguistic inquiry monographs ; 45) Includes bibliographical references and index. ISBN 0-262-18250-5 (alk. paper); 0-262-68156-0 (pbk: alk. paper) 1. Grammar, Comparative and general. I. Title. II. Series. P151.R45 2006 415—dc22 2005054004 10 9 8 7
6 5 4 3 2
1
Contents
Acknowledgments
ix
Introduction: Optimal Design
1
Chapter 1 Reference-Set Computation
13
1.1
The Minimal Link Condition
1.2
Interpretation-Dependent Reference Sets
1.3
The Interface Strategy: Repair of Imperfections
Chapter 2 Scope-Shift 2.1
14 25
47
Quantifier Scope: The State of the Art
2.1.1 The Optimistic QR View of the 1970s
48
48
2.1.2 The Syntactic Freedom of Existential Wide Scope
50
2.1.3 Can the Problem with Existentials Be Explained Away? 2.1.4 The ‘‘Realistic’’ QR View of the 1980s 2.1.5 Some Problems
2.2
2.2.1 Wh–In Situ
2.3
53
60
61
The Alternative of Wide Scope In Situ
2.2.2 Sluicing
37
64
64
66
The Interpretation Problem of Wide Scope In Situ
2.3.1 Wh–In Situ
69
2.3.2 Sluicing 71 2.3.3 Existential Wide Scope
73
68
vi
Contents
2.4
The Semantic Problem with Island-Free QR
2.5
An Intermediate Summary
76
79
2.6 Where No QR Is Needed: Choice Functions for Existential Quantifiers 81 2.6.1 Choice Functions and Existential Closure
81
2.6.2 Deriving the Choice-Function Interpretation 2.6.3 The Collective-Distributive Distinction
85
88
2.6.4 Which Indefinites Are Interpretable by Choice Functions? 2.6.5 Some Choice-Function Semantics
2.7
91
95
Scope-Shift: An Interface Repair Strategy
2.7.1 Minimize Interpretative Options
101
101
2.7.2 Applying the Illicit QR as a Repair Strategy
105
2.7.3 Processing Limitations on the Size of Reference Sets—Indefinite Numerals
Chapter 3 Focus: The PF Interface 3.1
125
Sentence Main Stress
127
3.1.1 Cinque’s Main-Stress System
127
3.1.2 Szendro˝i’s Main-Stress System
3.2
How Focus Is Coded
131
134
3.2.1 Main Stress and Focus: The Basic View 3.2.2 PF-Coding: The Focus Set
3.3
Stress Operations
134
135
141
3.3.1 Focus and Anaphora
141
3.3.2 The Operations: Destressing and Main-Stress Shift
3.4
Reference-Set Computation
3.4.1 Focus Projection 3.4.2 Markedness
148
156
156
161
Chapter 4 The Anaphora Reference-Set Strategy 4.1
165
Two Procedures of Anaphora Resolution
4.1.1 The Current Picture
166
4.1.2 What Is Binding? 169 4.1.3 Covaluation 172
165
110
Contents
4.2
vii
Anaphora Restrictions
4.2.1 Restrictions on Binding
173
173
4.2.2 Restrictions on Covaluation
4.3
178
The Interface Strategy Governing Covaluation (Rule I)
181
4.3.1 Minimize Interpretative Options 181 4.3.2 Reference-Set Computation 186 4.3.3 Further Details of the Computation
190
4.4
Covaluation in Ellipsis Contexts
192
4.5
The Psychological Reality of Rule I
196
Chapter 5 The Processing Cost of Reference-Set Computation 5.1
Acquisition of the Coreference Rule I
5.1.1 An Overview of Binding and Rule I
199
204
206
5.1.2 Thornton and Wexler’s Arguments against the Processing Account 5.1.3 Questions of Learnability
227
5.1.4 Explaining Chance Performance
5.2
232
Acquisition of Main-Stress Shift
5.2.1 An Overview of Stress and Focus 5.2.2 Preliminaries
238
238
246
5.2.3 Switch-Reference Resolution
251
5.2.4 Guess and Default: Focus Identification in the Scope of Only 5.2.5 Useful and Arbitrary Defaults
5.3
266
Acquisition of Scalar Implicatures
Notes
293
References
315
Author Index
331
Subject Index
335
272
259
216
Acknowledgments
Kriszta Szendro˝i has made important contributions to this work since 1999, when I presented my ideas on focus and stress in the summer linguistics school in Potsdam, Germany. Her dissertation (Szendro˝i 2001) has shaped much of my analysis of stress in chapter 3, and her research as a post doc in Utrecht since 2002 has turned the acquisition hypothesis entailed by the focus analysis into reality. It is hard to imagine that chapter 5 would have materialized without her ideas, experiments, and constant feedback. In this area of stress and focus, I also wish to thank Ad Neeleman, who taught me how important PF is, and how to think about it. Many of the arguments in chapter 3 were formed in our joint work (Neeleman and Reinhart 1998). My views on economy, reference-set computation, and the economy approach to anaphora were formed in intensive interaction with Danny Fox since 1993, when we were both exploring Yael Golan’s idea that reference-set comparisons are relative to interpretation. His inspiration goes beyond what is evident in the chapters on these topics. Choice functions were conceived through endless conversations with Remko Scha in the early 1990s. His patience, knowledge, and wisdom were formative at the stage summarized in Reinhart 1992. At later stages, many of the ideas were developed further through collaboration with Yoad Winter. Eddy Ruys has been a constant source of scrutiny and feedback on all issues of QR, economy, syntax, and interpretation. An acquisition conference in Trieste in 1998 provided key input for this work. Gennaro Chierchia, Stephen Crain, Maria Teresa Guasti, and Rosalind Thornton presented their early findings on the acquisition of scalar implicatures there. It was the first indication that 50 percent performance (chance) is found in another area of reference-set computation besides coreference. So what was by then just a hypothesis seemed to become an experimental reality. My exchanges since, particularly with
x
Acknowledgments
Stephen Crain, have been extremely valuable as I pursued this hypothesis further. The final shape of this book owes a lot to anonymous reviewers, of this monograph as well as of previous articles that were partially incorporated here. Their thorough and insightful comments gave me even more food for thought than I could incorporate.
Introduction: Optimal Design
A hypothesis that got much attention in the 1990s is that the wellformedness of syntactic derivations is not always determined by absolute conditions, but it may be based on a selection of the optimal competitor out of a set of candidates—a reference set. A restricted version of this was assumed at the early stages of the minimalist program (Chomsky 1992, 1994), and simultaneously, it has been the central notion developed in Optimality Theory (OT) (see Prince and Smolensky 1993 and later work on optimality in syntax, including Grimshaw 1997). The computation of optimality selection is of a di¤erent sort than previously assumed in syntax. Typically, it requires that in computing a given derivation, an alternative derivation be constructed, in order to determine whether a given step in the current derivation (or the full output) is permitted. I will refer to this type of computation as reference-set computation (following the notation in the early minimalist framework). Naturally, the introduction of this new type of computation ignited a debate on whether the computational system of natural language (syntax) indeed includes such computations. In later developments of the minimalist framework (since Chomsky 1995), it was determined that there is no evidence that reference-set computation applies in core syntax. To establish that such computation is required it would be necessary to show that whatever it derives could not be derived otherwise, with more minimal computations. In fact, all original syntactic arguments in favor of this computation have found a simpler explanation in the minimalist framework. The line of argument I pursue here is that this computation is nevertheless available to the computational system, as witnessed in some areas of the interface. But it is much more restricted than assumed in OT. It applies only as a ‘‘last resort,’’ when the outputs of core syntax operations are insu‰cient for the interface. The instances of reference-set computation that I examine in subsequent chapters can be viewed as interface
2
Introduction
strategies needed to make up for imperfections in the system. Their operation is severely restricted, and applying them comes with a processing cost. The concept of the interface that underlies this work can be best illustrated with a thought experiment from Chomsky 2000—an ‘‘evolutionary fable.’’ Imagine a primate that by some mystery of genetic development acquired the full set of the human cognitive abilities, except the language faculty. We can assume, then, that among other cognitive abilities, he has a system of concepts similar to that of humans, and a sensorimotor system that enables the perceiving and coding of information in sounds. Let us assume, further, that he has an innate system of logic, an abstract formal system, which contains an inventory of abstract symbols, connectives, functions, and definitions necessary for inference. What would he be able to do with these systems? Not much. Based on the rich concept system of humans, his inference system should in principle allow him to construct sophisticated theories and communicate them to his fellow primates. However, the inference system operates on propositions, not on concepts, so it is unusable for the primate in our thought experiment. Possibly he could code concepts in sounds, but not the propositions needed for inference. Pursuing this thought experiment, the goal of linguistic theory can be described as reconstructing the system the primate lacks, which consists of whatever is needed to facilitate the interface of his various cognitive systems. In other words, the goal is to construct the computational system (CS) (syntax in a broad sense) that defines language (L), a state of the faculty-of-language (FL) organ, which makes this interface possible. Correctly capturing the interface is the crucial adequacy criterion of any syntactic theory. This is not to be confused with functional accounts of language. There is ample evidence by now that it is strictly impossible to derive the properties of the computational system from any functional considerations of language use. Systems of inference, use, and communication are consistent with many possible languages, and they cannot explain why the particular human language was selected. On the other hand, it is a crucial fact about human language that it can be used to argue, communicate, think, and so on. If our formal analysis of the computational system turns out to be inconsistent with basic facts of language use—for example, if it can be shown that the representations it generates are unusable for inference or cannot adjust to varying contexts of use— this cannot be the correct analysis, since the actual sentences of human language can be used for such purposes.
Optimal Design
3
Figure I.1
Making the interface possible means that the other cognitive systems should be able to access the representations generated by the CS, namely, these representations should be legible to the other systems. Chomsky defines the ‘‘interface levels’’ as sets of representations legible to other systems (external to the faculty of language). He outlines two broad external systems (each consisting of sets of systems): (1) the sensorimotor, articulatory-perceptual systems, and (2) the conceptual-intentional (C/I), or thought systems. But for work on the interface it may be useful to decompose further the components of the C/I interface. I assume three (sets of ) such systems, along with the sensorimotor sound system, as schematized in figure I.1. We may assume that basic information of the concepts system is coded on the lexical items, which are the building blocks of the CS derivations. For this purpose, the information must be coded in a form legible to other systems—for example, as thematic features. The problem of legibility with the concepts system is somewhat di¤erent from the problem with the other C/I systems. Some of the information of the concepts systems, like the number of Theta roles of a verb—the verb’s arity—must be legible to the CS (rather than conversely). Similarly, the thematic properties of a selected argument may determine its merging order in the derivation (agents must merge in SpecVP, and so on). Much of the other information coded on lexical items is not legible to the CS itself, but it is transferred through the derivations of the CS to the other C/I systems, and it should be legible to them. (This is the same intuition that underlies the concept of interpretable features in the minimalist framework.) The
4
Introduction
inference system is essentially logic and its inventory includes, for example, logical relations, functions, abstract predicates, and variables (but no constants). The outputs of the CS are representations that are legible to the inference system, which can read them as propositions fit for its computations. The hardest to define given our present state of knowledge are the context systems that narrow the information transmitted through the derivation (coded in the relevant representation), and select the information that is useful for the context of use. On this view, then, the C/I systems are the concepts/context/inference systems. Figure I.1 is, of course, an abstraction. It may turn out to be necessary to assume that the context, inference, and sound systems may have direct interfaces, rather than each negotiating only with the CS, as in the figure. (This is related to Chomsky’s (2000) question of association.) I will touch occasionally on this question, particularly in chapter 3, on focus and stress, but I do not attempt a comprehensive answer. The specific focus of this book is the interface of the CS with the inference and the context systems. The interface with the concepts system is the topic of research on the relations between the lexicon and the computational system. I discuss the concepts interface in Reinhart 2002. A central question of linguistic theory, then, is how the interface is guaranteed, or what makes the CS representations legible to the other systems. Put more broadly, the question is how structure and use are related. There is no pretheoretical way to answer this. Suppose we observed, empirically, that a certain derivation D is associated with a set U of possible uses. In principle, there are several conceivable ways that this could come about. One is that the properties necessary for U are directly coded in D, through the computational system, as specific features, functional projections, operations, or conditions on derivations. In other instances, it is possible that there is no direct relation between the syntactic properties of D and U. Rather, the set U is determined solely by independent properties and computations of the external systems, which apply to legible CS representations and further modify them. Yet another possibility is that there are some interface strategies associating D and U, using independent properties of the CS, and of the external systems. I believe all three solutions to the question of the interface are realized in various areas of the interface, but the one actually favored in syntactic practice is the first—that of syntactic coding. Many of the properties now encoded in the theory of the CS got there in order to guarantee the correct interface with the external systems of use. Quantified (Q), focus (F), and referential (R) features are just a few examples. The features
Optimal Design
5
approach has been found useful in current syntax. The theoretical goal is that syntactic operations—the computational system—should be driven only by purely formal and mechanical considerations, and feature checking is a formal procedure of this nature. Nevertheless, it is far less obvious in advance that the full information necessary for the interface is coded in the CS in the same way. What it would mean, if true, is that features of one cognitive system are fully coded in another. It is easy to see why this is an attractive option. If this is how the human system has developed genetically, it can be viewed as a perfect system, where full matching of the various cognitive systems is guaranteed in advance, and no problem of coordination (or of the interface) can ever arise. Though not impossible conceptually, more evidence that language is perfect in this sense is needed than is currently available. We should also bear in mind that if the properties we encode in the CS (as theoreticians) do not, in fact, belong there, we are unlikely to get very far. As we will see, encoding interface properties in the CS has led to an enormous enrichment of the machinery. In many cases, the result is a highly baroque syntax, which, nevertheless, fares rather poorly in capturing the interface. This book covers four areas of the inference/context interface: quantifier scope, focus, anaphora, and (more briefly) scalar implicatures. The first question in each of the areas is what makes the computational system legible to the other systems at the interface. In other words, how much of the information needed for inference and context is coded in the CS, and how is it coded? Once the CS coding is established, we discover that in each of these areas there are certain aspects of meaning, or the use of derivations at the interface, that cannot be coded in the CS formal language, on both conceptual and empirical grounds. This residue, I argue, is governed by interface strategies that can be viewed as strategies of repair, adjusting the derivation to the needs on the interface. (Not all interface strategies are strategies of repair, but those discussed here are.) The broader context of this analysis is Chomsky’s (2000) hypothesis of optimal design. The term optimal as used here should not be confused with its use in Optimality Theory (OT). In OT, optimality is a type of computation—selecting the optimal competitor out of a reference set. Here the question is whether the genetic design of language happens to be optimal.1 As we saw with the primate thought experiment, the problem that language is a solution to is the interface of the di¤erent cognitive systems; to enable this, its representations must be legible to the other systems. Chomsky’s working hypothesis, viewed as an ideal to guide inquiry, is that the solution is optimal: ‘‘Language is an optimal solution
6
Introduction
to legibility conditions’’ (p. 96). A di‰cult question, of course, is what would count as an optimal design. A useful way to think about this may be to imagine a spectrum between a perfect and a poor solution, and optimal systems should be closer to the first alternative than to the second. As a first approximation, we may take the earlier view of the minimalist program, which I discuss in chapter 1. The assumption is that in a perfect system, the bare minimum needed for constructing derivations will be sufficient for the full needs of the interface. We may view deviations from the perfect system as imperfections—adjustments needed to enable the output representations to meet interface requirements. The less of these there are, the more optimal the system design is. Though this is not the full story, we may note that in the three interface areas under consideration here, the CS outputs are not su‰cient for the interface needs. Some extension or repair of what the CS allows is therefore needed, so these are areas of imperfections. I will argue that the repair strategies involve the application of an illicit operation, which is only motivated by the fact that the output representations of the CS are not su‰cient for the interface needs. Applying this operation requires constructing a reference set to check whether this is indeed necessary—that is, that this is the only way to meet the interface requirements. If optimal design has some measurable implications, imperfections should come at some cost. While capturing the interface is the minimal requirement of the CS, a factor that cannot be ignored in determining which solutions can be viewed as optimal is that the actual use of language is also restricted by questions of ‘‘hardware.’’ There is by now ample evidence that the human processor operates with limited resources of working memory and other limitations. So far we have looked only at the C/I interface. A CS that accommodates this interface would enable thought, but not yet communication. This also requires accommodating the SM (sensorimotor) interface, which, as formulated in Chomsky 2005, enables the externalization of language (sound, communication, processing). But the sensorimotor systems are severely constrained by the hardware available for sound production and perception, or more broadly, for the computations required in parsing sound inputs into linguistic representations. It may be possible to imagine various good solutions to the problem of the C/I interface, perhaps even better than the actual CSs, which require so much computation space that they would not be usable by humans. Suppose, then, that language is optimally designed to accommodate the C/I interface. The next question is how good the accommodation is to the SM interface, or the linking of the optimal design for the C/I interface with the fixed hard-
Optimal Design
7
ware properties of the SM systems. Of the various hardware questions, the one relevant to the problems discussed in this book is how a parser that operates with limited memory resources can access and apply the definitions of the CS. With this question in mind, we may turn to the fuller specification Chomsky (2000, 96) gives to the hypothesis of optimal design. He says: ‘‘Suppose that FL [Faculty of Language] satisfying legibility conditions in an optimal way satisfies all other empirical conditions too: acquisition, processing, neurology. . . . Then the language organ is a perfect solution to minimal design specifications. That is, a system that satisfies a very narrow subset of empirical conditions in an optimal way—those it must satisfy to be usable at all [i.e., the interface conditions]—turns out to satisfy all empirical conditions. Whatever is learned about other matters will not change the conclusions about FL.’’ Note that what is outlined here is the (unrealistic) perfect solution (rather than the optimal). In the perfect solution, the CS is some kind of genetic development that, while optimally enabling the C/I interface, also happens to fit perfectly for actual use with limited resources (thus satisfying all empirical conditions, not just the subset needed for the C/I interface). Sticking to the question of the parser, along with Chomsky’s reasoning in the paragraph quoted here, it is easy to see intuitively why this would be the ideal situation. The farther apart the CS and the parser are, the more questions arise regarding their coordination. For example, if a change takes place in one system, how does the other adjust? The question, then, is how close language design could be to this idealized perfect match. Phillips (1996) suggests that it is pretty close, arguing that ‘‘the parser is the grammar.’’ The question of how ‘‘transparent’’ the parser can be—to what extent it can directly apply computations of the CS, rather than its own independent algorithms—has a long history, with roots in Miller and Chomsky 1963. One interpretation of their proposal became known in research on processing as the Derivational Theory of Complexity. According to this theory, there should be a measurably greater processing load depending on the number and complexity of the operations assumed in syntactic theory (the theory even made the assumption that a passive sentence would impose greater processing load than an active sentence). The general contention since the mid-1970s was that this hypothesis did not find empirical support, and it was abandoned. Phillips reexamines this history in detail, and argues, first, that the empirical findings were not as sweepingly nonsupportive as they were said to be, and, more crucially, that this specific interpretation of Miller and Chomsky’s hypothesis was not
8
Introduction
necessarily warranted, because it also depends on the question of measurement, or perceptual complexity. It is not obvious that every di¤erence in processing steps, or number of operations applied, should be measurable. Berwick and Weinberg (1984) argued that in a nonserial model of parsing, it is possible that increased complexity does not increase time demands. Phillips’s hypothesis that the parser is the CS adopts the strongest possible version of a transparent parser, which in Berwick and Weinberg’s terms is token-to-token transparency between the CS and the parser. On this view, then, the fit of the parser and the CS is perfect. But executing this hypothesis in Phillips’s system also requires substantial changes in the CS, to make it usable in linear left-to-right parsing (changes that Phillips attempts to show are independently motivated by syntax-internal considerations, but that are nevertheless not consistent with current views of the CS). It may be realistic to consider the other option as well—that language is optimally designed, though it is not perfect. It has been repeatedly argued that in speech perception (processing) the parser uses specific principles or strategies that find no direct correlate in computations of the CS. The most famous are strategies resolving local ambiguity at a processing stage, which does not even arise in syntax, as in (1). (1) a. Max knows Lucie well enough. b. Max knows Lucie will laugh. In response time or eye-tracking experiments, it was found that there is more intense processing activity following the occurrence of Lucie in (1b) than in (1a). This indicates that the parser first attaches Lucie as a complement of the verb in both derivations, but then reanalysis is required in (1b). Such findings are often taken to suggest that the parser must be a system independent of the CS (other relevant parsing-specific operations will be mentioned shortly). But we may still ask how optimal the correlation between the two systems is. Let us assume that the parser is some algorithmic device that generates trees. But it can lend itself to any other system for the specifics of the trees generated—that is, it has no internal information about what counts as a legitimate tree. This means that as long as the CS definitions and computations are accessible to parsing algorithms, the parser can construct trees defined by the CS as legitimate outputs. Though the parser can, in principle, parse anything that is formally compatible, it has developed to operate within the hardware of limited human working memory. Hence, there
Optimal Design
9
are parser-specific strategies of how to minimize the load on working memory, while still applying the computations dictated by the system it works with, which in this case is the CS. A parser of this sort can be viewed as transparent, because apart from adjustments to hardware needs, it does not apply rules of its own, but borrows them from the CS—its computations are a function of the CS computations and the input string. (The term transparent parser has a long history, but I am using it only as described here.2) In practice, it may turn out that the actual human parser requires some parser-specific conditions for the processing of language derivations, but the more transparent the parser is, in this sense, the more optimal language design is. It means that the minimal design necessary to capture the interface also contains all the information needed for the parser; thus, the serious problems of coordinating two independent systems are avoided. Perhaps the reason the idea of a transparent parser was abandoned was that for years it did not seem possible to define such a parser for the CS. The crucial problem seemed to be that parsing, unlike the CS derivations, proceeds from left to right (top to bottom). However, Pritchett’s (1992) parser, which may not have received the attention it deserves, does solve this problem. It is a head-driven parser, which means that inputs are stored until a lexical head (verb) is reached. What distinguishes it from other head-driven parsers is a parsing condition (specific to language processing) stating that any step in the parsing derivation must satisfy a Theta requirement (Pritchett’s Theta attachment). Once a parsing step is licensed by Theta attachment, other attachments in this parse follow the CS instructions. When the first verb is encountered, the subject being stored can be attached, by Theta attachment. The derivation proceeds bottom up, and at the same parse, VP, IP, and CP are constructed, the subject moves from SpecVP to SpecIP, and so on. (Let us abstract away from the question of adjuncts here.) In (1b), repeated in (2a), when Lucie is encountered, Theta attachment allows only one option—attaching it as the complement of the verb, just as in (1a). Hence, when this attachment is found inconsistent with the subsequent input, reanalysis is required. (2) a. Max knew Lucie would laugh. b. Max warned Lucie would laugh. (Garden path) While (2a) is captured by virtually any of the available parsers without assuming either a head-driven or Theta-based parser, Pritchett’s analysis focuses on the di¤erence between (2a) and (2b). Both require reanalysis, but only the second is a garden path. Pritchett draws a crucial distinction
10
Introduction
between low-cost and high-cost reanalysis. The di¤erence between (2a) and (2b) is that the verb in the first sentence selects only one Theta role, but in the second it selects two. In Pritchett’s initial analysis, the problem with the second reanalysis was assumed to be that Lucie is moved from one Theta domain to another. This would be puzzling, because it means another parser-specific condition not familiar from syntax is required. However, later in the same book he shows that the conditions on permissible (low-cost) reanalysis are purely syntactic. Mulders (2004, 2005) and Siloni (2004) develop this second line of argument and maintain that reanalysis obeys a subset of the conditions that apply to movement in the CS. Garden paths, then, are cases where the parsing of a legitimate output of the CS requires a violation of a syntactic principle of the CS during parsing. If a parser-specific restriction on reanalysis is still required (which is a topic of debate between these two authors), it is, in any case, stated in terms defined in the CS. A central argument against Pritchett’s parser has been that, because of its head-driven engine, it cannot be suitable for verb-final languages, like Japanese or, partially, Dutch, because the system predicts that in such languages no commitment is made before the verb is encountered. However, Mulders (2002) shows that this is actually in accordance with the facts, because verb-final languages have fewer garden paths, and those that are still found follow from a Pritchett-based analysis. Mulders o¤ers an in-depth analysis of Japanese parsing, in this system. She shows further that it is possible to formulate a ‘‘universal’’ parser that accounts for garden-path phenomena across languages, with no appeal to languagespecific parsers. Pritchett’s system, then, is a realistic model of a highly transparent parser, which directly applies conditions of the computational system, with a very restricted set of parser-specific conditions. We may note that even the parser-specific condition of Theta attachment is not precisely an arbitrary parser condition. The current view of the CS is that its derivations proceed bottom up, starting with a verb or a lexical head, and, although this has been debated, certain aspects of the Theta criterion guide the derivation. Theta attachment remains a parser-specific condition, since the CS does not require that no computation can take place unless it also involves Theta attachment. But this parser-specific condition can be viewed as an attempt of the parser to imitate the CS computations as closely as possible—to start each step of the parsing in a position enabling it to proceed bottom up according to CS definitions. It seems possible to conclude that, with this model of the parser, language may indeed be optimally designed with respect to the ‘‘empirical
Optimal Design
11
condition’’ of use with limited resources. The minimal set necessary for the interface—that is, for language use to be possible at all—is also close to being su‰cient for capturing everything else. Turning now to reference-set computation, my focus in this book is on reference sets of the global sort (which requires holding two or more derivations open). If the parser is transparent, this means it actually carries out the required computation. But this goes against what is known about working-memory limitations. Given these limitations, the parser attempts to compute, close, and discharge chunks of the tree as soon as possible. Global reference-set computation does not allow such closure and discharge, before the full derivation is completed and evaluated against the reference set. So carrying out this computation imposes a serious processing load. As we will see, in frameworks assuming that this type of computation is the driving engine of the CS and is extremely common, it is necessary to assume parser-specific bypassing algorithms, so the parser does not, in fact, carry out the CS computations. Given that in such frameworks, this computation is always at work, this would mean that the parser cannot be transparent, or, in our terms, that language is not optimally designed. But the hypothesis I develop in this book is that reference-set computation, although available for the CS, is a severely restricted computation. As mentioned earlier, it applies in areas of imperfections, where the CS fails to meet the interface requirements and where an illicit operation must apply. On this view, imperfections have a processing cost. Let us continue to assume that language is optimally designed and that the parser is transparent, with the minimum parser-specific adjustments mentioned above, which means it also carries out reference-set computations when required. We would then expect to find some evidence for the cost involved. The problem of measurement this question entails is di¤erent than in the cases examined by the Derivational Theory of Complexity. It is not just a question of degrees of computational complexity, which may not be fully measurable (and it also depends on other considerations not discussed here), but it verges on the question of the upper limit of what can be processed within the limitations of human working memory. In chapter 5, I argue that children, whose working memory is not yet fully developed, simply cannot carry out the required computations, a claim based on extensive acquisition findings. Since their knowledge of the CS is innate, they know what they have to do in the relevant experimental tasks, but failing the execution they resort to bypassing strategies, one of which is guessing. Adults’ working memory is su‰cient for the standard
12
Introduction
instances of computations with two members in the reference set. (Otherwise we could not have any real way of knowing that such computations are available to the CS, apart from theory-internal considerations.) But in section 2.7.3, I argue that adults also cannot process reference sets with more than three members, which gives us an idea of the upper limit on what can be computed within the resources of human working memory. If accurate, these findings lend support to the hypothesis that the parser is transparent, and indirectly, to the hypothesis of optimal design.
Chapter 1 Reference-Set Computation
As mentioned in the introduction, reference-set computation (the selection of the optimal competitor out of a relevant reference set) moved to the forefront of linguistic theory in the 1990s. A restricted version of this process was assumed at the early stages of the minimalist program, and simultaneously, it has been the central notion developed in Optimality Theory. It turned out that none of the original arguments in the early minimalist program actually justify this move, which is the major reason it was eventually rejected. The present assumption in the minimalist framework is that none of the operations of the computational system require reference-set comparisons (section 1.1). But research in this area led to the discovery that there are certain instances where interpretationbased reference-set comparisons are still needed (section 1.2). Originally, these cases were associated with the Minimal Link Condition (MLC). I argue that these cases are not related to the MLC, but restricted instances of reference-set computation are operative at the interface, in areas where the outputs of the computational system do not meet the (contextual) interface needs and adjustments are required. These, indeed, are areas where there are imperfections in the computational system. We may expect therefore that there should also be some observable processing cost associated with these imperfections (section 1.3). In this chapter, I examine the formal properties of the reference-set type of strategy at the interface, and in the following chapters I turn to the various instances where it applies. To establish the type of computation involved, I begin with a survey of the development of the concept of reference-set economy in the minimalist program, and the reasons it was abandoned.
14
1.1
Chapter 1
The Minimal Link Condition
The early stages of the minimalist program, in Chomsky 1992 and 1994, introduced the concept of economy of derivations. There are two types of economy considerations in that early framework, which are summarized in (1) and (2). (As the theory developed, some of the terminology changed. I quote here from the earliest formulation of these ideas in Chomsky 1992, with later changes noted in brackets.) (1) ‘‘If a derivation D converges without application of some operation, then that application is disallowed’’ (Chomsky 1992, 47). (2) Minimal Link Condition (MLC) ‘‘Given two convergent derivations D1 and D2 [out of the same numeration1] . . . D1 blocks D2 if its links are shorter’’ (Chomsky 1992, 48). Condition (1) states that operations are only allowed if they enable a derivation to converge—that is, that derivations are driven only by the need to check features, which, if not checked, will disable convergence. Condition (2) governs the strategies that should apply if there is more than one possible way for a derivation to converge (i.e., there are two or more ways to satisfy feature checking). Chomsky argues that the strategies governed by (1) (which were, at the time, greed and procrastinate) could be viewed as reducing the computational complexity of the syntax. Given that the second strategy (2) requires comparing derivations and choosing one of them, the more permissible derivations they can select from, the bigger the computational e¤ort is. If the syntactic operations permitted are only those that satisfy (1), the number of permissible (convergent) derivations to compare is dramatically reduced. When there is, nevertheless, more than one way a derivation can converge, (2) requires choosing the shortest one. The MLC in (2) will be our center of attention in this chapter, because it is this condition that introduces reference-set computation into syntax. A given convergent derivation a is evaluated against a set of alternative convergent derivations: its reference set. If a derivation more economical than a is found in this set, a is blocked. Of course, the reference set should be strictly defined. (We do not want to compare derivations related by some arbitrary notion of similarity.) In a framework assuming syntactic levels, the reference set should include all and only derivations with identical input—that is, the same deep structure. In the minimalist program, syntactic levels were abolished. What guarantees that we compare only
Reference-Set Computation
15
derivations with identical input is the concept of numeration: a derivation starts with a numeration list of all the elements that it will use. Only derivations with identical numeration count as candidates for a reference set. Let us follow the development of the MLC in (2) and the concept of a reference set for a derivation, through the history of one problem of whmovement, known as superiority. It is revealing to examine this problem in detail, because of all the putative instances of the MLC, superiority seemed at first the clearest instance of a restriction that could not be explained locally by conditions on syntactic movement. The question then is whether handling this problem indeed requires reference-set computation, either for capturing the derivation, or for the interpretation of the relevant sentences. (The interpretation question is discussed in section 1.2.) Though the answer in both cases will turn out to be no, the exploration may facilitate understanding the formal properties of reference-set computation, and identifying other instances where it is at work. Chomsky (1973) noted the contrasts between the (a) and (b) derivations in cases like (3)–(5). (3) a. Who e discussed what with you? b. */?What did who discuss e with you? (4) a. What did Lucie discuss e with whom? b. */?Whom did Lucie discuss what with e? (5) a. Whom did Lucie persuade e [PRO to visit whom]? b. *Whom did Lucie persuade whom [PRO to visit e]? In the (a) cases, the wh-NP that moved originates higher in the tree than the one that stays in situ. If the lower one moves, as in the (b) cases, the derivation is worse. In cases like (3b) that involve just the subject and the object, the violation seems weak, and it has been argued not to exist in all languages. However, things deteriorate with VP-internal arguments in (4b); the movement in (5b), across a clause boundary, is even worse. Chomsky (1973) assumed that these facts illustrate the operation of a syntactic constraint on wh-movement, which he labeled ‘‘superiority.’’ The relation ‘‘superior’’ is the predecessor of c-command, and the superiority condition requires that given two or more wh-candidates for movement, the one that moves is the superior one, which in later terms means that which c-commands the others. At the time, this constraint posed a problem, and seemed inconsistent with what was known about syntax. A striking property of the superiority restriction is that there seems to be no way to state it as an absolute constraint on syntactic movement, like the number of syntactic barriers
16
Chapter 1
crossed. To see this, observe the di¤erence between (5b), repeated here, and (6). (5b) *Whom did Lucie persuade whom [PRO to visit e]? (6) Whom did Lucie persuade Max [PRO to visit e]? The distance between whom and its trace is precisely identical in the bad example (5b) and the good example (6). This means that the movement of whom in (5b) does not violate any island condition, or any absolute prohibition on movement. So well-formedness appears here to be a relative matter: for (6), there is no other candidate for movement, while for (5b) there is. There was no obvious way to state such facts in the syntax of 1973, apart from a descriptive constraint. Along with the conceptual problem that the superiority constraint seemed to pose, there were empirical problems, which cast doubt on whether this was the correct generalization, and led to the abandonment of the idea. The problems showed up with wh-adjuncts, as in (7). The superiority constraint rules out (7a), where why presumably originates lower than who. But by the same reasoning (7b) should be permitted, which is not the case. In fact, (7b) was felt to have the same status as (7a) or (3b). (7) a. */?Why did who arrive e? b. */?Who e arrived why? (8) a. *Who fainted when you behaved how? b. Who fainted when you attacked whom? The judgments are again clearer in cases like (8a). There is no superiority violation here. In terms of syntactic movement, (8a) is identical to the acceptable (8b). Still, when the wh–in situ is an adjunct, as in (8a), the derivation gives the same appearance of being a superiority violation. Huang (1982) observed that the problem in (8a) resembles the problem in (9), where syntactic movement extracts an adjunct out of an embedded clause, violating a constraint known as the ECP.2 (9) *How did Max faint when you behaved e? Based on such facts, Huang argued that all instances of wh–in situ must undergo further covert movement at LF to join with the question operator. If this is the case, then the covert movement of how in (8a) violates the ECP just as its overt movement in (9) does. Huang’s analysis was extremely influential, and the idea that wh–in situ must undergo LFmovement gained popularity in the 1980s, when it was believed that such movement is also needed for interpretative reasons.
Reference-Set Computation
17
Huang’s hope was that the LF-movement analysis would explain both the superiority and the adjunct e¤ects as instances of ECP violations at LF. On this view, overt syntax movement can apply to any of the whcandidates (subject to standard restrictions on syntactic movement), but at LF, all other wh-elements must raise. In the specific implementation of Huang, who of (3b), repeated below, adjoins to what in SpecCP, as in (10). From that position it does not c-command its trace (since the index of this Spec remains that of what). So the trace of who is not antecedent governed—violating the ECP. The same is true for (3a), repeated below, but there the trace is head governed, hence the ECP permits the derivation. (3) a. Who e discussed what with you? b. */?What did who discuss e with you? (10) LF of (3b): *[who1 [what2 ]]2 [e1 discussed e2 with you] This account captures correctly all adjunct cases, since adjuncts always require antecedent government, and it also happens to capture superiority with subjects, as in (3b). What has gone unnoticed, though, is that it leaves the other superiority cases (4) and (5) unexplained—for example, the LF of (5b), repeated below, should be (11) in Huang’s system. In this LF-derivation, the trace is appropriately head governed. Hence it is not ruled out by the ECP. (5b) *Whom did Lucie persuade whom [PRO to visit e]? (11) LF of (5b): [whom1 [whom2 ]] [Lucie persuaded e1 [PRO to visit e2 ]] Though several other implementations of the LF-movement approach exist, it remained the case that this approach did not solve the full range of the superiority problem. In the minimalist program (starting with Chomsky 1992), Chomsky returned, in a sense, to the analysis of Chomsky 1973. Regardless of whether covert LF-movement of wh-constituents is still independently needed, Chomsky argued that superiority is a restriction on overt movement—an instance of the economy strategy (2) of preferring shorter links. Traveling to SpecCP, the c-commanding wh has to cross fewer nodes that dominate it than any wh it c-commands. Hence the movement in the (a) cases of (3)–(5) is more economical than that in the (b) cases. This may appear to leave us precisely where we started, with the problem of wh-adjuncts unsolved. However, Tsai (1994) and Reinhart ([1994] 1998) argued, on di¤erent grounds, that this problem is, indeed,
18
Chapter 1
independent of the problem of superiority with wh-arguments. Note first that the problem in (8a), repeated below, is not a general problem with wh-adjuncts, as assumed by Huang, but is restricted to adverbial whphrases. Example (12), in which how is replaced with what way, is fine. Syntactically and semantically, the wh-phrase is an adjunct in both. Still, only the adverbial adjunct causes problems. (8a) *Who fainted when you behaved how? (12) Who fainted when you behaved what way? I argued in Reinhart [1994] 1998 that the standard interpretation of instances of wh–in situ involves no LF-movement, and they are interpreted in situ by a mechanism of choice functions, whose details I will examine in chapter 2. But adverbial wh-elements cannot be interpreted this way. One thing that would be agreed on in all frameworks is that whadverbials are di¤erent from wh-NPs, first, because they do not have a common noun set (N-set), and second, because they denote functions ranging over higher-order entities (Szabolcsi and Zwarts 1990). This means that choice functions selecting an individual from a set cannot apply to them (since there is no set of individuals that the choice function could select from). In (12), the adjunct that stays in situ is still an NP, hence it is interpreted in situ by applying a choice function. But in (8a), the same procedure cannot apply. Adverbial wh-expressions, then, pose a specific problem, because they are uninterpretable in situ. Two routes are open to proceed from this observation. One is that whadverbials in situ, and only they, must indeed undergo LF-movement in order to be interpreted. Hence, Huang’s (1982) account still holds for such adverbials, and (8a) is an ECP violation. Another route is to pursue the alternative account o¤ered for the problem in Reinhart 1981b, namely, that such adverbials are, in fact, base generated in SpecQP, hence (8a) cannot be generated. The analysis assumed two Specs, which would correspond in current syntax to CP and QP, and among the arguments for base generating adverbials in SpecQP was the fact that we never find more than one such adverbial per clause. While (13a), which could be obtained by some sort of scrambling of the adverbial to final position, is marginal, (13b) is completely out. (13) a. ? Who spoke how? b. *Who spoke when how? Either way, we may conclude that the problem of wh-adverbials is independent of superiority, and the latter indeed reflects, a restriction on overt
Reference-Set Computation
19
syntactic movement. The road is open, then, to pursuing Chomsky’s (1992) assumption that superiority is an instance of the MLC in (2)—in other words, that it requires reference-set computation. While at the previous stage, in 1973, the superiority condition seemed arbitrary and structure-specific, in the early 1990s the MLC was believed to govern a broad spectrum of facts. It was intended to entail the relativized minimality e¤ects of Rizzi 1990, as well as minimizing the number of chain-formation operations, in cases discussed by Epstein (1992) and Collins (1994). Let us look more closely at the intuition behind (2), and its implications for the theory of syntax. At the transitional stage between the principles-and-parameters framework and the minimalist program, it was noted that certain, apparently distinct, constraints on syntactic movement have something in common that could be characterized as ‘‘least e¤ort.’’ Following Rizzi’s relativized minimality, it was felt that what the bad derivations in (14) have in common is that the (italicized) moved element skips an (underlined) potential landing site, which is closer to the original position of the moved element, so, in some sense, the movement is ‘‘longer’’ than necessary. (14) Relativized minimality a. Head movement (HMC): *Where find Max will t the book. b. A-movement (superraising): *Max seems [that it is certain [t to arrive]] c. A 0 -movement (wh-islands): *I wonder what you forgot from whom you got t t. In the superiority cases that we discussed, there is no intervening landing site. Still, the derivations seem longer than necessary, since to check the wh-features of C, the wh-element closer to it could move. In the first implementation of the minimalist program (MP) (Chomsky 1992, 1994), movement was motivated by the need of the moved element to check its features (‘‘greed’’). Under this implementation, it was not possible to state the ‘‘shorter-link’’ intuition locally. For example, in (14b), once we select and merge it in the second cycle, there is no shorter way for Max to check its case or DP features. Similarly, from the perspective of the wh that moved in (3b) (*/?What did who discuss e with you?), the route it took is the only (hence the shortest) way to check its own features. Capturing this intuition required, therefore, comparing a set of competing convergent derivations, which was later labeled the reference set. The MLC condition (2), repeated here, is based on constructing such a set.
20
Chapter 1
(2) Minimal Link Condition (MLC) ‘‘Given two convergent derivations D1 and D2 [out of the same numeration] . . . D1 blocks D2 if its links are shorter’’ (Chomsky 1992, 48). For the superraising case in (14b), the relevant reference set is the pair h15a, 15bi, which contains two possible derivations from the same numeration (the same ‘‘deep structure,’’ in the previous model). (15) a. [(F) It seems that [(F) Maxi is certain [t i to arrive]]] b. *[(F) Maxi seems that [(F) it is certain [t i to arrive]]] In (15a) Max moves in the second cycle (of certain), to check the feature F, and in the higher cycle it is merged. Example (15b) is (14b). Since the link between Max and its trace is shorter in (15a) than in (15b), (15a) blocks (15b). Similarly, the reference set for the superiority violation in (5b), repeated in (16b), is the pair h16a, 16bi. Derivation (16a), with the shorter link, blocks (16b). (16) a. Whom did Lucie persuade e [PRO to visit whom]? b. *Whom did Lucie persuade whom [PRO to visit e]? We noted that a characteristic property of the superiority restriction is that it is impossible to state it as an absolute condition in terms of the distance between the original position and the target position of movement. The distance between these positions is identical in the problematic (16b) and in the innocent (6), repeated in (17). (17) Whom did Lucie persuade Max [PRO to visit e]? This is precisely the type of property that can be explained by assuming reference-set computation. For (17), there is no alternative derivation that will satisfy the wh-feature (no alternative convergent derivation), so it is the single member in its reference set and hence, the shortest possible derivation in this set. Let us reflect now on the formal properties of the computation we have been assuming. The characteristic properties of reference-set computation are that it assumes a relative concept of well-formedness (as we saw), and, next, in the specific instances under consideration, that it requires global computation. In (15), for example, it is useless to construct a reference set locally, at the second (certain) cycle, since the e¤ects of either inserting it or moving Max are only noticeable at the next cycle. So the whole derivation must be kept open and available at that top cycle. As pointed out by Collins (1997), the problem is more general. Since (4) requires compar-
Reference-Set Computation
21
ing only convergent derivations, the construction of the reference set is only possible at the very end, where nonconvergent derivations can be filtered out. My focus of attention here is on instances of reference-set computation with this second property of requiring global computation. (Other instances may involve local reference-set computation, which does not raise the questions I will be turning to.) Optimality Theory (OT), which developed in about the same period, is based on the same notion of global reference-set economy, with these two properties, though the technical details of the implementation are di¤erent. But the OT system is much richer, assuming, first, that what needs to be checked against a reference set is not just which derivation is shorter, but a currently open list of constraints, and next, that these constraints are ranked, with possible variations of the ranking across languages. The global nature of reference-set optimality poses a problem if we assume that the parser is essentially transparent, in the sense outlined in the introduction—that is, that it actually implements (a subset of ) the computations required by the CS, with minimum parser-specific computations. If we translate the computation into actual processing terms, it requires, first, holding all nodes in the derivation accessible in working memory, until the full derivation can be completed, and at the same time constructing (or attempting to construct) alternative derivations, with which to compare the stored material. The type of load on working memory assumed here exceeds what is known to be realistic for the human parser. The assumption shared by all processing studies (since, at least, Fodor, Bever, and Garrett 1974) is that given the limitations of working memory, the human processor attempts to close constituents as soon as possible. Chunks of the derivation that are closed are assigned some abstract representation, and the nodes they dominate are no longer available for subsequent processing. Opening a closed constituent to access its subparts is possible but can be highly costly, leading to a garden-path e¤ect. If the parser requires global reference-set computation, either nothing gets closed and eventually the overload is too great for processing (as in the case of center embedding), or constituents constantly close and reopen (garden-path e¤ect). Neither option is consistent with the fact that in actual language use, sentences ordinarily get processed smoothly. The least we can infer is that the human parser does not operate, in processing, by computations of this kind. An approach developed to address this problem, particularly in the OT framework (though it is also still found in the earlier parts of Chomsky
22
Chapter 1
1995), is that one should not attempt to deduce the properties of the computational system (competence) from properties of the parser (performance). The actual processing of derivations need not literally compute optimality, but rather some algorithms, or heuristic strategies, are developed by speakers for a quick assessment. (For some algorithms proposed for acquisition, see Pulleyblank and Turkel 1998 as well as Tesar 1998.) This approach cannot yet be evaluated, given that the full range of algorithms guiding the parser still needs to be specified. But rather than dwelling on this point, we may note its implications for the hypothesis of optimal design outlined in the introduction, based on Chomsky 2000. Suppose we have successfully defined a computational system that is an optimal solution to the elementary interface conditions, but it still fails the other conditions—for example, it is not fully adequate for processing with limited working memory, so we have to add many parser-specific algorithms that enable it to bypass the required computation of the CS. This would mean that the optimal-design hypothesis is false and human language is not optimally designed. If reference-set computation is found only in isolated cases governed by the MLC, as is the case in the early minimalist program, this does not constitute a complete failure of optimal design, as in Optimality Theory, because the problem is confined to specific areas. Nevertheless, it is appropriate to check further whether it really needs to be assumed even in these isolated cases. The question at stake is whether syntax—the computational system—includes (even restricted) computations of this kind. In fact, it turned out that there was no real motivation to assume the complex computation of the MLC in (2), since whatever is correct about the intuition of ‘‘least e¤ort’’ or ‘‘shortest move’’ can also be captured by a local computation. In chapter 4 of Chomsky 1995, both the views on what triggers movement and on the MLC are revised. Greed is replaced with attract: movement is not triggered by the requirements of the moving element, but by the higher (functional) category, which needs this element in order to be interpreted or deleted. This enabled building the MLC into the definition of attract. (18) ‘‘Attract’’ (combines last resort and MLC) ‘‘K attracts F if F is the closest feature that can enter into a checking relation with a sub-label of K’’ (Chomsky 1995, 297). From the perspective of the attracting target, there is nothing complex about finding its nearest candidate. Suppose we reach a stage in the derivation where a functional category (a feature) has been merged. At this
Reference-Set Computation
23
point we search in the chunk of the derivation we have just built, for the necessary element to check it, and the search stops as soon as the first such element (going from top to bottom) is found. For example, in the superiority cases of (16), repeated here, the relevant state of the derivation is (19), where the wh-feature has just been merged at the matrix. (16) a. Whom did Lucie persuade e [PRO to visit whom]? b. *Whom did Lucie persuade whom [PRO to visit e]? (19) Qþwh [Lucie persuade whom [PRO to visit whom]] This feature now attracts the nearest wh-element it can find, which is the complement of persuade. Hence (16a) is derived, and there are no further options for continuing the search that could derive (16b). In (17), repeated below, the first wh that can be found is the complement of visit, hence it is this one that is attracted. I will return shortly to the cases of relativized minimality in (14). (17) Whom did Lucie persuade Max [PRO to visit e]? The MLC on this view is not a relative condition, but an absolute one. The first relevant element must be selected, regardless of any other considerations that may have tempted us to do otherwise. On this formulation, no reference set is constructed at all—(16b) is not ruled out by comparison to alternative options, but it is underivable. The MLC is also local, in the sense that it applies as soon as the attracting node has merged, with no need to know about any potential future steps in the derivation. This is the place to note that there has always been something puzzling about the view of the original MLC as a ‘‘least-e¤ort’’ or economy principle. An extremely costly computation, which exceeds standard processing limitations, was needed to save the e¤ort posed by a longer link than necessary. On the other hand, under the present formulation it is possible to observe that this absolute condition is indeed a ‘‘least-e¤ort’’ condition in terms of actual processing. It minimizes the search for a checking element, thus enforcing the quickest possible conclusion of the given step in the derivation and freeing working memory for the next task. There is still a di¤erence between the revised MLC and the other absolute conditions on syntactic movement, which prevent movement out of an island. The latter define the limits of the search—the domain beyond which a functional category cannot attract elements to check it. For example, when the Q-feature is merged in (20a), it starts the search for a wh that it can attract. However, the search cannot reach into the syntactic
24
Chapter 1
island, which is why (20b) cannot be derived. Hence no wh-feature can be attracted, and a derivation starting with this numeration has no way to converge. The same is true for the CED island in (21). (20) a. Qþwh [you resign [after Max behaved (in) what way]] b. *In what way did you resign after Max behaved t? (21) *Which shelf did you borrow the books on t? In terms of processing, islands correspond to units that have been closed and stored at the stage of the derivation where the attractor is introduced. Their unavailability, again, decreases the load on working memory. The properties of the computational system that emerge out of this view of ‘‘least e¤ort’’ provide no evidence for a need to impose ‘‘imperfections,’’ such as an altogether separate parser, or processing algorithms. On the contrary, the revised MLC and the island restrictions appear to be conditions enabling the computational system to match the processing limitation of human users—that is, the limitation of working memory. The computation is local, which means that only chunks of the derivation that are actively at work need to be retained in working memory; syntactic islands define the absolute limit for search operations, and the revised MLC imposes further acknowledgment of this limitation, forcing the quickest conclusion of operations required in a given step in the derivation. Conceptual issues aside, the reasons the global reference-set approach was discarded in the minimalist framework are also empirical. Even for the small corpus examined here, we can see that the version of the reference-set MLC, as stated in (2), yields the wrong results in the case of wh-islands. (This was pointed out in Reinhart [1994] 1998.) The reference set for (14c), repeated in (22c), is h22b, 22ci. (In terms of ‘‘deep structure,’’ (22b, c) are both derived from (22a).) Recall how this was determined: with the numeration used in (22c), we could obtain all three derivations in (22), as well as several others. However, only the derivations in (22b, c) converge: in (22a), as well as in the other conceivable options, the wh-feature is not checked. (22) a.
I wonder [Qþwh [you forgot [Qþwh [you got what j from whom i ]]]] b. *I wonder [from whom i [you forgot [what j [you got t j t i ]]]] c. *I wonder [what j [you forgot [from whom i [you got t j t i ]]]]
Given the reference-set MLC as stated in (2), there are now two possible conclusions: either we decide that the two derivations have equally short
Reference-Set Computation
25
links, or one of them is shorter than the other. (Computing here is not simple, but nothing hinges on deciding this.) In the first case, both derivations should be allowed; in the second, one of them (the shorter one) should be permitted. Both these conclusions are wrong. This in itself does not prove that the idea of reference-set economy in the computational system is wrong, since one may reasonably argue that wh-islands are governed by an independent absolute constraint. Nevertheless, the problem illustrates the danger of using such a strategy freely. The account suggested in chapter 4 of Chomsky 1995 for these cases rests on another option of satisfying ‘‘attract’’ in (22), which we have overlooked so far. Suppose what moved to check the wh-feature of its clause as in the first step of (22b). When the next Qþwh is merged and looks for a feature to attract, the nearest one it can find is this same what. Hence the ‘‘attract’’ version of the MLC in (18) determines that this is the only option, and what must move again. Thus the only derivation permitted from this numeration (from the ‘‘deep structure’’ in (22a)) is (23). (23) I wonder [what j Q [you forgot [t j Q [you got t j from whom i ]]]] The assumption is that (23) indeed converges, in the sense that all relevant features are checked, but it is semantically defective.3 Similar reasoning applies in the case of superraising in (14b), though it entails some further complications.4 We should note that this specific account of wh-islands and superraising is a matter of implementation, which is being continually revised in the MP framework. Another possibility, suggested in Reinhart [1994] 1998, is that these are not, in fact, instances of the MLC, even in its present formulation, but they follow from other conditions.5 An issue still open in the MP is the precise account of syntactic islands (which originally also included wh-islands). The decision regarding the division of labor between the MLC and other conditions must await such an account. Either way, it is clear that none of the cases that originally motivated the introduction of reference-set computation into the computational system justify this move. If anything, they show that such a computation is not, in fact, available in this system. 1.2
Interpretation-Dependent Reference Sets
Though it was found irrelevant for syntax, the concept of reference-set computation, in the early minimalist program, inspired a line of research
26
Chapter 1
on its role at the interface of the computational and the conceptual systems. Interestingly, the first formulations of reference-set strategies at the interface also evolved around the earlier version of the MLC in the area of superiority. So let us first trace this development. There are a residue of facts noted over the years that pose problems for any analysis of superiority e¤ects. One such problem, noted by Lasnik and Saito (1992), is given in (24). Example (24a) is a standard superiority violation (the lower rather than the higher wh-phrase has moved). But (24b), where precisely the same thing happens in the embedded clause, is much better. (24) a. */?I know [what [who bought e]]? b. Who e knows [what [who bought e]]? (25) a. Who e knows who e bought what? b. Lucie does. (¼ Lucie knows who bought what.) c. Lucie knows who bought a car . . . (26) a. Who e knows what who bought e? b. *Lucie does. (¼ Lucie knows what who bought e) c. Lucie knows what Max bought . . . (27) a. For which hx, yi, x knows what y bought. b. For which x, x knows for which hz, yi, y bought z. As Lasnik and Saito noted, this is only possible if who has matrix scope. In principle, sentences with this structure have two scope construals, as seen in (25), which does not involve a superiority violation. If the wh–in situ (what) takes scope in the lower clause, a possible answer would be (25b); if it takes scope in the top clause, the answer will have the form of (25c). (The italicized constituents correspond to the wh-constituents that are being answered. For independent reasons, a wh-constituent in SpecCP cannot have scope beyond that CP, so there is no additional scope construal for the question.) By the same token, (24b) should also be ambiguous regarding the scope of the embedded who, but it is not. As we see in (26), it cannot be answered with (26b), which is obtained by interpreting who with scope over the embedded clause, but only with (26c), which corresponds to the higher scope construal. In other words, of the two informal scope representations in (27), (26a) can only be construed as (27a). (I ignore here the precise details of the interpretation of questions and of wh–in situ, issues that are discussed in Reinhart 1992, [1994] 1998.) Golan (1993), followed by Reinhart [1994] 1998, argued that to capture such facts, we need to assume that the MLC, which is behind the superi-
Reference-Set Computation
27
ority e¤ects, is interpretation-dependent—that is, it determines the most economical derivation relative to interpretative goals. In the standard bad instances of superiority violations, the derivations with the long and the short movement yield precisely the same question. For example, derivation (28a), which violates superiority, results in (28b), which is precisely the same as (29b), obtained by the shorter derivation (29a). In this case the more economical derivation (shorter link) blocks the other. (28) a. *What did who buy e? b. For which hx, yi x bought y. (29) a. Who e bought what? b. For which hx, yi x bought y. In the problem case (24b), repeated in (30a), the derivation appears to violate the MLC as well, since a shorter derivation exists, as in (25a), repeated in (31a), where the c-commanding who is moved. (30) a. Who e knows what who bought e? b. For which hx, yi, x knows what y bought. (31) a. Who e knows who e bought what? b. For which hx, zi x knows who bought z. But in this case the questions denoted by these two derivations are not identical. With a matrix scope of the wh–in situ, (30b) asks for a value for who, while (31b) asks for a value for what.6 So, if we try to ask the question (30b), there is no other, more economical derivation that could arrive at this question. Hence, this is the most economical way to reach an interface goal. The line of argument in Reinhart [1994] 1998 is that considerations of this type apply at the stage of translating syntactic forms into semantic representations. (It is not necessarily full semantic representations that need to be checked, but some representation in which variables are introduced and bound.) The way it was stated in Reinhart [1994] 1998, if at the stage of translating a given convergent derivation D into some semantic representation, we discover that an equivalent semantic representation could be obtained by a more economical derivation D 0 (from the same numeration), D 0 blocks D. (That is, D 0 blocks D unless their translations are not equivalent.) I argued further that under this view, the computation found in superiority has properties similar to the strategy that I proposed in Reinhart 1983 for the coreference aspects of conditions B and C. Abstracting away from the technical details (which are worked out in Grodzinsky and Reinhart 1993), the coreference generalization is
28
Chapter 1
that two expressions in a given LF, say D, cannot corefer if, at the translation to semantic representations, we discover that an alternative LF, D 0 , exists where one of these is a variable bound by the other, and the two LFs have equivalent interpretations. In other words, D 0 blocks coreference in D, unless they are semantically distinct. (I will return to this strategy in chapter 4.) But this formulation of the computation involved here is somewhat vague. Fox (1995) proposed a precise formal statement of this intuition. He built it into the definition of the reference set, and at that stage it was applicable only for interface strategies governed by the MLC: the set out of which the MLC selects the most economical derivation includes only derivations that end up with the same interpretation. Technically, this means that the reference set consists of pairs hd, ii of derivation and interpretation, where the interpretation i is identical in all pairs. A given hd i , ii pair is blocked, if the same interface e¤ect could be obtained more economically—that is, if the reference set contains another competitor hd j , ii, where d j has a shorter link. For illustration, again consider the reference set for (28a), repeated below. It consists of the pair h32a, 32bi. Each member of this pair is itself a pair of a derivation and the interpretation assigned to it. The reason both derivations are included in the reference set is that (they start with the same numeration and) their interpretation member is identical. In this reference set, the link in the d-part of (32b) is shorter than that in (32a), hence (32a) is ruled out by (32b). (28a) *What did who buy e? (32) a. hWhat did who buy e, for which hx, yi x bought y.i b. hWho e bought what, for which hx, yi x bought y.i The acceptable superiority violation in (24b), repeated here, contains only one member in its reference set—the hd, ii pair based on (27a), repeated here. This is so because, as we saw, no other derivation out of the same numeration has the same interpretation. Hence, 24b is the most economical derivation (relative to this interpretation). (24b) Who e knows what who bought e? (27a) hWho e knows what who bought e, for which hx, yi, x knows what y bought.i This approach, then, retains the earlier view of the MLC as a selection out of a reference set, but restricts the set further by interpretative considerations.
Reference-Set Computation
29
Fox (1995) and Reinhart [1994] 1998 argued that QR as well is sensitive to reference-set computation. In Fox’s implementation, QR obeys the MLC, under this interpretation-sensitive formulation.7 I will return to this question in greater detail in section 2.7, but let us just follow the gist of the idea here. Following the tradition in the LF-theory of the principles-andparameters framework, and in Heim and Kratzer 1998, Fox assumes that all non-subject-quantified NPs necessarily undergo QR at LF. Whether their final scope would correspond to their overt position depends on where they move to at LF. Thus, a sentence like (33) is ambiguous regarding whether every patient has narrow scope, as determined by its overt syntactic position, or it scopes over a doctor. (33) A doctor will examine every patient. (Ambiguous) (34) a. A doctor2 [e2 will [VP every patient1 [VP examine e1 ]]] (There is a doctor x, such that for every patient y, x will examine y.) b. Every patient1 [a doctor2 [e 2 will [VP examine e1 ]]] (For every patient y, there is a doctor x, such that x will examine y.) The narrow-scope interpretation is obtained by raising the quantified object just to the VP, as in (34a). (This is the position proposed for raised VP-internal quantifiers in May 1985.) The wide scope of every patient is obtained by movement to the topmost IP position, as in (34b). Assuming that a quantified VP-internal argument can adjoin either to VP or to IP to be interpreted, it appears that the MLC should determine that only the first is allowed in practice, since the link between the quantifier and its trace is shorter in (34a) than in (34b). However, if the MLC does not compare just derivations, but hd, ii pairs of a derivation and its interpretation, the movement in (34b) is licensed, since it yields a distinct interpretation from the shorter derivation in (34a). Hence, the reference set of (34b) contains only this derivation. Fox provides impressive evidence for this view of the MLC. His point of departure is a puzzle noted by Sag (1976) and Williams (1977). Although (33), repeated here, is ambiguous, as we just saw, the ambiguity disappears in the ellipsis context of (35). (33) A doctor will examine every patient. (Ambiguous) (35) A doctor will examine every patient, and Lucie will [ ] too. (Only narrow scope for every)
30
Chapter 1
When (33) occurs as the first conjunct of the ellipsis, it allows only the narrow scope for every patient, represented in (34a) (i.e., (35) is true only if there is one doctor that will examine all the patients). The account Sag and Williams o¤ered for this fact is based on their assumption that VP-ellipsis is an LF-operation: an LF-predicate is copied into the empty VP (at least in Williams’s analysis). The predicate should be well formed, and, specifically, it cannot contain a variable bound outside the copied VP. Now let us look again at the two LFs generated for (33), repeated in (36a, b). (36) a. A doctor2 [e2 will [VP every patient1 [VP examine e1 ]]] b. Every patient1 [a doctor2 [e2 will [VP examine e1 ]]] c. And Lucie will [ ] too. The second ellipsis conjunct is generated, as in (36c), with an empty VP, into which an LF-VP should be copied from the first conjunct. If we copy the (top) VP of (36a) ([VP every patient1 [VP examine e1 ]]), the result is well formed. But the VP of (36b) is [VP examine e1 ]. This VP contains the trace of every patient, which is bound outside the VP. Hence this is not an independent well-formed predicate, so it cannot be copied. It follows, then, that only the LF (36a) allows interpretation of the ellipsis, hence in (35) there is no ambiguity. Sag and Williams viewed this as strong evidence for their LF-analysis of ellipsis. However, Fox, also citing Hirschbu¨hler 1982, points out that this could not be the correct explanation, based on examples like (37). (37) A doctor will examine every patient, and a nurse will too. Unlike (35), (37) is ambiguous—that is, the ambiguity of the first conjunct is not canceled in the context of ellipsis. Example (37) di¤ers only minimally from (35) (a nurse, instead of Lucie). So the question is why that minimal di¤erence should matter. Though there have been many attempts at an answer since Hirschbu¨hler pointed the problem out, it remained, essentially, a mystery. Fox’s solution rests on the alternative view of ellipsis as a PF-deletion developed in the minimalist program (see Chomsky and Lasnik 1993 and Tancredi 1992 for some of the details). The inputs of VP-ellipsis, then, are two full derivations (clauses). Then one of the VPs is ‘‘deleted’’—that is, it is not spelled out phonetically. This is subject to parallelism considerations, which also may a¤ect other PF-phenomena, like deaccenting. The least we know about what counts as parallel derivations is that all LFoperations, like QR, that apply to one of the conjuncts should apply also
Reference-Set Computation
31
to the other (though many additional considerations may play a role). Let us see, for example, how (37) is derived, under the construal of every patient with wide scope. (38) a. Every patient1 [a doctor2 [e 2 will [VP examine e1 ]]] and b. Every patient1 [a nurse2 [e2 will [VP examine e1 ]]] too. Both conjuncts are derived in full, as in (38). QR has applied, independently to both. The result, then, is that the two VPs are precisely identical, and the second one need not be realized phonetically, so the PF is the string in (37). If QR does not apply in precisely the same way to both conjuncts, no ellipsis is possible, as witnessed by the fact that (37) cannot have di¤erent scope construals in the first and second conjuncts. The question, now, is why the same is not true also for (35). For ellipsis to be possible under the wide-scope construal of every patient, QR should apply in both conjuncts, as in (39). For convenience, in the following examples I ignore the LF-movement of the subject argument. If QR applies freely, as in the standard view, this should be possible, and there is, again, no explanation for why this reading is impossible for the ellipsis in (35). (39) a. Every patient1 [a doctor will [VP examine e1 ]] and b. Every patient1 [Lucie will [VP examine e1 ]] (40) a. [A doctor will [VP every patient1 [VP examine e1 ]]] and b. [Lucie will [VP every patient1 [VP examine e1 ]]] This is where the interpretation-dependent MLC enters the picture. The intuitive idea is that the MLC determines that the longer-link QR (outside of the VP) applies only if this is required to obtain an interpretation not available otherwise. The problem in (39) lies in the second conjunct. The movement of every patient here is longer than necessary for interpretation. In (38b) long-distance QR results in a di¤erent interpretation than that obtained if every patient is assigned scope inside the VP. But in the case of (39b), the reading obtained by long-distance QR is equivalent to the reading obtained in (40b) with the shorter movement, so there is no interpretative need that could motivate the longer movement. This can be observed by examining the reference set for (39b), given in (41). 8 9 > a. hEvery patient1 [Lucie will [VP examine e1 ]], (41) > > > > > < = For every patient x, Lucie will examine xi > b. h[Lucie will [VP every patient1 [VP examine e1 ]]],> > > > > : ; For every patient x, Lucie will examine xi
32
Chapter 1
Since the interpretation is identical in (41a) and (41b), the reference set includes both derivations. The shorter-link derivation (41b), then, blocks the derivation (41a). Correspondingly, the only construal permitted in the second conjunct is (41b). Returning to (40), parallelism determines, therefore, that for ellipsis to be allowed, the first conjunct should have the same LF-structure, hence, only (40) is the source of ellipsis in (35). In the case of (38), repeated below, the long-distance movement of every patient in the second conjunct yields an interpretation distinct from the shorter movement (inside the VP). This is so because every patient moves across an existential quantifier—a nurse—and whether it is inside or outside the scope of this quantifier has interpretative consequences. (38) a. Every patient1 [a doctor2 [e2 will [VP examine e1 ]]] and b. Every patient1 [a nurse 2 [e2 will [VP examine e1 ]]] too. Hence the reference set of (38b) will contain only this one derivation, and nothing rules it out. The same is true for the first conjunct, so parallelism allows this construal of (38). Fox shows the same pattern in several other cases, where long-distance QR cannot change the interpretation (with two universal quantifiers in the second conjunct, and with negation). In all these cases, the ambiguity of the first conjunct is lost in the ellipsis context. As impressive as the arguments are for the interpretation-based MLC, this view also poses serious problems. First, as observed in section 1.1, the MLC has broad coverage in the minimalist program. It is assumed, for example, to also cover all instances of relativized minimality. However, unlike superiority e¤ects and QR, none of the other movement instances governed by the MLC show any interpretation dependence. Wh-islands provide a clear instance. These may be weaker than the cases of relativized minimality with A-movement, and they may vary in unacceptability. Thus, (42) is worse than (43) even in English. (In Hebrew, (42) is fine and (43) is out, for reasons discussed in Reinhart 1981b.) But whatever status they have, it is not a¤ected by context or interpretative needs. (42) *I wonder from whom you forgot what you got. (43) ?I wonder what you forgot from whom you got. It is not too di‰cult to imagine which questions would be denoted by each of these derivations, had they been allowed. It is also clear that in each case, the given derivation is the only way to express the relevant question, based on the given numeration. Still, this does not improve the
Reference-Set Computation
33
derivation. This means, then, that the status of (42) and (43), including the issue of why the first is worse, is determined by the computational system with no access to any interface considerations. The question, then, is why just the two instances of the MLC we discussed should show interface sensitivity, and more generally, what determines when a syntactic condition is sensitive to interpretation. This is a serious problem, since if we cannot define precisely the set of operations subject to interface reference-set computation, we face the danger of a vacuous theory, where all movement depends on our undefined feelings about meaning. Possibly the problem of relativized minimality can be dismissed, if it turns out that relativized minimality is not an instance of the MLC, as might indeed be the case, independently of the problem under consideration. If so, then only superiority and QR are governed by the MLC, but it is still appropriate to wonder why just this condition is sensitive to interpretation. Next, recall that the analysis rests crucially on the earlier view of the MLC as global reference-set computation. As we saw in section 1.1, the reference-set view of the MLC was rejected with good reason. The global nature of this computation poses a serious problem for optimal design, since it is inconsistent with what is known about human processing, hence it would require bypassing the computational system by heuristic algorithms. Fox (2000) o¤ers a reanalysis of QR that can be viewed as local rather than global computation. But it remains the case (as we saw in section 1.1) that there was never any empirical reason to assume that this kind of computation is involved in the relevant problems, and it was just a mistaken formulation of the procedure of feature checking that led to the view that wh-movement obeys the MLC. Regarding superiority, we should note that the argument cited above for why interpretation-based reference-set computation is needed is not as strong as it may seem; in fact, it was probably mistaken. A point I overlooked in Reinhart [1994] 1998 is that the same argument does not extend to other instances of superiority violations. We saw in section 1.1 that superiority e¤ects are worse across a clause boundary, as in (5), repeated in (44). (44) a. Whom did Lucie persuade e [PRO to visit whom]? b. *Whom did Lucie persuade whom [PRO to visit e]? (45) *Who remembers whom Lucie persuaded whom to visit e? (24b) Who e knows [what [who bought e]]?
34
Chapter 1
In cases like (44b), the violation remains the same if the derivation is embedded in a context like (45). This is precisely the same context as (24b), repeated above, which appeared to license such violations inside the clause. Example (45) has a reading that cannot be obtained with a derivation that does not violate superiority, but still, the derivation is not improved. Furthermore, even within the same clause, the generalization illustrated in (24b) has been challenged. Chomsky (1995, 387, note 69) points out that (46) is unacceptable, and suggests that perhaps (24b) only reflects ‘‘preference for association of likes.’’ (46) *What determines to whom who will speak? It appears that superiority violations inside the clause are weak to begin with, and that various factors may a¤ect their acceptability. However, there is no systematic account in terms of truth-condition di¤erences that can explain the full range of variations here.8 A more promising direction in investigating the superiority phenomenon is that superiority inside the clause is a¤ected by focus and stress considerations at the PF-interface. In any case, at least for the time being, it would not be wise to base any theory on a phenomenon so poorly understood. So there is no reason to conclude that reference-set computation is involved in such cases. For all we know, superiority remains a purely syntactic condition, and it can be captured by the revised MLC (based on ‘‘attract’’), with no appeal to either reference sets or interpretation. In the case of QR, Fox’s findings are completely solid, and only strengthened by further inquiry into the facts. I will argue that these findings support the general claim that QR is subject to reference-set computation at the interface. The question is whether this computation is indeed governed by the MLC. Note that Fox’s specific account of his findings rests on the prevailing assumption that QR is always obligatory for the interpretation of all quantified DPs, and the only question is how far an internal quantified argument can travel. This is the question that, according to Fox, is addressed by the MLC. But the underlying theoretical assumption has never, in fact, been motivated by empirical considerations. Rather, it is purely conceptual, or theory internal. In the introduction, I outlined several ways that derivations (D) can be associated with interpretations (their possible uses U). The theoretical preference in linguistics has been to already code everything needed for the interpretation in the syntax. On this view, if the logical representation of VP-internal quantifiers requires some lambda abstraction, the variables needed for the l-operator should be available in the syntactic (LF) repre-
Reference-Set Computation
35
sentation, which would be obtained by applying QR (see, e.g., Heim and Kratzer 1998). Though it is easy to see why this is convenient, it is not the only conceivable solution to this problem of association. Another possibility mentioned in the introduction is that in some instances, the set U is determined by independent properties and computations of the external systems, which apply to legible CS representations and further modify them. In this specific instance, the system of logic (inference) that accesses syntactic derivations may apply its own computations to interpret them, whether by inserting l-predicates as in the Montague tradition, or by other means of type shifting available to logical syntax. On this view, what makes the representation legible to the inference system is the lexical semantic properties of the DPs (including their semantic definitions), but the rest of the semantic computation is carried out at that system, not at the CS. Since the debate between these two possibilities is conceptual rather than empirical, we may as well choose the one that renders the CS itself more e‰cient. For instance, if the MLC is needed only for such unverified instances of covert movement, the alternative that this does not happen in the syntax should be seriously considered. The position I defend in chapter 2 is that there is no covert movement just for the interpretation of quantifiers. The only situation where further covert movement must be assumed is when the scope of a given quantified DP is not identical to its scope at the overt syntactic structure. If we adopt this view, there is no reason to assume that the MLC is involved here. Nevertheless, scope-shifting of this type applies only when needed for interface purposes—that is, to obtain an interpretation that is otherwise unobtainable. In the literature surveyed above, these types of considerations were labeled ‘‘interface economy’’—economy considerations that allow a certain operation to apply only if it is required by the interface. An alternative way to capture interface economy was proposed by Chomsky (1995). Its essence is building the relevant interface considerations of QR into the numeration. Chomsky assumes (not just for this problem) that any item ‘‘enters the numeration only if it has an e¤ect on output’’ (economy principle (76)). He appears to assume, further, that QR needs to apply only to capture scope-shift, as described above (which reflects a change in his earlier position). Suppose it is a movement of some feature like QUANT. Some functional feature must, then, be included in the numeration to host this feature, and it will eventually be merged in a topmost IP position. This functional feature will be allowed into the numeration only if it has an e¤ect on the output—that is, if the interpretation obtained is not identical to what will be obtained without scope-shift.
36
Chapter 1
Fox’s insight is captured in this framework just the same: the relevant QUANT projection can be inserted into the numeration only in cases where Fox’s analysis allowed long-distance QR to apply. On this view, then, interface needs determine the shape of the numeration; the underlying intuition may be that it is at the stage of choosing the building blocks for the derivation that speakers select items according to what they want to say. (Theoretically, this line of thought resembles the earlier position that all aspects of meaning are determined in deep structure.) Under this view, it appears that no reference-set computation is involved in QR, which is an advantage in terms of optimal design.9 This move still raises some conceptual questions, but the question I am more concerned with here is that of psychological reality. A crucial implication of this view is that once the relevant QUANT feature is selected into the numeration, the QR-operation is motivated by convergence, just like any other operation. Thus, scope-shift obtained by QR ends up indistinguishable in status from any other syntactic operation. In practice, however, it was found that scope-shift derivations are harder to process and less common in discourse than overt scope. (I elaborate on this point in chapter 2.) No such complexities are found in standard cases of syntactic movement. There would be no obvious way to explain this di¤erence under the view that QR is indistinguishable from any other movement operation. Furthermore, if there are, as I will argue, other instances of interface economy with the same properties, they will all have to be encoded into the computational system in the same way. Feature coding is, in fact, what guarantees that this solution is fully explicit and restrictive. There are, however, cases where syntactic encoding is more problematic than it is for QR (though of course this may always be possible, at a serious theoretical cost). So some alternative account is needed anyway. The line I will pursue is that QR, and other instances of interface economy, indeed involve reference-set computation of the type examined above—that is, the reference set consists of pairs hd, ii of derivation and interpretation. A given hd, ii pair is blocked if the same interface e¤ect could be obtained more economically—in other words, if there is a better hd, ii competitor in the reference set. However, this is not governed by the MLC, nor by feature encoding in the computational system. Referenceset computation, though available to the CS, is a ‘‘last-resort’’ procedure enforced at the interface in a restricted set of cases to be defined. It is not enforced by the needs of the syntactic derivation, but by some deficiency of the outputs of the system at the interface.
Reference-Set Computation
1.3
37
The Interface Strategy: Repair of Imperfections
I know of only four instances where there is substantial evidence for assuming that reference-set computation is at work at the interface: QR, already discussed; stress-shift for the purpose of focus construal; the coreference strategy of Reinhart 1983a (binding conditions B, C); and the computation of scalar implicatures. I will discuss the first three instances in detail in chapters 2–4, and scalar implicatures more briefly in section 5.3. But first it may be appropriate to ask what they might have in common, or when reference-set computation must apply. In Reinhart 1995, I suggested that reference-set computation is involved when an uneconomical procedure is needed in order to adjust a derivation for use at the interface. So this computation is triggered only by the application of such uneconomical procedures. The first question, then, is what sense of economy is involved here, specifically, what counts as a noneconomical way to satisfy an interface need (in other words, what is the metric, in terms of Optimality Theory). In the case of QR, I believed at the time that an answer could be drawn from Chomsky’s economy principle (1), repeated in (47), which we examined briefly in section 1.1. (47) ‘‘If a derivation D converges without application of some operation, then that application is disallowed’’ (Chomsky 1992, 47). Principle (47) poses a severe restriction on the computational system: in each given derivation, the system is allowed to apply an operation (from the available inventory of operations) only if applying this operation is needed for convergence, which was implemented as feature checking. If true, then the computational system is a most e‰cient or economical system, with no superfluous steps in its derivations. It is obvious that QR is not an operation needed for convergence in the strict sense of checking syntactic features. (As we saw, it is possible to create a feature for the occasion; however, this move is not motivated by syntactic needs of the derivation, but by interface needs.) Recall that we are assuming that QR applies only for scope-shift, and it is not otherwise needed for the interpretation of quantifiers. Applying QR at a given derivation means, then, that we select a move operation from the available inventory, even though it is not needed for convergence—that is, we violate principle (47). It is at this stage of violating a basic economy principle of the computational system that a reference set should be consulted to verify that this indeed is the only way to
38
Chapter 1
meet the interface needs. So it would be approved only if scope-shift has an e¤ect on the interpretation, as Fox (1995) showed. For this line of reasoning to also apply to the other instances of reference-set computation at the interface, it would be necessary to extend principle (47) so it covers any superfluous operation, not just syntactic movement. Thus, for a derivation to meet the PF-interface, it needs to have main stress. Assignment of this stress, then, is not superfluous. However, stress-shift is, so it has to be checked against a reference set. In the case of coreference and implicatures, what I assumed to be at stake is applying a superfluous interpretative procedure. We should note, however, that the status of (47) is not, in fact, fully clear when overt syntactic operations are concerned. Originally, it represented a theoretical hope that there is no optional movement in the derivation of sentences, and all applications of movement can be reduced to feature checking (convergence). In practice, however, we do find across languages many instances of operations that seem to apply optionally in terms of syntactic convergence, like scrambling, topicalization, PPpreposing, and a variety of ‘‘stylistic-movement’’ options, which change word order. The assumption that the computational system abides by (47) has led to an industry of analyses attempting to show that each instance of such movement is either motivated by syntactic features and the corresponding functional projections to host them, or involves interface economy, namely, reference-set computation at the interface. The cost to the computational system, if all these analyses are correct, is much higher than if we assume that optional movement exists. In some instances, work within the framework of interface economy has entailed ranking optional operations in terms of their ‘‘cost.’’ For example, is it more costly to apply word-order shift by an optional syntactic operation, or to apply optional stress-shift? Thus, the theory of the interface is in danger of becoming an unconstrained version of Optimality Theory: if a system includes ranking of operations, which may vary across languages, its expressive power (the set of possible languages it generates) is greater than that of a system with no such ranking. Such a system is bound to allow many more options than are actually found in natural language. Even if we do not enter the realm of ranking, a point I would keep returning to is that reference-set computation is costly in terms of processing, even if it applies at the interface. If all, or many, of the instances of optional movement involve such computation, language cannot be very optimally designed. I have not yet discussed what counts as evidence that computation of this nature is indeed involved in a given
Reference-Set Computation
39
instance. Once this is defined, we will also be able to observe that there is no empirical evidence for computation like this in most instances of optional movement. In section 2.7, I will argue that something like (47) may still be needed for covert movement (i.e., movement after the phonetic spell-out). A computational system allowing unrestricted covert movement is in danger of not being optimally usable at the interface, since each phonetic string may allow many possible interpretations, depending on which covert operations took place. Allowing this to apply just for purposes of convergence makes covert operations fully recoverable, thus restricting the set of interpretative options. But for overt movement, there is no such danger, since all operations are overtly visible. A reasonable alternative is to assume that optional overt movement simply exists—that is, the computational system allows it. Once optional movement is available, it would make sense for speakers to use this option to improve and refine the context interface. For instance, the well-established tendency to place topic material in sentenceinitial position may make use of fronting operations. If these are optional, there is no need to assume that reference-set computation is involved in the choice to apply fronting. As argued in Reinhart 1995, part IV, topic considerations indeed make no use of this kind of computation. This does not mean, though, that in practice all word-order options available are functional at that interface—there may be a certain amount of arbitrariness. But even if all the options are used at the interface, this does not guarantee that we have, as linguists, the theoretical tools to define all these uses at present. The context interface is the hardest to formulate, so we may have to live with a certain lack of clarity regarding this question for a while. But we are still left with the question of which interface considerations do enforce reference-set computation. It cannot simply be the need to apply a superfluous operation, because as we just saw, there may be many innocent superfluous operations. The intuition that (47) enabled us to state is that applying the operations in question violates some principle that prohibits their normal application (even if (47) is not itself the relevant principle in all cases). As I have mentioned, in the case of QR, the principle violated may still be (47), if it is formulated to apply to covert movement only. But the other instances I will examine do not involve covert movement. The question of what principle is violated may vary with the operations applied, and I will get back to this question in subsequent chapters, where
40
Chapter 1
I examine specific instances of reference-set computation. For now, let us call the operations resulting in reference-set computation illicit operations, in the sense that their application violates some prohibiting principle. Since the reference-set type of strategy applies to rule out illicit operations when not required by interface needs, the type of operation itself is not determined by this strategy, and instances can be found in unrelated modules. My basic assumption, then, is that the reference-set type of strategy at the interface is a kind of repair mechanism, activated when the outputs of the computational system fail to meet an interface need. In other words, it is invoked when there is an imperfection in the system. Recall from the introduction that the basic requirement of the computational system is that it should enable the interface. In an optimally designed system, the bare minimum needed for convergence should also be su‰cient to satisfy the interface conditions. Instances where this fails to be the case may be viewed as imperfections in the system. (Note that I am talking here about operations needed for convergence and not about their other applications. An operation that in one context applies obligatorily for convergence may apply optionally in another, where not required for convergence.) Let me illustrate this notion of imperfection with a preview of the focus problem, to be discussed in chapter 3. A basic requirement of the context interface is that sentences be associated with a focus (or foci). The question is how the computational system guarantees the identification and marking of the focus constituent. An independent requirement of the PF-interface is that each sentence carry some main stress, which is necessary for pronunciation. Let us now imagine a perfect computational system. In that system, the obligatory assignment of main stress to the derivation would also be su‰cient for the association with focus, in that it would provide the marking of the focus constituent. In fact, such a view of the perfect focus assignment was proposed in Chomsky 1971. Let us see how this approach works. Assuming that the main-stress rule applies independently, the simple rule in (48) selects a set of possible foci for each derivation. (48) The focus of a given derivation is any constituent containing the main stress of IP. (49) a. b. c. d.
My neighbor is building a desk. [ DP a desk] [VP building a desk] [ IP My neighbor is building a desk]
Reference-Set Computation
41
Suppose that in (49a) main stress falls on a desk. (Main stress is marked by means of boldface throughout.) All the constituents in (49b–d) contain this main stress. Hence, (48) determines that any of them can serve as a focus. We may refer to (49b–d) as the focus set of (49a). At the context interface, one member of the focus set is selected as the actual focus of the sentence. Sentence (49a), repeated below, can be used as an answer in any of the contexts in (50), with the italicized F-bracketed constituent as focus. (49) a. My neighbor is building a desk. (50) a. Speaker Speaker b. Speaker Speaker c. Speaker Speaker
A: B: A: B: A: B:
What’s your neighbor building? My neighbor is building [ F a desk]. What’s your neighbor doing these days? My neighbor [ F is building a desk]. What’s this noise? [ F My neighbor is building a desk].
At this stage, it is up to the discourse conditions, rather than the computational system, to determine the relevant focus to be selected in a given context. If the foci defined by (48) were su‰cient for the use of sentence (49a) in all possible contexts, we could conclude that we have a perfect system. So far we have only applied the stress operation needed anyway for phonetic convergence, and a general interface rule (48) links all derivations to appropriate contexts. The actual human computational system, however, is not that perfect. We can easily find contexts where we would want to use derivation (49a), but none of the foci associated with it fit the given context. For example, (49a), with the same main stress indicated with boldface, cannot be used as an answer in either of the contexts of (51). This is so, because the context requires the F-bracketed constituents in (51) to be the foci, but the focus set defined for this derivation by (48) does not include these constituents. (The a symbol indicates, throughout, inappropriateness to context.) (51) a. Speaker Speaker b. Speaker Speaker
A: Has your neighbor bought a desk already? B: aNo, my neighbor is [ F building] a desk. A: Who is building a desk? B: a[ F My neighbor] is building a desk.
This means, then, that our computational system contains an imperfection. The stress operation needed for PF-convergence is not su‰cient to
42
Chapter 1
meet all the needs of the context interface. Thus the question is what to do when facing an imperfection in the system. Note that the problem of QR is essentially of the same type. In a perfect system, the overt structure associated with a derivation would be sufficient to capture all its scope construals in di¤erent contexts. In practice, this is not the case, and the context may require a construal not generated by the computational system (without an illicit covert operation). Indeed, there is also a certain resemblance in the history of how quantifier scope and focus have been conceptualized in theoretical linguistics. As we just observed, at the earlier stages—Chomsky 1971—focus was essentially viewed as a property defined in terms of PF-structures. This approach rested on the notion of ‘‘normal’’ or ‘‘neutral’’ intonation, namely, in present terminology, the assumption that there is an independent stress operation needed for PF-convergence. For the cases of imperfections, where this stress operation is not su‰cient for the interface, Chomsky (1971, 199) argued that ‘‘special . . . processes of a poorly understood sort may apply in the generation of sentences, marking certain items as bearing specific expressive or contrastive features that will shift the intonation center.’’ A distinction is implicit here between neutral stress and marked stress, obtained by applying special required operations. In Keenan and Faltz 1978 and Reinhart 1983a, the same was assumed for the scope of quantifiers: scope is determined by the syntactic configuration of the overt structure. A rule like QR is used only when it is necessary to derive scope construal wider than the overt c-command domain, and it is viewed there as a marked, discourse-driven, operation. On this view, overt scope is always the preferred option, with one systematic exception in the case of internal NP-scope, noted in Reinhart 1976, who argued that these cases require an independent analysis. (I return to these questions in section 2.7.) However, the concept of markedness was problematic. It appears easy to find examples of covertly determined wide scope that sound perfectly natural. (For instance, as Hirschbu¨hler (1982) noted, in a sentence like An American flag was hanging in front of every building, the most natural construal is with wide scope for every building.) If it can at times be as easy to get the marked derivation as the unmarked one, it is not clear what empirical content the concept of markedness could have. Similarly, the distinction between marked and neutral stress has also been challenged. As an argument against the Nuclear Stress Rule (NSR) or Chomsky’s (1971) focus analysis, it was repeatedly pointed out that in
Reference-Set Computation
43
the appropriate context, main stress can fall anywhere, with e¤ects hardly distinguishable from that of the neutral stress. (For an overview, see Selkirk 1984.) The crucial problem here as well is whether any content can be given to the concept of markedness. If there is no obvious way to distinguish neutral and marked stress, we run into the danger of vacuity— having a theory that excludes nothing regarding stress. The facts that follow from its rules are labeled ‘‘neutral,’’ and everything else, ‘‘marked.’’ (This type of theory is always true, regardless of what its rules are, by virtue of being unfalsifiable.) A more realistic conclusion appeared to be that there is no sentencelevel generalization governing the selection of possible foci, and any expression can be a focus, subject only to discourse appropriateness. Hence, it was concluded that main stress cannot be assigned at PF independently of the semantics of the sentence, and it must be the other way around: sentence intonation reflects its independently determined focus structure. The prevailing solution since Chomsky 1976, where LF-movement was introduced, has been that both scope and focus are identified at the covert structure: LF. A focus constituent has been marked by a focus feature, and the marked constituent moved at LF. Thus, covert ‘‘focus movement’’ has been assumed to be obligatory for every derivation. QR has been assumed to be obligatory in all derivations with quantified constituents. Thus, the problem of markedness has been avoided. But this solution is problematic as well. First, while focus movement does eliminate the problem of markedness, the relations between stress and structure become a complex issue, raising questions about the visibility of the covert structure to PF-rules (stress). More generally, this solution placed much of the burden of capturing the interface requirements on the covert structures. I have already noted a problem with this approach and will return to it later. Generally, the more information that is captured covertly, the more mysterious it is that speakers are able to understand each other. Admitting an imperfection in the system, we may still wonder whether it must be as sweeping as entailed by this analysis—for example, that the derivation’s main stress is uniformly determined at the covert structure. Furthermore, we may note that this massive imperfection still does not take us very far toward capturing the actual interface conditions. Though no satisfactory content could be given to the notion of markedness, in practice it is not the case that covert quantifier scope is always as free and easy to get as overt scope, and certainly not that the so-called marked stress is completely free. Introducing the machinery of covert movement
44
Chapter 1
is thus just the first step in formulating the question of when it can actually be used. Answering this question will require introducing more conditions and rules (more imperfections). One may wonder whether it is not possible to start directly by answering the second question, skipping the massive imperfection we introduced just to formulate it. In an influential work, Cinque (1993) o¤ered a new perspective on the NSR and argued that the earlier view of the relations between stress and focus can be maintained. This direction is pursued here in chapter 3. We should note, however, that the analysis is based on a revival of the distinction between neutral and marked stress: when the stress assigned by the NSR is not appropriate to the context, a special stress-shift operation applies, yielding marked stress. So the question ‘‘How do we know it is marked?’’ is relevant again. In Reinhart 1995, 1998, I argued that it is a mistake to search for evidence of markedness in the realm of direct intuitions. A marked derivation is a derivation that involves an illicit operation, as defined above. (Both QR and stress-shift are viewed here as illicit operations.) When this is done with no reason, the result is visibly awkward. But if using the illicit operation is unquestionably the only way to satisfy a certain interface need, the result sounds perfectly fine, and it is only indirectly that we can see that it is marked nevertheless. As we observed in the case of QR, Fox (1995, 2000) provides ellipsis evidence consistent with the claim that QR does not take place when not needed for interpretation. The evidence for the illicit status of the stress-shift operation will be discussed in chapter 3. In more precise terms, what is claimed in the last paragraph is that computing QR and stress-shift involves constructing a reference set and checking whether it contains a better hd, ii pair—that is, a pair derived without applying the illicit operation. If it does, the derivation is blocked (in other words, if we nevertheless produce it, it is visibly marked). Thus, to conclude the question we started with in this section, referenceset strategies are ‘‘last-resort’’ strategies used to repair or make up for imperfections in the computational system. They are used when the need arises to apply an illicit operation in order to adjust a derivation to the interface needs. That illicit operations need to apply at all remains an imperfection in the core system. However, this imperfection is much less serious than we previously assumed. First, PF-procedures, like stress, operate, as they should, on the overt structure. Next, the illicit QR and stress-shift cannot apply just anywhere but are restricted by reference-set checking. On the
Reference-Set Computation
45
other hand, as we saw, global reference-set computation has a serious processing cost, which is problematic for the secondary requirement of meeting the empirical conditions of use—processing and acquisition. Here, too, the problem is far less massive than that demonstrated by the Minimal Link Condition, since reference-set computation is triggered only if an illicit operation applies. Nevertheless, in these restricted cases, we do have a deviation from optimal design. The strongest interpretation of the concept of imperfection is that if we have to admit it into our theory, there should also be some way to observe the imperfection in the use of language itself, say in the processing of sentences. I will argue that this is indeed the case when reference-set computation needs to apply to repair an imperfection—it comes with an observable processing cost. Reference-set computation imposes a greater load on working memory than local computation does. Adults can apparently cope with this load (with limitations on the size of the reference set, as I will argue in section 2.7), but there is reason to believe that this load is too big for children, whose working memory is not yet as developed. Grodzinsky and Reinhart (1993) argue that the (relatively rare) chance pattern found in the acquisition of coreference (Condition B, or their Rule I) indicates guess performance. The reason is that the relevant coreference strategy involves reference-set computation, and children are unable to execute the computation, which, as they know innately, is required for this task. In chapter 5, I will provide further evidence for this claim in the area of coreference, and argue that there is growing evidence that the same pattern is found in the other instances of reference-set computation. If true, then acquisition findings also provide the most direct confirmation that reference-set computation is indeed involved in the relevant cases. This enables us to form a strong and strictly falsifiable hypothesis that if it is independently established that a certain interface problem requires global reference-set computation, we should also find out that children are unable to process and solve this problem. This puts a severe restriction on our theoretical freedom to postulate reference-set computation anywhere, as in Optimality Theory. In conclusion, we should keep in mind that the reference-set strategy governing the application of illicit operations is just one of the interface strategies, and as I have pointed out, it is only if evidence for the computational complexity is found for some linguistic instance that we can conclude that it might fall under this type. Operations for which no such evidence can be shown must belong elsewhere. One option, noted
46
Chapter 1
in section 1.2, is that they are directly encoded in the computational system, say, as optional features whose selection is governed by the interface requirement on the numeration, as suggested in Chomsky 1995. Or they may be governed by di¤erent context-adjustment strategies that apply at the interface and that do not involve reference-set computation. In Reinhart 2004, I discuss strategies of assessment and retrieval from discourse storage that govern the identification of topics and certain types of discourse presuppositions. As in the case of focus, I do not see a need to assume that the topic constituent is marked with a feature at the CS. However, the strategy governing its identification involves no comparison of derivations. Similarly, assessing the accessibility hierarchy that governs discourse anaphora resolution in Ariel’s (1990) analysis involves no such comparisons.
Chapter 2 Scope-Shift
The operation of Quantifier Raising (QR), and the level of LF it was assumed to generate, were introduced to linguistic theory in the mid1970s in Chomsky 1976, followed by May 1977.1 This development was not immediately accepted by all. For example, in Reinhart 1976, I argued that there is insu‰cient evidence for the introduction of covert scope-shift operations, and that the relative scope of quantifiers is determined only by the overt c-command relations of the quantifiers. In retrospect, it is easy to see why this was debated. As noted in chapter 1, allowing unrestricted covert movement into the computational system risks making the system substantially harder to use at the interface, because it increases the number of possible interpretations associated with each PF. The more arbitrary syntactic operations that could take place invisibly, the more mysterious it would be how speakers using language could be sure what others are saying. However, the arguments in Reinhart 1976 were empirical. I argued, first, that in the case of universal quantifiers, scope-shift construals (construals where the scope of the universal quantifier is not identical to its overt c-command scope) are extremely hard to get, and next, that in the case of existential quantifiers, where it appears very easy to obtain wide scope outside the overt c-command domain, the apparent wide scope is reducible to vagueness, and, thus, no scope-shift is involved. Both arguments were refuted over the years, which is among the reasons I switched in Reinhart 1983a to a view accepting QR as a marked and restricted operation. In any case, it is obvious by now that some mechanism of scope-shift is available in the computational system. In the terms presented in chapter 1, this means that the system contains an imperfection in this area—the scope construals generated by the obligatory operations of the computational system are not su‰cient to capture the interface needs. The remaining question is how sweeping this imperfection is.
48
Chapter 2
The picture of QR that emerged in the 1970s was relatively constrained. QR was believed to be just an instance of the overt operation Move a, subject to the standard constraints on this movement, such as island constraints. However, as more empirical findings accumulated in the 1980s, the picture got substantially more complex. On the one hand, it was assumed that QR is obligatory and applies to quantified NPs regardless of scope-shift, in order to obtain an interpretation for quantification. On the other hand, the operation itself became pretty idiosyncratic. It was assumed that indefinite-existential DPs can move freely without obeying island constraints. Universally quantified DPs, by contrast, can move only within their clause. Neither of these movements can be viewed, then, as just an instance of the overt Move operation. We will see, first, that the complexities introduced in the 1980s are not, in fact, justified. In the case of existential DPs, their apparent freedom of scope is captured without any covert movement, by the mechanism of choice functions discussed in section 2.6. The residue of QR can be viewed as a standard instance of syntactic movement, obeying all known restrictions, and it only applies when scope-shift is required at the interface. But this residue of QR is still an imperfection in the system, which should be treated along the lines we observed in section 1.3. In section 2.7, I examine further evidence that QR requires the costly reference-set computation. Sections 2.2–2.6 appear here as published in Reinhart 1997. Hence, the presentation does not cover some more recent developments. Most notably, the findings of Merchant 2001 cast doubts on my assumptions regarding sluicing, which are heavily based on Chung, Ladusaw, and McCloskey 1994, specifically that sluicing is restricted to indefinites. Possibly sluicing is not directly relevant to the analysis of existential wide scope and wh–in situ, which is the major theme in these sections. Nevertheless, these sections provide the background necessary for the analysis of the interface strategy that governs scope-shift. 2.1
Quantifier Scope: The State of the Art
2.1.1 The Optimistic QR View of the 1970s One of the strongest arguments for the introduction of QR in the syntactic framework was a correlation observed in the 1970s between the options of covert wide quantifier scope and those of overt wh-movement. This correlation was first noted by Rodman (1976), who stated his findings as a descriptive generalization (mainly about scope out of relative clauses). Chomsky (1975, 105) argued that ‘‘the quantificational property
Scope-Shift
49
that Rodman noted is a special case of a much more general principle . . . namely that all transformational rules are restricted to adjacent cyclic nodes.’’ (Later, that principle became known as ‘‘subjacency.’’)2 Although not much seems left, currently, of the optimism that surrounded this generalization, let me illustrate it nevertheless with a case where it happens to be true, as a first step in my attempt to restore that early optimism. In the sentences of (1), the quantified NP every new patient can take scope over the whole matrix clause—that is, the choice of doctor can vary with the choice of patient. This correlates with the fact that an extraction of a wh-constituent is possible from the same positions in (2). In (4), by contrast, island constraints on movement prevent wh-movement. Correspondingly, the sentences in (3) do not allow every new patient to be interpreted with wide scope (over a doctor). (1) a. A doctor will interview every new patient. b. A doctor will try to assist every new patient personally. c. A doctor will make sure that we give every new patient a tranquilizer. (2) a. Which patients will a doctor interview e? b. Which patients will a doctor try to assist e personally? c. Which patients will a doctor make sure that we give e a tranquilizer? (3) a. A doctor will examine the possibility that we give every new patient a tranquilizer. b. A doctor should worry if we sedate every new patient. (4) a. *Which patients will a doctor examine the possibility that we give e a tranquilizer? b. *Which patients should a doctor worry if we sedate e? In examples like these, the correlation appears complete up to the finer grains. Thus, while wh-movement is possible out of an embedded tensed clause, as in (2c), it is more di‰cult (and, thus, more context dependent) than extraction out of an infinitival clause, as in (2b). Correspondingly, it was widely observed that it is easier for a quantified NP to take wide scope outside its clause when the clause is not tensed. If true, this correlation speaks strongly in favor of capturing scope by covert syntactic movement: the scope of a quantifier is always determined by its syntactic position, but this position need not always be that in which it is realized phonetically.3 As I mentioned in chapter 1, there is
50
Chapter 2
also danger in allowing covert syntactic operations. If arbitrary syntactic operations could take place invisibly, it would be a mystery how speakers using language can understand each other. This was among the reasons the enthusiasm for QR was not shared by all in the 1970s. For example, my view in Reinhart 1976 (which I no longer hold) was that there is insufficient evidence for the introduction of such dangerous covert operations. But if the correlation observed in (1)–(4) is true, this danger is under control. The set of operations allowed invisibly is precisely the same set that also applies overtly, so the possible covert derivations from a given phonetic realization of a sentence can be computed easily.4 Furthermore, if such a correlation exists, it is hard to explain it, in any nonstipulative way, in frameworks capturing all scope construals in situ, such as quantifier storage.5 However, the scope picture has turned out to be much less neat than the facts in (1)–(4) suggest. Already in the 1970s it was observed that quantifiers are not all alike in their options for covert scope. In the case of strong (universal) quantifiers, Ioup (1975) argued that its availability varies so dramatically with the choice of determiner and with the context, that it is not clear that one neat generalization can be maintained. (A similar pessimistic conclusion was reached in Szabolcsi 1995.) Thus, while each appears to behave pretty much as predicted by QR, every is much more restricted. Many (e.g., Farkas 1981) have argued that most strong quantifiers are actually restricted to having scope only in their clause. Still, patterns exemplified in (1) exist, and no systematic account is available as to why in some contexts it is easy to get such a pattern and in others, virtually impossible. This is not necessarily evidence against the original QR-view, since it is possible that further contextual considerations that we do not understand yet a¤ect the ease of applying QR. The devastating problem is that existentially quantified NPs go in precisely the opposite direction, showing massive and systematic violations of syntactic restrictions on movement. 2.1.2 The Syntactic Freedom of Existential Wide Scope It has been widely acknowledged that many indefinite NPs appear to show scope-freedom that defies anything we know about disciplined syntactic behavior. These are all weak NPs, in the sense of Milsark 1974 and Barwise and Cooper 1981, or ‘‘existential’’ NPs, as defined in Keenan 1987. They include indefinite singular NPs, bare cardinal plurals (including many), and wh-NPs. I will refer to them for the time being as existential or indefinite NPs, and I will return to the question of what exactly the relevant set consists of in section 2.6.4.
Scope-Shift
51
Typically, existential NPs are indi¤erent to islands. The facts themselves have been known for a while, and even encoded in the syntax of QR, as we will see. But in the 1990s, it was noted that there are other serious problems lurking behind these facts. This was found, independently, in three di¤erent areas. In the area of quantifier scope, Ruys (1992) and Abusch (1994) show that the existing analyses do not always capture correctly the scope of existentials. In the case of questions, Reinhart (1992) argues that instances of wh–in situ are plainly uninterpretable in any of the available LF-analyses. Chung, Ladusaw, and McCloskey (1994) show that under all current LF-views, sluicing is an enormous mystery. Let me first illustrate the syntactic freedom of existentials in these three areas, and the problems it presents will unfold gradually. We may use (5) as a comparison basis for the di¤erence in the scope options of strong quantifiers and the relevant existential ones. The (underlined) strong quantified NPs in these examples cannot have the higher existential in their scope. In terms of QR, this means that it cannot be extracted out of the syntactic island. (5) a. Someone reported that Max and all the ladies had disappeared. b. Someone will be o¤ended if we don’t invite most philosophers. c. Many students believe anything that every teacher says. But if an existential occurs in the same position, as in (6), it appears to have no problem taking scope over the whole sentence. (The choice of ladies, philosophers, or teachers in these examples may be independent of that of the strong quantifier in whose domain they appear overtly.) (6) a. Everyone reported that Max and some lady had disappeared. b. Most guests will be o¤ended if we don’t invite some philosopher. c. All students believe anything that many teachers say. In the cases of relative quantifier scope, as in (6), the sentences are ambiguous and the judgment that the existential can have wide scope rests on intuitions regarding possible meanings of the sentence. Hence the facts in this area have been subject to many subtle considerations and debates, only some of which I will be able to mention here. For this reason, it is important to also consider wh–in situ and sluicing where no ambiguity is involved. While in their semantics, instances of wh–in situ are standard existentials (a point I will return to), they enable us to examine the scope problem in a syntactic rather than just a semantic way. In this case, scope hypotheses can be directly tested (by the set of possible answers), and scope judgments usually rest on syntactic intuitions of well-formedness,
52
Chapter 2
which are much clearer than semantic intuitions regarding possible interpretations. Indeed, the substantial push for QR (and for LF-theory) came, historically, from the findings of Huang (1982), who provided for the first time some content to the claim that movement is involved in assigning scope to wh–in situ. I will return to Huang’s arguments in section 2.2, but the relevant point here is that at the same time, Huang pointed out that this assignment is not sensitive to islands. The scope of the italicized wh–in situ in (7) is marked by the position of the top who, and thus it must be the whole sentence, even though the instances of wh– in situ are generated in the same island positions, as before. (7) a. Who reported that Max and which lady had disappeared? b. Who will be o¤ended if we don’t invite which philosopher? c. Who believes anything that who says? Turning to sluicing, illustrated in (8), the second conjunct in these structures has a wh-NP as the ‘‘remnant’’ of ellipsis. The first conjunct contains an NP corresponding to that remnant (‘‘correspondent’’). The correspondent-NP can only be an existential one. So, this is a good test case for existential distribution. The default assumption regarding their analysis is that they must involve some operation in the first (antecedent) clause. Following the spirit of the standard analysis of ellipsis (Sag 1976; Williams 1977; Pesetsky 1982), an LF-predicate has to be formed in this clause; it is obtained by applying QR to the correlate (someone), as in (8b). (8) a. They invited someone, but I forgot who. b. Someone i [they invited e i ], but I forgot who. c. Someone i [they invited e i ], but whoj [they invited e j ]. As for the second (sluiced) conjunct, two approaches are available: either it is generated with an empty IP, into which this LF-predicate is copied, or, under a deletion analysis, it is generated as a full sentence in which wh-movement applies. The second conjunct, under this analysis, looks as in (8c). The predicates in (8c) meet the identity or parallelism requirement. So the second conjunct can either be copied at LF (as proposed by Sag), or deleted at PF along the lines proposed by Chomsky (1995). On the second view, the interpretation is based on the full derivation (8c), but since the predicate in the second conjunct is identical to the first, it is simply not pronounced. Under both approaches, in any case, QR must apply to the existential in the first conjunct. Hence, sluicing is another case where the scope of the existential can be directly witnessed: if it cannot be extracted in the first conjunct, the second conjunct could not be interpreted—that is,
Scope-Shift
53
the derivation should be ill-formed. However, as Chung, Ladusaw, and McCloskey (1994) point out, Ross (1969) has already observed that whatever operation is involved here (prior to deletion), it violates all island constraints, as illustrated in (9), for the three sentence structures we have been considering. The correspondent of the wh-remnant is italicized in each case. (9) a. Max and some lady disappeared, but I can’t remember which lady [ ]. b. If a certain linguist shows up, we are supposed to be particularly polite, but do you remember who [ ]? c. Max will believe anything that someone will tell him, and you can easily guess who [ ]. 2.1.3 Can the Problem with Existentials Be Explained Away? Summarizing what we have seen so far, then, the original appeal of the QR-analysis seems completely lost. First, the hope that scope is a unified phenomenon, reducible to syntax, is shattered by the fact that existential and strong quantifiers have completely di¤erent scope patterns. Next, if existentials do not obey island constraints, the type of movement required for scope is not reducible to standard syntax. In view of the gravity of the problem, let us look at some of the attempts made to explain it away, just to find out that it is, indeed, a real problem. Interestingly, it is precisely in the case of existentials that it has appeared least obvious that there is any real semantic problem of scope. A question debated in the 1970s, at the dawn of QR, was whether it is true at all that the sentences of (6) are ambiguous, or whether it is indeed so obvious that wide scope should be encoded in any representation of these sentences. One line of thought that was entertained then was that, in fact, to capture correctly the semantics of such sentences, it is su‰cient to construe it with narrow scope of the existential. This is so since the (nonrepresented) wide scope entails the narrow-scope representation. That is, one of the situations that will render the construal of the existential with narrow scope true is the situation in which its construal with wide scope is true. This was the approach followed in Reinhart 1976 and Cooper 1979, for example. (An extensive survey of the debate on this issue can be found in Ruys 1992, chap. 1.) To illustrate the argument from Reinhart 1976, consider (10). Suppose that the scope representation our syntax allows for it is only (11a). To show that this is not su‰cient, and that the scope representation in (11b) should also be derived for the sentence, we have to show that a possible
54
Chapter 2
use of the sentence is disallowed without this addition. The obvious way to show that is to find a context (model) in which (10) construed as (11b) is true, but (10) construed as (11a) is false. (10) Every tourist read some guidebook. (11) a. (Every tourist x (some guidebook y (x read y))). b. (Some guidebook y (every tourist x (x read y))). But this is impossible, since (11b) entails (11a). This should not be viewed as proof that the sentence cannot have the reading (11b), but as an argument that there is no obvious way to know whether it does, or to distinguish between ambiguity and vagueness, in such cases.6 Compare this to the case in (12). Here overt syntactic compositionality allows only the representation (13a). Suppose our judgment is that the sentence can also be true when uttered in a situation where all guidebooks were read, but by di¤erent tourists. This cannot be accounted for without generating the additional representation (13b). (12) Some tourist read every guidebook (13) a. (Some tourist x (every guidebook y (x read y))). b. (Every guidebook y (some tourist x (x read y))). The problem here is the reverse of the previous one: the reading we generate entails the one we do not generate (but not conversely). This order of entailment is irrelevant for our purposes; trivially, when A entails B, B can be true while A is false. Specifically, in the situation under consideration, (13b) is true while (13a) is false. So there is no way to argue that (13b) represents just a specific instance that can make (13a) true. Since we decided that (12) nevertheless can be true in a situation corresponding to (13b), the reading we generate is not su‰cient. The conclusion drawn from these entailment relations was that in the case of universal quantifiers, their wide, nonovert scope could not be explained away. It could only be derived by QR, or by an equivalent operation generating the relevant scope construal. But in the case of existential quantifiers, there is no genuine wide scope involved. Hence it would follow that while universal wide scope is restricted by constraints on movement, the apparent existential wide scope is not restricted syntactically. However, it was eventually observed that this entailment pattern holds only in a subcase of existential wide scope.7 Fodor and Sag (1982) and Ruys (1992) point out that even in the simplest cases, the argument does not hold when the existential occurs in the scope of a nonmonotone quantifier. In this case, neither scope construal entails the other.
Scope-Shift
55
(14) a. Exactly half the boys kissed some girl. b. [Exactly half the boys x [some girl y [x kissed y]]] c. [Some girl y [exactly half the boys x [x kissed y]]] (15) Mary dates half the men who know a producer I like. For instance, in Ruys’s example (14), it is perfectly possible for the sentence to be understood as represented in (14c), with at least one (and the same) girl being kissed by exactly half the boys. But if the sentence is true under this construal, it may still be false under the narrow-scope construal in (14b)—for example, if more than half of the boys kissed one girl or another, but only half kissed the same girl. Thus, if we generate only the overt-scope interpretation (14b), we do not capture correctly the conditions under which the sentence can be used truthfully. Nevertheless, existentials can also take wide scope outside of an island in such cases, as in Fodor and Sag’s example (15). Farkas (1981) and Abusch (1994) show the same in cases in which the existential occurs inside an implication, as in Farkas’s (16), where the wide-scope construal of a . . . poem clearly does not entail the narrow one. (16) John gave an A to every student who recited a di‰cult poem by Pindar. (17) If some relative of mine dies, I will inherit a house. Similarly, in (17), it is very easy for the sentence to mean that there is a relative of mine such that if she or he dies I inherit a house, although this does not entail that for any relative who dies this is so (the overt narrow-scope reading). The facts are thus that when logic permits a clear di¤erentiation of the two readings, they do indeed show up. This means that there must be some linguistic mechanism that generates the relevant readings. Even if the no-ambiguity line could somehow have been maintained for the issue of relative quantifier scope, it would be of no help with the other two problems of wh–in situ and sluicing, since, as we saw, the free scope of existentials is witnessed there independently of any ambiguity. (For example, there is no possible independent local interpretation of the wh–in situ that could entail the question interpretation.) Another approach has been proposed to the apparent free scope of existentials. It rests on the fact that existential NPs (of the relevant type) can be used to refer to discourse entities, or to introduce new entities. On that view, developed in Fodor and Sag 1982, this is explained by assuming that indefinite NPs are ambiguous between a quantified (existential)
56
Chapter 2
and a referential interpretation. (In some views, this ambiguity is encoded as two entries of the indefinite determiner.) In their free-scope occurrence, indefinites are kind of referential. So, what seems to be ambiguity of scope construal of existentials is, in fact, just ambiguity of the indefinite NPs themselves. The idea that indefinite NPs are ambiguous has become popular (independently of the issue of free scope), and it comes under di¤erent names, each representing a slightly di¤erent view regarding what the relevant property is. On their referential side, indefinites can be D-linked (Pesetsky), Presuppositional (Diesing), or strong (de Hoop). It may appear that this view, unlike the previous one, could also be extended to account for existential wide scope in the case of wh–in situ, as proposed at least for a subclass of them by Pesetsky (1987). Hence we should examine it in some detail. In this approach, then, there is no need to assume QR for the apparent wide scope of indefinites, since in these cases they are used in their referential entry. Just as proper names can be interpreted in situ without moving—so goes this line of thought—referential indefinites can also stay, with the same e¤ect. Some variants of this approach combine the idea of ‘‘specific’’ or ‘‘referential’’ indefinites with the mechanism of unselective binding. They do not function like proper names, but rather are bound in situ by a remote existential operator (Pesetsky 1987; Beghelli 1995). The obvious question these approaches face in the case of wide-scope existentials is how one would ever know whether indefinites are ambiguous or not. In the standard examples discussed in this literature, there can be no possible truth-condition di¤erence depending on whether an existential is construed as taking wide scope or as specific, under any of the descriptions of specificity. (See Higginbotham 1987 for a more articulate presentation of this point.)8 This, however, is not true for Fodor and Sag’s (1982) analysis. Aware of this problem, they o¤er a clear way to check the distinction they propose. If the apparent wide-scope interpretation is indeed generated by an (island-free) QR, we would expect all scope construals to be possible, as would be the case in a logical language, not restricted by human-language islands. Specifically, one of the construals that should be possible for (18a) is (18b).9 (18) a. Every professor will be fired if a student in the syntax class cheats on the exam. b. [For every professor x [there is some student y in the syntax class such that [if y cheats in the exam x will be fired]]]
Scope-Shift
57
This construal is generated if QR raises the embedded existential out of the if-clause but places it in the scope of the universal, which, if an island-free QR is at work here, should be possible. But the claim is that the sentence does not allow this construal. Though they do not spell it out precisely in this way, I think they mean the following. If there happens to exist, say, one student who cheated on the exam, the construal (18b) allows the implication in (18a) to be true also in case some professors are fired and some are not: every professor is associated with a student whose cheating will lead to firing. So many options are logically open if one student cheated; one of them is that one professor will end up fired. The factual claim of Fodor and Sag is that, in fact, the implication in (18a) is understood to be true only if either all or no professors are necessarily fired in this case. (Under the narrow-scope construal, all professors should be fired if a cheating student exists. Under the maximal wide scope it could only be all or none.) More generally, Fodor and Sag argue that we do not, in fact, get the full range of (noncompositional) scope options but only two: the narrow (compositional) scope, and a maximal wide scope, but no intermediate scopes. This is the result one would expect if the ambiguity at issue is between the logical (existential) interpretation of indefinites, and their referential one, which is simply insensitive to scope. It is correct that the two approaches to the relevant ambiguity problem that I have discussed so far di¤er in this prediction regarding intermediate scope. What is extremely di‰cult, again, is to check the intuitions needed to decide which is right. However, both Ruys (1992) and Abusch (1994) try to do this, and show with great care and detail that intermediate readings do exist. This is illustrated in (19a), from Ruys (his (18), p. 101). (19) a. Every professori will rejoice if a student of hisi cheats on the exam. b. [For every professor x [there is some student y of x such that [if y cheats on the exam, x will rejoice]]] c. [For every professor x [if there is some student y of x such that y cheats on the exam, x will rejoice]] The pronoun here is bound by the universal, so there is no option for either maximal wide scope or referential interpretation. Luckily, then, we only have two interpretations to consider, and Ruys argues that the sentence does indeed have both interpretations in (19b, c). Under the intermediate construal in (19b), the sentence can be true, even if, say, some
58
Chapter 2
student of Professor Jones cheats, and Professor Jones does not rejoice. (Professor Jones has two students, Max and Felix. Max has cheated, but Jones would have rejoiced if Felix had.) It may be easier to observe the intermediate readings if we use specificity markers—for example, if we replace the indefinite in (19a) with a certain student of his. But what this means is that, as Ruys points out, the apparent specificity impression has nothing to do with either referentiality or maximal wide scope. In the examples used by both Ruys and Abusch for intermediate readings, the existential happens to contain a bound pronoun, as in a student of his in (19a). Based on this fact, Kratzer (1998) argues that there are no genuine intermediate readings of existential wide scope in such contexts, but that it is the pronoun that creates an impression of such readings. She o¤ers a mechanism for capturing the anaphora interpretation in such cases (which is probably also needed, independently, for this and a variety of other anaphora problems in the system I will propose). With this assumed, Kratzer proposes a new implementation of the basic intuition of Fodor and Sag that the apparent existential wide scope is a case of specificity, relating to the discourse status of the indefinites. However, intermediate readings have been noted and analyzed before, also without bound pronouns. Farkas (1981) brings the following counterexamples to Fodor and Sag’s claim (her (17), p. 64). (20) a. Each student has to come up with three arguments that show that some condition proposed by Chomsky is wrong. b. Everybody told several stories that involved some member of the Royal family. In (20a), it is relatively easy to understand the three arguments as addressing one and the same condition by Chomsky, but still the relevant condition may vary with students (some condition has wider scope than three arguments, but narrower than each student). Admittedly, in these examples one could argue that the impression that it is the same condition for all three arguments is just a matter of vagueness, along the entailment line of argument we examined above, hence this is not an intermediate reading, but just a specific instance that makes the narrowest scope of some condition come true. To control for this, we may look at (21), based, with some variation, on the inventory of Ruys. (21) a. Most linguists have looked at every analysis that solves some problem. b. [Most linguists]1 [[some problem]3 [every analysis that solves e3 ]2 [e1 looked at e 2 ]]
Scope-Shift
59
(22) Each student has to find all arguments in the literature showing that some condition proposed by Chomsky is wrong. Let us focus on the reading where some problem in (21) has scope wider than every analysis—that is, for a given problem, the relevant linguists looked at all the analyses that solve this problem. It is still possible, in this case, that di¤erent linguists looked at di¤erent problems; in other words, it is not a necessary entailment that most linguists looked at the analyses of the same problem. This, then, is the intermediate reading represented syntactically in (21b). (But obtaining this reading syntactically would involve extraction of the indefinite out of an island.) Similarly, the modification of Farkas’s example in (22) still allows the intermediate reading, so the sentence is ambiguous in three ways. The same point can be illustrated in the sluicing context (23). The most plausible construal of the elliptical part (following which word ) is as given in the brackets in (23b). Given our assumptions so far, this construal can only be obtained if the correlate some word occurs (at the covert structure) in an intermediate position between each player and all the consonants. (23) a. Each player must write down all the consonants that some word contains, when properly pronounced, and let the others guess which word. b. Each player x must . . . let the others guess for which word y, [x wrote down all the consonants y contains when properly pronounced]. We may conclude that intermediate readings are available, independently of whether a bound pronoun occurs or not. The presence of a bound pronoun, as in (19a), only makes it easier to observe the existence of the intermediate reading, since it eliminates one of the competing readings— that of widest scope. (The more available all three readings are, the harder it is to identify just the intermediate one.) As always with tasks involving quantifier interpretation, judgments of such readings may be subtle, and certainly depend on many contextual factors. But the existence of contexts where they are possible is enough to raise the question of how they are derived. Fodor and Sag’s test has provided us, then, with further confirmation of the conclusion that there is a real problem here. Existentials show properties that (so far) look like logical wide scope, with blatant blindness to islands.10
60
Chapter 2
2.1.4 The ‘‘Realistic’’ QR View of the 1980s The problem we have observed, then, is that existential and universal quantifiers appear to have completely di¤erent scope options. Universal quantifiers clearly obey island restrictions, and in many cases have narrower scope options than predicted by QR; existential quantifiers have broader options than predicted by QR. So it does not seem that the original optimism of the QR view can be maintained. In the view of QR that emerged in the 1980s, this problem was addressed. The decisive factor was Huang’s (1982) argument that although instances of wh–in situ do not obey subjacency islands, there is evidence that nevertheless they must move to get scope (since they obey another syntactic condition— the ECP, an issue I will return to). The theoretical account that emerged, then, is rather complex (Huang 1982; May 1985; Chomsky 1986). First, the idea that QR is just an instance of standard syntactic movement (Move a) was replaced by the assumption (24a) that QR is a special operation, not restricted by subjacency islands—or, put di¤erently, that subjacency only restricts overt syntactic operations, but not the covert ones. This directly entails the distribution of existentials, but raises the question of why strong quantifiers are so restricted. To address this question, it was decided that a further restriction applies to them, which, as far as I know, was not actually defined beyond the statement that it is ‘‘roughly clausebound,’’ as in (24b). The modifier roughly is sometimes interpreted as also allowing extraction out of small clauses (ECM) and infinitival clauses.11 (24) The QR view in the 1980s a. QR does not obey subjacency islands. b. Strong quantifiers are ‘‘roughly’’ clause-bound. Technically, it can be argued, then, that QR is one unified rule, but since strong quantifiers are further restricted, they never get a chance to manifest the full options of QR. We should note that the problem-solution ratio here is rather poor. The problem is that there appear to be three types of scope-taking options: overt wh-movement, which is island-restricted; covert scope of existentials, which is island-free; and covert scope of universals, which is islandrestricted, or perhaps more restricted than that. The question is why this is so—in other words, what generalization(s) this could follow from. The solution is that there are three rules, each capturing exactly one of these options. They do not capture anything beyond these three problems, since, except for the scoping of existentials, there is no other movement operation that does not obey subjacency, and except for the scoping of
Scope-Shift
61
universals, no other movement is restricted in the ‘‘clause-bound’’ manner.12 In other words, the solution is nothing more than rephrasing the description of the problem in more technical language. I have focused here on the QR-approach, but the ratio problem remains the same for all approaches to scope that share this view of what the facts are, though the technical language of the description may vary dramatically. If this is how things are, the early optimism must be replaced with a modest and realistic approach. The question, though, is whether this new picture is, indeed, realistic. Have we at least managed to describe the facts correctly? Of course, if we have not, more problem-specific rules can be added. However, if we do not have any upper bound in our theory on the number of stipulations and lists we can add to address every new problem, it may be true that we can reach a correct description, or at least the opposite cannot be easily proven. But it is a mystery how a language learner can learn such lists of descriptions. 2.1.5 Some Problems Some (relatively minor) problems arise regarding the proposed generalization in (24b) that strong quantifiers are clause-bound. One of the problems that QR seemed particularly promising for—for example, in the analysis of May 1977—was the de re interpretations in belief contexts, as in (25a). (See note 2 in the present chapter.) (25) a. Lucie believes that every politician is corrupt. b. Someone believes that every politician is corrupt. c. Someone is always willing to believe that every politician is corrupt. d. [someone i ] [everyone]j [e i believes that e j is corrupt] Under its de re interpretation, (25a) entails that Lucie believes that Clinton is corrupt, and the standard way to guarantee this entailment is by scoping the universal out of its clause. If (24b) is correct, this is no longer permitted. Possibly some way can be found to capture de re interpretations without movement, but for the time being, it seems that this entails adding problem-specific stipulations. Nevertheless, it is still the case that in (25b), it is very di‰cult to get a reading where someone is dependent on, namely in the scope of, every politician (it is slightly easier in the generic (25c)). This is the intuition that (24b) is based on. It seems that the scope construal we can get easily is the one represented syntactically in (25d), where every politician scopes out of its clause, hence is interpreted de re, but still has narrower scope than someone.
62
Chapter 2
It is not obvious why this is so, but it is equally possible that it is because someone in such contexts resists referential dependence, not because every politician resists taking wide scope. This hypothesis is pursued in Kennedy 1997, where it is substantiated with examples from antecedentcontained deletion.13 Recall that the problem that led to the proposed generalization (24b) was that strong quantifiers show mysterious behavior, when it comes to questions of their scope interaction with other quantifiers. Unlike existential quantifiers that scope out easily, it is sometimes easy and sometimes di‰cult for strong quantifiers to get scope wider than quantifiers that ccommand them overtly, even inside their clause. (I return to this question in section 2.7.) Scoping strong quantifiers out of their clause seems even harder and, in fact, we do not know precisely when and why it is possible. Thus, in (1c), repeated here, every new patient can easily take scope over the higher subject, contrary to the proposed generalization (24b). (1c) A doctor will make sure that we give every new patient a tranquilizer. Farkas and Giannakidou (1996) argue that scoping out of a tensed clause, as in (1c), is found only with a restricted set of verbs, like make sure, and o¤er an explanation for why this is so. Whether their explanation is right or not, it appears that various factors a¤ect the ease of scoping a strong quantifier out, rather than just a syntactic clause-boundedness restriction. When an irregular pattern of facts is discovered, it is always only the theory that could decide what is the rule and what is the exception. The decision to adopt (24b) and leave out (25a) and (1c) as unexplained problems has no merits over the opposite decision, to take these two as representing the standard application of QR, and leave for future explanation the question of why scoping out of a clause is sometimes much harder (as in (25b)). However, (24b) does not pose serious conceptual problems in and of itself, so we may leave the question of whether it is indeed motivated open here. The major conceptual problem with the description in (24) is the assumption that the covert operation, QR, di¤ers from overt movement in that it does not obey subjacency. Already in the first stages of this theory, opinions were sharply divided regarding the status of such statements. Some considered this a real hindrance to a unified theory of syntactic movement. Others thought the fact that syntactic movement and LFmovement obey di¤erent constraints provides strong evidence for LF, as
Scope-Shift
63
distinct from surface structure. While this purely conceptual debate could go on forever, in the minimalist program it is impossible even to state this question, since there are no levels of representation. There is only one derivation—deriving LF—that can be spelled out and enter the PFinterface at any stage. So there is no way to state that up to the branching to PF you have to obey a certain constraint, and from there on, you do not have to obey it. Independently of the conceptual issue, it is also empirically wrong that covert movement in general does not obey subjacency. The discussion here has focused on the issue of relative scope of quantifiers. But there are several other problems that have motivated QR over the years. In the case of comparatives, like (26a), it is common to assume that men must scope out, covertly, at LF (along with some degree operator; Heim 1986 provides an extensive survey of available approaches). (26) a. b.
We invited more men to our party than women. More people said that they will vote for Jones, in the last poll, than for Smith. c. *More people who love Bach arrived, than Mozart.
But the way QR operates in this case parallels the operation of overt scoping. It is not clause-bound—in (26b) the NP Jones can easily be extracted out of its clause, to form the comparison pair with Smith. Yet it obeys subjacency—for (26c) to be interpretable, Bach must be extracted, but since it occurs in an island, this is impossible, so the derivation is uninterpretable.14 Another instance is except elliptic conjunctions, as in (27). Here too, the italicized (correlate) phrase must move at LF. This is argued in detail in Reinhart 1991, where I also contend that no overt syntactic movement can account for such structures. (27) a. b.
We invited everyone to our party, except/but Felix. Lucie admitted that she stole everything, when we pressed her, except/but the little red book. c. *The people who love every composer arrived, except/but Mozart. d. *Which composer did the people who love e arrive?
This case is particularly interesting, since the NP moved here is a universal quantifier. Still, this movement is not clause-bound, as illustrated in (27b), but it does obey subjacency. The derivation in (27c), where the correlate is in an island (hence its LF-movement violates subjacency), is as bad as the cases of overt movement such as (27d).15
64
Chapter 2
If other instances of QR do obey subjacency, the description of the problem in (24) cannot be maintained. Rather, we should go back to a more elementary statement of the mystery we started with: existential NPs can move arbitrarily to get wide scope. This now needs to be stipulated as a purely problem-specific operation. In section 2.4, we will see that even so, this is not the correct description of the problem. The wide scope generated by island-free movement of this type is not actually found in English, and allowing such a rule to exist yields the wrong semantics. But even if it was the correct description, it would still make sense to look for something from which it could be derived. The alternative, which has been considered all along, is that the wide scope of existentials can be captured in situ (i.e., without movement), and could follow from some independent properties of these NPs. Let us now consider such alternatives. We will see that although syntactically and conceptually there is strong reason to believe this is the right direction (section 2.2), the actual implementations of this idea dramatically fail to capture the interpretation of existential wide scope (section 2.3). 2.2
The Alternative of Wide Scope In Situ
2.2.1 Wh–In Situ As noted, instances of wh–in situ illustrate best the free distribution of existential scope (since their scope is directly tested by the set of possible answers). But at the same time, they also illustrate an alternative way of accounting for this scope, which does not involve movement. In fact, the history of this problem, within the syntactic frameworks, goes back and forth between the two approaches. The first, which I assumed in the previous discussion, is that they undergo movement at LF, to some clauseinitial position, where their scope is correctly captured, as illustrated, for (28), in (29a). The second, originating in Baker 1970, is that each question-sentence contains an abstract Q-morpheme, and instances of wh–in situ are bound directly by Q. (More generally, the idea that scope assignment does not require movement was advocated in several papers by Williams, though along somewhat di¤erent lines (e.g., Williams 1986).) This view has regained popularity and was further developed in the work of Pesetsky (1987) and Nishigauchi (1986), who argued that at least part of the time, cases of wh–in situ are bound in situ by Q. Their formulation of this line of argument makes use of the mechanism of unselective binding developed in Heim 1982. The Q-operator unselectively binds all the
Scope-Shift
65
variables in the wh-NPs that have not moved. The LF derived this way for (28) is (29b). If this approach is adopted, the island problem is directly resolved: instances of wh–in situ are insensitive to islands, since they never move (and coindexation is insensitive to islands anyway). (28) Which lady2 [e 2 read which book1 ]? (29) a. LF-movement: [Which book1 [which lady2 [e 2 read e1 ]]] b. Baker (77): Qh1, 2 i [which lady2 [e 2 read which book1 ]] Although, as we will see, it is far from obvious how the available analyses along the lines of (29b) can yield the correct semantics, the reason they were rejected in the QR-framework is syntactic rather than semantic. As mentioned, the QR-analysis of wh–in situ became widely accepted following Huang’s (1982) findings. I have discussed some of these in section 1.1, in the context of superiority, but let me reiterate the central argument for why LF-movement is necessary. Huang noted contrasts like those in (30)–(31). In (30a), with an argument whom in situ, the derivation is fine, even though a wh cannot move overtly out of this position, as seen in (30b). But if the adverbial how occurs in this position, as in (31a), the derivation is uninterpretable. Syntactically, the di¤erence is that how is an adjunct rather than an argument. Generally, adjuncts are assumed to be more restricted in their movement options than arguments are, since they also need to obey the syntactic condition known as ECP.16 If we assume that cases of wh–in situ have to move covertly to get scope, then the ill-formedness of (31a) follows from the ECP, in the same way that the overt movement in (31b) is ruled out. (30) a. Who fainted when you attacked whom? b. *Whom did Max faint when you attacked e? (31) a. *Who fainted when you behaved how? b. *How did Max faint when you behaved e? This, along with parallel facts from Chinese (which has no overt whmovement at all), was taken as decisive evidence that QR must apply to assign scope to wh–in situ, since this scope is sensitive to purely syntactic constraints. But at the same time, it led to the conclusion that QR is not sensitive to subjacency, to account for (30a). However, there were several problems with the claim that instances of wh–in situ show ECP e¤ects. One, mentioned in chapter 1, is that only the adverbial wh-adjuncts behave as in (31a). If we replace the how of (31a) with what way, the derivation is fine. Syntactically and semantically what way is an adjunct, just as how is. If the ECP is what rules out (31a),
66
Chapter 2
there is no way to explain why it is not excluded as well. Another problem is that the ECP should also apply to subjects, not just to adjuncts. Thus, the syntactic extraction in (32a) has all the marks of a severe ECP violation. But this is not found with wh–in situ, as we see in the analogous (32b), which is acceptable. (32) a. *Who did Max read the book that e wrote b. Who read the book that who wrote? The question posed by (31a) is what makes the scope of adverbial whs appear more restricted, but it cannot be viewed as evidence of movement. Indeed, in the minimalist program of Chomsky 1995, the idea that instances of wh–in situ move covertly was abandoned. I discuss this issue in detail in Reinhart 1994, where I also o¤er an account of the peculiar behavior of adverbial adjuncts like how. 2.2.2 Sluicing We may turn now to sluicing, under the new light that Chung, Ladusaw, and McCloskey (1994) have shed on these structures. As noted in the discussion of (8a), repeated in (33a), under all standard analyses, an LFpredicate is formed by QR in the antecedent clause, as in (33b) (following Sag and Williams). On the LF-copy view, this predicate is copied (covertly) into the empty IP of the second (sluiced) conjunct. On the alternative view of ellipsis as deletion under identity at PF (proposed by Chomsky), it seems that wh-movement should also apply in the second conjunct, and then the two IPs are identical and the second can be deleted. (33) a. b. c. d.
They invited someone, but I forgot who. Someone i [they invited e i ], but . . . whoj [they invited e j ] Max and some lady disappeared, but I can’t remember who [ ] If a certain linguist shows up, we are supposed to be particularly polite, but I don’t remember who [ ] e. Max will believe anything that someone will tell him, and you can easily guess who.
The problem posed by the island violations illustrated in (7), illustrated here in (33c–e), is even more acute than in all the other cases. Under both views of ellipsis, there does not seem to be a way around assuming some movement, in the first conjunct, since it is only this movement that creates identity between the two conjuncts. (If, at the deletion stage, we have the structure They invited someone but I forgot who they invited t, nothing licenses deletion of the second IP.) If my presentation of the sec-
Scope-Shift
67
ond view is correct, this seems to also require violation of subjacency of the overt wh-movement in the second conjunct. (This, indeed, was what Ross (1969), who discovered these structures and named them sluicing, assumed to be the problem.) A long-standing puzzle is the fact that only indefinite (existential) NPs can license sluicing: sluicing in the second conjunct is possible only if the ‘‘correlate’’ of the wh-phrase in the first conjunct is existential. For example, sluicing is not licensed in (34a) and (35a), where the (italicized) correlate is a strong NP. (34) a. *Lucie already knew that they appointed Max. Still, she didn’t tell me who. b. Lucie already knew that they appointed Max. Still, she didn’t tell me who they appointed. (35) a. *If you already know that everyone objected to your proposal, there is no point in asking who. b. If you already know that everyone objected to your proposal, there is no point in asking who did. Chung et al. point out that this restriction on the possible correlate cannot be dismissed as falling under some pragmatic considerations. As they show with similar examples, the discourse in (34) and (35) is perfectly coherent. That this is so is further attested by the fact that without the ellipsis the sentences are fine, as in the (b) cases. In (35b) we also see that there is no independent ellipsis problem here, since VP-ellipsis is possible. Whatever makes the (a) cases impossible must then be syntactic and not pragmatic. It may appear that this could follow from the standard picture of QR, summarized in (24): since existentials do not obey subjacency they may scope out in (33c–e), but strong quantifiers are restricted to their clause, hence they cannot scope out in (34)–(35). But this is not the correct account. Strong NPs do not license sluicing also when they are not embedded, as in (36a). (36) a. *We invited everyone you know to the party, so stop asking me who. b. We invited everyone you know to the party, so stop asking me who we invited. c. [Everyone you know]i [we invited e i to the party] . . . Under the standard picture, nothing prevents QR of everyone within its own clause, as in (36c), which should then license the sluicing in the second
68
Chapter 2
conjunct. Nevertheless, sluicing is impossible. Again, this is not a pragmatic or contextual matter, since (36b) with the same intended meaning is fine. Why only existentials can license sluicing thus remained a persistent mystery. Chung et al.’s alternative analysis explains both the subjacency question and the indefiniteness e¤ects. It rests on the basic idea in Discourse Representation Theory (DRT) (Kamp, Heim) that indefinites are not necessarily closed NPs, but they can be viewed as ‘‘restricted free variables.’’ As in Heim 1982, their variable can, then, be unselectively bound by another operator. They propose that given a structure like (37a), an operation that they call ‘‘merging’’ applies: the full antecedent IP is copied (recycled) as is into the second clause, yielding (37b). (37) a. They invited some linguist but I forgot who. b. [They invited some linguist] but I forgot who [they invited some linguist] c. . . . but I forgot [which x [they invited linguist (x)]] The indefinite determiner (some) is, as just noted, semantically invisible, hence the indefinite variable in the recycled IP can now be unselectively bound by the wh-operator. Simplified, the result is illustrated in (37c). (I will return to the question of how the merging operation is interpreted.) Since merging involves unselective binding, the indefiniteness e¤ect is derived: if a clause containing a strong NP gets ‘‘recycled,’’ this NP cannot be unselectively bound, so the wh-operator will not bind any variable, and the derivation is ruled out as vacuous quantification (an illegitimate LF-object). The island problem also disappears. No movement is involved at all, and unselective binding is not island-sensitive ( just as assumed before for wh–in situ). While in the case of wh–in situ, the evidence against the movement analysis was subtle, as we saw, here it is fully decisive. The existentialspecific version of QR, which does not obey subjacency, can correctly describe the fact that islands are not observed in the sluicing examples, but there is no way it can explain or describe the definiteness e¤ect, namely, why only IPs containing an existential can serve as the antecedent clause of sluicing. There is also additional strong evidence for Chung, Ladusaw, and McCloskey’s approach in their paper.17 2.3
The Interpretation Problem of Wide Scope In Situ
We saw some of the advantages of giving up the idea of obtaining wide existential scope by movement. However, a crucial question that should
Scope-Shift
69
be checked is whether an interpretation that captures truth conditions correctly can be associated with the structures we generate. The islandfree QR seems, so far, to face no problems in this area, which is why it was proposed to begin with (though, as we will see in section 2.4, it does in fact face problems). The question, then, is whether the alternative can capture this one and only thing that the previous analysis appears to do right. As we saw, the mechanism that made it possible to account for wide existential scope with no movement is assumed to be unselective binding. So far we have followed this line of thought in the case of wh–in situ and sluicing, but it has also been proposed for the interpretation of wide scope, most explicitly by Beghelli (1993). We should note, though, that the use of this option here is dramatically di¤erent than in Heim’s (1982) original proposal. Heim did not allow unselective binding across an intervening operator, and more generally, the indefinite could only be unselectively bound in a position obtained by QR. So she assumes both an island-free QR and unselective binding. This is with very good reason, as will become obvious soon. Thus we should now check whether it is indeed possible to extend this mechanism in the proposed way. Or, more generally, can we correctly capture the semantics of wide scope without QR? As before, I will first examine this question in detail in the case of wh– in situ, where the judgments are clearest. Then it will be easy to see that we are facing precisely the same problem in all three areas of existential wide scope under consideration here. (The argument regarding the whcases appeared in Reinhart 1992.) 2.3.1 Wh–In Situ Let us assume here the semantics of questions proposed by Karttunen 1977 (see also Engdahl 1986).18 On this view, wh-NPs are essentially existential NPs, and the question denotes the set of propositions that are true answers to it—for example, the interpretation of question (38a) is given in (38b). (38) a. Which European country has a queen? b. {Pj(bx) (European country (x) & P ¼ b (x has a queen) & true (P))} c. {England has a queen; Holland has a queen; Denmark has a queen} Representation (38b) is the set of true propositions P, such that there is a European country x about which P asserts that x has a queen. In our actual world, the values of x yielding ‘‘x has a queen’’ as a true proposition turn
70
Chapter 2
out to be Denmark, England, and Holland, so the question denotes the set in (38c). It should now be obvious why instances of wh–in situ pattern with all other existential NPs—being standard existentials, their distribution is just like that of the other existential NPs. Recall that the two LFs we have been considering for the syntax of (28) are those in (29), repeated here. (28) Which lady2 [e 2 read which book1 ]? (29) a. QR-movement: [Which book1 [which lady2 [e 2 read e1 ]]] b. Unselective binding: Qh1, 2 i [which lady2 [e 2 read which book1 ]] (39) a. Which lady read which book? b. With movement: {Pj(bx) (by) (lady (y) & book (x) & P ¼ b (y read x) & true (P))} c. No movement—‘‘unselective binding’’: {Pj(bhx, yi) (lady (y) & P ¼ b (y read x & book (x)) & true (P))} Applying Karttunen’s analysis to the two LFs, we get (39b) for the LF obtained by raising of the wh–in situ. (This is the set of true propositions P such that there is a lady y and a book x, about which P asserts that y read x.) Representation (29b)—the representation obtained with no covert movement of the wh–in situ—now corresponds to (39c), which di¤ers from (39b) only in where the book-restriction occurs in the representation. If cases of wh–in situ do not move, a crucial result of the analysis is that although their scope is identical to that of a moved wh-phrase, the N-restriction stays in situ, rather than occurring as a restriction on the question operator. In the specific case of (39), the result is unproblematic. The two representations appear equivalent. But if we look deeper, we will discover that this is, nevertheless, the wrong interpretation, and the idea of leaving the restriction in situ is rather dangerous. To see this, let us consider (40). (For convenience, I will use both an informal representation and Karttunen-type representations in the examples.) (40) Who will be o¤ended if we invite which philosopher? (41) Incorrect a. For which hx, yi, if we invite y and y is a philosopher, then x will be o¤ended. b. {Pj(bhx, yi) P ¼ b ((we invite y & philosopher (y)) ! (x will be o¤ended)) & true (P))} c. Lucie will be o¤ended if we invite Donald Duck.
Scope-Shift
71
(42) Correct a. For which hx, yi, y is a philosopher, and if we invite y x will be o¤ended. b. {Pj(bhx, yi) (philosopher (y) & P ¼ b ((we invite y) ! (x will be o¤ended)) & true (P))} In this case, the restriction occurs in an if-clause. Thus, the representation obtained if we leave it in situ is (41a). Now, if (41a) is the question expressed by (40), one of the possible answers to it should be (41c). Since Donald Duck is not a philosopher, it must be true of him that if he were a philosopher and we invited him, Lucie would be o¤ended.19 In fact, anything that is not a philosopher could be a value for y in (41a), since its restriction occurs in the antecedent clause of an implication. This result is just wrong. We do not want to allow (41c) in the set of possible true answers to the English question (40).20 The representation yielding the correct set of answers in such cases is that in which the restriction is pulled out of the implication, as in (42a). This correctly allows the values for y to be all and only those individuals who are philosophers and for whom the implication is true. The same problem is illustrated with (43). Leaving the restriction in situ and applying unselective binding, we obtain (43b), under which it turns out that a necessarily true answer is, for example, (43d), since it is true for every linguist x that if Nancy Reagan is a philosopher, then x read every book by her.21 (43) a. Which linguist read every book by what philosopher? b. For which hx, yi, x is a linguist and for every z, if z is a book by y and y is a philosopher, then x read z. c. {Pj(bhx, yi) (linguist (x) & P ¼ b (Ez (book by y (z) & philosopher (y)) ! (x read z)) & true (P))} d. All linguists read every book by Nancy Reagan. The same problem with leaving the N-restriction in situ shows up in all areas of existential wide scope. 2.3.2 Sluicing As we saw, Chung, Ladusaw, and McCloskey’s (1994) analysis of sluicing is probably the first real solution ever proposed for this problem. However, it rests on precisely the same idea of extending the mechanism of unselective binding. It is easy to observe that it will therefore face the same interpretation problem.
72
Chapter 2
Recall that Chung and colleagues argue that the antecedent IP of, for example, (44a) is recycled into the sluiced conjunct, yielding (44b). This now is interpreted by unselective binding, which is illustrated in (44c). That is, (44b) is to be interpreted as the standard question (44d), which is precisely what we want.22 (44) a. b. c. d.
Joan ate dinner with some linguist, but . . . with whom? [[with whom] [Joan ate dinner with some linguist]] {Pj(bz) (linguist (z) & P ¼ b (Joan ate dinner with z) & true (P))} With which linguist (did) Joan eat dinner e?
But the crucial question is still how we get from (44b) to (44c). In (44c), the N-restriction linguist is pulled out, into the restrictive term of the wh (existential) operator. A standard way to do that is to apply QR to some linguist, as Heim (1982) does in her analysis of unselective binding. However, as we saw, a central point in Chung et al.’s analysis is that sluicing structures defy subjacency, and one of their breakthroughs was in enabling us to avoid an island-free QR. (45) a. If a certain linguist shows up, we are supposed to be particularly polite, but do you remember who [ ]? b. Max will believe anything that some teacher will tell him, and you can easily guess who. The island-blindness of these structures was illustrated in (9), partially repeated in (45). If QR obeys subjacency, the italicized indefinite cannot move, after merging. Specifically, its N-restriction will stay in situ (or, at most, be attached to the lower clause). To interpret the derivation, we may attempt unselective binding in situ. But then we run into precisely the same problem as before—for example, after merging, the second conjunct of (45a) is (46a). Allowing the linguist-variable to be unselectively bound in situ by the question operator will yield the interpretation informally represented in (46b). (46) a. . . . who [if a certain linguist shows up, we are supposed to be particularly nice] b. For which x, if x shows up and x is linguist, then we are supposed to be particularly nice. c. It is Donald Duck, obviously! This, again, is an interpretation that the sentence lacks altogether. If it had this interpretation, we could have happily let Donald Duck be our reply, as in (46c), which we obviously cannot do. The same considerations
Scope-Shift
73
apply to (45b), which is precisely analogous to (43), as far as the semantic problem goes. So, if this semantics is correct, I would have been perfectly justified in volunteering Donald Duck, again, as my guess for who Max will believe. 2.3.3 Existential Wide Scope Recall that the original problem of wide scope for which the island-free QR was assumed is that of quantifier scope, discussed in detail in section 2.2. It may appear that these cases lend themselves most easily to the solution of unselective binding, since that mechanism was proposed, to begin with, in order to allow indefinites to be externally bound either by the standard existential operator, or by what DRT introduced as a discourse existential operator (or by another operator, in whose restrictive term they occur). This could give us, then, the maximal (specific) and the intermediate wide scopes discussed in section 2.2. This line of thought was, indeed, developed by Beghelli (1993, 1995), who argues that indefinites are unselectively bound by an existential operator, which is located, syntactically, at the C-projection. But it should be trivial to observe now, as Heim (1982) did, that we certainly cannot give up QR to handle these cases with unselective binding. Nor can we replace QR, for this problem, with the absorption mechanism, which moves the determiner only while leaving the N in situ. Let us, first, check this with the same conditional context we have already processed several times. (I do this just because among the examples involving islands, these are the easiest to explain. The problem is much broader, as we will see.) (47) If we invite some philosopher, Max will be o¤ended. (48) Derivation without QR (unselective binding) bi [if we invite [some philosopher]i Max will be o¤ended]] (49) b(x) ((philosopher (x) & we invite x) ! (Max will be o¤ended)) (50) Derivation with QR a. [Some philosopher]i [if we invite e i Max will be o¤ended] b. bx (philosopher (x)b(we invite x ! Max will be o¤ended)) We are trying to capture the wide-scope (specific) interpretation, that there is some philosopher such that if we invite that philosopher Max will be o¤ended, namely, (50b). The LFs we derive with no QR can be (48), where we introduce an existential operator (or some syntactic binder). Recall that in the DRT framework some in (47) has no interpretation,
74
Chapter 2
so the structure is interpreted as in (49). If (47) is construed this way, can it ever be used falsely? Not in our present world, where there are many nonphilosophers, hence, it is necessarily true that if they were both philosophers and invited, Max will be o¤ended. But the actual (47) is not a necessary truth. So, the upshot is that we give up QR, we generate the sentence with a meaning it cannot possibly have, and what’s worse, we fail to generate a meaning under which it certainly can be used. The QRderivation (50), by contrast, yields the correct interpretation. As I mentioned, this is not a problem for Heim’s (1982) analysis of unselective binding. Fully aware of this problem, Heim first applies QR. For example, some philosopher, in (47), first moves to the topmost IP position, as in (50a) (violating subjacency). It is in this position that the indefinite variable is unselectively bound by the (discourse) existential operator. In this case, Heim’s analysis is precisely identical to that obtained by QR without unselective binding. (It should be recalled that existential wide scope was not the problem that motivated unselective binding.) It may be appropriate to check whether unselective binding can be modified to nevertheless handle this problem. In Reinhart 1987 I argued that unselective binding cannot, in any case, apply to individual variables, and the variable bound in donkey-type contexts must be a set variable. (This was needed, independently of the present problem, to handle the proportion problem.)23 Beghelli (1993) extends this analysis to the problem of wide existential scope. He argues that the existential operator unselectively binds a set variable. (His major motivation is to enable the analysis to capture plural cardinal indefinites.) However, this analysis faces precisely the same problem. Let us repeat (47) in (51a), using a plural cardinal NP for variety. Beghelli’s analysis will be (51b), where Y is a set variable, denoting the maximal set of philosophers we invite, with the cardinality of 2. (51) a. If we invite two philosophers, Max will be o¤ended. b. b(Y) ((jYj ¼ 2 & Y ¼ {xjphilosopher (x) & we invite x}) ! (Max will be o¤ended)) Again, all that (51b) says is that there is some set, such that if it has two members who are philosophers that we invite, Max will be o¤ended. There are many sets that meet this requirement (not only nonphilosopher sets, but also the null set). So the sentence ends up being a necessary truth. We have to conclude, then, that unselective binding does not provide us with the magic formula that can eliminate the idea of an island-free QR. We should keep in mind that the interpretation problem at issue is
Scope-Shift
75
substantial. Though I focused attention on conditional structures, this is only for reasons of presentation. The same problem will show up whenever the existential NP occurs in the restrictive term of a universal quantifier, as in (52). (If these sentences are interpreted by unselective binding, leaving the N-set in situ, they end up necessary truths, in every world whose entities are not only books and philosophers.) (52) a. Every student who solved some problem got a prize. b. Every joke about some philosopher got published. More generally, the problem shows up in any downward entailment context, as in the scope of negation in (53).24 (53) a. Max did not consider the possibility that some politician is corrupt. b. QR: bx (politician (x) & s (Max consider the possibility that x is corrupt)) c. Unselective binding: bx s (Max consider the possibility that [x is corrupt & politician (x)]) We are considering here the wide-scope (or ‘‘specific’’) interpretation of some politician in (53a), namely, the construal (53b), which would be derived if QR can extract the existential out of an island. But under the unselective binding procedure, (53a) receives the construal (53c) where the N-restriction stays in situ, in the scope of negation. While (53a) clearly can have a reading corresponding to (53b), it can never be used to mean anything like (53c). One of the situations that will make (53a) false is if for all politicians, Max considered the possibility that they are corrupt. But, under the construal in (53c), it can still be easily true in that state of a¤airs. It is highly likely that there is some nonpolitician entity about which Max did not consider the possibility that it is a politician (and corrupt). The problem can easily be extended to modals (e.g., take (53a) with Max could have, or should have considered, instead of did not consider). Again, this problem surfaces equally in the other contexts of existential wide scope we examined, like wh–in situ and sluicing. (54) a. Max would not even consider the possibility that some politician is corrupt, and you can easily imagine which. b. Which journalist did not consider the possibility that which politician is corrupt? In these cases, the fact that the existential must have wide scope is syntactically determined (as we saw). If wide scope is interpreted by absorption, the derivations in (54) only have the construal along the lines of (53c).
76
Chapter 2
Donald Duck, then, should be a perfectly possible answer, as the value of which (politician). I have focused here only on cases that involve wide scope out of islands, since these are the areas where the traditional QR faces problems. However, if we allow into the computational system the option of assigning wide scope via unselective binding, there is no way to restrict it to just the cases in which an existential occurs in an island (except, of course, by stipulation). If this is a free legitimate operation, then the interpretation problems we have observed extend to the most basic cases. Specifically, the negation problem will show up everywhere, as below. (55) a. The students did not understand some argument. b. bx s (the students understand x & x is an argument) (56) a. Which students did not understand which argument? b. Which hx, yi ((x is a student) & B (x understand y & y is an argument)) {Pjbhx, yi (student (x) & P ¼ bB (x understand y & argument (y)) & true (P))} If we leave the N-set in situ and interpret it as in (b) of these examples, it is hard to imagine a context in which (55a) could be false in our actual world, and anything could serve as a value of y in (56b). Any attempt to pull the restriction somehow outside of the scope of negation would eventually amount to just an implicit application of QR,25 though in these cases it is possible to apply QR to also rescue a correct representation for the sentences. The problem is how to block the interpretations derived here, if scoping by unselective binding exists. 2.4
The Semantic Problem with Island-Free QR
We seem to be back where we started. So far, the only analysis that appears to capture the wide scope of existentials is island-free QR. In all the examples discussed in section 2.3, the semantic problem will be eliminated if we just keep QR as the way to derive existential wide scope. It may appear that an option left is forgetting about the syntactic problems of the island-free QR as well as the sluicing problem, which cannot be addressed within the QR framework, and sticking with island-free QR in order to save at least the elementary question of interpretation. This last option, however, rests on the assumption I have made so far, that the island-free QR captures the semantics of existential wide
Scope-Shift
77
scope, and that the problems are only syntactic. There is good reason to doubt this assumption, which surfaces when we focus on plural (cardinal) existentials. Plural existentials can be interpreted distributively or collectively. Let us focus on the distributive reading, illustrated in (58). (57) A guard is standing in front of two buildings. (58) There are two buildings such that in front of each there stands a guard. The wide-scope distributive reading of (57) is paraphrased in (58). In principle two buildings could also be interpreted with a narrow scope, or with wide-scope collective construal, but these two interpretations happen to be inconsistent with world knowledge (entailing one and the same guard standing in front of (a set of ) two buildings). Hence, it is easy to focus here on the intended reading. To derive this reading, the NP two building must first be raised by QR, as in (59). (59) [two buildings]i [a guard is standing in front of e i ] (60) b two x (building (x) & by guard (y) & y stands in front of x) (61) a. [two buildings] lz (a guard is standing in front of z) b. bX (two (X) & building (X) & X Dlz (by (guard (y) & y is standing in front of z)) From there on, there are two basic (families) of approaches. One is that the raised NP is interpreted as a standard generalized quantifier (GQ) over the domain of singular individuals, which is then necessarily distributive. In that case, the interpretation will be already equivalent to that informally represented in (60), which is what we wanted to capture. The alternative is to assume that the basic (or only) interpretation of plural cardinals is as sets, or ‘‘plural individuals,’’ and a distributivity operator makes the predicate apply to each singular member of this set. In this particular example, the predicate is not just the VP, but the complex lpredicate of (61a). I represent this schematically in (61b), where D stands for a distributivity operator. The e¤ect of this operator is that for each x that is a member of the set X of two buildings, the l-predicate holds, namely, there is a guard standing in front of it. There are many implementations of this approach, and the details are not important for the present discussion. In any case, the interpretation along this line of
78
Chapter 2
argument should be equivalent to that of the standard generalized quantifier interpretation (as in Barwise and Cooper 1981). In (59), QR is nonproblematic, since there are no islands on the way. Our question is whether it can also apply in the same way outside of an island, as entailed by the island-free approach. Ruys (1992, 1996) observed that when existentials take scope outside of an island, they do not, in fact, allow the GQ (distributive) interpretation. Let us see this with one of his examples (from Ruys 2000). (62) If three relatives of mine die, I will inherit a house. Example (62) has the interpretation we have been calling all along the wide existential scope—that is, it can be construed as talking about three specific relatives of mine (rather than about any three relatives, as would be the case under the narrow-scope reading, which is also available for this sentence). Nevertheless, it does not have the GQ-reading we have been examining. Suppose QR applies, as in (59), to generate (63). Applying the standard GQ-interpretation, we get here (an equivalent of ) (63b). (63) a. [three relatives of mine]i [if e i die, I will inherit a house] b. b three x (relative of mine (x) & (x dies ! I inherit a house)) Construed this way, the sentence will be true if there are three relatives for each of whom it holds that if she or he dies, I inherit a house. That is, I could inherit a house if only one of these three relatives dies. But (62) clearly cannot be true in this case. The only wide-scope interpretation it has is that there is a set of three relatives, such that if each one of them dies, I inherit a house (in other words, they all have to die). Under the standard GQ-construal of existentials, this is the only interpretation we can derive for the syntactic representation (63). Thus, applying free QR to an existential GQ both fails to capture the wide-scope reading the sentence has, and generates a reading it does not have.26 This may appear as no concern for approaches that assume, following the DRT tradition, that the existentials of the relevant type are never interpreted as generalized quantifiers anyway. In these approaches, the distributive operator is always independent of the scope of the existential. (See Szabolcsi 1995 for a survey.) But in fact, the same wrong interpretation also arises in these approaches. Suppose three relatives of mine denotes a set, or a plural individual. The predicate applying to it, if we let QR apply here, is the l-predicate in (64a). Applying the distributive operator to that predicate, as in (64b), we get precisely the same interpretation as in (63b), which the sentence does not have.
Scope-Shift
79
(64) a. [three relatives of mine] lz (if z dies, I inherit a house) b. bX (three (X) & relatives of mine (X) & X Dlz (z dies ! I inherit a house)) c. bX (three (X) & relatives of mine (X) & (X Dlz (z dies)) ! (I inherit a house)) Under this second approach it is also possible to generate the correct interpretation for the sentence: if we apply the distributive operator to the predicate die, rather than to the full l-predicate of (64a), we get the correct interpretation, as in (64c). (That each member of the set dies, is now inside the conditional.) The problem is, however, that there is nothing to prevent the wrong interpretation we are considering from being generated as well. As we saw in (61), the distributive operator must also be able to apply to predicates created by QR, to allow for the wide-scope distributive readings of such sentences, so nothing rules it out in the case of (63). The problem with allowing island-free QR, then, is that this overgenerates— deriving nonexistent readings, also if the existential is not a standard generalized quantifier. Several other instances of this problem are discussed in Winter 1997. In several studies, it was argued that, under their distributive interpretation, numeral indefinites can never, in fact, have wide scope over NPs they do not c-command—that is, the distributive scope of existentials is only their overt scope. I will return to this issue in detail in section 2.7.3, where I o¤er an alternative account for the facts that motivated this view.27 One thing we can safely conclude is that wide scope outside an island is not possible, if a plural existential is construed as a (standard) generalized quantifier. So whatever this scope option is, it is not what would be obtained by applying QR to such a GQ. The other approaches, viewing existentials as non-GQ, are often less restricted, so it is possible that new machinery can be introduced to handle the problem posed by sentences like (62).28 However, the point remains that the semantics of existential wide scope out of islands is not as entailed by an island-free QR, unaided by lists of stipulations. In section 2.6.5 we will see other instances where it is not clear that natural language has the full range of options predicted by allowing QR to generate free wide scope of standard existential GQs. 2.5
An Intermediate Summary
Let me summarize the picture that emerges. We have seen that existentials, unlike universally quantified NPs, allow arbitrary wide scope. This
80
Chapter 2
cannot be dismissed as a problem of vagueness. Nor can it be reduced to ‘‘specificity’’ or ‘‘referentiality’’ options of existential NPs. Apart from this problem, however, there is no serious reason to abandon the earlier optimism of the QR view. We saw (in section 2.1.5) that there are many instances where a rule like QR is needed, and in all these cases it behaves, essentially, as entailed by known constraints on syntactic movement. In the specific area of relative quantifier scope, there may be further restrictions, or contextual strategies, that dictate scopal preferences and exclude options permitted by QR. Furthermore, as I have mentioned, it has been observed that, except for the case of existentials, nonovert quantifier scope is a marked option: it is often very hard to obtain and requires a strong discourse motivation. I return to the view of scope-shift as a marked operation in section 2.7. The serious problem, however, is the free wide scope of existentials. To capture this apparent wide scope we need to assume a completely ad hoc rule that is specific to existentials of the relevant type and that is free of any syntactic constraints. The no-movement alternatives we have examined so far (in section 2.3) dramatically fail to capture the interpretation of wide scope. Apart from the theoretical cost of this ad hoc rule, it faces empirical syntactic problems—for example, in the area of wh–in situ (touched on briefly in section 2.2). On top of that, as we last saw in section 2.4, it is not obvious that this ad hoc QR rule can always capture the truth conditions of wide scope found in natural language. Obviously, what we would like to have is some way to capture the behavior of the relevant existentials without moving them, and still get their truth conditions right. An alternative implementation of what QR captures has been developed in the DRT tradition. This is based on assuming restricted variables. In Kamp and Reyle 1993, it is postulated that all the NP-internal restrictions are entered when the discourse variable is introduced (i.e., at the top box). Szabolcsi (1995, 1997) proposes that these variables range over minimal witness sets of the GQ that the relevant NP denotes. This line of thought resembles the mechanism of quantifier storage of Cooper 1983. We know already, from the previous round of quantifier storage, that this mechanism is, indeed, equivalent to QR, since it has the same e¤ect as pulling the whole NP out of its original position, so these implementations face none of the problems discussed in section 2.3. To evaluate the predictions this approach makes in comparison to QR, we need to know how precisely this pulling out of the restriction is derived compositionally, which is not always spelled out. But a fully formal implementation of this
Scope-Shift
81
storage procedure is provided in Abusch 1994. Due to the explicit execution, it is easy to observe that her system is precisely equivalent to QR, hence it faces the same problem we observed for QR in section 2.4.29 Nevertheless, all the approaches surveyed here, including those in terms of unselective binding that we surveyed in section 2.3, are aiming at precisely the same intuition, that it should somehow be possible to capture the interpretation of indefinites of the relevant type in situ. I believe this same intuition can be captured in the analysis I propose next, which expands the choice-function approach to existential wide scope suggested in Reinhart 1992. In sum, what I assume is that QR can generate nonovert scope, subject to standard constraints on movements. All GQs can undergo QR, including the existentials under consideration. But these existentials share a property that also enables them to get wide scope without movement. For this reason, it is easy for them to obtain any nonovert scope— as opposed to the costly way open for the other GQs—to obtain it by movement. 2.6
Where No QR Is Needed: Choice Functions for Existential Quantifiers
2.6.1 Choice Functions and Existential Closure The interpretative problem is how to assign wide scope to existential NPs, which, otherwise, show properties of remaining in situ. Specifically, how can the N-restriction remain in situ, while still being interpreted as a restriction on a remote operator? Taking the idea seriously that the existential NP does not have to move means that it should be interpretable as an argument (rather than either a predicate or a generalized quantifier). A simple way to do that, outlined in Reinhart 1992, is to allow existential quantification over choice functions. As a first approximation, let us assume the following description of choice functions: A function f is a choice function (CH (f )) if it applies to any nonempty set and yields a member of that set. Let me first illustrate the intuition behind this line of thought, before addressing its formal properties in the following sections. This requires abstracting away, for the time being, from the question what happens when the N-set is empty (I turn to that question in section 2.6.5). Suppose we want to represent the wide scope of some book in (65a), without pulling its restriction out. This can be done as in (65b).
82
Chapter 2
(65) a. Every lady read some book. b. bf (CH (f )bEz (lady (z) ! z read f(book))) c. bx (book (x)bEz (lady (z) ! z read x)) In (65b), a choice function applies to the set of books. The function variable can be bound by an existential operator arbitrarily far away. Representation (65b) says that a function exists, such that for every z, if z is a lady, then z reads the book selected by this function. As desired, f(book) here is an argument (of read ), which corresponds to the fact that its NP stayed, syntactically, in an argument position, and denotes the value of the function f—that is, a given book. Note that the choice function used here is simpler than the more familiar Skolem functions, employed to capture narrow scope of existentials, where the choice of value for them varies with the choice of value for some bound variable. Though choice functions have been studied by logicians (since Hilbert and Bernays [1939] 1970), not much attention has been given before, in the linguistics literature, to this (choice-function) option of capturing existential wide scope. This is possibly since capturing wide scope in cases like (65) has never seemed a particularly interesting problem.30 Consequently, this use of choice functions has not been fully researched. Still, in a model where the N-set is not empty, (65b) is equivalent to the standard existential wide scope in (65c), which is the interpretation we want to capture. Let us check how the same procedure applies when the N-restriction occurs in the antecedent clause of an implication, since these are the contexts that posed problems to absorption or unselective binding. Examples (47), (48), and (50) are repeated here. (47) If we invite some philosopher, Max will be o¤ended. (48) Derivation without QR (unselective binding) a. bi [if we invite [some philosopher]i Max will be o¤ended] b. bx ((philosopher (x)bwe invite x) ! (Max will be o¤ended)) (50) Derivation with QR a. [Some philosopher]i [if we invite e i Max will be o¤ended] b. bx (philosopher (x)b(we invite x ! Max will be o¤ended)) (66) Choice-function interpretation bf (CH (f )b(we invite f(philosopher) ! Max will be o¤ended)) The problem we observed with (47), repeated above, was that if we leave the N-restriction in situ, and bind it unselectively, as in (48), the sentence ends up a necessary truth in any world that contains nonphilosophers.
Scope-Shift
83
(Assume, e.g., that the philosophers-set is not empty in that world. Nevertheless, there is always some nonphilosopher entity, of which the implication is true.) This problem is eliminated when we apply the choice-function procedure, as in (66). Although the N-restriction stays in situ just the same, the NP–in situ can now denote only a philosopher. (Representation (66) says that a function exists, such that if we invite the philosopher it selects, Max will be o¤ended.) Assuming, again, that the philosophers-set is not empty, (66) ends up equivalent to the standard representation of wide scope in (50b), repeated above, which is obtained if we apply an island-free QR. A di¤erent question, which we have been postponing (until section 2.6.5), is what happens when the philosophersset is empty. Negation contexts are also no longer a problem. The wide existential scope in (53a), repeated in (67a), is represented in (67b). What occurs in the scope of negation here is the politician selected by the function. (Example (62) asserts that a function exists, such that it is not the case that Max considered the possibility that the politician it selects is corrupt.) So it is no longer the case that anything could be a value of the variable. (67) a. Max did not consider the possibility that some politician is corrupt. b. bf (CH (f )bs (Max consider the possibility that f(politician) is corrupt)) Next, let us look at the cases of ‘‘intermediate’’ wide scope discussed in section 2.1.3. As we saw, in (21), repeated in (68a), the choice of a problem may vary with the choice of a linguist, in which case some problem is not specific. Still it can take scope over every analysis. (68) a. Most linguists have looked at every analysis that solves some problem. b. [Most linguists]1 [[every analysis that solves some problem]2 [e1 looked at e 2 ]] c. For most linguists x, bf (CH (f )bEy (analysis (y)by solves f(problem) ! (x looked at y)) Assuming that existentials can be interpreted without movement, via a choice function, this reading of (68a) is not a problem. Existential closure of the function variable (its binding by an existential operator) is a purely interpretative procedure, applying arbitrarily far away, so there is no reason this existential should not also be introduced in the scope of another operator. If it is introduced as (informally) in (68c), we obtain the interpretation under consideration. (In the QR-framework, this representation
84
Chapter 2
will be derived by first applying QR to the every QNPs, as in (68b). The binding existential can be introduced anywhere in that derivation. But these are independent details, on which nothing hinges for the present discussion.) Let us turn to the problem of wh–existentials. Crucially, in all standard semantic approaches to questions (e.g., in the approach of Karttunen that I have assumed here), wh-NPs are translated as existential quantifiers. Hence, we can apply to them straightforwardly the same mechanism of quantifying over choice functions. In (69a), which lady moved overtly, but we are interested in the interpretation of the wh–in situ which book. Let us abstract away, first, from the moved NP, and maintain the standard (existential) interpretation for it. For the wh–in situ which book we apply a choice function, yielding f(book). The function variable will then be bound by the relevant question operator, as illustrated informally in (69b), yielding the question denotation (69c). (The question denotes here the set of true propositions P, each stating for some lady x and for some function f that x read the book selected by f.) (69) a. b. c. d.
Which lady e read which book? For which hx, fi, (lady (x)) and (x read f(book)) {Pjbhx, fi (CH (f )blady (x)bP ¼ b (x read f(book))btrue (P))} {Pjbhg, fi (CH (g)bCH (f )bP ¼ b (g(lady) read f(book))btrue (P))}
As for the moved which lady, technically, it is no longer in an argument position, so it cannot be directly interpreted as an argument of the form f(lady). If we want to nevertheless maintain uniformity of interpretation for all wh-expressions, some covert syntactic operation could apply to (69a), to turn it back into an argument (either introducing a l-operator, or reconstruction), in which case, an interpretation like (69d) could be assigned. Nothing here hinges on whether we decide to do this or not. Turning to the conditionals problem repeated in (70), we apply the same procedure, where the choice function selects a value from the philosophersset. Although the restriction occurs in an if-clause, the values permitted in the answer can only be from the philosophers-set, as we saw in the discussion of (66), with the existential some philosopher. (70) a. Who will be o¤ended if we invite which philosopher? b. For which hx, fi, if we invite f(philosopher), x will be o¤ended. c. {Pjbhx, fi (CH (f )bP ¼ b (we invite f(philosopher) ! x will be o¤ended)btrue (P))}
Scope-Shift
85
The sluicing cases also follow straightforwardly. As we saw, following Chung et al., the antecedent clause of, for example, (71a) gets copied into the sluice clause as it is, yielding (71b). (71) a. Max and some lady disappeared, but I can’t remember which [ ] b. Max and some lady disappeared, but I can’t remember which [Max and some lady disappeared] (72) a. But I can’t remember which [Max and f(lady) disappeared] b. But I can’t remember which f (Max and f(lady) disappeared) c. But I can’t remember {Pjbf (CH (f )bP ¼ b (Max and f(lady) disappeared) & true (P))} Let us now focus on the resulting second conjunct in (71b). The embedded question there looks like gibberish, and the analysis makes sense only if the correct question interpretation may be derived for it. Recall that the determiner some plays no role, semantically, so we may treat the indefinite some lady in the same way we have been doing above, by introducing a choice-function variable to select from the set of ladies, as in (72a). This function variable must now be existentially closed. Since it occurs in a question context, it gets bound by the existential activated by which. This binding is illustrated informally in (72b), and it is translated into the standard question representation, in (72c). Example (72c) is, indeed, the question denoted by the second conjunct of (71a), namely, the interpretation we wanted to derive for the sluiced part. The ‘‘correlate’’ existential can, under such construal, occur in an antecedent of an implication or in the scope of negation, since it will be correctly interpreted, as in the previous cases we examined in (68) and (70). It is easy to observe that the same analysis will apply to all the cases we have considered so far, with the correct results, so it seems that assigning wide scope to existentials without moving their restriction is possible. 2.6.2 Deriving the Choice-Function Interpretation Let me first be more specific on how the choice-function interpretation is compositionally derived. One of the basic insights of DRT is that indefinite NPs of the relevant type lack a quantificational determiner—that is, what may appear syntactically to be an indefinite determiner (a, some, three) is not a determiner in the semantic sense, which could turn the NP into a standard generalized quantifier. This means that an indefinite NP of this type just denotes a
86
Chapter 2
predicate (of type he, ti), and the question is how we proceed from that starting point. It will be useful to have, at this point, some picture of the internal structures of NPs. In the syntactic framework it is assumed that the relevant projection here is DP (Determiner Phrase), which contains an NP. Without entering the massive syntactic literature on the analysis of DPs, let us assume that what indefinite DPs lack in this case is Spec of DP, so their structure is as represented schematically in (73), a view developed in Danon 1996.31 (73)
Let us assume, further, that what determines the quantificational force of a DP is its Spec, which hosts (semantic) determiners of the GQ-type. The D-head of the projection only hosts features (relevant both for syntactic agreement and for interpretation) like number, þ/wh, or gender (in some languages). The determiner words here are all (X 0 ), so they can serve as the D-heads. However, they can also head a projection of their own and be inserted at the Spec position. In the DRT framework (e.g., Kamp and Reyle 1993), it is assumed that the indefinites of our relevant type can never also be construed as GQ, which, in our syntactic terms means they only have the structure in (73). This, however, is not really motivated syntactically. In principle, heads can project to an XP, and as XPs, they can occur in the Spec position. This is visible with modified numerals, such as more than three and exactly three, which are clearly XPs, but there is no reason why, say, three alone cannot also head an XP (unmodified). If it does, it is inserted in SpecDP. If this is the case, Spec is interpreted as a standard semantic (existential) determiner, and the DP denotes a GQ.32 I will assume, then, that the indefinites of the relevant type also allow the standard GQinterpretation, though nothing in my semantic analysis actually hinges on this assumption, so it can be excluded, by stipulation.33 In any case, we are now concerned with how the structure in (73) is interpreted. As observed, the D 0 -projection (which includes the NP)
Scope-Shift
87
denotes only a predicate (based on the N-set), with at most a cardinality marker. Let us start with the case of singular indefinites. The neutral assumption is that the predicate is of type he, ti. To enable function application (say, with the VP-denotation), some covert function must be introduced, to do the job of the empty Spec, which usually hosts a function. In principle, this could be either of the type hhe, ti, hhe, ti, tii, which would turn the DP into a generalized quantifier, or of the type hhe, ti, ei, which turns it into an individual. A choice-function analysis along the first line of thought is developed in Winter 1997, but here I will pursue the second. A choice-function variable is, then, introduced, as in (74).34 (74)
At this stage, we have a function variable, which must still be existentially closed. The intuition expressed in DRT, that indefinites of this type correspond in some sense to free variables, can thus be maintained without assuming the individual variables of Heim 1982. For the binding of the function variable, we may assume the procedure of existential closure discussed in Heim 1982. However, I assume, crucially, that such closure can apply only to function variables, and in no case do we allow unselective binding of individual variables. The default assumption is that closure can apply freely anywhere. If it needs to be further restricted, this would require some special restriction posed by the computational system, since it could not follow from logic. (Such restrictions were proposed by Heim for unselective binding.) But this does not seem necessary, since, as we saw in section 2.1.3, the so-called intermediate wide-scope readings exist. These are derived if existential closure applies in the scope of another operator. In cases of wh–in situ, such as which woman, under the semantics we assumed for them all along, they are viewed just as standard existentials, hence at the local NP-level they can be analyzed just as in (73). However, they di¤er from the other existentials in that their binding existential operator must be inserted in a predetermined position in the scope of the question-formation operator (which forms the set of propositions denoted
88
Chapter 2
by the question). In English, the position where closure applies is marked by the wh-constituent that moved overtly. Summarizing, I assume that the computational system allows two interpretative procedures for indefinites. They can be either construed as standard existential generalized quantifiers (over singular individuals), or with the choice-function interpretation. On the first, they behave like any other GQ, and their scope is restricted by syntax (i.e., it is either the overt scope, or that permitted by an island-sensitive QR). On the second, they can have any scope, depending on where we apply existential closure. (The assumption that the standard GQ-construal is available as well can be dropped, without a¤ecting the analysis of the second procedure.) 2.6.3 The Collective-Distributive Distinction When we turn now to plural indefinites, along the question of scope, there is the question of the distinction between the collective and the distributive readings of existentials. The standard GQ-construal always yields the distributive interpretation. Some procedure must be assumed, in all approaches, for also deriving the collective interpretation. Since we now have at our disposal an additional construal of indefinites, based on (74), we would not like to assume that on top of the choice-function mechanism for scope we also have a separate machinery for collectivity. I will argue that, indeed, the same choice-function procedure is also what generates the collective interpretation of plural indefinites. That the two are related can be witnessed by again examining the problem raised by Ruys (1992), in (62), repeated here. (62) If three relatives of mine die, I will inherit a house. (63) a. [three relatives of mine]i [if e i die, I will inherit a house] b. b three x (relative of mine (x) & (x dies ! I inherit a house)) We saw that under the wide scope of the existential, it cannot be construed as a standard GQ. If we apply QR to a GQ, we get a reading equivalent to (63b), which can be true in a situation where just one of the relevant three relatives dies, a reading the sentence does not have. The only interpretation available is where the existential is taken as a collective set of three relatives, all of whom must die for the implication to be true. To proceed, we need, first, some analysis of plurals and collectivity. It is widely assumed that the cardinal in indefinites construed collectively (i.e., the D-head in (73)) is interpreted as some sort of a modifier, as in (75) (e.g., in the modification view of Kamp and Reyle 1993, or in Hig-
Scope-Shift
89
ginbotham 1985, where a modification structure is described as enabling two variables to be ‘‘discharged’’ by the same operator). (75) a. Three women chatted. b. bx (women (x)bthree (x)bchatted (x)) But what is x in (75)? It could not be a standard individual variable, since we are not talking here about an individual with the property of being three. So it must denote a set, which appears to distinguish it from the case of singular indefinites. A desire common to many approaches is to keep type uniformity in the analysis of singular and plural indefinites (though this is not a conceptual necessity). Two (families of ) ways are available for that: either to reduce plural sets to individuals, as in the tradition of Link 1983, or to lift singulars to sets, as proposed in Scha 1981. I will follow the second approach here, since it enables one of the solutions to the empty-set problem that I discuss in section 2.6.5. But other implementations are certainly conceivable. On this view, the predicate must be lifted to type hhe, ti, ti, so it can apply to the set argument. This can be represented as in (76), where the Scha-star on the verb indicates that it denotes this higher type. (I will ignore this star in the subsequent discussion.) (76) bX (women (X)bjXj ¼ 3b*chatted (X)) (77) a. Some/a/which woman chatted. b. (. . .) bX (women (X)bjXj ¼ 1b*chatted (X)) Under the uniformity approach, singular indefinites are interpreted the same way, with the cardinality being 1, as in (77).35 Thus, singular indefinites (when not construed as a GQ) denote a singleton set. Since the predication now is of the higher type, the next question is how it can distribute over individuals in the argument set, when it is a plural set and this is a relevant interpretation. For the present discussion, any of the available approaches to distributivity can be assumed.36 Returning now to the choice-function procedure, under this implementation, the value of a choice function applying to the set in D 0 must always be a set rather than an individual, as assumed before (in (74)). That is, the function variable applies to a set of sets, and selects a set, as represented in (78a). To allow its use also under di¤erent implementations, let us assume the schematic description of the choice-function type in (78b), where T stands for a type, and its value may be either hei or he, ti.
90
Chapter 2
(78)
So, the representation of the NPs in (76)–(77) under the choice-function construal is given in (79b)–(80b). For convenience, I will continue to use the informal notation in (c), but it should be read as (b). (79) a. three women b. f ({Xjwomen (X)bjXj ¼ 3}) c. f(three women) (80) a. Some/a/which woman chatted. b. f ({Xjwomen (X)bjXj ¼ 1}) c. f(woman) With this assumed, we can turn to Ruys’s problem in (62), repeated again in (81a), which posed a problem for the QR-view. The function variable is introduced to apply to three relatives in situ. Its value now is a set of three relatives. Since we are interested in the wide-scope construal, the function variable is existentially closed outside the conditional. The result is abbreviated in (81b), which should be read as (81c). (81) a. If three relatives of mine die, I will inherit a house. b. bf (CH (f )b(f(three relatives of mine) die ! I inherit a house)) c. bf (CH (f )b((f ({Yjrelatives of mine (Y) & three (Y)}) die) ! (I inherit a house))) So, (81) now reads that there is a function f, such that if the set of three relatives it selects dies, I inherit a house. This death of a set is interpreted, under any distributivity mechanism, to mean that each member of this set dies. In its treatment of collectivity, this analysis is just one of the possible variants of the standard view, which we observed in section 2.4. However, as we saw in the discussion of (64), partially repeated below, a problem in capturing the wide-scope collective reading with island-free QR was that it is not obvious how to prevent a distributivity operator from applying to the whole l-predicate obtained by QR, as in (64b), which yields the wrong distributive reading, just as (63) does.
Scope-Shift
91
(64) a. [three relatives of mine] lz (if z dies, I inherit a house) b. bX (three (X) & relatives of mine (X) & X Dlz (z dies ! I inherit a house)) But this is precisely the problem eliminated by the choice-function approach. The indefinite is interpreted in situ (and it cannot, even optionally, be moved out of an island, since QR is island-sensitive). Thus, in (81) there is no new predicate formed at the covert structure. The only predicate that takes a set argument is die, hence it is only this predicate that can distribute. So we derive only the interpretation the sentence has: that there is a set of relatives, such that if each of them dies, I inherit a house. Under this analysis, then, the choice-function procedure is what generates the collective interpretation of plural existentials. It applies uniformly to generate the relevant set locally (in situ). The question of scope is a byproduct. Since the function variable can be existentially closed anywhere, the scope of a collective existential NP is determined by where we choose to apply it. Recall that the system still allows existential NPs to be construed as (distributive) GQ. In a previous draft, I assumed, following Scha 1981, that this may be all we need: genuine distributivity is only obtained via the GQ-procedure. As for the distributivity e¤ects of the predicate (like die, in (81)), Scha argued that they may follow from the lexical semantics of the predicate, with no need to assume a special distributivity operator. However, Winter (1997) and Heim (personal communication) point out that this cannot be maintained, in view of more complex examples.37 I therefore assume now that such an operator is needed. If this is the case, it becomes less obvious why we should also allow the GQ-interpretation of the relevant existentials. Currently, this creates a redundancy, allowing two ways to derive what appears to be the same distributive reading. I leave open the question of whether the readings are indeed always identical—that is, whether a GQ-interpretation must still be available. The crucial point I would like to maintain, though, is that there is no need to assume both a mechanism for deriving free wide scope of existentials and a separate mechanism for collective interpretations. Rather, these are instances of one and the same choice-function procedure. 2.6.4 Which Indefinites Are Interpretable by Choice Functions? So far I have left open the question of which indefinite NPs allow free wide scope, and, correspondingly, a choice-function interpretation. As
92
Chapter 2
noted in section 2.1, they must be weak, or existential (under Keenan’s (1987) definition of the term), but it is not the case that all existentials allow free wide scope. Beghelli (1993) and Szabolcsi (1995, 1997) argue that the relevant group includes only indefinites with unmodified (bare) numerals, of the kind I have used in the examples throughout (a, some, three, which, many, and so on). This is the group that, for Kamp and Reyle (1993), has only the set (or (plural) individual) interpretation. The other group, of existentials with modified numerals, includes all plural numerals that occur with any kind of modifier: less than three, more than three, exactly three, at least three, three or more, between three and five, and so on. Kamp and Reyle argue that NPs of this type are interpreted only as generalized quantifiers. If this grouping of existentials is correct, it should mean, under the present analysis, that the second group does not allow a choice-function interpretation. Consequently, their maximal scope cannot be wider than that allowed by an island-restricted QR.38 Next, since choice functions are what generate the collective readings, they should not allow a (genuine) collective interpretation. Both consequences are argued to be true in the studies cited, but let us examine the second, which may appear more problematic, as presented there. This requires more attention to collective predicates. Some such predicates, like meet, surround, or even lift a piano (under imperfective uses), also appear easily with a GQ (e.g., Most students met). But there is another group of collective predicates that does not, like be a good team/ couple, or the collective weigh two pounds. Dowty (1986) suggests that in the predicates of the first group, there are subentailments regarding the role of each member of the set in the collective activity (so, loosely speaking, they remain distributive), but in the second, there are no such subentailments. Possibly, another characteristic of the di¤erence is that if a predicate of the first type is true for some set, it is not excluded that it is true of some subset of this set. (If 100 people surround the yard, it is not excluded that 70 of them also surround the same yard.) But in predicates of the second type, this is excluded: if three potatoes weigh two pounds together, then it is false that two of them do, and a subset of a good team is not the same good team. Yoad Winter (personal communication) observed that it is this second predicate group that should be checked to see whether an NP has a genuine (i.e., set) collective interpretation.39 Indeed, bare numerals can occur with such predicates, as in (82), but strong GQ cannot, as in (83). The modified numerals in (84) pattern with the GQ, and it is much harder to assign any meaning to these sentences.
Scope-Shift
93
(82) a. Three / many potatoes weigh two pounds together. b. Ten / which workers in our o‰ce are a good team. (83) a. *?Most potatoes weigh ten pounds together. b. *?All workers in our o‰ce are a good team. (84) a. *?Less than five potatoes weigh two pounds together. *?At least three potatoes weigh two pounds together. b. *?More than ten workers in our o‰ce are a good team. *?Exactly ten workers in our o‰ce are a good team. The question, in our terms, then, is why it is impossible to interpret modified numerals in the choice-function procedure. The puzzle posed by these numerals is that they do not form any known semantic set. They include both monotone decreasing (less than three), nonmonotone (exactly three), and increasing quantifiers (more than three). Most puzzling is what semantic property could possibly distinguish between three and at least three. Kamp and Reyle (1993) argue that bare numerals are precisely those that ‘‘introduce a discourse referent.’’ The modified numerals lack this discourse property. As a diagnostics of this property, they o¤er an examination of anaphora behavior of the two types: a question in discourse anaphora is whether a pronoun in sentences like (85) refers back to the N-set, or to the intersection set of N and the predicate. (85) a. Five students left shortly after the exam started. They could not understand the questions. b. More than four students left shortly after the exam started. They could not understand the questions. Suppose ten students actually left in our model. Could the pronoun nevertheless refer to just five students in (85a)? Kamp and Reyle’s judgment (shared by Szabolcsi (1997)) is that it can. But in (85b), the pronoun cannot refer to just any number greater than four. So, if only five students of those who left did not understand the questions, (85b) is false, but (85a) can still be true. The judgments here are subtle,40 but they are clearer on the second Kamp and Reyle test, with intrasentential anaphora. (86) a. Three portersi broke a table theyi lifted. b. At least three portersi broke a table theyi lifted. Example (86a) has both the collective and the distributive reading of the predicate. Example (86b) has only the distributive one, namely, each of
94
Chapter 2
the three porters broke a table he lifted. Now, the pronoun under the collective reading of (86a) must be able to refer to the N-set alone.41 Suppose, as is very likely, that the set of NPs interpretable by choice functions (or, in Kamp and Reyle’s terms, as sets) also have some common discourse functions. Nevertheless, introducing the ‘‘discoursereferent’’ property is not, in itself, an answer to our problem, since this is not an inherent (logical) property of determiners or NPs, and the puzzle still remains why just the set of bare indefinites should have this discourse function. Possibly a pragmatic answer could be sought in terms of procedures of assessment. I believe this is the intuition behind Szabolcsi’s (1995, 1997) attempt to define this set of indefinites in terms of their witness sets. She assumes that existentials of this type involve existential quantification over minimal witness sets. On this view, we could say that the basic interpretation of all existential NPs alike is that of a generalized quantifier. However, a typical property of indefinites of the relevant set is that it allows assessment by checking just one minimal witness set of the GQ.42 Hence these indefinites are allowed to also be interpreted by existentially quantifying over such a witness set. (Translating this into the present framework, the choice function selects from the set of minimal witness sets.) As appealing as this line of thinking seems,43 the problem currently is that it does not slice the set we want. Though it can be developed to exclude all non-monotone-increasing existentials, it cannot exclude the other modified numerals. Specifically, it is not obvious why more than three ladies smiled cannot be assessed with a minimal witness set of four ladies, or why at least three cannot be assessed by checking a minimal set of three. Nor is it obvious why it should not apply, just the same, in the case of strong increasing quantifiers. In the absence of semantic or pragmatic properties that could distinguish the relevant groups, we may pay closer attention to their syntactic properties. Recall that in section 2.6.2, I assumed, following Danon 1996, that the relevant bare numerals have the structure (73), repeated here. (73)
Scope-Shift
95
We noted that an element can occur in the D-position in this structure only if it is of the X 0 syntactic type—that is, it can serve as a head. A head cannot be modified by anything, hence modified numerals cannot occur in that position. On the other hand, the same head three can project its own XP, as is the case with modified numerals. As an XP, it can occur only in the Spec position of (73). What modified numerals have in common, then, under Danon’s analysis, is that they have the structure in (87). (The D-head in this structure hosts only syntactic features.)44 (87)
We assumed, further, that it is the Spec position that always corresponds to a GQ semantic determiner. It follows then that indefinite NPs with this structure must be interpreted as GQs. On the other hand, we assumed that it is only when the Spec position is empty, as in (73), that a choicefunction variable is introduced in this position, to enable function application. It follows that precisely the set of bare numerals is interpretable by choice functions. What this means is that the interface of discourse and syntax, which was insightfully observed in the DRT framework, in this case goes in the other direction than assumed there. It is not that discourse properties are encoded in the syntax (or the formal semantics), but rather, independent properties of the human computational system (syntax) enable certain discourse uses. The choice-function procedure, whose semantics I am about to explore more closely, generates options that discourse strategies can happily use. Since the choice-function variable can be existentially closed at any point, one of the options is to do that at the (widest) discourse level, in which case the indefinite can be used for forming a discourse entity. 2.6.5 Some Choice-Function Semantics I have not yet been fully explicit on the formal characterization of the quantification I assume over choice-function variables. There is no reason to expect that adapting this approach will require any less semantic work on its precise implications than in the case of unselective binding, dynamic
96
Chapter 2
semantics, or any of the other approaches to quantification. But let me point out some basic questions that need to be addressed. 2.6.5.1 The Empty Set Our point of departure was to attempt to capture the wide scope of existentials. So far I have assumed that (at least in the singular case), its truth conditions are the same as would be obtained if we apply an island-free QR to standard (GQ) existentials. Before we can even check whether this is so or not, we must decide what happens when the D 0 set that a choice function applies to is empty, as in (88a) (assuming that there have been no American kings). Under the classical analysis, in (88b), the sentence is false, but if we say nothing further, the choice function in (88c) could just select any arbitrary value, so the sentence would come up true, in case someone visited Utrecht. (88) a. An American king visited Utrecht. b. bx (American king (x) & x visited Utrecht) c. bf (f(American king) visited Utrecht) One line of thought that may suggest itself is to let the choice functions be partial. In this case, (88c) comes out undefined, pretty much the same as may be the case if a definite NP like the American king occurs in the sentence. If so, then clearly choice-function semantics is not equivalent to classical logic of existentials. It would be recalled that some of the specificity approaches, discussed in section 1.3, assume anyway that indefinites are ambiguous, and that under one construal, they carry something like an existence presupposition. Diesing (1992) argues that explicitly, but the other approaches assuming ambiguity (like the D-linking view) are also consistent with this idea. The source of the presuppositional e¤ects of indefinites was never defined in these approaches, beyond the level of stipulation, and the choice-function procedure outlined here could be used, then, to provide the missing definition. Indeed, this is the approach taken in Kratzer 1998, who adopts the choice-function approach of Reinhart 1992 only for the problem of ‘‘specific’’ readings, and assumes that these functions are partial and thus that the relevant indefinites are presuppositional. I believe, however, that this is a move that should not be taken too hastily. The procedure of choice-function interpretation, as outlined here, applies in a vast variety of contexts. It is the mechanism responsible for all collective construals of plural indefinites, and, in approaches assuming no standard GQ-construals for them, it is the only interpretation that indefinites of the relevant type can get. As we saw in section 2.1.3, Krat-
Scope-Shift
97
zer assumes a narrower use of choice functions—for example, she argues that there are no intermediate-scope construals, so existential closure of the function variable is always only with widest (discourse) scope. But we also saw there that this is not, in fact, the case, and existential closure must be able to apply anywhere. Furthermore, I argue in Reinhart 1995 that the idea that indefinites are sometimes presuppositional (in the semantic sense of yielding an undefined value when the N-set is empty) was never su‰ciently substantiated. In fact, empty-set indefinites create the impression of a presupposition failure only when used as topics. But under that use, this follows from a pragmatic, rather than semantic, approach to referential presuppositions, along the lines of Strawson 1964. On this view, assessment of a sentence starts with the set denoted by its topic, and if that set is empty, assessment gets stuck, an unpleasant experience that one may describe as a presupposition failure. Allowing indefinites to carry existence presuppositions is a serious move, which turns them into strong rather than weak quantifiers, and disables basic entailments.45 It should not be taken without very substantial evidence and motivation. In fact, it is not at all a necessary consequence of the choice-function procedure that (88a), under (88c), is undefined. There are several conceivable ways to avoid this result and allow (88c) to be false, as entailed by classical logic. I will outline two ways, of which I prefer the second. First, in approaches that allow partial functions, it is possible to assume that although choice functions are, indeed, partial, the value of the sentence depends on how we define the existential quantifier in a threevalued logic. Since we want to keep the classical-logic view on this matter, we may assume its definition in (89).46 (89) (bx) A is true i¤ for some value of x, A denotes true, and false otherwise. On this definition, (88c) is false. There is no value of the f-variable that makes the formula true, and this is su‰cient to define it as false. This means that (88a) is not ambiguous, and its two representations are equivalent. The next question is the way choice functions work in implications. Under the narrow-scope construal of American king in (90a) (inside the antecedent of the conditional), the sentence should be true. This, indeed, is derived already. The function variable in (90b) is existentially closed inside the antecedent. Thus, the A relevant for (89) is the antecedent clause.
98
Chapter 2
Since the function is undefined, there is no value of the variable that can make it true, and the antecedent comes out as false, by (89). Hence, the implication is true. (90) a. If we invite an American king, Max will be o¤ended. b. [bf (CH (f )bwe invite f(American king))] ! [Max will be o¤ended] (91) a. bx (American king (x)b(we invite x ! Max will be o¤ended)) b. bf (CH (f )b[we invite f(American king) ! Max will be o¤ended]) The interesting case is the wide-scope construal of the existential. Under the classical-logic reading (91a) (which will be obtained if we apply QR to a standard GQ), the sentence is false, since there is no American king such that Max can be o¤ended if he is invited or not. Is the same true of (91b)? More generally, the question is whether (92a) and (92b) are equivalent. (92) a. bx Q(x) & (P(x) ! B) b. bf (P (f(Q)) ! B) Here, the A relevant for (89) is the whole implication. Independently of our problem, it has been debated in systems allowing the undefined truth value, whether when the antecedent of an implication is undefined, the implication should come out true or undefined (e.g., for sentences like Max will be o¤ended if we invite the present king of France). Under the second (undefined) decision, the antecedent in (92b) is undefined; hence, the implication is undefined. Given (89), then, the whole formula is false, since there is no value of the variable that yields it true. Under this assumption, then, (91b) is false, and (92a) and (92b) end up equivalent. But under the other view, (91b) comes out as true, just like (90), so (92a, b) are not equivalent. I will soon examine the possibility that this is nevertheless the correct result. An analysis along these lines (partial choice functions, and (89)) will face di‰culties that, independently of our specific problem, are posed to any three-valued logic. (These are surveyed in Winter 1997.) So it is important to observe that, at least for our case—of (indefinites’) choice functions—it is not necessary to allow partial functions into the semantics. An alternative approach, proposed in Winter 1997, is to also define choice functions when they apply to the empty set. It rests on the fact that we already assume that the linguistically relevant choice functions se-
Scope-Shift
99
lect sets rather than individuals (i.e., their type is hhhe, ti, ti, he, tii). All we have to do is define them so that when they apply to the empty set, their value is the empty set. Winter’s definition of choice functions is roughly as in (93). (93) F is a choice function i¤ for every set S of type hhe, ti, ti: a. if S is not empty, F(S) ¼ X, where X A S. b. if S is empty, F(S) is the empty set of type he, ti. Recall now that whenever an indefinite is interpreted via a choice function, so that it denotes a set, the predicate that takes this value as an argument is lifted to type hhe, ti, ti. In a system like Scha’s (1981), which works with such predicates, it is independently necessary to stipulate that they yield ‘‘false’’ when applied to the empty set. Let us state this in (94). (94) The extension of any lexical predicate of natural language excludes the empty set. With this assumed, (88a), repeated below, also comes out false under the choice-function construal in (88c), (since f(American king) denotes the empty set, and P(q) is false, by (94)). (88) a. An American king visited Utrecht. c. bf (f(American king) visited Utrecht) Next let us look at the implication case, repeated below. The narrowscope construal in (90b) comes out true, as is standard: there is no function whose value can render the antecedent true. Since the antecedent is false, the implication is true. (90) a. If we invite an American king, Max will be o¤ended. b. [bf (CH (f )bwe invite f(American king)] ! [Max will be o¤ended] (91) a. bx (American king (x)b(we invite x ! Max will be o¤ended)) b. bf (CH (f )b[we invite f(American king) ! Max will be o¤ended]) (92) a. bx Q(x) & (P(x) ! B) b. bf (P (f(Q)) ! B) However, under the wide-scope construal (91b), which has been our focus here, the result is not equivalent to the classical-logic representation in (91a). The antecedent remains false, as in (90b), so the implication is true. This means that, in fact, (92a, b) are not equivalent. They yield the same truth value only when the N-set is not empty. When it is, the
100
Chapter 2
choice-function interpretation yields the same value for the wide scope as for the narrow scope. More generally, this means that, under Winter’s implementation, the choice-function interpretation does not generate for indefinites precisely the same set of truth conditions as that generated by an island-free QR. (The same result is also obtained in Winter’s (1997) implementation of choice functions as generalized quantifiers.) Is this good or bad news? This is not a conceptual question, but an empirical one. The question is whether English sentences like (90a) do, in fact, have the truth conditions allowed by QR. Specifically, do we actually ever judge them as false? In the case of an implication, the judgments required here may be too subtle, since the logical verdict that (90) is true is not easily accessible, anyway, by naive intuitions. So let us look at a negation context instead. (95) a. The organizers did not invite two American kings to the party. b. There are two American kings that the organizers did not invite to the party. (96) a. The organizers did not invite two American linguists to the party. b. There are two American linguists that the organizers did not invite to the party. Example (95a) is most easily judged as true. Example (95b) is an English sentence that demonstrates the wide-scope reading that (95a) would get under standard (QR) existential construal. Sentence (95b) is obviously false. But it is very di‰cult to read (95a) as meaning the same as (95b). This does not indicate that it is generally di‰cult for numeral indefinites to get scope wider than negation. This reading is readily accessible in (96a), which can easily be understood as meaning the same as (96b). Possibly there are other ways to account for this result. But it is, nevertheless, what we would get under Winter’s analysis of the truth conditions of choice functions: it does not matter what the scope of the empty-set indefinite in (95a) is (i.e., where we apply existential closure). Under both construals it remains true, so in the case of the empty set, a sentence like (95a) cannot be ambiguous.47 Winter (1997) discusses several other contexts that support the view that natural language does not, in fact, have the full range of truth conditions predicted by allowing standard GQexistentials to have free scope. 2.6.5.2 Extensionality The empty set aside, the analysis should capture all standard properties of the wide scope of existentials. For this, we must
Scope-Shift
101
make sure that the given functions always select only from the extension of the N-set in the actual world (even when the N-restriction originates in an intentional context). The problem can be illustrated with the question in (97). (97) Who wants to marry which millionaire? Which millionaire here occurs in the complement of want. Nevertheless, its scope is marked by the top who, so the question cannot be ambiguous, and which millionaire only has an extensional construal. But since no movement is involved, and the N-restriction stays in situ, nothing so far guarantees that the function will select a set from the set of millionaires in the actual world. Technically, this can be captured by defining the range of quantification for f, as in (98).48 The set of choice functions is now defined in G. These functions apply to the intension of a given set (of sets), and select an element from the extension of this set in the actual world. (Under Winter’s analysis, discussed in (93), P must be of type hs, hhe, ti, tii, and if aP ¼ q, then f(P) ¼ q. But I leave (98) open on that, to allow other implementations.) (98) G ¼ {fjEP (aP 0 q ! f(P) A aP)} P of type hs, he, tii, or hs, hhe, ti, tii This means that the precise representation of, for example, (99a), should be (99b), rather than the simpler version I have used so far. (f is defined to belong to the set in (98). Thus, its argument is an intension and its value is an extension—a philosopher in the actual world.) Similarly, the wh–in situ of (97) is interpreted as in (100). (99) a. Max will be o¤ended if we invite some philosopher. b. bf (f A Gb(we invite f b(philosopher) ! x will be o¤ended)) (100) a. Who wants to marry which millionaire? b. {Pjbx bf A G (P ¼ b(x wants to marry f b(millionaire))btrue (P))} All instances of quantification over choice-function variables above should be read in the same way. 2.7
Scope-Shift: An Interface Repair Strategy
2.7.1 Minimize Interpretative Options We saw that QR is, in fact, a much more restricted operation than typically assumed. The clearest cases of what appears as scope outside of the
102
Chapter 2
c-command overt domain are captured, independently of QR, by the choice-function mechanism, which interprets them in situ. Nevertheless, there are cases of genuine scope-shift, for which we still need QR. Though this QR residue can just be viewed as a standard instance of a movement operation, it still poses conceptual problems. As I have mentioned throughout this chapter, the problems were always there, but they are more acutely obvious in the framework of the minimalist program. As we saw in chapter 1, the original theoretical goal in that framework was to allow movement (overt or covert) only for formal morphological reasons of checking features. That was captured by the economy condition (101) (discussed as (1) and (47) of chapter 1). (101) ‘‘If a derivation D converges without application of some operation, then that application is disallowed’’ (Chomsky 1992, 47). Although it is possible, of course, to introduce some arbitrary feature that justifies QR, this goes against the spirit of the program, since there is no morphological evidence for such features. In the case of quantifier scope, this movement is motivated only by interpretation needs, and it is only witnessed at the inference interface. As I mentioned in section 1.3, it is not obvious that the strong restriction in (101) can be maintained for overt movement, because there is growing evidence that optional overt movement, not required for any morphological reasons, is available across languages. Nevertheless, the basics of the minimalist program enable us to state the problem with free covert movement. Recall (from the introduction) that the elementary requirement of the computational system is to make the interface possible—a process that has always been stated as relating sound to meaning. The final outputs of the system can be viewed as pairs hp, ii of a phonological representation and an interpretation representation. This relation is mediated by syntactic derivations. We may either assume that the relevant properties of these derivations are encoded in the phonological representations, as assumed in the theory of phonological phrases, or that in generating the hp, ii pairs, the computational system is operating on hp, di inputs of a phonological representation and a derivation, yielding hp, ii outputs. I will return to these questions in chapter 3. We may note now that the more interpretations that can be associated with a given phonological representation, the more complex the computation at the context interface is—the computational system must generate more hp, ii pairs for
Scope-Shift
103
each derivation, which is not necessarily problematic, but at the interface, only one such pair needs to be selected in the given context. The more there is to select from, the harder adaptation to context is. There are several views regarding what economy considerations are (what ‘‘economy’’ consists of ). A prevailing approach, which I examined in chapter 1, is that these considerations minimize computational e¤ort within the computational system itself—the ‘‘least-e¤ort’’ conditions. However, if we look at the problem from the perspective of the context interface—or more generally from that of language use (communication)—an economy strategy that would be extremely useful would be minimizing interpretative options associated with a given phonological representation. It may appear that by this reasoning, a perfect computational system should allow no ambiguous phonological representations at all. But this is certainly not a possible conclusion. The crucial requirement is to meet the interface needs to begin with. There is no way to know that a system with no ambiguity would allow all that is needed for the inference and context systems—it may just be too poor, hence fail the interface requirement completely. In any case, we do know that the given human computational system allows ambiguity, just as it allows di¤erent derivations with the same interpretation. But when it comes to covert movement, special attention is required to the context interface. This is a powerful mechanism that can associate with each single phonological representation several interpretations, obtained by movement not recoverable from the phonological representation itself. (Since QR is not clause-bound, the number of possible scopeinterpretations increases rapidly when the derivation includes one or more clausal complements.) This is an obvious area where an interface economy requirement to minimize interpretative options would be very useful. The economy requirement (101) is of the type aiming at reducing the number of possible derivations out of a given numeration. In the case of overt movement, this has nothing to do (if it holds) with minimizing interpretative options, because overt movement also changes the phonological representation, so the number of hp, ii pairs per derivation does not increase, in principle, with applying as many overt operations as we want. (An accidental increase as an outcome of overt movement is possible, of course.) But if it applies to covert operations only, then it is a restriction on interpretative options, since covert operations of the QR-type increase, in principle, the number of interpretations associated with a single phonological representation. Let us, then, restate (101) as (101 0 ).
104
Chapter 2
(101 0 ) If a derivation D converges without application of some covert operation, then that application is disallowed. Principle (101 0 ) as well may turn out too strong as formulated. My crucial claim here is that some prohibition against covert operations that increase, in principle, the number of interpretative options associated with a given phonological phrase must hold, if the computational system meets optimally the requirement of economy (e‰ciency) of the context interface. Principle (101 0 ), on this view, is just a specific instantiation of the broader economy principle ‘‘minimize interpretative options.’’ As I mentioned, the prevailing concept of economy has centered around the ‘‘leaste¤ort’’ principle. Given that most arguments for such a principle came from syntax, and they no longer hold in current syntax, as we saw in chapter 1, it is appropriate to doubt whether such a principle is directly active at the interface. An interface instance where it has been previously assumed is the coreference restriction (Rule I), where variable binding was viewed, since Reinhart 1983b, as a more e‰cient way to express anaphora than coreference. The ‘‘least-e¤ort’’ view of this restriction is emphasized by Reuland (2001), who argues that computations applying at the interface (coreference) are always more costly than those applying at the CS (variable binding). However, I argue in chapter 4, based on Reinhart 2000, that there is a serious empirical problem with the ‘‘least-e¤ort’’ approach to coreference. Given a basic sentence like Max loves his mother, this approach entails, incorrectly, that there can be no coreference-binding ambiguity here, and that the sentence allows only binding, which is the ‘‘least-e¤ort’’ way to express anaphora. It is still possible that something like the ‘‘least-e¤ort’’ strategy is operative in determining the first preference in the processing of anaphora, but this would still not be su‰cient for explaining coreference restrictions like Rule I (or Condition B). Rather, I suggest in chapter 4 that the underlying economy principle is something like ‘‘minimize interpretative options.’’ I turn to the way this works for coreference in that chapter. Here let me just state a rough approximation of this principle. (102) Minimize interpretative options Unless required for convergence, do not apply a procedure that increases the number of interpretations associated with a given single PF. ‘‘Least e¤ort’’ is, of course, a very broad principle that does not specify exactly what counts as e¤ort. It is possible, therefore, to view (102) as
Scope-Shift
105
spelling out an instance of this broad principle. Increasing the number of interpretations associated with a given PF also increases the e¤ort required from the addressee (hearer) for identifying all interpretative candidates and selecting one in context. So having (102) as a principle that guides the application of interpretative procedures also conforms with ‘‘least e¤ort.’’ 2.7.2 Applying the Illicit QR as a Repair Strategy Based on what I have said so far, QR is not allowed at all—that is, it is an illicit operation, ruled out by (101 0 ). But the whole point of this chapter has been to argue that it is nevertheless needed in a restricted set of cases. On the approach outlined in chapter 1, illicit operations may still be used, in case the outputs of the computational system are insu‰cient for the interface needs of a given context. Thus, applying an illicit operation is a strategy used to extend the options permitted by the CS, and can be viewed as a repair mechanism. But its application still violates a condition of the CS. (In the case of QR, it increases the set of interpretations associated with the given PF.) Therefore, its application comes at the cost of constructing a reference set to determine whether the illicit extension of the CS’s limits is indeed justified. We may turn now to the view of QR as a repair strategy. The roots of this approach are in the concept of QR as a marked operation. The markedness approach, stated in semantic terms, was proposed by Keenan and Faltz (1978), who argue that lambda abstraction applies only to capture marked scope. I pursued that idea within the LF-framework in Reinhart 1983a, chap. 9. The approach rests on the well-motivated assumption, in the framework of generalized quantifiers, that to interpret quantified NPs, there is no need to ever raise them. The only motivation for movement is to obtain scope wider than their c-command domain at the overt structure. But this scope-shift is the marked case, and it is harder to obtain than the overt c-command scope. It is far from obvious, therefore, that the computational system should be dramatically modified just to capture the marked cases. I proposed, instead, that the standard interpretation of quantified NPs is in situ—that is, their scope is their overt c-command domain. But QR may apply to create alternative scope construals. Scope outside the c-command domain, then, requires a special operation, which does not apply in the case of interpretation in situ. Thus interpretations derived by this operation are more costly. This may explain why they are marked and harder to obtain.49 There is one area where inverse scope is not only ‘‘unmarked,’’ but appears to be
106
Chapter 2
obligatory—with complements of N (inverse linking). I argued in Reinhart 1976 that this scope construal must be governed by a mechanism independent of QR, but the precise analysis of this problem is still an open question.50 As mentioned in section 1.3, the concept of markedness was always a bit vague, and the notion of a costly operation was not defined. However, the perspective of reference-set strategies at the interface enables us to give it more specific content. A marked operation is an illicit operation, which violates some principle of the computational system. Applying such an operation requires checking that there is good reason to do this—in other words, that this is indeed the only way for a given derivation to meet the interface needs. Technically, checking this involves constructing and computing a reference set of pairs hd, ii, of a derivation and its interpretation, all with the same input (numeration) and the same interpretation. If the set contains a derivation that does not use this operation, its application is ruled out. It is the fact that reference-set computation is required, then, that makes the operation costly. The idea that QR is a marked and costly operation rested originally on the intuition that it is harder to obtain wide scope of universal quantifiers outside their c-command domain. This intuition found support in empirical studies of Gil (1982), where nonlinguist subjects across languages were asked to identify scope construals of sentences. Gil found that although nonovert scope exists in such cases, the preferred reading (statistically) is overwhelmingly the overt one. Nevertheless, such considerations are not su‰cient to decisively establish the claim that QR is not a free operation, but rather a costly one. In principle, there could be all kinds of performance factors that determine why one interpretation is preferred over the other, and the decisions regarding the structure of the computational system should not normally be based on statistical frequency or other performance considerations. Hence, there seemed to be no independent evidence that QR applies only when needed to obtain scope wider than overt c-command, and the debate concerning the status of QR seemed for years to be purely theory internal. The first direct evidence that QR does not apply freely was provided by Fox’s (1995, 2000) findings, which I surveyed in section 1.2. A problem with covert movement is that we normally have no direct access to check how and whether it applies (since it has no e¤ect on the phonological representation). However, Fox provided a way to do that, using ellipsis
Scope-Shift
107
structures. Recall that the problem (noted by Sag 1976 and Williams 1977) was why the ambiguity of (103a) disappears when it is placed in the ellipsis context of (103b). (103) a. A doctor will examine every patient. (Ambiguous) b. A doctor will examine every patient, and Lucie will [ ] too. (Only narrow scope for every patient) (104) a. Every patient1 [a doctor will [VP examine e1 ]] and b. Every patient1 [Lucie will [VP examine e1 ]] The scope construal of (103a) that disappears in the ellipsis context is the construal obtained by raising every patient covertly, as in (104a). Since we know that this construal is possible for (103a), in isolation, the explanation for the ellipsis context must rest on what happens in the elided conjuncts. The parallelism requirement on ellipsis determines that the scope construal in both conjuncts should be identical. Hence, to derive the reading (104a) in this context, the elided conjunct should have the structure (104b), where every patient raises covertly, in the same way. Fox argues that this construal is illicit, because the movement has no e¤ect on the interpretation—(105a), where this movement applies covertly, is precisely identical in interpretation to (105b), where it does not. 8 9 > a. hEvery patient (105) > [Lucie will [ examine e ]] > > 1 VP 1 > > < = For every patient x, Lucie will examine xi > > >b. hLucie will [VP examine every patient] > > > : ; For every patient x, Lucie will examine xi As we saw, the way this is computed, technically, is that applying QR requires the construction of a reference set consisting of pairs hd, ii of a derivation and its interpretation. So the reference set for (104b, 105a) is (105). Since this set contains the pair (105b) with the same interpretation but with a simpler derivation, (105a) is ruled out. If QR applies freely, there can be no di¤erence between (104a) and (105a). In both, the operation applies to the (same) quantified object. Thus Fox provides a proof that QR is, in fact, not free, and it needs to be checked against the interpretative e¤ects it produces. This example is particularly interesting, since there is even some context pressure to allow QR to apply here. If it does, it would allow the conjunction (103b) to have the interpretation (104), which is not obtainable if we do not apply QR in the second conjunct (104b). However, Fox points out that a¤ecting the interpretation of a neighboring derivation does not count as a
108
Chapter 2
su‰cient reason to apply an operation illicitly. It is only allowed if this operation produces a new interpretation for the given derivation itself. As I mentioned in chapter 1, Fox couches his analysis in terms of the Minimal Link Condition (MLC). He still assumes that QR is an obligatory operation for all quantified DPs. Hence it also applies in (105b), but VP-internal arguments are constrained by the MLC to move only to a VP-initial position. If they move further, as in (105a), the MLC would allow this longer link only if it has an interpretative e¤ect. However, this assumption that QR is obligatory has no empirical basis, and it rests only on theory-internal considerations. Recall that what is at stake here is the question of whether the interpretation of quantification requires an operation like QR. On the standard QR-view, QR is an obligatory operation that is assumed to be necessary—for example, in order to create the variable bound by the Q-operator, regardless of whether the final scope is isomorphic to the overt c-command domain or not. On the alternative view, QR is not required for the interpretation of quantification, but it is only an optional operation for obtaining scope-shift. Whatever is needed for the interpretation of generalized quantifiers can be captured directly at the stage of assigning a semantic representation to sentences, as done in the Montague, or generalized-quantifier, tradition. (It is not necessary to assume that each l-operator required in the semantic representation corresponds to a variable in the syntactic representation.) The least we can conclude is that precisely the same results obtained in Fox’s analysis are obtained in a system where QR is an illicit operation that never applies at all, unless forced to by relevant interface needs. We saw in chapter 1 that in recent developments of the minimalist program, the reference-set MLC, as originally stated, has no other evidence or use in the computational system. Forcing an otherwise superfluous QRmovement, just so it can obey this otherwise unneeded condition, does not seem to be an optimal move. We should note another implication of the MLC view of QR, which may have empirical consequences. As stated, this view entails that reference-set computation is required for every derivation that contains a quantified argument (and a two-place verb). Since in such cases QR should apply obligatorily, we have to consult the MLC to determine the landing site of the moved argument. The MLC, in the version under consideration, involves constructing a full reference set of hd, ii pairs in each case. Thus, consider again the derivation of the sentence under consideration, repeated in (106) but this time with no specified context.
Scope-Shift
109
(106) Lucie will examine every patient. 8 9 > a. hEvery patient (107) > [Lucie will [ examine e ]] > > 1 VP 1 > > < = For every patient x, Lucie will examine xi > >b. hLucie will [VP every patient1 [VP examine e1 ]]> > > > : ; For every patient x, Lucie will examine xi The sentence contains the quantified DP every patient. This DP has to undergo QR during the derivation. In principle, it could adjoin to either VP or IP. To decide where it should go in practice, we need to construct the reference set in (107). Since the interpretations are identical, and (107b) is the shorter link derivation, (107b) will be selected as the only possible derivation. Note that this is the logic of the system—reference-set computation must apply just to decide which is the correct landing site. If reference-set computation comes with a visible processing load, as I argued in section 1.3, this means that (106), and all (relevant) sentences with a quantifier, are harder to process than the same sentence with a referential argument. In other words, all sentences with (a two-place verb and) a quantifier are equally marked, in the sense described above, since they all involve the costly reference-set computation, regardless of which scope construal is selected. Though this has not been empirically tested, I do not expect to find this as an actual result. Under the alternative view I have proposed here, a reference set needs to be constructed only if we are considering applying the illicit QR. Only when this happens would there be a computational cost. And normally we expect this to happen only if there is a contextual reason to want to do that. Otherwise, condition (101 0 ) (no covert movement for purposes other than convergence) will apply and block this option from consideration. This example illustrates a general di¤erence between the view of reference-set computation as applying freely in the syntax, and referenceset computation as a repair strategy at the interface. The first is the standard case in Optimality Theory. In that framework, even the simplest derivation is a selection from a set with worse competitors. Hence, as I explained in chapter 1, Optimality Theory entails that the parser cannot be transparent, and in actual language use, the phonetic inputs are computed by algorithms that bypass the computational system. In a mixed system like the early minimalist program assumed by Fox, the parser may be still be transparent, with the isolated cases of reference-set computation as the exception.
110
Chapter 2
In any case, under the view that reference-set computation is involved in each derivation containing a quantified expression, it is not possible to explain why derivations with scope-shift are harder to obtain or more marked than derivations with no such shift. In both cases, the selection of an interpretation requires the same costly computation.51 2.7.3 Processing Limitations on the Size of Reference Sets—Indefinite Numerals Another potential indication of the costly nature of the scope-shift operation is the vast disagreement on the data in the linguistics literature.52 We have already noted some of the history of such disagreements in passing. Thus, in the 1970s, it was debated whether sentences like (108) indeed allow wide-scope construal of the object quantifier. (108) Some tourists visited every museum. I argued in Reinhart 1976 that ‘‘in spite of earlier reports in the literature’’ sentences like (108) could not be ambiguous and ‘‘the universal cannot have scope over the subject’’ (p. 193). I later retracted this position in favor of a marked QR-operation. But the same type of categorical verdict surfaced again in the 1990s with other problems of scope-shift. In 1992, Ruys argued that ‘‘in spite of earlier reports in the literature, which may have been founded on simple extrapolations from data with strong quantifiers, rather than on actual intuitions, it seems to be impossible for the object in [109a] to be interpreted with scope over the subject [in the distributive construal]’’ (pp. 106–107). (109) a. Three men lifted two tables. b. bX (two (X) & tables (X) & X Dlz (bY (three (Y) & men (Y)) & y lifted z)) Under the distributive wide-scope construal of two tables, represented in (109b), the sentence means something like two tables were each lifted by three men. This construal can be true in a situation where six men were involved in the lifting of the tables. It is, indeed, virtually impossible to associate sentence (109a) with such a model. The conclusion Ruys drew from this fact (which was first noted in Verkuyl 1988) is that plural numeral indefinites can never have wide distributive scope. This view was widely accepted in the 1990s, and was built into the theory of Ben-Shalom 1993, Beghelli and Stowell 1995, Kamp and Reyle 1993, and Szabolcsi 1997. In these approaches, it is assumed that scope-shift of universal, or all strong quantifiers, is fully free and productive. But in the case of numeral plural indefinites, it is not allowed, either
Scope-Shift
111
because such DPs cannot undergo QR, as proposed by Ruys, or because the distributive operator is too low to allow the subject to be in its scope, which is roughly the spirit of the analysis in Beghelli and Stowell 1995. Although the judgment of (109) is pretty robust, we should still note that judgments of scope-shift are known to vary with contexts. Thus, what eventually settled the previous debate regarding the status of (108) was that much more convincing examples were found, which show the existence of scope-shift with universal quantifiers. Typically, they are found where the overt scope results in a contextually weird interpretation. In the case of covert universal scope, one such example is (110), discussed by Hirschbu¨hler (1982) in a di¤erent context. The overt scope would yield here the reading that one flag was stretching over all buildings, which is highly unlikely. (110) An American flag was hanging in front of every building. (111) a. An American flag was hanging in front of two building. b. A guard stood in front of two buildings. As noted already in section 2.4, in the same context, numeral indefinites can be interpreted with wide distributive scope, as in (111). Sentence (111a) can clearly mean that there were two buildings such that in front of each, an American flag was hanging, and this, in fact, is the interpretation that would first come to mind—that is, this is the wide-scope distributive reading of two buildings. The same construal is also found in (111b). If numeral indefinites cannot scope out, which would explain the unavailability of wide distributive scope for two tables in (109), it should also not be available for two buildings in (111). But the fact is that the sentences in (111) do have the distributive reading. It appears that the state of the art regarding covert scope-shift remains as already described in Ioup 1975: its availability varies dramatically with contexts and with the individual quantifiers. As long as we do not reach clearer generalizations, beyond mere lists, there is no reason to take one of the contexts as more representative of the behavior of scope-shift than the others. Theoretically, we may as well take (111) as the representative example, and leave open the question of why it is so di‰cult to obtain the same reading in (109). A more ambitious task would be to search for generalizations that may explain this dramatic variation. Let us pursue here the line of thinking that is opened by the view of QR as an illicit operation. There appear to be two completely independent factors that determine the ease of obtaining scope-shift. One, which we have already noted,
112
Chapter 2
is the strength of the contextual need to apply QR. Under the present view, QR does not apply, unless there is an interface requirement for it. A strong requirement of this sort is consistency with world knowledge. As noted above, following Winter 1997, when the overt scope is inconsistent with world knowledge, it is easier to perceive the scope-shift reading. (112) a. A flag was hanging in front of twenty buildings. b. A bomb blew up five monuments across the world. (113) a. A police truck towed away twenty cars. b. A student read every book. In (112), world knowledge discards as unlikely the overt-scope reading and scope-shift is preferred. But in (113), there is nothing in the world to determine which of the two scope construals is more likely. Under the present view, applying scope-shift is always costly, because it requires reference-set computation. Given that there is nothing in the context that forces opting for scope-shift, the preferred option would be to assign the sentence the overt-scope interpretation. This explains the experimental findings in Gil 1982, where the overt-scope construal was overwhelmingly the preferred interpretation, regardless of the type of quantifier used. It is important to note, though, that we are examining sentences in isolation here. Uttered in a discourse context, there are many other interface needs that may force scope-shift even in sentences like (113). Among these contextual factors are topic and focus considerations, and the fact that specific indefinites resist scope dependence. It has also been known since Ioup 1975 that some quantifiers (like each) may force a preference for distributive wide scope even when it involves scope-shift. However, such contextual considerations are not su‰cient to explain the full range of the facts examined above. In (113) it is possible to construct, with the appropriate context, a scope-shift interpretation, but in (109), repeated below, it is much harder to imagine a context that would enable that. It remains virtually impossible to take (109) to mean that up to six men were involved in lifting the relevant two tables. (109) Three men lifted two tables. Furthermore, Beghelli and Stowell (1997) point out that even in contexts like (111) and (112), cited in Reinhart 1995 as examples for the availability of scope-shift with numeral indefinites, this shift is not always possible. If we replace the singular subject with a plural numeral indefinite, scopeshift becomes as di‰cult as in (109).
Scope-Shift
113
(114) a. Three flags were hanging in front of two buildings. b. Five guards stood in front of twenty buildings. The sentences in (114) are awkward. The only reading that can be obtained is the one inconsistent with world knowledge (the same five guards are simultaneously in front of twenty buildings). There must be some internal properties of the sentences in (109) and (114) that restrict the option of scope-shift. But what can these be? The standard approach, as we saw, has been to search for the answer in the computational system itself, specifically within the internal properties of the moved constituent. These approaches view QR as part of the computational system, and search for restrictions on its operation, stipulating, for example, that plural numerals cannot undergo QR. This has a theoretical cost—QR can no longer be viewed as just a free application of the Move operation, because special restrictions are needed on which constituents can move and where. This requires stipulating various abstract features corresponding to di¤erent quantified DPs, and functional projections that host these features. Still, with this massive enrichment of the machinery of the computational system, this approach cannot explain the di¤erence between (112) and (114). Beghelli and Stowell (1997) propose an enrichment of the computational system that does capture this di¤erence. First they assume that the distributive operator (their ‘‘silent each’’) has a fixed position lower than the external subject position (‘‘between AGRS-P and AGRO-P’’). It is only up to that position that the internal indefinite can move. Thus, to begin with, the subject is not in the scope of the moved indefinite and its distributive operator. To enter this scope, the subject needs to reconstruct to its original theta position (in SpecVP). In (112) this reconstruction takes place, so a flag in (112a) ends up in the (distributive) scope of two buildings. But the subject cannot always reconstruct. Beghelli and Stowell argue that only ‘‘simple indefinites’’ that they define to be singular indefinites and bare plurals can do so. All other indefinites, like the plural numeral subjects in (109) and (114), or generalized-quantifier indefinites (like less than three guards) must be interpreted in their surface position. Hence, the relevant scope-shift cannot be obtained in (109) and (114). The insight underlying Beghelli and Stowell’s analysis is that it is not just the properties of the VP-internal DP that determine its ability to take wide distributive scope; the properties of the subject have an e¤ect as well. However, the question remains whether their implementation of this insight is on the right track. More generally, the question is whether this is a problem of the computational system or of the interface.
114
Chapter 2
Note again the cost to the computational system, if enriched the way Beghelli and Stowell propose. Along with all the previous features and functional projections that govern the movement of quantified DPs, we now need some mechanism restricting reconstruction. It is far from obvious that reconstruction should be governed by feature compatibility at all. But if it does, it is not clear what independent property distinguishes precisely these two instances of ‘‘simple indefinites’’ from all other indefinites—that is, it is not clear which feature is coded. But the crucial question is empirical: Is it indeed possible to approach the problem with a list of the DP-subjects that prevent scope-shift (or cannot reconstruct)? Turning to this empirical question, we may note that the judgments of examples in this section may appear subtle, and many of them have not been previously discussed or evaluated in the literature. To verify my intuitions, I tested all examples (in Hebrew) on a couple of nonlinguist informants, using a method specified in a footnote.53 Let us look at the minimal pair in (115), where (115a) repeats (114a). (115) a. Three flags were hanging in front of two buildings. b. Three identical flags were hanging in front of two buildings. Unlike (115a), (115b) can be easily interpreted as asserting that in front of each of two buildings, three identical flags were hanging. While in (115a) the interpretation allows for only three flags (which makes it di‰cult to imagine the situation described by the sentence), in (115b) the preferred interpretation is that there were six flags in all. The DP three identical flags is not a ‘‘simple indefinite’’ by Beghelli and Stowell’s definition, and still, in their terms, it can reconstruct. To make sure this is not some peculiarity of the specific linguistic context in (115), let us examine other contexts. (116) a. Two simultaneous questions confused fifteen subjects in the experiment. (The others did fine with two simultaneous questions.) b. Ten matching answers brought two couples to the final round [in a televised couples contest]. c. Two subsequent meetings took place in three o‰ces. In (116a) it is not necessarily the case that the same simultaneous questions confused all thirty subjects. Similarly, in (116b), there is no reason to assume that the two couples got the same matching answers—that is, the wide-scope distributive reading where each of the two couples got ten (possibly di¤erent) matching answers is readily available. In (116c), each
Scope-Shift
115
o‰ce could host a di¤erent set of two subsequent meetings (in other words, there could be up to six meetings in these three o‰ces). What the sentences in (115b) and (116) have in common is that they disfavor a distributive interpretation of their subjects. Roughly, this is because the property of being simultaneous, identical, subsequent, or matching does not distribute among members of the set. (The set of subsequent meetings is not a set each of whose members is a subsequent meeting.) Given this observation, we can also note that this is, in fact, the property shared by Beghelli and Stowell’s set of ‘‘simple indefinites.’’ Singular indefinites obviously cannot have a distributive interpretation, and for bare plurals, this interpretation is extremely di‰cult to get. Nevertheless, having found a shared property of the subjects that do allow scope-shift of the object does not mean we can define the relevant set of DPs that can reconstruct in terms of their internal properties, say, the set of ‘‘nondistributable’’ DPs. Any numeral DP can be disambiguated to allow only the collective interpretation by using an adverb like together. It turns out that in this case, scope-shift is allowed, regardless of the internal properties of the subject. (117) a. Four guests sleep in two rooms. b. Four guests sleep together in two rooms. (118) Three Canadian flags were hanging together on two buildings. Example (117a) is the standard case where object scope-shift is impossible —the sentence can only be understood as involving four guests. However, in (117b) it is possible to construe the situation as involving eight guests —that is, in each of the two rooms four guests are sleeping together. Similarly, in (118) it is possible to construe the depicted situation as involving six flags. These construals can only be obtained if the objects have wide distributive scope. The descriptive generalization seems to be that if in a given derivation the subject could (potentially) be interpreted distributively, scope-shift of the object is not allowed. But, crucially, we are concerned here with the potential of the subject to be distributive, not with the question whether it actually is. In the derivations where scope-shift was found impossible, like (117a), (114), or (109) (Three men lifted two tables), the subject is indeed of the type that can be distributive, but it is not actually interpreted distributively in the derivation under consideration. That is, the scopeshift derivation is ruled out even if we interpret the subject collectively. Why should scope-shift of the object depend on whether the subject could, in principle, be interpreted distributively in another LF-derivation?
116
Chapter 2
Generally, when properties of the derivation are not absolute, but depend on its relations to other possible derivations or interpretations, this is an indication that reference-set computation is at work. Let us turn now to what could explain the pattern under consideration within the interface view of QR, which involves such computation. Even a brief check of the problematic derivations above, like (117a), reveals that the reason that scope-shift is ruled out cannot possibly be that the same interpretation is obtainable without QR. QR is the only way to obtain the relevant reading, and furthermore, we saw that there is a very good contextual reason to want to apply QR here. But I will argue that the reason the relevant readings cannot be derived, despite this fact, is that the computation involved in deciding the matter is too costly. The need to construct and compare a reference set is costly to begin with. However, in all cases considered so far, this is a cost that at least adult speakers can bear. (In chapter 5, I argue that children cannot.) An aspect of the computational cost that we have not considered so far is the size of the reference set. There may be a limit to how much even adults can hold in their working memory while attempting to satisfy the interface requirements. To see this, let us first review briefly again the procedure of constructing a reference set. When the option of applying the illicit covertmovement operation is considered, we need to construct a hd, ii pair of the intended derivation. We then need to find out whether the same i(nterpretation) is not available without applying QR—in other words, whether the same interpretation cannot be associated with the overt derivation. Strictly speaking, the only way to find that out is by running through all the interpretations of the overt derivation. This task is sometimes relatively simple, as in the case of (119). (119) a. A flag was hanging in front of every building. bf (CH (f )bEz (building (z) ! f(flag) was hanging in front of z)) b. [every building] [a flag was hanging in front of e] Ez (building (z) ! bf (CH (f )bf(flag) was hanging in front of z)) (120) A student read every book. In (119a), the only scope construal possible at the overt structure is with the universal quantifier in the scope of the existential. What is under consideration is applying QR, to obtain (119b), in which their scope is reversed. The two hd, ii pairs considered, then, are (119a) and (119b).
Scope-Shift
117
(Nothing hinges in this discussion on the choice-function mechanism used to represent the interpretation.) Since the interpretations are distinct, nothing rules out (119b). In this case, the evaluation of whether QR is permitted requires considering a minimal number of just two hd, ii pairs. This is a standard cost of reference-set computation. Whether scope-shift is easy to obtain in such derivations depends only on the contextual needs. Since in (119) world knowledge disfavors the overt-scope construal, the reference set is constructed and the scope-shift construal admitted. Obtaining this scope-shift in (120) requires precisely the same steps and reference set. However, as we saw, since there is nothing in the context that would lead us to attempt scope-shift to begin with, the option would not arise in isolation, which accounts for the impression that it is harder to obtain in this case. (As mentioned, there may also be other contextual factors that a¤ect the ease of obtaining scope-shift.) Let us now look at the computation that scope-shift requires in a derivation with two plural numerals, such as (121a), which has been the problem under consideration here. (121) a. Two flags are hanging in front of three buildings. b. [three buildings] [two flags were hanging in front of e] We are considering whether the QR-derivation (121b) is allowed. For this, it is necessary to check whether the interpretation it would generate is not also available without applying this illicit operation. To determine this, all scope construals possible in (121a) need to be listed and checked. It turns out that there are quite a few of these. For ease of presentation, I will represent them only informally here. First, there is the choice-function (collective) interpretation of both indefinites, which is summarized in (122a). (In fact, (122a) stands for two equivalent representations, a point I will return to.) In this construal the situation involves two flags and three buildings. (122) Choice functions a. There is a set x of two flags and a set y of three buildings, such that x is hanging in front of y. (Two flags, three buildings) Distributive subject b. There is a set of two flags, such that for each flag x in this set, there is a set y of three buildings, and x is hanging in front of y. (Two flags, six buildings) c. There is a set of three buildings y, and a set of two flags such that each flag x in this set is hanging in front of y. (Two flags, three buildings)
118
Chapter 2
Next, the overt derivation (121a) allows also a distributive interpretation of the subject, where each member of the two-flag set is considered. Recall that in the present view, this option requires no further covert movement. Distributivity is just an interpretative procedure that applies to the overt structure. So it is one of the interpretations of (121a) that needs to be considered. The internal three buildings can only be interpreted via a choice function (collectively). But the existential closure of the function variable can be either inside or outside the scope of the distributive subject. There are thus two scope construals under the distributive interpretation of the subject. In the narrow closure in (122b), the situation involves six buildings. The wide closure in (122c) ends up equivalent to (122a), but for us to see that, it would have to have been listed and computed. With all these interpretative options activated and stored, we may turn to evaluate the output of the illicit QR in (121b). In principle, the moved three buildings in (121b) could be interpreted collectively, via a choice function. This hd, ii pair, however, would be filtered out, because the interpretation is equivalent to what could be obtained without QR, in (122a) or (122c). But the interpretation that could motivate QR here is (123b), in which three buildings is distributive. (123) a. [three buildings] [two flags were hanging in front of e] b. There is a set of three buildings such that for each building x in this set, there is a set y of two flags, and y is hanging in front of x. (Six flags, three buildings) This interpretation is indeed distinct from all the others. It is the only one that allows the situation associated with the sentence to involve six flags. So applying QR here is very well motivated, and the derivation should be allowed. But while in the case of (119), the conclusion that QR is allowed required holding and comparing two hd, ii pairs, here the same procedure requires holding four such pairs (in fact, five, as we will see). This may just be too much for the human processor. The di¤erence in the processing load imposed by (119) and (121) is substantial enough to suggest that the problem with obtaining scope-shift in (121) is a problem of processing. Since the reference set required for (121) is too big, the computation cannot be completed, so scope-shift cannot be approved. The crucial complexity factor is the availability of the distributive interpretation of the subject, combined with a plural-numeral internal argument, of the choice-function type. This combination always adds two members to the reference set. All the facts discussed above now follow.
Scope-Shift
119
If the subject is a singular indefinite, as in (112), no distributive interpretation is possible, so these two extra members are not generated. The same is true when the distributive construal is otherwise not easily available, as with bare plural subjects, or the examples of (115)–(116) with three identical flags or two subsequent meetings. It is also clear why subjects that can in principle be distributive cause a problem in such configurations regardless of whether they are in fact interpreted distributively. The logic of the system is that to determine whether scope-shift is allowed, all scope construals in the derivation must be checked, to verify that the desired interpretation is not already available without QR. There is no way to verify this formally other than by running through all scope interpretations of the given derivation. The only time a derivation is exempt from this construal is when it is clear that in this specific derivation the subject cannot have the distributive reading. The other instance of such exemption that we observed was (117), repeated here. (117) a. Four guests sleep in two rooms. b. Four guests sleep together in two rooms. The derivation in (117a) follows the steps examined for (121), and scoping out two rooms is blocked due to the same factor of a processing load that is too heavy. But in (117b), with precisely the same subject, the distributive option is ruled out by the collective adverb together. The fact that the subject could be distributive in another derivation, without this adverb, is not relevant, because what we need to consider is the set of possible scopal interpretations of the given derivation. While the distributive interpretation of the subject is in the set for (117a), it is not in the set for (117b). It is important to note that considering the full set of possible scopal interpretations is only required when one contemplates applying QR. Recall that, as discussed in section 2.7.2, I assume that QR never applies just to interpret derivations, and given the choice-function mechanism, it also does not apply just for standard (collective) wide scope of indefinites. All scope construals except for scope-shift are obtained without QR. Under this view, the fact that (121a) and (117a) allow four scope construals, or that they are (two-way) ambiguous, is not relevant when the derivation is normally used at the interface. Disambiguating, or selecting the interpretation appropriate to context, is an altogether di¤erent procedure. There is no need to assume that the full set of options needs to be considered. Thus, what I said here does not entail that (121a), without QR, is more
120
Chapter 2
complex or di‰cult to process than (119) (with a singular indefinite subject and a universally quantified object). The only entailment is that the QR-interpretation is relatively easy to process in (119), but cannot be processed in (121) and (117a). The comparison of (119) and (121) represents two edges of a spectrum of possible reference sets for QR. A two-member set is relatively easy; a set with four or five members is unprocessable. In between there are other options, where the situation may be less clear. Thus, let us compare the computation involved in (119), repeated below, with (124), where a plural indefinite replaces every building. (119) a. A flag was hanging in front of every building. bf (CH (f )bEz (building (z) ! f(flag) was hanging in front of z)) b. [every building] [a flag was hanging in front of e] Ez (building (z) ! bf (CH (f )bf(flag) was hanging in front of z)) (124) a. A flag was hanging in front of two buildings. bfi (CH (fi ) (bfj (CH (fj ) (fi (flag) was hanging in front of fj (two buildings))) bfj (CH (fj ) (bfi (CH (fi ) (fi (flag) was hanging in front of fj (two buildings))) b. [two buildings] [a flag was hanging in front of e] (125) a. There is a flag x, such that there is a set of two buildings y and x was hanging in front of y. b. There is a set of two buildings y such that there is a flag x and x was hanging in front of y. In (124a) both arguments are interpreted with a choice function. Technically, this means that the overt derivation allows two scope construals, depending on where existential closure is applied. This is only a matter of which closure is in the scope of the other, as given in (124a). For convenience, an informal representation of the two options of existential closure is given in (125). The two construals are equivalent, of course, but strictly speaking, if the reference set must include all scopal interpretations of the overt derivation, both options need to be checked. The QRderivation in (124b) would be filtered out if two buildings is interpreted collectively, but here it is interpreted distributively, so the derivation is allowed. The upshot is that while the reference set for (119) includes two hd, ii pairs, the one for (124) includes three such pairs. (This is also the reason
Scope-Shift
121
why the reference set for the example of two flags and three buildings in (121) includes five members, and not four as assumed above, for brevity. The collective construal of the DPs corresponds to two representations.) Does this di¤erence have a processing e¤ect? It seems impossible to trace any di¤erences regarding the ease of obtaining scope-shift in these sentences. But this is also the context where world knowledge strongly favors scope-shift. Recall that the theoretical verdict in the literature we started with (following Ruys 1992) has been that plural-numeral indefinites cannot scope out at all, namely, independently of what the subject is. This must have been based on observing some actual di‰culties with scoping them out also when the subject is a singular indefinite. If we move to a context that does not force an interpretation so strongly, there seems to be a di¤erence in the ease of scoping out between the two types of DPs. (126) a. A tablecloth covers every table. b. A tablecloth covers two tables. (127) a. A doctor will examine every patient. b. A doctor will examine twenty patients. In an informal check with nonlinguists, my informants interpreted (126a) as involving a separate tablecloth for each table. But the first interpretation of (126b) was with one tablecloth for both tables (though it was possible to convince them that there could also be two separate tablecloths). This is a context that still has a slight preference for the scopeshift construal (since it is more common for tables to be covered with individual tablecloths). In the context of (127b), with no preference imposed whatsoever, it is much harder to imagine the situation as involving more than one doctor. It is not completely impossible, as it is in the two numerals examples, but it takes lots of e¤ort to construct a context that would enable that interpretation (e.g., that in the emergency room patients were first examined and screened by interns, who decided that twenty patients require a doctor’s attention). Thus, indeed there seems to be a slightly greater di‰culty with scoping out numeral indefinites, and I cannot o¤er more insight into why it is sometimes more di‰cult than at other times. Another indication that it is the size of the reference set that determines the ease of scope-shift, rather than the internal properties of the moved DP, comes from examining complex numerals like more than five, less than five, or at least five. In some of the literature on numeral scope (e.g., Beghelli 1993), these are considered the most radical instances of
122
Chapter 2
unmovable DPs. It is believed that they are even harder to scope out than the bare plural numerals. In fact, it seems to be the other way around (based again on informal checking with nonlinguists). (128) a. A tablecloth covers two tables. b. A tablecloth covers at least two tables. (129) a. A doctor will examine twenty patients. b. A doctor will examine less than twenty patients. While in (128a) the preferred interpretation involved one tablecloth, in (128b) my informants preferred the construal with at least two tablecloths, namely, the scope-shift reading. In the context of (129), which does not impose contextual preferences, it is much easier to perceive the scope-shift reading (not necessarily the same doctor) in (129b) than in (129a). Recall (from section 2.6.4) that in the present analysis, complex numerals are not interpretable by choice functions. (Kamp and Reyle (1993) argue that they are interpretable only as generalized quantifiers.) For this reason, they do not have multiple scope construals in situ. For the bare plural numerals, like two tables, di¤erent scopes can be obtained by applying closure at di¤erent projections, without moving the DP. But for generalized quantifiers, scope wider than their overt position can be obtained only by movement. This means that when considering their covert movement, the reference set would be identical to the cases of (119), with an internal universal quantifier. It contains only two members—the scope construal in situ, and the scope construal after movement. Hence their ease of scoping out should be identical to that of universal quantifiers, which appears to be the case. Returning to reference sets with three members, they are found also when a generalized quantifier is scoped over a bare numeral indefinite subject. This is because the subject has in this case two construals— collective and distributive—so the overt derivation comes with two members already in the reference set. One such example is (108), repeated below. I am admittedly biased regarding this sentence. In Reinhart 1976 I used it to argue against the idea of QR, claiming that there is no scopeshift reading in this sentence. I still find this reading di‰cult to get (harder than with a singular indefinite). (108) Some tourists visited every museum. Nevertheless, we should not attach too much significance to the di¤erence between a two-member reference set and a three-member set. There may
Scope-Shift
123
be slightly greater processing di‰culty associated with the second, but it is also obvious that a three-member reference set is not beyond the processing ability of adults, and many other contextual factors may have a bigger e¤ect on the ease of obtaining scope-shift in this case than the size of the reference set. The crucial distinction we have observed in this section is the distinction between the five-member sets (with two or more bare numeral indefinites) and the reference sets with two or three members. The former mark a real limitation of the human processor, so contextual factors can do very little to save scope-shift in such cases.
Chapter 3 Focus: The PF Interface
Identifying the focus of a given derivation is of crucial importance for the context and the inference systems. The focus constituent relates the utterance to context, and has an e¤ect on truth conditions (inference). It is therefore essential to the interface that derivations are associated with focus. I will not be concerned here with the semantics of focus and its actual e¤ects on the interpretation, which is a well-studied area. For this question, I assume the basic semantics of focus proposed in Rooth 1985, 1992, by which the focus is always computed against a set of alternatives. But the question at stake here is how the association with focus is obtained, namely, how the inference and context systems identify the focus of the derivation—how focus is coded. As noted in section 1.3, the question bears resemblance to the questions of scope. Quantifier scope plays a role in the inference system and determines truth conditions. It is crucial therefore for the inference system to identify the scope in a given derivation, and the question we asked was how this information is coded—that is, how the inference system knows what scope to associate with the quantifiers in each given derivation. A central debate in that area has been whether scope is marked overtly or covertly. On the first view, c-command relations at the overt structure determine scope uniquely. On the second, scope is determined by ccommand relations at LF—in other words, covert operations may change overt scope. As we saw in section 1.3, there is also similarity in the history of the concepts of quantifier scope and focus in theoretical linguistics. At the earlier stages—for example, Chomsky 1971—focus was viewed, essentially, as a property defined on PF-structures. The basic idea was that sentence stress is assigned independently, by the phonological rules, and the interface systems make use of this available stress in relating a sentence to its context, to signal the focus and presupposition structure. The focus
126
Chapter 3
was defined as any constituent containing the intonation center of the sentence. So, on this view focus is coded overtly. This rests on the notion of ‘‘normal’’ or ‘‘neutral’’ intonation. Specifically, a distinction was needed between this type of normal stress, and more marked stress options required by discourse needs. This is reminiscent of the markedness view of scope-shift, which we examined in chapter 2 (Keenan and Faltz 1978; Reinhart 1983a), where the ‘‘normal’’ scope construals are those obtained overtly, and QR applies only to obtain marked interpretations required by the discourse. However, in both cases, the concept of markedness was problematic. As we saw in chapter 2, it turned out to be easy to find examples of scope-shift that sound perfectly natural (e.g., Hirschbu¨hler’s An American flag was hanging in front of every building). Similarly, the distinction between marked and neutral stress has been challenged with arguments that in the appropriate context, main stress can fall anywhere, with e¤ects hardly distinguishable from that of the neutral stress. (For an overview, see Selkirk 1984.) The conclusion reached was that there is no sentencelevel generalization governing the selection of possible foci, and any expression can be a focus, subject only to discourse appropriateness. If so, then main stress cannot be assigned at PF independently of the semantics of the sentence, and it must be the other way round: sentence intonation reflects its independently determined focus structure. In fact, in Chomsky 1976, where QR was introduced for the questions of quantifier scope, the view was that, as in the case of quantifier scope, the focus constituent is identified at the covert structure (LF), requiring ‘‘focus-movement.’’ Consequently, any constituent that can be raised by QR can serve as focus. But this solution is problematic as well. A problem we have already discussed regarding QR (see section 2.7) is that in the minimalist program, the hope is that syntactic movement is triggered only by needs of convergence (technically implemented as feature checking), and it is blind to any interface considerations. In an optimally designed language, the bare minimum needed for convergence should also make it possible to meet the interface conditions. This is not the case with QR and focus-movement: the derivation would converge without them, so this is superfluous movement. Though technically we can always introduce focus and quantification features to motivate movement, feature checking is only the implementation, and the problem remains the same. But in the case of focus, there is a further complication. Focusmovement indeed eliminates the problem of markedness, but this requires assuming that main stress is sensitive to semantics—that is, that it is
Focus: The PF Interface
127
determined by identifying the focus. Thus, the relations between stress and structure get to be a complex issue, raising questions of visibility of the covert structure to PF-rules (stress). Regardless of how we go about it, the fact that neutral stress is not always su‰cient to identify the focus is a case of imperfection. The minimum that would su‰ce for the computational system (namely, a simple rule of main stress) is not su‰cient to meet interface requirements. We must, therefore, depart from optimal design, to enable the computational system to meet the interface conditions. Nevertheless, it is not obvious that the imperfection must be as sweeping as entailed by the focusmovement analysis—for example, that the derivation’s main stress is uniformly determined at the covert structure. Cinque (1993) reopened the markedness issue in the area of focus, and proposed, in essence, returning to the PF-view of focus. Zubizaretta (1994, 1998) pursues this line of thought, but argues that it is still necessary to assume that focus is also marked at LF, by f(ocus) features. In Reinhart 1995, 1998, I argued that if focus is marked in the overt structure, there is no need to assume any focus features, or other covert marking. To see how focus computation works, we should start with the basics of the stress system, namely, the question of how main stress is assigned to derivations. In section 3.1, we will examine the system proposed by Cinque and further developed by Szendro˝ i (2001). 3.1
Sentence Main Stress
3.1.1 Cinque’s Main-Stress System The broader issue Cinque (1993) is concerned with is phrase and compound stress, but the instance of this problem that is relevant for us here is sentence stress. Previous analyses, which followed, in various ways, the Nuclear Stress Rule (NSR) of Chomsky and Halle 1968, assumed that this rule is parameterized, to capture the stress patterns across languages. Halle and Vergnaud (1987) developed a metrical approach to this rule (following the metrical-line analysis of word stress, as first proposed by Liberman). The basic idea is that the NSR applies cyclically, where the cycles are determined by syntactic constituency. The input of the procedure is the sequence of (noncompound) word stresses, marked by asterisks and represented as a line. A new line is introduced for each new cycle. The NSR, then, locates the prominent stress of this line. My summary of how this works will be simplified; a more detailed summary can
128
Chapter 3
be found in Cinque 1993. For illustration, let us check how we derive the fact that main stress in the simple sentence (1a) falls on book. Throughout, I will represent the word carrying the main stress of a sentence in boldface. (1) a. I read the book. b. (Dat) ik het boek las. (Dutch) (2) a. Line 1 (¼ word line 3): b. line 2 (VP cycle): c. line 3 (IP cycle):
[I [read [the book]]] [* [ * [ * ]]] [ [ * ]] [ * ]
The output of word stress for (1) is (2) (which is assumed to be metrical line 3). NSR then selects one of the word stresses of line 1, and places it in line 2. The same holds for line 3. Of course, the question is how the rule knows which asterisk to place on the next line. (A simple idea such as ‘‘take the rightmost asterisk’’ will not do, for example, for the Dutch equivalent (1b).) Halle and Vergnaud first define the cycle as a syntactic constituent containing at least two asterisks (stressed words). In this case, one is defined as the head of the constituent line. Once the head is identified, the NSR proceeds to project the head of each line onto the next line. The gist of this procedure is, then, stated in (3). (3) Nuclear Stress Rule Locate the heads of line N constituents on line Nþ1. (4) Parameter setting for English (on line N(N ¼ 3)): [þHT, right] But the crucial question, now, is how we identify the relevant head for (3). This is why (3) has to be parameterized. For English, given the way the parameter is set in (4), the head must be in a terminal position of its constituent, and this position is to the right; thus book is selected as the relevant VP-asterisk. Given all these assumptions, the derivation in (2) goes through, giving the right result for English. It is not a trivial matter to define the parameters for a ‘‘mixed’’ language like Dutch. If we define it as left-headed, we will get the stress of (1b) correctly, since the VP leftmost stress (boek) will be projected. But with an intransitive sentence, the leftmost stress will be the subject, which may then get the main stress incorrectly. Cinque’s insight is that, in fact, no parametrization of the stress rule is needed. Apart from the empirical problems of such parametrization, it is
Focus: The PF Interface
129
doing nothing more than an unneeded duplication of the mechanism that governs, independently, word-order variations in syntax. Assuming that we need to know independently what the direction of recursion in a language is, the same (and better) results will be obtained by applying the one universal stress rule, starting with the most embedded constituent of the sentence. The basic idea is as follows: let us assume that the first cycle of the stress rule is the most deeply embedded stress—that is, a category containing only one (word-level) stress. The stress rule now needs no mention of heads or their order, and it can be stated with a slight simplification in (5). As far as I can see, the rest follows with no further assumptions. I should mention that I am not fully loyal to Cinque’s actual execution. He assumes a greater machinery than I do here, though I think I capture correctly his intuition. Nothing here hinges on this being the case, and if my presentation is mistaken, one can go back to Cinque’s precise formulation.1 Let us see how the derivation of the stress of (1a) follows. (5) Generalized Stress Rule Locate the stress (asterisk) of line N on line Nþ1. (6) a. b. c. d.
line line line line
1 2 3 4
(¼ word line 3): (NP cycle): (VP cycle): (IP cycle):
[Max [ * [ [ [
[read [the book]]] [* [ * ]]] [ [ * ]]] [ * ]] * ]
Let us assume that the most deeply embedded constituent is the object (a point I will return to). The first cycle line, (6b), is then the NP (or N). Since there is only one stress for this cycle in the previous line, it is this stress that projects to the present line. From then on, there are no more options, and each cycle projects this same stress. Thus, the gist of the analysis is that the main stress of the sentence will always be on its most embedded constituent, namely, on the node we started stress processing with. Of course, everything now depends on the correct identification of the most embedded node. Specifically, the problem arises in the case of sisters (both carrying stress). Cinque argues that the answer lies in the order of recursion. Given two sisters, the most embedded one is that occurring on the recursive side of the tree. At first glance, this may seem like begging the question, but Cinque’s point is that the order of recursion, or whatever determines word order, is a problem independent of stress, the answer to which is the goal of current syntax.
130
Chapter 3
Once the answer is found, the stress pattern should follow. Thus, in a right-branching language like English, in the VO structure in (7), the most embedded node is the object. In a left-branching language like Dutch (in this relevant structure), it is again the object. (7) Asymmetry of sisters
Zubizarreta (1994) argues that, in fact, it is not correct to just talk about order of recursion here, and that depth of embedding is determined by head-complement relations. With this assumed, the Dutch (1b), repeated in (8), is derived as in (9). (8) (Dat) ik het boek las. (Dutch) I the book read (9) a. b. c. d.
Word stress NP-cycle VP-cycle IP-cycle
[ik [* [ [ [
[[het boek] las]] [[ * ] * ]] [[ * ] ]] [ * ]] * ]
The intransitive case appears nonproblematic, at this stage: given a sentence like (dat) ik las ‘I read,’ the first cycle assigns stress to V (or to VP—nothing hinges on this, in this case). Since the VP and the subject are not sisters, the issue of embedding does not arise, and it is clear where the stress processing starts. Hence, the main stress will fall on the verb. More problematic are structures where the subject (or another adjunct or specifier) is a complex constituent, containing more embedding than the VP. In this case the main stress still falls on the deepest constituent of the VP, and the question is how this happens. Cinque assumes that the subject constitutes a cycle of its own. In this, he follows Halle and Vergnaud, who noted, independently of this problem, that the subject always gets secondary stress (higher than nonstressed nodes in the VP). The issue, then, becomes that of how to merge two cycles, each carrying its own main stress. For that purpose, Cinque defines the notions of major and minor paths of embedding. The main stress always falls on the major path, but when a minor path joins it, it gets secondary stress (one asterisk).
Focus: The PF Interface
131
Zubizarreta (1994) o¤ers a di¤erent formulation of this merging, sensitive to the complement/adjunct distinction, but for our purpose here these details are not crucial. Cinque argues that his stress rule applies directly to syntactic constituents and that no notions like a phonological or prosodic phrase are needed. The question of what the relevant constituents for phrasal stress are, has been the subject of much debate. Cinque’s approach contrasts with the view developed by Selkirk 1984, where it applies to phonological phrases, related but not isomorphic to syntactic constituents. Zubizarreta (1994, 1998) points out that Cinque’s analysis can also be stated to apply to phonological phrases, so this question need not be decided here.2 3.1.2 Szendro˝i’s Main-Stress System Szendro˝i (2001) presents an alternative technique for the execution of the stress rule. She uses Liberman’s (1979) metrical-tree notation. In this method, there are no separate cycles like the NP-cycle or the VP-cycle as in Cinque’s system. Rather stress is assigned to the nodes of the syntactic tree (or alternatively, the prosodic structure). An advantage of this system is that it is fully transparent how it applies to syntactic (or prosodic) trees, and thus it lends itself to strictly incremental application. Stress is determined at each pair of nodes, as the derivation is built. Let us see how this system works. In the simplest case, when the tree contains just two terminal nodes, the sisters a and b in (10), one of the nodes is assigned a Strong (henceforth S) label, while the other node receives a Weak (henceforth W) label. The topmost, root node always receives an S-label. This is illustrated in (10). (10)
What this means is that b is prosodically more prominent, ‘‘stronger,’’ than a. So, by assumption, b bears more stress relative to a. One of the main advantages of this notation is that by assigning Strong-Weak pairs, it captures the inherently relational nature of stress. There is no such thing as a stressed element in the absolute sense. In other words, assuming that there is a phonetic cue (or a set of cues) associated with stress, there is no value x, above which an element is stressed and below which it is unstressed.3 Rather, we take an element to be stressed if it bears more stress than another element.
132
Chapter 3
Given the above, one might think that Strong-Strong and Weak-Weak pairs are excluded. This is true. However, importantly, the two do not have the same status. As I will explain in the discussion of destressing in section 3.3, two Weak nodes may occur adjacent. In contrast, two Strong nodes may never be adjacent, due to the Obligatory Contour Principle (Goldsmith 1976), which disallows adjacent stresses (or identical tones). But let us concentrate on the technicalities of stress assignment in a metrical-tree-based system. Having seen the simplest configuration in (10), we will take a more complex example, such as (11). In (11), Z, being the root node, receives an S-label, while the sisters, X–Y, a–b, and c–d receive W-S labels. Thus, we end up with two terminal nodes that receive a Strong label: b and d. But which is ‘‘stronger’’? In other words, which node bears primary stress and which bears secondary stress? (11)
If we look at the subtrees dominated by X and Y separately, the strongest nodes in the subtrees are d and b, respectively. Since X itself has a Weak label, while Y is Strong, it is the strongest node of Y (i.e., b) that is the strongest node of the whole tree. Thus, b receives main stress. The terminal node d, being the strongest node of the subtree whose topmost label is Weak, bears secondary stress within the whole tree. On the basis of the above, we can give the following definitions for primary and secondary stress. (12) Main stress falls on the terminal node that is connected to the root node by a path that does not contain any Weak nodes, including the root node itself and the terminal node (i.e., RootS X1S X2S XiS aS ). (13) Secondary stress falls on the terminal node whose path to the root node contains only S-nodes, except for exactly one W-label on the node immediately dominated by the root node (i.e., RootS X1W X2S XiS aS ). So far we have seen that a Strong and a Weak label are assigned at each branching node. But I have said nothing about the order of assign-
Focus: The PF Interface
133
ment. Szendro˝ i maintains Cinque’s insight that stress is assigned following the order of embedding. Thus at each branching node, a Strong label is assigned to the node that is syntactically more embedded. Its sister receives a Weak label. The reformulation of the rule in these terms is given in (14). Compare (14) with (5) above. (14) Generalized Stress Rule (metrical-tree version) Assign a Strong label to the node that is syntactically more embedded at every level of the metrical tree. Assign Weak to its sister node. In (15) it is shown how the derivation of the stress follows in a metricaltree diagram, given Cinque’s insight that main stress falls on the most embedded constituent on the recursive side of the tree. (Compare (15) with (6) above.) The object receives main stress as it bears an S-label and is only dominated by S-labels. The subject receives secondary stress as it is the strongest node (in this case the only node) in the Weak-labeled subtree immediately dominated by the root node. (15)
Among the advantages of this system is that it provides a way to derive Halle and Vergnaud’s insight that the subject always bears secondary stress. Cinque claimed that the subject and the VP constitute two separate cycles, the VP being the major path, while the subject is the minor path, by assumption. He also assumed that when the two cycles meet, the main stress in the minor path becomes secondary and the main stress of the major path becomes the main stress of the whole. In the present formulation, the root node immediately dominates two subtrees: one with a Weak label and one with a Strong label. The former corresponds to Cinque’s
134
Chapter 3
minor path, while the latter is the major path. The fact that the two paths come together and thus secondary and primary stresses are determined is a property of the system, which identifies one Strong and one Weak node at every level of embedding. It does not have to be stated separately that one of the paths is superior to the other. 3.2
How Focus Is Coded
3.2.1 Main Stress and Focus: The Basic View The analysis of sentence stress outlined so far is independent of any discourse considerations: it is impossible to utter a sentence with no prominent stress, so the PF-rule we examined—(5) or (14)—determines where this stress will fall. The main stress of the sentence, which is assigned by this rule, is just a particular instance of stress assignment, which is needed independently (e.g., for units smaller than a sentence). However, sentence accent interfaces with the theory of discourse, via the notion of focus. Focus, which is roughly viewed as the most informative part of an utterance, is usually identified by prominent stress. The gist of Cinque’s proposal is that the set of possible (neutral) foci in a sentence is determined by its main stress—that is, by the same rule of phrasal stress. I will return shortly to how precisely this works. On this issue of the relations between main sentence stress and focus, two conflicting positions exist. The one that Cinque returns to is that possible focus selections are restricted by an independent PF-stress rule. The other is that there is no such thing as a (neutral) PF-stress, and that the main stress of the sentence is determined solely by its relations to discourse—that is, by focus. Cinque surveys common counterarguments to the position he defends and concludes that discourse considerations may at times interfere with the results of the phrase-stress rule, assigning a di¤erent stress prominence. But he assumes that the two types of prominence can be distinguished. For him, the relevant distinction is that between sentence grammar and discourse grammar. The latter can change the output of the computational system: if in a given context, it is appropriate to use as a focus a constituent that was not assigned the main stress by ‘‘sentence grammar,’’ ‘‘discourse grammar’’ assigns an additional stress to this constituent, or destresses the original prominent stress. Zubizarreta (1994) develops this approach, and argues that the relevant distinction is that between a neutral focus and a marked one. Neutralfocus intonation is often characterized as the intonation under which a sentence could be uttered ‘‘out of the blue’’—that is, the whole sentence
Focus: The PF Interface
135
is asserted (as ‘‘new’’) and none of its constituents need to be preassumed in the context (no ‘‘presupposition’’). Zubizarreta argues, then, that what Cinque’s stress rule determines is the neutral focus intonation of a sentence. When a sentence with this intonation is uttered out of the blue, the full sentence can be viewed as the focus phrase.4 But the central point of Cinque’s and Zubizarreta’s analysis is that, under the same neutralfocus intonation, a sentence can also be used with only one of its constituents as the focus (and the rest preassumed). Crucially, the full set of the possible (neutral) focus constituents of the sentence is determined by the same rule of phrasal stress. Cinque’s generalization is given in (16). (16) The focus of IP is a(ny) constituent containing the main stress of IP, as determined by the stress rule. (This is Cinque’s ‘‘sentencegrammar’’ focus, and Zubizarreta’s ‘‘neutral focus.’’) As Cinque notes, his analysis goes back, in its essence, to the view of focus in Chomsky 1971. A way to check the prediction that any of the constituents dominating the main (neutral) stress can serve as focus is by checking the set of possible substitutions. For instance, in the context of a yes/no question in (17), modeled after Chomsky’s example, the di¤erent answers correspond to di¤erent selections of focus in the question. The focus in each answer, which is the underlined F-bracketed constituent, substitutes one of the possible foci in the question, namely, one of the constituents dominating the main stress of the question. (17) Are you [looking for [a passenger with [a red [shirt]]]]? a. No, I am looking for a passenger with a red [F tie] b. No, I am looking for a passenger with [F a coat] c. No, I am looking for [F a member of the crew] d. No, I am [F just wandering around] 3.2.2 PF-Coding: The Focus Set Let us examine further the question that underlies this line of analysis. At the interface, sentences must be fit to context and purpose of use. One of the means for relating sentences to discourse is focus. The computational system should, therefore, provide us with su‰cient means to identify the focus constituent, so the question is how it does that, namely, where and how the focus information is coded (signaled). Since 1976, the prevailing answer has been that it is coded at LF—that is, at the covert structure. This has been obtained either by covert movement (QR) of the focus constituent, or by attaching a focus feature to nodes in the syntactic derivation, or, most commonly, by both: attaching a focus feature to a
136
Chapter 3
constituent, to license its movement (which, interestingly, is viewed by some as more minimal than doing just one of these two). It is technically possible to combine Cinque’s distinction between neutral and marked stress with the assumption that the focus is coded covertly at LF. This is the implementation chosen by Zubizarreta, who states the focus rule (16) as a restriction on nodes marked þF(ocus). However, as argued in Reinhart 1995, 1998, this is clearly not a conceptual necessity. A more minimal and realistic assumption is that the focus constituent is coded at PF, as essentially assumed in Chomsky 1971. Since main stress is a requirement of the computational system (a derivation cannot be pronounced without main stress), an optimal language system would make use of this visible property of derivations to code information needed at the interface. Szendro˝i (2001) points out several theoretical and technical problems with prevailing feature-based approaches to the coding of the focus constituents. One of her arguments is based on the notion of inclusiveness. As Chomsky (1995, 228) formulates this notion, ‘‘A ‘perfect language’ should meet the condition of inclusiveness: any structure formed by the computation . . . is constituted of elements already present in the lexical items selected for N [the numeration]; no new objects are added in the course of the computation apart from rearrangements of lexical properties’’. . . . Let us assume that this condition holds (virtually) of the computation from N to LF. Szendro˝ i argues that at least in the present formulations of the feature-based approach, [þF] violates the Inclusiveness Condition, because there is no sense in which bearing [þF] is a lexical property of an item. Thus, [þF] is nothing more than a diacritic introduced into the computation to account for something that does not directly relate to the lexical item bearing [þF] (see Zubizarreta 1998 for the same claim). Suppose that it can even be argued somehow that the [þF] feature is associated with a lexical item in the numeration to satisfy some discourse need. Still, the fact of the matter is that the feature must be inserted on the lexical node that will bear main stress. But the semantic (discourse) import of this feature is not necessarily associated with the lexical node it is attached to, because the actual focus can be a wider projection. Either way we look at it, it is not a property of the lexical item that is coded with this feature, but rather its position in the prosodic structure. The broader conceptual problem with approaching focus identification with assignment of features is that focus, unlike main stress, is not a property of a constituent or a node. By definition, focus is a relation between
Focus: The PF Interface
137
an expression and a sentence (see, e.g., its semantics in Rooth 1985, 1992. But all approaches to the semantics of focus since at least Jackendo¤ 1972 have that same property.) The focus can thus be defined only at the level of the sentence. An assignment of a focus feature is a technical trick that can perhaps be made to work, but it codes information that cannot, in principle, be present in the numeration. This is di¤erent, for example, from the case of the Q(uantifier) feature, which was proposed to mark DPs that may undergo QR. In that case, the properties that define the DP as [þQ] are internal to that DP (say, its determiner). It is at least arguable that its quantificational nature is traceable to the lexical entry and is thus information available to the syntactic derivation (present at the numeration). Hence there was no conceptual problem in assuming such a feature, and my arguments evolved around the question of whether this is necessary and whether it can do the needed empirical work. Even if we ignore the conceptual problem, and assuming that all technical problems with feature-based approaches to focus that are pointed out by Szendro˝ i (2001) and by Neeleman and Szendro˝ i (2002) can be solved, the fact that none of this is needed is a su‰cient argument against it. As mentioned in the discussion of QR in chapter 2, coding information at the overt structure is, in any case, a more e‰cient mode of communication than coding it covertly in the inaudible and invisible sphere. A language system that encodes focus overtly, then, is more optimal for interface needs. Part of the reason that this simpler line of argument seemed untenable in linguistic theory was conceptual. A leading assumption regarding the architecture of language has been the T-model (LF and PF being the two top branches of the T). On this view, the interface of the CS with the inference and context systems must be only via the LF-branch, and information of the phonological spell-out is not accessible—that is, the PF-coding is not legible to the semantic (inference) system. So the two branches must be completely separate and independent. This, however, is not a conceptual necessity. From the perspective of enabling communication, the picture is almost the opposite. A system perfect for communication would be one in which precisely the same syntactic tree that is needed to enable the semantic interface would also be the one that is spelled out phonetically. This would mean that the syntactic tree can be read directly and e‰ciently from the phonological input. (For example, phonological boundaries correspond exactly to phrase boundaries, and intonational contours reflect hierarchy relations of the tree.) PF, in this case, is nothing but the physical coding
138
Chapter 3
of an abstract syntactic tree. Turning to the view of the semantic interface in this perfect communication system, PF is legible to the inference systems, in the sense that all information regarding the syntactic tree that is needed for constructing semantic representations (propositions) is obtained through the PF-coding. In other words, PF is also LF. That human language may be such a perfect system was proposed, in e¤ect, by Cinque, who argued, as we saw, that main stress is determined directly on the syntactic tree. However, there seems to be ample evidence from phonology that human language is not that perfect. There are independent phonological requirements that can only be satisfied assuming a distinct derivational phonological tree. From the syntactic perspective, it has been widely assumed that there are clear instances of covert processes that cannot be reflected at PF. So the phonological and the syntactic derivations cannot be isomorphic. Nevertheless, there are various imaginable degrees of imperfection. We could imagine a system of the type proposed in Jackendo¤ 1997 where the PF-coding is done on representations completely unrelated to the syntactic representations, and their association requires a whole set of linking rules that associate nodes in one derivation with nodes in the other. This is the least user-friendly system. Each sentence requires processing two independent derivations and computing their links. The e‰ciency of such a system depends dramatically on the e‰ciency of the linking rules, and the more such rules are needed, the more room this leaves for errors of computation. But the most common approach to the derivation of the phonological tree is that it is constructed by applying certain well-defined and restricted operations on the syntactic tree. Thus, the phonological phrase is defined based on the syntactic phrase, but taking into account edges that are not imposed by the syntactic tree (e.g., Selkirk 1984; Nespor and Vogel 1986). On this latter view, much of the information of the syntactic tree is still recoverable from the phonological tree, as in the perfect system we examined above. But the representations are not fully isomorphic. In some cases, the hearer, in communication, or the inference systems at the interface, still have to reconstruct some operations from the PF-representation in order to extract the relevant syntactic information. This, then, is still some imperfect variant of a system in which LF-information is fully recoverable from PF. We might as well hope that human language indeed falls within the boundaries of this type of model, and attempt to define the information needed for the semantic interface in such a way that it is recoverable from PF.
Focus: The PF Interface
139
Neeleman and Weerman (1999), followed by Neeleman and Reinhart (1998), argue that possibly the strict adherence to the T-model also hindered understanding of syntactic processes. They argue, for example, that case checking can take place across languages either in the government domain of the syntactic tree, or in the phonological phrase, which is not always identical. If the phonological tree is built incrementally, which is entailed by the minimalist view, particularly Chomsky 2001, there is no reason why the phonological phrase should not be an available domain for checking, and the choice of domain can be parametrized. Nevertheless, my argument here, that focus is coded at PF, is not dependent on the option I have just outlined that LF is recoverable from PF. Maintaining the T-model, we may still assume that the inputs to the inference (semantics) system are hLF, PFi pairs, so it is only in restricted cases, like focus identification, that the relevant information is coded at PF. What is needed on the view that focus is coded at PF is a rule or definition that tells inference and context how to identify the focus unit, based on intonation. The generalization in (16) provides the basics. In implementing this view of overt coding, I pursue a line of thinking suggested in Reinhart 1981a, for the analysis of topics. On that analysis, rather than associating derivations with a single topic, each derivation is associated with a set of possible pragmatic assertions (PPA-set). The set is defined for each derivation within the syntax, but discourse procedures select a member of this set, which is appropriate to the given context. Extending this view to foci, each derivation is associated not with an actual focus, but with a set of possible foci, namely, a set of constituents that can serve as the focus of the derivation in a given context. This set is determined by the computational system at the stage where both the syntactic tree and stress are visible—that is, the focus selection applies either to a configurational PF-structure, or to a pair hPF, LFi, of sound and configurational structure.5 The focus generalization (16) can then be stated as the definition of the focus set associated with each derivation, as in (18). If stress falls on the object, either in English SVO structures or in Dutch SOV structures, the focus set defined by (18) is the one in (19c). (18) Focus set The focus set of a derivation D includes all and only the constituents that contain the main stress of D. (19) a. [ IP Subject [VP V Object]] b. [ IP Subject [VP Object V]] c. Focus set: {IP, VP, Object}
140
Chapter 3
This means that in actual use, any of the members of the set in (19c) can serve as focus. At the interface, one member of the focus set is selected as the actual focus of the sentence. For illustration, let us look at a concrete example (of a familiar type). Sentence (20a), which is generated with stress on the object, can be used as an answer in any of the contexts in (20b–d), with the F-bracketed constituent as focus. (20) a. My neighbor is building a desk. b. Speaker A: What’s this noise? Speaker B: [F My neighbor is building a desk] c. Speaker A: What’s your neighbor doing these days? Speaker B: My neighbor [F is building a desk] d. Speaker A: What’s your neighbor building? Speaker B: My neighbor is building [F a desk] At this stage, it is up to the discourse conditions, rather than syntax, to determine whether a derivation with a particular stress is appropriate in a given context. A language system that codes its foci successfully in this way may be viewed as perfect: the stress rule that is needed independently is su‰cient to code the foci needed for the interface. Thus, the bare minimum needed for the computational system is fully su‰cient for the interface. Natural language, alas, is not that perfect. There are more context needs than can be satisfied by this simple system. A derivation is inappropriate in context, if no member of its focus set can be used as an actual focus in that context. Statement (20a), for example, cannot be used as an answer in either of the contexts of (21). (The a sign indicates, throughout, inappropriateness to context.) (21) a. Speaker A: Has your neighbor bought a desk already? Speaker B: aNo, my neighbor is [F building] a desk. b. Speaker A: Who is building a desk? Speaker B: a[F My neighbor] is building a desk. This is so because in the contexts of (21), the F-bracketed constituents should be the foci, but these constituents are not in the focus set generated by (18) for a sentence in which the object bears stress (cf. 19). We have encountered, then, an imperfection of the computational system—it is not fully su‰cient for interface needs. Hence, some repair mechanism is needed to adjust the derivation to a context like (21). Assuming, as we have been, that focus is always coded at PF, by main stress, the repair mechanism must be an operation shifting the main
Focus: The PF Interface
141
stress. Let us first state this operation schematically as (22), and we will turn to more precise details in the next section. (22) Relocate the main stress. In the context of (21a), repeated in (23a), (22) shifts the main stress to the verb, yielding (23aB). As a result, the verb is in the focus set defined by (18), and the derivation is appropriate in this context. In (23b), main stress shifts to the subject. (23) a. Speaker Speaker b. Speaker Speaker
A: B: A: B:
Has your neighbor bought a desk already? No, my neighbor is [F building] a desk. Who is building a desk? [F My neighbor] is building a desk.
The output of (22) is what is called marked stress. Although they sound perfectly natural in their context, the foci in (23) are marked, since they are obtained by a superfluous operation that undoes the results of the Nuclear Stress Rule. As we will see, applying (22) is not just superfluous, as far as the needs of the computational system are concerned, but it is a costly operation that requires reopening completed segments of the derivation, or undoing previous steps. This is what defines it as an illicit operation. The logic of the system as outlined in previous chapters is that applying this illicit operation requires consulting a reference set to verify that there is no other choice but to apply it. Before we examine the reference-set computation involved here, however, we need to get a clearer picture on the nature of stress-shift. 3.3
Stress Operations
3.3.1 Focus and Anaphora Note, first, that there are, in fact, two reasons why speaker B’s response in (21b), repeated here, is inappropriate. (21b) Speaker A: Who is building a desk? Speaker B: a[F My neighbor] is building a desk. (23b) Speaker B:
[F My neighbor] is building a desk.
The reason that we observed is that in this context, the subject my neighbor should be the focus, but this constituent is not in the focus set of the derivation with this stress. The other problem is that the object a desk is inappropriately stressed. The object does not carry new information in this context, having just been mentioned in the previous sentence. But the
142
Chapter 3
stress it carries suggests that it does. Both problems appear to be solved with the shift of the main stress in (23b), repeated above. But as we will see, this is a byproduct of the specific example, and in most cases it appears that two distinct operations are involved in determining the stress pattern. Cinque (1993) argued briefly that stress-shift involves, in fact, two distinct operations. One is the destressing of a stressed element; the other is the strengthening of an element that does not bear the main stress. In Reinhart 1995 I attempted to reduce these operations to just one, arguing that ‘‘ideally, rather than assuming two distinct operations, we should be able to show that the availability of both results is entailed by the same focus-computation system’’ (p. 75, section 4.5 of part III). But the analysis required certain complications in the definition of the focus set, and, nevertheless, some serious empirical problems remained unsolved. The conclusion I have drawn (since Reinhart 1998) from this failed exercise in unification is that there are two distinct processes responsible for stress e¤ects, as argued in Selkirk 1984, 1996, and correspondingly, two distinct stress operations. One is anaphoric destressing, and the other is stressshift that applies when it is necessary to add a member to the focus set. Williams (1997) and Schwarzschild (1999) drew a di¤erent conclusion from the failure to capture the full stress pattern in discourse by just stress-shift for focus. Still attempting a unified account, they argue that the only relevant factor is, in fact, anaphoricity. They develop an optimality-based account, which derives the various stress and focus options by selection of the optimal candidate, based on the basic concept of anaphoricity (Williams) or givennesss (Schwarzschild). As far as I can judge, in their empirical coverage these approaches are essentially identical, but they di¤er in implementation. Schwarzschild presents the problem by looking at the widely held generalization in (24), where prominence means stress prominence. (24) a. Lack of prominence indicates givenness. b. Prominence indicates novelty. He points out that the assumptions in (24) cannot both be true in view of examples like (25). (25) a. Question: Who did John’s mother vote for? b. Answer: She voted for him. In (25b) him, referring to John, has prosodic prominence or stress. Still, John has just been mentioned in the previous discourse, so the pronoun
Focus: The PF Interface
143
represents given information. Schwarzschild argues that while (24a) captures an observationally correct generalization, (24b) cannot be defined in a way consistent with the facts. He concludes, therefore, that while the association of lack of stress prominence with givenness is a real discourse phenomenon, the association of stress prominence with novelty is unfounded, and more broadly, that the concept of novelty does not in fact play any independent role in discourse. I agree with the second conclusion. The term novelty (or new information) has been used in linguistic theory under a broad variety of definitions or descriptions, which in fact do not have much to do with each other. For instance, Schwarzschild (1999, 41) cites Halliday 1967 as giving the following three descriptions of new: (a) ‘‘textually and situationally non-derivable information’’; (b) ‘‘contrary to some predicted or stated alternative’’; (c) ‘‘replacing the WH-element in a presupposed question.’’ However, the concept of novelty is not under consideration here. Our basic assumption is that sentence main stress is associated with focus. And focus, indeed, cannot be defined by novelty. Rather, as mentioned, I assume the general lines of the analysis of focus in Rooth 1985, 1992. On this account, the focus is always computed against a set of alternatives (leaving aside here questions of the interpretation of the IP-focus). Adding information to an ongoing discourse means excluding options, or narrowing the set of compatible propositions. The focus can be viewed as marking what is excluded. Of the three definitions cited from Halliday, then, this is closest to definition (b), and the use of focus in question contexts, as in Halliday’s (c), is just a specific instance of the same definition. Halliday’s (a) plays no role in defining the focus. Returning to Schwarzschild’s example (25), there is no conceptual problem in viewing John as both the focus and as anaphoric, or given, if we assume that these are just independent notions (and abandon the assumption (24b), as Schwarzschild argued we should). I return to how stress is derived in this case in the next subsection. However, Schwarzschild proceeds to argue (like Williams) that since givenness is a concept that is clearly needed in discourse, it should be taken as the basic one, and it should su‰ce to derive all the phenomena associated with identifying the focus. He o¤ers an optimality system and a set of definitions that are able, indeed, to do that, as far as I can judge, at least for languages like English. Unlike the view presented here, there is no distinction between neutral and shifted stress, and the execution requires reference-set computation in both cases, as also in Williams 1997. In the present system,
144
Chapter 3
this is required only with stress shift, as we will see in section 3.4. From the perspective of optimal design, extending this costly computation to all derivations seems too big a price to pay for unification. In chapter 5, we will see that children are unable to process stress-shift in comprehension, and I argue that this is the case because they are unable to execute the required reference-set computation. If the same computation is involved in all instances of stress, it is not easy to see how children manage to comprehend anything at all, given that stress is present at each derivation. Independently of the specific optimality implementation of Williams and Schwarzschild, I believe the distinction between (non-) anaphoricity and focus is real and cannot be replaced. This is obvious if we consider the interaction of these two notions with stress. As we will see, destressing obtained by stress-shift and destressing obtained by anaphoric destressing do not have the same prosodic properties. Cinque does not elaborate on the way the two operations di¤er. But Neeleman and Reinhart (1998) argue that they not only have di¤erent prosodic properties, but also completely independent discourse functions. Stress strengthening is an operation on the focus set, employed to derive foci not in the set, while destressing is an anaphoric process, independent of the focus set. In the specific case of (23b) ([ F My neighbor] is building a desk), it is indeed the case that only one stress operation applied, as argued in Reinhart 1995: although the object a desk does not carry new information, it is also not anaphoric. Generally, indefinite NPs are not anaphoric elements referring to entities already in the context set, and they are normally not anaphorically destressed.6 Hence, only strengthening took place, and the destressing of the object just reflects the fact that it no longer carries the main stress. But there are many contexts where the two operations can be clearly distinguished. Let us survey Neeleman and Reinhart’s analysis. Anaphoric destressing applies when a DP (or another constituent) denotes an entity already in the context set—that is, an entity previously mentioned in the discourse or available in the situation (I will elaborate on this notion below). A denotation of this type is often found with definite DPs, as can be observed in (26), but it is most noticeable with pronouns. Whether a definite DP is anaphoric depends on the previous context. Hence, without such context, judgments are not always clear. But pronouns are mainly used anaphorically and hence they are almost obligatorily destressed. If the object is destressed, the stress of the verb becomes the prominent stress in VP, as illustrated in (27).
Focus: The PF Interface
145
(26) Speaker A: That man over there is a famous writer. Speaker B: I was just thinking that I know that face. (27) a. aMax saw her/it. b. Max saw her/it. The other stress operation assigns an extra stress to the verb (or any other element selected as focus), without destressing the object (or whatever other element bears neutral stress). The result is that the object carries less stress than the verb, but that some secondary stress is still present on it, unless the object is independently destressed for reasons of anaphora. It is only this operation that I will refer to here as main-stress shift. Typically, stress shift applies when it is needed to create a focus not already in the focus set. In (28a), for example, the verb seeing is by itself not a possible focus with neutral stress. Strengthening its stress enables it to serve as the only focus. (28) a. Max can only a¤ord seeing cars. b. Max can only a¤ord seeing her. (29) a. Only Max can a¤ord buying cars. b. Only Max can a¤ord seeing her. When main-stress shift applies to the verb, as in (28), there is no actual intonational di¤erence between the e¤ects of destressing and main-stress shift. In both one hears a stronger stress on the verb than the Nuclear Stress Rule would assign to it. Thus, in (28b) a destressing operation has applied to her, independently of any focus requirement. But there is no di¤erence between the resulting prosodic pattern and the pattern in (28a). As we will see in the next subsection, this follows independently from the mechanism of stress assignment, under Szendro˝ i’s implementation. An intonational di¤erence is easily observed, however, when mainstress shift applies further away from the object, as in (29a), where the subject is strengthened. Here the secondary stress that remains on the object is audible. In contrast, the pronoun in (29b), which is independently destressed, does not carry any stress at all. Rather, it is the verb that carries secondary stress.7 Example (29b), then, is an instance of the common situation where both stress reduction and main-stress shift apply in the same sentence. In principle, however, the two procedures are independent, and it is possible for only one of them to apply to a given derivation. In (29) and hereafter, secondary stress is marked with italics. But the presentation here is only informal. I return to the details of how secondary
146
Chapter 3
stress is determined in the next subsection. An example from Dutch with the same e¤ect of preservation of the original stress when main-stress shift applies is given in (30). (30) Zelfs die milieu-fanaat heeft nu een auto gekocht. (Dutch) ‘Even that environment-fanatic has now bought a car.’ A systematic explication of the e¤ects of (what I call here) main-stress shift on the focus structure of a sentence is provided by Williams (1997). Williams argues (based on a detailed analysis of more elaborate examples) that main-stress shift creates a new focus, but does not eliminate the previous focus structure. When this operation takes place, the ‘‘presupposition’’ part of the sentence typically contains a focus and a presupposition itself. That is, there is a subordinate focus. Thus, the fact that main-stress shift does not eliminate the original stress, as Neeleman and Reinhart argue, finds a direct correlation in the focus interpretation of the derivation. A special instance of main-stress shift can be observed in (31). (31) Ik denk dat ik iets moet eten. I think that I something must eat ‘I think I have to eat something.’ The object here is certainly not anaphoric. But since it is devoid of any specific content, it is an unlikely focus by itself. Though I am not aware of an analysis of such examples in the literature on focus,8 these cases seem to be related to the contrast Bolinger (1972) found between the sentences in (32) (quoted by Zubizarreta and Cinque). In (32a), the candidate for neutral stress does not merit a focused status because it is semantically uninformative. In such cases no subsidiary focus structure is derived, although stress is obtained by the same stress-shift operation. (32) a. I have a point to make. b. I have a point to emphasize. The standard view relates all stress operations to just focus structure. More generally, most attention in studies of stress has centered around the relation between stress and focus. A notable exception is Selkirk (1984), who argues that there must be some independent procedure for anaphoric destressing, a position she develops further in Selkirk 1996.9 Neeleman and Reinhart argue that the lack of a systematic distinction between the two has led to many problems in the theory of both sentence stress and focus, and I will return to some of these problems below. Un-
Focus: The PF Interface
147
derlying the need for a distinction is the fact that stress patterns provide many more interface clues than just focus structure. The task of anaphora resolution, relevant to even the simplest discourse, involves a complex procedure of associating expressions with their potential antecedents, which, at times, may all be of the same number and gender. Without some means of signaling anaphoric relations, this task would seem impossible to carry out. One of these means is signaling by stress. That accent patterns indeed have a crucial role in discourse anaphora, independently of focus, has been confirmed in several experimental studies by Nooteboom and colleagues (see, for example, Nooteboom and Kruyt 1987 as well as Terken and Nooteboom 1988). They found that subjects tended uniformly to associate deaccented DPs with discourse entities. Comprehension time was substantially longer when DPs representing discourse entities were not destressed. The converse also holds: comprehension is slower when a destressed DP refers to an entity mentioned for the first time. In practice, speakers operate by the assumption that a DP is destressed if and only if it is discourse-given. Naturally, work in this area must focus on the question of when an entity counts as anaphoric, or discourse-given, and thus is subject to the destressing rule. In fact, anaphoricity, or previous mention, are not a sufficient condition for this type of destressing. A DP referring to an entry that has not been active for a while, or has been mentioned too far back, is not normally destressed. Rather, destressing is governed by the accessibility of the antecedent, as defined in Ariel’s (1990) analysis of anaphora resolution. This definition also takes ‘‘topics’’ into account (a DP is highly accessible if it is either the topic, or has been mentioned very recently). Furthermore, following Pesetsky’s (1987) view of D-linking, the accessible entity need not be an antecedent in the sense of strict identity. Thus, a DP may also be D-linked if only its common noun set is already in the context set. Nevertheless I should stress that the clearest instances of destressing are pronouns and anaphoric definite descriptions. In the other cases it is not always obvious whether the D-linked DP has been destressed, or its reduced stress is just an outcome of applying main-stress shift to another constituent in the derivation. With this assumed, Neeleman and Reinhart state a first approximation of the generalization governing the operation of destressing, given in (33). (33) A DP is destressed if and only if it is D-linked to an accessible discourse entity.
148
Chapter 3
Note that this is an if-and-only-if condition. If a DP is appropriately Dlinked, it must be destressed, and if it is not D-linked it cannot be fully destressed, regardless of the focus structure of the sentence. Though the anaphoric status of expressions may have an e¤ect on their focus structure, the crucial point is that something along the lines of (33) must be operative independently of focus, as should be clear from the data discussed in this section. Note that (33) is not an output condition on derivations, but rather it determines when the operation of destressing must apply. As we will see in the next subsection, there are instances where stress-shift can undo the results of destressing. 3.3.2 The Operations: Destressing and Main-Stress Shift Having examined the contextual (discourse) motivation for the two stress operations, let us now turn to the way they actually apply. To get an intuitive grasp of how they work, let us first look at the first approximations in (34)–(35), which use the star notation to mark stress, as in Cinque’s system. (34) Main-stress shift Add two stars. (35) Anaphoric destressing Remove a star (prior to the NSR). Main-stress shift takes a given output of main-stress assignment and, while keeping this assignment, adds stress to another word. (The formulation of the rule in (34), as adding exactly two stars, is for ease of illustration only, and is not a precise formulation of the rule in a Cinque-type star notation. I return to a more explicit formulation below.) In example (36), the result is that main stress is on my neighbor, but the original stress on desk remains as a secondary stress. * * * * (36) My neighbor is building a desk ) My neighbor is building a desk Destressing, as we saw, is an operation independent of focus, and of the main stress of the derivation. It applies locally, to any anaphoric constituent, independently of the general main-stress rule (NSR). If we restrict attention first to single anaphoric elements, this can be captured by assuming that it applies at the word level, prior to the NSR. Thus, the relevant D-linked or anaphoric expressions do not carry an intonational star when the NSR applies, as in (37a).
Focus: The PF Interface
149
* * (37) a. Destress: Max [saw her] * b. NSR: ) Max [saw her] The NSR, then, just operates in the standard way, turning the most embedded star into the main stress. Since in (37a) the lowest star is on the verb, it is the verb that will carry main stress, as in (37b). Let us follow the derivation of stress in (29), repeated in (38). (Recall that main stress is indicated with boldface and secondary stress with italics.) (38) a. Only Max can a¤ord buying cars. b. Only Max can a¤ord seeing her. * * * * (39) Max can a¤ord buying cars ) Only Max can a¤ord buying cars. (Stress shift) * * * * * (40) a. Destressing: Only Max can a¤ord seeing her. * b. NSR: Only Max can a¤ord seeing her. * * * c. Stress-shift: Only Max can a¤ord seeing her. In (38a), no anaphoric destressing takes place. (The focus-shift to Max is motivated by the scope of only.) Stress-shift applies, as in (39), and the object cars still carries secondary stress. Example (38b) is a more complex case where both destressing and stress-shift apply. The pronoun is destressed prior to the NSR, as in (40a). The NSR, then, assigns main stress to the most embedded element with stress, namely the verb, as in (40b). Finally, because in this specific example we wanted the focus (for only) to be the subject, and the subject is not in the focus set, stress-shift must apply to yield (40c). The secondary stress still remains on the verb, where main stress originally fell. The two sentences in (38) thus end up, with very di¤erent intonations, and I argued above that in this case, the di¤erence is audible. Destressing can apply to larger units than a word. Typically, when it applies to a whole VP, the destressed VP may also not be pronounced at all, giving rise to VP-ellipsis. This is illustrated in (41). Since the VP in the
150
Chapter 3
second conjunct is anaphoric, it is destressed. Main stress then is assigned to the only possible candidate, namely the subject, as in (41b). The VP could either be pronounced as in (41b), or mispronounced (‘‘deleted at PF’’), as in (41c). * * * * (41) a. Destressing: First Max [touched Felixi ] and then Lucie [touched himi ] * * b. NSR: First Max [touched Felixi ] and then Lucie [touched himi ] c. PF-deletion: First Max [touched Felixi ] and then Lucie did [e] Let us now return to the problem posed by Schwarzschild (1999) in (25), repeated in (42). (42) a. Question: Who did John’s mother vote for? b. Answer: She voted for him. In this case, the pronoun is both anaphoric, and a focus. In our terms, (42b) appears to violate the destressing generalization (33), which entails that destressing must apply to this pronoun, prior to or independently of the NSR. But it is easy to see how this apparent contradiction can be reconciled. The derivation of (42b) starts in the same way as (37). * (43) a. Destress: She [voted for him] * b. NSR: ) She [voted for him] (44) Focus set of (43b): {voted, voted for him, she voted for him} * * * (45) Stress-shift: she voted for him Since the pronouns in (43) are destressed, prior to the NSR, main stress projects from voted, the only word bearing stress in this derivation. The resulting focus set is (44). However, at the given context, him is the required focus, and it is not in this focus set. Hence, to meet the context, stress-shift must apply, as in (45). Thus, although it may appear that the stress on him in (42b) is derived by the NSR, in fact, it is derived by stress-shift. In the same way, stress-shift can apply to the derivation in (41), as in (46).
Focus: The PF Interface
151
* * * (46) Stress-shift: First Maxi [touched Felix] and then Lucie [touched himi ] The output in (46) is a switch-reference structure. The pronoun cannot pick up the same antecedent as it would in the destressed (41). I return to such structures in section 5.2.3, where we will follow the way the switch-reference interpretation is derived. We will also see there that such derivations indeed have the traits of stress-shift in acquisition— children cannot process them. Pronoun foci obtained by stress-shift are often described as contrastive, but for our purposes, they are just standard applications of the stress-shift operation. The star notation was used here only for illustration. We cannot expect this informal mechanism to actually determine the intonational pattern of the derivation, particularly when the more complex main-stress shift operation is concerned. For example, as mentioned in section 3.3.1, the question whether the word that carried main stress prior to main-stress shift will carry the derivation’s secondary stress (after main-stress shift) is also dependent on other stress factors. Capturing the precise intonational pattern requires, then, a full-fledged stress analysis. Szendro˝ i (2001) o¤ers a formulation of the operations of the present system in the metrical-treebased framework. I will only review it briefly here. For the main-stress shift operation (which creates a new focus), it is necessary to ensure that a constituent receives main stress. In the tree-based framework, this means that it has to bear an S-label and can only be dominated by nodes with S-labels. This can be achieved by an application of Szendro˝ i’s rule in (47). (47) Main-stress shift (metrical-tree version) Assign S to a node a and every node dominating a. Recall (from section 3.1.2) that the Obligatory Contour Principle (OCP) disallows the occurrence of two adjacent S-nodes.10 Hence, if (47) applies, the sister of the targeted node has to be changed as well and becomes Weak. Representations (48) and (49) illustrate the application of this rule to the example we have followed here. The standard main-stress derivation in (a) is inappropriate to the context, since the required focus is not in the focus set. Applying (47) adjusts the derivation to the context, resulting in the (b) answer. The locus of the application of (47) is indicated by circles in the diagrams. The operation requires first switching the value of the circled node from W to S, and then adjusting its sister node from S to W.
152
Chapter 3
(48)
Has your neighbor bought a desk already? a. aMy neighbor is [F building] a desk. b.
(49)
Who is building a desk? a. a[F My neighbor] is building a desk. b.
After (47), the boldface word in each derivation carries the main stress, because it is dominated by S-labeled nodes all the way up. In these specific examples, the dominating nodes are independently S-labeled, by the standard requirement on W-S order, so no iterative application of (47) is traceable. But we will consider instances where this is not the case shortly. We can observe now that desk ends up with a di¤erent stress in the two derivations. Given Szendro˝ i’s definition of secondary stress in (13), repeated below, desk bears secondary stress in (49b) but not in (48b), where the secondary stress is on neighbor.
Focus: The PF Interface
153
(13) Secondary stress falls on the terminal node whose path to the root node contains only S-nodes, except for exactly one W-label on the node immediately dominated by the root node (i.e., RootS X1W X2S XiS aS ). The upshot is that when main stress shifts from the object to the verb, the intonational pattern is not di¤erent from that obtained with destressing of the object. But when it shifts to the subject, the pattern is di¤erent, as illustrated in (29), repeated as (50). (50) a. Only Max can a¤ord buying cars. b. Only Max can a¤ord seeing her. In (50a), cars, where the main stress would originally fall, is defined as the secondary stress. In (50b), the original main stress would fall on seeing (because the object is anaphorically destressed). This original stress remains the secondary stress of the derivation. Regarding destressing, note first that in the tree-based implementation, there is no specific rule like the NSR. Rather, the NSR e¤ects are obtained incrementally by applying the W-S rule to each pair of nodes. Hence, there is no reason to state that destressing applies before the NSR. But the di¤erence between main-stress shift and destressing is that the latter is strictly local and independent of other elements in the derivation. It is stated in (51). (51) Destressing Assign W to an anaphoric node. The standard stress rule for English determines a W-S order for each pair of nodes. However, if at a given stage an anaphoric word or anaphoric constituent is encountered, (51) requires that it be assigned a W. Unlike the main-stress shift operation, this process is strictly local and has no e¤ect on the nodes dominating its locus. Let us look at the derivations in (52)–(53), where (53) repeats (41). (52)
154
Chapter 3
(53)
In (52), the anaphoric pronoun must be W. Since its sister—the verb— is not anaphoric, it is assigned an S. At the next node up, the derivation can proceed by the standard W-S requirement, so S is assigned to the rightmost VP-node. The outcome is that main stress falls on the verb, which is dominated only by S-nodes. In (53), both the verb and the pronoun have been mentioned in the immediate context. So both are anaphoric and assigned a W at the terminal level.11 The next node up—the VP—dominates only anaphoric material (the whole node is anaphoric), hence (51) applies again. Thus the whole VP ends up destressed, and the main stress falls on the subject, with no stress-shift. We may note in conclusion that there is a substantial di¤erence between destressing and the main-stress shift operation. Destressing is a purely local calculation. At any given point in the incremental application of W-S assignment, whether a node is anaphoric or not is given by the context. This determines whether it gets stress according to the default W-S order specified for the language, or it must be assigned the anaphoric W-value, regardless of the default order. At each stage what we consider is only whether the node we are assigning a value to is anaphoric or not, and the local decision has no further e¤ect on the process of assigning a value to nodes dominating it. But main-stress shift is a global operation. Note, to begin with, that main stress is a global stress relation. It is defined not by the stress on an individual node, but by the full path in the relevant subtree. Main-stress shift is an operation designed to alter the global default stress of the derivation. So once a node is selected as the future main stress, all nodes above it have to be adjusted accordingly. In our examples so far, this adjustment was not visible, because the default order happened to assign S to the dominating nodes. But in other instances, like (54b) below, the value assigned to some dominating nodes must be adjusted as well.
Focus: The PF Interface
155
(54) a. Whose neighbor is building a desk? b.
Here the specifier of the subject DP is focused; thus it receives an S-label by (47). However, the subject itself would be Weak unless (47) applied to it. Thus, the only way to ensure that main stress falls on the specifier of the subject is by applying (47) twice, first to the specifier and then to the subject itself. It does not matter much if main-stress shift applies iteratively in a given derivation. The crucial consideration that makes this operation global is the need to keep track of the history of a given node, and reapply Schecking at each of the dominating nodes. A further di¤erence between destressing and main-stress shift is where they apply. Anaphoric destressing applies incrementally during the phonological derivation. This is possible, because the information regarding whether an element is (discourse) anaphoric is available locally. (For each item it is known locally whether it is already in the context set or not.) But main-stress shift is needed to create a focus not in the focus set, and as noted (in section 3.2.2), being a focus is not an inherent property of a constituent, but rather a relation between this constituent and the whole sentence that contains it. Hence, there is no (nonarbitrary) way to decide in advance during the derivation what should be the locus of the stress-shift operation. This type of information is available only at the IP-level (the root), where the full sentence is constructed. At this level, the focus set is computed, and it can be determined that the focus needed for the context is not in the focus set. So main-stress shift applies at this stage to adjust the derivation to the context. Main-stress shift, then, is a repair mechanism. It involves undoing the output of the default stress procedure and creating a new S-path for the
156
Chapter 3
targeted constituent. This is what makes this operation marked, under the present view. An operation reopening and undoing a completed derivation is always illicit. It is not something the computational system allows naturally, and applying an illicit operation requires checking if there was no way to avoid its application—in other words, it requires reference-set computation. 3.4
Reference-Set Computation
3.4.1 Focus Projection The widely acknowledged characteristic of the focus obtained by shifted (marked) stress is that it ‘‘does not project’’—it can only be ‘‘narrow focus.’’ This has been used by Cinque and Zubizarreta as a major diagnostics for the distinction between neutral and marked stress, but it is also largely assumed by others. The opponents of the distinction have come up with ample examples showing that this cannot always be true. I will return to these arguments, but first let us see what this generalization means. As we have noted, stress obtained by the default (nuclear, neutral) stress rule allows any projection containing it to serve as focus—for example, the whole IP in (20b), repeated below. The shifted cases of (23), by contrast, cannot be used in the same context, as we see in (55), which means that they do not project IP as focus. Similarly, stress shifted inside the VP does not project VP as focus, as seen in the comparison of (20c) and (56). (20b) Speaker A: What’s this noise? Speaker B: [F My neighbor is building a desk] (55) Speaker A: What’s this noise? Speaker B: a[F My neighbor is building a desk] a[F My neighbor is building a desk] (20c) Speaker A: Speaker B: (56)
What’s your neighbor doing these days? My neighbor [F is building a desk] aMy neighbor [F is building a desk]
The same di¤erence can be witnessed with the scope of only (which is always the focus). In (57), stress is assigned by the Main Stress Rule to builders. In this case, the scope of only can be either (the narrow-focus) builders, or the whole VP that contains it.12 Suppose our store sells equipment only to builders, but at the same time we also buy used equipment
Focus: The PF Interface
157
from builders and others. In this situation, (57a) with the narrow focus is true, but (57b) with the VP-focus is false. (57) a. We only sell equipment [F to builders] (not to the general public). b. We only [F sell equipment to builders] (We do not buy anything from anybody). (58) a. We only sell [F equipment] to builders—not health insurance. b. aWe only [F sell equipment to builders]—We do not buy anything from anybody. In (58), stress-shift applied. The sentence can only be used to exclude the option that we sell anything but equipment to builders, but not to exclude anything else, as witnessed by the inappropriateness of (58b). This means that the only element in the scope of only is the narrow focus—the argument bearing the new stress, as in (58a), but not the whole VP. Though widely discussed, such facts have received a satisfactory account. Standard approaches postulate a special focus-projection rule for ‘‘contrastive’’ focus. But it is far from obvious how we can distinguish (in a noncircular way) the ‘‘contrastive’’ (58a) from the ‘‘standard’’ (57a), given that in both the focus is narrow. Note also that it is impossible to explain the projection problem by forming some direct association between the position of the stress and its focus interpretation. Such an attempt could, for example, identify a given stress as ‘‘nonneutral’’ by noting that it does not fall on the position required by the default stress rule, and then stipulate that with nonneutral stress the focus is only the smallest projection of the stressed node. Technically, this means defining a special focus set for sentences with nondefault stress. Though this may seem an adequate descriptive generalization, it is not, in fact, true. (59) [The man with the hat] committed a murder. (60) a. [Did the man with the apron] commit a murder? b. Who committed a murder? In (59) main stress falls in a ‘‘nonneutral’’ position on the noun in the subject. By the generalization under consideration, the focus set for (59) should then contain only the lower DP the hat. The derivation can certainly be used with this constituent as a focus—for instance, as an answer to question (60a). But it can also be used as an answer to (60b), in which
158
Chapter 3
case the whole DP the man with the hat is the focus. So the nondefault stress does project up to the top DP. But it is still impossible to use this derivation in an ‘‘out-of-the-blue’’ context, namely, with the full IP as focus. Another instance where this approach will not work is when anaphoric destressing is involved. In (61) stress is in a nonneutral position. Still the focus set has the three members in (61b), which is witnessed by the three contexts it can be used with in (62). The focus set in (61) is precisely the one determined by the standard definition of the focus set (which I will repeat shortly). (61) a. Lucie hit him. b. Focus set: {V, VP, IP} (62) a. Did Lucie kiss him? b. What did Lucie do? c. Why is he crying? So, to begin with, narrow focus cannot be associated with the position of the stress, but it has to do with how the stress got there. It is only if the given main stress was obtained by the main-stress shift operation that it does not project. In (61) this operation did not apply. As we saw, the stress on the verb is obtained by applying the default stress rule (with anaphoric destressing of the pronoun). But even if stress shift applies, more is needed to explain (59). Given the reference-set approach to illicit operations, assumed in this work, these facts are exactly what we should expect. Such an operation is allowed in case this is the only way to meet interface requirements, so it requires reference-set computation. Most notably, in this approach it is not surprising that interpretative e¤ects can depend on the derivational history. It is only when an illicit operation applies that we have to check a reference set. So the fact that we chose to apply this operation may restrict our set of possible interpretations. Let us see how this works. I assume, first, just the one definition of the focus set in (18), repeated below, which is blind to how stress is assigned. Hence, for the derivations at hand the focus sets defined are those in (b) of (63) to (65). (18) The focus set of a derivation D includes all and only the constituents that contain the main stress of D. (63) a. My neighbor is building a desk. b. Focus set: {IP, VP, Object}
Focus: The PF Interface
159
(64) a. My neighbor is building a desk. b. Focus set: {IP, VP, V} (65) a. My neighbor is building a desk. b. Focus set: {IP, subject} The focus sets of (63) and (64) intersect in the case of IP and VP. Suppose that in a given context we want VP (or IP) to be the focus. We could obtain this result by using (63a), without applying the superfluous stressshift. Hence, (64a) is ruled out for that context. The only focus of (64a) not already in the focus set of (63a) is the verb. Hence, it is only the need to use this focus that can motivate stress-shift. Similarly, (65b) intersects with (63b) on IP. Hence (65b) can only be used with the subject as focus. As mentioned, computing this type of reasoning, requires construction of a reference set, which consists of hd, ii pairs of a derivation and interpretation. In this case, the relevant interpretation is a selection of a focus out of the focus set. So, suppose our task is to decide whether (64a) can be used in a context requiring the selection of IP as focus. The reference set is (66). (66) a. d: My neighbor is building a desk ! My neighbor is building a desk i: Focus: IP b. d: My neighbor is building a desk i: Focus: IP Since the pair in (66b) does not involve the extra operation, it blocks (66a). Suppose now that we want to use (64a) with the verb as the focus. Since stress-shift is involved, we have to construct a reference set here as well. However, the reference set is (67), which contains only this one member, since no other derivation (of the same numeration) has the verb as focus. Hence this derivation is allowed. (67) d: My neighbor is building a desk ! My neighbor is building a desk i: Focus: V Let us return to (59), repeated as (68a). (68) a. [The man with the hati ]j committed a murder. b. Focus set: {DPi : the hat, DPj : the man with the hat, IP}
160
Chapter 3
(69) a. [The man with the hat] committed a murder. b. Focus set: {DP: a murder, VP, IP} The focus sets of (68) and (69) intersect with the IP. Hence, the referenceset computation excludes using (68) in a context where IP is the intended focus. But neither of the DPs in the focus set (68b) is also included in the focus set of (69). Hence, the derivation can be used with either of these DPs as foci (as we saw with the contexts in (60)). This is why the nonneutral stress in (68a) still projects partially. Szendro˝i (2001) observes the contrast in (70). By what we have said so far, the focus sets of (70a) and (70b) equally contain the full DP-subject. Still, while this DP-subject can be used as a focus in (70a), it cannot be in (70b). (70)
Who committed a murder? a. [F The man with the hat] committed a murder. b. a[F The man with the hat] committed a murder.
Szendro˝i argues that in the case of (70b), another reference set needs to be constructed, namely, a derivation with fewer applications of main-stress shift. In (71a), corresponding to (70a), the main-stress shift that places S on a given node applies once: to the subject DP-node. (This is so because hat is already S-labeled. To turn it into the main stress, only one W-node on its path needs to be switched.) In (71b), corresponding to (70b), mainstress shift applies twice: to the DP-subject and also to the N-head of the subject. (71) a.
Focus: The PF Interface
161
b.
In both (71a) and (71b) the subject DP is in the focus set. If what we need in the context is just this DP as focus, there is no reason to apply the illicit operation twice. So deriving (71a) can be viewed as ‘‘less illicit’’ or involving fewer violations than (71b), hence, Szendro˝ i argues, it rules (71b) out. 3.4.2 Markedness The analysis is based on the assumption that the computational system always assigns main stress in the same way. This stress is referred to as ‘‘neutral’’ or ‘‘default’’ stress. Stress derived by main-stress shift is viewed as marked. The analysis has established some concrete content to the concept of markedness—a marked derivation is associated with computational complexity. Whenever main-stress shift applies, a reference set must be constructed to check its appropriateness. This means that this operation entails computational complexity, whether the final outcome is ‘‘in’’ or ‘‘out.’’ On the other hand, there is no reason to assume any reference-set computation in derivations involving no stress shift (as would be assumed in some optimality approaches). So, under the present view, the di¤erence between neutral and ‘‘marked’’ stress is that the latter requires computational complexity not involved in the former. As mentioned, the idea that a systematic distinction can be drawn between marked and neutral stress has often been challenged. The central argument against markedness was that in the appropriate context marked sentences may sound as innocent as neutral sentences. Hence, we can never know whether a given stress is marked or not, and, consequently,
162
Chapter 3
a theory assuming this distinction is unfalsifiable. Under the present formulation, these objections are irrelevant, since computational complexity cannot be observed by introspection. When applying the more complex computation is the only way to satisfy the interface needs, its outputs sound perfectly normal. The more significant objections revolve around the issue of focusprojection. As we saw, the fact that in the case of stress-shift the focus does not project has been used as a major diagnostics for marked stress. The counterargument has been that this is empirically wrong, and we can find many instances of focus-projection with stress not in its default position, as in the examples below, from Schmerling 1976 and Ladd 1980 respectively. The point of both examples is that the whole IP is the focus, even though stress-shift has applied. (72) I’d give the money to Mary, but I don’t trust Mary. (73) Speaker A: Has John read Slaughterhouse Five? Speaker B: No, John doesn’t read books. As mentioned, a major reason this debate could not be successfully concluded was the lack of an explicit distinction between anaphoric destressing and focus-shift. With this distinction established, we can observe that many (though not all) of the counterexamples are, in fact, instances of anaphoric destressing, rather than main-stress shift. Example (72) is a clear anaphoric instance, since Mary is obviously in the context set just mentioned. Example (73) is a more complex case of D-linking: the context set (established by speaker A) contains a specific book, but this appears to activate the set of books, which is viewed as given in speaker B’s response. Hence books is anaphorically destressed. In both cases, then, main stress falls on the verb, rather than on the default object position, but it did not get there by applying the costly main-stress shift operation. As we saw, in this specific instance, where main stress falls on the verb, there is no prosodic di¤erence between the outputs of anaphoric destressing and main-stress shift. Nevertheless, the di¤erence in derivational history shows up in the options of focus-projection. More generally, when only anaphoric destressing applies at a given derivation, the focus projects in the standard way (as essentially observed by Selkirk 1984, under a di¤erent formulation). In the present system, this result is enabled, to begin with, since the definition of the focus set is independent of the derivational history of the sentence, and holds uniformly regardless of where the stress falls. By way of a summary, let us review a simple case like (74).
Focus: The PF Interface
(74) a. a 0. b. b 0.
163
Max likes cars Focus set: {IP, VP, Object} Max likes her Focus set: {IP, VP, V}
Recall that the focus set of IP consists of the constituents containing the main stress of IP. For a structure like (74a), three possible foci can thus be identified (cf. 74a 0 ). The minimal assumption would be that the definition of possible foci (that is, of the focus set) remains constant no matter how stress is derived. If so, destressing of the object, as in (74b), should lead to the focus set in (74b 0 ). This focus set di¤ers from that in (74a 0 ) in only one construal: (74b 0 ) allows the verb, but not the object, to be the focus (since the object does not contain the main stress). Representation (74a), in contrast, allows the object as focus, but not the verb. But in both derivations IP and VP are equally defined as possible foci—that is, so far, nothing blocks focus-projection in either structure. Examples (72) and (73) are analogous to (74b), with precisely the same focus set. But the next crucial question is why the fact that IP and VP can be used as foci without anaphoric destressing does not block their use in (74b), (72), and (73). In the present system, there is no reason to expect that it would. Focus-projection is blocked only if reference-set computation must take place, and it reveals that the projected focus could be obtained without stress-shift. But this computation is only activated if an illicit operation applies. The global main-stress shift operation, which requires undoing the syntactic tree, is such an operation. But as we saw, anaphoric destressing is not. It applies incrementally with no computational cost involved. So nothing triggers any reference-set computation here, the derivation is not defined as marked, and the projection diagnostics shows, indeed, that it is not.
Chapter 4 The Anaphora Reference-Set Strategy
4.1
Two Procedures of Anaphora Resolution
Pronouns are commonly viewed as variables.1 Thus, (1b) corresponds to (2a), where the predicate contains a free variable. This means that until the pronoun is assigned a value, the predicate is an open property (does not form a set). There are two distinct procedures for pronoun resolution: binding and (what I will label) covaluation. In the first procedure, we close the property: a common technical implementation is that the variable gets bound by the l-operator, as in (2b). Here, the predicate denotes the set of individuals who think that they have gotten the flu, and the sentence asserts that Lili is in this set. (1) a. Lucie didn’t show up today. b. Lili thinks she’s gotten the flu. (2) a. Lili (lx (x thinks z has gotten the flu)) b. Binding Lili (lx (x thinks x has gotten the flu)) c. Covaluation Lili (lx (x thinks z has gotten the flu) & z ¼ Lucie) In the second procedure, the free variable is assigned a value, say, from the discourse storage.2 Suppose (1b) is uttered in the context of (1a). We have stored an entry for Lucie, and when the pronoun she is encountered, it can be assigned this value. In theory-neutral terms, I will represent this assignment as in (2c), where Lucie is a discourse entry, and the pronoun is covalued with this entry. (The pronoun can also be covalued with the entry Lili, yielding an interpretation equivalent to (2b).) Let us turn now to the way this basic distinction is captured in the theory of anaphora.
166
Chapter 4
4.1.1 The Current Picture The logical concept of binding is trivial: variables are bound by operators. However, in the linguistic theory of anaphora, it turned out to be useful to talk about the relation between arguments (pronouns and their antecedents), since this enabled the formulation of syntactic conditions on binding, like Condition B and the chain condition of Reinhart and Reuland (1993). The definition of binding assumed since Chomsky 1981 is (3). (3) Definition of binding a binds b i¤ a and b are coindexed, and a c-commands b. The logical and the syntactic use of the notion ‘‘binding’’ are thus substantially di¤erent. Logical binding is a relation between operators and variables, and not between arguments, but syntactic binding is a relation between variables (indices)—that is, between arguments. In the binding construal of (1b), on the syntactic view, Lili is said to bind she. Technically, then, in the syntactic representation (2b), where indices are replaced with bound variables, one occurrence of the variable x binds the other (which is a meaningless description if one uses the logical concept of binding). I believe that the idea of capturing binding with restrictions on syntactic coindexation, or identity of variables, has led to several stubborn problems in the theory of anaphora.3 One set of problems it creates is in the area of defining the syntactic restrictions on binding. I will only illustrate this briefly here. If binding is a relation between indices, or variables, it is not trivial to distinguish (4a) from (4b) (a ‘‘strong-crossover’’ configuration). (4) a. Who i e i said we should invite him i ? b. *Who i did he i say we should invite e i ? c. who (lx (x said we should invite x)) Both the pronoun and the wh-trace stand for variables. In (4a) both can be (logically) bound by the same operator, yielding a representation like (4c). But in (4b) they cannot, although the relations of the two variable arguments appear identical. To filter (4b) out, it was necessary to assume a syntactic restriction, which originates in Reinhart 1976 and is known as Condition C of the binding theory. This condition disallows the coindexation in (4b) (hence the interpretation (4c) for it). But how could this condition distinguish (4a) and (4b), given that in both we have coindexed variables? The assumption (since Chomsky 1981) has been that they di¤er syntactically: wh-traces were defined as R-expressions (the same type as referential DPs), while pronouns as ‘‘pronouns.’’ (Condition C prohibits
The Anaphora Reference-Set Strategy
167
R-expressions from being bound by another argument, in the sense of binding in (3).) However, Reinhart and Reuland (1993) argue that by all syntactic criteria, pronouns and wh-traces pattern alike: they occur in argument position, get full case specification, and pattern the same in other anaphora and chain contexts. Though I cannot go into the details here, the reflexivity framework provides evidence for and rests crucially on the shared typology of pronouns and wh-traces as þR (full argument specification, unlike NP-traces and anaphors). If true, Condition C’s solution to the problem in (4) is not feasible and the problem remains unsolved, under the present view of binding. No satisfactory alternative account has been o¤ered for the strong-crossover problem. In Grodzinsky and Reinhart 1993, it is treated on a par with weak crossover, thus leaving unexplained why strong crossover is so much worse than weak crossover. The same holds for categorial-grammar accounts such as Jacobson 1999. Next (and more relevant for the present discussion), problems arise in interpreting the indexing system. The source of the problem is that, in fact, binding and covaluation are both defined in terms of identity of indices, or variables. The only di¤erence, in the syntactic framework, is in the structural configuration: binding is coindexation under c-command. For this reason, it is actually impossible to state, within this framework, the full range of the distinction between binding and covaluation, or between the two procedures of anaphora resolution that we have observed. Let us look at this in some detail. Given the two procedures above, the term antecedent is ambiguous (as widely observed). If Lili is identified as the antecedent of the pronoun in (1b), the sentence has two anaphora construals. Since Lili is also in the discourse storage, (1b) can have, along with (2b), the covaluation construal (5a). (1) b. Lili thinks she’s gotten the flu. (2) b. Binding Lili (lx (x thinks x has gotten the flu)) (5) a. Covaluation Lili (lx (x thinks z has gotten the flu) & z ¼ Lili) b. Lili thinks she has gotten the flu, and Max does too. Though (2b) and (5a) are equivalent, it was discovered in the 1970s (since Keenan 1971) that certain contexts show that there is a real ambiguity here. For example, assuming that she is Lili, the elliptic second conjunct of (5b) can mean either that Max thinks that Lili has gotten the flu (the
168
Chapter 4
‘‘strict’’ reading), or that Max himself has gotten it (the ‘‘sloppy’’ reading). The first is obtained if the elided predicate is construed as in (5a), and the second—if it is the predicate of (2b). Another well-known disambiguator is only, as in (6a). (6) a. Only Lucie respects her husband. b. Binding Only Lucie (lx (x respects x’s husband)) c. Covaluation Only Lucie (lx (x respects her husband) & her ¼ Lucie) Representation (6b) entails that unlike Lucie, other women do not respect their husbands; (6c) entails that other women do not respect Lucie’s husband. The two construals are, thus, truth-conditionally distinct. At earlier stages, a standard assumption, which I shared, was that the distinction at issue is between binding and coreference—that is, that covaluation is possible only when the antecedent is a referential NP. However, Heim (1998) points out that precisely the same ambiguity illustrated in (6) can be found when the antecedent is not referential, as in (7). (7) a. Every wife thinks that only she respects her husband. b. Every wife (lx (x thinks that [only x respects x’s husband])) c. Every wife i thinks that only she i respects heri husband. Sentence (7a) can be construed as entailing either that every wife thinks that other wives do not respect their husbands, or that every wife thinks other wives do not respect her husband. The most immediate question is what gives rise to the ambiguity of (7a). Since coreference is irrelevant, we are left with the syntactic coindexation in (7c), which is defined (by (3)) as binding, since she is coindexed with and c-commands her. Coindexation is just the identity of variables, and (7b) is thus the representation where the indices are translated into such identical variables. Of course, one could skip the syntactic coindexation (7c) and derive (7b) directly, by letting the top operator bind all free pronouns.4 The problem remains exactly the same: looking at the single binding representation—(7b)—that our system associates with (7a), the fact that (7a) is ambiguous is a mystery. It seems clear that the ambiguity in (6) and (7) should be related somehow. But while coreference could provide another representation for (7), there is nothing in our present theoretical machinery that can provide further distinctions between identical variables.
The Anaphora Reference-Set Strategy
169
Heim concludes that the present coindexation system is insu‰cient. She introduces a distinction between two index types (‘‘inner’’ and ‘‘outer’’). As she points out, the intuition behind her indexation system is similar to that which led Higginbotham (e.g., 1983) to replace indices with a theory of linking (represented by arrows). Fox (1998) o¤ers the most explicit formulation of this intuition, which is not dependent on index types (and is also the closest to what I propose below). However, the basic view of binding remains, in these frameworks, the same as in (3): binding is a relation between arguments (variables). Hence, the syntactic problems, exemplified here with strong crossover, remain untouched. Indeed, these authors assume the standard Condition C, with its arbitrary distinction between pronouns and wh-traces. Other complications stemming from this view will be mentioned in section 4.4. My first goal is to explore further the intuition that, I believe, underlies these proposals, and to relate it to the actual procedures of anaphora resolution we started with. This should make it possible to also address the syntactic problems mentioned above (e.g., eliminating Condition C), as well as the problems of ellipsis we turn to in section 4.4. As our starting point, we need a more explicit definition of the distinction between binding and covaluation. 4.1.2 What Is Binding? Rather than examining coindexation, or identity of variables, let us look at the problem posed by (7) from the perspective of the resolution of pronouns that we started with. Let us start with the embedded clause of (7), given in (8a). (8) a. . . . only she (ly (y respects her husband)) b. Open VP property . . . only she (ly (y respects x’s husband)) c. Closed VP . . . only she (ly (y respects y’s husband)) The VP in (8a) contains a free variable (her). As before, we have precisely two options when encountering a free variable: either do nothing at this local stage, namely leave the VP as an open property, with a free variable awaiting a value, as in (8b) (which is just a di¤erent notation for (8a)), or bind the variable and close the set, yielding (8c). Here, the VP denotes the set of female entities each respecting her husband. In (8a, b), then, there are two free variables left (she and her), in (8c)—just one (she).
170
Chapter 4
Now we move to the higher clause of (7). Again we have the same two options: leaving the variables free, awaiting a value from the discourse storage, or closing the top VP. Suppose we opt for the second, and bind all variables that are still free at this stage to the top operator (lx). We then obtain the two representations in (9). (7) Every wife thinks that only she respects her husband. (9) a. Every wife (lx (x thinks that [only x (ly (y respects x’s husband))])) b. Every wife (lx (x thinks that [only x (ly (y respects y’s husband))])) The sets denoted by the lower l-predicates remain di¤erent. If we take out only, this di¤erence would not be noticed, since the two representations would end up equivalent. But with only we get di¤erent results if for every value of x, x thinks that only x belongs to the set of people respecting x’s husband (9a), or that only x belongs to the set of people respecting their own husband (9b). Representation (9a) thus entails that every wife thinks that other wives do not respect her husband, while (9b) entails that every wife thinks that other wives do not respect their husbands—the distinction we wanted to capture. As mentioned, from the perspective of syntactic binding (as defined in (3)), the relation of she and her in (9a) is the binding relation. Representation (9a) is just the translation of the coindexation in the syntactic representation (7c), repeated in (10), into identical variables. (10) Every wife i thinks that only she i respects heri husband. However, the reading obtained in (9a) corresponds to the covaluation reading of (6) (Only Lucie respects her husband ), and not to the bound reading. It is (9b) that corresponds to the bound reading of (6). (This will get clearer as we proceed.) To capture the binding in (9b), we need to return to the logical concept of binding. Binding is just the procedure of closing a property. Under the widely assumed technical implementation I use here, this is obtained by binding a free variable to a l-operator. But other implementations are certainly conceivable. A similar view of binding as closing a property has been developed, in a variable-free framework, by Jacobson (1999) (her G-function). However, for stating binding restrictions, it is still convenient to be able to talk about relations of DPs, traditionally described as the relation of a pronoun and its antecedent. Let us look, then, at the relation of she and
The Anaphora Reference-Set Strategy
171
her in (9b). The she variable (only x) is the argument of a l-predicate whose operator (ly) binds the her variable (y). This is the relation that I argue is relevant for the binding theory. To avoid confusion with just the standard logical binding it is based on, let us call this relation A(rgument)-binding, defined in (11). Note that also the term A-binding is used here di¤erently than in the syntactic binding theory, which is based on the definition of binding in (3).5 (11) A-Binding (logical-syntax based definition) a A-binds b i¤ a is the sister of a l-predicate whose operator binds b. I use the term sister here, rather than argument, to leave open the interpretation of l-predication. (In the generalized-quantifier framework, which I assume, in a formula DP (lx (P(x))), the DP denotes a set of sets, and thus, although it is the syntactic argument of the predicate, semantically it is not an argument.) In relating syntactic derivations to logical syntax representations with l-operators, I follow the standard assumption that subjects are obligatorily (syntactic) arguments of a l-predicate. (This is a standard interpretation of the EPP—the predicate is formed by the raising of the subject from its VP-Spec position.) A sentence like He likes Lucie, then, corresponds to the representation He (lx (x likes Lucie)). Other DPs may become arguments of such l-predicates by movement. If a A-binds b, by (11), this entails that a c-commands b, at the given representation (since it is a sister of a node containing b). C-command is relevant for the syntactic conditions under which l-predicates can be formed (compositionality), but has no independent role in defining binding. By way of a summary, let us check how this definition works in the examples (8) and (9), repeated here. (8) a. . . . only she (ly (y respects her husband)) b. . . . only she (ly (y respects x’s husband)) c. . . . only she (ly (y respects y’s husband)) (9)
Every wife thinks that only she respects her husband. a. Every wife (lx (x thinks that [only x (ly (y respects x’s husband))])) b. Every wife (lx (x thinks that [only x (ly (y respects y’s husband))]))
In (8a) she is a sister of a l-predicate. If her is bound by the l-operator, as in (8c), she A-binds her, under (11). Similarly, in (9b) she A-binds her. In
172
Chapter 4
both (9a) and (9b) every wife A-binds she. Representations (9a) and (9b) di¤er in that in (9a) her is A-bound by every wife, while in (9b), her is A-bound by she. Given (11), then, the binding relations in (9) are distinct, as desired. 4.1.3 Covaluation We may now check the relations of she and her in (9a). Neither A-binds the other. Rather, they are both A-bound by every wife (‘‘cobound,’’ in the notation of Heim 1998). But if we want to talk about their relation to each other, the correct description is that she and her are covalued— that is, assigned the same value, which is here the bound variable x. Given the definition of A-binding (11), then, covaluation can be trivially defined, as in (12). (12) Covaluation a and b are covalued i¤ neither A-binds the other and they are assigned the same value. The distinction between binding and covaluation thus holds, regardless of the referential status of the relevant expressions. Binding is the logical relation—a relation between an operator and variables. Covaluation is a relation between arguments—variables, or the indices of discourse entities. A-binding is just the notation introduced to describe the relation of antecedents and pronouns when logical binding holds. With this, then, we capture the fact that it is the same ambiguity in (9) and in (6), which was our point of departure. For (12), the anaphora relation in (8b) or (9a) is precisely the same as that in (6c), where the pronoun is not A-bound by Lucie, but is assigned the same (discourse) value as Lucie. (6) a. Only Lucie respects her husband. b. Binding Only Lucie (lx (x respects x’s husband)) c. Covaluation Only Lucie (lx (x respects her husband) & her ¼ Lucie) What distinguishes the so-called referential anaphora from quantified anaphora is thus not the option of covaluation, which is available for both, but the fact that referential NPs form a discourse entity that the pronoun can directly be covalued with, while in the case of quantified NPs, covaluation is only possible between a pronoun and a variable. Compare the anaphora options in (9) to those in (13).
The Anaphora Reference-Set Strategy
173
(13) Lucie thinks that (only) she respects her husband. Binding a. Lucie (lx (x thinks that (only) x (ly (y respects y’s husband)))) Covaluation b. Lucie (lx (x thinks that (only) x (ly (y respects x’s husband)))) c. Lucie (lx (x thinks that (only) x (ly (y respects her husband)))) & her ¼ Lucie In (13), if the pronoun her is not bound, it has two possible covaluation construals. In the first, the pronoun is covalued with the variable x, yielding the bound covaluation in (13b), precisely as in the case of (9a). But in the second, the pronoun is covalued with Lucie, as in (13c). This option is obviously not available in the quantified case of (9), since every wife does not have a discourse value that the pronoun can pick up. Of course, the two covaluation representations in (13) are equivalent. By logical syntax, if a pronoun is covalued with the argument (sister) of a l-predicate, it is also covalued with the variables bound by the l-operator. But the fact that referential NPs allow also the construal in (13c) explains why the ‘‘strict’’ reading is available in ellipsis sentences like (5) (Lili thinks she has gotten the flu, and Max does too), and not in the parallel cases with a quantified antecedent. 4.2
Anaphora Restrictions
4.2.1 Restrictions on Binding Variable binding is not a free interpretative procedure, but it obeys two types of conditions. One is imposed by logic, and covers the variablebinding aspects of what is known as Condition C. The other is a set of conditions independent of logic, which are imposed by internal considerations of the computational system (Conditions A, B). 4.2.1.1 The Logical Syntax Condition (Condition C) Under the present view of binding, we expect it to be sensitive to the standard laws of the relations of operators and variables in logical syntax. Specifically, only free variables can be bound by a given operator. This can be illustrated with the strong crossover case of (14). (14) a. b. b 0. c.
Who did he say we should invite t? who (lx (he said we should invite x)) who (lx (he (ly (y said we should invite x)))) Binding *Who (lx (he (lx (x said we should invite x))))
174
Chapter 4
Given the definition of A-binding in (11), no special anaphora condition is required to explain why there can be no A-binding relation between the pronoun and the wh-trace in (14). The trace is bound by the wh-operator, so it cannot be A-bound again by the pronoun. Let us assume that the logical syntax representation for the moved wh in (14a) is (14b) (though it does not matter if another question operator is assumed there, rather than the l-operator). Representation (14b 0 ) is the full spell-out of (14b), also including the VP l-predicate. The same will hold throughout for examples numbered with a prime ( 0 ). In (14b 0 ), ly cannot bind x, since x is bound. Saying that the pronoun A-binds the trace amounts to assuming some nonsensical logical representation like (14c), where the same variable is bound by two operators. (This does not yet exclude an alternative anaphora construal for (14a), to which I will return shortly.) As far as binding goes, then, the present definition eliminates the need to assume a special syntactic restriction, like Condition C. Recall (from section 4.1.1) that for a syntactic restriction to apply here, wh-traces had to be defined arbitrarily as R-expressions distinct from pronouns. On the present view, what distinguishes them from pronouns is just the fact that they are bound already. As we will see, the fact that binding is disallowed in (14c) (by logic) will enable the covaluation rule to rule out any alternative anaphora construal for this derivation, with no appeal to any special properties of the expressions involved. 4.2.1.2 Conditions of the Computational System Apart from logic, binding also obeys conditions specific to the computational system. In Max touched him, logic does not exclude the binding construal Max (lx (x touched x)), but this construal is excluded by internal considerations of the CS. Let us view these briefly. The presentation is based on Reinhart 1999b. Natural language uses di¤erent types of anaphoric expressions in di¤erent syntactic contexts. All languages have the two anaphoric types in (15), though not all have both anaphors. English does not have a SE anaphor; the Dravidian languages of India do not have a SELF anaphor; Germanic and many other languages have both. (15) Types of anaphoric expressions Pronouns (she, her) Anaphors a. Complex SELF anaphors (herself ) b. SE (Simplex Expression) anaphors (zich, in Dutch)
The Anaphora Reference-Set Strategy
175
The core restrictions on binding are most commonly believed to be purely syntactic. It is assumed, first, that bound anaphora is possible only when the antecedent c-commands the anaphoric expression. The central problem, however, is the di¤erent distribution of the two anaphoric types. It was discovered in the 1970s (Chomsky 1973) that the two anaphora types correspond to the two types of syntactic movement illustrated below. (16) a. wh-movement: Who i did you suggest that we invite t i ? b. NP-movement: Felix i was invited t i . (Passive) Felix i seems [t i happy]. (Raising) NP-movement is much more local than wh-movement. Chomsky’s empirical generalization rests on observing the relations between the moved NP and the trace left in its original position: in the syntactic domain in which a moved NP can bind its trace, an NP can bind an anaphor, but it cannot bind a pronoun, as illustrated in (17) and (18). Where an anaphor cannot be bound, NP-movement is excluded as well, as in (19). (17) a. Felix i was invited t i . b. Felix i invited himselfi . c. *Felix i invited him i . (18) a. Felix i b. Felix i c. Felix i d. *Felix i
was heard [t i singing] heard [himselfi sing] hoorde [zich i zingen] (Dutch) heard [him i sing]
(19) a. *Lucie i believes that we should elect herselfi . b. *Lucie i is believed that we should elect t i . In the early implementations of binding theory (Chomsky 1981), this was captured by defining NP-traces as anaphors. Thus, the restrictions on NPmovement were believed to follow from the binding conditions. Skipping the technical definition of a local domain, these are given in (20), where bound means coindexed with a c-commanding NP. (20) Binding conditions Condition A: An anaphor must be bound in its local domain. Condition B: A pronoun must be free in its local domain. Examples (17c) and (18d) violate condition B. (19a, b) violate condition A. The others violate neither and hence are permitted. Later developments in syntax enabled a fuller understanding of what this generalization follows from. A crucial di¤erence between wh-traces
176
Chapter 4
and NP-traces is that NP-traces cannot carry Case. The conditions in (20) alone cannot explain why this should be so. This requires an examination of the concept of ‘‘argument.’’ An argument of some predicative head P is any constituent realizing a grammatical function of P (thematic role, Case, or grammatical subject). However, arguments can be more complex objects than just a single NP. In the passive sentence (17a), there is, in fact, just one argument, with two links. Arguments, then, need to be defined as chains: roughly, an A(rgument)-chain is a sequence of (one or more) coindexed links satisfying c-command, in a local domain (again skipping the definition of the local domain, which requires that there are no ‘‘barriers’’ between any of the links). Reinhart and Reuland (1993) argue that the anaphora and movement facts in (17)–(19) are governed by a condition on A(rgument)-chains, which has been assumed in syntax independently of the binding problems that (20) attempted to capture. If A-chains count as just one syntactic argument, they cannot contain two fully independent links. Specifically, coindexation that forms an A-chain must satisfy (21). (21) The A-chain condition An A-chain must contain exactly one link which carries structural Case (at the head of the chain). Condition (21) is clearly satisfied in (17a) and (18a), where the trace gets no Case. Turning to anaphoric expressions, Reinhart and Reuland argue that while pronouns are fully Case-marked arguments, anaphors, like NP-traces, are Case-defective. Consequently, it turns out that the binding conditions in (20) are just entailments of (21) (Fox 1993; Reinhart and Reuland 1993). If a pronoun is bound in the local domain, as in (17c) and (18d), an A-chain is formed. But the chain contains two Case-marked links, hence (21) rules this out, as did Condition B of (20). In all the other examples in (17) and (18), the A-chains satisfy (21), because they are tailed by a caseless link (NP-trace or anaphor). As Fox (1993) points out, if an anaphor is not bound in the local domain, it forms an A-chain of its own. For example, in (19a), Lucie and herself are two distinct Achains (i.e., two arguments, rather than one). The second violates (21), since it does not contain even one Case. Hence, (21) filters out the derivation, as did Condition A of (20). Condition A, then, is just a reflex of the requirement that arguments carry Case, while Condition B is the requirement that they do not carry more than one Case, both currently stated in (21).
The Anaphora Reference-Set Strategy
177
Recall that only arguments are required to have Case. So (21) does not prevent an anaphor from occurring unbound in a nonargument position. For example, the only di¤erence between (19) and (22) is that the anaphor in (22) is embedded in an argument, but is not an argument itself. (19) *Lucie i believes that we should elect herselfi . (22) Lucie i believes that we should elect Max and herselfi . Anaphors that are not part of a chain are commonly labeled ‘‘logophoric,’’ and the question when they are preferred over pronouns is dependent on discourse—rather than syntax—conditions (Pollard and Sag 1992; Reinhart and Reuland 1991). There is, however, an aspect of bound local-anaphora that is not covered by (21) (or (20)). Regarding Case, SE and SELF anaphors are alike. Nevertheless, while both can occur in (18c), in (23a), SE is excluded in (23b), which does not follow from (21). The di¤erence is that in (23b) a reflexive predicate is formed, because the anaphor and Max are coarguments. But in (23a) the SE anaphor is the subject of the embedded predicate. The same contrast is found in many languages. (23) a.
Max i ‘Max b. *Max i c. Max i ‘Max
hoorde [zich i /zichzelfi zingen] (Dutch) heard himself sing.’ hoorde zich i . hoorde zichzelfi . heard himself.’
Reinhart and Reuland argue that, universally, the process of reflexivization requires morphological marking. Thus, another principle is active here: (24) Reflexivity Condition A reflexive predicate must be reflexive-marked. A predicate can be reflexive-marked either on the argument, with a SELF anaphor, or on the predicate. (In the Dravidian language Kannada, the reflexive morpheme kol is used on the verb.) Since zich is not a reflexive marker, (23b) violates (24). Under this analysis, then, the anaphora facts that fall under Condition B in the principles-and-parameter framework, are in fact governed by two independent principles: the A-chain condition (21) and the Reflexivity condition (24). For convenience, I may continue to refer to the relevant anaphora contexts as Condition B contexts, or even use Condition B as a shortcut for these two principles.
178
Chapter 4
4.2.2 Restrictions on Covaluation A question that has been debated is whether there are also syntactic conditions on covaluation. So far, I assumed that covaluation is a free procedure that can be used everywhere (subject only to discourse conditions). But, in fact, there are well known cases where covaluation is excluded. Note, first, that the logical restriction on binding does not take us very far in filtering out anaphora in the strong-crossover case of (14), repeated here. (14) a. c.
Who did he say we should invite t? Binding *Who (lx (he (lx (x said we should invite x))))
(25) Covaluation (he ¼ x) a. aWho (lx (x said we should invite x)) a 0 . aWho (lx (x (ly (y said we should invite x)))) While the binding construal in (14c) is excluded, nothing so far prevents binding he to lx (i.e., A-binding he to who), as in (25a). In (25a), he and the wh-trace end up covalued, both bound by the same operator. The problem is that (14a) does not allow this covaluation construal. Precisely the same problem arises in (26a). (26) a. b. c.
She said we should invite Lucie. She (lx (x said we should invite Lucie)) Covaluation aShe (lx (x said we should invite Lucie & she ¼ Lucie)) With QR d. Lucie (ly (she said we should invite y)) d 0 . Lucie (ly (she (lx (x said we should invite y)))) Binding e. *Lucie (lx (she (lx (x said we should invite x)))) Covaluation f. aLucie (lx (x said we should invite x)) g. aLucie (lx (she said we should invite x & she ¼ Lucie))
Here as well, nothing needs to be stipulated to explain why no A-binding relations are possible between Lucie and she in (26b): Lucie is not the type of object that can be bound by the l-operator whose sister is she (since it is not a free variable). For completeness, let us check whether Lucie could A-bind the pronoun here. This would require, first, that Lucie undergoes covert movement, forming the l-predicate in (26d). Let us assume that
The Anaphora Reference-Set Strategy
179
QR is permitted here—that is, that (26d) is one of the derivations for (26a). In (26d), the variable-trace of Lucie is bound, just as in (14), so it cannot be A-bound by the pronoun. (Binding would yield here some illicit representation like (26e).) But so far nothing blocks a covaluation interpretation for (26a). This could be obtained most naturally with no QR, as in (26c), and the equivalent covaluation construals with QR are also available, as in (26f, g). The problem is again that in practice, the pronoun cannot be construed this way. How the wrong covaluation interpretations of (26a) are blocked has been a subject of debate. In the 1970s (when this was viewed as a coreference problem), it was assumed that there is a special syntactic restriction doing the job (Langacker 1969; Lasnik 1976). Reinhart (1976) formulated it as the requirement that a pronoun cannot corefer with a full NP it c-commands, which became known as Condition C of Chomsky (1981).6 Recall that under the present definition of A-binding, Condition C is not needed to exclude A-binding, which is independently ruled out by considerations of logical syntax. The question is whether it should be assumed as a condition on covaluation. The problem is not restricted to Condition C environments, but shows equally in Condition B contexts. Condition B, or the A-chain condition, excludes, for instance, the binding construal Max (lx (x saw x)) for Max saw him. But the question is what rules out the covaluation construal of such simple sentences (Max saw him & him ¼ Max). Again, this covaluation problem shows up equally in quantification contexts. In (27a), Condition B or the A-chain condition prohibits the A-binding of him by he. So (27b) is excluded. However, Heim (1998) points out that in semantic terms, it is still possible to obtain anaphora in (27a) without violating Condition B, as in (27c), where Everyone A-binds both pronouns, so he is covalued with him, but does not A-bind it, as can be witnessed in the fuller representation in (27c 0 ). (This recapitulates a problem noted by Higginbotham (1983), under a di¤erent notation.) (27) a. Everyone thinks that he can hear him sing in the bathroom. b. *Everyone (lx (x thinks that x (ly (y can hear y sing in the bathroom)))) c. Everyone (lx (x thinks that x can hear x sing in the bathroom)) c 0 . Everyone (lx (x thinks that x (ly (y can hear x sing in the bathroom))))
180
Chapter 4
If covaluation is governed by syntactic constraints, we need to modify Condition B (the A-chain condition and the Reflexivity condition), so that it excludes both binding and covaluation. We end up, then, with two conditions on covaluation: Condition C, and half of Condition B. We should note that, under the present assumptions, it is possible at least to unify these two conditions. I suggested two changes in the current view of binding theory that are needed independently of the questions of the restrictions on covaluation. First binding is defined in traditional logical terms, and, consequently, the distinction between binding and covaluation is indi¤erent to the referential status of the antecedent. These changes enabled us to see that Condition C is only needed for covaluation (since binding is anyway impossible in Condition C contexts, on standard logical grounds). Condition C can now be modified, such that it also handles the covaluation residue of Condition B. Note, first, that both Conditions C and B apply only when one of the DPs c-commands the other. (This was built into the definition of syntactic binding in (3).) In our terms, the relevant syntactic configuration is not directly determined by c-command, but rather by the question whether one of the DPs is in a configuration enabling it to A-bind the other, namely whether it is an argument (sister) of a l-predicate containing the other. In all instances of blocked covaluation discussed above, which are summarized in (28a–c), one of the boldfaced DPs is in a configuration to A-bind the other. When neither DP is in a configuration to A-bind the other, as in (28d), there are no sentence-level restrictions on covaluation (as entailed by the classical binding theory). (28) Strong-crossover (14b 0) a. Who (lx (he (ly (y said we should invite x)))) Condition C (26b) b. She (lx (x said we should invite Lucie)) Condition B c. Max (lx (x saw him)) No restrictions on covaluation d. The woman next to him (lx (x touched Max)) Covaluation permitted in a configuration of A-binding e. Lili thinks she has got the flu, and Max does too. (See (5b).) f. Only Lucie respects her husband. (See (6).) g. Every wife thinks that only she respects her husband. (See (7).) It appears, then, that the generalization is that whenever A-binding is possible, covaluation is blocked (clause (a) of (29) below). However,
The Anaphora Reference-Set Strategy
181
given our discussion so far, this generalization is too strong. We saw that there are cases where both A-binding and covaluation are permitted. These are illustrated again in (28e–g). Although one of the boldfaced DPs can A-bind the other, a covaluation construal is also possible. The di¤erence available so far between (28a–c) and (28e–g) is that although in both, one DP is in a configuration to A-bind the other in the first, binding is excluded. Let us, then, state this generalization in (29). (29) Modified Condition C(ovaluation) a cannot be covalued with b if a. a is in a configuration to A-bind b, and b. a cannot A-bind b. If correct, the generalization captured in (29) is that when the CS disallows binding, it also disallows covaluation, and it does not matter if binding is blocked by logical syntax (as in the case of the old Condition C), or by binding restrictions specific to natural language (Condition B). In fact, the empirical coverage of (29) is not just identical to that of the original Conditions C and B on covaluation. It is also su‰cient to capture some anaphora puzzles that cannot be captured by the original binding theory and Condition C (the Dahl-cases), to which I return in section 4.4. But we may note that, if true, (29) has some curious properties. For example, why should the covaluation option be dependent at all on the option of A-binding? Let us, first, pay more attention to the status of (29). 4.3
The Interface Strategy Governing Covaluation (Rule I)
4.3.1 Minimize Interpretative Options The empirical problem with (29) is the same as with its predecessors Conditions C and B: there are systematic contexts in which it can be violated. Reinhart (1983b) argued that this is possible whenever covaluation is not equivalent to binding. (30) a. [Who is the man with the gray hat?] He is Ralph. b. The patient does not remember who he is t. c. Only he (himself ) still thinks that Max is a genius. In (30a, b), it is not easy to imagine a construal of the truth conditions, which will not include covaluation of the pronoun with the NP or the trace. But this covaluation violates Condition C. In both cases, however, the covaluation reading is clearly distinct from the bound one: what is attributed of the pronoun in (30a, b) is not the property of self identity
182
Chapter 4
(lx (x is x)), which is what would be obtained by binding. Similarly, believing oneself to be a genius may be true of many people, but what (30c) attributes only to Max, is believing Max to be a genius. (The facts and their interpretation are discussed in Reinhart 1983b, Grodzinsky and Reinhart 1993, and, at greater depth, in Heim 1998. So I will not elaborate on them here.) The same empirical problems surface with the Condition B aspects of (29). In the Heim type examples, we may compare (27) to (31a). The latter permits anaphora, in violation of Condition B, or the modified Condition C. But it has only the covaluation interpretation in (31b). (The sentence will be false if, say, Max thinks that someone other than him may have heard him sing in the bathroom.) (31) a. Everyone thinks that only he can hear him sing in the bathroom. b. Everyone (lx (x thinks that [only x (ly (y can hear x sing in the bathroom))])) Other examples are given in (32). If the modified Condition C rules out covaluation in all contexts of Condition B, this would be right for (32a), but not for (32b, c).7 (32) a. The suspect saw him (& him ¼ the suspect). b. The suspect claims that he was at the opera at the time of the murder. But if it is true, only he (himself ) saw him there. c. You are you and she is she. Don’t lose your ego! An alternative view of the restrictions on covaluation was proposed in Reinhart 1983a, 1983b. Stated in terms of current syntactic theory, the view was that covaluation is not directly governed by a condition of the computational system, but by an interface strategy that takes into account the options open for the computational system in generating the given derivation. I still wish to defend this general approach here, though the specific view I took there regarding what this strategy is about may have been mistaken. I assumed there that the crucial factor that blocks coreference in the relevant environments is the fact that one could also opt for binding in these environments: the structural generalization is that covaluation is blocked only under c-command, which is the mirror image of where variable binding is always allowed. If we assume that variable binding is the more economical way to capture anaphora, it would follow that avoiding it, when the structure permits it with a di¤erent selection from the numeration, is uneconomical. It could only be justified when the covaluation interpretation is distinct from that of binding, so there is a reason to avoid binding. At that early stage, I assumed that
The Anaphora Reference-Set Strategy
183
the reason binding is preferred is that it is the more explicit way to express anaphora, so opting for it satisfies the Gricean maxim of manner.8 In the formulation of the coreference rule in Grodzinsky and Reinhart 1993, the same view was stated in more general terms of economy. That variable binding is more economical is possibly defendable, in terms of semantic processing. Compare the two interpretations of (33). (33)
Max loves his mother. a. Max (lx (x loves x’s mother)) b. Max (lx (x loves z’s mother) & (z ¼ Max))
In (a), where the pronoun is bound, the VP forms a set, and we just have to check whether Max is in it. In (33b), the pronoun remains a free variable. The VP remains an open property, and it has to be held open until the pronoun is assigned a value. Only when this happens can assessment take place. If it turns out that the intended value is Max anyway, it is not obvious why we had to go through assignment at all. The economy requirement would be, then, ‘‘get rid of free variables—that is, close open properties—as soon as possible.’’ So this appears to be an instance of the ‘‘least-e¤ort’’ principle of economy. This view of the economy requirement is developed, under a di¤erent terminology, in Fox 1998.9 If this reasoning is correct, then the relevant covaluation strategy can be stated as in (34). (34) Covaluation strategy (tentative) a and b cannot be covalued if a. a is in a configuration to A-bind b, and b. The covaluation interpretation is indistinguishable from what would be obtained if a A-binds b. Strategy (34) is identical in spirit to the generalization proposed in Reinhart 1983a and 1983b, but clause (a) reflects the changes introduced here regarding what binding is. Reuland (2001), assuming a generalization like (34), o¤ers a di¤erent rationale for why ‘‘least e¤ort’’ requires that (34) should hold. On his analysis, variable binding is a procedure taking place within the computational system (forming a chain), while coreference is a discourse procedure. He argues, roughly, that in general, procedures applying during the derivation are more economical than those applying at the interface. So when the first is available, an interpretation based on the second is excluded. However, as plausible as the ‘‘least-e¤ort’’ approach seems, it is not clear to me that the human processor is indeed sensitive to this type of
184
Chapter 4
economy considerations. The problem with this approach has always been that, in practice, (33) equally allows both construals of anaphora, as witnessed in the ellipsis context (35a). The predicate in the second conjunct can be construed as either that of (33a) or of (33b), which can only be obtained if (33) allows both. (35) a. Max likes his mother and Felix does too. b. He likes Max’s mother, and Felix does too (he 0 Max). c. Max praised him and Lucie did too (him 0 Max). At first glance, it may be assumed that the availability of the covaluation construal (33b) for (35a) is licensed by the ellipsis context (i.e., that the more economical variable binding construal can be avoided, because the covaluation reading is distinct in this context). But this is not so. Although ellipsis contexts enable the two construals to be distinct, they crucially do not license covaluation in and of themselves. In (35b), the fact that we want to use the predicate (lx (x likes Max’s mother)) in the elided conjunct does not enable covaluation of Max and he in the first conjunct. The same point is illustrated for Condition B environments in (35c). More generally, evaluating whether the bound reading is distinct from the covaluation reading can be based only on information in the derivation itself (perhaps relative to its previous context), but not on considerations of how it would a¤ect upcoming discourse. (In this sense, this type of economy remains local, as in other instances of economy. See Fox 1995, 2000, for an extensive discussion of this point, in the case of QR in ellipsis structures.) As stated, (34) will thus disable the relevant strict (or ‘‘coreference’’) interpretation of (35a). We may note that the problem is still restricted to VP-ellipsis contexts. In the other contexts, summarized in (28f–g) (like Only Lucie respects her husband ), the readings obtained by binding and covaluation are distinct locally, so clause (b) of (34) allows covaluation. Fox (1998) argues that the answer to the problem in (35a) should follow from the theory of ellipsis as PF-deletion, and not from the restriction on covaluation. In his approach, the first conjunct in (35a) indeed excludes covaluation. However, the relevant strict reading of the second conjunct is still generated, given the way the identity requirements on deletion are defined. At the present, however, this result can be derived only by adding stipulations. I conclude, for now, that this economy view has not been, after all, precisely on the right track. As mentioned, the intuition behind (34) has been that covaluation is excluded whenever an equivalent binding is possible. The alternative
The Anaphora Reference-Set Strategy
185
view that emerges is that it is the other way around: covaluation is excluded whenever an equivalent binding is impossible (in a configuration allowing it in principle). In fact, this was built into the formulation of the covaluation Rule I in Grodzinsky and Reinhart 1993, note 13, but the account was highly stipulative. Given the analysis in section 4.2.2, it is possible now to state the covaluation generalization in (36), which just adds a clause (c) to the covaluation generalization proposed in (29) as the modified Condition C. For the sake of continuity, I will continue to refer to the interface strategy (36) as ‘‘Rule I.’’10 (36) Rule I (an interface rule) a and b cannot be covalued in a derivation D, if a. a is in a configuration to A-bind b, and b. a cannot A-bind b in D, and c. The covaluation interpretation is indistinguishable from what would be obtained if a A-binds b. If there is a broader principle behind (36), it seems to be that if a certain interpretation is blocked by the computational system, you would not sneak in precisely the same interpretation for the given derivation, by using machinery available for the systems of use. It is easy to see why such a principle could be useful at the interface. The problem for users of linguistic derivations is how to minimize the set of possible interpretations of a given PF. The more options there are, the more mysterious is the fact that speakers manage to understand each other. In the specific case of anaphora resolution, the problem is how to restrict the set of potential antecedents for a given pronoun (i.e., the set of potential values). If the computational system provides a restriction of that set, it is not cooperative for users to overrule that, even if they have the machinery to do so. The broader economy principle relevant for Rule I, then, is not the unspecified ‘‘least e¤ort,’’ but rather the specific ‘‘minimize interpretative options,’’ that we considered in section 2.7.1. The principle prohibits applying a procedure that increases the number of interpretations associated with a given PF. (Allowing coreference where binding is excluded adds to the derivation an extra anaphora interpretation that is not available without this extension.) As we saw there, the same principle could explain why covert operations not needed for convergence are illicit— that is, why it is prohibited to increase the scope-construal interpretations of the given PF by applying QR. Let us assume Rule I in (36) as a specific strategy for anaphora resolution, which conforms with the general principle of minimizing
186
Chapter 4
interpretative options: it rules out an interpretation (a hd, ii pair) if it is indistinguishable from an interpretation ruled out by principles of the CS (Condition B, or the logical-syntax prohibition against binding a variable doubly). Had this interpretation been permitted, this would increase the anaphoric interpretation options of a given PF beyond what is enabled already by the CS, thus violating ‘‘minimize interpretative options.’’ Technically, what limits (36) to apply just in these cases is its clause (a). This is needed to exclude sentences like (37) from the scope of (36). (37) a. Max’s mother loves him (he ¼ Max). b. His mother loves Max (he ¼ Max). c. The lady next to him kissed Max (he ¼ Max). Neither Max nor the pronoun are in a configuration to bind the other in (37). Hence, (a) of (36) does not hold, and no further aspects of Rule I need to be checked. This means that covaluation is free in such structures, as far as Rule I is concerned.11 4.3.2 Reference-Set Computation Under either of the formulations of the covaluation strategy ((36) or (34)), if we get precise about the way it applies, it must involve referenceset computation: computing clause (c) requires constructing a reference set, which includes the current derivation under the covaluation interpretation and another member with the binding interpretation. If the two members are equivalent, the covaluation interpretation is blocked. (For (34), this is so because covaluation is less economical; for (36) because it enables bypassing a prohibition of the CS.) Let us specify the procedure of constructing the reference set for (36) as in (36 0 ). (36 0 ) To check clause (c) of (36), construct a comparison-representation by replacing b, with a variable A-bound by a. Let us now check in more detail how Rule I works in assessing whether covaluation should be permitted in a given derivation. In (33), repeated below, the derivation D that we are considering is (33b), where the pronoun remains a free variable. The question is whether his (z) can be covalued with Max. Since Max is in a configuration to bind his (i.e., (a) of (36) holds), the next clauses of Rule I must be considered. (33)
Max loves his mother. a. Max (lx (x loves x’s mother)) b. Max (lx (x loves z’s mother) & (z ¼ Max))
The Anaphora Reference-Set Strategy
187
However, (b) of (36) does not hold here. Since the pronoun is a free variable, in the scope of the l-operator, further application of binding to the given derivation would have enabled it to be bound (the same operation that applied in (33a)). Hence, the assessment is completed, and (c) of (36) need not be checked. Next, consider the ‘‘strong-crossover’’ case of (14), repeated in (38). The derivation on which anaphora assessment is computed is (38b). The question is whether he could be covalued with x, which would lead to the interpretation in (38c). (I assume that who can A-bind he here, but if we choose to do that, we obtain a covaluation of he and x, which needs to be checked.) (38) a. Who did he say we should invite t? b. Who (lx (he said we should invite x)) Covaluation c. Who (lx (x said we should invite x)) c 0 . who (lx (x (lz (z said we should invite x)))) d. Binding-comparison who (lx (x (lz (z said we should invite z)))) (c) 1 (d), hence he 0 x; (c) ruled out. The word he is in a configuration to A-bind x, so we turn to clause (b) of Rule I. Here, as we saw, no further operation on the derivation (38b) could allow he to A-bind x, since x is A-bound already by who. We are now considering the covaluation derivation in (38c), fully specified in (38c 0 ). To decide whether this is a possible construal, we have to check clause (c) of Rule I, namely, whether the result of covaluation is distinguishable from what we would have obtained by binding. This requires, first, constructing the comparison representation that would have been derived if binding was not excluded here. The procedure for constructing this binding comparison, is (36 0 ), repeated here. (36 0 ) To check clause (c) of (36), construct a comparison-representation by replacing b, with a variable A-bound by a. In (38c 0 ) the c-commanding x (he) is the a of Rule I, and the lower x (the trace) is b. (For convenience, the b-element is printed in boldface in (38) and the examples below.) The binding comparison is obtained by replacing the trace x with a variable A-bound by he, as in (38d). Next, we check whether the two representations are semantically distinguishable. We find that they are equivalent. So, the verdict of Rule I is that he cannnot be construed as x—that is, (38c) is not an appropriate interpetation of (38b).
188
Chapter 4
As we saw, the same reasoning is involved in (26), repeated in (39). (39) a. She said we should invite Lucie. b. She (lx (x said we should invite Lucie)) c. Covaluation She (lx (x said we should invite Lucie) & (she ¼ Lucie)) d. Binding-comparison She (lx (x said we should invite x) & (she ¼ Lucie)) (c) 1 (d), hence she 0 Lucie. ((d) ruled out). In (39b)—the derivation we are considering—she is in a configuration to A-bind Lucie (i.e., she is a and Lucie is b, of Rule I in (36)). We need to decide whether the covaluation in (39c) is a possible construal. The argument she cannot A-bind Lucie (since Lucie is not a free variable). Hence clause (c) must be checked. To check clause (c), a comparisonrepresentation should be constructed by replacing b (Lucie) with a variable A-bound by a. This is done in (39d). Since the binding in (39d) is equivalent to the covaluation in (39c), the latter is ruled out. In (26d–g), we checked, for completeness, the option that Lucie undergoes QR. I will return to this option shortly. In (40a), Max is in a configuration to A-bind him. Binding is excluded by Condition B, and no further operation could change that. We are considering the covaluation interpretation of the derivation, in (40b), which requires checking clause (c). (40) a. Max admires him. b. Covaluation Max (lx (x admires him) & (him ¼ Max)) c. Binding-comparison Max (lx (x admires x)) The binding comparison is (40c), which is the interpretation the sentence would have received, had binding been permitted. Since (40c) is equivalent to (40b), (40b) is excluded. The same holds when two bound pronouns are covalued, as in (27), repeated in (41). (41) a. Everyone thinks that he can hear him sing in the bathroom. Covaluation b. Everyone (lx (x thinks that x can hear x sing in the bathroom)) b 0 . Everyone (lx (x thinks that x (ly (y can hear x sing in the bathroom)))) c. Binding comparison Everyone (lx (x thinks that x (ly (y can hear y sing in the bathroom))))
The Anaphora Reference-Set Strategy
189
In the derivation (41b) both pronouns are A-bound by everyone. Thus they end up covalued, though neither A-binds the other. ((41b) is more fully spelled out in (41b 0 ), where it is obvious that there is no binding.) One occurrence of x is in a configuration to A-bind the other, and Condition B prohibits binding, so this covaluation needs to be checked with clause (c) of Rule I. The binding comparison is obtained by replacing the boldface variable with a variable A-bound by x. Since (41b) is equivalent to the binding comparison in (41c), the derivation is filtered out.12 In (38)–(41) covaluation is ruled out by clause (c) of Rule I, since it is indistinguishable from illegitimate A-binding. But we saw some examples where the two were distinct, like (30c), repeated in (42). (42) a. Only he (himself ) still thinks that Max is a genius. b. Covaluation Only he (ly (y thinks Max is a genius) & (he ¼ Max)) c. Binding comparison Only he (ly (y thinks y is a genius) & (he ¼ Max)) (c) is not equivalent to (b), hence (b) is allowed. As before, the covaluation construal in (42b) is subject to Rule I, since he is in a configuration to A-bind Max (or its trace, if QR applies), and Abinding is excluded. The binding comparison (42c) is, as before, the interpretation the derivation would have had, if binding was permitted. However, (42c) is not equivalent to (42b): the properties attributed only to he are di¤erent, hence the representations have di¤erent truth conditions. (Representation (b) could be true if everyone considers himself a genius, as long as no one but Max considers Max a genius. Representation (c) will be false in this situation.) Clause (c) of Rule I blocks covaluation only if it is indistinguishable from binding. Hence it does not block it here, and (42b) is allowed. The same reasoning applies in (30a), repeated in (43). (43) a. He is Ralph. b. Covaluation He (lx (x is Ralph) & (he ¼ Ralph)) c. Binding comparison He (lx (x is x) & (he ¼ Ralph)) (c) is not equivalent to (b), hence (b) is allowed. Representations (43b) and (43c) are not equivalent, the second being a tautology. Hence (43b) is allowed. In these examples, the representations are distinguishable since their truth conditions are distinct. In other cases,
190
Chapter 4
they may be distinguishable because only one of the properties, but not the other is relevant to previous context. (For discussion, see Reinhart 1983a and Heim 1998.) 4.3.3 Further Details of the Computation Sentences like (26), repeated below, which were traditionally governed by Condition C, deserve further attention. We have ruled out already the covaluation in (26c). However, as mentioned, it is necessary to also check their derivation with QR, as in (26d), since covaluation in sentences with this PF should be blocked under any derivation. Although the discussion may seem tedious, if just this problem is concerned, it is, in fact, also necessary for the more substantial problems I return to in section 4.4. (26) a. b. c.
She said we should invite Lucie. She (lx (x said we should invite Lucie)) Covaluation aShe (lx (x said we should invite Lucie)) & (she ¼ Lucie) With QR d. Lucie (ly (she said we should invite y)) d 0 . Lucie (ly (she (lx (x said we should invite y)))) e. Binding *Lucie (lx (she (lx (x said we should invite x)))) Covaluation f. aLucie (lx (x said we should invite x)) g. aLucie (lx (she said we should invite x & she ¼ Lucie)) g 0 . aLucie (lx (she (lz (z said we should invite x)) & she ¼ Lucie))
(44) Binding-comparison Lucie (lx (x (lz (z said we should invite z)))) With this derivation, there are two covaluation construals that need to be ruled out: (26f, g). (Recall, from the discussion of (13), that although these construals are equivalent, it is necessary to assume that a pronoun can also be covalued directly with the argument (sister) of the l-predicate, for the contexts of ellipsis.) Representation (26f ) is straightforward: she is covalued here with the variable x, which it cannot A-bind (since x is bound already). This is, then, the standard strong-crossover configuration. Since the result is equivalent to the binding comparison (44), covaluation is ruled out. The question, however, is how (26g) is ruled out. At first glance, the covaluation rule faces a problem here: technically, she is not covalued with the variable x, but with Lucie. Hence, a and b of this
The Anaphora Reference-Set Strategy
191
rule must be Lucie and she. Since Lucie is in a configuration to A-bind she, clause (b) of Rule I requires checking if Lucie can A-bind she in this derivation. If it can, covaluation will be wrongly permitted. Note that this problem is not related to the interface aspects of Rule I (37), but it arises in the same way for the modified Condition C (29). The question for both is whether Lucie can A-bind she here. We assumed two circumstances where binding is excluded: when prohibited by logical syntax, or by Condition B. Neither of these prevent Lucie from A-binding she in (26g). However, if binding applies, we obtain the representation (26f ), where she ends up covalued with x. This covaluation is illicit, as we saw already. So, in fact, Lucie cannot A-bind she here, since no licit interpretation can be obtained for this derivation, if it does. Admittedly, the reasoning involved here is somewhat complex: considering whether a given pronoun can be bound by a given l-operator (or its sister) requires considering the e¤ect this would have on the rest of the derivation within that l-predicate. However, in terms of semantic processing, this does not amount to reopening closed constituents: the computation is done within a predicate that is still open. (The top lpredicate contains the free variable she, and the lower contains the variable x, which is free in that predicate, as can be checked in the fuller representation (26g 0 ).) In the case of (26d), the whole issue could be possibly dismissed, if we assume that QR cannot apply arbitrarily in a given derivation, where it has no e¤ect on the interface, as in the case of (26d) (Chomsky 1995). However, the same problem would surface in other derivations where it is obvious that movement has applied, as in (45). (45) a. aLucie, she insisted we should invite e (& she ¼ Lucie). b. aShe insisted that we should invite Lucie, last week, and not Lili (& she ¼ Lucie). In (45a), topicalization applies overtly. In the elliptic conjunction (45b), QR of Lucie must apply for the conjunction to be interpretable. The computation of covaluation would work here precisely as illustrated in (26d). Of course, one could always resort to various formulations of reconstruction in such cases. (On such lines, the covaluation rule checks the relations of she and the copy of Lucie left in situ, and the derivation is ruled out just as (26).) But in the next section we will see an instance of the same form of computation, where no movement has occurred at all.13
192
4.4
Chapter 4
Covaluation in Ellipsis Contexts
Ellipsis contexts provide an anaphora problem which has regained much attention lately, most notabely in Fiengo and May 1994 and Fox 1998. (46) Max said that he likes his paper, and Lucie did too. Taking all pronouns in the first conjunct to be anaphoric to Max, the second allows three construals. Two of those are the familiar ones: the ‘‘strict’’ reading, where both pronouns are covalued with Max, and then Lucie too said that Max likes Max’s paper, and the bound or ‘‘sloppy’’ reading (under which Lucie said that Lucie likes Lucie’s paper). Apart from these, there are two logical options, only one of which can in practice be realized. Since judging the construals requires some processing, it may be easier to view them in their nonellided form below. (The presence of too usually requires the same sort of parallelism as required for ellipsis.) (47) Max said that he likes his paper, and Lucie too said that she likes his paper. (48) aMax said that he likes his paper, and Lucie too said that he likes her paper. Sentence (47) poses no problem, but (48) is funny (as long as the destressing and intonation pattern required by too is kept). In any case, (46) can be construed as in (47), but not as in (48). For the construal (47) to be generated, the first conjunct must be analyzed as in (49a). The predicates in the two conjuncts are, then, identical (with Lucie as the argument in the second conjunct, (49b)). (46) a. Max said that he likes his paper b. and Lucie did too. (49) a. Max (lx (x said that x likes his paper) & (his ¼ Max)) b. and Lucie (lx (x said that x likes his paper) & (his ¼ Max)) (50) a. Max (lx (x said that he likes x’s paper) & (he ¼ Max)) b. and Lucie (lx (x said that he likes x’s paper) & (he ¼ Max)) Similarly, to generate the construal (48), the predicate must be construed as in (50). Since (48) is an impossible construal (of 46), this means that something went wrong in (50a). The problem is, then, why (50a) is blocked as an anaphora construal of (46a), while (49a) is permitted.
The Anaphora Reference-Set Strategy
193
When anaphora is viewed as a sequence of coindexation of arguments, this is a serious puzzle, since from that perspective, there is only one representation of anaphora for (46a). (This is apparently what initiated the study of this problem originally.) But given the analysis here, (50a) turns out to be an instance of strong crossover. It is ruled out in the same way as the derivations just discussed in section 4.3.3. In the derivation (50a), the lower pronoun, his, was bound to the top lx operator, but the higher he was left as a free variable. When Max is encountered, the question is whether this free pronoun can now be covalued with Max. The fuller representation of (50a) is given in (51a). Since Max is in a configuration to A-bind he, Rule I (or the modified Condition C) has to check first whether Max can A-bind this pronoun. (51) a. Max (lx (x said that he (ly (y likes x’s paper)) & he ¼ Max)) b. Max (lx (x said that x (ly (y likes x’s paper)))) c. Max (lx (x said that x (ly (y likes y’s paper)))) The reasoning proceeds as in (26d) of section 4.3.3. In principle, Max could A-bind the pronoun. But then covaluation is obtained between he and x, as in (51b). This covaluation is illicit: he cannot A-bind his, since his is already bound. In this case, it is not movement that created the binding of his, but an optional step taken in the derivation, namely, the choice to bind it. Nevertheless, his is a bound variable, and cannot be A-bound again by he. In other words, if the derivation has started as in (51a), no licit operation could apply to derive (51c) from (51a). Since (51c) and (51b) are equivalent, (51b) is ruled out. Hence, in fact, Max cannot A-bind he and covalualuation is ruled out. In the derivation (49a), by contrast, clause (b) of Rule I does not hold. (Both Max and the variable x can A-bind his.) So nothing blocks covaluation. The fuller representation of (49a) is given in (52a). In this derivation, the lower pronoun his remains free, while the top one is bound. Since his is a free variable it can be A-bound by any c-commanding NP. If Max A-binds his, covaluation is obtained also between x and his. But this covalutation is permissible, since x could A-bind his, as in (52b). (52) a. Max (lx (x said that x (ly (y likes his paper)) & his ¼ Max)) b. Max (lx (x said that x (ly (y likes y’s paper)))) (53) Max likes his paper & (his ¼ Max). The situation in (52a) is, thus, analogous to the covaluation construal of (53). As we saw in section 4.3.2, covaluation is always permitted in such derivations, precisely because binding is not excluded.
194
Chapter 4
Thus, Rule I (or the modified Condition C in (29)) allows the covaluation derivation (49a) for (46a), but excludes (50a). Probably, the reason why the correlation between (50b) and crossover configurations could not be formulated in previous approaches is that this configuration is created here by a choice of a particular nonobligatory step in the derivation. The di¤erence between (46a) and the other instances of strong crossover is that for (46a) it was possible to take di¤erent steps in the earlier derivation that would have allowed covaluation (as in (49a)), while in the standard crossover cases, once the selection from the numeration was made, there is no way to derive a configuration allowing covaluation. By way of summarizing, we may observe that anaphora in sentences like (46) may be obtained under several construals, two of which are ruled out by Rule I (as well as by the modified Condition C, for such examples). (54) a. b. c. d. e. f. g.
Max said that he likes his paper. Max (lx (x said that x (lz (z likes z’s paper)))) Max (lx (x said that he likes his paper & he ¼ Max & his ¼ Max)) Max (lx (x said that he (lz (z likes z’s paper)) & he ¼ Max)) Max (lx (x said that x likes his paper & his ¼ Max)) aMax (lx (x said that he likes x’s paper & he ¼ Max)) aMax (lx (x said that x likes x’s paper))
Representation (54b) is the bound reading, which will lead to the ‘‘sloppy’’ construal of the ellipsis in (46). In (54c) both pronouns are covalued with Max. Both (54c) and (54d) lead to the ‘‘strict’’ construal of the ellipsis in (46). Representation (54e) is what we discussed in (49a), which leads to the construal (47) of the ellipsis. The next two construals are excluded by Rule I: in both, clause (b) or Rule I holds, since the lower x is bound. Hence clause (c) rules them out. (Nothing empirical hinges on (54g) being excluded, since in ellipsis contexts it would have led to the same interpretation as (54b). Nevertheless, this is an entailment of Rule I.) Fox (1998) o¤ers a di¤erent perspective on why (54f, g) are ruled out. On this view, the problem here is not directly with the covaluation of free variables, but with the binding of variables. This follows the spirit of a partial reformulation of Rule I, which was proposed by Heim (1998) and which Fox reformulates again as ‘‘rule H,’’ in (55). This is intended to replace only the Condition B aspects of Rule I. Heim and Fox assume that along with (55), one needs to assume also the tradional Condition C of the binding theory. However, Fox argues that once (55) is introduced, it also accounts for the problem under consideration here.
The Anaphora Reference-Set Strategy
195
(55) Rule H A variable x cannot be bound by an antecedent a, if a more local antecedent b could bind x yielding an indistinguishable interpretation. (54) f. aMax (lx (x said that he likes x’s paper & he ¼ Max)) g. aMax (lx (x said that x likes x’s paper)) (Note that (55) assumes the standard syntactic definition of binding as identity of variables.) On this formulation, the reason why (54f, g) are ruled out is that the lowest variable x (his) is long-distance bound by the top variable x, while the semantic representation that is obtained could have been obtained also if it was bound by a closer antecedent (he).14 Rule H is stated within the economy view of Rule I, which, as mentioned, I also shared in the past. As explained in section 4.3.1, the economy principle that could be behind it, is ‘‘get rid of free variables (i.e., close properties) as soon as possible.’’ Note, first, that for (55) to apply more broadly (and not just to the configurations of type (54)), one has to assume a specific view of discourse anaphora, developed in DRT. As stated, it appears that (55) has nothing to say on why anaphora is blocked in (56b), since there are no bound variables in this representation. (56) a. [Max is happy. . . . ] He really likes him. b. aHe (lx (x likes him) & (he ¼ Max, him ¼ Max)) However, Heim assumes that discourse anaphora (covaluation) is also a form of variable binding. Covalued pronouns are bound by a discourse l-operator, whose argument (sister) is some discourse entity. On this view, him in (56) is bound to the discourse entry Max, which is more remote than the local antecedent (he) that could have bound it with indistinguishable semantic results. Hence it is ruled out. (Since the pronoun is also prohibited by Condition B from being bound locally, this derivation has no anaphora construal.) Once this is assumed, we are back to the problem with the economy view, that I mentioned in section 4.3.1. Representation (54e), repeated below, turns out to violate Rule H as well. (54) e. Max (lx (x said that x likes his paper & his ¼ Max)) The pronoun his here is discourse-bound, while the representation is indistinguishable from what we would have obtained had we bound it to the more local x. So the derivation is ruled out. But we saw that, in fact,
196
Chapter 4
this construal is allowed. More generally, as stated, Rule H does not allow covaluation in sentences like Max likes his paper. This is why I concluded that although the rationale behind the economy, or locality, view seems reasonable, it is not the type of consideration that the human processor takes into account. In addition, Rule H cannot be extended to capture the full range of Condition C e¤ects, for which Heim and Fox assume additional constraints. On the other hand, (54f, g), where Rule H seems most successful, are ruled out anyway, by the version of Rule I presented here. The locality e¤ects illustrated in (54f, g) are thus an entailment of Rule I, rather than an independent condition. 4.5
The Psychological Reality of Rule I
Let us return now to the question whether covaluation is governed by syntactic principles, or by an interface strategy like that formulated in Rule I. Recall, first, that with the present definitions of binding and covaluation, it is possible to unify Condition C and the covaluation residue of Condition B, while also improving their empirical coverage. The modified Condition C and Rule I are repeated for the comparison. (29) Modified Condition C(ovaluation) a and b cannot be covalued in a derivation D, if a. a is in a configuration to A-bind b, and b. a cannot A-bind b in D. (36) Rule I a and b cannot be covalued in a derivation D, if a. a is in a configuration to A-bind b, and b. a cannot A-bind b in D, and c. The covaluation interpretation is indistinguishable from what would be obtained if a A-binds b. On the empirical side, we saw that Rule I is closer to capturing the facts: when covaluation under c-command (i.e., in an A-binding configuration) is distinct from binding, it is allowed, correctly, by Rule I. However, this empirical gain appears to come at a heavy cost. This is visible already in the formulation of the two rules. Rule I, at least as presented here, just adds a further clause to Condition C. But the major cost is in terms of processing: the computation involved in assessing covaluation by Rule I is highly complex, as we saw. It requires constructing a reference set that includes a comparison representation, and then assessing the semantic
The Anaphora Reference-Set Strategy
197
relations between the two representations. The syntactic condition, by contrast, requires much simpler computation. Furthermore, the empirical gap between the two approaches is not gigantic. In most cases, they yield, by definition, the same result, di¤ering only in the complex contexts we have observed. Under the present formulation, Condition C captures also the anaphora in ellipsis contexts discussed in section 4.4. The di¤erence will show up again only in the cases where covaluation is distinct from binding, discussed for these contexts in Fox 1998. Even in contexts where covaluation is distinct from binding, the use of covaluation that goes against Condition C is not the most common choice. Except for identity contexts (like He is Max), speakers often prefer using an ambiguous derivation, rather than using this option. For instance, (42), repeated below, is permitted by Rule I, as we saw, because its covaluation interpretation is distinguishable from an alternative binding representation. Nevertheless, (57a) may be, in practice, the preferred option for expressing the same covaluation interpretation. This is so despite the fact that (57a) is ambiguous between the two readings observed in (57b, c), while (42) has only the reading corresponding to (57b). (42) Only he (himself ) still thinks that Max is a genius. (57) a. Only Max (himself ) still thinks that he is a genius. b. Only Max (lx (x thinks that he is a genius & he ¼ Max)) c. Only Max (lx (x thinks that x is a genius)) On theoretical grounds, it would not be unreasonable, under these circumstances, to dismiss the empirical di¤erences between the two approaches, at least for the time being, and opt for the syntactic approach, which looks less costly. This, indeed, is the line taken by many researchers. Nevertheless, there is much stronger empirical evidence that a complex strategy, rather than a mechanical syntactic rule is at work here. As just noted, Rule I entails that computational complexity is involved in derivations of the type we have been considering, hence one may expect a visible processing cost.15 Let us, first, recall which steps in Rule I involve a processing complexity. If either clause (a) or clause (b) does not hold, the assessment ends there, with nothing complex about it. For example, we saw that in (58) (¼ (37)), clause (a) is not met, hence nothing could preclude covaluation. In (59) (¼ (33)), clause (b) does not hold, since binding could have applied. Hence, anaphora is permitted, whether it is construed as binding or as covaluation.
198
Chapter 4
(58) Max’s mother loves him (he ¼ Max). (59) Max loves his mother. But if both (a) and (b) hold, assessment must go through clause (c), which is the costly step. These are all and only the derivations that violate the syntactic Condition C, as modified in (29). (60) She said we should invite Lucie. (61) Max admires him. (42) Only he (himself ) still thinks that Max is a genius. In these cases, a comparison representation must be constructed and compared to the intended covaluation representation. In terms of processing, it does not matter whether the final verdict of Rule I is ‘‘allow’’ as in (42), or ‘‘disallow’’ as in (60)–(61). In both cases, the decision requires a complex procedure. This, then, is the crucial di¤erence between Condition C and Rule I. Condition C requires precisely the same steps in computing covaluation in (58)–(59) and in (60)–(61), while for Rule I, computing the second is a much more complex enterprise than computing the first. If there is empirical evidence that the second involves indeed a processing di‰culty absent from the first, this is evidence for Rule I. As we will see in detail in section 5.1, this was unexpectedly confirmed in studies of the acquisition of anaphora. Children consistently fail on tasks involving step (c) of Rule I, and only on those tasks. Children’s performance on anaphora in (58) and (59) more or less parallels that of adults. But in the cases of (60)–(61), they perform at chance level. Sentences like (42) were not studied. However, the theoretical expectation, if Rule I is at work, is that children would have the same di‰culty with (42). More generally, they will have the same di‰culty with interpretations ruled in or ruled out by clause (c) of Rule I. Other studies (starting with Wexler and Chien 1991) show that the problem arises only with covaluation construals. When anaphora could only mean binding (e.g., with quantified antecedents), children do not make these mistakes. The processing cost of clause (c) of Rule I may also explain why, in practice, adults as well do not opt for it too easily either. Sentence (57) is often preferred over (42), since constructing and comparing representations in a reference set is a high cost to pay for avoiding ambiguity.
Chapter 5 The Processing Cost of Reference-Set Computation
The theoretical prediction of this framework is that whenever referenceset computation is involved, there should be some evidence of processing complexity. Apparently, the processing load posed here is not beyond the processing ability of adults (given, e.g., that stress-shift is used easily and frequently). But unexpected evidence that the required processing may exceed children’s processing abilities came from the studies of the acquisition of Condition B (unexpected, because the studies that first found it were not set up with this purpose in mind, which is often the strongest evidence there is). Grodzinsky and Reinhart (1993) argue that the findings in the acquisition of Condition B contexts (governed in the present framework by Rule I) indicate that children are unable to execute the computation, which, as they know innately, is required for this task. They resort, instead, to bypassing strategies, which in the case of coreference, is simple guessing. As we will see, another bypassing strategy is found in other areas of reference-set computation. Grodzinsky and Reinhart’s point of departure was that the statistics of children’s performance should be taken as a significant part of the data. In the experiments on coreference in Condition B contexts, children responded correctly on average about 50 percent of the time (which in individual experiments can range between 35 and 65 percent). This is not a very common range of performance. In many tasks, some of which will be mentioned in this chapter, children perform at close to a 100 percent range (namely, at 80 to 95 percent adultlike responses). In cases where they set the parameter wrongly, the experimental design should be able to elicit close to zero correct results. In the acquisition literature, 50 percent findings are often just summarized as ‘‘children performed poorly’’ and are not distinguished from below-chance performance. But Grodzinsky and Reinhart argue that results in the 50 percent range require a specific explanation. The reason is that they are consistent with chance, and
200
Chapter 5
if, indeed, they can be shown to be chance performance, this is a significant finding that cannot be reduced to just not knowing the relevant linguistic rule. (Grodzinsky and Reinhart argued that if children just do not know a rule, they should respond uniformly, most likely following the established ‘‘yes’’ bias.) In subsequent work on Condition B, much e¤ort was invested in proving that the 50 percent results do not indicate chance or guessing (e.g., that some children master the rule and some do not). Nevertheless, there is by now evidence that the performance (of at least most children) on Condition B is by pure chance. To establish whether a 50 percent performance at the group level indeed indicates a guess strategy, statistics of individual child performance should be compared to the binomial model of results expected by pure chance (e.g., in a coin-tossing experiment). As we will see in section 5.1.4, Thornton and Wexler’s (1999) statistical analysis of their experimental findings is consistent with this model, at least for most of the children they studied. In section 5.2.3, a detailed comparison of children’s performance with the binomial model shows the same result in another area, of switch reference with stress-shift. Since guess performance is definitely not a common pattern in acquisition, the question is what could lead a child to resort to guessing. I pursue here the processing account of Grodzinsky and Reinhart, arguing that the computation required by reference-set computation exceeds children’s processing ability, specifically the limitations of their working memory, which is not as developed yet as that of adults. The MIT Encyclopedia of Cognitive Sciences describes the workingmemory system as follows: ‘‘Cognitive scientists now assume that the major function of the system in question is to temporarily store the outcomes of intermediate computations when problem solving, and to perform further computations on these temporary outcomes (e.g., Baddeley 1986)’’ (Smith 1999, 888). Reference-set computation relies heavily on this ability to store and perform further computations on temporary outcomes. The same is required of course in other computations during processing, but because of its global nature (holding, constructing, and comparing full representations), the load posed on working memory by reference-set computation is heavier. Independently of this computation, it has been by now pretty well established in psychology that children’s working memory is not yet fully developed. An extensive survey of the literature and findings in this area can be found in Gathercole and Hitch 1993 and Gathercole and Baddeley 1993. (For an example of experiments on the linguistic e¤ects of this lim-
Processing Cost of Reference-Set Computation
201
itation with preschool children, see Gathercole and Adams 1993; Adams and Gathercole 1996.) The working-memory system should not be confused with memory resources in general (long-term memory). For instance, an anonymous reviewer argues against the hypothesis put forth here that ‘‘children are capable of memorizing a large number of new words; they are capable of learning rules of new games, etc.’’ But these tasks concern memory resources in general, regarding which I am not aware of attested limitations in children. (The huge amount of information that children manage to learn suggests, in fact, the opposite.) Smith (1999) explains that the view of working memory as a gateway to longterm memory has been undermined by neuropsychological studies that have found that there are patients who are impaired on working-memory tasks but perform normally on long-term memory tasks. Note also that the precise details of how working memory develops—whether memory capacity itself increases, or only e‰ciency in allowing more resources to be employed in storage, is a subject of debate. But these details are not important for the present discussion, because either way, children’s working memory was found not to operate as e‰ciently as adults’. The hypothesis, then, is that the chance performance found in the acquisition of coreference reflects children’s inability to process the required computation, because of their limited working memory. Thornton and Wexler (1999) challenge this hypothesis and conclude that there is no reason to assume children have any di‰culties in processing referenceset computation. Rather, the coreference delay reflects, in their view, a pragmatic deficiency. I address their arguments in section 5.1, including problems posed by the acquisition of coreference in Condition C environments, and argue that they do not provide evidence against the processing account. Furthermore, their pragmatic account cannot explain the chance performance found in their own study. Once a processing failure is established in one of the areas of referenceset computation, we can turn to the other areas where this computation has been established. The empirical criterion I have posed here for whether reference set is indeed involved in a given computation is that if it is, children are unable to carry out the required computation, resorting instead to strategies to bypass it. The simplest option we have observed so far is individual guessing, but in areas involving semantic disambiguation there is another option of establishing an arbitrary default, a strategy I examine in detail in sections 5.2.4 and 5.2.5. When this option is available, it is possible that individual children fix a consistent default across all tasks in a given area. If this happens, we would find a consistent
202
Chapter 5
individual response pattern. In the relevant dual-choice experiments, fixing on one of the defaults would consistently yield an adultlike answer; fixing on the other would produce the nonadultlike answer. However, even if the child fixes a default in advance for all tasks, the choice of the default is still arbitrary. For this reason, the overall group performance would still be in the chance range of 50 percent (in dual-choice experiments). It may appear possible to interpret such findings as indicating that some children have mastered the required computation while others have not, but then it is not trivial to explain why this division is always in the 50 percent range. Relevant questions are why exactly half the children of the same age group in all relevant experiments have reached adult performance, and why the same proportion of maturation di¤erences among children is not found in other areas where maturation is assumed to take place. In section 5.2, I examine the acquisition of stress-shift and focus, which is another area that requires reference-set computation. I argue that contrary to some prevailing views, there is no evidence that children have general problems with stress, but still, we find the 50 percent range of performance when stress-shift applies, as predicted by the processing-cost hypothesis. Analysis of the explanations children give for their answers reveals that they are attempting to construct the relevant comparison derivation, but they get stuck at that stage. While the group results on all tasks examined in this section are in the 50 percent range, analysis of individual responses reveals two bypassing strategies. One is simple guessing, dominant in tasks involving switch-reference with stress-shift. The other, dominant in tasks involving semantic disambiguation, is the selection of an arbitrary default, which may be fixed for a given child across tasks. The empirical prediction of this analysis, then, is that in all areas of reference-set computation we should find children’s performance in the 50 percent range, whether this is obtained by the guess or by the default strategy. This is a strong claim, meaning that if no such performance is found, no reference-set computation can be involved. Of the four established areas where this computation is required, evidence is already available, along with Rule I and stress-shift, for the acquisition of scalar implicatures, which I discuss in section 5.3. Several studies have found a 50 percent range of group performance in children’s understanding of implicatures. Chierchia et al. (2001) and Gualmini et al. (2001) argue that processing an implicature requires the construction of a reference set, and that children fail in carrying out this computation. This is an
Processing Cost of Reference-Set Computation
203
area of semantic disambiguation, where the default strategy is possible; hence the findings reveal consistent individual responses in most cases. In the area of QR, insu‰cient relevant experimental work has been done, to my knowledge. Some recent experimental attention has been devoted to the relative scope of quantifiers and negation (e.g., Lidz and Musolino 2002; Musolino and Lidz, forthcoming). However, this problem involves reconstruction rather than QR—the subject quantifier reconstructs back into its SpecVP position, which enables interpreting it in the scope of negation. No reference-set computation is involved in reconstruction, so these issues are irrelevant to our question. QR is a more difficult area to test because of the interference of other factors in children’s interpretation of quantifiers, like the phenomenon of quantifier spreading and other issues.1 A possible experimental design specific to QR is outlined in Reinhart and Szendro˝ i 2003. In any case, the empirical criterion proposed here still holds. If the 50 percent pattern is not found, once the work is done extensively, this would falsify my analysis of QR in section 2.7, and will confirm Fox’s (2000) version of the analysis, in which QR involves local rather than global reference-set computation, and hence it is not a costly operation. A further question that needs to be established is where precisely the breaking point of processing for children is. Reference-set computation at the interface has two components. Stated from the perspective of comprehension, one is that an additional representation, di¤erent from the parsed input, must be constructed (creating the reference set). The other is the semantic comparison of the two (or more) representations, against the context. The second of these components is, in fact, at work in many areas of language. All instances of semantic disambiguation require this second type of processing. In the case of quantifier scope with indefinites, as in Every student read a philosophy book, the derivation is ambiguous regarding the existential closure of the indefinite choice function (whether it is in the scope of every student or not). As we saw, this ambiguity is not generated by an application of QR; hence, no reference-set computation is involved. Still, the two possible closures need to be evaluated and decided against the context. In the case of focus, under the analysis assumed here, all derivations with neutral stress are always focus-ambiguous, and one member of the focus set needs to be selected in context. Semantic disambiguation poses a certain amount of processing load. From a purist perspective on optimal design, the vast existence of semantic ambiguities is a sort of imperfection, because of this processing load. (I discuss this question in section 5.2.5.) Nevertheless, it is di¤erent in
204
Chapter 5
nature from the severe imperfections leading to reference-set computation. In the first case, the derivation contains everything needed for the interface, though it requires extra processing to use it. So the CS outputs meet the essential requirement of enabling the interface, but at a cost to the performance systems of processing and acquisition. In the second case, the outputs of the CS are not su‰cient for the interface, so an illicit operation or interpretative procedure needs to apply to repair the deficiency. This di¤erent nature of the imperfection manifests itself in the amount of processing load, since the second case requires an additional task of constructing an alternative derivation. Another di¤erence between the resolution of semantic ambiguities and reference-set computation is that in the first case it is often possible for adults to develop meaningful defaults that enables bypassing the required computation in many contexts. As we will see, in the case of reference-set computation, defaults are not always possible. But when they are, they can be based only on statistical probability. Given, however, that semantic disambiguation also imposes a load on working memory, an empirical question is whether this already surpasses children’s capacity. (If it does, we have a serious problem of acquisition at hand, given that these ambiguities are very common.) In Grodzinsky and Reinhart 1993 we assumed that children’s failure in lexical disambiguation indicates that it does. But in Reinhart 1999b I argued that it is specifically the step of constructing a representation not available at the input that is the source of the problem. Although children may also have problems with semantic disambiguation, the hypothesis that I will pursue further here is that it is not the same sort of problem, and that it does not lead to the same processing crash that is witnessed in the 50 percent performance range. 5.1
Acquisition of the Coreference Rule I
Chien and Wexler (1990) were pioneers in establishing the basic generalization regarding the acquisition of anaphora.2 Based on experiments with a large number of children (177, age 2;6 to 7;0), they showed that acquisition delays are found only with coreference, not with the binding theory in general: children perform well on the variable-binding aspects of the binding theory, including Condition B, but poorly on coreference in Condition B environments. This conformed with Reinhart’s (1983a, 1983b) theoretical conclusion that variable binding and coreference are governed by di¤erent types of linguistic conditions. The conditions on
Processing Cost of Reference-Set Computation
205
binding are absolute output conditions, while the conditions on coreference are relative and context dependent. The acquisition question has been why this di¤erence should entail a delay in children’s performance on coreference. In Reinhart 1983b the condition on coreference was perceived as belonging to pragmatics. It involves an inference based on knowledge of grammar, meaning, and appropriateness to context, and I believed it could be viewed as an instance of Gricean generalized implicatures— another area where poor performance of children has been discovered, to which I turn in section 5.3. Chien and Wexler formulated a similar intuition and argued that children’s coreference performance reflects a delay in acquiring the context considerations underlying a pragmatic principle. Grodzinsky and Reinhart (1993) took a di¤erent perspective on this question. Their point of departure has been another key result of Chien and Wexler’s study. Virtually all studies on the acquisition of coreference have found not just vague poor performance, but results ranging around 50 percent adultlike answers. Such figures are, in principle, consistent with chance performance. Chien and Wexler conducted careful statistical analyses, including individual data, and found that many of the children perform at chance level individually—that is, they sometimes answer yes and sometimes no on the same condition. If so, this is a pattern of guessing that is not known to be common in acquisition. Grodzinsky and Reinhart argued that the account for the coreference delay should also explain this specific pattern. Grodzinsky and Reinhart’s account rests on a later development of the coreference condition, stated as Rule I (Intrasentential Coreference). While in the 1970s and 1980s everything that was not governed by syntax proper was lumped together as ‘‘pragmatics,’’ in the 1990s the concept of the interface was beginning to emerge. Rule I, discussed in chapter 4, views the coreference restriction as belonging to the context interface, where all the components required for the coreference inference are available (syntax, semantics, and context). In today’s terminology, Rule I is a procedure involving reference-set computation, namely, an optimalitytype procedure comparing two competing representations. To determine whether coreference is permitted in a given derivation, another representation with a bound variable should be constructed. Coreference is permitted only if the two are not equivalent in the given context. The computation involved in coreference is thus more complex than that involved in binding, and Grodzinsky and Reinhart argue that it is the
206
Chapter 5
computational complexity, rather than just the appeal to context, that explains children’s di‰culties in the relevant tasks. The processing poses too big a load on their working memory, which is known to be less developed than that of adults (see extensive surveys in Gathercole and Hitch 1993; Gathercole and Baddeley 1993). Failing the execution, they may resort to guessing. Thornton and Wexler (1999) adopt Rule I, under its reformulation in Heim 1998, but they raise several arguments against the processing account, and conclude that there is no reason to assume children have any di‰culty processing reference-set computation of the type required by Rule I. They maintain that the coreference delay reflects a pragmatic deficiency, and develop an analysis of the pragmatic factors underlying Rule I that children have not acquired yet. The broader question underlying this debate is the role of processing considerations in acquisition. This factor has hardly been considered in studies on the acquisition of syntactic competence. However, given that working-memory limitations are known to exist in children (as we saw in the introduction to this chapter), it would make sense to determine which e¤ects of acquisition delays can be traced to this factor. I will survey this debate here, and argue in favor of the processing approach. Let us start with an overview of the binding conditions and Rule I. The substance of the approach I assume was given in chapter 4, but the presentation in the overview is organized around the questions that are crucial for the discussion of the acquisition of anaphora, and mainly Condition B contexts. I will also use this occasion to provide more background information on the history of the questions addressed. (As explained in chapter 4, I use the name Condition B here as an abbreviation for the A-chain condition and the reflexivity condition. But, in any case, the syntactic details of the condition are irrelevant for the formulation of Rule I.) 5.1.1 An Overview of Binding and Rule I It is by now well established that intrasentential pronominal anaphora has two interpretations: binding and covaluation (coreference). In the first, the pronoun (originally a free variable) is bound by some operator; in the second, the pronoun picks up the same value (reference) as some other argument in the sentence. The most obvious instance of covaluation is coreference, where the value is a referential discourse entity, but other instances are discussed in Heim 1998, Reinhart 2000, and chapter 4 here). Quantified DPs cannot serve as antecedents for coreference (having no
Processing Cost of Reference-Set Computation
207
reference), so they can normally enter only bound anaphora relations. But referential DPs allow both relations—for example, there are two anaphoric construals for (1) that can be represented as in (2). (1) Lucie thinks she is smart. (2) a. Lucie (lx (x thinks x is smart)) b. Lucie (lx (x think she is smart)) & she ¼ Lucie In (2b) the pronoun corefers with Lucie. In (2a), the pronoun is bound by the l-operator. In the framework of syntactic binding theory, the conditions on binding must be stated in terms of relations between arguments (DPs). Hence, Lucie is said to bind the pronoun in this representation. However, this means that syntactic binding must be defined as in (3) (from Reinhart 2000; see also Heim 1998). (3) Binding a binds b i¤ a is an argument of a l-predicate whose operator binds b. (4) Lucie thinks she is smart and Lily does too. Representations (2a) and (2b) are of course equivalent, in isolation. But, as was discovered in the 1970s (since Keenan 1971), certain contexts show that there is a real ambiguity here. For instance, assuming that she ¼ Lucie, the elliptic second conjunct of (4) can mean either that Lily thinks that Lucie is smart (the ‘‘strict’’ reading), or that Lily thinks Lily herself is smart (the ‘‘sloppy’’ reading). The first is obtained if the elided predicate is construed as in (2b), and the second, if it is the predicate of (2a). The conditions under which bound-variable anaphora is possible are pretty much agreed on, and they are summarized in (5). (5) (Variable) binding condition b can be construed as a variable bound by a i¤ a. a c-commands b, and b. b is a free variable, and c. In the local domain of a, b is not a pronoun. (Condition B) Condition (5a) defines the structural configuration for binding assumed since Reinhart 1976.3 Condition (5b) does not need to be stated as a specific condition, but it is the obvious condition imposed by logic: only free variables can be bound by an operator. Pronouns and anaphors are commonly viewed as variables, so these are the candidates for being bound (leaving aside here more complex instances of free variables). Condition (5c), by contrast, is a specific condition of natural language. Condition B
208
Chapter 5
of the binding theory (or any of its alternatives) determines that pronouns cannot be bound in the local domain of the binder. Only anaphors can be bound in that domain. Thus, in (6a) the first two conditions of (5) are met: a free variable is c-commanded by a potential binder. Nevertheless, the interpretation (6b) is ruled out by (5c). There are various views on the formulation of Condition B, as well as the question of why it should exist in natural language, but this topic is irrelevant to the present discussion. (6) a. Every lady praised her. b. *Every lady (lx (x praised x)) (7) a. Lucie praised her. b. *Lucie (lx (x praised x)) c. *Lucie (lx (x praised her)) & her ¼ Lucie The same condition obviously rules out the binding construal with a referential DP, as in (7b) for (7a). However, this is not su‰cient to rule out anaphora in (7a). In principle, a free pronoun can pick up its value anywhere, so nothing so far rules out its picking up Lucie, as in (7c). In fact, however, the sentence cannot have this covaluation reading. So we also need to define the conditions on covaluation. These conditions are more complex. On the one hand, covaluation is much freer than binding, and it does not require c-command (The woman who praised him hates Max). On the other hand, the two anaphora types still obey some shared restrictions. Specifically, covaluation also appears to obey Condition B (7c). Presently this means that Condition B has to be stated so that it restricts both binding and covaluation (coreference). How this could be done is not a trivial question. As we saw, anaphora can mean two very di¤erent things (binding and covaluation). For this reason, it is unreasonable to assume that both interpretations can be captured by one and the same coindexing mechanism, as in the classical binding theory. (A survey of the problems can be found in Reinhart 1983b, 2000.) Let us assume such problems can be solved. (For instance, binding and covaluation are captured by di¤erent types of indices, or Condition B is stated twice, in slightly di¤erent terms—for binding and for covaluation.) Still we should note that there is another way to approach this problem, which avoids such questions. An observational generalization that emerges is that covaluation is generally free, except in the c-command domain. Recall that this is the domain that enables variable binding. In this domain, it turns out that if binding is excluded, covaluation is excluded as well. Let us state this in (8).
Processing Cost of Reference-Set Computation
209
(8) Covaluation Condition (Temporary) a and b cannot be covalued if a. a is in a configuration to bind b, (namely, a c-commands b) and b. a cannot bind b. Suppose that in the processing of (7a) (Lucie praised her), we are considering assigning the pronoun the value of Lucie, namely, the covaluation of she and Lucie (7c). For that, (8) needs to be consulted. Lucie is in a configuration enabling it to bind the pronoun, but actual binding is ruled out by Condition B (5c). Hence, (8b) determines that the covaluation in (7c) is also disallowed. In their empirical coverage the two approaches to Condition B e¤ects are precisely identical. But in the second, rather than checking a direct structural restriction on covaluation, we need to consider an altogether di¤erent question, namely whether binding is possible in our given derivation. On this view, covaluation is not directly governed by a condition of the computational system, but by an interface strategy that takes into account the options open to the computational system in generating the given derivation. At this point, this second approach may seem a weirdly indirect way to capture the given facts. But this becomes less so, when we consider another set of facts. It was noted in Reinhart 1983a and 1983b that in the case of coreference, we can find systematic violations of Condition B, as in (9). (Coreference is marked by italics.) (9) a. Despite the big fuss about Felix’s candidacy, when we counted the votes, we found out that, in fact, only Felix himself voted for him. (Reinhart 1983a) b. I dreamt that I was Brigitte Bardot and I kissed me. (George Lako¤, discussed in Heim 1998) c. You are you and she is she. Don’t loose your ego! (10) *Oscar is depressed these days. He almost seems t to hate him. (Meaning ¼ ‘to hate himself ’) Contextually similar examples were noted in Evans 1980 for (what became known as) Condition C.4 Evans argued that the reason his equivalent examples are permitted is that although the pronoun ends up coreferring with a c-commanding NP, it is not referentially dependent on that NP, but rather it picks up its value from the previous mention of this referent in the discourse. If this is the correct explanation, it is not clear why Condition B violations are not always possible. It is known that actual discourse tends to maintain referential continuity, so in a large
210
Chapter 5
majority of cases, a potential antecedent has been mentioned already before. Still, an arbitrary instance of referential continuity, like (10), does not allow Condition B violation. I argued in Reinhart 1983b that the reason coreference is possible in (9) is that the coreference interpretation is clearly distinguishable from the interpretation that would be obtained by variable binding. No such distinction can be found in (10). (The semantic distinction is discussed further in Grodzinsky and Reinhart 1993, and at greater depth in Heim 1998. I will illustrate it briefly below.) If a comparison with the bound interpretation is relevant for deciding whether covaluation is possible in a Condition B environment, it is very di‰cult to see how this could even be stated by a purely structural condition. The covaluation condition (8), by contrast, easily enables stating this by adding the clause (11c). The resulting set of conditions (11) is Rule I (Intrasentential Coreference) of Grodzinsky and Reinhart (1993), as reformulated in Reinhart 2000 and in chapter 4. (11) Covaluation Rule I a and b cannot be covalued if a. a is in a configuration to bind b, (namely, a c-commands b) and b. a cannot bind b and c. The covaluation interpretation is indistinguishable from what would be obtained if a binds b. In (9a) (Only Felix voted for him), Felix is in a configuration to bind him, (11a); hence (11b) needs to be consulted for covaluation. Felix cannot bind him; hence (11c) needs to be consulted. But covaluation and binding are distinguishable here. Hence the third conjunct of (11) does not hold, so (11) does not rule out covaluation. In the case of (9a), the distinction is clearly truth conditional: the reading obtained by covaluation (Only Felix (lx (x voted for Felix))) is true only when no one else voted for Felix, while the reading that would have been obtained by binding (Only Felix (lx (x voted for x))) may be true if many people voted for Felix, but he is the only person who voted for himself. More broadly, in all the examples of (9), applying (11c) would show that the bound-variable interpretation is distinguishable from the covaluation reading. Heim (1998) points out that defining the notion ‘‘distinguishable interpretation’’ is not a trivial matter, and develops the notion of guises to account for some of the contexts that she believes are not captured by logical equivalence alone. However, in many of her cases, a contrast can still be found with the bound-variable interpretation (I will return to the area where our views di¤er). For (9b), Grodzinsky and
Processing Cost of Reference-Set Computation
211
Reinhart (1993) argue that this is because most likely Lako¤ ’s dream did not involve an act of self-kissing. Heim’s concept of guises can capture this intuition in a di¤erent way. In the identity cases like (9c), Reinhart (1983b) argued that the bound-variable interpretation (You (lx (x is x))) is a tautology, while the intended covaluation reading (You (lx (x is you))) is an empirical statement. Heim (1998) developed the concept of ‘‘structured meaning’’ to handle such cases. In (10), as in (9), (the trace of ) Felix c-commands him (11a) and cannot bind it (11b). But here, neither the internal semantics of the sentence nor the context provides any possible distinction between the covaluation interpretation and the binding interpretation. Since all conjuncts of (11) hold, it rules out covaluation. It is because of clause (c) of (11) that the covaluation rule cannot be a simple structural condition on coindexation outputs. Rather, it is an optimality-type condition. To compute (11c), a reference (comparison) set must be constructed. For example, for (12), the binding construal (13b) is ruled out by Condition B, which is an absolute, nonnegotiable, condition. But suppose we consider assigning the free pronoun the value of Oscar, namely, (13a). For this, we need to construct the reference set (13). Although the derivation at hand does not allow the interpretation (13b), it needs to be constructed, and compared with (13a). Only if the two are distinct at the relevant context is (13a) allowed. (12) Oscar hates him. (13) Reference set for covaluation a. Oscar hates him & him ¼ Oscar. b. Oscar (lx (x hates x)) There are several approaches to the question of why the computation of covaluation requires a comparison of representations in a reference set, namely, what is behind Rule I. Initially I assumed (Reinhart 1983a, 1983b) that what governs the covaluation condition is the fact that in configurations of c-command one could also opt for binding. A predecessor of this view was Dowty (1980), who proposed that the underlying principle was ‘‘avoid ambiguity.’’ The scope of his proposal was only instances of (what became) Condition B: for (12), he observed that replacing the pronoun with the reflexive anaphor would yield an unambiguous anaphora interpretation, while the choice of a pronoun allows both an anaphoric and a nonanaphoric interpretation. I argued that this is a more general phenomenon (found also when opting for variable binding still leaves the derivation ambiguous between the anaphoric and nonanaphoric construals). Assuming that binding is in general a more explicit
212
Chapter 5
way to express anaphora than covaluation, avoiding it for no interpretative reason suggests noncoreference. This is also the approach taken by Grodzinsky and Reinhart 1993, stated there in more general terms of economy. Several more sophisticated lines of analysis attempt to explain Rule I in terms of ‘‘least-e¤ort’’ economy, most notably Fox 1998 and Reuland 2001.5 However, as we saw in section 4.3, this approach has always had a persistent problem, acknowledged already (with no solution) in Grodzinsky and Reinhart 1993—it could not explain why both binding and coreference are possible when binding is permitted. I argued in that section that the appropriate conclusion is that these potential economy considerations do not, in fact, play a visible role in anaphora resolution, and there is no obligatory preference for variable binding over coreference or covaluation, when both are allowed by the CS. Thus, it may still turn out that such considerations determine a preference for binding over coreference as the first interpretation in neutral discourse, but revising this preference is not costly. I proposed instead that Rule I is an instance of the broader economy requirement ‘‘minimize interpretive options.’’ An interpretative option ruled out by the computational system should not be sneaked back in arbitrarily by procedures available at the discourse level. In our case, if binding is ruled out—that is, if the set of interpretative options of the pronoun is restricted by the CS—Rule I determines that one cannot obtain precisely the same interpretation by using the discourse option of covaluation. One may wonder, then, why Rule I applies just in the case of Condition B environments. The answer is that it does not. In fact, Rule I is a general restriction on covaluation. The other environment where covaluation is not allowed is when the pronoun c-commands the antecedent. This has been originally perceived as a structural condition on coreference. The condition has essentially remained unchanged since its first formulation in Reinhart 1976 (14a), and it is presently known as Condition C (14b). (14) a. ‘‘A given NP cannot be interpreted as coreferential with a nonpronoun in its c-command domain’’ (Reinhart 1976). b. Condition C (Chomsky 1981) An R-expression is free Definitions i. An NP is bound i¤ it is coindexed with a c-commanding NP. ii. An NP is free i¤ it is not bound. iii. An R-expression is any DP that is not a free pronoun or an anaphor.
Processing Cost of Reference-Set Computation
213
The term pronoun in (14a) was defined to include also reflexive pronouns, so a nonpronoun is neither a pronoun nor a reflexive. If coreference is to be captured by coindexation, (14a) determines that a nonpronoun cannot be coindexed with a c-commanding NP. Condition (14b) captures the same generalization, by use of the definitions (i)–(iii). Free is defined as not coindexed (i.e., not coreferential) with a c-commanding NP. An R-expression is a nonpronoun (or anaphor). The term has nothing to do with reference—bound variables, like wh-traces, are also defined as Rexpressions. The two formulations in (14) thus express the same condition, based on the view that both binding and coreference (or more broadly covaluation) are guided by a structural rule. Let us assume, for the moment, that Condition C, just like Condition B, is a condition on (variable) binding, namely, that it should be added to the binding conditions in (5). This means that binding is impossible in (15a)—that is, that (15b) is ruled out. But just as in the case of Condition B (exemplified in (7)), this is not su‰cient to rule out the covaluation construal in (15c). (15) a. She thinks that Lucie is smart. b. *Lucie (lx (x thinks that x is smart)) c. *She (lx (x thinks that Lucie is smart)) & she ¼ Lucie However, once Rule I in (11) is assumed, nothing needs be added to it to rule out (15c). In (15a), she c-commands Lucie. Hence, if we are considering covaluation of the two, (11b) needs to be consulted. By Condition C, she cannot bind Lucie in (15b). The fate of (15c) now depends on clause (c) of Rule I. In the given context, the covaluation reading (15c) is equivalent to (i.e., indistinguishable from) the bound reading (15b). Hence (15c) is ruled out. In (16), by contrast, the two readings are truth-conditionally distinct. (In (16b) considering oneself smart is said to hold only of Lucie; in (16c) considering Lucie smart holds only of Lucie.) Hence Rule I allows covaluation here. (16) a. Only she thinks that Lucie is smart. b. Only Lucie (lx (x thinks that x is smart)) c. Only she (lx (x thinks that Lucie is smart)) & she ¼ Lucie I will return to other instances where Rule I permits coreference in apparent violation of Condition C. In fact, the evidence for clause (c) of Rule I is much stronger in the case of Condition C than with Condition B. (See note 4 for a potential reason.) All of Evans’s (1980) original examples were with apparent Condition C violations, and historically my major
214
Chapter 5
motivation in Reinhart 1983b was capturing the interpretative and conceptual problems posed by Condition C. So far we have assumed that, similarly to Condition B, Condition C is still needed as a structural condition on binding, independently of covaluation. Let us now turn to the question whether this is indeed so. To check this, let us incorporate Condition C into the binding condition we assumed before in (5), as clause (d) of (17). (17) (Variable) binding condition (with Condition C) b can be construed as a variable bound by a i¤ a. a c-commands b, and b. b is a free variable, and c. In the local domain of a, b is not a pronoun (Condition B), and d. b is not an R-expression. (Condition C). Recall that the definition of R-expression covers anything that is not a free pronoun or anaphor—that is, that cannot be construed as a free variable. (Wh-traces are R-expressions.) Thus, (17d) just repeats (17b). Clause (b) is, of course, crucial for binding, but as I have mentioned, it is just a prerequisite of logic that does not have to be stated as a specific linguistic condition. (One cannot imagine that natural language would be usable at the interface, if it had a concept of variable binding undefined in logic.) Our original binding condition (5), is thus su‰cient to capture the restrictions on variable binding under consideration. (As in the standard theory, weak crossover is not captured by what has been stated here; see note 3.) Condition C is thus superfluous as far as binding goes. However, since the linguistics community seems attached to Condition C, we may leave open here the question whether clause (d) of (17) is needed independently of clause (b)—in other words, whether Condition C exists. Either way, Condition C, like Condition B, only restricts variable binding, and the crucial point here is that covaluation (coreference) in a Condition C environment is governed by the covaluation Rule I. Let us examine another example of how clause (b) of (17), or Condition C, interacts with Rule I in classical cases where it is assumed that Condition C is at work. (18) a. Who does she think t is smart? b. Who (lz (she think z is smart)) c. *Who (lz (z think z is smart)) In the strong-crossover structure (18b), the pronoun she c-commands the trace z. But by (17b) (namely 5b), it cannot bind it, because the trace is a
Processing Cost of Reference-Set Computation
215
bound, rather than a free variable. Who can still bind the free pronoun she. In this case we would obtain (18c), where the pronoun and the trace are covalued—both get the value z. (That she—z does not bind the trace—z can be verified with definition (3).) However, since she both c-commands the trace z, and cannot bind it, Rule I in (11) determines that this covaluation is ruled out. Condition C e¤ects in quantification contexts (*Shei thinks that every ladyi is smart) are precisely analogous, assuming that the quantified DP undergoes QR. Turning now to the acquisition of Rule I, as established since Chien and Wexler 1990, there is a sharp contrast between children’s performance on bound anaphora and on coreference. Specifically, children rule out variable-binding construals in Condition B contexts at a rate of 80 to 90 percent, but they perform at around 50 percent on ruling out coreference in these contexts. Grodzinsky and Reinhart assume that both the binding conditions and Rule I (or the broader strategy behind it) are innate, and fully available to the child: there is no deficiency of information, or any factor that needs to be acquired. The child also masters innately the basic laws of logic, and has the tools to compute logical equivalence, as required by clause (c) or Rule I. But the di¤erence in children’s performance on binding and coreference follows from the different types of computation involved in resolving these two types of anaphora. Computing clause (c) of Rule I requires constructing, holding, and carrying a semantic comparison of a reference set with two representations, and Grodzinsky and Reinhart argue that the amount of processing required by this step exceeds children’s working-memory capacity. As noted in the introduction to this chapter, it is established in psychology that children’s working memory is not yet fully developed. (As mentioned, an extensive survey of the literature and findings in this area can be found in Gathercole and Hitch 1993 and Gathercole and Baddeley 1993.) Given that factor, one may assume that children know precisely what they have to calculate in order to answer the questions in the experiments, but they fail to execute the required procedure. I will provide more details on how this works in section 5.1.2.1. Grodzinsky and Reinhart argue that the crucial indication for workingmemory failure is in the statistics of children’s performance. What the repeated experiments on coreference in Condition B environments confirmed is that at the relevant experimental setting, the results are at chance level (approximately 50 percent of adultlike performance). As mentioned, Chien and Wexler (1990) showed that, in these circumstances, chance performance is also found in individual children (conflicting
216
Chapter 5
answers on the same condition), which indicates a guess pattern. The same results were confirmed in Thornton and Wexler 1999, who showed that (for most children in their experiments) the analysis of the individual results corresponds to the binomial-model probability of arbitrary choices between two options. I will turn to their statistical findings and their significance in section 5.1.4, where I also discuss the experimental conditions under which chance performance is to be expected. As mentioned, 50 percent performance consistent with a guess pattern is not commonly found in acquisition. If children do not know a given rule, one may still expect a uniform performance pattern of individual children. But if the source of the di‰culty is a processing failure these results are explained: to resort to a guess, the children have to know that they are missing something. (Otherwise they would operate uniformly based on their assumptions on what the relevant rule is, or, in case of default strategies, according to the default.) This condition is met here because the children know innately that they have to execute the comparison required by clause (c) of Rule I. Since they get stuck in the execution, and there is a pressure to answer either yes or no, one of these is chosen arbitrarily. 5.1.2 Thornton and Wexler’s Arguments against the Processing Account Thornton and Wexler (1999) argue against the processing account. They assume a version of Rule I that follows its reformulation in Heim 1998. As mentioned, on Heim’s view, there are some contexts where what enables a coreference reading is that the two NPs pick up the shared referent under distinct guises. In all other contexts, Heim’s analysis is the same as outlined above. But Thornton and Wexler extend her analysis to all contexts and argue that it is not the need to construct and evaluate a comparison set that hinders children’s performance on coreference, but rather a pragmatic deficiency in identifying the use of guises. Thus, children’s di‰culties do not reflect a processing limitation, but problems with contextual orientation that develops with age. Let us first examine the main arguments of Thornton and Wexler against the processing account of Grodzinsky and Reinhart (1993). One argument is conceptual. They say that ‘‘on Grodzinsky and Reinhart’s account, the processing bottleneck that children encounter is ‘of the sort known to diminish with age’ (1993, 91). Thus, they do not share the assumption that children have access to a universal parser (see Crain and Wexler 1999; Crain and Thornton 1998). Rather, the child’s processing system has di¤erent properties from adults’, and Rule I remains problematic until this system matures’’ (Thornton and Wexler 1999, 47).
Processing Cost of Reference-Set Computation
217
I definitely share the theoretical assumption of a universal parser, in the references cited by Thornton and Wexler. But Grodzinsky and Reinhart’s point of departure is precisely that the children’s parser, being innate, is identical to that of adults. They argue that ‘‘there is no known reason to assume that any of the steps [of Rule I] requires knowledge that surpasses children’s innate endowment. . . . But the execution of all these steps, in the specific case of structures like [Oscar touched him] puts a much heavier burden on working memory than do other rules (e.g., the binding conditions). . . . If this is so, then presented with [such sentences], children know exactly what they are required to do by Rule I, but getting stuck in the execution process, they give up and guess’’ (Grodzinsky and Reinhart 1993, 88). The di¤erence between children and adults, in this case, is only in the size (or e‰ciency) of their working memory. It is commonplace wisdom that precisely one and the same parser (software), applying in two hardware-systems di¤ering only in the size of their memory, may fail at some tasks in one system, but not in the other. A di¤erence in memory space cannot be described as a di¤erent parser, nor precisely as a di¤erent processing system. Rather, acknowledging commonplace wisdom in psychology, that children’s working memory is smaller than adults’ (or their use of its resources is not yet fully developed), enables us to explain how the same innate computational system and parser can still fail in children’s processing. Conceptual issues aside, Thornton and Wexler raise two arguments against Grodzinsky and Reinhart’s analysis, which they summarize as follows: There are two main problems with Grodzinsky and Reinhart’s account. . . . First, there is little or no evidence to support the proposal that some sentences containing pronouns (e.g., Mama Bear is washing her) cause a processing overload whereas others (e.g., Mama Bear is washing her face . . .) do not. Second, there are reliable experimental findings showing that whereas children misinterpret pronouns in principle B structures, they do not have di‰culty with parallel principle C structure (i.e., Mama Bear is washing her vs. She is washing Mama Bear). On the Rule I account, both should be equally di‰cult to process. (p. 52)
Let us examine each of these arguments. 5.1.2.1 Processing Load The empirical prediction of Grodzinsky and Reinhart is that all tasks that require the processing of clause (c) of Rule I (as stated here) would lead to chance performance of children, which they take as evidence for a heavy processing load. In the present formulation, clause (c) is the step that requires a semantic reference-set computation. Let us see, first, how this works. Rule I is repeated below.
218
Chapter 5
(11) Covaluation Rule I a and b cannot be covalued if a. a is in a configuration to bind b, (namely, a c-commands b) and b. a cannot bind b and c. The covaluation interpretation is indistinguishable from what would be obtained if a binds b. Suppose the child is considering coreference assignment in a given derivation. This means Rule I must be consulted. If either clause (a) or clause (b) of (11) does not hold, the assessment ends here, with nothing complex about it. (19) a. Max’s mother loves him (& he ¼ Max). b. The woman next to Max praised him (& him ¼ Max). (20) a. Mama Bear is washing her face (& her ¼ Mama Bear). b. Mama Bear is washing herself (& herself ¼ Mama Bear). In (19), clause (a) of Rule I (11a) does not hold—neither of the candidates for covaluation c-commands the other. Hence, clause (b) (11b) need not even be consulted, and the covaluation goes through. In (20a), clause (a) holds, so clause (b) must be checked. However, under the present formulation of Rule I, clause (b) does not hold since binding Conditions B and C do not rule out the binding of the anaphoric element. So assessment ends here, and coreference is permitted. There is no evidence that the need to check clause (b) of (11) poses any processing di‰culties to children. But if both clauses (a) and (b) hold, assessment must go through clause (c), which is the costly step. These are the cases of coreference in apparent violation of Conditions B (21) and C, ((22)–(23)). (21) *Mama Bear is washing her (& her ¼ Mama Bear). (22) *She is washing Mama Bear (& she ¼ Mama Bear). (23) Only she is washing Mama Bear (& she ¼ Mama Bear). In these cases, a comparison representation must be constructed and compared to the intended coreference representation. In terms of processing it does not matter whether the final verdict of Rule I is ‘‘allow,’’ as in (23), or ‘‘disallow,’’ as in (21)–(22). In both cases, the decision requires a complex computation. A question that arises is what it is precisely about step (c) of (11) that exceeds the processing ability of children. In fact, two procedures take place in applying this clause. It is easiest to spell them out from the
Processing Cost of Reference-Set Computation
219
perspective of the comprehension side of the parser. First, in order to determine whether coreference is distinct from binding, the binding representation needs to be constructed. This representation is not available at the input derivation (which is associated with the phonological input the parser receives), since the input derivation does not allow binding. So the parser has to construct an alternative representation with variable binding. (The details of the procedure of constructing the alternative derivation are discussed in chapter 4.) The next step is semantic computation: the two representations need to be compared against the context; only if they are distinct is coreference allowed. The second procedure seems similar in nature to that involved in semantic disambiguation, where two representations need to be compared in order to select the one appropriate to the context. Semantic disambiguation itself already poses a processing load, because it requires holding two (or more) representations in working memory, an issue I turn to in section 5.2.5. It is known that children (like adults) tend to develop defaults to bypass the parsing of semantic disambiguation, which provides some evidence for the greater processing load posed by this task (see, e.g., Crain, Ni, and Conway 1994). So one may ask which of the parsing procedures involved in reference-set computation surpasses the capacity of children’s working memory. Grodzinsky and Reinhart assumed that the disambiguation task is already beyond children’s ability, and cited children’s performance on lexical disambiguation as evidence. (Faced with a lexically ambiguous word, children select the reading that is statistically more frequent, rather than comparing the competing readings against the context.) But in Reinhart 1999b, I suggested that the conclusion of Grodzinsky and Reinhart was mistaken, and it is only the full complex involved in reference-set computation that leads to a processing crash of the child’s parser. (As pointed out in Thornton and Wexler 1999, in the case of lexical disambiguation, alternative analyses of children’s performance are available.) More empirical work is needed on children’s performance on semantic disambiguation, but the hypothesis put forth here is that we expect a processing failure only when the computation requires also the first step of constructing a derivation not available at the parser’s input. The theoretical expectation, then, is that if other areas of language are found that require reference-set computation, with properties similar to clause (c) of Rule I, under the appropriate experimental setting we should find processing failure of children in these areas as well.
220
Chapter 5
As we will see, Grodzinsky and Reinhart’s prediction that for all anaphora tasks involving step (c) of Rule I, it should be possible to find instances of chance performance, has been empirically confirmed, including contexts similar to (23). However, Thornton and Wexler o¤er an alternative account for children’s di‰culties in all these cases (except (23)). Hence it is appropriate for them to raise the question what independent evidence exists that the source of di‰culty here is indeed the processing load. In principle, it should be possible to test directly the processing load in Rule I tasks by standard measurement of processing time, or more sophisticated eye-tracking experiments. To my knowledge, this has not been done. Another type of possible independent evidence is if the same chance performance is found also in areas other than anaphora, where, on the one hand, reference-set computation has been established, and, on the other, Thornton and Wexler’s pragmatic analysis cannot apply. In section 5.2 I will examine the evidence for this in the areas of stress-shift for focus. In section 5.3 we will observe the same in the area of scalar implicatures, which also involve semantic reference-set computation.6 5.1.2.2 Condition C The second major argument of Thornton and Wexler concerns Condition C. The theoretical stand underlying their argument is that Condition C must be assumed as an independent syntactic condition. (This follows Heim (1998), who modified Rule I to apply only in Condition B environments, leaving the covaluation problems with Condition C for future research.) However, as we saw in section 5.1.1, whether Condition C is needed for binding or not, is independent of the question under consideration, of children’s performance on the coreference aspects of Condition C. Be that as it may, it seems to me that Thornton and Wexler should expect exactly the same behavior in Condition C environments, as Grodzinsky and Reinhart do. This is because they state that, as in Chien and Wexler 1990, they continue to assume that the pragmatic generalization governs both the coreference aspects of Condition B and of Condition C (p. 31), and they provide an alternative account of why children’s performance on Condition C coreference appears better than that on Condition B (see below). This is not surprising, since both Grodzinsky and Reinhart and Chien and Wexler 1990, further developed in Thornton and Wexler, share the assumptions of Reinhart 1983b that variable binding and coreference are governed by di¤erent types of rules. Chien and Wexler’s study was pioneering in establishing that, correspondingly, children perform well on binding tasks (in both Condition B
Processing Cost of Reference-Set Computation
221
and C environments), but perform at around 50 percent in coreference tasks. So, under both analyses one may expect the same with the coreference aspects of Condition C. Nevertheless, Thornton and Wexler also argue that given that Condition C is innate, children should not have problems with its coreference aspects, and use this as an argument against Grodzinsky and Reinhart’s analysis. Let us follow this argument. With the exception of Grimshaw and Rosen (1990), who found nearchance performance on sentences like their (24a),7 most studies found that children rule out coreference disallowed by Condition C at a much higher rate than they do in their performance on Condition B. (24) a. *He said that Bert touched the box (he ¼ Bert). b. Because he heard a lion, Tommy ran fast (he ¼ Tommy). However, Grodzinsky and Reinhart argued that the apparent improved performance on Condition C might reflect an independent factor: in a right-branching language, the most frequent instances of Condition C violations also involve backward anaphora. With the exception of Crain and McKee (1986), studies that found a high rate of rejection of anaphora in (24a) also found that children reject backward anaphora in structures like (24b), where it is permitted by Condition C. These studies attribute both results to an independent directionality factor, and conclude that children reject backward anaphora regardless of Condition C. (To mention just a few: Tavakolian 1977; Solan 1978; Lust and Cli¤ord 1982). In their chapter 2, Thornton and Wexler argue in reply that the findings regarding directionality e¤ects are just a product of the experimental setting, specifically, of using act-out or elicited imitation tasks. Thus, they conclude that there is no evidence for children’s independent di‰culties with backward anaphora, which means that the reason they reject anaphora in structures like (24a) can only be adherence to Condition C (p. 49). Interestingly, however, in chapter 3, Thornton and Wexler encounter a directionality problem for their own analysis. Their account for children’s performance on Condition B is that they have not mastered the use of guises yet. Thus, they may obtain ‘‘local coreference’’ under two di¤erent guises in (25a), where the adult conditions for obtaining distinct guises are not met. One would expect, then, that children should also allow local coreference in the Condition C environment (25b), by precisely the same procedure of assigning di¤erent guises to the relevant entity. Still, in Thornton and Wexler’s experiments, children rejected local coreference in sentences like (25b) at a high rate of 92 percent, while they
222
Chapter 5
rejected coreference in sentences like (25a) only at a rate of 57 percent, with mostly chance performance. (25) a. Mama Bear washed her. b. She washed Mama Bear. (Thornton and Wexler 1999, 106, (25)) Thornton and Wexler explain that ‘‘the crucial di¤erence between sentences subject to principle B and those subject to principle C is the obvious one: in the former, the pronoun is in object position, and in the latter, the pronoun is in subject position’’ (p. 106)—that is, the crucial factor is directionality. They proceed to o¤er two reasons why when the pronoun precedes the potential antecedent (as also in 24a), anaphora computation will be blocked independently of the guises options. One is in terms of processing: pronouns are assigned a reference as soon as they are encountered. If she in (25b) has been assigned the reference of Mama Bear (from the previous discourse), then deciding whether it could corefer with the next occurrence of Mama Bear, under a di¤erent guise, would require backtracking this step and starting a new guise-computation. It is this backtracking which is di‰cult for children, or as Thornton and Wexler (1999, 107) conclude, ‘‘obviously, an online incremental parser would find this amount of computation burdensome.’’8 Presumably, then, children do not even consider the coreference option in such contexts. Whether this is the precise formulation of the directionality factor or not, it confirms Grodzinsky and Reinhart’s conclusion that with backward anaphora there is an independent factor that disables the application of Rule I for children. Possibly the factor is not just any directionality, as they assumed, following previous studies, but only directionality involving the subject, as proposed by Thornton and Wexler. In any case, it is this directionality factor that explains why children reject coreference in the common Condition C contexts, independently of Rule I. As it turns out in their chapter 3, also in Thornton and Wexler’s analysis, children’s improved performance on Condition C tasks does not provide any evidence for their mastery of the computation of the coreference aspects of Condition C. In fact, they assume, like Grodzinsky and Reinhart, that children bypass Rule I, or the guises computation, in these environments. Grodzinsky and Reinhart (1993) suggest that to circumvent the directionality factor, children’s performance on the coreference aspects of Condition C should be checked in the few instances where this factor is absent, most notably in reconstruction contexts such as (26). (26) *Near Ann, she saw a lion (she ¼ Ann).
Processing Cost of Reference-Set Computation
223
Under the reconstruction analysis, Condition C (and hence, Rule I) blocks coreference here because, once reconstructed to its original position, the PP is c-commanded by the pronoun. On the other hand, there is no directionality e¤ect here, since during processing the pronoun follows the antecedent. Indeed, in such environments, experiments have reached a clear consensus: children perform poorly, with exact figures varying according to the experimental method (e.g. Ingram and Shaw 1981; Taylor-Browne 1983; Lust, Loveland, and Kornet 1980). Thornton and Wexler dismiss these findings as well (chapter 2), based on the claim that one of them (Taylor-Browne 1983) has used what they consider deficient methodology. (They ignore in this chapter the question how come children all of a sudden master guises computation in these contexts.) They argue further that a more recent study—Chierchia and Guasti 2000—has proved unequivocally children’s mastery of Condition C in reconstruction contexts. In fact, however, Chierchia and Guasti’s study focused on bound variable anaphora, such as (the Italian version of ) (27). (27) *In the barrel of every piratei , hei carefully put a gun. Indeed, children rejected anaphora in such sentences 90 percent of the time. But this is precisely the expected result for both Chien and Wexler (1990) and Grodzinsky and Reinhart. Rule I is not involved in the processing of (27). Bound variable anaphora is governed directly by the binding conditions, whether the relevant condition here is Condition C, or clause (b) of (17) (the logical requirement that only free variables can be bound). The crucial assumption of Grodzinsky and Reinhart (as of Thornton and Wexler) is that children should face no problem in the processing of variable binding. In Grodzinsky and Reinhart’s terms this is because a heavy computational load is only involved in coreference tasks, where Rule I needs to be consulted to determine whether coreference is still permitted although binding is ruled out. Chierchia and Guasti, in fact, emphasize this point in that same paper, stating explicitly that they did not study coreference in these structures, but their theoretical expectation is that in coreference tasks, the same di¤erence would be found between children’s performance on bound-variable (quantified) anaphora and on coreference, as found in Condition B environments. It remains the case that to properly check children’s performance on Rule I in Condition C environments, one should abstract away from possible directionality factors. An unexpected further confirmation that if this is controlled, children perform at chance level, comes from Thornton
224
Chapter 5
and Wexler’s own experiments on coreference in VP-ellipsis, with sentences like (28). (28) The kiwi bird cleaned Flash Gordon and he did too. In the story context, the kiwi bird and Flash Gordon fell in the mud. Flash Gordon asked a third participant to help clean him, but that one refused. The kiwi bird helped clean Flash Gordon, but mostly, Flash Gordon had to clean himself on his own. Children accepted (28) as true in this context 54 percent of the time. In other words, they allowed the pronoun he to corefer with Flash Gordon at chance level. Let us first examine the type of computation required to determine whether coreference is permitted here. At the interpretation stage, the lpredicate formed in the first conjunct (29a) is present also in the second conjunct (29b) (whether copied from the first, or just deleted at PF, but present at LF). (29) a. The kiwi bird (lx (x cleaned Flash Gordon)) and b. He did (lx (x cleaned Flash Gordon)) too (& he ¼ Flash Gordon). Now, we are considering assigning the pronoun the value of Flash Gordon, which would result in a covaluation configuration in clause (29b). The first clause of Rule I (11) holds: the pronoun c-commands Flash Gordon in (29b), hence Rule I must be checked further. Clause (b) holds as well—the pronoun cannot bind Flash Gordon, by Condition C, or its equivalent logical prohibition (clause (b) of 17). So clause (c) of Rule I must be applied, namely the representation (30) should be constructed and compared with (29b). (30) He did (lx (x cleaned x)) too (& he ¼ Flash Gordon). The coreference construal (29b) is permitted only if (29b) is distinguishable from (30) in the context of (29a). The fact of the matter is that it is. The parallelism requirement is that the predicates in the two conjuncts are identical (under the relevant definition). The predicate in (30) does not occur in (29a). The only candidate for parallelism is the predicate as construed in (29b). In more intuitive terms, the property shared by the two events is that of cleaning Flash Gordon, not of cleaning oneself. The type of meaning distinctness this example shows is similar to that in (31), observed by Evans (1980), and discussed in Reinhart 1983b, Grodzinsky and Reinhart 1993, and Heim 1998. Heim 1998 labeled such contexts ‘‘structured meaning’’ contexts.
Processing Cost of Reference-Set Computation
225
(31) I know what Ann and Bill have in common: she thinks that Bill is terrific, and he thinks that Bill is terrific. (Adapted from Evans 1980, (49)) (32) a. She (lx (x thinks that Bill is terrific)) and b. He (lx (x thinks that Bill is terrific)) (& he ¼ Bill ). (33) He (lx (x thinks that x is terrific)) (& he ¼ Bill ). The last conjunct in (31) violates Condition C. Nevertheless, the coreference interpretation (32b) is permitted. In this case, parallelism is not imposed by ellipsis, but by the content of the preceding context, which requires identifying a shared property of Ann and Bill. Although the proposition (32b) is equivalent to (33), the properties attributed to their subjects are not identical (they denote di¤erent sets). It is only the property in (32b) that is indeed shared by (32a, b), or by Bill and Ann. This su‰ces for Rule I to allow the coreference construal in (32b). Typical of parallelism configurations like both (28) and (31) is that the shared material must be destressed (in 31) or fully suppressed (in 28), which entails that in both, the subject pronoun is stressed. By this computation, then, (28) comes out as an instance of coreference ruled in by Rule I.9 This contrasts with some claims in the theoretical literature on coreference that (28) is ruled out for adults (e.g., Fiengo and May 1994). But that it is indeed the correct verdict is witnessed by the results in the adults’ control experiment of Thornton and Wexler, where they accepted (28) at a rate of 83 percent. The less than 100 percent acceptance rate here may be typical of Rule I computations, which require more e¤ort also from adults, but it is still in sharp contrast to their full rejection of coreference in Condition C environments ruled out by Rule I, such as He cleaned Flash Gordon—with no ellipsis context. Recall that unlike the adults, children performed here at chance level, allowing coreference at 54 percent. For Grodzinsky and Reinhart, this is the expected result whenever clause (c) of Rule I needs to be processed. Whether coreference is ruled in or ruled out by Rule I cannot be relevant, since children cannot complete the computation anyway. The question then is why this expectation is confirmed only in the ellipsis context (28), and not in the experiments with the same sentences as matrix, as in (25b) (She washed Mama Bear) or He cleaned Flash Gordon. Thornton and Wexler provide an answer: in the VP-ellipsis context, the online processing factor, which blocks even considering coreference in the matrix cases, does not play a role, because the full predicate attributed to the subject pronoun in the elliptic conjunct is available to the child from
226
Chapter 5
the previous conjunct. So when the reference of the pronoun is decided, the full proposition is available for computation (p. 129). Thus, the o¤ensive backtracking is not required, or, in our terms, the directionality factor is avoided. This, then, is a novel direct confirmation of Grodzinsky and Reinhart’s prediction: in contexts where the directionality factor is neutralized, children’s performance on Condition C aspects of Rule I should be at chance level. Nevertheless, Thornton and Wexler present this experimental finding as a major and decisive argument against Grodzinsky and Reinhart’s analysis, in two chapters of their book: ‘‘On Grodzinsky and Reinhart’s view,’’ they argue, ‘‘the proposed asymmetry between matrix and VPellipsis structures is not to be expected. For Grodzinsky and Reinhart, children should respond at chance levels to matrix sentences governed by Principle C because Rule I requires two representations to be compared’’ (p. 129). ‘‘Thus, Grodzinsky and Reinhart’s account cannot be correct in its present form’’ (p. 201). This same experimental finding also sheds some light on another argument of Thornton and Wexler against Grodzinsky and Reinhart’s analysis. Under Grodzinsky and Reinhart’s account, the crucial factor leading to chance performance is that children are unable to carry out the computation required by Rule I, because of their underdeveloped working memory. This means that what the actual verdict of Rule I is (for adults) cannot have an e¤ect on children’s performance. Whether Rule I permits coreference or not, they will not be able to complete the computation to decide this. In Thornton and Wexler’s account, by contrast, the explanation for children’s delay rests on their extending adults’ conditions for the creation of guises: ‘‘Children create guises in a superset of the contexts in which adults do so’’ (p. 102, (18)). It follows from this principle, that children should allow coreference under distinct guises wherever adults do, though they extend this also to areas where adults do not. Thus, when Rule I permits coreference in Conditions B and C environments, Thornton and Wexler predict that children should perform like adults. Overlooking the relevance of their own experiment on (28)–(29), where contrary to their prediction, children performed at chance level in a Condition C environment permitted by Rule I, Thornton and Wexler explain that these environments have not been studied experimentally yet, but they expect that these are the results that would be found, once the experiments are done, and argue that such findings would be a further argument against Grodzinsky and Reinhart’s analysis that predicts the opposite (p. 48).
Processing Cost of Reference-Set Computation
227
5.1.3 Questions of Learnability In the absence of any actual argument against a processing account, the two accounts for coreference delay in acquisition appear, so far, equivalent in their empirical coverage, with the exception of one area— derivations ruled in by Rule I. Assuming that Thornton and Wexler’s analysis can be modified to handle such cases, this is an interesting situation, where two di¤erent accounts appear equally possible for the same phenomenon. This is particularly interesting since the two accounts are based on essentially the same view of the linguistic background, namely, on the division of labor between binding theory and the coreference restrictions. Both follow the view in Reinhart 1983a that binding theory (under its various formulations) restricts only variable binding, and its operations are of the familiar type of output conditions of the computational system. Coreference (or covaluation), on the other hand, is governed by a di¤erent type of procedure, which is based on contextdependent inference. In principle, it is just as reasonable to assume that what causes coreference delay in acquisition is the type of computation involved, executing which requires larger working memory than children have (Grodzinsky and Reinhart), or that it is some deficiency of the relevant context-dependent factors, which children have not mastered yet (Thornton and Wexler). Our next question, in this and the next subsection, is whether it is possible, nevertheless, to decide between these two possible accounts. Note, first, that for the processing account, no question of learnability arises. Grodzinsky and Reinhart assume that children know everything innately that is required for coreference computation. So as soon as their working memory matures, they will be able to execute it. Thornton and Wexler’s account is based on some deficiency in knowledge, which has to be acquired. So the question how it is acquired is relevant. To assess how Thornton and Wexler answer this question, more details of their analysis of guises are needed. What Thornton and Wexler find particularly attractive in Heim’s (1998) reformulation of Rule I, is that in some areas it enables coreference computation to depend only on the identification of guises, without applying Rule I, namely, with no comparison of representations. The clearest example is what Heim labels ‘‘identity debate contexts’’ as in her (34). (34) Speaker A: Is this speaker Zelda? Speaker B: How can you doubt it? She praises her to the sky. No competing candidates would do that.
228
Chapter 5
In such contexts, it can be argued that her refers to the person that A and B identify as Zelda, while she refers to the person who is the speaker (say on stage), and whom B does not manage to identify. The same entity (Zelda) is then presented here under two guises. Heim argues that in this case, a comparison of a coreference representation with the bound variable representation must identify them as logically equivalent. So Rule I as stated would wrongly rule coreference out. She proposes that, in fact, when entities are represented under di¤erent guises, their relation does not count as coreference (roughly, since pronouns have guises as their denotation, and not individuals; hence they do not have the same denotation here). Thus, they are not subject to Rule I at all. Though this is not crucial for the present discussion, I should mention that I do not share Heim’s intuition that the reason why speaker B’s utterance in (34) is appropriate is that no coreference is involved here. Though postulating noncoreference may solve a technical problem, my own intuition is that the fact that she and her corefer is crucial for the interpretation—it is the inference that speaker B wants speaker A to draw. Instances of ‘‘debated identity’’ seem to me related to what Heim labeled ‘‘structured meaning’’ cases. That is, what matters in the context is the di¤erence of the properties, rather than the full propositions, which are equivalent in both cases. In the identity-debate contexts, the property one ascribes to an individual under discussion should help establish his identity. Praising oneself (lx (x praises x)) and praising her ¼ Zelda (lx (x praises her)) are distinct properties. If we identify someone as belonging to the set of those who praise themselves to the sky, this cannot help in establishing this person’s identity, in the given context, but locating her in the set of those who praise Zelda to the sky, enables the inference that she is Zelda. (In Heim’s example, the context also spells out that Zelda is probably the only member in this set. But the same inference would also be licensed without this addition.) Obviously, we do not yet have the formal tools to describe this type of inference precisely (which may rest on notions like relevance). This is the worry that led Heim to exclude such problems from the range of Rule I. Doing so enables us to keep the term ‘‘distinguishable interpretation,’’ which is used in Rule I, purely truth conditional. But it is not clearly getting us closer to understanding either the inference at question, or the conditions under which speakers are allowed to opt for coreference rather than variable binding. Let us, however, assume with Heim that identity debate contexts are instances of distinct guises. If so, then the coreference task for the child is just to determine whether it is possible that two referring expressions
Processing Cost of Reference-Set Computation
229
represent, in the given context, di¤erent guises of a discourse entity. If this happens, then coreference is permitted. Thornton and Wexler argue that given that guises are coded, they must be innate. What children do not master yet are the conditions under which speakers associate di¤erent guises with the same discourse entity, since children have a general deficiency in identifying speakers’ contextual intentions. However, for the analysis to work, Thornton and Wexler extend Heim’s analysis much further, for example, to the contexts Heim labeled ‘‘structured-meaning’’ that we examined in (31), changed in (35) so it illustrates Condition B environments. (35) I know what Ann and Bill have in common: she adores him passionately and he adores him passionately. For Heim, he and him in the italicized clause cannot possibly pass as two distinct guises of Bill. Allowing this would deprive the intuitive concept of guises of any content, since there is nothing here that suggests speakers’ uncertainty concerning identity, or a dual perception of the same individual. Heim assumes that in this case, Rule I applies as in Reinhart 1983b, or Grodzinsky and Reinhart—that is, a semantic comparison of representations is needed, though she o¤ers further refinement of the conditions under which they are distinguishable. Thornton and Wexler, by contrast, argue that two guises are involved here as well. There is Bill ‘‘in the guise of the individual, in the flesh’’ who adores someone, and there is Bill in the guise of the person that Ann adores (adapted from Thornton and Wexler 1999, 94, examples (6), (9)).10 Bearing di¤erent guises, as used here, seems to mean just bearing di¤erent y-roles. Thornton and Wexler label this type of guise distinction ‘‘role reversal guise’’ (p. 101). The same of course can be said of any instance of permitted coreference. For instance, in Bill adores himself, there is Bill the agent, and Bill the patient. But it also holds for all instances of blocked coreference. In Mama Bear washed her we automatically have two guises. Thornton and Wexler appear aware of this, and they add another condition to the identification of guises. They note that in the relevant clause in (35) the subject pronoun is stressed, and propose the generalization that ‘‘stress on the pronoun has the e¤ect of presenting [Bill] in a di¤erent guise, in virtue of the unexpected property of self-admiration’’ (p. 93). More generally, stress is a major clue for identifying guises in Thornton and Wexler’s analysis, and they argue that except for identity debate contexts, it is required in all cases of local coreference under distinct guises.
230
Chapter 5
The heavy stress marks that there is something surprising, and noncharacteristic about the situation expressed in the sentence. With this, then Thornton and Wexler can identify one of the factors that children have to acquire when they eventually reach adult coreference use. (36) ‘‘Children must learn that stress . . . marks the speaker’s intention to convey the local coreference interpretation by bringing [the stressed element] into focus’’ (Thornton and Wexler 1999, 205). While it is true that in many instances of coreference approved by Rule I in Condition B contexts there is heavy stress on one constituent or the other (in (35) it is on the subject; in (37)—on the object), the same stress pattern is found in many other instances where it does not have the e¤ect of allowing coreference. For example, (37) is a context that Thornton and Wexler believe allows coreference for Mama Bear washed her, with the help of the heavy stress (a judgment not shared by all). The same stresspattern, with the same sentence, in the contexts of (38) has precisely the opposite e¤ect of enforcing a noncoreference interpretation (Mama Bear could only wash Daisy Duck). (37) Mama Bear did not wash Miss Piggy. Mama Bear washed her. (38) a. First Daisy Duck washed Mama Bear and then Mama Bear washed her. b. First Daisy Duck washed Miss Piggy and then Mama Bear washed her. (39) Children must learn that stress marks the speaker’s intention to convey a noncoreference interpretation, by bringing the stressed element into focus. By the same logic, then, we should add to the conditions the child must learn, the one in (39). A theory equipped with both (36) and (39) can never fail, because it covers the whole domain of options (stress either means coreference, or noncoreference). In a sense, it captures the facts accurately—the child eventually knows that heavy stress is sometimes associated with coreference, and sometimes with noncoreference, as is the state of a¤airs in the adults’ world. Nevertheless, it is not clear that this is the type of theory we want.11 A more appealing conclusion for such a state of a¤airs would be that stress is not the factor that determines coreference options in Condition B contexts. If it were indeed possible to reduce all instances of coreference approved by Rule I to distinct guises, then there would be no motivation to assume any reference-set computation for coreference, to begin with.
Processing Cost of Reference-Set Computation
231
It is only necessary to determine for a given sentence whether the two referential occurrences are under the same or a di¤erent guise, which does not involve constructing and comparing semantic representations. Indeed, Thornton and Wexler mention in passing that possibly ‘‘Rule I can be dispensed with entirely’’ (p. 104). If so, then the processing account is of course unmotivated, and we are indeed left only with pragmatic considerations. This would not be the first attempt to dismiss the problem posed by coreference computation by enriching the set of referential distinctions, namely, to capture it directly by properties of the participating arguments, rather than by properties of the full representations. A whole family of accounts, starting with Evans 1980, attempted to distinguish coreference from ‘‘referential dependence’’ and argue that the binding conditions restrict only the latter. For example, Fiengo and May (1994) argue that coreference is always possible for two given NPs, as long as ‘‘it is not part of the meaning of the sentence that they are co-valued’’ (their linking rule). Like Evans, they do not consider why, then, coreference is not just simply always possible (see the discussion of (10) above). The apparent success of such attempts rests on using undefined notions. Thus, as I mentioned, Evans’s insight was in exposing and illustrating virtually all contexts that allow coreference in apparent violation of the binding conditions. But his description of the distinction he assumes would equally allow the same everywhere else. (For a more detailed survey of this point, see Reinhart 1983b.) A theory based on undefined distinctions is always true, by virtue of being unfalsifiable. Thornton and Wexler are probably aware of the danger of unfalsifiability posed by their description of guises as signaled by heavy stress. So they appear to view this just as a necessary condition (in all but identity debate contexts). Stress alone is not su‰cient to determine the guises interpretation. In addition, they assume that there are special contextual cues that speakers use as ‘‘markers of the speaker’s intended interpretation’’ (p. 105). It is only when these special cues are used, that the sentence can be associated with what Thornton and Wexler view as surprising, or noncharacteristic traits of the situation expressed by the sentence, which in turn allow coreference. In other words, these cues determine when by using heavy stress, the speaker actually intends to use the expressions as di¤erent guises. It is these cues that the child has to learn. Regarding what these cues are, Thornton and Wexler do not say much, but rather refer the reader to Heim 1998. As we saw, however, Heim argues that no guises are involved at all in the examples under consideration. On the
232
Chapter 5
view of Heim (1998) and Reinhart (1983b), determining that coreference is possible here is not based on any cues, but rather on applying logic: if the coreference representation is logically distinguishable from the bound one, coreference is permitted. This set of presently unspecified contextual conditions (cues) is, then, what children were missing at the age they were during the experiments, and will acquire in the next couple of months or years. Based on innately specified pragmatic principles of the Gricean sort, ‘‘children count on speakers to make their intended interpretation clear whenever possible, using whatever means are at their disposal. Children learn from experience that specific contextual cues accompany the local coreference interpretation, such as the factor of ‘surprise’ ’’ (p. 103). How does this learning from experience happen, in the absence of negative evidence? Thornton and Wexler suggest that ‘‘once children have witnessed a su‰cient number of examples of the local coreference interpretation in contexts that contain the relevant contextual cues, they will thereafter refrain from assigning this interpretation in the absence of these special markers of the speaker’s intended interpretation’’ (p. 103). The underlying assumption of Thornton and Wexler is probably that the acquisition of contextual abilities, and identification of speakers’ intentions is of a di¤erent type than found with innate Universal Grammar principles. Nevertheless, the same learnability questions still arise. As mentioned, actual examples of coreference in Condition B environments are quite rare in discourse. One may wonder if by the age of about six, all children have had su‰cient exposure to such uses. One may also wonder how each child decides at this age that the examples in the corpus he has encountered so far cover the whole set of options of use. Let us assume that once the set of ‘‘special markers of the speaker’s intended interpretation’’ is defined at some greater precision, these questions can be answered. 5.1.4 Explaining Chance Performance Assuming that the pragmatic and the processing accounts may fare roughly the same in predicting the areas of delay in the acquisition of coreference, and even if it turned out that they are equally plausible in terms of learnability, we may still ask whether they both, indeed, explain the experimental findings. To address this, we need to get clearer about what the problem is that requires an explanation. Though it is standard to describe the experimental findings as indicating a delay in the acquisition of coreference, the findings are much more specific than that. Acquisition delay can take several forms. If children do
Processing Cost of Reference-Set Computation
233
not know a given rule, or have set the parameter wrongly, the most natural result to expect is in the vicinity of 90 to 100 percent nonadult performance. (A variety of di¤erent group statistical results is to be expected if children’s performance di¤er individually.) But in all experiments on Condition B coreference (of the relevant type—see below), the group statistics of children’s performance is around 50 percent. This is a curious result in and of itself, but it becomes more puzzling once it is established that this is indeed chance performance, taking into consideration the performance of individual subjects. As mentioned, Chien and Wexler (1990) provide statistical analyses of individual performance, showing that many children perform individually at chance level (they sometimes reject and sometimes accept coreference under the same experimental conditions). Grodzinsky and Reinhart’s point of departure was that chance performance of this kind indicates guessing, which requires an explanation. Let me first clarify the experimental conditions at which 50 percent performance is found, as explained in Grodzinsky and Reinhart. The target sentence is preceded by, or embedded in another sentence which also provides an antecedent for the pronoun, as, for instance, in (40a). (40) a. This is A. This is B. Is A washing him? Picture/story context b. A washes A. c. A washes B. The story or picture accompanying the sentence includes either the situation in (40b) or in (40c). In both Chien and Wexler 1990 and Grimshaw and Rosen 1990, children had no problem answering yes in the vicinity of 90 percent in the context (40c), but they had around 50 percent performance in the context of (40b). It is this condition (40b) (the ‘‘mismatch’’ condition), which is relevant for our discussion. By comparison, Chien and Wexler found that at the same context, if A is a quantified DP, like every bear, rather than a referential DP, children at the age of five gave the adult answer (-no) 85 percent of the time. On Grodzinsky and Reinhart’s account, the reason why chance performance occurs only for (40b) is that only in this context (clause (c) of ) Rule I needs to be consulted. Though they do not explain this, Rule I applies when a coreference interpretation is considered. In the context (40c) the option of coreference is not suggested by the context, so there is no reason for the child to even examine the option in deciding his answer. In (40b), by contrast, a coreference interpretation corresponds to the context situation. So Rule I has to determine whether the target sentence
234
Chapter 5
allows coreference (in which case the answer to (40a) is yes) or not (with the answer no).12 Since in this sentence binding is disallowed, clause (c) of rule I needs to be processed. Adults would complete the task successfully and answer no, but children cannot complete the execution, hence they perform at chance, or guess. Not all subsequent experiments confirmed chance performance also at the level of individual children, but Thornton and Wexler (1999) point out that usually the experiments’ results have not been su‰ciently analyzed, statistically, to determine that. To identify chance performance of individual children (namely guessing), the results should be compared to the binomial model of chance performance. A detailed example of how the binomial model is constructed will be given in section 5.2.3 (for experiments with stress-shift). But the gist of the calculation is that first the probability p for a child to give a correct answer on any one trial is determined, based on the proportion of the total (group) number of correct answers out of the total number of answers. In an ideal coin-tossing experiment, p is 0.5. This, however, does not mean that every individual tossing coins, say 20 times, would necessarily get 10 heads and 10 tails. Similarly, a guess pattern does not entail that each child will give 50 percent correct answers on all trials. The next step in constructing the binomial model is to calculate the number of children expected to have n number of correct answers for all trials. (If there were 20 trials, then n is between 0 and 20.) If there is a reasonable fit between these last figures and the actual experimental results, it means that the children were guessing. In the detailed statistical analysis of Thornton and Wexler of their own experiments, a similar pattern to that of Chien and Wexler 1990 was found. By their own conclusions, the binomial model was confirmed at least for a group of about 75 percent of the children—fifteen out of the nineteen subjects. The group’s performance on Condition B sentences like Bert brushed him was approval of coreference 58 percent of the time.13 The individual subject data reveals that out of the nineteen subjects, eight children accepted 3/4 or 4/4 trials, seven children accepted 2/4 trials, one child accepted 1/4 trials, and three children accepted 0/4 trials. Note that seven children showed an equal number of yes and no on the four trials of the same condition. But this is not the only indication of individual chance performance (since chance allows di¤erent individual numbers). The combined group results are almost consistent with the binomial model for guess selection between two options. Thornton and Wexler point out that the number of correct answers (adultlike coreference rejection) in 3/4
Processing Cost of Reference-Set Computation
235
or all 4 of the trials is a bit higher than the probability in a binomial model: the model predicts two such children, while there are four (p. 175). Thornton and Wexler propose to identify these four children as a separate subgroup. For the other fifteen children (or, statistically, for seventeen of the nineteen children), they conclude that the response pattern is fully consistent with the binomial model of pure chance, or guessing. As for the subgroup of 4 children that rejected coreference in Condition B environments, Thornton and Wexler assume that they have reached adult knowledge. In their terms, this means they have mastered early the cues to guise-identification; in Grodzinsky and Reinhart’s terms (if the analysis of the results is correct), this would mean that their working memory has developed early, so they are able to execute the computation. Technically, only two children diverge from the binomial statistics, as we just saw. But Thornton and Wexler followed this group in all conditions of the sequence of experiments, and they found out that the same children perform equally well in all conditions involving coreference in Condition B environments. (For instance, in the ellipsis condition with sentences like Bert brushed him and the Tin Man did too, this subgroup permitted coreference incorrectly in 1/16 trials, precisely the same result as in the non ellipsis condition Bert touched him, that we have just examined.) This uniform behavior across conditions justifies singling out all four children as a separate group. However, Thornton and Wexler’s conclusion that the reason they are singled out is that, unlike the other children, they have reached adult knowledge does not follow automatically. In fact, it is not consistent with another of their findings. In the VP-ellipsis tests of Condition C that we discussed in (28), repeated in (41), the children as a group performed at chance level, allowing the construal he cleaned Flash Gordon 54 percent of the time. But on this task, unlike the Condition B tasks, there is no significant di¤erence between the two groups of children, as seen in (42) (Thornton and Wexler 1999, 200). (41) The kiwi bird cleaned Flash Gordon and he did too. (42) Acceptance of the interpretation ‘He cleaned Flash Gordon & he ¼ Flash Gordon’ a. Group I (4 children): 44% (7/16) b. Group II (15 children): 57% (32/56) The adult control group accepted coreference here 83 percent of the time, but the group of four children that are presumably ‘‘little adults’’ in their knowledge of guises (or Rule I) performed here at 44 percent, which is in the range of chance performance. (Thornton and Wexler do not provide
236
Chapter 5
the individual data for this group of four children on this condition.) Recall that this ‘‘structured meaning context’’ is an instance of coreference ruled in by Rule I, although binding is ruled out here by Condition C. In Thornton and Wexler’s analysis this is a case of distinct guises. If the four children in question mastered adults’ guise understanding, which Thornton and Wexler assume, to explain their performance on Condition B tasks, they should have manifested this also in the present task. It is in principle possible that when children are unable to execute a given task, some of them would develop some sort of default strategy to deal uniformly with such tasks without applying the di‰cult procedure. Children operating by a strategy end up performing uniformly across different tasks, and, depending on the default strategy and the experimental condition, it can happen to be the adultlike response. It is not crucial for the present discussion to determine what strategy could explain the full data of the performance of these four children. Nevertheless we may note that it is possible, in fact, to formulate such a strategy, though it is hard to see where it could come from: it would be to disallow coreference whenever a pronoun can be bound, skipping Rule I altogether. This would rule out coreference in Condition B environments but not in (40), where the pronoun cannot be bound. Hence, Rule I still needs to be processed, leading to the familiar failure and guessing. There is another interesting finding of Thornton and Wexler that appears consistent with such a strategy. This regards the strict interpretation of reflexives in VP-ellipsis contexts such as (43b), namely, the question whether children allow a coreference interpretation for the reflexive in (43a), as opposed to its interpretation as a bound variable. On this issue, there is no reason to expect 50 percent performance, under either theory, and indeed it was not found. But the two groups performed dramatically di¤erently, as summarized in (44) (Thornton and Wexler 1999, 195). (43) a. Hawkman fanned himself. b. Hawkman fanned himself and the baby boy did too. (44) Acceptance of the strict interpretation of (1b) (‘The baby boy fanned Hawkman’) a. Group I (4 children): 13% (1/16) b. Group II (15 children): 81% (21/26) The 4 children that Thornton and Wexler identified as little adults disallowed it; the others allowed it. From the perspective of Rule I alone, coreference should be permitted in (43a), because clause (b) of Rule I does not hold—Hawkman can bind himself. (See the discussion of (20) in
Processing Cost of Reference-Set Computation
237
section 5.1.2. The question why this is not an option taken by adults more commonly is an independent issue, to which I do not know the answer.) Children who apply Rule I will be able to get this far, and since this clause does not hold, they will allow coreference here. Children who bypass Rule I and operate by the strategy just outlined will rule out coreference in (43a) because the reflexive can be bound. In any case, abstracting away from the group of four, the crucial finding confirmed again in Thornton and Wexler’s experiments is that at least for the majority of children, performance in Rule I environments is at chance level, consistent with individual guessing. So, a crucial question about a given analysis of coreference delay is whether it can explain this specific guess pattern, which, as mentioned, is not a common finding in all areas of acquisition. Even if not all children show this pattern, for those who do this needs explaining. The processing analysis of Grodzinsky and Reinhart has taken this finding as its point of departure, and it provides a straightforward answer: a guess pattern is found when, on the one had the child knows what needs to be computed to provide an answer, and on the other hand, he is not able to complete the task. So, given that there are two options to choose from—yes or no, the choice is arbitrary—guessing. On the pragmatic account of Thornton and Wexler, it is hard to see how chance performance could be derived even for a minority of the children. On their account, the source of children’s coreference delay is their extension of the conditions allowing distinct guises: they permit distinct guise-interpretation in a superset of the conditions under which adults permit them (‘‘extended guise creation,’’ p. 102). Their performance, then, should be determined by the size and properties of the superset they adopt. Suppose, for instance, that children accept freely what Thornton and Wexler labeled ‘‘role reversal guises’’—that is, they allow every thematic role to correspond to a separate guise. In this case they should always allow coreference in Condition B environments, because (as in any other instance of coreference), the two occurrences have di¤erent thematic roles. So their performance should be close to 100 percent acceptance. Suppose they take heavy stress as always allowing distinct guises. Then their performance should depend on the experimental conditions. In sentences where heavy stress is used, they should allow, again, coreference at the range of 100 percent. But if heavy stress is avoided (as in most of the experiments), they should perform in an ‘‘adultlike’’ fashion and disallow coreference at the same range. So the guise superset analysis can indeed correctly predict that children’s performance would di¤er
238
Chapter 5
from that of adults, but it cannot predict the specific way it di¤ers, namely, the actual findings of individual chance performance. 5.2
Acquisition of Main-Stress Shift
Another area we examined that involves reference-set computation is stress-shift.14 As we saw in chapter 3, stress-shift is a repair procedure, designed to supply the derivation with a focus not included in its focus set. The operation has to undo the main stress assignment of the derivation; hence it is an illicit operation. Applying it requires constructing a reference set to determine that the required focus could not be obtained without applying the illicit operation. This explains why the focus obtained by stress-shift is always narrow, namely, it does not project. The hypothesis is, then, that in this area, similar acquisition evidence will be found that the computation exceeds children’s processing ability. I examine here two areas of stress-shift—switch-reference (section 5.2.3) and focus identification in the scope of only (section 5.2.4). As we will see, the 50 percent range of performance is indeed found in both areas. In section 5.2.2, I argue that this cannot be attributed to any general problems with stress that children are facing. The analysis of individual responses reveals two bypassing strategies. In tasks involving switchreference, the dominant strategy is simple guessing, so we find individual performance at the range of 50 percent. In focus identification, where the tasks involve semantic disambiguation, the dominant strategy is the selection of an arbitrary default, which may be fixed for a given child across tasks. But, as argued in section 5.2.5, the choice of the default is itself arbitrary; hence the group results remain in the 50 percent range. Analysis of the explanations children give for their answers reveals that they are attempting to construct the relevant comparison derivation, but they get stuck at that stage. In the area of focus disambiguation, there are additional factors to take into account, because even when stress is not shifted, focus selection involves semantic disambiguation, or selecting a focus out of a set of candidates. I address these issues in section 5.2.5. Let me first summarize in some detail the analysis of stress and focus I am assuming, which will serve as the basis for stating the questions we may seek an answer to in acquisition studies, in section 5.2.2. 5.2.1 An Overview of Stress and Focus This section surveys the major conclusions of chapter 3, but without repeating the motivation, argumentation and evidence.
Processing Cost of Reference-Set Computation
239
5.2.1.1 Neutral Main Stress (Cinque 1993; Szendro˝i 2001) Cinque (1993) revived the view of the 1970s that main sentence stress is determined independently of focus or discourse considerations, and that a distinction between neutral and marked stress is, therefore, feasible. Cinque’s implementation rests on reanalysis of the Nuclear Stress Rule (NSR), which determines for each derivation where its main stress falls. The basic framework of his analysis is the metrical grid theory of Halle and Vergnaud (1987)—the NSR starts the assignment of stress with the most deeply embedded constituent, which then moves up to the next metrical line. The outcome will be, then, that the most prominent stress falls on this constituent. The gist of Cinque’s analysis is that the depth of embedding (in the case of sisters) is determined by the direction of selection (or recursion, as he phrases it) of the given language. Both in a VO language like English and in an OV language like Dutch, the most deeply embedded constituent in (45) is the object. Hence, in both, the object receives main stress (throughout I will use bold to indicate the main sentence stress). (45) a. I read the book. b. dat ik het boek las that I the book read Szendro˝ i (2001) presents an alternative technique for the execution of the neutral main stress rule. She uses Liberman’s (1979) metrical tree notation. In this method, there are no separate cycles like the NP-cycle or the VP-cycle assumed in Cinque’s system. Rather stress is assigned to the nodes of the syntactic tree (or alternatively, the prosodic structure). An advantage of this system is that it is fully transparent how it applies to syntactic (or prosodic) trees, and thus it lends itself to strictly incremental application. The technical details of the NSR are not crucial for the present discussion. What is crucial is that main stress is assigned to derivations by a rule that is independent of focus considerations. I will refer to this rule as the neutral main-stress rule or just main-stress rule, and continue to abbreviate it as NSR. In chapter 3, I addressed some of the arguments raised against the idea that a uniform neutral stress rule can be postulated, and argue that many of them rest on conflating focus stress and anaphoric destressing, which, as we see below, is independent of the NSR. 5.2.1.2 The Focus Set Cinque’s theory of sentence stress enables reformulation of the idea that focus is marked overtly, at PF, rather than
240
Chapter 5
covertly. In Reinhart’s (1995) execution, each derivation is associated not with an actual focus, but with a set of possible foci, that is, a set of constituents that can serve as the focus of the derivation in a given context. This set is determined by the computational system at the stage where both the syntactic tree and stress are visible. In other words, focus selection applies to a pair hPF, LFi of sound and configurational structure. The focus set is defined, then, in (46). If stress falls on the object, either in English SVO structures, or in Dutch SOV structures, the focus set defined by (46) is the one in (47). (46) The focus set of IP consists of the constituents containing the main stress of IP. (47) a. [IP Subject [VP V Object]] b. [IP Subject [VP Object V]] c. Focus set: {IP, VP, Object} This means that in actual use, any of the members of the set in (47) can serve as focus. At the interface, one member of the focus set is selected, as the actual focus of the sentence. For illustration, (48), which is generated with stress on the object, can be used as an answer in any of the contexts in (48), with the F-bracketed constituent as focus. (48) a. My neighbor is building a desk b. Speaker A: What’s this noise? Speaker B: [F My neighbor is building a desk] c. Speaker A: What’s your neighbor doing these days? Speaker B: My neighbor [F is building a desk] d. Speaker A: What’s your neighbor building? Speaker B: My neighbor is building [F a desk] If this was the complete story, we could conclude that language is almost perfect in meeting the interface need of associating a focus with derivations, because an obligatory PF-rule, needed for phonological convergence, is su‰cient for signaling the focus. This system still generates semantically ambiguous derivations (each consistent with several focus construals). But the set of interpretative options is clearly defined and restricted. However, in reality, the set of possible foci the system generates is not su‰cient for the interface. A derivation is inappropriate for a given context if no member of its focus set can be used as an actual focus in that context. (48), for example, cannot be used as an answer in either of the contexts of (49). (The a sign indicates, throughout, inappropriateness to context.)
Processing Cost of Reference-Set Computation
(49) a. Speaker Speaker b. Speaker Speaker
241
A: Has your neighbor bought a desk already? B: aNo, my neighbor is [F building] a desk. A: Who is building a desk? B: a[F My neighbor] is building a desk.
This is so because in the contexts of (49), the F-bracketed constituents should be the foci, but these constituents are not in the focus set generated by (46) for a sentence in which the object bears stress (cf. 47). 5.2.1.3 Stress-Shift Operations For cases like (49), where the focus set defined by the neutral stress does not contain the desired focus, a special stress shifting operation applies. For the present discussion it su‰ces to state it informally, as in (50), which follows Neeleman and Reinhart’s (1998) implementation. It applies to a given output of main-stress assignment and, while keeping this assignment adds stress to another word. The formulation of the rule as adding exactly two stars, is for ease of illustration only. For the precise formulation, see Szendro˝ i 2001 and chapter 3. In the example (51), the result is that main stress is on my neighbor, but the original stress on desk remains as a secondary stress. (50) Main-stress shift Add two stars. * * * (51) My neighbor is building a desk ) My neighbor is building a desk. In the context of (49a), repeated in (52a), extra stress is assigned to the verb. As a result, the verb is in the focus set defined by (46), and the derivation is appropriate in this context. In (52), the same operation applies to the subject. (52) a. Speaker Speaker b. Speaker Speaker
A: B: A: B:
Has your neighbor bought a desk already? No, my neighbor is [F building] a desk. Who is building a desk? [F My neighbor] is building a desk.
The output of (50) is what I have called marked stress. Although they sound perfectly natural in their context, the foci in (52) are marked, since their derivation violates the neutral main stress rule—it is obtained by a superfluous operation that undoes the results of the neutral main-stress rule. I will return directly to the symptoms of markedness. The e¤ects of (50) are often confused, in discussions of marked stress, with the e¤ects of a di¤erent process of anaphoric destressing. This
242
Chapter 5
distinction, proposed in Selkirk 1984, is discussed in Reinhart 1997 and Neeleman and Reinhart 1998, who argue that the latter is completely independent of considerations of the focus set. For reasons discussed there and in chapter 3, it does not generate the same markedness e¤ects. (In terms of the next subsections, it allows focus-projection, and it does not require reference-set computation).15 The destressing operation can be stated as in (53). It applies locally, to any anaphoric constituent, independently of the neutral main stress rule (NSR). This can be captured by assuming that it applies at the word level, prior to the NSR. Thus, the relevant D-linked or anaphoric expressions do not carry an intonational star when the NSR applies, as in (54). (55) Anaphoric destressing Remove a star (prior to the NSR). (54) a. Destress:
* * Max [saw her]
b. NSR: )
* Max [saw her]
The NSR, then, just operates in the standard way, turning the most embedded star into the main stress. Since in (54) the lowest star is on the verb, it is the verb that will carry main stress, as in (54). Destressing can apply to larger units than a word. Typically, when it applies to a whole VP, the destressed VP may also not be pronounced, giving rise to VP-ellipsis. This is illustrated in (55). Since the VP in the second conjunct is anaphoric, it is destressed. Main stress then is assigned on the only possible candidate, namely the subject, as in (55b). The VP could either be pronounced as in (55b), or mispronounced (‘‘deleted at PF’’), as in (55c). * * * * (55) a. Destressing: First Max [touched Felixi ] and then Lucie [touched himi ] * * First Max [touched Felixi ] and then Lucie [touched himi ] c. PF deletion: First Max [touched Felixi ] and then Lucie did [e]
b. NSR:
* * * * (56) Stress shift: First Maxi [touched Felix] and then Lucie [touched himi ]
Processing Cost of Reference-Set Computation
243
Note finally that, if this is what the context requires, it is possible to make an anaphoric destressed constituent the focus. This would of course require applying stress-shift. If we apply this to the derivation (55), we obtain (56). Foci obtained this way are often described as contrastive, but for all our purposes, they are just standard applications of the stress-shift operation. 5.2.1.4 Reference-Set Computation The widely acknowledged characteristic of the focus obtained by shifted (marked) stress is that it ‘‘does not project’’ (it can only be ‘‘narrow focus’’). As we saw, stress obtained by the neutral NSR allows any projection containing it to serve as focus, for example, the whole IP in (48b), repeated below. The shifted cases of (52), by contrast, cannot be used in the same context, as we see in (57), which means that they do not project IP as focus. Similarly, stress shifted inside the VP does not project VP as focus, as seen in the comparison of (48c) and (58). (48b) Speaker A: What’s this noise? Speaker B: [F My neighbor is building a desk] (57) Speaker A: What’s this noise? Speaker B: a[F My neighbor is building a desk] a[F My neighbor is building a desk] (48c) Speaker A: What’s your neighbor doing these days? Speaker B: My neighbor [F is building a desk] (58) aMy neighbor [F is building a desk] Though widely discussed, such facts did not receive a satisfactory account. Standard approaches postulate a special focus-projection rule for ‘‘contrastive’’ focus. But it is far from obvious how we can distinguish (in a noncircular way) the ‘‘contrastive’’ (58) from the ‘‘standard’’ (48d) (Speaker A: What’s your neighbor building? Speaker B: My neighbor is building [F a desk]), given that in both the focus is narrow. Within the present framework, the di¤erence between neutral and ‘‘contrastive’’ stress is that the latter is derived through a violation, or extension, of the main-stress rule (NSR). Generally, violations of core principles of the CS are never allowed to apply superfluously. They are permitted just in case this is the only way to satisfy an interface requirement. In the specific case of focus, we have already seen that the stress system is not perfect for the interface, as it does not generate the full set
244
Chapter 5
of foci that may be needed in context. To overcome this, stress-shift applies to generate a focus not available otherwise. But the price for resorting to such an extension of the system is always the same: reference-set computation is required, to determine whether indeed this is the only possible way to reach the interface goal. In the case of focus, the outcome is that shifted stress cannot project, if the given projection is also available with neutral stress. Let us see how this works. I assume, first, just the one definition of the focus set in (46), repeated below, which is blind to how stress is assigned. Hence, for the derivations at hand the focus sets defined are those in (59b), (60b), and (61b). (46) The focus set of IP consists of the constituents containing the main stress of IP. (59) a. My neighbor is building a desk. b. Focus set: {IP, VP, Object} (60) a. My neighbor is building a desk. b. Focus set: {IP, VP, V} (61) a. My neighbor is building a desk. b. Focus set: {IP, subject} The focus sets of (59) and (60) intersect in the case of IP and VP. Suppose that in a given context we want VP (or IP) to be the focus. We could obtain this result by using (59), without applying the superfluous stressshift. Hence, (60) is ruled out for that context. The only focus of (60) not already in the focus set of (59) is the verb. Hence, it is only the need to use this focus that can motivate the stress-shift. Similarly, (61) intersects with (59) on IP. Hence (61) can only be used with the subject as focus. Computing this type of reasoning requires construction of a reference set, which consists of hd, ii pairs of a derivation and interpretation. In this case, the relevant interpretation is a selection of a focus out of the focus set. So, suppose our task is to decide whether (60) can be used with the IP as focus (as in the context of (48b), What’s this noise?). The reference set is (62). (62) a. d: My neighbor is building a desk ! My neighbor is building a desk i: Focus: IP b. d: My neighbor is building a desk i: Focus: IP
Processing Cost of Reference-Set Computation
245
Since the pair in (62b) does not involve the extra operation, it blocks (62a). Suppose now we want to use (60) with the verb as the focus, as in the context of (48c). Since stress-shift is involved, we have to construct a reference set there as well. However, the reference set is (63), which contains only this one member, since no other derivation (of the same numeration) has the verb as focus. Hence this derivation is allowed. (63) d: My neighbor is building a desk ! My neighbor is building a desk i: Focus: V On this analysis, then, stress-shift imposes a costly complex computation. At first glance, it may appear stipulative to postulate such a complex solution to the problem of focus projection with stress-shift. In all cases under consideration so far, the actual focus ends up being the narrow one, namely, the smallest projection of the stressed node. If this is the correct descriptive generalization, several simpler accounts may suggest themselves. One would be to change the definition of the focus set. Rather than a unified definition for all forms of stress, one could assume that when the stress shifts, namely, when the nondefault stress is used, the focus set contains only this one member. In frameworks assuming a focus feature, the same could be obtained by associating the feature with the type of stress. However, this descriptive generalization is not, in fact, true. There are configurations that allow shifted stress to project beyond the smallest projection. (64) [The man with the martini] is the murderer. (65) a. [Did the man with the apron] commit the murder? b. Who is the murderer? In (64) main stress falls in a ‘‘nonneutral’’ position on the noun in the subject. By the generalization under consideration, the focus set for (64) should contain then only the lower DP the martini. The derivation can certainly be used with this constituent as a focus, e.g., as an answer to the question (65a). But it can also be used as an answer to (65b), in which case the whole DP the man with the martini is the focus. So the nondefault stress does project up to the top DP. But it is still impossible to use this derivation with the full IP as focus. If focus projection is determined by reference-set computation, this is the expected outcome, because, unlike IP, the top DP is not in the focus set of the derivation with neutral main stress (which would fall on murderer).
246
Chapter 5
5.2.2 Preliminaries The question whether reference-set computation must be assumed for stress-shift foci resembles that posed by Rule I. In both cases the empirical evidence that it is indeed needed is not huge. In most instances, an alternative simpler theory would capture the facts just the same. For this reason, acquisition findings may have a crucial role in evaluating the proposed analysis. The analysis of stress-shift as requiring reference-set computation rests on the assumption that there is a fixed definition of a focus set. So even when the stress is shifted, any constituent containing the stress is a potential focus of the derivation. This is what enforces constructing a comparison set to determine whether any of the candidates could have been derived also with neutral stress, in which case it is ruled out as a focus of the derivation with stress-shift. As I mentioned, this assumption may seem the most speculative part of the analysis, because it appears easy to capture most of the facts without it. It turns out, however, that this aspect of the analysis is easiest to test experimentally, and as we will see in section 5.2.4, it gets direct confirmation: given children’s failure in processing sentences with stress-shift, the experimental design enables examining which interpretations they are considering. There is clear evidence that when the stress shifts to the direct object in, for example, Lucie sold a car to Max, children still access the VP-focus (sold a car to Max). This, then, is consistent with the hypothesis that a computation of the sort assumed here is at work, and, furthermore, that children are aware of the computation required by stress-shift, and initially attempt to carry it out. If reference-set computation is indeed at work here, our most immediate expectation is that we should find not just some vague acquisition delay, but performance at the range of 50 percent. This general expectation is confirmed in the available studies. Several studies noticed a correlation between children’s performance on the coreference aspects of Condition B, or Rule I, and their performance on tasks involving contrastive stress. In the latter as well, group statistics were found to be around 50 percent adultlike answers. In the present framework, there is no special status for so-called contrastive stress, and it is viewed as an instance of the stress-shift operation. The findings, then, bear directly on our question. McDaniel and Maxfield (1992) studied two contexts of what they view as contrastive stress. One involves focus selection, as in the task below.
Processing Cost of Reference-Set Computation
247
(66) Experimenter: Bert doesn’t want to eat the big strawberry. What do you think he wants to eat? [Props: Bert, a big strawberry, a little strawberry, a big tomato, a little tomato, a pear, a carrot, an orange, and a green pepper] In our terms, the contextually appropriate answer to the experimenter’s question rests on selecting a focus out of the focus set of the first clause, given partially in (67b). (67) a. Stress-shift: Bert does not want to eat the big strawberry. b. Focus set: {big, the big strawberry, eat the big strawberry . . .} (68) a. Neutral stress: Bert does not want to eat the big strawberry. b. Focus set: {strawberry, the big strawberry, eat the big strawberry . . .} The question clause restricts attention to the first two members of (67b) (since it presupposes Bert’s wanting to eat something). So the child needs to select one of these two in order to pick an object in the props inventory. If the narrow (adjective) focus big is selected, as in the adults’ response, then the only choice is the little strawberry. If the full DPfocus—the big strawberry—is selected, then Bert does not want to eat any strawberry, so any of the nonstrawberry objects in the inventory can be selected. The choice of focus, however, is not free. Since stress-shift applied, a reference set must be constructed, with the neutral stress derivation (68). Since the wide DP-focus is also in the focus set of (68), it is excluded as a candidate for the focus of (67). On our hypothesis, children know that this computation is required, but they are unable to execute it. Hence, their choice will be arbitrary. McDaniel and Maxfield do not report directly the results of this experiment, but rather they scored them together with the results of their second experiment with contrastive stress (discussed in section 5.2.3). Children were given 1 point for each correct answer, with the maximum score of 10 points. The mean score (for the two experiments together) was 5.5, meaning that 55 percent of the answers were correct, a result consistent, in principle, with chance performance, though this pattern cannot be established without the individual data. McDaniel and Maxfield report an earlier experiment of Tavakolian (1974) using similar tasks and methodology as (66), with similar results. As we shall see, the same overall performance in the range of 50 percent was found in all studies on the acquisition of stress-shift. At least in one of the instances of contrastive stress—the switch-reference contexts,
248
Chapter 5
discussed in section 5.2.3—analysis of the individual results reveals a pattern of guessing. Nevertheless, this is not yet su‰cient to establish the analysis under consideration. In principle, it cannot be excluded that there are other linguistic tasks at which children perform at the range of 50 percent. (The entailment goes only one way: that if reference-set computation is involved, there is performance at 50 percent, but not that performance at 50 percent entails reference-set computation). So, in principle, other accounts for this performance range are possible. McDaniel and Maxfield, for example, suggest that the source of children’s di‰culties here is their inability to perceive stress, and thus to use it as a clue for interpretation. They argue, further, that the clue for permitting coreference in violation of Condition B (Rule I) is contrastive stress. Since children are unable to perceive stress, this also explains their poor performance on ruling out coreference. We saw in section 5.1, that the coreference options in these contexts are not determined by stress (though of course stress may interact with coreference). So just stress problems cannot explain the coreference findings. On our hypothesis, what the coreference and the stress-shift tasks have in common is that they both involve reference-set computation, though the specific required computations are unrelated. The relevant question for this hypothesis is whether children’s di‰culties with stress-shift could be explained by an independent general problem with identifying stress. Indeed popular hypotheses in studies of the acquisition of sentential prosody are that children are insensitive to stress—that is, they do not perceive it at all, or, if they perceive it, they cannot use it as a key to disambiguation in comprehension tasks (though they have no problem in production tasks). We should note, however that these studies reached this conclusion based on experiments with stress-shift (e.g., Maratsos 1973; Cutler and Swinney 1987 and the references cited there). At present, we are considering two di¤erent explanations of children’s di‰culties with stress-shift, one is that the problem is specific to stress-shift, because only in this case reference-set computation is at work; the other is that it reflects general stress problems. It is impossible to decide that the second is right based only on findings of acquisition of stress-shift. To settle the issue children’s performance on stress in other areas must be determined. Regarding the first hypothesis that children simply do not perceive stress at all, it is interesting to note that in one of the experiments of Cutler and Swinney (1987), they checked the perception of stress in a string of arbitrary words that did not form a sentence, but one of them carried stronger stress. They found that children processed the stressed
Processing Cost of Reference-Set Computation
249
words significantly faster than the nonstressed words. (The experiment measured response time to a target word when it is accented and when it is not.) This contrasts with their experiments with actual sentences. They conclude that children can perceive stress, so the di‰culties must lie at the sentence level. The second hypothesis is that at the sentence-level children cannot use stress for disambiguation, or for parsing the information structure. Halbert et al. (1995) examined this question with pairs like (69), where no stress-shift is involved. (69) a. Big Bird threw [the fish] food. b. Big Bird threw the [fish-food]. Without stress, the word string in (69) is ambiguous between the syntactic construal (a) and (b). When pronounced (i.e., once the main-stress rule has applied), there is no ambiguity: in (69a), where food is the themeargument, its stress is projected as the sentence-stress. In (69b), the most deeply embedded argument is fish-food, which gets the standard compound stress. This stress then projects as the main sentence stress. In both derivations stress is assigned by the main-stress rule (NSR), and no stress-shift applies. If children have problems either with the NSR, or with disambiguating via stress, they should perform badly on distinguishing these sentences. Under our hypotheses that the main-stress rule is both innate and computationally simple, and that it is only stress-shift that requires reference-set computation, they should do well here. In Halbert et al.’s study, children were given a story for each sentence. In this context, the sentence would be false for adults. Of the child subjects, 89 percent (16 out of 18, aged 3 to 5;3) indeed gave the correct answer (i.e., judged the sentence false). This single experiment is not yet su‰cient to establish the point,16 but the results suggest that children have no independent comprehension problems in using stress for disambiguation. The most serious problem for the view that children have independent di‰culties with contrastive stress is that all findings show that the problem arises only in comprehension tasks, whereas in production children are able to use contrastive stress. For example, in Hornby and Hass’s (1970) experiment, surveyed in Thornton and Wexler 1999, children between 3;8 and 4;6 were shown pictures and asked to describe them. The pictures came in pairs di¤ering in only one element. When the pictures showed di¤erent agents (a boy or a girl petting a dog), children stressed the subject 80 percent of the time, in their description of the second picture. On our analysis, this is obtained by stress-shift, so the experiment
250
Chapter 5
shows that they are able to apply this procedure in production. As pointed out by Thornton and Wexler, this inconsistency of comprehension and production remains a mystery in subsequent studies. It is not easy to imagine how this di¤erence can be explained within the view of stress deficiency. But within the analysis of stress-shift proposed here, production and comprehension do not involve the same sort of difficulty, though both require more processing than with neutral stress. In production, the language user knows which focus he intends for the utterance. Considering all our assumptions, in the derivation he executes, the NSR (main-stress rule) would still need to apply. First because this is an obligatory rule of the phonological component, and next, because it is only with stress assigned that he can know whether the focus he intends is in the focus set. If it is not, stress-shift must apply. This itself is a costly operation, because it requires undoing steps in the derivation. However, the producer’s task ends at this stage. For his addressee, on the comprehension side, there is an additional mission. He is faced with an ambiguous derivation (phonological input), which, by the innate definition of the focus set, allows several focus construals, and he needs to determine which of them the producer intended. Semantically ambiguous derivations always pose greater load on the hearer than on the producer (who always knows which meaning he intends). But in this specific case, where the input derivation contains stress-shift, the hearer needs, in addition, to apply reference-set computation, namely construct an alternative derivation, additional to the input derivation. Based on this, he would decide whether the derivation is still ambiguous (as in the case of stress shifted to the subject DP), or only one of the focus construals is actually permitted, in which case, this must be the focus that the producer intended. In actual language use, then, reference-set computation of stress-shift is involved only in comprehension. If it is this step of constructing and comparing an alternative derivation which children are unable to perform, we should expect problems in comprehension, but not in production. The central hypothesis under consideration is that children are unable to process reference-set computation, and they resort to some bypassing strategy instead. So far we have examined one strategy—guessing. But focus identification also involves semantic disambiguation. Even with neutral stress, a focus is selected in context out of a set of possible candidates. It is known that both children and adults may use semantic defaults to settle ambiguities, when possible. Hence, the study of acquisition in this area enables us to examine the relations between guessing and
Processing Cost of Reference-Set Computation
251
default strategies. To di¤erentiate the two, let us start with an area where semantic defaults do not play a role, and where indeed the guessing pattern was found with stress-shift. 5.2.3 Switch-Reference Resolution A contrastive stress context that has attracted much attention is anaphora resolution with contrastive pronouns. This was the second contrastive stress experiment of McDaniel and Maxfield (1992) and was studied before in Maratsos 1973 and Solan 1983. The problem is examined in greater depth in Zuckerman, Vasic´, and Avrutin 2001 as well as in Baauw, Ruigendijk, and Cuetos 2003. So I will examine here their findings. The structures under consideration are of the type illustrated in (55) of section 5.2.1, and repeated in (70). (70) a. First Max touched Felix and then Lucie touched him. b. First Max touched Felix and then he touched him. * * * * (71) a. Destressing: First Max [touched Felixi ] and then Lucie [touched himi ] * * b. NSR: First Max [touched Felixi ] and then Lucie [touched himi ] * * c. Stress-shift: First Maxi [touched Felix] and then Lucie [touched * * himi ] Recall, first, that within the present system, the stress on the pronouns of (70) is derived by stress-shift. The derivation of (70a) is repeated in (71). Since the whole VP of the second conjunct is anaphoric, it is detressed by the anaphoric destressing rule, independently of the neutral main-stress rule (71a). The NSR then applies to assign main stress to the only available word Lucie (71b). In the relevant context, the pronoun needs to be focused, which is obtained by applying the stress-shift rule, in (71c) (where Lucie retains secondary stress). In the derivation of (70b), stressshift applies to both pronouns. The anaphora peculiarity of such structures is that the stressed pronoun appears to select the reverse value of its destressed counterpart. In (71b), with a destressed pronoun, it picks up Felix as reference. In (71c) the stressed pronoun picks up Max. For this reason these structures are sometimes referred to as ‘‘switch-reference’’ instances.
252
Chapter 5
In Zuckerman, Vasic´, and Avrutin’s (2001) experiment, twenty-eight children at the ages 4;3–6;2 (and twelve adults) were tested on similar sentences (e.g., First Tinky-Winky hugged Po and then Dipsy hugged him). They were presented with sets of four pictures which were accompanied by a prerecorded sentence that described the events in the pictures. The first picture in each set corresponded to the first conjunct of the sentence. The three remaining pictures contained only one that correctly corresponded to the second conjunct of the sentence. The task of the children was to point to the correct picture out of these. There were twenty-four items for each condition. In twelve the contrastive pronoun was the object, and in the other twelve it was the subject. No di¤erence was found between performance on subject and object pronouns. The percentage of correct answers was 51. As Zuckerman, Vasic´, and Avrutin point out, the result is not di¤erent from chance. Zuckerman, Vasic´, and Avrutin (2005) present the individual data of the same experiment, which, as they point out, enables considering matching the results to the binomial model. In this experiment twenty-eight children were tested on twenty-four trials, which provides ample grounds for applying the model. At each trial the child had a choice between the correct and the incorrect answer. Given that the choice is binary, if we hypothesize that the children are guessing, then the distribution of the results should follow the binomial model with p ¼ 0:5. Let us see whether this is in fact so. The comparison with the binomial model is given in figure 5.1. The figure and its analysis were prepared by Kriszta Szendro˝ i, based on the figures of Zuckerman, Vasic´, and Avrutin, who have also found consistency with the binomial model. The results of Zuckerman and colleagues are in gray. We can see that there were no children who got exactly 0, 1, 2, 3, or
Figure 5.1 Stress-shift pronouns (Zuckerman, Vasic´, and Avrutin 2005)
Processing Cost of Reference-Set Computation
253
4 answers. There was one child with exactly 5 correct answers, six children had exactly 13 correct results, and so on. The columns in black are the ones predicted by the binomial model. (Note that due to rounding o¤ to integers, i.e., no half kids, the estimate of the expected binomial pattern in red contains an extra child, which accentuates the di¤erences between the actual and the estimated distributions.) The red columns show the number of children expected on the binomial model to have exactly n correct answers, where n is between 0 and 24. (k0 is the expected number of children with 0 correct responses, k1 is the expected number of children with exactly 1 correct response, and so on.) To calculate these, we first have to determine the probability p: the probability for a child to give a correct answer on any one trial. In an ideal coin-tossing case, where all the coins are perfectly balanced, p is 0.5. Here we have children, not ideal coins, so the probability is likely to be somewhat di¤erent from 0.5. The maximal likelihood estimator for p is determined by the average of the overall results. Altogether there were 344 correct answers in this experiment (as can also be calculated from figure 5.1). Since twenty-eight children took twenty-four trials, altogether there were 28 24 ¼ 672 answers. Out of these, 344 were correct. This means that the probability that a child gave a correct answer on a trial is p ¼ 344=672 ¼ 0:5119, which is not very far from the ideal 0.5. Now the calculation for the red columns can be given. Let us only examine k9 , the number of children expected on the binomial model to have exactly 9 correct answers. The probability for a child to have exactly 9 correct responses and 15 incorrect ones is p 9 (1 p) 15 . Note that it does not matter which the 9 correct answers were, so long as there were exactly 9. So we multiply this by the coe‰cient 1307504. Finally since each child had an equal chance to have exactly 9 correct answers, we get k9 ¼ 1307504 28 p 9 (1 p) 15 ¼ 1307504 28 0:5119 9 (1 0:5119) 15 ¼ 1:8788, which rounds up to 2. So, if children were performing randomly, we expect to find two children who answered exactly 9 answers correctly. Performing the above calculation for all k’s, we get the expected distribution indicated in red in the diagram. Comparing the expected distribution in red to the actual results in blue, we see that the expected results fit the actual results reasonably well. So we can conclude that the results of Zuckerman, Vasic´, and Avrutin 2005 are consistent with a hypothesis that the children are guessing. We should note that the overlap of actual results with the binomial model is never one to one. In figure 5.1 we observe two striking deviations on 10 and on 13. There is also one child (out of 28) with 19 correct
254
Chapter 5
answers, which one could call adultlike performance. The final decision depends on whether an alternative interpretation is available to account for the discrepancies, which is di‰cult to imagine for these particular deviations. It seems established that children apply the guessing strategy faced with the experimental task. To assess these results, we should get a clearer picture of the type of computation involved in the resolution of anaphora with contrastive stress. Zuckerman, Vasic´, and Avrutin (2001) attribute their account to Akmajian and Jackendo¤ 1970, by which contrastive stress is used to switch reference. The basic intuition is that identifying coreference requires first establishing the anaphora construal of the same derivation without contrastive stress, under parallelism. Parallelism, as they present it determines that in certain contexts, which Akmajian and Jackendo¤ describe as characterized by continuance in time and action, the DP in a syntactic position parallel to that of the pronoun will be favored as the antecedent of that pronoun. In Zuckerman and colleagues’ example (72), the subject pronoun must corefer with the parallel subject in the preceding conjunct, and the object pronoun must corefer with the object. (Throughout, italics mark coreference; boldface marks main stress.) In all examples below, the stress is obtained by the neutral main-stress rule, as we just saw for (72b) in (71b). (72) a. First Mary touched Sue and then she touched Peter. b. First Mary touched Sue and then Peter touched her. (73) John hit Bill and then Mrs. Smith punished him. (74) Max greeted Felix and then Lucie called him. As the authors point out, under this broad formulation of the parallelism constraint, it is often ‘‘overridden’’ by other discourse considerations. Thus, in their (73), which they also view as a parallelism structure, the object pronoun can easily refer to the subject John, because the context favors John as the one who is punished. I am not sure there is su‰cient evidence to assume that a parallelism constraint of this broad formulation indeed exists. When the predicates in the two conjuncts are not identical, parallelism is not obligatory even when the context does not impose preferences, as in (74), where either Max or Felix can be equally picked up as the reference of the pronoun. (There may be other discourse preferences operative here, but the strong parallelism e¤ect does not hold.) However, if we narrow the formulation, what is at play here is referential parallelism, as stated by Fox (1998) for ellipsis contexts. In (72), unlike (73) not
Processing Cost of Reference-Set Computation
255
just one of the pronouns is anaphoric, but the verb as well. For both, an elliptic form is available, as in (75). (Example (75a) is ambiguous, but we are considering here its construal as in (72a).) (75) a. First Mary touched Sue and then—Peter. b. First Mary touched Sue and then Peter did. In the framework of ellipsis as PF deletion (e.g., Chomsky and Lasnik 1993: see also section 4.4), it is established that what is (optionally) deleted at PF is destressed material. Elliptic derivations, thus, have nonelliptic counterparts (where PF deletion does not apply), but with similar interpretative properties. It is therefore to be expected that the same parallelism constraint that restricts ellipsis is also operative in the phonologically realized, but destressed counterpart. (The correlation between elliptic and destressed configurations is discussed in detail in Williams 1997.) Fox identified two parallelism conditions—structural and referential—and the one relevant here is the second, which states that NPs in the elided (or destressed) part of the derivation must have the same referent as in the corresponding antecedent part. The details of the condition on referential parallelism are not crucial to the present discussion. What is crucial, under Zuckerman, Vasic´, and Avrutin’s (2001) analysis is that resolving anaphora with stressed pronouns requires first construing the same derivation without stress: ‘‘In essence, the stress rule is a two-step operation. The first step entails establishing what the reference for the pronoun would be in a similar unstressed structure. The second step then follows, where the reference that was established in the first step is cancelled or switched’’ (p. 782). Though Zuckerman and colleagues do not explicitly state this, their account means that the resolution of stressed pronouns requires reference-set computation. To determine the reference of the stressed pronoun in (76a), one needs to construct another derivation with neutral stress, namely (72b), repeated in (76b). One needs then to compute the pronoun resolution in (76b), according to the parallelism condition. Based on the result of this step, her in (76a) cannot be assigned the reference parallelism determines for it in (76b), namely, it cannot be Sue. So the option left for the pronoun in this context is Mary. (76) a. First Mary touched Sue and then Peter touched her. b. First Mary touched Sue and then Peter touched her. (77) a. First Mary touched Sue and then she touched her. b. Mary touched Sue and she—her. (Gapping)
256
Chapter 5
If there are two stressed pronouns, as in (77), then the procedure applies to each of them. So the result is that the underlined object pronoun refers to the subject of the first conjunct, and the subject pronoun refers to its object. (These too are configurations that allow ellipsis, of the gapping type, as in (77b).) If so, then children’s guess pattern in tasks involving (76a) follows directly from our hypothesis that children cannot process the computation required in reference set comparisons. The processing required by the switch reference rule is at least equal to the task posed by Rule I, and accordingly, children’s performance is identical—the majority just o¤er a guess. In fact, the task here is harder than required by Rule I, since the computation of the comparison member (76b) rests on referential parallelism, which itself requires access to two representations (the two conjuncts in (76b)). Independently of our problem, children also do not perform very well on the parallelism requirement, though their performance is not at chance level. (In Thornton and Wexler’s (1999) experiments, children accepted violations of referential parallelism 21 percent of the time.17) In any case, on the present account, Rule I already bypasses children’s processing ability. There is no poorer performance than guessing, so the results would end up the same for Rule I and for the stressed switch-reference computation.18 While this is su‰cient to explain children’s performance on switchreference tasks, the account rests on discovering another area of reference-set computation, independent of that required for focus selection with stress-shift. From the perspective of optimal design, discovering a new linguistic rule that requires this computation is not good news. Rather it makes one wonder how optimally designed language could be, if such rules can be so easily discovered. Let us therefore examine briefly the option that the processing cost here is reducible to the problem of focus computation with stress-shift, where we know already that referenceset computation is required. Suppose stress-shift has a single function of selecting a focus not in the focus set of the derivation with neutral stress. So when stress shifts to a pronoun, this is to mark it as a focus. The interpretation of the focus, regardless of whether it is obtained by neutral stress or stress-shift, always rests on establishing some contrast set, or a set of alternatives (leaving maximal IP foci aside for the present discussion). In case the focus is a single referential NP, this simply means that the NP must be disjoint in reference from its contrast set. This is, in essence, the intuition underlying Williams’s (1997) Disanaphora Law, which states that the strong (stressed)
Processing Cost of Reference-Set Computation
257
element is disanaphoric. In standard contexts, the contrast set need not be associated with any specific position in the previous clause, nor even explicitly mentioned. For example, in (78) both she and Max are foci (Max stressed by the neutral main-stress rule, and she by stress-shift). The contrast set for she is identified contextually as everyone, so the pronoun must be interpreted as not in this set, for which Lucie is the obvious candidate. (The interpretative e¤ect is that everyone is construed as everyone but Lucie.) But the contrast set for Max is not explicitly mentioned. Rather, it is construed as the set of better husbands Lucie could find, from which Max must be excluded. (78) Everyone thinks that Lucie could marry more wisely. But she loves Max. (79) Everyone thinks that Lucie could find a better partner than Max. a. But she loves him. b. But she loves him. In standard contexts, it is often equally possible to use either a destressed pronoun, or a stressed one (focus), with interpretative nuances. In (79a), for example, only neutral stress applied. The focus is construed as the whole IP, and the relevant contrast is between everyone’s beliefs and Lucie’s love for Max. In (79b), both pronouns are marked as foci, through stress-shift, and the contrast is construed along the lines of (78). But in the specific contexts of switch reference, where the two clauses are identical in all but one element, the contrast set is structurally fixed. This appears to be another aspect of the parallelism requirement, though it does not follow from the way referential parallelism has been stated above. It appears that, in the spirit of Williams (1997), parallelism is a structural condition that matches object with object and subject with subject. Thus, it matches anaphoric (destressed) elements with their structural counterpart in the antecedent clause, but it does the same with the foci (stressed) elements. However, the switch reference or ‘‘disanaphora’’ outcome of this matching need not be postulated as a specific rule or generalization. In both cases of (76), repeated in (80), the object pronoun of the second clause must be matched with the object of the first, and the subject with the subject. (80) a. First Mary touched Sue and then Peter touched her. b. First Mary touched Sue and then Peter touched her. The di¤erence is that in (80b) the pronoun is destressed, hence it must be anaphoric to its counterpart (by the general destressing convention
258
Chapter 5
discussed in chapter 3). In (80a) it is the focus, hence, its matching with the object Sue means that this object NP is the contrast set for the focus pronoun, which entails that it is disjoint in reference from this set. In (81), with both pronouns stressed, the object is, again, matched with the object and the subject with the subject. Since the pronouns are stressed, hence, necessarily foci, the matching fixes their contrast sets. Her contrasts with (hence is disjoint in reference from) Sue and she, with Mary. (81) First Mary touched Sue and then she touched her. As pointed out in Zuckerman, Vasic´, and Avrutin 2001, the fact that she ends up anaphoric to Sue and her to Mary is not part of the switch reference process, but it is the outcome of these being the only two available referents, given that the other anaphoric construal is ruled out. They point out that this is further confirmed with examples like their (82). (82) John introduced Bill to Harry and then Mary introduced him to Ken. The theme pronoun must be matched by parallelism with the theme Bill. Since the pronoun is the focus, the matching means contrast, so him cannot corefer with Bill. But the derivation still allows the interpretation of the pronoun as either John or Harry. The same is witnessed even more sharply in (83), from Kriszta Szendro˝i (personal communication). (83) Max hit the man near Peter and Felix hit the man near him. The only construal excluded by parallelism matching is that him refers to Peter. But in the situation of men-fight described here, almost all other construals of the pronoun are allowed (Max, the man near Peter, or Felix himself ). On this view, then, there is no need to assume that the neutral stress derivation has to be constructed in order to determine the reference of a stressed pronoun. Rather, it is computed directly on the stress-shift derivation, and it follows from parallelism, just like in the case of destressed pronouns. So the phenomenon of switch reference itself does not require any reference-set computation, or more generally, there is no special rule of anaphora resolution that needs to be assumed for these cases.19 The outcomes follow from the standard interpretation of focus, combined with parallelism. Why should children, then, fail to process switchreference derivations? The resolution of anaphora in these contexts rests crucially on focus identification, and since the foci are anaphoric pronouns, they can only be derived by stress-shift. My hypothesis has been that the identification
Processing Cost of Reference-Set Computation
259
of foci obtained by stress-shift always requires reference-set computation, which surpasses children’s processing ability. To determine the reference of the stressed pronoun, the child first has to determine if it is the focus or not. By our definition of the focus set, if the subject pronoun is stressed, the focus set includes also the IP, as in (84a). (For simplicity, I ignore here the focus status of Max.) (84) a. [First Mary touched Sue and then] she touched Max. Focus set of the second conjunct: {she, she touched Max} b. [First Mary touched Sue and then] she touched Max. Focus set of the second conjunct: {Max, she touched Max} To identify the stressed pronoun as a focus, the child needs to exclude the option that the IP is the actual focus, an option illustrated for an unstressed pronoun in (79a). It is this stage, of identifying the actual focus intended, which requires the construction of the comparison derivation with neutral stress in (84b), thus forming the reference set (84). Given that the IP focus is also available in the neutral stress (84b), it is excluded as a possible focus of (84a), so the stressed pronoun must be selected as the focus. On the hypothesis under consideration, then, it is this preliminary stage of focus identification, that the child gets stuck in. As I mentioned, although children have di‰culties also with the parallelism condition that applies once the focus has been identified, there is no evidence that it uniformly surpasses their processing ability. I suggest that they do not even get to the stage of processing parallelism, and resort to a guess strategy before. Nevertheless, faced with a processing failure, the strategy children resort to does not always have to be guessing. The task in the switchreference cases does not involve semantic disambiguation, typical to focus selection. All that is required is to identify the focus by forming a reference set, as a prerequisite for determining the value of the pronoun. Hence, it is not easy to imagine another strategy but guessing, that would enable bypassing the required computation. Let us now turn to areas where the selection of the focus has semantic (truth conditional) impacts. Focus identification in the scope of only is a clear instance. As we will see, in this area, semantic defaults may play a role. 5.2.4 Guess and Default: Focus Identification in the Scope of Only Let us first review the problem of focus projection in the scope of only which was discussed in chapter 3. The standard assumption is that the potential scope of only is just its c-command domain, where it selects the
260
Chapter 5
focus as its scope. In (85), stress is assigned by the main (nuclear) stress rule to builders. In this case, the scope of only can be either (the narrow focus) builders, or the whole VP that contains it. Suppose our store sells equipment only to builders, but at the same time we also buy used equipment from builders and others. In this situation, (85a), with the narrow focus is true, but (85b) with the VP-focus is false. (85) a. We only sell equipment [F to builders] (not to the general public). b. We only [F sell equipment to builders] (we do not buy anything from anybody). (86) a. We only sell [F equipment] to builders—not health insurance. b. aWe only [F sell equipment to builders]—we do not buy anything from anybody. In (86), main-stress shift applied. The sentence can only be used to exclude the option that we sell anything but equipment to builders, but not to exclude anything else, as witnessed by the inappropriateness of (86b). This means that the only element in the scope of only is the narrow focus—the argument bearing the new stress, as in (86a), but not the whole VP. Computing this type of reasoning, requires construction of a reference set, which consists of hd, ii pairs of a derivation and an interpretation. In this case, the relevant interpretation is a selection of a focus out of the focus set. Alternative (86b) is ruled out, because the focus it selects—the VP is in the focus set of (85b), which is a derivation with no application of main-stress shift. This means that this operation entails a computational complexity, whether the final outcome is ‘‘in’’ or ‘‘out.’’ On the other hand, there is no reason to assume any reference-set computation in derivations involving no stress-shift (as would be assumed in some optimality approaches). Halbert et al. (1995) checked the interaction of only with what they call ‘‘emphatic stress,’’ as in their (87). Under the analysis in chapter 3, the main-stress rule assigns stress to the dative object Miss Piggy, and in (87), stress-shift has applied. (87) Daisy only gave a cherry to Miss Piggy. Thirty-three children age 3;6 to 6;6 were tested. The experiment was a truth-value judgment task. One experimenter told a story. Another experimenter, playing the puppet, then played a tape with a sentence about the story (prerecorded to guarantee the correct stress, on the pretext that the
Processing Cost of Reference-Set Computation
261
puppet has a sore throat), and the child was asked if the puppet was right or wrong. For (87) the story was about Daisy Duck, who had a famous restaurant, with hot dogs and cherries. Miss Piggy, who had spent her day in the gym, came in very hungry. But Daisy explained that it is not good to eat much after working out in the gym, and therefore o¤ered her a cherry, which she accepted. Then a spaceman arrived, who was not familiar with earth food, and Daisy gave him a hot dog. In this context, (87) is true under the narrow construal of focus (the only thing Daisy gave Miss Piggy was a cherry). But it is false on the wide construal, with the VP/IP as focus, namely that the only thing that happened was that Daisy gave a cherry to miss Piggy. Uttered in this context, no adult will have a problem judging (87) as true (as also attested by the judgments of the adult control group in the experiment): this judgment requires first identifying the focus of the sentence (in order to compute only). Since stress-shift is involved, a reference set needs to be consulted, to check whether this is the only way to obtain cherries as focus, and this is indeed the case. (The example is precisely analogous to (85)–(86), where the computation was discussed.) The results of the experiments were that only 46 percent of the children judged it true, so that the results fall within the 50 percent range. Halbert and colleagues proceed to show that children’s problems with such tasks cannot be attributed to a general stress deficiency. (Their experiment on this was described in section 5.2.2.) So the problem must be related to stress-shift. For the processing account assumed here, this is the expected result. The required computation surpasses the processing abilities of children, so they resort to a strategy. As noted, in this approach it should not matter whether we test sentences ruled in or ruled out by the reference-set computation, since both involve the same computation. In the coreference case (of Rule I), the tested sentences were those ruled out by the coreference rule, with the exception of one test, discussed in section 5.1. However, as explained there, the prediction of the analysis is that we should find the same delay e¤ects in cases ruled in. In the stressshift experiment, children indeed faced problems with sentences ruled in. The prediction is, of course, that we should also find the same pattern in contexts where stress-shift is ruled out (which were exemplified in (86b)). The whole point is that children know what they should do to judge whether the sentence is ruled in or out in the given context, but since they cannot perform the computation, they apply some strategy. Halbert et al.’s study was carried out independently of the question of whether reference-set computation may entail processing di‰culties for
262
Chapter 5
children. However, three subsequent studies replicated their experiment, with this explicit question in mind: (on English, Gualmimi, Maciukaite, and Crain 2003 and Gennari et al. 2001; on Dutch, Szendro˝i 2003). They compared children’s performance on sentences with stress shift, like (87), repeated in (89), to their performance on sentences with neutral stress, like (88). In the story context, (88) is judged by adults to be true, while (89) is judged false. (The experimental design is as described for Halbert et al.’s study above.) (88) Daisy only gave a cherry to Miss Piggy. (Adult answer in context: no) (89) Daisy only gave a cherry to Miss Piggy. (Adult answer in context: yes) In terms of the global statistics of group’s performance, they found similar results. In Gualmini, Maciukaite, and Crain 2003, children gave an adultlike (no) reply 87 percent of the time for (88), and only a 35 percent adultlike reply (yes) for (89). In Gennari et al. 2001 it was 97.5 percent for (88) and 36.5 percent for (89). In Szendro˝i’s (2003) experiment on Dutch-speaking children, it was 84.8 percent for (88) and 52.2 percent for (89). It seems pretty established that, as in the case of Condition B, children show an acquisition delay of foci obtained by stress-shift, and that here as well their (group) performance is roughly in the range of 50 percent. Based on this data, it seems safe to conclude that (at least a large group of ) children do not carry out the required processing for the reference-set computation of stress-shift outputs, and resort instead to some strategy. A di¤erent question is whether the strategy is guessing, as in the case of Condition B and the switch-reference cases. The question arises because, in the case of focus interpretation, there are also default strategies available. Gualmini, Maciukaite, and Crain (2003) argue that the strategy cannot be guessing, based on analysis of the reasons given by the children who replied no (the nonadult response). The story and text sentences I use to illustrate their findings are those of Szendro˝i (2003), to which I want to turn next. But the conditions in the two experiments were identical. The participants in the story are Tigger, Piglet, and Winnie the Pooh. They are playing in a garden where there is a lot of old furniture around. Tigger wants to show how strong he is, and he throws furniture over to the others. The relevant propositions that are true in the story context are given in (90). The target sentence with a stress-shift focus is (91). The neutral-stress sentence is (92).
Processing Cost of Reference-Set Computation
263
(90) a. Tigger threw a chair over to Winnie. b. Tigger threw a table over to Winnie. c. Tigger threw a chair over to Piglet. (91) Tigger only threw a chair to Piglet. (92) Tigger only threw a chair to Piglet. Statement (91) with chair as focus is true in this context (the only thing that Tigger threw to Piglet is a chair). But (92) is false, whether it is construed with Piglet or the VP as focus, because Tigger also threw a chair to Winnie. The children who answered no to (91) were asked what they thought had really happened. Gualmini and colleagues report that most of them answered that the puppet was wrong because Tigger also threw a chair to Winnie. This is a very surprising result (that would not be found with adults). It appears that children use the falsifying condition of (92) to falsify (91). Gualmini and associates take this to mean that children interpret Piglet as the focus of (91). This suggests, they argue, that children apply a fixed default that takes the indirect object as focus, or, in the similar conclusion reached by Gennari et al. (2001), that children interpret the focus element to be the last NP (the NP that would bear the neutral stress of the sentence) rather than the stressed NP. If this is the correct interpretation, then children have a bigger deficiency than we have assumed so far: they are unable to identify the focus set associated with the derivation. Recall that under the present analysis, the focus set is computed directly on stress and structure, and it consists of all the constituents that contain the main stress, regardless of whether it was assigned by the neutral main-stress rule, or by stress-shift. The focus sets of (91) and (92) are given below. (93) a. Tigger only threw a chair to Piglet. b. Focus set: {a chair; threw a chair to Piglet.} (94) a. Tigger only threw a chair to Piglet. b. Focus set: {Piglet; threw a chair to Piglet} The VP (threw a chair to Piglet) is in the focus set of both derivations, but they di¤er regarding the other member in the set. In (94), either member of the focus set can, in principle, be selected in context. But in (93), a further consideration applies. Since the derivation involves stress-shift, members of the focus set that would also be found in the derivation without stress-shift are discarded. The upshot is that for adults, the chair is the only possible actual focus. In any case, Piglet is not a member of the focus set of (93). Nevertheless it appears to be selected as the focus by the
264
Chapter 5
relevant children. If this is the case, then Gualmini and colleagues (2003, 96) are correct in pointing out that ‘‘this invites the conclusion that children do not make use of prosodic prominence to determine the associate of the focus operator only.’’ But, as pointed out in Szendro˝i 2003, this is not the only possible interpretation of the findings. Before we turn to her alternative interpretation, let us pay more attention to the semantic conditions underlying the experiment. Children’s task in explaining why the puppet was wrong is to provide the falsifying condition—the proposition that makes the sentence false under a given focus construal. To analyze the children’s responses, we need, first, to get clear about the falsifying conditions for each member of the focus set. The list of true propositions in the context was given in (90), repeated below. We assume that the stress shifted (93) has two members in its focus set. The interpretation of the sentence for each of these members is given in (95) and (96). (90) a. Tigger threw a chair over to Winnie. b. Tigger threw a table over to Winnie. c. Tigger threw a chair over to Piglet. (95) Narrow (DP) focus selection for (93) The only thing that Tigger threw to Piglet is a chair. (96) Wide (VP) focus selection for (93) The only thing that Tigger did is throw a chair to Piglet. (97) Narrow (Dative DP) focus selection for (94) The only one that Tigger threw a chair to is Piglet. (98) Wide (VP) focus selection for (94) The only thing that Tigger did is throw a chair to Piglet. Assertion (95) would be falsified if Tigger had also thrown to Piglet something other than a chair. There is no proposition corresponding to this in the context (90), hence (95) is true. For (96), the falsifying conditions are (90a) and (90b). It is of course enough that one of them holds, to falsify this focus construal, but the experimental context happens to have both. If a child selects the VP-focus construal of (93), he would provide one of these propositions as an answer. Gualmini and colleagues report only answers based on (90a), but Szendro˝i found also responses containing (90b). The falsifying conditions are, of course, identical if the VP is selected as focus in (94), as in (98). The source of the puzzle that led Gualmini and associates to their interpretation of the findings is that (90a) is also the falsifying condition for the narrow-focus construal of
Processing Cost of Reference-Set Computation
265
the neutral-stress derivation, in (97). So they took replies with (90a) to indicate that the latter is the focus construal of the relevant children. Szendro˝i (2003) argues that, in fact, the nonadultlike responses reflect a choice of the VP-interpretation. She shows that there is no reason to assume that children cannot identify the focus set. Rather, what their responses indicate is that about half of them selected the VP-focus out of this set, which would be ruled out by the required reference-set computation. Szendro˝i proceeds to point out that there are in fact two distinct hypotheses that the present analysis rests on, which both can be tested by acquisition performance. The first is that reference-set computation is indeed involved in focus resolution with stress-shift, and the second is that children are unable to execute this computation. The last finding, that children access the seemingly irrelevant VP-focus, is, in and of itself, strong evidence for the first assumption. As mentioned, the theoretical problem with stress-shift foci is similar to that raised by Rule I. The actual contexts that allow a pronoun to corefer in apparent violation of the binding requirement of Condition B are rather minimal, so the assumption that a complex reference-set computation is involved in coreference resolution may seem speculative. Similarly, in the case of focus, there are only very limited contexts that motivate the assumption that such computation takes place, namely, the cases where the stress has shifted but the focus still projects (e.g., in subject DPs). Other than these, it appears easy to imagine di¤erent approaches that would not require any such computation for the relevant problem of focus-projection. Alternatives can be either changing the definition of focus set, so in the case of stress-shift it contains only the stressed element, or similar manipulations of focus features (discussed in Szendro˝i 2001). Statistically, in an overwhelming majority of cases such a formulation would capture the facts. But under such views, it would be impossible to explain why children access the VP-focus at all in such derivations. (A similar point was made by Gennari et al. 2001, who interpret what they view as the selection of the indirect-object focus as an indication that children access a derivation with neutral stress.) The fact that many children get stuck in this first stage of executing the required computation and select the VP-focus is the most direct evidence that this focus construal is active in the computation of focus with stress-shift. However, accessing all members of the focus set is only the first stage in determining the actual focus of sentences with stress-shift. The next step is computing whether any of these members could also be obtained without stress-shift, namely, the reference-set computation. The second hypothesis
266
Chapter 5
under examination is that children are unable to execute this computation, due to limitations of their working memory. We have already seen that children indeed do not complete this computation the same way adults do, and we are concerned here with the question of what strategy they use instead. Based on the group statistics examined so far, the strategy could still be guessing. Children have to decide whether the focus is the wide VP or the narrow DP. About half the time they select the one and thus give what appears to be an adult answer; the other half of the time they select the other. However, in an individual analysis of children’s responses, Szendro˝i (2003) showed that uniform performance of individuals across trials is substantially larger than could be predicted by a binomial model. Out of the twenty-three children in Szendro˝i’s experiment, seven consistently gave the yes answer on the stress-shift condition like (93), which means they consistently selected the narrow focus (a chair). Nine children consistently rejected the sentence, which means they selected the wide-VP focus. The other seven children had less consistent responses. It seems that at least two subgroups of the children have internalized a default selection of either the narrow or the wide member of the focus set. In the case of foci in the scope of only, it has been established that both adults and children also apply default strategies with neutral stress, where no reference-set computation is involved. So, as Szendro˝i points out, it is possible that they are also just applying their independent strategies in the case of stress-shift. The next question, then, is whether, indeed, children’s di‰culties with stress-shift indicate a specific inability to compute the reference set. 5.2.5 Useful and Arbitrary Defaults Focus resolution provides an opportunity to examine the relations of reference-set computation to the resolution of semantic ambiguity, a question raised before, in the discussion of Rule I. Independently of stress-shift, focus resolution (with neutral stress) is a procedure that involves selecting the relevant focus out of the focus set associated with the derivation. This, then, is an instance of semantic ambiguity. Unlike the case of reference-set computation, the competing alternatives are directly available locally in the derivation, and need not be constructed or retrieved. Nevertheless, the resolution of semantic ambiguity requires comparing two (or more) semantic representations relative to context. This alone already imposes some load on working memory (e.g., Crain and Steedman 1985; Altman and Steedman 1988). Returning to the hypothesis of optimal design, semantic ambiguities can be viewed as a sort
Processing Cost of Reference-Set Computation
267
of imperfection of the computational system. A system that would fit the limited hardware of the human processor perfectly would not require holding two representations in working memory. Indeed, though it is pretty well established that adults have the full ability to carry out these contextual selections, this is also the area where they develop default strategies enabling them to bypass the comparison and selection procedure, when possible. Crain, Ni, and Conway (1994) point out that these defaults can be experimentally witnessed when sentences are given with no context, as in some processing experiments, or when the context is consistent with more than one interpretation. It is not established that the required computation exceeds children’s processing ability, but, as we will see, they do develop defaults as well. Semantic defaults are not computed individually for each derivation, but rather provide a ready-made way to make the selection without an actual contextual computation. Nevertheless, what distinguishes a default from mere guesswork is that there are underlying principles that determine which defaults may be set. Crain and Hamburger (1992) argue that in resolving ambiguity, the semantic parser is motivated by the need to minimize cognitive e¤ort, or overload on the limited working-memory capacity. The generalization behind the principle they propose is that the semantic parser tries to reduce the risk of making commitments that will need to be changed later. As Crain, Ni, and Conway put it, the semantic parser is guided by a ‘‘minimal commitment’’ preference. Crain, Ni, and Conway (1994) illustrate this principle with instances of semantic entailment—that is, one of the members of the set of possible interpretations entails the other. Focus construal with only, in neutralstress sentences, is one such instance. Let us examine this with the sentence that we have been following, repeated in (99)–(101). (99) a. Tigger only threw a chair to Piglet. b. Focus set: {Piglet; threw a chair to Piglet} (100) Wide (VP) focus selection for (99) The only thing that Tigger did is throw a chair to Piglet. (101) Narrow (Dative DP) focus selection for (99) The only one that Tigger threw a chair to is Piglet. The wide-focus construal with only always entails the narrow-focus construal. Specifically (100) entails (101). This means that the situations in which the wide construal (100) is true are a subset of the situations in which the narrow construal (101) is true. (Alternative (100) rules out
268
Chapter 5
many more situations than (101) does. For example, if (100) is true, it cannot also be true that Tigger threw a table to Winnie, as in our story context. But (101) remains true in this situation.) Another way to describe the distinction is that (100) is more informative than (101)—that is, accepting it means undertaking a bigger commitment. The principle of ‘‘minimal commitment,’’ or lowering the risks of the semantic parser, entails that in such contexts adults should prefer the less committal narrow-scope construal (101). An intuitive way to motivate this preference is that when a hearer selects this option of interpreting a speaker’s utterance, he or she maximizes the chances of being in accord with the speaker’s intentions. Crain, Ni, and Conway conducted several experiments that show that such a preference is indeed strongly operative in adults. Although they do not discuss this in their 1994 paper, the same entailment relations hold in the cases of quantifier scope that we examined in section 2.1.3. As we saw there, one of the scope construals of (102) entails the other. (102) Everybody in this room speaks two languages. (103) Wide existential scope There are two languages such that everybody in this room speaks these same two languages. (104) Narrow existential scope For every person x in this room, there are two languages that x speaks (possibly di¤erent languages for di¤erent persons). As I argued in chapter 2, in this case, the scope of the existential quantifier is determined by where the choice function that corresponds to the indefinite DP is existentially closed. So no QR-movement is involved in any of the readings. This means that as far as the computational system goes, they are equally permitted. (Reference-set computation is required only when an illicit covert operation needs to apply.) But examined from the perspective of the semantic parser, or e‰ciency of communication, the situation is precisely the same as in (99). Construal (103) entails (104)— that is, (104) is consistent with more situations in the world than (103) is, or (103) is true in a subset of the situations where (104) is. Though both scope construals are equally permitted, the principle of minimal commitment entails that in the absence of context, as in this example, the default preferred interpretation would be (104), which closes less options than (103), and thus maximizes the language user’s chance of being correct.
Processing Cost of Reference-Set Computation
269
Returning to the focus-selection default, Crain, Ni, and Conway discovered that children too use default strategies in the disambiguation of only sentences, but for many children it is the opposite default, of ‘‘maximum commitment.’’ Rather than the adult choice of the narrow-focus interpretation (101), these children select the wide-focus VP (100), which excludes more situations. The account Crain and colleagues give for this default is that it is the most e‰cient strategy to acquire a potential semantic interpretation. The more children exclude, the more they will get the chance to add, based on positive evidence, while subtracting from the initial hypothesis may require negative evidence. Crain, Ni, and Conway assume that this strategy reflects an obligatory requirement for language acquisition, whenever one interpretation is a subset of the other. This is their semantic subset principle, which determines that to avoid subset learnability problems, learners initially hypothesize an interpretation that makes a sentence true in the smallest set of situations. But it is not clear from their experimental studies whether indeed all children follow this default. A VP-default was identified for three out of six children in one of their experiments, and for eight out of twelve in the other. It may turn out that some of them apply the narrow default, just as adults do. It is also not established whether a majority of children also apply the same principle of maximum commitment in other instances, like the quantifier-scope example above. Furthermore, I am not sure that the semantic-subset cases pose a learnability problem in the classic sense. This would be the case if the range of interpretative options was to be acquired just based on experience. However, the hypothesis here is that semantic knowledge, like syntax, is innate. So the child knows innately that a derivation like Tigger only threw a chair to Piglet is ambiguous. Specifically, the innate definition of the focus set (combined with knowledge of the c-command scope of only) identifies two foci in this derivation. The child also knows the truth conditions associated with each of the focus selections. Possibly what motivates the VP-default children is the drive to discover actual uses of the two interpretations they know are associated with the sentence. If this is the case, then consistently selecting the reading compatible with the fewest situations indeed enables them to identify the contexts associated with the broader readings. Given that it is not yet established that all children always abide by the semantic-subset principle, or maximum commitment, it may even be the case that the default the children settle on reflects other aspects of their personality, like curiosity (maximally exclusive reading) versus the ambition to succeed
270
Chapter 5
(minimally exclusive reading). A decision on these matters is not crucial for the present discussion. In any case, the characteristic property of defaults used for semantic disambiguation is that they are useful, relative to a goal. Unlike the option of bypassing contextual computation by guessing, the selected default has greater chances of meeting the goals, whether these are the standard ones of minimum mistakes and revisions, or some special motivation favored for the acquisition stage. However, when it comes to reference-set computation, the semanticdisambiguation defaults turn out to be useless. The reason is that what needs to be computed is not just the appropriateness of a given interpretation to context, but whether this interpretation is not independently ruled out. Let us view the task of a child encountering a stress-shift sentence like Tigger only threw a chair to Piglet. We saw that the child correctly identifies the focus set and thus the two semantic representations of the sentence—that is, the di¤erent sets of truth conditions associated with each. In the given experimental setting, the sentence is false under the wide-focus VP reading and true under the narrow-DP reading. The child has to determine which of these is intended, in order to answer whether the sentence is true or false. We assume that children can identify stress— that is, that they know the main-stress rule. Hence, the child realizes that the derivation involves stress-shift and thus that it violates the basic stress rule. We assume next that, as with Rule I, the child knows that when such a violation occurs, reference-set checking should be executed to determine that the contextual needs could not be satisfied without this violation. This entails mentally constructing an alternative derivation with neutral stress, and filtering out any focus construal that is also available at that derivation. But the child cannot carry out this computation. In the case of coreference or switch-reference resolution, the only option left is guessing. In the case of focus, there appears to be another option. Superficially, processing both neutral-stress and stress-shift derivations involves disambiguation, namely selecting a member from the focus set, so the child may decide to use the general disambiguation default for only foci, and thus avoid the required reference-set computation. Suppose, then, that our child has a wide-focus VP-default. He or she would then apply the same default in the reference-set task. However, this default is irrelevant for the task, because the VP-focus is not available with stress-shift. It is impossible to learn anything about the contexts favoring wide VP-construals, by systematically using this construal when a di¤erent module of the language system rules it out, independently of
Processing Cost of Reference-Set Computation
271
the subset problem. Opting for an irrelevant default can thus be viewed as another form of guessing, or at least a systematic way to bypass the required computation. But the arbitrary selection that children with the VP-default make is set in advance for all tasks, so their performance is consistent across trials—they will always be wrong, in comparison to the adult response. If a child is set on the narrow-focus default for semantic disambiguation, applying this same default to bypass the reference-set computation turns out more useful, though again for irrelevant reasons. The narrow default guides the user to always avoid the wide-focus option. Since in the relevant instances of stress-shift it is the wide focus that is excluded by the required reference-set computation, children who use this default will always end up with the correct (adultlike) response. Note, however, that this apparent success is nevertheless based on an irrelevant default. The only reason this default is useful is statistical probability. It is indeed the case that in most instances the reading excluded by the semantic default is also excluded by the relevant reference-set computation. So based on probability, one’s chances of being correct when applying this default are high. In fact, since this irrelevant default is so successful statistically in contexts of stress-shift in the scope of only, it is possible that adults as well adopt it to bypass the required computation. Recall that in Szendro˝i’s (2003) experiment, three groups were identified: one that systematically fails the task across trials, one that systematically succeeds, and an undecided group. Though further experiments are needed to confirm this, it is possible that the first is the group that borrows the wide VP-default for the task, the second is the group that borrows the narrow default, and the third are children who just guess. For focusidentification tasks in the scope of only, it turns out to be impossible to check further whether the third of the children in Szendro˝i’s experiment who systematically respond correctly across trials are successful in carrying out the required reference-set computation, or are just borrowing the narrow-focus default from the area of semantic disambiguation. For these reasons, it may be more useful to study children’s identification of focus with stress-shift in di¤erent semantic contexts, and not in the scope of only. To conclude the discussion of the acquisition of stress-shift, we saw that (at least the majority of ) children do not carry out the required reference-set computation, but they utilize some bypassing strategy. The strategy can be guessing, as in the case of switch-reference, or it can be an irrelevant semantic default, available for other tasks of semantic
272
Chapter 5
disambiguation. Either way, since they do not execute the required computation, whether their answers in the experimental tasks correlate with those of adults is purely accidental. 5.3
Acquisition of Scalar Implicatures
Historically, scalar implicatures were among the first areas where what is known today as reference-set computation was argued to exist. Grodzinsky, Reinhart, and Wexler (1990) argued that in this area we should expect to find the same acquisition pattern that had by then already been established for Condition B, or Rule I. That is, children’s performance on scalar implicatures should also be in the range of 50 percent. This was based on the assumption of Reinhart (1983a) that the coreference rule itself is a special instance of generalized conversational implicatures;20 hence the computational complexity in other instances of scalar implicatures must be of the same magnitude. If children’s 50 percent performance on the relevant coreference tasks is due to processing di‰culties, they should have the same di‰culties with scalar implicatures. Reducing the coreference rule to implicatures was mistaken, for various reasons, some of which are discussed in chapter 4. Nevertheless, the computation assumed for deriving scalar implicatures in the Gricean-based framework has the properties of reference-set computation, as outlined in this book, although the trigger and the details of the computation are di¤erent from those involved in Rule I. For this reason, scalar implicatures are listed in Reinhart (1999b) as one of the four areas where reference-set computation is involved and hence, the 50 percent range in acquisition is still to be expected. The essence of the view of implicatures that stemmed from Grice 1975 is that certain aspects of the meaning of sentences in context are derived from considering a set of alternative options speakers had for expressing their intentions. For instance (skipping many finer details that have by now been widely discussed), utterance (105a) invites the hearer to consider the possible alternative (105b). Alternative (105b) entails (105a) and is thus more informative than (105a). Assuming that speakers attempt to be maximally informative (or that the context is such that being maximally informative is required), the hearer would infer that the reason the speaker has avoided the more informative option is that implicature (105c) holds, which is obtained by negating alternative (105b). Formulating the computation of implicatures in terms of speaker and hearer’s intentions, as in the Gricean tradition, is not crucial. Rather the question
Processing Cost of Reference-Set Computation
273
can be stated as determining which assertion is added to the context set at the context interface. (105) a. Lucie studies linguistics or chemistry. b. Alternative: Lucie studies linguistics and chemistry. c. Implicature: It is not the case that Lucie studies both linguistics and chemistry. (106) Lucie studies linguistics or chemistry but not both. Grice’s original formulation left open a serious question regarding how the set of relevant alternatives is determined, or what could count as a potential alternative. For instance, why should we consider (105b)—rather than the possible sentence (106)—to be the alternative of (105a)? Had (106) been a relevant alternative, we could infer that the speaker avoids it and uses (105a) instead, to also allow for the option that Lucie studies both subjects. The initial assumption was that the alternatives must be equally ‘‘brief ’’—that is, they must conform equally to Grice’s maxim of manner—which is not true of (106), compared to (105a). But that still raised questions of how ‘‘equal brevity’’ can be measured. In the case of (106), opting for the most informative phrasing requires using an elaborate conjunction. But in the case of numerals, the speaker’s intention could easily be clarified by using the modifier exactly. How is one to determine, then, that when the speaker utters Max has three children the excluded alternative is Max has four (or more) children, rather than Max has exactly three children? More generally, the search for relevant alternatives must be restricted in some systematic way. This problem was solved with the development of the notion of a scale in Horn 1972 and Gazdar 1979, which was brought back to the forefront in Chierchia 2004. On this view, the set of alternatives is lexically determined. The relevant lexical items that can give rise to scalar implicatures are listed in the lexicon together with their scales, which are internally ordered by strength (translatable to entailment relations). Along with the {or, and} scale, the positive-quantifier scale includes {some < many < most < all}. Some other Horn scales are given in (107). (107) a. b. c. d.
Modals: f possibly < necessarilyg, fmay < should < mustg Adverbs or quantification: sometimes < often < always Propositional attitude verbs: believe < know Completion verbs: start < finish
The question of under what conditions lexical items are associated with a scale has not been substantially studied, but there is reason to believe that
274
Chapter 5
association with a scale is to a large extent universal. We assume then that for each scale item, its scale is listed in its lexical entry, and it is only from this scale that alternative propositions can be drawn. However, identifying the set of scalar alternatives is just the initial stage of the computation of implicatures. As we will see, the relevant comparison set is not (105a) and (105b); rather the comparison is based on the logical representation of the given derivation (105a) with the representation that includes the implicature (something similar to (105c)). As in other instances of reference-set computation, the comparison considers whether the interface needs could not be satisfied without constructing an alternative representation. Establishing this requires some elaboration, which I will return to. Since 2000, when research on the acquisition of scalar implicatures intensified, the expectation of finding the 50 percent range of performance in this area has been confirmed by experimental findings. Sporadic earlier studies suggested that children do not compute implicatures at all, but instead respond to sentences with scalar items as would be determined by their logical interpretation (Smith 1980; Braine and Rumain 1981; Noveck 2001). However, later studies indicate that children are familiar with the required computation, but their overall (group) performance on implicature tasks is in the 50 percent range. As in the research on coreference in Condition B environments, not all researchers are aware of the significance of the 50 percent findings, but the numbers in their experimental results speak for themselves. These findings do not extend to numeral scalar items, where it was determined that children perform much more like adults. I turn to the question of why this is so at the end of this section. Before turning to these acquisition findings, I should mention that scalar implicatures have become a central topic of research since 2000. There are many competing analyses of their semantics and computation. Given that the set of scalar alternatives can be computed compositionally, based on the lexical entry, much recent debate revolves around the question of whether the computation of implicatures belongs to pragmatics, as in the Gricean view, or to the computational system (syntax or semantics). Another persistent debate is about whether the implicature is associated with the derivation by default, as assumed in the neo-Gricean tradition and further developed by Chierchia (2004), or whether it is added at the context interface, as assumed in Relevance Theory. In many cases, it appears that such debates may be more conceptual than empirical. Furthermore, within the approaches locating implicatures in the computational system
Processing Cost of Reference-Set Computation
275
(CS), some of the more intricate competing analyses are not easily distinguishable from one another, on empirical grounds. I will not survey all the competing approaches, but will focus on one empirical criterion for theory selection. Within the hypothesis of optimal design, central to this book, the computations postulated by CS theory may have direct correlates in the processing of derivations. Correspondingly, facts about processing may be relevant for deciding between competing theories. The hypothesis central to this chapter has been that a 50 percent range of performance in acquisition indicates a processing failure: it is found when the resources required for completing a computation exceed the working memory available for children. If indeed children perform in this range on scalar implicatures, any theoretical analysis should be such that it would also explain this fact. As we will see, Chierchia’s (2004) strictly local semantic analysis is ‘‘too easy’’ in this respect—it cannot explain why children should have any problem processing scalar implicatures. More broadly, as argued by Breheny, Katsos, and Williams (forthcoming), the default approach to implicatures is also inconsistent with adults’ processing findings. A crucial empirical criterion in establishing the precise computation applicable in scalar implicatures, then, is the question of whether the proposed analysis also explains what is known about their processing. With the necessary background laid out, we can turn to the actual acquisition findings. Chierchia et al. (2001) and Gualmini et al. (2001) studied the acquisition of disjunction (or) implicatures. They show first that children master the semantics of or—specifically that they are familiar with its logical inclusive reading. This is checked with downwardentailing contexts, such as (108), where the exclusive-or implicature (a or b, but not both) does not arise. (108) Every dwarf who chose a banana or a strawberry received a jewel. The experiments used the truth-value judgment task. For (108), children were told a story about Snow White and four dwarves at a picnic. Snow White suggested they eat healthy food, reminding them that bananas and strawberries are healthy. She promised a jewel to those who chose healthy food. Three of the dwarves chose both a banana and a strawberry, and received a jewel. One of the dwarves chose potato chips instead, and did not receive a jewel. At the end of the story a puppet produced sentence (108), and the children’s task was to reward the puppet with a coin if it said ‘‘the right thing.’’ Adults’ judgment is that the sentence is true in the story context, as witnessed by 95.5 percent approval of (108) in the adult
276
Chapter 5
control group. Children’s performance was not significantly di¤erent. Fifteen children (age 3;7 to 6;3, mean age 4;11) participated in the experiment and were presented with four target trials similar to (108). They accepted the target sentence fifty-five times out of the sixty trials (91.6 percent). But in contexts licensing the exclusive-or implicature, children’s performance is quite di¤erent from that of adults. On a typical trial in the experiments on this context, children heard a story about four boys at a summer camp who were debating which of several toys to play with. At the end all four boys chose both a skateboard and a bike. The puppet then uttered sentence (109). (109) Every boy chose a skateboard or a bike. Though logically (109) is true in this context, adults feel this is an inappropriate answer, because for this situation the stronger representation is required (Every boy chose a skateboard and a bike). Use of the weaker (entailed) version (109) gives rise to the implicature that not every boy chose both a skateboard and a bike, which is false in the story context. Accordingly, the adult control group rejected (109) in 100 percent of the trials. But children performed here at the 50 percent range. Fifteen children participated (age 3;5 to 6;2, mean age 5;2). Each child was presented with four target sentences. They accepted the target sentence thirty times out of the sixty trials (50 percent). In group terms this is a chance performance, though, as we will see, the findings do not correspond to individual guessing. Essentially the same range of performance was found for the some . . . all scale in Papafragou and Musolino 2002, 2003. This case has an interesting history. As mentioned, initial studies on the acquisition of implicatures concluded that children do not attempt to compute them at all. Papafragou and Musolino’s first experiment appeared to confirm this contention. In a typical trial of their first experiment, children were told a story about three dinosaurs who went to get something to eat. Having contemplated some other options, all three dinosaurs ended up eating trees. The puppet was then asked to describe what happened, and uttered sentence (110) (in Greek). Children were asked to say whether the puppet ‘‘answered well.’’ (110) Some of the dinosaurs ate trees. Thirty Greek-speaking children participated (age 4;11 to 5;11, mean age 5;3). They rejected target sentences like (110) only 12.5 percent of the time. By comparison, the adult control group rejected it 92.5 percent of
Processing Cost of Reference-Set Computation
277
the time. Again, (110) is logically true in this context, but adults reject it as a plausible description of the situation, because it would be more appropriate to use a sentence like All dinosaurs ate trees, as the adult participants explained in justifying their rejection of (110). Based on these results, it seems that children completely skip the computation of implicatures, and since (110) is a logically correct answer, they accept it. However, Papafragou and Musolino (2003) argue that a potential reason for this and similar acquisition findings is that children interpret their task as just saying yes if the target sentence is true, and do not even consider the question of whether there are more appropriate ways to describe the events in the stories, which adults do automatically. Hence Papafragou and Musolino designed their second experiment so that it directs children’s attention to the information status, relevance, or appropriateness of a sentence in a context. Children were told that the puppet Minni sometimes said ‘‘silly things’’ and the aim of the game was to help her ‘‘say things better.’’ The target tasks were preceded by a training session, where Minni was helped to express herself better. (For instance, she described a dog as ‘‘a little animal with four legs,’’ and was corrected to use ‘‘a dog’’ instead). The target stories themselves were modified so that the issue of whether the predicates held for some or all of the objects became more central. The stories were based on scenarios involving a contest or challenge. For instance, one of the characters claimed he was very good at throwing hoops around a pole, and he challenged Mickey to try to do the same thing with three hoops. Mickey concentrated hard and managed to put all three hoops around the pole. At the end of the story, the puppet was asked ‘‘How did Mickey do?’’ and answered by uttering (111) (in Greek). (111) Mickey put some of the hoops around the pole. As Papafragou and Musolino (2003) describe the results, children’s performance improved significantly in this experiment. Thirty Greekspeaking children (di¤erent from the group in the previous experiment) participated in the experiment (age 5;1 to 6;5, mean age 5;7). In the present experiment, they rejected targets like (111) 52.5 percent of the time, as compared to their mere 12.5 percent rejection in the previous experiment. But it has gone unnoted in this description that what appears as an improved performance rate is, in fact, in the 50 percent range, which, as we saw in the introduction to chapter 5, is a suspicious result in the realm of chance, and not typical of acquisition findings. In the same experiment, Papafragou and Musolino also studied the {start, finish} scale (observed
278
Chapter 5
in (107d)). Children heard a sentence like Donald started coloring the star (in Greek), when in the story Donald finished coloring the star. They rejected this sentence 47.5 percent of the time, which is again in the chance range. It can be concluded that the method used by Papafragou and Musolino is indeed successful in directing the children to consider the appropriateness of the expression to context, and hence, to attempt to compute the implicatures that adults compute in the same experimental setting. But once children are trying to compute the implicature, we discover a processing failure. In our terms, the task at hand involves reference-set computation. In the appropriate experimental setting, children know precisely what they have to do to carry it out, but fail the execution and resort, instead, to some strategy bypassing it, as in the other instances of this computation that we have examined. The findings on the acquisition of implicatures bear a striking similarity to the acquisition of stress-shifted focus, discussed in section 5.2. First, in both, di‰culties witnessed by performance in the 50 percent range are found only in comprehension, not in production. In production, children never use all when they mean some but not all (Papafragou and Musolino 2003). The felicity-task experiments reported in Chierchia et al. 2001 and Gualmini et al. 2001 can be interpreted to the same e¤ect: when given a choice of two sentences that could describe a particular situation, children correctly chose the representation facilitating an implicature (with or) rather than the entailing representation (with and ). Next, the analysis of individual performance provided by Chierchia et al. 2001 reveals that the pattern of responses in implicature experiments is not individual guessing, but a pattern of fixed responses (with about half the children consistently rejecting the target sentences, which can make it look like they are computing the implicature, and the other half accepting it). Essentially the same results are reported in Papafragou and Musolino 2002 (272, note 9), although for some reason they report the individual results of only ten of the thirty participants in the experiment. As we saw in sections 5.2.4 and 5.2.5, the same was found with stress-shifted foci. In the terms presented there, this suggests that in the area of implicatures as well, children are operating by means of a fixed, though irrelevant, default—a point I will return to. To analyze the acquisition findings further, we first need to get a clearer picture of the mechanism generating implicatures. Landman 2000 and Chierchia 2004 raise some serious problems for the prevailing view of the derivation of implicatures. On that view, implicatures are computed
Processing Cost of Reference-Set Computation
279
globally, as the representation is entered into the context set. Landman examines sentences like (112). Retrieving the scale alternatives of three girls, the implicature calculation should be obtained by negating all members of this set, namely, events of kissing four or more girls. So, roughly, the final interpretation should be that each boy kissed three girls, and not more than three. (112) a. Every boy kissed three girls. b. It is not the case that every boy kissed more than three girls. But Landman points out that if we apply this computation globally, we do not get these truth conditions. Instead, the derived implicature would be (112b), where external negation applies to the whole proposition. Representation (112b) leaves open the option that some boys kissed more than three girls, which the sentence, under the implicature construal, does not in fact have. Chierchia 2004 argues that computing implicatures at the output stage, on completed semantic representations, is bound to give the wrong semantics in configurations of embedded implicatures. This can be illustrated in (113) (which is based on Chierchia’s (21), 46, but with a di¤erent reported situation, just for variety). (113) Our employees are either paid by the hour or given some of the profits. (114) a. Our employees are either paid by the hour or given some, but not all, of the profits. b. Our employees are not both paid by the hour and given some of the profits. Utterance of (113) is strongly associated with the two implicatures illustrated in (114). Deriving (114b) is not problematic for the global approach, but the problem is how the embedded implicature in (114a) could be derived. The set of alternatives for the second conjunct is derived from the lexical scale of some, whose topmost member is all. So the alternative based on all is (115a). As before, we assume that the implicature is derived by the negation of the alternative. Since the negation applies globally at the final stage of the representation, the result would be (115b). (115) a. Our employees are either paid by the hour or given all of the profits. b. It is not the case that our employees are either paid by the hour or given all of the profits.
280
Chapter 5
The problem is that logically this external negation entails that neither of the conjuncts can be true (s(P or Q) entails sP and sQ). So (115b) also negates the possibility that our employees are paid by the hour, which sentence (113) clearly allows. Based on this and several other arguments, Landman and Chierchia conclude that implicatures are computed locally, as soon as their trigger is processed. I will pursue Chierchia’s implementation here, though I will not enter into the precise details of the formulation (see Chierchia 2004, 59–65). This solves the problem of embedded implicatures, and derives the correct representation for (113) and its two implicatures in (114). But it means that implicatures are inserted into the semantic derivation by default, regardless of whether they are eventually realized. Specifically, Chierchia shows that in downward-entailing (DE) contexts, scalar implicatures of the type we have examined are always absent. This was exemplified in sentences (108) and (109), used in Chierchia et al.’s (2001) experiment, which are repeated with some variation in (116) and (117). Sentence (116) implicates that the boys did not choose both a banana and a strawberry, but (117), where the disjunction is in the (restrictive) scope of a conditional DE operator, does not implicate that to receive a jewel a dwarf could not choose both a banana and a strawberry. (116) Every boy chose a banana or a strawberry. (117) Every dwarf who chose a banana or a strawberry received a jewel. (118) Every boy (lx (x chose a banana or a strawberry and sx chose a banana and a strawberry)) (119) Every dwarf (lx ((x chose a banana or a strawberry) (x received a jewel))) Chierchia argues that this is obtained by removing the implicature clause in the presence of a DE operator. (To be precise, he argues that in that case the implicature clause is replaced in favor of another type of implicature, which he labels ‘‘indirect,’’ but I will not discuss this type here.) For (116), the semantic derivation is given in (118). (An alternative with and is constructed, negated, and conjoined with the predicate composed from the ‘‘plain’’ entry of or.) The semantic derivation of (117) starts precisely the same way, so a predicate identical to that in (118) is initially constructed. However, when the conditional operator of (117) is encountered during the derivation, the added conjoined predicate (the implicature) is removed, so the final outcome is (119), which is the original construal of the sentence with no implicatures. This removal is governed by a general condition that Chierchia names ‘‘the strength condition.’’ To state it,
Processing Cost of Reference-Set Computation
281
some of Chierchia’s notation needs to be clarified. For any English expression a, kak is its value, computed in standard semantic notation. Chierchia calls this value the ‘‘plain’’ value. But the gist of Chierchia’s proposal is that along with this value, a is also assigned compositionally a scalar value (kak S ), which is computed based on the scale of one of the elements in a, as informally illustrated above. The plain value of (116) can be informally represented as Every boy (lx (x chose a banana or a strawberry)), and its scalar value is (118). The strength condition can be stated as in (120a). (120) Strength condition a. The scalar value of a cannot be weaker than its plain value. b. A representation a is stronger than a representation b i¤ a entails b. Condition (120a) is checked locally, at each step of the derivation, and its e¤ect is to filter out implicatures when they lead to weakening of the original information content, rather than strengthening it. In the case of (116), the scalar value (118) entails the plain value. Hence, by (120b), the scalar value is stronger, and the implicature goes through the filter (120a). But in the case of (117), it is the other way round—the plain value, which is the one that survived in (119), entails the scalar value, which would be Every dwarf (lx ((x chose a banana or a strawberry and sx chose a banana and a strawberry) (x received a jewel ))). Hence, the scalar value is weaker than the plain value, and condition (120a) filters the scalar value out, leaving the derivation with the representation (119). Returning to the acquisition findings, Chierchia’s analysis raises two questions. First, it is not clear why children should have any problems with processing active (non canceled) implicatures. The type of computation he proposes involves just local evaluation of alternatives. As Chierchia notes, this is essentially the same computation involved in the interpretation of focus (specifically using the implementation of Krifka (1995) for this computation). There is no reason to suspect that children have problems with this computation. The problems I surveyed in section 5.2 all concern identification of the focus constituent, but not the interpretation of the focus. (121) a. Daisy only gave a cherry to Miss Piggy. b. Daisy only gave a cherry to Miss Piggy. The area where 50 percent performance was found was in identifying that (121b) with stress-shift only allows the narrow focus cherry. In (121a), the derivation allows both Miss Piggy and the whole VP to serve as focus,
282
Chapter 5
and the selection of the actual focus is context dependent. As we saw in section 5.2.5, children also do not behave the same way as adults in selecting the relevant focus out of the focus set in neutral-stress sentences like (121a). It is possible that children apply a fixed default here, rather than computing the context. However, the computation parallel to that proposed by Chierchia for implicatures is not similar to this process of disambiguation. Instead it is similar to the task of determining the interpretation of the focus, once identified—for instance, perceiving that if Miss Piggy is selected as a focus in (121a), then Daisy did not give a cherry to anyone else. There is no indication that this basic construction and exclusion of alternatives poses any problem to children. If anything, the computation required in the Chierchia system for implicatures is slightly easier, because the set of alternatives is constructed directly from the scale associated with the item in the lexicon, and does not require consulting the context. The next question is why children have no problems with computing the DE contexts, as in (117), where the implicature is canceled. The computation involved here is essentially the same as in (108) or (116), where the implicature is preserved, and where children performed in the 50 percent range. At the lower phase of (117), the scalar term is computed and a scalar value is constructed; then at the higher phase, the implicature clause is removed. Whatever computation is causing children problems in (116), then, is also present in (117). Possibly an answer to the first question could lie in the strength condition (120). Checking this condition requires comparing two representations at each step of the derivation. Though the comparison is local, executing it requires keeping two parallel semantic representations throughout the derivation—the plain reading is not discharged, but is kept along with the scalar reading for comparison in the next step of the derivation. This is one clear di¤erence from the computation of focus, where no comparison of competing values is required, once negation of the relevant members of the alternative set has applied. With condition (120) constantly checked, the computation of implicatures shares some properties with the instances of global reference-set computation that I have discussed in previous chapters, although it is local. Given that children are unable to execute global reference-set computation, as we saw, this could, perhaps, also explain their 50 percent range of performance on implicatures. But with this assumed, the problem with the DE contexts like (117) only sharpens. Filtering out the implicature in (117) requires precisely
Processing Cost of Reference-Set Computation
283
the same procedure of comparing the plain and the scalar value. While in (116), which children fail to process, this computation applies only once, in (117), it applies twice—in the first application the scalar value is let in, as in (116), because it conforms with the strength condition (120), and in the second application, in the scope of the DE operator, the comparison filters the scalar value out. It appears that the computation involved in (117) should be more di‰cult for children than that of (116). Possibly, because the comparisons are local, the number of applications does not increase processing di‰culty, but in this case, (117) should be just as di‰cult as (116) for the same reason as before—that is, because they involve the same computation in the first phase. This acquisition puzzle ties in with an ongoing debate regarding how scalar implicatures are derived. Chierchia 2004, following, in this respect, a long neo-Gricean tradition starting with Gazdar 1979 and Horn 1984, assumes that implicatures are associated with scalar items by default. They are, in a way, always present at the derivation. But since they are defeasible, they can be canceled, either overtly (some, and perhaps all, students . . .) or contextually, when the context does not license the implicature inference. The alternative context-driven view is that the scalar implicatures, like the Gricean particularized conversational implicatures, are triggered by the context, and they are generated only if there is contextual reason to do so. (This was argued in Carston’s 1998 application of Sperber and Wilson’s 1986/1995 Relevance Theory; for some of the history, see Breheny, Katsos, and Williams, forthcoming.) On this view, then, the default interpretation of a scalar item is its logical one (what Chierchia calls the plain value) and the implicatures are added in context. For the context-driven view, then, (116) and (117) involve dramatically di¤erent processing tasks. In the case of (116), the context enforces the computation of the implicature. Assuming that this is a complex computation (a point I return to), there is room to argue that it exceeds children’s processing storage. But in the case of the DE (117), nothing triggers this computation, so it does not apply and the sentence enters the context set with its plain logical value. (As we will see, DE contexts are just one of the factors that may exclude scalar implicatures.) Given that children are completely e‰cient with the logical readings of scalar items (as Chierchia et al. 2001 and others have shown), they have no difficulties with such derivations. Noveck and Posada (2003) and Breheny, Katsos, and Williams (forthcoming) show that the same di¤erence is also found in adult processing. Sentences licensing scalar implicatures are processed more slowly than sentences that do not, like (117). If the default
284
Chapter 5
theory of scalar implicatures is right, we would not expect such a di¤erence. So there seems to be processing evidence that implicatures are computed only when there is contextual reason to do so. But the context-driven line appears to require a global computation— it is only when the full representation is assessed against a context set that one can know whether the implicature should be added. As we saw, Chierchia showed convincingly that in semantic terms, this is not feasible. Computing the implicatures on the final semantic representation of the sentence would give the wrong result for embedded implicatures, as illustrated in (115). So the question is how these two conflicting findings (that the computation must be global and that it cannot be) can be reconciled. Let us assume with the context-driven approach that we turn to computing the implicatures only when the context indicates that the semantic representation constructed on the basis of the logical reading of the scalar item may be insu‰cient. The procedure then requires constructing an alternative semantic representation. Let us assume with Chierchia that the only way to construct this alternative representation is to start over compositionally and derive it precisely the way he does. This means that while holding the semantic representation we have constructed already (the plain value), we start working on constructing another representation for the derivation (its scalar value). This, obviously, is not a standard procedure. It means reopening a derivation (from the position where the relevant scalar item is) and constructing a new interpretation for it. In terms of what is involved, it is similar to the operation of stress-shift, which, as we saw in chapter 3, undoes the PF (stress) representation of the derivation, and starts over from the point where we want the new stress. Given the assumptions of this book, this is a clear case of an illicit operation, motivated only by needs of the interface. Thus it requires strong justification, checked through reference-set comparison. It is permitted only if the two representations derived for the sentence (the plain and the scalar) are clearly distinguishable at the interface. For (116), repeated below, the reference set consists of the two semantic representations in (122): (122a) is the plain value derived compositionally for this sentence, and (122b) is the scalar value that, on the present analysis, is obtained by retracing and deriving a new interpretation compositionally. The two are clearly distinguished: (b) entails (a), but not conversely. (116) Every boy chose a banana or a strawberry. (122) a. Every boy (lx (x chose a banana or a strawberry)) b. Every boy (lx (x chose a banana or a strawberry and sx chose a banana and a strawberry))
Processing Cost of Reference-Set Computation
285
So far it seems that the comparison of the two representations is based just on the equivalent of Chierchia’s strength condition: reinterpreting a derivation is permitted only if the new interpretation is stronger than the original one, and thus it renders the sentence more informative. This, however, is not a su‰cient justification. There are contexts where this sort of increased informativeness is superfluous, relative to the needs of the context set, and hence the implicature is not licensed. Breheny, Katsos, and Williams (forthcoming) call such contexts lower-bound, borrowing the term from Horn’s (1984, 13) statement that implicatures may also be canceled ‘‘implicitly (by establishing the appropriate contexts, in which all that is relevant, or can be known, is the lower bound).’’ They cite (123), from Levinson (2000, 51), as an example of a lower-bound context. (123) a. Is there any evidence against them? b. Some of their identity documents are forgeries. In the context of the question in (123a), the logical (plain) interpretation of the statement in (123b) is a su‰cient answer. To answer the question positively, it is su‰cient that the people under question have some forged documents, and it is irrelevant whether all of their documents are forgeries or not. So in this case, the implicature that not all of their documents are forgeries does not arise. (In Levinson’s analysis, as in Chierchia’s, it is assumed that the implicature is canceled in this context; in the contextdriven approach of Breheny and colleagues, it is not induced, and it is not computed in such contexts.) A more subtle instance of the context dependence of scalar implicatures is example (124) from Breheny, Katsos, and Williams (their (6)). The same disjunctive NP—the class notes or the summary—licenses the exclusive reading in (124a), but not in the ‘‘lower-bound’’ context (124b). (124) a. John was taking a university course and working at the same time. For the exams he had to study from short and comprehensive sources. Depending on the course, he decided to read the class notes or the summary. b. John heard that the textbook for Geophysics was very advanced. Nobody understood it properly. He heard that if he wanted to pass the course he should read the class notes or the summary. This is determined only by the previous context, and not by entailment relations between the two competing representations for the disjunction. Let us look at the last sentence of (124b), given in (125). Its plain value
286
Chapter 5
is (125a), and the scalar value is (125b). The disjunction does not occur in the restrictive term of the DE conditional operator. Hence, the entailment relations between the two representations are precisely the same as in (122): (125b) entails (125a) and is thus stronger, or more informative. (125) If John wants to pass the course he should read the class notes or the summary. a. John (lx ((x wants to pass the course) (x should read the class notes or the summary))) b. John (lx ((x wants to pass the course) (x should read the class notes or the summary and sx should read the class notes and the summary))) Nevertheless, this strength of information does not give (125b) any preferred status in the context of (124b), because its extra information is not relevant. The context is set so that the key to succeeding in the course is bypassing the required reading, and the less informative construal (125a) is su‰cient for that context. (The reasoning here is reminiscent of the original Gricean maxim of quantity that also prohibits being overinformative.) In our terms, then, the contextual e¤ect of adding the implicature to the derivation is not su‰ciently distinguished from that of the representation without it, to justify applying an illicit operation. Regrettably, the precise conditions governing the selection of the required strength of information relative to the context set cannot be presently stated as formally and explicitly as conditions of the CS or semantics are. This is a central topic of research in Relevance Theory (cited above). It is understandable that syntacticians and semanticians would try to shy away from what often seem like vague formulations. But the crucial question is if it is indeed possible to avoid such issues when one attempts a comprehensive analysis of scalar implicatures. Chierchia (2004) is well aware that contextual factors, not just semantic factors, can lead to the cancellation of implicatures. He says that in such cases ‘‘we immediately see that the implicature is incompatible with the context, so we throw it out’’ (p. 50). But to know when the context forces one to remove an implicature, we have to know precisely those contextual factors that tend to be dismissed as ‘‘pragmatics.’’ There is no conceptual advantage in introducing a formal and precise system for generating implicatures just to say its outputs may then disappear under mysterious discourse conditions, compared to assuming that the formal machinery that generates implicatures is activated only when the mysterious discourse conditions demand it.
Processing Cost of Reference-Set Computation
287
Although conceptually the two approaches just outlined are equivalent, some empirical evidence in favor of the second comes from adult processing findings. There is growing evidence that sentences involving scalar implicatures impose a greater processing load than sentences without them. This is found even in simple experiments measuring reading time (Noveck and Posada 2003; Breheny, Katsos, and Williams, forthcoming, and the references cited there). Breheny and colleagues point out that under the default view of implicatures, the two contexts in (124) should create identical processing problems, since in both the processing of an implicature is required at some stage of the derivation. (This is the same issue that we observed in the case of the acquisition of (116) and (117), except that no DE operator is involved here.) But their experiments show clearly that this is not so, and processing the trigger when the implicature is activated (in 124a) took significantly longer for subjects than when it is not as in (124b). One experiment consisted of twelve (mismatched) pairs similar to that in (124), but in Greek. Forty-seven native speakers of Greek (mean age 23.5 years) participated. The texts were presented electronically segment by segment. In this specific experiment the trigger-containing segment (e.g., he decided to read the class notes or the summary) occurred at the end of the text in all pairs. Participants were instructed to read each segment only once and click the left mouse button to see the next segment. Reading time and answers to (independent) comprehension questions were recorded by the software. The mean reading time of the trigger segment was 1,291 (SD ¼ 352) milliseconds when it occurred in the implicature-inducing context, like (124a), and 1,204 (SD ¼ 292) milliseconds in the nonimplicature context like (124b). With further analysis of variance, Breheny and colleagues concluded that the reading time of the trigger segment was significantly longer when an implicature was activated. The processing findings are consistent with the model of deriving scalar implicatures outlined here: the computation requires an illicit operation that retraces the semantic composition, and constructs a new semantic representation for the derivation. This process itself is compositional, along the lines of Chierchia, but it is costly since it is based on reopening closed stages of the derivation. Like other instances of illicit operations motivated by the needs of the interface, it requires reference-set computation that compares, globally, the original and the new representation. It is only licensed if the two are distinct in a way that is significant to the ongoing context. Reference-set computation of this global type is always costly. Children are unable to carry it out. Adults can, of course, but at a
288
Chapter 5
processing cost. Unlike in the default view, this complex procedure is motivated by the context, and speakers do not attempt to compute it when there is no discourse reason. Hence, scalar items do not give rise to processing di‰culties when no implicatures are involved. With this we can return to the findings on the acquisition of implicatures. As mentioned, although the group performance is in the 50 percent range, the findings do not reveal individual guess patterns. Rather, children have a fixed pattern of response, with about half consistently giving adultlike judgments and the other half consistently giving the judgment wrong for adults. It is tempting to summarize this finding as indicating that half of the children at age five had mastered the adult use of implicatures, and the other half had not, as is done in some of the literature cited. But this is a puzzling and unusual situation. (Why should the distribution of maturation in the same age group be exactly at chance level?) I noted in sections 5.2.4 and 5.2.5 that this is precisely the pattern of performance found in the acquisition of stress-shifted focus. In fact, the similarity between the two computations involved is even greater. In the case of stress-shift, we saw that there is evidence that children manage to construct the reference set, at least partially (see section 5.2.4). But they are not able to complete the computation and select the right focus. Their task there is to decide whether the narrow DP-constituent that contains the shifted stress is the intended focus, or the larger VP that also contains it and is also in the focus set. (The choice here should be based on whether the VP-focus could be obtained without stress-shift.) What they do instead is resort to a fixed default, based on entailment relations. Crain, Ni, and Conway (1994) showed that there are two default strategies, when the context enables a choice between two readings one of which entails the other. The common adult strategy is ‘‘minimal commitment’’—that is, select the entailed reading, which is weaker and less committal, thus minimizing the chance that you will have to revise your choice as you proceed. Children use also the opposite default of ‘‘maximal commitment,’’ which leads to a selection of the stronger, entailing option. Crain and colleagues assume that the latter is a necessary stage in acquisition and that most children abide by it during language development, because it facilitates learning with no negative evidence. But as we saw in section 5.2.5, it is more likely that children operate by di¤erent defaults, depending on factors other than learnability. In any case, the preference for one of these two defaults is fixed for a child. Returning to stress-shifted foci, what we saw that children do, once the two alternatives of the reference set are constructed, is select by their pre-
Processing Cost of Reference-Set Computation
289
set default. If they operate by minimal commitment, they will choose the narrow-focus representation, which is entailed by the wide (VP) focus representation. If they operate by maximal commitment, they will select the wide, entailing, representation. The problem is that neither of these defaults is relevant to the task. It is a procedure that enables bypassing the required computation by resorting to an independently established default. Since the choice of the default is itself arbitrary, the group performance remains around 50 percent. But the outcome is that the children that follow minimal commitment end up selecting the correct adult response in the setting of the experiments (narrow focus), while the others consistently select the ‘‘wrong’’ response. To unsuspecting observers, it may look as though half of them had matured enough to understand stress-shift, while the other half had not. The situation in the resolution of implicatures is quite similar. The reference set for (116) was (122), both repeated below. Suppose children know they have to compute the option of implicatures, perhaps helped into this awareness by a training session, as in Papafragou and Musolino’s (2003) experiment. Since the computation itself—and the semantic mechanism deriving implicature—are innate, children know exactly what they have to do to carry out this computation. Suppose they manage to construct the two representations in (122) and get stuck there, as in the case of stress-shifted focus. (116) Every boy chose a banana or a strawberry. (122) a. Every boy (lx (x chose a banana or a strawberry)) b. Every boy (lx (x chose a banana or a strawberry and sx chose a banana and a strawberry)) The semantic properties of the reference set in (122) are the same as in the case of focus; one member (122b) entails the other. Therefore, the children who are stuck can apply precisely the same arbitrary default. If they go by minimal commitment, they will select the weaker (entailed) reading (122a) and say that the puppet has said it right (in the situation where every boy chose both a banana and a strawberry). If they go by maximal commitment, they select the stronger (122b), and judge the puppet to be wrong. Unlike in the case of focus, it is the maximalcommitment group that will resemble mature little adults, because in this context, adults select (122b). Again, the default is irrelevant to the task, because, as we just saw in the discussion of (125), the point is to decide whether the context justifies the selection of the costly stronger reading. Since children operate by an arbitrary default, the group result remains at chance.
290
Chapter 5
Let us finally turn to the findings regarding numeral scalar implicatures. It has increasingly been found that children have no problems in this area, or far fewer problems than with the other types of scalar implicatures. For instance, in the experiment of Papafragou and Musolino 2003, which was reported above, children performed in the 50 percent group range on the scalar item some—they rejected the statement (111), repeated below, 52.5 percent of the time, when in the story Mickey put all three hoops around the pole. (111) Mickey put some of the hoops around the pole. (126) Mickey put two of the hoops around the pole. But in the same experiment, Papafragou and Musolino also studied numeral expressions as in (126), under precisely the same conditions (including a training designed to increase children’s awareness to appropriateness to context). Children’s rejection rate on (126) was 90 percent, which is not significantly di¤erent from adults’. Katsos, Breheny, and Williams (2005) survey some studies on adult processing that found that numeral scalar implicatures do not present the same processing pattern as the other scalar implicatures. It appears possible to address this question by postulating that numerals are either ‘‘underspecified’’ or ambiguous regarding their ‘‘at least,’’ ‘‘exactly,’’ or ‘‘at most’’ interpretation, as suggested in Papafragou and Musolino 2003 and many other publications. If so, then the costly computation of a scalar implicature is not involved in understanding numerals in context, and consequently, children face no problems in the processing of numerals. But although this explains the acquisition findings, it does not seem to take us very far in explaining the numeral problem itself. The question remains as to what determines which of the potential numeral meanings is selected in a given context. In many respects, numerals still behave like the other scalar items. For instance, the ‘‘exactly’’ reading is canceled in DE contexts, as in (127). But there is no room to account for that within a system assuming just ambiguity. (127) a. Anyone who has six children is entitled to child support. b. If you have three dollars, you can enter the zoo today. Within the framework presented here, a di¤erent solution suggests itself. The analysis has rested crucially on Chierchia’s compositional approach to scalar implicatures. This means that the compositional construction of scalar implicatures, based on the set of alternatives derived from the lexical scale, is part of compositional semantics. I have argued that in the
Processing Cost of Reference-Set Computation
291
standard cases of scalar implicatures this compositional semantics applies only at the output stage, and requires reopening of the semantic composition, which is what makes it costly. But given that the computation itself is available in UG, it is in principle possible that there are some areas where it indeed applies during the derivation. Suppose, then, that with numerals, Chierchia’s procedure is lexicalized—that is, it does apply as a default. A numeral, then, is always computed locally against its scalar alternatives, carrying the ‘‘at most’’ implicature through the derivation, unless it is canceled by a DE operator (or by other contextual factors). I argued above that this type of procedure should not cause a processing crash in children. So, if this procedure indeed applies with numerals, this explains why children have no problems with numeral scalar implicatures.
Notes
Introduction 1. As Chomsky (2000, 92) illustrates the question: ‘‘Suppose that a super-engineer were given design specifications for language: Here are the conditions that FL must satisfy; your task is to design a device that satisfies these conditions in some optimal manner (the solution might not be unique). The question is: How close does language come to such optimal design?’’ 2. The term has been associated with specific implementations, attempting, for example, to have the parser run precisely the same operations as the CS, but ‘‘in the reverse order.’’ As will become clear shortly, I do not assume such implementations here. Chapter 1 1. The concept of numeration was introduced later, in Chomsky 1995. In the original formulation, in Chomsky 1992, the specification was that the two derivations are ‘‘with the same LF output,’’ which is obviously not what was intended. See Reinhart [1994] 1998 for further discussion. 2. The Empty Category Principle (ECP) stated that empty nodes—that is, traces left by a movement operation—must be governed. Government can be obtained either via a head, or via a relation to an antecedent. Complements are always governed by their heads, so they can be freely extracted as long as this does not violate an island condition. But adjuncts need antecedent government. The latter requires a structural proximity of the empty node and its antecedent, a condition not met in (9). 3. This account rests on a distinction between interpretable and uninterpretable features. Interpretable features are not deleted after checking, since they play a role in the interpretation. They include category-features, phi-features (gender, number), and wh-features. Given that they are not deleted, they remain, technically, available for further feature checking. 4. Once it was merged in (ia), the only possible continuation, given (18), is to move it further, as in (ib), which is why (ic) cannot be derived. The problem is why the convergent (ib) is in fact bad, and it cannot be any semantic defectiveness in this case. (Earlier considerations regarding the case of Max, which were
294
Notes to Pages 25–49
available in the principles-and-parameters model, are not usable in the same way in the MP.) (i) a. e seems that [it is certain [Max to arrive]] b. It seems that [t is certain [Max to arrive]] c. *Max seems [that it is certain [t to arrive]] For extensive discussion of this and related problems in chapter 4, see Ruys 1996. 5. Note that superraising cases violate the chain-condition, which Reinhart and Reuland (1993) argue is still indispensable for various cases of anaphora: the trace of Max forms a singleton chain, which is R, which violates the requirement that argument chains are (headed by) þR. 6. There are independent reasons for why this is so, say, for why (31a) cannot be interpreted as (30b). As noted, a wh-phrase in SpecCP cannot take scope higher than this CP. Epstein (1992) argues that this too follows from economy, but in any case, this is a fact about wh-scope. 7. In Fox 2000, he modified his analysis of QR, though the basic intuition remains the same. For the present discussion it is Fox’s (1995) earlier version that is relevant. 8. No one account seems to capture all the contexts that have been observed over the years to allow superiority violations. There are two famous contexts that do not follow from the account above. First, there are the so-called D-linking cases of Pesetsky 1987, illustrated in (ib) (which is better than (ia)). Second, as observed by Kayne (1984), adding another wh-focus to the sentence usually improves superiority violations. For example, (ii) is better than (ia). (i) a. */?What did who buy? b. ?Which book did which man buy? (ii) ?What did who buy where? 9. But the precise formal mechanisms for selecting an element into the numeration according to its future e¤ect on the interface need further examination. Possibly the only explicit way to formulate these mechanisms will eventually entail reference-set computation. Chapter 2 1. A shorter version of this chapter, excluding section 2.7, appeared as ‘‘Quantifier Scope: How Labor Is Divided between QR and Choice Functions’’ in Linguistics and Philosophy 20 (1997): 335–397. Sections 2.2 to 2.6 are reprinted here with kind permission of Springer Science and Business Media. I would like to thank Danny Fox, Genarro Chierchia, Irene Heim, Anna Szabolcsi, and especially Remko Scha for many helpful comments at various stages of this chapter. Yoad Winter’s contribution to the analysis of choice functions in section 2.6 goes far beyond his article on the matter (Winter 1997). Many of the ideas here were intensively discussed with him before they took their present form. 2. May (1977), who further developed and substantiated the idea of QR, argues that this operation is clause-bound, and extraction out of an embedded tensed
Notes to Pages 49–54
295
clause is ‘‘marked.’’ But, in fact, his judgments of scope correlate precisely with those on wh-movement. Movement out of an embedded tensed clause is known to be sensitive to the type of matrix verb, and only ‘‘bridge’’ verbs allow it. May took the bridge verbs to be the marked option. Discussing the availability of scoping the quantifier out of the embedded complements in (i) and (ii) (for the de re interpretation), May argues that it is impossible in (ia) but possible in (iia), and proceeds to label (iia) as ‘‘marked.’’ The crucial point is that his examples still show the correlation between wh-movement and QR: he argues that wh-movement is much worse in (ib) than in (iib)—a judgment largely accepted. (i) a. John hissed that Smith liked every painting. b. */?What did John hiss that Smith liked e? (ii) a. John said that everyone left. b. Who did John said e had left? 3. Historically, it was believed in the syntactic framework that QR is also needed independently of the question of nonovert scope, and applies to all quantifiers regardless of their scope, in order to allow a classical-logic interpretation for them. This belief, which is purely theory internal, was never, in fact, founded. (For example, no answer was given as to how quantifiers like most are interpreted within classical logic.) I assume throughout that quantifiers are interpretable in situ, as in the Montague tradition, and the only question is nonovert scope, for which a structure distinct from the overt structure was also assumed by Montague. 4. In the current—minimalist—stage of syntactic theory (Chomsky 1994), there is nothing peculiar to quantification in assuming covert movement (chain formation). The view is that a derivation of a sentence can be spelled out phonetically at any stage (subject to the relevant spell-out conditions). Languages may vary regarding where spell-out takes place, which is the source of word-order variation. There is, therefore, nothing particularly puzzling about operations continuing at the covert structure, which may then a¤ect the interpretation (though there is a serious question regarding what drives and restricts such operations). 5. It is possible to develop semantics that capture all scope construals compositionally in situ, as shown in Hendriks 1993. But why it should be restricted by syntactic islands remains a mystery. 6. Specifically, to check whether (11b) is needed, we have to find a situation in which (10) can be used, and that is not covered already by its construal under (11a). Since (11b) entails (11a), the only case where they can di¤er is if (11b) is false, while (11a) is true. So, to decide the matter we need to check a situation in which we want to use (10) falsely, construed under (11b), while the same situation (10) is true under the construal (11a). Kempson and Cormack (1981) argued that it is possible to distinguish vagueness from ambiguity based on precisely such contexts. They proposed the negation test—for example, the denial of (10) in (i) takes (10) to be construed under (11b) and asserts that it is false. (i) A: Every tourist read some guidebook. B: It’s not true, there is no guidebook that every tourist read. / It’s not true, they could not possibly all have read the same book.
296
Notes to Pages 54–56
The delicacy of such tests is best attested by the fact that Kempson and Cormack argue that such denial is perfectly fine (for their equivalent examples), concluding that the sentence is indeed ambiguous rather than vague, while my gut reaction to (i) is that it is a pure instance of incoherent discourse. This, of course, does not prove that Kempson and Cormack are wrong. It just shows that it is extremely di‰cult to decide. 7. The proponents of QR have addressed this argument before, but in a peculiar way. Thus Ruys (1992) reports that both May (1977) and Huang (1982) construed the argument of Reinhart 1976 in precisely the opposite direction of what was claimed, and proceeded to argue (correctly) that this claim is obviously false. For convenience, I include the paragraph below from Reinhart (1976, 193–194) that May and Huang quote from (with original numbering). (40) Everybody speaks two languages. (41) (Ax) (E two languages y) (x speaks y) (42) (E two languages y) (Ax) (x speaks y) . . . Many linguists (particularly those working within a Montague-oriented approach) have argued for example that sentences like (40) are, in fact, ambiguous, contrary to the claim I have made here. However, this is, I feel, a relatively minor problem. In the first place, most putative examples of such ambiguities which are discussed in the literature are ones where one interpretation entails the other. E.g. the interpretation (42), in which there are two specific languages that everybody speaks, entails (41). So our intuitions distinguishing ambiguity and vagueness in these cases are less clear than in cases where the two interpretations are logically independent.
Both May and Huang quote only the first sentence of the second paragraph above, and proceed to argue that this argument does not apply to ‘‘the very case (40) at hand (Huang), since contrary to what Reinhart claims (41) does not entail (42). So Reinhart’s analysis does not allow her to derive all readings for sentence (40) (May 1977)’’ (quoted in Ruys 1992, 8–9). In view of the null intersection between the argument and its putative refutation, it would be di‰cult to answer the refutation in any precise way. I would like to stress, though, that both authors also o¤er more compelling arguments for their LF-analysis than those just cited. Interestingly, Ruys reports that precisely the same misconstrual of the argument is applied by Fodor and Sag (1982), against Cooper’s (1979) claim that the same reasoning could explain apparent island violations of existentials. 8. Rather, the arguments for specificity readings usually rely on the authors’ feelings regarding which previous discourse is more appropriate for each of the readings they propose for the sentence, or regarding the putative mental state of the speaker when uttering the sentence (e.g., the degree of his familiarity with the entity being discussed). All are, indeed, interesting and important pragmatic questions, but they are also highly undecidable. 9. Fodor and Sag’s actual example uses the reverse order of the conditional, as in (i) (their (73)). (i) If a student in the syntax class cheats on the exam, every professor will be fired. This order poses independent problems. If the argument is correct, it should also hold for the order in (18).
Notes to Pages 59–63
297
10. A more formal implementation of Fodor and Sag’s intuition is o¤ered in Beghelli, Ben-Shalom, and Szabolcsi 1993. They apply the concept of a principal filter that, intuitively, enables the quantifier to ‘‘always talk about the same individuals.’’ While universal and definite NPs always denote principal filters, indefinites may do so as an additional reading. With this assumed, the apparent wide scope in the relevant cases follows, in an interesting way, as an entailment of the logical-scope construals of the sentence. Beghelli, Ben-Shalom, and Szabolcsi do not discuss the problem of intermediate readings. So it remains to be studied whether the analysis could be modified to handle those readings. 11. In the 1990s, several syntactic accounts for clause boundedness were proposed. In Hornstein’s (1995) elegant solution, wide scope of strong NPs does not require movement at all. Rather the scope is determined by which element of an A-chain gets deleted. However, since this allows wide scope only in A-chain environments, it allows ECM and raising subjects to take wide scope outside their clause, but it excludes wide scope out of infinitival clauses, which is widely believed to be allowed (see Ruys 1992 and Beghelli 1995 for the details). Beghelli’s (1995) account captures scoping out of infinitivals, and many other facts, but at the price of employing much heavier machinery than Hornstein’s. The original account of clause boundedness in May 1977 was that covert movement, as opposed to overt movement, is by adjunction. Cyclic (overt) whmovement is possible because the wh moves to the Comp of its S, thus crossing only one S-node, and then has to cross only one S-node on its way to the next Comp. But if a quantified NP adjoins to the S of its clause, it cannot move on to the next S, since it has to cross both this S-node and the next one, and two Snodes are not allowed to be crossed by subjacency. On this view it could be said that QR of strong QNPs obeys subjacency, but that of existentials does not. However, this solution would exclude wide scope not only out of infinitival clauses, but out of ECM and raising structures as well. It would also be di‰cult to maintain this view given the highly influential view of dominance developed in May 1985, since on that view, the S that an NP adjoins to is defined as not dominating it. 12. Extraposition is supposed to be clause-bound, but a growing consensus is that it does not exist as a syntactic operation. 13. Hornstein (1994) proposes a principle, inspired by Diesing (1992), that he labels ‘‘reference principle’’ and that has the e¤ect that when NPs of the ‘‘specific’’ type (i.e., the set of existentials under consideration) occur in subject position, they always prefer wider scope than their competitors. (The principle is actually stated using somewhat di¤erent terminology.) 14. Currently, a leading hypothesis regarding VP-ellipsis is that it does not involve LF-copy, but rather some mechanism of actual deletion at PF, under a parallelism condition. However, it is hard to see how an equivalent can be developed for comparatives, without scoping out at least the degree operator. Even if a PFdeletion approach could be developed for the comparative cases, they still di¤er from VP-ellipsis, since the latter does not show island e¤ects like those illustrated in (26c). Thus, such an analysis would have to involve some scoping of the correlate in the first comparison clause. For the except cases below, an ellipsis analysis is infeasible, anyway (see the next note).
298
Notes to Pages 63–68
15. I argue in Reinhart 1991 that an ellipsis approach (whether copy or deletion) is impossible for these structures, since it yields the wrong semantics. Example (ia) cannot be derived from anything like (ib). Even if we sneak in the negation mysteriously, as in (ic), (ic) is a contradiction, which (ia) is not. (i) a. Everyone arrived, but Felix. b. aEveryone arrived, but Felix arrived. c. Everyone arrived, but Felix did not arrive. (ii) Everyone but Felix arrived. In its interpretation, (ia) is identical to (ii), where every-but is one constituent. It could appear that (ia) could be derived by some overt movement applied to (ii). However, I argue in Reinhart 1991 that if we look at the full distribution of except conjunctions, this turns out to be an impossible analysis. Current syntactic theory entails certain di¤erences between overt and covert movement, since the overt one is via SpecCP, while the covert one is by adjunction. For example, while syntactic movement shows wh-islands, as in (iii), it has been noted (e.g., in Moltmann and Szabolcsi 1994) that there are no such islands with covert movement. Except conjunctions are, indeed, insensitive to wh-islands, as in (iv). The same is found with the movement of the correlate in comparatives, as in (v). (iii) *About whom should I tell you what I think e? (iv) I’ll tell you what I think about everyone, if you insist, except my boss. (v) More people remember what was said by Jones than by Smith. For this and other reasons, I argue that the semantic every-but constituent in such structures must be formed at the covert structure. 16. Roughly, the Empty Category Principle (ECP) states that if the moved constituent is not a complement of some predicate, there can be no syntactic barrier between it and its trace. This entails that when an argument is extracted across a syntactic barrier (i.e., out of an island) it violates only subjaceny, but when an adjunct is extracted from the same position it violates both subjacency and the ECP. Correspondingly, it is also assumed that (31b) is worse than (30b), since it violates two syntactic conditions. 17. As Chung and colleagues show, there are, in fact, two di¤erent instances of sluicing. The more familiar case is the one examined above. In the second case, illustrated in (i), the wh-remnant is an adjunct or an argument with no overt correspondent. (i) a. If Sam was going, Sally would know where. b. She is reading. I can’t imagine what. For the interpretation of (i), Chung and colleagues introduce the operation ‘‘sprouting,’’ which completes the structure of the ‘‘recycled’’ clause, by an operation similar to ‘‘form-chain’’ (which, in e¤ect, adds the missing trace for the whelement to bind). The operation is subject to restrictions that are well defined and explained (in terms of argument structure). An impressive entailment of their analysis is that it follows that the two types of sluicing have a di¤erent syntactic distribution. The first type does not show island restrictions, since unselective binding is not sensitive to subjacency. The second
Notes to Pages 68–71
299
type, by contrast, is a syntactic process. Hence, it obeys all known restrictions on ‘‘form-chain,’’ including subjacency. (There can be no relevant barriers between the wh in the sluice and its reconstructed trace.) This is exemplified below (based on examples from Chung and colleagues (105)–(107)). (ii) a.
Agnes said that John could eat, but I don’t remember what (¼ but I don’t remember what Agnes said that John could eat t). b. *Agnes wondered how John could eat, but I don’t remember what.
(iii) a. It is likely that Tom will win, but it is not clear which race. b. *That Tom will win is likely, but it is not clear which race. (iv) *Clinton is anxious to find out which budget dilemmas Panetta would be willing to tackle, but he won’t say how. 18. I chose this framework since it lends itself easily to the type of solution I propose for the problems below. I leave it open whether the same can also be stated in the framework of Groenendijk and Stokho¤ 1982. 19. Technically speaking, it is not, in fact, fully clear that the representation (41b) should allow the relevant answer to be (41c). It allows Donald Duck as a value for y, but the proposition in the denotation set may have to be (i). If this is so, however, (41b) would also disallow (ii) as a possible answer, while equally allowing both (i) and (iii) as answers, which is wrong enough. (i) Lucie will be o¤ended if Donald Duck is a philosopher and we invite him. (ii) Lucie will be o¤ended if we invite Kripke. (iii) Lucie will be o¤ended if Kripke is a philosopher and we invite him. 20. A reviewer pointed out that the analysis in (41) cannot be rescued by employing an actuality operator on the restriction, as say in (i). (i) {Pjphx, yi (P ¼ lw ((we invite y in w and y is a philosopher in the actual world) (x will be o¤ended in w)) & true (P))} The same problem remains in (i), namely, that any nonphilosopher in the actual world satisfies the implication. 21. It is fashionable nowadays to enrich both the semantic and the syntactic machinery by associating presuppositions with almost any type of NP. This line of thinking would attempt to face this problem by claims that wh-phrases carry presuppositions, and it is the presupposition of what philosopher that should somehow explain why the wrong answers obtained by the derivations above are excluded. Associating presuppositions with existentially quantified NPs is highly problematic within any of the familiar semantic systems, disabling basic entailments. Semantically, wh-expressions are strictly existential (weak) NPs. This claim cannot even be evaluated under Karttunen’s analysis, which I assumed here only because it is most familiar, and because the fine details are irrelevant to my main argument. For example, if wh-expressions are weak, the question How many chairs in this room are broken? should be equivalent to the question How many broken chairs are there in this room? But under Karttunen’s analysis, the definition of P in the set of propositions will be entirely di¤erent for these questions, so equivalence cannot even be computed. However, this problem does not arise in the original analysis of Hamblin (1976), where the restriction is put inside the
300
Notes to Pages 71–79
definition of P. This captures correctly what I believe to be the semantic properties of wh-expressions. Hamblin himself is quite explicit about the issue of presuppositions belonging to pragmatics (see Hamblin 1976, 257). In any case, even if wh-expressions are always presuppositional, we would still need to know how, precisely, this ‘‘somehow’’ association between presuppositions and the wrong entailments under consideration is executed beyond just the hope that this problem could somehow be solved. 22. The authors do not specify exactly how unselective binding applies, but exemplify the intended output of its application. 23. For example, under the standard unselective binding proposed by Heim, (i) is assigned the representation (ii), which in fact the sentence cannot have. (As has been widely discussed, under this construal the sentence is true, if there are, say, 9 men who each buys 1 car without worshiping it, and one man who buys 100 cars and worships them all, since for most man-car pairs, the implication is true. But (i) cannot be true in this situation.) (i) Almost every man who buys a car worships it. (ii) Almost every hx, yi (man (x) & car (y) & x buys y) ! (x worships y)) (iii) Almost every hx, Yi (man (x) & Y ¼ {zjcar(z) & x buys z} & jYj ¼ jYj ¼ 1) ! (x worships Y) To avoid this, the representation must be that in (iii) (from Reinhart 1987, 144). What gets unselectively bound here is a set variable. Under this construal the implication must be true for most pairs of a man and the (maximal) set of all the cars he buys. So (i), construed as (iii), is correctly false in the situation described above. 24. This point was brought to my attention by Danny Fox. 25. Note that there was some equivocation in the way I discussed the syntactic implementations of unselective binding before. The idea is that the existential NP itself still remains in argument position. But it was impossible to even represent this intention, assuming standard unselective binding, since the N 0 that remains in situ is a predicate, not an argument. Hence, in the actual representation of what I thought was the intended interpretation, I moved the N-set quietly to an A-bar position in its clause, where it could be interpreted as a predicate. This is just an implicit instance of QR inside the IP containing the existential. 26. Ruys concludes that the apparent wide scope is not, in fact, real wide scope, but rather what he calls ‘‘nonscope.’’ Based on this conclusion, he has developed a mechanism for capturing scope and nonscope, using a system of superscripts. The mechanism needs to prevent indefinites from taking scope over other NPs, while still having this apparent wide scope. So a certain complexity of the machinery is entailed. 27. Note in the meantime that although it eliminates the problem in (63), the decision that numeral existentials cannot undergo scope-shift amounts to adding one more problem-specific stipulation to the many we have already accumulated. Couched in the QR-approach, we now would have the following list. (1) Existential GQs (or distributively interpreted existentials) do not move at all. (2) Strong
Notes to Pages 79–82
301
(universal) GQs move only in their clause. (3) Non-GQ (or Collective) existentials can move in an island-free way. (4) Overt whs can move outside the clause, but obey islands. Despite the rich machinery, this leaves out cases like (58), as well as many others I mentioned along the way, which I will keep for the time being under lists of exceptions. In section 7.2 I argue that no such stipulation is needed. 28. I have discussed here only the semantic approaches to the distributivity operator, as, for example, in Kamp and Reyle 1993. In the syntactic approach developed by Beghelli and Stowell (1995), this operator is generated as a syntactic node (a head of a special Distributive projection, which is generated below the Subject Agreement projection). It could perhaps be suggested that this head can further move, to allow for the relevant distributivity in (58), and it is this movement that obeys subjacency, though the movement of the existential does not. If the D is generated inside the if-clause in (62), it cannot move out to distribute over the whole l-predicate of (63). 29. Abusch follows DRT in assuming that indefinites are restricted variables. These variables are stored at each cycle, together with their whole NP, until they reach the restrictive term of the operator that can bind them. Thus, the mechanism is equivalent to the storage mechanism, and it can capture everything that QR can. One of the problems that made storage unattractive was the fact that information about islands had to be built into it in an artificial way, defeating, in e¤ect, the enterprise of showing that the process is semantic rather than syntactic. Abusch does not have this problem, since her analysis is restricted to existentials, which do not obey island restrictions. On the other hand, the question that this line of thought raises is why only weak-existential NPs get stored. Abusch attempts to derive this from semantics (essentially, from the open, or variable, nature of these NPs). But Cooper has shown that, as far as semantics goes, strong NPs can be stored just as well. (Recall that Abusch is not just storing a variable—what is called in DRT a restricted variable, is, in fact, a full syntactic NP.) So, this appears to remain a stipulation. The crucial question, if this mechanism is equivalent to QR, is whether it can handle the semantic problems that QR cannot, which we observed in section 2.4. Abusch’s judgments of the facts of (62) and (63) are di¤erent from Ruys’s and mine, so this is not technically a problem for her analysis. But if the judgments as presented here and in Winter 1997 are correct, it is not obvious how her system could be extended to account for them. 30. In Engdahl’s (1980) analysis of wh-questions, she introduced an idea that appears similar to that of choice functions: she defined a function W that applies to a set and yields a subset of this set, and allowed the function variable to be existentially closed from a distance. However, what Engdahl intended to capture with this function was not the standard wide-scope problem (which she too considered nonproblematic), but rather anaphora problems as in (i) (Engdahl’s Swedish (47)– (48), p. 140) where, under one reading, the antecedent for the pronoun appears not to have scope over it (as she states the problem, on p. 141). (i) a. Which of his books does every author usually recommend? b. Which of his poems did Maja want an author to read?
302
Notes to Pages 82–91
In Engdahl 1986, she observed, correctly, a flaw in the analysis. She concluded that the procedures she proposed in 1980 cannot, in fact, handle the problem they were designed to capture, and abandoned this approach. I believe this was the correct move. The cases of (i), and, more generally, ‘‘functional readings’’ of questions are typically the inverse problem of what we are considering here: the existential is dependent on some other quantifier, although it appears to take wider scope than that quantifier. The simple choice functions I examine here are applicable strictly to the cases of independent (genuine) wide scope. For the ‘‘dependent’’ wide scope, some equivalent of the more complex Skolem function must be used (i.e., the choice of member must be relative to a choice of value for some other variable). Various implementations (apart from those of Engdahl herself, in 1986) are Chierchia 1993 and Kratzer 1998. (In Reinhart 1992, I made a similar mistake to that of Engdahl 1980, in assuming that simple choice functions could be extended to capture some subset of the anaphora problems of the Engdahl type.) 31. The aspects of Danon’s analysis that are crucial to the present problem are consistent with many other analyses (cited there). Danon o¤ers evidence from Hebrew, where there are many syntactic tests to distinguish the D and Spec position of determiners. The D-head in (73) is not necessarily generated in that position— it may originate inside the NP and move to D. (Danon argues that, at least in Hebrew, this must be the case.) 32. Danon (1996) shows that in Hebrew, where the two position are easily distinguishable, unmodified three can occur in both positions. 33. With some D’s, like which, a, and some, there is stronger motivation to assume they never occur in SpecDP, since they can never be modified. Syntactically, this may suggest that they are unable to project an XP. The question of whether we want them to nevertheless allow a GQ-interpretation then becomes a purely semantic question. If maintaining this option is motivated, a covert existential (GQ) determiner may be assumed to be optional in SpecDP. 34. For the time being, the fact that the function in (74) must be of the choice type (rather than any arbitrary function) must be stipulated. 35. Of course, under this construal the sentences are consistent with there being more than one, or more than three women who chatted. 36. Scha assumes that predicates of natural language are always of the hhe, ti, ti type, and whether some such distribution is entailed is determined lexically by the type of the verb. However, a distributive operator can be defined for predicates of this type, as in (i), along the lines discussed in van der Does 1992. (i)
D
def
hhe, ti, ti hhe, ti, ti ¼ lPhhe, ti, ti . lXhe, ti . X 0 qbEy A X (P({y}))
37. In examples like (i), from Winter, the preferred reading (under the wide-scope construal) is as given in (ii). (i) If three workers have a baby soon, we will have to face some hard organizational problems. (ii) There is a set of three workers such that if each of them has a (di¤erent) child, we face organizational problems.
Notes to Pages 91–100
303
The distributivity of a predicate like have a child cannot be reduced to some lexical property. So we must assume some distributive operator applying to this predicate, which makes it hold for each member of the set of three workers (selected in situ by the choice function). 38. As I have noted, in the studies cited it is assumed that their scope is, in fact, narrower than allowed by QR, a point we need not pursue here. 39. Deriving the collective interpretation of predicates of the first type, with a GQ subject, is a long-standing problem. Szabolcsi (1995) suggests (without actually distinguishing these two types) that in such cases, it is the predicate that denotes a collective set, though this still needs some spelling out. 40. Note that this goes against the judgments of Evans 1980, which I also defended in Reinhart 1986, where I argued that the antecedents for discourse anaphora must be defined on the intersection set. There, (ii) was judged as not equivalent to (i), since (ii) is consistent with there being cats that Lucie has and that Max does not take care of, while (i) is not. (But no special attention was given to the di¤erence between five and at least five.) (i) Lucie has five cats and Max takes care of them. (ii) There are five cats that Lucie has and Max takes care of. It is, in principle, possible to maintain the intersection view as a special property of discourse anaphora, without giving up the distinction under consideration. 41. Though the judgments are very clear, it is not fully clear what this is a test for. Possibly this is just a clearer way to show that NPs of the first type take a collective reading much more easily than the second. This result, however, is su‰cient for the distinction I make below. 42. A witness set, as defined by Barwise and Cooper 1981, is any set in the denotation of the GQ, which is also a subset of its live-on set. 43. Its appeal is that it allows a unified treatment of all existentials as GQs and enables the choice-function procedure as a special operation on their witness sets. 44. There is some room for variation here. Danon argues that some modifiers may occur as modifiers of the whole DP, or NP, or as adjectives, in which case, the numeral may still be the head of the DP. The Hebrew equivalent of certain (in a certain student) occurs as an NP-modifier (student mesuyam). 45. This is shown in Lappin and Reinhart 1989, among other works. 46. This approach was brought to my attention by Remko Scha (personal communication). He reports that the way Bochvar (1939) defines external disjunction corresponds to this analysis of the existential. 47. When it is easy to construe the indefinite as a topic, we may get the undefined reading for it, which would follow, independently of the semantics of choice functions, along the Strawsonian line I mentioned. What is hard to get is the false reading. Note that in the system I assume, this reading can nevertheless still be generated, by applying QR to a GQ-construal of the indefinite (since no island interferes here). But, as mentioned, obtaining covert wide scope (by QR) for such GQs is, independently, extremely di‰cult.
304
Notes to Pages 101–114
48. I thank Mats Rooth and Remko Scha for help in formulating (98). 49. This statement of the underlying idea is from Reinhart 1983a, 197–198. 50. The problem is illustrated in (ia). It was noted in Reinhart 1976 that such NPstructures are the only context that systematically goes against the generalization of overt c-command scope. The most available (perhaps even the only possible) scope construal is with the lowest QNP (inside the NP) taking widest scope. This is seen even more clearly when there is further embedding, as in (ic), where the scope order is the inverse of the embedding order. (i) a. Some gifts to every girl were wrapped in red paper. b. Every gift to some girl was wrapped in red paper. c. Some gift to every girl in two countries arrived on Christmas Eve. I argued that such cases are governed by an independent mechanism, but, as pointed out in May 1977, the mechanism I suggested was pretty ad hoc. May argued that these structures, which he labeled ‘‘inverse linking,’’ are the strongest argument for a QR-view of scope. But in fact the QR analysis also cannot explain why the scope order should be the inverse of the embedding order (as in (1c)). Within the view of QR as a marked operation, these cases should be interpreted in situ to yield this result. But I still have to leave open the question of how this happens here. 51. The same question arises for approaches building QR into the numeration, as in Chomsky 1995, which I discussed in chapter 1. On that view, some feature like QUANT must be included in the numeration, to license QR. This functional feature will be allowed into the numeration only if it has an e¤ect on the output, namely, if the interpretation obtained is not identical to what will be obtained without this movement. If this is so, there is, in fact, no concept of markedness. When an operation like QR is needed for interpretation, it ends up indistinguishable in status from any other economical operation. Thus, we have no obvious explanation for the fact that quantifier scope outside the c-command domain was found, empirically, to be harder to obtain, and less common. 52. I thank Eddie Ruys for intensive discussions and comments on this section. 53. Linguists may have biased judgments on sentences that have an established judgment in the theory, but the sentences under consideration lend themselves easily to testing with nonlinguists, because they can be followed with questions about the number of objects or people participating. In examples like (i)–(iii), the topmost number allowed for the set denoted by the subject, if QR applies, is given in parentheses. Presenting a sentence like (ia), I asked: Assuming that there are five tables, how many tablecloths are there? For (iib) the question would be: How many doctors will examine patients? If the informant gives the narrow-scope answer—one—the next question would be: Could there be more—for example, ten doctors? (i) a. A tablecloth covers every table. (Up to as many tablecloths as tables) b. A doctor will examine every patient. (Up to as many doctors as patients) (ii) a. A tablecloth covers two tables. (Up to two tablecloths) b. A doctor will examine ten patients. (Up to ten doctors)
Notes to Pages 114–139
305
(iii) a. Two doctors will examine ten patients. (Up to twenty doctors) b. Three men lifted two tables. (Up to six men) For (ii), I was able to solicit a yes answer to the second question in all my informal testing. For (iii), it was impossible to convince the informants to consider the option that there were twenty doctors or six men involved. The same method was used in the examples below. Although my testing was in Hebrew, the area of semantic judgments of quantifier scope is not, to my knowledge, subject to variation between Hebrew and English. Chapter 3 1. Cinque’s stress rule (10) (p. 244) still includes the formulation in (3) (p. 241), which assumes heads. It also includes an additional requirement that an asterisk on line N must correspond to an asterisk on line N-1. In his actual analysis, he starts with the next XP-cycle (e.g., VP), just like Halle and Vergnaud. But curiously, he omits the requirement that the cycle contain at least two asterisks, and he adds that ‘‘this simplification is crucial to obtain the correct results’’ (p. 244, note 7). Indeed, this omission enables the analysis to also work without the previous assumptions, which is why I think this is what he actually intended. In any case, I do not think that there is anything at stake here apart from whether the machinery can be reduced. And I assume that the way I present Cinque’s analysis is precisely equivalent, empirically, to his. 2. Zubizarreta points out (note 14) that the question of whether the prosodic phrase is determined semantically or syntactically does not have much empirical content. Selkirk argues that the intonational phrase must form a sense unit, where two constituents constitute a sense unit if they stand in a modifier-of-head or an argument-of-head relation. But the notions assumed in this definition—modifier argument and head—are, anyway, syntactic notions. 3. For this reason, it is highly unlikely that a syntactic feature such as [þstress] would be a helpful tool to account for stress phenomena. As we will see, it is also unlikely that [þfocus] is the right way to encode focus. Both stress and focus are relational notions: stress never exists without unstressed material or focus without background. I will return to this point. 4. This is assumed under di¤erent wordings since Chomsky 1971 and Jackendo¤ 1972, but has recently gained more attention in work by Vallduvi, Engdahl, and Herman Hendriks. 5. To get more precise about this description, we need to know more about the product of spell-out (i.e., about the nature of PF, in the pair hLF, PFi). There are two possible views of what PF is. The first is that it is just a sound string, the product of all spell-out procedures. In this case both a PF- and an LFrepresentation are necessary to identify the focus. The other possibility is that like LF, PF is the full syntactic tree, derived up to the stage of spell-out, also representing further steps in the derivation required by spell-out operations like stress, erasure of features, and other phonological processes. It is a di¤erent question whether this PF-tree is just the syntactic tree, as argued by Cinque, or a tree consisting of phonological phrases. If PF is a derivational tree (under either of these
306
Notes to Pages 139–165
views), it is in principle possible that the focus rule applies solely at PF—that is, that it associates a set of possible foci with each PF. 6. In the specific example this is even clearer because the preceding context (Who is building a desk) uses a creation verb. The choice of example, from Reinhart 1995, was thus not accidental (or subconscious). Since there is no anaphoric destressing here, the derivation can be accounted for by using just one rule of mainstress shift. 7. The way this is obtained is addressed in the next subsection. 8. Schwarzschild (1999) argues that givenness, rather than anaphoricity, is the notion relevant for focus identification. Under his definition of givenness, the uninformative bare indefinites of the type we are considering can be viewed, in fact, as given. Roughly, a constituent is given by Schwarzschild’s definition if it has a discourse antecedent that, after existential closure, entails it or, in the case of referring expressions, is identical to it. It can be argued that the existential closure of anything or something (e.g., that something exists) is given in any context. While technically this is a possible definition, I do not believe it correlates with our actual perception of what is given or anaphoric in a given discourse. However, the crucial problem with this view is that in fact, anaphoric elements and bare indefinites have very di¤erent stress patterns, which could not be captured if they are both handled by anaphoric destressing. 9. Williams (1997) acknowledges the central role of anaphora in stress. However, he follows the tradition of viewing anaphora and focus as one unified problem. For him, the whole issue of focus is an instance of anaphora. 10. One case where we can see the OCP (or some other principle with similar effect) at work is when two words are placed one after the other in such a way that the first word has word-final stress, such as thirte´en, and the second word has word-initial stress, such as me´n. This would result in adjacent stresses, as in *thirte´en me´n, but that is not what we get. Rather, word stress is shifted within the first word and the outcome is thı´rteen me´n. 11. Note that the Obligatory Contour Principle (OCP) requires only that there are no adjacent S-nodes. So it does not rule out this option of a W-sequence. In case of very long W-sequences, further accentuation devices may apply to create a contour, but this is not enforced by the OCP. 12. The standard assumption is that the potential scope of only is just its ccommand domain, where it selects the focus as its scope. Chapter 4 1. The bulk of this chapter appeared as ‘‘Strategies of Anaphora Resolution’’ in Hans Bennis, Martin Everaert, and Eric Reuland, eds., Interface Strategies (Amsterdam: Royal Netherlands Academy of Arts and Sciences, 2000). I would like to thank Danny Fox and Yoad Winter for comments on and discussion of the earlier draft. 2. A standard assumption since the 1980s is that while processing sentences in context, we build an inventory of discourse entities, which can serve further as antecedents of anaphoric expressions (McCawley 1979; Prince 1981; Heim 1982).
Notes to Pages 166–183
307
3. For further—conceptual—problems see Ristad 1992. 4. Syntactic coindexation is just a technical device, with no psychological reality. It was never actually necessary for anaphoric binding (as opposed to movement). It was assumed only in order to capture uniform patterns of movement and anaphora. That identical results can be captured by direct translation of unindexed pronouns as variable bound by l-operators was argued, for example, in Reinhart 1983a, 159–160. 5. In that framework (following Chomsky 1981), a A-binds b i¤ a binds (is coindexed with and c-commands) b and a is in an argument position. For example, in the LF (ib) the trace A-binds the pronoun by the syntactic definition, while by (11), every boy does, if the pronoun is construed as in (ic). (i) a. Every boy loves his mother. b. Every boyi [t i loves hisi mother] c. Every boy (lx (x loves x’s mother)) 6. Condition C states that an R-expression (i.e., any NP that is not a free variable) cannot be bound. This may seem superfluous if binding is defined as in (11), since, as we saw, binding is excluded here anyway, by logical syntax. However, as mentioned above, binding is used di¤erently in that framework: being bound is defined as being coindexed with a c-commanding NP. When one NP ccommands the other, covaluation is an instance of syntactic binding. Condition C thus correctly blocks the wrong construals of (8) and (10). 7. As noted in Reinhart 1983a, covaluation where Condition B blocks binding is much harder to find than covaluation in Condition C environments. For example, in both (ia) and (ib), the intended construal of the VP-predicate is as in (ii). Still, (ib) is harder than (ia). (i) a. At the end, only Max voted for Max. b. At the end, only Max voted for him (& him ¼ m). (ii) Only Max (lx (x voted for Max)) The reason suggested there is that (ia) is a more explicit way to express the predicate in (ii). Example (ib) requires the extra task of identifying the value of the pronoun. 8. A predecessor of this view was Dowty 1980. Dowty proposed that the underlying principle was ‘‘avoid ambiguity.’’ The scope of his proposal was instances of what is known today as Condition B: the choice of the reflexive anaphor in (ia) yields an unambiguous anaphora interpretation, while the pronoun in (ib) would allow, in principle, both an anaphoric and nonanaphoric interpretation. Hence, if anaphora is our intended interpretation, only (ia) will be selected. (i) a. Max adores himself. b. Max adores him. I argued that the preference is not restricted to these environments, and generally bound anaphora is preferred over coreference—that is, the relevant principle governs not only (i) but also the environments that fall under what is known today as Condition C. But in these environments ‘‘avoid ambiguity’’ is irrelevant, since a pronoun is always ambiguous. I proposed instead that binding is the most explicit
308
Notes to Pages 183–186
way to express anaphora even if ambiguity cannot be avoided. The economy condition that I proposed to capture both was: be as explicit as the conditions permit. 9. A reviewer pointed out that the argument that variable binding is more economical than coreference may be theory dependent: it depends on how the interpretation of pronouns is interpreted in the theory. If we follow Heim and Kratzer 1998 and assume that free (and bound) pronouns are just assigned a value directly when they are encountered, based on the contextually given assignment function g, then it is not actually the case that the interpretation of the VP in (33) (on the covaluation reading) involves an open property. Rather, the meaning of the VP is just lx (x loves g(his) mother). On this view, then, the relevant economy principle would have to be something like ‘‘avoid invoking the contextual assignment function.’’ This may seem an arbitrary principle. But it is less so within the approach of Reuland 2001 to this problem, to which I turn shortly. 10. Rule I of Grodzinsky and Reinhart 1993 is given below, with covalued replacing corefer in the original formulation. (i) NP A cannot be covalued with NP B if A could not be bound by B, and replacing A at LF with a variable bound by the trace of B, yields an indistinguishable interpretation. (Grodzinsky and Reinhart 1993, note 13) Independently of the question of whether it is economy that explains Rule I, this formulation had another problem, which is now addressed. Rule (i) still assumes the syntactic definition of binding: a binds b i¤ a and b are coindexed, and a c-commands b. (Hence, binding obtains between the trace of the l-argument and another variable). Under this formulation, Rule I could not, in fact, rule out cases of strong crossover, such as (ii) (which were treated there on a par with weak crossover, as in (iii)). (ii) a. Who did he say that we should invite t? b. who (lx (x said that we should invite x)) (iii) Who did his mother spoil t? Recall that in that system there is no Condition C. So nothing can rule out independently coindexation of he and the trace in (iia). The first condition of (i) thus is not met in (ii), hence Rule I does not apply here. The interpretation (iib) was ruled out by the translation definitions that entailed that a pronoun is a bound variable i¤ its binder is in an argument position (A-binder, under the previous definition of binding) (Grodzinsky and Reinhart, (15c)). This equally disallows binding of the pronoun in (iia) and (iii). The striking di¤erence in terms of acceptability of weak and strong crossover violations was lost in that system. Under the present definition, for he to bind the trace means that the trace should be bound by the VP l-operator whose sister is he, which is impossible, since the trace is already bound. So (ii) is subject to Rule I, and the special status of strong crossover is restored. Note, again, that this is independent of the issue of economy. Replacing the definition of binding would give the right results here also under the economy view of Rule I. 11. In the contexts of (37) binding is impossible (e.g., *His mother loves everyone). This is attributed to a ‘‘weak-crossover’’ generalization, which (as before) does not
Notes to Pages 186–191
309
follow from anything discussed in this paper. Without clause (a), it may seem that clause (b) of Rule I would rule out the covaluation construal in (37). In the long run, clause (a) may turn out just a reflex of a more semantic property: when a variable is A-bound by a c-commanding argument, this always reduces the number of open properties. A-binding by a non-c-commanding antecedent (which is created by QR) is vacuous, in the sense that it does not reduce open properties. To see this, let us compare (i) and (ii). (i) a. He loves his mother. b. [ IP x [VP ly (y loves z’s mother)]] c. [ IP x [VP ly (y loves y’s mother)]] In the IP of (ib), there are two open properties: the lower VP and the IP itself, which both contain a free variable. If we bind the free variable z, as in (ic), the VP can be closed (i.e., form a set), and we are left with just one property open— the IP. (ii) a. His mother loves him. b. [ IP x’s mother [VP ly (y loves z)]] c. [ IP x (lz (z’s mother [VP ly (y loves z)]] In (iia), there are two open properties, as before. But if we A-bind z to x, or x to z (technically obtainable by QR), we get (iic), which has precisely the same number of open properties: the VP is still open, and so is the IP. So, possibly, Rule I can be stated to disallow covaluation of a and b if a cannot A-bind b nonvacuously, and covaluation is nevertheless equivalent to nonvacuous binding. Since no equivalent nonvacuous binding exists here, clause (c) never has to be checked in such (weak-crossover) structures—that is, covaluation is always allowed. On this view, it may turn out that the generalization behind the weak-crossover restriction is something like ‘‘avoid vacuous binding.’’ 12. Technically, this specific derivation is also ruled out because the lower x is bound, and cannot be bound again. This result always obtains when two bound variables are covalued under c-command. Thus, Rule I happens to rule out (ib) as a possible anaphora construal of (ia). (i) a. Everyone/Max said that he loves his mother. b. Everyone/Max (lx (x said that x (lz (z loves x’s mother)))) c. Everyone/Max (lx (x said that x (lz (z loves z’s mother)))) The di¤erence between (ia) and (41a) is that (ia) still allows the bound-anaphora construal in (ic), which in (41a) will be ruled out by Condition B. 13. As I mentioned, the discussion in this section (4.3.3) is independent of whether we assume the interface Rule I, or the modified Condition C (29). The two would di¤er only in cases where covaluation and binding yield dinstinct interpretations. One such instance is the di¤erence between (i) and (ii). (i) Max, only he (himself ) can stand e & (he ¼ Max) (ii) aOnly Max (himself ), he can stand e & (he ¼ Max) The modified Condition C rules (ii) out successfully: as in (26) or (45a), Max cannot bind he here (since this leads to illicit covaluation of he and the trace). But
310
Notes to Pages 191–209
it would rule out (i) in precisely the same way. For Rule I, (i) is permitted for the same reason as in previous examples with only. In (iii), if Max binds he, we get the covaluation (iiib). The sets denoted by the lower l-predicate in (iiib) and its binding comparison (iiic) are di¤erent. Hence, we get di¤erent interpretations if only x belongs to these sets. (iii) a. Max (lx (only he (ly (y can stand x)) & he ¼ Max)) b. Max (lx (only x (ly (y can stand x)))) c. Max (lx (only x (ly (y can stand y)))) (iv) a. Only Max (lx (he (ly (y can stand x) & he ¼ Max))) b. Only Max (lx (x (ly (y can stand x)))) c. Only Max (lx (x (ly (y can stand y)))) In (ii), only occurs with the top argument—(ii) asserts that only Max is in the lx set. In this case, the sets denoted by the lx-predicate are identical in (ivb) and the binding comparison (ivc)—that is, (ivb) and (ivc) are equivalent. Hence covaluation is disallowed. 14. The locality solution to the problem of this section has also been proposed under di¤erent formulations by Ben-Shalom (1996) and, according to Fox, by Kehler (1993). 15. Note that the shift I made here from viewing Rule I as an economy principle to a di¤erent sort of cooperation strategy (‘‘Minimize interpretative options’’) does not a¤ect the processing complexity of the procedure. Under either view, two representations must be compared. Chapter 5 1. Children reject utterances such as Every boy is riding an elephant in contexts where adults would judge them to be true—for example, three boys are riding three elephants and there is an additional elephant (Inhelder and Piaget 1964; Philip 1995; also cf. Crain et al. 1996). Another factor argued to exist is children’s aversion to wide-scope indefinites (Kra¨mer 2001). 2. Large parts of this section appear in my article ‘‘Processing or Pragmatics? Explaining the Coreference Delay,’’ in Edward Gibson and Neal Pearlmutter, eds., The Processing and Acquisition of Reference, MIT Press, forthcoming. 3. Note, however, that (5a), as stated here, does not rule out cases of weak crossover, because it does not specify at which stage of the derivation c-command should hold. In His mother loves every boy, every boy can c-command the pronoun after QR, and (5a) will not rule out binding in this derivation. The covaluation conditions I turn to would also not rule this out. As is standard, I assume now that weak crossover is handled by a di¤erent generalization. (In my earlier work I assumed that c-command must hold at the overt structure, hence the same condition also rules out weak crossover.) 4. As noted in Reinhart 1983a, coreference where Condition B blocks binding is much harder to find than coreference in Condition C environments. For instance, in the context of (9a), it would be more natural to express the idea with When we counted the ballots . . . only Felix had voted for Felix. The reason suggested there is
Notes to Pages 209–220
311
that using the full proper name is the more explicit way to capture the intended meaning. (The pronoun requires the extra task of identifying its value.) Nevertheless, examples like (9) are possible, with e¤ort. 5. As we saw in chapter 4, within this view of economy, Rule I could be stated without clause (b) of (11), as in (i), which is essentially how it was viewed in Grodzinsky and Reinhart 1993 (modulo technical changes introduced in Reinhart 2000). (i) a and b cannot be covalued if a. a is in a configuration to A-bind b, and b. The covaluation interpretation is indistinguishable from what would be obtained if a A-binds b. 6. Thornton and Wexler (1999) also try to provide empirical counterevidence to Grodzinsky and Reinhart’s claim that it is the complexity of the computation that is responsible for children’s di‰culties. This is based on the assumption that there are other areas of anaphora that involve equally complex computations, and still, they pose no di‰culties to children. As Thornton and Wexler put it, ‘‘Indeed, there are several empirical findings in the literature showing that for many complex structures, children can hold two representations in memory and compare them for the purposes of computing the reference of a pronoun’’ (p. 46). However, of the two examples of such complex computations they discuss one does not, in fact, involve any reference-set computation, nor can it be argued that it poses a comparable complexity of computation. The example concerns discourse anaphora as in (i) (Thornton and Wexler’s (39), p. 46). (i) a. No mouse/every mouse came to Simba’s party. He wore a hat. b. A mouse came to Simba’s party. He wore a hat. The pronoun can refer to the indefinite in (ib), but not to the quantified DP in (ia). Thornton and Wexler cite experiments of Crain and of Conway that found that children performed almost adultlike on anaphora tasks in such sentences, and they conclude that this is despite the fact that ‘‘clearly children must be able to hold both sentences in memory in order to apply the relevant constraint’’ (p. 47). It is not obvious why this is so clear. I am not aware of an analysis that requires a reference-set computation in such tasks, and if one exists, it is unmotivated. In this case, there is even no need to hold two representations in memory at all. It has been established in DRT (and other frameworks) that indefinites introduce a discourse referent that can be picked up in subsequent discourse, (ib), while quantified NPs normally do not (special circumstances, absent in (ia), aside). This generalization can be stated under many theoretical formulations, but the task involved is establishing which item in the discourse referents’ storage is available for the pronoun to get its value from. The mouse entity is available in this storage for (ib) but not for (ia). In any case, the task requires looking at the discourse storage, rather than retaining two representations, let alone comparing them. Thornton and Wexler’s other example regards Condition B. They cite experiments by Crain (1991) and Thornton (1990), which checked sentences like (iia). (i) a. I know who scratched him—Bert. b. Every turtle scratched him and Bert did too.
312
Notes to Pages 220–225
Children correctly rejected (iia) if Bert was shown scratching himself, which suggests that they had no di‰culty processing the sentence. Note, first, that this is not a new instance of anaphora involving a comparison of representations, but the same one we have been dealing with. Thornton and Wexler view (iia) as an instance of ellipsis: a VP needs to be copied or reconstructed for Bert. If so, this is precisely equivalent to the type of task in (iib), which was the focus of their own experiments (although they did not experiment precisely with sentences like (iib), but rather with the reverse ordering or the conjuncts, which, they note, may have been an oversight). In principle, if the construal of the first conjunct of (iib) as lx (x scratched Bert) is contextually activated (e.g., with a picture of people scratching Bert), children should have di‰culty ruling out the second conjunct. Given that the same was not found in (iia), it might be necessary to examine more closely the contexts used in the experiment. In any case, in this example, Thornton and Wexler’s analysis and the reference-set analysis have precisely the same predictions. So if this experiment poses a probem, it is a problem for both. 7. In the reported experiments, children allowed coreference in (23) in 37.5 percent of the cases, which was not significantly di¤erent from their performance on Condition B violations. 8. In fact, Thornton and Wexler make a stronger claim that ‘‘backtracking in order to reconsider the interpretation of the pronoun, is not likely to be within the parser’s capacity, either for children or for adults’’ (p. 107). This cannot be true since the adults’ parser can clearly deal with backward anaphora, as well as with the apparent Condition C violations permitted by Rule I, such as (16) above, or Evan’s (i). (i) I know what Ann and Bill have in common. She thinks that Bill is a genius, and he thinks that Bill is a genius. Needless to say, if Thornton and Wexler’s generalization also holds for adults, there is very little evidence for Condition C in right-branching languages, since most of what it rules out would also be ruled out by the special parser limitation. 9. As explained in Fox 1995, an ellipsis context is not su‰cient in itself to license coreference by Rule I. Thus, in the reverse order in (i), coreference in the first conjunct is not allowed, even though this would enable the interpretation that both the kiwi bird and Flash Gordon himself cleaned Flash Gordon. (i) He cleaned Flash Gordon, and the kiwi bird did too. However, the prohibition stated by Fox is (roughly) against letting future discourse a¤ect the processing of a given derivation. In (28), by contrast, the computation of Rule I applies after the relevant predicate has been formed already in the previous context. That apparent Condition C violations are possible in the given ellipsis context has been noted before. Thornton and Wexler mention that Fiengo and May (1994) suggested for sentences like Mary likes John and he thinks that Sally does too that an operation of ‘‘vehicle change’’ (roughly) changes the status of John in the second conjunct to that of a pronoun. As Thornton and Wexler point out, however, this would not work for the local context of (28), where a pronoun is ruled out as well. (Fiengo and May argue that sentences like (28) are indeed ruled
Notes to Pages 225–256
313
out, but given the adults’ answers in the experiment—they accepted it 83 percent of the time—this cannot be true.) The account Thornton and Wexler o¤er for why coreference is permitted in (28) is that since the pronoun is stressed, it is taken as a di¤erent guise of Flash Gordon. Hence this is an instance of coreference under di¤erent guises. 10. Thornton and Wexler’s example is (i), along with a similar example with the predicate vote for him. (i) You know what Mary, Sue and John have in common? Mary admires John, Sue admires John, and he admires him too. 11. This either/or condition is reminiscent of the original Principle P that Chien and Wexler (1990) o¤ered to account for coreference. Assuming that binding conditions B and C always enforce contraindexing, the principle says that ‘‘contraindexed NPs are noncoreferential unless the context explicitly forces coreference.’’ In other words, contraindexed NPs are either coreferential or not, depending on unspecified context considerations. 12. This is in general the case with interface reference-set computation. As we saw in previous chapters the same computation is found with QR and stress-shift for focus. In these cases as well, the computation needs to be carried out only if the relevant interpretation is considered. This means, for example, that not all interpretations of quantifier scope are equally complex. To compute whether in (i) a woman can take scope over every bear no special computation need apply. (i) A woman washed every bear. But if the option of a QR-interpretation is considered (wide scope for every bear), a semantic reference set needs to be constructed, so the computation is more costly. 13. In the VP-ellipsis sentences like Bert brushed him and the tin man did too, the coreference acceptance rate was 43 percent. Their combined rate is 50.5 percent. Usually, the more results that are combined in a chance pattern performance, the closer it gets to precisely 50 percent. 14. Large parts of this section appeared in my article ‘‘The Processing Cost of Reference Set Computation: Acquisition of Stress Shift and Focus,’’ Language Acquisition 12/2 (2004). It is partially based on joint work in progress with Kriszta Szendro˝i, whose constant input and insight have shaped its development. 15. In the present discussion I do my best to abstract away from anaphoric e¤ects, which should be helped by the use of a verb of creation, like build. 16. Halbert and colleagues also found di¤erent results when they checked a verb with two obligatory internal arguments like give, and the findings were not further tested and clarified. 17. Zuckerman and colleagues found a much higher failure rate (45 percent). They suggest that this may have to do with factors of the experimental design, such as the fact that stressed and unstressed sentences were tested in the same session. 18. Zuckerman, Vasic´, and Avrutin (2001, 791) o¤er a di¤erent explanation for children’s di‰culties in both the switch-reference cases and Condition B. They
314
Notes to Pages 256–272
argue that children at first misclassify pronouns as anaphors, namely, expressions that can be bound locally and cannot carry stress. Therefore, an encounter with a stressed pronoun ‘‘creates confusion which leads either to chance performance or to ignoring the stress altogether and performing as if these sentences include an unstressed pronoun.’’ It is not obvious to me, why, on this account, children do not have a more general problem with anaphora resolution with pronouns, given that pronouns occur all over the place in positions that disallow anaphors. 19. There is one area where this analysis di¤ers empirically from that proposed in Zuckerman, Vasic´, and Avrutin 2001. They argue that the contrast in (i) shows that the neutral-stress derivation must be the basis for the computation of contrastive pronouns. (i) a. John hit Bill and then Mrs. Smith punished him. b. John hit Bill and then Mrs. Smith punished him. On their intuition, in (ia) parallelism is canceled due to the special context, so him must refer to John. Consequently the switch reference rule must reverse this construal, so in (iib) him cannot refer to John. If so, the switch reference rule must consider what the anaphora construal of the neutral derivation is. Here we di¤er in judgment. For me, either of the derivations allows both anaphora construals. As I argued, such sentences do not fall at all under the parallelism constraint. 20. For Reinhart (1983a), the coreference rule was based on the Gricean maxim of manner: be as explicit as the conditions permit. Levinson (1987) argued that it is a standard instance of scalar implicatures based on the maxim of quantity.
References
Abusch, Dorit. 1994. The scope of indefinites. Natural Language Semantics 2(2):83–135. Adams, A.-M., and S. E. Gathercole. 1996. Phonological working memory and spoken language development in young children. Quarterly Journal of Experimental Psychology 49A, 216–233. Akmajian, A., and R. Jackendo¤. 1970. Coreferentiality and stress. Linguistic Inquiry 1:124–126. Altman, G. T. M., and M. Steedman. 1988. Interaction with context during human sentence processing. Cognition 30:191–238. Ariel, Mira. 1990. Accessing Noun Phrase Antecedents. London: Routledge. Avrutin, Sergey. 1994. Psycholinguistic Investigations in the Theory of Reference. Doctoral dissertation, MIT. Baauw, S., E. Ruigendijk, and F. Cuetos. 2003. The interpretation of contrastive stress in Spanish-speaking children. In J. van Kampen and S. Baauw, eds., Proceedings of GALA 2003. LOT, Occasional Series 2. Utrecht, The Netherlands. Baddeley, A. D. 1986. Working Memory. Oxford: Oxford University Press. Baert, J. L. 1987. Focus, Syntax, and Accent Placements. Dordrecht, The Netherlands: ICG Printing. Baker, Carl L. 1970. Notes on the description of English questions: The role of an abstract question morpheme. Foundations of Language 6:197–219. Barwise, J., and R. Cooper. 1981. Generalized quantifiers and natural language. Linguistics and Philosophy 4:159–219. Beghelli, Filippo. 1993. A minimalist approach to quantifier scope. In Amy J. Schafer, ed., Proceedings of NELS 23, University of Ottawa. Amherst: NELS/ University of Massachusetts, Amherst. Beghelli, Filippo. 1995. The Phrase Structure of Quantifier Scope. Doctoral dissertation, UCLA. Beghelli, Filippo, Dorit Ben-Shalom, and Anna Szabolcsi. 1993. When do subjects and objects exhibit a branching reading? In Erin Duncan, Donka Farkas, and Philip Spaelti, eds., Proceedings of the Twelfth West Coast Conference on Formal Linguistics. Stanford, Calif.: CSLI Publications.
316
References
Beghelli, Filippo, and Timothy Stowell. 1995. Distributivity and negation: The syntax of each and every. Unpublished manuscript, UCLA. Beghelli, Filippo, and T. Stowell. 1997. The syntax of distributivity and negation. In A. Szabolcsi, ed., Ways of Scope Taking, 71–108. Dordrecht, The Netherlands: Kluwer. Ben-Shalom, Dorit. 1993. Object wide scope and semantic trees. In A. Lahiri, ed., Proceedings of Semantics and Linguistic Theory (SALT) 3, 19–37. Ithaca, NY: CLC Publications/Cornell University. Ben-Shalom, Dorit. 1996. Dependent and independent pronouns. In Semantic Trees, chap. 2. Doctoral dissertation, UCLA. Berwick, Robert, and Amy Weinberg. 1984. The Grammatical Basis of Linguistic Performance. Cambridge, Mass: MIT Press. Bochvar, D. A. 1939. On a three-valued logical calculus and its application to the analysis of contradictories. Matematceskij Sbornik 4:287–308. Quoted in Susan Haack, Deviant Logic: Some Philosophical Issues. Cambridge: Cambridge University Press, 1974. Bolinger, Dwight. 1972. Accent is predictable (if you’re a mind-reader). Language 48:633–644. Braine, M., and B. Rumain. 1981. Children’s comprehension of ‘‘or’’: Evidence for a sequence of competencies. Journal of Experimental Child Psychology 31:46–70. Breheny, Richard, Napoleon Katsos, and John Williams. Forthcoming. Are generalised scalar implicatures generated by default? An on-line investigation into the role of context in generating pragmatic inferences. Cognition. Carston, R. 1998. Informativeness, relevance, and scalar implicatures. In R. Carston and S. Uchida, eds., Relevance Theory: Applications and Implications. Amsterdam: Benjamins. Chien, Y.-Ch., and K. Wexler. 1990. Children’s knowledge of locality conditions in binding as evidence of the modularity of syntax and pragmatics. Language Acquisition 1:225–295. Chierchia, Gennaro. 1993. Questions with quantifiers. Natural Language Semantics 1:181–234. Chierchia, Gennaro. 2004. Scalar implicatures, polarity phenomena, and the syntax/pragmatics interface. In A. Belletti, ed., Structures and Beyond. Oxford: Oxford University Press. Chierchia, G., S. Crain, M. T. Guasti, A. Gualmini, and L. Meroni. 2001. The acquisition of disjunction: Evidence for a grammatical view of scalar implicatures. In A. H.-J. Do, L. Domı´nguez, and A. Johansen, eds., Proceedings of the 25th Annual Boston University Conference on Language Development, 157–168. Somerville, Mass.: Cascadilla Press. Chierchia, G., and M. T. Guasti. 2000. Backwards vs. forward anaphora: Reconstruction in child language. Language Acquisition 8(2):129–170. Chomsky, Noam. 1971. Deep structure, surface structure, and semantic interpretation. In D. D. Steinberg and L. A. Jakobovits, eds., Semantics: An Interdisci-
References
317
plinary Reader in Philosophy, Linguistics, and Psychology, 183–216. Cambridge: Cambridge University Press. Chomsky, Noam. 1973. Conditions on transformations. In Stephen Anderson and Paul Kiparsky, eds., A Festschrift for Morris Halle, 232–286. New York: Holt, Reinhart and Winston. Chomsky, Noam. 1975. Questions of form and interpretation. Linguistic Analysis 1:75–109. Chomsky, Noam. 1976. Conditions on rules of grammar. Linguistic Analysis 2:303–351. Reprinted in Noam Chomsky, Essays on Form and Interpretation. New York: North Holland, 1977. Chomsky, Noam. 1981. Lectures on Government and Binding. Dordrecht, The Netherlands: Foris. Chomsky, Noam. 1986. Barriers. Cambridge, Mass.: MIT Press. Chomsky, Noam. 1992. A Minimalist Program for Linguistic Theory. Cambridge, Mass.: MIT Working Papers in Linguistics. Chomsky, Noam. 1994. A minimalist program for linguistic theory. In Kenneth Hale and Samuel Jay Keyser, eds., The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, 1–52. Cambridge, Mass.: MIT Press. (Reprinted as chap. 3 in Noam Chomsky, The Minimalist Program. Cambridge, Mass.: MIT Press, 1995.) Chomsky, Noam. 1995. The Minimalist Program. Cambridge, Mass: MIT Press. Chomsky, Noam. 2000. Minimalist inquiries: The framework. In R. Martin, D. Michaels, and J. Uriagereka, eds., Step by Step, 89–156. Cambridge, Mass.: MIT Press. Chomsky, Noam. 2001. Derivation by phase. In M. Kenstowicz, ed., A Life in Language, 1–52. Cambridge, Mass.: MIT Press. Chomsky, Noam. 2005. Three factors in language design. Linguistic Inquiry 36(1):1–22. Chomsky, Noam, and Morris Halle. 1968. The Sound Pattern of English. New York: Harper and Row. Chomsky, Noam, and Howard Lasnik. 1993. The theory of principles and parameters. In Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld, and Theo Vennemann, eds., Syntax: An International Handbook of Contemporary Research, vol. 1, 506–569. Berlin: Walter de Gruyter. (Reprinted in Noam Chomsky, The Minimalist Program. Cambridge, Mass.: MIT Press, 1995.) Chung, S., W. Ladusaw, and J. McCloskey. 1994. Sluicing and logical form. Natural Language Semantics 3(3):239–282. Cinque, Guglielmo. 1993. A null theory of phrase and compound stress. Linguistic Inquiry 24(2):239–298. Collins, Chris. 1994. Economy of derivation and the generalized proper binding condition. Linguistic Inquiry 25(1):45–61. Collins, Chris. 1997. Local Economy. Cambridge, Mass.: MIT Press.
318
References
Cooper, R. 1979. Variable binding and relative clauses. In Franz Guenthner and S. J. Schmidt, eds., Formal Semantics and Pragmatics for Natural Languages, 131–170. Dordrecht, The Netherlands: Reidel. Cooper, R. 1983. Quantification and Syntactic Theory. Dordrecht, The Netherlands: Reidel. Crain, S. 1991. Language acquisition in the absence of experience. Behavioral and Brain Sciences 14:597–611. Crain, S., and H. Hamburger. 1992. Semantic knowledge and NP modification. In R. Levine, ed., Formal Grammar: Theory and Interpretation, vol. 2, 372–401. Vancouver: University of British Columbia Press. Crain, S., and D. Lillo-Martin. 1999. An Introduction to Linguistic Theory and Language Acquisition. Oxford: Blackwell. Crain, S., and Cecile McKee. 1986. Acquisition of structural restrictions on anaphora. In S. Berman, J.-W. Choe, and J. McDonough, eds., Proceedings of the North Eastern Linguistic Society 16, GLSA, 94–110. Amherst, Mass.: GLSA. Crain, S., W. Ni, and L. Conway. 1994. Learning, parsing, and modularity. In C. Clifton, L. Frazier, and K. Rayner, eds., Perspectives on Sentence Processing, 443–467. Hillsdale, N.J.: Erlbaum. Crain, S., and M. Steedman. 1985. On not being led up the garden path: The use of context by the psychological parser. In D. Dowty, L. Karttunen, and A. Zwicky, eds., Natural Language Parsing: Psychological, Computational and Theoretical Perspectives, 320–358. Cambridge: Cambridge University Press. Crain, S., and R. Thornton. 1998. Investigations in Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics. Cambridge, Mass.: MIT Press. Crain, S., R. Thornton, C. Boster, L. Conway, D. Lillo-Martin, and E. Woodams. 1996. Quantification without qualification. Language Acquisition 5:83–153. Crain, S., and K. Wexler. 1999. A modular approach to methodology. In William Ritchie and Tej Bhatia, eds., Handbook of Child Language Acquisition, 327–426. New York: Academic Press. Cutler, A., and D. Swinney. 1987. Prosody and the development of comprehension. Journal of Child Language 14(1):145–167. Danon, G. 1996. The Syntax of Determiners in Hebrew. MA thesis, Tel Aviv University. Available at http://faculty.biu.ac.il/~danong1/papers/thesis.pdf. Diesing, M. 1992. Indefinites. Cambridge, Mass.: MIT Press. van der Does, J. 1992. Applied Quantifier Logics. Doctoral dissertation, University of Amsterdam. Dowty, D. 1980. Comments on the paper by Bach and Partee. In K. J. Kreiman, and A. E. Ojeda, eds., Papers from the Parasession on Pronouns and Anaphora, 29–40. Chicago: Chicago Linguistic Society. Dowty, D. 1986. A note on collective predicates, distributive predicates and ‘‘all.’’ In F. Marshall, A. Miller, and Z. Zhang, eds., Proceedings of the Third Eastern State Conference on Linguistics. Columbus: Ohio State University.
References
319
Engdahl, Elisabet. 1980. The Syntax and Semantics of Questions in Swedish. Doctoral dissertation, University of Massachussetts, Amherst. Engdahl, Elisabet. 1986. Constituent Questions: The Syntax and Semantics of Questions with Special Reference to Swedish. Dordrecht, The Netherlands: Reidel. Epstein, Samuel David. 1992. Derivational constraints on A-Chain formation. Linguistic Inquiry 23(2):235–259. Evans, Gareth. 1980. Pronouns. Linguistic Inquiry 11(2):337–362. Everaert, M., and Eric Reuland, eds. 2000. Interface Strategies. Amsterdam, The Netherlands: Royal Netherlands Academy of Arts and Sciences. Farkas, D. 1981. Quantifier scope and syntactic islands. In G. N. Carlson, ed., Papers from the Seventeenth Regional Meeting of the Chicago Linguistic Society, 59–66. Chicago: University of Chicago. Farkas, Donka, and Anastasia Giannakidou. 1996. How clause bounded is the scope of universals? In Teresa Galloway and Justin Spence, eds., Proceedings of SALT VI, 35–52. Fiengo, Robert, and Robert May. 1994. Indices and Indentity. Cambridge, Mass.: MIT Press. Fodor, Janet Dean, and Ivan A. Sag. 1982. Referential and quantificational indefinites. Linguistics and Philosophy 5(3):355–398. Fodor, Jerry. 1979. In defense of the truth gap. In C. K. Oh and D. A. Dinneen, eds., Presuppositions, Syntax, and Semantics 11. New York: Academic Press. Fodor, Jerry A., Thomas G. Bever, and Merrill F. Garrett. 1974. The Psychology of Language: An Introduction to Psycholinguistics and Generative Grammar. New York: McGraw-Hill. Fox, Danny. 1993. Chain and Binding: A modification of Reinhart and Reuland’s reflexivity. Unpublished manuscript, MIT. Fox, Danny. 1995. Economy and scope. Natural Language Semantics 3:283–341. Fox, Danny. 1998. Locality in variable binding. In Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis, and David Pesetsky, eds., Is the Best Good Enough? Optimality and Competition in Syntax, 129–155. Cambridge, Mass.: MIT Press. Fox, Danny. 2000. Economy and Semantic Interpretation. Cambridge, Mass.: MIT Press. Gathercole, S. E., and A. Adams. 1993. Phonological working memory in very young children. Developmental Psychology 29:770–778. Gathercole, S. E., and Alan D. Baddeley. 1993. Working Memory and Language: Essays in Cognitive Psychology. Hove, UK: Erlbaum. Gathercole, S. E., and G. J. Hitch. 1993. Developmental changes in short-term memory: A revised working memory perspective. In A. F. Collins, S. E. Gathercole, M. A. Conway, and P. E. Morris, eds., Theories of Memory, 189–209. Hove, UK: Erlbaum.
320
References
Gazdar, G. A. 1979. Pragmatics: Implicatures, Presupposition and Logical Form. New York: Academic Press. Gennari, S., A. Gualmini, S. Crain, L. Meroni, and S. Maciukaite. 2001. How adults and children manage stress in ambiguous contexts. In Proceedings of the 1st Workshop on Cognitive Models of Semantic Processing. Edinburgh: University of Edinburgh. Gil, David. 1982. Quantifier scope, linguistic variation and natural language semantics. Linguistics and Philosophy 5:421–472. Golan, Yael. 1993. Node crossing economy, superiority and D-linking. Unpublished manuscript, Tel Aviv University. Goldsmith, John A. 1976. Autosegmental Phonology. Doctoral dissertation, MIT. Bloomington: Indiana University Linguistics Club. Grice, P. 1975. Logic and conversation. In P. Cole and J. Morgan, eds., Syntax and Semantics 3: Speech Acts. New York: Academic Press. Also in Paul Grice, Studies in the Way of Words. Cambridge, MA: Harvard University Press, 1989. Grimshaw, Jane. 1997. Projections, heads and optimality. Linguistic Inquiry 28:373–422. Grimshaw, J., and S. Rosen. 1990. Knowledge and obedience: The developmental status of the binding theory. Linguistic Inquiry 21:187–222. Grodzinsky, Y., T. Reinhart, and K. Wexler. 1990. The Development of Principles Governing Pronouns in Grammar and Discourse—Binding Theory and Pragmatic Implicatures of Quantity. Grant proposal, U.S.–Israel Binational Science Foundation. (Ms. Tel Aviv University and MIT, available at http:// www.let.uu.nl/~tanya.reinhart/personal/.) Grodzinsky, Yoseph, and Tanya Reinhart. 1993. The innateness of binding and coreference. Linguistic Inquiry 241:69–101. Groenendijk, Jeroen, and Martin Stokhof. 1982. Semantic analysis of whcomplements. Linguistics and Philosophy 52:175–234. Groenendijk, Jeroen, and Martin Stokhof. 1984. Studies on the Semantics of Questions and the Pragmatics of Answers. Doctoral dissertation, University of Amsterdam. Gualmini, A., S. Crain, L. Meroni, G. Chierchia, and M. T. Guasti. 2001. At the semantics/pragmatics interface in child language. In Rachel Hastings, Brendan Jackson, and Zsofia Zvolenszky, eds., Proceedings from Semantics and Linguistic Theory 11, 231–247. Ithaca, N.Y.: Cornell University. Gualmini, Andrea, Simona Maciukaite, and Stephen Crain. 2003. Children’s insensitivity to contrastive stress in sentences with only. In Sudha Arunachalam, Elsi Kaiser, and Alexander Williams, eds., Proceedings of the 25th Penn Linguistics Colloquium, 87–100. Philadelphia: Department of Linguistics, University of Pennsylvania. Guenthner, F., and S. Schmidt, eds. 1979. Formal Semantics and Pragmatics for Natural Languages. Dordrecht, The Netherlands: Reidel.
References
321
Gussenhoven, Carlos. 1984. On the Grammar and Semantics of Sentence Accents in Dutch. Dordrecht, The Netherlands: Foris. Halbert, A., S. Crain, D. Shankweiler, and E. Woodams. 1995. Children’s interpretive use of emphatic stress. Paper presented at the Eighth Annual CUNY Conference on Human Sentence Processing, Tucson, Ariz. Halle, M., and J.-R. Vergnaud. 1987. An Essay on Stress. Cambridge, Mass.: MIT Press. Halliday, M. A. K. 1967. Notes on transitivity and theme in English, Part 2. Journal of Linguistics 3:199–244. Hamblin, Charles Leonard. 1976. Questions in Montague English. In B. Partee, ed., Montague Grammar. New York: Academic Press. (Originally appeared in Foundations of Language 10/1(1973):41–53.) Heim, Irene. 1982. The Semantics of Definite and Indefinite Noun Phrases. Doctoral dissertation, University of Massachusetts, Amherst. (Published in 1989 by Garland, New York.) Heim, Irene. 1986. Notes on comparatives and related matters. Unpublished manuscript, University of Texas, Austin. Heim, Irene. 1998. Anaphora and semantic interpretation: A reinterpretation of Reinhart’s approach. In U. Sauerland and O. Percus, eds., The Interpretative Tract. MIT Working Papers in Linguistics 25. Cambridge, MA: MITWPL, Department of Linguistics and Philosophy, MIT. Heim, Irene, and Angelika Kratzer. 1998. Semantics in Generative Grammar. Oxford: Blackwell. Hendriks, H. 1993. Studied Flexibility. Doctoral dissertation, University of Amsterdam. Higginbotham, James. 1983. Logical form, binding, and nominals. Linguistic Inquiry 14(3):395–420. Higginbotham, James. 1985. On semantics. Linguistic Inquiry 16(4):547–593. Higginbotham, James. 1987. Indefinites and predication. In E. Reuland and A. G. B. ter Meulen, eds., The Representation of (In)definiteness, 41–80. Cambridge: Cambridge University Press. Higginbotham, James. 1992. Interrogatives. Unpublished manuscript, MIT. Higginbotham, James, and Robert May. 1981. Questions, quantifiers, and crossing. Linguistic Review 1:41–79. Hilbert, D., and P. Bernays. [1939] 1970. Die Grundlagen der Mathematik II. 2nd ed. Berlin: Springer. Hirschbu¨hler, P. 1982. VP Deletion and across-the-board quantifier scope. In James Pustejovsky and Peter Sells, eds., NELS 12, 132–139. GLSA, University of Massachusetts, Amherst. de Hoop, H. 1992. Case Configuration and NP Interpretation. Doctoral dissertation, University of Groningen, The Netherlands.
322
References
Horn, Larry. 1972. On the Semantic Properties of Logical Operators in English. PhD dissertation, UCLA, distributed by Indiana University Linguistics Club. Horn, Larry. 1984. Toward a new taxonomy for pragmatic inference. In D. Schiffrin, ed., Meaning, Form, and Use in Context: Linguistic Applications (GURT ’84) (p. 1142). Washington, DC: Georgetown University Press. Horn, Larry. 1989. A Natural History of Negation. Chicago: University of Chicago Press. Hornby, P. A., and W. A. Hass. 1970. Use of contrastive stress by preschool children. Journal of Speech and Hearing Research 13:395–399. Hornstein, Norbert. 1995. LF: The Grammar of Logical Form: From GB to Minimalism. Oxford: Blackwell. Hornstein, N., and A. Weinberg. 1990. The necessity of LF. Linguistic Review 7:129–167. Huang, C. T. James. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral dissertation, MIT. Ingram, D., and C. Shaw. 1981. The comprehension of pronominal reference in children. Unpublished manuscript, University of British Columbia, Vancouver. Inhelder, B., and J. Piaget. 1964. The Early Growth of Logic in the Child. London: Routledge and Kegan Paul. Ioup, G. L. 1975. The Treatment of Quantifier Scope in a Transformational Grammar. Doctoral dissertation, City University of New York. Jackendo¤, Ray S. 1972. Semantic Interpretation in Generative Grammar. Cambridge, Mass.: MIT Press. Jackendo¤, Ray S. 1997. The architecture of the language faculty. Linguistic Inquiry Monograph 28. Cambridge, Mass.: MIT Press. Jacobson, Pauline. 1999. Towards a variable-free semantics. Linguistics and Philosophy 22:117–184. Kamp, H., and U. Reyle. 1993. From Discourse to Logic: Introduction to Model Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Dordrecht, The Netherlands: Kluwer. Karttunen, L. 1977. The syntax and semantics of questions. Linguistics and Philosophy 1:1–44. Katsos, N., R. Breheny, and N. Williams. 2005. Interaction of structural and contextual constraints during the on-line generation of scalar inferences. Paper presented at CogSci2005—27th Annual Conference of the Cognitive Science Society. Kayne, Richard. 1984. Connectedness and Binary Branching. Dordrecht, The Netherlands: Foris. Keenan, Edward. 1971. Names, quantifiers and a solution to the sloppy identity problem. Papers in Linguistics 4(2). Keenan, Edward. 1987. A semantic definition of indefinite NP. In E. Reuland and A. ter Meulen, eds., The Representation of (In)definiteness, 109–150. Cambridge, Mass.: MIT Press.
References
323
Keenan, Edward, and Leonard M. Faltz. 1978. Logical types for natural language. In UCLA Occasional Papers in Linguistics 3. Los Angeles: Department of Linguistics, UCLA. Kehler, Andy. 1993. A discourse copying algorithm for ellipsis and anaphora. Proceedings of EACL. Association for Computational Linguistics, Morristown, N.J. Kempson, R., and A. Cormack. 1981. Ambiguity and quantification. Linguistics and Philosophy 4:259–309. Kennedy, Christopher. 1997. Antecedent-contained deletion and the syntax of quantification. Linguistic Inquiry 28:662–688. Kra¨mer, I. 2001. Interpreting Indefinites. Doctoral dissertation, Max Planck Institute, Nijmegen. Kratzer, Angelika. 1995. Stage-level and individual-level predicates. In G. N. Carlson and F. J. Pelletier, eds., The Generic Book, 125–175. Chicago: University of Chicago Press. Kratzer, Angelika. 1998. Scope or pseudo scope? Are there wide scope indefinites? In S. Rothstein, ed., Events in Grammar, 163–196. Dordrecht, The Netherlands: Kluwer. Krifka, Manfred. 1995. The semantics and pragmatics of polarity items. Linguistic Analysis 25:209–257. Ladd, D. Robert. 1980. The Structure of Intonational Meaning: Evidence from English. Bloomington: Indiana University Press. Landman, Fred. 2000. Events and Plurality. Dordrecht: Kluwer. Langacker, R. 1969. On pronominalization and the chain of command. In W. Reibel and S. Schane, eds., Modern Studies in English. Englewood Cli¤s, N.J.: Prentice Hall. Lappin, S., and T. Reinhart. 1989. Presuppositional e¤ects of strong determiners. Linguistics 26:1021–1037. Lasnik, Howard. 1976. Remarks on coreference. Linguistic Analysis 2(1):1–22. Lasnik, Howard, and Mamoru Saito. 1992. Move alpha: Conditions on its applications and output. Cambridge, Mass.: MIT Press. Levinson, Stephen C. 1987. Pragmatics. Cambridge: Cambridge University Press. Levinson, S. 1987. Pragmatics and the grammar of anaphora. Journal of Linguistics 23:379–434. Levinson, S. 2000. Presumptive Meanings. Cambridge, Mass.: MIT Press. Lewis, David. 1975. Adverbs of quantification. In Edward Keenan, ed., Formal Semantics of Natural Language, 3–15. Cambridge: Cambridge University Press. Liberman, Mark. 1979. The Intonational System of English. New York: Garland Press. (Originally a doctoral dissertation, MIT, 1975.) Lidz, J., and J. Musolino. 2002. Children’s command of quantification. Cognition 84(2):113–154.
324
References
Link, G. 1983. The logical analysis of plural and mass terms: A lattice theoretical approach. In R. Ba¨uerle, C. Schwarze, and A. von Stechow, eds., Meaning, Use and Interpretation of Language. Berlin: De Gruyter. Lust, B., and T. Cli¤ord. 1982. The 3D study: E¤ects of depth, distance, and directionality on children’s acquisition of Mandarin Chinese. In J. Pustejovsky and P. Sells, eds., NELS 12. GLSA, University of Massachusetts, Amherst. Lust, B., K. Loveland, and R. Kornet. 1980. The development of anaphora in first language: Syntactic and pragmatic constraints. Linguistic Analysis 6(4):359– 391. Maratsos, M. P. 1973. The e¤ects of stress on the understanding of pronominal reference in children. Journal of Psycholinguistic Research 2(1):1–8. May, Robert. 1977. The Grammar of Quantification. Doctoral dissertation, MIT. (Distributed by Indiana University Linguistics Club.) May, Robert. 1985. Logical Form: Its Structure and Derivation. Cambridge, Mass.: MIT Press. Mazuka, R. 1996. Can a grammatical parameter be set before the first words? In J. L. Morgan and K. Demuth, eds., Signal to Syntax. Hillsdale, N.J.: Erlbaum. McCawley, J. 1979. Presuppositions and discourse structure. In C. K. Oh and D. A. Dinneen, eds., Presuppositions, Syntax and Semantics, 11. New York: Academic Press. McDaniel, Dana, and Thomas L. Maxfield. 1992. Principle B and contrastive stress. Language Acquisition 2(4):337–358. Merchant, Jason. 2001. The Syntax of Silence: Sluicing, Islands, and the Theory of Ellipsis. Oxford: Oxford University Press. Miller, George, and Noam Chomsky. 1963. Finitary models of language users. In R. D. Luce, R. R. Bush, and E. Galanter, eds., Handbook of Mathematical Psychology, vol. 2. New York: Wiley. Milsark, Gary. 1974. Existential Sentences in English. Doctoral dissertation, MIT. Cambridge, Mass.: MIT Libraries. Moltmann, F., and A. Szabolcsi. 1994. Scope interactions with pair-list readings. In M. Gonza´les, ed., NELS 24, 381–395. Amherst: GLSA, University of Massachusetts. Mulders, Iris. 2002. Transparent Parsing: Head-Driven Processing of Verb-Final Structures. Doctoral dissertation, Utrecht University. LOT Dissertation Series 56. Available at http://www.library.uu.nl/digiarchief/dip/diss/2002-1029-094527/ inhoud.htm. Mulders, Iris. 2004. Phase theory in sentence processing. Paper presented at the Tools in Linguistic Theory conference, May 16–18, Budapest, Hungary. Unpublished manuscript, Utrecht University. Mulders, Iris. 2005. Transparent parsing: Phases in sentence processing. In M. McGinnis and N. Richards, eds., Perspectives on Phases, 237–264. MIT Working Papers in Linguistics 49. Cambridge, Mass.: MITWPL, Department of Linguistics and Philosophy, MIT.
References
325
Musolino, J., and Lidz, J. Forthcoming. The scope of isomorphism: Turning adults into children. Language Acquisition. Neeleman, Ad. 1994. Complex Predicates. Doctoral dissertation, Utrecht University. OTS Dissertation Series. Neeleman, Ad, and T. Reinhart. 1998. Scrambling and the PF interface. In W. Geuder and M. Butt, eds., Projecting from the Lexicon. Stanford, Calif.: CSLI Publications. Neeleman, Ad, and Kriszta Szendro˝ i. 2004. Superman sentences. Linguistic Inquiry 35(1):149–159. Neeleman, Ad, and Fred Weerman. 1996. Case and arguments in a flexible syntax. Unpublished manuscript, OTS, Utrecht University. Neeleman, Ad, and Fred Weerman. 1999. Flexible Syntax: A Theory of Case and Arguments. Dordrecht, The Netherlands: Kluwer. Nespor, M., T. Guasti, and A. Christophe. 1996. Selecting word order: The rhythmic activation principle. In U. Kleinhenz, ed., Interfaces in Phonology, 1–26. Berlin: Akademie Verlag. Nespor, Marina, and Irene Vogel. 1986. Prosodic Phonology. Dordrecht, The Netherlands: Foris. Nishigauchi, Taisuke. 1986. The syntax of wh-questions in Japanese and English. Shoin Literary Review 20:67–85. Nooteboom, S. G., and J. G. Kruyt. 1987. Accents, focus distribution, and the perceived distribution of given and new information: An experiment. Journal of the Acoustical Society of America 825:1512–1524. Noveck, I. 2001. When children are more logical than adults. Cognition 86:253–282. Noveck, I., and A. Posada. 2003. Characterising the time course of an implicature. Brain and Language 85:203–210. Papafragou, A., and J. Musolino. 2002. The pragmatics of number. Proceedings of the 24th Annual Conference of the Cognitive Science Society. Hillsdale, N.J.: Erlbaum. Papafragou, A., and J. Musolino. 2003. Scalar implicatures: Experiments at the semantic-pragmatics interface. Cognition 80:253–282. Pesetsky, David. 1982. Paths and Categories. Doctoral dissertation, MIT. Pesetsky, David. 1987. Wh–in situ: Movement and unselective binding. In E. Reuland and A. ter Meulen, eds., The Representation of (In)definiteness, 98–129. Cambridge, Mass.: MIT Press. Philip, William. 1995. Event Quantification in the Acquisition of Universal Quantification. Doctoral dissertation, University of Massachusetts, Amherst. Phillips, Colin. 1996. Order and Structure. Doctoral dissertation, MIT. (Distributed by MIT Working Papers in Linguistics.) Pollard, Carl, and Ivan Sag. 1992. Anaphors in English and the scope of the binding theory. Linguistic Inquiry 23:261–305.
326
References
Prince, Alan, and Paul Smolensky. 1993. Optimality Theory: Constraint Interaction in Generative Grammar. Technical Report 2. New Brunswick, N.J.: Center for Cognitive Science, Rutgers University. Prince, Ellen F. 1981. Towards a taxonomy of given-new information. In P. Cole, ed., Radical Pragmatics, 233–255. New York: Academic Press. Pritchett, Bradley L. 1992. Grammatical Competence and Parsing Performance. Chicago: University of Chicago Press. Pulleyblank, Douglas, and William J. Turkel. 1998. The logical problem of language acquisition in optimality theory. In Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis, and David Pesetsky, eds., Is the Best Good Enough? Optimality and Competition in Syntax, 399–420. Cambridge, Mass.: MIT Press and MIT Working Papers in Linguistics. Reinhart, Tanya. 1976. The Syntactic Domain of Anaphora. Doctoral dissertation, MIT. (Distributed by MIT Working Papers in Linguistics.) Reinhart, Tanya. 1981a. Pragmatics and linguistics: An analysis of sentence topics. Philosophica 27(1):53–93. Reinhart, Tanya. 1981b. A second COMP position. In A. Belletti, L. Brandi, and L. Rizzi, eds., Theory of Markedness in Generative Grammar, 517–557. Pisa: Scuola Normale Superiore. Reinhart, Tanya. 1983a. Anaphora and Semantic Interpretation. Chicago: University of Chicago Press. Reinhart, Tanya. 1983b. Coreference and bound anaphora: A restatement of the anaphora questions. Linguistics and Philosophy 6(1):47–88. Reinhart, Tanya. 1986. On the interpretation of donkey sentences. In E. C. Traugott, A. ter Meulen, J. S. Reilly, and C. A. Ferguson, eds., On Conditionals, 103– 122. Cambridge: Cambridge University Press. Reinhart, Tanya. 1987. Specifiers and operator binding. In E. Reuland and A. ter Meulen, eds., The Representation of (In)definiteness, 130–167. Cambridge, Mass.: MIT Press. Reinhart, Tanya. 1991. Elliptic conjunctions: Nonquantificational LF. In A. Kasher, ed., The Chomskyan Turn. Oxford: Blackwell. Reinhart, T. 1992. Wh–in situ: An apparent paradox. In P. Dekker and M. Stokhof, eds., Proceedings of the Eighth Amsterdam Colloquium, 483–492. Amsterdam: ILLC/Department of Philosophy, University of Amsterdam. Reinhart, T. [1994] 1998. Wh–in situ in the framework of the minimalist program. OTS Working Papers in Linguistics, TL-94-003. (Appeared with slight revisions in Natural Language Semantics 6(1998):29–56.) Reinhart, T. 1995. Interface Strategies. OTS Working Papers in Linguistics, TL95-002. Utrecht: Utrecht University. Reinhart, T. 1997. Quantifier scope: How labor is divided between QR and Choice functions. Linguistics and Philosophy 20:335–397.
References
327
Reinhart, T. 1998. Interface economy: Focus and markedness. In Chris Wilder, Hans-Martin Gaertner, and Manfred Bierwisch, eds., The Role of Economy Principles in Linguistic Theory, 146–170. Berlin: Akademie Verlag. Reinhart, Tanya. 1999a. Binding theory. In Robert A. Wilson and Frank C. Keil, eds., The MIT Encyclopedia of the Cognitive Sciences, 86–88. Cambridge, Mass.: MIT Press. Reinhart, Tanya. 1999b. The processing cost of reference-set computation: Guess patterns in acquisition. OTS Working Papers in Linguistics, 99001-CL/TL, Utrecht University. Reinhart, Tanya. 2000. Strategies of anaphora resolution. In Hans Bennis, M. Everaert, and E. Reuland, eds., Interface Strategies, 295–324. Amsterdam: Royal Netherlands Academy of Arts and Sciences. Reinhart, Tanya. 2002. The Theta system—an overview. Target Article. Theoretical Linguistics 28(3):229–290. Reinhart, Tanya. 2004. Topics and the conceptual interface. In H. Kamp and B. Partee, eds., Context Dependence in the Analysis of Linguistic Meaning, 275–305. Amsterdam: Elsevier Press. Reinhart, Tanya, and Eric Reuland. 1991. Anaphors and logophors: An argument structure perspective. In Jan Koster and E. Reuland, eds., Long Distance Anaphora, 283–321. Cambridge: Cambridge University Press. Reinhart, Tanya, and Eric Reuland. 1993. Reflexivity. Linguistic Inquiry 24:657– 720. Reinhart, Tanya, and Kriszta Szendro˝ i. 2003. Optimal Design in Language. NWO grant proposal. Available at http://www.let.uu.nl/~tanya.reinhart/ personal/. Reuland, Eric. 2001. Primitives of Binding. Linguistic Inquiry 32(3):439–492. Reuland, Eric, and A. ter Meulen, eds. 1987. The Representation of (In)definiteness. Cambridge, Mass.: MIT Press. Ristad, Eric. 1992. Computational Structure of Natural Language. Doctoral dissertation, MIT. (Distributed by MIT Working Papers in Linguistics.) Rizzi, Luigi. 1990. Relativized Minimality. Cambridge, Mass.: MIT Press. Roberts, Craige. 1987. Modal Subordination, Anaphora, and Distributivity. Doctoral dissertation, University of Massachusetts, Amherst. Rodman, R. 1976. Scope phenomena, movement transformations, and relative clauses. In B. Partee, ed., Montague Grammar. New York: Academic Press. Rooth, Mats E. 1985. Association with Focus. Doctoral dissertation, University of Massachusetts, Amherst. Rooth, Mats E. 1992. A theory of focus interpretation. Natural Language Semantics 1:75–116. Ross, J. R. 1969. Guess who. In R. I. Binnick, A. Davidson, G. M. Green, and J. L. Morgan, eds., CLS 5, Proceedings of the Fifth Regional Meeting of the Chicago Linguistic Society, 252–286. Chicago: University of Chicago Press.
328
References
Ruys, E. G. 1992. The Scope of Indefinites. Doctoral dissertation, Utrecht University, The Netherlands. OTS Dissertation Series. Ruys, E. G. 1996. Some notes on economy conditions in Chapter 4. Unpublished manuscript, Utrecht University. Ruys, E. G. 2000. Weak crossover as a scope phenomenon. Linguistic Inquiry 31(3):513–539. Sag, I. 1976. Deletion and Logical Form. Doctoral dissertation, MIT. Scha, R. 1981. Distributive, collective, and cumulative predication. In J. Groenendijk, T. Janssen, and M. Stokhof, eds., Formal Methods in the Study of Language, 485–512. Amsterdam: Mathematical Center. Schmerling, Susan. 1976. Aspects of English Sentence Stress. Austin: University of Texas Press. Schwarzschild, Roger. 1999. Givenness, AvoidF and other constraints on the placement of accents. Natural Language Semantics 7(1):41–177. Selkirk, Elisabeth O. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, Mass.: MIT Press. Selkirk, Elisabeth O. 1996. Sentence prosody: Intonation, stress and phrasing. In John A. Goldsmith, ed., The Handbook of Phonological Theory. Oxford: Blackwell. Siloni, Tal. 2004. Garden path: Illicit movement. Paper presented at Tools in Linguistic Theory conference, May 16–18, Budapest, Hungary. Unpublished manuscript, Tel Aviv University. Smith, Carol L. 1980. Quantifiers and question answering in young children. Journal of Experimental Child Psychology 30:191–205. Smith, Edward E. 1999. Working memory. In Robert A. Wilson and Frank C. Kieil, eds., The MIT Encyclopedia of the Cognitive Sciences. Cambridge, Mass.: MIT Press. Solan, Lawrence. 1978. Anaphora in Child Language. Unpublished doctoral dissertation, University of Massachusetts, Amherst. Solan, Lawrence. 1983. Pronominal Reference: Child Language and the Theory of Grammar. Dordrecht, The Netherlands: Reidel. Sperber, D., and D. Wilson. [1986] 1995. Relevance: Communication and Cognition. Oxford: Blackwell. Strawson, P. 1964. Identifying reference and truth values. Theoria 30:96–118. (Reprinted in P. Strawson, Logico Linguistic Papers. London: Methuen, 1974. Also reprinted in D. Steinberg and L. Jakobovits, eds., Semantics. Cambridge: Cambridge University Press.) Swinney, D. 1979. Lexical access during sentence comprehension: Reconsideration of context e¤ects. Journal of Verbal Learning and Verbal Behavior 18:645– 659. Swinney, D., J. Nicol, and E. B. Zurif. 1989. The e¤ects of focal brain damage on sentence processing: An examination of the neurological organization of a mental module. Journal of Cognitive Neuroscience 1:25–37.
References
329
Swinney, D., and P. Prather. 1989. On the comprehension of lexical ambiguity by young children: Investigations into the development of mental modularity. In D. S. Gorfein, ed., Resolving Semantic Ambiguity, 225–238. New York: Springer Verlag. Szabolcsi, Anna. 1995. On modes of operation. In Paul Dekker and M. Stokhof, eds., Proceedings of the Tenth Amsterdam Colloquium. Amsterdam: ILLC/ Department of Philosophy, University of Amsterdam. Szabolcsi, Anna. 1997. Strategies for scope taking. In A. Szabolcsi, ed., Ways of Scope Taking, 109–154. Dordrecht, The Netherlands: Kluwer. Szabolcsi, Anna, and Franz Zwarts. 1990. Semantic properties of composed functions and the distribution of wh-phrases. In Martin Stockhof and Leen Torenvliet, eds., Proceedings of the Seventh Amsterdam Colloquium, 529–555. Amsterdam: Institute for Language, Logic, and Information. Szendro˝ i, Kriszta. 2001. Focus and the Syntax-Phonology Interface. Doctoral dissertation, University College London. Szendro˝ i, Kriszta. 2003. Acquisition evidence for an interface theory of focus. In J. van Kampen and S. Baauw, eds., Proceedings of GALA 2003. LOT, Occasional Series 2. Utrecht, The Netherlands. Tancredi, Christopher Damian. 1992. Deletion, Deaccenting, and Presupposition. Doctoral dissertation, MIT. Tavakolian, Susan L. 1974. Contrastive stress pattern in four year olds. Unpublished manuscript, University of Massachusetts, Amherst. Tavakolian, Susan L. 1977. Structural Principles in the Acquisition of Complex Sentences. Doctoral dissertation, University of Massachusetts, Amherst. Taylor-Browne, K. 1983. Acquiring restrictions on forwards anaphora: A pilot study. In Calgary Working Papers in Linguistics, 75–99. Calgary, Alberta: Department of Linguistics, University of Calgary. Terken, J., and S. G. Noteboom. 1988. Opposite e¤ects of accentuation and deaccentuation on verification latencies for given and new information. Language and Cognitive Processes 2(3/4):145–163. Tesar, Bruce. 1998. Error-driven learning in optimality theory via the e‰cient computation of optimal forms. In Pilar Barbosa, Danny Fox, Paul Hagstrom, Martha McGinnis, and David Pesetsky, eds., Is the Best Good Enough? Optimality and Competition in Syntax, 421–435. Cambridge, Mass.: MIT Press and MIT Working Papers in Linguistics. Thornton, Rosalind. 1990. Adventures in Long-Distance Moving: The Acquisition of Complex Wh-Questions. Unpublished doctoral dissertation, University of Connecticut, Storrs. Thornton, Rosalind. 1998. Children’s interpretation of pronouns. Paper presented at the Trieste Conference on Acquisition, September 1998. Thornton, Rosalind, and Kenneth Wexler. 1999. Principle B, VP Ellipsis, and Interpretation in Child Grammars. Cambridge, Mass.: MIT Press.
330
References
Tsai, Wei-Tien. 1994. On Economizing the Theory of A-Bar Dependencies. Doctoral dissertation, MIT. Vallduvi, E. 1990. The Informational Component. Doctoral dissertation, University of Pennsylvania. Verkuyl, Henk J. 1988. Aspectual asymmetry and quantification. In V. Ehrich and H. Vater, eds., Temporalsemantik: Beitra¨ge zur Linguistik der Zeitreferenz, 220–259. Tu¨bingen: Niemeyer. von Fintel, Kai. 1999. NPI licensing, Strawson entailment, and context dependency. Journal of Semantics 16:97–148. Wexler, K., and Y.-C. Chien. 1985. The development of lexical anaphors and pronouns. In Papers and Reports on Child Language Development, 24:138–149. Stanford, Calif.: Stanford University. Williams, Edwin S. 1977. Discourse and logical form. Linguistic Inquiry 8(1):101– 139. Williams, Edwin S. 1986. A reassignment of the functions of LF. Linguistic Inquiry 17:265–299. Williams, Edwin S. 1997. Blocking and anaphora. Linguistic Inquiry 28(4):577– 628. Winter, Yoad. 1997. Choice functions and the scopal semantics of indefinites. Linguistics and Philosophy 20(4):399–467. Zubizarreta, Marı´a Luisa. 1994. Word order, prosody, and focus. Unpublished manuscript, University of Southern California. Zubizarreta, Marı´a Luisa. 1998. Prosody, Focus, and Word Order. Cambridge, Mass.: MIT Press. Zuckerman, S., N. Vasic´, and S. Avrutin. 2001. The syntax-discourse interface and the interpretation of pronominals by Dutch-speaking children. In B. Skarabela, S. Fish, and A. H.-J. Do, eds., Proceedings of the 26th Annual Boston University Conference on Language Development, BUCLD 26, 781–792. Somerville, Mass.: Cascadilla Press. Zuckerman, S., N. Vasic´, and S. Avrutin. 2005. Pronominal reference in child language. Unpublished manuscript, UiL OTS, Utrecht University.
Author Index
Abusch, Dorit, 51, 55, 57, 58, 81, 301n29 Adams, A., 201 Akmajian, A., 254 Altman, G. T. M., 266 Ariel, Mira, 46, 147 Avrutin, Sergey, 251–255, 313–314n17–19 Baauw, S., 251 Baddeley, Alan D., 200, 206, 215, 216 Baker, Carl L., 64 Barwise, J., 50, 78, 303n42 Beghelli, Filippo, 56, 73, 74, 92, 110–115, 121, 297n10, 297n11, 301n28 Ben-Shalom, Dorit, 110, 297n10, 310n14 Bennis, Hans, 306n1 Bernays, P., 82 Berwick, Robert, 8 Bever, Thomas G., 21 Bochvar, D. A., 303n46 Bolinger, Dwight, 146 Boster, C., 310n1 Braine, M., 274 Breheny, Richard, 283, 285, 287, 290 Carston, R., 283 Chien, Y.-Ch., 198, 204–205, 215, 220, 223, 233, 313n11 Chierchia, Gennaro, 202, 205, 220, 223, 273–275, 278–287, 280, 283, 290, 291, 294n1, 302n30 Chomsky, Noam, 1–5, 7, 14, 15, 17, 19, 22, 23, 25, 30, 34, 35, 37, 40, 42, 43, 46, 47, 48–49, 52, 60, 62, 66, 126, 127, 135, 136, 139, 166, 175, 179, 255, 285n4, 304n51, 305n4, 307n5, 392n1 Chung, S., 48, 51, 53, 66–68, 71–72, 298– 299n17 Cinque, Guglielmo, 44, 127–131, 133, 134– 136, 138, 141, 142, 144, 146, 156, 238, 305n1, 305n5 Cli¤ord, T., 221
Collins, Chris, 19, 20 Conway, L., 219, 267–269, 288, 310n1, 311n6 Cooper, R., 50, 53, 78, 80, 296n7, 301n29, 303n42 Cormack, A., 295–296n6 Crain, Stephen, 202, 205, 216, 219–221, 249, 260–269, 275, 278, 280, 283, 310n1, 311n6, 313n16 Cuetos, F., 251 Cutler, A., 248 Danon, G., 94–95, 302n31–32, 303n44 Diesing, M., 56, 96, 297n13 van der Does, J., 302n36 Dowty, D., 92, 211, 307n8 Engdahl, Elisabet, 69, 301–302n30, 305n4 Epstein, Samuel David, 19 Evans, Gareth, 209, 213, 231, 303n40 Everaert, M., 306n1 Faltz, Leonard M., 42, 105, 126 Farkas, Donka, 50, 55, 58, 62 Fiengo, Robert, 192, 225, 231, 312n9 Fodor, Janet Dean, 54–59, 296n7, 296n9, 297n10 Fodor, Jerry A., 21 Fox, Danny, 28–34, 36, 38, 44, 106–109, 169, 176, 184, 192, 194, 196, 197, 212, 254, 294n1, 300n24, 306n1, 310n14 Garrett. Merrill F., 21 Gathercole, S. E., 200, 201, 206, 215, 216 Gazdar, G. A., 273, 283 Gennari, S., 262–263, 265 Giannakidou, Anastasia, 62 Gibson, Ted, 310n2 Gil, David, 106, 112 Golan, Yael, 26
332 Goldsmith, John A., 132 Grice, P., 183, 205, 232, 272–274, 283, 286, 314n20 Grimshaw, Jane, 1, 221, 233 Grodzinsky, Yoseph, 27, 45, 167, 182–185, 199–200, 204–238, 272, 307n10, 311n5–6 Groenendijk, Jeroen, 299n19 Gualmini, Andrea, 202, 205, 220, 262–265, 275, 278, 280, 283 Guasti, M. T., 202, 205, 220, 223, 275, 278, 280, 283 Halbert, A., 249, 260–262, 313n16 Halle, M., 127–128, 130, 133, 238, 305n1 Halliday, M. A. K., 143 Hamblin, Charles Leonard, 299–300n21 Hamburger, H., 267 Hass, W. A., 249 Heim, Irene, 29, 35, 63, 64, 68, 69, 73, 74, 87, 91, 168–169, 172, 179, 182, 194–196, 206, 207, 210, 216, 220, 224, 227–229, 231–232, 294n1, 300n23, 306n2, 308n9 Hendricks, H., 295n5, 305n4 Higginbotham, James, 56, 88–89, 169 Hilbert, D., 82 Hirschbu¨hler, P., 30, 42, 111, 126 Hitch, G. J., 200, 206, 215, 216 de Hoop, H., 56 Horn, Larry, 273, 283, 285 Hornby, P. A., 249 Hornstein, Norbert, 297n11, 297n13 Huang, C. T. James, 16–17, 18, 52, 60, 65, 296n7 Ingram, D., 223 Inhelder, B., 310n1 Ioup, G. I., 50, 111 Jackendo¤, Ray S., 138, 254, 305n4 Jacobson, Pauline, 167, 170 Kamp, H., 68, 80, 86, 88, 92–94, 110, 122, 301n28 Karttunen, L., 69–70, 84, 299n21 Katsos, Napoleon, 283, 285, 287, 290 Kayne, Richard, 294n8 Keenan, Edward, 42, 50, 92, 105, 126, 167, 207 Kehler, Andy, 310n14 Kempson, R., 295–296n6 Kennedy, Christopher, 62 Kornet, R., 223 Kra¨mer, I., 310n1 Kratzer, Angelika, 29, 35, 58, 96–97, 302n30, 308n9 Krifka, Manfred, 281 Kruyt, J. G., 147
Author Index Ladd, D. Robert, 162 Ladusaw, W., 48, 51, 53, 66–68, 71–72, 298–299n17 Landman, Fred, 279, 280 Langacker, R., 179 Lappin, S., 303n45 Lasnik, Howard, 26, 30, 179, 255 Levinson, S., 285, 314n20 Liberman, Mark, 127, 131 Lidz, J., 203 Lillo-Martin, D., 310n1 Link, G., 89 Loveland, K., 223 Lust, B., 221, 223 Maciukaite, Simona, 262–263, 262–264, 265 Maratsos, M. P., 248, 251 Maxfield, Thomas L., 246, 247, 248, 251 May, Robert, 29, 47, 60, 61, 192, 225, 231, 294–295n, 296n7, 297n11, 304n50, 312n9 McCauley, J., 306n2 McCloskey, J., 48, 51, 53, 66–68, 71–72, 298–299n17 McDaniel, Dana, 246, 247, 248, 251 McKee, Cecile, 221 Merchant, Jason, 48 Meroni, L., 202, 205, 220, 262–263, 265, 275, 278, 280, 283 Miller, George, 7 Milsark, Gary, 50 Moltmann, F., 298n15 Mulders, Iris, 10 Musolino, J., 203, 276–278, 289–290 Neeleman, Ad, 137, 139, 144, 146, 147, 241, 242 Nespor, Marina, 138 Ni, W., 219, 267–269, 288 Nishigauchi, Taisuke, 64 Nooteboom, S. G., 147 Noveck, I., 274, 283, 287 Papafragou, A., 276–278, 289–290 Pearlmutter, Neal, 310n2 Pesetsky, David, 52, 56, 64, 147, 294n8 Philip, William, 310n1 Phillips, Colin, 7, 8 Piaget, J., 310n1 Pollard, Carol, 177 Posada, A., 283, 287 Prince, Alan, 1, 306n2 Pritchett, Bradley L., 9–10 Pulleyblank, Douglas, 22 Reinhart, Tanya, 4, 17, 18, 24, 25, 26, 27, 29, 32, 37, 39, 42, 44, 45, 46, 47, 50, 51,
Author Index 53, 69, 74, 81, 97, 105, 106, 110, 112, 122, 126, 127, 139, 144, 146, 147, 166, 167, 174, 176, 177, 179, 181, 182–185, 199–200, 204–205, 206, 207, 209, 210, 211, 214, 219, 220, 224, 227, 231, 232, 240, 241, 242, 272, 294n1, 294n5, 296n7, 298n15, 300n23, 302n30, 303n40, 303n45, 304n49, 304n50, 306n1, 306n6, 307n4, 307n7, 307n10, 310n2, 310n4, 311n5–6, 313n14, 314n20 Reuland, Eric, 104, 166, 167, 176, 177, 183, 212, 294n5, 306n1 Reyle, U., 80, 86, 88, 92–94, 110, 122, 301n28 Ristad, Eric, 307n3 Rizzi, Luigi, 19 Rodman, R., 48–49 Rooth, Mats E., 125, 137, 143, 304n48 Ross, J. R., 53 Rozen, S., 221, 233 Ruigendijk, E., 251 Rumain, B., 274 Ruys, E. G., 51, 53–55, 57–58, 78, 88, 90, 110–111, 294n4, 296n7, 297n11, 300n26, 301n29, 304n52 Sag, Ivan A., 29–30, 52, 54–59, 66, 107, 177, 296n7, 296n9, 297n10 Saito, Mamoru, 26 Scha, Remko, 89, 91, 99, 294n1, 302n36, 303n46, 304n48 Schmerling, Susan F., 162 Schwarzschild, Roger, 142–144, 150, 306n8 Selkirk, Elisabeth O., 43, 126, 131, 138, 146, 162, 242 Shankweiler, D., 249, 260–262, 313n16 Shaw, C., 223 Siloni, Tai, 10 Smith, Carol L., 201, 274 Smolensky, Paul, 1 Solan, Lawrence, 221, 251 Sperber, D., 283 Steedman, M., 266 Stokho¤, Martin, 299n19 Stowell, T., 110–115, 121, 301n28 Strawson, P., 97 Swinney, D., 248 Szabolcsi, Anna, 18, 50, 80, 92–94, 110, 294n1, 297n10, 298n15, 303n39 Szendro˝i, Kriszta, 127, 131–134, 136, 137, 145, 151, 152, 160, 239, 252, 258, 262, 264–266, 271, 313n14 Tancredi, Christopher Damian, 30 Tavakolian, Susan L., 221, 247 Taylor-Browne, K., 223 Terken, J., 147
333 Tesar, Bruce, 22 Thornton, Rosalind, 201, 206, 216–232, 234–236, 249–250, 256, 310n1, 311– 313n6–10 Tsai, Wei-Ten, 17 Turkel, William J., 22 Vallduvi, E., 305n4 Vasic´, N., 251–255, 313–314n17–19 Vergnaud, J.-R., 128, 130, 133, 238, 305n1 Verkuyl, Henk J., 110 Vogel, Irene, 138 Weerman, Fred, 139 Weinberg, Amy, 8 Wexler, Kenneth, 198, 201, 204–206, 215– 233, 234–236, 272, 311–313n6–11 Williams, Edwin S., 29–30, 52, 64, 66, 107, 142–144, 146, 255–257, 306n9 Williams, John, 283, 285, 287, 290 Wilson, D., 283 Winter, Yoad, 91, 92, 98–100, 112, 294n1, 301n29, 302n37, 306n1 Woodams, E., 249, 260–262, 310n1, 313n16 Zubizaretta, Maria Luisa, 127, 130, 131, 134–136, 146, 156, 305n2 Zuckerman, S., 251–255, 313–314n17–19 Zwarts, F., 18
Subject Index
A-binding, 169–172 A-chain condition, 174–178 Acquisition of main-stress shift, 202, 238– 272 contrastive focus, 243 contrastive stress, 243, 246–249, 251, 254 default strategies, 238, 266–272 focus identification, 238, 250, 258, 259– 266, 271 focus set, 239–241 general stress deficiency, arguments regarding, 246–251, 261 NSR (as neutral main stress rule), 239, 242, 243, 249–251 overview of stress and focus for purposes of, 238–245 projection of focus, 243, 259–261 reference-set computation evidence of children’s problems with, 246–251 overview of, 243–245 similarity to acquisition of scalar implicatures, 278, 281–282, 288 stress-shift operations, 241–243 switch-reference resolution, 202, 247, 251– 259 Acquisition of Rule I, 199–202, 204–238 chance performance, explaining, 232–238 complexity of computation, di‰culties caused by, 311–312n6 Condition C issues, 220–226 learnability questions, 227–232 overview of binding, covaluation, and Rule I, 206–216 processing load, evidence of, 217–220 Thornton and Wexler’s arguments against processing account, 216–226 Acquisition of scalar implicatures, 37, 272– 291 DE contexts, children’s lack of problem with computing, 282–283
default strategies, 288–290 directing children to consider implicature, 277–278 findings regarding, 274, 275–277 similarity to acquisition of stress-shifted focus, 278, 281–282, 288 Anaphora, 5, 165–198. See also Binding; Covaluation; Rule I acquisition of, 204–206 (see also Acquisition of Rule I) backtracking, 222, 226, 312n8 children vs. adults, working memory limitations of, 198, 199–202 current theory of, 166–169 disanaphora, 256–257 Engdahl’s application of choice-function concept to problem of, 301–302n30 focus/stress operations and, 141–148 procedures for anaphora resolution, 165– 173 restrictions on, 173–181 SE (simple expression) and SELF anaphors, 174, 177 switch-reference resolution, 251–259 Backtracking and anaphora, 222, 226, 312n8 Binding, 165, 169–172. See also Anaphora acquisition of Rule I and overview of, 206–216 computational system conditions, 174– 177 Condition C, 173–174 current theory on, 166–169 defined, 166 restrictions on, 173–177 C/I systems. See Conceptual-intentional (C/I) systems Case and anaphoric binding restrictions, 176–177
336 Children vs. adults, working memory limitations of, 11–12, 45, 198, 199–204. See also entries at Acquisition Choice functions for existential qualifiers, 81–101 anaphora, Engdahl’s application of concept to problem of, 301–302n30 collective vs. distributive readings of existentials, 88–91 deriving choice-function interpretation, 85–88 empty set, 96–100 existential closure and choice function, 81–85 extensionality, 100–101 indefinites interpretable by, 91–95 numeral plural indefinites, 117, 120 semantics of, 95–101 wh-in situ, 80, 84, 87, 101 Clause boundedness, 60–64, 297n11 Coindexation, 307n4. See also Anaphora; Binding Collective vs. distributive readings of existentials, 88–91 Complex SELF anaphors, 174 Computational system (CS) binding restricted by, 173–174 parser, transparency of, 7–11 reference-set computation, restricted role of, 11–12 scalar implicatures and, 274–275 syntactic coding of, 4–5 as syntax in broad sense, 2 Concepts systems, 3, 4 Conceptual-intentional (C/I) systems, 3–7 as concept, context, and inference systems, 4 interface strategies for, 3–4 transparency of parser and, 7 Condition B acquisition of (see Acquisition of Rule I) covaluation restricted by, 179–181 Condition C (Logical Syntax Condition) acquisition of Rule I and, 220–226 binding restricted by, 173–174 Coreference/covaluation Rule I (see Rule I) covaluation restricted by, 179–181 Context C/I system, 3, 4 Context-driven vs. default approach to acquisition of scalar implicatures, 283– 288 Contrastive focus, 151, 152, 243 Contrastive stress, 243, 246–249, 251, 254 Coreference, 27–28 Coreference Rule I. See Rule I
Subject Index Costliness of reference-set computation. See Processing costs of reference-set computation Covaluation, 165, 172–173. See also Anaphora acquisition of Rule I and overview of, 206–216 current theory on, 166–169 in ellipsis contexts, 192–196, 312–313n9 interface strategy governing (see Rule I) minimize interpretive options principle, 181–186 with QR, 179, 190–191 reference-set computation, 186–190 restrictions on, 178–181 Rule H, 194–196 statement of, 308n10 CS theory. See Computational system Default strategies acquisition of main-stress shift, 238, 266– 272 acquisition of scalar implicatures, 288–290 Default vs. context-driven approach to acquisition of scalar implicatures, 283– 288 Derivational Theory of Complexity, 7, 11 Destressing. See also Focus/stress anaphoric, 141–148 application mechanics of, 148–156 Disambiguation, semantic, 201–204, 219, 238, 250, 259, 266–267, 270, 271 Disanaphora Law, 256–257 Discourse grammar, focus, and stress, 134 Discourse Representation Theory (DRT), 68, 73, 78, 80, 85–87, 95, 195 Distributive vs. collective readings of existentials, 88–91 ECM (extraction out of small clauses), 60 Economy of derivations, 11 Economy principle, 23, 37 ECP. See Empty Category Principle Ellipsis contexts acquisition of main-stress shift, 254–256 acquisition of Rule I, 225–226, 235–236 covaluation in, 192–196, 312–313n9 interpretation-dependent reference sets, 29–32 scope-shift, 52, 63, 67, 107, 297n14 Empty Category Principle (ECP) defined, 298n16 reference-set computation, 16–17, 18, 293n2 wide scope in situ, 65–66 Empty set and choice functions for existential qualifiers, 96–100
Subject Index Except elliptic conjunctions, 63 Existence presupposition of indefinites, 96– 97 Existential wide scope. See also Choice functions for existential qualifiers interpretive problem of wide scope in situ and, 73–76 problems posed by, 53–59 QR and, 79–81 syntactic freedom of, 50–53 Extraction out of small clauses (ECM), 60 Faculty of language (FL), 2, 7 Features, 4–5, 135–137, 293n3 Focus/stress, 5, 125–163. See also Acquisition of main-stress shift; Main stress; Nuclear Stress Rule; Secondary stress anaphora and, 141–148 coding of, 134–141 contrastive focus, 151, 152, 243 contrastive stress, 243, 246–249, 251, 254 destressing anaphoric, 141–148 application mechanics of, 148–156 discourse grammar, 134 features approach to, 135–137 focus set, 134–141, 239–241 general stress deficiency in children, arguments regarding, 246–251, 261 givenness, 142–143 identification of focus, 136–137, 139, 238, 250, 258, 259–266, 271 markedness, 126–127, 134–135, 161–163 neutral main stress, 126–127, 134–135, 161–163, 239 projection of focus, 156–161, 243, 259– 261 reference-set computation acquisition of main-stress shift and, 243– 251 markedness, 161–163 projection of focus, 156–161, 243, 259– 261 repair of imperfections, interface strategies as, 40–44, 127, 140–141, 155–156 scope-shift problems related to issues of, 125–126 semantics of, 125 stress-shift operations acquisition of main-stress shift and, 241– 243 application mechanics of, 148–156 contextual motivation for, 141–148 number and type, 142 switch-reference resolution, 151, 202, 247, 251–259 Functional accounts of language, 2
337 Garden paths and transparency of parser, 9–10 Givenness and focus identification, 142–143 Generalized quantifiers (GQ)-construal choice functions for existential qualifiers, 86, 88, 89, 91, 92, 94–96, 98, 100 for existentials, 80–81 island-free QRs, 78–79 Gricean maxim of manner, 183, 273, 314n20 Gricean maxim of quantity, 286 Gricean tradition, 205, 232, 272, 274, 283, 286 Guises, concept of, 210–211, 216, 221–223, 226–231, 235–237 Head-driven parsers, 10 Identification of focus, 136–137, 139, 238, 250, 258, 259–266, 271 Illicit operations QR as, 105 reference-set computation, leading to, 40, 44 Imperfections, repair of. See Repair of imperfections, interface strategies as Implicatures, scalar. See Scalar implicatures Indefinite/existential NPs. See Existential wide scope Indefinite numerals, processing limitations on size of reference sets for, 110–123 choice function, 117, 120 disagreement regarding, 110–111 distributive interpretability, 115–119 full scope of options, need to consider, 119–120 reconstruction of subject, 113–114 spectrum of possible reference sets, 120– 123 strength of contextual need to apply QR, 112–113 Indefinites interpretable by choice functions, 91–95 Inference C/I system, 3, 4 Interface economy of interpretationdependent reference sets, 35–36 Interface strategies alternatives to reference-set computation, 45–46 covaluation governed by (Rule I), 181–191 (see also Rule I) diagram of, 3 optimal design and, 1–5 reference-set computation as, 1–2, 11–12 (see also Reference-set computation) of repair, 5–6, 37–46 (see also Repair of imperfections, interface strategies as) transparency of parser and, 7–11
338 Interpretable features, 293n3 Interpretation-dependent reference sets, 25– 36 alternative means of identifying derivations with interpretations, 34–35 ellipsis context, 29–32 interface economy of, 35–36 problems with, 32–34 superiority restrictions, 26–28, 32–34, 294n8 Island-free QR, 56–57, 69, 72, 73, 76–79, 83, 90, 96, 100 Learnability questions and acquisition of Rule I, 227–232 ‘‘least e¤ort’’ principle, 19–24, 104–105, 183–184 LF coreference and, 28 existential NPs, 51 focus/stress, 43, 126–127, 135–139 MLCs and, 16–17, 18 QR, 28–31, 34–35 wh-in situ and wide scope in situ, 64–65 Linguistic theory, goal of, 2 Logical Syntax Condition. See Condition C Main stress. See also Acquisition of mainstress shift Cinque’s ‘‘most embedded’’ system, 127– 131, 134 neutral main stress, 126–127, 134–135, 161–163, 239 relationship between focus and, 134–135 Szendro˝i’s metrical-tree notation system, 131–134 Markedness focus/stress, 126–127, 134–135, 161–163 QR, 105–110 Metrical grid theory, 239 Metrical-tree notation, Szendro˝i’s main stress system using, 131–134 Minimal Link Conditions (MLCs), 14–25 interpretation-based, 25–36 (see also Interpretation-dependent reference sets) ‘‘least e¤ort,’’ 19–24 minimalist program’s association of reference-set computation with, 13 QR and, 29, 108 superiority restriction and, 15–19, 20, 26– 28, 33–34, 294n8 Minimalist framework and reference-set computation, 1, 13 Minimize interpretive options principle, 101–105, 181–186 MLCs. See Minimal Link Conditions Montague tradition, 35, 108, 295n3, 296n7
Subject Index Neo-Gricean tradition, 274, 283 Neutral main stress, 126–127, 134–135, 161–163, 239 Novelty, concept of, 143 NPs, indefinite/existential. See Existential wide scope Nuclear Stress Rule (NSR), 127–128 as neutral main stress rule, 239, 242, 243, 249–251 PF-coding and focus set, 141 projection of focus, 156 repair of imperfections, interface strategy as, 42, 44 stress-shift operations, 145, 148– 150 Numeral plural indefinites. See Indefinite numerals, processing limitations on size of reference sets for Numeral scalar implicatures, acquisition of, 290–291 Numeration, concept of, 15, 392n1 Obligatory Contour Principle (OCP), 132, 151, 306n10–11 Optimal design, 1–12 Chomsky’s hypothesis of, 5–6, 7 interface strategies for, 1–5 reference-set computation, restricted operation of, 2, 11–12 scalar implicatures and 275 SM restrictions and, 6–7 transparency of parser and, 7–11 Optimality Theory (OT) metric in, 37 MLC and, 13, 21, 22 optimal design distinguished, 1, 5 parser transparency and, 109 restrictions imposed on theoretical freedom to postulate reference-set computation by, 45 Order of recursion, 129 Parallelism, 30, 32, 52, 107, 192, 224–225, 254–259, 297n14, 314n19 Parsers head-driven, 9 Theta attachment and, 9–10 transparency of, 7–11, 21, 109 Perfect system, language viewed as, 5, 7, 137–138 PF interface. See Focus/stress Phonological tree and focus coding, 138– 139 Presuppositions association with NPs, 299–300n21 existence presupposition of indefinites, 96– 97
Subject Index Principles-and-parameters framework, 19, 29 Processing costs of reference-set computation, 199–204 main-stress shift acquisition, 202, 238–272 (see also Acquisition of main-stress shift) numeral plural indefinites, 110–123 (see also Indefinite numerals, processing limitations on size of reference sets for) QR, costliness of, 106–108 restricted operation due to, 2, 38–39, 45 Rule I acquisition as evidence of, 199–202, 204–238 (see also Rule I) scalar implicatures, acquisition of, 37, 272–291 (see also Acquisition of scalar implicatures) Projection of focus, 156–161, 243, 259–261 Pronoun resolution. See Anaphora QR acquisition of, 203 clause boundedness, 60–64, 297n11 costly nature of, 106–108 covaluation construals with, 179, 190–191 as evidence for reference-set computation, 32–34, 37 existentials and, 79–81 (see also Existential wide scope) existentials not requiring, 79–81 as illicit operation, 105 initial theory regarding, 48–50 introduction of concept, 47–48 island-free, 56–57, 69, 72, 73, 76–79, 83, 90, 96, 100 LF-movement analysis, 28–31, 34–35 markedness approach to, 105–110 minimize interpretive options principle, 101–105 MLC view of, 29, 108 numeral plural indefinites and, 110–123 (see also Indefinite numerals, processing limitations on size of reference sets for) problems raised by, 32–36 ‘‘realistic’’ view of, 60–61 repair of imperfections, interface strategies as, 37–38, 39, 42–44, 105–110 sluicing and wide scope in situ, 66–68 QUANT, 35–36, 303n51 Quantifier scope, 5, 48–64, 101–123. See also Existential wide scope, scope-shift clause boundedness, 60–64, 297n11 initial theory regarding, 48–50 ‘‘realistic’’ QR view, 60–61 ‘‘Realistic’’ QR view, 60–61 Reconstruction, 113–114, 203
339 Recursion, order of, 129 Reference principle, 297n13 Reference-set computation, 13–46. See also Processing costs of reference-set computation alternative interface strategies to, 45–46 covaluation as means of anaphora resolution, 186–190 defined, 13 empty set and choice functions for existential qualifiers, 96–100 evidence for necessity of, 1–2 focus/stress and (see Focus/stress) global nature of, 11, 20–21, 24, 154 illicit operations leading to, 40, 44 as interface repair strategy, 37–46 (see also Repair of imperfections, interface strategies as) interpretation-dependent, 25–36 (see also Interpretation-dependent reference sets) MLCs and, 13, 14–25 (see also Minimal Link Conditions) noneconomic procedures triggering, 37– 38 restricted operation of, 2, 11–12 semantic disambiguation vs., 202–204 Referential parallelism, 254–257 Referentiality, 80 Relativized minimality, 19 Relevance theory, 274, 286 Repair of imperfections, interface strategies as, 5–6, 37–46 economy violations, 37–40 focus/stress, 40–44, 127, 140–141, 155– 156 illicit operations requiring, 40, 44 minimize interpretive options principle, 101–105 QR, 37–38, 39, 42–44, 105–110 Rule H (covaluation in ellipsis contexts), 194–196 Rule I, 181–191. See also Acquisition of Rule I; Covaluation in ellipsis contexts, 192–196, 312–313n9 minimize interpretive options principle, 181–186 psychological reality of, 196–198 reference-set computation, 186–190 scalar implicatures and, 272 Thornton and Wexler’s arguments against processing account of, 216–226 Scalar implicatures. See also Acquisition of scalar implicatures active (non canceled) implicatures, problems related to, 281–282 background of concept, 272–275
340 Scalar implicatures (cont.) context-driven vs. default approach to, 283–288 development of concept, 272–275 mechanism governing, 278–281 numeral scalar implicatures, 290–291 optimal design hypothesis, 275 Relevance theory, 274, 286 Rule I and, 272 Scope-shift, 47–48, 101–123. See also Existential wide scope; Indefinite numerals, processing limitations on size of reference sets for; QR; Quantifier scope; Wide scope in situ focus/stress problems related to issues of, 125–126 interpretation-dependent reference sets and, 35 minimize interpretive options principle, 101–105 QR as repair strategy, 105–110 SE (simple expression) anaphors, 174 Secondary stress main stress and, 130, 132–133 processing costs of reference-set computation, 241, 251 stress-shift operations, 145, 148–149, 151– 153 SELF anaphors, 174, 177 Semantic default strategies acquisition of main-stress shift, 238, 266– 272 acquisition of scalar implicatures, 288– 290 Semantic disambiguation, 201–204, 219, 238, 250, 259, 266–267, 270, 271 Sensorimotor/sound (SM) interface diagram of relationship to other systems, 3 ‘‘hardware’’ restrictions of, 6–7 Simple expression (SE) anaphors, 174 Skolem functions, 82, 302n30 Sluicing choice functions for existential qualifiers, 85 existential NPs, 51, 52, 53, 59 instances of, 298–299n17 interpretive problem of wide scope in situ and, 71–73 problems raised regarding, 48 wide scope in situ, 66–68, 71–73 Specificity, 56, 58, 80, 96, 296n8 Stress and stress-shift. See Focus/stress ‘‘structured meaning,’’ concept of, 211, 224, 228, 229, 236 Subjacency quantifier scope, 49, 60, 62–64 wide scope in situ, 65, 67–68, 72, 74
Subject Index Superiority restrictions, 15–19, 20, 26–28, 33–34, 294n8 Superraising, 19–20, 25 Switch-reference resolution, 151, 202, 247, 251–259 Syntactic coding, 4–5 Syntactic tree and focus coding, 138 Syntax in broad sense, CS as, 2 Theta attachment, 9–10 Token-to-token transparency, 7 Transparency of parser, 7–11, 21, 109 Uninterpretable features, 293n3 Unselective binding, 74, 300n22–25 Verb-final languages and head-driven parsers, 10 Wh-in situ choice functions for existential qualifiers, 80, 84, 87, 101 interpretive problem of wide scope in situ and, 69–71 quantifier scope, 48, 51–52, 55–56, 60 wide scope in situ, 64–66, 68–69, 69–71, 75 Wide scope in situ as alternative to quantifier scope, 64–68 existential wide scope and interpretive problem of, 73–76 interpretive problem of, 68–76 sluicing, 66–68, 71–73 wh-in situ, 64–66, 68–69, 69–71, 75 Word order and stress, 129 Working memory, limitations of, 6, 11–12, 45, 198, 199–204. See also Processing costs of reference-set computation