Events, Phrases, and Questions
Oxford Studies in Theoretical Linguistics General editors: David Adger, Queen Mary Uni...
85 downloads
1360 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Events, Phrases, and Questions
Oxford Studies in Theoretical Linguistics General editors: David Adger, Queen Mary University of London; Hagit Borer, University of Southern California Advisory editors: Stephen Anderson, Yale University; Daniel Büring, University of California, Los Angeles; Nomi Erteschik-Shir, Ben-Gurion University; Donka Farkas, University of California, Santa Cruz; Angelika Kratzer, University of Massachusetts, Amherst; Andrew Nevins, University College London; Christopher Potts, University of Massachusetts, Amherst; Barry Schein, University of Southern California; Peter Svenonius, University of Tromsø; Moira Yip, University College London Recent titles 15 A Natural History of Infixation by Alan C. L. Yu 16 Phi-Theory Phi-Features Across Interfaces and Modules edited by Daniel Harbour, David Adger, and Susana Béjar 17 French Dislocation: Interpretation, Syntax, Acquisition by Cécile De Cat 18 Inflectional Identity edited by Asaf Bachrach and Andrew Nevins 19 Lexical Plurals by Paolo Acquaviva 20 Adjectives and Adverbs Syntax, Semantics, and Discourse edited by Louise McNally and Christopher Kennedy 21 InterPhases Phase-Theoretic Investigations of Linguistic Interfaces edited by Kleanthes Grohmann 22 Negation in Gapping by Sophie Repp 23 A Derivational Syntax for Information Structure by Luis López 24 Quantification, Definiteness, and Nominalization edited by Anastasia Giannakidou and Monika Rathert 25 The Syntax of Sentential Stress by Arsalan Kahnemuyipour 26 Tense, Aspect, and Indexicality by James Higginbotham 27 Lexical Semantics, Syntax, and Event Structure edited by Malka Rappaport Hovav, Edit Doron, and Ivy Sichel 28 About the Speaker Towards a Syntax of Indexicality by Alessandra Giorgi 29 The Sound Patterns of Syntax edited by Nomi Erteschik-Shir and Lisa Rochman 30 The Complementizer Phase edited by E. P hoevos Panagiotidis 31 Interfaces in Linguistics New Research Perspectives edited by Raffaella Folli and Christiane Ulbrich 32 Negative Indefinites by Doris Penka 33 Events, Phrases, and Questions by Robert Truswell 34 Dissolving Binding Theory by Johan Rooryck and Guido Vanden Wyngaerd For a complete list of titles published and in preparation for the series, see pp 264–5.
Events, Phrases, and Questions ROBERT TRUSWELL
1
3
Great Clarendon Street, Oxford ox2 6dp Oxford University Press is a department of the University of Oxford. It furthers the University’s objective of excellence in research, scholarship, and education by publishing worldwide in Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republic France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Oxford is a registered trade mark of Oxford University Press in the UK and in certain other countries Published in the United States by Oxford University Press Inc., New York © Robert Truswell 2011 The moral rights of the author have been asserted Database right Oxford University Press (maker) First published 2011 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing of Oxford University Press, or as expressly permitted by law, or under terms agreed with the appropriate reprographics rights organization. Enquiries concerning reproduction outside the scope of the above should be sent to the Rights Department, Oxford University Press, at the address above You must not circulate this book in any other binding or cover and you must impose the same condition on any acquirer British Library Cataloguing in Publication Data Data available Library of Congress Cataloging in Publication Data Library of Congress Control Number: 2010935045 Typeset by SPI Publisher Services, Pondicherry, India Printed in Great Britain on acid-free paper by MPG Books Group, Bodmin and King’s Lynn ISBN 978–0–19–957777–4 (Hbk.) 978–0–19–957778–1 (Pbk.) 1 3 5 7 9 10 8 6 4 2
Contents General Preface Acknowledgements List of Figures and Tables Abbreviations 1. Introduction 1.1 1.2 1.3 1.4
Where We’re Going Locality Theory and Extraction from Adjuncts: A Potted History Further Puzzles The Plan
vii viii ix x 1 1 6 29 37
Part I. The Structure of Events 2. The Variable Size of Events 2.1 2.2 2.3 2.4
Variable Pragmatic Coarse-Graining The Directness of Direct Causation Aspectual Classes Summary
3. Single Events from Multiple Verb Phrases and the Role of Agentivity 3.1 3.2 3.3 3.4 3.5
Events and VP-Coordination Relations other than Direct Causation Agentivity and Event Size Planning and Enablement Agentivity, Aspectual Classes, and the Progressive
4. Structures Built from Events 4.1 4.2 4.3 4.4
Introduction The Problems Dealing with the Problems Events and Intervals in Syntax
43 46 51 55 67 68 68 73 79 82 98 104 104 105 109 111
Part II. Events and Locality 5. Where We Stand
121
vi
Contents
6. Extraction from Adjuncts 6.1 6.2 6.3 6.4 6.5
Rationale Clauses Prepositional Participial Adjuncts Bare Present Participial Adjuncts Conclusion Appendix: Preposition Stranding in Adjuncts
7. Extraction from Complement Clauses and the Effect of Tense 7.1 7.2 7.3 7.4 7.5 7.6
On the Impermeability of Tensed Adjuncts Why Tensed Complements are Different Factive Islands and Event Structure Cyclic Determination of Event Structure What Has Happened to the CED? Summary
8. Architectural Issues 8.1 8.2 8.3 8.4
Introduction Could We Syntacticize the Single Event Grouping Condition? Integrating Syntactic and Semantic Constraints on Movement Conclusion
129 130 135 145 167 169 174 175 178 179 188 192 205 207 207 209 229 238
9. Conclusion
240
References Author Index Subject Index
246 256 259
General Preface The theoretical focus of this series is on the interfaces between subcomponents of the human grammatical system and the closely related area of the interfaces between the different subdisciplines of linguistics. The notion of ‘interface’ has become central in grammatical theory (for instance, in Chomsky’s recent Minimalist Program) and in linguistic practice: work on the interfaces between syntax and semantics, syntax and morphology, phonology and phonetics etc. has led to a deeper understanding of particular linguistic phenomena and of the architecture of the linguistic component of the mind/brain. The series covers interfaces between core components of grammar, including syntax/morphology, syntax/semantics, syntax/phonology, syntax/ pragmatics, morphology/phonology, phonology/phonetics, phonetics/speech processing, semantics/pragmatics, intonation/discourse structure as well as issues in the way that the systems of grammar involving these interface areas are acquired and deployed in use (including language acquisition, language dysfunction, and language processing). It demonstrates, we hope, that proper understandings of particular linguistic phenomena, languages, language groups, or inter-language variations all require reference to interfaces. The series is open to work by linguists of all theoretical persuasions and schools of thought. A main requirement is that authors should write so as to be understood by colleagues in related subfields of linguistics and by scholars in cognate disciplines. David Adger Hagit Borer
Acknowledgements This is a substantially revised version of my dissertation, Truswell (2007b). Everyone who was thanked in 2007 is hereby thanked for a second time. In particular, the input of Ad Neeleman as supervisor of that dissertation, and of Jack Hoeksema and David Adger as examiners, remains just as valuable. Since 2007, portions of this material have been presented at Tufts, MIT, UMass Amherst, Harvard, and Edinburgh. Thanks to all those audiences for their input. Many of the refinements of this work over Truswell (2007b) have their roots in conversations in the Center for Cognitive Studies at Tufts University, in particular with Ray Jackendoff. I doubt anyone there agrees with much of this, but I am grateful to them for making me think about topics that had never crossed my mind before. Thanks also to all the informants who have discussed these data with me: Peter Ackema, Daniel Altshuler, Kristine Bentzen, Anna Cardinaletti, Francesca Filiaci, Raffaella Folli, Vera Gribanova, Ger de Haan, Zakaris Hansen, Ana Carrera Hernandez, Jack Hoeksema, Eric Hoekstra, Anders Holmberg, Lars Jensen, Vikki Janke, Akis Kechagias, Hans van de Koot, Björn Lundquist, Ad Neeleman, Øystein Nilsen, Katya Pertsova, Matthew Reeve, Aglaya Snetkov, Goutta Snetkov, Vladimir Snetkov, Alyona Titova, Nikos Velegrakis, Reiko Vermeulen, and Olga Yokoyama. It will be clear to anyone familiar with the literature that two works really should have been cited more than they are in this text, namely Jackendoff ’s Semantic Structures (1990) and Bridget Copley’s (2002) MIT dissertation. I can only apologize for the omission. This research was originally carried out with the support of a studentship from the Arts and Humanities Research Council, and a Wingate Scholarship. The revisions as this research makes the transition from thesis to monograph have been supported by a British Academy Post-Doctoral Fellowship.
List of Figures and Tables Figures 2.1. The maximal core event
58
3.1. A TOTE unit
87
3.2. Hammering a nail into a wall, in TOTE units
88
3.3. Still from Wolff ’s Experiment 1: all force-dynamically related objects are inanimate. (Figures 3.3–3.6 are reprinted from Cognition 88, Phillip Wolff, ‘Direct causation in the linguistic coding and individuation of causal events’, pp.16, 22, Copyright (2003), with permission from Elsevier.)
92
3.4. Still from Wolff ’s Experiment 2: the initial element in the causal chain is animate
93
3.5. Still from Wolff ’s Experiment 3: the initial element in the causal chain is animate, and intends to bring about the goal event
94
3.6. Still from Wolff ’s Experiment 3: the initial element in the causal chain is animate, but does not intend to bring about the outcome of the causal chain
95
3.7. The maximal core event
96
3.8. Extended events
98
Tables 8.1. Processing costs associated with a single-event reading of the sentence What did John arrive whistling?
235
8.2. Processing costs associated with a multiple-event reading of the sentence What did John arrive whistling?
236
Abbreviations 1
First person
2
Second person
3
Third person
a3
Set A affix, third person (Tzotzil)
acc
Accusative case
adv
Adverbial form
AP
Adjective phrase
BPPA
Bare Present Participial Adjunct
C
Complementizer
CCG
Combinatory Categorial Grammar
CED
Condition on Extraction Domain
Conj
Conjunction
cp
Completive aspect
CSC
Coordinate Structure Constraint
dat
Dative case
DP
Determiner phrase
DRS
Discourse Representation Structure
DRT
Discourse Representation Theory
ECP
Empty Category Principle
GB
Government and Binding theory
GPSG
Generalized Phrase Structure Grammar
HPSG
Head-driven Phrase Structure Grammar
H
Head (syntactic category unspecified)
HP
Maximal projection (syntactic category unspecified)
icp
Incompletive aspect
inf
Infinitive
INFL
Inflection
IP
Inflectional phrase
LCA
Linear Correspondence Axiom
Abbreviations LF
Logical Form
masc
Masculine gender
neu
Neuter gender
nom
Nominative case
NP
Noun phrase
OSV
Object–Subject–Verb order
OVS
Object–Verb–Subject order
P
Preposition
P&P
Principles and Parameters theory
pass
Passive
PF
Phonological Form
PP
Prepositional phrase
sbj
Subjunctive
sg
Singular
SOV
Subject–Object–Verb order
Spec
Specifier
SVO
Subject–Verb–Object order
t
Trace
T
Tense
TP
Tense Phrase
UTAH
Uniformity of Theta-Assignment Hypothesis
V
Verb
VOS
Verb–Object–Subject order
VP
Verb phrase
VSO
Verb–Subject–Object order
X, Y
Head (syntactic category unspecified)
XP, YP
Maximal projection (syntactic category unspecified)
xi
This page intentionally left blank
1 Introduction 1.1 Where We’re Going This book has two aims. The first is to propose a theory of what certain conceptual units in semantic structures look like. There is a fair consensus about what sorts of primitive elements we need to make our semantics work as a model of the way humans talk and understand each other. We need some individuals, the basic concrete or abstract things in the world that we talk about when we talk. Things happen to individuals, individuals do things, and a large part of what natural language does is establish or manipulate a set of relations among individuals. We probably also need some stuff (often called masses), the badly-behaved material which is not individuated like individuals are but which still participates in the same sort of relations that individuals participate in. Individuals are generally composed of stuff, but stuff need not correspond to any well-defined individual. We also need some events, individuated occurrences in which indviduals and portions of stuff do things to each other and to other elements of our semantic structures. These events may or may not be different from states, which are more atemporal relations holding among one or more other units. Events are definitely different from what I will call happenings (which Bach 1986 called processes). As Bach argued, events relate to happenings in the same way as individuals relate to stuff. When we move beyond that, into the realm of what Asher (1993) called abstract objects, there is rather less agreement. We almost certainly need some propositions, which are things that some individuals believe, express, and hold other attitudes towards. These may be different from facts, which look a lot like propositions but, on Asher’s account, can make individuals do things. And so on. And then there are times and degrees and all sorts of other, more exotic building blocks. The same sort of lists occur in work after work, allowing for variation among authors in how ontologically austere they are. One area where researchers have generally had less to say, however, concerns how to recognize these units. This is true even if we stick to regular, concrete individuals we can touch and see, and ignore trickier abstract or fictional
2
Introduction
individuals. We all agree on the basic cases, where the batter is an individual and the baseball is an individual, and the stuff that makes up the batter is stuff, as is the stuff that makes up the baseball, and the batter hitting the ball is an event, and the happenings that the batter’s hitting the ball are composed of are happenings, and so on. But, for each of these clear cases of ‘packaging’ of continuous stuff and happenings into discrete individuals and events, there are more marginal examples where it is unclear whether we are dealing with an individual (or event) or not. This problem is far from new, of course. It preoccupied many Gestalt psychologists, among others. Wertheimer framed the problem eloquently: I stand at a window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have “327”? No. I have sky, house, and trees. It is impossible to achieve “327” as such. (Wertheimer 1923: 71)
And Koffka made a similar point: Think for a moment what happens to a retinal element while your eyes roam around as they continually do: in quick succession, and without any order whatsoever, this element will be stimulated now by white light, now by greenish light; one moment the stimulation will be strong, the next very weak; green will be followed by red or blue, a kaleidoscopic change. And what corresponds to this whirl of stimulation of the retinal points? A perfectly steady and orderly world; the cigarette box on my desk remains a cigarette box, the calendar a calendar, while my eyes move along. (Koffka 1935: 175)
The observation can be posed equally acutely for events: why is it that we see the hitting of a baseball as the hitting of a baseball rather than a series of undifferentiated movements? The outline of the Gestalt psychologists’ answer is that ‘perception is organization’ (Koffka 1935: 110), that is, perception is a process in which, at least, ‘[u]nits are . . . formed and maintained in segregation and relative insulation from other units’ (Koffka 1935: 175). This is precisely the ‘packaging’ referred to above. However, this fundamental insight, that the units we manipulate are reified as part of the process of perception rather than given by some inherent organization of the external world, leads to a second question: What are the principles according to which we organize our perceptual environment into these segregated, insulated units? The aforementioned works tackled this problem, as it applies to the perception of individuals, in terms of a series of laws of organization, and a general condition, the Law of Prägnanz, which states that ‘psychological organization will always be as “good” as the prevailing conditions allow’.
Where We’re Going
3
An initial focus of this book is on the related questions that arise in the perception of events. Here, too, the organization of the continuous happenings into discrete units is clearly part of the process of perception rather than something given in the external world. Consider a ball flying through the air and a person running on the ground. If these two occurrences are taken to be independent of each other, they will probably be considered as two separate events, but if we attribute to the person an intention to catch the ball, then we may suddenly be inclined to consider the two occurrences jointly as a single event of the person chasing after the ball. This shows that eventhood, too, is in the eye of the beholder, and that our perception of certain happenings as a single event is contingent on all sorts of relations between that event and other happenings, and among subparts of the happenings themselves. The first task in this book, to be undertaken in Part I, is to present a theory of those relations among events. Under what circumstances are we willing, or likely, to consider a set of happenings as a single event? However, as this is primarily a work about language, we will shift the emphasis of the previous question somewhat, to consider the description of stuff that happens. If perception is organization, then an individual’s discourses about the real world will inevitably reflect the organization that results from that individual’s perceptual processes. Instead, then, we will ask this: Under what circumstances are we willing, or likely, to describe a set of happenings (or accept as accurate a description of a set of happenings) as a single event? In slightly more formal terms, the sorts of algebraic structures proposed initially for the domain of individuals by Link (1983), and then for events by Bach (1986), allow us to consider, for any two events, a larger single event composed solely of those two subevents. We want to know under what circumstances we are likely to actually exercise that option, or conversely, when two events just do not feel like a single larger event. The answer proposed here goes some way beyond the usual linguistic characterization of events, based primarily on the event variable of Davidson (1967). I will adopt a version of what Pietroski (2000: 2) describes as a ‘nonCartesian form of dualism’. That is, without subscribing to the Cartesian distinction between the physical and the mental, I will claim that mental processes, and in particular intention, have a special status in our organization of happenings into events. This does not mean that the mental actually is different from the physical—I need not care what ‘the mental’ and ‘the physical’ actually are. Rather, all I claim is that, when we perceive some happenings as corresponding to an agent’s intention, that affects the way in which we organize those happenings into events.
4
Introduction
More specifically, behaviour perceived as intentional, planned, or agentive, allows the formation of larger perceptual units than would otherwise be possible. I suggest with Fodor (1970) and Wolff (2003) that, in the absence of perceived intention, a major factor in the organization of happenings into events is direct (sometimes called ballistic) causation. If some property of an object x causes some change of state in another object y, with no intermediary, then we perceive a single event, and we use a single event description to describe what happens. That is why we can say the falling tree destroyed the car to describe a series of blind, mechanical changes in which a strong wind blows a tree over, and the tree happens to land on a car, crushing it. On the other hand, if some intermediate cause is apparent (for example, the strong wind blows a heavy tree over, the tree lands on one end of a seesaw, on the other end of which is a 1-tonne weight, which is launched onto a car, crushing it), then we are less willing to describe the series of happenings with the sentence the falling tree destroyed the car. Once intention is involved, though, the requirement of direct causation is much less clearly felt. I can truthfully claim that I destroyed the car if I (with my heavy friends) jump on the end of a see-saw, sending a 1-tonne weight flying onto the car. However, accepting such a description requires attribution of intention (or at least some form of responsibility) to me: if I was dropped, unconscious, onto the see-saw, and the same consequences ensued, the claim that I destroyed the car sounds, if not completely false, at least tenuous. Part I, then, is concerned with the elaboration of a theory of the perception and description of events which takes account of the special status of intention. After this, Part II applies this theory to a recalcitrant and apparently unrelated problem in syntax, namely patterns of extraction out of adjuncts, primarily in English. As we will see presently, there are two major positions among theoretical syntacticians on the status of adjuncts as locality domains. Neither is very satisfying in isolation, because one of them (the claim that adjuncts are strong islands) rules out too many cases of extraction from adjuncts and one of them (the claim that adjuncts are weak islands) doesn’t rule out enough. And although it is impossible to construct a negative proof in cases like this, I hope to show that the prospects are bleak for attempts to find a more empirically satisfying pure-syntax story of this area. In such cases, the best approach is often to adopt the overly permissive position, and look for a supplementary constraint to rule out the cases that such an analysis in isolation predicts to be acceptable. In the case of extraction from adjuncts, the supplementary theory I propose is based on the analysis of events. The major hypothesis is, roughly, that movement is only possible
Where We’re Going
5
when the minimal constituent containing the head and the foot of the chain describes no more than one event. Even at this early stage, two consequences hopefully stand out. Firstly, this means that the theory of events developed in Part I is not just a theory of what some of our basic perceptual units look like. Rather, the shape of such a theory has consequences for empirical areas usually considered part of ‘core’ linguistic theory, such as syntactic locality. Secondly, we have some major architectural issues to confront. The theory we will arrive at belongs to a class of theories that we may call ‘pluralistic’, in that they give semantic structures a substantial degree of independence from phrase structure and let them carry a lot of weight in accounting for would-be syntactic facts on partially nonsyntactic terms. The present work therefore joins a large, diverse, and growing, body of pluralistic research into A -locality (see Erteschik-Shir 1973, Morgan 1975, Goldsmith 1985, Lakoff 1986, Kuno 1987, Kluender 1992, Szabolcsi and Zwarts 1993, Culicover 1993, Kehler 2002, and Culicover and Jackendoff 2005, to name just a few). Such work contrasts with what Jackendoff and others have called ‘syntactocentric’ theories, in which interpretive components, in particular the semantic component, only minimally modify the structures fed to them by syntax. Pluralistic accounts of empirical areas like this arguably push our analyses closer to the minimalist ideal of a stripped-down syntactic module which satisfies the substantial constraints imposed by the modules it interacts with, and does nothing more than that. However, pluralistic theories require a nontrivial theory of how those modules interact, a problem which is much less pressing for syntactocentric theories where the LFs generated by narrow syntax are assumed to map more or less directly into the sort of structures interpretable by the semantic component. Such architectural issues are the concern of Chapter 8. In the remainder of this chapter, Section 1.2 sketches what existing theories have to say about extraction from adjuncts, and shows that, of the two major classes of theories of extraction from adjuncts, there is some truth in each, but that each is also empirically unsatisfying as things stand. Section 1.3 then aims to sow some seeds of doubt about the standard practice of treating extraction from adjuncts as a primarily syntactic problem, by discussing a series of puzzling facts about the distribution of extraction from adjuncts. Although I will not attempt to show that the patterns in question cannot be accounted for in purely syntactic terms, I will discuss a few ways in which patterns of acceptable extraction from adjuncts really do not look like the sort of thing our syntactic locality theories are currently set up to describe. These patterns represent a challenge to anyone pursuing the project of a purely syntactic account of patterns of extraction from adjuncts: if extraction from
6
Introduction
adjuncts is regulated solely by the sort of purely syntactic principles which regulate extraction from many other types of constituent, why do the patterns of acceptable extraction from adjuncts look as unusual as they do from a syntactic perspective? Finally, Section 1.4 sketches the course we intend to follow in subsequent chapters, where we elaborate an alternative to the purely syntactic account, which currently stands as the only game in town.
1.2 Locality Theory and Extraction from Adjuncts: A Potted History Although the notion that adjuncts are islands for extraction is often attributed to Ross (1967), there is no direct discussion of adjuncts as a unified class in Ross’ thesis. Rather, the claim is first explicitly made in Cattell (1976). Cattell, as with Chomsky (1973), attempts to provide a unified account of several of Ross’ constraints, in this case the Complex Noun Phrase Constraint, the Sentential Subject Constraint, and the Coordinate Structure Constraint. It appears to be seen as an added bonus, rather than the point of the paper, that his explanation also rules out extraction from many adjuncts. The core of Cattell’s theory is the notion of syntactic configuration, which defines the maximal domains in which operations such as wh-movement can apply. (1) A syntactic configuration is a maximal sequence of sentoids [which for our purposes can be taken to be equivalent to clauses], S1 ,. . . ,Sn , such that each Si (i = 1) is embedded in the predicate [as opposed to the subject] of Si−1 , and is a function of its verb—and such that no syntactically required item is lacking in any predicate of the configuration so formed.1 (Cattell 1976: 33, emphasis removed) Cattell then proposes the following constraint. (2) The NP Ecology Constraint. The number and identity of argumentNPs within a syntactic configuration must remain constant under the operation of movement rules. (Cattell 1976: 27) These definitions jointly ensure that movement only occurs within syntactic configurations. As syntactic configurations are defined in terms of predicate and function of the verb, extraction from subjects and adjuncts is excluded: from subjects, because subjects are outside the predicate, and from adjuncts, because adjuncts are typically optional, and so not a function of the verb. 1 There is no comparably concise definition of function of a verb in Cattell’s paper, but the basic idea is that if the status of a constituent X as obligatorily present, optionally present, or obligatorily absent, varies from verb to verb, for syntactic reasons, as opposed to, say patterns of semantic incompatibility, then X is a function of the verb.
7
Locality Theory and Extraction from Adjuncts
Cattell provides examples like the following to illustrate his claim with respect to adjuncts:2 (3)
(a) John met a lot of girls [without going to the club]. (b) *Which club did John meet a lot of girls [without going to
]?
(4)
(a) John won’t meet any girls [unless he goes to University]. (b) *Which University won’t John meet any girls [unless he goes to ]?
(5)
(a) John dated a lot of girls [instead of doing the exams]. ]? (b) *Which exams did John date a lot of girls [instead of doing (Cattell 1976: 38)
However, Cattell also acknowledges the existence of examples like (6), which involve extraction out of an apparently non-subcategorized constituent (that is, an adjunct), but which are still perfectly acceptable. (6)
(a) He bought a book [for the girl]. (b) The girl that he bought a book [for
]
(Cattell 1976: 38)
Why these examples should diverge from the presumed norm in (3–5) is left as an open question, but the impossibility of extracting from an adjunct is taken to represent the general case, while examples like (6) remain ‘recalcitrant’ (p.38) problems. This assumption remained largely intact, or even became more widely accepted, for the next decade or so. In particular, Belletti and Rizzi (1981) noted that the possibility of ne-cliticization out of objects in Italian (and impossibility of ne-cliticization out of adjuncts and external subjects) is largely parallel to the observed data concerning patterns of wh-movement. (7)
] a Milano (a) Gianni ne trascorrerà [tre in Milan Gianni ne will.spend three ‘Gianni will spend three of them in Milan.’ ] ne passano rapidamente (b) * [Tre Three ne elapse rapidly ‘Three of them elapse rapidly.’ (Belletti and Rizzi 1981: 119) ] a Milano (c) * Gianni ne è rimasto [tre in Milan Gianni ne is remained three ‘Gianni stayed for three of them in Milan.’ (Belletti and Rizzi 1981: 126)
2 As a convention, I will enclose adjuncts or other relevant locality domains in brackets, and generally mark only the position at the foot of the chain with an underscore.
8
Introduction
Assuming a movement theory of cliticization, this pattern was accounted for by a revised version of Subjacency, according to which movement across a bounding node is only possible if that bounding node is superscripted, in the sense of Rouveret and Vergnaud (1980). Given that superscripting is used by Rouveret and Vergnaud partly as a syntactic indicator of relations between a predicate and its internal arguments, Belletti and Rizzi’s theory largely replicates the empirical predictions of Cattell (1976),3 but this time on the basis of an independently motivated locality condition, namely Subjacency. Almost simultaneously, the Condition on Extraction Domain in Huang (1982) offered an explanation of the presumed impossibility of extracting out of subjects and adjuncts, this time more closely related to the Empty Category Principle than to Subjacency. The Condition is stated as follows, where ‘proper government’ is government by a lexical category: (8)
A phrase A may be extracted out of a domain B only if B is properly governed. (Huang 1982: 505)
The property of proper government distinguishes complements from finite subjects and adjuncts. Complements are governed by lexical V, while finite subjects in English are only governed by nonlexical INFL, and adjuncts aren’t governed by any head. Consequently, the CED not only unifies two apparently independent observations about limitations on the domain of application of movement, as Cattell also did, but also gives a principled account of why movement is impossible out of specifically these two classes of constituents, based on the then-ubiquitous concept of government. As with Belletti and Rizzi (1981), then, the CED not only reinforces Cattell’s presentation of the facts, but also offers an explanation of why the facts should be that way. The definition of Subjacency in Belletti and Rizzi (1981) and the CED in Huang (1982) yield very similar empirical predictions, to the point where it is hard to tease them apart, given the flexibility elsewhere in the general theory of grammar. One of the ways in which they might differ in principle, though, concerns the treatment of counterexamples. Belletti and Rizzi (1981) was the first paper I am aware of which acknowledged a systematic exception to the islandhood of adjuncts, at least in English. The exception in question concerns extraction from the rationale clauses of Faraci (1974), which is frequently possible in English, but not in Italian. 3 As with Huang’s Condition on Extraction Domain, to be discussed immediately below, Belletti and Rizzi (1981)’s theory does not exactly replicate Cattell’s empirical predictions, because the two later theories rely on syntactic properties (superscripting and proper government, respectively) which are fairly closely related to the notion of ‘internal argument’, whereas Cattell’s theory is based on the somewhat looser notion of ‘function of the verb’.
Locality Theory and Extraction from Adjuncts (9)
9
(a) What did John go to New York [to buy ]? (b) * Che cosa è andato a New York [per What is gone to New York for comprare ] Gianni? buy.inf Gianni ‘What did John go to New York to buy?’ (Belletti and Rizzi 1981: 132)
Plausibly, the theories considered so far differ in how they would treat such counterexamples. Cattell would have to find ways to extend the NP Ecology Constraint, Belletti and Rizzi to allow exceptional superscripting of such adjuncts, and Huang to allow proper government to extend to adjuncts in such cases. The problem faced by Belletti and Rizzi has some similarities to the discussion of restructuring in Rouveret and Vergnaud (1980), which may suggest that some method of exceptionally superscripting constituents would be independently necessary. However, despite many efforts, the theory of restructuring and reanalysis in the early 1980s never developed to the point where we could see a testable difference between the theories here. Belletti and Rizzi’s discussion of extraction from rationale clauses reflected a growing recognition that the data concerning extraction from adjuncts are more equivocal than earlier works might suggest. Firstly, a significant amount of research in the late 1970s and early 1980s (among others, Taraldsen 1981, Chomsky 1982, Kayne 1983, and Engdahl 1983, although Taraldsen concentrated in particular on parasitic gaps in subjects rather than adjuncts) documented the existence of a well-defined subclass of at least apparent cases of extraction from adjuncts in parasitic gap constructions such as (10). (10)
This is a paper that I would reject
[without reviewing
].
Whether this represents a genuine case of extraction from adjuncts was a topic of some debate throughout the 1980s, and remains so to some extent. Within the Principles and Parameters literature, where the existence of two distinct base positions corresponding to a single overt displaced constituent raises acute theoretical issues, evidence was sought that no movement relation holds between the overt filler and the adjunct-internal gap: either there is a dependency, but it isn’t created by movement across the adjunct boundary; or the gap within the adjunct is a trace of movement, but that movement does not leave the adjunct island. Such approaches postulate an asymmetry between the ‘real’, island-external, gap, and the ‘parasitic’, island-internal gap: the filler originated in the real gap position, and was never in the parasitic gap position. A certain amount
10
Introduction
of evidence supporting this asymmetry has been presented. For example, Chomsky (1986) reproduces the following paradigm from Kearney (1983). (11)
(a) Which books about himself did John file ]? (b) *Which books about herself did John file ]?
[before Mary read [before Mary read (Chomsky 1986: 60)
Himself is only compatible with the antecedent John, accessible only from the island-external gap site, while herself is only compatible with Mary, accessible only from the island-internal gap site. The contrast between (11a) and (11b) therefore suggests that reconstruction is possible only into the island-external gap site. If reconstruction is dependent on a prior movement step, this in turn suggests that the wh-phrase could only have moved from the island-external site. A handful of similar pieces of evidence have been presented, suggesting that the asymmetry predicted by GB approaches is real. However, such evidence is challenged with some success by Levine and Sag (2003). Levine and Sag point out, firstly, that there are cases in which apparent reconstruction into either island-internal or island-external gap sites is possible, as in (12); and secondly, that asymmetries similar to the Kearney paradigm in (11) can be found in cases of across-the-board movement out of coordinate structures (13), which is generally differentiated from parasitic gap constructions. Such pieces of evidence call into question the empirical support for the predictions of the GB theory, as opposed to conceivable alternatives in GPSG, HPSG, or CCG, where there is less necessity of postulating an asymmetry between the two gaps. (12)
(a) There were pictures of herself which, [once Mary finally decided ], John would have to put into circulation. she liked (b) There were pictures of himself which, [once Mary finally decided ], John would have to put into circulation. she liked (Levine and Sag 2003: 241)
(13) Which pictures of himself/*herself did [[John approve of ] and enormously]] (Levine and Sag 2003: 242) [Mary like Two comments are in order here. Firstly, if adjuncts are not completely impermeable to movement, then the status of asymmetries such as (11) must be rethought: if movement could in principle have originated from either gap site, we would lose the major motivation for singling out the island-external gap as the base position of the wh-phrase. Secondly, even if the evidence given here and elsewhere that adjuncts are not strong islands is ignored, the predictions of path-based approaches to locality such as Kayne (1983) are much
Locality Theory and Extraction from Adjuncts
11
more malleable than those of standard GB. Although data such as those in (12–13) may call into question the notion of an asymmetry between the two gaps, then, it does not seriously damage the entire set of GB-era assumptions which predicted such an asymmetry, so much as selectively attack some of those assumptions. During the first wave of research into parasitic gaps, Chomsky (1982) discussed examples which should perhaps have had more influence on subsequent research than they did (see also Levine and Sag 2003). These were examples of extraction from a variety of classes of adjuncts. Chomsky (1982: 72) suggests that they ‘range in acceptability from fairly high . . . to virtual gibberish’.4 Here are some of the examples he gave. (14)
(a) Here is the influential professor that John went to college [in order ]. to impress ] (b) The article that I went to England [without reading ] (c) The book that I went to college [because I liked ] (d) The man that I went to England [without speaking to (Chomsky 1982: 72)
This was little more than a further acknowledgement of the problem that Cattell had hinted at (Chomsky does not even indicate which of the above examples are fairly acceptable, and which are virtual gibberish). However, it is the first explicit suggestion that it may be fruitful to treat extraction from adjuncts as a more gradient phenomenon than the essentially binary analyses of Cattell and Huang suggest. Shortly afterwards (1983), Kayne’s Connectedness theory appeared, presenting an implicit dissolution of the relationship between subjects and adjuncts which characterizes all of the above work. This is because Connectedness is built on the notion of canonical government configuration, defined partly in terms of direction. (15)
W and Z (Z a maximal projection, and W and Z immediately dominated by some Y) are in a canonical government configuration iff
4 I suspect that the widespread belief that adjunct parasitic gaps are more acceptable than extraction out of adjuncts is partially due to lack of control for factors such as tense. The most prominent example of degraded extraction from an adjunct in Huang (1982) involves a tensed adjunct: *Who did Mary cry after John hit. On the other hand, parasitic gaps have traditionally been illustrated with examples involving untensed adjuncts, such as Which articles did John file without reading, from Engdahl (1983: 5). As Engdahl shows (pp.9–11), parasitic gaps inside tensed adjuncts are less readily acceptable, at least in English. This cannot be the whole story, as Cattell (1976) discusses both tensed and untensed adjuncts, and Engdahl (1983) shows that the real gap site is more accessible than the parasitic gap site. However, it would be interesting to know how much of the perceived difference would remain if these factors were controlled for.
12
Introduction (a) V governs NP to its right in the grammar of the language in question and W precedes Z or (b) V governs NP to its left in the grammar of the language and Z precedes W.
This therefore builds a left–right asymmetry into the core of locality theory. Without going into the details of Kayne’s theory, the overall effect is that in a language like English, where V governs NP to the right, governed constituents on right branches can move arbitrarily far up a right-branching tree, regardless of whether nodes on the path between filler and gap are themselves governed or not. On the other hand, subjects, being on left branches, do not occur in a canonical government configuration with their sister. Subconstituents of that subject therefore cannot move off that left branch without violating the Connectedness condition. In other words, extraction of governed constituents out of adjuncts, but not subjects, is predicted to be possible by Kayne’s theory. Kayne does not explicitly recognize this prediction, but it is implicitly endorsed in trees such as his (28), p. 233, which show a single path extending from an adjunct-internal gap to an adjunct-external filler. This feature of Connectedness was ‘fixed’ by Longobardi (1985), who claims that: It is well-known, and much discussed in the syntactic literature (cf. most recently, Chomsky (1982)) that complement sentences which are not strictly subcategorized [footnote omitted] by a predicate (e.g. sentential subjects on the one hand; and on the other, gerund clauses; clauses introduced by certain prepositions such as before, in order to, and without; by adverbial phrases perhaps preposed by WH-movement, such as while, when) are all more or less resistant to extraction of a WH-constituent. (Longobardi 1985: 169)
Longobardi accommodates this observation within a revised version of the Connectedness condition, by incorporating the additional condition ‘W governs Z’ (p. 171) into the definition of canonical government configuration in (15) above. As adjuncts are ungoverned constituents, they never occur in the position of Z in a canonical government configuration, and movement out of adverbial clauses is consequently blocked.5 It is striking that Longobardi’s perception of the adjunct subextraction facts differs from Chomsky’s (1982) ‘range in acceptability’. We may therefore question whether Kayne’s Connectedness condition really needed fixing. The 5 As it happens, Longobardi’s condition may go too far, barring movement of adjuncts, as well as movement out of adjuncts, as government is so crucial to any nonlocal dependency on his theory.
Locality Theory and Extraction from Adjuncts
13
evidence is equivocal. Certainly, regular wh-movement out of adjuncts is sometimes possible. However, there is one good argument for a connectedness effect, in more deeply embedded adverbial clauses. Kayne (1983) showed that although a parasitic gap inside a clausal subject could be licensed, a parasitic gap inside a subject inside that clausal subject could not be licensed. (16)
(a) ?a person that [[people who read [a description of ]] usually ] end up fascinated with ] are read] (b) *a person that [[people to whom [descriptions of ] usually end up fascinated with (Kayne 1983: 228)
Longobardi shows that the same effect emerges if an adverbial clause is embedded within another adverbial clause. (17)
(a) ?The head of cattle that [we have eliminated [without trying to ]]] were the ones in the worst shape. persuade the vet [to cure (b) *The head of cattle that [we have eliminated [without trying ]]] were the ones in worst shape. to call a vet [instead of killing (Longobardi 1985: 175)
Balanced against this, however, is the phenomenon of symbiotic gaps discussed by Levine and Sag (2003), in which one gap is contained within an adverbial clause and one within a subject. (18) What kinds of books do [authors of ]? writing
] argue about royalties [after (Levine and Sag 2003: 243)
If both adverbial clauses and subjects are islands, then such examples raise serious questions. For example, we do not know which is the ‘real’ gap and which is ‘parasitic’ (hence the term symbiotic gap). Consequently, we do not know what is licensing what. On Longobardi’s revision of Connectedness, such examples are predicted to be as ungrammatical as (16b) or (17b), contrary to fact. However, if we reject Longobardi’s revision in favour of Kayne’s original proposal, then such examples are unproblematic: the real gap is inside the adjunct, which, being on a right branch, is able to move up and license the parasitic gap contained within the left-branch subject. On balance, it seems that Longobardi’s revision must be rejected. This leaves us in need of an explanation of the pattern demonstrated in (17). I believe that this is related to the fact that wh-movement out of such more deeply embedded adverbial clauses is impossible, regardless of considerations related to parasitic gaps. The theory to be developed in Part II will have some bearing on this issue, as will be discussed in Chapter 8.
14
Introduction
As was implicit in Kayne (1983), then, we should not prohibit extraction from adjuncts in principle. The first explicit such proposal is found in Chomsky (1986). One of the principal achievements of that work is to present a unified treatment of three types of locality effect, previously dealt with separately by Subjacency, the Empty Category Principle, and Huang’s CED. A major obstacle to such unification, though, was that violations of the different conditions produce different results. Classical Subjacency violations like (19) frequently feel only very mildly degraded, if at all, once factors like the syntactic category of the extractee are controlled for, while in contrast, ECP violations are generally horrible, as with the that-trace effect in (20). (19) Which shoes are you wondering [whether to buy (20)
*Who did you say [that
]?
bought the shoes]?
Without going into the details of his quite elaborate system, the solution that Chomsky offered works broadly as follows. Maximal projections typically constitute barriers to movement, but the barrierhood of certain maximal projections can be circumvented by adjoining the moved phrase to that projection as an intermediate landing site. This means that the large distances covered by movement are composed of a series of very local steps. However, in some circumstances, a moved phrase cannot adjoin to a barrier. In those cases, the moved phrase must cross the barrier in one fell swoop. These crossings of barriers by movement lead to degradation of a sentence, and the more barriers that are crossed in this way, the greater the degradation. Specifically, if a movement in a sentence crosses one barrier like this, the sentence is mildly degraded, as with the Subjacency violation in (19). If more than one barrier is crossed, though, the sentence will be categorically unacceptable. As well as offering a unification of the various disparate locality effects, then, another achievement of Barriers is that it implies a way in which the grammar might handle undeniably gradient data in this area. To be sure, the gradience is quite coarse-grained (only three degrees of grammaticality are distinguished: crossing no barriers is grammatical, crossing one barrier is mildly degraded, and crossing more than that is ungrammatical), but even this is a start.6 6 Browning (1987), basing her discussion on an idea of Belletti and Rizzi (1988), incorporates further degrees of gradience by alluding to the segment–category distinction developed in May (1985) and Chomsky (1986). Crossing a barrier category works as above. Crossing a barrier segment causes milder degradation. On this analysis, extraction from adjuncts has a status intermediate between 1-Subjacency and 2-Subjacency violations, or in other words, intermediate between classical Subjacency violations and ECP violations. Other ramifications allow her system to capture further fine-grained contrasts in acceptability. However, many people find certain cases of extraction from adjuncts fully acceptable, and certainly no worse than a classical Subjacency violation.
Locality Theory and Extraction from Adjuncts
15
How many barriers is extraction from an adjunct predicted to cross? The Barriers framework is flexible enough that the answer to this question is not straightforward, but extraction from an adjunct adjoined to VP probably crosses a single barrier, while extraction from an adjunct adjoined to IP should cross two barriers.7 One piece of evidence in favour of this claim concerns the fact that (21a) is ambiguous, but (21b) is not: in the former case, they could be (i) so angry that they cannot hold the meeting, or (ii) so angry that it is not safe for us to hold the meeting. In the latter case, only reading (i) survives. (21)
(a) They were too angry [to hold the meeting]. (b) Which meeting were they too angry [to hold
]? (Chomsky 1986: 33)
Chomsky analyses this in terms of a height difference: reading (i) emerges when too angry to hold the meeting forms a VP-internal constituent, while reading (ii) emerges where to hold the meeting is adjoined VP-externally. Extracting from the adjunct then blocks the VP-external structure, as extracting from a VP-external adjunct crosses two barriers and leads to severe degradation. Wh-movement thereby resolves the structural ambiguity by making one of the two structures illegitimate. Chomsky (1986), then, predicts that extraction from an adjunct will never be fully acceptable, but will be only mildly degraded in some cases, and fully unacceptable in others. It also predicts a correlation between the attachment height of an adjunct and the possibility of extracting out of it: the lower VPadjuncts allow extraction more easily than higher IP-adjuncts. Another prediction made by Chomsky (1986) is that extraction from VPadjuncts will behave like extraction from wh-islands like (19) or other weak islands, as both reduce to Subjacency. In each case, the movement crosses a single barrier, which prevents the movement from being fully acceptable. Chomsky shows that, as with wh-islands, extraction of NP is slightly more acceptable than extraction of PP, and extraction of an adjunct is ungrammatical, yielding an ECP violation. 7 Here’s why: we assume, as is typical in Barriers, that wh-movement moves to [Spec,C] via adjunction to VP if appropriate, but that other intermediate landing sites, such as adjunction to an adjunct or to IP, are unavailable. The adjunct maximal projection is therefore a blocking category, and hence a barrier, and IP is a barrier by inheritance if the adjunct is adjoined to IP. Movement from within an IP-adjoined constituent to [Spec,C] therefore crosses two barriers, leading to severe degradation. However, if the constituent is VP-adjoined, the moved constituent can also adjoin to VP as an intermediate landing site, giving rise to the configuration [CP wh . . . [VP twh [VP [VP . . . ] [XP . . . twh . . . ]]]]. In this case, one barrier (XP) is crossed by the first movement step, and none by the second movement step, giving rise to only mild degradation. If, alternatively, adjunction of NP to the adjunct maximal projection is possible, as considered at one point by Chomsky, extraction of NP should be possible without degradation.
16
Introduction
(22)
(a) He is the person who they left [before speaking to ]. ]. (b) [?]He is the person to whom they left [before speaking (c) *How did you leave [before fixing the car ]? (Chomsky 1986: 32)
This prediction was fleshed out in research into the properties of weak islands, notably in Cinque (1990). Cinque’s main relevant observation, building on earlier work by Pesetsky (1987), Kroch (1989), and Rizzi (1990), is that the acceptability of extracting out of a weak island depends on what you try to extract. In particular, Cinque claims that it is only possible to extract referential arguments out of weak islands (typically NPs, and maybe PPs as in (23a)). We therefore find contrasts like those in (23), which show that extraction of a referential phrase out of a wh-island (the major type of weak island) contrasts with extraction of a nonreferential phrase. (23)
(a) A quale dei tuoi figli ti chiedi To which of.the your children you wonder ]? [quanti soldi hai dato how much money have.2sg given ‘To which one of your children do you wonder how much money you gave?’ (Cinque 1990: 18) (b) *Quanti chili ti ha chiesto [se How many kilos you has.3sg asked whether ]? pesavi weighed.2sg ‘How many kilos did he ask you whether you weighed?’ (Cinque 1990: 5)
Such a distinction is also found in cases of extraction from adjuncts, where (24) shows that extraction of nonreferential NPs is impossible. (24)
*Quanti chili ha smesso di mangiare [pur How many kilos has.3sg stopped to eat even senza pesare ] without weigh.inf ‘How many kilos did he stop eating without even weighing?’ (Cinque 1990: 104)
Extraction from adjuncts and from wh-islands is not identical, however. The major difference is that (23a) showed that extraction of PP from a wh-island is possible, while (25) shows that extraction of NP from an adjunct is preferred to extraction of PP, a judgement which conflicts with that of Chomsky reported in (22b) above, but is shared by many English speakers.
Locality Theory and Extraction from Adjuncts (25)
17
(a) Anna, che me ne sono andato via [senza neanche Anna who I went away without even salutare ] say goodbye.inf ‘Anna, who I went away without even saying goodbye to, . . . ’ (b) *Anna, con la quale me ne sono andato via [senza Anna with the which I went away without ] neanche parlare speak.inf even ‘Anna, to whom I went away without even speaking, . . . ’ (Cinque 1990: 103)
This leads Cinque to develop a theory in which apparent cases of extraction from an adjunct are actually derived by A -binding of a null pronominal, an analysis which does not straightforwardly carry over to cases, such as (23), of extraction of non-nominal phrases from weak islands. However, this should not obscure the clear common core of locality effects holding in adjuncts and in weak islands. Both classes contrast sharply with subjects, for instance, from which any extraction is quite marginal in English and Italian. (26)
] sia ??Piero, che credo che [invitare be.sbj Piero who believe.1sg that invite.inf necessario, . . . necessary ‘Piero, who I believe that it is necessary to invite, . . . ’
This, then, was the late GB state of the art. Adjuncts looked more like weak islands than like strong islands, and we knew this not only because extraction out of an adjunct is sometimes only mildly degraded, and extracting the right sort of thing out of a weak island felt less severe than other types of locality violation; but also because there is a characteristic division between constituents which can, and cannot, be extracted from weak islands, and fairly similar patterns show up in cases of extraction from adjuncts and in cases of extraction from wh-complement clauses. Some degree of consensus along these lines appears to have been reached in the early 1990s: as well as works by GB theorists reiterating this position and offering alternative accounts of it, such as Hegarty (1992) or Chomsky and Lasnik (1993), there are also works outside GB making similar empirical claims, such as Pollard and Sag (1994), who list a series of acceptable cases of extraction from adjuncts (although they do not explicitly consider them as weak islands); or Postal (1998), which builds
18
Introduction
on Cinque’s work and notes that adjuncts generally behave like the nearest equivalent to weak islands in his own system of classification of islands. With the advent of minimalism, though, strange things happened to this consensus. The seminal minimalist works, Chomsky (1993) and Chomsky (1995), say next to nothing about CED effects, but one early work, Takahashi (1994), did offer a novel account, with largely similar empirical predictions to those of the late GB consensus. Takahashi’s idea comes in several stages. Firstly, there is a condition that movement should be as short as possible: no potential landing site should be skipped, or more specifically, every maximal projection on the path from the foot to the head of the chain should be adjoined to. Secondly, it is not possible to adjoin to some maximal projections. You cannot adjoin to a constituent that has moved, because that would mean that the copies of that moved constituent were different from each other; and you cannot adjoin to a conjunct in a coordinate structure unless you adjoin to all of the conjuncts, because adjoining to some conjuncts but not others would make the conjuncts too dissimilar to be coordinated. Thirdly, adjuncts are treated as a variety of coordinate structure. If we buy all of this, then we derive a prediction that, wherever adjunction to a maximal projection is impossible, movement out of that projection should be degraded, because the impossibility of adjunction conflicts with the fact that Shortest Move necessitates that adjunction. More specifically, we predict that extraction out of subjects and adjuncts should be degraded in languages like English, but for slightly different reasons. Extraction out of subjects should be degraded because the subject is taken to originate in a VP-internal position and raise to [Spec,T]. This means that English subjects are always moved categories, and adjunction to moved categories is banned. This means in turn that extraction from English subjects is predicted to be degraded. The explanation for the degradation of extraction out of adjuncts is largely parallel, except that the reason for the impossibility of adjunction to an adjunct lies instead in the notion that adjunction structures are a kind of coordinate structure. On Takahashi’s account, violations of Shortest Move necessitated by the impossibility of adjunction to some projection do not lead to absolute ungrammaticality. Rather, he follows Lasnik and Saito (1984) in distinguishing between traces of complements and of adjuncts. On Takahashi’s account, intermediate traces of movement of a complement can delete at LF, while this is not possible for intermediate traces of movement of an adjunct. And if the offending chain links remain at LF, then the violation is more severe than if they do not. In this way, Takahashi builds a complement–noncomplement
Locality Theory and Extraction from Adjuncts
19
asymmetry into his theory, allowing him to account for distinctions like the following. (27)
(a) ?*Who did you go home [before talking to ]? ]? (b) *How did you go home [before fixing the car (Takahashi 1994: 69)
There are clearly many problems with Takahashi’s theory. There are worries about theory-internal consistency (for example, the notion that links in a chain must be identical conflicts with the widespread minimalist use of late adjunction, as in Chomsky 1993, or Fox and Nissenbaum 1999), but the major worry concerns the rationale for banning adjunction to adjuncts. The treatment of adjuncts as coordinate structures is problematic for two reasons: firstly, it is indefensible in the case of a variety of modal, conditional, and other adjuncts (does A unless B really count as a coordinate structure?); and secondly, it is simply not true that extraction from a single conjunct of a coordinate structure is impossible. We have known this since Ross (1967), who gave examples like the following, and more systematic subsequent research into the phenomenon has been done by, for example, Goldsmith (1985), Lakoff (1986), and Kehler (2002). (28)
(a) Here’s the whiskey which I [[went to the store] and [bought ]]. (Ross 1967: 168) ]] now? (b) Which dress has she [[gone] and [ruined (Ross 1967: 170)
Clearly, these examples are semantically not typical coordinations: in (28), the meaning of and is more than simply conjunction. Rather, we understand that the speaker went to the store in order to buy whiskey and then bought whiskey. In examples like (28b) (known as pseudocoordination), on the other hand, the meaning is somehow less than simply conjunction, because there aren’t two separate predicates to conjoin—there is no going independent of the ruining. This reflects a persistent intuition that exceptions to Ross’ Coordinate Structure Constraint form a semantic natural class. However, this intuition cannot help Takahashi’s project of treating adjunction structures as coordinate structures. Whatever the meaning of prepositions introducing adjuncts such as without, before, despite, or unless is, they clearly are not typical coordinations. In that case, even if a treatment of adjunction structures as coordinate structures turns out to be plausible, it is not at all clear that we would expect these coordinate structures to fall into the class of coordinate structures to which the CSC applies without exception.
20
Introduction
However, despite these problems, Takahashi’s thesis deserves recognition as the first attempt to base a theory of extraction from adjuncts on minimalist principles such as Shortest Move, rather than the Subjacency condition which formed the backbone of the Barriers framework. Moreover, the judgements predicted by that theory largely follow those of Barriers and later work, in that many cases of extraction from adjuncts are predicted to have an intermediate level of acceptability, rather than being fully grammatical or ungrammatical. Such empirical predictions contrast with those of a cluster of theories, originally proposed by Toyoshima (1997) but more commonly associated with Uriagereka (1999), which attempt to derive the principal effects of the CED, without reference to the notion of government, which has been ostracized in minimalist theory. In Uriagereka’s theory and subsequent research along the same lines (Nunes and Uriagereka 2000, Johnson 2002, Sabel 2002, Zwart 2007, and Müller 2010), subjects and adjuncts are taken to be similar in that they are nonprojecting phrasal sisters of a phrasal constituent. This distinguishes them from categories which project, and from complements, which are nonprojecting sisters of heads. This provides a close minimalist counterpart of the GB notion of proper government, which essentially amounts for the purposes of the CED to being within the maximal projection of a lexical head.8 To see how these theories might work (although different theories offer different ultimate reasons for ungrammaticality), consider the following tree— nothing changes if subjects are generated VP-internally. (29)
TP DP Subject
T
VP
VP V Head
XP DP
Adjunct
Complement 8 There are obvious differences between the set of constituents picked out by Uriagereka’s characterization and by Huang’s: [Spec,V] is properly governed, but is a nonprojecting sister of a phrasal constituent; while a VP adjunct attached to an unergative intransitive verb is not properly governed, but possibly only has a head for a sister in bare phrase structure. There is enough leeway in the relevant definitions that no clear predictions emerge from these discrepancies.
Locality Theory and Extraction from Adjuncts
21
Now, imagine that merging one phrasal constituent with another phrasal constituent is something that our language faculty cannot do: something goes wrong in the process of building up phrase 1 in one derivational workspace, building up phrase 2 in another derivational workspace, and then getting the contents of one derivational workspace into the other, to attach it there. Instead, we must do something extra, specifically to the phrasal sister which does not project further. Exactly what we do is one locus of variation among the works listed above: Uriagereka applies Spellout, for example, while Johnson appeals to an operation of ‘Renumeration’. But the common core of these different conceptions is that the syntactic structure is somehow flattened or otherwise rendered inaccessible to further syntactic operations. On Johnson’s model, for example, the ‘Renumerated’ constituent behaves like a giant lexical item. You cannot then move part of that constituent, stranding the rest, for the same reason that you cannot move part of a lexical item. This predicts extraction out of subjects and adjuncts to be impossible. These theories are extremely elegant and exemplify the parsimonious minimalist ideal of getting something for nothing. Uriagereka, for example, derives CED effects using nothing more than worked-out definitions of core minimalist operations like Merge and Spellout. Conceptually, then, everything here is very attractive. The problem, of course, is that there is no obvious way to deal with the exceptions originally documented by Chomsky for adjuncts and Ross for subjects. This problem is, if anything, exacerbated by the fact that these theories derive from fundamental components of the minimalist architecture: we probably should not start tinkering with definitions of Merge, just because we find a grammatical case of extraction out of an adjunct. And so we reach a stand-off, one which is currently still unresolved. On the one side, we have people following Huang and Uriagereka, producing very elegant theories deriving the result that adjuncts are strong islands, and so movement of anything out of an adjunct is impossible. On the other side, there is Chomsky (or at least the Chomsky of the 1980s), Cinque, and Takahashi, producing theories with more nuanced empirical predictions, in which some cases of extraction from adjuncts are predicted to be marginally acceptable. At present, it seems that researchers typically assume that the facts look more or less as Huang and Uriagereka predict, which is surprising, given the prominence of the works in which counterexamples have appeared. Empirically, the facts above appear to decide the matter: unless some method of allowing exceptions is forthcoming, theories in the Huang/Uriagereka mould are too restrictive. However, the reason why ideas like Uriagereka’s are so widely adopted is probably not just a matter of their elegance. Quite simply, in many cases, extraction out of adjuncts looks much more restricted than extraction out of a typical weak island. We will see further examples of this
22
Introduction
in Section 1.3, but to mention just one, the crosslinguistic distribution of extraction out of adjuncts is restricted. Languages like French and Dutch disallow any such extraction. (30)
(a) Elle est allée en Angleterre [sans lire le She is gone in England without read.inf the livre] book ‘She went to England without reading the book.’ (b) *le livre qu’ elle est allée en Angleterre [sans the book that she is gone in England without lire ] read.inf ‘the book that she went to England without reading’ (Postal 1998: 76)
(31)
(a) Jan is [tango’s fluitend] gearriveerd John is tangos whistling arrived ‘John arrived whistling tangos.’ fluitend] gearriveerd? (b) *Wat is Jan [ whistling arrived What is John ‘What did John arrive whistling?’
The Dutch case is only slightly surprising, as extraction of NP from weak islands is also impossible in Dutch for at least some speakers, as shown by Browning (1987), citing an unpublished talk by Hilda Koopman and Dominique Sportiche (the following data give subordinate clause word orders). (32)
aardig nice
(a) *Wie hij zich afvroeg [of jij who he self asked if you ‘Who he wondered whether you liked.’ (b) *Wie hij zich afvroeg [of who he self asked if ‘Who he wondered whether liked you.’
jouw you
vond] found
aardig nice
Extraction of PPs from weak islands in Dutch is possible, as in (33). (33) Met wie hij zich afvroeg [of hij zou kunnen with who he self asked if he would can praten] talk ‘With whom he wondered whether he would be able to talk.’
vond] found
Locality Theory and Extraction from Adjuncts
23
However, extraction of PPs from adjuncts in Dutch is equally ungrammatical as extraction of NPs. (34)
*Met wie ben je hier gekomen [om with who are you here come for praten]? talk ‘With whom did you come here to speak?’
te to
The impossibility of extraction from adjuncts in Dutch is expected if, as mentioned above, adjuncts behave more or less like weak islands, except that extraction of PPs from weak islands is much better than extraction of PPs from adjuncts. Why such a pattern should hold remains a mystery, though. And in French, given that extraction of NP from weak islands is well documented, the impossibility of extracting NP from adjuncts remains unexplained. It seems, then, that extraction from adjuncts has a more restricted distribution than extraction from other weak islands. This may suggest that it is not such a huge idealization to assume that extraction from adjuncts is always impossible, as theories following Huang and Uriagereka do. However, in this book, I will adopt the Chomskyan position that adjuncts are, roughly, only weak islands (with the facts concerning extraction of PP remaining unexplained). This is not just because of the existence of counterexamples to the generalization that adjuncts are strong islands but because of the specific distribution of those counterexamples. This is Cinque’s claim, as discussed above, and it is also suggested by processing studies such as Kluender (1998). However, adopting this position does beg the question of why extraction out of adjuncts is so restricted. This work is, in part, a contribution to a satisfactory answer to that problem. Before we move on to the meat of the book, though, I will briefly address a related issue in locality. This is the question of whether a unified account of extraction out of subjects and adjuncts is even desirable. In contrast to the theories of Cattell and Huang, we saw above that Kayne’s Connectedness disallows extraction from subjects, but not adjuncts, in SVO languages like English with rightward adjuncts. Extraction must follow a path branching in the canonical direction of government. So in English, extraction is predicted to be impossible from constituents on left branches, like subjects, while it should be possible, all else being equal, from constituents on right branches, like adjuncts. This provides a clear contrast to Huang’s unified account of extraction from subjects and adjuncts, and the choice between Kayne’s and Huang’s positions has been debated ever since. Although it is in principle an empirically based choice, the data in these areas are notoriously murky, and the idealizations that one makes have a significant effect on what we take to
24
Introduction
be the explananda. However, the data concerning extraction from subjects are a reasonable fit for the predictions of the Connectedness theory: Kayne (fn.2, p.226) already noted, following Ross (1967), that relativization out of subjects is possible in SOV Japanese. (35)
(a) [Mary-ga sono boosi-o kabutte ita] koto-ga akiraka Mary that hat wearing was thing obvious da is ‘That Mary was wearing that hat is obvious.’ (b) Kore-wa [[Mary-ga kabutte ita] koto-ga akiraka this Mary wearing was thing obvious na] boosi da is hat is ‘This is the hat which it is obvious that Mary was wearing.’ (Ross 1967: 243–4)
To this, we can add examples of extraction from subjects in two further SOV languages, from Stepanov (2007). (36) Navajo: (a) ?Łéécha˛a’í ˛ iisxí-(n)í gíí shi-ł dog perf.3.kill (something)-nom me-with bééhózin-ígíí nahał’i. is-known-rel ipfv.3.bark ‘The dog that I know to have killed (something) is barking.’ killed something is known Lit. ‘The dog that [(the fact) that to me] is barking.’ (b) *Ashkii yah’ííyáa-go hadeeshghaazh-˛e˛e boy entered-comp I-shouted-out-rel sitsilí át’é. my-younger-brother is ‘The boy that I shouted when came in is my younger brother.’ (37)
Turkish: [Ahmet-in git-me-si]-nin ben-i Ahmet-gen go-inf-agr-gen I-acc üz-dü-˘g-ü] ev. sadden-past-comp-agr house ‘The house which [that Ahmet went to ] saddened me.’
(a) [Opi
Locality Theory and Extraction from Adjuncts
25
(b) *[[Ahmet gi] için] sana i yedi˘ Ahmet.nom eat-past-3sg for you-dat kızdim pastayıi . anger-past-1sg cake-acc ‘I got angry with you because Ahmet ate the cake.’ (Stepanov 2007: 89–90) Furthermore, Chung (1991) has argued that VSO Chamorro, ‘and . . . VSO languages more generally’ (p.76), allow extraction from sentential subjects, on the basis of examples like the following. (38)
]? Hayi siguru [na pära u-ginänna i karera who infl-certain that will infl-pass.win the race ‘Who is it certain that the race will be won by?’ (Chung 1991: 76)
Chung also shows, on the basis of facts concerning case agreement between embedding verbs and sentential subjects, that the embedded clause in a range of examples such as the above is a true subject, and not extraposed. It therefore appears that extraction from subjects in VSO and SOV languages is, if not ubiquitous, at least fairly common. Although the VSO case is somewhat less straightforward than the SOV case, because VSO is typically assumed to be derived through movement, such data are still significant if we take Connectedness to be a theory about surface order. There are two other cases where the subject and object appear on the same side of the verb, namely VOS and OSV orders. Here, though, the rarity of the orders means that evidence is hard to come by. Dryer (2008) lists only four OSV languages in a 1,228-language sample, and I am unaware of any studies of locality in such languages. A -locality in three VOS languages has been studied in detail, namely Malagasy, Palauan (Georgopoulos, 1985), and Tzotzil (Aissen, 1996). However, the two former languages are relatively uninformative, as Georgopoulos has suggested that Palauan A -constructions are derived by base-generation and null resumption, and a similarly movement-free analysis has been suggested for wh-constructions in Malagasy. The Tzotzil data, on the other hand, is equivocal for other reasons. A movement analysis is taken to be more appropriate here, but Aissen (1996) shows that extraction is possible from internal, but not external, subjects. (39)
(a) ??Buch’u ta=x-chonolay who icp-sell ‘Whose wife is selling?’
y-ajnil? a3-wife
26
Introduction (b) Buch’u i-cham x-ch’amal? who cp-die a3-child ‘Whose child died?’
(Aissen 1996: 461)
Only the least surprising subcase of extraction from a subject by movement is possible here, then. On the whole, there is very little evidence for the sort of extraction patterns in VOS languages that the Connectedness theory predicts. However, this is probably due in part to the rarity of these languages: to my knowledge, Tzotzil is the only well-studied language exhibiting both VOS order and general A -movement. So far, then, the predictions of Connectedness hold up well. SOV and VSO languages frequently show movement out of subjects; and more data is required on OSV and VOS languages. This leaves the negative prediction, that extraction from subjects is impossible in SVO and OVS languages. Given the extreme rarity of OVS order, we will concentrate here on SVO languages where extraction from subjects is claimed to be acceptable. Sabel (2002) mentions two such languages, namely the Kwa language Akan of Ghana, and the Bantoid language Tuki of Cameroon, but I have been unable to find any more data about these cases. However, one case of extraction from subjects in an SVO language, namely Russian (Stepanov, 2007), has been fairly widely publicized. Stepanov’s data are as follows. (40)
(a) S kem by ty xotel ˇctoby govorit’ with whom sbj you wanted that-sbj to-speak bylo by odno udovol’stvie? were.3sg.neu sbj one pleasure.3sg.neu ] were sheer plea‘With whom would you want that [to speak sure?’ ˇ (b) Cto by ty xotel ˇctoby kupit’ ne sostavlyalo what sbj you wanted that-sbj to-buy not constitute by nikakogo trudo? sbj no labour ] would not be any ‘What would you want that [to buy trouble?’ (Stepanov 2007: 91)
However, there is reason to think that the embedded clause here is not a subject, but a preverbal clausal object.9 The reason is that choices of nominal 9 I am very grateful to Alyona Titova for explaining this to me, and then having the patience to explain it to me a second time when I lost her first explanation.
Locality Theory and Extraction from Adjuncts
27
predicate other than neuter odno udovol’stvie result in different agreement on the copula.10 (41)
S kem by ty xotel ˇctoby govorit’ with whom sbj you want that.sbj speak.inf byl by splošnoj užas? were.3sg.masc sbj entire horror.3sg.masc ‘To whom do you want that it would be a complete horror to speak?’
Russian shows subject–verb agreement, but not object–verb agreement, and so it appears that the subject of the above examples is not the embedded clause. In Stepanov’s original examples, this is not clear, as the verb always shows third person neuter singular agreement, which could have been a default form. However, the nominal predicates he chose were both third person neuter singular. Once we vary this, a fuller pattern emerges. Disregarding Russian, then, the data concerning extraction from subjects are a reasonably good fit for the predictions of Kayne’s theory. Even if two putative SVO counterexamples remain, it seems that extraction from sentential subjects is not particularly surprising in SOV and VSO languages, while it is very rare in SVO languages. One question we will address below concerns the extent to which this is also true of extraction from adjuncts. To anticipate: I am unaware of any counterexamples to the predictions of the Connectedness theory, but that theory in isolation would substantially overgenerate, and it is unclear how much independent work the theory would do when paired with other, independently necessary constraints. While Connectedness does a good job of predicting the distribution of extraction from sentential subjects (even if it requires supplementary constraints), the further restrictions on extraction from adjuncts will be shown to be different, and more severe. This suggests that a nonunified approach to CED effects should be pursued. One major proponent of such an approach is Stepanov (2001; 2007). Stepanov’s argument, in particular in the 2007 paper, is simple: many languages allow extraction from subjects, as we have seen above; but none, according to him, allow extraction from adjuncts. Therefore, there is something fundamentally different about the two types of category, and Huang’s attempt to find characteristics common to the two was misguided. As we have seen, this argument is based on a false premise, because certain languages allow extraction from adjuncts, just as certain languages allow 10 Things are slightly more complex with Stepanov’s other example, (40b), because the fixedness of the phrase Ne sostavljat’ truda ‘not be any trouble’ makes it tough to find alternatives to truda with other genders. As things stand, (40b) is compatible with the analysis sketched in the text, but the only evidence specifically for that analysis comes from the contrast between (40a) and (41).
28
Introduction
extraction from subjects. However, a more subtle argument will be pursued here. It is not the case that those languages with movement out of adjuncts also allow movement out of subjects. For example, we have seen that several languages allow extraction out of subjects, but not adjuncts. English, on the other hand, allows at least some extraction out of adjuncts, as we have seen, while extraction out of subjects, if possible at all, is very marginal (see already Ross 1967: 242, 265–6 and Kuno 1973, as well as Levine and Sag 2003, Sauerland and Elbourne 2002, and Chomsky 2008, among others, on the restrictive conditions on extraction out of subjects in English. Although some such examples are clearly worse than others, opinions vary as to whether any of them are good enough for us to class them as grammatical). (42)
] damaged by the explosion? (Ross 1967: 242) ] (b) The evidence which I blew up the building [to destroy (a) %Of which cars were [the hoods
Moreover, we will see later that the factors which can ameliorate extraction out of adjuncts have no effect on extraction out of subjects. These considerations suggest that a nonunified approach to CED phenomena is on the right track: the legitimate cases of extraction from the relevant domains do not pattern alike crosslinguistically or within a given language. The most relevant immediate consequence of this is that it legitimizes the programme of looking for an account of patterns of extraction out of adjuncts without addressing patterns of extraction out of subjects. If subjects and adjuncts undeniably formed a natural class with respect to these phenomena, then a theory which only said anything about adjuncts would be gravely lacking. However, if their apparent unity is illusory, then we are justified in pursuing a theory which does not address the subject data. This nonunified approach to adjuncts and subjects also militates against Uriagereka’s approach to CED effects. On that approach, it is inevitable that most subjects and adjuncts pattern together, because this is dictated by phrase structure alone: our constituency tests tell us that subjects and adjuncts are nonprojecting phrasal sisters of phrases, and so Uriagereka’s analysis should apply equally to both. If so, the evidence that a nonunified approach to these two domains is more appropriate also argues against the unified, postUriagereka approach to CED effects. Much in this area is still up in the air, then. There is a real tension between the overly restrictive Huang/Uriagereka approach to extraction from adjuncts, which classes adjuncts as strong islands; and the overly permissive Chomsky/Cinque approach, which classes them as more similar to weak islands. Moreover, there is no consensus as to whether or not subjects and adjuncts
Further Puzzles
29
form a natural class (Cattell, Huang, Longobardi, Nunes and Uriagereka) or not (Kayne, Stepanov), as far as locality is concerned. However, my position in this book will be that adjuncts are weak islands (disregarding the discrepancy concerning extraction of PPs), and do not form a natural class with subjects. This leaves us with work to do, as additional constraints on extraction from adjuncts, over and above their weak islandhood, must be found, in order to rein in the excessive permissiveness of analyses based on weak islandhood in isolation. This is the task of Part II.
1.3 Further Puzzles Although we will not properly develop our theory of extraction from adjuncts until Part II, this section gives some reasons to doubt that the extra restrictions on extraction from adjuncts are purely syntactic in nature. One of these reasons is conceptual: we have a small, powerful, and reasonably principled array of minimalist locality principles at the moment, and if those principles don’t have anything to say, then there isn’t much room for manœuvre in such a stripped down syntactic architecture. In that case, the only place to look is at the interfaces. The other reason is empirical: there are patterns in the data which do not look like the sort of thing which syntactic locality theories normally account for, and so it is natural to assume that something nonsyntactic is at work here. 1.3.1 The Empirical Puzzles Once we accept that adjuncts are weak islands, the remaining restrictions on extraction from adjuncts look weird. We need to account for patterns like the following. The Restricted Extraction Puzzle Some classes of verbal adjunct freely permit extraction of their complement. This is true, for example, of the rationale clauses illustrated in (43), as noted by Belletti and Rizzi (1981) and Jones (1991), among others. (43)
]? (a) Whose attention is John waving his arms around [to attract ]? (b) What did you come round [to work on (c) Which paper did John travel halfway round the world [to submit ]? ]? (d) What did Christ die [to save us from
It appears that every felicitous declarative sentence containing a rationale clause permits extraction of a referential NP complement of the verb in that
30
Introduction
clause. However, other classes of adjunct impose restrictions on subextraction. A case in point is the class of bare present participial adjuncts, or BPPAs, illustrated in (44). (44)
(a) What did John drive Mary crazy [whistling ]? (b) What did John die [whistling ]? (c) *What does John work [whistling
]?
Extraction from BPPAs was first documented in Borgonovo and Neeleman (2000).11 To a first approximation (to be sharpened below), extraction from this class of adjunct is possible only if the matrix VP describes a telic event: (44a–b) show that extraction is possible when the matrix VP describes a telic accomplishment or an achievement, respectively, while (44c) shows that extraction is impossible when the matrix VP describes an atelic activity.12 From a syntactic perspective, this is puzzling. Even given the existence of proposals to treat aspectual class membership in decompositional syntactic terms, there is no clear reason why such proposals should interact with locality theory to produce the patterns that we see here. Of course, it is quite possible, as always with theories that are still under construction, that a purely syntactic treatment of this pattern will be forthcoming at some point. However, I see no way to proceed with this pattern syntactically, and so I will be looking for an account which is not purely syntactic. The Restricted Answers Puzzle The class of prepositional participial adjuncts illustrated in (45) also imposes restrictions on when subextraction is possible. (45)
(a) Who did John go home [without talking to (b) *Who did John get upset [despite talking to
]? ]?
However, these restrictions are rather distant from the aspectual class restrictions illustrated in the previous puzzle. Although matters such as height 11 Borgonovo and Neeleman proposed a theory covering the achievement case, together with a subset of the accomplishment data, in which the matrix object is a reflexive anaphor. The reflexivity of the accomplishment cases they discussed played a prominent role in the theory they developed to account for those cases. The existence of cases of extraction with nonreflexive matrix objects such as (44a) therefore constitutes a serious challenge for that theory. However, the importance of that paper to the present work, as the first sustained generative discussion of extraction from BPPAs, should not be overlooked. 12 We cannot properly test this with the fourth of Vendler’s (1957) aspectual classes, namely states, because BPPAs modifying states are basically unacceptable in declarative as well as interrogative contexts.
(i) ??John knows Georgian [wearing this magic hat]. (ii) *Which magic hat does John know Georgian [wearing
]?
Further Puzzles
31
of attachment and certain semantic factors are partially responsible, other restrictions on felicitous extraction from prepositional participial adjuncts concern the assumed answer to the question. Consider the dialogues in (46– 47). (46)
A: Which book did John design his garden [after reading B: An introduction to landscape gardening.
]?
(47)
A: Which book did John design his garden [after reading B: #Finnegans Wake.
]?
From a purely temporal point of view, it is equally plausible that John puts down his copy of either Finnegans Wake, or the introduction to landscape gardening, and gets to work. However, for some speakers at least, the question asked by A assumes more than that. Clearly, the introduction to landscape gardening is connected to garden design in a way that Finnegans Wake, hopefully, is not. More specifically, reading the introduction to landscape gardening enables John to design his garden. For many speakers, asking a question such as that in (46–47) assumes some such connection, over and above the purely temporal relation specified by after. This is linked not only to the fact that A’s utterance is a question, but more specifically to the fact that the extraction site of the wh-phrase is contained within the adjunct. Compare the dialogues in (48–49). Once again, we see here a wh-question including a verbal after adjunct. In this case, however, extraction is of the complement of the matrix verb, so wh-movement bypasses the adjunct. In this case, there is no requirement for any specific connection between the events described in the matrix VP and the adjunct, and so both answers are equally acceptable. [after reading the introduction to landscape (48) A: What did John do gardening]? B: He wrote an essay on Finnegans Wake. (49) A: What did John do [after reading the introduction to landscape gardening]? B: He designed his garden. This is again quite unexpected from a syntactic perspective. There is no clear reason why the after which is extracted across in (46–47) should impose different semantic requirements from the after which is bypassed by extraction in (48–49). More fundamentally, there is also no syntactic reason to expect the wellformedness of a wh-question to depend on the answer to that question, and so the possibility arises once more that the account of this pattern is not purely syntactic.
32
Introduction
The Interpretive Puzzle Consider again two examples discussed in the context of the restricted extraction puzzle: (50) (a) What did John drive Mary crazy [whistling ]? (b) What did John die [whistling
]?
There is a clear interpretive difference between these two examples. (50a), in which the matrix VP describes an accomplishment, is most readily interpreted as in (51). (51) What is the x such that John whistling x caused Mary to go crazy? However, (50b) cannot be interpreted as in (52a). Instead, the relation between matrix and adjunct events is purely temporal, more along the lines of (52b). (52)
(a) What is the x such that John whistling x caused him to die? (b) What is the x such that John was whistling x immediately before he died?
This distinction does not immediately constrain our theory of extraction from adjuncts, as these readings are also preferred in corresponding declaratives, as in (53). (53) (a) John drove Mary crazy whistling hornpipes. (b) John died whistling the Marseillaise. However, other declaratives constructed from VPs of the same aspectual classes do not have these interpretations, and do not allow extraction either. For example, (54a) contains a BPPA describing an activity and modifying a VP describing an accomplishment, in parallel with (53a). Unlike that case, though, it is impossible, or at least requires a great deal of creativity, to interpret the two events in (54a) as standing in a causal relation. Instead, the relation here is roughly one of simultaneity: John painted the picture while, but not by, eating apples. And when this causal relation is absent in this way, the possibility of extracting out of the adjunct disappears, as shown by (54b). (54) (a) John painted this picture [eating apples]. (b) *What did John paint this picture [eating
]?
A similar observation can be made in the achievement case. (55a) shows a case, parallel to (53b), in which a BPPA describing an activity modifies a matrix VP describing an achievement. However, unlike (53b), the relation between the two events here cannot be one where the adjunct event immediately precedes the matrix event, as our world knowledge tells us that living room carpets are situated inside homes, and so John’s arrival at his home must precede
Further Puzzles
33
his dripping mud over the living room carpet. Once again, this difference in interpretation brings with it a difference in extraction possibilities. (55b) is pretty marginal in any case, but it is absolutely impossible if the answer is the living room carpet, a further illustration of the restricted answers puzzle. (55)
(a) John came home [dripping mud all over the living room carpet]. (b) ??/*What did John come home [dripping mud on ]?
The interpretive puzzle posed by such examples contains two sub-puzzles. Firstly, we must explain why the possibility of extraction from BPPAs modifying VPs describing accomplishments and achievements is contingent on the interpretive relation between the two events. Secondly, why do these two different aspectual classes require different interpretive relations to license extraction? Once again, locality theory currently has little to say on the matter, and although I cannot rule out a syntactic account of such effects, I will be pursuing a more interface-based approach in what follows. The Unlikely Antilocality Puzzle Antilocality, as the name suggests, is the opposite of locality. Whereas locality theory normally operates on the assumption that certain syntactic operations are impossible because they relate elements that are structurally too distant from each other, antilocality claims that certain syntactic operations are impossible because they relate elements that are structurally too close to each other. Early expositions are found in Boškovi´c (1994), who cites earlier work by Saito, and Pesetsky and Torrego (2001). Two more recent theories discuss empirical areas closer to our particular interests. In the first, due to Grohmann (2003), antilocality prohibits movement between certain subdomains of the clause. Meanwhile, in the second, due to Abels (2003), antilocality prohibits movement between positions within a single projection. Under certain circumstances, as in (56), extraction out of BPPAs also appears to exhibit antilocality effects. (56)
(a) ??What did John drive Mary crazy [fixing ]? (b) What did John drive Mary crazy [trying [to fix
]]?
Clearly, the grammatical (56b) contains more syntactic structure than the ungrammatical (56a), and so an antilocality approach to these data is the obvious place to look from a purely syntactic perspective. (56a) is ungrammatical, so the story would go, because what and its trace are simply too close. However, several factors make this point of view a nonstarter. Firstly, note that replacing the verb contained within the adjunct in (56a) gives a fully
34
Introduction
grammatical sentence, as in (44a), repeated below, despite the absence of any obvious additional syntactic structure.13 (57)
What did John drive Mary crazy [whistling
]?
Secondly, replacing the embedding verb in the adjunct in (56b) can lead to ungrammaticality, as in (58). Once more, this is unexpected on any putative antilocality approach: there is no clear sense in which what and its trace are any closer in (58) than in (56b), and so no antilocality theory will capture this straightforwardly. (58) *What did John drive Mary crazy [beginning [to fix
]]?
Finally, the antilocality effect, such as it is, disappears altogether in other syntactic environments. In (59), for example, the same participial constituents occur in the PP complement of talk. In this environment, extraction from either example is well-formed. ]]? (59) (a) What did John talk [about [fixing (b) What did John talk [about [trying [to fix
]]]?
It seems, then, that the putative antilocality effect must be sensitive not only to the distance traversed by movement but also to the local relations among nodes passed along the way. The presence of an adjunct boundary apparently induces an antilocality effect which is absent when the path from filler to gap passes no such boundary. Equally, certain lexical items apparently form barriers to movement, while other syntactically similar lexical items do not. It is clear that such an antilocality theory will lack the elegance and theoretical motivation of Grohmann’s and Abels’ proposals. One way in which the present work hopes to make a contribution is by allowing elegant syntactic theories to do what they do best, without forcing them to contort themselves to account for messy data such as the above as well. So the antilocality account of (56) has proven to be a straw man. But what else does syntactic theory have to offer in the face of such data? This gradience and susceptibility to subtle semantic factors has been noted in the syntactic literature for decades, but it has never been clear how any semantic factors produce these effects. Consequently, we have never known exactly what our syntactic theory is supposed to account for, which is why we can currently entertain such a varied range of theories which differ from each other in their basic empirical predictions. 13 Even on syntactic decompositional approaches to aspectual classes, we would expect an activity verb such as whistle to contain no more eventive projections than an accomplishment verb like fix, as activities have, if anything, a less complex subevent structure. Syntactic decompositional approaches will not help here, then.
Further Puzzles
35
The next subsection briefly attempts to delimit the extent of the empirical responsibility of syntactic locality theory in this area, in the context of a sketch of a typical minimalist locality theory. We will see that, antilocality aside, the unifying characteristic of current locality conditions is that intervening material can only cause further problems, making movement harder, not easier. This is the exact opposite of what the Unlikely Antilocality Puzzle shows us, as an adjunct in this case allows extraction only in the presence of some such extra structure. 1.3.2 Minimalist Locality Theory: Where Would We Look? Theories of locality in the minimalist program can have a claim to a principled basis, in the following sense. The defining characteristic of minimalism is that it attempts to build a theory of syntax on just a handful of simple operations, such as Merge, Agree, and Spellout. A theory of locality in that system works by restricting the application of those operations. Uriagereka’s Multiple Spellout theory in the previous section worked in just this way. That theory relies on a stipulation about the types of units that Merge can operate on (it cannot form sets containing two syntactically complex elements); and a stipulation about the timing of Spellout (Spellout can apply if it is the only way for a derivation to proceed, when Merge needs to apply to two syntactically complex elements). But that is all it relies on: there is no literal condition on movement in Uriagereka’s approach, but these considerations conspire to limit the circumstances under which movement can take place. And this is a good thing too, if Chomsky’s claim that movement reduces to Merge is justified. If movement is just Merge, then Move as a separate operation does not exist, and so it is impossible to state constraints on movement itself. A suitably minimalist goal, then, is to find natural conditions on Merge, Agree, Spellout, and so on, which restrict the distribution of movement, without stating conditions on movement itself. We have seen many components of a typical minimalist toolbox already in Section 1.2. One possibility, which Chomsky and Lasnik (1993) and Takahashi (1994) made use of, is that links in a chain must be as close together as possible (a condition commonly called Shortest Move). Another, updating ideas from Rizzi (1990), constrains the distances over which Agree relations can be established, and specifically hypothesizes that Agree relations can be blocked by similar features intervening between probe and goal (Relativized Minimality). In recent versions of minimalism incorporating ideas from Chomsky (2000), A -movement is contingent upon the establishment of such Agree relations, and so such intervention effects also restrict the distribution of movement.
36
Introduction
In addition to these two conditions forcing relations to be very local, there are two common types of condition on Spellout. Whereas the above conditions define relativized locality domains, conditions on Spellout define absolute locality domains, or barriers which are impermeable to the relevant syntactic relations. One such condition is Uriagereka’s Multiple Spellout theory, which defines a class of constituents across the boundaries of which syntactic relations cannot straightforwardly be established.14 As we have seen, this restriction ultimately derives from a stipulation concerning the nature of Spellout. Another type of restriction on Spellout is the phase theory initiated by Chomsky (2000). The essential idea of phase theory is that elements contained in the complement of certain heads H are spelled out, and thereby become inaccessible to operations involving elements outside of HP. However, H itself, and HP-internal elements which c-command H, constitute a privileged ‘edge’, which is able to form relations with HP-external constituents. Once again, then, a condition on the operation of Spellout ultimately constrains the establishment of Agree relations, and thereby of movement. Finally, we have encountered a class of hypotheses which force movement operations contingent upon Agree relations not to take place in very local configurations. These are the antilocality theories. For example, Abels (2003) argues that there is no motivation for movement from complement to specifier position within the same projection, as any syntactic relation that could be established between X and YP in [Spec,X] could already be established between X and YP when YP is its complement. If unmotivated operations do not occur in syntax (another general stipulation which indirectly constrains movement), these assumptions about Agree predict that movement from complement to specifier position is unnecessary, and so impossible. Many, if not all, of these tools have something to say about extraction from adjuncts. In fact, we saw in Section 1.2 that many of them have been pressed into service to explain presumed facts about the distribution of extraction from adjuncts. However, the challenge is not just to say something about extraction from adjuncts but to explain why we find the sorts of patterns we saw in Section 1.3.1. This is much harder for the above tools. I cannot prove that it is impossible, but I will suggest that there is no obvious way to proceed, except by directly building a description of the patterns we are trying to describe into ad hoc constraints on the application of one of these devices. For example, we may approach the restricted extraction and the interpretive puzzles outlined above by associating different semantic relations holding between the events described by a BPPA and the VP to which it attaches, 14 The exception implied by ‘straightforwardly’ comes with parasitic gaps in the theory of Nunes and Uriagereka (2000), which we can safely ignore here.
The Plan
37
with different null functional heads. We may then stipulate that certain of these heads trigger Spellout, while others do not, and that all such heads are unusual in that they have no A -landing site in their edge. This gives us the technology to trap wh-phrases within syntactically defined domains corresponding to BPPAs which stand in certain semantic relations to the VPs to which they attach, but not others. Equally, we may capture the unlikely antilocality puzzle by claiming that BPPAs headed by certain verbs, like fix, form an antilocality domain with matrix [Spec,C], such that movement from within those BPPAs to matrix [Spec,C] is degraded. However, a subordinate clause within the BPPA trying to fix of an example like (56b) makes the A dependency sufficiently nonlocal to void the antilocality effect. On the other hand, verbs like whistle never form such antilocality domains in the first place. No doubt, some strategy along these lines could cover the data pertinent to the unlikely antilocality puzzle. Such strategies can presumably be made to work, so I cannot claim that any of the data I have mentioned is beyond the reach of a syntactic theory of locality. However, I clearly do not want to do things like this, and I do not see any more elegant pure-syntax alternative. In fact, I believe that the more we allow conditions like the above straw men to be built into the way we state our theories of locality, the less claim we have that our theories of locality are elegant and explanatory. This would be a real shame because, as things currently stand, a minimalist syntactic theory of locality explains certain clear empirical phenomena in a natural, principled, and efficient way, and such virtues are worth fighting to maintain. This, then, is the justification for looking for an interface-based account of the puzzles we have encountered. It is not that a syntactic theory couldn’t account for the puzzles but rather that present indications are that it could not do so in an elegant, explanatory way. Over the decades, we have developed an impressive theory which accounts for very many nonobvious, but robust, phenomena, but these patterns of acceptable extractions from adjuncts do not seem like they are going to be one of them. If we can find an account of these patterns outside of the pure-syntax theory of locality, then we can maintain that theory in all its elegant, explanatory glory, and we may also sharpen our picture of the division of labour between syntax and neighbouring linguistic components.
1.4 The Plan I will attempt to describe the patterns noted in Section 1.3.1 in terms of a constraint on the relationship between a sentence containing an instance of
38
Introduction
A -movement, and the semantic representation of that sentence as a description of one or more events. The account I will propose is based on one condition. Here it is, in preliminary form: (60)
The Single Event Condition (first approximation): An instance of wh-movement is legitimate only if the minimal constituent containing the head and the foot of the chain can be construed as describing a single event.
The link between the largely syntactic discussion above and this condition is at first completely obscure. Here, then, is a rough first pass through the story. Firstly, recall that events have a linguistic and cognitive reality parallel to that of regular individuals. In particular, the mereological relations among events parallel those among regular individuals (see Davidson 1967, Davidson 1969, Link 1983, and Bach 1986 for foundational work on this parallelism). Crucially, an event can stand in a part–whole relation to another event. As initially proposed by Davidson (1967), a verb phrase functions as an event description or a property of events. In the adjunction structures we are primarily interested in, there are two VPs, one in the matrix and one in the adjunct. Therefore, we have two event descriptions. The question is whether those two event descriptions can be construed as each describing part of a single larger event. The Single Event Condition predicts that wh-movement is possible only if we are able to stretch the boundaries of what we consider as a single event sufficiently to cover the two subevents described by the matrix VP and the adjunct VP. This immediately puts much of the responsibility for deriving patterns of wh-movement onto our theory of admissible part–whole relations among events. Part I is concerned with the elaboration of that theory. We require, for example, that the whistling event and the driving-Mary-crazy event in (44a), repeated as (61a) below, can be jointly construed as a single event of John driving Mary crazy whistling; and that the whistling event and the dying event in (61b) can be jointly construed as a single event of John dying whistling; but that we cannot straightforwardly construe the whistling event and the working event in (61c) as jointly forming a single event. The Single Event Condition then converts these event-structural distinctions into grammaticality judgements: in (61c), where a single event construal is less readily available, extraction out of the adjunct leads to more substantial degradation. (61) (a) What did John drive Mary crazy [whistling ]? (b) What did John die [whistling ]? (c) *What does John work [whistling
]?
The Plan
39
The major questions, then, concern what happens when we have multiple verbs, and therefore more than one event variable, in a structure, and particularly in a structure where neither verb c-commands the other. Does our compositional semantics identify those event variables, producing a complex single event description; or does it keep them separate, producing multiple more simple event descriptions? The answer I will propose is that our compositional semantics is maximally flexible in this respect. In many cases, we are free to interpret these structures as describing a single event or multiple events. However, from a broader cognitive perspective, we are more likely to perceive certain semantic configurations of multiple subevents as a single larger event than we are for certain other configurations. The Single Event Condition states that, to the extent that we readily perceive a group of multiple subevents as a single larger event, extraction out of constituents describing those subevents should be possible. Part I is therefore devoted to elaborating a theory of what can, and what cannot, readily count as a single event, on grounds independent of whmovement. We proceed in two steps. First, we define the notion of a core event in Chapter 2, which covers the standard decompositional treatment of Vendlerian aspectual classes as larger events containing subevents as proper parts (see Vendler 1957, Dowty 1979; and many subsequent authors). We then, in Chapter 3, define an extended event as a chain of appropriately related core events, corresponding to the plan of the agent of the first event in the chain. The ability, or willingness, to construe a series of events as occurring according to the will of a rational agent therefore affects the upper bound of possible event size, as extended event construal is only available if we perceive such a plan. In Chapter 4, we describe a basic syntax–semantics interface, which has the flexibility to generate single-event and multiple-event readings compositionally for most sentences. There is very rarely anything syntactically illformed with either reading, on this theory. The semantic constraints on event structure are much more active. This chapter also gives a sketch of the place of event structure within a larger theory of temporality and related matters, topics which are largely beyond the scope of this monograph, but which cannot be ignored entirely. With this theory of event structure in hand, we return to A -locality in Part II. The first major task there is to relate the event structures described in Part I, through the lens of the Single Event Condition, to grammaticality judgements concerning A -movement constructions. A major focus there will be a thorough exploration of factors relating to the puzzles discussed in Section 1.3.1. However, the Single Event Condition is not stated, or
40
Introduction
intended, as a parochial condition on extraction out of adjuncts but rather as a broader condition on A -movement, or possibly all movement. After we have used the Single Event Condition to build a more adequate account of adjunct subextraction phenomena, we will also check that it doesn’t do any violence to existing locality theories in other empirical domains, and even possibly has some use in explaining patterns of preposition-stranding in those languages which permit it, and the degradation of extraction out of factive complements. After all that, though, we still have to tackle questions implicit in the formalization proposed in Chapter 4, namely what exactly goes wrong in a bad case of extraction out of an adjunct; what does all this tell us about the architecture of the grammar; and how we can formulate the Single Event Condition more precisely to reconcile it with what we already know about the organization of the grammar. We have, then, a simple plan, built on a simple hypothesis embodied in the Single Event Condition. What distinguishes that condition from most previous approaches to locality, of course, is that it is not purely syntactic (although we shall ultimately end up with a syntactic component in the mechanisms underlying the implementation of that condition). Instead, it ties together a series of claims about the semantics, pragmatics, and processing of wh-questions. This surprising fact illustrates a methodological principle familiar from the earliest days of generative grammar. Chomsky has insisted from 1955 onwards that we have no way of determining a priori the cause of a deviant sentence’s ill-formedness (for example, ‘descriptions of particular subsystems of the grammar must be evaluated in terms of the entire system of rules’, Chomsky 1965: 44). What we see here is a case where what appeared to be a syntactic fact, the islandhood of adjuncts, in fact requires the techniques, assumptions, and vocabulary of multiple interacting areas of linguistic description to provide a full account. The four puzzles given above are intended to illustrate the problematic nature of the adjunct subextraction data for purely syntactic locality theories. The rest of this book is dedicated to showing that a more interface-based approach can do a better job.
Part I The Structure of Events
This page intentionally left blank
2 The Variable Size of Events Chapter 1 presented a series of puzzles which suggested that in many cases, patterns of extraction out of adjuncts were substantially different from the typical patterns determined by our syntactic theories of locality. I also made the suggestion, embodied in the Single Event Condition ((60) in that chapter), that considerations related to event structure played a role in filling the gaps left by the syntactic theory. Of course, such a suggestion requires a theory of what counts as a single event. The aim of Part I is to elaborate such a theory. The classical semantic notion of event is associated first and foremost with Davidson (1967), but that text and many others following it are less concerned with the delimitations of individual events than with the formal hypothesis that sentences describe events. However, the question of the individuation of events was raised not long afterwards, firstly, to my knowledge, in Davidson (1969). This is the question of which collections of happenings we are willing to consider as a single event—exactly the question we started with in Section 1.1. The answer I will suggest is that the individuation of events is somewhat fluid. The question of whether we consider two smaller events (call them subevents, following Ramchand 2008) as a single larger event (a macroevent) is context-dependent and subject to individual variation in many cases. One way of conceiving of this operation of macroevent formation is as a sort of coarse-graining. We do not generally consider an individual or an event simultaneously as a whole and as a collection of parts: if we consider a batter as an individual, we don’t consider him at the same time as two arms, two legs, torso, and head; and if we consider hitting a ball as a single event, we don’t take into account at the same time the series of muscle movements which jointly constitute hitting a ball. In each case, what is at issue is the coarseness of the grain at which we perceive the stuff in question, and perceiving larger collections of happenings as single events often means blinding ourselves to the fact that proper subparts of those collections of happenings also look like events from a more fine-grained perspective. This is the phenomenon referred to as variable pragmatic coarse-graining by Bittner (1999), and it exemplifies the
44
The Variable Size of Events
flexibility in the individuation of events which is key to the project of relating locality of movement to event structure. We have also seen, however, that even if the size of the collections of happenings which are packaged as events is variable, it is not the case that anything goes. Instrumental to our investigation of the size of events is a particular test for single eventhood, namely Fodor’s (1970) distinction between lexical and periphrastic causatives, and related issues discussed instructively by Pietroski (2000). Armed with this test for single events, we will investigate the circumstances under which a set of subevents can be considered as a single macroevent. In particular, we will focus in this chapter on the relation of direct causation, which substantially increases the likelihood of two subevents being construed as a single macroevent. Section 2.1 introduces Fodor’s test, and shows how it diagnoses some basic constraints on macroevent formation. After that, Section 2.2 will formalize some of the constraints that we have found, along lines suggested by Kamp (1979), among others. This leads to a discussion in Section 2.3 of the lexical aspectual classes discussed by Vendler (1957), among others. The proper treatment of the aspectual classes is a huge subject in its own right, with different approaches being proposed by researchers as diverse as Mourelatos (1978), Dowty (1979), Bach (1986), Moens and Steedman (1988), Verkuyl (1989), Parsons (1990), and Pustejovsky (1991). My aim in that section is not to say anything particularly new in this exhaustively researched area but rather to give the bare bones of a treatment of aspectual classes, focusing on two properties which will prove useful in Part II of this book. The first of these two properties is the amenability of the aspectual classes to a decompositional treatment, whereby four aspectual classes are derived from the presence or absence of two subevents, a temporally extended preparatory process causing, and immediately preceding, a pointlike culmination. The second relevant property is the relative fixedness of aspectual class membership. I do not deny the existence of well-established relationships between the structure of the internal argument of certain events, and the structure of the event itself (the measuring out effect, which we will return to below), or other relevant phenomena such as resultative secondary predication or aspectual coercion. Rather, the important point for us is the following: assume for now that an accomplishment can be decomposed into a preparatory process and a culmination. Even if that is the case, we cannot, in the general case, derive an accomplishment-denoting predicate by taking a lexical item describing a preparatory process and attaching it in the semantics to a lexical item describing a culmination. Shifts in the aspectual class of a verb are associated with
The Variable Size of Events
45
relatively small amounts of material in a close syntactic relationship (perhaps mutual m-command) to that verb. This theory of lexical aspect is the point at which most theories of event structure stop.1 However, this work aims to go further and show in Chapter 3 that, under certain circumstances, sentences containing multiple verb phrases can also describe a single event. This is another cornerstone of the project of relating event structure and locality, as the existence of successive-cyclic movement would be inexplicable under the Single Event Condition if VPs or clauses marked an upper bound on the size of a constituent describing a single event. Much of Chapter 3, then, is concerned with a demonstration that multiple VPs, for example in an adjunction relation, may come to jointly describe a single event. This casts Fodor’s Generalization, introduced in Section 2.1, as a one-way implication. A lexical verb must describe a single event, but a single event is not necessarily described by a single verb phrase. The circumstances under which multiple VPs can jointly describe a single event depend on the relations among the events described by the individual VPs. I distinguish a class of contingent relations, consisting of at least direct causation and a relation concerned with plan formation which I will call enablement. If two subevents are construed as related by a contingent relation, then, all else being equal, they can form a single macroevent. The construal of two events, described by two separate VPs, as contingently related determines the availability of a single-event construal, then. Two primary factors affect the availability of a construal as contingently related. The first is the syntactic configuration in which the two VPs are found, and its associated semantics. The second factor is the participants in the dialogue, who may enrich the information encoded in the utterance itself by positing a contingent relation among the subevents, even when the utterance contains no direct encoding of such a relation. This provides a place for world knowledge in the individuation of events. This, in a nutshell, is the theory of event structure to be developed here. By this point, we have delimited a set of well-motivated and often internally complex events. It is, of course, possible to use these events as atoms in further semantic structures, in the same way that constituents can be viewed as the building blocks of larger constituents, and ultimately utterances and discourses. This is a possibility that we will make some use of in Part II, but firstly, Chapter 4 will describe two problems, both fairly well known, concerning the modification of coordinated VPs by temporal PPs, and the semantics of 1 The same is not true of theories approaching event structure from other perspectives, such as cognitive science. In fact, this is the point at which many discussions in such disciplines start. I return to this below.
46
The Variable Size of Events
alternately, which suggest that such larger structures are also necessary on quite independent grounds. Briefly, the former problem is that if two separate time adverbials modify two coordinated VPs, these four constituents all denote predicates of the same event variable. There is then no way semantically to associate the right time adverbial with the right VP. The result, in most cases, is that such sentences come to denote a contradiction. Meanwhile, Lasersohn (1992) has shown that alternately cannot denote a simple predicate of events, as to do so would lead to serious problems when alternately takes two conjoined antonymic predicates of events within its scope, as in the room was alternately hot and cold. We see in Chapter 4 that both of these problems can be solved by making reference to a larger structure defined on the basis of the events delimited in the rest of Part I.
2.1 Variable Pragmatic Coarse-Graining Consider the two utterances in (1). (1)
(a) John let go of the glass. It plummeted towards the ground. A fraction of a second later, it hit the hard floor and shattered into a thousand tiny pieces. (b) John broke the glass.
How many events are described in each of these utterances? Counting conservatively, (1a) describes four separate events, namely the letting go, the plummeting, the impact, and the shattering. On the other hand, our intuitions are clear, and unequivocal, that (1b) describes only a single event. However, it is quite likely that (1a) and (1b) describe exactly the same sequence of happenings, with the only distinction between the two being that (1a) does so in a more finegrained way than (1b).2 For example, (1b) is compatible with the information that John let go of the glass, but it does not entail it. He could equally well have taken a hammer to it, or shattered it by practicing his falsetto. All that needs 2 A similar point was made by Link (1983) with respect to the denotation of nominals. Link observed that the cards on the table and the decks of cards on the table generally refer to the same stuff, but packaged in different ways, as there are 52 cards in each deck. We can tell that it is packaged differently by considering collective predicates such as be numbered consecutively. Link observes that (i) and (ii) do not mean the same thing: in the case of (i), if there are three decks of cards, then there are 156 cards, which should accordingly be numbered 1–156. In the case of (ii), however, the three decks should simply be numbered 1–3.
(i) The cards on the table are numbered consecutively. (ii) The decks of cards on the table are numbered consecutively. Bach (1986) was the first to point out the formal parallels between the structures of the domains of individuals and events, suggesting that extending Link’s lattice-theoretic analysis of individual denotations to the domain of events would be profitable.
Variable Pragmatic Coarse-Graining
47
to have happened for (1b) to be true is that John did something which resulted in the glass’s being broken. The striking thing about (1), then, is that simply being more or less explicit about the details of what actually happens allows a speaker to report that the same series of happenings consists of a single event, or multiple events. Moreover, there is in principle no limit to this subdivision. The action of letting go of a glass can be further decomposed into a series of muscle movements, each of which can be explicitly reported. Moreover, each muscle movement will have its own internal components, which people who know about such things would no doubt be able to report on. And so on. On the other hand, not all sets of happenings correspond to a single event. To see this, let us amend (1a) slightly. (2)
John let go of the glass. It plummeted towards the ground, but miraculously survived intact. Three days later, Bill let go of the same unfortunate glass. Once again, it plummeted towards the ground. This time, it hit the hard floor and shattered into a thousand tiny pieces.
In this case, we are unwilling to accept the sequence of happenings as a single event. There is also no equivalent of (1b) in the case of (2) (we will come back to the significance of this in a minute). John broke the glass and John and Bill broke the glass are false, while Bill broke the glass is true, but does not cover the same series of happenings as (2) does. The question to be addressed in this chapter is why this should be the case. We may assume that sentences denote properties of existentially quantified event variables, but it also seems that not every logically possible property of such a variable actually makes a plausible property of a single event. Furthermore, we know that subparts of single events can also be considered as independent events in their own right, as shown by comparing (1a) and (1b), but we know from (2) that not every collection of independent events can be considered as a single event. So what determines which collections of subevents make good macroevents? The basic reason why (2) does not describe a single event would appear to be that the two ‘halves’ of (2) (that is, John’s dropping of the glass and Bill’s dropping of the glass) are too unrelated. I will take that to be accurate in what follows, but this begs an equally substantial question, namely the meaning of ‘too unrelated’. The issue cannot be that there is no relation between the two droppings of the glass in (2). On the contrary, there are many such relations. Most obviously, they are both the same kind of thing, happening to the same glass, but there are countless other relations to consider.
48
The Variable Size of Events
And so we embark on a search for the relations which allow multiple events to be construed as a single larger event. For example, we might hypothesize that the problem with (2) is that the two glass-droppings are temporally too far apart to be seriously considered as a single event: we do not typically think of events as containing two happenings, three days apart and nothing in between. If so, we might provisionally assume that only temporally continuous sets of happenings can be considered as single events. We can find other such common-sense restrictions too, some of which also may play a part in explaining the intuition that (2) does not describe a single event. For example, we are very unlikely to admit (3) as a description of a single event. (3) A casserole was cooked in an oven in England and immediately afterwards a family ate a casserole in New Zealand. Once again, these two happenings, the cooking and the eating, are apparently too unrelated to be construed as part of the same event. In this case, though, the problem is clearly not temporal, because we are explicitly told that the eating of the casserole in New Zealand abuts the cooking of the casserole in England. As a first guess, we may suggest that the unrelatedness in (3) is somehow connected to the spatial distance between the cooking and the eating parts. We might provisionally assume, then, that a single event must happen in a single place, as well as at a single time. However, even an example like (4), which meets these two criteria, is unlikely to be considered as a single event. (4)
John read a book for a while and then Bill, who was sitting just next to him, laughed.
So even a continuous portion of space–time as described by (4) does not necessarily make a good single event. One might guess that the problem here is that there are no participants in common between the reading and the laughing. However, that possibility is counterexemplified by (5). In (5a), the two subevents (the flicking of a switch and the light going on) have no participants in common, but they are still readily construed as parts of a single larger macroevent, as described in (5b). (5)
(a) John flicked a switch. The light went on. (b) John turned the light on.
Such examples also counterexemplify the provisional claim made in relation to (3) that an event must happen in a single spatial location. It is quite easy to imagine variants of (5) which allow for action over a substantial distance: the President pushes a button in the White House and thereby launches a missile
Variable Pragmatic Coarse-Graining
49
on the other side of the world. It is harder, and perhaps impossible, to find convincing counterexamples to the claim given in relation to (2) that events are temporally continuous, but I will defer discussion of that for now. So (5) shows that two subevents can form a plausible macroevent, even if they have no participants in common. The converse is also true: even if both subevents involve the same individual, there is no guarantee that they will be acceptable as a single event. (6) could be a case in point, where it may appear that the two subevents do not jointly feel like a single event. We might say that the two events in (6) are typically construed as independent, in the sense that there is no reason for either event to be related in any way to the occurrence or otherwise of the other. (6)
John read a book for a while, then went for a walk.
Already, though, our judgements of what can and cannot constitute a single event are probably starting to falter. (6) certainly seems like a more plausible single event than, say, (3), and we might be willing to consider the happenings described by (6) as a single event of, say, John relaxing. Clearly, to make this notion of single event clear, we need at least a heuristic test for how we individuate events. In fact, I have already been using such a test implicitly. It is based on an observation due to Fodor (1970), and has been used experimentally with some success by Wolff (2003). This test relies on the following principle. (7)
Fodor’s Generalization: A single verb phrase describes a single event.
If (7) is true, then a set of happenings must be construable as a single event if we can describe it with a single verb phrase.3 We can make sense of (7) by considering Higginbotham’s (1985) syntactic implementation of Davidson’s proposals regarding event variables. Higginbotham suggests that an event variable introduced as part of the lexical semantics of V is bound by INFL. In modern syntactic terms, this suggests that the complement of T0 denotes a property of events, and that T0 functions to existentially quantify the event variable, and situate the event temporally with respect to the reference time. If this story is correct, an almost automatic consequence is that a verb phrase 3
I restrict this to verb phrases, rather than sentences, for two reasons. The first is because of the effect, noted by Henk Verkuyl (e.g. 1993), that the type of subject can have on the aspectual class of the utterance. For example, bare plural or mass noun subjects coerce a telic event into an atelic event. Although it is not implausible that a sentence such as People turned up for hours describes a single event, I will avoid this question here. The second, and more substantial, reason for stating Fodor’s Generalization in terms of verb phrases is that a sentence clearly does not have to describe a single event. We have seen three cases where this is not the case, in (3), (4), and (6), and we will discuss this problem more generally in Chapter 4. Stating Fodor’s Generalization at the level of verb phrases allows us to disregard this fact for now.
50
The Variable Size of Events
describes a single event. We will modify this simple theory somewhat in Chapter 4, but without losing this basic point. A quick recap is in order. Happenings occur in the world. We are willing to construe some sets of happenings as constituting a single event, but we do so much less readily for other sets of happenings. And if we can describe something with a single verb phrase, we know that we are construing it as a single event, although we have no guarantee that only things describable by a single verb phrase constitute single events. In many cases, we can construe a set of happenings either as multiple smaller events, or as a single larger event, depending on the coarseness of the grain with which we look at the happenings. Whether this is possible depends on the nature of the relations between the smaller events, although we have not attempted a precise statement of which relations allow such macroevent formation. However, Fodor’s Generalization suggests that in at least some cases, the shift from multiple subevents to single macroevent may be accompanied by a shift from multiple VPs describing individual subevents to a single VP describing the macroevent. To see Fodor’s Generalization in action, consider the examples in (8). (8)
(a) Floyd caused the glass to melt on Sunday by heating it on Saturday. (b) #Floyd melted the glass on Sunday by heating it on Saturday. (Fodor 1970: 432–3)
The oddity of (8b) in comparison to (8a) can be explained as a result of two conflicting constraints. On the one hand, the VP melt the glass requires the glass-melting to be construed as a single event. On the other hand, we know that this event did not occupy a continuous stretch of time, as Floyd was heating the glass on Saturday (and presumably, by Gricean considerations, not on Sunday, or else it would have been more cooperative to say so), and the glass melted on Sunday. We saw, with reference to (2) above, that a single event must occupy a continuous stretch of time. There is no way to resolve these two demands, and so the sentence is infelicitous. On the other hand, there is no such problem with (8a): here, we have the same two temporal adverbials, but we also have two VPs, melt and cause the glass to melt on Sunday. Therefore, there is no requirement that the whole sentence describe a single event, and so no requirement for temporal continuity. The conflict we observed in (8b) therefore doesn’t arise: the melting occurred on Sunday, even though the causing occurred on Saturday. We therefore derive the distinction between the two cases in (8). Fodor also claims that the factor distinguishing (8a) from (8b) is direct causation. A lexical causative is only appropriate if the result state is brought
The Directness of Direct Causation
51
about directly by the actions of the subject. This notion unifies the examples given in (2–6), as there is no relation of direct causation holding between the subevents in any of those examples. It also explains why we can find single events with action at a distance, as I can cause things to happen which are quite remote from me. We will eventually expand upon Fodor’s characterization in Chapter 3, introducing a family of contingent relations among events. However, we shall first consider what direct causation might mean here, as it illustrates the fundamental role of coarse-graining in the linguistic encoding of event structure.
2.2 The Directness of Direct Causation A typical definition of causation, for example Lewis’ 1973 notion of causal dependence, holds that causation is a transitive notion, such that if an event e1 causes an event e2 , which in turn causes e3 , then we may reasonably claim that e1 causes e3 , despite the presence of an intervening event. Given such a characterization, the distinguishing characteristic of direct causation is that it does not admit intervening causes. e1 is not a direct cause of e3 , as e2 intervenes between the two. We may formalize direct causation, then, as immediate succession in a strict partial order over a set of events, as follows. (9)
Given a set E of events and a partial order < of E representing causation, e1 directly causes e2 for e1 , e2 ∈ E iff e1 < e2 ∧ ¬∃e3 .(e1 < e3 ∧ e3 < e2 ).
This means that direct causation is dependent on the set of events in question. For example, John letting go of the glass in (1a) is not a direct cause of the glass breaking, as there is an explicitly mentioned intervening event of the glass plummeting towards the ground. Furthermore, we saw above that the event of John letting go of the glass can be further subdivided essentially without limit. Indeed, Kamp (1981a) showed that it is possible to subdivide arbitrarily the transition between any two states, leading to a situation in which the definition of a direct relation in (9) is inapplicable, as there is always an event intervening between any two others in a causal chain. The causal ordering over events becomes dense, in the following sense. (10) An order < is dense iff ∀x∀y.(x < y → ∃z.(x < z ∧ z < y)) Comparing the definitions of directness in (9) and density in (10), it is clear that the following holds. (11) If x < y, where < is some dense order, x and y are not directly related by <.
52
The Variable Size of Events
A dense order is one in which a third element intervenes between any two elements, and so no two elements are directly related. This threatens to lead to a paradox: the direct causation requirement is empirically well-motivated, and Kamp’s arbitrarily fine-grained subdivision of events is conceptually solid. Something has to give. Bittner (1999), building to some extent on van Benthem (1983), introduced the notion of variable pragmatic coarse-graining to resolve this tension. She notes, following Kamp (1979), that a notion of immediate precedence is needed to capture the meaning of adverbials such as immediately, which are not amenable to principled definitions on a view of time as densely ordered. Bittner invites us to consider dialogues such as the following. (12) Q: When did Mary leave the party? A: She left immediately before John. r Better informed observer B: No, she didn’t. I saw her leave before Bill, and Bill left before John r Pragmatically challenged observer C: No, she didn’t. I watched the door the entire second before John left, and she didn’t leave during that time. (13) Q: What happened here? A: A woman tried to rob a bank, and John shot her dead. r Better informed observer B: No, he didn’t. His bullet just grazed her ear. But this frightened her so much that it caused a heart attack, and that was the immediate cause of her death. r Pragmatically challenged observer (watching replay on film) C: No, he didn’t. His bullet started a long chain reaction. For when this molecule in his bullet got close to that molecule in her heart then their electrons repelled, and that made the heart molecule go thata-way, which in turn caused . . . , and that [last mentioned molecular event] was the immediate cause of her death. (Bittner 1999: 19) Bittner observes that B’s objections are relevant to our determining the validity of A’s claims, but that C’s objections are not because they involve too finegrained an account of what happened. This holds in exactly the same way for immediately in (12) and the resultative phrase in (13), although the former imposes a purely temporal order on events, and the latter involves a causal order (part of the semantics of the resultative is that the event described by the matrix verb directly causes the state described by the resultative predicate). In each case, we have an order which is dense in principle (and C’s comments
The Directness of Direct Causation
53
remind us, unhelpfully, of that density), but which, in practice, we conceive of as discrete, over which relations of immediate precedence can be straightforwardly defined. Bittner concludes that: linguistic events are not part of the continuous reality that surrounds us. They are part of a discrete conceptual structure that we may impose on this reality in order to talk about it. The issue is not what is out there, but rather how we choose to conceptualize it given our interests in a particular context . . . Depending on the number and size of these topical events, the grain of the generated time structure will vary—from coarsegrained, if the topical events are few or long-lasting, to fine-grained, if they are shortlived and many. But even though the grain in principle can be refined to an arbitrary level of precision, the time structure generated by any finite event structure will be discrete. (Bittner 1999: 22)
This is the way out of the apparent conflict between the dense ordering of causal chains of events, and our intuitive notions of an event e1 happening immediately before, or being the immediate cause of, e2 . We can refine our causal chains, in principle, to an arbitrary extent, but in practice, we never do. Ideas along these lines have been formalized in very elegant and insightful work such as Kamp (1979), van Benthem (1983), and Landman (1991). Work along those lines raises many questions concerning the relation between temporal units (instants, intervals) and events, the treatment of change, and other issues. I have nothing to add on those issues, and so it would be inappropriate for me to commit myself to a particular axiomatization of the event structures that I will need in this book. However, the basic picture will be as follows. We require a set of events, which we take as primitives, and define several (partially interdefinable) binary relations over them. Some of these are purely temporal relations, some causal (I will assume that causal relations among events, like the classification of sets of happenings as events, are part of our way of carving up the world into perceptual units, rather than something inherent in the sets of happenings themselves), and some concerned purely with the structure within the set of events itself. The following, based heavily on Landman (1991: 196), is a partial list of the relations we are interested in— one more will be added in Chapter 3, and I leave it as an open question whether yet further relations should be added. (14)
Given a set of events E, we define: r A strict partial order precede; r A partial order temporally include;
54
The Variable Size of Events r A reflexive, symmetric relation overlap; r A strict partial order cause; r A partial order part-of. The following conditions on these relations hold: 1. If temporally include (e1 , e2 ), then ∀e.(precede(e, e1 ) →precede(e, e2 )) ∧ ∀e.(precede(e1 , e) →precede(e2 , e)). 2. If overlap (e1 , e2 ), then ∃e3 ∃e4 .(temporally-include(e1 , e3 )∧temporallyinclude(e2 , e4 )∧temporally-include(e3 , e4 ) ∧temporally-include(e4 , e3 )). 3. If part-of (e1 , e2 ), then temporally-include (e2 , e1 ). 4. if cause (e1 ,e2 ), then precede (e1 , e2 ).
There would appear to be some redundancy in a system that includes both precede and cause, for example, or temporally-include and part-of. We require such paired relations, though, because in each of these pairs, one relation is strictly temporal, whereas the other has more to do with the concerns that, I claim, influence the structure of the domain of events that we perceive. Neither of these is reducible to the other, because different events can go on at the same time, while two simultaneous time periods are the same period. This means, for example, that one event may temporally include another event but be completely unrelated to it. We do not say in such a case that the smaller event is part-of the larger event. We saw above that similar considerations hold with respect to precede and cause. Note that the inclusion of the part-of relation means that this system can capture the central notion of macroevent formation: if e1 and e2 jointly form a macroevent, then ∃e.part-of(e1 , e)∧part-of(e2 , e). However, it tells us nothing about the conditions under which that will happen. If we think that macroevent formation is sometimes automatic, we must add extra stipulations. If, for example, we think that a relation of direct causation holding between e1 and e2 is sufficient to guarantee that e1 and e2 jointly form a macroevent, we can add the following condition. (15) If cause(e1 , e2 ) ∧ ¬∃e3 .((cause(e1 , e3 )∧cause(e3 , e2 )), then ∃e.(part-of(e1 , e)∧part-of(e2 , e)). In fact, such a condition is somewhat too strong, as will become apparent. However, central to this project is a statement of the circumstances under
Aspectual Classes
55
which such macroevent formation can take place, optionally or automatically. As a further step in that direction, the following section briefly examines the major area in which syntacticians and semanticists have found an approach based on event decomposition to be fruitful, namely the lexical aspectual classes. To summarize this chapter so far, the flow of stuff happening does not come readily packaged into identifiable events any more than the stuff we are confronted with is inherently delimited into individuals. Instead, we package these happenings ourselves, in a way that is not completely free but is flexible enough to allow some sensitivity to the requirements of the current discourse environment. The coarseness of the grain used in this packaging determines what counts as an individual event, and in the light of Fodor’s Generalization, determines what can be expressed by a single verb phrase. There are restrictions on what can be considered as a single event, which lead to restrictions on the coarseness of the grain with which this packaging may apply. We discussed Fodor’s generalization that lexical causatives can only encode direct causation. Although we will modify this proposal below to allow single verb phrases to describe other relations among events, Fodor’s Generalization allows us to exclude many (but not all) temporally or spatially discontinuous sets of happenings, as well as many (but not all) cases where disjoint sets of participants are involved in different subevents. All this shows that, despite the flexible nature of our ability to package up happenings into events, and despite the ready possibility of one event forming a subpart of a larger macroevent, there are clear limits on what can be considered as a single event.
2.3 Aspectual Classes Section 2.2 showed that the linguistic individuation of events is partly a pragmatic matter and one that requires sensitivity to the current discourse context. Moreover, Fodor’s Generalization, that a single verb phrase describes a single event, imposes a way for the speaker to vary the coarseness of the grain with which he describes events. For example, a speaker who utters (1a), with its detailed, ‘slow-motion’ description of someone dropping a glass, is describing events from a more fine-grained perspective than one who utters the more concise (1b). However, not all verb phrases are alike in the relationship between the single event they describe and the subevents which that event may be composed of. The differences which certain classes of verbs exhibit in this way relate to the aspectual classes delimited by Vendler (1957).
56
The Variable Size of Events
Vendler observed that different English verbs (or, more accurately, verb phrases) behave differently with respect to a number of distributional tests. Two core tests each divide the class of verb phrases in two, thereby jointly giving four classes of verb phrase, the accomplishments, achievements, activities, and states.4 The tests which divide the class of verb phrases in this way are, firstly, compatibility with the progressive tense, and secondly, the forms of wh-questions which are most natural with a given example. Although these tests are assumed to be reflexes of deeper semantic properties, then, they are essentially syntactic, distributional tests. We will turn shortly to the semantic properties that they reflect. The first distinction Vendler makes is between those verb phrases which can take the progressive and those which cannot. Accomplishments (16a) and activities (16b) belong to the former category, while achievements (16c) and states (16d) belong to the latter. (16) (a) (b) (c) (d)
I am running a mile (drawing a circle, building a house, . . .). I am running (writing, working, . . .). *I am spotting the plane (appearing, blinking, . . .). *I am knowing the answer (loving you, understanding antisymmetry, . . .).
The two classes which can take the progressive, and the two that cannot, are further distinguished by the types of wh-questions which can felicitously be formed from them, when the question concerns the temporal location and duration of the event in question. Accomplishments (17) and achievements (19) reject questions based on the phrase for how long in favour of phrases which imply an endpoint to the event in question, while the opposite is true of activities (18) and states (20).5 (17) (a) #For how long did he run a mile (draw a circle, build a house, . . .)? (b) How long did it take to run a mile (draw a circle, build a house, . . .)? (18) (a) For how long did he run (write, work, . . .)? (b) #How long did it take to run (write, work, . . .)? 4 Vendler in fact presents many more than two tests, but the others are all intended to reinforce the two distinctions made by the progressive and wh tests, so I ignore them here. 5 The actual forms of the wh-questions used to test accomplishments and activities on the one hand, and achievements and states on the other, differ, presumably as a consequence of the results of the test given in (16) above. Explanations for the divergent acceptability of different question forms for accomplishments and achievements can be found in view of the semantic differences between the classes to be explored below.
Aspectual Classes
57
(19) (a) #For how long did you spot the plane (appear, blink, . . .)? (b) At what time did you spot the plane (appear, blink, . . .)? (20) (a) For how long did you know the answer (love me, understand, . . .)? (b) #At what time did you know the answer (love me, understand, . . .)? Vendler comments on the semantic significance of his distributional tests in several places, including the following passage: The concept of activities calls for periods of time that are not unique or definite. Accomplishments, on the other hand, imply the notion of unique and definite time periods. In an analogous way, while achievements involve unique and definite time instants, states involve time instants in an indefinite and nonunique sense. (Vendler 1957: 149)
Vendler, then, conceived of the availability of the progressive as corresponding to the description of a period, as opposed to an instant; and he saw the different forms of acceptable wh-questions as corresponding to a distinction analogous to the definiteness distinction in nominals. The latter distinction has, in fact, not been widely adopted. Instead, in the spirit of Bach’s (1986) extension of Link’s (1983) analysis of the structure of the domain of individuals, which found an analogous structure in the domain of events, researchers (in particular Krifka 1989) have found that a more apt analogy with the nominal domain relates this pattern to the mass–count distinction. As Vendler notes elsewhere, a verb phrase which rejects for how long questions is temporally bounded, in much the same way that a count noun denotes a spatially or spatiotemporally bounded individual. That is: [W]hile running or pushing a cart has no set terminal point, running a mile and drawing a circle do have a “climax,” which has to be reached if the action is to be what it is claimed to be. (Vendler 1957: 145)
To borrow some terms from Smith (1997), then, accomplishments and achievements are distinguished from activities and states by the presence of a natural, rather than an arbitrary, endpoint, or a telos. Although such terms remain intuitive, they give us a reasonably clear first impression of the semantic correlates of Vendler’s distributional tests. To make this notion more precise, I adopt a common idea in the literature (see in particular Pustejovsky 1991, although clear precursors to this position can be found in Dowty 1979, Bach 1986, Parsons 1990, and elsewhere). The idea is that, although it is fair to say that, at a microscopic level, all
58
The Variable Size of Events Change CULMINATION
significant
insignificant
PROCESS
Time
Figure 2.1 The maximal core event
accomplishments, achievements and activities consist of subevents,6 they differ in which components they realize of a fixed event structure, which I will refer to as the maximal core event. Any (possibly nonproper) subset of the maximal core event, I will refer to as simply a core event. The maximal core event consists of two subevents: the process, which corresponds to Vendler’s ‘periods’ in the above; and the culmination, the typically attained climax which characterizes a telic event. We may think, then, of a maximal core event as schematized in Figure 2.1, where the x axis represents temporal or causal progression; the horizontal line represents a relatively continuous or homogeneous process, where such changes as may occur are linguistically irrelevant; and the vertical line represents an instantaneous (again, at least from a linguistic perspective), linguistically significant change.7 We arrive, then, at the following decompositions of the aspectual classes: (21)
Vendlerian aspectual classes: (a) Accomplishment = process + culmination (b) Achievement = culmination
6 Interestingly, it appears that states do not. Specifically, it is hard to find any particular subparts of a state to describe, as states are, by definition, static (in the framework to be adopted immediately below, they do not contain a process or a culmination). This position echoes that of Verkuyl (1993: ch.14), who claims that the interval at which a state is true is not subdivided into discrete ‘milestones’ as it would be for an event; or van Voorst (1992: 78), who writes that ‘[s]tates do not take place or do not happen’. 7 This diagram recalls the diagram of a nucleus presented, in a similar context, in Moens and Steedman (1988), and also the structure assigned to situations in Smith (1997). Each of these authors makes different proposals about the component parts of such a structure, and those proposals have empirical consequences, concerning viewpoint as well as situation aspect, in Smith’s terms. However, explicit comparison of these alternatives would distract somewhat from my main concerns here. In a nutshell, the reason why my proposal contains fewer components than those of Moens and Steedman or Smith is that I am only interested here in a model of inner aspectual distinctions, and so I do not need to concern myself with issues relating to the representation of the perfect, for example, or ingressive readings, which do fall under the scope of the other, more expansive models. Even those models tend to have a privileged subpart (the nonstative part of Moens and Steedman’s model, and Smith’s features [±durative] and [±telic]) which corresponds broadly to my maximal core event, though.
Aspectual Classes
59
(c) Activity = process (d) State = ∅ The semantic reflex of the distributional tests given above is then straightforward. If you have a process component, you can form the progressive. If you do not have a culmination component, you can form a for how long question. Taking this seriously, however, suggests a slightly different set of decompositions, as in (22). (22)
Aspectual classes (revised): (a) (b) (c) (d) (e)
Accomplishment = process + culmination Achievement = process + culmination Point = culmination Activity = process State = ∅
The reason for this is that some, but not all, predicates commonly classified as achievements actually allow the progressive in some circumstances, as Vendler himself noted (see also Verkuyl 1989, Smith 1997). (23) (a) I’m reaching the summit as we speak. (b) John is winning the race. (c) He’s arriving any minute now. This does not mean that these predicates have been misclassified as achievements, however. We will see in Chapter 3 that the behaviour of these achievements in the progressive is different from that of accomplishments in the progressive, in that the former are restricted to a type of ‘prospective’ meaning, implying that the culmination is about to be reached. Equally, though, it is not the case that this obliterates the Vendlerian class of predicates which disallow both progressives and for how long questions. Many other predicates still behave precisely like this.8 (24) (a) *John is noticing the carnage. (b) *John is recognizing his long-lost brother. (c) *John is hiccupping. 8 Many of the examples in (24–25) are perfectly acceptable under other readings. For example, the (a) and (b) examples with notice and recognize allow stative interpretations, and consequently (24a), for example, can be coerced into an inchoative reading. And the (c) and (d) examples can be rescued by treating hiccup or blink as an iterated action, which creates an atelic activity out of this single, pointlike event, or as a description of a photograph which catches its subject mid-blink. However, in their basic senses, where these verb phrases refer to single, near-instantaneous events, all of these examples are unacceptable.
60
The Variable Size of Events (d) *John is blinking.
(25) (a) (b) (c) (d)
*For how long did John notice the carnage? *For how long did John recognize his long-lost brother? *For how long did John hiccup? *For how long did John blink?
This suggests that we have an extra aspectual class here. I will call predicates with the behaviour in (24–25) points, while reserving the term achievement for those predicates with limited use of the progressive, as in (23). According to the above interpretation of Vendler’s distributional tests, this means that points have only a culmination, whereas achievements have a process and a culmination. Points are clearly related to semelfactives, the fifth aspectual class described in Smith (1997). Semelfactives, for Smith, are basically an atelic version of achievements. Smith focuses on the inchoation of a linguistically significant result state as the defining difference between achievements and semelfactives: achievements like arrive imply that a result state begins to hold, while the same is not true of semelfactives like cough. This leads Smith to impose slightly different divisions on the class of predicates from the above: while arrive is an achievement for both Smith and for me, and cough is a semelfactive for Smith and a point for me, instantaneous predicates which lead to a distinct result state, such as notice, are points for me (because they are incompatible with the progressive), but achievements for Smith. In other words, Smith’s class of semelfactives constitutes a proper subset of my class of points. I do not see any immediate problems with this: we are interested in slightly different phenomena, and slicing the cake in slightly different ways, accordingly. The decomposition in (22) does not distinguish between accomplishments and achievements, then. We may then ask what does distinguish these two classes, given that we have exhausted the possibilities of processes and culminations. We will return to that question in Chapter 3, where we will also discuss the interpretive distinction between progressives formed from accomplishments and achievements. The above model implies, though, that this distinction lies outside of event structure as narrowly defined above. The progressive manipulates the perspective from which we view an event, locating the reference time within the temporally extended process component (this is why states and points, predicates with no process component, do not allow the progressive), and so it does not distinguish between accomplishments and achievements, as both classes have a process component. The distinction
Aspectual Classes
61
between accomplishments and achievements rests rather on a subdivision within the broad class of culminated processes. Another way in which the class of culminated processes can be subdivided concerns the distribution of descriptive content across the two subevents within an accomplishment or an achievement. Typically, the verb itself describes only one subevent of a culminated process, that is, the process or the culmination, but not both. Other material may contribute to the description of the other subevent. This division of labour (discussed under the name of manner–result complementarity by Levin and Rappaport Hovav 2008) can be seen, for example, in the ‘measuring-out’ effect discussed by Verkuyl (1972), Tenny (1987), Krifka (1989), Dowty (1991), Jackendoff (1996), and Smith (1997), whereby the mass/count properties of the nominal complement of, for example, a creation or consumption verb, determine the telicity of the resulting event description. (26) (a) Susan ate an apple in/#for 30 seconds. (b) Susan ate apples for/#in 5 minutes. (c) Susan ate for/#in 10 minutes. Here, as with most accomplishment verbs, eat arguably only describes the nature of the process which Susan is engaged in (see Ramchand 2008). Verbs like eat are compatible with a telic reading, but do not require such a reading. The nature of the denotation of the complement of eat therefore determines whether that process has, in Smith’s (1997) terms, a natural endpoint (as in (26a), an accomplishment description) or an arbitrary one (as in (26b,c), activity descriptions). We have, then, a process, and only a process, described by the verb; and we infer a natural endpoint where appropriate, according to the nature of the internal argument. Such inferences concerning the endpoint of the eating process can also be cancelled, as in aspectual coercion phenomena, which is why the variant of (26a) with a for-PP is unusual, but not impossible. The inference that (26a) describes a process of eating cannot, on the other hand, be cancelled. The same story can be told in the case of resultative predicates, as in (27). It does not matter here exactly how the syntax and semantics of resultatives work. Once again, though, the verb describes a process, and the state resulting from that process is described by the postverbal material. (27) (a) Sean wiped the table clean. (b) Michael ran himself ragged. Examples such as these contrast with cases in which the verb describes only a culmination, despite the fact that we can infer the existence of an associated
62
The Variable Size of Events
process. The canonical examples of such a configuration are achievement verbs. In examples like (28), we infer that the package and the climber have traversed some unspecified path to reach their respective culminations. However, no descriptive content is attached to that process of path-traversal by the predicate itself. (28) (a) The package has arrived. (b) The climber just reached the summit. There is, then, a rough correlation between the accomplishment–achievement distinction and the choice of subevent to which the verb’s descriptive content is attached. However, certain other cases obscure this correlation to some extent. The first, and least substantial, class concerns cases of verbal polysemy. One prominent example of this is the verb cut. Cut is discussed by Levin and Rappaport Hovav as a potential counterexample to their generalization concerning manner–result complementarity. The pertinent property of cut is that, as it is typically used (e.g. (29)), cut describes both a process (slicing actions with a sharp blade) and a culmination (severing of some part of the patient of the cutting). (29) The Queen cut the red ribbon. However, at least some uses of cut fail to encode such a process.9 One example is a reading peculiar to reflexive uses of cut, illustrated in (30a) (compare the nonreflexive (30b), where the most salient reading of (30a) is unavailable). On this reading, there is no specified manner component. Instead, all that is described is a result, namely that Laura was bleeding. As the specified process component disappears, so does Laura’s agentive involvement. This is why Laura cuts herself on, rather than with, a barbed wire fence, as (30c) shows: the complement of with describes an instrument, and the barbed wire fence is not an instrument on this reflexive reading of cut. (30) (a) Laura cut herself on a barbed wire fence. (b) *Laura cut James on a barbed wire fence. (c) #Laura cut herself with a barbed wire fence. A second class of more involved cases pertaining to descriptions of process and culmination is those verbs which, in fact, describe neither, acting as a kind of dynamic, light verb skeleton, to which stative descriptive content contained in 9 Levin and Rappaport Hovav (2008) go further than this and claim that cut has one pure-manner reading and one pure-result reading, but no single manner–result reading. Whether this argument proves to be tenable is not important for the use we shall ultimately make of this typology of processand culmination-describing verbs.
Aspectual Classes
63
sister constituents can attach. One example which we will return to repeatedly below is drive, as used with various predicates indicating different types of distress, as in (31).10 (31) (a) The noise drove Simon crazy. (b) Her great uncle drove Melissa to drink. Finally, contradicting Levin and Rappaport Hovav’s generalization concerning manner–result complementarity, there appear to be verbs which indeed describe both components, as Koontz-Garboden and Beavers (2009) show. Koontz-Garboden and Beavers’ star witness is the class of verbs of killing, illustrated in (32a), which entail both a specific action leading to death and the resulting death itself. However, they also mention verbs of cooking (32b) as another source of counterexamples. (32) (a) Bill drowned/electrocuted/crucified Neil. (b) Mike fried/roasted/blanched the tomatoes. All four logical possibilities are realized, then. Given a verb which describes an event consisting of a process and a culmination, the verb’s descriptive content may attach either to the process alone (as in classic verbs of creation and consumption); to the culmination alone (as with achievement predicates, and also the reflexive reading of cut); to neither (as with light verbs such as drive); or to both (as with verbs of killing or cooking). We will see in Part II that these different possibilities lead to different possibilities of extraction from modifying adjuncts. I do not have a particular theory of how these different subevent structures are derived compositionally. In particular, I do not want to enter the debate about the extent to which these patterns reflect regular syntactic and semantic composition (see, for example, Borer 2005 for a strongly syntactic view; many of the above references, such as Jackendoff 1996 and Smith 1997, for more lexicalist positions; and Ramchand 2008 for an attempt to find a third way). I must, however, stipulate that the type of event composition seen here is in 10 In principle, the same point could be made with better-known light verbs such as make or get. In fact, though, those verbs will not give such clear-cut results when we come to examine extraction patterns in Part II. Why this should be so is a mystery to me. Clearly, each verb has its idiosyncrasies: as well as the restriction to predicates of distress, drive is also syntactically restricted, cooccurring with adjectives, to-PPs, and, marginally, to-infinitives (What would drive a man to do a thing like that?). Make, on the other hand, is restricted to AP, predicative NP, and bare infinitival complements; while get is most natural with participial complements, as well as certain, mainly negative, adjectives (get John drunk or angry but not *get John happy). Why any of this should be the case is beyond me, but it makes me inclined to see the different extraction behaviour of the three predicates as just another idiosyncratic difference between these verbs.
64
The Variable Size of Events
some way distinct from a more general process of macroevent formation. The reason is that, in the general case, we will want macroevent formation to apply quite freely, taking two smaller events and forming a larger one out of them whenever our perception admits (or requires) it. We will see some examples of such relatively unconstrained macroevent formation in Chapter 3. However, in the particular case of decompositional models of the aspectual classes, such as that seen in the present section, the possibilities for macroevent formation are much more limited, and largely determined by the matrix verb in the construction. The crucial case in this respect is that of accomplishment predicates. On the model presented here, accomplishments are subspecies of culminated processes. That is, they contain a process, which directly causes a culmination. In many cases, as in (26–27), the presence of a culmination is inferred from, or entailed by, material appearing very low within VP, such as a direct object or a resultative predicate, rather than by the verb itself. However, neither a count NP direct object nor a resultative actually describes a culmination in itself. The direct object describes an affected participant in the event, while the resultative describes a state which holds after, or as a result of, the culmination being reached. However, the presence of such material licenses the inference that the process described by the verb has a natural endpoint. What does not happen is that we take two event descriptions, one describing a process and one describing a culmination, and treat them as jointly describing a culminated process. Consider (33). The accomplishment described in (33a) has two subevents: a process of running which directly causes a culmination of John reaching the end of a mile-long path. In some circumstances, then (for example, when John runs a measured mile), the same series of happenings could be described as in (33b). (33) (a) John ran a mile. (b) John ran and (then) reached the 1-mile marker. However, (33b) does not behave like an accomplishment description, in the way that (33a) does. For example, progressive forms of (33a) show the characteristic accomplishment pattern whereby (34a) entails that John is running (and so has run), but not that he will reach the end of the mile. None of the progressive examples (34b–d), based on (33b), show such behaviour. When both verbs appear in the progressive, as in (34b), the reading is not identical to (34a). Rather, progressive reaching continues to show the behaviour that it generally shows, of requiring that the culmination be imminent (see Chapter 3 for more on this behaviour). (34a) is not equivalent to (34b), then: the former
Aspectual Classes
65
is felicitous as soon as John embarks on his mile, while the latter only becomes felicitous when the mile is almost over. (34) (a) (b) (c) (d)
John is running a mile. John is running and reaching the 1-mile marker. John has run and is reaching the 1-mile marker. John is running and will/may/intends to reach the 1-mile marker.
Similarly, neither (34c) nor (34d) has the same truth and felicity conditions as (34a). This is obvious in the case of (34c), but less so for (34d). However, it appears that there is no modal modification of reach which captures the relation between process and culmination entailed by the use of the progressive in (34a). Will is obviously incorrect: John is running a mile does not entail that John will run a mile. May is better in that respect, but does not capture the closeness of the relation between the running and the reaching of the 1-mile marker: John is running and may reach the 1-mile marker can be true even if John actually believes he is running half a mile, or a marathon. In such circumstances, however, (34a) is infelicitous. Given such considerations, John is running and intends to reach the 1-mile marker is perhaps the most accurate two-verb paraphrase of (34a). However, even here there are problems. The first is empirical: the events that you participate in are not always determined by your own intentions. (35b) is certainly not a good paraphrase of (35a), for example: the rats probably have no idea what a mile is, and the intention that they run a mile is my intention, not theirs. (35) (a) My lab rats are running a mile today. (b) My lab rats are running and intend to reach the 1-mile marker today. The second worry concerning a paraphrase based on intend is more basic. Such a formulation no longer represents an attempt to form a culminated process description out of a process description (a form of run) and a culmination description (a form of reach). Rather, it is, if anything, an attempt to form a culminated process description out of a process description and a description of a mental state (a form of intend to reach). As an attempt to compose a culminated process description out of a process description and a culmination description, it therefore falls at the first hurdle. The possibly unsurprising conclusion of all this is that we cannot take just any event descriptions and push them together any way we feel like, to compose descriptions of culminated processes (accomplishments or achievements). In examples like (33b), the activity description continues to behave like
66
The Variable Size of Events
an activity description, and the achievement description continues to behave like an achievement description. Accomplishment-like behaviour simply does not emerge. In the light of such a conclusion, the behaviour of a further construction stands out as surprising. This is the Bare Present Participial Adjunct, or BPPA, construction mentioned briefly in Chapter 1, to be discussed in more depth in Part II. A BPPA attaches to a matrix verb phrase and provides a description of a process which overlaps temporally with the event described by the matrix VP. In the simplest case, as in (36), the matrix VP already contains a process description of its own. In such cases, the interpretation of the BPPA is as a description of a simultaneous, backgrounded event (the interpretation doesn’t change vastly if while is inserted before the BPPA). (36) (a) Michael works whistling tangos. (b) Jane wrote a letter thinking about her future. Much the same is true in those cases where the matrix VP does not contain a process description, whether the matrix VP entails the existence of such a preparatory process, as in (37a, b), or not, as in the case of point descriptions like (37c). (37) (a) Sarah reached the summit walking along the footpath. (b) Nigel drove Jean crazy whistling mazurkas. (c) Ben noticed the commotion looking through his binoculars. However, in those cases where the matrix VP entails such a preparatory process, a further reading exists in which the BPPA descibes that very process. In this reading, Nigel’s whistling mazurkas is the process which leads to the culmination of Jean going crazy, and walking along the footpath is the process by which Sarah reached the summit. No similar reading exists for the matrix point description (37c). Although looking through binoculars is the sort of thing which could well function as a direct cause of noticing a commotion, this is not quite how we interpret (37c). In particular, there is no feeling of a natural endpoint here. Whereas Sarah’s progress towards the summit can be measured out in terms of her walking along the footpath, or Jean’s craziness can be tracked through the quantity of mazurkas that Nigel whistles, no such relation between process and culmination is present for (37c). This means, for example, that we would probably not say that Ben noticed the commotion by, but rather while, looking through his binoculars. This suggests that the availability of the extra, causal reading of BPPAs requires that the matrix VP entails the existence of a preparatory process, but does not attach any descriptive content to that process.
Summary
67
On the above analysis, BPPAs are exceptional: two verb phrases jointly describe the process and culmination of a single complex event. I do not know why BPPAs should be exceptional in this way, though I will suggest in Part II that their syntactic smallness may be related to it. However, one contributing factor surely has to be related to the lexical nature of the matrix VP. Where the causal reading of BPPAs is available, the matrix VP specifies that a process subevent exists, but does not specify anything else about it. A BPPA, on the other hand, describes a process event, but does no more than that. Given the special role of the matrix verb in determining the overall aspectual shape of a sentence, it may be that BPPAs are uniquely well suited to contribute to an event description, without altering its overall shape.
2.4 Summary This concludes the discussion of aspectual classes. Although there is substantial agreement about most points covered in this chapter, it already forms a nontrivial theory of event structure. We have seen that events are discrete reifications of sets of happenings, reflecting perceptual factors such as the coarseness of the grain with which those happenings are reported. Furthermore, events may contain smaller events as proper parts, and conversely, two subevents can jointly form a single larger macroevent. The circumstances under which such macroevent formation takes place are once again largely determined by perceptual factors. In particular, the importance of direct causation was highlighted here: if one event is perceived as directly causing a second event, then typically, the two events may jointly form a macroevent. Finally, we examined a range of lexically constrained event structures, the lexical aspectual classes. A first pass divides verbal and other predicates into four classes depending on which components they realize of a bipartite structure which I call the maximal core event. Of most immediate interest to us is the class of culminated processes (accomplishments and achievements), which contain both components of the maximal core event. Within the class of culminated processes, further divisions can be made, depending on whether the matrix verb attaches descriptive content to the process, the culmination, both, or neither. Finally, we saw that the unique nature of Bare Present Participial Adjuncts lies in the fact that, when the matrix verb describes only the culmination within a culminated process, a BPPA can describe the process. This leads, uniquely, to a configuration in which two VPs jointly describe a single core event, a configuration that we will return to in Part II.
3 Single Events from Multiple Verb Phrases and the Role of Agentivity In the previous chapter, we pointed out some consequences of the Davidsonian hypothesis that verb phrases can describe events, and noted some of the ways in which an event can be internally structured. In particular, we saw that certain verb phrases describe events with two major subparts, namely a process which directly causes a culmination. All of this is common to dozens of theories of event structure. This chapter explores two questions which are less often considered. Firstly, in Section 3.1, we ask whether syntactic units containing multiple verb phrases can describe a single event. After that, in Section 3.2, we ask whether relations other than direct causation are sufficient for macroevent formation. The rest of the chapter considers a second such relation, which I will call enablement. The theme running through this chapter, then, is finding out how far we can expand the rather frugal conception of the syntax and semantics of event structure that we started with in Chapter 2. By the end of the chapter, we will have a clearer picture of the cognitive units at issue, and we will be almost ready to return to the main subject matter of this book, namely the role of those cognitive units in certain locality effects.
3.1 Events and VP-Coordination Whether or not multiple VPs can describe a single event plausibly depends on the syntactic configuration of those VPs. We must therefore be slightly more explicit about the syntactic encoding of events than we have been so far. I will start with perhaps the simplest case, namely two coordinated VPs, before moving on to consider other structures. We are interested in examples like the following. (1) John [VP [VP sang] and [VP danced]]. Given that John sang describes one event, and John danced describes one event, and given also that certain events can properly contain other, smaller events,
Events and VP-Coordination
69
does (1) describe one event of singing and dancing, or two events, one of singing and one of dancing? When we consider this question fully, in Chapter 4, we will come to the conclusion that, in simple cases like (1), anything goes: such examples can be interpreted as describing one event or many events. However, for the immediate purposes of this chapter, we only need show that multiple coordinated VPs like those in (1) can sometimes describe a single event. To see this, we need to look at the seminal work of Higginbotham (1985) on the syntactic representation of event descriptions. One of Higginbotham’s achievements in that paper was to provide a way of deriving Davidson’s event semantics compositionally. One of Davidson’s pieces of evidence for the presence of an event variable involved the relations between sentences such as the following. (2)
(a) Jones buttered the toast. (b) Jones buttered the toast in the bathroom. (c) Jones buttered the toast in the bathroom with a knife at midnight. (Davidson 1967: 83)
Each sentence lower on this list entails all the sentences above it. So (2b) entails (2a), and (2c) entails both of the above. The question is, how do we derive all of the examples in (2) while keeping a compositional interpretation procedure which ensures that all these entailments go through? One solution which will not work is to assume that butter is a verb of variable adicity (along with, presumably, all other verbs in English). This approach would claim that, just as causative–inchoative melt can take either one or two nominal arguments, butter can take two, three, or five arguments, along with any number of other options. This is represented schematically in (3). (3)
(a) [[(2a)]]= butter2 (jones,toast) (b) [[(2b)]]= butter3 (jones,toast,bathroom) (c) [[(2c)]]= butter5 (jones,toast,bathroom,knife,midnight)
The problem with this approach is that butter2 , butter3 , and butter5 , however similar they may look on the page, are logically independent predicates. The fact that I have chosen very similar names for them should not blind us to the fact that these names, if unsupported by a theory of how butter2 relates to butter3 and butter5 , remain arbitrary choices. Of course, it is possible to stipulate such truth-conditional dependencies as meaning postulates, as in (4). (4)
(a) MP1:∀x∀y∀z.(butter3 (x, y, z) → butter2 (x, y)) (b) MP2:∀u∀w∀x∀y∀z.(butter5 (u, w, x, y, z) → butter3 (u, w, x))
70
Single Events, Multiple VPs, and Agentivity
However, as we face an arbitary number of adverbials modifying a given verb phrase, we would need a correspondingly unlimited number of meaning postulates. Even worse, similar postulates would need to be stated independently for every verbal predicate in the language. Our grammar would have to contain an infinite set of meaning postulates, which is clearly unworkable. A second approach to the problem posed by (2) fails for similar reasons. This would be to assume that the verb butter unambiguously denotes, not a two-place predicate but rather an n-place predicate, for some fixed n, with a pre-assigned slot waiting for every possible type of adverbial. If a given utterance contains an occurrence of the verb without its full complement of arguments (in a sense now expanded to include adverbials), then the unrealized argument positions would be existentially quantified. If we fix the number of argument slots at five, for example, the examples in (2) would be represented as follows. (5)
(a) [[(2a)]]= ∃x∃y∃z.(butter(jones, toast, x, y, z)) (b) [[(2b)]]= ∃x∃y.(butter(jones, toast, bathroom, y, z)) (c) [[(2c)]]= butter(jones, toast, bathroom, knife, midnight)
We now have the entailments we want. It is quite legitimate to infer (5a) from (5b), and to infer either of these from (5c). However, there is no reason to limit the combined number of arguments and adverbials occurring with the verb butter to five, as we have, arbitrarily, here. As Davidson points out, we can expand (2c) further, by adding ‘. . . by holding it between the toes of his left foot’, and it seems that any limit on the number of possible such modifications is processing-based, rather than grammatical, in nature. In that case, this approach, too, bites the dust. Finally, we can treat modifiers like in the bathroom or at midnight as functors, taking VPs as arguments. If a VP is of type α, such modifiers are of type α, α. This allows such modifiers to be added recursively to VPs, with no intrinsic upper bound on application of such an operation. Although this is a clear improvement over the previous proposals, and has real appeal as a treatment of modal adverbials, problems remain, because there is no guarantee that application of such a function will preserve the entailments that it should. For instance, there is no guarantee in the general case that in_the_bathroom(butter(x, y)), or at_midnight(butter(x, y)), entails butter(x, y). To put it another way, we know that a buttering at midnight or a buttering in the bathroom is still a buttering, and this holds fairly generally, but on the modifiers-as-functors approach, the only way to capture that fact is in a piecemeal way, through the semantic representations of individual lexical items.
Events and VP-Coordination
71
The novelty of Davidson’s approach is that it allows a finite expansion of the argument structure of a given verbal predicate, which nonetheless creates the flexibility to allow an arbitrary number of modifiers of that predicate. Davidson’s theory works by assuming that each relevant predicate has one more argument than traditionally assumed, corresponding to an event, often taken to be a subclass of individual. This additional argument position is generally existentially quantified, but included within the scope of the existential quantifier may be other predicates of the event variable, as in (6). The entailment that a buttering in the bathroom is still a buttering then becomes an instance of the general schema (∃x.P(x) ∧ Q(x)) → (∃x.P(x)). (6)
(a) [[(2a)]]= ∃e.(butter(jones, toast, e)) (b) [[(2b)]]= ∃e.(butter(jones, toast, e) ∧ in(e, bathroom)) (c) [[(2c)]]= ∃e.(butter(jones, toast, e) ∧ in(e, bathroom)∧ with(e, knife) ∧ at(e, midnight))
Higginbotham’s (1985) syntactic implementation of Davidson’s idea, whereby a verb introduces an event variable, which is bound by INFL, provides a way of approaching the two questions raised at the start of this subsection. The analysis of the sentences with adverbials in (2) adds three further assumptions to this basic theory. Firstly, we assume, as is standard, that the adverbials in question are attached below INFL, and so below the point at which the event variable is existentially quantified. Secondly, we assume that the adverbial phrases also denote predicates of events. Finally, we assume that the adverbial can combine with the VP by conjunction, identifying the λ-abstracted event variables within the two conjuncts (this is the rule of Event Identification in Heim and Kratzer 1998, based on Higginbotham’s rule of Theta Identification). All this together gives a representation of the VP in (2b) as follows, with the semantics of every node represented beneath its label. (7)
VP lely.(butter(y, t, e) ∧ in(e, b))
VP lely.(butter(y, t, e))
butter lxlely.(butter(y, x, e))
NP t the toast
PP le1.in(e1, b)
in lxle1.in(e1, x)
NP b the bathroom
72
Single Events, Multiple VPs, and Agentivity
So Higginbotham allows a compositional implementation of Davidson’s event semantics, by treating PP modifiers semantically as conjoined predicates of events. This suggests that it should at least be possible to implement a similar interpretation procedure for multiple VPs in VP-coordination structures.1 A simplistic rule for coordination (see Partee and Rooth 1983, and Keenan and Faltz 1985 for a much more careful and thorough presentation) treats and as belonging to a family of related types, namely ·, ·, ·, for any · ‘ending in t’. And operates by conjoining two constituents, and identifying the Î-abstracted variables from the two conjuncts. In the specific case of conjunction of regular VPs, with abstracted event and external argument variables, then, and would look like this. (8) [[andVP ]] = ÎX Ev, e,t ÎY Ev, e,t ÎeÎx.(X(e)(x) ∧ Y(e)(x)) We can now tackle our simple example of VP-coordination. Ignoring the issue of how and combines with its two arguments, we derive the following syntax and semantics for (1). (9)
TP ∃e.(sing(j, e) ∧ dance(j, e))
NP j John
λP
Ev, e,t
T λx∃e.P(x, e)
VP λeλx.(sing(x, e) ∧ dance(x, e))
VP λe1λy.sing(y, e1) sang
and (=(8))
VP λe2λz.dance(z, e2) danced
As mentioned above, a representation of conjoined VPs as describing a single event is not appropriate in every case, but the above analysis suggests that 1 Similar, but not identical, given that coordinated VPs are generally of the same semantic type, whereas the VP and the modifier in (7) are of different types. This means that a special rule like Event Identification is needed to derive (7) compositionally, while VP-coordination should be derived in the same way as any other type of coordination.
Relations other than Direct Causation
73
multiple coordinated VPs can jointly describe a single event in at least some cases. A slight extension brings us to the constructions that will be the focus of Part II, namely adjuncts containing VPs. In the same way as we must eventually develop compositional mechanisms which will allow examples such as (1) to describe either one or two events, we may expect a similar ambiguity in examples like the following. (10) John fell to the ground after being punched. In the by now familiar fashion, falling to the ground is clearly an event, and punching is clearly an event, but the question is whether falling to the ground after being punched constitutes a single event or not. Moreover, if we need to ensure that our syntax–semantics mapping is flexible enough to allow cases of VP-coordination to represent single or multiple events, it would be unsurprising if it were also possible for examples like (10) to describe single or multiple events (we will come back to the precise mechanisms underpinning this ambiguity in Chapter 4). That, in fact, is just what I will claim: examples like (10) are ambiguous between single-event and multiple-event readings. And whether we interpret such examples as describing a single event or multiple events is partially determined by our assumptions about the way the world works, rather than being a narrowly linguistic matter. This, then, is our answer to the first question raised above, that of whether multiple verb phrases can jointly describe a single event. In some cases of VP-coordination, a single-event reading is automatically available; there is no reason not to permit the same to occur with certain adjunction structures; and whether two events are construed as related in such a way as to allow them to jointly form a larger event is once again in the eye of the beholder. We turn next to the question of which such relations allow formation of a single event.
3.2 Relations other than Direct Causation The previous section concluded that there is no reason in principle why two verb phrases, one forming part of a constituent adjoined to the other, should not form a single event description, but whether they ultimately do or not depends in part on the way in which the events they describe are construed as relating to each other semantically. This section aims to explore the semantic relations required for macroevent formation, while in Chapter 4 we turn briefly to syntactic constraints on macroevent formation. We have already made a start in this direction with our discussion in Chapter 2 of Fodor’s (1970) generalization concerning direct causation. I
74
Single Events, Multiple VPs, and Agentivity
continue to assume that a relation of direct causation holding between two events allows those two events to form a macroevent, all else being equal. The question here is whether any other relation is sufficient to allow macroevent formation. I will begin by assuming that there exists a set of relations which allow macroevent formation. A necessary condition on macroevent formation is that the subevents are related by one of these relations. Let us call such relations contingent relations, to borrow a term from Moens and Steedman (1988), and then find out which relations are contingent relations. The conclusion from Chapter 2 was that direct causation is a contingent relation, in this sense. One question to be addressed in this section is whether direct causation continues to permit macroevent formation even in multipleVP constructions such as (10), from Section 3.1. Certainly, in such an example, the relation between the two described events is plausibly one of direct causation. This relation is, of course, over and above the temporal relation literally encoded by the preposition after: our knowledge of the fact that being punched is the sort of thing that can directly cause people to fall to the ground is responsible for the enriched interpretation of (10). Later in this chapter we will see some evidence suggesting that, if the relation between the events described in (10) is indeed construed as involving direct causation, the multiple VPs within that example can jointly form a single event description. However, before we get to that, we will consider what happens when we run into other (enriched or unenriched) construals of relations among events. Is anything except direct causation a contingent relation? For example, consider (11). Here, we would not say that reading the book actually caused the designing of the garden (in either an intuitive sense, or any of the more formalized senses, such as that of Lewis 1973, in which an effect would not have occurred without its cause; or that of Reinhart 2002, in which a cause is perceived as a sufficient condition for the effect to occur). However, there is nonetheless a relation between reading the book and designing the garden, beyond the temporal one: the book was presumably some help in designing the garden. Call this relation an enablement relation, following Talmy (1988), Wolff (2003), and others. However, when we come to formulate an explicit characterization of this relation, our treatment will be somewhat distinct from the treatment in those works. (11)
(a) John read an introduction to landscape gardening before designing his garden. (b) John designed his garden after reading an introduction to landscape gardening.
Relations other than Direct Causation
75
On the analysis to be pursued here, unlike that in the references given above, enablement is a relation not just between events but between an agent (John in (11)) and a set of events, in a sense to be made more clear in Section 3.3. The question of whether sets of events related by enablement count as a single event is open at this point, but I will claim in that section that there is good evidence that such groupings of events constitute cognitive units, at least. For the missing link, suggesting that sets of events related by enablement are, in a linguistically relevant sense, the same kind of cognitive units as sets of events related by direct causation, we turn to some experimental evidence from Wolff (2003) in Section 3.4.2. This same enablement relation may be seen in a further class of adjuncts, illustrated in (12). (12) John came to England in order to meet the Queen. Again, here we find an enablement relation holding between John and the two events. Coming to England didn’t cause John to meet the Queen but it may well have facilitated it, and the sentence implies that John, as the person responsible for his coming to England, believed that it would help him to do so. The difference in the case of (12) is that whereas the enablement relation in (11) is supplementary to the core meanings of the prepositions, in (12) the phrase in order tells us explicitly that, firstly, John’s goal is to meet the Queen, and secondly, he believes that coming to England will enable this.2 The enablement relation is therefore directly encoded by in order, but only present in the interpretation of VPs linked by a preposition such as before or after as an enrichment of their core meaning. Two further properties of direct causation and enablement stand out. Firstly, there is no necessary correspondence between the phrase structural configuration of the two event-denoting constituents, and the relations holding between them. In (12) the agent (John) believes that the event described in the matrix VP (his coming to England) enables the event described in the adjunct (meeting the Queen), and the same is true in (11a). But in (11b), the relation goes the other way round: if the introduction to landscape gardening enables John’s design of the garden, then the event described in the adjunct (reading a landscape gardening book) enables the matrix event (designing a garden). 2 There are some more complex uses of in order where the goal is attributed not to the agent of the matrix event but rather to the speaker or some other agent. A typical case of this is literary criticism, where the critic may attribute a goal to a writer, who ultimately has control over all events reported in a text, whether ‘agentive’ or not, involving the protagonists in the text in question. This gives examples such as The ship sinks in order to further the plot (Culicover and Jackendoff 2001: 504). We will return briefly to these more complex cases below, although they will remain somewhat mysterious.
76
Single Events, Multiple VPs, and Agentivity
Secondly, we notice that enablement is an opacity-inducing relation, while causation is not. In (12) we have no guarantee that John actually did meet the Queen as a result of coming to England, and if John had come to England to meet a unicorn, that would not commit us to the belief that unicorns exist. Rather, we are committed to the proposition that John believes that unicorns exist, and that coming to England will help him to meet one. In contrast, if we state that a unicorn killed John (or, less clearly, caused John to die), we are committed to the existence of unicorns. This follows from any standard definition of causation (see, for example, Lewis 1973 and Section 2.2). Such definitions entail that two causally related subevents are inseparable: if the causing event happened, then so did the caused event. This distinction between causation and enablement has a linguistic reflex in the felicity of utterances such as (13). Whereas (13a) is acceptable, (13b) is contradictory. This is as expected because the opacity induced by the enablement relation allows us to separate the occurrence of the matrix event from the adjunct event in (13a) in a way which is simply not possible in (13b). (13)
(a) John came to England in order to meet the Queen, but he never got to meet her. (b) #John fell to the ground after being punched, but he {never fell to the ground/was never punched}.
So a cause always produces its effect in the normal course of things, whereas an enabling event does not necessarily produce the event which it enables. This suggests an approach to a further asymmetry between the two relations. That is, there are no lexical items that directly encode enablement relations conflated with some other predicate, whereas there are a great deal of lexical items that encode causation. That is, while transitive and intransitive melt are related in that melt tr means roughly ‘directly cause to meltitr ’, there are no transitive–intransitive pairs such that Vtr means ‘enable to Vitr ’. This makes sense from the perspective adopted here because, if a single lexical item were to directly encode enablement, this would mean that it described two subevents, the first of which happened, and the second of which may or may not have happened. English, at least, does not seem to allow this.3 Viewed in this way, causation and enablement are fairly similar. Causation is almost a special case of enablement, in which the opacity of the more general 3 However, there are languages, discussed by Smith (1997) among others, in which the reaching of the culmination in a non-progressive accomplishment is a cancellable implicature rather than an entailment. Further investigation of such languages would be very interesting from the above perspective. However, I am unaware of how similar any of them are to English in other related respects, and so this remains very much a matter for future research.
Relations other than Direct Causation
77
notion is removed. However, a formal difference, which will have an important role below, is that causation is a binary relation between events, whereas enablement is a ternary relation between two events and an individual. We need a way of talking about the two events related by enablement, without worrying about the individual. Accordingly, I define a binary relation derived from enablement. In what follows, I will use the term enablement for both enable and enable , as it should be clear which relation is intended. (14)
e1 enable e2 ↔ ∃x.enable(x,e1 ,e2 )
This gives us a second binary relation among events. I will argue that enable , like direct causation, is a contingent relation. Given our approach to macroevent formation, this means that either events related by direct causation or by enable may jointly form a macroevent. Of course, there may be further contingent relations. For instance, Wolff (2003), following Talmy (1988), discusses causation and enablement as forming a natural class with a relation of prevention. Meanwhile, causal relations form only a small part of Kehler’s (2002) theory of coherence relations, which also covers notions like temporal contiguity. In fact, I want to exclude both of these possibilities, as will become clear, and the only contingent relations I will make use of are direct causation and enable . However, there is no reason why other relations could not be included within this family, if the facts pointed that way.4 We saw in Chapter 2 that pure spatiotemporal contiguity was not sufficient for macroevent formation, and so is not a contingent relation. Contingent relations, then, are partially independent of a second family of relations among events, namely the purely temporal relations discussed with reference to (10– 11) above. The independence is not total, however. Our naïve conception of these two contingent relations is one in which the causing or enabling event always precedes the caused or enabled event. We may, then, enrich a linguistically encoded purely temporal relation into a contingent relation, but only subject to the condition that the temporal order of events is congruent with the contingent order of events, as specified in (15). 4
In general, the proliferation across researchers of different groupings of events may give us slight cause for concern. The groupings that I make use of here are a proper subset of both the groupings that Wolff assumes for his force dynamic purposes, and the separate, discourse-based groupings that Kehler assumes. I see no incompatibility between my use of cause and enable , and the results reported in Wolff (2003) and Wolff (2007), although the different ways in which we use the terms could be distinguished by further testing. The differences between my cause and enable , and the relations used by Kehler, are more substantial, but that is somewhat unsurprising: the relations that can hold between event descriptions in a full discourse are bound to be more varied than those which hold between subevent descriptions within macroevents.
78
Single Events, Multiple VPs, and Agentivity
(15) (e1 cause e2 ∨e1 enable e2 )→ e1 precede e2 Such pragmatic enrichment is obviously constrained by world knowledge. We are only willing to countenance a contingent construal of a linguistically encoded temporal relation if such a contingent construal matches our knowledge of how the world works. We therefore predict a three-way split in our reactions to sentences, such as those in (10–11), which encode temporal relations among events. In (16a), we automatically suppose that John collapsed as a result of his collision with a lamp post.5 But such a possibility doesn’t seem so likely in the case of (16b). The Master and Margarita is a powerful book, but unlikely to actually bring the reader to his knees. The most likely interpretation for (16b), then, is that John collapsed for some other reason: he was ill, or he fainted, and it just happened to be after he had been reading. In (16c), on the other hand, there is simply no chance of John’s collapse occurring as a result of the collision. Whatever caused John to collapse must have come before the collapse itself, and the collision came afterwards. (16)
(a) John collapsed after colliding with a lamp post. (b) John collapsed after reading The Master and Margarita. (c) John collapsed before colliding with a lamp post.
(16b) and (16c) differ, then, in coercibility. The absence of a contingent reading in (16c) is due to a fundamental component of our conception of the way the world works: causes precede effects, and enabling conditions precede whatever they enable.6 On the other hand, the typical absence of a contingent reading in (16b) is due to a much more trivial part of that conception, namely that reading a book just isn’t that bad. Accordingly, we accept an alteration to the latter belief more readily than the former. Maybe, sometimes, reading a book really is very hard work indeed. And so we can sometimes construe the two events as causally related, as in (17). (17) Some lengthy novels demand a lot of intellectual and even physical effort from the reader. Some would say too much. I mean, John collapsed after reading Finnegans Wake. On the other hand, no amount of context seems to be able to help (16c). Even in (18), where it is explicitly stated that causes follow effects, the two events still resist a contingent construal. 5 Causation is the only relevant contingent relation here, as we know that both events happened, and it is intuitively inaccurate to talk of hitting a lamp post enabling John to collapse. I will return to the reason for this intuition in the following section. 6 Once again, this may not be true in the ‘real world’, but that is irrelevant to the question of how we naïvely conceptualize the world.
Agentivity and Event Size
79
(18) We had a weird night last night. We fell through a wormhole into a parallel universe where the flow of causation was reversed so that a cause happens after its effects. It was terrible. John collapsed before colliding with a lamp post. So don’t expect too much from him today. This suggests that the direct causation data discussed in Section 2.1 are actually part of something bigger. Recall again Fodor’s Generalization, repeated below. (19)
Fodor’s Generalization A single verb phrase describes a single event.
Given the role of contingent relations in macroevent formation, this predicts that, all else being equal, contingently related subevents will be describable by a single verb phrase, if an appropriate verb exists.7 This prediction seems, in fact, to be accurate. Of course, details are lost in the translation from a multiverb description of an event to a single-verb description of the same event, not least because there are strict limitations on the possible descriptive content of a single verb over and above Fodor’s Generalization. Nevertheless, we are willing to accept the single-verb description as a valid portrayal of the same set of happenings in all the relevant cases. We have already seen numerous examples of this phenomenon with direct causation in Section 2.1. The next section shows that the same holds for the enablement relations discussed in this section. This is the first piece of evidence for the claim that multiple events related by enable are single events, in the same way that multiple events related by direct causation are.
3.3 Agentivity and Event Size The direct causation restriction on verb phrase denotations has been a staple of research into the syntax–semantics interface over the last 40 years. This section aims to use that finding as a model for further exploration of the hypothesis that enablement constitutes a second contingent relation. We have discussed two types of enablement relation. In the first, enablement is linguistically encoded, for example by in order. In the second, an enablement construal arises through enrichment of a linguistically encoded temporal relation. In each case, however, we can take the enabled event (call it the goal event) as standing for the entire complex of enabling event + goal event. Consider the following. 7 Given that we have cast Fodor’s Generalization as a one-way implication, we do not expect this prediction to hold universally. However, it is not unreasonable to expect at least a tendency in this direction.
80
Single Events, Multiple VPs, and Agentivity
(20)
John is deep in concentration, reading an introduction to landscape gardening. Bill asks him what he’s doing. ‘I’m designing a garden’, replies John.
This exchange is completely natural, even though John is not actually designing a garden as he speaks.8 What he is doing is something that will enable him to design a garden at a later stage—what he is doing could be described in more detail as reading an introduction to landscape gardening in order to design a garden. Note, moreover, that the truth conditions of (20) are independent of whether John ever actually designs a garden—it is only necessary that he believes that reading the book will help him to do so. So a single verb, by hypothesis describing a single event, comes to stand for the same series of happenings, some actual, some intended, that could otherwise be described by two separate verbs describing two distinct events standing in an enablement relation. Exactly the same points hold in the following example. (21) I’m seeing some friends from Monday to Thursday. (21) can felicitously be uttered if, in fact, the speaker will only be actually visiting friends from Tuesday to Thursday, but has to travel all day on Monday because they live in the Outer Hebrides and he’s marooned in London. The travelling to enable oneself to visit friends is legitimately counted as part of visiting friends, even if no friends were actually visited during the travelling period, and (21) could be more accurately replaced by I’m seeing some friends from Tuesday to Thursday, and travelling up to meet them on Monday. The following sentence is therefore not a contradiction. (22) I was away visiting friends last week, but I didn’t actually manage to visit any of them. (22) is interpretable if the first conjunct actually has the meaning I was engaged in activities that would have enabled me to see my friends under normal circumstances. Again, because of the opaque nature of enablement, there is no contradiction if the goal event is not actually realized. Again, a description of a single event may also subsume multiple subevents related by enablement. This is the phenomenon referred to by Talmy (2000) as windowing of attention (see also Talmy 1988, van Valin and Wilkins 1996), whereby linguistic material describes only certain parts of a complex structure, while omitting 8 If John puts the book aside and does not do anything else, he cannot claim ‘I designed a garden’. This is entirely parallel to the fact that a runner cannot stop after 25 yards and claim ‘I ran a mile’.
Agentivity and Event Size
81
(‘gapping’) others. In the examples discussed above, we have a choice between a multi-sentential description, in which the window of attention extends over the entire sequence of events, and a ‘gapped’ description, in which the window of attention covers the agent in the initial (enabling event), and the goal event, but omits any description of intervening events. As Talmy shows at length, gapping is very frequent, or even standard, in certain kinds of event structure, including chains of intended causally related events under the control of an agent. Now consider what happens when there is no such contingent relation holding between two events. Section 2.1 already discussed the absence of direct causation. Here, I will expand the discussion to cover cases where an enablement relation is absent. Consider the following variant of (20). (23) John is slumped in an armchair watching the rugby league on TV. Bill asks him what he’s doing. ‘I’m designing a garden’, replies John. Unlike (20), this exchange is somewhat bizarre. John’s response triggers some extra cognitive effect. Is he being sarcastic, or lying? Or is the rugby league somehow influencing his garden design process? Maybe, echoing Dowty (1979), he really is in the process of designing a garden, and Bill happened to catch him on a protracted break. Regardless of what the correct answer is, it is clear that either the Gricean maxims are being flouted, or we must use some extra cognitive effort to construct a coherent event structure relating watching the rugby and designing a garden. Leaving the two events unrelated, when both are described by a single verb phrase, is not an option. Again, the same point can be made with respect to (21). This sentence simply is not true if, rather than travelling to see friends on Monday, the speaker sits around all day twiddling his thumbs before visiting his friends on Tuesday. Single verb phrases can expand their descriptive scope, then, to include multiple contingently related events, which could alternatively be described by multiple verb phrases. This does not prove that multiple verb phrases can themselves describe a single event. However, suggestive evidence in favour of that possibility comes from the fact that the same interpretive restrictions apply in the multi-verb case as in the single-verb case. Consider, for example, Fodor’s generalization that the causal relations expressed by lexical verbs must be direct. The same appears to be true in the multi-verb case. To see this, consider again (16a), repeated below. (24)
John collapsed after colliding with a lamp post.
82
Single Events, Multiple VPs, and Agentivity
This sentence is true on the interpretation where colliding with the lamp post directly causes John’s collapse. It is also true if the collision did not cause the collapse but merely preceded it. The interesting case, though, is the one where the collision does cause John’s collapse, but only indirectly. Such a scenario would be one where, for example, John staggered off after walking into the lamp post without collapsing, but badly shaken. So badly shaken, in fact, that he needed several drinks to calm his nerves. And after eight triple brandies, he collapsed in a sorry heap. It seems that the temporal relation among events expressed by after in (16a) cannot be enriched into a contingent relation in such a case. Some evidence supporting this claim can be found by considering the following exchange, in the above scenario, modelled on those from Bittner (1999) discussed in Chapter 2. (25)
(A) I know why John collapsed: he collapsed after colliding with a lamp post. (B) That’s not true. It was the triple brandies that did it.
In (25), speaker A identifies a cause for John’s collapse, choosing to express it with an after-phrase. However, the cause that he identifies is the remote cause, the collision with the lamp post. Speaker B can take issue with this, and claim that the ‘real’ cause of John’s collapse (by implication, the only ‘real’ cause at this level of granularity), was the brandy. This rejects speaker A’s assertion as a false claim concerning the causal chain in question, which implies in turn that such an assertion in such a scenario will only be understood as true if the relation specified by after is understood in purely temporal terms rather than in terms of indirect causation. If this is accurate, we see a striking parallel between the possibilities for pragmatic enrichment of relations among multiple verb phrases on the one hand and the possible relations among events expressed within a single verb phrase on the other. In turn, this is strongly suggestive that the same event structures can be composed from either single or multiple VPs as required.
3.4 Planning and Enablement We now return to enablement, the relationship among events which licences the single-event construal of appropriately interpreted examples such as (12). Section 3.3 noted an asymmetry between causation and enablement: a cause necessarily produces an effect in the most accessible possible worlds, but an enabling event does not necessarily lead to the goal event. This section concentrates on a further difference between the two, and relates both differences to
Planning and Enablement
83
agentivity. The overall effect we observe is that an agentive subject can permit the formation of larger macroevents, making more use of enablement than is otherwise possible. This is corroborated by experimental evidence reported in Wolff (2003). The further difference in question between causation and enablement is that the linguistically relevant causal relation is direct causation, whereas we do not distinguish between direct and indirect enablement relations. We saw some evidence in Sections 2.2 and 3.2 for the privileged status of direct causation over indirect causation in natural language, but if we apply similar tests to enablement relations, we find no distinction: both direct and indirect enablement relations are permitted among subevents of the same macroevent. In other words, for linguistic purposes, enablement, but not causation, is a transitive relation: if A enables B, and B enables C, then A enables C, but this does not hold for direct causation, the linguistically relevant case of causation. I will continue to assume Fodor’s Generalization, that any set of happenings that can be described by a single verb phrase constitutes a single event. Now, what if that single event consists of three (or more) subevents at the relevant level of granularity, all related by enablement? A construal of the subevents as related directly by enablement is unavailable. If such a set of happenings can be covered by a single verb phrase, and therefore can constitute a single event, then subevents related by indirect, as well as direct, enablement relations can be contained within a single macroevent. It is hard to find clear examples of this configuration, as the notion of ‘relevant level of granularity’ remains a primarily intuitive one. However, the following is a plausible example. (26) John is walking to the outdoor pursuits shop. A friend asks him where he is going. John replies ‘I’m going climbing’. At one level, this should be false. He is walking to the outdoor pursuits shop. However, there is a chain of events which makes this a legitimate claim to make. Let us assume that he was walking to the shop in order to buy carabiners. He then intended to use the carabiners to help him climb a rock face. We have, then, three events: (a) the walk to the shop; (b) the purchase of carabiners; (c) the climb. Moreover, it seems that none of these events can be omitted. (a) and (c) cannot be omitted, because they represent the current state of affairs and John’s stated goal, respectively. But (b) cannot be omitted because walking to the shop is only contingently related to climbing if something happens in the shop which enables climbing—the fact of going to the shop on its own does not make climbing any more or less likely, and we assume
84
Single Events, Multiple VPs, and Agentivity
that John believes this too. So we have three subevents. And each of these enables the next: going to the shop does not cause John to buy carabiners but it does enable it, and likewise for owning carabiners and going climbing. So (a) enables (b) and (b) enables (c). But that means that the enablement relation between (a) and (c) is indirect, according to the definition of directness in (9) of Chapter 2. Yet this clearly does not remove the possibility of construing these three events as a single macroevent, as they are described by a single verb phrase, going climbing, in (26).9 We must, then, claim that enablement differs from causation in that indirect enablement, but only direct causation, is relevant to Fodor’s Generalization. Why should this be? I believe that it is linked to a further difference between causation and enablement, hinted at in the previous section, which is this. (27)
Enablement relations form plans Enablement relations can only be seen by, or attributed to, an agent (that is, a rational actor acting with the intention of reaching a specific goal), who chains together events standing in such relations to form plans.
In other words, causation is a relation concerning how we perceive events as relating to each other in the world at large. But enablement is a relation concerning how we perceive an agent’s beliefs about the way events relate to each other in the world at large, and how he hopes to chain these events together to form a plan. We return, then, to the triadic nature of enablement: whereas causation is a relation between events, enablement is a relation between a set of events and an individual, the agent who forms events into a plan. This is the major point of divergence in what I am saying from force dynamic models of causation and enablement, such as Talmy (1988), Wolff (2003), and Wolff (2007). For those authors, enable is a dyadic relation between an object with a tendency towards some endstate and an agonist which amplifies that tendency. There is not obviously any room for a distinction between agentive and nonagentive objects in that characterization (although we shall see presently that Wolff incorporates a notion of agentivity elsewhere in his definitions), and so it is unclear exactly how to capture the effect of agentivity on enablement, and through it, on plan-formation. The same problem does not arise in the present approach because enablement is characterized as a triadic relation between an agent and a set of events, whereas causation remains dyadic. Key to the notion of agentivity, on this definition, is the notion of goal-driven behaviour. These goals can be short- or long-term. 9 It is not clear to me, though, why we must say I’m going climbing, or I’m climbing today, for example, rather than simply I’m climbing.
Planning and Enablement
85
For example, John could be walking just for the sake of walking, in which case his goal is immediately fulfilled (and fulfilled in an ongoing way) by the action he is currently performing. Or John could be walking in order to get to the shops in order to buy some carabiners in order to go climbing. In either case, John is an agent and so enablement relations can be attributed to him. If, on the other hand, John is not acting in a goal-driven way, no intermediate event enabled him to end up wherever he ended up. To see this requires a thought experiment. Assume that John’s actions are exactly the same as above (walking, outdoor pursuits shop, carabiner-buying, climbing) but that he didn’t mean to perform any of those actions. He was on his way to the shop next door, but stumbled into the outdoor pursuits shop by mistake. Once he was there, he decided to buy some Kendal mint cake, but somehow he found himself buying carabiners instead. By this time, he was feeling quite puzzled and somewhat foolish, but he decided he should at least make sure the carabiners have a good home, so he would give them to his friend, who lives at the bottom of Kilnsey Crag and likes climbing. But when he got there, some unknown force came over him, and rather than ringing his friend’s doorbell, before he knew it he was using the carabiners himself, attached to a rope halfway up Kilnsey Crag. Now, when, in these circumstances, could John truthfully say I am going climbing? When he was intentionally going climbing, he could truthfully say it as soon as he started on the chain of enabling events. When he only unintentionally found himself climbing, though, although his actions were the same at every stage, it could only truthfully be said when he actually is climbing. It seems, then, that an agent acting in a goal-driven way allows us to form larger events than we can form in the absence of such an agent. In fact, a series of papers have drawn attention to the central role of plans and goals as cognitive or behavioural units. Although this is a staple of the cognitive psychology literature, it is perhaps not universally known among linguists. I will therefore briefly mention a few key results supporting this notion. 3.4.1 The Foundational Work The insight that behaviour can usefully be segmented into units corresponding to goals is a venerable one. In the first half of the twentieth century, variations on this theme can be found in the writings of both Gestalt psychologists and behaviorists. Among the Gestalt psychologists, we find Köhler’s (1925) The Mentality of Apes. The main point of this work is to make an argument for attributing ‘insight’ to chimpanzees, by distinguishing behaviour which
86
Single Events, Multiple VPs, and Agentivity
occurs in the presence of a plan for problem-solving from random, insightfree behaviour. Suggestively, Köhler’s main criterion for spotting insightful behaviour is that insightful behaviour appears as a single, continuous behavioural unit. He writes: We can . . . distinguish sharply between the kind of behaviour which from the very beginning arises out of a consideration of the structure of a situation, and one that does not. Only in the former case do we speak of insight, and only that behaviour of animals definitely appears to us intelligent which takes account from the beginning of the lay of the land, and proceeds to deal with it in a single, continuous, and definite course. Hence follows this criterion of insight: the appearance of a complete solution with reference to the whole lay-out of the field. (Köhler 1925: 198)
Note that this definition relies on our ability to quantify ‘courses’ of action. As with the discussion in this chapter and the preceding one, insight on Köhler’s definition is therefore in the eye of the beholder. This contrasts with the behaviourist insistence on measurability which is found in Tolman (1932). However, Tolman took pains to distinguish his brand of behaviourism from what he calls ‘molecular’ behaviourism, in which one ‘describe[s] behavior in terms of simple stimulus–response connections’ (p.4). In contrast, he proposes a ‘molar’ behaviourism, operating in terms of larger units defined in terms of goals. He writes: The complete identification of any behavior-act requires . . . a reference first to some particular goal-object or objects which that act is getting to, or, it may be, getting from, or both. . . . such a getting to or from is characterized not only by the character of the goalobject and this persistence to or from it, but also by the fact that it always involves a specific pattern of commerce-, intercourse-, engagement-, communion-with such and such intervening means-objects, as the way to get thus to or from. (Tolman 1932: 10–11)
Tolman showed at length that even the behaviourist’s staple diet of rats running through mazes could only make sense if their behaviour was viewed as purposive or goal-driven. However, the insistence on measurability meant that goals, purposes, and so on could not be construed as defining cognitive units. Rather, they were seen as behavioural units. The primary evidence of purpose in a rat’s behaviour, for Tolman, was that the rat is docile with respect to its behaviour in a maze. If a rat’s prior experience of mazes tells it that a particular type of behaviour is more likely to lead to food, then it will be predisposed to produce more of such behaviour in novel situations. However, this only makes sense if the rat’s goal, when in a maze, is to reach the reward as painlessly as possible.
Planning and Enablement
87
Tolman’s understanding of purposive behaviour is therefore inevitably retrospective, cashed out in terms of patterns of learning. In contrast, from a Chomskyan perspective, the more striking feature of purposive behaviour is its creativity. In response to new situations, an animal creates new plans, within the limits imposed by species-specific cognitive limitations. Tied up with this distinction is another one, namely that human plans in response to novel situations, and indeed human behaviour in general, tend to have a strongly hierarchical organization. Tolman (pp. 98, 178) explicitly conflates hierarchical structure with linear order: a rat running through a maze towards an ultimate goal of food, sex, or whatever, may, after an amount of learning, come to proceed towards that ultimate goal in such a way as to suggest the presence of a string of intermediate subgoals. In part because of the nature of the mazes used to measure the behaviour of rats, and no doubt in part because rats are not the most intelligent of creatures, those subgoals are always adequately characterized in the work Tolman cites in terms of linear order: there is a chain of n subgoals, leading to an ultimate goal. When considering human planning and problem-solving, though, such a characterization appears quite inadequate. The hierarchical structure of our plans is emphasized by Miller, Galanter and Pribram (1960) (see also Newell, Shaw and Simon 1958). On Miller et al.’s theory, the basic building block of our plans is the ‘Test–Operate–Test–Exit unit’, or ‘TOTE unit’ for short, as represented in Figure 3.1. The ‘Test’ establishes whether or not, to put it crudely, things are as they should be. If they are, then we have ‘congruity’ and we can instantly exit. If not, we enter the ‘Operate’ phase, where the operation is designed to eliminate incongruity. After the ‘Operate’ phase, we test again for congruity. If the incongruity remains, we re-enter the ‘Operate’ phase, and so on.
Test
(Incongruity)
Operate Figure 3.1 A TOTE unit
(Congruity)
88
Single Events, Multiple VPs, and Agentivity Test nail
(flush)
(sticks up) Hammer (up) Test hammer
Test hammer (up)
(down) Lift
Strike
Figure 3.2 Hammering a nail into a wall, in TOTE units
The hierarchical structure comes from the fact that the Operate phase of one TOTE unit may itself contain further TOTE units. For example, Figure 3.2, from Miller et al. (1960: 36), shows the structure of hammering a nail into a wall. The superordinate unit tests for whether or not the nail is still sticking out from the wall. If it is, we enter an Operate phase consisting of two further TOTE units. The first of these tests for whether the hammer is in the right position, and lifts it if not. The second brings the hammer down onto the nail. Once we have passed through these two TOTE units, we return to our original test, and check whether or not the nail is flush with the wall. If not, we re-enter the Operate phase, and so on. It is instructive to consider just how much of the fine structure of accomplishments is captured by this simple model. In particular, for simple cases like this, we find a broad analogue of the measuring-out effect discussed briefly in Chapter 2. The subordinate TOTE units form an alternating pair: if the hammer is down, then we raise it; and if the hammer is up, then we bring it down on the nail. The characterization of an activity as an unbounded series of small events (see, e.g., Dowty 1979 or Smith 1997) therefore comes for free. The superordinate TOTE unit provides the natural endpoint for the process of hammering: if the nail is flush with the wall, we achieve congruity and no longer enter the hammering phase of the TOTE unit. This is as much as we get for free. In particular, the mapping between progress through an event and through an incremental theme is not automatically captured. Moreover, something further has to be said to capture events
Planning and Enablement
89
which are not inherently bounded. However, it is gratifying to see that a basic hierarchical model of plans like this one instantly captures something of the properties of the agentive events discussed in the previous chapter. This parallel is interesting because there is no principled upper bound to the recursive embedding of TOTE units within TOTE units. The same kinds of considerations which give us a basic account of the structure of accomplishments can also account for more complex plans. So long as there is a goal (in the form of elimination of some incongruity), that goal defines a unit. This is more explicit in Jackendoff (2007: ch. 4), where actions with more complex internal structure are diagrammed in a way reminiscent of phrase structure trees.10 The work of Miller et al., and particularly Newell et al., led to a rich literature on related issues within artificial intelligence, which I cannot summarize here (see Newell and Simon 1972 and Schank and Abelson 1977 for classic foundational work in this area). There is also a rich experimental tradition aiming to elucidate the perception of goal-driven behaviour by children, adults, and nonhumans (see Meltzoff 1995, Woodward 1998, Baldwin et al. 2001, and large amounts of research by Michael Tomasello and collaborators, and Gergely and Csibra, summarized in Tomasello et al. 2005 and Csibra and Gergely 2007, respectively). This research has very clearly shown that humans understand behaviour by animate (typically human) agents in terms of their intentions, and the goals towards which those intentions are directed, while they typically understand behaviour by inanimate objects, or animates acting unintentionally, in terms of purely physical characteristics. This means, for example, that children are more surprised when someone reaches for a new object in the same position where they previously reached for some other object than when they reach for the same object in a new location (Woodward 1998); or that, when children are shown an actor apparently trying and failing to perform some action on a novel object, they will subsequently reproduce the presumed target action rather than the actor’s failed attempts at producing that action (Meltzoff 1995). 10 This gross similarity between the syntactic structure of human language and the structure of human plans has been remarked upon in many places, including Miller et al. (1960), Steedman (2002), and Jackendoff (2007). Moreover, Miller and Chomsky (1963), the last of the three chapters by Chomsky and/or Miller in the Handbook of Mathematical Psychology, ends with a section entitled Toward a Theory of Complicated Behavior, expressing a degree of optimism for the possibility that the formal tools developed for the study of the syntax of natural language may be useful in the description of other behaviours as well. Fascinating though these speculations are, we do not need to commit ourselves to such a strong position. All that matters for the present argument is that goals define cognitive units. They do not need to be units in exactly the same sense that syntactic constituents are.
90
Single Events, Multiple VPs, and Agentivity
Something which, to my knowledge, has not received so much experimental attention is the connection between intentionality of action and the size of the units into which we parse that action. Several researchers have commented on our apparent reliance on two distinct types of causation in making sense of what happens around us, namely physical causation, as involved in the movements of inanimate objects; and psychological causation, which drives agentive behaviour. This distinction, pleasingly, is broadly similar to that drawn, on entirely different grounds, between ‘actions’ and ‘events’ by Pietroski (2000), based on logical considerations of nonidentity among events in complex actions. However, the psychological literature has not in general been concerned with the question of whether perception of intentional action is in any useful sense the same kind of perception as perception of unintentional happenings. For my present concerns of carving up happenings into meaningful perceptual units, that is obviously a crucial question. In this respect, the work of Phillip Wolff stands out. Wolff ’s experiments have shown not only that the perceptual units of intentional and unintentional actions are the same kinds of units (which we shall continue to call ‘events’) but also that intentional events can be larger than unintentional events, and finally that this difference in event size is reflected in linguistic behaviour. I devote the following subsection to a summary of his 2003 paper. 3.4.2 Wolff ’s Results Wolff (2003) reports a series of experiments designed to elucidate the relationship between agentivity and event structure. In particular, he aimed to test the validity of the no-intervening-cause criterion and the no-intervening-cause hypothesis, given below.11 No-intervening-cause criterion Direct causation is present between the causer and the final causee in a causal chain (1) if there are no intermediate entities at the same level of granularity as either the initial causer or final causee, or (2) if any intermediate entities that are present can be construed as an enabling condition rather than an intermediate causer. (Wolff 2003: 4–5) No-intervening-cause hypothesis The linguistic coding of causal chains in English (and possibly in other languages) is determined by the concept of direct causation as defined by the no-intervening-cause 11 The terminology in the definitions given below is reproduced verbatim from Wolff ’s paper, and in some cases is at odds with how these terms are used elsewhere in the present work. In particular, Wolff ’s conception of direct causation is much broader than mine, and subsumes relations of both enablement and direct causation, in my terms.
Planning and Enablement
91
criterion. Further, the way in which English speakers (and possibly speakers from other languages) individuate events is also determined by the concept of direct causation as defined by the no-intervening-cause criterion. In terms of linguistic coding, the nointervening-cause hypothesis holds that in the absence of an intervening cause, a causal chain can be described by a single-clause sentence. In terms of events, the hypothesis holds that when there is no intervening cause, a causal chain can be construed as a single event. (Wolff 2003: 7)
The first part of the no-intervening-cause criterion, and all of the nointervening-cause hypothesis, are familiar from several works on causation and the individuation of events discussed above, such as Lewis (1973), Bittner (1999), and Fodor (1970). The interest, for our purposes, comes from the notion of enabling condition in part (2) of the definition of the no-interveningcause criterion, making use of a very similar notion of enablement to that discussed in Section 3.2. This is why Wolff ’s results are of particular interest to us: while the foregoing discussion strongly suggests that goals define cognitive units, nothing so far has suggested that these are the same cognitive units picked out by Fodor’s Generalization. By relating subjects’ perception of relations among subevents to their linguistic description of those events, Wolff allows us to relate these findings directly to the more linguistic concerns of the previous chapters. Wolff, following Talmy’s (1988) foundational work in this area, notes that we make a pretheoretical distinction between causation and enablement. This explains the fact that the English verbs cause and enable, taken to at least approximate these two relations, are appropriate in different circumstances. (28)
(a) The explosion caused the windows to shatter. (b) #The explosion enabled the windows to shatter.
(29)
(a) Gasoline enables cars to run. (b) #Gasoline causes cars to run.
(Based on Wolff 2003: 7–8)
Although Wolff describes the relations of causation and enablement in terms of a theory of force dynamics based on Talmy (1988) and Jackendoff (1990), all the results that he reports are equally compatible, as far as I can see, with the approach taken above. I continue to assume, then, that enablement relations require the presence of an agent. Wolff ’s first experiment tests the case where there are no enablement relations to consider, and so direct causation is present only if there are no intermediate causes. Subjects were shown animations involving three marbles, from which the still in Figure 3.3, originally found in Wolff (2003: 16), is taken. In these animations, each marble rolled into the next, thereby causing the next marble to start to move.
92
Single Events, Multiple VPs, and Agentivity
Figure 3.3 Still from Wolff ’s Experiment 1: all force-dynamically related objects are inanimate
There were therefore two unmediated causal chains, between the first and second, and second and third marbles; and a mediated causal chain, between the first and third marbles, with the second as an intermediary. As all participants in the causal chain are nonsentient, there is no question of this mediated causal chain representing an instance of direct causation (in Wolff ’s sense), and so the presence of an intermediary forces the causal relation to be indirect. As they are throughout this series of experiments, the claims of the no-intervening-cause criterion regarding Wolff ’s definition of direct causation are approached through the predictions of the no-intervening-cause hypothesis, namely that direct causation (or a single macroevent, in my terminology) allows lexical causative use and perception of a single event (while not necessarily blocking periphrastic, biclausal causative use or perception of multiple events). In that case, if subjects are asked to describe the interaction between either the first and second, or the second and third marble, with one of the two forms in (30), either option should be available. Equally, subjects should be able to perceive the interaction of either of these pairs of marbles as a single event. On the other hand, if asked about the relation between the first and the last marbles, only the periphrastic causative (30b) and a perception of two or more events should be possible. The observed data matched these predictions to a statistically significant extent. (30)
(a) The red marble moved the blue marble. (b) The red marble made the blue marble move.
Planning and Enablement
93
This suggests that, when no sentient causers are involved, direct causation as defined in Chapter 2 determines the distribution of linguistic and perceptual single events. Experiment 2 contrasts this finding with the case where the initial causer is sentient. To represent this, a human hand was used in place of the first marble, as in Figure 3.4, reproduced from Wolff (2003: 16). With a sentient initial cause as represented in Figure 3.4, we predict that, in the case where the hand moves a marble, which subsequently causes another marble to move, a sentence like (31a), the analogue of (30a), should now be acceptable to describe the effect of the hand on the second marble, as well as (31b), as the first marble could be seen as enabling the human to bring about the second marble’s movement. Again, the results supported the prediction to a statistically significant extent. (31)
(a) The man moved the blue marble. (b) The man made the blue marble move.
Finally, Experiment 3 showed that more than sentience was at issue in the definition of direct causation. Pairs of animations were prepared which differed in that the sentient initial causer apparently intends to bring about the result state in one but not in the other. For instance, one pair consists of a
Figure 3.4 Still from Wolff ’s Experiment 2: the initial element in the causal chain is animate
94
Single Events, Multiple VPs, and Agentivity
woman waving her hand towards smoke rising from an ashtray, and a woman walking past smoke rising from an ashtray, as in Figures 3.5 and 3.6, from Wolff (2003: 22). In both cases, the smoke disperses, but only in the former case is this an intended consequence of the woman’s actions. Only in the former case, then, is the woman behaving agentively, making enablement relations available for macroevent formation. Furthermore, if we are to perceive a causal relation between the woman’s actions and the smoke dispersing in Figure 3.6, we arguably need to consider an intermediate causal event (the woman creates a draught), thereby blocking a relation of direct causation between the woman’s movement and the smoke’s dispersal. In other words, neither causation nor enablement relations allow us to construe the whole sequence as a single event. As predicted, then, subjects were more willing to use lexical causatives, for example as in (32a) as opposed to (32b), and perceived a single event more readily, for the intended scenarios than for the unintended cases. (32)
(a) The woman dispersed the smoke. (b) The woman made the smoke disperse.
These experiments clearly support the conclusion that perception of an action or series of actions as goal-oriented increases our willingness to admit larger
Figure 3.5 Still from Wolff ’s Experiment 3: the initial element in the causal chain is animate, and intends to bring about the goal event
Planning and Enablement
95
Figure 3.6 Still from Wolff ’s Experiment 3: the initial element in the causal chain is animate, but does not intend to bring about the outcome of the causal chain
portions of those actions as a single event.12 Equivalently, people accept a more coarse-grained representation of a series of happenings if they recognize those happenings as enabling an agent’s attempt to reach a remote goal. Or, in other words, if the subject is an agent, macroevent formation can apply more expansively, exactly the same conclusion we reached above on independent grounds. This is exactly the distinction which Talmy’s (2000) theory of the windowing of attention leads us to expect. We now need to know how to fit this fact into our model of the internal structure of events, which currently consists only of the maximal core event represented in Figure 2.1 of Chapter 2, repeated here as Figure 3.7. Clearly, the relation between the two components of this core event is insufficient to cover the enablement relations discussed above. For example, in (33), we may wish to consider the accomplishment described in the matrix 12 Ad Neeleman points out a further prediction of this approach, namely that the same animation could be construed as part of a single event, if it formed part of a larger plan. Such a state of affairs could arise in the following scenario. A group of zookeepers like to disperse smoke rising from an ashtray by letting a herd of angry rhinoceri charge down a corridor past the offending ashtray. The rhinoceri themselves do not intend to disperse the smoke, but the zookeepers intend the rhinoceri to disperse the smoke. The present theory predicts that we should be able to perceive such a series of happenings as a single event, and describe it using the phrase the zookeepers dispersed the smoke, despite the fact that it contains a proper part which is identical in relevant respects to the animation from which Figure 3.6 was taken. My intuitions are that this is correct, but unfortunately it is a prediction which was not tested by Wolff.
96
Single Events, Multiple VPs, and Agentivity Change significant
insignificant
CULMINATION
PROCESS
Time
Figure 3.7 The maximal core event
VP as forming a single macroevent with the remote goal expressed in the rationale clause. However, as the matrix event already consists of a fully specified process and culmination, the goal of communicating his inner rage cannot be identified with either of these components. (33) John drew a picture in order to communicate his inner rage. For this reason, I introduce the notion of an extended event, as defined in (34). (34) An extended event consists of a series e1 , . . . ,en of core events, such that: (a) e1 occurred and is agentive; (b) The agent of e1 intends en to occur; (c) For every ek , 1 ≤ k < n, the agent of e1 either believes that ek cause ek+1 or that ek enable ek+1 . Although the decision I have made to distinguish between core events and extended events is really just terminology, it is also intended to reflect a difference between the types of evidence adduced in favour of core events and extended events. In the case of core events, the decompositional structure which may be contained within is not immediately obvious to the observer, and, indeed, is not universally accepted by theoreticians even today. Arguments for a decompositional model of core event structure rest on discoveries, largely dating back to work in Generative Semantics such as Lakoff (1970), concerning morphological alternations, entailment patterns, and so on, between apparently distinct lexical predicates. Whatever theoretical position one subscribes to today, there is substantial agreement across the spectrum from Reinhart (2002) via Ramchand (2008) to Borer (2005) concerning the sort of data for which a theory of core event structure and related matters is to be held accountable, and it consists first and foremost of the bread and butter of theoretical linguistics, that is, the sort of low-level, strictly linguistic data which passes under the radar of most naïve language users.
Planning and Enablement
97
On the other hand, the evidence in favour of extended events as units is much more general. The fact that people talk about plans and goals is something of which most people are aware, and not necessarily of any great ‘core’ morphosyntactic interest. No-one has tried to build a theory of phrase structure around the structure of extended events, for instance, while some researchers have been attempting to assimilate the structure of core events to basic phrase structure for at least 45 years now. Instead, we have seen that the evidence in favour of extended events as units comes primarily from descriptions of behaviour and of perception of action. I take this distinction between the types of evidence for core events and extended events as cognitive units to reflect a robust intuition that the two are different in some way: coarse graining can make us unaware of the relation between core events and their component parts to a much greater extent than it can with extended events and their component parts. In this light, the great interest of Wolff ’s results, as reinterpreted here, is that they suggest that, despite this difference, there is some common ground in that the direct causation observed in core events and the enablement observed in extended events have similar effects on the units we perceive and the way in which we report those units verbally. An extended event, then, consists of a series of core events, the first of which is actually performed by an agent who intends the last event in the sequence (the goal event) to occur. The evidence suggests that an extended event is still a single event, however. We also make a further prediction that extended events can never be formed on the basis of states, although they can feature states as goals.13 This is because states are nonagentive—although an act of will may be involved in exerting oneself to maintain a state, this still implies a dynamic configuration of forces, as emphasized by Wolff (2007). It therefore seems (as first noted, I believe, by Barbara Partee) that some states start to behave like activities if their subject is agentive. States do not usually form a progressive, as shown in (35), but adjectival states, at least, allow a progressive form, but only with an agentive reading which is obligatorily absent if the progressive is absent (36).14 As the ability to take the progressive is the criterion distinguishing states from activities for Vendler, it appears that adding agentivity converts a state into an activity. (35) 13
*John is knowing the answer.
Could states also feature as intermediate events? I suspect not, because states are, by definition, static, and the whole point of a plan is to set in motion a chain of dynamic events. At best then, the inchoation or termination of states could function as intermediate links in a planning chain. 14 Why this should only apply to states described by copula + adjective is beyond me.
98
Single Events, Multiple VPs, and Agentivity (CULMINATION)
(PROCESS)
GOAL AND ≥ 0 INTERMEDIATE EVENTS
Figure 3.8 Extended events
(36)
(a) John is annoying. [nonagentive] (b) John is being annoying. [agentive]
If states are nonagentive, then we predict that they cannot appear as the initial subevent in an agent’s plan. Accordingly, it should not be possible to modify them with rationale clauses, which describe the initial and goal subevents of just such a plan. This prediction is borne out most strikingly by minimal pairs such as the following. Only when be is added in (37b) to form an activity from a state does the sentence become felicitous. (37)
(a) *John is annoying to make his brother laugh. (b) John is being annoying to make his brother laugh.
Pictorially, although it is hard to find a coherent, economical, and general representation of extended events, we may consider a representation as in Figure 3.8. The parentheses around the initial process and culmination are intended to represent their optionality: although the initial event must be some form of agentive dynamic event, its internal structure is not significant. And the dashed lines for events beyond the initial event are intended to represent the fact that those events may or may not actually occur.
3.5 Agentivity, Aspectual Classes, and the Progressive We turn now to a further way in which agentivity interacts with event structure, namely that it offers a new perspective on the question of whether or not to consider accomplishments and achievements as separate aspectual classes. Both positions have well-established pedigrees within the literature. The two classes have been considered as distinct since Vendler (1957), but there also exists an older tradition, apparently stemming ultimately from Aristotle, in which they are not distinguished. For more recent discussion, see Dowty (1979) and Parsons (1990), following Vendler, and Mourelatos (1978) and Verkuyl (1989), following Aristotle, among many others in both cases.
Agentivity, Aspectual Classes, and the Progressive
99
Usually when two distinct positions coexist for a long period of time, this is because there is a grain of truth in both, and this, I think, is the case here. As will have been clear from Section 2.3, I place myself in the latter tradition, but we cannot deny, following the discussion in that section, that for all the similarities between accomplishments and achievements, there remains one clear difference. We showed in Section 2.3 that progressives can be formed from many Vendlerian achievements. However, we also hinted there at an interpretive difference. Progressives formed from achievements have only a ‘prospective’ interpretation, whereby the culmination specified by the achievement is imminent, in some sense. Meanwhile, progressives formed from accomplishments are acceptable even when the culmination is quite remote. For instance, as discussed earlier in this chapter, one can claim to be climbing a mountain (usually considered an accomplishment) before even taking a step towards the top, but one cannot claim to be reaching the summit (an achievement) until you are almost there. Now, what distinguishes climbing a mountain from reaching a summit, in terms of the tools we have at our disposal? The most obvious distinction is that the preparatory process of climb a mountain consists of a series of agentive actions15 —before the actual clambering upwards starts, there is the business of preparing equipment, planning routes, travelling to the foot of the mountain, and so on, carried out with the intention of climbing that mountain. Meanwhile, the preparatory process of reach the summit consists only of approaching a particular location, which is not an agentive action—a piece of wood could reach a summit as easily as a human could, if it was blown there by a strong wind. Accomplishments involve agentivity, then, and so can take the form of extended events, while achievements are characterized as nonagentive, and therefore do not have extended events available to them.16 This explains why progressives of accomplishments, but not achievements, are acceptable when the culmination is remote. In the achievement case, the process and the culmination must be part of the same core event, whereas in the accomplishment case, they must only be part of the same extended event, a requirement which imposes much less strict requirements on their temporal proximity. This, in turn, means that the implication that the culmination is reached is cancelled 15 See Levin and Rappaport Hovav (2008) for evidence that the basic meaning of climb describes only an agentive behaviour, despite apparent counterexamples such as The plane climbed to 35,000 feet. 16 The relevance of agentivity for Vendler’s distinction between accomplishments and achievements had been noted already in Verkuyl (1989), and to some extent in Pustejovsky (1991). However, Verkuyl collapsed the two classes completely, while Pustejovsky later (1995) adopted an alternative approach to the distinction in terms of event headedness, a notion I will not discuss here.
100
Single Events, Multiple VPs, and Agentivity
more readily with the progressive of an accomplishment than of an achievement. (38)
(a) I was climbing the mountain, but I stopped. (b) #I was reaching the summit, but I stopped.
In the accomplishment case, we are willing to accept reaching the summit as a goal of climbing the mountain, related to it as part of an extended event. However, we are also willing to accept that plans change, and that this goal may never be reached. In the achievement case, on the other hand, the progressive is licensed not by a remote goal but by a more or less physical ‘inertia’, and we are consequently much less willing to accept that things didn’t turn out as they should have, unless we are given good reason, as in (39). (39) I was just reaching the summit, but a particularly unfriendly bunch of hikers blocked my way, and I never made it. Distinguishing between accomplishments and achievements in terms of agentivity allows a further prediction which is beyond the reach of a purely eventstructural approach to this distinction. This relies on the fact that, although achievements are apparently always nonagentive, as shown by the semantic clash with the agent-oriented adverb in (40), at least some ‘accomplishments’ (according to the traditional Vendlerian classification) are typically agentive but can, in some cases, be nonagentive. And still other accomplishments are necessarily agentive. This is illustrated for destroy in (41), which contrasts with the necessarily agentive play in (42). (40)
(a) *John deliberately died. (b) *John deliberately arrived.17 (c) *John deliberately reached the summit.
(41)
(a) John deliberately destroyed the house. (b) John unintentionally destroyed the house.
(42)
(a) John deliberately played a Schubert scherzo. (b) #John unintentionally played a Schubert scherzo.
Although (42b) is grammatical, and easily interpretable, the interpretation we arrive at is one where John intended to play something, but not the Schubert scherzo. Maybe he sat down to play a Schumann sonata, but the wrong music had been put on the music stand, and he didn’t realize. John was acting 17 Do not be put off by examples such as John deliberately arrived late. In such cases, deliberately modifies only late. Therefore, we contrast John’s late arrival with alternatives in which he arrived on time or early, but we do not contrast it to alternatives in which he did not arrive at all.
Agentivity, Aspectual Classes, and the Progressive
101
agentively in (42b), then, even if the result wasn’t the one he intended. In contrast, in (41b), John didn’t necessarily intend to do anything. He could have fallen asleep with a cigarette in his mouth and burnt the place down. There appears, then, to be a contrast between canonical accomplishment verbs like play, which require an agentive subject, and verbs like destroy, which are less fussy. A Vendlerian approach would automatically class both of these as accomplishments, and a sentence such as (41b) would therefore be a nonagentive accomplishment. On the approach taken here, though, a nonagentive accomplishment is a contradiction. An accomplishment is agentive, by definition, and an accomplishment with the agentivity removed becomes an achievement. So whereas Vendler treats aspectual class as a lexical property of the verb (or, on a more charitable reading in view of the objections in Verkuyl 1993 and elsewhere, as a function of the lexical items making up the VP), lexical entries underdetermine the accomplishment–achievement divide on the theory advanced here. Exactly the same sentence can be an accomplishment in one case and an achievement in another, depending on whether the subject is construed as acting agentively or not. This prediction is borne out. If we accept that the best diagnostic of the accomplishment–achievement distinction in this area of English is that only achievements require the culmination to be imminent when used in the progressive, then we can show that the same sentence behaves sometimes like an accomplishment, and sometimes like an achievement, with respect to the progressive, depending on the agentivity of the subject. I will show this through a series of stories, starting in (43). (43) It had been a disastrous picnic, one which was really best forgotten. Tom clearly agreed, as he had picked up a nearby can of petrol and a box of matches, and was now approaching the leftovers with a look of steely intent on his face. Dick frowned. ‘What’s wrong?’, asked Harry. ‘Tom’s destroying what’s left of the food’, said Dick. Here, Tom has a plan to destroy the leftovers by bringing some petrol toward them, dousing them, and setting them on fire. This is a remote goal—the food is not currently being destroyed, or even affected, by Tom’s moving the petrol towards the leftovers. However, as the subject is acting agentively in this preparatory process, we predict that destroy what’s left of the food is an accomplishment in this case, and so the progressive can be used to describe such a remote action. This prediction is correct. This contrasts with (44). In (44a), the subject is not behaving agentively— the alcohol has taken care of that. This means that Tom here patterns with the
102
Single Events, Multiple VPs, and Agentivity
nonsentient, and so by definition nonagentive, subject in (44b). As a nonagentive subject defines an achievement, as opposed to an accomplishment, we predict that a use of the progressive in these cases is only felicitous if the arrival of the culmination point is imminent. As this is not the case in (44), the sentences are infelicitous. (44)
(a) It had been a gorgeous picnic, but with one drawback. Far too much alcohol had been involved. Most of the picnickers were now sleeping it off in the shade, with three exceptions, Tom, Dick and Harry. Tom was amusing himself with a wayward, uncoordinated dance that was bringing him inexorably closer to the leftovers. Harry, who had stayed sober, surveyed the scene and frowned. ‘What’s wrong?’ asked Dick. #‘Tom is destroying what’s left of the food’, said Harry. (b) It had been a gorgeous picnic on the beach, but now it was time to leave. The picnickers had arrived at low tide, and placed their blanket near the shore, but the tide had turned, and now each wave came a little closer to the leftovers. Tom surveyed the scene and frowned. ‘What’s wrong?’ asked Bill. #‘The sea is destroying what’s left of the food’, said Tom.
However, there is nothing inherently wrong with the progressive forms in (44). As these are progressives formed from achievements, we predict that they will be acceptable if the culmination is imminent rather than remote as in (44). By shifting the context to one where the destruction is well under way, as in the cases in (45), we rescue the sentences. (45)
(a) It had been a gorgeous picnic, but with one drawback. Far too much alcohol had been involved. Most of the picnickers were now sleeping it off in the shade, with three exceptions, Tom, Dick and Harry. Tom was amusing himself with a wayward, uncoordinated dance that had landed him in the middle of the leftovers, which he was obliviously kicking about and trampling into the earth. Harry, who had stayed sober, surveyed the scene and frowned. ‘What’s wrong?’ asked Dick. ‘Tom is destroying what’s left of the food’, said Harry. (b) It had been a gorgeous picnic on the beach, but now it was time to beat a hasty retreat. The picnickers had arrived at low tide, and placed their blanket near the shore, but the tide was coming in with astonishing speed, and was now lapping around the leftovers, which the picnickers hadn’t had a chance to salvage. Tom surveyed
Agentivity, Aspectual Classes, and the Progressive
103
the scene and frowned. ‘What’s wrong?’ asked Bill. ‘The sea is destroying what’s left of the food’, said Tom. Clearly, then, the same sentence can behave like an accomplishment in some cases and an achievement in others, purely on the basis of the agentivity of the subject. This offers clear support to the proposal that the accomplishment– achievement distinction, as defined here, belongs outside of the theory of aspectual classes, and instead in the domain of the interaction between agentivity and event structure. In this way, we see that both the Vendlerians, aiming to show that the two classes are distinct, and the Aristotelians, hoping to prove that they are identical, were correct in a way. In sum, an agentive subject allows macroevents to correspond to larger sets of happenings than would otherwise be the case. We attributed this to the transitivity of enablement relations and the intransitivity of direct causation, the only linguistically relevant causal relation. On the assumption that enablement is a relation between an agent and a set of events, such that the agent sees these events as forming a chain enabling him ultimately to reach a goal, we derive the link between agentivity and event size. This link was formalized through the definition of extended events, conceived of as a series of core events, each standing in a contingent relation to the next. We showed that such a definition also allowed us to recast the distinction between accomplishments and achievements in terms of agentivity, offering a third way in the decades-old debate concerning the unity or disunity of those two classes. Also in this chapter, we saw that similar restrictions on the structure of events apply in different syntactic configurations. Specifically, the categorization of sets of happenings as single events or as multiple events applies in the same way, regardless of whether those events are represented by a single verb phrase, or multiple verb phrases in an appropriate structural configuration. This is the source of the potential discrepancy between syntactic and semantic structures which motivates an event-structural constraint on wh-movement in Part II. This marks the end of the basic definition of event that I will propose in this work. This means that we now have a new set of units from which we can build larger structures. The following chapter investigates the formal possibilities for, and empirical necessity of, building larger semantic structures on the basis of the set of events delimited in Chapters 2 and 3.
4 Structures Built from Events 4.1 Introduction Over the past two chapters, I have made a series of increasingly expansive proposals concerning the internal structure of events. I have now reached the end of that process. However, before we return to issues of locality, this chapter pursues two interrelated goals. The first is to demonstrate that, now that we have a well-motivated set of event structures at our disposal, we can use the events that fit those structures as the atoms of higher-level grouping operations. In fact, such higher-level structure, in the form of descriptions of temporal units, rather than of events, is not only available in principle but empirically necessary. This claim is nothing new, of course. As with some of the material in Chapter 2, the semantic details in this area have already been much more fully, and more carefully, worked out by Kamp (1979), van Benthem (1983), and Landman (1991), along with a range of less formal accounts. I give two related pieces of evidence below concerning VP-modifiers to support this first claim. One piece, based on Lasersohn (1992), suggests that the adverbial alternately cannot be treated as a predicate over single events in anything like the sense of event defended in preceding chapters. The other piece of evidence returns to a puzzle that we left hanging in Section 3.1, concerning the behaviour of temporal PPs such as on Friday under VP-coordination. Similarly to alternately, we will see that the variable of which such modifiers are predicated cannot be the same variable which is existentially quantified by T. This leads us to revise the simple theory of the interaction of tense and the event variable proposed in Higginbotham (1985), according to which tense existentially quantifies the event variable and situates it with respect to the speech time in the temporal order. The second goal of this chapter is to show, as implicit in Higginbotham (1985) and explicit in much subsequent work, that there is an upper limit on the syntactic level at which an event variable can remain Î-abstracted. The guiding intuition is that, at some point in the course of the syntactic derivation and the compositional semantic interpretation thereof, we switch from
The Problems
105
manipulation of possibly internally complex event descriptions to situation of those events in time. This means fixing the event descriptions once and for all, and moving on to matters of the relationships between the times at which those events occur and the reference and speech times. As a consequence, manipulations of event variables such as those described in preceding chapters are impossible above the point at which we make this shift from event structure to temporal structure. Looking ahead to our return to locality in Part II, this gives us a principled reason to expect syntactic height effects in the locality data: once we have moved on from event structure to temporal structure, there is no going back to perform the sort of manipulations which would be necessary to meet the Single Event Condition and thereby allow extraction. The structure of this chapter is as follows. Firstly, in Section 4.2.1, I return to the Davidsonian discussion of modification which we started in Section 3.1, to consider a wider range of data than were previously relevant. From there, we move in Section 4.2.2 to the related problem noted by Lasersohn (1992), and, in Section 4.3, to a sketch of Lasersohn’s solution to it. We will see that Lasersohn’s solution to this latter problem straightforwardly offers an account of the previous problem, too. Along the way, though, much comes to depend on the role of an operator, discussed in Section 4.4, which shifts its argument from a predicate over event variables to a predicate over temporal units, which, following Kamp, can be cast in type-theoretic terms as sets of sets of events. In other words, there comes a point in the mapping from event structure to temporal structure at which we are dealing with different elements from the event descriptions characterized in the rest of Part I. Luckily, this is precisely the conclusion we need to make our locality theory fly in Part II.
4.2 The Problems 4.2.1 VP-Coordination and Temporal Modification In Section 3.1, we saw that Davidson’s analysis of action sentences as event descriptions, particularly as syntacticized by Higginbotham (1985), leads us to treat VPs as denoting properties of events. And given standard theories of coordination, this means that multiple coordinated VPs also denote properties of events. This was taken in Chapter 3 as straightforward evidence that multiple VPs could in principle jointly describe a single event. However, things get more involved if we add a temporal adverbial to each VP prior to coordination. As (1) shows, coordinating two such VPs leads to a representation in which a single event is taken to be located at two different times.
106
Structures Built from Events
(1) John eats fish on Monday and drinks wine on Saturday: (i) [[eat fish on Monday]]= ÎeÎx.(eat(x, fish, e) ∧ on(e, Mon)) (ii) [[drink wine on Saturday]]= ÎeÎx.(drink(x, wine, e) ∧ on(e, Sat)) (iii) [[eat fish on Monday and drink wine on Saturday]]= ÎeÎx.(eat(x, fish, e) ∧ drink(x, wine, e) ∧ on(e, Mon) ∧ on(e, Sat)) Given our flexible theory of event size and of possible event descriptions, we should not immediately rule out the existence of a single event covering both Monday and Saturday and consisting of a fish-eating and a wine-drinking. However, even if we admit that (1) does not in principle denote a contradiction, it is clear that it does not represent the meaning of the sentence accurately. This is because there is no way to get from the last line of (1) to an interpretation which associates the fish-eating specifically with Monday and the wine-drinking specifically with Saturday. We would end up with a truthconditionally equivalent last line if we swapped the time adverbials around, as in (2). (2) John eats fish on Saturday and drinks wine on Monday. There seems at first to be a simple solution to this problem. This involves assuming with Higginbotham that the event variable will eventually be bound by a tense head. We can then coordinate the conjuncts after the introduction of T0 , by which point the event variables will have been bound separately within each conjunct. In that case, there is no possibility of interpreting the wrong adverbial as associated with the wrong VP, and (1) and (2) come out with different truth conditions, as (3) shows. I omit the details of the derivation, but reconstructing them should be straightforward. (3)
(a) [[(1)]]= Îx(∃e1 .(eat(x, fish, e1 ) ∧ on(e1 , Mon)) ∧ ∃e2 .(drink(x, wine, e2 ) ∧on(e2 , Sat)))(j)
(b) [[(2)]]= Îx(∃e1 .(eat(x, fish, e1 ) ∧ on(e1 , Sat)) ∧ ∃e2 .(drink(x, wine, e2 )∧ on(e2 , Mon)))(j)
Something like this option must be possible in any case, because it is possible for two conjuncts to bear distinct tenses, as in (4). (4) Richard moved here from Glasgow and lives in a shared house. A single T0 node should be valued as either past or nonpast, and so is unable to assign both present and past tense. We must therefore assume that each conjunct contains its own T0 node, which in turn forces coordination above the T0 level (but below the subject) in an example such as (4). This, in turn, means that (3) represents a potential solution to the basic adverbial attachment problem. However, problems remain. The most obvious one is that examples
The Problems
107
can be constructed involving coordinated VPs modified by time adverbials, which nonetheless are both c-commanded by a single element, which is itself no higher than T0 . Two such cases involve auxiliaries, including modals, and negation. On any regular account of the phrase structure of English, negation occurs between the VP and T0 —this is the reason underlying the classic dosupport paradigm with negation (Chomsky 1955, Chomsky 1957). And auxiliaries and modals may be taken as heading their own projection below T0 , or as being generated in T0 itself, but are certainly assumed never to be generated above T0 . On the account given above, according to which coordinated VPs modified by time adverbials must be coordinated at or above the T level, the prediction is clear: coordinated VPs modified by time adverbials, but jointly c-commanded by a single occurrence of negation or an auxiliary, should be impossible, or should at least give rise to the incorrect truth conditions seen above. This prediction is clearly false, as shown by examples such as the following: (5)
(a) Negation: John [didn’t [[eat fish on Monday] and/or [drink wine on Saturday]]]. (b) Auxiliary: John [has [[eaten fish on Monday] and [drunk wine on Saturday]]]. (c) Modal: John [might [[eat fish on Monday] and [drink wine on Saturday]]].
Clearly, then, coordinated VPs modified by time adverbials can somehow be represented as coordinated VPs, even if larger constituents can also be coordinated. This implies, then, that there is something wrong with the specific implementation of the interpretation procedure for VP coordination sketched in Section 3.1. 4.2.2 Alternately A related problem concerning a different class of modifiers was noted by Lasersohn (1992). This time, the culprit is the following. (6)
The room was alternately hot and cold.
If we assume that nothing is hot and cold at the same time, then the adjective phrase hot and cold will necessarily have an empty extension at any single time. This is equally true for any other conjoined antonyms (wet and dry, rough and smooth, etc.), so, at the very best, we would predict alternately hot and cold, alternately wet and dry, and so on, to be truth-conditionally equivalent (at the worst, we may expect them to be trivially false of any subject). Lasersohn
108
Structures Built from Events
also argues convincingly that a semantics with fine-grained intensions built on properties as basic entities will not help us here. In Lasersohn’s words, the problem is that ‘the adverb “needs access” to the times at which an object is hot and the times at which it is cold, in order to assure that these times are arranged in an appropriate pattern’ (Lasersohn 1992: 384), but there is no way for alternately to get that access on the basis of the denotation of the conjoined adjectives taken together. One possible approach would then be to deny that the syntax of (6) is as in (7a), with the adverb modifying the conjoined adjectives, and to claim instead that alternately . . . and is a discontinuous conjunction, with direct access to the required conjuncts, as in (7b) (cf. Lasersohn 1992: 385). (7)
(a)
AP alternately
AP hot
(b)
and cold
AP
Conj
alternately
cold
and
hot
However, Lasersohn gives three pieces of evidence against such an approach. Firstly, there are cases like (8) where the conjunction is apparently more deeply embedded than the adverb. (8) John’s mood is alternately like that of a man who just lost his job and one who just won the lottery. (Lasersohn 1992: 385) Secondly, there are semantically similar cases which do not include and, as in the following (judgements are Lasersohn’s). (9)
(a) John raised each of his fingers in alternation. (b) ?John alternately raised each of his fingers. (c) %John alternately raised his two hands. (Lasersohn 1992: 386)
Dealing with the Problems
109
Both of these problems militate against the notion that the observed semantic pattern might be the work of a single discontinuous lexical item alternately . . . and. Finally, a complementary problem is shown by (10), which has an interpretation on which very scopes over both adjectives. This is just as we would expect on a syntactic structure like (7a), where very may attach within the first conjunct or outside the conjoined APs. However, the possible wide scope of very remains a mystery on a structure like (7b). (10) Alternately very hot and cold It seems, then, that the ‘access’ to the individual conjuncts must be provided within the semantics and cannot be directly reflected in the syntactic constituency.
4.3 Dealing with the Problems Before presenting a sketch of Lasersohn’s solution to the problem of alternately, it may be instructive to clarify the nature of the similarity between the problem raised by alternately and that of the temporal modifiers discussed in Section 4.2.1. Both problems concern the way that a predicate holds through time. The type of predicates which occur within the scope of alternately are like those which are modified by PPs like on Thursday in that they hold of the subject at certain moments but not at others. The function of alternately is to state that the moments at which the two conjoined predicates within its scope hold of the subject are, roughly, in complementary distribution, while the function of on Thursday is to state that one time at which the predicate within its scope holds is on a Thursday. The problem which alternately poses for a maximally simple syntactic implementation of Davidson’s analysis, as in Higginbotham (1985), is how to keep track of the set of moments when hot is true of the subject, and the set of moments when cold is true of the subject, when this information is obliterated at the level of hot and cold. Similarly, the problem posed by examples like (5) is how to keep track of which time is associated with which event, even when all the event variables from all the conjoined predicates have been identified. From here, the basic elements of the solution should be clear. In the case of alternately, two predicates are conjoined, and then modified. In the case of examples like (5), two predicates are modified, and then conjoined. In each case, we need to ensure that the things that are conjoined are somehow different from the things that are modified, and that the way in which the larger units are composed of smaller units can be checked.
110
Structures Built from Events
I will keep the name of event for the smaller units, associated with variables introduced by verbal (and possibly other) predicates, and use the term interval to refer to the larger groupings. Following the method of constructing a temporal structure from events in Kamp (1979), I assume that any set of events generates a totally ordered set of instants, corresponding to maximal sets of pairwise overlapping events. An interval is then a convex subset of instants, or a set of sets of events. This terminology is distinct from that adopted by Lasersohn, who refers to all of these units as events, but defines distinguished ‘simple’ and ‘uniform’ subclasses of events, corresponding roughly to the smaller structures discussed in the preceding chapters. This discrepancy between this account and Lasersohn’s is more or less terminological. However, it reflects the fact that Lasersohn’s account rests on a domain of events which is closed under joins. In contrast, the theory of events developed in Chapters 2 and 3 requires precisely that events are not closed under joins: the eventhood of e1 and e2 is no guarantee that e1 and e2 will jointly form a legitimate event. Whereas Lasersohn’s terminological choice implies that atomic events and events are the same kind of thing (the former being a subspecies of the latter), I emphasize the difference between the two, reflecting the privileged status of the smaller units in the theory developed in preceding chapters.1 Given this distinction between events and intervals, the basic interpretations of the relevant examples are now as follows: (11)
(a) The room is alternately hot and cold: There is an interval containing temporally non-overlapping events, some of which are hot-room events and some of which are cold-room events. (b) John eats fish on Saturday and drinks wine on Monday: There is an interval which contains at least two events, one of which is situated on Saturday and is a fish-eating event, and one of which is situated on Monday and is a wine-drinking event.
1 An intuitively similar, but formally quite distinct, theory of alternately was proposed by Winter (1995). Winter proposes, in essence, that the word and makes no semantic contribution but rather that an interpretation of two elements as conjoined comes from two freely available operations, product introduction and generalized conjunction. Product introduction forms a tuple φ, ψ, of type a • b, from two expressions ˆ, of type a, and ¯ of type b. Generalized conjunction then converts ˆ, ¯ into ˆ ¯. However, given Winter’s claim that and does not make any semantic contribution, Winter can divorce the application of generalized conjunction from the occurrence of and, and claim that in the case of alternately P and Q, generalized conjunction does not apply and alternately operates directly on the members of the tuple. Although it does so in a very different way, Winter’s theory maintains the basic distinction that I need between small units that alternately can see (members of tuples) and the larger units output by conjunction, and may therefore be equally usable here.
Events and Intervals in Syntax
111
Formally, the point of this approach is to ensure that the variable which is associated with the larger grouping of conjoined predicates is distinct from the variables associated with any of the conjuncts. This gives us a basis on which to build a mechanism for ‘reaching inside’ an interval and recovering the details of the events of which it is composed. Moreover, this is done purely semantically, without recourse to the sort of syntactic structure in (7b). All that remains is to replace Higginbotham’s simple theory of the syntactic instantiation of the event variable with an alternative, specifying how, and where, these variables come to be existentially quantified.
4.4 Events and Intervals in Syntax We have seen that tense cannot be adequately defined with respect to single events, on the conception of events defended in the rest of this chapter. Equally, tenses cannot be defined simply with reference to instants—as defined by Kamp (1979) as sets of pairwise overlapping events—as sentences such as (5) make reference to two non-overlapping events in the scope of a single tense operator. Two non-overlapping events cannot help but give two instants, and the past tense operator must treat these two instants as distinct, just like the events that generated them. However, these two instants must form part of an interval which covers them both—there is no necessary limit to the temporal extent of an interval. The simplest entities with respect to which tense relations can reasonably be described in Kamp’s theory, then, are intervals. At some point in the derivation, then, we may assume that the event variable is bound, and the derivation proceeds through manipulation of instant and interval variables, with the tense operator ranging over intervals. I will assume specifically that an operator Op takes an Ev, e, t argument (a VP or possibly vP, with or without adverbials),2 binding the event variable, and asserting that that event takes place at an instant which is, in turn, part of an interval. The tense head then locates that interval relative to speech time. Formally, Op looks like this.3 (12) [[Op]]= ÎR Ev, e,t .ÎInt
Ev,t,t .Îx.∃e.∃i Ev,t .(R(e)(x) ∧ e ∈ i ∧ i ∈ Int) 2 I am ignoring the interactions of these considerations with the VP-Internal Subject Hypothesis. For concreteness, I am treating subjects as merged above Op, in [Spec,T], although I don’t see any problems that arise if this proves to be wrong, as it probably is. 3 See von Stechow (2002) for a related idea. In this and subsequent representations, i is a variable over instants as defined by Kamp (1979) (of type Ev, t for events of type Ev), and Int is a variable over intervals (of type
Ev, t, t).
112
Structures Built from Events
The past tense could then be represented as in the following, which takes a predicate over intervals and an individual as arguments, identifies the individual as the subject of the predicate, and states that there exists an interval in which the predicate holds such that every instant in that interval precedes an instant containing the event (es ) of producing the utterance in question. (13) [[TPast ]]= ÎX
Ev,t,t, e,t Îx∃Int
Ev,t,t . X(Int)(x) ∧ ∀i Ev,t . (i ∈ Int → ∃i Ev,t (es ∈ i ∧ i < i ) The other tenses will be represented in analogous ways. A straightforward derivation of a sentence such as John ate fish will now proceed as follows: (14)
(i) [[eat fish]]= ÎeÎx.(eat(x, fish, e)) (ii) [[Op eat fish]]= ÎInt
Ev,t,t .Îx.∃e.∃i Ev,t .[eat(x, fish, e) ∧ e ∈ i ∧i ∈ Int] (iii) [[ate fish]](= [[TPast Op eat fish]]) = Îx∃Int
Ev,t,t ∃i Ev,t ∃e.(eat (x, fish, e) ∧e ∈ i ∧ i ∈ Int ∧ ∀i Ev,t . i ∈ Int → ∃i Ev,t .(es ∈ i ∧ i < i ) (iv) [[John ate fish]]= ∃ Int
Ev,t,t ∃i Ev,t ∃e. eat(j, fish, e) ∧e∈i∧i∈ Int ∧ ∀i Ev,t i ∈ Int → ∃i Ev,t .(es ∈ i ∧ i < i )
Modification of a VP by a temporal adverbial takes place as before: the adverbial locates the event in a given stretch of clock- or calendar-time (Tuesday, five minutes ago, etc.). When it comes to the interaction of Op and coordination, however, things get more interesting. Assuming that and is unfussy about the types of the constituents it coordinates (so long as the types match), it is in principle possible to apply Op to each conjunct separately, and then conjoin them, or alternatively to conjoin them first and then apply Op to the conjoined VPs. In practice, though, the second option will always lead to problems if both VPs are modified by incompatible adverbials. To see why, here are the two possible derivations for (1). The derivation in which coordination precedes application of Op will proceed as follows: (15)
(i) [[eat fish on Monday]]= ÎeÎx.(eat(x, fish, e) ∧ on(e, Mon)) (ii) [[drink wine on Friday]]= ÎeÎx.(drink(x, wine, e) ∧ on(e, Sat)) (iii) [[eat fish on Monday and drink wine on Saturday]]= ÎeÎx.(eat(x, fish, e) ∧drink(x, wine, e) ∧ on(e, Mon) ∧ on(e, Sat)) (iv) [[Op eat fish on Monday and drink wine on Saturday]]= ÎInt
Ev,t,t .Îx.∃e.∃i Ev,t .[eat(x, fish, e) ∧ drink(x, wine, e)∧ on(e, Mon) ∧ on(e, Sat) ∧ e ∈ i ∧ i ∈ Int]
Events and Intervals in Syntax
113
At this stage, the representation asserts the existence of an event which is on Monday and on Saturday. Furthermore, this event is a member of some instant i. But we have claimed, following Kamp, that instants are sets of pairwise overlapping events. What does it mean to overlap with an event which is half on Monday and half on Saturday? Do you have to overlap with both halves? With either? The above representation claims that there is a single event which stretches from Monday to Saturday, and any other event happening on Thursday, say, overlaps with it. But we have to ask how we can verify that an event happening on Thursday is overlapping with an event consisting of something on Monday, something on Saturday, and nothing in between. This is another of the cases in which we do not want our set of events to be closed under joins, particularly in view of the considerations discussed in Section 2.2, which supported the claim that events must be spatiotemporally continuous, and subevents thereof must be appropriately related. This is sufficient to rule out the representation in (15), if not as ill-formed at least as false, because no events are of the right ‘shape’ to make it true. Nothing will be able to affect the falsity of the assertion of the existence of a single event which takes place half on Monday and half on Saturday. We are left, then, with the alternative derivation of (1), according to which Op is applied to each conjunct separately before the two are conjoined. This derivation succeeds. I omit the details here, but the end result will be the following. (16) [[John ate fish on Monday and drank wine on Saturday]]= ∃Int
Ev,t,t ∃e1 ∃e2 ∃i1 Ev,t ∃i2 Ev,t . eat(j, fish, e1 ) ∧ on(e1 , Mon) ∧ e1 ∈ i1 ∧i1 ∈ Int ∈ i2 ∧ i2 ∈ Int∧ ∧ drink(j, wine, e2 ) ∧ on(e2 , Sat) ∧ e2 ∀i Ev,t . i ∈ Int → ∃i Ev,t (es ∈ i ∧ i < i ) So we now have a way to allow VP-coordination without the concomitant problems shown by (1–2) at the start of this chapter. The trick, as above, is to distinguish smaller units which can be modified by the temporal PPs, while recognizing a larger unit formed from the smaller units, which can form an input to the tense operator, auxiliaries, and so on. Here, the smaller units are events, and the larger units are intervals, or sets of sets of events. This architecture can also cope with the challenge posed by alternately, sticking close to Lasersohn’s account. The key point is that Lasersohn’s account of alternately requires a denotational structure closed under joins, and we do not want this to be true of our event structure. Therefore, we require that alternately be defined semantically with respect to some other structure which has the necessary property, and the structure of intervals seems like a reasonable bet.
114
Structures Built from Events
The reason for this is that we may stipulate that every two intervals I1 and I2 define a least upper bound. Assuming that intervals are convex, this is the smallest interval which contains I1 and I2 , and also contains any intervening intervals. (17) The least upper bound of any two intervals I1 and I2 , I 1 I 2 , is the smallest interval I such that I1 ⊆ I ∧ I2 ⊆ I. Assuming that the set of intervals is closed under this join-like operation , this least upper bound is unique for any pair I1 and I2 . Assuming that I1 precedes I2 , I 1 I 2 will be the interval that starts when I 1 starts and ends when I 2 ends. Joins therefore form a partial order based on the inclusion relation ⊆. The further assumption that we need for an account of alternately is that propositions persist from smaller intervals to more inclusive intervals.4 That is, if some proposition P is true at an interval I 1 , and I 1 ⊆ I 2 , then P is also true at I 2 . The effect of this is that, although there will never be a single event, or instant, which satisfies the description The room was hot and cold, there may be an event which satisfies the description The room was hot, and another event which satisfies the description The room was cold. If those propositions are true at intervals I 1 and I 2 , respectively, then there will necessarily be an interval I 1 I 2 at which both propositions hold. We can then define alternately relative to that interval. This analysis is more or less a notational variant of Lasersohn’s, so I only give a summary, and refer the reader to Lasersohn (1992) for the details. The first step is to formulate the notion of a P alternation-eligible pair, that is, two intervals I1 and I2 , at neither of which the proposition P holds (which in turn entails that P also does not hold at any subintervals of I1 or I2 ), but which jointly define a minimal interval I 1 I 2 at which P holds. The major truth condition on alternately P and Q is then that alternately P and Q is true at intervals which contain a [P and Q]-alternation-eligible pair. Other additions can be envisaged (should it contain multiple such pairs? Should it be composed entirely of such pairs? Should any intervening time be allowed between the times at which P and Q hold?), but such modifications can be straightforwardly made if they are felt to be necessary. (18) [[alternately]] = ÎP
Ev.t,t, e,t ÎI
Ev,t,t Îxe . (i) P(I)(x) = 1 4 This obviously requires reining in, possibly pragmatically, if it is to be compatible with our fairly precise linguistic indications of the temporal locations of events. See Lasersohn for some discussion.
Events and Intervals in Syntax
115
(ii) ∃I1 , I2 ⊆ I (iii) I1 , I2 is a P alternation-eligible pair. When the argument positions are filled in, ignoring the contribution of present tense, we arrive at a representation like the following: (19) [[The room is alternately hot and cold]]= ∃I.(hot_and_cold(I)(the_room) = 1 ∧ ∃I1 , I2 ⊆ I.( I1 , I2 is a hot_and_cold alternation-eligible pair)). Unpacking this last conjunct, the predicate hot_and_c ol d does not hold at I1 or at I2 , but it holds at I 1 I 2 , which is, by definition, still included within I . This, in turn, forces the predicate hot_and_c ol d to be decomposed into two predicates, each of which holds at one of I1 and I2 , which is the basic alternation that we previously failed to capture.5 The general approach outlined above therefore allows a solution to both of the problems raised above concerning Higginbotham’s simpler theory of the place of event variables in the syntax, while preserving the result that multiple verb phrases can describe a single event. In an example such as (15), nothing went wrong from a compositional view. Instead, the time adverbials forced a contradictory interpretation of the resulting structure. Without those time adverbials, it would have been quite legitimate to conjoin the VPs before or after embedding them under Op, resulting in single-event or multiple-event readings, respectively. If two VPs are conjoined before the introduction of Op, then, they will jointly describe a single event, as in Chapter 3. We may, for example, arrive at an interpretation like the following, in a case where two conjoined VPs may describe causally related subevents of a larger macroevent. (20) [[John Op (got drunk and fell over)]]= ∃Int, i, e.(get_drunk(j, e) ∧ fall_ over(j, e) ∧ e ∈ i ∧ i ∈ Int ∧ ∀i (i ∈ Int → ∃i .(es ∈ i ∧ i < i ))) Moreover, this theory predicts a syntactic, as well as a semantic and/or cognitive, limit to the possibility of macroevent formation. That limit is supplied, again, by Op. One consequence of such an operator is that whatever is within the scope of Op must correspond to a single event. Given that Op is assumed to merge only after a verb has merged with its internal arguments, but before T (which is now assumed to operate over intervals rather than events), that means that we expect syntactic height effects in macroevent formation. Although I haven’t argued that Op really is present in the syntax 5 This approach inherits several minor problems from Lasersohn’s. To mention one, the restriction of alternation-eligibility to pairs means that n-way alternations are not captured for n > 2 (e.g. The room was alternatively hot, cold, and freezing). This is one way in which Winter’s approach seems ultimately more promising.
116
Structures Built from Events
(as opposed to, say, constituting part of the syntax–semantics interface), the fact of its interaction with elements such as T and VP which clearly exist in the phrase structure is sufficient to guarantee that Op can induce syntactic height effects, even if it transpires that Op itself is absent from the syntax. We can distinguish three relevant cases concerning height effects induced by Op. In the structure (21), XP is adjoined to VP before Op is merged. As the complement of Op must describe a single event, this means that any Îabstracted event variables contained within the denotations of VP and XP must be construable as subevents of a single macroevent. (21)
VP Op
VP VP
XP
On the other hand, in (22), XP is adjoined after Op has merged. This is predicted to be semantically ill-formed if XP contains any Î-abstracted event variables, as XP is not in the scope of any occurrence of Op which could bind those variables. (22)
VP VP Op
XP VP
There is a way to rescue a structure such as (22), however. If XP contains a nonfinite VP which introduces a Î-abstracted event variable, an Op merged within the adjunct will bind that variable, as in (23). We expect, then, that an adjunct can be merged outside the scope of an instance of Op c-commanding the matrix VP, but that the events described by the matrix VP and the adjunct cannot be construed as jointly forming a single macroevent in that case.6 6 A further possible structure is found when multiple copies of Op appear on the same projection line. Each one binds the abstracted event variable in its scope, but a higher copy is still legitimate if a further event variable is introduced outside the scope of the lower Op. This quite legitimate configuration is found in regular tensed complement clauses, to be discussed in more detail in Chapter 7.
Events and Intervals in Syntax (23)
117
VP1 VP1 Op
XP VP1
X
VP2 Op
VP2
Given that Op, and often X, are phonologically null, this essentially claims that a string consisting of a nonfinite verbal adjunct attached to a VP is structurally ambiguous, which will lead to a semantic ambiguity between single-event and multiple-event readings. The discussion in Chapters 2 and 3 gives us some grounds for thinking that this is accurate in the case of nonfinite adjunct VPs. We will return to the matter in Part II. Such ambiguity is predicted to disappear, however, higher in the tree. We have seen that auxiliaries and tense semantically require an occurrence of Op within their scope. I will illustrate this with T below, but the same logic applies to auxiliaries. Op binds V’s event variable and introduces a Î-abstracted interval variable, which T would then bind. Accordingly, if a T node intervenes between the two verbs, we expect either semantic ill-formedness or a multiple event reading. There are two subcases of this. In the first, a nonfinite verbal adjunct is adjoined above T, as in (24). (24)
(a)
TP
TP T
XP VP
Op
(b)
VP
TP TP T
XP X
VP1
Op
VP1
VP1
Op
VP2
The fact that some adjuncts are restricted to high positions in the clause is well established: see Jackendoff (1972), Cinque (1999), and Ernst (2002). In such
118
Structures Built from Events
cases, the low attachment of the adjunct is independently ruled out, and so we are left with a choice between uninterpretability, as in (24a), and a multipleevent reading, as in (24b). The second subcase occurs when the verbal adjunct is finite, and so there is a T node within the adjunct itself. As above, this necessitates the presence of an Op node above V but below T within the adjunct. This will, once again, bind the event variable in the adjunct’s denotation, making it unavailable to form a single macroevent with the matrix VP’s denotation. As before, the only interpretable structure will result in a multiple-event reading, as in (25). (25)
VP1 VP1 Op
XP VP1
X
TP T
VP2 Op
VP2
In that case, syntactic limits on the size of event descriptions can be derived largely from limits on the distribution of Op. Op is restricted to positions above the positions where internal arguments are merged, and below T, so any adjuncts merged above T can only give multiple-event readings, whereas lower adjuncts will be ambiguous between single-event and multiple-event readings. Finally, very low adjuncts, adjoined below an internal argument position, should only allow single-event readings. This accords well with research on resultative secondary predicates, which are merged very low and which are often taken as specifiying a result of an event described by the verb, giving a single syntactically complex core event. Equally, in those cases where syntactic height does not determine the choice of a single-event or multiple-event reading, such ambiguities will nevertheless only arise if the verbal adjunct is nonfinite. If the adjunct necessarily includes a T node, or even auxiliaries or negation, then it must also include Op, and so the event that it describes will not be able to form a macroevent with the event described by the matrix VP. The theory sketched in this chapter, then, makes a number of predictions which will come in useful in Part II. Most salient of these, for now, is the prediction of syntactic height effects in the individuation of events, most notably that macroevent formation is impossible after T has been merged.
Part II Events and Locality
This page intentionally left blank
5 Where We Stand We have been concerned in Part I with building a particular theory of event structure based on a range of formal and cognitive semantic considerations. However, the point of this book is not just to elaborate this structure, but to show its relevance to certain phenomena which are normally considered narrowly syntactic. In particular, Chapter 1 suggested the Single Event Condition, repeated below, as an approach to certain recalcitrant problems in A -locality. (1)
The Single Event Condition (first approximation): An instance of wh-movement is legitimate only if the minimal constituent containing the head and the foot of the chain can be construed as describing a single event.
Now we have some idea what an event looks like, we can turn to the puzzles listed in Section 1.3.1 and see how things measure up. This chapter will first summarize the theory put forward in Part I and detail the predictions of the Single Event Condition. The rest of Part II will then be taken up with three case studies exploring the Single Event Condition as it applies to extraction from adjuncts (Chapter 6), followed by a demonstration in Chapter 7 that the condition does no harm with respect to core issues concerning successivecyclic A -movement, and may even do some good there too. In Chapter 8, we discuss the major architectural issue that arises from the above, namely the question of why A -locality should care about events. Finally, Chapter 9 concludes. It is hopefully clear by now that the approach being pursued here is an irreducibly interface-based one. This means two things. Firstly, there is a nontrivial interface between syntax and semantics. We saw in Chapter 3 that the semantic notion ‘single event’ does not straightforwardly map onto any single phrase-structural configuration, and I will continue to assume, pending further discussion in Chapter 8, that events, as semantic units, do not map onto any particular syntactic units. However, syntactic factors undoubtedly have a very large role to play in constraining wh-movement: this work is certainly not a misguided attempt to do away with Subjacency, for example,
122
Where We Stand
or many of the other syntactic constraints discussed in Section 1.2. More narrowly, the claim, as in Chapter 1, is not that adjuncts are not islands. Rather, I claim that adjuncts are more like weak than strong islands. The basis for that weak islandhood, and the broad patterns of extraction that it predicts (e.g. extraction only of referential DP arguments, as in Rizzi 1990, Cinque 1990) is probably partially syntactic, as discussed in Chapter 1. Instead, the present work is an attempt to complement those syntactic constraints by finding a different resource which can do things that those constraints only do badly. This in no way removes the need for syntactic machinery regulating A -dependencies (although it may well affect the specific formulations adopted). Syntax has a part to play in three areas here. Firstly, it will continue to rule out the core locality violations that have been systematically investigated since Ross (1967). Secondly, considerations relating to the syntax of the individual adjunction constructions investigated will turn out to be relevant to some extent. Finally, the way in which phrase structures are related to semantic structures, as sketched in Chapter 4, will mean that extraction from certain phrase-structural positions will be ruled out, albeit on ultimately semantic grounds. However, the inclusion of substantial semantic and pragmatic factors in our model of locality leads to a prediction of a different ‘shape’ to patterns of acceptability of wh-movement than that which would be expected under a purely syntactic account. In particular, we expect that many instances of wh-movement will be semantically or pragmatically unacceptable, for reasons relating to the Single Event Condition, rather than syntactically deviant. And given that the Single Event Condition forces single-event readings, regardless of whether these readings are natural or even plausible, we expect coercion to play a significant role here, with all the fuzzy, gradient acceptability judgements that that entails. In contrast, according to many current syntactic theories, the grammaticality or otherwise of a sentence is a discrete fact. For example, the minimalist architecture described most fully in Chomsky (1995) allows for only two outcomes of a syntactic derivation: either it crashes or it converges. Although there is nothing in principle preventing a version of minimalism in which different types of syntactic deviance are associated with different levels of grammaticality (for example, Chomsky 1986 allowed three levels of grammaticality, corresponding to crossing of 0, 1, or more than 1 barriers), in practice, such theories have rarely been developed within the Principles and Parameters framework, to my knowledge (although gradient Optimality-Theoretic models of syntax, for example, have been developed). If the acceptability of wh-question formation were a purely syntactic matter, then, a narrowly P&P-based approach would expect grammaticality
Where We Stand
123
judgements to be quite categorical in these matters. On the other hand, an account such as the one proposed here, in which factors such as coercion play a significant role, suggests a more gradient pattern of acceptability judgements. This is because such operations are not necessarily as automatic as syntactic structure-building operations, and people’s judgements of the acceptability of a given sentence may reflect factors such as the amount of cognitive effort required to coerce the described events into a contingent relation. On the account proposed here, then, we expect gradient patterns of acceptability judgements, subject to both inter- and intra-speaker variation, and reflecting the perceived functional and associative relations between the specific subevents described in particular examples. To give an example, let’s return to the Unlikely Antilocality Puzzle of Chapter 1. The basis of this puzzle is that, under certain circumstances, adding extra material within the adjunct improves extraction from that adjunct. (2)
]? (a) ??What did John drive Mary crazy [fixing (b) What did John drive Mary crazy [trying [to fix
]]?
However, the shorter example (2a) is clearly not fully ungrammatical, and is substantially less degraded than violations of the conjunct constraint on extraction from coordinate structures, for example, or many cases of extraction from subjects, as shown in (3). ]? (3) (a) *What did Mary fix [the radiator and ] enrage Michael? (b) *Who did [that Mary kissed Moreover, for some speakers, (2b), the better of the two examples in (2), is still slightly degraded. Chapter 6 shows that we can predict such a subtle pattern of judgements. The degradation of (2b) is related to its status as extraction of a referential DP complement out of a weak island. Such constituents are the most acceptable thing to extract from weak islands, but they are still somewhat degraded for some speakers. Syntactically, nothing else goes wrong in the case of (2a). However, the process of macroevent formation to satisfy the Single Event Condition requires some coercion in that case, which typically leads to a perception of further slight degradation. In contrast, the cases in (3) are much more categorically bad, in the former case because of whatever ultimately derives the exceptionally robust status of the conjunct subcase of the Coordinate Structure Constraint, in the latter case because of whatever ultimately derives the Sentential Subject Constraint. At our present level of understanding, it is reasonable to assume that both constraints are ultimately syntactic in nature.
124
Where We Stand
On the theory sketched in Chapter 4, satisfaction or violation of the Single Event Condition is ultimately determined by the scope of the operator Op, which binds the event variable in its scope. More specifically, in a construction containing an untensed verbal adjunct, Op can attach quite freely to each verbal constituent individually, or it can attach once in a position where it has scope over both event variables. This generates multiple-event and single-event readings, respectively, and these two options will always be produced by the syntax in examples containing nonfinite adverbials, whether the sentence in question is declarative or interrogative. The role of the Single Event Condition is to filter out multiple-event readings in interrogatives, leaving only the single-event readings. Whether a given single-event reading is felicitous or not depends on an interlocutor’s willingness to admit that any subevents of that single event are contingently related, which depends in turn on world knowledge and the interlocutor’s creative ability to perceive links between subevents. Borrowing some familiar examples from Chomsky’s earliest syntactic writings, a common assumption has been that the illformedness of a case such as What does John work whistling? ((44c) in Chapter 1) is parallel to that of the flatly ungrammatical Furiously sleep ideas green colourless, while the proposal here is that its deviance is closer (although clearly not identical) to that of the nonsensical Colourless green ideas sleep furiously. Moreover, as we will see, there are limiting cases where a single-event construal is either automatic (for example, because the subevents are explicitly stated to stand in a contingent relation, or because there only is one event description in the first place), or impossible (because other structural factors such as tense prohibit the formation of a single-event reading). In such cases, the gradience of the phenomenon is masked, and replaced by a more or less discrete pattern of acceptability, similar to the discrete grammaticality judgements we expect in response to narrow syntactic structures. This theory can therefore explain the fact that some extractions from adjuncts ‘feel’ either categorically ungrammatical or fully acceptable, while others elicit more gradient responses, despite the absence of any obvious syntactic distinctions which might motivate such a pattern. Based on the findings of Part I, we can make several specific predictions about how these patterns will manifest themselves. In general, every time we have no reason not to form a single macroevent, we predict wh-movement out of the constituents describing subevents of that macroevent to be more or less acceptable. However, we have also seen many reasons why macroevent formation may be blocked. The three key reasons are the following:
Where We Stand (4)
125
(a) Inappropriate construal of relations among subevents; (b) Factors pertaining to world knowledge; (c) Syntactic height effects relating to Op.
I claim that these effects can offer a unified explanation of the puzzles documented in Chapter 1. In the rest of this chapter, I will spell out what the factors in (4) predict. Turning first to the construal of relations among subevents, this is guided by two main factors: (5)
(a) The space of possibilities permitted by event structure in general; (b) The specific restrictions imposed by the element introducing the adjunct (if any).
(5a) refers to the fact that the admissible types of macroevent formation are, in and of themselves, quite restricted. We argued in Part I for a series of cognitive and linguistic units built from subevents in three stages. Firstly, we build core events according to a very restrictive template (Chapter 2). Core events consist of a maximum of two subevents, a process and a culmination, such that the process directly causes the culmination. Any more than two subevents, and any relation other than direct causation, are inadmissible within a core event. However, core events may be chained together to form extended events (Chapter 3). There is no upper bound to the number of subevents in an extended event, but extended event formation is constrained by the requirement that subevents are perceived by the relevant agent as connected by relations of causation or enablement, and correspond to the plan that that agent has when performing the initial subevent. Event formation privileges a class of contingent relations, then, consisting minimally (and, I suspect, maximally) of direct causation and enablement. These two relations are linked to temporal structure, in that if e1 causes or enables e2 , then it also precedes e2 temporally, although two events can of course be temporally ordered while remaining independent in terms of causation or enablement. This is, primarily, where (5b) comes in. Although the element introducing certain classes of adjunct, for example rationale clauses and possibly by-phrases, directly specifies a contingent relation between the events described by the matrix VP and the adjunct, more often (particularly with prepositions such as before and after), a temporal relation, or some other non-contingent relation, is specified. In either of these cases, however, the possibilities for macroevent formation are restricted. If the adjunct in question is a rationale clause, for example, then the adjunct event must be construed as the goal of an extended event, with the matrix event as the initial subevent
126
Where We Stand
thereof. If, on the other hand, the adjunct is an after phrase, then we know that the matrix event follows the adjunct event in terms of temporal order. This is only compatible with macroevent formation if the adjunct event causes or enables the matrix event, rather than vice versa, as the opposite order would violate the mapping between temporal and contingent relations discussed in Section 3.2. Beyond these formal constraints, there are pragmatic and interpretive factors to consider (4b). There have long been claims for stored schemata representing our knowledge of how common situations unfold, for example the frames of Minsky (1974) or the scripts of Schank and Abelson (1977). Such frames or scripts surely influence our ability to perceive smaller events as part of a comprehensible whole. It is not automatically the case, therefore, that we are willing or able to take two subevents which are not explicitly specified to stand in a contingent relation, and arrive at an enriched interpretation where the two events are contingently related. Matters of world knowledge of which events usually lead to which other events, and, at a less automatic level, of which events meet the necessary conditions to be able to cause which other events (even if such a relation would be quite unusual) may make such enrichment either close to automatic or near impossible, depending on the descriptive content attached to particular event variables. This is a further way in which an attempted contingent construal can be infelicitous. As mentioned in (4c), we predict that, in addition to semantic and pragmatic effects relating to event structure, the possibility of extraction will also be constrained by height effects. Building on the theory developed in Chapter 4, extraction is not possible from an adjunct containing an occurrence of Op, and extraction is not possible from a verbal adjunct adjoined above an occurrence of Op in the matrix clause.1 This means, in more concrete terms, that extraction from only low adjuncts (roughly, the VP-adjuncts) is predicted (any higher and Op would already have been merged in the matrix clause), and extraction from only untensed adjuncts is permitted (or perhaps, as will be suggested in Chapter 6, even adjuncts without modals, auxiliaries, etc. — any more structure within the adjunct would require Op, on the theory of Chapter 4). The following two chapters will test this intricate network of predictions. The primary testing ground will be extraction from three classes of verbal adjunct, to be introduced in Chapter 6. We will see that the Single Event Condition makes largely accurate predictions with respect to these three classes, 1 See Chapter 7 for a way of resolving the obvious clash between this claim and examples of successive-cyclic movement out of a finite embedded clause, which clearly involves movement past an occurrence of Op, if we follow the proposals of Chapter 4.
Where We Stand
127
although a substantial addition to the event structures described in Part I will be presented in Section 6.3.3 to account for some surprising interpretive asymmetries in the case of extraction from bare present participial adjuncts. It goes without saying, though, that these three classes of adjunct are not privileged with respect to the Single Event Condition. Ultimately, that condition will stand or fall according to its ability to predict acceptability of extractions more generally, in conjunction with a theory of syntactic constraints on movement. A comprehensive coverage of the interaction of the Single Event Condition with all the conceivable cases of extraction in natural language is far beyond the scope of the present work. However, Chapter 7 shows how the condition handles the sine qua non of any theory of A -dependencies, namely their apparently unbounded character; and also how it interacts with a successive-cyclic theory of movement. This is placed in the context of a discussion of the interaction of presupposition, factivity, and event structure, working within the approach to presupposition pioneered by van der Sandt (1992). One potential benefit of this approach is a novel, nonsyntactic, explanation of the factive island phenomena investigated by Erteschik-Shir (1973), although this extension is not entirely unproblematic, as we shall see in Chapter 7. By that point, we have a robust theory of the interaction of event structure with considerations of locality. The final substantive chapter, Chapter 8, explores the extent to which this event structure is genuinely independent of phrase structure, an important issue in the light of the theories of Lakoff (1970), Hale and Keyser (1993), and Ramchand (2008), among others, which seek to characterize event structure as directly represented in the narrow syntax. Although theories such as these have huge explanatory potential, and have been employed in descriptions of scores of facts where the theory presented here has nothing to offer, my conviction is that a substantial part of the representation of event structure, as laid out in Part I, must be kept strictly separate from phrase structure. Clearly, I cannot state that there is no way to make the necessary distinctions within phrase structure, when our theory of phrase structure is also still in a process of elaboration. However, I will show in Chapter 8 that discrepancies between current models of phrase structure and the theory of event structure given above give grounds for scepticism about extending the more syntactocentric approach to cover such a model of event structure in its full generality. The challenge which then arises is to keep the real results of the syntactocentric programme, particularly those which relate event-structural patterns directly to patterns of morphology and/or argument structure, while allowing event structure what I believe to be the necessary degree of independence.
128
Where We Stand
Chapter 8 will also consider the division of labour between syntax, semantics, pragmatics, and processing with respect to A -locality more widely, in an attempt to address the question of why A -movement should care about events. The final challenge in Part II is to give an answer to that obvious, but cussed, question.
6 Extraction from Adjuncts In this chapter, we arrive at the main syntactic point of the present work. We will examine patterns of extraction from three classes of adjunct, namely rationale clauses, as in (1a); prepositional participial adjuncts, as in (1b); and bare present participial adjuncts, as in (1c). (1)
]? (a) What did you come round [to work on ]? (b) Who did John get upset [after talking to ]? (c) What did John come back [thinking about
Each of these classes of adjuncts has a different profile of acceptable and unacceptable extractions. As we will see, to a large extent, this is as predicted by the Single Event Condition, in conjunction with the theory of event structure in Part I. However, the three classes are also not syntactically identical. The syntax of BPPAs, in particular, will prove to be interesting in its own right from the perspective of the extraction facts. The chapter devotes one section each (Sections 6.1 to 6.3) to the three classes of adjunct that we concentrate on here. This is followed by a conclusion in Section 6.4, and an appendix in Section 6.5 containing a very brief discussion of extraction from adjunct prepositional phrases. One thing that this chapter will not attempt to do is provide a general theory of adjuncts or adjunction. We do not need to care whether adjuncts have a meaningful syntactic identity with respect to locality or anything else. This is one way in which the present theory is substantially different from that embodied in the Condition on Extraction Domain or its successors. In much (though certainly not all) previous work on adjuncts and locality, the theory has been based on some unifying property of adjuncts, either taken in isolation or (more often) together with subjects. For example, the CED in Huang (1982) is built upon the common identity of subjects and adjuncts as constituents which are not properly governed, while the locality theory of Uriagereka (1999) unifies the two as Spellout domains. In contrast, the position taken here is that the diversity of the adjunct classes is crucial in accounting for their different behaviour with respect to extraction.
130
Extraction from Adjuncts
This gives the theory here a very different shape from most theories of adjuncts and locality, because this theory really doesn’t care very much at all about the fact that the particular domains considered here all happen to be adjuncts. There is nothing about the Single Event Condition, or the characterization of event structure, which would give adjuncts as a monolithic group any privileged status. Of course, this removes any possibility of making generalizations over subjects and adjuncts in the way which makes work stemming from Huang (1982) so conceptually appealing. Chapter 1 suggested that this is not such a bad thing, as the behaviour of subjects and adjuncts with respect to locality is not always so similar after all. I return to the issue in Chapter 7.
6.1 Rationale Clauses Purpose clauses occupy a special place among adjunct classes because their permissiveness with respect to extraction, at least in English, has long been recognized, and is not under dispute. Faraci (1974: 20) was, to my knowledge, the first to explicitly recognize the possibility of extracting from purpose clauses, at a time, before Cattell (1976), where the ability to productively extract out of a class of adjuncts was not yet cause for surprise. Since then, the grammar of extraction from purpose clauses has been refined to some extent, notably by Browning (1987) and Jones (1991). I will concentrate here on one type of purpose clause, namely the rationale clauses. These are distinguished by their control properties. Rationale clauses are controlled by the subject of the clause to which they are attached, as in (2). (2) Ii came here today [proi to talk about politics]. These examples must be distinguished from the superficially similar, but thematically distinct, classes of subject-gap and object-gap purpose clauses, illustrated in (3a,b), respectively. (3) (a) John brought Billi in [proi to work on this car]. (b) John brought these tyresj in [pro to put ej on this car]. In a subject-gap purpose clause, such as (3a), the sole null argument is the subject, and this argument is coindexed with the object of the matrix, as opposed to the rationale clause in (2), where it is coindexed with the matrix subject. In an object-gap purpose clause like (3b), on the other hand, there are two null arguments. The subject of the purpose clause is preferentially, though not obligatorily, controlled by the matrix subject, while the object of
Rationale Clauses
131
the purpose clause is coindexed with the matrix object. The fact that this gap is in object position means that it cannot be pro. Instead, it is typically taken, following Chomsky (1977), to be a null operator which moves to the front of the purpose clause by A -movement. The reason for concentrating on a single type of purpose clause is purely expository. Extraction is possible from all three types of purpose clause, with a similar degree of freedom in each case, as shown in (4). (4)
(a) Which cari did John bring Billj in [e j to work on t i ]? (b) ? Which cari did Johnj bring the tyresk in [e j to put e k on t i ]? (Jones 1991: 91)
However, illustrating the points made below for all three classes would lead to a lot more bulk with no great increase in understanding, so I concentrate on rationale clauses, more or less arbitrarily, and leave it to the interested reader to verify that the same description holds of subject- and object-gap purpose clauses. The pattern of extraction from rationale clauses is very simple: extraction of the relevant class of DP complements is always possible, as shown in (5) below, repeated from Chapter 1. (5)
(a) Whose attention is John waving his arms around [to attract ]? ]? (b) What did you come round [to work on (c) Which paper did John travel halfway round the world [to submit ]? ]? (d) What did Christ die [to save us from
Taking a broader perspective on this matter, however, a less trivial question emerges. This is the question of why it should be specifically this class of adjuncts which allows extraction so readily, as opposed to, say, after-phrases or BPPAs. Of course, it is possible to stipulate a syntactic story which facilitates extraction from this particular class of adjuncts in some way. The question is whether such a story has any ability to explain why extraction from specifically purpose clauses should be so free. From the persepective of the Single Event Condition, the answer is straightforward. As noted above, rationale clauses are distinguished from other types of adjunct under consideration by the fact that a contingent relation necessarily holds between the two events described in the matrix and adjunct VPs. Specifically, the adjunct describes the goal which the agent envisages when performing the action described in the matrix VP. So, in an example like (5), it is automatically possible to construe these two events as a single event, and so extract from the adjunct.
132
Extraction from Adjuncts
As we work through the different types of event-structural relations among adjunct and matrix VPs in this chapter, it will perhaps be useful to have a pictorial representation for the event structures under consideration. The relations holding in a structure with a rationale clause can be represented as in (6). This should be interpreted as follows. Each circle corresponds to a core event, and an arrow between two circles represents a contingent relation holding between two events. Finally, a sequence of circles enclosed within a dashed box corresponds to an agent’s plan. (6) John is working hard in order to pass his exam.
Work
...
...
Pass
This is the reason why extraction from specifically rationale clauses is so free, on the account proposed here. The semantics of the rationale clause entails that a contingent relation holds between the events described in the matrix VP and the adjunct, and the prediction made by the Single Event Condition is consequently clear: all else being equal, extraction from an untensed rationale clause should be automatically possible. This is almost what we find. As soon as we control for the distribution of rationale clauses in declaratives, the relevant cases of extraction from rationale clauses are almost always well-formed. We will come back to the exception shortly, after considering the general case. The distribution of rationale clauses in declaratives is constrained by one major factor, namely the requirement that the matrix event be under someone’s control, under normal circumstances by the matrix subject. This requirement, also a prerequisite for extended event formation, makes a rough initial cut between cases with matrix accomplishments and activities, on the one hand, and the other aspectual classes on the other: as accomplishments and activities readily allow agentive subjects, they also readily allow adjunction of a rationale clause, unlike the other classes. As points, states, and achievements are often nonagentive, however, rationale clauses are frequently infelicitous with these classes. This is illustrated in (7). (7) (a) Matrix VP describes accomplishment: John travelled to England [to make a sculpture of the Queen]. (b) Matrix VP describes activity: John is jumping up and down [to attract Mary’s attention]. (c) Matrix VP describes achievement: ??John arrived at base camp [to reach the summit in a few days].
Rationale Clauses
133
(d) Matrix VP describes point: #John noticed the typo [to help the copyeditor]. (e) Matrix VP describes state: #John knew the answer [to frustrate the other pub quiz teams]. However, as expected, there are cases where canonical achievements and points do allow a rationale clause, with an agentive interpretation of the subject, as in (8a,b) and (8c), respectively.1 (8) (a) Christ died [to redeem our sins]. (b) I came in today [to talk to you about fly fishing]. (c) I tapped my nose [to signal the presence of an intruder to Mary]. This shows that the baseline for extraction from rationale clauses is one in which many possibilities are already ruled out for reasons independent of locality of movement. Once such factors are controlled for, however, extraction of complements from rationale clauses is generally free. Certainly, movement out of any of the foregoing grammatical examples is quite possible, as shown by (5) and (9). (9)
(a) What are you working so hard [to achieve ]? (b) Who did John travel to England [to make a sculpture of to Mary]? (c) What did you tap your nose [to signal
]?
Extraction of complements from rationale clauses is essentially unrestricted, then, once we control for two things. The first is the factors restricting the distribution of rationale clauses in declaratives in the first place. The second thing which needs to be controlled for, however, is who is in control of the matrix event. As mentioned in footnote 2 in Chapter 3, there are certain cases in which the matrix subject is nonagentive, and so cannot be responsible for the plan described by the rationale clause. Following Culicover and Jackendoff (2001: 504), there are two subcases of this. In the first, as in (10), the matrix clause is passive, in which case there is an implicit agent, who controls the rationale clause. (10) The ship was sunk (by the owners) [to collect the insurance]. In the second class of cases, in the words of Culicover and Jackendoff (2001), ‘in the absence of any explicit or implicit Agent in the matrix clause, the in order to clause can be controlled by an implicit “stage manager” or “playwright” who has control over the course of the action’, as in (11) 1 See the discussion of (37) in Chapter 3 on the impossibility of rationale clauses modifying stative verbs.
134
Extraction from Adjuncts
(11) (a) The ship sinks [to further the plot]. (b) This story will appear on the back page [in order not to embarrass the president]. In either of these cases, and particularly in the latter, extraction from the rationale clause is substantially worse than the general case described above. ]? (12) (a) ??What was the ship sunk [(in order) to collect ]? (b) *What does the ship sink [to show (c) *Who will this story appear in the paper [to please ]? I must confess that I have little idea what’s going on here. The correct description of the facts is straightforward: the controller of the rationale clause must be the matrix subject (or, if the passive case is deemed acceptable, the controller must be a linguistically present individual, given the presence of implicit agents in passives). Clearly, the controller of the rationale clause in (11) is not the matrix subject: (11a) does not mean that the ship is expected to further the plot but rather that the ship’s sinking permits someone (author, narrator, actors, or whoever) to further the plot. The question, though, is why A -movement would care about any of this. There is undoubtedly some extra semantic complexity in examples like (11), but, according to the criteria in Part I, such sentences still describe single events. I must leave this unresolved here, taking some comfort in the fact that things are at least as mysterious on a syntactocentric approach: there is a semantic difference here which has some chance of being pertinent, but no obvious syntactic difference of any relevance between good and bad cases of extraction from rationale clauses. There is, then, an even greater lack of good reasons why A -movement would care about controller choice on any standard locality theory. This puzzling state of affairs aside, the interfacebased theory incorporating the Single Event Condition is already arguably showing slight advantages in comparison to more syntactocentric theories, when it comes to explanatory adequacy in this fairly small empirical corner. Most importantly, we can explain the fact that specifically this type of adjunct allows subextraction so freely, as opposed to simply describing it. This is a minor victory for the Single Event Condition.
Prepositional Participial Adjuncts
135
6.2 Prepositional Participial Adjuncts The rationale clauses discussed in the previous subsection represent the simplest and clearest case where the Single Event Condition predicts extraction out of an adjunct to be acceptable, providing syntactic constraints on extraction are met. The element introducing the adjunct specifies that the events described in the matrix and the adjunct VPs are contingently related, meaning that macroevent formation, and consequently extraction, is always possible. And apart from one significant wrinkle, that prediction is correct. However, things are not always so clear-cut. In many cases, the patterns of extraction from adjuncts are either much more complex, or much more variable in terms of acceptability. In itself, this latter fact presents a problem for a purely syntactic approach to extraction patterns, as there is currently no clear way of accounting for such variability in many major current models of grammar. Hopefully, it will become clear that the interface-based approach is better equipped to deal with these phenomena. In this section, I present a more complex pattern of data, where the Single Event Condition correctly predicts that we will find less categorical grammaticality judgements. This concerns the class of prepositional participial adjuncts. Although these form a natural class based on their internal syntactic structure, the specified relation among events is here dependent on the semantics of the preposition introducing the adjunct, and so we may expect to find extraction patterns from these adjuncts varying according to the choice of that preposition. Moreover, there are effects of syntactic height to consider, as it has been claimed (e.g. by Cinque 2003) that regular PPs are not freely ordered within the clause, suggesting differences among PPs in height of attachment. I will not attempt a systematic discussion of extraction from every class of prepositional participial phrase in English here, but rather choose cases which are illustrative of the range of possibilities we predict on the approach developed above. So I will present data concerning one preposition (by) which appears, at least at first sight, to specify a contingent relation; two prepositions (before and after) which specify noncontingent relations, but which may receive enriched, contingent interpretations congruent with the requirement, discussed with reference to (15) in Chapter 3, that a causing (or enabling) event must temporally precede the caused (or enabled) event; and two prepositions (since and upon) which resist any extraction out of their complement. Before we proceed, it must be noted that the grammaticality judgements presented here are perhaps the most marginal of any discussed in this monograph. Many native speakers of English are very reluctant to accept cases of
136
Extraction from Adjuncts
extraction from many of the above classes (including Browning 1987, who frequently assumes similar examples to be very marginal). Accordingly, the judgements given below will often be marked to reflect the contrast between acceptability for some speakers (%) and general rejection (*). However, it is still legitimate to report these findings, given the existence of speakers (including, but certainly not limited to, myself) whose idiolects allow extractions in the pattern reported here. Moreover, it will become apparent as we proceed that this is in fact the area where the greatest amount of speaker variation in acceptance is predicted by the present approach. 6.2.1 Patterns of Extraction from By-participial Adjuncts On the assumption that a by-phrase asserts that the matrix and adjunct events in question are contingently related, such that the adjunct event causes or enables the matrix event, by-phrases are predicted to allow subextraction quite freely. In fact, that proves to be the case. Certainly, plenty of grammatical cases of extraction out of such adjuncts can be found. (13) gives a selection. (13) (a) %Which speech did John make his point [by reciting ]? (b) %Which item of furniture did John upset his hosts [by eating (c) %Which path did John reach the summit [by walking along
]? ]?
Once again, the distribution of by-phrases in declaratives is restricted. Two major sources of ungrammaticality can be isolated. The first stems from an apparent restriction on the combination of aspectual classes in declarative examples containing by-phrases. The pattern is an unusual one, and I do not fully understand it, but I have been unable to find acceptable cases of byphrases in the following aspectual configurations (although here, too, individual judgement patterns vary somewhat). (14) (a) Matrix VP describes an accomplishment, adjunct describes an achievement: *John drove Mary crazy [by reaching the summit]. (b) Matrix VP describes an accomplishment, adjunct describes a point: *John drove Mary crazy [by noticing the problem]. (c) Matrix VP describes a point, adjunct describes an accomplishment: *John noticed the problem [by building a prototype]. (d) Matrix VP describes a point, adjunct describes an achievement: *John noticed his brother [by turning up]. (e) Matrix VP describes a point, adjunct describes a point: *John noticed his brother [by turning the telescope on]. (f ) Matrix VP describes an activity, adjunct describes an accomplishment: ??John works on the project [by building a prototype].
Prepositional Participial Adjuncts
137
(g ) Matrix VP describes an activity, adjunct describes an achievement: ??John works on the project [by turning up]. (h ) Matrix VP describes an activity, adjunct describes a point: ??John works on the project [by noticing a problem]. The second major source of ungrammaticality for by-phrases is apparently related to competition with the class of BPPAs, to be discussed below. We will see in Section 6.3 that BPPAs are interpreted in some cases as describing causes of the event described in the matrix VP. Such cases clearly overlap significantly with the interpretation of by-phrases, and in many cases, only one of the two options feels natural. This is illustrated below. (15)
(a) John opened the door [*(by) pressing this button]. (b) John turned the house upside down [(*by) looking for his glasses].
Although the details again escape me, it seems reasonable to claim that this pattern is related to some further specification of the semantics of by beyond a purely contingent relation. Specifically, it appears that a by-phrase, above and beyond the causal relation that it specifies between the two subevents, also involves a ‘means’ component. For example, in (15a), an acceptable interpretation is that the means by which John opened the door was pressing the button. However, in (15b), it is absurd to talk of the means by which John turned the house upside down. Whether this suggestion generalizes to the full range of environments in which by occurs, though, remains to be determined. In sum, although the factors governing the distribution of by-phrases in declaratives may be somewhat puzzling, extraction from by-phrases, as in (13), does appear to be generally possible once such factors are controlled for. This provides a good baseline for further exploration of extraction out of prepositional participial adjuncts, where we will find more restrictive patterns. 6.2.2 Patterns of Extraction from Before- and After-participial Adjuncts The primary semantic function of before and after is to relate the events described in the matrix and adjunct clauses temporally. As we have seen in Section 2.1, purely temporal relations fall outside the class of contingent relations. Two events standing in a relation of temporal contiguity therefore do not automatically qualify as a single macroevent, and, as a result, extraction from a before- or after-phrase is not predicted to be automatically possible according to the Single Event Condition. However, as we saw in Section 3.2, this primary specification of a temporal relation between two events can be enriched in such a way that it comes to be interpreted as a contingent relation,
138
Extraction from Adjuncts
subject to world knowledge. In this way, the interpretation of the relation between events described in the two phrases linked by before or after depends on our interpretation of those two phrases. As we also saw in Section 3.2, world knowledge makes it much less plausible to infer a causal relation in (16b) of Chapter 3, repeated as (16b) below, than it is in Chapter 3’s (16a) ((16a) below), where the inference that the two events are causally related is quite natural. However, given the right context, even (16b) could be coerced into a causal interpretation. (16) (a) John collapsed after colliding with a lamp post. (b) John collapsed after reading The Master and Margarita. Following Zwicky and Sadock (1975) and earlier work summarized therein, we can apply tests along these lines to show that the single-event and multiple event readings of such sentences really are distinct readings rather than a single vague reading.2 Firstly, coordinating a PP which favours a causal interpretation with one which allows such an interpretation, but does not favour it, brings out the causal reading of the second PP more clearly. (17)
I collapsed after colliding with a lamp post and after reading The Master and Margarita.
Secondly, if the subject is a coordinated NP, the same reading is forced on both conjuncts. If we try to interpret the following in such a way that The Master and Margarita caused the collapse of John, but was not causally related to the collapse of Mary, we derive what sounds like a bad joke. This interpretive oddity can be brought out further by adding appositive modifiers to the coordinated NPs, favouring the two readings, as in (18b). (18) (a) John and Mary collapsed after reading The Master and Margarita. (b) John, who is very sensitive, and Mary, who is a narcoleptic, collapsed after reading The Master and Margarita. Finally, parallelism requirements on do so-ellipsis force the same reading across conjoined sentences in a case like (19). Attempts to derive mixed readings result in the same bad-joke interpretation that we saw above. 2 Actually, one of the tests discussed in Zwicky and Sadock (1975) does not give this result. This is the test by contradiction. If sentences such as those in (16) are ambiguous, and if one of the senses (the causal one) entails the other, but not vice versa, we should be able to conjoin a sentence with its negation without contradiction. So That’s a dog, but it isn’t a dog can be interpreted as stating that ‘that’ is canine but isn’t a male canine (Zwicky and Sadock 1975: 7). However, John collapsed after colliding with a lamp post, but he didn’t collapse after colliding with a lamp post is a straightforward contradiction. I have no idea why this specific ambiguity test should differ from the others.
Prepositional Participial Adjuncts
139
(19) John, who is very sensitive, collapsed after reading The Master and Margarita, and Bill, the narcoleptic, did so too. This shows that, although availability of a contingent interpretation of before or after is very much a gradient phenomenon, based on an individual’s willingness to admit an interpretation of two events as causally related, the two interpretations are nevertheless distinct (see also the discussion of before in Zwicky and Sadock 1975: 20–1). We have also seen semantic reasons why one or the other reading may be disallowed. In a case such as (16a), construal of the two events as contingently related is almost automatic, while in a case such as (18) from Chapter 3, repeated below, a causal interpretation of the relation between events is impossible, as the temporal order specified by before is incompatible with the would-be enriched causal order. (20)
We had a weird night last night. We fell through a wormhole into a parallel universe where the flow of causation was reversed so that a cause happens after its effects. It was terrible. John collapsed before colliding with a lamp post. So don’t expect too much from him today.
As extraction from a verbal adjunct requires a contingent relation, we predict that, where we find gradience in the availability of the contingent relation, this gradience will carry over directly to the grammaticality judgements for extraction from before- and after-phrases. We will see below that this is just what we find. There is, however, a further complicating factor. We predict the acceptability of extraction from this class of adjuncts to be dependent on the possibility of construing the matrix and adjunct events as contingently related. However, extraction from an adjunct crucially removes part of the description of the adjunct event. Consider the question in (21). (21)
%Which book did John design his garden [after reading
]?
Is it plausible to consider the book-reading and garden-designing events as contingently related? That depends on the choice of book. Certain books (Finnegans Wake, for example) have nothing to do with garden design, while other books (such as The Essential Garden Design Workbook by Rosemary Alexander) have as their raison d’être the enablement of garden designing. Whether or not the requirement of a single event in (21) is satisfied depends, in that case, on the choice of answer to the question. In this way, the requirement which the question imposes of a single-event description, as a result of the wh-movement, comes to give the appearance of being a requirement imposed on the answer to the question. This can be seen by the following judgements: although some speakers never fully accept questions such as (21), for many
140
Extraction from Adjuncts
speakers, a choice of answer as in (22a) ameliorates the dialogue in comparison to (22b), and may even make it fully acceptable. (23) gives a parallel set of examples with before instead of after. (22) (a) (A) %Which book did John design his garden [after reading ]? (B) The Essential Garden Design Workbook by Rosemary Alexander. (b) (A) %Which book did John design his garden [after reading ]? (B) #Finnegans Wake. (23) (a) (A) %Which professor was John working so hard [before meeting ]? (B) The one who decides whether he gets the job or not. (b) (A) %Which professor was John working so hard [before meeting ]? (B) #The one who lives next door who he plays golf with. In contexts where readings such as (22a) or (23a) are preferred, extraction from before- and after-phrases is quite possible for many speakers.3 Some further examples are given in (24).4 (24) (a) %Which professor did John rewrite his paper [after meeting ]? [Acceptable if the meeting was related to the paper, unacceptable if purely temporal]. (b) %Which picture is John doing lots of research [before looking at ]? [Acceptable if the research is connected to the painting, but not otherwise]. Further support can be given to the claim that the answer to such questions must be compatible with a contingent interpretation, by considering ways in 3 Although we expect a degree of speaker variation in this area, we may wonder why the level of variation is so high, and indeed why some speakers consistently reject examples such as (22b). We can imagine many hypotheses in this respect. For example, it may be that the extra pragmatic step of enrichment required to bring an example like (22b) into line with the Single Event Condition is sufficient to push such examples over a threshhold for many speakers. In other words, speakers may vary in the amount of effort they are willing to expend to arrive at a grammatical reading for an example. Such hypotheses readily suggest themselves, and are testable. Unfortunately, though, testing them is beyond the scope of the present work. 4 The characterization given above does not strictly fit every case of extraction from before-adjuncts. The exceptions are examples like %What did John die before finishing?, where the interpretation is clearly not that the death causes the finishing. Instead, the death interrupts a chain of contingently related events which were intended to lead to completion of the project. I am confident that minimally expanding the definition of macroevent to cover such cases is possible, but I do not have a concrete suggestion for how to go about it at present.
Prepositional Participial Adjuncts
141
which speaker B may expand upon his answer. (25) shows that, unsurprisingly, a case such as (22a) is unaffected by a further statement asserting that reading Rosemary Alexander’s book enabled the designing of the garden. Strikingly, however, (26) shows that even the answer Finnegans Wake can be rescued if speaker B asserts that, implausibly, that book proved to be useful in matters of garden design. (25)
(A) %Which book did John design his garden [after reading ]? (B) The Essential Garden Design Workbook by Rosemary Alexander. It really helped.
(26)
(a) (A) %Which book did John design his garden [after reading ]? (B) Actually, it was Finnegans Wake. It really inspired him, believe it or not. (b) (A) %Which book did John design his garden [after reading ]? (B) #Finnegans Wake. It had nothing to do with designing the garden, though. Things just turned out that way.
It is not straightforward to test the parallel negative prediction, namely that extraction from the adjunct should be ungrammatical in cases where the descriptions of the relevant events make enrichment to a contingent relation implausible. This is because the possibility of construal of two events as contingently related is dependent on the creativity of an individual, and finding examples which absolutely preclude this possibility is near impossible. However, examples such as the following, chosen so as to make a contingent construal particularly unlikely, are generally rejected (although even here, if we imagine a context in which a glass-breaking ritual bestows good luck on the recipient of the letter, these examples don’t sound so bad, to my ears at least). This, together with the contrasts noted in (22–23) and (25–26), supports the general approach developed here. (27)
(a) (*)Which letter did John break a glass [after writing (b) (*)Which letter did John break a glass [before writing
]? ]?
In sum, extraction from participial adjuncts introduced by before and after shows all the characteristics predicted on the current approach. Extraction is dependent on an enriched interpretation of these temporal prepositions, where the matrix and adjunct events are contingently related. Acceptability of extraction from such adjuncts is therefore gradient rather than categorical, and is dependent on an individual’s willingness and ability to see a contingent relation between the two events in question. Moreover, the plausibility of such
142
Extraction from Adjuncts
a relation is dependent not just on the form of the question itself but also on the expected answer to the question, which plays a vital part in determining whether a contingent relation among events is available. 6.2.3 Patterns of Extraction from Since- and Upon-participial Adjuncts The pattern of extraction from this third class of prepositional participial adjuncts is a simple one: extraction is impossible and no amount of manipulation of events can help. An illustrative selection of examples is in (28–29). (28) (a) John has been grinning manically [since meeting the evangelist]. (b) *Who has John been grinning manically [since meeting ]? (29) (a) John rushed over [upon hearing the good news]. ]? (b) *What did John rush over [upon hearing As ever, the question is why extraction from this particular class is so categorically bad. One approach to these data is to argue for a syntactic height effect, banning extraction from since- and upon-phrases because they are attached too high. In the terms of the present theory, this means that adjuncts headed by since or upon necessarily adjoin above the highest point at which the operator Op described in Chapter 4 can merge. I will tentatively adopt this suggestion here, although the evidence in favour of it is admittedly slight. Certainly, relative orders of the various classes of participial adjuncts discussed here are essentially free. For example, by-phrases can both precede and follow a since- or upon-phrase, as shown below. (30) (a) John has been irritating his colleagues [since meeting the clairvoyant] [by talking constantly about the future] (b) John has been irritating his colleagues [by talking constantly about the future] [since meeting the clairvoyant]. (31) (a) John hid himself away [upon receiving the warning] [by travelling to Siberia]. (b) John hid himself away [by travelling to Siberia] [upon receiving the warning]. However, this is, in itself, inconclusive. All our theory tells us at present is that it is possible for by-phrases to attach low, below Op. We have no evidence that it is necessary. It is quite possible, then, that since- and upon-phrases necessarily attach above Op, with the free orders attested in (30–31) being the product of multiple available attachment sites for by-phrases. This would be the case if there were a structural distinction as in (32), where the position of the since or upon phrase remains constant, above Op, while the by-phrase is free to merge either side of it.
Prepositional Participial Adjuncts
(32)
(a)
143
TP John T PP Op VP
PP
since/upon . . .
by . . .
(b)
TP John T PP Op
VP
PP
by . . .
since/upon . . .
This makes one prediction which appears to be borne out. If the word order difference in (30–31) is indeed due to the availability of two attachment sites for by, we predict extraction to be impossible from the higher attachment site, above Op, illustrated in (30a) and (31a). This correctly derives the contrast between (33a), in which the by-phrase is in the lower position, and (33b), in which it is in the higher position, and likewise for the upon case in (34). (33) (a) %What has John been irritating his colleagues [by talking about ] [since meeting the clairvoyant]? (b) *What has John been irritating his colleagues [since meeting the ]? clairvoyant] [by talking about (34)
] (a) %Which country did John hide himself away [by travelling to [upon receiving the warning]? (b) ??Which country did John hide himself away [upon receiving the ]? warning] [by travelling to
144
Extraction from Adjuncts
However, there are too many poorly understood and apparently conflicting factors influencing the possibility of extraction from multiple PPs to have much faith in these data alone. In fact, though, there is further slight evidence supporting the claim that since- and upon-phrases attach outside Op, from their interaction with perfectivity. According to the account developed in Chapter 4, single-event and multiple-event readings of a given constituent arise from a structural ambiguity in the attachment site of Op, which merges freely, constrained only by a requirement that it be merged once on the path from a verbal head to the first c-commanding Aux or T head. Now, both since and upon interact to a significant extent with perfectivity, associated with outer aspect and so with the auxiliary system. As shown in (35–36), since-phrases require that the phrase to which they attach be perfective, while upon phrases require the opposite.5 (35) (a) John has been grinning manically since meeting the evangelist. (b) *John grinned manically since meeting the evangelist. (36) (a) *John has been grinning manically upon meeting the evangelist. (b) John grinned manically upon meeting the evangelist. This sensitivity to perfectivity is not found with any of the other classes of prepositional participial adjunct discussed above. (37) (a) John has come to stay before flying home tomorrow. (b) John came to stay before flying home the following day. (38) (a) John has recovered fully after visiting the doctor. (b) John recovered fully after visiting the doctor. (39) (a) John alienated all his friends by stealing their belongings. (b) John has alienated all his friends by stealing their belongings. Although it is far from conclusive, the contrast between (35–36) and (37–39) may suggest that since- and upon-adjuncts are restricted to a higher portion of the clause, in the auxiliary range. If so, the automatic ungrammaticality of extraction from such adjuncts would follow naturally from the fact that they would always be merged above Op, and so single-event readings would be inaccessible to them. Certainly, this is just the sort of effect that the theory developed in Chapter 4 leads us to expect. To summarize the findings concerning extraction from prepositional participial adjuncts, we have made use of syntactic, semantic, and pragmatic 5 Hornstein and Weinberg (1981) report that a similar claim has been made for since by Dresher (1976).
Bare Present Participial Adjuncts
145
factors to account for a complex and only partially understood pattern. A first, syntactic, cut can be made between those prepositions (since and upon) which necessarily attach too high for extraction, and those (by, before, and after) which attach sufficiently low attachment for extraction. Within the latter class, extraction is dependent on the type of relation specified by the preposition, which may be contingent (by), and so allow extraction in the general case, or basically noncontingent (before and after), and so only allow extraction subject to the pragmatic feasibility of an enriched, contingent interpretation. All these patterns described in this section accord with the predictions of the Single Event Condition together with the theory of event structure developed in Part I. On the other hand, they remain a mystery on any account which is purely syntactically driven, as the height effect which distinguishes since and upon from before, by, and after is too coarse to capture the further subtle patterns within this latter class. In the following section, the event-structural approach will be confronted with data from a third and final class of verbal adjuncts.
6.3 Bare Present Participial Adjuncts 6.3.1 Bare Present Participial Adjuncts in Declaratives Bare present participial adjuncts, the final class of adjuncts under investigation here, differ from the other classes discussed above in that they do not contain an overt introducing element parallel to in order (Section 6.1) or a preposition (Section 6.2).6 In fact, there is evidence suggesting that BPPAs are very small indeed. The facts to be discussed here are based on work by Seth Cable (2004) on -ing forms and restructuring in English. Cable’s hypothesis is that -ing, which shows up in a range of unrelated syntactic environments, is a morphological default in English, like more familiar cases of infinitive forms as morphological defaults. It therefore occurs in restructuring contexts, among others. On an approach to restructuring such as that of Wurmbrand (2003), restructuring is not an operation, but a property: restructuring complements are smaller than other verbal complements. Although the term restructuring is not typically applied to configurations other than head–complement, some of the restructuring diagnostics used by Cable have more general applicability, suggesting the same syntactic smallness holds of -ing-VPs in BPPAs. Firstly, the temporal location of the event described by BPPAs is very tightly pegged to that of the matrix event. This means that independent specification 6 Many of the data covered in this section have been discussed from a slightly different perspective in Truswell (2007a).
146
Extraction from Adjuncts
of the temporal location of the adjunct event by an adverbial, as in (40a), is typically ungrammatical, as is the case for the -ing complement of a classic restructuring predicate, as in (40b).7 (40) (a) *At 2pm, John arrived [whistling {at 3pm/at 1:30pm/tomorrow}]. (b) *I tried [leaving tomorrow] (but the airline didn’t have any tickets). Secondly, Wurmbrand (2003) notes that partial control of restructuring predicates is impossible. Once again, the same is true of BPPAs. (41) (a) *John drove Mary crazy [working together]. (b) *I tried [dancing together]. Finally, BPPAs, like restructuring complements, are incompatible with clausal negation. However, some care has to be taken in arriving at this judgement to avoid two potential confounds. The first potential confound is the danger of confusing constituent negation (which appears to be marginally acceptable in BPPAs) with clausal negation. (42a) is therefore more or less acceptable on the reading where John’s not fixing the radiator is taken as a wilful act on John’s part. What is missing, though, is a reading which could be paraphrased as John drove Mary crazy because he didn’t fix the radiator. Similar concerns arise in the restructuring case (42b). The second potential confound is that negation, unlike the previous tests, appears to be acceptable in string-identical cases where the adjunct is separated from the matrix VP by a prosodic break. Such a break is presumably indicative of a larger structure within the adjunct. Provided we avoid these pitfalls, BPPAs pattern with -ing complements of restructuring verbs. (42) (a) *John drove Mary crazy not fixing the radiator. (b) *John tried not fixing the radiator. Cable adds five further tests for the smallness of -ing VPs. However, these yield somewhat more equivocal results, both when applied to complements of restructuring verbs and to BPPAs. The three tests above, however, already suggest that there is very little syntactic structure within BPPAs. To be clear, I am not claiming that BPPAs exhibit restructuring effects. Cable claims that long passives, the classic restructuring effect which is most likely 7 The (b) examples in the following discussion are all taken from Cable (2004). Some of the diagnostics Cable uses are only applicable in a somewhat problematic way. For example, it is not entirely clear to me that the impossibility of (40a) could not be explained with reference to the entailment properties of such examples (John arrived whistling implies John whistled, which isn’t true for John tried whistling). However, I do not see a complete explanation along such lines, and it would appear to be very unlikely in any case that all of the diagnostics of syntactic smallness listed below will prove to be amenable to such independent explanations.
Bare Present Participial Adjuncts
147
on theoretical grounds to appear in English, can in fact be found in examples like (43). (43) This floor needs [washing
]
(Cable 2004: 9)
However, an important part of such constructions is that the matrix verb does not assign a θ-role to this floor. Similar matrix predicates are not available with BPPAs. Moreover, there does not appear to be any way of demoting the agent within a BPPA, so examples like the following are, at best, incomprehensible. (44)
#The package arrived carrying [intended: the package arrived with someone carrying it]
If constructions like the long passive are taken as indicative of restructuring, then BPPAs are not restructuring constructions. However, we do not need them to be. All that is important for us is that BPPAs are syntactically very small (say, VP with no additional functional structure), in the same way that restructuring complements are. This syntactic smallness is probably related to a semantic fact, discussed in Section 2.3, namely that the relation between matrix and adjunct VPs is tighter in the case of BPPAs than in the cases of rationale clauses or prepositional participial adjuncts. Specifically, in the case of the previous two classes of adjunct, both the adjunct VP and the matrix VP describe a whole core event. The question was then whether those two core events could be interpreted as contingently related as required by the Single Event Condition. Here, on the other hand, the matrix and adjunct VPs jointly describe a single core event. Although things will get slightly more complicated below, the major restriction that this imposes is that, in the case of a BPPA, the adjunct and matrix VPs may not jointly describe an extended event (except insofar as a core event counts as a degenerate extended event). The formation of a core event by a matrix VP and a BPPA doesn’t leave any extra material, and so there is nothing there which may constitute an extended event. The possibilities for macroevent formation are therefore much more restricted in the case of BPPAs than for the classes of adjuncts discussed in previous sections. The subevents described by a BPPA and the matrix VP to which it is attached must be identified as the two subevents of a single core event if the Single Event Condition is to be satisfied and extraction from the adjunct permitted. In this respect, the BPPA example in (45c) patterns together with the resultative construction (a classic case of a syntactically very small adjunct participating in relations of direct causation) in (45d), in contrast to the enablement
148
Extraction from Adjuncts
examples (45a,b).8 Resultatives and BPPAs show very similar syntactic and semantic profiles, then: they are small constituents, attaching very low within VP, and semantically ‘inseparable’ from the content of the matrix VP. (45) (a) John came to England in order to visit the Queen, but he never got to see her. (b) John emptied the hearth before making a fire, but he never got round to making a fire. (c) #John drove Mary crazy whistling but he didn’t {drive Mary crazy/whistle anything}. (d) #John hammered the metal flat, but {he didn’t do any hammering/the metal wasn’t flattened}. The events described by a matrix VP and a BPPA attached to it cannot be construed as related by enablement, then: extended event formation requires that the adjunct have more semantic independence from the matrix event than it does here. This leaves only a narrow range of options for interpreting BPPAs. In fact, the following exhausts the possibilities. (46)
(a) The matrix and adjunct events are interpreted as conjoined, and temporally overlapping (two separate core events); (b) The matrix and adjunct events are interpreted as jointly forming a single core event.
Both of these possibilities are in fact attested. The former possibility, (46a), is clearly visible in an example like (47), interpreted as meaning that John listens to music while he does his work. (47)
John works [listening to music].
A parallel reading is marginally available in an example like (48), as well, but this time, it is not the most salient interpretation. (48) John drove Mary crazy [whistling the Marseillaise]. A while-reading, corresponding to (46a), of the relation in (48) can be brought out by enriching the context, as in (49). 8 Some speakers apparently interpret (45b) as presupposing that a fire is necessarily made. This presumably reflects a stronger preference for a purely temporal reading of before for those individuals: the temporal reading implies the occurrence of the fire-making event, whereas the enablement reading only implies the existence of a plan to make a fire. We might expect that there would be an inverse correlation between a speaker’s intuition that (45b) presupposes the making of a fire, and that speaker’s willingness to accept extraction from such adjuncts, because the former property relies on the relative salience of the purely temporal reading, while the latter relies on the relative salience of the enablement reading. I have not yet tested this prediction.
Bare Present Participial Adjuncts
149
(49) John drives Mary crazy every day, usually by spending hours cleaning their carpet with a tiny brush, in absolute silence. Yesterday, John drove Mary crazy once again, but instead of doing so in silence, the remarkable thing about yesterday was that John drove Mary crazy whistling the Marseillaise. (49) surely does not represent the most salient reading of (48), however. Instead, (48) is most naturally interpreted as stating that John’s whistling is the cause of Mary’s craziness. I suggest that this comes about through option (46b): the matrix and adjunct events jointly describe a single core event. The whistling process described in the adjunct is construed as the direct cause of Mary’s craziness. This can be schematized as in (50b), in comparison with the regular accomplishment John built a house in (50a).9
(50)
(a) Change significant
insignificant
∃x.house(x)
build(j)
Time
(b) Change significant
insignificant
crazy(m)
whistle(j)
Time
The obvious question now becomes one of what regulates the different interpretations that we find in (47) and (48). In the former case, an interpretation along the lines of (46a) is salient, and an interpretation along the lines of (46b) is unavailable. In the latter case, both interpretations are available, but the (46b) interpretation is much more natural in a neutral context than (46a). 9 Neither of these diagrams is particularly felicitous, because they obscure the fact that the state of the house measures out the building, and the degree of craziness measures out the whistling. There is no theoretical significance to these diagrams, though—they are only intended to illustrate the interpretive similarity.
150
Extraction from Adjuncts
In fact, there is a clear link between these patterns of interpretation and lexical aspect. Telic and atelic predicates are distinguished in that the latter do not have a culmination, Vendler’s typical endpoint ‘which has to be reached if the action is to be what it is claimed to be’ (Vendler 1957: 145; see also Section 2.3). That endpoint represents a linguistically significant change of state, which is a prime candidate for the caused event in a structure such as those in (50): in the prototypical case, causal relations bring about just such a change of state, and the maximal core event structure of Section 2.3 requires that the second subevent of any core event correspond to a change of state. We may expect, then, that atelic matrix predicates modified by BPPAs do not have interpretations such as (46b), illustrated diagramatically in (50), available to them: there is no core event structure corresponding to a non-culminating event which nonetheless contains two differentiated, non-culminating subparts. Interpretations such as (46a) are then expected to be the only interpretations available when both the matrix and adjunct predicates are atelic. Cases of BPPAs where both the matrix VP and the adjunct describe atelic events must be interpreted conjunctively, along the lines of (46a), then. Moreover, a BPPA must describe an atelic event, presumably as a consequence of the semantics of the -ing morpheme, which requires that the event described by the verb to which it is attached is ongoing.10 If the matrix VP describes a telic event, however, with the adjunct describing an atelic event, option (46b), of a single-event interpretation, becomes available.11 The matrix VP denotes a predicate containing a culmination, and the adjunct can be interpreted as describing the process which leads to that culmination. In that way, the two constituents which describe events each describe one subevent of a single core event. This leads to the following division: (51) (a) Atelic matrix VP: conjoined interpretation of matrix and adjunct events (47) only. (b) Telic matrix VP: conjoined (49) or single-event (48) interpretations of matrix and adjunct events available. When the matrix VP is telic, the default interpretation is often the singleevent reading. This is not always the case, however. Contrast (48) with exam10 It may be objected that this cannot always be the case for -ing, as shown by an example like John felt sick after eating the oyster, where the eating event is clearly completed. Certainly, in the morphosyntactic environment of a BPPA, however, the claim holds up. If we examine instead John felt sick eating the oyster, we note that eating the oyster has been coerced into an atelic activity interpretation. Again, we see that in the case where the -ing verb is embedded in a very small constituent like a BPPA, the range of semantic options is more restricted than in larger constituents like prepositional participial adjuncts. 11 See Chapter 8 for a possible reason why the single-event reading of this configuration is preferred over the conjunctive reading.
Bare Present Participial Adjuncts
151
ples such as those in (52), which consist of the same basic configuration of accomplishment-denoting matrix VP with activity-denoting adjunct. (52)
(a) John painted this picture trying to express his inner rage. (b) John built his house thinking it would be a nice challenge.
In these latter cases, although it is quite plausible that trying to express his inner rage might cause John to paint the picture, or thinking it would be a nice challenge might cause John to build the house, the causal relation in these cases is not direct. What brought about the existence of the picture, or the house, is not John’s mental state as expressed in the respective adjuncts but rather the painting and building processes described by the matrix verbs themselves. This suggests that the relation between the events described in the matrix and adjunct VPs in this case is not of the right type to form a single core event, because only direct causation is admissible within core events. We must instead claim that sentences such as (52) are interpreted along the lines of (46a), in that the two events are interpreted as conjoined, not as the two subparts of a single core event. A fair paraphrase for (52a) may be John was trying to express his inner rage when he painted this picture, for example. As discussed in Section 2.3, this is not an arbitrary fact but rather part of a larger pattern. The reason why we cannot interpret the BPPAs in (52) as direct causes is because a direct causal relationship is already specified within the matrix VP: the painting is the direct cause of the existence of the picture, and so there is no room left within the Maximal Core Event template for any other direct cause. The only way left to form a coherent event structure, given the descriptive content originating from the matrix VP and the constraints on interpretation of BPPAs, is a conjoined, multiple-event interpretation. Still another pattern can be found by considering BPPAs modifying achievement verbs. On the theory presented in Section 2.3 and Chapter 3, achievements are identical to accomplishments in terms of subevent structure, but are distinguished in that they resist interpretations with agentive preparatory processes. In that case, we might expect single-event readings, similar to (46b), to be available in this case too. In fact, such a reading is available, but it is somewhat different from cases such as (48) above. Consider the following: (53)
(a) John arrived [whistling the Marseillaise]. (b) John came back from his travels [thinking he was invincible].
In such cases, we do not consider that whistling the Marseillaise caused John to arrive, or that thinking he was invincible caused John to come back from his travels. Instead, the relationship is, at first sight, much closer to the conjoined
152
Extraction from Adjuncts
readings as in (46a) than the single-event reading. However, there is a twist in this case. Consider again a sentence such as (47), in which a BPPA modifies an activity VP. In such cases, the working event and the listening-to-music event have to overlap temporally. This is not so in examples following the same aspectual schema as (53), however. To see this, consider the following. (54) John died [whistling Ode to Joy]. Temporal overlap is a clearly inappropriate characterization of the relation between the adjunct event and the matrix culmination in this case, as dead men do not whistle. Instead, the truth conditions on (54) include a requirement that John was whistling Ode to Joy immediately prior to his death. Unlike the genuine atelic cases of temporal overlap such as (47), then, the asserted relation in cases such as (53) and (54) is one of immediate temporal precedence. With this in mind, consider again (53a). In this case, too, the necessary and sufficient relation is immediate temporal precedence. Consider the following scenario. (55)
John walks home from school, whistling a different tune each day as he walks. Today, it was the Marseillaise. John’s father knows about John’s whistling, but never hears which tune he whistles because John stops the instant he opens the door of his family home. So every day, when he gets home, John tells his father which tune he whistled on the way. Today, John said ‘Dad, I came home whistling the Marseillaise today’.
Even though John stops whistling the instant the result state of the predicate come home is reached, (55) is a perfectly acceptable statement for John to make in the context. This strongly suggests that, even in cases of questions with matrix verbs such as come home, immediate temporal precedence is the necessary and sufficient relation between the two events. In fact, in cases such as these, given our analysis of achievements as a variety of culminated process, it is not correct to say that the adjunct event precedes the matrix event. Rather, the adjunct event precedes the culmination of the matrix event, but still overlaps with the process component of the matrix event. The readings described above arise when the moment of change described by the matrix VP coincides with another moment of change, namely the cessation of the process described by the adjunct VP. The above shows simply that the adjunct process must be taking place while the matrix process is taking place, and it can stop the instant the matrix culmination is reached. To borrow a term from Krifka (1989), the above suggests a requirement that the temporal trace of the adjunct event is no shorter than that of the matrix event. Whether it is any longer than the matrix event is a matter of pragmatics.
Bare Present Participial Adjuncts
153
This does not mean that John must stop whistling the minute he arrives for (53a) to be felicitous. Depending on our world knowledge of the characteristics of certain actions, the normal interpretation of such relations is often, indeed, one where the adjunct event continues through the time of the matrix event. For example, there is no reason for us to assume that John’s coming back from his travels in (53b) would stop him from thinking he was invincible. In the absence of any evidence to the contrary, then, we will usually interpret such a sentence as implying that John continues to think that he is invincible after he gets back from his travels. Examples such as (55) show that such an inference is only a cancellable default, however, and not part of the asserted content of an example such as (53b). A similar observation can be made in the case of the reading of (56) on which the boat floats towards a specified endpoint under the bridge (the directed motion reading, as opposed to the locative reading, on which the boat is afloat under the bridge, but motionless). (56)
The boat floated under the bridge.
Although the float event and the under event form a single core event structure as in (50), we do not assume that the boat must cease to float (i.e. sink) the instant it is under the bridge. Instead, the normal interpretation in this case is one where the floating continues after the boat has reached a position under the bridge. Again, though, this normal assumption can be cancelled, exactly parallel to the case of come home whistling in (55) above. (57)
The boat floated under the bridge, and then sank the second it got there.
We saw above that the combination of a BPPA and an atelic matrix VP is unambiguously interpreted in such a way that the events described by the two constituents are conjoined and overlap temporally. And if the matrix VP describes an accomplishment rather than an activity, we find an ambiguity between such a reading and a preferred causal reading, on which the matrix and adjunct events form a single macroevent. In this latter case, the two readings are clearly distinguishable: on the conjoined reading, there is no causal component to the interpretation, and on the single-event reading, there is no requirement that the two subevents overlap temporally, as shown by examples such as (58). (58)
John hated the Marseillaise, and he also hated Mary. But he knew that she also hated the Marseillaise. So he hatched a plan. He would whistle the Marseillaise for just as long as was necessary to send her into a foaming rage, and then stop for the sake of his own sanity. And sure enough, he drove Mary crazy whistling the Marseillaise, and stopped the very instant she got really mad.
154
Extraction from Adjuncts
In the case where the matrix VP describes an achievement, however, it is not so clear whether or not there is an event-structural ambiguity. To be sure, there are interpretations on which the matrix and adjunct events are most naturally interpreted as overlapping (53), and others in which they are not (54). However, the two readings are not doubly dissociated here as they are in the case of accomplishments, as there is never any causal relationship between these two subevents when the matrix VP describes an achievement. The single-event reading properly includes the conjoined reading, as the matrix and adjunct events on a single event reading of (53) can be interpreted as temporally overlapping, but needn’t be. Does the interpretation of a BPPA modifying an achievement-denoting VP consist of a single macroevent, then, or two conjoined events? There are reasons to believe either story. On the one hand, a relation of temporal succession among events is insufficient to allow macroevent formation according to the theory of event structure developed in Part I. This gives us an initial reason to think that the achievement case illustrated in (54) represents two events, like the activity case in (47). However, I will present a modification to that theory below, according to which examples such as (54) or (53a) may be construed as, roughly, a single event. This is partly motivated by the fact, discussed above, that the temporal profile of an example like (54) is identical to the single-event reading of an accomplishment-based example like (48), and quite distinct from a conjoined case such as (47), in that the two subevents in both (54) and (48) must stand in a relation of immediate temporal precedence, such that the process precedes and abuts the culmination. Furthermore, in both the accomplishment and achievement cases, we may naturally interpret the process as continuing past the point of the culmination, although this default inference is cancellable. In contrast, the two events in conjoined cases like (47) must overlap, and not just abut, each other. The parallels between the temporal profiles of the accomplishment and achievement cases constitute an argument in favour of treating them as a natural class, with single-event readings available in both the accomplishment and achievement cases. On grounds independent of the extraction data, then, there are equally good reasons to favour either position. However, the extraction data concerning BPPAs, analysed in terms of the Single Event Condition, argue in favour of a conception of cases like (54) as describing single events.12 We turn to those data in the next subsection. Of course, if such examples really do 12 Although there is no clear way, on this theory, of telling whether BPPAs modifying achievementdenoting VPs also have a conjoined reading parallel to (49), it is natural to suppose, by parallelism with the accomplishment case, that they do. I am unaware of any way to test this supposition, however.
155
Bare Present Participial Adjuncts
describe single events, the question is why, or perhaps, given the theory of event structure developed above, how? Providing an answer to this will be the motivation for the promised modification to Part I’s theory of event structure. 6.3.2 Extraction from Bare Present Participial Adjuncts: The Basic Patterns By now, we have found several distinctions in the interpretation of bare present participial adjuncts, based primarily on the aspectual class of the VP which the adjunct modifies. BPPAs modifying matrix accomplishments and achievements form a natural class, to the exclusion of activities, in allowing a temporally non-overlapping reading. And the accomplishment case is distinguished from the achievement case in that it requires that the two subevents in such a reading be causally, rather than temporally, related. We are now ready to see how these patterns match up to the extraction data. The predictions are as follows: once we control for syntactic factors such as the category of the extracted element, the Single Event Condition predicts extraction from a BPPA modifying an accomplishment to be possible, but it will force the event described by the BPPA to be interpreted as the cause of the event described by the matrix VP. If that is impossible, the extraction should be unacceptable. In the case of BPPAs modifying achievements, we saw above that it is unclear whether a single-event reading exists. We will see shortly, however, that extraction from BPPAs modifying achievements is possible. The prediction for BPPAs modifying activities is that extraction should be impossible, as there is no single-event reading in such cases. This is a reasonably good first approximation to the data. (59) lists several cases of extraction from BPPAs modifying achievement VPs, while (60) does the same for the accomplishment case. Finally, (61) shows that extraction from BPPAs modifying activities is frequently unacceptable. (59)
]? (a) What did John arrive [whistling ]? (b) What did John die [thinking about (c) Which clothes did John come back [wearing
]?
(60) (a) What did John drive Mary crazy [whistling ]? ]? (b) What did John cut himself [carving (c) What did John turn the house upside down [looking for
]?
]? (61) (a) *What does John work [thinking about (b) *What does John dance [screaming ]? ]? (c) *What did John laugh a lot [listening to Moreover, we find that cases such as those in (52), where any causal relation which does hold between the adjunct event and the matrix event is indirect,
156
Extraction from Adjuncts
do not readily allow extraction, as shown in (62). Again, this is as predicted by the Single Event Condition, if only direct causal relations are admissible within core events. (62) (a) ??What did John paint this picture [trying to express (b) *What did John build this house [thinking (about)
]? ]?
We have a pretty good initial match between the predictions of the Single Event Condition and the observed data, then. Closer scrutiny reveals a couple of interesting puzzles, however. For one thing, the above conclusions regarding the single-event interpretation of a BPPA modifying an achievement-denoting VP are both tentative (although arguably given some indirect support by patterning with the other, clearer cases of single-event readings with respect to extraction) and anomalous with respect to the theory of event structure given in Part I. Also, there is a class of counterexamples to the generalization illustrated in (61) that extraction from a BPPA modifying an activity-denoting VP is ungrammatical. This involves cases such as the following, where extraction is permitted in just this aspectual configuration. ] all day? (63) (a) What did John lie around [reading (b) Which chair did John eat his breakfast [sitting on ]? (c) What was John walking about [whistling
]?
I will present a single modification to the above theory which can solve both problems, with further positive effects in other cases of extraction from adjuncts. This modification is based on the notion of agentivity. We have already seen the central role of agentivity in fixing the upper bound of extended events. These three puzzles suggest a further interaction between agentivity and event size, however. This is the topic of the next subsection. 6.3.3 Modifying the Single Event Condition Section 3.5 showed that accomplishments and achievements are distinguished by agentivity, but are built around the same event structure. In the previous subsection, we saw that the two classes share the same temporal profile in cases of extraction, but differ as to whether the relation between the two subevents is interpreted as causal or not. Moreover, the distinction between the class of accomplishments represented in (52) and those represented in (48) can be thought of in related terms. Cases such as (48) are ones in which the matrix accomplishment leaves open the precise nature of the agentive preparatory process, instead only describing the culmination reached as a result of that process, while the process is fully specified in the examples in (52), as a result
Bare Present Participial Adjuncts
157
of the use of a verb such as paint or build. This means that the direct causal relationship found between the adjunct process and the matrix culmination in (48), but not (52), is related to the fact that there are two distinct fully specified agentive processes in the (52) examples, but not in (48). Finally, the distinction between the two sets of activity examples in (61) and (63) is also related to agentivity. In each of the examples in (61), both the matrix and adjunct processes are agentive, while the cases in (63) are distinguished by the fact that one of the two processes (lying around, sitting on a chair, and walking about,13 respectively) is nonagentive. At an abstract level, then, the common theme uniting the problems highlighted at the end of the previous subsection is that each distinction, whether interpretive (as with the accomplishment–achievement distinction) or a difference in grammaticality (as with the distinctions among subclasses of activities with respect to extraction from the adjunct) is related to the number of agentive processes present in the representation and the extent to which they are specified. Achievements have a nonagentive preparatory process while accomplishments have an agentive preparatory process; those accomplishments which allow extraction from attached BPPAs are distinguished from those that do not in that the nature of the agentive process component of the matrix accomplishment is unspecified when extraction is allowed; and those pairs of activities which allow extraction are distinguished from the others in that one of the activities is nonagentive. To capture this within the framework adopted here, I need to make use of larger event-structural units than I have so far. Call the larger structures I will require event groupings. We reformulate the Single Event Condition in terms of such event groupings, as follows. (64)
The Single Event Grouping Condition An instance of wh-movement is legitimate only if the minimal constituent containing the head and the foot of the chain can be construed as describing a single event grouping.
I define ‘event grouping’ as follows: (65)
An event grouping E is a set of core events and/or extended events {e1 , . . . en } such that: (a) Every two events e1 , e2 ∈ E overlap spatiotemporally; (b) A maximum of one (maximal) event e ∈ E is agentive.
13
I will return to the justification for the claim that this is nonagentive below.
158
Extraction from Adjuncts
For clarity, I add the following: (66) An event e is agentive iff: (i) e is an atomic event, and one of the participants in e is an agent; (ii) e consists of subevents e1 , . . . , en , and one of the participants in the initial subevent e1 is an agent. This set of definitions extends the project undertaken in Part I. The algebraic model of the domain of events developed by Bach (1986), building on the work of Link (1983) on the individual domain, gives us an extremely general picture of how that domain is structured. This model contains many possible individual (or event) denotations which do not correspond to the way in which our cognitive systems regularly package things into individuals (or events), however. Under normal circumstances, I do not jointly form an individual with Hercules and Mongolia, but the me-Hercules-Mongolia triad is a conceivable individual denotation in Link’s system. Similarly, Bach’s extension of Link’s theory to the domain of events may well class as single macroevents those configurations of events which Chapter 2 endeavoured to show that we could not ordinarily treat as such. Technically, this is because Bach and Link’s structures are closed under joins. In the model developed here, on the other hand, temporal intervals are closed under joins, but events are not. Throughout Part I, the claims about the upper bounds of single events were supported by intuition, by simple thought experiments, and by experimental results reported in the literature. As far as I am aware, though, there is no such evidence available for treating the groupings discussed in this section in terms of the upper bound on the size of single events. Although there is no formal reason not to continue to consider the data discussed in this section as evidence about the upper bounds of single events, then, I have chosen to introduce a new term, event grouping, to reflect the fact that the structures relevant to the extraction patterns have gone beyond those which we have independent reason to class as single events. This complication to the theory of locality presented here is not, however, particularly damaging to the central claim advanced here, that certain locality effects can be explained in event-structural terms which do not have straightforward phrase-structural analogues. We can now return to the puzzles noted at the end of Section 6.3.2, armed with our revised locality theory, consisting of the modified Single Event Grouping Condition (64), supplemented with the definitions of event grouping (65), and agentive event (66). It can be seen that the data are now as predicted. The easiest case is, without a doubt, that of activities. Here, we saw that extraction was possible from a BPPA modifying an activity-denoting VP, only if one
Bare Present Participial Adjuncts
159
or other of the events was nonagentive. The relevant examples are repeated from (61) and (63) below. (67)
(a) *What does John work [thinking about ]? ]? (b) *What does John dance [screaming ]? (c) *What did John laugh a lot [listening to
(68) (a) What did John lie around [reading ] all day? (b) Which chair did John eat his breakfast [sitting on ]? (c) What was John walking about [whistling
]?
This is exactly what the modified Single Event Grouping Condition predicts, given the definition of event groupings in (65). Although the examples in (67) and (68) each describe two overlapping events, the nonagentive nature of one of the two events in each example in (68) makes it possible to group those events into a single event grouping. However, in the cases in (67), all the processes are agentive, and so it is impossible, on the above definitions, to contain both the matrix and the adjunct event within a single event grouping. The one surprising example, in this respect, is (68c). Walking is clearly, under normal circumstances, an agentive activity. However, part of the meaning of the particle about (and also around, which seems to function similarly) is to imply a certain aimlessness on the part of the subject. In the words of McIntyre (2004), this indicates ‘that the course of an event metaphorically lacks a goal (“gets nowhere”, so to speak), whence the intuition that around is a verb diminutive which portrays an event as aimless, unplanned, ineffectual, etc.’ (McIntyre 2004: 531). If, as suggested by the discussion in Chapter 3, we can take it to be a key component of agentivity that the agent is acting deliberately, with a goal (no matter how small or immediate) in mind, the aimlessness which around and about add to a verb meaning plausibly induces a nonagentive, goal-free interpretation of the subject of that verb, thereby allowing an example such as (68c) to conform to the conditions on event groupings listed in (65). This is confirmed by applying agentivity tests to walk around. For example, the predicate is incompatible with agent-oriented adverbs. Although (69) is an acceptable sentence, it is only acceptable on the reading where around specifies a path around some perimeter, which is quite distinct from the aimless interpretation obtained in (68c).14 (69) *John (deliberately/intentionally) walked around (on purpose). 14 One qualification is in order here. Although (69) is clearly degraded, an example such as John deliberately walked around whistling is quite acceptable. I believe that in such cases, deliberately modifies the agentive adjunct event, whistling, rather than the nonagentive walk around. This is the same pattern we encountered concerning deliberately arrive late in footnote 17, Chapter 3.
160
Extraction from Adjuncts
The Single Event Grouping Condition therefore correctly predicts the split between acceptable and unacceptable cases of extraction from a BPPA modifying an activity VP. Moving on to the split in the accomplishment cases, a very similar story holds here. In those cases where a BPPA modifying a VP describing an accomplishment allows extraction, as in (60), repeated below, the process component of the matrix accomplishment is underspecified. We know, by virtue of the definition of accomplishment in Section 3.5, that the process is agentive, but that is all we know. On the other hand, in those cases which prohibit extraction, the matrix accomplishment already has a fully specified process component, as in (62), also repeated below. (70) (a) What did John drive Mary crazy [whistling ]? ]? (b) What did John cut himself [carving (c) What did John turn the house upside down [looking for (71)
(a) ?? What did John paint this picture [trying to express (b) * What did John build this house [thinking (about)
]?
]? ]?
Here, the definition of ‘event grouping’ does not automatically make the right distinction. At first sight, there are two agentive preparatory processes in both (70) and (71). However, the crucial difference here comes from the underspecification of one of those processes in (70). This underspecification makes it possible to unify the two processes in such a way that the process described in the adjunct comes to be interpreted as the direct cause of the culmination described in the matrix VP. I will set aside the question of exactly how this is achieved compositionally, pending a better understanding of the restructuring-like properties of BPPAs discussed above, but something like this must be possible, if we are to account for the ambiguity between singleevent (48) and conjoined (49) readings described earlier in this section. The substantive claim here, then, is that there is a limit to the amount of descriptive content that can be associated with a slot in the Maximal Core Event template. One agentive predicate, but no more than that, can be associated with each slot. This means that a housebuilding event is not a thinking event. More tendentiously, it means that an event cannot simultaneously be descibed as a painting event and a trying-to-express event, even if there are cases in which an action of painting is an action of trying to express oneself, mirroring Pietroski’s (2000) distinction between actions and events. As a result, the unification of two event descriptions with agentive descriptive content attached to the same subevent position will not succeed because there simply is not enough room in the template for all those descriptions. On the other hand, a whistling event can be the (otherwise unspecified) process component of a driving-Mary-crazy event. The unification of the two
Bare Present Participial Adjuncts
161
event descriptions is possible in that case because each description attaches descriptive content to a different subevent. Moreover, as both of the distinct processes in (71) are agentive, they cannot be contained within the same event grouping, following (65). With these auxiliary assumptions, then, the Single Event Grouping Condition correctly predicts the split in acceptability between (70) and (71).15 We can now turn to the final puzzle, concerning the admissibility of extraction from BPPAs modifying achievements, and the interpretation of the resulting structure. We saw above that it is implausible to consider the matrix and adjunct events as jointly forming a single core event in this case, as the preparatory process associated with the matrix event is nonagentive, yet the adjunct processes in question are generally agentive. However, the temporal profile of a BPPA modifying an achievement VP is identical to the single-event reading found with accomplishments, and extraction proved to be just as possible for the achievement case as for the accomplishment case. Moreover, there is a clear interpretive difference: the causal relation that we found between subevents in the accomplishment case is replaced here by a relation of immediate temporal precedence. The relevant examples are repeated below. (72)
]? (a) What did John arrive [whistling ]? (b) What did John die [thinking about (c) Which clothes did John come back [wearing
]?
We can now make sense of this pattern in the following way. Following the discussion in Section 3.5, I assume that achievements are distinguished by the presence of an obligatorily nonagentive preparatory process. Meanwhile, the process described by the adjunct is generally agentive. The two cannot jointly form a single event, in that case. However, the nonagentivity of the preparatory process associated with the achievement ensures that the two can form a single event grouping, regardless of the agentivity of the adjunct event. We therefore predict extraction to be possible from such a configuration according to the Single Event Grouping Condition. Regarding the interpretation of such an event grouping, the definition (65) of event groupings tells us that events contained within the event 15 One prediction made by the current approach is that extraction from a BPPA modifying an accomplishment should be possible, regardless of the descriptive content of the matrix VP, if the adjunct event is nonagentive. The support for this prediction is equivocal. On the one hand, cases like What did John paint this picture sitting on? or What did John build this house wearing? are much more acceptable, at least to my ears, than those in (71). However, a parallel modification to (70) degrades, if anything, the sentence—certainly examples such as ??What did John drive Mary crazy sitting on? or ??What did John turn the house upside down wearing? are much less natural than those in (70). This may indicate that the agentivity requirement included in the definition (65) of ‘event grouping’ needs tweaking, or it may indicate that extraction from a BPPA somehow forces the single event reading whenever the aspectual class of the matrix verb allows. For now, I have no clear account of such cases.
162
Extraction from Adjuncts
grouping must overlap spatiotemporally. Moreover, we know that the immediate cause of the culmination described by the achievement is the nonagentive preparatory process with which it forms a core event. The adjunct event, being agentive, cannot be a direct cause of that culmination, in that case, but it must spatiotemporally overlap with that cause. This is the reason for the interpretation of immediate temporal precedence obligatorily associated with cases of extraction from a BPPA modifying an achievementdenoting VP.16 Diagrammatically, the space of possibilities for extraction out of BPPAs can be represented in the following way. In what follows, each diagram consists of a series of circles representing events. In each diagram, there are two rows of circles. The top row represents the denotation of the matrix VP, while the bottom row represents the adjunct VP. Events connected by horizontal lines represent the subevents of a core event. If two events are aligned so that one is immediately beneath the other, enclosed within a box, those two events overlap temporally. Vertical bars linking two circles represent identification of the two corresponding events. Finally, a circle drawn with a double line represents an agentive event, while a single line represents a nonagentive event. (73)
(a) Nonagentive culminated process with agentive adjunct event e.g. What did John arrive whistling?
process
arrival
whistling
16 This raises the question of why a nonagentive adjunct event, such as the one in (72c), also cannot be interpreted as the cause of the matrix culmination. In this case, I have to appeal to world knowledge. Unlike the accomplishment case, nothing forces the causal link here. And given the implausibility that wearing a particular outfit caused John to come back in (72c), it is natural that the causal interpretation is still resisted.
Bare Present Participial Adjuncts
163
(b) Nonagentive culminated process with nonagentive adjunct event e.g. What did John arrive wearing?
process
arrival
wearing
(c) Agentive culminated process with agentive adjunct event identified as the preparatory process of the matrix event e.g. What did John drive Mary crazy whistling? (causal reading)
process(j)
crazy(m)
whistling(j)
(d) * Agentive culminated process with agentive adjunct event, no identification e.g. * What did John drive Mary crazy whistling? (no causal relation between John’s whistling and Mary’s craziness) Violates the condition that at most one event in an event grouping is agentive.
process(j)
whistling(j)
crazy(m)
164
Extraction from Adjuncts (e) Agentive culminated process with nonagentive adjunct event e.g. What did John drive Mary crazy wearing?
process(j)
crazy(m)
wearing(j)
(f) Agentive matrix process with nonagentive adjunct process e.g. Which chair did John eat his breakfast sitting on?
eating
sitting
(g) Nonagentive matrix process with agentive adjunct process e.g. Which book did John lie around reading?
lying around
reading
Bare Present Participial Adjuncts
165
(h) Nonagentive matrix process with nonagentive adjunct process e.g. What did John wait around sitting on?
waiting around
sitting
(i) * Agentive matrix process with agentive adjunct process e.g. * What does John dance screaming? Violates the condition that at most one event in an event grouping is agentive.
dancing
screaming
By basing our groupings on event-structural, rather than phrase-structural, units, an initially arbitrary-looking pattern of acceptable and degraded extractions from BPPAs proves to be amenable to a compact and largely accurate description. The key components of this account are surprisingly few. Firstly, BPPAs are syntactically very small, and semantically very tightly related to the matrix VPs that they modify, in particular in terms of the temporal traces of the events that the two constituents describe. Secondly, we stipulate a definition of event grouping based on the event-structural notions motivated in Part I. Finally, we stipulate the Single Event Grouping Condition. Patterns of extraction from BPPAs fall out as a special case of this condition.
166
Extraction from Adjuncts
6.3.4 Prepositional Participial Adjuncts Revisited: The Case of Without An interesting further application of the new locality theory incorporating event groupings can be found by considering another class of prepositional participial adjuncts, headed by without. I omitted discussion of this class from Section 6.2 as they behave quite differently from the cases of extraction from a prepositional participial adjunct discussed there. Firstly, extraction from a without participial adjunct does not require the same contingent relation among subevents which was claimed to be necessary in the case of other prepositional participial adjuncts. Secondly, extraction from a without participial adjunct is quite simply easier than extraction from other prepositional participial adjuncts. Many speakers who reject most examples in Section 6.2 nonetheless allow extraction from a without adjunct, and even those speakers who accept the earlier examples frequently express a preference for the without case. Examples of extraction from without-phrases have also been relatively widely noted in the literature, for example in Cinque (1990). Examples are given below. (74) (a) Who did you go home [without speaking to ]? ]? (b) What has John designed this house [without considering (c) Which problem could you go a whole day [without thinking about ]? We can now make sense of this pattern in the following way. Unlike the prepositions considered in Section 6.2, the temporal relation among events specified by without is typically simultaneity. However, the meaning of without also specifies that the adjunct event does not occur. A sketch of the meaning of two verbal phrases related by without therefore consists of the following elements: (75) (a) An event grouping E consisting of . . . (b) An event e1 described by the matrix VP, which occurs simultaneously with . . . (c) An event e2 described by the adjunct, which consists of . . . (d) The non-occurrence of an event e2 . Such an event grouping is legitimate with respect to the Single Event Grouping Condition, provided one of the events in question is nonagentive. The obvious candidate for nonagentivity is the adjunct event. Negating an event can render it nonagentive. For example, although building a house is always a deliberate activity, not building a house can be a deliberate activity or not, depending
Conclusion
167
on nonlinguistic factors.17 For a builder at work, not building a house is tantamount to going on strike, whereas I have lived my whole life without building a house, as much by accident as by design. We therefore predict that examples such as (74) should be legitimate if the adjunct is nonagentive. Unfortunately, although consideration of examples such as (74) suggests that this is broadly correct, it is hard to find conclusive evidence in favour of this correlation in this case: for example, there is no attachment site for agent-oriented adverbs which targets specifically the level of without, at which the adjunct event is negated, rather than targeting the adjunct event itself or the whole complex of matrix and adjunct events. (76)
(a) John [deliberately [went through the day without looking at Bill]]. (b) John went through the day without [deliberately [looking at Bill]]. (c) *John went through the day [deliberately [without looking at Bill]].
I must leave full confirmation of the analysis of without participial adjuncts for future research, then. However, it is striking that the interface-based approach pursued here again has a ready-made explanation for why it should be precisely this preposition which allows extraction so readily. To the best of my knowledge, there is no comparable independent syntactic explanation, but the semantic facts make this pattern appear quite natural.
6.4 Conclusion This chapter has shown that a natural expansion of the event structure proposed in Part I, when taken together with the Single Event Grouping Condition and a standard syntactic theory of locality, can give an empirically accurate description of the distribution and interpretation of cases of extraction from three classes of adjunct. With respect to extraction, verbal adjuncts do not behave as a monolithic whole. Instead, different classes of adjunct 17 This claim has a clear similarity to Verkuyl’s (1993) discussion of the stativity of negated events. Verkuyl’s result, which has substantial empirical support, is compatible with my position here, as we saw in Chapter 3 that states are nonagentive. Such a unification would also give us some insight into the behaviour of a further class of adjuncts, namely those formed around past participial verb forms. As (i–ii) show, extraction is quite possible from such examples, with only a slight preference for a subject-oriented interpretation of the adjunct.
] yesterday? (i) What did John come home [covered in ]i/j (ii) What did the chefi serve the meatj [wrapped in This makes sense if the situations described by such forms are inherently stative, and so nonagentive. The permeability of bare past participial adjuncts and without-participial adjuncts would then be derived from a common source.
168
Extraction from Adjuncts
display different extraction profiles. At one extreme, prepositional participial adjuncts headed by since or upon disallow any extraction. At the other extreme, rationale clauses and without-PPs, among others, freely allow extraction of any referential DP argument. And in the middle, we find several cases where extraction is possible to different extents and with different degrees of naturalness. Moreover, in these intermediate cases, the behaviour of the adjunct is determined by several semantic and pragmatic factors. In the case of bare present participial adjuncts, the primary determinants of the acceptability of extraction are the aspectual classes of the verbs in question, and the agentivity of the subject with respect to the two subevents. For prepositional participial adjuncts, it is the naturalness of a contingent reading of the relation between two events. This allows generalizations expressible in semantic terms across syntactically quite unnatural classes. The main example of this was the analogy between extraction from BPPAs modifying achievements, and from withoutparticipial adjuncts. The common property of these two cases is that they describe two simultaneous events, of which one (the preparatory process of an achievement, or the non-occurring event described by the without-phrase) is necessarily nonagentive. Accordingly, extraction in each case is readily compatible with the Single Event Grouping Condition. It is hard to imagine a purely syntactic explanation of such patterns. This chapter has seen a complication in our theory of the interaction of event structure and syntax, as we moved from the Single Event Condition to the Single Event Grouping Condition. The justification for this move comes purely from the fact that it allows such generalizations over syntactically disparate, but semantically unifiable, classes to be stated. At present, the Single Event Grouping Condition remains a stipulation, and an even odder one than the original Single Event Condition (though we will attempt to lessen its oddness in Chapter 8). However, it must be noted that even descriptive adequacy in this area has systematically eluded us since the first serious studies in the mid-1970s. Moreover, no earlier results are lost by the move from the Single Event Condition to the Single Event Grouping Condition. A proper extension of the class of event structures allowing extraction will not affect our claim that rationale clauses allow extraction quite generally (Section 6.1). Furthermore, since and upon aside, the prepositions discussed in Section 6.2 specify relations of temporal precedence, not overlap, among events, which means that the matrix and adjunct events in such cases will never be able to form event groupings, for which spatiotemporal overlap is required. Extraction from such adjuncts is still restricted to the enriched, contingent interpretation, in that case.
Appendix: Preposition Stranding in Adjuncts
169
This concludes the major part of this chapter, a demonstration that eventstructural factors play a significant role in constraining the availability of extraction out of untensed verbal adjuncts in English. However, several extensions suggest themselves. In the following appendix, I will dip a toe in the murky waters of patterns of A preposition-stranding in English adjuncts. Then, in Chapter 7, I will move on to derive the distinctions between extraction from adjuncts, which is available only in quite limited circumstances; and extraction from complement clauses, which is quite widely possible. We will see that the Single Event Grouping Condition does no harm to our theories of extraction out of complement clauses, and may be capable of covering some of the empirical ground here too. Moreover, as consideration of this issue necessitates a discussion of the interaction of tense with the Single Event Grouping Condition, I will also propose a way of deriving the general ungrammaticality of extraction out of a tensed adjunct in that chapter.
6.5 Appendix: Preposition Stranding in Adjuncts As the theory of locality embodied in the Single Event Grouping Condition is a partially semantic one, there is no clear reason to expect it to apply just to adjuncts, let alone to verbal adjuncts. The next chapter will investigate the interaction of this condition with extraction out of nonadjuncts. Before that, though, this appendix investigates one extension of the condition, to nonverbal adjuncts. By far the best-known case of extraction from adjuncts in English involves stranding of a preposition by movement of its DP complement. (77a) illustrates the relatively unsurprising case,18 in which an argument preposition is stranded. However, adjunct prepositions can, in some cases, be just as readily stranded, as in (77b). (77)
(a) Who did you talk [to ]? (b) Who did you go for a drink [with
]?
], itself imposAlthough both of these cases show the configuration [PP P sible in the majority of languages, the examples differ further in that while (77a) represents extraction out of an argument of talk, (77b) shows extraction from an adjunct and so is doubly perplexing. I have nothing to add here about the rarity of P-stranding in general. However, the Single Event Grouping Condition does give us a handle on why 18 The ‘relatively unsurprising’ case is, of course, still quite surprising from a crosslinguistic perspective, given the typological rarity of preposition-stranding. See van Riemsdijk (1978), Hornstein and Weinberg (1981), and Abels (2003), among others, for attempts to capture this rarity. It is fair to say that the very limited crosslinguistic distribution of P-stranding currently remains a mystery, however.
170
Extraction from Adjuncts
the argument–adjunct distinction matters so little in the cases where A Pstranding is possible.19 Addition of a regular PP specifies a further optional individual argument of the eventive predicate described by the VP, but does not introduce a new event variable. Ignoring tense, and so ignoring any contribution of the factors discussed in Chapter 4, we may represent the difference between (78a) and (78b) as in (79)—although there is an extra property predicated of the event variable, there is still only one such variable. (78) (a) John danced. (b) John danced with Mary. (79) (a) ∃e.(e = dance(j)) (b) ∃e.(e = dance(j) ∧ with(e, m)) In the majority of cases, then, adding a PP does not change the event structure of the example, and so extraction is predicted to be possible. However, there are exceptions. Three prepositions which typically take event-denoting nominal complements are notwithstanding, despite, and during, as illustrated in (80). Moreover, these events generally do not stand in contingent relations with the matrix events. Accordingly, they do not automatically allow extraction of their complement, as shown in (81). (80) (a) I will be here on time, notwithstanding disruptions en route.20 (b) I will be here despite the disruptions. (c) I read a book during lunch. 19 It would be inaccurate to say that the argument–adjunct distinction is predicted not to matter at all. Specifically, because the theory being developed here assumes that adjuncts are roughly weak islands, extraction from adjunct PPs should display the characteristics of weak islandhood. This means, for example, that it should be only possible to extract referential DPs from PPs. This prediction is actually quite hard to test, as the relevant examples inevitably involve a significant amount of complexity, and complex NPs in particular. The general pattern appears to me to head in the right direction, but the data are too fuzzy for much confidence.
]? (i) ?Of which company have you been talking [to a representative (ii) ??Of which company did you have lunch [with a representative ]? ]? (iii) *Of which company did you have lunch [next to a representative One possibly relevant construction involves cases of prepositions taking PP complements, as described in Jackendoff (1973). In such cases, the superordinate PP is always an adjunct, as far as I’m aware, and extraction of the subordinate PP has been assumed to be impossible. (iv) Which rock did you crawl out [from under (v) *Under which rock did you crawl out [from
]? ]?
Alternative explanations are possible, however, so I do not expect that this settles the matter. 20 In this case, there are complicating factors with this case stemming from the necessity of a parenthetical prosody on the adjunct, and also from the fact that notwithstanding is often used as a postposition, both of which (in addition to its greater morphophonological weight than most prepositions) make the analysis of this particular case quite fraught. These problems do not apply to the other cases discussed here, though.
Appendix: Preposition Stranding in Adjuncts (81)
(a) *What do you expect to get there [notwithstanding ]? (b) *What do you expect to get there [despite (c) *Which meal did you read a book [during ]?
171
]?
Moreover, the same sort of factors which were shown to ameliorate extraction from prepositional participial adjuncts in Section 6.2 also affect at least the during case illustrated in (81c). In that example, the two events of reading and eating are too independent to naturally be interpreted as contingently related. However, other examples do not suffer from this problem. Consider the following: (82) John fell asleep during Tamburlaine. It is quite plausible that, in this case, there is a contingent relation between the two events in question. Depending on John’s theatrical persuasions, it is quite possible that he would find Tamburlaine dull enough to send him to sleep. If so, the two events are causally related, and suddenly, extraction becomes possible, at least for some speakers. (83)
%Which play did John fall asleep [during
]?
Regarding the unavailability of parallel strategies for rescuing (81a,b), such strategies are predicted to be inapplicable in those cases because the adverse events which are felicitous as complements of notwithstanding and despite are necessarily too independent of the matrix event to be rescued in the same way: the complements of notwithstanding and despite describe a hindrance to the occurrence of the matrix event, and so cannot be contingently related to that event. Although this is far from a comprehensive account of the P-stranding data, of course, it does show that it is possible to extend the Single Event Grouping Condition to cover nonverbal adjuncts as well as the verbal cases discussed above. Note also that reanalysis-based theories of P-stranding (see, e.g., Stowell 1982) have generally included a semantic component in the conditions on applicability of the reanalysis rule, to the effect that the output of reanalysis must be a ‘possible (or semantic) verb’, in some sense.21 However, the assumption remained that reanalysis itself is a syntactic operation, 21
The theory of Hornstein and Weinberg (1981) is a notable exception to this claim. Hornstein and Weinberg claim that there is a free, optional operation which reanalyzes V and any amount of VP-internal material to its right as a single verb. The ‘semantic verb’ condition is then relocated to a separate operation of ‘predication’, which is only relevant to P-stranding by A-movement in pseudopassive cases. This is in contrast to the theory of van Riemsdijk (1978), which derives the distinction between the relatively restricted distribution of the pseudopassive and the relatively free distribution of A P-stranding by adopting a semantically constrained reanalysis rule, but analyzing the A case in terms independent of reanalysis, making reference to the distribution of COMP positions within PP. Both theories are exempt from the discussion in the main text, then. However, both have
172
Extraction from Adjuncts
manipulating the phrase-structural and X -theoretic status of particular nodes in the tree. Chomsky (1981: 292) made some brief remarks about the desirability of an approach to reanalysis which did not alter phrase-structural relations in this way, in parallel to the approach to restructuring as an operation on indices rather than on phrase-structural relations in Rouveret and Vergnaud (1980). However, such an approach to reanalysis was never developed in the published literature, to my knowledge. This is one place where the present work has something to offer. The problems with syntactic reanalysis rules are very well documented by Levine, and Baltin and Postal, as well as elsewhere. The present approach sidesteps such problems by offering a non-phrase-structural take on many of the data that theories of reanalysis were designed to account for. As well as clarifying the notion of possible verb, by giving it some explicit falsifiable content in terms of independently motivated theories of event structure, the strong claim here is that there is no need for a syntactic reflex of the manipulation of event variables but instead that the semantics and pragmatics can influence the acceptability of a case of wh-movement. Reanalysis functions by treating certain strings as derived constituents, although our theory of phrase structure would not normally treat them as such. The gist of the objections to this approach from Levine, and Baltin and Postal, is that there is very little evidence that these derived constituents actually are constituents in any independent sense, and very strong evidence that the derived constituents do not, in fact, function as syntactic units (see Truswell 2009 for a summary). In the present theory, however, the functional equivalent of ‘derived constituent’ is not a syntactic their empirical drawbacks. See Hornstein and Weinberg for criticism of van Riemsdijk’s programme of reducing the distribution of A P-stranding to PP-internal structural factors. On the other hand, Hornstein and Weinberg’s theory suffers from complementary problems, discussed at length in Levine (1984) and Baltin and Postal (1996). Firstly, the actual domain of applicability of reanalysis remains a mystery as it certainly does not apply completely freely to base-generated VP-internal strings, as Hornstein and Weinberg (fn.9, p.60) acknowledge. Secondly, Hornstein and Weinberg have to resort to some quite unpalatable assumptions to explain the acceptability of certain apparent cases of VP-external A P-stranding, as in (i–iii). (i) What day did John leave on? (ii) Which act did John leave the theater before? (iii) Which act did John leave the theater after? (Hornstein and Weinberg 1981: 79, attributed to David Pesetsky) Hornstein and Weinberg claim that such examples are actually ungrammatical but interpretable. However, if we admit that speakers readily produce, comprehend, and accept sentences which are actually ungrammatical, then we lose the empirical basis of the theory of grammar: which data are we trying to capture, and which ungrammatical but otherwise fully acceptable data are not actually our problem?
Appendix: Preposition Stranding in Adjuncts
173
unit but a semantic one, defined in terms of event structure. These objections are thereby sidestepped. It is worth pointing out, following Hornstein and Weinberg (1981), just how unconstrained any syntactic reanalysis rule would have to be to cover all cases of A extraction from prepositions, and how wildly the rule would overgenerate in the absence of what appear to be purely semantic constraints on its application. As the constraints are purely semantic, and as the only syntactic reflex appears to be a weakening of island boundaries, I argue that a more parsimonious, and less theoretically or empirically problematic, picture emerges from bypassing the syntactic instantiation of the reanalysis operation, and instead relating wh-movement directly to event structure.
7 Extraction from Complement Clauses and the Effect of Tense Compared to most theories of locality currently on the market, the one being developed here is an unusual one. That unusualness can be traced to the fact that it has focused initially on a fairly unusual empirical domain. Most theories of locality concentrate on the fundamental, though maddeningly elusive, facts concerning apparently unbounded movement along chains of head–complement relations—the elements that Ross (1967: 469–70) defined as ‘strips’. This has led researchers to concentrate on phenomena that I have had nothing to say about: subject–object asymmetries, argument–adjunct asymmetries, superiority effects, successive cyclicity, and so on. That, of course, is the point of the theory developed here: I have not been trying to replace the machinery which has been brought to bear on such effects but to complement it, finding a way to do well what that machinery presently does quite badly, without requiring that the event-based theory also do well what current syntactic locality theories already do well, so long as the event-based theory doesn’t get in the way. The main point of the present chapter is that the theory of locality based on the Single Event Grouping Condition does no harm in more familiar localitytheoretic territory: our more purely syntactic theories of Relativized Minimality effects, ECP effects, and so on, can continue to run happily, doing the work that they do do well, without the Single Event Grouping Condition getting in the way. The way to get to this point is by considering the effect of tense on event structure and, through the Single Event Grouping Condition, on extraction. Section 7.1 shows that extraction from tensed adjuncts is degraded in comparison to extraction from untensed adjuncts. The effect of tense on extraction has been well documented, and this interaction is explained naturally by the Single Event Grouping Condition. This raises an important question. The standard examples of long-distance wh-movement involve apparently effortless extraction out of any number of tensed complement clauses. We therefore require a theory which can
On the Impermeability of Tensed Adjuncts
175
distinguish between these two cases. This is the job of Section 7.2, which shows that the required distinction arises straightforwardly from standard semantic considerations. At this point, everything is OK. We have shown that the Single Event Grouping Condition is not doing any harm to our regular theory of A movement. Ideally, though, we would like a stronger result: we would like to be able to say that this condition is useful outside the domain of extraction from adjuncts, as well as in the cases we have considered so far. Sections 7.3 and 7.4 attempt to elaborate such an argument on the basis of patterns of extraction from semantically distinct classes of complement clauses. It is not clear to me how successful this attempt is: a lot rests on the precise grammaticality judgements assigned to various sentences, and this is an area in which there is no real agreement in the literature. However, I believe that the analysis is worth considering, especially given that no harm is done to the wider architectural claims pursued here if it should prove to be inadequate. Finally, we conclude the chapter by returning to the concerns raised in Section 1.2 concerning the status of the Condition on Extraction Domain of Huang (1982). With our event-based theory in hand, we can return to the cases of marginally acceptable extraction from subjects in English and see whether the Single Event Grouping Condition can cast any light on these patterns. The conclusion will be that it cannot, which gives further support to the nonunified post-CED position, whereby opacity effects in subjects and adjuncts are accounted for differently.
7.1 On the Impermeability of Tensed Adjuncts We have so far dodged the issue of the interaction of tense with the Single Event Grouping Condition. This must be addressed. One of the few generalizations concerning extraction from adjuncts in English which stands up to close scrutiny (made in Cinque 1990 and Szabolcsi 2006, as well as elsewhere) is that extraction from tensed adjuncts is uniformly impossible.1 Many prepositional participial adjuncts and rationale clauses have semantically very close finite counterparts, as in (1) below. 1 Even here, the generalization may not be totally accurate. As Ivan Sag was the first to point out to me, there are documented cases of extraction, normally by relativization, from tensed adjuncts. Examples like (i) are sometimes found to be acceptable.
(i) This is the watch that I got upset [when I lost
].
It is interesting that relativization often appears more lenient than other forms of A -movement with respect to locality constraints, such as the Single Event Grouping Condition, which produce gradient acceptability judgements. I have no explanation for this fact.
176
Complement Clauses and Tense
(1) (a) (b) (c) (d)
John went home [before he talked to Mary]. John went home [after he talked to Mary]. John fell asleep [while he was talking to Mary]. John is talking to Mary [so that she will understand how he feels].
However, extraction from these adjuncts is quite ungrammatical, regardless of their interpretation. (2) (a) (b) (c) (d)
*Who did John go home [before he talked to ]? ]? *Who did John go home [after he talked to ]? *Who did John fall asleep [while he was talking to *What is John talking to Mary [so that she will understand
]?2
There is a straightforward explanation for the absence of extraction from tensed adjuncts. We may assume that whatever formal mechanism lies behind the formation of the event groupings described in Section 6.3 is only available before the event variables in question are bound. Furthermore, part of the function of Op, from Chapter 4, is to bind the event variables in question. Op also plays a crucial role in the derivation of temporal intervals from events. This means that the introduction of a tense node depends on the prior application of Op. In that case, a tensed adjunct is one within which relations among events are already fixed: we cannot further manipulate the event variables within a tensed adjunct and identify them as part of a larger event. As, even on the revised locality conditions involving the Single Event Grouping Condition, such structure formation is a prerequisite for extraction, we derive the fact that extraction is impossible from tensed adjuncts. Moreover, this pattern is not restricted to tensed adjuncts but also includes other classes of adjunct which demonstrably contain functional structure above VP. Of course, this is exactly as expected on the approach to the relation between temporal structure and event structure sketched in Chapter 4. We saw there that auxiliaries merge above Op, while generation of a singleevent reading is contingent on the requirement that Op has not yet merged. 2 A plausible alternative hypothesis is that these questions are degraded because of the presence of a subject within the adjunct. Deciding between these two hypotheses depends on the status of examples such as %What did we come all this way without Bill noticing?, which include extraction across a subject within a nonfinite adjunct. Judgement is, as ever, divided on the grammaticality of such examples. This is one of the most widely accepted such configurations, yet even so, most speakers reject it. However, the fact that a few speakers consider such examples at least marginally acceptable leads me to suspect that other factors are also at work here, and that extraction across subjects in nonfinite adjuncts is at least sometimes possible. In contrast, I haven’t found anyone who accepts the sentences in (2). In that case, the extractions from tensed adjuncts would be degraded as a result of the presence of a T head, not a subject.
On the Impermeability of Tensed Adjuncts
177
Accordingly, we predict paradigms such as the following. By comparing (3a) and (3b), we see that addition of a temporal adverb within the verbal adjunct leads to sharp degradation, although addition of a temporal adverb outside the verbal adjunct leads to no problem. This cannot be due to factors relating to the basic declarative structure, however, as (3c) is acceptable. It also cannot be due to an outright ban on extraction across further material within the adjunct, as the addition of a manner adverb in (3d) is also relatively acceptable. The natural conclusion, then, is that the degradation of (3b) is specifically due to the effect of this particular adverbial. This is exactly as predicted by the interaction of the Single Event Grouping Condition with the hypothesis concerning binding of the event variable sketched in Chapter 4. (3)
(a) Who did John come here yesterday [to prepare for a meeting with ]? (b) *Who did John come here yesterday [to prepare for a meeting tomorrow with ]? (c) John came here yesterday [to prepare for a meeting tomorrow with the minister]. (d) ?Who did John come here yesterday [to prepare (diligently) for a ]? (very important) meeting (about the economy) with
I take it to be a significant result for the theory presented here that it can explain the negative effect of tense and other higher structure on the acceptability of extraction in these cases. The interaction between tense and locality has long been recognized, for example by Ross (1967: §6.1.3) and Kluender (1992), but it has always been less clear why it should have an effect in just a few areas such as this: as (4) shows, a tensed complement clause isn’t even a weak island in English, as it is still possible to extract constituents other than DP complements. (4)
Where did John say that Mary had been?
Ross leaves the effect of finiteness as an unexplained puzzle, while Kluender builds an intriguing story on a theory of the processing of predicate–argument structures, which is potentially compatible with the approach developed here but has less to say about why tensed complements of bridge verbs as in (4), for example, readily allow extraction. The Single Event Grouping Condition will start to show an empirical advantage in this area if it can explain why tense has the effect it does in cases like (2) but not (4). In other words, now we have a theory which does such a good job of banning extraction across tense, there is a pressing need to demonstrate that
178
Complement Clauses and Tense
this position is not vastly over-restrictive. The next section shows a way to keep this result while simultaneously allowing extraction from tensed complement clauses.
7.2 Why Tensed Complements are Different We have seen that merging a T head or auxiliary requires prior application of the operator Op described in Chapter 4, which binds an event variable, closing off the possibility of manipulation of that variable to form a larger event grouping and thereby turning an adjunct into a strong island for movement. The standard examples of extraction from a clausal complement of a bridge verb, such as (5), appear at first sight to flatly contradict that claim. that Mary kissed ]? (5) (a) Who did John say [ (b) Who does John think [ that Mary kissed ]? In each case, the complement clause is tensed, but extraction is still possible. The aim of this section is to show how this can be so without losing the result of the previous section. I will show that the distinction is due to the semantic and pragmatic status of the complement clause relative to the rest of the sentence. As tense requires prior application of Op, which binds the event variable in its scope, an event described by a tensed adjunct is claimed to exist independently of the matrix event,3 this claim being conjoined with the independent assertion of the existence of any events described by the matrix VP. On the other hand, in the case of complement clauses, the event described by the complement forms an argument of the predicate described by the bridge verb. Schematically, then, we find semantic differences such as the following. (6) (a) Tensed adjunct clause: ∃e1 .(. . . e1 . . .) ∧ ∃e2 .(. . . e2 . . .) (b) Tensed complement clause: ∃e1 .(e1 = P(. . . , ∃e2 .(. . . e2 . . .))) While the existence of the adjunct event is clearly stated independently of the existence of any event described by the matrix VP, then, the existence of an event described by the complement clause is not stated independently but is rather dependent on the predicate expressed by the matrix verb. The existential quantifier which binds the complement event variable is within the scope of the matrix predicate. This is the crucial difference between the two cases in the light of the Single Event Grouping Condition: on the one hand, 3 For now, I am using nontechnical terms like ‘claimed’ or ‘stated’ to describe the status of the existential quantification over events in order to duck the issue of whether the existence of these events is asserted or presupposed. We will return to it shortly.
Factive Islands and Event Structure
179
a representation such as (6a) corresponds to two events which are claimed to exist independently of each other. On the other hand, a representation such as (6b) asserts the existence of only a single independent event, e1 . The second event variable, e2 , is bound within the scope of the matrix predicate. This means that the existence of e2 in this world is not necessarily asserted, as is familiar from the literature on referential opacity. The most we can say is that the existence of e2 in some world accessible to the subject from the real world is asserted. Depending on the content of the predicate P, then, we may have no guarantee from a representation such as (6b) that e2 actually happened. To take a concrete example, (5a) cannot be paraphrased as asking the identity of the person such that there exists an event in the actual world of Mary kissing that person and there exists a separate event of John saying that the first event exists. We are not able to conclude from (5a) that that kissing event actually took place. Only the saying event is asserted to have taken place, so it would be more accurate to say that (5a) asks the identity of the person such that there exists an event of John saying that there exists an event of Mary kissing that person. The theory developed here therefore allows us to distinguish between tensed adjuncts and many tensed complement clauses on the grounds that it is only in the adjunct case that the described event has the independence from the matrix event which forces a violation of the Single Event Grouping Condition. The referential opacity of bridge verbs is sufficient to render unnecessary the formation of two separate event groupings, and so the Single Event Grouping Condition can still be satisfied. The following section shows that this approach may be able to do more work in this area by concentrating on a class of matrix predicates where this referential opacity does not hold.
7.3 Factive Islands and Event Structure The previous subsection set up a distinction between tensed adjuncts, which disallow subextraction, and tensed complements, which may allow subextraction, based on the different relations of the events described by those two structures to the matrix event. However, since Ross (1967), Kiparsky and Kiparsky (1970), and Erteschik-Shir (1973), it has been noted that extraction from a tensed complement clause is not always possible. One of the classes of exception which these authors examine is the factive islands, illustrated by examples such as (7).4 4 I will not discuss Erteschik-Shir’s other major class of non-bridge verbs, which consists of cases where the verb incorporates a manner of saying as well as the basic meaning of say, as in ??Who did John yell/holler/whisper/exclaim that Mary kissed?
180
Complement Clauses and Tense
(7) (*)Who did John regret [that Mary kissed
]?
There is some debate about exactly how bad such sentences are. Ross (1967: 449) assigns them two question marks; Kiparsky and Kiparsky (1970: 162) give examples marked with an asterisk, but restrict the scope of this judgement to ‘the oddity of questioning and relativization in some factive clauses’ (emphasis added); and Erteschik-Shir (1973: 90) gives a four-way distinction, based on work with one informant, ranging from fully acceptable to fully ungrammatical. In more recent years (for example, Hegarty 1992: 90), researchers have tended to assimilate factive complements to weak islands: extraction of the right kind of complement is taken to be barely degraded, if at all, while extraction of adjuncts is impossible and extraction of subjects is highly variable. Although the more recent assumption certainly makes for a cleaner set of explananda, I have my doubts about its validity. The reason for this is that the judgements reported in earlier work were all based on extraction of who from object position, which is almost the most extractable type of constituent we have. And yet, several researchers and their informants independently found extraction of even referential complements from factive clauses to be quite bad. I will therefore proceed on the assumption that extraction from factive islands is always substantially degraded. Of course, the judgements of people like Hegarty cannot be ignored, and in the fullness of time, we will want to be able to say more about the variable acceptability of extraction from factive complements. That will have to wait for the future, though. As before, then, factive verbs take a tensed clause as complement, but the approach taken to the cases of say and think above must fail to some extent here. The clearest difference between examples such as (5) and (7) above concerns the pragmatic status of the event described by the complement clause. The occurrence of such an event is presupposed by a verb such as regret (it is for this reason that they are known as factive verbs), but not for a verb such as say. This means, for example, that the case with regret in (8a) continues to imply that the addressee lied, despite this proposition being embedded under an interrogative operator, but this does not happen in (8b), where the matrix verb is non-factive say. (8) (a) Did you regret that you lied? (b) Did you say that you lied? The occurrence of the presupposed event in (8a), but not in (8b), is then largely independent of the material contained in the matrix clause in that it is unaffected by the presence of operators such as the interrogative operator within the matrix clause (see already Kiparsky and Kiparsky 1970 and Langendoen and Savin 1971). For Erteschik-Shir, too, presupposition is a paradigm case of the notion of semantic subordination, the opposite of semantic
Factive Islands and Event Structure
181
dominance. Erteschik-Shir’s theory of extraction is based on the following condition. (9)
Extraction can only occur out of clauses or phrases which can be considered dominant in some context.5 (Erteschik-Shir 1973: 27)
What this condition does not do, however, is explain why presupposition should block extraction in this way. Armed with the Single Event Grouping Condition, however, we are in a position to provide such an explanation. The second vital ingredient in this explanation is a theory of presupposition projection. A family of theories of presupposition, based on Kiparsky and Kiparsky (1970) and Langendoen and Savin (1971), derive the result that a presupposed element comes to have wider scope than the same element would if asserted. For concreteness, I will illustrate with one similar theory, from van der Sandt (1992) (although van der Sandt does not talk in terms of literal scope). Van der Sandt shows convincingly that the semantic conditions under which presuppositions are projected are related to those in donkey-sentences under which the binding by an existential quantifier of a variable corresponding to a pronoun is possible despite the pronoun not being syntactically subordinate to the existential. The sentences in (10) all fail to presuppose that Jack has children, despite the presence of the presupposition trigger all of Jack’s children. In the parallel syntactic environments in (11), we see the classic donkey-sentences in which a pronoun is bound by a non-ccommanding antecedent. (10) (a) Jack has children and all of Jack’s children are bald. (b) If Jack has children, then all of Jack’s children are bald. (c) Either Jack has no children or all of Jack’s children are bald. (11) (a) John owns a donkey. He beats it. (b) If John owns a donkey, he beats it. (c) Either John does not own a donkey or he beats it. (van der Sandt 1992: 343) 5 The inclusion of the qualification ‘in some context’ is due to some rather idiosyncratic and variable data concerning verbs such as regret. Regret is now a prototypical case of a factive island-inducing verb, but extraction from its complement is considered acceptable by Erteschik-Shir’s informant. She proposes to account for this by relating it to the fact that its complement is not presupposed in an example such as Harvard regrets that children cannot be accommodated (Erteschik-Shir 1973: 91, attributed to Karttunen). In that case, there are contexts in which regret’s complement is not presupposed (or, in her terminology, is semantically dominant), which leads her to predict that extraction from the complement of regret is acceptable. As this is contrary to the judgements of many others, I will ignore the ‘in some context’ in what follows, as well as many subtle further details in the characterization of semantic dominance, and concentrate on the fact that presupposition of a constituent’s denotation is predicted to prevent extraction from that constituent.
182
Complement Clauses and Tense
Building on this, van der Sandt claims that, semantically, pronouns and presuppositions are both anaphors, and that, moreover, they are subject to the same resolution mechanisms (where ‘resolution’ means identification of an antecedent in the case of a pronoun, or contextual satisfaction or accommodation in the case of a presupposition). The primary difference between pronouns and presuppositions is that only presuppositions have internal structure, and so a more clearly defined semantic content. This means that they are easier to accommodate if an appropriate antecedent is not already present. Van der Sandt goes on to develop a DRT-based account of the resolution of such anaphoric dependencies. This is based on the notion of accessibility given in (12–13) below. (12) Subordination A DRS Ki immediately subordinates a DRS Kj if one of the following holds: (i) (ii) (iii) (iv) (v) (vi)
There is a Kk such that Kj → Kk ∈ Con(Ki ) There is a Kk such that Ki → Kj ∈ Con(Kk ) There is a Kk such that Kj ∨ Kk ∈ Con(Ki ) There is a Kk such that Kk ∨ Kj ∈ Con(Ki ) ¬Kj ∈ Con(Ki ) Kj ∈ A(Ki )6
A DRS Ki subordinates a DRS Kj just in case (i) Ki immediately subordinates Kj (ii) There is a Kk such that Ki subordinates Kk and Kk subordinates Kj (13) Accessibility Let u ∈ U(Kj ), where Kj is an element of some A-structure and v an established marker in some U(Ki ). Now v is accessible to u just in case (van der Sandt 1992: 356) Ki subordinates Kj . Van der Sandt is careful to distinguish his theory from one, such as that of Kiparsky and Kiparsky (1970) or Langendoen and Savin (1971), in which presuppositions are always accommodated with maximally wide scope. Instead, the definition of subordination allows one to recursively trace a path through superordinate DRSs to find an antecedent, with the presupposition generally being resolved with the most local antecedent found. If no accessible antecedent is found, the presupposition will instead be accommodated by adding the relevant information to a superordinate DRS. In this case, the 6 The structure A(K) is an addition made by van der Sandt to the definition of a DRS K, familiar from Kamp (1981b), as a pair U(K), Con(K), corresponding to the universe of discourse markers and the constraints upon that universe, respectively. Van der Sandt adds the A-structure A(K) to that pair, corresponding to a collection of the anaphoric elements (sub-DRSs) of K.
Factive Islands and Event Structure
183
preference is for global accommodation, in the highest DRS, corresponding to maximally wide scope, but local accommodation in a lower DRS can be forced if accommodation in the highest DRS would lead to inconsistency. However, the details of exactly where a presupposition is accommodated are less important for our purposes than the fact that a presupposed element ends up in a superordinate DRS compared to a corresponding asserted element. After resolution or accommodation, then, a presupposed element almost always comes to be included in a DRS superordinate to the DRS within which the presupposition was generated.7 Pictorially, this can be represented as follows. In (14a), the embedded DRS contains a presupposition (represented in italics) which is compatible with a constraint in a superordinate DRS. The presupposition is resolved with this constraint, producing (14b). In (14c), on the other hand, there is no such accessible potential antecedent, and so the presupposition is accommodated globally, as in (14d). In both cases, the result is that the presupposition comes to be associated with a DRS superordinate to the DRS which gives rise to the presupposition, and so the verification of the presupposed constraint comes to be independent of the structure which initially generated that presupposition. u P(u) (14)
(a)
v P(v) Q(v)
uv
(b)
P(u) u=v
Q(v) 7 The exception implied by ‘almost always’ comes from a possible scenario in which the only site for accommodation or resolution is within the same DRS that gave rise to the presupposition. This situation is perhaps common with anaphora, but much less so with true presuppositions, and I ignore it in the following.
184
Complement Clauses and Tense
(c)
v P(v) Q(v)
v P(v) (d) Q(v)
The interest for our purpose comes in the interaction of such a theory of presupposition with propositional attitude verbs, and factive verbs in particular. It has been known since Frege that attitude verbs create opacity, in the sense that properties assumed to hold of the real world cannot be assumed to hold in the belief world of the subject of an attitude verb, and vice versa. For example, in the real world, it is known that Elton John’s real name is Reg Dwight, and so Elton John and Reg Dwight are one and the same individual. (15a) is therefore true iff (15b) is true. (15) (a) Elton John wears a wig. (b) Reg Dwight wears a wig. However, the actual identity of the people named by two proper names is no impediment to someone believing that the proper names pick out two different individuals. In that case, it is quite possible for someone to believe one of (15a) and (15b) to be true without believing the other. The truth conditions of (16a) and (16b) can therefore vary independently of each other. (16)
(a) John believes that Elton John wears a wig. (b) John believes that Reg Dwight wears a wig.
However, one exception to this rule concerns presuppositions. Under normal circumstances, a sentence such as (17) presupposes both that Mary has a daughter and that Mary believes that she has a daughter. This, then, is one case in which information embedded under an attitude verb can apparently affect our assumptions about the real world, as the presupposition Mary
Factive Islands and Event Structure
185
has a daughter must be resolved or accommodated outside the scope of the embedded proposition. (17)
Mary believes that her daughter wears a wig.
There is general agreement that there is an asymmetry between the two presuppositions, the one concerning the actual world and the one concerning the belief world, such that one of the presuppositions is in some sense derivative of the other, and so need not be generated by the theory as an independent presupposition. However, there has been some disagreement as to which presupposition should be taken as basic and which should be taken as derived. I will follow Geurts (1998) (contra Heim 1992) in assuming that the presupposition concerning the actual world is basic.8 In that case, representations of (17), with unresolved and accommodated presuppositions, respectively, can be found in (18). m Mary(m) uv (18)
(a) believes(m,
daughter(u,m)
)
wig(v) wears(u,v)
mu Mary(m) daughter(u,m) (b)
v believes(m,
wig(v) ) wears(u,v)
8 I will not repeat Geurts’ very detailed discussion of the different predictions of the two theories here—see his paper for details. I have also not followed Geurts’ analysis of presupposition in attitude contexts. His analysis is more complete and more accurate than mine, but also more complex, and the basic pattern is all we need.
186
Complement Clauses and Tense
In this light, the distinguishing property of factive verbs is that they presuppose their complements in their entirety, whereas any presuppositions that complements of other attitude verbs may contain are triggered by some structure or lexical item internal to that complement. After resolution or accommodation of those presuppositions, the whole propositional DRS representing the complement of a factive verb therefore comes to reside in a superordinate DRS, outside the scope of the factive verb, while the DRS representing the complement of a nonfactive attitude verb will still be within the scope of that verb. Of course, something must remain as the internal argument of a factive verb such as regret, and this would not be the case if we followed this approach to presupposition projection literally. I will sidestep this issue here in a way which is inelegant but sufficient for present purposes. Let us assume a class of facts as discourse referents, following Asher (1993), and assume moreover that regret specifies a relation between an individual and a fact. Finally, we shall make use of the counterpart relation described by Geurts.9 With these assumptions, we can implement a nonderivational equivalent for factive predicates of van der Sandt’s core insight about presupposition projection, while maintaining that regret has the correct number of arguments. We can then represent the distinction between factive and nonfactive complementation as in (19) and (20) below. (19) John regrets that Bill hurt his leg. j b l e1 e2 f John(j) Bill(b) leg-of(b,l) e1 =hurt(b,l) b l e1 f=
Bill(b) leg-of(b,l) e1 =hurt(b,l)
e2 =regret(j,f)
9 For Geurts, the counterpart relation is a version of identity which holds across possible worlds. It is represented by use of alphabetically identical names for discourse referents in two different DRSs, one of which is accessible from the other. Strictly speaking, (18b) should also have contained a counterpart of u within Mary’s belief world, but that is less crucial for our concerns.
Factive Islands and Event Structure
187
(20) John thinks that Bill hurt his leg. j b l e1 John(j) Bill(b) leg-of(b,l) e1 =think(j,
e2 e2 =hurt(b,l)
)
A representation like (19) includes two independent events. The condition corresponding to Bill hurt his leg and the condition corresponding to John regrets e2 are separate DRS conditions introduced at the same level of embedding, which makes the former independent of the latter in the relevant respects. Moreover, the events of Bill hurting his leg and John regretting it cannot form an event grouping, in the sense defined in Section 6.3: a more fully specified DRS would show that the event of Bill hurting his leg precedes John’s regretting that event temporally, which violates the conditions on formation of event groupings.10 On the other hand, (20) only represents a description of a single actually occurring event. The event of Bill hurting his leg has no existence independent of John’s belief world in (20). In that case, there are not two actually occurring events described in (20)—instead, we have one event which consists of John thinking that a second event exists. In that case, the Single Event Grouping Condition, coupled with a theory of presuppositions in belief contexts, predicts that factive complements are islands. Moreover, this explanation is readily extensible to other presuppositional eventive environments. One such example would be a subcase of the observation that extraction from definite noun phrases is degraded with respect to indefinite noun phrases.11 However, given that definiteness is primarily associated with individuals, not events, I will refrain from speculating too far in this direction here, as the topic merits more substantial independent treatment before any parallels are drawn. In this section, we have moved beyond extraction from adjuncts, to discuss the more central case of extraction from complements, and so allow the the10 It is unclear to me that event groupings in factive islands can always be ruled out on purely temporal grounds. However, I have not yet come up with a plausible case where none of the conditions on event groupings are violated, and so I have not been able to test the prediction that extraction from a factive complement should be possible in such a case. 11 See Davies and Dubinsky (2003) for other observations on the interaction of event and argument structure with the possibility of extraction out of nominals, many of them not immediately captured within the present analysis.
188
Complement Clauses and Tense
ory developed here to interact more generally with broader locality-theoretic concerns. We concentrated on one promising application of the Single Event Grouping Condition, to factive islands, and on showing how a well-motivated theory of presupposition could capture the distinction between bridge verbs and this particular class of non-bridge verbs. Whether this approach is to be preferred in the long run is really an empirical matter: extraction from factive complements is one area where judgements vary so dramatically between individual idiolects that it is hard to have any great faith in strings of question marks and asterisks as indicators of any general reality. The simple application of the Single Event Grouping Condition in this section predicts extraction from factive complements to be quite generally bad. It seems likely that that prediction is too strong. If that turns out to be the case, we have two options: either we attempt to modify the interface-based theory presented here to accommodate a greater variation in judgements or we abandon the interfacebased theory in this area. However, even if it should eventually turn out that the Single Event Grouping Condition cannot account for the distinction between bridge verbs and factive verbs, this does not invalidate the Single Event Grouping Condition in the general case. Rather, it would mean that the condition is essentially inert in the case of extraction from complement clauses, leaving it to the syntax to filter out all unwelcome cases of movement.12 That is, I think, a less desirable position than the one explored here, where the condition pulls its weight in both the case of extraction from adjuncts and the case of extraction from complements. However, if that is where the facts eventually force us, so be it.
7.4 Cyclic Determination of Event Structure This chapter has given us a principled reason why extraction from a tensed adjunct modifying a matrix VP is severly degraded, and why this is different from extraction from at least some tensed complement clauses. However, at present, the theory of extraction from complement clauses overgenerates in one very important sense.13 Rectifying this problem will lead us to a conception of the syntax–semantics interface in which the Single Event Grouping Condition is checked cyclically, rather than a potential alternative in which it is checked globally, for an entire syntactic representation or unbounded chain at once. In this way, the data discussed here constitute a further indi12
The next section will show that it is possible to render the condition inert in this way in the relevant cases of successive-cyclic movement, if the facts should demand it. 13 Thanks to Klaus Abels for the crucial observation discussed in this section.
Cyclic Determination of Event Structure
189
rect argument in favour of a successive-cyclic approach to long-distance A dependencies, with a wh-phrase moving through intermediate positions on the way to its final landing site. To see the problem with the theory as it stands, it is instructive to consider the shape of the current proposal. The crucial difference between those complement clauses from which extraction is possible and those where it is unacceptable concerns the presuppositional status of the event described by the embedded clause, relative to the matrix predicate. If the semantic material contained within the embedded clause is presupposed relative to the matrix predicate, then the embedded event is treated as independent of that predicate. This is what leads to a violation of the Single Event Grouping Condition, and therefore to the unacceptability of extraction out of factive islands. If, on the other hand, the semantic material contained within the embedded clause is not presupposed, no such independence arises, and the Single Event Grouping Condition is satisfied as a result. This theory does not differentiate any further regarding what that embedded material consists of, however. In the examples considered above, a simple clause was embedded beneath a predicate. However, exactly the same pattern would be predicted to occur regardless of the syntactic and semantic shape of the embedded material. If, for example, we modify the embedded clause with a tensed adjunct, a bridge verb construction will continue to describe a single event, given that a typical bridge verb is a plug for presuppositions in the sense of Karttunen (1973). Accordingly, extraction out of a tensed adjunct embedded beneath a bridge verb should be allowed, contrary to fact. This is exemplified in (21–22) below.14 14 Events described by tensed adjuncts are presupposed with respect to the matrix VPs to which they are adjoined. For example, they are unaffected by matrix negation, as shown in (i–ii).
(i) John cried after Mary kissed Bill → Mary kissed Bill. (ii) John didn’t cry after Mary kissed Bill → Mary kissed Bill. However, crucially, this presupposition fails to project past a matrix bridge verb, which acts as a plug in such cases. This is shown in (iii–v). (iii) Susan said that [John cried after Mary kissed Bill] → Mary kissed Bill. (iv) Susan said that [John didn’t cry after Mary kissed Bill] → Mary kissed Bill. (v) Susan didn’t say that [John cried after Mary kissed Bill] → Mary kissed Bill. In that case, the ungrammaticality of the cases discussed in the main text remains a surprising fact.
190
Complement Clauses and Tense
(21) (a) *Who does Susan regret [that John cried [after Mary kissed
]]?
e1 e 2 e 3 f j s m x john(j) mary(m) susan(s) x=? e1 =cry(j) (b)
e2 =kiss(m,x) after(e1,e2) e 1 e2 f=
e1=cry(j) e2=kiss(m,x) after(e1,e2)
e3 =regret(m,f)
Violates Single Event Grouping Condition: ungrammaticality expected. (22) (a) *Who did Susan say [that John cried [after Mary kissed
]]?
e1 s m j x john(j) mary(m) susan(s) x=?
(b)
e2 e3 e1 =say(s,
e2 =cry(j)
)
e3 =kiss(m,x) after(e2,e3)
Satisfies Single Event Grouping Condition: ungrammaticality unexpected.
Cyclic Determination of Event Structure
191
The necessary modification requires us to pay slightly closer attention to the syntax of such long-distance A -dependencies than we have until now. We have so far said nothing about the role of intermediate traces with respect to the Single Event Grouping Condition.15 (23) [CP Who did Susan [VP say [CP twho that John [VP [VP cried] [PP after Mary [VP kissed twho ]]]]]] In principle, there are two options with respect to the intermediate landing site and the Single Event Grouping Condition. Either we ignore it, and check the Single Event Grouping Condition with respect to the whole wh-chain, or we include it, and verify that the structure satisfies the condition at each stage of wh-movement. By ignoring the issue until now, we have tacitly placed ourselves in the former camp, which led to problems in accounting for the ungrammaticality of (22). However, this problem vanishes if we adopt the second possibility, and check event structures cyclically for conformity with the Single Event Grouping Condition. Here’s how this helps. Consider the embedded CP in (22), with the whphrase having moved to its specifier. As (24) shows, this structure fails to meet the Single Event Grouping Condition. (24)
(a) [CP Who that John [VP [VP cried][PP after Mary [VP kissed twho ]]]] e2 e3 (b)
e2 =cry(j) e3 =kiss(m,x) after(e2,e3)
Violates Single Event Grouping Condition: ungrammaticality expected. Assuming that the Single Event Grouping Condition is checked cyclically therefore explains the ungrammaticality of (22), while keeping all our earlier results. I therefore adopt the following. (25)
Cyclic determination of event structure A structure must satisfy the Single Event Grouping Condition at every step of A -movement.
Such a claim raises several questions concerning the interaction of event structure with successive-cyclic movement out of complement clauses. Depending on the fine details of the way in which (25) is applied, and of the semantic representations that we adopt, we can imagine that such a condition may rule 15
I put aside the issue of whether there should also be an intermediate trace of who in [Spec,P].
192
Complement Clauses and Tense
out extraction from factive complements, or have nothing to say about such extractions. I will not attempt to sharpen this any further here, although doing so would be a very worthwhile project. It seems to me that the next step towards a better understanding of the issues involved should be empirical: until we have reached some consensus concerning basic facts of the nature and extent of the degradation associated with extraction from factive complements, further theory construction seems somewhat premature. From the perspective of the theory developed here, it is possible to sharpen definitions in such a way that a cyclically applied Single Event Grouping Condition blocks extraction from factive islands, or alternatively permits it (possibly following the same pattern as extraction from weak islands). If the latter turns out to be the case, we would need to distinguish between the ungrammaticality of extracting from a presupposed tensed adjunct and the relative acceptability of extraction from a factive complement by assuming that there is no landing site, and so no checking of the Single Event Grouping Condition, at the edge of a tensed adjunct. A more problematic result would be that there is gradience in the acceptability of extraction from factive complements, and that gradience lacks an independent explanation. We may actually find such a pattern, based on the judgements reported in Erteschik-Shir (1973: 90), but some replication and expansion of those reported judgements from a single informant is in order before we attempt to make sense of this.
7.5 What Has Happened to the CED? I have been at pains to stress that I am not attempting to invade an empirical area usually associated with syntactic forms of locality theory. Rather, an empirically adequate analysis needs both a pure syntax component and something more interface-based, as developed here. So, in general, the syntactic theory of locality can continue in much the same shape as it ever had, and it can even potentially become more streamlined and efficient, as it can hand over some messier areas to the Single Event Grouping Condition. As anticipated in Section 1.2, though, the one likely syntactic casualty of the theory developed here is the Condition on Extraction Domain, due to Huang (1982). That condition has the effect of ruling out extraction from constituents which are not properly governed. The two major relevant types of non-properly-governed constituent are subjects and adjuncts. The CED therefore directly claims that subjects and adjuncts form a natural class in being opaque to subextraction.
What Has Happened to the CED?
193
The theory developed here directly challenges the CED. However, as we saw in Section 1.2, such challenges date back, implicitly at least, to Kayne (1983), with earlier isolated counterexamples already present in Cattell (1976). Within what I take to be standard minimalist theorizing, however, the CED is perceived to be a fact, or perhaps more accurately, an explanandum in search of an explanation. In contrast, the data from Stepanov (2007) concerning extraction from subjects, coupled with the data in Chapter 6 concerning extraction from adjuncts, suggest that, contrary to this standard assumption, the CED is at best a generalization with holes in it. The question then becomes one of whether we should maintain the CED as a leaky generalization, and look for ways to accommodate the apparent counterexamples, or whether we should take the more radical path advocated by Stepanov, and dispense with the CED altogether. The theory developed here encroaches on the CED’s empirical territory, and does so without mentioning adjuncts at all. A major claim of this work is that phrase-structural notions like ‘adjunct’ describe the wrong type of units for describing certain extraction patterns which the CED is concerned with. Accordingly, it would seem that this theory should serve as grist to the anti-CED mill. In fact, though, working out the implications for the CED is harder than one might think. In principle, a single counterexample is all it takes to falsify a universal claim. However, in practice, in a complex theory under development on many fronts at once, a single counterexample is hard to identify. As things stand, removing the CED, or any successor with similar empirical effects, from our theory of syntax risks damaging our theory’s predictions. There is still a strong crosslinguistic tendency for subjects and adjuncts to be completely opaque to extraction, in a way that Stepanov’s theory of extraction from subjects, or my theory of extraction from adjuncts, does not capture. This is familiar territory for locality theory. Perhaps the most notorious case of a locality phenomenon which is robustly attested in a very few languages, but crosslinguistically very rare, is preposition stranding, as discussed in Section 6.5. Only a handful of languages have been found to attest any form of preposition stranding by movement, and preposition stranding by A-movement is rarer still. Moreover, the languages allowing P-stranding are heavily concentrated in the Germanic languages: English and the North Germanic languages are the stereotypical P-stranding languages, and the status of P-stranding in West Germanic (including Old English) is a matter of ongoing debate, but it is plausible that some limited P-stranding under A -movement is possible. Outside of Germanic, the only languages for which any prepositionstranding has been claimed are Macedonian (van Riemsdijk 1978), two Kru languages, Vata and Gbadi (Koopman 1984), and one dialect of French spoken
194
Complement Clauses and Tense
on Prince Edward Island, which is in heavy contact with English (King and Roberge 1990). In that case, as with our more immediate concern here, an analysis of the phenomenon within one language is all very well, but the crosslinguistic considerations demand attention at some point. As van Riemsdijk showed for P-stranding, an explanation of such constructions must not only account for the fact that a phrase gets out of a domain (PP in his case, adjuncts in mine) in a given language, but it must also show why this doesn’t happen more often crosslinguistically. Truswell (2009) reviewed the attempts that have been made to account for the extreme rarity of P-stranding. I won’t go into such detail here, but I believe that a fair summary is as follows: we are constructing increasingly accurate theories of the distribution of extraction from PP in a language, and we can tie those theories to the availability of marked phrase-structural configurations, grammatical relations, or operations. However, we struggle to explain why P-stranding is so much rarer crosslinguistically than another marked phenomenon like, say, VSO order. On the one hand, according to Merchant (2001), Matthew Dryer failed to find any productively P-stranding languages outside of Germanic, in a 625-language sample; while on the other hand, VSO is the dominant order in rougly 7% of languages surveyed in Dryer (2008): clearly a minority, but in nowhere near the same way in which Pstranding languages constitute a minority. The question we now face is this: extraction out of an adjunct or out of a subject is undoubtedly rare (or marked, if you prefer), but is it rare in the way that VSO order is rare (that is, something which we don’t automatically expect but which we should expect our theory to accommodate without batting an eyelid), or is it rare in the way that P-stranding is rare (that is, something which will require exceptional measures if it is to be permitted by a given grammar)? If the former, we should bid a fond farewell to the CED: the exceptions are simply too unremarkable to allow the CED to stand with any great generality. If the latter, the CED should probably stay in place as part of our theory of grammar, and we should turn our attention to a theory of the really exceptional measures required to allow languages like English to violate the CED in some circumstances. Clearly, a question like this will not receive a straightforward answer, and I do not have anything like the typological database at my disposal to make the call in anything other than a very provisional way. It will also become clear that I believe that the answer is much more involved than a simple ‘yes’ or ‘no’. However, I will lay out some empirical considerations which bear on the status
What Has Happened to the CED?
195
of the CED in this way, at the very least as a way of describing a programme for sharpening this area of our theory of locality. I will proceed as follows. The first subsection below will lay out some crosslinguistic considerations in the area of extraction from adjuncts to show that the English pattern is unusual, but not entirely isolated, from a crosslinguistic perspective. I then turn my attention to Stepanov’s subject extraction data in Section 7.5.2, which allows us to attempt to draw a line of best fit with respect to the status of the CED. 7.5.1 Extraction from Adjuncts: A Crosslinguistic Perspective We have seen that extraction from adjuncts in English is not particularly unusual. It is not completely free, even in syntactic terms: adjuncts are roughly weak islands, and there are good, partially semantic, reasons to expect tensed adjuncts to behave like strong islands. Add in the semantic restrictions following from the Single Event Grouping Condition, and many cases of extraction from adjuncts are ruled out. However, there are plenty of other cases where extraction from adjuncts is possible. This leads to the obvious question of how other languages behave in this respect. The CED is held responsible for countless facts across languages, so we have good reason to hope that it remains a viable principle. In fact, languages appear to differ substantially with respect to how tolerant they are of extraction out of adjuncts. English is relatively tolerant, and it is not alone in this behaviour. Norwegian, and to a large extent Swedish, are similarly permissive of extraction from adjuncts. I illustrate this here with examples of extraction from BPPAs, because such extraction has the most restricted crosslinguistic distribution in languages that I am aware of. Less remarkable types of extraction from adjuncts discussed above are also grammatical in Norwegian and Swedish. (26)
Norwegian: Hvilken sang kom han [plystrende på on Which song came he whistling ‘Which song did he arrive whistling?’
(27)
]?
Swedish: Vilken sång kom han in i rummet [visslande på Which song came he in in room.the whistling on ]? ‘Which song did he come into the room whistling?’
196
Complement Clauses and Tense
Dutch, by contrast, is thoroughly intolerant, as is Greek. It appears that extraction from any class of adjunct is impossible, as seen in (28). (28) (a) *Wat is Jan [ fluitend] gearriveerd? What is John whistling arrived ‘What did John arrive whistling?’ te lezen]? to read
(b) *Wat ben je hier gekomen [om What are you here come for ‘What did you come here to read?’
(c) *Wat ben je naar huis gegaan [zonder What are you to house gone without zien]? see ‘What did you go home without seeing?’
te to
Russian, likewise, disallows almost any extraction from adjuncts. Although I would not claim to have anything like a good understanding of the Russian A -system, and, in particular, I have not been able to construct examples of extraction out of BPPAs, examples like the following are ungrammatical, at least. (29) (a) ??Komu Ivan ušol [ne posvoniv Who.dat John left not call.adv ‘Who did John leave without calling?’
]?
(b) *Komu Ivan prišol [dlya togo ˇctoby pozvonit’ Who.dat John came for that.sbj call.inf ]? ‘Who did John come here to call?’ So far, it may appear that there is a binary parameter lying behind the crosslinguistic distribution of extraction from adjuncts. In the same way as P-stranding under A -movement is hypothesized to be contingent upon the inclusion of some marked structure, relation, or operation in a grammar, we might suspect that extraction from adjuncts is impossible in the unmarked case, and depends on the availability of some similar marked element. However, if we move beyond adjuncts as a monolithic category, the patterns become somewhat more elaborate. Cinque (1990) shows that Italian allows extraction out of many classes of prepositional participial adjunct, in the terminology of Chapter 6.
What Has Happened to the CED?
197
(30) Anna, che me ne sono andato via [senza neanche Anna who I went away without even salutare ] say goodbye.inf ‘Anna, who I went away without even saying goodbye to’ However, even Italian rejects extraction out of BPPAs, as shown in (31).16 (31) *Che cosa Gianni è arrivato [fischiettando What John is arrived whistling ‘What did John arrive whistling?’
]?
Meanwhile, Postal (1998) claims that even extractions similar to those accepted in Italian are rejected in French: (32) (a) Elle est allée en Angleterre [sans lire le She is gone in England without read.inf the livre]. book ‘She went to England without reading the book.’ (b) *le livre qu’ elle est allée en Angleterre [sans the book that she is gone in England without ] lire read.inf ‘the book that she went to England without reading’ (c) Elle y est allée en avion [pour confronter She there is gone in aeroplane for confront.inf le directeur]. the manager ‘She went there by plane to confront the manager.’ (d) *le directeur qu’ elle y est allée en avion the manager that she there is gone in aeroplane ] [pour confronter for confront.inf ‘the manager that she went there by plane to confront’ (Postal 1998: 76) To this, we could add that extraction from BPPAs is untestable in French because no clear equivalent of the BPPA construction exists in the first place. 16 Placing the subject in other positions appears to make little difference to the acceptability of (31). However, for one informant, using a null subject improves matters without making the sentence fully acceptable.
198
Complement Clauses and Tense
(33) Jean est arrivé *(en) sifflant la Marseillaise. John is arrived in whistling the Marseillaise ‘John arrived whistling the Marseillaise.’ This contrasts with the behaviour of Spanish. As was originally reported in Demonte (1988), extraction from bare past participial adjuncts is possible in Spanish, as in (34). (34)
De qué novia no sabes si Pepe volvió From which girlfriend not know.2sg if Pepe came back [harto ] fed up ‘Which girlfriend do you not know if Pepe came back fed up with?’ (Demonte 1988: 22)
This possibility is not restricted to past particples. BPPAs also allow extraction in Spanish. (35)
Qué melodia volvió Pepe [siblando Which tune came back Pepe whistling ‘Which tune did Pepe come back whistling?’
]?
Indeed, as with (34) above, even extraction of PPs from BPPAs is quite acceptable in Spanish, unlike English. (36) Sobre qué idea volvió Pepe [pensando About which idea came back Pepe thinking ‘About which idea did Pepe come back thinking?’
]?
In summary, then, it is hard to defend the claim that no adjuncts allow extraction. Extraction from rationale clauses and from certain types of prepositional participial adjuncts appears to be relatively unremarkable within this small, eurocentric sample. As an attempt to shore up the CED, one might be inclined to classify such constituents not as adjuncts but rather as the ‘quasi-adjuncts’ mentioned, but never defined, by Boeckx (2003).17 However, the constituents in question certainly look like adjuncts: they are optional, they occur outside 17
I use ‘quasi-adjunct’ as a placeholder term for an intuition that many people share that adjuncts like the above are less adjunct-like than others. Other terms covering roughly the same intuition are ‘pseudoargument’ and ‘quasiargument’. As far as I am aware, there is no well-developed content attached to such terms. Boeckx (2003) makes a brief stab at such a theory, based on the following premises: (a) Agreement into adjuncts is generally impossible; (b) Obligatory control is based on Agree (Landau, 2001); (c) Obligatory control, and hence Agree, is possible into some classes of adjuncts, as shown in the subjectless adjuncts above. These adjuncts allowing obligatory control are therefore quasi-adjuncts. Because Agree into such constituents is possible, on Boeckx’s theory, movement out of these constituents should be possible, all else being equal. The above examples show that this falls
What Has Happened to the CED?
199
complements, and so on. The alternative is to claim, as here, that adjuncts are not generally strong islands. This leaves us with two questions. The first is what to do about languages like French, which disallow extraction from adjuncts in general. I have no good story about this at present. However, a counterexample like this should not obscure the general pattern that extraction from adjuncts is very frequently possible. A parallel may be drawn with the treatment of successive-cyclic movement. In many languages, most famously Russian, extraction from indicative complement clauses headed by the complementizer ˇcto is impossible. (37)
*vot ženšˇcina kotoruju ja knal ˇcto on ljubil. ‘here is the woman who I knew that he loved’
(Ross 1967: 464)18
Yet, even if we do not understand why that should be the case, we should not let that obscure the fact that successive-cyclic movement is straightforwardly possible in many languages. Exactly the same is true here with languages like French that do not allow any extraction from adjuncts. The second question raised by the data above concerns classes of adjunct which are more resistant than usual to subextraction. Among the classes under consideration here, this applies above all to BPPAs. The crosslinguistic distribution of extraction from BPPAs is restricted to much the same extent that the crosslinguistic distribution of P-stranding is restricted.19 It certainly appears to be much more restricted than the distribution of extraction out of the other classes of adjunct under consideration: at present, I am aware of examples of legitimate extraction from such adjuncts in only English, Norwegian, Swedish, and Spanish. On the other hand, such adjuncts exist, but disallow subextraction, in at least Icelandic, Faroese, Danish, Dutch, German, Greek, and Italian. Equally, extraction is very clearly subject to the sort of short of a satisfactory theory of extraction from adjuncts: even among adjuncts with obligatory control, opacity is the norm in many languages. This is not intended as an attack on Boeckx (2003): Boeckx’s discussion of quasi-adjuncts takes less than a page, and is far from the main point of that work. Rather, I mention Boeckx because he is one of the very few minimalist authors to have even recognized the problem, and attempted to outline a contentful solution, however sketchily. Given that his approach appears not to be tenable, notions such as ‘quasi-adjunct’ remain devoid of noncircular content. 18 I have slightly modified Ross’ orthography for consistency with Russian examples elsewhere in this monograph. 19 In Truswell (2009), I went further than this and claimed that A -extraction from BPPAs was possible in a language only if that language (a) had BPPAs, and (b) allowed P-stranding under A-movement. This claim can no longer be true, given the Spanish data from Demonte (1988) reproduced above. The Spanish pattern is slightly different from that found in Germanic languages in that extraction of a PP from an adjunct is just as acceptable as extraction of an NP. At present, I do not understand the Spanish data well enough to know why this should be.
200
Complement Clauses and Tense
gradient, partially semantic factors that the Single Event Grouping Condition is designed to capture in at least Norwegian and Swedish as well as English. Basic comparative considerations such as these therefore suggest that extraction from BPPAs, unlike extraction from adjuncts in general, is the marked case: its crosslinguistic distribution is restricted, as is its distribution within a language. This is also the conclusion drawn by Demonte (1988), who argues that extraction from secondary predicates in Spanish is legitimized by application of a syntactic reanalysis rule, on the basis of comparative English and Spanish data. I will not adopt the syntactic reanalysis hypothesis, for the reasons discussed in Chapter 6, but the empirical conclusion underlying it, concerning the markedness of extraction from BPPAs, is consonant with the data discussed in this subsection. The moral here, then, appears to be that not all adjuncts should be treated alike. We already have independent reasons, based on the Single Event Grouping Condition, to exclude extraction from many classes of adjunct, such as adjuncts which attach high in the tree or tensed adjuncts. Such prohibitions on extraction should be universal to the extent that the classes of adjuncts in question are syntactically and semantically similar across languages. In cases where the constraints imposed by the Single Event Grouping Condition are automatically met, in virtue of the semantic relation between adjunct and matrix events, there is also a crosslinguistic tendency to allow extraction, although languages such as French pose counterexamples to this tendency. Finally, cases like BPPAs are more restricted crosslinguistically than might be expected on the basis of the above considerations alone. This points to some further, crosslinguistically variable, factor at work, reminiscent of the sort of marked pattern that reanalysis was intended to capture in much early work on P-stranding: satisfaction of the Single Event Grouping Condition, and the conditions imposed by weak islandhood and so on, is not sufficient to allow extraction from BPPAs in isolation. 7.5.2 Patterns of Extraction from Subjects As we have seen, the CED predicts that extraction from subjects and from adjuncts often behave similarly. As neither is typically properly governed, extraction from either should usually be impossible. However, the prediction is slightly more subtle than that. Adjuncts are universally ungoverned. However, there is the possibility that the proper government of subjects is subject to crosslinguistic variation. Indeed, the first suggestion along these lines was found already in Huang (1982), who proposed that covert movement out of subjects was possible in Chinese because Chinese INFL, unlike
What Has Happened to the CED?
201
English INFL, is a lexical category and so capable of governing the subject position. Indeed, there is substantial evidence that extraction from subjects and adjuncts pattern quite differently, even at quite a coarse empirical grain. I will deal here only with the facts in English. It is quite likely that a careful crosslinguistic comparison will reveal further typological dissociations between patterns of extraction from subjects and from adjuncts. For example, those SOV languages which apparently allow extraction from subjects quite freely have been reported not to allow extraction from even preverbal adjuncts, a fact which is irreducible to Kayne’s (1983) Connectedness condition. The same is true of Russian, which, as we have seen, disallows extraction from postverbal adjuncts. Although claims of extraction from subjects in Russian have doubtless been exaggerated, as we saw in Chapter 1, short-distance extraction from postverbal subjects in Russian (again, in conformity with the Connectedness condition) does seem to be possible. (38)
Kakaja tebya ukusila sobaka? Which.nom you.acc bit dog.nom ‘Which dog bit you?’
Ideally, we would like to see a double dissociation: some other language in which extraction from subjects, but not from adjuncts, is absolutely impossible. In fact, I am unaware of any such language. Claims of grammatical extraction from subjects in at least some cases have been made for every one of the languages shown to allow extraction from adjuncts in Section 7.5.1. Another form of argument for the disunity of extraction from subjects and from adjuncts can be found, though, by examining the patterns of such extractions within a given language. That is what I will do, with respect to English, in the remainder of this subsection. We have seen that extraction from adjuncts is only sporadically impossible in English, to the point where I have claimed that the true generalization does not actually make reference to adjunction. However, extraction from subjects is generally impossible in English. The few exceptions which have been noted to this generalization over the years (Ross 1967, Kuno 1973, Sauerland and Elbourne 2002, Levine and Sag 2003, Chomsky 2008) appear to obey quite different conditions from the adjunct cases. For example, Chomsky (p.147) observes that the acceptability of extraction from a subject is related to whether that subject is an internal or external argument. According to Chomsky’s judgement, the examples of extraction from a derived subject in (39) pattern with the cases of extraction from an object in (40) rather than those of extraction from an external subject in (41).
202
Complement Clauses and Tense
(39) (a) It was the car (not the truck) of which [the (driver, picture) was found]. (b) Of which car was [the (driver, picture) awarded a prize]? (40) (a) It was the car (not the truck) of which [they found the (driver, picture)]. (b) Of which car did [they find the (driver, picture)]? (41)
(a) *It was the car (not the truck) of which [the (driver, picture) caused a scandal]. (b) *Of which car did [the (driver, picture) cause a scandal]? (Chomsky 2008: 147)
Although there has been no published response, as far as I am aware, to Chomsky’s empirical assumptions, there has been some amount of surprise registered informally among syntacticians as to the judgements, representing as they do a reversal of the judgements assumed over the 25 years since Huang’s dissertation. In particular, the subject island violations discussed by Chomsky display a strong preference for extraction of PPs over extraction of nominals. This is in contrast to the general preference in English for stranding prepositions. So while it is more natural for most English speakers to say (42a) than (42b), the opposite pattern is found when comparing the sharply degraded (43b) to the relatively acceptable (43a). (42)
(a) Who did you talk [to (b) To whom did you talk
]? ?
(43) (a) Of which car was [the driver (b) ??Which car was [the driver of
] awarded a prize? ] awarded a prize?
In a footnote (p.160), Chomsky attributes the degradation of the stranding examples to the Clause Nonfinal Incomplete Constituent Constraint of Kuno (1973), repeated below. (44)
The Clause Nonfinal Incomplete Constituent Constraint It is not possible to move any element of phrase/clause A in the clause nonfinal position out of A if what is left over in A constitutes an incomplete phrase/clause (Kuno 1973: 381)
However, this is incompatible with a claim due to Sauerland and Elbourne (2002), which is more restrictive in some respects than Chomsky’s, but more permissive in others. Sauerland and Elbourne’s generalization, like Chomsky’s, restricts extraction to internal subjects. It also, however, imposes the additional requirement that the derived subject must be a scope-taking element, which interacts with some other scope-taking element, and finally, that
What Has Happened to the CED?
203
the subject takes narrow scope with respect to that scope-taking element (in their terms, the derived subject reaches subject position by PF-movement rather than regular A-movement). Only if all these conditions are met is extraction from a subject possible, according to Sauerland and Elbourne. They therefore give examples like the following. (45)
(a) *That’s the book Opj that [a chapter of tj ]i seems t i to have been assigned to John ti . (b) ?That’s the book Opj that [a chapter of tj ]i seems t i to have been assigned to every student ti .
(46)
] always provided? (a) *?Which constraint are [good examples of (b) Which constraint are [good examples of ] always sought? (Sauerland and Elbourne 2002: 304)
The distinction between (45a) and (45b) is due to the fact that a chapter of t interacts scopally with every student in the latter case, but cannot interact scopally with a proper name like John in the former case. The precise claim, then, is that (45b) is grammatical only if a chapter of t takes narrow scope with respect to every student. Similarly, (46b) is predicted to be grammatical, unlike (46a), because seek, unlike provide, is intensional, and so can interact scopally with good examples of t. More pertinently, though, Sauerland and Elbourne clearly believe that it is grammatical to extract DP, stranding P, from a subject, provided that other relevant conditions are met. This is in contrast to Chomsky’s reliance on the Clause Nonfinal Incomplete Constituent Constraint to rule out stranding of P in (43) above. This is an empirical area which is only starting to be seriously investigated. Holes are increasingly being found in the generalization that extraction from subjects is always impossible in English. However, there is no consensus yet as to what the empirical facts actually are. That is not a debate I intend to enter into here. Two points are relevant here, though. Firstly, even in English, subjects are not universally islands. Secondly, it appears that the conditions under which extraction from subjects is possible in English do not appear to reduce straightforwardly to the Single Event Grouping Condition, regardless of whether the facts eventually look more like Sauerland and Elbourne’s, or Chomsky’s, description. This second point becomes apparent if we consider the event structures corresponding to the different cases considered by Chomsky, and Sauerland and Elbourne. None of the event-structural considerations in Part I had a direct bearing on the distinction between derived and external subjects, and to the extent that there may be a correlation between the internal/external
204
Complement Clauses and Tense
distinction and event size, we would expect external subjects, which are more typically agentive, to participate in events at least as large as those which internal subjects, which are not typically agentive, participate in. If the Single Event Grouping Condition makes any prediction at all here, then, it would most likely predict that extraction from external subjects should be more frequently possible than extraction from internal subjects, contrary to fact. Whatever the ultimate story concerning extraction from subjects may be then, it is highly likely to be independent of the Single Event Grouping Condition. A further possible reason for differentiating subject islands from adjunct islands can be found in the results reported by Sprouse, Brinjak, Littman, and Meyers (2007). Sprouse et al. compare grammaticality judgements, elicited through magnitude estimation, for a range of overt and covert island conditions. For the most part, their results confirm Huang’s (1982) claim that covert wh-movement is not sensitive to islands. However, there is one exception, namely that covert wh-movement from within a subject island is judged to be degraded to a statistically significant extent, relative to a nonisland-violating control. The full paradigm given is as follows: (47a) tests the acceptability of a multiple wh-question with an embedded wh object in situ. This can be compared with (47b), where the in situ wh-phrase is embedded within an embedded object, and also with (47c), where the in situ wh-phrase is an embedded subject rather than an embedded object. All of these conditions received almost identical grammaticality ratings, showing that covert extraction of a subject or covert extraction from within a complement is just as acceptable as the baseline of covert extraction of a complement. However, when these are compared with (47d), where the in situ wh-phrase is embedded within an embedded subject, this latter is significantly degraded. (47)
(a) (b) (c) (d)
Who thinks that a bottle tripped who? Who thinks that a bottle of wine tripped the manager of what? Who thinks that what tripped the manager? *Who thinks that a bottle of what tripped the manager?
In contrast, a similar paradigm involving adjunct islands does not show a parallel degradation: although (48a), with the in situ wh-phrase within a complement, is perceived as slightly more acceptable than (48b), where the in situ wh-phrase is within an adjunct, the effect does not reach significance. (48) (a) Who thinks that you forgot what at the office? (b) Who laughs if you forget what at the office?
Summary
205
Sprouse et al. conclude that there is evidence for a covert subject island effect but not a covert adjunct island effect.20 Of course, there are many problems in interpreting this result in relation to the general considerations at issue here. Most prominent of these are empirical concerns relating to the status of extraction from if-clauses in the general case (see, e.g. Pullum 1987, Culicover and Jackendoff 2005: ch. 13, and Taylor 2006 for partially divergent assessments of the empirical facts in this area). However, it is possible that constitutes another piece of evidence for the disunity of the cases originally brought together under the CED. We therefore have several reasons to suspect that the exceptions to the subject and adjunct island conditions do not have the same source, either in English or crosslinguistically. Of course, this raises the question of why this should be. Unfortunately, though, we simply do not yet have enough consensus about the empirical facts of extraction from either domain to make much progress here. A full, nonunified, post-CED theory is therefore still some way in the future. However, the above considerations constitute an argument that we need such a theory.
7.6 Summary This chapter has had several aims. First and foremost, we have shown that there is no incompatibility between the Single Event Grouping Condition and regular syntactic theories of successive-cyclic movement out of complement clauses, and the limits of such movement. At worst, then, the condition is harmless to the rest of our locality theory. However, we also investigated the possibility of developing a theory of factive islands based on the Single Event Grouping Condition. The evidence in favour of this analysis was somewhat equivocal, as the empirical facts are not yet clear. However, if such a theory should prove tenable, it would be a very strong argument for the semantic basis of the Single Event Grouping Condition, as this condition would then account for the impossibility of extraction out of two syntactically quite distinct domains, namely tensed adjuncts and the complements of factive verbs. In developing this latter theory, we found a novel, indirect argument for successive-cyclic movement and a grammatical architecture in which structure is interpreted cyclically rather than globally, based on the undesirable consequences of a globally applicable Single Event Grouping Condition. This consideration of the interaction of the Single Event Grouping Condition with regular syntactic locality concerns led to a discussion of the one 20 Clearly, this conclusion is independent of whether the analysis of wh in situ or multiple wh questions involves literal covert movement or not.
206
Complement Clauses and Tense
prominent syntactic locality condition which overlaps significantly with the discussion of extraction from adjuncts in Chapter 6, namely the Condition on Extraction Domain. The tentative conclusion is that the CED should not be considered as a unified principle: both major cases covered by the CED (extraction from subjects and from adjuncts) have a variety of exceptions, and those exceptions pattern differently in the two cases, both crosslinguistically and within individual languages. From the present perspective, the Single Event Grouping Condition is active in several languages in restricting extraction from adjuncts. However, it does not have much to offer in terms of restricting extraction from subjects. Some supplementary theory of restrictions on extraction from subjects is therefore in order. If there is indeed a correlation between the possibility of extracting from pre- or postverbal subjects and the VO–OV parameter, as suggested in Chapter 1, then this might suggest on minimalist assumptions that the theory in question should make reference to linear order, and so to processing or PF. The next chapter will continue to explore issues along these broad lines. We now have a condition, the Single Event Grouping Condition, which does a fair amount of empirical work for us, across a variety of different areas. However, at another level, this condition remains just a description, and a fairly odd one at that. Chapter 8 addresses one main question: Why does wh-movement care about events? Is it because events, like wh-movement, are represented in the syntax? If not, what can we tell about the nature of the interface between the relevant syntactic and nonsyntactic concerns?
8 Architectural Issues 8.1 Introduction If the argumentation above is on the right track, then event structure, a cognitive and semantic factor, has a significant impact on the applicability of A -movement. At this stage, that conclusion could be implemented in one of two ways. It is possible, as I have tacitly assumed until now, that the Single Event Grouping Condition interacts directly (that is, without mediation from syntax) with A -movement. However, there is another possibility, namely that the event-internal structures detailed in Part I are represented in the syntax, and that the Single Event Grouping Condition is actually derivative of a purely syntactically defined structural condition on A -movement. In developing this possibility, we would aim to build on the syntactic approach to lexical decomposition pioneered by Lakoff (1970) and other generative semanticists, and later resurrected by Hale and Keyser (1993), among others. I will argue that this syntactocentric view cannot be upheld. The argument presupposes the correctness of the model of event structure developed in Part I, and shows how the units and relations defined there do not all map onto the units and relations of phrase structure, as currently understood, in any direct way. This does not, of course, mean that we cannot in principle modify our understanding of phrase structure to close the gap, so there is no argument here that the Single Event Grouping Condition could not in principle be construed as a syntactic condition. Rather, the argument is that, given our current understanding, it is not naturally construed as a syntactic condition. This conclusion leads to two further questions. The first concerns the division of labour between syntax and semantics in determining linguistic event structures. Syntactocentric theories of event structure, most recently that defended at length in Ramchand (2008), can account for dozens of facts which the Single Event Grouping Condition, and the model of event structure in Part I, have nothing to say about. These are often facts concerning the relation of morphology to event structure, but there are also arguments based on crosscategorial similarities between syntactic and semantic decomposition
208
Architectural Issues
of VPs and events, PPs and paths, and APs and scales, for example. Clearly, these similarities are beyond the jurisdiction of the Single Event Grouping Condition, as it stands. So we must ask how the notion of ‘event’ is divided up between those theories which would essentially syntacticize that notion, and those theories, like mine, which insist on some irreducibly semantic component to the definition of events. Finally, the obvious question, deferred until now, is this: Why should A -movement care about a semantic notion like ‘event’? If we cannot completely syntacticize event structure, then we must do something to shore up the modularity of our theory of grammar. Even if a syntactic operation can apparently make reference to a semantic notion like ‘event’, we surely do not want to allow such rules to also make reference to, say, a phonological notion like ‘nasal’. We will not countenance the following condition. (1)
The Single Nasal Condition An instance of wh-movement is legitimate if the minimal constituent containing the head and the foot of the chain contains only one nasal consonant.
I have put this question aside until now because demonstrating even the descriptive adequacy of the Single Event Grouping Condition, even for a single language, must surely take priority in an empirically very rocky area such as this. Now, though, the matter must be addressed. The answer I will develop suggests that the Single Event Grouping Condition is based on neither syntax, nor semantics, but on processing. We will develop a version of the processing theory of locality in Gibson (1998), where the cognitive cost of processing A -dependencies is measured partly in terms of the number of discourse referents encountered between filler and gap. In that original proposal, the discourse referents considered were only individuals. However, subsequent elaborations (in particular Gibson 2000) have incorporated events as discourse referents into the picture. And Part I has given us a way of counting events. A condition like the Single Event Grouping Condition should exist, then, as a theorem of Gibson’s Dependency Locality Theory. However, it does not automatically follow that the condition should take precisely the form it does here. This gives us an opportunity to simultaneously sharpen a lot of our definitions, formulate a lot of hypotheses, and put the Single Event Grouping Condition on a principled footing. Moreover, the worries about modularity dissolve: event structure can interact with wh-movement because incremental syntactic structure building and incremental interpretation intermingle (Crain and Steedman 1985).
Syntacticizing the Single Event Grouping Condition
209
Section 8.2 is dedicated to the defence of the more interface-based, less syntactic, implementation of the Single Event Grouping Condition which has been implicitly adopted in the foregoing text. In that section, we will also consider the division of labour between the independently useful syntactic and semantic notions of ‘event.’ We move on in Section 8.3 to the elaboration of the processing account. Finally, Section 8.4 concludes.
8.2 Could We Syntacticize the Single Event Grouping Condition? 8.2.1 Two Syntactocentric Theories of Event Structure If we are to attempt to syntacticize the Single Event Grouping Condition, we must be clear about what constitutes a syntactocentric theory of event structure. I will assume that the fundamentally semantic characterization of the internal structure of events outlined in Part I is largely accurate. Inevitably, this will not be completely true. However, we can continue to investigate the division of labour between syntax and semantics in the Single Event Grouping Condition without worrying overly about just how accurate the theory of Part I is, so long as the broad shape of that theory is accepted. The important characteristics of the theory of Part I are that it takes a set of semantic units (events) and defines relations among them based on semantic notions such as direct causation and goal-driven enablement or planning. The characteristic hypothesis distinguishing a syntactocentric theory of event structure is that certain of these semantic elements are taken to be determined by, or at least stand in a direct relation to, independently motivated features of phrase structure. If such a hypothesis can be maintained, then it is possible to recast the Single Event Grouping Condition in syntactic terms by replacing the semantic terms currently used in the definition of ‘single event’ with their syntactic counterparts. Here, we already reach a choice point. Syntactic structures, at a basic level, have two main components: a set of units, and a set of specified relations among those units. This much is as true of phrase structure as of event structure as characterized in Part I. In phrase structure, the units in the Chomskyan tradition are constituents, and the relations among constituents include the family of largely interdefinable relations such as c-command and dominance; the family of privileged relations which participate in selection or agreement, possibly including head–complement, head–head or spec–head; and the relations which link elements in a chain or other apparently nonlocal grammatical dependency. Similarly, event structure, at least on any decompositional theory, has two main components: a set of events, and certain specified relations (here,
210
Architectural Issues
primarily direct causation and enablement, as well as the relations involved in macroevent formation and event groupings) among those events. So there are two obvious ways in which a syntactocentric theory might be developed by matching either of these two subparts of the whole system. (2)
(a) Syntactocentric Hypothesis 1: Events always correspond to constituents at a given point in the derivation. (b) Syntactocentric Hypothesis 2: Certain relations among events (e.g. contingent relations) always correspond to particular relations among constituents at a given level of representation.
In practice, however, an event is rarely considered syntactically in isolation. This causes immediate problems for the first hypothesis above. The widely accepted current theories of the interaction of phrase structure and event structure, following Higginbotham (1985), hold that an event variable is introduced by one head, and existentially quantified by an operator associated syntactically with a second, c-commanding head. In such a case, it is clear that the event in question does not straightforwardly correspond to any syntactic unit, as the assertion of the existence of an event is due not to any one constituent but to the interaction of two heads which can be quite remote from each other. In its above form, then, Syntactocentric Hypothesis 1 is a nonstarter. A less clearly incorrect version of the first hypothesis can be derived by considering the event variable’s role as an argument of a predicate. In all but the simplest cases (such as weather verbs), the event variable is just one of several arguments of that predicate. Call the syntactic representation of a predicate and all its arguments, eventive or otherwise, a predicate-argument group. We may then consider an alternative version of Syntactocentric Hypothesis 1, as follows: (3)
Syntactocentric Hypothesis 1 (revised): Predicate-argument groups are constituents at a given level of representation.
Although (3) makes no explicit reference to events, it is still a substantive hypothesis about the representation of event structure. The other arguments of a predicate are part of the descriptive content which allows us to identify a particular event and individuate it. For example, we naturally talk about the event of John kissing Mary, specifying the participants as part of the description of the event. Syntactocentric Hypothesis 1, as revised in (3), states that all the syntactic elements which contribute to that descriptive content form a constituent at the relevant level of representation. Syntactocentric Hypotheses 1 and 2 are in principle independent of each other. If both were true, there would be a homomorphism from event
Syntacticizing the Single Event Grouping Condition
211
structure into phrase structure with respect to the relations mentioned in Hypothesis 2. This is a very strong position, which has never, as far as I am aware, been proposed in print. Each hypothesis has, however, been proposed in isolation. The first is explored in Geis (1973), but has been largely ignored since, and is worthy of more serious scrutiny. The second is familiar from the research programme instigated by Hale and Keyser (1993), wherein a V head taking a VP complement corresponds to a causal relation between two events, for example. The purpose of this section is to scrutinize both these hypotheses. We will see that the revised Hypothesis 1 from (3), although initially attractive, ultimately fails, because of an inadequacy with respect to the Unlikely Antilocality Puzzle presented in Chapter 1. Secondly, although Hypothesis 2 can be made to work well, given suitable auxiliary assumptions, for core events, the smallest level of our three-level event structure, there does not appear to be a way to expand it to cover extended events or event groupings. We are then left with the conclusion that the Single Event Grouping Condition is purely semantic, and not dressed in syntactic clothes. However, this must not obscure the fact that the syntactocentric approach to core events is very promising, and explains types of facts which simply are not addressed by the Single Event Grouping Condition. We will therefore leave this section with a fairly specific division of labour, where there are arguments for treating some parts of event structure in a syntactocentric way, but equally strong arguments for not doing so with all of event structure. 8.2.2 Contingent Relations as Complementation: How Far Can We Go? The most popular current syntactocentric theory of event structure embodies Syntactocentric Hypothesis 2 described above. Hale and Keyser (1993) initiated a very productive line of research building on this hypothesis, leading to work such as Harley (1995), Kratzer (1996), Marantz (1997), Travis (2000), Pylkkänen (2002), and Ramchand (2008). The guiding hypothesis behind much of this work is that a more articulated view of event structure goes hand in hand with a more articulated model of the lower portion of the functional sequence that forms the backbone of the clause, or the first phase, to borrow Ramchand’s term. Although the details vary from proposal to proposal, there is a common core consisting of the following characteristics. Taking a restrictive version of X-theory as a starting point, disallowing n-ary branching and multiple specifiers, a lexical head comes with a maximum of two phrase-structural positions in which arguments could be merged, namely its specifier and its complement. For the lowest verbal head, both of these arguments can be individual arguments. However, every other verbal head
212
Architectural Issues
has a verbal complement, by definition, so light verbs are restricted to a single non-verbal argument, located in its specifier. The spec–head relation may then be seen as corresponding to a generalized ‘subject-of ’ relation in the semantics. Moreover, it can be assumed that each verbal head introduces a predicate taking an event and possibly some individuals as arguments. The semantic correlate of a head–complement relation among verbal constituents is causation: the event described by the higher head directly causes the event described by the complement. Finally, a further relation, something like causation, except holding between an individual and an event, may hold between the highest specifier and its sister. To take a concrete example, consider the phrase structure proposed by Ramchand (2008). Ramchand proposes that there are a maximum of three subevents in each macroevent, interpreted as initiation, process, and result, respectively. Each subevent is instantiated syntactically as the head of an aspectual projection, these projections standing in complementation relations with each other. If all three projections are activated in a given derivation, this leaves four empty syntactic positions in the tree, namely the specifiers of the three aspectual projections, and the complement of the most deeply embedded of the three. The specifier positions are all filled by (not necessarily distinct) individual arguments, while the low complement position may be filled by a result-denoting XP. Putting all these components together gives the syntactic structure in (4), from Ramchand (2008: 39). (4)
initP
DP3 subj of ‘cause’ init
procP
DP2 subj of ‘process’
proc
resP DP1 subj of ‘result’
res
XP ...
Interpreting this structure involves, in addition to a generalized event composition rule for specifying the relations between subevents, and a generalized
Syntacticizing the Single Event Grouping Condition
213
‘subject-of ’ relation as described above, the lexical semantics of the individual heads, given in (5).1 (5)
(a) [[res]] = ÎPÎxÎe[P(e) & res (e) & State(e) & Subject(x, e)] (b) [[proc]] = ÎPÎxÎe∃e1 , e2 [P(e2 ) & proc (e1 ) & Process(e1 ) & e = (e1 → e2 ) & Subject(x, e1 )] (c) [[init]] = ÎPÎxÎe∃e1 , e2 [P(e2 ) & init (e1 ) & State(e1 ) & e = e1 → e2 & (Ramchand 2008: 45) Subject(x, e1 )]
This is a theory which embodies Syntactocentric Hypothesis 2 in that it sees complementation as the syntactic embodiment of direct causation. It does not, however, embody Syntactocentric Hypothesis 1, as a verbal head introducing a predicate of an event and an individual does not generally form a constituent with the XP denoting the individual but rather usually stands in a Spec–head relationship with it. Although the specifics of this proposal are Ramchand’s, the overall architecture is broadly similar to those proposed by the aforementioned researchers. There is an increasing amount of evidence for the availability of structures along these lines. The strongest evidence does not, in fact, come from semantic concerns. Ramchand states that: I will tie . . . argument relations to a syntactically represented event decomposition. The reason for this move is the claim that the generalizations at this level involve a kind of systematicity and recursion that is found in syntactic representations. The strongest hypothesis must be that the recursive system that underlies natural language computation resides in one particular module that need not be duplicated in other modules of grammar (i.e. in the lexicon, or in the general cognitive system). (Ramchand 2008: 38)
However, we do not need a recursive phrase structure to handle phenomena relating to semantic forms of embedding, especially not the limited, nonrecursive embedding of the type we see in Ramchand’s event structures. Indeed, one major thrust of the replies to Hauser, Chomsky, and Fitch (2002) in Pinker and Jackendoff (2005), and Jackendoff and Pinker (2005), concerns the existence of recursive cognitive systems which are not closely related to phrase structure, for instance in gestalt effects related to perceptual grouping. If, as appears to be the case, recursion is pervasive in human cognition, there is no argument a priori that event structure (even if it were recursive) should be represented in phrase structure. Rather, if we allow a fine-grained representation of counterparts of event-structural notions within our phrase structure, we do so 1 In fact, certain heads in Ramchand’s system are associated with multiple lexical entries, the choice between them depending on the nature of their complement in a given example, but I ignore this here.
214
Architectural Issues
because we want that information there, because it helps us explain interfacebased empirical phenomena. There are, to my mind, three types of arguments that could be put forward for a syntacticization of event structure. Firstly, we could argue that event structure has morphological effects, but that there is no direct connection, unmediated by syntax, between morphology and semantics. Secondly, we could suggest that the linguistic encoding of event structure shows nonnecessary parallels with the linguistic encoding of other, unrelated, cognitive semantic structures, and that the best way of capturing such a crosscategorial generalization is to relate it to a generally available phrase-structural template. Finally, it may be the case that certain restrictions on the distribution of event-structural relations match restrictions on the distribution of phrasestructural relations (such as syntactic locality constraints on movement, for example), and that event-structural relations bear a hallmark of syntactic relations in this way. To my knowledge, the last type of argument has never been made. However, there are several variants of the first two types of argument. They do not necessarily agree on a single phrase structure for the representation of event structure, and I will not try to force them all into a single coherent mould here, but the same considerations have recurred often enough by now to suggest that there is at least a grain of truth to the syntactic representation of some elements of event structure. I will concentrate mainly throughout this section on examples discussed and analyses proposed in Ramchand (2008). Ramchand is far from the only researcher to show the analytical benefits of adopting a syntactic decomposition. Summarizing the whole literature would take a vast amount of space, though, and would largely repeat examples of the same modes of explanation. One type of argument that could be built on considerations in Ramchand (2008) (though this strictly goes beyond what she says in that work) concerns crosscategorial semantic similarities. As is well known (Verkuyl 1972, Tenny 1987, Krifka 1989, Dowty 1991, Jackendoff 1996), there are parallels between the algebraic structures of certain nominal denotations and the algebraic structures of certain verbal denotations, when the NP in question functions as the complement of the V in question. This is the ‘measuring out’ effect mentioned repeatedly in Part I, whereby eat an apple in (6a) is telic because an apple is bounded, while eat apples in (6b) is atelic because apples is unbounded. (6)
(a) Michael ate an apple (in five minutes/?for five minutes). (b) Michael ate apples (#in five minutes/for five minutes).
Syntacticizing the Single Event Grouping Condition
215
Ramchand shows, firstly, that such an effect is not restricted to complement NPs. We see it as well, for example, with PPs. If a PP denotes a bounded path, then a VP in which it is contained can be interpreted as telic, while if the PP denotes an unbounded path, the VP is interpreted as atelic. (7)
(a) Sally pushed the cart into the river (in 30 seconds/*for 30 seconds). (b) Sally pushed the cart towards the river (*in 30 seconds/for 30 seconds).
Finally, the case of APs is somewhat more involved, but we can at least formulate a further parallel set of distinctions among adjectival denotations in terms of whether the scale which an adjective denotes is open or closed. Ramchand reports that AP resultatives with selected objects are only possible when the AP denotes a gradable, closed-scale property. We see, then, a broad conceptual parallel emerging across the four lexical categories. Process and culmination in the verbal domain, path and place in the prepositional domain, ‘paths through objects’ and boundedness in the nominal domain, and adjectival scales and limits on those scales all follow the same broad template of an extended property, and a bound on that property. Secondly, Ramchand shows that there are interactions between the type of event described by a verb and the availability of readings such as the above. A major dividing line in her system comes between verbs which lexically entail the attainment of a result state (syntactically encoded by the Res head in (4) above), and verbs which do not, and therefore lack Res. Verbs such as die are examples of the former, but typical accomplishments do not actually entail the attainment of a result state (this is why measuring-out effects are possible), and therefore pattern with activity predicates as verbs lacking Res. The interesting part of this story concerns the type of objects that Proc and Res can take as complements. Proc represents the ongoing, dynamic component of verbal meaning for Ramchand, while Res represents a lexically entailed result. As in (4), Proc can take Res as a complement. However, either head can also take another constituent as complement, denoting a ‘rhematic object’. And the semantic shape of those objects must match that of the head in question. If the rhematic object is denoted by a complement of Proc, it must denote something which is ‘extended’, in the same way in which Proc is extended. If it is denoted by a complement of Res, it must not be extended because Res is not extended. This, then, lies behind the difference between pound and break illustrated in (8). A [Proc] verb lacking Res, like pound in (8a), is incompatible with a PP headed by a preposition which describes a result without reference to a path, like in. On the other hand, a [Proc,Res]
216
Architectural Issues
verb, like break, can describe a result state with just such a pathless PP, as in (8b).2 (8)
(a) Kayleigh pounded the metal *in/into pieces. (b) Katherine broke the stick in/into pieces. (Ramchand 2008: 75–6)
Although the mode of analysis employed by Ramchand here seems primarily semantic, concerned with the common scalar structure underlying different syntactic categories, there is a phrase-structural component too. Specifically, the analysis relies on the assumption that there is at most one complement of each head, so a rhematic complement of Proc and a ResP projection compete for the same phrase-structural slot, and both cannot be simultaneously present. This mirrors the binary branching and restriction to a single complement assumed in minimalist phrase structure. The successful general application of this impressive crosscategorial semantic structure relies on assumptions about a decompositional syntactic structure embodying that semantics. Indirectly, then, we arrive at an argument for a decompositional phrase structure accompanying a decompositional event semantics. The morphological evidence adduced for a syntactic decomposition of event structure generally takes the form of a set of correspondences between a ‘bigger’ event structure, a larger array of morphemes, and a larger amount of participants in the events. Implicit in the argumentation, but probably uncontroversial, is the assumption that, if the proposed semantic decompositions were never accompanied by a segmentation into separate lexical items, an antilexicalist argument could only proceed on conceptual grounds, as there could be no evidence against the position that the decomposition in question was purely lexical. However, once we see cases where semantic relations among subevents correspond to morphosyntactic relations among morphemes, words, and phrases, a more persuasive case can be made that the construction of event structure is simply another instance of the regular correspondence between morphosyntax and compositional semantics. Several descriptions have appeared of such correspondences between event structure, argument structure, and morphology, along lines initially suggested by Hale and Keyser (1993). To take one example from Svenonius (2004), the Russian lexical prefix pro- in (9b) licenses an otherwise unlicensed argument, and (to judge by the English translation) induces telicity in an otherwise atelic event description. 2 It is less clear to me how Ramchand accounts for the acceptability of a path-denoting PP like into pieces with a [Proc,Res] verb like break.
Syntacticizing the Single Event Grouping Condition (9)
217
(a) Sobaka ležala (*odejalo). Dog lay blanket ‘The dog lay (*the blanket).’ (b) Sobaka proležala odejalo. Dog about- lay blanket ‘The dog wore out the blanket by lying on it.’ (Svenonius 2004: 216)
To be clear, the argument is not simply that morphologically more complex items are semantically more complex too. That would come as no great surprise to anyone. Rather, the claim is that, if we avail ourselves of basic constraints on phrase structure and a small, fixed inventory of eventrelated functional heads, we can lay the groundwork for an integrated account of why particular types of morphosyntactic alternation correspond to certain types of argument-structural and event-structural alternations, but not others. There are interesting, and suggestive, arguments for a syntactic representation of event structure, then, and the kind of correspondences between syntactic and semantic categories uncovered in arguments like the above are probably beyond the scope of the almost purely semantic theory of event structure elaborated in Part I. However, the empirical domain for which such syntax-heavy theories are designed is only a small part of the much more expansive notion of event structure defended in Part I. The event-denoting constituents derived by a first phase syntax correspond maximally to the core events described in Part I, and the next subsections will demonstrate that the theory cannot be expanded to the larger structural units proposed above, namely extended events and event groupings. 8.2.3 Can Syntactocentric Hypothesis 2 Cope with Extended Events? This section will consider how we might expand a theory such as Ramchand’s to cover extended events. The most natural way to do this is to further expand the right-branching cascade in the syntax, and to weaken the direct causation requirement on V-VP complementation structures, so that any contingent relation between two events is admissible as the interpretation of such a structure. This is a natural way to extend the post-Hale and Keyser theory, as complementation is the recursive structure-building operation par excellence in the syntax, and it is being used here to represent a similarly recursive structure-building operation in the semantics, namely extended event formation. Abstracting away from specific labels, then, we may expect to see an extended event as in (10a) correspond to a right-branching cascade as in (10b).
218
Architectural Issues
(10) (a) Action (e1)
(b)
...
e2
en−1
Goal (en)
V1 P Spec V1
V2 P Spec V2
... Vn−1 P Spec Vn−1
Vn P
Once more, head–complement relations in phrase structure correspond to contingent relations in the event structure. However, given the possibility of recursive extended event formation, the number of VP shells is no longer limited to two (Kratzer 1996, Hale and Keyser 1993) or three (Ramchand), but is in principle infinite. However neat this generalization may look, it runs into empirical problems almost immediately. The major problem faced by this generalization of the post-Hale and Keyser approach to extended events comes from the fact that it predicts that elements related to initial subevents always c-command elements related to the goal with which that initial subevent was performed. We cannot immediately say that such a structure is incorrect. In fact, it has frequently been proposed for verbal adjuncts in general, for example by Larson (1988), and it is widely accepted that resultative secondary predicates, for example, appear in just such a low position. However, it runs into empirical problems in this particular case. We can see this by comparing such a structure to a more orthodox proposal for the phrase structure of one of the structures which motivated our initial discussion of extended events in Chapter 3, the rationale clause. The phrase structures for rationale clauses in Faraci 1974 or Jones 1991, for example, would place the adjunct somewhere above, or possibly adjoined to, VP, as in (11a). However, as the rationale clause specifies a goal of the matrix event, the theory
Syntacticizing the Single Event Grouping Condition
219
under discussion would have to place it at the bottom of a right-branching cascade, as in (11b). (11)
(a)
TP Subj T
VP IOC
VP V (b)
Obj
TP Subj
T
V1 P V1
... Vm P Obj
Vm
... IOC
To decide which of the two structures in (11), if either, fits the empirical facts, we turn to the usual diagnostics for c-command. In the structure in (11a), the subject c-commands into the rationale clause, but the object doesn’t. On the other hand, in (11b), the adjunct is in a very low position at the bottom of the cascade, and so everything c-commands into it. This means that a traditional c-command diagnostic such as condition A, which states that a reflexive anaphor must be bound by a locally c-commanding DP should tease the two theories apart. (12a) shows that the subject is indeed capable of binding a reflexive anaphor, as predicted by both structures. However, (12b) shows that the same is not true for the object. This suggests that the object does not ccommand the anaphor, and so does not c-command the rationale clause, as in the orthodox structure, but unlike the Larsonian structure developed above on the basis of Syntactocentric Hypothesis 2. (12)
(a) John hugged Mary [in order to make himself happier]. (b) *John hugged Mary [in order to make herself happier].
220
Architectural Issues
Consideration of the pro subject within the rationale clause changes the analysis somewhat, but does not render it much less problematic. We would need to derive the generalization that pro inside a rationale clause is always controlled by the subject. In a structure like (11a), this is straightforward, as the subject is the only c-commanding argument, and so the only possible antecedent choice. In a Larsonian structure like (11b), however, all arguments c-command pro, and the lower arguments in the tree c-command it more minimally than the higher arguments. Any condition like the Minimal Distance Principle of Rosenbaum (1967) should then predict that the matrix object, not the matrix subject, is the default controller of pro in a rationale clause. Granted, other binding-theoretic tests for c-command produce conflicting results, as pointed out by a reviewer. In particular, patterns of bound variable anaphora appear to indicate that the object is in an appropriate position to antecede a pronoun within the rationale clause. (13) John hugs every studenti [in order to annoy heri parents] However, it is well known that the locality constraints on variable binding, whatever they may be, do not require strict c-command, as possessors can bind out of noun phrases (14a) and NPs can bind out of PPs (14b). (14)
(a) [Every boyi ’s mother] loves himi . (b) I talked [to [every boy]i ] about hisi future.
In contrast, the obligatory control and reflexive binding relations in (12) really do diagnose strict c-command. Although the results are somewhat more equivocal than we might like, then, the diagnostics which most reliably diagnose c-command suggest that the object fails to c-command into the rationale clause here. In other words, if we are to keep a structure like (11b), we have to explain a systematic violation of the Minimal Distance Principle. One possibility would be to say that, for some reason independent of the Minimal Distance Principle, only subjects can control into adjuncts. This cannot be the whole story, though, given that in double object constructions, other varieties of purpose clauses do not show obligatory control by the subject.3 (15)
(a) Johni gave a dummyj to his babyk [Op∗i/j/∗k pro∗i/∗j/k to suck on t Op ] (b) Johni gave his babyj itk [Op∗i/∗j/k pro∗i/j/∗k to suck on t Op ]
In ditransitives with rationale clauses, on the other hand, the subject orientation remains. 3
Thanks to Ally Beaven for bringing this to my attention.
Syntacticizing the Single Event Grouping Condition (16)
221
(a) Johni gave a dummyj to his babyk [proi/∗j/∗k to get some peace and quiet] (b) Johni gave his babyj a dummyk [proi/∗j/∗k to get some peace and quiet]
Something has to be said to explain why the rationale clauses are systematically subject-oriented in double object constructions, while this is not true even of other classes of purpose clause. The natural story to tell about this, based on ideas going back to Faraci (1974), relates this control difference to syntactic height of attachment: rationale clauses attach above either internal argument, whereas purpose clauses do not. This natural syntactic story is, however, incompatible with Syntactocentric Hypothesis 2: at least some adjuncts specifying an agent’s goal appear in syntactic structures like (11a) rather than (11b), which means that not all enablement relations among events can correspond to head–complement relations in the phrase structure. Given that it gets the c-command facts, and so the constituency facts, wrong, then, the post-Hale and Keyser implementation of Syntactocentric Hypothesis 2 cannot be used as a model of the full event structure described in Part I. This does not preclude using it as a model of the syntactic representation of core events, though, as the above evidence only weighs against using it for extended events, and so, a fortiori, for event groupings. In fact, although I do not have space for anything more than some very compressed remarks on this topic, it is interesting to note that the evidence for Ramchand’s syntactic decomposition is weakest where the treatment of agents and initiators, which license extended event formation in the terms of Part I, is concerned. On the one hand, initiation and agentivity do not fit into the crosscategorial similarities between paths, scales, and processes that Ramchand uncovers: there is no PP or AP equivalent of agentivity. Moreover, relatively little of the morphosyntactic evidence Ramchand amasses in favour of her proposal concerns initiators or agents (the major exception is a discussion of two Hindi causatives in her Chapter 6, discussion of which will have to wait for another occasion). Further evidence that something is awry in Ramchand’s treatment of initiators comes from consideration of event-denoting subjects. We will see that (17a) and (17b) have a semantic similarity which is not reflected in the phrase structure. (17)
(a) John deafened me. (b) John whistling deafened me.
There is a basic syntax–semantics mismatch in (17). Although the subject in (17a) appears to denote an individual, definitions of the decompositional
222
Architectural Issues
structure of accomplishments following Dowty (1979) treat verbs like deafen as denoting a relation among events, such that an event involving John caused me to become deaf. The apparently individual-denoting subject must stand in for an event. In (17b), the nature of that causing event is made explicit. We could attempt to remove the syntax–semantics mismatch in (17) by postulating a null event description as complement of John in (17a), as in (18a), thereby restoring a syntactic similarity to match their semantic similarity. However, (19) shows that there is no null event description in the syntax: John c-commands himself in (19a), but not in (19b). (18)
(a) John
pro-V
deafened
me
(b) John (19)
whistling
deafened
me
(a) [TP John [VP deafened himself]] (b) *[TP [John whistling hornpipes] [VP deafened himself]]
The natural account of these facts is that, when necessary, a constituent such as John can denote an event, as a form of coercion. If a predicate such as deafen requires an event-denoting subject for semantic coherence, then whatever constituent occupies the subject position must be interpreted as an event. However, if this move is admitted, it is hard to see what the semantic content is of the ‘generalized subject-of ’ relation implicit in Hale and Keyser (1993) and made explicit by Ramchand. An event makes a natural subject of an accomplishment predicate like deafen, but not of an activity like walk, say, despite the fact that both are represented as InitP + ProcP in Ramchand’s system. In fact, individual-denoting specifiers are the norm across every projection discussed by Ramchand except for InitP in accomplishments. The subject of an accomplishment predicate is very much the odd one out in this respect. Of course, there is still no formal content attached to Ramchand’s notion of ‘subject-of ’, so it is hard to say that these considerations falsify her claim that there is a generalized ‘subject-of ’ notion evident across all three of her eventive projections, but they do suggest some grounds for concern. In general, Ramchand’s syntax and semantics is not set up to handle the fact that an agent’s cognitive involvement in a series of happenings which he plans can lead to the delimitation of that series of happenings as an extended event, as documented in Chapter 3. Ramchand describes the basic empirical fact correctly: agents do not behave semantically (or syntactically) like other
Syntacticizing the Single Event Grouping Condition
223
initiators. However, her representation of this, where an agent is not just an initiator (in [Spec,Init]) but also an undergoer (in [Spec,Proc]) leads to a recurring problem, whereby a structure has too many undergoers, and the criteria for choosing which is the syntactic undergoer are not given. We also arrive at something other than the simplest compositional system, where the complex Ë-role initiator–undergoer bears nothing more than the combined entailments of the simplex Ë-roles initiator and undergoer, because cognitive involvement in an event is not sufficient to license just anyone as an undergoer—only the agent is privileged in this respect. Although I cannot explore this avenue in any detail here, then, this discussion suggests a possible division of labour with respect to syntactic and semantic representation of event structure. Ramchand’s decomposition of events into a syntactically represented process and result is well supported by restrictions on the coocurrence of such semantic units across syntactic categories, and by morphosyntactic evidence that process and result can be described by separate lexical items. However, there is little evidence of either type for syntactic representation of the initiation component. Moreover, syntactic representation of initiators in general, and agents in particular, leads to a range of problems. Perhaps, then, we might investigate the possibility of a common syntactic representation of processes, paths, and scales, and their bounds, while keeping the representation of agentivity strictly separate. At present, this is the most promising route I can see for approaching an empirically satisfying syntactic decomposition. However, such a project is very much for the future. For now, the pertinent conclusion is that effect of agentivity cannot be adequately represented along the lines of Syntactocentric Hypothesis 2, and this is sufficient grounds for not syntacticizing the Single Event Grouping Condition in this way. 8.2.4 Predicate-Argument Groups as Constituents: The Unlikely Antilocality Puzzle Revisited The previous subsection argued that a model of event structure which is syntactocentric in that it assumes a direct mapping between complementation relations in the syntax and contingent relations in the event semantics can only work for a subset of the full event structure presented in Part I, although it can provide a useful representation of core event structures. This subsection will explore the possibilities of a further form of syntactocentrism, as represented in the revised Syntactocentric Hypothesis 1, repeated below. (20)
Syntactocentric Hypothesis 1 (revised): Predicate-argument groups are constituents at a given level of representation.
224
Architectural Issues
In this section as a whole, I hope to show that the Single Event Grouping Condition must be cashed out in genuinely semantic terms, and cannot be a semantic reflection of a syntactic generalization. Syntactocentric Hypothesis 1 is interesting in this respect, because relating event-structural predicateargument groups to syntactic constituents gives rise to a natural way of attempting to implement the Single Event Grouping Condition syntactically, and it is important to be clear about where this fails. What I have in mind is that treating predicate-argument groups as directly derived from particular constituents raises the possibility that certain such constituents may be islands, whereas others may fail to be. For concreteness, let us assume with Uriagereka (1999) and subsequent work that in a configuration such as (21), where two sisters are both phrasal, the nonprojecting sister (Y in (21)) is generally an island for extraction. (21)
X X
Y
Moreover, assume that the islandhood of Y in (21) can be voided in certain circumstances, when some further syntactic relation (call it the pseudoargument relation) holds between (the head of) X and Y. Such a principle would join a long history of similar conditions. For example, Huang (1982) allows extraction only from properly governed constituents, while Chomsky (1986) stipulates that L-marking (Ë-government by a lexical category) removes barrierhood and allows movement over greater distances than would otherwise be possible. There is a recurring theme, then, of some additional syntactic relation removing a barrier to movement. This appears to be a natural syntactocentric approach to the adjunct subextraction data described in Chapter 6: adjuncts are naturally barriers, but this can be voided in appropriate circumstances by forming a pseudoargument relation with (the head of) their sister. However, there is little to be gained by syntacticizing the present theory in this way. I continue to take it for granted that the patterns of extraction described in Chapter 6 have their roots in event semantics. All else being equal, then, the simplest theory of this aspect of locality makes reference exclusively to semantic factors, and the burden of proof therefore lies with any proponents of a syntactocentric version of the theory. Mimicking these factors by means of syntactic features and relations is a theoretical possibility, but we would expect
Syntacticizing the Single Event Grouping Condition
225
the resulting syntactic theory to bear the hallmarks of syntactic phenomena. This, however, is not what we find. Moreover, the Unlikely Antilocality Puzzle described in the introduction rears its ugly head again here. This puzzle was concerned with the contrast in acceptability between sentences such as the following. (22)
(a) ??What did John drive Mary crazy [fixing ]? (b) What did John drive Mary crazy [trying [to fix
]]?
We concluded in the introduction that this contrast could not plausibly be explained by an antilocality condition, in the sense of Abels (2003) and Grohmann (2003), except by doing violence to the elegance of the notion of antilocality as it stands. Moreover, the Single Event Grouping Condition can straightforwardly account for the contrast: fix describes an accomplishment, whereas try to fix describes an atelic activity, as shown by the following tests, introduced in Chapter 2. Both fix and try to fix are acceptable in the progressive, but fix takes in-PPs, while try to fix takes for-PPs. (23)
(a) John is fixing the car. (b) John is trying to fix the car.
(24)
(a) (b) (c) (d)
John fixed the car in two hours. #John fixed the car for two hours. #John [tried to fix the car] in two hours.4 John tried to fix the car for two hours.
We saw in Section 6.3 that BPPAs must describe activities for extraction to be possible. This is instantly congruent with (22b), but (22a) requires coercion for extraction to be possible, which leads to a degradation in acceptability. The question we must ask here is, how could the locality theory just sketched, based on the syntactic, barrierhood-voiding pseudoargument relation, account for the contrast? The VP fix must be a barrier, given the unacceptability of (22a).5 Something must void the barrierhood of [VP fix ] in (22b), then. The natural candidate is the VP’s sister, to. Let’s assume that to can somehow form a relation with its sister which voids its barrierhood. Then the contrast between (22b) and an example like (25), repeated below, would suggest that the barrierhood sometimes re-emerges higher within such an adjunct. According to the locality theory sketched above, both adjuncts are 4 The brackets are intended to keep the reader away from the other, grammatical, reading, where in two hours modifies the embedded, accomplishment-denoting, clause. 5 It may be objected that (22a) only shows the barrierhood of fixing, not fix. As -ing clearly does not form a barrier in cases such as (22b), though, it is natural to suspect that the reason for the degradation of (22a) is due to fix itself and not fixing.
226
Architectural Issues
in position of Y in (21), and so will be barriers unless their sister VP, or its V head, enters into a pseudoargument relation with the adjunct. We would have to claim that this happens in the case of (22b), but not (25). (25) *What did John drive Mary crazy [beginning [to fix
]]?
It must be possible to make some story along these lines work, if not by relying on properties of overt lexical items such as to, then by stipulating the distribution of null lexical items. So there is no descriptive problem here. However, we must be aware that this analysis does something quite distinct from the approaches to locality considered above. Specifically, this approach would make to a type of element which I will call a facilitator, the opposite of an intervener. Whereas the presence of an intervener on a movement path makes that movement harder, the presence of to makes movement easier. In contrast, since Ross (1967), we have assumed that, in the vast majority of cases, extra syntactic material intervening between the head and the foot of the chain can only make extraction harder, never easier. A theory based on such a facilitator would therefore be quite out on a limb. Within current versions of locality theory, there is only one class of elements which behave like facilitators, pointed out to me by David Adger and an anonymous reviewer. These are the Edge Features of Chomsky (2008), and the heads that bear them. We could, then, pursue a syntactocentric account of the Unlikely Antilocality Puzzle by stipulating the distribution of Edge Features in such a way that it reflects the pattern of extractions described above. However, it must be clear that this is a syntactocentric account of the Unlikely Antilocality Puzzle in only a very trivial way. Edge Features are a technology for making things move. A theory including Edge Features is a theory of how things move and where things move to, but it is not yet a theory of locality. To make Edge Features into a theory of locality, we need a theory of the distribution of Edge Features, and no such theory has yet been proposed. In other words, a syntactocentrist wanting to use Edge Features to explain the Unlikely Antilocality Puzzle would still have to base the requisite description of the distribution of Edge Features on something, and no obvious syntactic basis for that something springs to mind. Edge Features, then, could be marshalled to help explain the fact that movement happens, but the specific pattern of movements (and so the specific distribution of Edge Features) that we see here is not explained by use of Edge Features. The pattern of when movement is and isn’t possible would still have to be explained in other terms, then. Certainly, there is nothing in the syntax of try and begin which suggests that one, but not the other, should enter into pseudoargument relations across adjunct boundaries. However, there is
Syntacticizing the Single Event Grouping Condition
227
a semantic, event-structural distinction between the two. Specifically, try produces unbounded activities regardless of the aspectual class of its complement TP. On the other hand, begin is bounded. It clearly makes reference to an inception, but it is also telic, as shown by Smith (1997). With respect to the for/in test, it patterns most like the class of points in disallowing either class of PPs, as shown below.6 (26)
(a) *John hiccupped for five minutes. (b) *John hiccupped in five minutes.
(27)
(a) *John [began to fix the TV] for five minutes. (b) *John [began to fix the TV] in five minutes.
Regardless of whether begin forms a natural aspectual class with points, however, it clearly does not form a natural class with try in this respect, as we saw above that try allows modification by for-PPs. And in the light of the Single Event Grouping Condition, it is natural to tie the difference in acceptability between (22a) and (25) to this aspectual distinction. Once again, then, if we were to try to account for these data in terms of the distribution of pseudoargument relations, this syntactic mode of explanation would be mimicking a primarily semantic fact: syntactic pseudoargument relations would be postulated wherever semantic macroevent formation is found. This is a formal possibility of course, but at present such syntacticization lacks independent motivation. It is natural to conclude that the only reason for adopting a syntactocentric approach to these data would be to preserve a particular notion of modularity, whereby locality is syntax’s domain by definition. However, two objections can be raised against this conception of modularity. The first is that, if such a large chunk of the semantics could be foreshadowed in the syntax without any independent motivation, then strong hypotheses about the modular architecture of the grammar would lose much of their empirical bite. Secondly, it is a mistake to think that there is any conflict between the theory based on the Single Event Grouping Condition and an architectural model of the language faculty based on the Y-model. As was made clear in Chapter 5, we have arrived at a model of the extraction data presented here where what is at issue is the interpretation of a wh-question rather than locality in the traditional sense. The syntax, considered in isolation, overgenerates, producing 6 Begin does readily allow progressives, unlike points, however. This might be less of a contrast than is apparent, though. Classical points, such as hiccup, are most readily conceived of as atomic actions. However, if watched in slow motion on a video, it is quite possible to say John is hiccupping during a drawn-out hiccupping event. Beginnings, in contrast, are always drawn-out events.
228
Architectural Issues
many degraded questions, as well as many legitimate ones. However, as well as the well-studied constraints on their syntactic shape, the claim here is that there is a significant constraint on their semantic shape, as embodied in the Single Event Grouping Condition. Degraded questions such as (22a) or (25) are not so much ungrammatical, as semantically ill-formed, and so only interpretable with significant difficulty. As mentioned in Chapter 5, a question such as Which book did John collapse after reading? has a status which is similar in certain respects to that of Colourless green ideas sleep furiously: both are grammatical, but in neither case is the interpretation straightforward. Of course, there are differences between the two cases in how we confront this semantic anomaly, but the differences are smaller than one might think. Colourless green ideas sleep furiously leaves us looking for ways to use the lexical items in question in extended, metaphoric, or metonymic ways to arrive at a coherent proposition as governed by the compositional semantics. Similarly, in the case of Which book did John collapse after reading?, the compositional semantics, plus the Single Event Grouping Condition, force us to use an enriched, not purely temporal, interpretation of after. The factors determining our choices of lexical interpretations are more narrowly structural in the latter case, but the process is essentially the same. If this is accurate, then a syntacticization of the Single Event Grouping Condition may be misguided in the first place. Summing up, the model of locality based on a syntactic ‘pseudoargument’ relation is theoretically possible, but the unmotivated transplantation of obviously semantic notions into the syntax must raise eyebrows, and lead to a charge that the model is being complicated rather than simplified overall. Moreover, there are no broad architectural reasons to follow such a path, and we would run into empirical difficulties in accounting for data such as the Unlikely Antilocality Puzzle. We have investigated two ways of incorporating an apparently semantic constraint into our syntax, based on the idea that, to syntacticize a constraint like the Single Event Grouping Condition, either phrase-structural units must correspond to event-structural units or phrase-structural relations must correspond to event-structural relations. With some such correspondence in place, we would be able to syntacticize the Single Event Grouping Condition. However, enablement relations in extended events do not correspond to the obvious candidate for a phrase-structural counterpart, namely asymmetric c-command, and so at least one contingent relation lacks a direct phrasestructural analogue, although there is some evidence in favour of a syntactic instantiation of direct causation in core events in terms of asymmetric c-command.
Integrating Syntactic and Semantic Constraints on Movement
229
It may be possible, on the other hand, to maintain some form of correspondence between constituents, as phrase-structural units, and predicate– argument groups, as event-structural units, through judicious use of coercion. However, it is not clear that this is sufficient to allow a successful syntacticization of the Single Event Grouping Condition. After all, the point of coercion is that it introduces some significant structural distinction between the semantic representation straightforwardly derived from the syntactic structure and the ultimate semantic representation. If the Single Event Grouping Condition is best stated over post-coercion semantic representations, it certainly doesn’t go without saying that a pre-coercion analogue can be found. I will not claim to have proved that the Single Event Grouping Condition must be semantic: there are too many choice points, and too much is contingent on the specifics of our syntactic and semantic theories to allow such a sweeping statement. I consider the foregoing to be good grounds for scepticism, however. I will therefore continue to assume that the Single Event Grouping Condition is semantic and leave it to anyone who disagrees to prove me wrong.
8.3 Integrating Syntactic and Semantic Constraints on Movement Now that I have made my argument for the basically semantic nature of the Single Event Grouping Condition, the question is what I intend to do about the existence of this semantic constraint restricting the distribution of a syntactic operation. There is a wide consensus that such constraints, in their simplest form, must not exist: there should be no Single Event Grouping Condition for the same reason that there is no Single Nasal Condition: movement, as a syntactic relation, is the wrong type of thing to care about phonological features like nasality or semantic properties like eventhood. This consensus is not universal. It is not shared by many researchers working within cognitive grammar, for example, where semantics and syntax collapse together to a substantial extent, to the exclusion of phonology. The Single Event Grouping Condition in isolation could be taken as an argument for a cognitive grammar approach. However, I have stressed that we must not take the Single Event Grouping Condition in isolation. It only has any hope of empirical adequacy if it is paired with a set of narrowly syntactic constraints on movement. That is why the title of this book is Events, Phrases, and Questions: it is crucial to the success of this enterprise that we have a purely syntactic representation, and a purely semantic representation which can differ in nontrivial, but constrained, ways, and that we can state constraints over either representation. This means that we should expect to keep our barriers between modules intact to avoid
230
Architectural Issues
the empirical catastrophe of a theory which permits Single Nasal Conditions and the like. However, challenges to this sort of medium-sized modularity (as opposed to the existence of coarse-grained modules like a language faculty or finegrained modules like the bounding, binding, Case, and other modules of Chomsky 1981) come in a variety of shapes and sizes. Concentrating simply on restrictions on wh-movement, we know that the acceptability of movement is constrained by at least pragmatic notions (Morgan 1975), discourse-based questions of perspective and focus (Kuno 1987), and processing concerns related to memory limitations and online parsing preferences (Miller and Chomsky 1963, Gibson 1998). There have also been arguments for restrictions on movement relating to linear order (Culicover 1993), and formal semantic considerations relating to scope and the algebraic structure of different types of noun phrase denotation (Szabolcsi and Zwarts 1993), and the syntactic distribution of predication relations (Kluender 1992). I do not intend to show what every item on this (no doubt partial) list does. Rather, I want to make two points. The first is that we have been committed to a more or less pluralistic theory of restrictions on the operation of wh-movement for a long time. In some frameworks (in particular that of Culicover and Jackendoff 2005), it is not even accurate to say that wh-movement, and the constraints on it, is primarily syntactic. If even half of the above claims are accurate, we should not be surprised that event structure affects the distribution of wh-movement as well. The second point is more substantial. In many of the above cases, the discovery of a nonsyntactic constraint on a syntactic operation has been accompanied by an explicit or implicit account of why we find precisely the apparent violation of modularity in question. So what do we do about the challenge to medium-sized modularity posed by the Single Event Grouping Condition. One option (which I adopted, more or less for want of anything better to say, in my dissertation, Truswell 2007b) would be to claim that effects related to this condition had a source, not in the movement itself, but in the semantics of the C0 head: the syntax doesn’t actually care about how many events are described, but movement to [Spec,C] requires the presence of a certain variety of C0 , and that C0 presupposes that its complement describes only a single event. That is a theoretical possibility, but it lacks any independent evidence, as far as I can tell, and so doesn’t have particularly good prospects for elevation from a description to an explanation. Instead, I want to explore an alternative here, based on an inaccuracy in the present formulation of the Single Event Grouping Condition that I have ignored until now.
Integrating Syntactic and Semantic Constraints on Movement
231
The problem is this: the present formulation makes reference to ‘the minimal constituent containing the head and the foot of the chain’ formed by movement. In practice, the head of the chain is always in [Spec,C], so the minimal constituent in question is always CP. And so, according to the present formulation of the Single Event Grouping Condition, no constituent within the CP targeted by movement should be able to describe an event independent of that single event in question. That is clearly too restrictive. For example, adding an additional rightward adverbial, even a finite one, in a position c-commanding the foot of the movement chain, generally makes no difference to the acceptability of movement. This is as true in the case of extraction out of an adjunct as it is in the regular case of extraction from complement position. (28)
] [when he got home]]? (a) [CP What did John [eat ] [when he got (b) [CP What did John [drive Mary crazy whistling home]]?
In each of these cases, the minimal constituent containing the head and the foot of the chain is CP, which also contains the adjunct when he got home, describing a second event. The Single Event Grouping Condition, as it stands, should rule this ungrammatical, but the sentences are absolutely fine. There is a purely syntactic way of addressing this problem, adopting and modifying ideas from Lebeaux (1988), Chomsky (1993), Boškovi´c and Lasnik (1999), Fox and Nissenbaum (1999), Stepanov (2001), Takahashi (2006), and Takahashi and Hulsey (2009), concerning late merger of syntactic constituents. Although each of these researchers has made different use of the same leading idea, the basic principle is that a limited amount of countercyclicity is allowed in the derivation. In particular, in the present case, we could assume that movement of what takes place prior to merge of when he got home. At the point in the derivation at which wh-movement takes place, then, the offending adjunct containing a second event description is not present. If we allow the Single Event Grouping Condition to be checked prior to merge of that adjunct, we predict the grammaticality of (28), on the basis of some very specific claims about the timing of the interaction of syntactic and semantic structures. Although that works descriptively, I do not want to follow this route here. Partly, this is because it appears increasingly likely that late merger, if it is useful for theories of anything, is not closely tied to an adjunct/complement distinction. In particular, the original use of countercyclic merger by Lebeaux and Chomsky aimed to capture an apparent adjunct–complement distinction in A -reconstruction for Condition C, but it has been shown by Kuno
232
Architectural Issues
(1997) and Lasnik (1998) that the class of constituents which reconstruct for Condition C, and the class which doesn’t, cannot be mapped onto the adjunct/complement distinction. Similarly, Stepanov’s attempt to construct a post-CED theory on the basis of an adjunct-only form of late merger is falsified by the existence of cases of extraction out of adjuncts discussed in this book. On the other hand, the theory of Takahashi, and Takahashi and Hulsey, expands late merger far beyond the adjunct/complement distinction, and so is not directly transferrable to effects like the one at issue here. Like Boškovi´c and Lasnik, and Fox and Nissenbaum, the function of late merger in these more recent papers is quite distant from the use of countercyclic merger sketched above. Although a purely syntactic theory of the grammaticality of (28) is possible in principle, then, it would look somewhat discordant in the context of the above developments. Moreover, I believe that there is an alternative that offers greater promise for simultaneously capturing the exception demonstrated in (28), and putting the Single Event Grouping Condition on a more principled footing. The alternative I have in mind is based on the Dependency Locality Theory, described in Gibson (1998) and particularly Gibson (2000). According to that theory, processing of filler–gap dependencies is costly because of the effort required to keep a filler in memory while processing other linguistic material, and the effort required to integrate fillers structurally when a gap site is found. The further the filler site is from the gap site, the more severe the processing cost. Most importantly for us, the distance covered by the filler–gap dependency is measured in discourse referents. In Gibson’s original (1998) work in this area, the only type of discourse referent considered is the individual. The original formulation of the Dependency Locality Theory therefore predicted a correlation between the number of individual-denoting constituents crossed by the filler–gap dependency, and the acceptability of that dependency. Even in that original form, Gibson’s theory was strikingly accurate in predicting a number of effects across a wide variety of configurations. However, it was never clear why individuals should be singled out from other discourse referents when it comes to calculating the processing costs associated with movement. This was addressed in Gibson (2000), where events were explicitly considered alongside individuals as discourse referents. This gives us a link to the concerns of the present work. The Single Event Grouping Condition, like the version of the Dependency Locality Theory in Gibson (2000), punishes dependencies across more than one event denotation. Indeed, more generally, a processing theory of the effects of extraction from adjuncts is attractive for many reasons.
Integrating Syntactic and Semantic Constraints on Movement
233
Firstly, it is now widely accepted that syntactic structure and semantic interpretation are both processed incrementally in tandem. The kind of admixture of syntactic and semantic concerns represented by the Single Event Grouping Condition fits nicely with such a holistic view of incremental processing, as opposed to the modular encapsulation espoused in competence theories of syntax and semantics. Secondly, we have seen a degree of gradience in the acceptability of extraction from adjuncts. Processing theories are one type of explanation available to us for such gradient patterns. In the presentation in the rest of Part II, I have talked almost exclusively about one source of gradience in extraction from adjuncts, namely the gradient availability of a single-event reading. However, even extraction from adjuncts on a multiple-event reading is not as toothgnashingly awful as other types of locality violation, such as extraction of a conjunct from a coordinate structure. (29)
(a) *Who did you get upset [after you talked to (b) **Who did you talk to [Mary and ]?
]? >
As mentioned in Chapter 7, cases of relativization, in particular, out of finite adjuncts are attested, and appear to be marginally acceptable for some speakers. The type of adjunct also appears to make a difference: extraction out of finite if-clauses is occasionally found to be only moderately degraded. Why we should find exactly these patterns is beyond me. However, this does suggest that our eventual theory should not instantly rule out any violations of the Single Event Grouping Condition. A processing theory may be capable of making sufficiently fine-grained distinctions to account for the fact that extraction in violation of the Single Event Grouping Condition is more degraded than extraction which respects the condition, without forcing us to consider the extractions which violate the condition as fully ungrammatical. In the theory of Gibson (2000), processing costs are associated with the introduction of a new discourse referent and with the integration of a filler at a gap site. The latter type of cost is where considerations of locality come in: it is assumed that integration is more costly when the filler is more remote from the gap, and that this remoteness is measured in terms of the number of discourse referents introduced by the intervening material. An extension of Gibson’s theory to the concerns of the Single Event Grouping Condition must therefore begin by deciding what counts as a discourse referent. Part I helps here by giving us a basis for deciding what counts as a single event. The basic effects of that theory of event structure should be familiar by now. Driving Mary crazy whistling something is a single event; arriving whistling something is two events, but a single event grouping; and working
234
Architectural Issues
whistling something is two events and two event groupings. Reading a book before building something is a single event if the reading is contingently related to the building, but two events (and two event groupings) if not. And so on. Instantly, we run into many questions of implementation. For example, we must ensure that two event-describing constituents jointly coming to describe one single event leaves us with one discourse referent rather than three (the macroevent and two subevents). If our theory predicts that driving Mary crazy whistling introduces more events than working whistling does, then our processing theory is scuppered from the start. Moreover, nothing tells us a priori how to deal with event groupings, as opposed to simple events. Such considerations force me into a series of stipulations to get this processing theory off the ground. Of course, any future development of this theory into more than just a sketch will require close scrutiny of the viability of these stipulations. Firstly, I will assume that event groupings are the discourse referents which count for the purposes of the Dependency Locality Theory rather than events. This allows a parallel treatment of arrive whistling and drive Mary crazy whistling. Secondly, I will assume that the matrix verb in a single-event description contributes the event grouping description, while any following event descriptions act as modifiers of that main event grouping description. In a singleevent reading of arrive whistling, then, arrive contributes an event grouping description, while whistling modifies, or further specifies, that description. In the same way as a big car still describes a single individual, despite the presence of a modifier, then, arrive whistling describes a single event grouping. Work whistling, however, describes two event groupings: work describes one, and whistling describes the other. Such a reading is generally available, in line with the concerns raised in Chapter 4. It is equally possible in the case of arrive whistling for each verb to contribute an independent event grouping description. However, the distinguishing characteristic of cases such as work whistling is that there is no alternative to such a multiple-event reading. Along with a proliferation of event groupings comes a proliferation of arguments. Whistling just isn’t whistling unless there is a whistler and a tune being whistled. This is, of course, equally true regardless of whether whistling occurs as part of a single-event or multiple-event reading. However, the definition of ‘event grouping’ in Chapter 6 insists that all events in an event grouping share the same external argument. I suggest, then, that only one external argument is introduced per event grouping. As a very crude syntactic analogue, building
Integrating Syntactic and Semantic Constraints on Movement
235
on the discussion of Cable (2004) in Chapter 6, we may consider this as equivalent to a distinction between (30a) and (30b) below, although clearly more has to be said about how to extend this distinction to other, larger, adjunct classes. (30) (a) [TP John [VP [VP arrived] [VP whistling]]] (b) [TP John [VP [VP arrived] [CP pro [VP whistling]]]] However, the antecedent of pro in such cases will always be the highly accessible matrix subject, so this extra argument position should be responsible for less of an additional processing load than an extra independent, referential DP subject would be. An approach such as this seems quite plausible for cases such as Bare Present Participial Adjuncts. For the other classes of adjunct discussed in Chapter 6, where prepositions or phrases like in order intervene between the two verb forms, the approach seems less natural: surely we need to acknowledge that the two verbs in those cases independently describe something, in order to be able to say what it is that the intervening preposition or in order relates. By stipulating that the Dependency Locality Theory cares about event groupings rather than events, though, I sidestep this issue: the verbs contribute separate event descriptions, but these are related as part of a single larger event, and so as part of a single-event grouping. As a broad outline of a processing theory, this approach gives us the results we want: there are more discourse referents intervening in a multiple-event reading than there are in a single-event reading. We consequently expect greater processing load in the multiple-event case. To give some concrete examples, consider the costs associated with the single-event reading of What did John arrive whistling? in Table 8.1. The head introducing every new description of a discourse referent incurs a cost of one unit on the first line; the cost of resolving a filler–gap dependency, measured in terms of intervening discourse referent descriptions, is given on the second line. The total hypothesized processing cost is the sum of these two figures. Now, compare this to a multiple-event reading of the same sentence in Table 8.2. Because the adjunct verb whistling introduces an independent event Table 8.1 Processing costs associated with a single-event reading of the sentence What did John arrive whistling? What New DR Integration cost Total
did
1 1
0
John
arrive
1
1
1
1
whistling 2 2
236
Architectural Issues Table 8.2 Processing costs associated with a multiple-event reading of the sentence What did John arrive whistling? What New DR Integration cost Total
did
1 1
0
John
arrive
pro
whistling
1
1
1
1
1
1
1 3 4
grouping discourse referent, and because the wh-dependency crosses one further discourse referent (the external argument of that new event grouping), we predict that the processing load at the point of integration of what as object of whistling will be two units higher in a multiple-event reading than in a single-event reading. This pattern is quite general across adjunct classes, given the assumptions adopted above. Whether such a rise in processing costs is sufficient to account for the degradation documented in Chapter 6 is not something to be decided by fiat here but rather something which must await a fuller theory and more detailed, quantitative empirical investigation. One major outstanding issue, for example, concerns any processing cost associated with extracting from a weak island. As stressed throughout, adjuncts look a lot like weak islands, even if they are clearly not strong islands. Such structural factors appear quite orthogonal to the concerns of Gibson (1998) and Gibson (2000), but there does appear to be some interaction, in that it is well known that extraction from weak islands is more ‘fragile’, and degrades more easily. If the above story is tenable, certain supplementary benefits come for free. In particular, we gain an instant account of the fact that, where both singleevent and multiple-event readings are possible, even in declaratives, the singleevent reading is preferred. This now reduces to the hypothesis in Gibson (2000: §5.3.6) that a preference for low integration cost is a factor in resolving structural ambiguities. Moreover, the puzzle concerning linear order discussed above with respect to (28) receives a principled solution: because the incrementality of processing is defined in linear rather than hierarchical terms, we have no reason to expect material after the gap site to affect acceptability of a sentence. Potentially, certain further recalcitrant patterns in the locality data could be explained by the approach adopted here, although details in implementation remain to be worked out. The first, mentioned briefly in Section 1.2, is that extraction out of multiple adjuncts, one embedded inside the other, is always sharply degraded.
Integrating Syntactic and Semantic Constraints on Movement (31)
237
(a) *What did John arrive [driving Mary crazy [whistling ]]? (b) ??Who did John go home [without getting upset [after talking to ]]?
There is now the possibility of accounting for this in terms of the greater integration cost associated with the longer movement, although making this precise is contingent on involved decisions, for example about successive cyclicity, which I am not currently in a position to make. The second empirical area worth exploring is the data from Culicover and Jackendoff (2001), discussed in Chapters 3 and 6, concerning rationale clauses not controlled by the matrix subject, and extraction therefrom. There is a rough hierarchy of complexity among different uses of rationale clauses. Most straightforward are the standard uses of rationale clauses controlled by the matrix subject. Rationale clauses attached to passives are more complex in that the rationale clause is controlled by a linguistically present but covert agent. Finally, other ‘playwright’ uses of rationale clauses are more complex still in that the rationale clause is controlled by an agent not otherwise linguistically present. This corresponds to differences in acceptability of extraction out of rationale clauses. In the standard case, extraction out of rationale clauses is fine (32a). Extraction out of a rationale clause modifying a passive is typically quite bad, but some moderately acceptable examples like (32b) can be found. Finally, extraction out of ‘playwright’ uses of rationale clauses like (32c) appears to always be unacceptable. (32)
]? (a) Which children did you kill the cow [to feed ]? (b) ?Which children was the cow killed [to feed (c) *Which dramatic effect does the ship sink [to produce
]?
We would hope that the processing theory outlined here could explain this on the grounds that extraction is more degraded when the manipulations required to find a controller for the rationale clause are harder. However, without a theory of the processing of implicit arguments, I must leave this as a promising avenue for future research. Finally, we raised the question in Chapter 6 of whether the correct generalization prohibited extraction from tensed adjuncts, or extraction from adjuncts with subjects. We now predict that the correct answer is both, and neither. The answer is both, because, on the one hand, tense cements the independence of the two event groupings, inevitably leading to introduction of new discourse referents. If times also count as discourse referents, then the introduction of a second tense node may further magnify this effect. On the other hand, an overt subject in an adjunct will typically be less accessible than
238
Architectural Issues
even a null subject. Establishing the wh-dependency across an overt adjunctinternal subject is also predicted to lead to a higher processing cost. From another perspective, though, we could answer that the correct generalization should prohibit neither of the above effects. Firstly, the above set of considerations do not refer explicitly to either tense or embedded subjects. Secondly, neither of the above structural factors lead to anything so severe as a prohibition, but more of a degradation, which can be substantial in certain circumstances. Many questions about this approach remain unanswered. Among the most glaring are, firstly, the status of the stipulations above; and secondly, a range of issues relating to complementation. Can the theory be extended to any of the data discussed in Chapter 7? I see that question as too intimately related to general questions pertaining to the processing of successive-cyclic movement to be immediately answerable. If, however, this approach is viable, we approach a satisfyingly minimal theory of extraction from adjuncts, one which does not refer directly to adjuncts at all. The component parts are a largely independently motivated theory of event structure, and a theory of the processing of filler–gap dependencies with substantial independent motivation. At least one general theory of the distribution of weak islands, namely that of Hegarty (1992), automatically includes adjuncts in that class too, although this is less straightforward in the case of other theories of weak islands. Although many loose ends need to be tidied up, then, this appears to be an explanatory approach to patterns of extraction from adjuncts with substantial promise.
8.4 Conclusion The primary concern of this chapter has been to explain why a condition like the Single Event Grouping Condition might exist. The reason why this is a particularly pressing question is because of the implicit challenge to mediumsized modularity which comes from making a syntactic operation like A -movement sensitive to semantic factors like event groupings. We discussed, and ultimately rejected, a possible attempt to resolve this tension by syntacticizing the Single Event Grouping Condition. Although it seems likely that some decompositional event structure is represented in the syntax, extended events, at least, cannot be felicitously represented that way. To the extent that core events and extended events form a natural class, then, that natural class must be semantically, rather than syntactically, defined. Instead, the more promising possibility is that the Single Event Grouping Condition represents a processing effect, derived from Gibson’s (2000)
Conclusion
239
Dependency Locality Theory. This hypothesis brings several advantages: it explains apparent linear order sensitivity in the workings of the Single Event Grouping Condition without recourse to countercyclic merger; it provides another source of gradience in this area; and above all it largely removes the sui generis nature of the original proposal by assimilating the effects discussed here to a more general, cognitively plausible theory. Testing the relationship between the Single Event Grouping Condition and the Dependency Locality Theory with any rigour is something I cannot do right now, but it should be done in the future. However, this approach, if tenable, will remove the challenge to medium-sized modularity by allowing us to state that the Single Event Grouping Condition has the effects it does due to the semantic effects on incremental processing, or more specifically, the fact that processing more complex semantic structures while resolving filler–gap dependencies leads to increased processing cost.
9 Conclusion There was a basic empirical problem at the heart of this work, but unravelling it has taken us quite far from the point of departure. The empirical problem is that current theories of adjunct islands apply quite blindly to make adjuncts behave like either strong or weak islands, and so there is little room for manoeuvre when it comes to observed cases where adjuncts don’t behave like such islands. To the extent that exceptions are countenanced in recent work, such as the notion of ‘quasi-adjunct’ in Boeckx (2003), we typically receive a promissory note, rather than a full theory of the exceptions. Legitimate extractions from adjuncts frequently pattern in ways which are the opposite of what syntactic locality theory has brought us to expect. Four puzzles were laid out in the introduction, which pose clear challenges to any syntactic theory of extraction from adjuncts. All four puzzles can now be explained, in terms of the theory developed in the preceding chapters. The Restricted Extraction Puzzle detailed a disparity between the extraction possibilities of different classes of adjunct. Rationale clauses allow quite free extraction of complement DPs, as shown by (1), repeated from Chapter 1 like all the examples discussed here. Meanwhile, the acceptability of extraction of complement DPs from Bare Present Participial Adjuncts is contingent on the aspectual class of the matrix VP, as well as a host of other factors explored in Section 6.3. The basic data is repeated below. (1)
]? (a) What did you come round [to work on (b) Which paper did John travel halfway round the world [to submit ]? ]? (c) What did Christ die [to save us from
(2)
(a) Matrix VP describes an accomplishment: What did John drive ]? Mary crazy [whistling (b) Matrix VP describes an achievement: What did John arrive [whistling ]? (c) Matrix VP describes an activity: *What does John work [whistling ]?
Conclusion
241
(d) Matrix VP describes a point: *What did John notice the commo]? tion [looking through (e) Matrix VP describes a state: *What does John know French [whistling ]? To be sure, a syntactocentric explanation for these facts is not inconceivable. Imagine, for example, that BPPAs are introduced by one of two null operators, depending on the aspectual classes of the matrix VP and the adjunct. Now, one of these two null operators is an intervener for extraction, whereas the other is not. In principle, there is nothing to stop such an account, but in the absence of any supporting evidence for the presence of two null operators, their distribution, and their different status with respect to intervention, the details remain entirely obscure. In contrast, the theory in the preceding chapters predicts just such a contrast. The crucial assumption, ignoring larger event groupings for now, is that events come in two sizes, core events and goal-related extended events. In the absence of any overt marking of goal-orientation, and perhaps because of their syntactic smallness, BPPAs are restricted to the class of core events, which underpin the lexical aspectual classes of Vendler (1957), as decomposed by Dowty (1979) and others following him. This is why aspectual class has such an influence on extraction out of BPPAs. On the other hand, rationale clauses do explicitly mark goal-orientation, and so extended event structure is automatically available, with the adjunct specifying the goal of the action described in the matrix VP. This means that rationale clauses automatically form a macroevent with their VP hosts, while macroevent formation is contingent on matters related to aspectual class in the case of BPPAs. The Single Event Grouping Condition therefore predicts unrestricted extraction from the former, but only extraction in certain cases from the latter. The Restricted Answers Puzzle concerned cases where the acceptability of a question depended partly on the expected answer to that question. This is shown particularly clearly by prepositional participial adjuncts, as in the examples repeated below. (3)
A: Which book did John design his garden [after reading B: An introduction to landscape gardening.
]?
(4)
A: Which book did John design his garden [after reading B: #Finnegans Wake.
]?
A syntactocentric account of extraction from adjuncts would have little chance here. The upper size limit of syntactic structures is, in normal circumstances, the sentence. Here, though, the acceptability of one utterance is contingent on
242
Conclusion
a subsequent utterance in the discourse. Syntax is not built to handle this sort of pattern on its own. In the light of the Single Event Grouping Condition, though, we expect just such a pattern to emerge. The prepositions which give rise to this pattern are those which do not specify a contingent relation between the two events in question but only a weaker, temporal relation. We saw in Chapter 3 that the relevant class of relations among events for the Single Event Grouping Condition consists only of contingent relations among the subevents of a given macroevent, and specifically not temporal relations. In order to meet this condition, the noncontingent relation expressed by a preposition such as after must be enriched to give a contingent reading. However, the feasibility of this enrichment depends on the answer to the question, as shown in Section 6.2. In this way, the condition on the relation among events becomes a condition on the answer to the question. If the actual answer does not meet that condition, the exchange is anomalous. The Interpretive Puzzle focused on two surprising interpretive asymmetries concerning BPPAs. Firstly, the available interpretative relations between matrix and adjunct events in an interrogative sentence form a proper subset of those available in a declarative sentence, as shown by contrasting the degraded accomplishment case (5) and the degraded achievement case (6) with the legitimate cases of extraction in (7). (5)
(a) John painted this picture [eating apples]. (b) *What did John paint this picture [eating
]?
(6)
(a) John came home [dripping mud all over the living room carpet]. (b) ??/*What did John come home [dripping mud on ]?
(7)
(a) What did John drive Mary crazy [whistling ]? (b) What did John arrive [whistling
]?
Secondly, the interpretations available in the legitimate cases of extraction from adjuncts differ in the accomplishment case, where a causal reading is strongly preferred, and the achievement case, where only an interpretation of immediate temporal precedence is readily available. This intricate pattern of facts was analyzed using the operator Op from Chapter 4. Structural ambiguities in the height of attachment of this operator automatically generate readings consisting of a single event grouping and multiple event groupings for every example. However, only the single-grouping readings are legitimate with respect to the Single Event Grouping Condition, which regulates only the extraction cases. The question then becomes one of which interpretations are legitimate within a single grouping. The event described by a BPPA is agentive in every instance considered here. Moreover,
Conclusion
243
we assume that the preparatory process in an accomplishment is necessarily agentive, and the preparatory process in an achievement is necessarily nonagentive. In that case, the only way to meet the requirement of the Single Event Grouping Condition that an event grouping contains at most one agentive event is to identify the adjunct event as the preparatory process which directly causes the culmination in the accomplishment case. On the other hand, in the achievement case, such identification is impossible, as the preparatory process is nonagentive by definition and the adjunct event is agentive. In that case, the only possible interpretation is one where the adjunct event cooccurs with the preparatory process leading to the matrix culmination, but is distinct from that process. This gives an interpretation of immediate temporal precedence without causation. Finally, The Unlikely Antilocality Puzzle concentrates on a tension between the typical syntactic locality pattern, which privileges short movement steps, and some further data concerning BPPAs, including the following. (8)
(a) ??What did John drive Mary crazy [fixing ]? (b) What did John drive Mary crazy [trying [to fix
]]?
The preference for shorter steps in syntactic theory comes from a fundamental asymmetry in locality theory. We assume certain elements to be interveners, hindering movement, but (with the exception of the Edge Features of Chomsky 2008) we do not have a class of facilitators, helping movement on its way. When comparing a longer movement A with a shorter movement B, such that the nodes traversed by A are a proper superset of the nodes traversed by B, the best case, then, is that none of the extra nodes traversed by movement A act as interveners, in which case A and B should be equally acceptable. Putting antilocality theories aside, as we have seen that they are irrelevant to the present case, there is no way for the longer movement A to be preferred over B. Data such as (8) are doomed to remain anomalous in a purely syntactic theory, then. However, the contrast is exactly as expected on the present approach. As mentioned in Section 8.2, although (8b) is syntactically more complex than (8a), it is aspectually simpler, in that try can take an accomplishmentdenoting complement and yield an activity. And, as we saw in Section 6.3, extraction from BPPAs is contingent upon core event formation, which, in turn, is only possible if the adjunct denotes an activity. The degradation of (8a) in comparison to (8b) is therefore due to the necessity of coercing the accomplishment fix into an activity description in (8a). Such factors strongly suggest that we must move away from the notion that there is a syntactic adjunct condition. However, this work is only partly about extraction from adjuncts. It also has a wider architectural and methodological
244
Conclusion
point to make. Most widely accepted minimalist grammatical architectures have one common feature, namely a radically impoverished narrow syntactic component complemented by an increased reliance on interface conditions to take up the slack left by simplification of the syntax. And yet there is a strong tendency in syntactic theory to assume that the mapping between syntax and the interfaces is more or less direct. At the PF interface, we see this in the LCA (Kayne 1994, Chomsky 1995), which removes the need for a substantial linearization algorithm in the syntax→PF mapping by stipulating a homomorphism from asymmetric c-command in the syntax to linear precedence in the phonology. At the LF interface, classic examples of this trend include UTAH (Baker, 1988), which reduces hierarchical effects among arguments to similarly hierarchical properties of phrase structure trees, or the decompositional approaches to event structure explored by Lakoff (1970) and Hale and Keyser (1993). Such theories are pulling in the opposite direction from one conception of the basic minimalist hypothesis, in that they take factors with clear interface, or post-interface, effects and reintegrate them into the syntax. In contrast, the theory proposed here rests on a conception of event structure which is motivated on grounds entirely independent from phrase structure, and which has structural properties quite distinct from phrase structure, as demonstrated in Section 8.2. In fact, the mapping from syntax to semantics looks quite like the mapping from syntax to phonology on the conception given here. We have a notion of semantic constituency, for our purposes based on events as units, which tends to map onto the units defined by the notion of syntactic constituency. That tendency is what lends plausibility to the ‘direct mapping’ approach of researchers from Lakoff through to Ramchand, together with the morphological effects of event structure which have been investigated over the last 20 years or so. However, in some limited cases, syntactic and semantic constituency diverge. We have seen two major cases of that here: the formation of single or multiple events from BPPAs modifying VPs in identical syntactic configurations, and divergence of the fine-grained constituency of extended events and their syntactic representations. Similarly, there is little doubt that, at the PF interface, there is a notion of prosodic constituency that is very broadly similar to syntactic constituency, but diverges in a number of ways. People rarely do phonology directly on syntactic representations, although there is clearly some degree of correspondence between syntactic and phonological representations. It should come as no surprise that the same is true of the syntax–semantics interface, but the enterprise of computing semantic representations directly from syntactic structure is still very popular across a wide variety of frameworks. One claim of the approach pursued here is that
Conclusion
245
the syntax–semantics interface should be less direct, and able to accommodate mismatches similar to those that must be explained by a successful phonological theory. Although the details of the proposal presented here are bound to be proved wrong in the fullness of time, and although many areas remain unexplored, one conclusion that I hope to have placed beyond reasonable doubt is that the attested patterns of acceptable extraction from adjuncts are systematic, but that the system is quite distinct from the system assumed to underlie phrase structure. This is concordant with a genuinely minimalist model of syntax, in which independently necessary structures at the interfaces and beyond can constrain acceptability of sentences in much the same way as factors regulating phrase structure, and each component of the overall system can be allowed to fully pull its weight.
References Abels, Klaus (2003). Successive Cyclicity, Anti-locality and Adposition Stranding. Ph. D. thesis, University of Connecticut, Storrs, CT. Aissen, Judith (1996). Pied-piping, abstract agreement, and functional projections in Tzotzil. Natural Language and Linguistic Theory 14: 447–91. Asher, Nicholas (1993). Reference to Abstract Objects in Discourse. Dordrecht: Kluwer. Bach, Emmon (1986). The algebra of events. Linguistics and Philosophy 9: 5–16. Baker, Mark (1988). Incorporation: A Theory of Grammatical Function Changing. Chicago: University of Chicago Press. Baldwin, Dare, Baird, Jodie, Saylor, Megan, and Clark, Angela (2001). Infants parse dynamic action. Child Development 72: 708–17. Baltin, Mark and Postal, Paul (1996). More on reanalysis hypotheses. Linguistic Inquiry 27: 127–45. Belletti, Adriana and Rizzi, Luigi (1981). The syntax of ‘ne’: Some theoretical implications. The Linguistic Review 1: 117–54. (1988). Psych verbs and Ë-theory. Natural Language and Linguistic Theory 6: 291–352. Bittner, Maria (1999). Concealed causatives. Natural Language Semantics 7: 1–78. Boeckx, Cedric (2003). Islands and Chains: Resumption as Stranding. Amsterdam: John Benjamins. Borer, Hagit (2005). Structuring Sense. Volume 2: The Normal Course of Events. Oxford: Oxford University Press. Borgonovo, Claudia and Neeleman, Ad (2000). Transparent adjuncts. Canadian Journal of Linguistics/Revue canadienne de linguistique 45: 199–224. Boškovi´c, Željko (1994). D-structure, theta-criterion, and movement into theta positions. Linguistic Analysis 24: 247–86. and Lasnik, Howard (1999). How strict is the cycle? Linguistic Inquiry 30: 691–703. Browning, Marguerite (1987). Null Operator Constructions. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Cable, Seth (2004). Restructuring in English. Ms., Massachusetts Institute of Technology. Cattell, Ray (1976). Constraints on movement rules. Language 52: 18–50. Chomsky, Noam (1955). The logical structure of linguistic theory. Ms., Massachusetts Institute of Technology. Published (1975) by Plenum, New York. (1957). Syntactic Structures. The Hague: Mouton. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. (1973). Conditions on transformations. In A Festschrift for Morris Halle (eds. S. Anderson and P. Kiparsky), pp. 232–86. New York: Holt, Rinehart and Winston.
References
247
(1977). On wh-movement. In Formal Syntax (eds. P. Culicover, T. Wasow, and A. Akmajian), pp. 71–132. New York: Academic Press. (1981). Lectures on Government and Binding. Dordrecht: Foris. (1982). Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, MA: MIT Press. (1986). Barriers. Cambridge, MA: MIT Press. (1993). A minimalist program for linguistic theory. In The View from Building 20: Essays in Honor of Sylvain Bromberger (eds. K. Hale and S. J. Keyser), pp. 1–52. Cambridge, MA: MIT Press. (1995). The Minimalist Program. Cambridge, MA: MIT Press. (2000). Minimalist inquiries: The framework. In Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik (eds. R. Martin, D. Michaels, and J. Uriagereka), pp. 89–115. Cambridge, MA: MIT Press. (2008). On phases. In Foundational Issues in Linguistic Theory: Essays in Honor of Jean-Roger Vergnaud (eds. R. Freidin, C. Otero, and M. L. Zubizarreta), pp. 133–166. Cambridge, MA: MIT Press. and Lasnik, Howard (1993). The theory of principles and parameters. In Syntax: An International Handbook of Contemporary Research (eds. J. Jacobs, A. von Stechow, W. Sternefeld, and T. Vennemann), pp. 506–69. Berlin: De Gruyter. Chung, Sandra (1991). Sentential subjects and proper government in Chamorro. In Interdisciplinary Approaches to Language: Essays in Honor of S.-Y. Kuroda (eds. C. Georgopoulos and R. Ishihara), pp. 75–99. Dordrecht: Kluwer. Cinque, Guglielmo (1990). Types of A-dependencies. Cambridge, MA: MIT Press. (1999). Adverbs and Functional Heads: A Cross-linguistic Perspective. Oxford: Oxford University Press. (2003). Complement and adverbial PPs: Implications for clause structure. Paper presented at the University of Venice. Copley, Bridget (2002). The Semantics of the Future. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Crain, Stephen and Steedman, Mark (1985). On not being led up the garden path: The use of context by the psychological syntax processor. In Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives (eds. D. Dowty, L. Karttunen, and A. Zwicky), pp. 320–58. Cambridge: Cambridge University Press. Csibra, Gergely and Gergely, György (2007). ‘Obsessed with goals’: Functions and mechanisms of teleological interpretation of actions in humans. Acta Psychologica 124: 60–78. Culicover, Peter (1993). Evidence against ECP accounts of the that-t effect. Linguistic Inquiry 24: 557–61. and Jackendoff, Ray (2001). Control is not movement. Linguistic Inquiry 32: 493–512. (2005). Simpler Syntax. Oxford: Oxford University Press.
248
References
Davidson, Donald (1967). The logical form of action sentences. In The Logic of Decision and Action (ed. N. Rescher), pp. 81–95. Pittsburgh, PA: University of Pittsburgh Press. (1969). The individuation of events. In Essays in Honor of Carl G. Hempel (ed. N. Rescher), pp. 216–34. Dordrecht: Reidel. Davies, William and Dubinsky, Stanley (2003). On extraction from NPs. Natural Language and Linguistic Theory 21: 1–37. Demonte, Violeta (1988). Remarks on secondary predicates: C-command, extraction and reanalysis. The Linguistic Review 6: 1–39. Dowty, David (1979). Word Meaning and Montague Grammar: The Semantics of Verbs and Times in Generative Semantics and Montague’s PTQ. Dordrecht: Reidel. (1991). Thematic proto-roles and argument selection. Language 67: 547–619. Dresher, Elan (1976). The position and movement of prepositional phrases. Ms., University of Massachusetts, Amherst, MA. Dryer, Matthew (2008). Order of subject, object and verb. In The World Atlas of Language Structures Online (eds. M. Haspelmath, M. Dryer, D. Gil, and B. Comrie), Chapter 81. Max Planck Digital Library, Munich. Available online at http://wals.info/feature/81. Accessed on 12 October 2009. Ellis, Willis (1938). A Source Book of Gestalt Psychology. London: Kegan Paul, Trench, Trubner & Co. Engdahl, Elisabet (1983). Parasitic gaps. Linguistics and Philosophy 6: 5–34. Ernst, Thomas (2002). The Syntax of Adjuncts. Cambridge: Cambridge University Press. Erteschik-Shir, Nomi (1973). On the Nature of Island Constraints. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Faraci, Robert (1974). Aspects of the Grammar of Infinitives and For-Phrases. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Fodor, Jerry (1970). Three reasons for not deriving ‘kill’ from ‘cause to die’. Linguistic Inquiry 1: 429–38. Fox, Danny and Nissenbaum, Jon (1999). Extraposition and scope: A case for overt QR. In Proceedings of the 18th West Coast Conference on Formal Linguistics (eds. S. Bird, A. Carnie, J. Haugen, and P. Norquest), pp. 132–44. Somerville, MA: Cascadilla Press. Geis, Jonnie (1973). Subject complementation with causative verbs. In Issues in Linguistics: Papers in Honor of Henry and Renée Kahane (eds. B. Kachru, R. Lees, Y. Malkiel, A. Pietrangeli, and S. Saporta), pp. 210–29. Urbana, IL: University of Illinois Press. Georgopoulos, Carol (1985). Variables in Palauan syntax. Natural Language and Linguistic Theory 3: 59–94. Geurts, Bart (1998). Presuppositions and anaphors in attitude contexts. Linguistics and Philosophy 21: 545–601. Gibson, Edward (1998). Linguistic complexity: Locality of syntactic dependencies. Cognition 68: 1–76.
References
249
(2000). The Dependency Locality Theory: A distance-based theory of linguistic complexity. In Image, Language, Brain (eds. A. Marantz, Y. Miyashita, and W. O’Neil), pp. 95–126. Cambridge, MA: MIT Press. Goldsmith, John (1985). A principled exception to the Coordinate Structure Constraint. In CLS 21, Part 1: The General Session, pp. 133–43. Chicago, IL: Chicago Linguistic Society. Grohmann, Kleanthes (2003). Prolific Domains: On the Anti-locality of Movement Dependencies. Amsterdam: John Benjamins. Hale, Kenneth and Keyser, Samuel Jay (1993). On argument structure and the lexical expression of syntactic relations. In The View from Building 20: Essays in Honor of Sylvain Bromberger (eds. K. Hale and S. J. Keyser), pp. 53–109. Cambridge, MA: MIT Press. Harley, Heidi (1995). Subjects, Events, and Licensing. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Hauser, Marc, Chomsky, Noam, and Fitch, William Tecumseh (2002). The faculty of language: What is it, who has it, and how did it evolve? Science 298: 1569–79. Hegarty, Michael (1992). Adjunct Extraction and Chain Configurations. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Heim, Irene (1992). Presupposition projection and the semantics of attitude verbs. Journal of Semantics 9: 183–221. and Kratzer, Angelika (1998). Semantics in Generative Grammar. Oxford: Blackwell. Higginbotham, James (1985). On semantics. Linguistic Inquiry 16, 547–93. Hornstein, Norbert and Weinberg, Amy (1981). Case theory and preposition stranding. Linguistic Inquiry 12: 55–91. Huang, C.-T. James (1982). Logical Relations in Chinese and the Theory of Grammar. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Jackendoff, Ray (1972). Semantic Interpretation in Generative Grammar. Cambridge, MA: MIT Press. (1973). The base rules for prepositional phrases. In A Festschrift for Morris Halle (eds. S. R. Anderson and P. Kiparsky), pp. 345–56. New York: Holt, Rinehart and Winston. (1990). Semantic Structures. Cambridge, MA: MIT Press. (1996). The proper treatment of measuring out, telicity, and perhaps even quantification in English. Natural Language and Linguistic Theory 14: 305–54. (2007). Language, Consciousness, Culture: Essays on Mental Structure. Cambridge, MA: MIT Press. and Pinker, Steven (2005). The nature of the language faculty and its implications for evolution of language (reply to Fitch, Hauser, and Chomsky). Cognition 97: 211–25. Johnson, Kyle (2002). Towards an etiology of adjunct islands. Ms., University of Massachusetts, Amherst, MA.
250
References
Jones, Charles (1991). Purpose Clauses: Syntax, Thematics, and Semantics of English Purpose Constructions. Dordrecht: Kluwer. Kamp, Hans (1979). Events, instants and temporal reference. In Semantics from Different Points of View (eds. R. Bäuerle, U. Egli, and A. von Stechow), pp. 376–417. Berlin: Springer. (1981a). Some remarks on the logic of change, part I. In Time, Tense, and Quantifiers: Proceedings of the Stuttgart Conference on the Logic of Tense and Quantification (ed. C. Rohrer), pp. 135–79. Tübingen: Max Niemeyer. (1981b). A theory of truth and semantic representation. In Formal Methods in the Study of Language (eds. J. Groenendijk, T. Janssen, and M. Stokhof), pp. 277–322. Amsterdam: Mathematisch Centrum. Karttunen, Lauri (1973). Presuppositions of compound sentences. Linguistic Inquiry 4: 169–93. Kayne, Richard (1983). Connectedness. Linguistic Inquiry 14: 223–49. (1994). The Antisymmetry of Syntax. Cambridge, MA: MIT Press. Kearney, Kevin (1983). Governing categories. Ms., University of Connecticut, Storrs, CT. Keenan, Edward and Faltz, Leonard (1985). Boolean Semantics for Natural Language. Dordrecht: Kluwer. Kehler, Andrew (2002). Coherence, Reference, and the Theory of Grammar. Stanford, CA: CSLI. King, Ruth and Roberge, Yves (1990). Preposition stranding in Prince Edward Island French. Probus 3: 351–69. Kiparsky, Paul and Kiparsky, Carol (1970). Fact. In Progress in Linguistics (eds. M. Bierwisch and K. Heidolph), pp. 143–73. The Hague: Mouton. Kluender, Robert (1992). Deriving island constraints from principles of predication. In Island Constraints: Theory, Acquisition and Processing (eds. H. Goodluck and M. Rochemont), pp. 223–58. Dordrecht: Kluwer. (1998). On the distinction between strong and weak islands: A processing perspective. In Syntax and Semantics 29: The Limits of Syntax (eds. P. Culicover and L. McNally), pp. 241–79. New York: Academic Press. Koffka, Kurt (1935). Principles of Gestalt Psychology. London: Kegan Paul, Trench, Trubner & Co. Köhler, Wolfgang (1925). The Mentality of Apes. London: Kegan Paul, Trench, Trubner, & Co. Translated by Ella Winter. Reprinted (1999) by Routledge. Koontz-Garboden, Andrew and Beavers, John (2009). Manner and result verbs. Paper presented at the annual meeting of the Linguistics Association of Great Britain, University of Edinburgh, 8 September 2009. Koopman, Hilda (1984). The Syntax of Verbs: From Verb Movement Rules in the Kru Languages to Universal Grammar. Dordrecht: Foris. Kratzer, Angelika (1996). Severing the external argument from its verb. In Phrase Structure and the Lexicon (eds. J. Rooryck and L. Zaring), pp. 109–37. Dordrecht: Kluwer.
References
251
Krifka, Manfred (1989). Nominal reference, temporal constitution, and quantification in event semantics. In Semantics and Contextual Expression (eds. R. Bartsch, J. van Bentham, and P. van Emde Boas), pp. 75–115. Dordrecht: Foris. Kroch, Anthony (1989). Amount quantification, referentiality, and long whmovement. Ms., University of Pennsylvania. Kuno, Susumu (1973). Constraints on internal clauses and sentential subjects. Linguistic Inquiry 4: 363–85. (1987). Functional Syntax: Anaphora, Discourse, and Empathy. Chicago, IL: University of Chicago Press. (1997). Binding theory in the minimalist program. Ms., Harvard University. Lakoff, George (1970). Irregularity in Syntax. New York: Holt, Rinehart and Winston. (1986). Frame semantic control of the Coordinate Structure Constraint. In Papers from the Parasession on Pragmatics and Grammatical Theory (eds. A. Farley, P. Farley, and K.-E. McCullough), pp. 152–67. Chicago, IL: Chicago Linguistic Society. Landau, Idan (2001). Elements of Control: Structure and Meaning in Infinitival Constructions. Dordrecht: Kluwer. Landman, Fred (1991). Structures for Semantics. Dordrecht: Kluwer. Langendoen, D. Terence and Savin, Harris (1971). The projection problem for presuppositions. In Studies in Linguistic Semantics (eds. C. Fillmore and D. T. Langendoen), pp. 54–60. New York: Holt, Rinehart and Winston. Larson, Richard (1988). On the double object construction. Linguistic Inquiry 19: 335–91. Lasersohn, Peter (1992). Generalized conjunction and temporal modification. Linguistics and Philosophy 15: 381–410. Lasnik, Howard (1998). Some reconstruction riddles. In Proceedings of the 22nd Annual Penn Linguistics Colloquium (eds. A. Dimitriadis, H. Lee, C. Moisset, and A. Williams), pp. 83–98. Philadelphia: University of Pennsylvania. and Saito, Mamoru (1984). On the nature of proper government. Linguistic Inquiry 15: 235–89. Lebeaux, David (1988). Language Acquisition and the Form of the Grammar. Ph. D. thesis, University of Massachusetts, Amherst, MA. Levin, Beth and Rappaport Hovav, Malka (2008). A constraint on verb meanings: Manner/result complementarity. Paper presented at Brown University, Providence, RI, 17 March 2008. Levine, Robert (1984). Against reanalysis rules. Linguistic Analysis 14: 3–29. and Sag, Ivan (2003). Some empirical issues in the grammar of extraction. In Proceedings of the 10th International Conference on Head-Driven Phrase Structure Grammar (ed. S. Müller), pp. 236–56. Stanford, CA: CSLI. Lewis, David (1973). Causation. Journal of Philosophy 70: 556–67. Link, Godehard (1983). The logical analysis of plural and mass terms: A latticetheoretical approach. In Meaning, Use and the Interpretation of Language (eds. R. Bäuerle, C. Schwarze, and A. von Stechow), pp. 303–23. Berlin: Walter de Gruyter.
252
References
Longobardi, Giuseppe (1985). Connectedness and island constraints. In Grammatical Representation (eds. J. Guéron, H.-G. Obenauer, and J.-Y. Pollock), pp. 169–85. Dordrecht: Foris. Marantz, Alec (1997). No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. In University of Pennsylvania Working Papers in Linguistics, pp. 201–25. May, Robert (1985). Logical Form: Its Structure and Derivation. Cambridge, MA: MIT Press. McIntyre, Andrew (2004). Event paths, conflation, argument structure, and VP shells. Linguistics 42: 523–71. Meltzoff, Andrew (1995). Understanding the intentions of others: Re-enactment of intended acts by 18-month-old children. Developmental Psychology 31: 838–50. Merchant, Jason (2001). The Syntax of Silence: Sluicing, Islands, and the Theory of Ellipsis. Oxford: Oxford University Press. Miller, George and Chomsky, Noam (1963). Finitary models of language users. In Handbook of Mathematical Psychology, vol. 2 (eds. R. D. Luce, R. Bush, and E. Galanter), pp. 419–91. New York: Wiley. ———, Galanter, Eugene, and Pribram, Karl (1960). Plans and the Structure of Behavior. New York: Holt, Rinehart and Winston. Minsky, Marvin (1974). A framework for representing knowledge. Technical report, Massachusetts Institute of Technology, Cambridge, MA. Artificial Intelligence Memo 306. Moens, Marc and Steedman, Mark (1988). Temporal ontology and temporal reference. Computational Linguistics 14: 15–28. Morgan, Jerry (1975). Some interactions of syntax and pragmatics. In Syntax and Semantics 3: Speech Acts (eds. P. Cole and J. Morgan), pp. 289–304. New York: Academic Press. Mourelatos, Alexander (1978). Events, processes and states. Linguistics and Philosophy 2: 415–34. Müller, Gereon (2010). On deriving CED effects from the PIC. Linguistic Inquiry 41: 35–82. Newell, Allen, Shaw, J. C., and Simon, Herbert (1958). Elements of a theory of human problem solving. Psychological Review 65: 151–66. and Simon, Herbert (1972). Human Problem Solving. Englewood Cliffs, NJ: Prentice Hall. Nunes, Jairo and Uriagereka, Juan (2000). Cyclicity and extraction domains. Syntax 3: 20–43. Parsons, Terence (1990). Events in the Semantics of English: A Study in Subatomic Semantics. Cambridge, MA: MIT Press. Partee, Barbara and Rooth, Mats (1983). Generalized conjunction and type ambiguity. In Meaning, Use and Interpretation of Language (eds. R. Bäurle, C. Schwarze, and A. von Stechow), pp. 361–83. Berlin: De Gruyter.
References
253
Pesetsky, David (1987). Wh-in-situ: Movement and unselective binding. In The Representation of (In)definiteness (eds. E. Reuland and A. ter Meulen), pp. 98–129. Cambridge, MA: MIT Press. and Torrego, Esther (2001). T-to-C movement: Causes and consequences. In Ken Hale: A Life in Language (ed. M. Kenstowicz), pp. 355–426. Cambridge, MA: MIT Press. Pietroski, Paul (2000). Causing Actions. Oxford: Oxford University Press. Pinker, Steven and Jackendoff, Ray (2005). The faculty of language: What’s special about it? Cognition 95: 201–36. Pollard, Carl and Sag, Ivan (1994). Head-Driven Phrase Structure Grammar. Stanford, CA: CSLI. Postal, Paul (1998). Three Investigations of Extraction. Cambridge, MA: MIT Press. Pullum, Geoff (1987). Implications of English extraposed irrealis clauses. In Proceedings of ESCOL, pp. 260–70. Pustejovsky, James (1991). The syntax of event structure. Cognition 41: 47–81. (1995). The Generative Lexicon. Cambridge, MA: MIT Press. Pylkkänen, Liina (2002). Introducing Arguments. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Ramchand, Gillian (2008). Verb Meaning and the Lexicon: A First Phase Syntax. Cambridge: Cambridge University Press. Reinhart, Tanya (2002). The theta system: An overview. Theoretical Linguistics 28: 229–90. Rizzi, Luigi (1990). Relativized Minimality. Cambridge, MA: MIT Press. Rosenbaum, Peter (1967). The Grammar of English Predicate Complement Constructions. Cambridge, MA: MIT Press. Ross, John Robert (1967). Constraints on Variables in Syntax. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Rouveret, Alain and Vergnaud, Jean-Roger (1980). Specifying reference to the subject: French causatives and conditions on representations. Linguistic Inquiry 11: 97–202. Sabel, Joachim (2002). A minimalist analysis of syntactic islands. The Linguistic Review 19: 271–315. Sauerland, Uli and Elbourne, Paul (2002). Total reconstruction, PF movement, and derivational order. Linguistic Inquiry 33: 283–319. Schank, Roger and Abelson, Robert (1977). Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures. New York: Lawrence Erlbaum. Smith, Carlota (1997). The Parameter of Aspect (second edn). Dordrecht: Kluwer. Sprouse, Jon, Brinjak, John, Littman, Robin, and Meyers, Conrad (2007). The effect of temporary representations on acceptability. Paper presented at GLOW XXX, Tromsø, 13 April 2007. Steedman, Mark (2002). Plans, affordances, and combinatory grammar. Linguistics and Philosophy 25: 725–53. Stepanov, Arthur (2001). Late adjunction and minimalist phrase structure. Syntax 4: 94–125.
254
References
(2007). The end of CED? Minimalism and extraction domains. Syntax 10: 80–126. Stowell, Tim (1982). Conditions on reanalysis. In Papers in Syntax: MIT working papers in linguistics volume 4 (eds. A. Marantz and T. Stowell), pp. 245–69. Svenonius, Peter (2004). Slavic prefixes inside and outside VP. Nordlyd 32: 205–53. Szabolcsi, Anna (2006). Strong vs. weak islands. In The Blackwell Companion to Syntax, Volume IV (eds. M. Everaert and H. van Riemsdijk), pp. 479–531. Oxford: Blackwell. and Zwarts, Frans (1993). Weak islands and an algebraic semantics for scope taking. Natural Language Semantics 1: 235–84. Takahashi, Daiko (1994). Minimality of Movement. Ph. D. thesis, University of Connecticut, Storrs, CT. Takahashi, Shoichi (2006). Decompositionality and Identity. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. and Hulsey, Sarah (2009). Wholesale late merger: Beyond the A/A distinction. Linguistic Inquiry 40: 387–426. Talmy, Leonard (1988). Force dynamics in language and cognition. Cognitive Science 12: 49–100. (2000). Toward a Cognitive Semantics, vol. I: Concept Structuring Systems. Cambridge, MA: MIT Press. Taraldsen, Knut Tarald (1981). The theoretical interpretation of a class of marked extractions. In Theory of Markedness in Generative Grammar: Proceedings of the 1979 GLOW Conference (eds. A. Belletti, L. Brandi, and L. Rizzi), pp. 475–515. Pisa: Scuola Normale Superiore di Pisa. Taylor, Heather Lee (2006). Moving out of IF-clauses: If an IF clause is sentence initial . . . Paper presented at the workshop on adjuncts and modifiers, GLOW XXIX, Universitat Autònoma de Barcelona. Tenny, Carol (1987). Grammaticalizing Aspect and Affectedness. Ph. D. thesis, Massachusetts Institute of Technology, Cambridge, MA. Tolman, Edward (1932). Purposive Behavior in Animals and Men. New York: Century. Tomasello, Michael, Carpenter, Malinda, Call, Josep, Behne, Tanya, and Moll, Henrike (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences 28: 675–91. Toyoshima, Toyoshi (1997). Derivational CED: A consequence of the bottom-up parallel processes of merge and attract. In Proceedings of WCCFL 15, pp. 505–19. Travis, Lisa (2000). Event structure in syntax. In Events as Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax (eds. C. Tenny and J. Pustejovsky), pp. 145–85. Stanford, CA: CSLI. Truswell, Robert (2007a). Extraction from adjuncts and the structure of events. Lingua 117: 1355–77. (2007b). Locality of Wh-movement and the Individuation of Events. Ph. D. thesis, University College London, London. (2009). Preposition stranding, passivisation, and extraction from adjuncts in Germanic. In Linguistic Variation Yearbook, vol. 8 (eds. J. van Craenenbroeck and J. Rooryck), pp. 131–77. Amsterdam: John Benjamins.
References
255
Uriagereka, Juan (1999). Multiple spell-out. In Working Minimalism (eds. S. Epstein and N. Hornstein), pp. 251–82. Cambridge, MA: MIT Press. van Benthem, Johan (1983). The Logic of Time. Dordrecht: Reidel. van der Sandt, Rob (1992). Presupposition projection as anaphora resolution. Journal of Semantics 9: 333–77. van Riemsdijk, Henk (1978). A Case Study in Syntactic Markedness: The Binding Nature of Prepositional Phrases. Dordrecht: Foris. van Valin, Robert and Wilkins, David (1996). The case for ‘effector’: Case roles, agents and agency revisited. In Grammatical Constructions: Their Form and Meaning (eds. M. Shibatani and S. Thompson), pp. 289–322. Oxford: Oxford University Press. van Voorst, Jan (1992). The aspectual semantics of psychological verbs. Linguistics and Philosophy 15: 65–92. Vendler, Zeno (1957). Verbs and times. Philosophical Review 66: 143–60. Verkuyl, Henk (1972). On the Compositional Nature of the Aspects. Dordrecht: Reidel. (1989). Aspectual classes and aspectual composition. Linguistics and Philosophy 12: 39–94. (1993). A Theory of Aspectuality: The Interaction between Temporal and Atemporal Structure. Cambridge: Cambridge University Press. von Stechow, Arnim (2002). Temporal prepositional phrases with quantifiers: Some additions to Pratt and Francez (2001). Linguistics and Philosophy 25: 755–800. Wertheimer, Max (1923). Untersuchungen zur Lehre von der Gestalt. Psychologische Forschung 4: 301–50. English translation of selected sections in Ellis (1938: 71–88). Page numbers refer to the English version. Winter, Yoad (1995). Syncategorematic conjunction and structured meanings. In Proceedings of SALT 5 (eds. M. Simons and T. Gallaway), pp. 387–404. Ithaca, NY: Cornell University. Wolff, Phillip (2003). Direct causation in the linguistic coding and individuation of causal events. Cognition 88: 1–48. (2007). Representing causation. Journal of Experimental Psychology: General 136: 82–111. Woodward, Amanda (1998). Infants selectively encode the goal object of an actor’s reach. Cognition 69: 1–34. Wurmbrand, Susi (2003). Infinitives: Restructuring and Clause Structure. Berlin: Mouton de Gruyter. Zwart, Jan-Wouter (2007). Layered derivations. Paper presented at University College London. Zwicky, Arnold and Sadock, Jerrold (1975). Ambiguity tests and how to fail them. In Syntax and Semantics 4 (ed. J. Kimball), pp. 1–36. New York: Academic Press.
Author Index Abels, K. 33, 34, 36, 169, 225 Abelson, R. 89, 126 Aissen, J. 25, 26 Asher, N. 1, 186 Bach, E. 1, 3, 38, 44, 46, 57, 158 Baird, J. 89 Baker, M. 244 Baldwin, D. 89 Baltin, M. 172 Beavers, J. 63 Behne, T. 89 Belletti, A. 7–9, 14, 29 Bittner, M. 43, 52, 53, 82, 91 Boškovi´c, Ž. 33, 231, 232 Boeckx, C. 198, 199, 240 Borer, H. 63, 96 Borgonovo, C. 30 Brinjak, J. 204 Browning, M. 14, 22, 130, 136 Cable, S. 145–147, 235 Call, J. 89 Carpenter, M. 89 Cattell, R. 6–9, 11, 23, 29, 130, 193 Chomsky, N. 6, 9–12, 14–19, 21, 23, 28, 35, 36, 40, 87, 89, 107, 122, 124, 131, 172, 201–203, 209, 213, 224, 226, 230, 231, 243, 244 Chung, S. 25 Cinque, G. 16–18, 21, 23, 28, 118, 122, 135, 166, 175, 196 Clark, A. 89 Crain, S. 208 Csibra, G. 89 Culicover, P. 5, 75, 133, 205, 230, 237 Davidson, D. 3, 38, 43, 49, 68–72, 105, 109 Davies, W. 187
Demonte, V. 198–200 Dowty, D. 39, 44, 57, 61, 81, 88, 98, 214, 222, 241 Dresher, E. 144 Dryer, M. 25, 194 Dubinsky, S. 187 Elbourne, P. 28, 201–203 Engdahl, E. 9, 11 Ernst, T. 118 Erteschik-Shir, N. 5, 127, 179–181, 192 Faltz, L. 72 Faraci, R. 8, 130, 218, 221 Fitch, W. T. 213 Fodor, J. 4, 44, 45, 49–51, 55, 73, 79, 81, 83, 84, 91 Fox, D. 19, 231, 232 Galanter, E. 87–89 Geis, J. 211 Georgopoulos, C. 25 Gergely, G. 89 Geurts, B. 185, 186 Gibson, E. 208, 230, 232, 233, 236, 238 Goldsmith, J. 5, 19 Grohmann, K. 33, 34, 225 Hale, K. 127, 207, 211, 216–218, 221, 222, 244 Harley, H. 211 Hauser, M. 213 Hegarty, M. 17, 180, 238 Heim, I. 71, 185 Higginbotham, J. 49, 69, 71, 72, 104–106, 109, 111, 115, 210 Hornstein, N. 144, 169, 171–173
Author Index Huang, C.-T. J. 8, 9, 11, 14, 20, 21, 23, 27–29, 129, 130, 175, 192, 200, 202, 204, 224 Hulsey, S. 231, 232 Jackendoff, R. 5, 61, 63, 75, 89, 91, 118, 133, 170, 205, 213, 214, 230, 237 Johnson, K. 20, 21 Jones, C. 29, 130, 131, 218 Köhler, W. 85, 86 Kamp, H. 44, 51–53, 104, 105, 110, 111, 113, 182 Karttunen, L. 181, 189 Kayne, R. 9, 11–14, 23, 24, 27, 29, 193, 201, 244 Kearney, K. 10 Keenan, E. 72 Kehler, A. 5, 19, 77 Keyser, S. J. 127, 207, 211, 216–218, 221, 222, 244 King, R. 194 Kiparsky, C. 179–182 Kiparsky, P. 179–182 Kluender, R. 5, 23, 177, 230 Koffka, K. 2 Koontz-Garboden, A. 63 Koopman, H. 22, 193 Kratzer, A. 71, 211, 218 Krifka, M. 57, 61, 152, 214 Kroch, A. 16 Kuno, S. 5, 28, 201, 202, 230, 231 Lakoff, G. 5, 19, 96, 127, 207, 244 Landau, I. 198 Landman, F. 53, 104 Langendoen, D. T. 180–182 Larson, R. 218–220 Lasersohn, P. 46, 104, 105, 107–110, 113–115 Lasnik, H. 17, 18, 35, 231, 232 Lebeaux, D. 231 Levin, B. 61–63, 99
Levine, R. 10, 11, 13, 28, 172, 201 Lewis, D. 51, 74, 76, 91 Link, G. 3, 38, 46, 57, 158 Littman, R. 204 Longobardi, G. 12, 13, 29 Müller, G. 20 Marantz, A. 211 May, R. 14 McIntyre, A. 159 Meltzoff, A. 89 Merchant, J. 194 Meyers, C. 204 Miller, G. 87–89, 230 Minsky, M. 126 Moens, M. 44, 58, 74 Moll, H. 89 Morgan, J. 5, 230 Mourelatos, A. 44, 98 Neeleman, A. 30 Newell, A. 87, 89 Nissenbaum, J. 19, 231, 232 Nunes, J. 20, 29, 36 Parsons, T. 44, 57, 98 Partee, B. 72, 97 Pesetsky, D. 16, 33 Pietroski, P. 3, 44, 90, 160 Pinker, S. 213 Pollard, C. 17 Postal, P. 17, 22, 172, 197 Pribram, K. 87–89 Pullum, G. 205 Pustejovsky, J. 44, 57, 99 Pylkkänen, L. 211 Ramchand, G. 43, 61, 63, 96, 127, 207, 211–218, 221–223 Rappaport Hovav, M. 61–63, 99 Reinhart, T. 74, 96 Rizzi, L. 7–9, 14, 16, 29, 35, 122 Roberge, Y. 194
257
258
Author Index
Rooth, M. 72 Rosenbaum, P. 220 Ross, J. 6, 19, 21, 24, 28, 122, 174, 177, 179, 180, 199, 201, 226 Rouveret, A. 8, 9, 172 Sabel, J. 20, 26 Sadock, J. 138, 139 Sag, I. 10, 11, 13, 17, 28, 201 Saito, M. 18, 33 Sauerland, U. 28, 201–203 Savin, H. 180–182 Saylor, M. 89 Schank, R. 89, 126 Shaw, J. 87, 89 Simon, H. 87, 89 Smith, C. 57–61, 63, 76, 88, 227 Sportiche, D. 22 Sprouse, J. 204 Steedman, M. 44, 58, 74, 89, 208 Stepanov, A. 24–27, 29, 193, 195, 231, 232 Stowell, T. 171 Svenonius, P. 216, 217 Szabolcsi, A. 5, 175, 230 Takahashi, D. 18–21, 35 Takahashi, S. 231, 232 Talmy, L. 74, 77, 80, 81, 84, 91, 95 Taraldsen, K. T. 9 Taylor, H. L. 205 Tenny, C. 61, 214 Tolman, E. 86, 87
Tomasello, M. 89 Torrego, E. 33 Toyoshima, T. 20 Travis, L. 211 Truswell, R. 145, 172, 194, 199, 230 Uriagereka, J. 20, 21, 23, 28, 29, 35, 36, 129, 224 Van Bentham, J. 52, 53, 104 Van der Sandt, R. 127, 181, 182, 186 Van Riemsdijk, H. 169, 171, 193, 194 Van Valin, R. 80 Van Voorst, J. 58 Vendler, Z. 30, 39, 44, 55–60, 97–101, 103, 150, 241 Vergnaud, J.-R. 8, 9, 172 Verkuyl, H. 44, 49, 58, 59, 61, 98, 99, 101, 167, 214 Von Stechow, A. 111 Weinberg, A. 144, 169, 171–173 Wertheimer, M. 2 Wilkins, D. 80 Winter, Y. 110, 115 Wolff, P. 4, 49, 74, 75, 77, 83, 84, 90–97 Woodward, A. 89 Wurmbrand, S. 145, 146 Zwart, J.-W. 20 Zwarts, F. 5, 230 Zwicky, A. 138, 139
Subject Index About 159 Accomplishments 30, 32–34, 44, 56–62, 64–67, 76, 88, 89, 95, 98–103, 132, 136, 149, 151, 153–157, 160–162, 215, 225, 240, 242, 243 Achievements 30, 32, 33, 56–63, 65–67, 98–103, 132, 133, 136, 137, 151, 152, 154–157, 161, 162, 168, 240, 242, 243 Activities 30, 32, 34, 56–59, 61, 65, 66, 88, 97, 98, 132, 136, 137, 150–160, 166, 215, 225, 227, 240, 243 Adverbials 71, 104, 111, 112, 231 Agent-oriented 100, 159, 167 Temporal 45, 46, 50, 61, 104–107, 109, 112, 113, 115, 177 Agentivity 3, 4, 39, 75, 79–85, 89–91, 94–103, 125, 131–133, 151, 156–166, 168, 221–223, 242, 243 Implicit agent 133, 134 Agree 35, 36 Akan 26 Alternately 46, 104, 107–110, 113, 114 Answers 31, 33, 139, 241, 242 Antilocality 33–37, 225, 243 Around 159 Aspectual classes 30, 32–34, 39, 44, 45, 49, 55–67, 98–103, 132, 136, 155, 161, 168, 227, 240, 241 Atelicity 30, 49, 59, 60, 150, 152, 153, 214–216, 225 Auxiliaries 107, 113, 117, 118, 126, 144, 176, 178 Bare past participial adjuncts 167, 198 Bare present participial adjuncts 30, 32, 33, 36, 37, 66, 67, 127, 129, 131, 137,
145–165, 168, 196–200, 225, 235, 240–244 Syntactic smallness of 67, 145–147, 150, 160, 165, 241 With prosodic break 146 Barriers 14, 15, 224–226 Behaviorism 85, 86 Binding conditions Condition A 219 Condition C 231, 232 Conditions on bound variable anaphora 220 Boundedness 215, 223 In nominals 57, 61, 64, 214, 215 Of events 88, 89, 227 Of paths 215 Open- and closed-scale properties 215 Temporal 57 Causation 32, 44, 51–54, 67, 69, 75–79, 81–84, 90–94, 96, 125, 126, 135–139, 150, 151, 153–156, 161–163, 171, 211, 212, 242, 243 Direct 44, 45, 50–55, 64, 66–68, 73–77, 79, 81–84, 90–94, 97, 103, 125, 147, 149, 151, 156, 157, 160, 162, 209, 210, 212, 213, 217, 228, 243 Physical 90 Psychological 90 Transitivity of 51 Causatives 221 Lexical 44, 50, 55, 76, 81, 92, 94 Periphrastic 44, 92 Chamorro 25 Chinese 200 Clause Nonfinal Incomplete Constituent Constraint 202, 203
260
Subject Index
Coarse-graining 43, 51–53, 67, 82, 83, 90, 95, 97 Coercion 44, 59, 61, 123, 225, 229, 243 Cognitive grammar 229 Coherence relations 77 Colourless green ideas 124, 228 Complement clauses 116, 169, 174, 175, 177–179, 188, 189, 191, 199, 205 Of bridge verbs 177–179, 188, 189 Of factive predicates 40, 127, 179–189, 192, 205 Condition on Extraction Domain 8, 14, 18, 20, 21, 27, 28, 129, 175, 192–195, 198, 200, 205, 206, 232 Connectedness 11–13, 23–27, 201 Contingent relations 45, 51, 74, 77–79, 81–83, 103, 123–126, 131, 132, 135–137, 139–142, 145, 147, 166, 168, 170, 171, 210, 217, 218, 223, 228, 242 Control 130, 134, 146, 198, 220 Coordinate Structure Constraint 6, 18, 19 Conjunct constraint 123, 233 Exceptions 19 Across-the-board movement 10 Pseudocoordination 19 Extension to adjuncts 18, 19 Core events 39, 58, 67, 95–97, 99, 103, 118, 125, 132, 147–151, 153, 156, 157, 161, 162, 211, 217, 223, 228, 238, 241, 243 Maximal 58, 67, 95, 150, 151, 160 Counterpart relation 186 Culmination 44, 58–68, 96, 98, 99, 101, 102, 125, 150, 152, 154, 156, 157, 160, 162–164, 215, 243
Do so-ellipsis 138 Dualism 3 Dutch 22, 23, 196, 199
Danish 199 Dense order 51–53 Dependency Locality Theory 208, 232–236, 239 Discourse referents 208, 232–237
Gbadi 193 Generative Semantics 96, 207 German 199 Germanic 193, 194, 199 Gestalt psychology 2, 85, 213
Edge Features 226, 243 Empty Category Principle 8, 14, 15, 174 Enablement 31, 45, 68, 74–85, 90, 91, 93–95, 97, 103, 125, 126, 135, 136, 139, 147, 148, 209, 210, 221, 228 enable 77, 79 Transitivity of 83, 84, 103 Enriched interpretation 45, 74, 75, 77–79, 82, 126, 135, 137, 140, 141, 145, 168, 228, 242 Event groupings 157–161, 163, 165, 166, 168, 176, 178, 187, 210, 211, 217, 221, 233–238, 241, 243 Event Identification 71, 72, 162 Event variable 3, 39, 46, 47, 49, 69, 71, 72, 104, 105, 109, 111, 115, 116, 124, 170, 176, 210 Binding of 49, 71, 104, 106, 111, 116–118, 124, 176–178, 210 Extended events 39, 96–100, 103, 125, 132, 147, 148, 156, 157, 211, 217–218, 221, 222, 228, 238, 241, 244 Facilitators 226, 243 Facts 1, 186 Faroese 199 Fodor’s Generalization 45, 49, 50, 55, 73, 79, 81, 83, 84, 91 Force dynamics 77, 84, 91, 97 Frames 126 French 22, 23, 197, 199, 200 Prince Edward Island dialect 193
Subject Index Goals 75, 79–87, 89, 91, 94–98, 100, 101, 103, 125, 131, 159, 209, 218, 221, 241 Subgoals 87 Gradience 11, 14, 34, 122–124, 135, 139, 141, 192, 200, 233, 239 Greek 196, 199 Height effects 15, 30, 105, 115–118, 122, 125, 126, 135, 142, 145, 200, 221 Hindi 221 Icelandic 199 Immediately 52, 53 Inchoation 59, 60, 69, 97 Initiation 212, 221, 223 Inner aspect 58, 150 Insight 85, 86 Instants 53, 111–114 Intention 3–4, 85, 89, 90, 93, 94, 96, 97, 99 Interface-based locality theory 5, 37, 40, 121, 135, 165, 167, 188, 192, 209 Intervals 53, 58, 110–115, 117, 176 Intervention 226, 241, 243 Islands 18, 224 Strong 4, 17, 21, 23, 28, 122, 178, 195, 199, 236, 240 Weak 4, 15–18, 21–23, 28, 29, 122, 123, 177, 180, 192, 195, 200, 236, 238, 240 Wh-islands 15–17 Italian 8, 16, 17, 196, 197, 199 Iteration 59 Japanese 24 Join 110, 113, 114, 158 Late merger 19, 231–239 Left branches 12, 13, 23 Lexical decomposition 39, 44, 47, 55, 58–60, 63–65, 96, 209, 241 In syntax 30, 34, 97, 127, 207, 209–225, 238, 244
261
Macedonian 193 Macroevent formation 43–45, 50, 54, 64, 67, 68, 74, 77, 79, 83, 85, 92, 94–96, 115, 118, 123–126, 135, 147, 149, 153, 154, 210, 227, 241 Malagasy 25 Manner–result complementarity 61–63 Markedness 194, 196, 200 Measuring out 44, 61, 88, 149, 214, 215 Minimal Distance Principle 220 Minimalism 5, 18, 20, 21, 29, 35, 37, 122, 199, 206, 216, 244, 245 Modularity 208, 227, 229, 230, 233, 238, 239 Navajo 24 Negation 107, 118, 146, 166, 167 No-intervening-cause criterion 90–92 No-intervening-cause hypothesis 90–92 Norwegian 195, 199, 200 NP Ecology Constraint 6, 9 Op 105, 111–113, 115–118, 124–126, 142–144, 176, 178, 242 Outer aspect 58, 144 Packaging 2, 44, 55 Palauan 25 Parasitic gaps 9–11, 13, 36 Symbiotic gaps 13 Passive 133, 134 Paths 208, 215, 216, 221, 223 Perfect 58, 144 Phases 36 Plans 39, 45, 82–89, 97, 98, 100, 101, 125, 132, 148, 209 Creativity of planning 87 Hierarchical organization of 87–89 Points 59, 60, 66, 132, 133, 136, 137, 227, 241 Preposition-stranding 129, 169–173, 193–194, 196, 199, 200, 202, 203
262
Subject Index
Prepositional participial adjuncts 30, 31, 76, 125, 129, 135–145, 147, 150, 166, 168, 171, 175, 196, 198, 235, 241 After 126, 131, 135, 137–142, 145 Before 135, 137–142, 145 By 125, 135–137, 142, 143, 145 Means component 137 Since 135, 142–145, 168 Upon 135, 142–145, 168 Without 166–168 Presupposition 127, 148, 178, 180, 181, 184–189, 192, 230 Projection 181–184, 186 Prevention 77 Process 44, 58–68, 88, 96, 98, 99, 101, 125, 149–152, 154, 156, 157, 160–165, 168, 212, 215, 221, 223, 243 Processing 40, 128, 206, 208, 209, 230, 232–236, 238, 239 Progressive 56, 57, 59, 60, 64, 65, 76, 97–103, 225, 227 Propositional attitudes 184–186 Purpose clauses 130, 131, 220, 221 Puzzles 5, 29–35, 37, 40, 43, 121, 125, 240 Interpretive 32–33, 36, 242–243 Restricted answers 30–31, 33, 241–242 Restricted extraction 29–30, 32, 36, 240–241 Unlikely antilocality 33–35, 37, 123, 211, 223–228, 243 Rationale clauses 8, 9, 29, 75–76, 96, 98, 125, 129–135, 147, 168, 175, 198, 218–221, 235, 240, 241 Atypical use 75, 133–134 Reanalysis 9, 171–173, 200 Reconstruction 10, 231, 232 Referential opacity 76, 80, 179, 184 Relativized Minimality 35, 174 Renumeration 21 Restructuring 9, 145–147, 160, 172 Long passive 146, 147
Result 50, 52, 60, 61, 64, 93, 118, 152, 212, 215, 216, 223 Resultatives 44, 52, 61, 64, 118, 147, 215, 218 Syntactic smallness of 147 Russian 26, 27, 196, 199, 201, 216 Scales 208, 216, 221, 223 Scripts 126 Semelfactives 60 Sentential Subject Constraint 6, 12, 123 Exceptions 25, 27 Shortest Move 18, 20, 35 Single Event Condition 38–40, 43, 45, 105, 121–124, 126, 127, 129, 130, 132, 134, 135, 137, 140, 145, 147, 154–157, 168 Single Event Grouping Condition 157–161, 165–169, 171, 174–179, 181, 187, 188, 192, 195, 200, 203–209, 223–225, 227–233, 238, 239, 241–243 Cyclic checking of 188–192, 205 Spanish 198–200 Spellout 21, 35–37, 129 Cyclic 36 Multiple 35, 36 States 1, 30, 56–60, 97, 98, 132, 133, 167, 241 Structural ambiguity 15, 117, 118, 138, 154, 160, 236, 242 Subjacency 8, 14, 15, 20, 121 Subject-of 212, 213, 222 Subjects 17, 18, 200–206, 237, 238 Successive cyclic movement 14, 15, 18, 45, 121, 127, 174, 188, 189, 191, 199, 205, 238 Swedish 195, 199, 200 Syntactocentric locality theory 5, 40, 122, 134, 135, 167, 207, 209, 228, 229, 238, 241, 243 Telicity 30, 49, 57, 58, 61, 64, 66, 88, 150, 214–216, 227
Subject Index Temporal relations 48, 49, 52–54, 74, 77–79, 82, 105, 125, 137, 148, 155, 187, 228, 242 Congruence with causal relations 77, 126, 135, 139 Contiguity 77, 137, 154 Continuity 48, 50, 55, 113 Immediate precedence 52, 53 Inclusion 53, 54 Overlap 54, 66, 110, 111, 113, 148, 152–155, 157, 159, 162, 166, 168, 243 Precedence 53, 54, 78, 82, 125, 126, 135, 152, 168, 187 Immediate 152, 154, 161, 162, 242, 243 Temporal trace 152, 165 Tense 11, 104, 106, 111–113, 116, 117, 124, 169, 170, 174–179, 188, 237, 238
263
In adjuncts 118, 126, 132, 169, 174–179, 188, 189, 192, 195, 200, 205, 231, 233, 237 Theta Identification 71 TOTE unit 87–89 Tuki 26 Turkish 24 Tzotzil 25, 26 Undergoers 223 Vata 193 VP-coordination 45, 46, 68–73, 104–107, 112, 113, 115 Wh-test 56, 57, 59 Windowing of attention 80, 81, 95 World knowledge 32, 45, 78, 124–126, 138, 153, 162
OXFORD STUDIES IN THEORETICAL LINGUISTICS published 1 The Syntax of Silence Sluicing, Islands, and the Theory of Ellipsis by Jason Merchant 2 Questions and Answers in Embedded Contexts by Utpal Lahiri 3 Phonetics, Phonology, and Cognition edited by Jacques Durand and Bernard Laks 4 At the Syntax-Pragmatics Interface Concept Formation and Verbal Underspecification in Dynamic Syntax by Lutz Marten 5 The Unaccusativity Puzzle Explorations of the Syntax-Lexicon Interface edited by Artemis Alexiadou, Elena Anagnostopoulou, and Martin Everaert 6 Beyond Morphology Interface Conditions on Word Formation by Peter Ackema and Ad Neeleman 7 The Logic of Conventional Implicatures by Christopher Potts 8 Paradigms of Phonological Theory edited by Laura Downing, T. Alan Hall, and Renate Raffelsiefen 9 The Verbal Complex in Romance by Paola Monachesi 10 The Syntax of Aspect Deriving Thematic and Aspectual Interpretation Edited by Nomi Erteschik-Shir and Tova Rapoport 11 Aspects of the Theory of Clitics by Stephen Anderson 12 Canonical Forms in Prosodic Morphology by Laura J. Downing 13 Aspect and Reference Time by Olga Borik
14 Direct Compositionality edited by Chris Barker and Pauline Jacobson 15 A Natural History of Infixation by Alan C. L. Yu 16 Phi-Theory Phi-Features Across Interfaces and Modules edited by Daniel Harbour, David Adger, and Susana Béjar 17 French Dislocation: Interpretation, Syntax, Acquisition by Cécile De Cat 18 Inflectional Identity edited by Asaf Bachrach and Andrew Nevins 19 Lexical Plurals by Paolo Acquaviva 20 Adjectives and Adverbs Syntax, Semantics, and Discourse Edited by Louise McNally and Christopher Kennedy 21 InterPhases Phase-Theoretic Investigations of Linguistic Interfaces edited by Kleanthes Grohmann 22 Negation in Gapping by Sophie Repp 23 A Derivational Syntax for Information Structure by Luis López 24 Quantification, Definiteness, and Nominalization edited by Anastasia Giannakidou and Monika Rathert 25 The Syntax of Sentential Stress by Arsalan Kahnemuyipour 26 Tense, Aspect, and Indexicality by James Higginbotham 27 Lexical Semantics, Syntax and Event Structure
edited by Malka Rappaport Hovav, Edit Doron and Ivy Sichel
32 Negative Indefinites by Doris Penka
28 About the Speaker Towards a Syntax of Indexicality by Alessandra Giorgi
33 Events, Phrases, and Questions by Robert Truswell
29 The Sound Patterns of Syntax edited by Nomi Erteschik-Shir and Lisa Rochman 30 The Complementizer Phase edited by Phoevos Panagiotidis 31 Interfaces in Linguistics New Research Perspectives edited by Raffaella Folli and Christiane Ulbrich In preparation External Arguments in Transitivity Alternations by Artemis Alexiadou, Elena Anagnostopoulou, and Florian Schäfer The Logic of Pronominal Resumption by Ash Asudeh
34 Dissolving Binding Theory by Johan Rooryck and Guido Vanden Wyngaerd Published in association with the series The Oxford Handbook of Linguistic Interfaces edited by Gillian Ramchand and Charles Reiss
edited by Martin Everaert, Marijana Marelj, and Tal Siloni Generality and Exception by Ivan Garcia-Alvarez The Indefiniteness and Focusing of Wh-words by Andreas Haida
Semantic Continuations Scope, Binding, and Other Semantic Side Effects by Chris Barker and Chung-Chieh Shan
Conditionals by Angelika Kratzer
Phi Syntax: A Theory of Agreement by Susana Béjar
Computing Optimality by Jason Riggle
Stratal Optimality Theory by Ricardo Bermúdez Otero
Nonverbal Predications by Isabelle Roy
Diagnosing Syntax edited by Lisa Lai-Shen Cheng and Norbert Corver
Null Subject Languages by Evi Sifaki and Ioanna Sitaridou
Phonology in Phonetics by Abigail Cohn The Theta System Argument Structure and the Lexicon-Syntax Interface
The Semantics of Evaluativity by Jessica Rett
Gradience in Split Intransitivity by Antonella Sorace The Morphology and Phonology of Exponence edited by Jochen Trommer