DERIVATIONS
Derivations draws together some of the most influential work of one of the world’s leading syntactitians, J...
14 downloads
672 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
DERIVATIONS
Derivations draws together some of the most influential work of one of the world’s leading syntactitians, Juan Uriagereka. These essays provide several empirical analyses and technical solutions within the Minimalist Program. The book pursues a naturalistic take on Minimalism, explicitly connecting a variety of linguistic principles and conditions to arguably analogous laws and circumstances in nature. The book can be seen as an argument for a computational approach to the language faculty. It presents an analysis of various central linguistic notions such as Case, agreement, obviation, rigidity, from a derivational perspective. Concrete studies are provided, covering phenomena ranging from the nature of barriers for extraction to the make-up of categories, and using data from many languages including Spanish, English and Basque. This book will be of interest not only to the working syntactician, but also those more generally concerned with word architecture. Juan Uriagereka is Professor at the University of Maryland. He is the author of Rhyme and Reason and co-editor of Step by Step. He has recently been awarded the sixth Euskadi Prize for scientific research by the Basque Government.
ROUTLEDGE LEADING LINGUISTS Series editor Carlos P. Otero
1 ESSAYS ON SYNTAX AND SEMANTICS James Higginbotham 2 PARTITIONS AND ATOMS OF CLAUSE STRUCTURE Subjects, agreement, case and clitics Dominique Sportiche 3 THE SYNTAX OF SPECIFIERS AND HEADS Collected essays of Hilda J. Koopman Hilda J. Koopman 4 CONFIGURATIONS OF SENTENTIAL COMPLEMENTATION Perspectives from Romance languages Johan Rooryck 5 ESSAYS IN SYNTACTIC THEORY Samuel David Epstein 6 ON SYNTAX AND SEMANTICS Richard K. Larson 7 COMPARATIVE SYNTAX AND LANGUAGE ACQUISITION Luigi Rizzi 8 MINIMALIST INVESTIGATIONS IN LINGUISTIC THEORY Howard Lasnik 9 DERIVATIONS Exploring the dynamics of syntax Juan Uriagereka
DERIVATIONS Exploring the dynamics of syntax
Juan Uriagereka
London and New York
First published 2002 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 Routledge is an imprint of the Taylor & Francis Group This edition published in the Taylor & Francis e-Library, 2005. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” © 2002 Juan Uriagereka All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data Uriagereka, Juan. Derivations : exploring the dynamics of syntax / Juan Uriagereka. p. cm. Includes bibliographical references. 1. Grammar, Comparative and general – Syntax. 2. Minimalist theory (Linguistics) I. Title. P291 .U748 2002 415–dc21 2001058962 ISBN 0-203-99450-7 Master e-book ISBN
ISBN 0-415-24776-4 (Print Edition)
PARA ELENA, ISABEL Y EL BICHITO
CONTENTS
Acknowledgments
ix
1 Introduction
1
2 Conceptual matters
22
PART I
Syntagmatic issues
43
3 Multiple spell-out
45
4 Cyclicity and extraction domains with Jairo Nunes
66
5 Minimal restrictions on Basque movements
86
6 Labels and projections: a note on the syntax of quantifiers with Norbert Hornstein 7 A note on successive cyclicity with Juan Carlos Castillo
115 136
8 Formal and substantive elegance in the Minimalist Program: on the emergence of some linguistic forms
147
PART II
Paradigmatic concerns
177
9 Integrals with Norbert Hornstein and Sara Rosen
vii
179
CONTENTS
10 From being to having: questions about ontology from a Kayne/Szabolcsi syntax
192
11 Two types of small clauses: toward a syntax of theme/ rheme relations with Eduardo Raposo
212
12 A note on rigidity
235
13 Parataxis with Esther Torrego
253
14 Dimensions of natural language with Paul Pietroski
266
15 Warps: some thoughts on categorization
288
Notes Bibliography Index
318 347 358
viii
ACKNOWLEDGMENTS
Since this constitutes the bulk of what I have written about in the last few years, to do justice to everyone who has contributed to it would take me another book. My debt to my co-authors (Norbert Hornstein, Paul Pietroski, Eduardo Raposo, Sara Rosen, Esther Torrego and my former students Juan Carlos Castillo and Jairo Nunes) can hardly be expressed; not only did they make my life easier, they also made the collaborative pieces, by far, the best ones. Similarly, I cannot repay a fraction of what I got from those who have influenced me the most, in particular those mentioned in the introduction. I am especially thankful to those linguists who sat in my classes, from Maryland and several other institutions, and those who allowed me to be on their thesis committee, at their own risk. Only for them and with them, have these pages begun to make any sense. I thank Carlos Otero for his guidance and for giving me the opportunity to put these ideas together, and the editorial staff of Routledge for making it possible, especially Rosemary Morlin. Thanks also to my editorial assistants Acrisio Pires and Ilhan Cagri, as well as to Haroon and Idris Mokhtarzada, who designed the graphs in the last chapters. Finally, I more than thank my wife for having given me a daughter who is truly in a different dimension. Chapter 2, Section 1, “Book review: Noam Chomsky, The Minimalist Program,” is reprinted from Lingua 107: 267–73 (1999), with permission from Elsevier Science. Chapter 2, Section 2, “On the emptiness of ‘design’ polemics,” is reprinted from Natural Language and Linguistic Theory 18.4: 863–71 (2000), with permission from Kluwer Academic Publishers. Chapter 2, Section 3, “Cutting derivational options,” is reprinted from Natural Language and Linguistic Theory 19.4 (2001), with kind permission from Kluwer Academic Publishers. Chapter 3, “Multiple spell-out,” is reprinted from S. D. Epstein and N. Hornstein (eds) Working Minimalism, 251–82 (1999), with permission from MIT Press. Chapter 4, “Cyclicity and extraction domains,” with Jairo Nunes, is reprinted from Syntax 3:1: 20–43 (2000), with permission from Blackwell Publishers. ix
ACKNOWLEDGMENTS
Chapter 5, “Minimal restrictions on Basque movements,” is reprinted from Natural Language and Linguistic Theory 17.2: 403–44 (1999), with permission from Kluwer Academic Publishers. Chapter 8, “Formal and substantive elegance in the Minimalist Program: on the emergence of some linguistic forms,” is reprinted from C. Wilder, H.-M. Gaertner and M. Bierwisch (eds) Studia Grammatica 40: The Role of Economy Principles in Linguistic Theory, 170–204 (1996), with permission from Akademie Verlag. Chapter 10, “From being to having: questions about ontology from a Kayne/Szabolcsi syntax,” is reprinted from A. Schwegler, B. Tranel and M. Uribe-Etxebarria (eds) Romance Linguistics: Theoretical Perspectives (Current Issues in Linguistic Theory 160), 283–306 (1998), with permission from John Benjamins. Chapter 11, “Two types of small clauses: Toward a syntax of theme/rheme relations,” with Eduardo Raposo, is reprinted from A. Cardinaletti and M. T. Guasti (eds) Syntax and Semantics, Volume 28: Small Clauses, 179–206 (1995), with permission from Academic Press. Chapter 12, “A note on rigidity,” is reprinted from A. Alexiadou and C. Wilder (eds) Possessors, Predicates and Movement in the Determiner Phrase (Linguistics Today 22), 361–82 (1998), with permission from John Benjamins. Chapter 15, “Warps: Some thoughts on categorization,” is reprinted from Theoretical Linguistics 25:1, 31–73 (1999), with permission from Walter de Gruyter.
x
1 INTRODUCTION
This book is called Derivations for two reasons. First, the work is largely derivative, especially on the research of Noam Chomsky. To some extent that describes the work of most generative linguists, but mine derives also from work by others – notably, Howard Lasnik, my teacher, Richard Kayne and James Higginbotham – in pretty much the way a composer’s resumé presents variations on great pieces by the classical masters. I have no problem playing second (or nth) fiddle to these people, among other things because this allows me to perform at a level that I would otherwise never achieve, and that is both fun and (for me at least) useful. This is all to be honest with the reader who may dislike the research of these linguists; mine will follow suit. Hopefully the correlation also works in the opposite direction. The second reason I call the book Derivations is more technical, and inasmuch as the term is often seen as opposing “representation,” it needs some explanation.
1 The notion of “representation” in linguistics and elsewhere The word “representation” is used in at least two different senses. A technical use, which I will write in bold in this introduction, is common in linguistics; a more general one, in philosophy. Although these uses are somewhat connected, they must be kept apart in principle. The linguistic notion of representation is inherited from the tradition of concatenation algebras that gave rise to generative grammar. A level of representation must include: (1) Level of representation iii. a vocabulary of symbols iii. a procedure to form associations among symbols iii. a class of acceptable formal objects that the particular level admits iv. a unification procedure iv. a procedure to map different levels among themselves. Examples of (i) are nouns or verbs at one level, consonants or vowels at another, and so on. Concatenation, or some looser association instantiates (ii). 1
DERIVATIONS
Instances of (iii) are valid phrases, valid syllables and so on. The idea that a sentence must have a subject and a predicate, or whatever unifies words, are good examples of unification as in (iv). Finally, the traditional mapping of S-structure from D-structure in the Standard System illustrates the role of (v). The notions so defined are called levels of representation, and not just “levels,” because the way in which one level relates to the next is through a particular relation called “representation,” or “rho” in shorthand. This notion is actually more familiar to linguists in its “is-a” guise, the converse of “rho.” The way a level of representation is structured is through its “is-a” relations with regard to objects of the next level. For example, we can take words as output objects of a morpho-phonemic level of representation and ask how these objects relate to the next level up, that of phrases. Clearly we want to say that, in the sentence John loves Mary, for instance loves Mary is a VP, whereas that relation is not defined for John loves. Identically, VP “rhos” or represents loves Mary. In comparable ways we would have said that Mary represents [m] [ae] [r] [i], and similarly for other relevant pieces of information in the syntactic object (e.g. a “chain,” or set of phrase-markers linked via the movement transformation, could in these terms be said to “rho” its links or component parts). A full characterization of representations is what the linguistics program is ultimately about, and the hypothesis that the language faculty is organized in terms of these particular layers of distinctive structure is obviously substantive. Indeed, it is not necessary to characterize linguistic knowledge in these particular terms. In particular, it is possible that linguistic stuff does not cohere into particular layers of structure with the characteristics just outlined. Instead of it being organized in terms of relevant notions of the form in (iii) above, unified as in (iv), and mapped to the next layer as in (v), it could well be that linguistic information is scattered around a dynamic computational procedure, with no such thing as specific objects and operations of the form in (iii) ever arising in the unified terms in (iv) – thus with no overall mapping of the sort implied in (v). Or to go to the other extreme: perhaps there is a single level of representation where all relevant formal objects coexist, without there being any sense in declaring that given objects are logically prior to others, thus making at least (v) irrelevant. Linguists speak of “purely representational” systems whenever they are dealing with a formal model which does not have any derivational properties. A derivation, in the linguistic sense, is a finite sequence of computational steps, with a definite beginning and a definite end, which are taken one step at a time according to some pre-established rules. Derivational systems manipulate nonterminal and terminal symbols, according to concrete permissible productions, set in motion through a starting axiom (e.g. S → NP VP). Representations are established between non-terminals and strings of terminals. In this sense, a standard derivational system is partly representational, in fact heavily so if representations cohere into levels as described above, so that Level n-1 precedes Level n, and so on. Right away we must say that there should be no temptation of understanding 2
INTRODUCTION
these dynamic metaphors (“productions set in motion,” “Level n-1 precedes Level n”) in a temporal sense. While such an interpretation has been advanced in the literature, it is by no means necessary: it is possible, I would even say desirable, to understand each rule application as a timeless logical process, as a matter of separating the linguistic competence of a speaker from how it is put to use in a given environment. A purely representational system lacks all of that computational machinery, and in this sense may stand in a super-case relation with regards to a derivational system (although not all such systems do). The way to characterize a purely representational system is through formal axioms of some sort or another, which determine, in the form of admissibility conditions, which particular set-theoretic objects are recognized as valid formal objects. Of course, a derivational system is that too, but what counts as admissible in that system is what conforms to the computational pattern sketched above. In contrast to all that technical talk of representations, the word “represent” is given a dozen or so entries in good dictionaries, the primary ones revolving around the notions of “portraying” or “embodying” something. These uses have given rise to classical philosophical concerns over the ultimate reference of symbols. For instance the name “Jack Kennedy” is said to represent the President of the US assassinated in 1963 (ideally one should now be looking at the man himself). Moreover, a slightly more exotic use of “represent” (listed by Webster’s dictionary as “rare”) is more pertinent to a recent philosophical tradition: “to bring before the mind, come to understand.” This is the leitmotiv behind the “representational theory of mind,” which in the works of Jerry Fodor and others has used the metaphor of the Turing machine and its computational representation as a model for mind in general. Both of those philosophical uses have some bearing on linguistics. The classical understanding of “representation” is relevant to many semantic theories, for instance. This notion is entirely at right angles with everything I have to say in this book, even those chapters where reference is discussed. Only rabidly anti-mentalist theories of reference would be incompatible with the rather abstract semantic proposals in this book, and I have nothing to contribute to this topic one way or the other. The modern philosophical understanding of “representation” is only a bit more pertinent to the ideas in the foregoing pages. Philologically, the term representation in concatenation algebras was probably borrowed from a philosophical use, although the moment that the notion “rho” is given its technical meaning, philology becomes a curiosity. Similarly, the notion “work” in physics may have its origins in British labor relations, but this matters very little to calculation in joules. Then again, I certainly provide in these pages an account of such technical linguistic notions as “command,” “Case,” “agreement,” or even “noun” and “verb.” The philosopher concerned with the representational theory of mind may be interested in whether notions of that ilk are represented somewhere, in the sense that they are “brought (in some form) before the mind” as one 3
DERIVATIONS
“comes to understand” them (in some form, also). Of course, both derivational and representational systems in principle face this sort of question; the fact that the latter are called “representational” does not make them more or less so in philosophical terms, vis-à-vis derivational alternatives. I ought to clarify right away that – respectable though the philosophical question may be – I do not think that linguistic symbols and procedures have too much to add to its elucidation. I say this because I doubt that VP, or the first segment [m] in Mary, or any such element, represents (in the philosophical understanding of the term) a phrase or a consonant or whatever. There is no serious sense – at least no linguistic sense I can think of – that VP or [m] (or for that matter whatever those shorthands boil down to, things like [consonant, nasal, etc.]) “bring before the mind” a given phrase or consonant, or one “comes to understand” those elements as a result of VP or [m]. In fact, so far as I can see those elements themselves are (they do not represent) a given phrase or a given consonant. One can, even ought to, ask what it means for VP or [m], etc. to be linguistic objects. There is a non-trivial question about what it means for those objects to be mental, which I return to in passing in the next section. But it is not clear to me how involving the extra representational assumption does anything toward clarifying this very difficult issue. I suppose that, ultimately, objects like these reduce to energy patterns of some sort, but only because in this sense everything else does – including (literally) the shapes of our various organs or the way in which our immune system operates. Needless to say, claiming that these things are (complex) energy patterns says nothing very deep, given the gap that exists between our understanding of energy patterns and the way organs, systems, mind and so on, come out. But by the same standards, claiming that VP, [m] and so on, represent something or other, given our present level of understanding, amounts to adding a claim about linguistic reality without much empirical basis, so far as I can tell.
2 Five differences between derivational and representational systems In this book I try to navigate mainly the derivational ocean, to find out what would count as a (minimalist) explanation from that perspective, and what general properties we expect the language faculty to have under those circumstances. Others study representational alternatives, and hopefully we will eventually find out, with the attained knowledge, how to decide on whether one alternative is superior; they are irreducible yet both necessary, or something deeper is common to both. I want to state in this introduction, though, what I take to be the main predictions of each sort of system, and what general theoretical moves favor one or the other take, as best as I understand them. It ought to be insisted on at this point that, since there is no temporal claim being made in any of the discussion above, it would be fallacious to distinguish a dynamic derivational system from a static representational one in terms of 4
INTRODUCTION
whether or not the mind/brain acts dynamically or statically in performance. Under current understanding of all these notions (especially whether the mind/brain acts dynamically, statically, or in some mixed, or deeper way) speculations in this respect seem senseless. As I said, derivational systems may be sub-cases of many representational systems, though this is not necessary. A purely representational system is one whose filtering conditions simply cannot be expressed as internally motivated derivational steps. Optimality Theory (OT) is one such system. Traditional (Paninian) rule-based systems work on input structures to generate, in a stepwise fashion, appropriate output structures. OT theorists question this mode of operation, arguing that many grammatical forms cannot be generated in this way with both an appropriate level of explanatory adequacy and lack of internal paradoxes for the computational system. Typically, paradoxical situations arise if computational rules R1 and R2 are logically ordered, but the only way one could generate (grammatical) object O is by applying R1, then R2 and then R1 again, to the very same object. In contrast, this sort of situation is not particularly problematic for OT systems, which work on abstract, input set-theoretic objects which get filtered by the system in terms of violable (soft) output constraints ranked in various fashions, in much the same way as standard connectionist networks proceed. This constitutes the first difference of a serious sort between derivational and representational systems. Note that this difference can push matters in the derivational direction also. For instance, imagine that object O must a fortiori be described as involving rule ordering, as in some sense rule R1 has the effect of destroying the context for another instance of this rule to apply, so that only R2 can apply next. If such a situation arises, the emerging ordering may not be describable in purely representational terms, as it may well be that some crucial information that was present in the course of the derivation disappears by the end of it, and is not recoverable. In any case, both of these formal circumstances, one showing a representational bias and the other a derivational one, have to be qualified in terms of at least two things. One is the possibility that there should always be a representational residue (e.g. a copy trace) of any rule application, in which case strictly speaking no rule would entirely destroy the context for its own later application, or that of another rule, if the residue is fully informative. Second, the object O where rules R1 and R2 apply may be structured in two “cycles” C1 and C2 (i.e. C2[… C1[…]…]), so that rules R1 and R2 apply within C1, and next R1 applies again at C2. As will be clear throughout, motivating cyclic domains is a central topic of this book. Purely derivational systems expect ultimately no stability points of the representational sort, whereby a set of representations can be adequately unified into a particular level. For example, in the transition from the Government and Binding (GB) model to minimalism, D-structure as a level of representation vanished, for empirical reasons. As a result, the system could intersperse D-structure-like objects (phrases) with S-structure-like objects (chains), in a 5
DERIVATIONS
strictly cyclic fashion. Whereas in any Extended Standard version of the system one would first complete D-structure and then go back down the phrase-marker to start generating transformations, minimalism does both things simultaneously, relying on economy principles to decide on which sort of step comes first. In that sense, minimalism is obviously less representational than GB, although standard minimalism (if there is such a thing) still retains representational residues in LF or PF, levels of representation in the sense above. Some researchers, myself included, have been exploring the possibility that this sort of residue is unnecessary too, and the entire model can be purely cyclic, with no proper levels. We say that a derivation converges at a level of representation like PF or LF if it meets whatever legibility conditions these levels impose; otherwise it crashes. This is the principle of Full Interpretation, demanding the legibility of syntactic representations. In its most radical version, a purely derivationalist system with no levels of representation should not have a coherent notion of convergence as such, and thus not obey the legibility conditions of Full Interpretation. This is neither good nor bad a priori: it depends on whether the system itself actually generates all and only the sorts of objects that convergence/ legibility is supposed to sanction. The focus of the system, and the resources, are different, however. Thus a partly representational system can generate object O and then ask: “Does O converge according to Full Interpretation (is O legible)?” The filtering effect of a negative answer is lost in a purely derivational system: the syntactic procedure itself has to come out with the grammatical output, for there is no intermediate representational point for the system to stop and evaluate the representations achieved, proceeding onwards with only the grammatical ones in terms of what is legible to the next component of the system. We can illustrate that point in terms of various familiar notions; for instance, a chain C understood as a set of (characteristically connected) phrase-markers K and L, that is: {K, L}. (2) [K…[…L…]…]… Chain C {K, L} movement For a pure representationalist this is a formal object obeying certain general adequacy conditions (of command, uniformity and locality among its links, for example). In contrast, the derivationalist must take a chain as nothing but the adequate – obeying command, uniformity, locality – derivational history of a computational procedure: say, movement (again, “history” having no temporal connotations). The emphasis for the representationalist is on a (complex) symbol, whereas for a derivationalist it is on a (complex) mechanism, which constitutes the second difference of a serious sort among the systems. At this point it is appropriate to raise a potential issue with regards to the philosophical notion of representation. A philosopher may be interested in whether representing symbols is more or less troubling than representing 6
INTRODUCTION
mechanisms. If the issue of representations is scientifically sound, a symbol would seem like an a priori candidate for a representation (in the Webster definition a symbol is “something which represents or typifies another thing, quality, etc.”). In contrast, a mechanism (in the Webster sense, “the working parts of a machine collectively”) need not be a priori representational, and it could be as dull – though “clever” as well – as the immune or the motor systems appear to be. For what it is worth, one can certainly give a derivational account of obviously non-symbolic systems: computational biology has scores of examples of just that sort, where talk of representations, in the philosophical sense at least, would be utterly beside the point. It is hard to turn that into an argument for derivational systems for two reasons. First of all, it is not obvious that the philosophical issue, as presently formulated, has a real scientific status. A representationalist could rightly defend this view by saying that no one has a clear picture of what it means to be a symbol, and hence whether a symbol needs to be mentally represented any more or less than a (mental) mechanism does. And as for the fact that several computational accounts exist of processes in the natural world which invoke no representations, the representationalist could dismiss their significance to present concerns by claiming that they are just using the computational mechanism as a modeling tool, whereas a derivationalist assigns this mechanism an ontological status of some sort which, when push comes to shove, ought to be represented as much (or little) as symbols are. Second, even for radically derivational systems which reduce everything to mechanisms, use of symbols at some level is not eliminable, at least in principle. It is for instance possible that the grammar generates some object according to its purely internal procedures, but the object in question simply cannot be interpreted by the interface components of the system. Strictly, this sort of object would not violate legibility conditions as such, as there are none in this extreme view; but it would not be intelligible. Intelligibility is an extra-syntactic notion, perhaps even a performative one – though not because of that a notion that one can shy away from. Conditions against vacuous quantification, for example, ruling out an illicit sentence like Who does John love Mary?, are very possibly of this sort; they may well have a say on whether otherwise grammatical structures, generated by standard derivational procedures, are judged acceptable by native speakers. If so, sooner or later all systems out there, be they derivational or representational, have to hit on symbols, which is the representational issue raised in these paragraphs. The role that symbolic elements play on one’s system may not decide its representational nature, but nonetheless have a bearing on the system’s internal shape. At stake ought to be how the system bottoms out. Symbols are arbitrary, and one must a fortiori admit either one of two things regarding prime symbols: (i) the primes that the systemic apparatus arranges in this or that fashion are completely random and could have been other elements; or (ii) there is a sub-system that predicts the properties of those primes. Of course, if (ii) obtains, one would want to understand the nature of that sub-system, and then a 7
DERIVATIONS
slippery-slope kind of argument suggests itself: the sub-system will also have to bottom-out, and the same puzzle arises all over again. Nonetheless, in principle a purely derivational system can at least pursue a type (ii) approach in meaningful ways, in much the same way that one does in a science like physics, where objects at some level are interactive processes at another – and the slipperyslope concern is basically set aside. In contrast, a representational system, by its very nature, has fewer scruples with a type (i) approach. The important thing in this general view is to establish the axioms that govern the interactions among given primes, and the exact nature of these symbols is admittedly immaterial. To be sure, this difference does not, in itself, distinguish between the two perspectives: it will depend on how reality turns out to be, whether prime symbols out there are more or less random or can be successfully modeled as having internal computational properties. In any case, in principle this constitutes a third difference between the two sorts of systems. We have seen above how, for both theoretical and empirical reasons, a derivational system may need to have cyclic characteristics. To repeat, a system without levels of representation ought to capture whatever properties these levels were designed to code in terms of cyclic application of processes; in turn, to address otherwise serious internal paradoxes with rule ordering, a derivational system may need to fix a given order of rule-application (for instance in terms of complexity) and then whenever that order seems to be violated, invoke a new cycle. Given this cyclic nature of the system, it is worth asking, also, whether cycles cut computational complexity, and whether this is a desirable property for grammars to have. If it is, this may turn into an argument for a derivational system. Naturally, concerns regarding computational complexity (as opposed to other sorts of elegance) should be immaterial in a representational system. This is a fourth difference between the systems. The possibility that computational complexity is relevant to the appropriate modeling of linguistic phenomena turns out to be a very difficult premise to establish in purely formal terms. Once again, one should not confuse the computational complexity that arises, normally in some mathematical limit, for a system of use (a performance matter) with whether complexity determines the design of the system of knowledge (a consideration about competence). Suppose one were to show that, as a class of alternative derivational objects that enters a computation grows, one cannot reach a computational decision about the optimality of the chosen derivation in polynomial time, or even in realistic time. Even if this were the case, it tells us little about the system of knowledge, just as problems with, say, center embedding do which make memory resources blow up. Speakers would simply use whichever chunks of structure work, not going into limiting situations, just as they avoid center embedding. Nonetheless, there are potential empirical advantages to proceeding with the desire to avoid computational complexity. Trivially, a system without levels of representation faces no need to treat unlimitedly large representations as units where dependencies can in principle hold. In turn, if a procedure is found for breaking down representations into minimal blocks, and these turn out to be 8
INTRODUCTION
natural domains for syntactic processes to take place, then an argument for the derivational system is at hand. Conversely, if there are syntactic dependencies which are established across computationally defined cyclic domains, that would be an argument for a representational system. Aside from the four differences I have sketched above between the two systems, all of which can be found in the specialized literature, there is a fifth difference, which to the best of my knowledge has not been seriously discussed. Consider the matter of glitches or imperfections in a system. A representational model has the formal structure of a logical edifice obeying eternal axioms; an error in such a system should constitute the end of the game. Computational systems, however, are less pristine, as familiar bugs in our personal computers readily attest. A computational glitch need not halt a program, which may find ways around it and even benefit (very rarely) from particular bugs. Once again, this is a priori neither good nor bad for modeling the linguistic system. It will depend on how it pans out, empirically. For the most part, I will have relatively little to say about this last difference, but it is something to keep in mind in principle, particularly as linguists discuss to what extent the language faculty obeys properties of design optimality, itself a purely empirical matter. In sum, these are the serious differences existing between each sort of system: (3) The first difference: The role of ordering The second difference: The presence of symbols vs. procedures The third difference: The role of prime architecture The fourth difference: The role of computational complexity The fifth difference: The presence of glitches
3 Symbols vs. procedures Chomsky’s derivational version of the Minimalist Program (MP) addresses theoretical and empirical worries with Deep and Surface-structures by denying the existence of these particular levels. If the grammar is an optimal interface with outside systems, only the latter will determine the actual levels of representation; this rules out the model-internal S-structure. In turn, the interface levels are supposed to be (virtually) conceptually necessary. Although it is unclear how this rules out D-structure (the lexicon could in principle introduce an interface with a conceptual system), empirical arguments exist against D-structure as a level of representation in the technical meaning above (see e.g. Chomsky 1995b: 188, reporting on an argument by Lasnik based on an example by Kevin Kearney). Epstein and his associates have explored the possibility that no particular level of representation is part of the system – not even LF or PF. This book, among other things, considers the conditions that the model should have in order to fulfill that sort of goal. Note that failing to meet any of the five defining conditions in Section 1 would suffice for an object not to count as a level of 9
DERIVATIONS
representation. Possibly, MP does not need (v), given assumptions about interface optimality (i.e. if there is only PF and LF, and they do not feed one another). (iv) is also unnecessary if a formal unification is achieved in performance, a perfectly reasonable possibility. For example a subject and a predicate would unite to form a proposition only when (Fregean) interpretation demands arise, after LF. Of course, (i) through (iii) above constitute the substantive content of each representational bit in the system (what distinguishes, say, phonology from semantics). While discussing the representational nature of these elements is central to this book, it is nonetheless clear that meeting those conditions need not be done in terms of a level of representation: those can be just properties of whatever formal objects derivations work with. Consider in that respect Chomsky’s “bare Phrase-structure” (BPS) project, adapted from earlier work by Muyskens. This thesis takes minimal and maximal projections as initial or end states of (non-vacuous) merger processes. Although they are real representational objects in the system (for instance in determining uniform chains), levels of projection are thus not primitive, and in fact they change as the derivation unfolds (e.g. X prior to and after merge, from a maximal to intermediate projection status). Epstein’s general approach to command can be interpreted in a similar vein: while this notion is real in that it determines a host of representational properties, command from X to Y is nothing but a reflex of whether X and Y are introduced in the same derivational workspace. Whether X commands Y depends only on whether there happens to emerge a K containing Y that immediately contains X; that situation will arise only if there is no separate, as it were stable, L that X is part of. The Multiple Spell-Out (MSO) idea, a central topic of this book, was proposed in that spirit. Assume a version of Kayne’s Linear Correspondence Axiom (LCA) compatible with Chomsky’s BPS: (4) a. Base: If X commands Y then X precedes Y. b. Induction: If X precedes Y and X dominates Z, then Z precedes Y. In minimalist terms (4) should either follow (from “virtual” conceptual necessity, including economy considerations in this rubric) or be a consequence of interface conditions. Assume furthermore that linearization is a PF demand. (4a) can be deduced as one optimal solution to that interface need, given preexisting merged objects (see Chapter 3). However, (1b) does not follow from anything, so the question is whether it can be eliminated. MSO suggests that it can, if Spell-out is taken to be a mere rule applying as many times as necessary up to convergence (within economy limits). In the MSO system, natural “cascades” of structure emerge every time the derivation encounters a non-complement. Thus these units are predicted to be substantively employed, for instance in: (5) a. A definition of “command” (relevant wherever it is employed, e.g. “distance”). b. Focus projections in PF. 10
INTRODUCTION
c. Scope, etc. at LF. d. CED (and other complement/non-complement) effects in the derivation. The latter follow if Spell-out “flattens” structure, converting an array of embedded sets (a phrase-marker) into a string of items which can be read as ultimately phonetic symbols. That string is no longer a syntactic object, hence it disallows operations within it (the “impenetrability” of cascades is thus deduced; see Chapters 3, 4 and 5 on this). Conversely, phenomena not amenable to the “cascade” pattern cannot be accounted for within the limits of the narrow computation (in a standard bottom-up fashion); for instance in: (6) a. b. c. d.
Anti-command effects. Liaison and other across cascade effects in PF. Antecedence effects at LF. Combinations of these (e.g. Weak cross-over, including anticommand and novelty).
Within this system, what are notions such as agreement (concord) or Case? Agreement may just be an “address system” to unify separate cascades (this would explain why it does not arise for head-complement relations, part of the same cascade; see Chapters 3 and 5). Case may be a mark of syntactic distinctness for identical lexical types, which would otherwise collapse into one another (see Chapter 8). Observe, for instance: (7) DP V DP
(e.g. He hit he.)
How does the grammar tell the “first” DP from the “second” in order to operate with them? By hypothesis (structure dependency) the grammar cannot use “first” and “second” notions. It may use syntactic context as token identification, but what happens prior to merge (in the numeration, a set of tokens) or after merged units collapse into PF strings (at the point of unification of separately spelled-out cascades)? There is no syntactic context in those instances, and Case may be seen as a configurational coding: e.g. “nominative” sister to T projection, etc. Apparently this phenomenon (distinct Case valuation) only happens locally; not across separate cascades, for instance. That is expected if it is locally (within cascades) that Case values are important. Aside from descriptive consequences, that approach has indirect consequences if coupled with assumptions about a narrow mapping into the semantics, a broad minimalistic assumption: (8) Transparency Thesis Semantic relations arise from computationally established syntactic relations. Then, all other things being equal, (10) should hold (a central theme of Chapter 8): 11
DERIVATIONS
(9) Semantic interpretation of DP dependency (binding, control, licensing) is under command. (10) Elements marked for syntactic distinctness (e.g. through Case value) are interpreted as semantically distinct (not co-referent or co-variant), a local obviation effect. Clearly, then, the general derivational dynamics of MSO have important representational consequences. One of the main goals of this book is to substantiate this general point.
4 Computational complexity A different kind of argument for a derivational approach is based on the possibility that computational complexity is relevant to the appropriate modeling of linguistic phenomena. Chomsky has made that argument central to his last couple of papers. MSO shares those papers’ characteristic of cutting down computational complexity, although it arises for different reasons there. For Chomsky the issue emerges technically – as a way to cyclically eliminating unwanted features – and is taken to be primitive: the desire to cut computational complexity makes the system assume a proviso about “phases” in the computation. As we saw in the previous section, for MSO the issue arose as an attempt to deduce Kayne’s LCA from computational dynamics. Thus, although cyclic Spell-out does cut down computational complexity, that does not motivate its presence within the system – convergence does. Another difference between Chomsky’s system of phases and MSO is in the status of “impenetrability.” Since Chomsky’s phases are stipulated (in terms of complexity avoidance) he can stipulate, further, whether these are transparent domains, and if so under what circumstances (e.g. at the “edge”). But MSO cascades (in Chapters 3, 4, and 8) are theorematic, hence cannot be tampered with. Adding to or subtracting from cascades would have to be done by way of manipulating the reasoning that yields them as side-effects. Three other chapters dealing with “impenetrability” are 4, 6 and 7, the latter two collaborations with Hornstein and Castillo. All these pieces have as a sideeffect the emergence of cyclic domains with island properties (across agreeing subjects, at LF and successive-cyclically, respectively). But it is important to note that in all these instances a barrier for an operation (and hence a cycle of sorts) arises because of derivational properties, not just in order to cut derivational complexity, although of course that is a consequence as well.
5 Ordering We have examined syntactic notions that either make no obvious representational sense (Why should specifiers, and not complements, agree? Why should arguments, and not predicates, host Case differences?) or are as natural as representational alternatives that do not arise (Why are dependencies under 12
INTRODUCTION
command, and not anti-command?). But there are also notions that either seem as natural as derivational alternatives that do not arise, or make no obvious derivational sense. Among the former, consider the second argument of a binary quantifier, as in (11e): (11) a. John likes most children. b. [[most children] [John likes ____]] c. The size of the intersection of the set of children and the set of entities that John likes is larger than half the set of children. d. most children set of children e. most John likes set of entities that John likes While it is clear what sort of derivational syntax relates most and its first argument, children, what kind of relation exists between the terms of “” in (11e)? As Chapter 6 shows, even if we do QR, that question has no obvious answer: (12) [XP [DPmost children] … [YP John likes t]] Whatever we say XP and YP are, John likes t is not in construction with most or its projections. Familiar modes of argument taking are as seen in (13), which can be deduced derivationally: (13) a. Head-complement relations (merge of head to maximal projection). b. Head-specifier relations (merge of two maximal projections). (13a) is used for direct, internal arguments whereas (13b) is for indirect, external arguments, in both instances as seems natural: internal arguments within head projection, external arguments without, but still merging. Semantically, we want John likes t to be the external argument (thus the specifier) of most but it is not even a sister to the determiner, let alone its specifier. We can stipulate some new, non-derivational specification to indicate that John likes t is the external argument of most. For instance, assuming standard QR: (14) If (at LF) quantifier X is immediately contained within Y, then Y is X’s external argument. But we should ask ourselves why (14) holds, and not (15) (or more radical alternatives): (15) a. If (at LF) quantifier X is (immediately) dominated by Y, then [same as (14)]. b. If [same as (14)], then Y is X’s internal argument. Representationally, (15) seems no more or less elegant than (14). Which makes one wonder whether QR plus (14) is the right way to go, since nothing requires it. A derivationalist ought to worry and try to address these serious difficulties. Ideally, this should be done in such a way that leaves no room for doubt: with 13
DERIVATIONS
the certainty that the solution is not “patch-work,” with the possibility that it cannot be replicated in natural representational terms. For the latter, the best situation is based on systemic characteristics that are best expressed in purely computational terms; this should be so in such radical conditions that the output of the computational procedure cannot reproduce the input, which is like saying that some information gets lost along the way. It is hard to find relevant examples of “loss of information,” among other things because, on the average, linguistic processes are highly conservative. Familiar constraints on the recoverability of deletion operations, or what Chomsky calls the “inclusiveness” of derivations (no features can be added to a computation if they were not part of the input), can obviously be expressed in terms of some sort of Law of the Conservation of Patterns (see Chapters 8 and 15). That same law, however, normally prevents us from teasing apart a computational and a representational approach. At the same time, there may be places where the law does not hold, just as there are points in the universe where some physical conservation laws come into question (e.g. black holes or the early universe). One such point might involve the creation of a linguistic object which is possible only after some other, definably simpler computational object has been achieved. Although in other circumstances the “simpler” object can surface, in the concrete construction in point such an object would be a mere computational state with no representational consequence. One relevant example is provided by the analysis of binary quantificational expressions based on the notion reprojection, as discussed in Chapter 6. This is a sort of formal entity which changes its label in mid derivation, in such a way that the original label could only be recovered by positing dual representations. For a sentence like most people love children Hornstein and I argue that the unit most people – the specifier of IP early in the derivation – literally takes I as its own specifier at a later stage, resulting in a sort of giant DP, of which people is the complement and the rest of the structure in the sentence is the specifier. This provides a natural syntax for binary quantification, and as I mentioned turns out to also predict peculiar islands that arise late in the derivation (a relation across a reprojected binary quantifier would be akin to a relation across a non-complement in the LF component, thus generally barred). A representational version of this sort of analysis, based on the dual nature of quantifiers (they are both arguments at some level and predicates at a further, derived level), seems more unnatural, in that it must postulate some artifact to account for the lost information. A further issue that arises with regards to rule ordering is why successivecyclic processes exist. Suppose that for some reason of the sort discussed above, linguistic objects are structured in cycles C2 [. . . C1 [. . .]. . .], which happen to be impenetrable. How then do we ever manage long-distance relations? The minimalist answers I am familiar with are stipulative: they either deny successivecyclicity or they code it through some technical, “peripheral” feature which happens to drive the computation to intermediate stages. It is, of course, per14
INTRODUCTION
fectly conceivable that the system may have come out with a different alternative: the impossibility of long-distance movement altogether. While that appears to be the case in some languages for some constructions (e.g. whmovement), most familiar languages allow unbounded extractions. In Chapter 7, Castillo and I suggest a very different take on this matter, based on an idea from Tree-Adjoining Grammars: in effect there is no longdistance wh-movement. What happens in unbounded movements is that the “intervening” structure gets successively merged in “tuck-in” fashion, between a moved wh-phrase and the rest of the structure. This eliminates artificial features, and raises the possibility that merger need not extend a phrase-marker at the root. Successive-cyclic effects arise in terms of the kinds of complementizers that can be successively tucked-in. If a language allows for non-identical complementizers for declaratives and questions, then a successive-cyclic effect will arise. In contrast, a language with identical complementizers for both sorts of clause types, or a situation whereby a wh-element is trying to move across another wh-element, will result in an impossible merge as the merging items collapse into one another under identity (see Chapter 8). This amounts to an island effect. From a representational perspective, it is not entirely clear why the phenomenon of successive-cyclicity should exist, or why there should be variations among languages in this respect. Needless to say, one can code relevant properties of the shape of chains, but it is very strange that in language A wh-chains should be possible across wh-islands, while in language B only across bridgeverbs, while still in language C relevant chains should be sanctioned only when strictly local.
6 Prime architecture: relational terms But to carry a derivational program through one cannot stop at showing how some representational notions are best seen as derivational consequences. One has to pull the trick for all such notions, or the system is still (partially) representational. The hardest problem turns out to be with such notions as V or N, which cannot be blamed on the properties of syntagmatic or “horizontal” syntax, as these are paradigmatic or “vertical” notions, which standard derivational systems are not designed to care about: in traditional derivational terms, it makes no difference whether we are manipulating V or N, or for that matter any of the ones and zeroes of a binary code. Of course, one can try to apply some horizontal syntax to such atomic-looking elements as V or N, which is what the generative semanticists did in the late 1960s. That project, in the limit, leads to the denial of autonomous syntax, making it a superfluous, basically emergent object between thought and surface linguistic representation. For both empirical and philosophical reasons I agree with those atomists that take that project to be misguided. Empirically, it still seems to be the case, as was some thirty years ago, that sub-lexical stuff does not have the characteristic productivity, systematicity and transparency of supra-lexical 15
DERIVATIONS
combination; plus it obeys other sorts of restrictions involving the canonicity of given expressions or predictable gaps in paradigms (the notion “paradigm” makes sense only in this domain). In a deeper conceptual sense one does not really want to split the lexical atom, if only because what to do next is not altogether clear, particularly if one’s view of semantics is as constrained as sketched in the Transparency Thesis above. But then derivationalists seem to be cornered into an irreducible position: they cannot explore a standard syntactic approach to sub-lexical stuff, short of falling into a version of generative semantics; yet if they do not, they hit a bedrock representational tier of the language faculty. I am clearer about the puzzle than about its possible solutions. The one I attempt here springs from two sources, one more directly relevant than the other. I became seriously concerned with lexical relations when coming to the (I think) surprising realization that so-called relational terms cannot be confined to a handful of peculiar elements in the lexicon which behave at the same time as objects and as concepts – things like brother and the like. The usual treatment for these creatures is to assume they involve two different variables, one relational and the other referential. If the referential one is used we can then come up with expressions like the Marx brothers, which is much like the Marx guys; but if the relational variable is used, then we are allowed predicative expressions like Groucho is Chico’s brother. In possessive guise both of these aspects are present at the same time, as in Groucho has a brother. The latter is important, if only because much serious syntactic work has gone, during the last decade, into this kind of syntax, after Kayne brought to the fore some interesting ideas of Anna Szabolcsi’s from the early 1980s (see Chapter 10 on these issues). The Kayne/Szabolcsi program sought, in essence, to explore the parallelisms existing between nominal and clausal structure, both syntactically and semantically. As such, parallelisms are neither here nor there; nonetheless, if carried to the limit this general idea may point toward the possibility that what looks like stable syntactic objects, nouns, in some instances at least, may encompass some basic conceptual relation. That could be anecdotal (the view that this, in the end, only happens for “relational” terms) or systematic; in the latter instance, surely it would have some relevance for the matter of a putative decomposition of core lexical spaces. After all, the “ur” lexical space is the noun (one can always think of verbs as relational, necessarily involving arguments which, in the end, bottom out as nominal). If that “ur” space can be relational, then it is not atomic, at least not necessarily so. Working with Hornstein and Rosen, it became clear that relational terms are very common, particularly if we extend our observations in terms of a Kayne/Szabolcsi syntax to part-whole relations (the content of Chapter 9). In short, all concrete nouns (most of those in the lexicon) are either part of something or whole to something else; indeed, often these relations are possible in both directions (e.g. a city contains neighborhoods but is part of a nation). Crucially, the relevant expressions (of the sort in My car has a Ford T engine) are so 16
INTRODUCTION
common and predictable that it would be pointless to list their relational properties as lexical idiosyncrasies. What is more, after a serious syntactic analysis, it becomes obvious that not all logically possible “possessive” relations find an expression in syntactic terms. For instance, together with (16a) we find (16b), as the Kayne/Szabolcsi syntax would lead us to predict if appropriately extended to these sorts of expressions. But together with (17a) we do not find (17b): (16) a. the poor neighborhoods of the city b. the city’s poor neighborhoods (17) a. a city of poor neighborhoods b. *the poor neighborhoods’ city Chapter 10 shows how this can be predicted, assuming a rich syntactic system, in terms having to do with the locality of movement (while movement of the city in (16b), from a base position to its specifier site, can be local, movement of the poor neighborhoods in (17b) is not). From the point of view of these relations being purely lexical, this gap in the paradigm makes no sense. If syntax aptly predicts some of these phenomena, it becomes legitimate to explore them syntactically, perhaps without committing to too many hostages regarding the general details of this syntactic apparatus. In the end, what is needed comes down to a small-clause, or basic predication. From a NeoDavidsonian perspective of the sort advocated by Higginbotham – which builds all semantic relations around basic predications – this is arguably a welcome step. Although I must admit that there is nothing traditional about these sorts of predications, which involve such nightmares as “material constitution” (e.g. this ring is/has gold) and appropriately cumbersome versions of these notions in what one may think as “formal” terms (e.g. this organization is/has several levels). And to complicate things further, only some of the various combinations of these notions are possible; thus compare: (18) a. A [robust [ninety-pound [calf]]] FORM SUBSTANCE b. (#) A [ninety-pound [robust [calf]]] SUBSTANCE FORM (18b) either adds emphasis on ninety-pound or is very odd; the neutral way to say the expression is as in (18a). Similarly in other languages, where these notions manifest themselves in standard possessive guise, for instance the Spanish (19): (19) a. una ternera de noventa libras de buena presencia a calf of ninety pounds of good looks b. (#) una ternera de buena presencia de noventa libras a calf of good looks of ninety pounds This must mean that there is a certain ordering in the way in which these various relations manifest themselves in syntax, whatever that follows from. 17
DERIVATIONS
The sort of thesis just sketched – that some apparently complex lexical items are best seen as the reflex of some standard syntax – is still a fairly traditional syntagmatic proposal. In effect, the idea comes down to taking a characteristic of a noun which is lumped together with other features as one of its paradigmatic properties, and discharging it as some piece of syntax. Chapters 11, 12 and 13, involving small-clauses/partitives, names and propositions, respectively, discuss other such examples of the relevant sort, all analyzed in terms of roughly “possessive” syntax. Of course, these moves, as such, are not entirely unlike those undertaken by generative semanticists. If the moves are constrained, however, to instances where the various dependencies involved manifest themselves in richly syntactic (meaning systematic, productive, transparent) terms, then the atomist need not be worried. He or she can diffuse this potentially troubling instance by admitting that some apparently lexical relations are not that, after all; the same concession is made regarding inflectional morphology, without the atomistic thesis being affected. Where the atomist cannot yield is in the idea that syntax, somewhere, eventually stops messing with some lexical items, more or less at the point that speakers stop having strong intuitions and more or less where systematicity, productivity and transparency break down or go away. At the same time, once a “possessive” syntax is in place, in particular, for nominal relations of the syntagmatic sort, we can ask whether that syntax does us any good for nominal relations of the paradigmatic sort, whatever residue of those we are prepared to accept. We have seen some of that already, in (18)/(19). True, these are facts of syntagmatic syntax, inasmuch as we are determining possible combinations among successive words; at the same time, where do we code the appropriate restriction? Should it not be in some paradigmatic cut of, in that instance, a noun like calf which must code “substance” dependents before it codes “formal” ones? That sort of ordering is familiar from other parts of the grammar where dependencies are organized. For example, we know that verbs code “themes” before “agents,” or that auxiliary selection is not random and correlates with aspectual properties of verbs. In a sense these are all vertical cuts of language (corresponding to familiar organizations among lexical spaces, such as the one that has all events implying a state, but not conversely; or a count term implying a mass, but not vice versa). Those are the ones that, if seen in sheer syntagmatic terms, worry the atomist: if we say that an event is analyzed syntagmatically as a state plus some sort of (say, causal) function, or a count term is analyzed syntagmatically as a mass term plus some sort of (say, classifier) function, we have again started the linguistic wars. Be that as it may, though, is there any advantage to thinking that the paradigmatic characteristic of nominal stuff being, for instance, concrete, stands in a kind of predicational relation to that very nominal stuff?
7 Prime architecture: categories We know horizontal syntax to be full of hierarchies, for verbal and nominal ontologies, thematic roles, auxiliaries and so on. Curiously, our normal syntax 18
INTRODUCTION
works without being sensitive to them – they have to be stipulated on the side. What we generally do with these objects is blame them on some outside notion, the structure of either the world or of the extra-linguistic mind. In both my collaborations with Paul Pietroski (Chapter 14), and Chapter 15, it is suggested that ascribing responsibility for the relevant orderings in those domains is unilluminating. We also submit that, at the very least, we should ask ourselves whether that other implicational, thus hierarchical, edifice that we know we are capable of as a species – namely counting – is partly responsible for the observed hierarchies in language. The idea would be that our mind is structured in those “dimensional” terms, and we use that apparatus to construct the structure of linguistic concepts. In this view, if it turns out that, say, events come out as formally more complex than states, or count nouns than mass terms, this would have nothing to do with the structure of the semantic correspondences to these terms. The culprit for this ordering would have to be found in the fact that, for some reason, the syntactic objects (event V vs. state V or count N vs. mass N) align themselves in the dimensional way that the whole and the natural numbers do, for instance. Whatever the merit of this general approach, it would have one immediate consequence: most of the representational apparatus needed to code symbols would reduce to something with the formal status of “0,” the basic prime we need in order to build mathematical concepts. One essentially starts with a nominal formal space corresponding to the “ur” abstract concept (with no particular specification about its lexical meaning, at least in syntax), and more complex formal spaces correspond to what amounts to topological folds on this space, which boost its dimensionality to the next level. This yields hierarchies of the sort animate count mass abstract, within the nominal system. In turn verbal systems can be conceived as adding a time dimension in horizontal syntax (verbs, unlike nouns, must have a syntagmatic nature, the combination with their arguments). In particular, a verb can be seen as a dynamic function, the derivative over time of the nominal space of its theme, or internal argument. Drinking a beer, for instance, can be modeled as a kind of function that monitors the beer change in the glass until it no longer exists, which is commonly thought of as the grammatical aspect of this sort of action. In turn verbal spaces can also be “warped” into higher dimensions through the addition of arguments, yielding familiar accomplishment achievement activity state hierarchies. What is derivational and representational in that sort of “warping” system is not easy to tell. Surely something with the notional status of “0” is still representational, but it is no longer clear whether whatever has the notional status of a derivative of a function of arguments like “0” over time need be any more representational than the computation of “command” as we saw above, or any such simple-minded computational entity. I cannot venture any speculations to decide on this interesting question simply because I lack the mathematical, computational or philosophical expertise to do so. My intention is simply to bring the matter to an appropriate level of abstraction, so that the question can be decided, or at least posed, in terms that merit the discussion. 19
DERIVATIONS
This view of things addresses the atomism vs. decomposition conflict only in part, as it is a mixed solution. In an ideal world the atomist should rest assured that horizontal syntax does not have to have the properties of vertical syntax. True, in essence I am assuming a sort of very abstract predication “all the way down,” so that whenever we find a hierarchy in language of the sort . . . A B C, I am in the end analyzing it as PRED(C) B, PRED(B) A. Although in some languages these very basic predications clearly show up in morphemic, or even lexical guise (classifiers, causativizers, etc.), their nature is significantly different from that of “normal” predicates like red (x). In this view of things, CLASSIFIER(x), for instance, when applied to whatever codes the extension of a mass term, yields a new kind of formal object: an individual, or classified extension of mass. And the point that matters to me right now is that this is not a syntagmatic entity, but a paradigmatic one. As to why that difference should translate to lack of productivity, systematicity or transparency of the relevant system, be sensitive to canonicity considerations (ultimately a matter of frequency in standardized use), or for that matter why people should have less solid intuitions about this sub-lexical stuff than supra-lexical stuff, that is anybody’s guess. In turn the decompositionalist, given this general view, should also be satisfied with the fact that it allows for all the standard reasons we have to look inside lexical atoms: to capture lexical entailments (or similar relations) and model the structure of languages where given lexical properties show up as morphemes, or separate words. That sort of evidence is totally consistent with the model just sketched, and furthermore constrained by it. As for the representational residue, there is little to say. One can blame it on some pattern of brain activity, or even the interlocking of patterns coming from the various organs that are possibly implicated in “ur” thoughts (in a dimensional system, it is not hard to imagine “talk” across dimensions of different organs without giving up a modularity thesis). Of course, this will not satisfy the philosopher who wants to nail linguists into having to admit some sort of representations, in the philosophical sense. The good news here, I suppose, is that this representation is so far removed from even familiar notions like V or N, that perhaps it should not be all that troubling. It poses a central question about what “thought” is, but not, perhaps, any deeper than that.
8 Organization of the chapters The present book is divided in two main halves: a series of standard (syntagmatic) derivational chapters, and a series of more radical (eventually paradigmatic) derivational chapters. In addition, a conceptual, introductory section sets up those concrete parts in Chapter 2. The introductory section includes a review of the Minimalist Program that appeared in Lingua (107, 1999) and my contribution to a polemic that came out in NLLT (2000 and 2001). I think that these pieces give a substantial presentation of the general derivational approach from a conceptually broad minimalist perspective. The syntagmatic derivations section starts (Chapter 3) with an article written 20
INTRODUCTION
in the mid-1990s, which was published in Epstein and Hornstein (1999), where the MSO system is sketched. Although less conceptual, the paper with Jairo Nunes, which appeared in Syntax (3, 2000), does a better job as Chapter 4 at presenting the technical details of this system. An empirical application to various problems of extraction is discussed in Chapter 5, which appeared in NLLT (17, 1999). The most far-reaching, and also radical ideas within this general framework are discussed in the last paper in this section, Chapter 8, which was published in Wilder, Gaertner and Bierwisch 1996. Two other papers in the same spirit, though with slightly different implications, are collaborations with Hornstein (Chapter 6) and Castillo (Chapter 7), both of which appeared as contributions to UMDWPL (8, in 1999 and 9, in 2000, respectively); they present “reprojections” and a highly derivational account of successive cyclicity, respectively. The paradigmatic derivations part is introduced in Chapter 9 by the collaboration with Hornstein and Rosen, which came out as a WCCFL presentation and, in an expanded fashion, in UMDWPL (2, 1994) – the version included here. Next is a conceptual, thus broader, piece that came out in Schwegler, Tranel and Uribe-Etxebarria (1998); the main ideas concerning the syntax of possession, as well as what it may entail for the larger concerns mentioned above, are sketched in this Chapter 10. Papers on names and rigidity matters (Chapter 12, which appeared in Alexiadou and Wilder 1998) and two other coauthored works on small-clauses (with Raposo, from Cardinaletti and Guasti 1995, Chapter 11) and parataxis (with Torrego, given as a paper at Georgetown and unpublished, Chapter 13 in this volume) expand on general possessive concerns. The chapters demonstrate how those issues show up in unexpected domains: e.g. the relation between a nominal and a syntactic presentation of its reference, a clause and a syntactic presentation of its truth value, or a variety of intricacies arising in the analysis of small-clauses of different kinds. Those are already notions which are normally analyzed in paradigmatic terms. We then come to the last two chapters in this section, one that appeared in Theoretical Linguistics (29, 1999), now Chapter 15, and the collaboration with Pietroski in UMDWPL (2001), Chapter 14, where it is explicitly proposed that many familiar paradigmatic cuts on various standard categories may be analyzed in roughly possessive guise, creating various dimensions of syntactic complexity.
21
2 CONCEPTUAL MATTERS
In this chapter a general architectural discussion of minimalism is presented from a derivational viewpoint. Section 1 is a review of the program. Sections 2 and 3 are responses which appeared in the context of a debate over the derivational take on the minimalist perspective.
1 Book review: Noam Chomsky, The Minimalist Program1 It may already be a cliché to note that Chomsky’s latest work (Chomsky 1995b) is both revolutionary and frustrating. To address the latter, the book was published at break-neck speed and would have certainly benefited from more time prior to publication to correct typos (e.g. “with pied-piping” instead of “without piedpiping” on p. 234, second paragraph), “thinkos” (e.g. the claim on p. 235 that “Merge is costless for principled reasons,” which confuses the type of operation with tokens of its application, with non-trivial consequences for the analysis of (169, on p. 346), ensure consistency of style (particularly since the first three chapters show the always healthy influence of Lasnik), sort out the non sequiturs and contradictions in Chapter 4 and guarantee the reliability of the index. Despite these imperfections, the book is a work of genius and I urge readers to overlook these minor defects and persevere with it, in much the same way as an audience at a performance by a legendary artist would ignore small technical inaccuracies. With all that out of the way, I will refrain from repeating the technical wonders of the Minimalist Program (MP), among other things because the very able work of Freidin (1997), and Zwart (1998), already does it justice, in widely accessible and very clear reviews. My interest is slightly more conceptual, which is where I think MP is truly unique. Concretely, the main focus of this review will be to compare the theory of grammar presented in MP (and its assumptions) to its predecessor GB (Government and Binding theory), with which it shares so much and from which it is so very distant. 1.1 Modularity GB was taken to be a modular system where different sub-components of grammar (Case, Theta, Binding, etc.) conspire to limit the class of admissible 22
CONCEPTUAL MATTERS
objects. This was a somewhat peculiar architecture from the point of view of Fodor’s famous modularity thesis (Fodor 1983). For him, there are a variety of isolated mental modules (language faculty, visual system, etc.) connected among themselves by a central system. It is not within the spirit of strict modularity to open the gate to modules within modules (and so on), which in the limit reduces to a connectionist (anti-modular) network. But in GB, for instance the (sub-) module of Binding Theory did have further (sub-sub-)modules: Condition A and Condition B could be independently (in NP-t or pro) or jointly satisfied (in PRO), or not at all (in wh-t). Indeed, because of the character just discussed, GB could be, and in fact is being modeled as a connectionist network, as in recent OT proposals. (The only non-connectionist aspect of OT is that, in the classical version of Prince and Smolensky (1993), summations of lower-ranked constraint violations may not outweigh higher-ranked ones; I understand that this has changed in recent proposals.) Constraints on X structures, theta configurations, Case and Binding relations, and all the rest, do lend themselves to the OT take on language and mind. Thus, X constraints, say, are ranked highest (depending on whether nonconfigurationality really exists), and others are ranked in this or the other fashion, yielding variation. The point is: the GB architecture, although sympathetic to Fodorian terminology, was in fact allowing an anti-Fodorian loophole. Of course, a connectionist network can model just about anything; some things, though, it models better than others. If we take the input to the network to be a completely trivial (non articulated) set-theoretic object, then it may or may not be the case that differently ranked constraints do the limiting job, in the meantime yielding variation. This is right or wrong, but not vacuous. More or less explicitly (see footnote 124 to Chapter 2 of Chomsky 1981), the GB input to the phrase-structure component was Lasnik and Kupin’s (1977) sets of monostrings, they themselves described from an unconstrained class of set-theoretic objects that was filtered out through admissibility conditions of various sorts. All other GB modules could be stated as admissibility conditions on those initial set-theoretic objects, and representational versions of the system assumed just that. Which is indeed, more or less reasonably, modeled in OT terms. MP does away with any natural connectionist modeling, since the theory has no modules – only features (p. 225ff.). It could be thought that a feature is just “a tiny module,” but this is really not the case. Whereas a Case module, for example, allows you to say anything you want about its internal workings (whether it invokes command, adjacency, directionality, etc.), a Case feature is a formative of the system, and can only be manipulated according to whatever laws the system as a whole is subject to. It makes no sense to claim, for instance, that (only) Case features are assigned under adjacency; on the other hand, it made perfect sense to say that a configuration of Case assignment which does not satisfy adjacency between the assigner and the assignee is ill-formed. The minimalist thinking is dramatically more limited. MP presents a strongly inductive system, where operations work in a strictly “inside-out” fashion to produce larger objects, which in turn become even larger 23
DERIVATIONS
through movement, and so on (p. 189). Whereas GB’s sets of monostrings could be a reasonable OT input, or GEN (indeed, even the more basic sets of mere strings could), it is senseless to say that MP’s Merge yields something remotely like a representation that OT constraints (of any sort) can manipulate. Nothing gets manipulated by, say, theta requirements until it is merged; no phrase can have anything to do with Case until it is moved (p. 258ff.), etc. The entire system is based on a complex inter-leaving of operations that invoke interactions and construct objects literally at the same time. You could, of course, model that with a connectionist network: having GEN be the output of Merge or Move – but that is trivially uninteresting. If there is a criticism to be made to MP in all these respects, it may be that it still is (not surprisingly) too similar to its GB parent. Thus, the program still analyzes derivations and representations in terms of phi (Case and agreement) features (p. 146ff., etc.), wh-features (p. 266ff.), or categorial (N, V, D . . .) features (e.g. p. 349ff.), obviously because the corresponding phenomena were understood in terms of modules that, in some cases, went by those very names. In most instances this leads to no particular surprises, but it does involve the occasional nightmare when it comes down to such classics as the Extended Projection Principle (pp. 232, 344) or Successive Cyclic wh-movement (e.g. p. 301ff.) – for which no real analysis is provided. Unfortunately, it is unclear (judging from Chomsky 2000, or his Fall 1997 lectures) that any solutions to these problems are forthcoming. 1.2 Government If something was the hallmark of the traditional GB model, it was the everpresent theme of government. Conceptually, government was quite significant: it served as the unifying criterion underlying all modules. It was a welcome notion too, since that sort of thing is obviously unpleasant to a connectionist: you do not really expect unifying notions that emerge in the interactions of randomly ranked constraints; why should Case constraints care about movement constraints, say? But of course, the grand unifying role was more rhetorical than anything. For X and theta theories, only government under sisterhood ever mattered; for Bounding Theory, sisterhood plus (perhaps) exceptional government was relevant, for Case and Binding theories, clearly exceptional government, with some added complications posed by specifiers, which are technically not governed by the head; for Control theory, it was not even obvious that government made a difference at all. Calling all these notions “government” was the only thing that unified them. MP bites the bullet, and riding on a wave that was already in motion in GB studies of the late 1980s, it flatly denies the theoretical possibility of invoking government (p. 173). This is quite an interesting move, both theoretically and conceptually. First of all, absence of government leads to all sorts of twists and turns of MP, the most intriguing ones having to do with the treatment of exceptional Case 24
CONCEPTUAL MATTERS
marking and related topics (p. 345ff.) – now by necessity a matter of LF under local conditions (see 147ff.). The door for such analyses was already opened in Chomsky (1986b), which invoked LF movement of associates to their corresponding expletives, to capture their “chain” behavior (see also p. 156). Only a small leap of faith moves us from that long-distance relation to one where no expletive awaits the moved element, and instead a mere relation of grammar (Case checking) obtains (p. 174). The process has to be checking, and not assigning Case, because the covertly moving element must carry its Case from the lexicon, or else it would never fork to PF in the appropriate guise (see p. 195). Second, and more importantly, exactly what is it that disallows the theorist to use government? That is: “what is ‘minimalist’ about Chomsky’s getting rid of government?” Or still in other words: could government ever come back? If as minimalism unfolds, it sticks to its guns, government should never come back. Government is to language like weight or motion are to heavenly bodies: a side effect. Of course, nothing has really been wasted by studying government; all that knowledge is something that any system will have to account for. The bad news, though, is equally interesting: we do not get rid of government because it did not work, but because it was ugly! Government was, first of all, ugly in itself – awfully hard to define coherently. Second, government was theoretically ugly in that, to make it work as it was supposed to (across modules) one had to keep adding except provisos or disjunctive clauses. If MP is right and, as Chomsky puts it (1986b: 168), “the hypotheses are more than just an artifact reflecting a mode of inquiry,” there may be something deep about the sort of beauty found in the language faculty, so much so that it restricts our ways of theorizing. Which is to say that, even if it turns out that a theory of language is better off without government, MP would assume no government regardless – so the rhetoric goes. That is, the decision to eliminate government from the theory does not seem to be just the methodological result of Ockham’s Razor, but rather a vague ontological intuition having to do with language being intrinsically beautiful, or “perfect.” Since that is an admittedly crazy stand to take, we should make much of this speculation. 1.3 Economy The best illustration of the bluntest move of MP is the concept of economy. One may have thought that Chomsky goes into the streamlined program he explores because this is the best available theory compatible with the data. This, however, is not obviously the case. Consider, for concreteness, one of the classic traits of GB, the Empty Category Principle (ECP) alluded to in p. 78ff., which was proposed in Lasnik and Saito (1984). Lasnik and Saito’s piece was, I think, misunderstood. They were the precursors of a single transformational process (“do whatever anywhere to anything”) that massively over generates, to then constrain the excess baggage in terms of principled (intra-modular) requirements. Their GB logic was as impeccable as 25
DERIVATIONS
their explanation was clear. Yet just about everyone was troubled by derivations that allowed movement “back and forth”; but why? Was that not the best theory with the fewest assumptions, with reasonable interacting theoretical elements that were there (to either make use of, or else explain away)? Yet MP explicitly denies us the possibility of moving “back and forth” to satisfy the ECP (the central notion last resort, via the various versions of greed explored in the book, impedes any movement that is not self-serving). Why is this so? Because that is how we are going to design things from now on. Of course, it is not the best theory, if only because it has to add a stipulation to the set of axioms (greed), no matter how many fancy terms we give it. Plainly, the theory is not so economical as the previous, but is nonetheless about a more economical object. I am loading the dice here to make a point. There are senses in which the emerging theory in MP is better than GB, in the Ockham’s Razor sense. It has, for example, fewer levels of representation (two instead of four, as from Chapter 3 onward), fewer modules (only one), less reference to non-trivial relations (perhaps just command, as in p. 254), and less of just about everything else. But, first of all, one has to be careful with this estimate. The system may only have two levels of representation, but it packs into definitions of various sorts all kinds of references to what used to be, say, D-structure (a multi-set of lexical tokens or numeration (p. 225), theta-relations in hierarchically configurational terms and one-to-one correspondence to arguments (p. 312), a mysterious requirement to check subject D-features (p. 232), and so on). Similarly, while the system may have no sub-modules, it does fundamentally distinguish among different types of features (p. 277) – so much so that relativized minimality is cued to these distinctions (297ff.). Which is all to say the obvious: it is hard to make good Ockham’s Razor arguments; what you take here, you often have to give back over there. Still, regardless of how many true stipulations are left in MP when someone figures out exactly what it is saying and exactly how it does what it claims to do, the program will probably stick to a rhetorical point. It does not matter. What you care about is not how many stipulations you need (at least, you do not any more than you do in any other science); rather, the fuss seems to be all about how “perfect” language is, how much “like a crystal” it is, or like one of those body plans that enthuse complexity theorists (e.g. the Fibonacci patterns that arise in plant morphology). These are all metaphors from Chomsky’s classes, although the present book only indirectly alludes to these issues (e.g. on p. 1 or p. 161ff.; see also the first few pages of Chomsky 2000). 1.4 A “mind plan” But of course, those considerations are not new to Chomskyan thought. Certainly, there are aspects of MP, and even more so of its follow-up (Chomsky 2000), which closely resemble Chomsky’s (1955) masterpiece. These include a heavily derivational model, a highly dynamic system where chunks of structure 26
CONCEPTUAL MATTERS
are merged and moved and manipulated as the derivational cycle unfolds and, in a word, the flexibility of a derivational flow that makes one seriously compare the linguistic “mind plan” with other “body plans” out there in nature, which are unmistakably flexible, dynamic and derivational (in the mathematical sense of the term; cf. the derivational models of plant or shell growth that stem from Aristide Lindemayer’s work, which is explicitly based on Chomsky’s rewrite rules). This “wormhole” between Chomsky’s last work and his first is, I think, very important for two reasons. One is intrinsic to MP, and clearly noted on p. 168. What are we to make of a system that is “perfect,” in the sense of MP – that is, presenting “discrete unboundedness,” “plastic underdeterminacy” and “structural economy”? Perhaps Chomsky’s greatest intellectual contribution so far is having the world acknowledge that there is something biological to grammar. But what is biological about those “perfect” properties? If this question is not addressed, a serious crisis lurks behind MP: how a seemingly artificial (“perfect”) program is to be squared with the traditional Chomskyan naturalistic goal. The fact that some modern biology is going in the direction that Chomsky foresaw, indeed decades ago, is not only a way out of a crisis that is at that point merely methodological (why are organisms that way, in general, if standard neoDarwinism has nothing to say about their “perfection”), but it is also a tribute to one of the deepest glories of MP: that it presents in detail what might be a reasonable “mind plan” in a concrete organism, us. More generally, this biological connection (albeit with the new biology) may be of use to those of us who think of ourselves as standard syntacticians. Let us not forget that, when all sound and fury is over, MP as it stands has relatively little to say about the ECP and all it was designed to achieve – not to mention scores of other issues. This is neither here nor there, but the question remains whether solutions are to be found within the technical confines of features right here or checking over there. Of course, this is just one among several possible instantiations of what looks like a very good idea, which makes syntactic representations and their interactions awfully similar, mathematically at least, to what goes on in nature. But if the latter is the case, one need not stop where MP has. Obvious questions remain. What does it mean to have movement, variation, morphology altogether? Why are syntactic objects arranged in terms of merge and its history? What are syntactic features and categories? Why does the system involve uninterpretable mechanisms? How tight is the connection to the interface systems, in what guise does it come, how many of those really exist? This is only the tip of the iceberg. Seen from the perspective of nature’s laws, those are truly fascinating questions, and the fact that we are even asking them is part of Chomsky’s legacy. Indeed, the possibility that “out there” and “in here” may be so closely related – even if obviously different – is, apart from humbling, some serious cause for optimism in our search, particularly for those who resist embarking on methodologically dualist quests that treat mind as different from matter. The irony is that (organized) matter might be much closer to mind than anybody had imagined, except Chomsky. 27
DERIVATIONS
2 On the emptiness of “design” polemics Lappin, Levine and Johnson (2000, henceforth LLJ) are concerned about the field gravitating toward the Minimalist Program (MP) without having exhausted the possibilities of the “Government and Binding” (GB) theory.2 In this reply, I concentrate on a concrete paradigm that has resisted a GB analysis. Since LLJ attack Chomsky’s system using in large part my book Rhyme and Reason (Uriagereka 1998), I will respond in my own terms. I believe that the analysis I discuss is in the spirit of precisely those general aspects of MP that LLJ find offensive, and it constitutes a good example of what they call a “staple cliché of trendy ‘parascientific’ chit-chat.” After I present the analysis, I discuss its assumptions, and attempt to show how they are not particularly unreasonable, or profoundly different from those that any natural scientist would make. That being the case, the force behind LLJ’s criticism dissipates. Languages differ as to whether the associate (A) in expletive constructions appears before or after a participial (P) expression. The (P, A) order is found, for instance, in Spanish (1), while (A, P) is attested in English (2): (1) Quedaron escritos tres libros. remained.AGR written.AGR three books (2) There remained three books written. Two variables seem implicated in the distribution of the orderings. One is verbal agreement. In constructions that present default V agreement in Spanish, the (P, A) order reverts to (A, P): (3) Hay tres libros escritos. have.DEF three books written.AGR “There are three books written.” The other relevant variable appears to be participial agreement. Swedish examples illustrate: (4) a. Det tre böcker skrivna. it three books written-AGR b. Det blev skrivet tre böcker. it became written.No-AGR three books Note first that V does not agree in these instances; second, observe how in constructions that lack P agreement in Swedish, the (A, P) order reverts to (P, A) (in contrast to (2)–(3)). Generalizing: (5) a. (P, A) in and, b. (A, P) in and
i(i) (ii) i(i) (ii)
V-agr, P-agr instances non-V-agr, non-P-agr instances. V-agr, non-P-agr instances non-V-agr, P-agr instances.
(5ai) is exemplified by V-agreeing Romance constructions and (5aii) by standard Danish, Norwegian and non-P-agreeing Swedish, whereas English 28
CONCEPTUAL MATTERS
exemplifies (5bi) and West Norwegian, default-V-agreeing Spanish, and P-agreeing Swedish exemplify (5bii).3 While that is a description of a complex state of affairs, it is hardly an explanation. The problem is in the system presupposed by (5), in which a higher order predicate is needed to make the system intelligible. We need to range over values of parameters to state relevant generalizations: (6) For ranging over “” and “”: a. V-agr, P-agr ↔ (P, A) b. V-agr, -P-agr ↔ (A, P) An example of this sort of generalization from the GB literature is Jaeggli and Safir’s (1989) observation that languages have pro-drop if they have full-fledged agreement (Spanish) or no agreement at all (Chinese). But MP does not allow us to formulate such a meta-statement, as it would not be expressible with the bare representational notation that comes from the lexicon. In large part, minimalism was designed to deal with precisely the sorts of facts above, where one particular derivation is not generally ungrammatical; rather, it is bad because an alternative derivation, in some definable sense, is better. To carry this logic through, we must make certain assumptions whose soundness I leave for the end of this note. The basic idea is discussed in Martin and Uriagereka (forthcoming): (7) Within local derivational horizons, derivations take those steps which maximize further convergent derivational options. (8) All else being equal, derivations involving fewest steps outrank their alternatives. (8) is Chomsky’s (1995b) idea about comparing derivations within a given reference set. (7) is a new proposal, also in the spirit of optimality criteria. Assume that “local derivational horizon” means a choice of formal alternatives for a derivationally committed structure (e.g. move vs. merge of some formal item). If we select a new substantive lexical item, relevant derivations are no longer comparable from that point on. Furthermore, observe how (7) explicitly introduces the idea of the entropy of a derivation, if we think of derivational options (i.e. whether after a given point further moves/merges are allowed) as “micro-states” of the system compatible with the whole derivational “macro-state,” given the relevant horizon. A derivational decision d will allow further convergent steps in a number of n possibilities, whereas some other derivational decision d will only allow a number m of possible continuations, where m n. In those circumstances d induces more derivational entropy than d does, which (7) aims at optimizing. Given this machinery, (9) sketches the first relevant bifurcation of derivational paths. If we take the derivational step in (9a), putting aside further movements (see Note 2), the resulting order will be (P, A), whereas (9b) immediately leads to the (A, P) order.4 But the question from the point of view of entropy is what happens next, within a relevant derivational horizon. We must consider in turn each of the derivational paths in (9). 29
DERIVATIONS
PP
(9) a.
P'
EXPL P [agr]
N PP
b. N
P' P [agr] t
Starting with (9a), observe how the expletive should block further movement of N across it (a locality effect). Let us state this in terms of Chomsky’s (1995b) notion “Attract.” Some further category higher up in the tree (T) can attract EXPL, as in (10a), but not N across EXPL (10b): (10)
a.
T' T
PP
EXPL PP
P' P [agr]
P'
EXPL
N
P [agr]
b.
N
T' T
PP EXPL
P'
P [agr] N
30
CONCEPTUAL MATTERS
Compare this situation to the one in (9b). For that structure, two derivational paths are possible: a.
(11)
TP EXPL
T' T
PP N
PP N
P'
P' P [agr]
P [agr]
t t
b.
T' T
PP N
P' P [agr] t
In (11a) we insert the expletive in the TP Spec , whereas in (11b) T attracts the associate. These are two convergent steps, unlike (10b).5 Therefore the entropy of (9b) (the number of continuations it allows) is higher. Again, the system goes with the derivational decision that leads to this result, in compliance with (7), and thus predicting languages of the sort in (5b). The logic of the reasoning forces us to say that, in languages of the sort in (5a), alternative derivational steps are equally entropic. Observe that, if this is indeed the case, familiar economy considerations as in (8) will predict (9a) (with no costly moves) to outrank (9b). But why should it be that in the relevant circumstances entropy gets equalized? As it turns out, languages with agreement in the V and P systems obtain this result in a way which has nothing to do with why languages with no agreement do. Let us examine this in detail. In languages with agreement in V and P, overt expletives are generally impossible. Suppose that an element with no phonetic or semantic realization (a null expletive) simply does not exist. If so, in this language step (9a) is not an option. When the Part domain is reached, the only options are to take or not to take step (9b). Taking it will lead to taking or not taking step (11b); not taking (9b) will lead (in principle) to again taking or not taking (9b). The derivations 31
DERIVATIONS
are equally entropic, hence the costless one wins. That is, the winner is the one with the non-moved A, according to generalization (5ai).6 In contrast, in languages with no agreement, the expletive is clearly present. So how do we distinguish this case from that of languages with an expletive but some form of agreement in the participial? Observe each instance side by side: (12) a.
T'
T'
b.
T
PP EXPL
T P'
PP EXPL
P [agr]
P'
P [agr] N
N
The very same relation between T and N which is impossible in (12a) (case (10b) repeated) becomes possible across an expletive which does not agree with a participial head (12b). The easiest way to state this contrast is to say that the agreeing expletive is visible to the system in a way that the non-agreeing expletive is not, hence only the former prevents Attraction across it.7 Of course, if (12b) is indeed an option, then the entropy of each of the alternatives in (10)/(11) gets equalized, and again the grammaticality decision must be made in terms of (8), thus correctly predicting no movement in these instances (5aii). I would welcome a complete explanation of facts like these in non-minimalist terms; I am not familiar with any. What is even more important is that while (6) is true, it is a spurious generalization. There is no deep sense in which languages with equal values for V – or P – agreement should also have a (P, A) order, or languages with different values for these variables should have an (A, P) order. If we were told that the very opposite generalization held, nothing would shock us. But after having seen the reasoning above, the opposite state of affairs would be unexpected. I believe that it is analyses of this sort that should be used to decide whether Chomsky’s idea is productive. Of course, some non-trivial assumptions that we have made need to be discussed. For instance, why is economy “ranked” with regards to entropy? In this respect, R&Rs’ suggestion about extending naturalistic metaphors may be more than “chit-chat,” and no different from similar extensions in complexity studies more generally. In nature, (7) is comparable to the Second Law of Thermodynamics, whereas (8) has the flavor of the Slaving Principle in synergetics. All of nature obeys the second law, but within the parameters it allows, certain systems find points of stability as a consequence of various processes of local feedback. Turbulence emerges in this fashion – not at random points, but in those where a certain fluctuation exists at lower levels of energy. This is consistent with the second law, but not predicted by it; an ordering principle is required to determine why turbulence emerges here and not 32
CONCEPTUAL MATTERS
there, and that principle can be metaphorically linked with the idea that costly derivations are dispreferred. What interests me, however, is the ranking. The Second Law first, the Slaving Principle within its confines. Could the ranking of (7) over (8) be related to this state of affairs? My speculation that it can will be fully tested only when we learn more about the physical basis of language in evolution, development and performance. Then again, how different would this metaphorical extension be from the ones made in computational biology when modeling organisms in terms of optima? Fukui’s (1996) original paper – from which R&R took the provocative thought that comparing derivational alternatives resembles Least Action in physics – explicitly made this same point for optimization in computer science. A concrete example that Fukui could not have anticipated, but which neatly illustrates his claim, is the analysis that West, Brown and Enqist (1997) provide of the cardiovascular system of vertebrates as a fractal space filling networks of branching tubes, under the assumption that the energy dissipated by this transportation system is minimized. That condition is a defining property of the system – it does not follow from anything. Suppose we applied LLJ’s logic for why my naturalistic take on MP is “groundless,” to this celebrated example. Optimality principles in physics “are derived from deeper physical properties of the (entities) which satisfy them . . . By contrast, the MP takes economy . . . to be one of (the grammar’s; here, the biological system’s) defining properties.” The premises are both true, but nothing follows. Would LLJ ask the scientific community to dump West, Brown and Enquist’s explanation of scaling laws – the only one around in biology – because it is irreducible in the way they suggest, hence “groundless”? That reductionism would be short-sighted. Intuitions about optima in physics appeared well before anyone had any idea as to how to deduce them from first principles, as in the current systems that LLJ allude to. This did not stop physicists from proceeding with their assumptions. Surely LLJ cannot be implying that only present-day physics, one state in an enterprise, matters, unless they are prepared to call “groundless” what was done prior to Einstein. By the same reasoning, computational biology is a perfectly grounded science, even if its optima are not based on the properties of the units that biology studies or do not reduce to the structure of quarks. And by the very same token, I fail to see why linguistics should be subject to a different standard. Many linguists who worked seriously within GB noticed that it allowed too much power: “Do anything to anything anywhere at any time.” That of course provided very interesting accounts of Empty Category Principle effects, but it became useful to limit the movement mechanism. The result was “last resort.” Do something which has a purpose, in some sense. Obviously that is a new axiom, with an economy flavor to it. Coupling it with other axioms explored at the time, which invoked certain symmetries within linguistic representations, field-like conditions on locality, uniformities across various notions and levels, a new spirit emerged. Could it be that language is structured in precisely those elegant terms? Taking the idea to the status of an appropriately extreme 33
DERIVATIONS
research question, one is led to ask: “How much sense is there in asking whether language is ‘perfect’?” The answer to this is purely empirical, and whether MP succeeds as a natural extension of GB should not be decided a priori. That means we have to, as Brecht’s Galileo puts it, “bother to look.” I believe that LLJ are too hasty in their conclusion that MP is “ungrounded in empirical considerations.” Are the facts above not empirical? I cannot go into a lengthy presentation of the relevant literature, but without even considering works by the many established linguists who have already made significant contributions to MP, one can find a host of younger colleagues who have provided what looks to me like deep accounts of Case/agreement distributions, islands, superiority effects, parasitic gaps, coordination, different binding requirements, quantifier relations, reconstruction, and control, just to name the obvious. None of these were discussed in detail in R&R (it was not the place for that), but the ones accessible in the mid-1990s were certainly included in the twenty plus pages of widely available references. One just has to bother to look. Although I let my other colleagues comment on the passionate, unnecessary overtones of LLJs’ piece, I do want to mention that conceding that “the conceptual defects of [MP] are probably no worse in kind than earlier examples might be” is unfortunately familiar rhetoric, especially when coupled with wild claims about the field of linguistics being irrationally dominated by Chomsky’s demon. Ironically, this perpetuates the myth that MP is Chomsky’s toy story, and not the collegial effort of many scholars who knew the limitations GB faced, and forged ahead.
3 Cutting derivational options8 Lappin, Levine and Johnson (LLJ) categorize my first reply to their piece as something “which addresses our main objection to the MP by attempting to present a substantive defense of economy conditions in the theory of grammar.” This is why I chose the paradigm I did, among other possible ones amenable to broadly minimalistic considerations: it illustrated competing derivations at play, and procedures to choose among them. I purposely tried to study, also, a case where different possible orders come out as grammatical in different languages. Any analysis of these facts should be welcome. The “most important” critique that LLJ present of my reply aims directly at that choice of facts. This is surprising, as the Romance data I reported have been replicated elsewhere, having first being pointed out, in some form, by Luigi Burzio – some twenty years ago. The Scandinavian facts I mentioned are due to Anders Holmberg in his careful study of various Germanic languages, and have been the focus of attention of several papers recently. My rather modest contribution was putting the various observations in the form of a generalization: languages with the same value ( or ) for verb and participial agreement exhibit participle-associate NP (P,A) order, while languages with distinct values for verb and participial agreement display A,P order in expletive structures. 34
CONCEPTUAL MATTERS
LLJs’ counterexample to my generalization is this: in some languages both orders arise, regardless of agreement. LLJs’ data is as reported in (13), with their judgments, and its Hebrew translation (English has mixed agreement while Hebrew has full agreement, both relevant test-cases): (13) a. ?There remained three players sitting. b. ?There remained sitting three players. Although it is certainly true that the grammaticality contrasts I was alluding to are not always dramatic (speakers often “can get” the alternative order, but it is signaled as less preferred, marked, or just bizarre), I have not found a native speaker of English who does not detect something odd about (13b). In my experience, to the extent that examples like (13b) are acceptable in this language, they need an intonation break between sitting and the rest of the sentence, which is best if heavy in either syllabic terms (i.e. long) or phonological weight; thus my data analysis would be as in (14), instead of (13): (14) a. ??? There remained sitting three players. b. There remained sitting three players who had no idea what the play was. c. There remained sitting THREE PLAYERS. None of these considerations about heaviness obtain for the alternative orderings of the associate and P. Several recent papers discuss the possibility that the “inverted” orders in (14) result from stylistic adjustment, not unlike Heavy NP shift. If so, the facts in (14) are no counterexample to my syntactic generalization. Appropriately idealized, (13a) is pretty good and (13b) is pretty bad. Also, observe (15), an existential construction with be which, to start with, is the most natural in English (vis-à-vis examples with unaccusative verbs), and thus yields more robust judgments: (15) a. There was not a single player sitting. b. ?*There was not sitting a single player. (15), and the appropriately analyzed instances in (14), seem like potentially relevant examples for derivational accounts of the minimalist sort: precisely in that we have to justify why (15a) (presumably with A movement) “wins over” alternative (15b) (certainly with no such movement). That (15b) is a reasonable alternative, in principle, is shown by the fact that the relevant order in that instance is fine, in fact preferred, in other languages. Given the sort of generalization I attempted, in a language like Hebrew – with full agreement in both the verbal and participial paradigms – my expectation would be that precisely the option where A does not move is what eliminates the alternative with movement as a grammatical route. So in a paradigm like (16), which LLJ provide, I clearly expect (16b) (where shlosha saxkanim “three players” is presumably not displaced) to derivationally outrank (16a) (where shlosha saxkanim is “wrapped around” yoshvim “sitting,” the P element): 35
DERIVATIONS
(16) a. Nisharu shlosha saxkanim yoshvim. remained-3pl-past three-m players-m-pl sitting-m-pl “There remained sitting three players.” b. Nisharu yoshvim shlosha saxkanim. remained-3pl-past sitting-m-pl three-m players-m-pl “There remained sitting three players.” LLJ detect no preferences in (16), unlike what happens, say, in Romance. Note that Hebrew allows for post-verbal subjects, hence (16b) could be analyzed as the equivalent of the English three players remained sitting, albeit with stylistic inversion of the subject – which again is irrelevant for my purposes (no displacement of three players). Of course, the potentially troubling example for me is (16a), and the question I face, whether I am forced to analyze this string of words in structural terms which are pertinent for the generalization I posited: has A been displaced here? The relevant structural terms are as in (17), where the main verb selects the P structure sitting three players, and subsequently three players moves: (17) [remained [three players [sitting t]]] However, the verb remain, or its Hebrew version nisharu, may also select for a regular nominal expression, in which case in principle two further structures could be at issue (actually four, if we also consider the possibility of inverted subjects noted above – which I now set aside): (18) a. [remained [three [[players] sitting]]]] b. [[remained [three players]] sitting] In (18a) sitting is a modificational element; in (18b), a secondary predicate. All of this has some relevance for the worrisome (16a), which could have the analyses in (19): (19) a. [Nisharu [shlosha [[saxkanim] yoshvim]]]] remained three players sitting b. [[Nisharu [shlosha saxkanim]] yoshvim] remained three players sitting Perhaps obviously now, neither of these structures is pertinent to my generalization. Not all Hebrew speakers like yoshvim “sitting” as an adjectival element, although all those that I have consulted accept it as a participial. For those people, then, only (17) should be a possible analysis. Most of my informants find (16a), presumably under non-adjectival circumstances, worse than (16b). Again, the structure is recovered with postposition to the right periphery, for contrastive focus: (20) nisharu ba-gan shlosha saxkanim yoshvim remain in-the-garden three players sitting 36
CONCEPTUAL MATTERS
This too is irrelevant, and makes (20) similar, in significant respects, to the examples in (14). Alongside active participles like yoshvim, Hebrew also allows for passive participles like yeshuvim, which is analogous to the English “seated” and thus easier to use as an adjective. Predictably, those speakers that find (16a) degraded accept (21) without troubles: (21) Nisharu shlosha saxkanim yeshuvim remained-3pl-past three-m players-m-pl seated-m-pl “There remained three seated players/three players seated.” Once more this particular structure has nothing to do with my generalization, and confirms, in its contrast to the degraded nature of (16a) in relevant guise, the overall claims I made. The empirical challenge, thus, dissipates both for English and for Hebrew, which align with other languages as predicted. This is hardly surprising. True optionality is rather scarce; it often tells us more about our carelessness as researchers than about the nature of syntax. It is worth mentioning, also, what LLJ propose as an analysis of what they take to be the facts (both possible {P, A} orders in all sorts of languages, contra my generalization): in the end “lexical properties of the participle (optional Case assignment?) are the main factor in determining the order of the NP associate relative to P.” The disconcerting aspect of LLJs’ critique is not so much the denial of the facts as lexical idiosyncrasies, but rather the thought that optional Case assignment – which poses too many problems for me to address in a few pages – may have anything to do with the overall analysis. So much for the key critique. Let me turn to the interesting questions, assuming the facts. This is the main body of their response to my proposal, which they find to have four “serious problems.” The first is presented within the veiled complaint that my formulation of the entropy condition is not precise; LLJ were able to understand it as follows: “at any given point d in a derivation D, it selects from the set O of possible operations the one which, when applied to d, produces the largest number of convergent derivational paths from d compared to other operations in O.” I could not have said it any better. What is imprecise about that is unclear. But this is not the criticism (just the general tone); the actual concern is the fact that this is not an economy condition. Of course, I never said it was. Their question is, “Why should maximizing the number of convergent continuations of a derivation produce a more economical derivation?” This reflects, I believe, a profound misconception as to how the strong minimalist thesis is supposed to work, so let me pause to address it. There are two misunderstandings hidden in the presuppositions of that question. One is technical. MP was conceived, in part, as an attempt to limit the class of possible derivations that the P&P system allowed. Several tools were explored for that; the notion of last resort was one of them, economy conditions (in the technical sense) a different one. In either instance, by the end of the day 37
DERIVATIONS
only some of many possible derivations are convergent or optimal, and in those terms grammatical. My entropy condition – aside from relevant to the broad conceptual concerns that LLJ were raising in their first piece – is just another way of limiting the class of possible derivations. How it works is simple: you arrive at a derivational juncture d and face, say, two options x and y. Where the P&P model would have said “take either,” MP says “choose one.” The entropy condition provides us with a method to decide: if x allows more convergent derivational continuations than y (within a defined space), then x outranks y. This is certainly not an economy condition as such (for instance in the sense that a strategy to take the fewest derivational steps is). But why should it be an economy condition? All that matters is the selection of a grammatical derivational path over its alternatives. Similarly, when Chomsky in recent work maximizes the checking of features at a given site this is not, in itself, an economy condition; but it certainly has indirect economy consequences, in that it reduces the class of grammatical derivations. The second misunderstanding is conceptual. The strong minimalist thesis seeks to determine whether a very interesting proposal concerning the nature of the language faculty is true, and if so in what form. The idea is that language arises at the optimal interface between an autonomous syntactic organ and other mental capacities. The question then becomes empirical, concerning, in large part, the specific sense in which the interface and how it is accessed turn out to be optimal. In that context, an economy condition is as expected as a maximization principle of the sort I (or Chomsky, in the case of feature checking) proposed, as would be other optima which encounter familiar expressions in other sciences. Optimal design cannot be determined a priori, and the only way to go about understanding its properties is to propose various reasonable concepts and see what empirical pay off they have. LLJ claim that “[f]rom a computational perspective [my suggestion] is precisely the opposite of an optimal system . . . Uriagereka provides no independent justification for his principle, and it is not clear to us how it encodes any recognizable concept of perfection in computational design.” I believe this touches on the crux of the matter, and is one of the reasons why LLJ seem so passionate about all of this. The field has largely ignored their concerns about globality in previous work, which they cite abundantly. This was so for two reasons, both raised by Chomsky in various places. First of all, a formal problem in some limit for a system of use has no obvious bearing on a system of knowledge. Familiar examples of this abound: e.g. center embedding poses a computational problem, so speakers avoid it. If the kinds of derivations that LLJ are concerned with are equally problematic, speakers will refrain from going into them – there is plenty of expressive space elsewhere. Second, and more importantly, it turned out to be trivial to reduce those phenomenal formal problems to nought. For example, any sort of cyclic computation entails that the class of alternative derivations at any given point is very small, thus not a mathematical explosion. In the very example I provided, I insisted on the fact that entropy is computed locally, within a derivational horizon established by lexical choice. If 38
CONCEPTUAL MATTERS
you get a new substantive item from the lexicon, there goes the computation of entropy; there is, thus, no computational blow-up. Thus the worry about complexity, in and of itself, seems superfluous. In contrast, how to limit derivational choices is still an issue; not because of computational concerns, but because alternatives to grammatical options, even if convergent and intelligible, are plainly unacceptable to speakers. In that context, all that a design property of grammars (whether they obey last resort, economy, maximal feature matching, entropy, or whatever) really needs in order to be justified is to show whether it cuts derivational choices. A serious conceptual problem with my entropy suggestion (or any other such design property) would arise only if it has no formal consequence, thus no weight on deciding among derivations, to choose the grammatical output. To judge design properties in any other terms (e.g. to demand that they be, specifically, economy conditions, in the technical sense of the notion) is to have misunderstood why we are positing them to start with. Let me continue with LLJs’ list of conceptual challenges. Their second one pertains to the fact that my “explanation requires that entropy be ranked above the smallest-number-of-steps-condition. He seeks to justify this ranking by comparing entropy to the Second Law of Thermodynamics . . . Presumably the smallest steps condition is less binding in its effects.” All of this is true, which LLJ take to be “deeply confused and based largely on a misplaced analogy.” While I have not provided, and cannot provide, an extra-linguistic justification for my analogy, I fail to see how it can be decided, a priori, whether it is misplaced. True, I believe – with Chomsky – that the specific properties of mind that we are exploring will ultimately be unified with characteristics of the “boring” universe, and how they manifest themselves in minds as a result of brain organization in evolution and development, within the constraints imposed by the physico-chemical channel; and true, I cannot justify this nor know of any theory that even attempts to do so. Nonetheless, my take on the strong minimalist thesis is in pretty much those terms, however distant they may be from a satisfying unification with other disciplines. That was the thesis in Rhyme and Reason, for conceptual reasons: once you give up a functionalist approach to optimality in grammars, what else could it be other than the familiarly optimal universe (in physics and chemistry) manifesting itself in biology? In that and other works I have tried to show that this proposition is not particularly crazy for other domains of (computational) biology, body plans and all that. To decide, from the start, that this is misplaced analogy and misunderstanding seems at best unimaginative. It is arguably a bit more dishonest than that. LLJ ought to know that there is a connection discussed in the computational literature between information and negative entropy, and that entropy is routinely used in the life sciences, as per the advice of no less than Schrödinger. In fact, the model study that I followed for the presentation of phyllotaxis in Rhyme and Reason, namely Roger Jean’s work, explicitly discusses an account for the ordering facts one of whose factors is a characterization of entropy. I do not want to venture whether LLJ consider 39
DERIVATIONS
those extensions misplaced as well, but I do reckon that in those instances, too, nobody has been able to reduce the explanation to first principles in quantum mechanics. The measure of success has been how well the model accounts for observed properties of plant symmetry. Why should our science be subject to a higher standard? LLJ take me, “[i]n seeking to rank economy conditions relative to each other . . . to be invoking as his model an OT hierarchy of defeasible constraints.” That claim is entirely false. My ranking has the same status within the system as that existing between, say, the choice of a minimal part of a numeration, in Chomsky’s recent terms, and within those confines the particular derivation d that such-and-such. This is a kind of ranking that results from the design of the grammar, and as such has nothing to do with the variable rankings possible in OT, the source of linguistic variation in that model; I do not expect any variation in entropy trumping economy. Suppose I had ranked economy above entropy; what would that mean? If entropy is taken, in essence, as the opposite of information, augmenting entropy is akin to reducing committed information (leaving options open); it is sensible to find the most economical among those open options. But once you commit to the optimal option, I am not sure what sense there is in, then, seeking the option that leaves most options open. A different story, of course, is whether that particular modeling of the system is ultimately sound (for my proposal as well as in that of parts of numerations chosen prior to derivations, or any others); whether, for instance, the rationale for the ranking I have outlined follows from its own logic or, rather, is a dull consequence of how information arises in the physical universe. That question is unresolved in linguistics, as it is in plant morphology or wherever else in the universe that form matters. The third and fourth challenges are less grandiose, and reflect a misunderstanding, whether resulting from their reading or my writing, I do not know. I do claim that, as they put it, “null expletives are not present in [languages with full agreement], and so expletive insertion is not possible.” That is a familiar position from the literature, which would for instance account for why these sorts of languages do not generally have overt T expletives (there is nothing to be had). I also claim that a grammar with those characteristics will never face a derivational decision in the entropic terms I suggest, as the continuations of the possible paths to be taken are equally rich: to move or not to move. (Alternatively, as Roger Martin suggests, in this instance there is no derivational choice and thus entropy is irrelevant.) They find this assertion to rely “on the vague nature of the entropy principle which permits Uriagereka to claim equivalence of derivational options without any obvious basis for doing so.” I think this perception stems from a misconstrual of how the entropy condition is supposed to work, as this passage suggests: “If one selects movement of A to Spec of P, then further movement of A to Spec of T . . . is possible, while on the initial in situ choice this latter derivation is ruled out.” I cannot tell why LLJ think the latter derivational step (A from the P Spec to the T Spec ) is in principle ruled out (aside from its ultimate fate in this particular derivation). My condition seeks to 40
CONCEPTUAL MATTERS
choose the derivational path which “increase[s] the number of (locally) available convergent paths in the derivation,” [my emphasis in their own, correct, rendition of my proposal]; the path in question is used for standard movement to subject position, hence is a fortiori convergent, hence the entropy of each path is the same. The fourth criticism grows from a fact that I myself pointed out in the footnotes: my “explanation of the P,A order for languages with no verbal or participle agreement relies crucially on the assumption that a non-agreeing expletive is invisible to movement, and so it does not block ‘long’ movement . . .” I said that a more elaborate explanation is possible if instead of taking the expletive as the blocker we take agreement to be the relevant element. Although this will add absolutely nothing to the conceptual discussion, I will sketch the alternative. (The impatient reader can skip this part.) The relevant patterns are as in (22): (22) a. … AgrV … AgrP … NP
b. … AgrV … AgrP … NP
c. … AgrV … AgrP … NP
d. … AgrV … AgrP … NP
Suppose that the associate NP is being probed (in Chomsky’s recent sense, represented as an arrow in (22)), and only Agr elements are active for the task. In (22c) and (22d) the active Agr heads probe the associate NP. In (22a), suppose one Agr head probes the other, as they are both active; thus something else must probe NP, say T. A similar situation arises in (22b), where both Agr heads are inactive. Next consider the computation of entropy, this time with respect to Probe/goal relations across no intervening extra Probe (i.e. locally). For (22c) and (22d) – where in the end NP moves – we want entropy to favor a movement step, which must mean that the continuations of that step are more numerous than continuations of a competing merging step. In the merging step a pleonastic is entered as an AgrP specifier. Assuming, with Chomsky, that pleonastic elements can themselves be probing heads if they have agreeing (person) features, there will be active pleonastics in (22c) and (22d), licensed in terms of association with Agr heads, either directly (22d) or after movement (22c). It is these active pleonastics that count as intervening probes, thus eliminating relations across them. The reasoning just given must mean that in neither (22a) or (22b) should there be active pleonastics, as they are no active licensing Agr heads there. This is rather obvious for (22a), as nothing is pronounced or interpreted in subject position; for (22b) it must mean that the pleonastic involved in this case is only a PF reflex. As a consequence, in both these instances relations across either nothing or a mere PF formative are possible, and entropy gets equalized; then merge trumps move, as desired. In the end that more or less elaborate reasoning follows, also, if one simply assumes that the sort of expletive in (22b) is a late arrival in the derivation, in the PF component, hardly a new proposal in the literature. It seems unfair to consider “a serious problem with [my] argument” a claim that suggests that if 41
DERIVATIONS
a language has some agreement (in V or in P), then it can syntactically license a bona fide pleonastic, but if it does not have any agreement whatsoever, then a pleonastic is only licensed stylistically or phonologically. That may be wrong, but I find it neither “vague” nor “ad hoc.” LLJ compare – for I suppose postmodern purposes – the language faculty to a urinary tract. Our main point of disagreement may ultimately be this. What seems to me remarkable about the linguistic system is that it has amazingly elegant properties regardless of the utterly bizarre uses humans can put this system to (e.g. this whole discussion). In that it seems different from LLJs’ urinary tract, which might also correlate with why they share the latter with all other vertebrates, whereas the former is a bit more restricted. I read Chomsky in MP as taking a stab at that mystery.
42
Part I SYNTAGMATIC ISSUES
3 MULTIPLE SPELL-OUT †
1 Deducing the base step of the LCA A main desideratum of the Minimalist Program is reducing substantive principles to interface (or bare output) conditions, and formal principles to economy conditions. Much energy has been devoted to rethinking constraints and phenomena that appear to challenge this idea, in the process sharpening observations and descriptions. In this chapter, I attempt to reduce a version of Kayne’s (1994) Linear Correspondence Axiom (LCA). Chomsky (1995c) already limits the LCA’s place in the grammar. Kayne’s version of the axiom is a formal condition on the shape of phrase markers. Chomsky’s (for reasons that go back to Higginbotham 1983b) is a condition that operates at Spell-out, because of PF demands. Kayne’s intuition is that a nonlinearized phrase marker is ill formed, in itself, whereas for Chomsky such an object is ill formed only at PF, hence the need to linearize it upon branching to this component. Chomsky’s version is arguably “more minimalist” in that linearization is taken to follow from bare output conditions. The axiom has a formal and a substantive character. The formal part demands the linearization of a complex object (assembled by the Merge operation, which produces mere associations among terms). A visual image to keep in mind is a mobile by Calder. The hanging pieces relate in a fixed way, but are not linearly ordered with respect to one another; one way to linearize the mobile (e.g. so as to measure it) is to lay it on the ground. The substantive part of Kayne’s axiom does for the complex linguistic object what the ground does for the mobile: it tells us how to map the unordered set of terms into a sequence of PF slots. But even if Chomsky’s reasoning helps us deduce the formal part of the axiom (assuming that PF demands linearization), the question remains of exactly how the mapping works. Kayne is explicit about that. Unfairly, I will adapt his ideas to Chomsky’s minimalist “bare phrase structure” (Chomsky avoids committing himself to a definition in either Chomsky 1995a or 1995c). (1) Linear Correspondence Axiom a. Base step: If commands , then precedes . b. Induction step: If precedes and dominates , then precedes . 45
DERIVATIONS
I will discuss each of the steps in (1) in turn, with an eye toward deducing their substantive character from either bare output or economy considerations. Consider why command should be a sufficient condition for precedence. It is best to ask this question with a formal object in mind. I will call this object a command unit (CU), for the simple reason that it emerges in a derivation through the continuous application of Merge. That is, if we merge elements to an already merged phrase marker, then we obtain a CU, as in (2a). In contrast, (2b) is not a CU, since it implies the application of Merge to different objects. (2) a. Command unit: formed by continuous application of Merge to the same object {, { , {, {, {…}}}}} →↑← {, {, {…}}} →↑← {…} b. Not a command unit: formed by discontinuous application of Merge to two separately assembled objects {, {{ , { , { …}}}, {, {, {…}}}}} { , { , { …}}} →↑← {, {, {…}}} →↑← { …} →↑← {…} Regarding CUs, the ultimate question is why, among the possible linearizations in (3), (3d) is chosen. {, …}
(3) a.
{, …}
b.
{, …}
{, …}
{, …}
{, …}
c.
{, …}
d. {, …}
e.
{, …}
{, …}
{, …}
{, …}
{, …}
f.
{, …}
{, …} {, …}
{, …}
{, …}
To continue with the mobile image, there are n! ways in which we can lay it on the ground, for n the number of hanging elements. Why is it that, among all the 46
MULTIPLE SPELL-OUT
apparently reasonable permutations, the linguistic mobile collapses into a linearized sequence specifically in the order , , {…}? We may ask the question from the point of view of what syntactic relations are relevant to the terminals of the structures in (3). Concentrating on the terminals, we see that the only relation that exists between them in a CU is “I have merged with your ancestors.” We can produce an order within CUs in terms of this relation, which essentially keeps track of what has merged with what when. This is, essentially, the insight behind Epstein’s (1999) interpretation of command, which has the effect of ordering the terminal objects in (3) as follows: , , {…}. If PF requirements demand that the Merge mobile collapse into a flat object, it is not unreasonable to expect that the collapse piggybacks on a previously existing relation. Indeed, minimalist assumptions lead us to expect precisely this sort of parsimony. However, we have not yet achieved the desired results. To see this, imagine a group of people trapped inside a building, with access to a window that allows them to exit just one at a time. These people may order themselves according to some previously existing relation (e.g. age). But having found an order does not mean having decided how to leave the building. Does the youngest exit first or last – or in the middle? Likewise, a decision has to be made with regard to the , , {…} order. Exactly how do we map it to the PF order? In minimalist terms, the question is not just how to map the collapsed , , {…} sequence to a PF order, but actually how to do it optimally. The hope is that mapping the collapsed , , {…} command order to the , , {…} PF order in (3d) is (one among) the best solution(s). Another analogy might help clarify the intuition. Visualize a house of cards, and imagine how pulling out one crucial card makes it collapse. To a reasonable extent, the order in which the cards fall maps homomorphically to the order in which they were assembled, with higher cards landing on top, and cards placed on the left or right falling more or less in those directions (assuming no forces other than gravity). If Merge operations could be conceived as producing what amounts to a merge-wave of terminals, it is not unreasonable to expect such a wave to collapse into a linearized terminal sequence in a way that harmonizes (in the same local direction) the various wave states, thus essentially mapping the merge order into the PF linear order in a homomorphic way. This, of course, is handwaving until one establishes what such a merge-wave is, but I will not go into that here (see Martin and Uriagereka (forthcoming), on the concept of collapsed waves in syntax). Even if we managed to collapse the merge order into the PF sequence that most directly reflects it, why have we chosen (3d) over the equally plausible (3a)? In short, why does the command relation collapse into precedence, and not the opposite? The harmonized collapse problem seems to have not one optimal solution, but two. Three different answers are possible. First, one can attribute the choice of (3d) over (3a) to something deep; it would have to be as deep as whatever explains the forward movement of time. (I am not entirely joking here; physical 47
DERIVATIONS
properties are taken by many biologists to affect core aspects of the morphology of organisms, and Kayne (1994) speculates in this direction.) Second, one can say (assuming that (3a) and (3d) are equally optimal solutions) that (3d) gave humans an adaptive edge of some sort, in terms of parsing or perhaps learnability. One could also imagine that a species that had chosen (3a) over (3d) as a collapsing technique might equally well have evolved a parser and an acquisition device for the relevant structures (but see Weinberg 1999). Third, one can shrug one’s shoulders. So what if (3a) and (3d) are equally harmonic? Two equally valid solutions exist, so pick the one that does the work. (This view of the world would be very consistent with Stephen Jay Gould’s punctuated equilibrium perspective in biology; see Uriagereka 1998 and Chapter 8.) This situation is acceptable within the Minimalist Program, or for that matter within any program that seeks to understand how optimality works in nature, which cannot reasonably seek the best solution to optimality problems, but instead expects an optimal solution; often, even mathematically optimal solutions are various. If I am ultimately on the right track, (3d) can correctly be chosen as the actual PF ordering that the system employs; that is, we should not need to state (1a) as an axiom. In a nutshell, command maps to a PF linearization convention in simple CUs (those (1a) is designed to target) because this state of affairs is optimal. I will not claim I have proven this, for I have only indicated the direction in which a demonstration could proceed, raising some obvious questions. I have little more to say about this here and will proceed on the assumption that the base step of the LCA can be turned into a theorem.
2 Deducing the induction step of the LCA Having met the demands of the Minimalist Program by showing how part of the LCA can reduce to more justifiable conditions, we should naturally ask whether the whole LCA can be deduced this way. I know of no deduction of the sort sketched above, given standard assumptions about the model. Nonetheless, an older model provides an intriguing way of deducing the LCA.1 For reasons that become apparent shortly, I refer to it as a dynamically split model. The origins of this outlook are discussions about successive cyclicity and whether this condition affects interpretation. Are the interpretive components accessed in successive derivational cascades? Much of this debate was abandoned the moment a single level, S-Structure, was postulated as the interface to the interpretive components. Now that S-Structure has itself been abandoned, the question is alive again. What would it mean for the system to access the interpretation split in a dynamic way? I want to demonstrate that the simplest assumption (i.e. nothing prevents a dynamically split access to interpretation) allows the LCA’s induction step to be satisfied trivially. In effect, this would permit the deduction of (1b), albeit in a drastically changed model that neither Chomsky (1995c) nor Kayne (1994) was assuming. 48
MULTIPLE SPELL-OUT
One way of framing the issue is to ask how many times the rule of Spell-out should apply. If we stipulate that it applies only once, then PF and LF are accessed only once, at that point. On the other hand, liberally accessing PF and LF in successive derivational cascades entails multiple applications of Spell-out. Surely, assuming that computational steps are costly, economy considerations favor a single application of Spell-out. But are there circumstances in which a derivation is forced to spell out different chunks of structure in different steps? One such instance might arise when a derivation involves more than one CU. As noted, CUs emerge as the derivational process unfolds, and they are trivially collapsible by means of the base step of the LCA. Now, what if only those trivially linearizable chunks of structure (e.g. (2a)) are in fact linearized? That is, what if, instead of complicating the LCA by including (1b), when we encounter a complex structure of the sort in (2b) we simply do not collapse it (thus linearizing it), causing a derivational crash? Only two results are then logically possible: either structures like (2b) do not exist, or they are linearized in various steps, each of which involves only CUs. The first possibility is factually wrong, so we conclude that Multiple Spell-Out (MSO) is an alternative. Before we explore whether MSO is empirically desirable, consider its possible mechanics. Bear in mind that CUs are singly spelled out – the most economical alternative. The issue, then, is what happens beyond CUs. By assumption, we have no way of collapsing them into given linearizations, so we must do the job prior to their merger, when they are still individual CUs. What we need, then, is a procedure to relate a structure that has already been spelled out to the still “active” phrase marker. Otherwise, we cannot assemble a final unified and linearized object. The procedure for relating CUs can be conceived in conservative or radical terms, either solution being consistent with the program in this chapter. The conservative proposal is based on the fact that the collapsed Merge structure is no longer phrasal, after Spell-out; in essence, the phrase marker that has undergone Spell-out is like a giant lexical compound, whose syntactic terms are obviously interpretable but are not accessible to movement, ellipsis and so forth.2 The radical proposal assumes that each spelled-out CU does not even merge with the rest of the structure, the final process of interphrasal association being accomplished in the performative components.3 I will detail briefly each of these versions. In the conservative version, the spelled-out phrase marker behaves like a word, so that it can associate with the rest of the structure; this means it must keep its label after Spell-out. Technically, if a phrase marker {, {L, K}} collapses through Spell-out, the result is {, L, K}, which is mathematically equivalent to {, {{L}, {L, K}}}.4 Since this object is not a syntactic object, it clearly can behave like a “frozen” compound. As a consequence, we need not add any further stipulations: the collapsing procedure of Spell-out itself results in something akin to a word. To see how we reach this conclusion, we need to take seriously Chomsky’s (1995c) notion of syntactic object. Syntactic objects can take two forms. 49
DERIVATIONS
(4) a. Base: A word is a syntactic object. b. Induction: {, {L, K}} is a syntactic object, for L and K syntactic objects and a label. (4a) speaks for itself, although it is not innocent. The general instance is not too complicated: a word is an item from the lexicon. However, the Minimalist Program permits the formation of complex words, whose internal structure and structural properties are not determined by the syntax. (Indeed, the object resulting from Spell-out also qualifies as a word, in the technical sense of having a label and a structure that is inaccessible to the syntax.) (4b) is obtained through Merge and involves a labeling function that Chomsky argues is necessarily projection. What is relevant here is how a label is structurally expressed. (5) Within a syntactic object, a label is not a term. (6) K is a term if and only if (a) or (b): a. Base: K is a phrase marker. b. Induction: K is a member of a member of a term. (6a) hides no secrets. (6b) is based on the sort of object that is obtained by merging K and L: one set containing K and L, and another containing {L, K} and label – namely, {, {L, K}. This whole object (a phrase marker) is a term, by (6a). Members of members of this term (L and K) are also terms, by (6b). Label is a member of the first term, hence not a term. All of these results are as desired. Consider next the collapse of {, {L, K}} as {, L, K}, equivalent to {, {{L}, {L, K}}}. By (6b), {L} and {L, K} are terms. However, {L, K} is not a syntactic object, by either (4a) or (4b). Therefore, {a, {{L}, {L, K}}} cannot be a syntactic object by (4b); if it is to be merged higher up, it can be a syntactic object only by (4a) – as a word. This is good; we want the collapsed object to be like a compound, that is, essentially a word: it has a label, and it has terms, but they are not objects accessible to the syntax. Note that set-theoretic notions have been taken very seriously here; for example, such notions as linearity have been expressed without any coding tricks (angled brackets, as opposed to particular sets). In essence, the discussion has revealed that generally merged structures (those that go beyond the headcomplement relation) are fundamentally nonlinear, to the point that linearizing them literally destroys their phrasal base. This conclusion lends some credibility to Chomsky’s conjectures that (a) Merge produces a completely basic and merely associative set-theoretic object, with no internal ordering, and (b) only if collapsed into a flat structure can this unordered object be interpreted at PF. Though the current notation does the job, the appropriate results can be achieved regardless of the notation. Various positions can be taken, the most radical having been mentioned already. In the version that ships spelled-out phrase markers to performance, one must assume a procedure by which already processed (henceforth, “cashed out”) phrase markers find their way “back” to 50
MULTIPLE SPELL-OUT
their interpretation site. Plausibly, this is the role agreement plays in the grammar. It is interesting to note that, according to present assumptions, MSO applies to noncomplements (which are not part of CUs). Similarly, agreement does not manifest itself in complements, which makes it reasonable to suppose that what agreement does is “glue together” separate derivational cascades that are split at Spell-out, the way an address links two separate computers. In either version of MSO, we have now deduced (1b), which stipulates that the elements dominated by in a CU precede whatever precedes. That should precede or be preceded by the other elements in its CU was shown in Section 1. The fact that the elements dominated by act as does within its CU is a direct consequence of the fact that has been spelled out separately from the CU it is attached to, in a different derivational cascade. The elements dominated by cannot interact with those that interacts with, in the “mother” CU. Thus, their place in the structure is as frozen under ’s dominance as would be the place of the members of a compound , the syllables of a word , or worse still, elements that have already “gone to performance.”5 I should point out one final, important assumption I am making. The situation we have been considering can be schematized as in (7). But what prevents a projection like the one in (8)? (7)
XP Merge Spell-out
YP Y
X' X
…
…
(8)
YP Merge Spell-out
Y' Y
X' X
…
…
In (8), it is the spelled-out category Y that projects a YP. This results in forcing the linearization of X’s projection prior to that of Y’s, contrary to fact. The problem is somewhat familiar. In systems with a single Spell-out, both Kayne and Chomsky must resort to ad hoc solutions to avoid this sort of undesired result involving specifiers. Kayne eliminates the distinction between adjuncts and specifiers,6 and Chomsky defines command in a peculiar way: only for heads and maximal projections, although intermediate projections must be “taken into account” as well.7 Within the conservative implementation of MSO, (8) can be prevented if 51
DERIVATIONS
only lexical items project. Again, MSO is designed to collapse a phrase marker into a compound of sorts. Yet this “word” cannot be seen as an item that projects any further; it can merge with something else, but it can never be the item that supports further lexical dependencies. This might relate to some of Chomsky’s (2000) conjectures regarding a fundamental asymmetry indirectly involved in the labeling of the Merge function; in particular, it may be that Merge (like Move) implies a form of Attract, where certain properties of one of the merging items are met by the other. It is conceivable that properties relevant to Attract are “active” only in lexical items within the lexical array, or numeration, that leads to a derivation, and not in words formed in the course of the derivation. This would include collapsed units of the sort discussed here, but it may extend as well to complex predicate formation, which is typically capped off after it takes place (there is no complex complex-predicate formation, and so on).8 At any rate, the price to pay for unequivocal attachment of spelled-out noncomplements is to have two (perhaps not unreasonable) notions of terminals: lexicon-born ones and derived ones. Under the radically performative interpretation of MSO, there is a trivial reason why a spelled-out chunk of structure should not project: it is gone from the syntax. The price to pay for equivocal attachment of spelled-out noncomplements is, as noted earlier, the agreement of these elements with corresponding heads.
3 Some predictions for derivations I have essentially shown how the base step of the LCA may follow from economy, and how the induction step may follow from a minimalist architecture that makes central use of MSO, thus yielding dynamically bifurcated access to interpretive components. Given the central position it accords CUs, this architecture makes certain predictions. In a nutshell, command is important because it is only within CUs that syntactic terms “communicate” with each other, in a derivational cascade. To get a taste of this sort of prediction, consider Chomsky’s (1995c) notion of distance, which is sensitive to command. The reason for involving command in the characterization of distance is empirical and concerns superiority effects of the following sort: (9) a. b. c. d.
who t saw what *what did who see t which professor t saw which student which student did which professor see t
Chomsky’s account capitalizes on the simple fact that the competing whelements (who, what, which) stand in a command relation in (9a,b), but clearly not in (9c,d), as (10a) and (10b) show, respectively.9 (10) a. [C [who … [saw what]]] b. [C [[which professor] … [saw [which student]]]] 52
MULTIPLE SPELL-OUT
Thus, he defines distance in terms of the following proviso: (11) Only if commands can be closer to a higher than is. This is the case in (10a): the target C is closer to who than to what. Crucially, though, in (10b) the target C is as close to the which in which professor as it is to the which in which student; these positions being equidistant from C, both movements in (9c) and (9d) are allowed, as desired. Needless to say, this solution works. But why should this be? Why is command relevant? In MSO terms, the explanation is direct. The two wh-elements in (10a) belong to the same derivational cascade, since they are assembled through Merge into the same CU. This is not true of the two wh-phrases in (10b); in particular, which professor and which student are assembled in different CUs and hence do not compete within the same derivational space (I return below to how the wh-features in each instance are even accessible). The fact that the phrases are equally close to the target C is thus expected, being architecturally true, and need not be stated in a definition of distance. It might seem that this idea does not carry through to the radically performative interpretation of MSO; but in fact it does. Even if which professor in (10) is in some sense gone from the syntactic computation, the relevant (here, wh-) feature that is attracted to the periphery of the clause stays accessible, again for reasons that I return to shortly. The general architectural reasoning that determines what information is and is not gone from the computation extends to classical restrictions on extraction domains, which must be complements.10 The contrast in (12) is extremely problematic for the Minimalist Program. (12) a. […X […t…]] e.g. who did you see [a critic of t] b. [[…t…] X…] e.g. *who did [a critic of t] see you The problem is that whatever licenses (12a) in terms of the Minimal Link Condition or last resort should also license (12b); so what is wrong with the latter? A minimalist should not simply translate the observation in (12) into a new principle; such a principle must again fall within the general desiderata of the program – and thus reduce to economy or bare output conditions. I know of no minimalist way of explaining the contrast in (12).11 But now consider the problem from the MSO perspective. A complement is very different from any other dependent of a head in that the elements a complement dominates are within the same CU of the “governing” head, whereas this is not true for the elements a noncomplement dominates. As a result, extraction from a complement can occur within the same derivational cascade, whereas extraction from a noncomplement cannot, given my assumptions. Basically, the following paradox arises. If a noncomplement is spelled out independently from its head, any extraction from the noncomplement will involve material from something that is not even a syntactic object (or, more radically, 53
DERIVATIONS
not even there); thus, it should be as hard as extracting part of a compound (or worse). On the other hand, if the noncomplement is not spelled out in order to allow extraction from it, then it will not be possible to collapse its elements, always assuming that the only procedure for linearization is the command – precedence correspondence that economy considerations sanction. Of course, one might now wonder how such simple structures as (13) can ever be generated, with movement of a complex wh-phrase. (13) [[which professor] [did you say [t left]]] If, for the collapse of a complex noncomplement’s elements to be sanctioned, they must be spelled out before they merge with the rest of the phrase marker, how can movement of noncomplements exist? Should such elements not be pronounced where they are spelled out? (14) *[did you say [[which professor] left]] The answer to this puzzle relates to the pending question of wh-feature accessibility in spelled-out phrases. I address both matters in the following section.
4 General predictions for the interpretive components The dynamically split model that MSO involves produces derivational cascades, each of which reaches the interpretive components in its own derivational life. If this model is correct, we should see some evidence of the relevant dynamics. Let us start with PF matters. The first sort of prediction that comes to mind relates to work by Cinque (1993), which goes back to Chomsky’s (1972) observations on focus “projections.” Generally speaking, the focus that manifests itself on a (complement) “right branch” may project higher up in the phrase marker, whereas this is not the case for the focus that manifests itself on a (noncomplement) “left branch.” For instance, consider (15). (15) a. Michaelangelo painted THOSE FRESCOES b. MICHAELANGELO painted those frescoes (15a) can answer several questions: “What did Michaelangelo paint?”, “What did Michaelangelo do?”, and even “What happened?” In contrast, (15b) can only answer the question “Who painted those frescoes?” Why? The architecture discussed here is very consistent with the asymmetry, regardless of the ultimate nature of focal “projection” or spreading (about which I will say nothing). The main contribution that MSO can make to the matter is evident: for this model, focus can only spread within a CU – that is, through a “right branch.” Spreading up a “left branch” would involve moving across two different CUs and hence would be an instance of a “crossdimensional” communication between different elements.12 There are other phonological domains that conform to this picture, predicting that a pause or a parenthetical phrase will sound natural between subject and predicate, for instance, or between any phrase and its adjuncts. 54
MULTIPLE SPELL-OUT
(16) a. Natural: Michaelangelo . . . painted those frescoes Unnatural or emphatic: Michaelangelo painted . . . those frescoes b. Natural: Michaelangelo painted those frescoes . . . in Florence Unnatural or emphatic, or different in interpretation: Michaelangelo painted . . . those frescoes in Florence The same results are obtained by replacing the dots in (16) with standard fillers like you know, I’m told, or I’ve heard (see Selkirk 1984). There are interesting complications, too. For example, Kaisse (1985) and Nespor and Vogel (1986) suggest that functional items phonologically associate to the lexical head they govern. Consider the examples in (17), from Lebeaux (1996), where underlined items are phonologically phrased together. (17) a. John may have seen Mary b. the picture of Mary These sorts of paradigms are compatible with the MSO proposal, although something else must be responsible for the cliticization (note, within a given CU). One can think of harder cases. A particularly difficult one from Galician is mentioned in Uriagereka (1988a). (18) vimo-los pallasos chegar saw.we-the clowns arrive “We saw the clowns arrive.” In this language, determiners surprisingly cliticize to previous, often thematically nonrelated heads; in (18), for instance, the determiner introducing the embedded subject attaches to the verb that takes as internal argument the reduced clause that this subject is part of. The sort of analysis I have given elsewhere (Uriagereka 1988a, 1996), whereby the determiner syntactically moves to the position shown in (18), is contrary to expectations, given my present account of the paradigm in (12). Otero (1996) gives reasons to believe that the determiner cliticization cannot be syntactic, but is instead a late morphophonological process; if my present analysis is correct, then Otero’s general conclusion must also be correct. Otero’s suggestion is in fact compatible with the general MSO architecture, so long as prosodic phrasing is allowed to take place after Spell-out, naturally enough in terms of the edges of “adjacent” (or successively spelled out) cascades of structure (see Otero (1996: 316); Lasnik (1995); and more generally Bobaljik (1995) and references cited there). This is straightforward for the conservative version of MSO, but is possible as well in the radical version, so long as prosody is a unifying mechanism in performance (in the same league as agreement, in the sense above). In fact, the radical version of MSO can rather naturally account for the variant of (18) given in (19). (19) *vimo-los pallasos chegaren (but OK: vimos os pallasos chegaren) saw.we-the clowns arrive.they “We saw the clowns arrive.” 55
DERIVATIONS
A minor change in the form of (18) – introducing agreement in the infinitival, the -en morpheme after chegar “arrive” – makes the determiner cliticization impossible. This can be explained if the cliticization is a form of agreement, in which case the subject of the embedded clause in (19) is forced to agree with two elements at once: the inflected infinitival and the matrix verb. If agreement is indeed an address, as is expected in the radical version of MSO, the kind of duplicity in (19) is unwanted; see the Agreement Criterion below.13 Needless to say, this bird’s-eye view of the problem does little justice to the complex issues involved in prosodic phrasing, not to mention liaison, phrasal stress, pausing, and other related topics. My only intention has been to point out what is perhaps already obvious: within the MSO system, “left branches” should be natural bifurcation points for PF processes, if the present architecture is correct. At the same time, if we find “communication” across the outputs of derivational cascades, the natural thing to do is attribute it to performative (at any rate, post-Spell-out) representations, plausibly under cascade adjacency. Similar issues arise for the LF component, where immediate predictions can be made and rather interesting problems again arise. The general prediction should by now be rather obvious: CUs are natural domains for LF phenomena. This is true for a variety of processes (binding of different sorts, obviation, scopal interactions, negative polarity licensing); it is indeed much harder to find instances of LF processes that do not involve command than otherwise. (Though there are such instances, to which I return.) More importantly, we must observe that CUs are just a subcase of the situations where command emerges. A problematic instance is patent in a now familiar structure, (20). (20)
M J
… N L
K
… H …
…
Although J and H are not part of the same CU (the latter is part of a CU dominated by L), J commands H. Empirically, we want the relation in (20) to hold in cases of antecedence, where J tries to be H’s antecedent (every boy thinks that [his father] hates him). The question is, if J and H are in different “syntactic dimensions” – after L is spelled out – how can J ever relate to H? The logic of the system forces an answer that is worth pursuing: there are aspects of the notion of antecedence that are irreducibly nonderivational. This might mean that antecedence is a semantic or pragmatic notion; either way, we are pushing it out of the architecture seen thus far – at least in part. The hedge is 56
MULTIPLE SPELL-OUT
needed because we still want antecedence to be sensitive to command, even if it does not hold within the CUs that determine derivational cascades. As it turns out, the dynamically split system has a bearing on this as well. Essentially, we want to be able to say that J in (20) can be the antecedent of H, but H cannot antecede anything within K. The radical and the conservative versions of the proposal deal with this matter differently, as follows. For the conservative view, recall that although L in (20) is not a syntactic object after Spell-out, it does have internal structure, and its information is not lost. To see this in detail, suppose that before Spell-out, L had the internal structure of his father, that is, {his, {his, father}}. After Spell-out, the structure becomes {his, his, father}, equivalent to {his, {{his}, {his, father}}}. By (6b), we can identify {his} and {his, father} as terms (and see Note 4). This is an important fact because, although the linearized object is not a syntactic object, it contains terms, which the system can identify if not operate with: they do not constitute a licit structure.14 The point is, if the relation of antecedence is based on the identification of a term like {his}, even the linearized structure does the job, inaccessible as it is to any further syntactic operation. (This highlights, I believe, the fact that accessibility is not the same as interpretability.) But consider the converse situation, where in spite of H’s being a term in (21), it cannot be the antecedent of L or K.15 (21)
M J
…
… H …
N
L
K
This suggests that ’s antecedent be characterized as in (22). (22) Where is a term in a derivational cascade D, a term, is ’s antecedent only if has accessed interpretation in D. This determination of ’s antecedent is derivational, unlike the notion term in (6), which is neutral with respect to whether it is characterized derivationally or not. It is worth noting that (22) is suggested as part of a definition, not of antecedence, but of antecedent of .16 The formal aspects of the notion in (22) are compatible with the system presented thus far, but its substantive character – that only a term that accesses LF in D can be an antecedent of the terms in D – does not follow from the architecture, at least in its conservative shape. Under the radical version of the MSO architecture, the internal structure of the spelled-out phrase is not relevant, since in this instance the phrasal architecture of the syntactic object need not be destroyed (what guarantees inaccessibility is the fact that the phrase has been sent to performance). In turn, this version has an intriguing way of justifying (22). 57
DERIVATIONS
As noted earlier, a problem for the performative approach is how to associate cashed-out structures to the positions where they make the intended sense. Antecedence as a process may be intricately related to this association problem. Simply put, the system ships structure X to the interpretive components; later, it comes up with structure Y, within which X must meaningfully find its place. This presupposes an addressing technique, so that X “knows” where in Y it belongs; by hypothesis, agreement is the relevant technique. It is natural, then, that X as a whole should seek a place within a part of Y. Now consider (22), and let be a term within an active structure Y, and an already cashed-out term (either X itself or part of it). Why should be the antecedent of only if accesses interpretation in Y’s derivational cascade? To answer this question, observe, first of all, that the performative version of MSO makes slightly more sense if the system works “top-down” than if it works “bottom-up.” Chomsky (2000) discusses this type of system and correctly points out that it is perfectly reasonable within present assumptions; Drury (1998) develops one such alternative. The only point that is relevant here is whether a series of noncomplements are sent to performance starting from the root of the phrase marker or from its foot. The logic of the system always forces a given noncomplement to access performance prior to the CU it associates with. Suppose this “top-down” behavior is generalized, so that the first noncomplement after the root of the phrase marker is shipped to performance first, the second noncomplement next, and so on, until finally the remaining structure is cashed out. With regard to antecedence, then, (22) amounts to this: for to be ’s antecedent, must have been sent to performance before – in fact, in a derivational cascade that is “live” through the address mechanism of agreement. In other words, antecedence presupposes agreement, which is very consistent with the well-known diachronic fact that agreement systems are grammaticalizations of antecedence/pronoun relations (see Barlow and Fergusson 1988). The intuition is that agreement is merely a pointer between two phrase markers, one that is gone from the system, and one that is still active in syntactic terms. Material within the cashed-out phrase marker is “out of sight”; the system only sees the unit as a whole for conceptual – intentional reasons (as the label that hooks up the agreement mechanism), and perhaps the phonological edges (under adjacency among cascades) for articulatory – perceptual reasons. Consequently, just as prosodic adjustments can take place only in the visible edges of the cashed-out material, so antecedence can be established only via the visible top of the phrase that establishes agreement with the syntactically active phrase (cf. (21)). The fact that the variable bound by the antecedent can be inside a cashed-out noncomplement (as in (20)) is perfectly reasonable if this variable serves no syntactic purpose vis-à-vis the antecedent. Differently put, whereas the syntactically active part of the structure needs to know where the antecedent is, it does not need to know precisely where the variable is, so long as it is interpretable within the cashed-out structure. This makes sense. The antecedent is a unique element 58
MULTIPLE SPELL-OUT
that determines the referential or quantificational properties of an expression; in contrast, the variable (a) is not unique (many semantic variables can be associated with a given antecedent) and (b) serves no purpose beyond its own direct association to some particular predicate – it determines nothing for other parts of the structure. In sum, if the radical MSO view is correct, antecedence is a semantic process that is paratactically instantiated through the transderivational phenomenon of agreement: the antecedent must syntactically agree (abstractly, of course; the agreement may or may not be realized morphologically with the structure that contains its associated variable). There is no structural restriction on the variable,17 although semantically it will be a successful variable only if it happens to match up with the agreement features of its antecedent.
5 How noncomplements can move Consider next a question we left pending: why (13), repeated here, is perfect. (23) [[which professor] [did you say [t left]]] Let us first evaluate this example from the perspective of the radical version of MSO. Strictly speaking, the phrase which professor is never directly connected to the structure above it – not even to the predicate left. Rather, it agrees with the relevant connecting points, which are presumably occupied by some categorial placeholder [D] (much in the spirit of ideas that go back to Lebeaux 1988). It is [D] that receives a -role, moves to a Case-checking position, and eventually ends up in the wh-site – which must mean that [D] hosts thematic, Case, and wh-information, at least. It is thus not that surprising, from this perspective, that the wh-feature should be accessible to the system even after the spelling out of which professor (wherever it takes place), since what stays accessible is not an element within which professor, but an element the [D] category carries all along, which eventually matches up with the appropriate features of which professor, as in (24). (24) [which professor]i … [[D]i [you say [[D]i left]]] An immediate question is why the “minitext” in (24) is not pronounced as follows: (25) [[D]i [you say [[D]i left]]] … [which professor]i Reasonably, though, this relates again to the phenomenon of antecedence, and in particular the familiarity/novelty condition; in speech, information that sets up a discourse comes before old or anaphoric information (see Hoffman 1996 for essentially this idea). A second question relates to the Condition on Extraction Domain (CED) effect account. Why can (12b) not now be salvaged as in (26a)? Strictly, (26a) cannot be linearized, since the subject of see you is too complex. But suppose we proceed in two steps, as in (26b). 59
DERIVATIONS
(26) a. [who]i … [[a critic of [D]i] see you] b. [who]j … [a critic of [D]j]i … [[D]i see you] There is a minimal, yet important, difference between (26b) and (20), where we do want J to relate to H as its antecedent. Whereas there is a relation of grammar that tries to connect who and [D] (the equivalent of its trace) in (26b), no relation of grammar connects an antecedent J to a bound variable H in (20). In other words, we want the long-distance relation in (26b) to be akin to movement, but clearly not in (20). But how is long-distance movement captured in the radical version of MSO, if we allow associations like the one in (24), where which professor has never been inside the skeletal phrase? The key is the [D] element, which moves to the relevant sites and associates via agreement to whatever phrase has been cashed out. In (26b), the [D] inside a critic of must associate to who (alternatively, if of who is a complement of critic, this element moves out directly – but see Note 14). Now, how does who associate to the matrix C position? If who associates inside a critic of, then it does not associate in the matrix C; conversely, if who associates in the matrix C, as a question operator, then it cannot associate inside a critic of. The only way an element like who can grammatically relate to two or more positions at once – that is, to -, Case, or wh-positions – is if all these positions are syntactically connected, in which case it is the [D] element that moves through them and eventually associates to who. This, of course, is what happens in the perfect (12a), repeated here. (27) [who]i … [[D]i [you see [a critic of [D]i]]] Here again, agreement uniqueness is at play, as I speculatively suggested regarding the ungrammaticality of (19). This important point can be stated explicitly as follows: (28) Agreement Criterion A phrase that determines agreement in a phrase cannot at the same time determine agreement in a phrase . This criterion is tantamount to saying that agreement is a rigidly unique address. It may well be that (28) follows from deeper information – theoretic matters, but I will not pursue that possibility here.18 The conservative version of MSO can also account successfully for (23), although (naturally) with assumptions that do not bear on agreement considerations and instead introduce very different operational mechanics. As before, the issue is to somehow have access to which professor, even though this phrase must also be sent to Spell-out if it attaches as a noncomplement. This statement looks contradictory; but whether it is or not depends on exactly how the details of movement are assumed to work. Consider whether the two steps involved in movement – copying some material and then merging it – must immediately feed one another, within a given derivation. Suppose we assume that move is a collection of operations, as 60
MULTIPLE SPELL-OUT
several researchers have recently argued (see, e.g. Kitahara 1994; Nunes 1995). Thus, movement of a complex phrase marker may proceed in several steps – for example, as in (29). (29) a. Copy one of two independently merged phrases L
L
K
…
…
copy …
b. Spell out the lower copy as trace L
L [Ø]
K
…
…
c. Merge the trace N L
L [Ø]
K
…
…
d. Merge the higher copy (possibly in a separate derivation) M L
…
…
N L [Ø]
K …
The key is the “in parallel” strategy implicit in (29a,b); the rest of the steps are straightforward. So let us see whether those initial steps can be justified. Technically, what takes place in (29a,b) is, at least at first sight, the same as what takes place in the formation of a phrase marker as in (30). (30) a. Numeration: {the, a, man, saw, woman, …} b. the saw the
man
saw
a a
woman
Prior to merging [the man] and [saw [a woman]], the system must assemble them in separate, completely parallel derivational spaces; there is no way of 61
DERIVATIONS
avoiding this, assuming Merge and standard phrasal properties of DPs and VPs. (29a) capitalizes on this possibility; instead of copying lexical items from the numeration, as in (30), in (29a) the system copies the items from the assembled phrase marker. In turn, (29b) employs the option of deleting phonetic material, thus making it unavailable for PF interpretation. It is reasonable to ask why this step is involved, but the question is no different from that posed by Nunes (1999), concerning why the copy of K in (31) is not pronounced when K is moved. (31) a. [K…[…K…]…] b. [K…[…[Ø]…]…] Why is who did you see not pronounced who did you see who? After all, if movement is copying plus deletion, why is deletion necessary, particularly at PF? Nunes’s answer capitalizes on the LCA, by assuming that identical copies are indeed identical. Hence, Kayne’s linearization question has no solution; for instance, in the above example does who command, or is it commanded by, you? It depends on which who we are talking about. One is tempted to treat each of these as a token of a lexical type, but they are not; each who (other than the lexically inserted occurrence) emerges as a result of mere derivational dynamics. Then there is no solution unless, Nunes reasons, the system deletes one of the copies on its way to PF (the place where linearization is required in Chomsky’s system). If only one copy of who is left, the linearization answer is trivial: in standard terms, the remaining copy precedes whatever it commands.19 (29b) has the same justification as Nunes’s copy deletion. Note that if the system does not spell out the lower copy of L as a trace, when it reaches the stage represented in (29d), it will not be able to determine whether L commands or is commanded by all other elements in the phrase marker, and thus this object will not collapse into a valid PF realization. In effect, then, there is a way to keep something like which professor accessible even if it starts its derivational life as a subject, by making a copy of it in advance and having that copy be the one that merges in the ultimate Spell-out site, the other(s) being spelled out as trace(s). A question remains, however: why can this procedure not provide a gambit for escaping the ungrammaticality of CED effects? For example, what prevents the following grammatical derivation of (26)? 1 2 3 4 5
Assemble see you and a critic of who. Copy who in parallel. Realize the first copy of who as a trace. Merge a critic of t to see you and all the way up to the C projection. Attach the stored copy of who as the specifier of C.
A way to prevent this unwanted derivation capitalizes on the desire to limit the globality of computational operations, as argued for in Chomsky (2000) and references cited there (see also Chapter 4). Step 2 in the derivation is clearly very 62
MULTIPLE SPELL-OUT
global: at the point of merging who, the system must know that this element will be attracted further up in the phrase marker – in a completely different derivational cascade. Crucially, the system cannot wait until the matrix C appears in order to make a copy (in a parallel derivational space) of who, thereby making the already attached copy of who silent (i.e. a trace); in particular, it cannot simply go back to the site of who and add the instruction to delete after it has abandoned the “cycle” a critic of who, since that operation would be countercyclic. It is literally when the “lower” who attaches that the system must know to take it as a trace (cf. (29)), which entails that the system must have access to the C that attracts who, even when it has not yet left the numeration. Let us again consider all the relevant examples side by side (CUs are boxed, trace copies are parenthesized). (32)
a.
which professor C you see a critic of (which professor)
b.
which professor C you say ( which professor ) left
c. * which professor C a critic of (which professor) see you (32a) is straightforward. Before the movement of which professor, the sentence involves a single CU, within which C trivially attracts the necessary wh-feature; after which professor pied-pipes along with this wh-feature, a new CU emerges, which is of no particular interest. (32b) and (32c) are more complicated, since they involve two CUs prior to wh-movement. The issue is how they differ. We saw earlier that the derivation in (32c) cannot proceed cyclically if it is allowed to go all the way up to the CP level, then to return to the lower which professor and delete it. Rather, at the point when which professor attaches, the system must know that it is being (overtly) attracted by C and hence must copy it in parallel and attach the initial copy as a trace. The same is true of which professor in (32b), but there C and (the entire phrase) which professor are at least part of the same CU, whereas in (32c) (the entire phrase) which professor is part of the CU of a critic of which professor, and C is not. This must be the key: as expected, only elements within the same CU can relate. But then, is (33) still not a problem? (33) a critic of which professor saw you (34) a critic of which professor C ( a critic of which professor ) see you The difference between (32c) and (33)/(34) is this. In the former, the system must decide to copy which professor as a trace while in the CU of a critic of 63
DERIVATIONS
which professor. In the latter, it is not which professor but a critic of which professor that is copied as a trace; hence, the system can reach the copying decision while in the CU where C is merged – that is, locally. At the same time, deletion of a critic of which professor is a cyclic process if we define the “cycle” within the confines of a CU that has not been abandoned. To make matters explicit, I state the following principle: (35) Principle of Strict Cyclicity All syntactic operations take place within the derivational cycles of CUs. In other words, the cascades of derivational activity that we have seen all along are responsible for limiting the class of activities the system engages in, in purely operational terms. Cross-cascade relations of any sort – Attract, Move, backtracking for deletion purposes, or presumably any others – are strictly forbidden by (35); a derivation that violates (35) is immediately canceled.
6 Beyond derivations In both versions of MSO, CUs are crucial. This is explicitly encoded in (35), for the conservative view, and is trivially true in the radical view, where only CUs exist in competence grammar – unification of CUs being left for performance. If the present model is anywhere near right, this fact can be used as a wedge to separate various sorts of phenomena; essentially, cyclic ones are syntactic, whereas noncyclic ones are paratactic, or perhaps not syntactic at all. We looked at two of the former: cliticization across CUs in the PF component, and the establishment of antecedent relations in the LF component. Even though these phenomena were suggested not to be strictly derivational, the derivational results were taken to importantly limit the class of possible relations involved in each instance – adjacency of cascades for PF, “top” of CUs for LF – as if syntax carved the path interpretation must blindly follow. But are there situations in which syntax leaves no significant imprint on representational shapes? Presumably that would happen, within present assumptions, whenever a systematic phenomenon simply does not care about command, or even exhibits anticommand behavior. Weak crossover may well be one such instance. I want to suggest the possibility of analyzing a typical weak crossover effect, as in (36b), as a violation of the condition on novelty/familiarity, which I take to be pragmatic. (36) a. His friend knocked on the door. A man came in. b. his friend killed a man The familiar his cannot antecede the novel a man in (36b) any more than it can in (36a). This is so, of course, only if we adopt the null hypothesis that the novelty or familiarity of a given file is assumed not just across separate sentences (36a), but also intrasententially (36b). (This must be the case, virtually by definition, for the radical version of MSO, for which each separate CU is a text.) 64
MULTIPLE SPELL-OUT
In order to extend this sort of analysis to the examples in (37), we must postulate that the operators which trigger weak crossover involve an existence predicate of the sort postulated by Klima and Kuroda in the 1960s (see Chomsky 1964), which gives them a characteristic indefinite or existential character.20 (37) a. his friend killed everyone b. who did his friend kill That is, the logical form of everyone must be as coded in its morphology: every x, one (x). Something similar must be said about who; this would be consistent with the morphological shape of such elements in East Asian languages (see, e.g. Kim 1991; Watanabe 1992; and references cited there). Then the existence element will induce a novelty effect with regard to the familiar pronoun, as desired. The point I am trying to establish is simple. Postsyntactic machinery may be needed to account for some familiar phenomena. The fact that they involve LF representations in the Government-Binding model does not necessarily force us to treat them as LF phenomena in the present system – so long as we treat them somehow (see Chapter 8). I strongly suspect that Condition C of the Binding Theory is another such phenomenon – as are, more generally, matters pertaining to long-distance processes that are extremely difficult to capture in purely syntactic terms (e.g. some kinds of anaphora, unbounded ellipsis under parallelism).
7 Conclusions The system I have programmatically sketched in this chapter is much more dynamically derivational than its alternative in Chomsky (1995c) (although it approximates the one in Chomsky (2000), to the point of being conceptually indistinguishable). That the system is derivational, and that it is dynamically (or cyclically) so, are both interesting ideas in their own right, with a variety of consequences for locality and the class of representations the architecture allows. Curiously, one consequence (best illustrated in Weinberg 1999) is that the gap between competence and performance is partly bridged, radically so in one version of the program. This has a repercussion for competence: it provides a rationale for the existence of agreement.
65
4 CYCLICITY AND EXTRACTION DOMAINS † with Jairo Nunes 1 Introduction If something distinguishes the Minimalist Program of Chomsky (1995b, 2000) from other models within the principles-and-parameters framework, it is the assumption that the language faculty is an optimal solution to legibility conditions imposed by external systems. Under this perspective, a main desideratum of the program is to derive substantive principles from interface (“bare output”) conditions, and formal principles from economy conditions. It is thus natural that part of the minimalist agenda is devoted to reevaluating the theoretical apparatus developed within the principles-and-parameters framework, with the goal of explaining on more solid conceptual grounds the wealth of empirical material uncovered in past decades. This chapter takes some steps toward this goal by deriving Condition-on-Extraction-Domains (CED) effects (in the sense of Huang 1982) in consonance with these general minimalist guidelines. Within the principles-and-parameters framework, the CED is generally assumed to be a government-based locality condition that restricts movement operations (see Huang 1982 and Chomsky 1986a, for instance). But once the notion of government is abandoned in the Minimalist Program, as it involves nonlocal relations (see Chomsky 1995b: Chapter 3), the data that were accounted for in terms of the CED call for a more principled analysis. Some of the relevant data regarding the CED are illustrated in examples (1)–(3). Example (1) shows that regular extraction out of a subject or an adjunct yields unacceptable results; (2) shows that parasitic gap constructions structurally analogous to (1) are much more acceptable; finally, (3) shows that if the licit parasitic gaps of (2) are further embedded within a CED island such as an adjunct clause, unacceptable results arise again (see Kayne 1984; Contreras 1984; Chomsky 1986a). (1) a. *[CP [which politician]i [C didQ [IP [pictures of ti] upset the voters]]] b. *[CP [which paper]i [C didQ [IP you read Don Quixote [PP before filing ti]]]] (2) a. [CP [which politician]i [C didQ [IP [pictures of pgi] upset ti]]] b. [CP [which paper]i [C didQ [IP you read ti [PP before filing pgi]]]] 66
CYCLICITY AND EXTRACTION DOMAINS
(3) a. *[CP [which politician]i [C didQ [IP you criticize ti [PP before [pictures of pgi] upset the voters]]]] b. *[CP [which book]i [C didQ [IP you finally read ti [PP after leaving the bookstore [PP without finding pgi]]]]] Thus far, the major locality condition explored in the Minimalist Program is the Minimal Link Condition stated in (4) (see Chomsky 1995b: 311). (4) Minimal Link Condition K attracts only if there is no , closer to K than , such that K attracts . The unacceptability of (5a), for instance, is taken to follow from a Minimal Link Condition violation: at the derivational step represented in (5b), the interrogative complementizer Q should have attracted the closest wh-element who, instead of attracting the more distant what. (5) a. *[I wonder [CP whati [C Q [IP who [VP bought ti]]]]] b. [CP Q [IP who [VP bought what]]] The Minimal Link Condition is in consonance with the general economy considerations underlying minimalism, in that it reduces the search space for computations, thereby reducing (“operative”) computational complexity. However, it has nothing to say about CED effects such as the ones illustrated in (1)–(3). In (1a), for instance, there is no wh-element other than which politician that Q could have attracted. In this chapter we argue, first, that CED effects arise when a syntactic object that is required at a given derivational step has become inaccessible to the computational system at a previous derivational stage; and second, that the contrasts between (1) and (2), on the one hand, and between (2) and (3), on the other, are due to their different derivational histories. These results arise as by-products of two independent lines of research on the role of Kayne’s (1994) Linear Correspondence Axiom (LCA) in the minimalist framework: the Multiple Spell-Out system of Chapter 3, which derives the induction step of the LCA by eliminating the unmotivated stipulation that Spell-out must apply only once, and Nunes’s (1995, 1998) version of the copy theory of movement, which permits instances of sideward movement (i.e. movement between two unconnected syntactic objects) if the LCA is satisfied. The chapter is organized as follows. In Section 2, we show how the standard CED effects illustrated in (1) can be accounted for within the Multiple SpellOut theory proposed in the previous chapter. In Section 3, we show that sideward movement allows constrained instances of movement from CED islands, resulting in parasitic gap constructions such as (2). In Section 4, we provide an account of the unacceptability of constructions such as (3) by reducing the computational complexity associated with sideward movement in terms of Chomsky’s (2000) cyclic access to subarrays. Finally, a brief conclusion is presented in Section 5. 67
DERIVATIONS
2 Basic CED effects Any account of the CED has to make a principled distinction between complements and noncomplements (see Cattell 1976 for early, very useful discussion). Kayne’s (1994) LCA has the desired effect: a given head can be directly linearized with respect to the lexical items within its complement, but not with respect to the lexical items within its subject or adjunct. The reason is trivial. Consider the phrase-marker in (6), for instance (irrelevant details omitted). (6)
VP DP the man
V' PP
V'
remained
AP after that fact
proud of her
It is a simple fact about the Merge operation that only the terminal elements in boldface in (6) can be assembled without ever abandoning a single derivational workspace; by contrast, the terminal elements under DP and PP must first be assembled in a separate derivational space before being connected to the rest. One can capitalize on this derivational fact in various ways. Let us recast Kayne’s (1994) LCA in terms of Chomsky’s (1995b) bare phrase-structure and simplify its definition by eliminating the recursive step, as formulated in (7).1 (7) Linear Correspondence Axiom A lexical item precedes a lexical item iff asymmetrically c-commands . Clearly, all the terminals in boldface in (6) stand in valid precedence relations, according to (7). The question is how they can establish precedence relations with the terminals within DP and PP, if the LCA is as simple as (7). Chapter 3 suggests an answer, by taking the number of applications of the rule of Spell-out to be determined by standard economy considerations, and not by the unmotivated stipulation that Spell-out must apply only once. Here we will focus our attention to cases where multiple applications of Spell-out are triggered by linearization considerations (see Chapter 5 for other cases and further discussion). The reasoning goes as follows. Let us refer to the operation that maps a phrase structure into a linear order of terminals in accordance to the LCA in (7) as Linearize.2 Under the standard assumption that phrasal syntactic objects are not legitimate objects at the PF level, Linearize can be viewed as an operation imposed on the phonological component by legibility requirements of the Articulatory – Perceptual interface, as essentially argued by Higginbotham (1983b). If this is so and if the LCA is as simple as (7), the 68
CYCLICITY AND EXTRACTION DOMAINS
computational system should not ship complex structures such as (6) to the phonological component by means of the Spell-out operation, because Linearize would not be able to determine precedence relations among all the lexical items. Assuming that failure to yield a total order among lexical items leads to an ill-formed derivation, the system is forced to employ multiple applications of Spell-out, targeting chunks of structure that Linearize can operate with. Under this view, the elements in subject and adjunct position in (6) can be linearized with regards to the rest of the structure in accordance with (7) in the following way: (i) the DP and the PP are spelled out separately and in the phonological component, their lexical items are linearized internal to them; and (ii) the DP and the PP are later “plugged in” where they belong in the whole structure. We assume that the label of a given structure provides the “address” for the appropriate plugging in, in both the phonological and the interpretive components.3 That is, applied to the syntactic object K { , {, }}, with label and constituents and (see Chomsky 1995b: Chapter 4), Spellout ships {, } to the phonological and interpretive components, leaving K only with its label. Since the label encodes the relevant pieces of information that allow a category to undergo syntactic operations, K itself is still accessible to the computational system, despite the fact that its constituent parts are, in a sense, gone; thus, for instance, K can move and is visible to linearization when the whole structure is spelled out. Another way to put it is to say that once the constituent parts of K are gone, the computational system treats it as a lexical item. In order to facilitate keeping track of the computations in the following discussion, we use the notation K [ , ] to represent K after it has been spelled out. An interesting consequence of this proposal is that Multiple Spell-Out of separate derivational cascades derives Cattell’s (1976) original observation that only complements are transparent to movement. When Spell-out applies to the subject DP in (6), for instance, the computational system no longer has access to its constituents and, therefore, no element can be extracted out of it. Let us consider a concrete case, by examining the relevant details of the derivation of (8), after the stage where the structures K and L in (9) have been assembled by successive applications of Merge. (8) *Which politician did pictures of upset the voters? (9) a. K [vP upset the voters] b. L [pictures of which politician] If the LCA is as simple as in (7), the complex syntactic object resulting from the merger of K and L in (9) would not be linearizable, because the constituents of K would not enter into a c-command relation with the constituents of L. The computational system then applies Spell-out to L, allowing its constituents to be linearized in the phonological component, and merges the spelled-out structure L with K, as illustrated in (10).4 69
DERIVATIONS
(10)
vP L' [picturespictures, of, which, politician]
v' v
VP upset the voters
Further computations involve the merger of did and movement of L to [Spec, TP]. Assuming Chomsky’s (1995b: Chapter 3) copy theory of movement, this amounts to saying that the computational system copies L and merges it with the assembled structure, yielding the structure in (11) (the deletion of the lower copy in the phonological component is discussed in Section 3). (11) [TP [pictures pictures, of, which, politician] [T did [vP [pictures pictures, of, which, politician] [v upset the voters]]]] In the next steps, the interrogative complementizer Q merges with TP and did adjoins to it, yielding (12). (12) [CP didQ [TP [pictures pictures, of, which, politician] [T did [vP [pictures pictures, of, which, politician] [v upset the voters]]]]] In (12), there is no element that can check the strong wh-feature of Q. Crucially, the wh-element of either copy of L [pictures pictures, of, which, politician] became unavailable to the computational system after L was spelled out. The derivation therefore crashes. Under this view, there is no way for the computational system to yield the sentence in (8) if derivations unfold in a strictly cyclic fashion, as we are assuming. To put it in more general terms, extraction out of a subject is prohibited because, at the relevant derivational point, there is literally no syntactic object within the subject that could be copied. Similar considerations apply to the sentence in (13), which illustrates the impossibility of “extraction” out of an adjunct clause. (13) *Which paper did you read Don Quixote before filing? Assume for concreteness that the temporal adjunct clause of (13) is adjoined to vP. Once K and L in (14) have been assembled, Spell-out must apply to L, before K and L merge; otherwise, the lexical items of K could not be linearized with respect to the lexical items of L. After L is spelled out as L, it merges with K, yielding (15). In the phonological component, Linearize applies to the lexical items of L and the resulting sequence will be later plugged in the appropriate place, after the whole structure is spelled out. The linear order between the lexical items of L and the lexical items of K will then be (indirectly) determined by whatever fixes the order of adjuncts in the grammar.5 (14) a. K [vP you read Don Quixote] b. L [PP before PRO filing which paper] 70
CYCLICITY AND EXTRACTION DOMAINS
(15)
vP [vP you read Don Quixote]
L' [beforebefore, PRO, filing, which, paper]
What is relevant for our current discussion is that after the (simplified) structure in (16) is formed, there is no wh-element available to check the strong whfeature of Q and the derivation crashes; in particular, which paper is no longer accessible to the computational system at the step where it should be copied to check the strong feature of Q. As before, the sentence in (13) is underivable through the cyclic derivation outlined in (14)–(16). (16) [CP didQ [TP you [vP [vP read Don Quixote] [before before, PRO, filing, which, paper]]]] Finally, let us consider (17a). Structures like (17a) have recently been taken to show that cyclicity cannot be violated. If movement of who to [Spec, CP] were allowed to proceed prior to the movement of to the subject position, (17a) should pattern like (17b), where who is extracted from within the object, contrary to fact. If cyclicity is inviolable, so the argument goes, who in (17a) must have moved from within the subject, yielding a CED effect (see Chomsky 1995b: 328; Kitahara 1997: 33). (17) a. *whoi was [ a picture of ti]k taken tk by Bill b. whoi did Bill take [ a picture of ti] A closer examination of this reasoning, however, reveals that it only goes through in a system that takes traces to be grammatical primitives. If the trace of in (17a) is simply a copy of , as shown in (18), the copy of who inside the object should in principle be able to move to [Spec, CP], incorrectly yielding an acceptable result. Crucially, the copy of who within the subject does not c-command the copy within the object and no intervention effect should arise. (18) [CP Q [TP [ a picture of who] was taken [ a picture of who] by Bill]] Before we discuss how the system we have been exploring, which assumes the copy theory of movement, is able to account for the unacceptability of (17a), let us first consider the derivation of (19), where no wh-movement is involved. (19) Some pictures of John were taken by Bill. In (20), the computational system makes a copy of some pictures of John, spells it out and merges the spelled-out copy with K, forming the object in (21). (20) a. K [TP were [VP taken [some pictures of John] by Bill]] b. L [some some, pictures, of, John] (21) [TP [some some, pictures, of, John] [T were [VP taken [some pictures of John] by Bill]]] 71
DERIVATIONS
Under reasonable assumptions regarding chain uniformity, the elements in subject and object positions in (21) cannot constitute a chain because they are simply different kinds of syntactic objects (a label and a phrasal syntactic object). Assume for the moment that lack of chain formation in (21) leads to a derivational crash (see next section for further discussion). Given the perfect acceptability of (19), an alternative route must be available. Recall that under the Multiple Spell-Out approach, the number of applications of Spell-out is determined by economy. Thus, complements in general do not need to be spelled out in separate derivational cascades because they can be linearized within the derivational cascade involving the subcategorizing verb – that is, a single application of Spell-out can linearize both the verb and its complement. In the case of (21), however, a licit chain can only arise if the NP in the object position has been independently spelled out, so that the two copies can constitute a chain. This leads us to conclude that convergence demands may force Spell-out to apply to complements, as well. That being so, the question then is whether the object is spelled out in (20a) before copying takes place or only after the structure in (21) has been assembled. Again, we may find the answer in economy: if Spell-out applies to some pictures of John before it is copied, the copies will be already spelled out and no applications of Spell-out will be further required for the copies.6 The derivation of (19) therefore proceeds along the lines of (22): the NP is spelled out before being copied in (22a) and its copy merges with the whole structure, as shown in (22b); the two copies of the NP can then form a licit chain and the derivation converges. (22) a. [TP were [VP taken [some some, pictures, of, John] by Bill]] b. [TP [some some, pictures, of, John] [T were [VP taken [some some, pictures, of, John] by Bill]]] Returning to (17a), its derivation proceeds in a cyclic fashion along the same lines, yielding the (simplified) structure in (23). Once the stage in (23) is reached, no possible continuation results in a convergent derivation: the strong wh-feature of Q must be checked and neither copy of who is accessible to the computational system. The approach we have been exploring here is therefore able to account for the unacceptability of (17a), while still adhering to the view that traces are simply copies and not grammatical formatives. (23) [CP wasQ [TP [a a, picture, of, who] [VP taken [a a, picture, of, who] by Bill]]] To summarize, CED effects arise when a given syntactic object K that would be needed for computations at a derivational stage Dn has been spelled out at a derivational stage Di prior to Dn, thereby becoming inaccessible to the computational system after Di. Under this view, the CED is not a primitive condition on movement operations; it rather presents itself as a natural consequence in a derivational system that obeys strict cyclicity and takes general economy considerations to determine the number of applications of Spell-out.7 72
CYCLICITY AND EXTRACTION DOMAINS
The question that we now face is how to explain the complex behavior of parasitic gap constructions with respect to the CED, as seen in the introduction, if the deduction of the CED developed above is correct. This is the topic of the next sections. Notice, for instance, that we cannot simply assume that parasitic gap constructions bypass some condition X that regular extractions obey; in fact, we are suggesting that there is no particular condition X to prevent extraction and, therefore, no way to bypass it either. Before going into the analysis proper, we briefly review Nunes’s (1995, 1998) analysis of parasitic gaps in terms of sideward movement, which provides us with the relevant ingredients to address the issue of CED effects in parasitic gap constructions.
3 Sideward movement and CED effects With the incorporation of the copy theory into the Minimalist Program, Move has been conceived of as a complex operation encompassing: (i) a suboperation of copying; (ii) a suboperation of merger; (iii) a procedure identifying copies as chains; and (iv) a suboperation deleting traces (lower copies) for PF purposes (see Chomsky 1995b: 250). Nunes (1995, 1998) develops an alternative version of the copy theory of movement with two main distinctive features. First, his theory takes deletion of traces in the phonological component to be prompted by linearization considerations. Take the structure in (24b), for instance, which is based on the (simplified) initial numeration N in (24a) and arises after John moves to the subject position. (24) a. N {arrested1, John1, was1} b. [Johni [was [arrested Johni]]] The two occurrences of John in (24b) are nondistinct copies (henceforth represented by superscripted indices) in the sense that both of them arise from the same item within N in (24a). If nondistinct copies are truly “the same” for purposes of linearization, (24b) cannot be mapped into a linear order.8 Given that the verb was, for instance, asymmetrically c-commands the lower copy of John and is asymmetrically c-commanded by the higher copy, the LCA should require that was precede and be preceded by John, violating the asymmetry condition on linear orders (if precedes , it must be the case that does not precede ). The attempted linearization of (24b) also violates the irreflexivity condition on linear orders (if precedes , it must be the case that 苷 ); since the upper copy of John asymmetrically c-commands the lower one, John would be required to precede itself. Simply put, deletion of traces in the phonological component is forced upon a given chain CH in order for the structure containing CH to be linearized.9 The second distinctive feature of Nunes’s (1995, 1998) version of the copy theory, which is crucial for the following discussion, is that Move is not taken to be a primitive operation of the computational system; it is rather analyzed as the mere reflex of the interaction among the independent operations described in (i)–(iv) above. In particular, this system allows constrained instances of 73
DERIVATIONS
sideward movement, where the computational system copies a given constituent of a syntactic object K and merges with a syntactic object L, which has been independently assembled and is unconnected to K, as illustrated in (25).10 (25) a. [K…i…]
Copy
i
Merge
[L…]
[Mi[L…]]
b. [K…i…]
Let us consider how a parasitic gap construction such as (26a) can be derived under a sideward movement analysis, assuming that its initial numeration is the one given in (26b) (irrelevant items were omitted). (26) a. Which paper did John file after reading? b. N {which1, paper1, did1, John1, PRO1, Q1, file1, after1, reading1, v2, C1} (27) shows the step after the numeration N in (26b) has been reduced to N and K has been assembled. Following Munn (1994) and Hornstein (2001), we assume that what Chomsky (1986a) took to be null operator movement in parasitic gap constructions is actually movement of a syntactic object built from the lexical items of the numeration. From the perspective we are exploring, that amounts to saying that the computational system spells out which paper in (27b), makes a copy of the spelled-out object (see Note 6), and merges it with K to check whatever feature is involved in successive cyclic A-movement, yielding L in (28a). The computational system then selects the preposition after and merges with L, forming the PP in (28b). (27) a. N {which0, paper0, did1, John1, PRO0, Q1, file1, after1, reading0, v1, C0} b. K [CP C PRO reading [which paper]] (28) a. L [CP [which which, paper]i C PRO reading [which which, paper]i] b. M [PP after [CP [whichwhich, paper]i C PRO reading [whichwhich, paper]i]] Consider now the stage after file is selected from the numeration, as shown in (29). Following Chomsky (2000), we assume that the selectional/thematic properties of file must be checked under Merge. However, possible continuations of the derivational step in (29) that merge file with the remaining elements of the reduced numeration N in (27a) do not lead to a convergent derivation; under standard assumptions, John should not be able to enter into a -relation with both file and the remaining light verb, or check both the accusative Case associated with the light verb and the nominative Case associated with did. Once lexical insertion leads to crashing, the system must resort to (sideward) 74
CYCLICITY AND EXTRACTION DOMAINS
movement, copying which paper from L and merging it with file, as shown in (30).11 The wh-copy in (30b) may then “mind its own business” within derivational workspace P, independently of the other copies inside M. This is the essence of the account of parasitic gaps in terms of sideward movement. (29) a. M [PP after [CP [whichwhich, paper]i C PRO reading [whichwhich, paper]i]] b. O file (30) a. M [PP after [CP [whichwhich, paper]i C PRO reading [whichwhich, paper]i]] b. P [VP file [which which, paper]i] It is important to note that sideward movement of [which which, paper] in (29)–(30) was possible because M had not been spelled out; hence, the computational system had access not only to M itself, but also to the constituents of M. The situation changes in subsequent derivational steps. As discussed in Section 2, a complex adjunct must be spelled out before it merges with a given syntactic object; hence, the computational system spells out M as M in (31a) and merges M with the matrix vP, as represented in (31b). (31) a. M [afterafter, [whichwhich, paper]i, C, PRO, reading, [whichwhich, paper]i ] b. vP vP
M'
[VP John file [whichwhich, paper]i]
[afterafter, [which,which, paper.]i, C, PRO, reading, [whichwhich, paper]i]
Further computations involve lexical insertion of the remaining items of the numeration and movement of John and did, resulting in the (simplified) structure represented in (32). (32) [CP didQ [IP John [vP [vP file [which which, paper]i] [after after, [which which, paper]i, C, PRO, reading, [which which, paper]i ]]]] The copies of [which which, paper] inside the adjunct clause in (32) are not available for copying, because the whole adjunct clause has already been spelled out; however, the copy in the object of file is still available to the computational system and, therefore, it can move to check the strong wh-feature of Q, yielding the (simplified) structure in (33), where the copies are numbered for ease of reference.
75
DERIVATIONS
(33)
CP [whichwhich, paper]1
C'
didQ
TP
John
T' T
vP
vP
M'
[VP file [whichwhich, paper]2]
[afterafter, [whichwhich, paper]3, C, PRO, reading, [whichwhich, paper]4]
Let us now focus on the computations related to the deletion of wh-traces of (33) in the phonological component. As discussed before, the presence of multiple nondistinct copies prevents linearization. In the phonological component, the trace of the wh-chain within M is then deleted before Linearize applies to M to yield M, as shown in (34). (34) M [afterafter, [whichwhich, paper]3, C, PRO, reading, [whichwhich, paper]4 ] After Spell-out applies to the whole structure in (33) and the previously spelledout material is appropriately plugged in, two wh-chains should be further identified for trace deletion to take place: the “regular” chain CH1 (copy1, copy2) and the “parasitic” chain CH2 (copy1, copy3).12 Identification of CH1 is trivial because copy1 clearly c-commands copy2; hence, deletion of copy2 is without problems. Identification of CH2 is less obvious, because M is no longer a phrasestructure after being linearized. However, if c-command is obtained by the composition of the elementary relations of sisterhood and containment, as proposed by Chomsky (2000: 31) (see also Epstein 1999), copy1 does c-command copy3 in (33), because the sister of copy1, namely C, ends up containing copy3 after the linearized material of M is properly plugged in.13 The phonological component then deletes copy3, yielding (35). Finally, Linearize applies to (35) and the PF output associated with (26a) is derived.14 (35) [CP [which which, paper]1 didQ [IP John [vP [vP file [which which, paper]2] [after after, [whichwhich, paper]3, C, PRO, reading, [whichwhich, paper]4 ]]]] Assuming that derivations proceed in such a strictly cyclic fashion, the contrast between unacceptable constructions involving “extraction” from within an adjunct island such as (13) and parasitic gap constructions such as (26a), there76
CYCLICITY AND EXTRACTION DOMAINS
fore, follows from their different derivational histories. In the unacceptable case, the clausal adjunct has already been spelled out and its constituents are no longer available for copying at the derivational step where Last Resort would license the required copying (see Section 2). In the acceptable parasitic gap constructions, on the other hand, a legitimate instance of copying takes place before the clausal adjunct is spelled out (see (29)–(30)); that is, sideward movement, if appropriately constrained by Last Resort, provides a kind of escape hatch for movement from within adjuncts.15 Similar considerations apply to parasitic gaps inside subjects. Let us consider the derivation of (36a), for instance, which starts with the numeration N in (36b). (36) a. Which politician did pictures of upset? b. N {which1, politician1, did1, pictures1, of1, upset1, Q1, v1} Suppose that after the derivational step in (37) is reached, K and L merge. No convergent result would then arise, because there would be no element in the numeration N in (37a) to receive the external -role assigned by the light verb to be later introduced; in addition, if either K or the wh-phrase within K moved to [Spec, vP], they would be involved in more than one -relation within the same derivational workspace, leading to a violation of the -Criterion.16 (37) a. N {which0, politician0, did1, pictures0, of0, upset0, Q1, v1} b. K [pictures of [which politician]] c. L upset The computational system may instead spell out the wh-phrase, make a copy of the spelled-out object, and merge it with upset (an instance of sideward movement), as shown in (38). Each copy of which politician in (38) will now participate in a -relation, but in a different derivational workspace, as in (30). (38) a. K [pictures of [which which, politician]i] b. M [upset [which which, politician]i] In the next steps, the light verb is selected from the numeration N in (37a) and merges with M in (38b), and the resulting structure merges with K after K is spelled out, yielding the (simplified) structure in (39). Further computations then involve merger and movement of did, and movement of the spelled-out subject to [Spec, TP], forming the (simplified) structure in (40). (39) [vP [pictures pictures, of, [which which, politician]i ] [v upset [which which, politician]i]] (40) [CP didQ [TP [pictures pictures, of, [which which, politician]i ]k T [vP [pictures pictures, of, [which which, politician]i ]k [v upset [which which, politician]i]]]] Among the three copies of which politician represented in (40), only the one in the object position of upset is available for copying; the other two became inaccessible after K in (37) was spelled out. The computational system then makes a 77
DERIVATIONS
copy of the accessible wh-element and merges it with the structure in (40), allowing Q to have its strong feature checked and finally yielding the structure in (41). (41)
CP [whichwhich, politician]1
T'
didQ
TP
[picturespictures, of, [whichwhich, politician]2]k
T' T
vP
[picturespictures, of, [whichwhich, politician]3]k
v' upset
[whichwhich, politician]4
In the phonological component, deletion of the trace of the chain involving [Spec, TP] and [Spec, vP] in (41) ends up deleting copy3, because copy3 sits within [Spec, vP]. As for the other wh-copies, since copy1 c-commands both copy2 and copy4 after the linearized material is plugged in (see discussion above), the chains CH1 (copy1, copy2) and CH2 (copy1, copy4) can be identified and their traces are deleted, yielding (42) below.17 (42) is then linearized and surfaces as (36a). Again, an apparent extraction from within a subject was only possible because Last Resort licensed sideward movement before the computational system spelled out the would-be subject. (42) [CP [which which, politician]1 didQ [TP [pictures pictures, of, [which which, politician]2 ]k T [vP [pictures pictures, of, [which which, politician]3 ]k [v upset [which which, politician]4]]]] Although sideward movement may permit circumvention of CED islands in the cases discussed above, its output is constrained by linearization, like any standard instance of upward movement. That is, the same linearization considerations that trigger deletion of traces are responsible for ruling out unwanted instances of sideward movement (see Nunes 1995, 1998 for discussion). Take the derivation sketched in (43)–(45), for instance, where every paper is spelled out and undergoes sideward movement from K to L. As is, the final structure in (44) cannot be linearized: given that the two instances of every paper are nondistinct, the preposition after, for instance, is subject to the contradictory requirement that it should precede and be preceded by every paper. In the cases discussed thus far, this kind of problem is remedied by trace deletion (deletion of lower chain links). However, trace deletion is inapplicable in (44); given that the two copies do not enter into a c-command relation, they cannot be identified 78
CYCLICITY AND EXTRACTION DOMAINS
as a chain.18 Thus, there is no convergent result arising from (44) and the parasitic gap construction in (45) is correctly ruled out. (43) a. K [PP after reading [every every, paper]i] b. L [VP filed [every every, paper]i] (44) [TP John [vP [vP filed [every every, paper]i] [after after, reading, [every every, paper]i]]] (45) *John filed every paper without reading. To sum up, the analysis explored above is very much in consonance with minimalist guidelines in that it attempts to deduce construction specific properties from general bare output conditions (more precisely, PF linearization); it limits the search space for deletion of copies (it can only happen within a c-command path), and it does not resort to the non-interface level of S-Structure to rule out (45), like standard GB analysis do (see Chomsky 1982, for instance).19 With respect to the main topic of this chapter, the lack of CED effects in acceptable parasitic gaps is argued to follow from the fact that Last Resort may license sideward movement from within a complex category XP, before XP is spelled out and its constituents become inaccessible to the Copy operation. In the next section, we will see that when parasitic gap constructions do exhibit CED effects, this is due to general properties of the system’s design, which strives to reduce computational complexity.
4 Sideward movement and cyclic access to the numeration Let us finally examine the unacceptable parasitic gaps constructions in (46), which illustrate the fact that parasitic gaps are not completely immune to CED effects. (46) a. *Which book did you finally read after leaving the bookstore without finding? b. *Which politician did you criticize before pictures of upset the voters? Under one derivational route, the explanation for the unacceptability of the sentences in (46) is straightforward. The PP adjunct headed by without in (46a), for instance, must be spelled out before merging with the vP related to leaving, as represented in the simplified structure in (47a) below; hence, the constituents of this PP adjunct are not accessible to the computational system and sideward movement of which book from K to L is impossible. Likewise, sideward movement of which politician from X in (48a) to Y in (48b) cannot take place because the subject in (48a) has been spelled out and its constituent terms are inaccessible for copying; hence, the unacceptability of (46b). (47) a. K [leaving the bookstore [without without, PRO, finding, which, book]] b. L read 79
DERIVATIONS
(48) a. X [IP [pictures pictures, of, which politician] upset the voters] b. Y criticize This account of the unacceptability of the parasitic gap constructions in (46) has crucially assumed that the computation proceeds from a “subordinated” to a “subordinating” derivational workspace; in all the cases discussed so far, sideward movement has proceeded from within an adjunct or subject to the object position of a subordinating verb. This assumption is by no means innocent. In principle, the computational system could also allow sideward movement to proceed from a “subordinating” to a “subordinated” derivational workspace, while still adhering to cyclicity. Suppose, for instance, that we assemble the matrix VP of (46a), before building the VP headed by finding, as represented in (49). (49) a. K [read [which book]] b. L finding Given the stage in (49), which book could undergo sideward movement from K to L, and M in (50b) would be formed (irrelevant details omitted). Further computations after M was spelled out and merged with K would then yield the (simplified) structure in (51). (50) a. K [read [which which, book]i] b. M [after PRO leaving the bookstore [without without, PRO, finding, [which which, book]i ]] (51)
CP didQ
TP
you
T' T
vP
vP
PP
read [whichwhich, book]i
[afterafter, PRO, leaving, the, bookstore, [withoutwithout, PRO, finding, [whichwhich, book]i]]
The relevant aspect of (51) is that, although the wh-copy inside PP is not accessible to the computational system, the wh-copy in the object position of read is. It could then move to check the strong feature of Q and deletion of the lower whcopies would yield the (simplified) structure in (52), which should surface as (46a). 80
CYCLICITY AND EXTRACTION DOMAINS
(52) [CP [whichwhich, book]i didQ [TP you [vP [vP read [which which, book]i] [after after, PRO, leaving, the, bookstore, [without without, PRO, finding, [which which, book]i ] ]]]] Thus, if sideward movement were allowed to proceed along the lines of (49)–(50), where a given constituent moves from a derivational workspace W1 to a derivational workspace W2 that will end up being embedded under W1, there should never be any CED effect in parasitic gap constructions and we would incorrectly predict that (46a) should be acceptable. Similar considerations apply to the alternative derivation of (46b) sketched in (53)–(56) below. In (53)–(54), which politician moves from the object position of criticize to the complement position of the preposition; further (cyclic) computations then yield the (simplified) structure in (55), in which the wh-copy in the matrix object position is still accessible to the computational system, thus being able to move and check the strong feature of Q. After this movement takes place, the whole structure is spelled out and the lower copies of which politician are deleted in the phonological component, as shown in (56). The derivation outlined in (53)–(56) therefore incorrectly rules in the unacceptable parasitic gap in (46b). (53) a. X [criticize [which politician]] b. Y of (54) a. X [criticize [which which politician]i] b. Z [of [which which politician]i] (55)
CP didQ
TP
you
T' T
vP
vP
PP
criticize [whichwhich politician]i
[beforebefore, [picturespictures, of, [whichwhich, politician]i], upset, the, voters]
(56) [CP [which which, politician ]i didQ [TP you [vP [vP criticize [which which, politician]i] [before before, [pictures pictures, of, [which which, politician]i ], upset, the, voters ]]]] The generalization that arises from the discussion above is that sideward movement from a derivational workspace W1 to a derivational workspace W2 yields licit results just in case W1 will be embedded in W2 at some derivational step. In 81
DERIVATIONS
the undesirable derivations sketched in (49)–(52) and (53)–(56), sideward movement has proceeded from the “matrix derivational work space” to a subordinated one. Obviously, the question is how this generalization can be derived from independent considerations. Abstractly, the problem we face here is no different from the one posed by economy computations involving expletive insertion in pairs such as (57), originally noted by Alec Marantz and Juan Romero. The two sentences in (57) share the same initial numeration; thus, if the computational system had access to the whole numeration, economy should favor insertion of there at the point where the structure in (58) has been assembled, incorrectly ruling out the derivation of the acceptable sentence in (57b). (57) a. The fact is that there is someone in the room. b. There is the fact that someone is in the room. (58) [is someone in the room] Addressing this and other similar issues, Chomsky (2000) proposes that rather than working with the numeration as a whole, the computational system actually works with subarrays of the numeration, each containing one instance of either a complementizer or a light verb. Furthermore, according to Chomsky’s 2000 proposal, when a new subarray SAi is selected, the vP or CP previously assembled based on subarray SAk becomes frozen in the sense that no more checking or thematic relations may take place within it. Returning to the possibilities in (57), at the point where (58) is assembled, competition between insertion of there and movement of someone arises only if the active subarray feeding the derivation has an occurrence of the expletive; if it does not, as is the case of (57b), movement is the only option and the expletive is inserted later on, when another subarray is selected. This strongly derivational approach has the relevant components for a principled account of why sideward movement must proceed from embedded to embedding contexts. If the computational system had access to the whole numeration, the derivation of the parasitic gap constructions in (46), for instance, could proceed either along the lines of (47) and (48) or along the lines of (49)–(52) and (53)–(56), yielding an undesirable result because the latter incorrectly predict that the sentences in (46) are acceptable. However, if the computational system works with one subarray at a time and if syntactic objects already assembled become frozen when a new subarray is selected, the unwanted derivations outlined in (49)–(52) and (53)–(56) are correctly excluded. Let us consider the details. Assuming that numerations should be structured in terms of subarrays, the derivation in (49)–(52) should start with the numeration in (59) below, which contains the subarrays A–F, each determined by a light verb or a complementizer. (59) N {{A Q1, did1}, {B you1, finally1, v1, read1, which1, book1, after1}, {C C1, T1}, {D PRO1, v1, leaving1, the1, bookstore1, without1}, 82
CYCLICITY AND EXTRACTION DOMAINS
{E C1, T1}, {F PRO1, v1, finding1}} The derivational step in (49), repeated here in (60), which would permit the undesirable instances of sideward movement, is actually illicit because it accesses a new subarray before it has used up the lexical items of the active subarray. More specifically, the derivational stage in (60) improperly accesses subarrays B and F of (59).20 (60) a. K [read [which book]] b. L finding Similarly, the step in (53), repeated here in (62), illicitly activates subarrays B and D of (61), which is the structured numeration that underlies the derivation in (53)–(56). (61) N {{A Q1, did1}, {B you1, v1, criticize1, which1, politician1, before1} {C C1, T1}, {D pictures1, of1, v1, upset1, the1, voters1}} (62) a. X [criticize [which politician]] b. Y of The problem with the derivations outlined in (49)–(52) and (53)–(56), therefore, are not the instances of sideward movement themselves, but rather the derivational steps that should allow them. By contrast, lexical access in the derivational routes sketched in (47) and (48), repeated below in (64) and (66), may proceed in a cyclic fashion from the structured numerations in (63) and (65), respectively, without improperly activating more than one subarray at a time. However, as discussed above, sideward movement of which book in (64) or which politician in (66) is impossible because these elements have already been spelled out and are not accessible to the computational system. (63) N {{A Q1, did1}, {B you1, finally1, v1, read1, after1}, {C C1, T1}, {D PRO1, v1, leaving1, the1, bookstore1, without1}, {E C1, T1}, {F PRO1, v1, finding1, which1, book1}} (64) a. K [CP C [TP PRO T [vP [vP leavingv the bookstore] [without without, C, PRO, T, findingv, which, book]]]] b. L read (65) N {{A Q1, did1}, {B you1, v1, criticize1, before1} {C C1, T1}, {D pictures1, of1, which1, politician1, v1, upset1, the1, voters1}} 83
DERIVATIONS
(66) a. X [CP C [TP [pictures pictures, of, which, politician] T [vP [pictures pictures, of, which, politician [v upsetv the voters]]]]] b. Y criticize The analysis of CED effects in parasitic gap constructions developed here can therefore be understood as providing evidence for a strongly derivational system, where even lexical access proceeds in a cyclic fashion.21
5 Conclusion This chapter has attempted to provide a minimalist analysis of classical extraction domains, in terms of derivational dynamics in a cyclic system. The main lines of research which provide a solution to the relevant kind of islands are (i) a computational system with multiple applications of Spell-out; and (ii) a decomposition of the Move operation into its constituent parts, taking seriously the idea that separate copies are real objects and can be manipulated in separate derivational workspaces (sideward movement). Extraction domains are opaque because, after Spell-out, the constituent terms of a given chunk of structure, while interpretable, are no longer accessible to the rest of the derivation. At the same time, this opacity can be bypassed if an extra copy of the moving term manages to arise before the structure containing it is spelled out, something that the system in principle allows. However, this possibility is severely limited by other computational considerations. For example, Last Resort imposes that the extra copy be legitimated, which separates instances where this copy is made with no purpose other than escaping an island (a CED effect) from instances where the copy is made in order to satisfy a -relation (a parasitic gap construction). In the second case, the crucial copy can be legitimated prior to the Spell-out of the would-be island, thus resulting in a grammatical structure. Moreover, we have shown how sideward movement can only proceed, as it were, forward within the derivational history. That result is straightforwardly achieved in a radically derivational system, where the very access to the initial lexical array is done in a strictly cyclic fashion. Although we find these results rather interesting, we do not want to finish without pointing out some of our concerns, as topics for further research. Our whole analysis relies on the assumptions that copies are real, and as such can be manipulated as bona fide terms within the derivation. If so, it is perplexing that, for the purposes of linearization different copies count as one, which drives a good part of the logic of the chapter. Of course, we can make this be the case by stipulating a definition of identity, as we have (token in the numeration as opposed to occurrence in the derivation); but we do not know why that definition holds. Second, it is fundamental for the account of island effects that spelled-out chunks be inaccessible to computation. However, chain identification can proceed across spelled-out portions, also in a rather surprising way. Once again, we can make things work by making c-command insensitive to anything other than the notion of containment; but we do not know why that 84
CYCLICITY AND EXTRACTION DOMAINS
should be, or why c-command should hold, to start with, of chains. Finally, it should be noted that cyclic access to the numeration is key in order to keep the proper order of operations; we have no idea why the relevant derivational cycles should be the ones we have assumed, following Chomsky (2000). All we can say with regards to all these questions is that we have suspended our disbelief, just to see how far the system can proceed within assumptions that are familiar.
85
5 MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS †
1 Introduction The Minimalist Program has no general account of islands. In part, this is because the system is designed in such a streamlined fashion B and with the assumption that computational mechanisms exist to meet the requirements of external interfaces – that little room is left for the apparently ad hoc considerations involved in formulating island conditions. In other words, nobody finds it elegant to speak of some category or another creating a barrier for movement, let alone removing such a barrier when necessary. In recent years, several attempts have been made to address at least some island restrictions within the Minimalist Program, with various degrees of success. In this chapter, I use some of the results discussed in Chapter 3, whereby it is argued that island effects arise as a consequence of a dynamic derivational system in which Spell-out – just as any other rule in the system – applies as many times as is necessary for a derivation to converge. In short, a structure to which Spell-out has applied becomes opaque for syntactic computation, thus turning into an island. Within these premises, this chapter studies a problematic paradigm from Basque syntactic studies, restrictions on question formation. The phenomenon has gained much attention throughout the last century because it deals with an ordering limitation in a language that looks otherwise rather free with respect to the position of verbal dependents; curiously, a question word must be leftadjacent to the verb. In present terms, an analysis is possible which has some rather important consequences for minimalism.
2 Wh-movement in Basque 2.1 Basic Basque syntax1 Basque is an underlyingly SOV, generally head-last language. It exhibits overt case morphology, the main cases being ergative ((e)k), absolutive (∅) dative ((r)i), and genitive (ko/ren): (1) Jonek Mireni Getxoko ogia bidali dio. J.-E M.-D G.-G bread-the/a-A sent 3-have-3-3 “Jon has sent Miren bread from Getxo.” 86
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
Nominal dependents get genitive (G) case; indirect objects, dative (D) case; base subjects, ergative (E) case; and base objects (and derived subjects of unaccusatives), absolutive (A) case, in a typical ergative Case system. Regardless of aspectual demands, the majority of Basque verbs are followed by an auxiliary which encodes subject, object and indirect object agreement (shown as numbers for person in glosses). Their sequential order is generally A(bsolutive).AUXILIARY.D(ative).E(rgative), and agreement association to arguments is standard: absolutive for monadic predicates, absolutive plus ergative for diadic ones, and an extra dative for triadic ones (as in (1)) as well as causative constructions (as in (3) below). Auxiliary selection is standard, too: (2) a. Jon etorri da. J.-A arrived is-3 “Jon has arrived.” b. Jonek Miren maite du. J.-E M.-A love 3-have-3 “John has loved Mary.” c. Aizkolariak lan egin du. lumberjack-the/a-E work make 3-have-3 “The lumberjack has worked.” Unaccusatives select a form of izan “be”; transitives, a form of ukan “have”; intransitives (unergatives) select a form of ukan as well, exhibiting agreement with two arguments (the absolutive one being a default, third person singular).2 Reasonably, as a result of this rich agreement system, Basque allows prodrop in all three major argument positions:3 (3) a. Jaun kuntiak zezen bati Bereterretxe harrapaerazi zion. Mr. count-the-E bull one-D B.-A hit-make 3-have-3-3 “The count has made a bull hit Bereterretxe.” b. pro pro pro harrapaerazi zion. “He has made it hit him.” To a large extent, pro-drop may also be a major source of the apparent “free word order” of Basque. Such a view would be very consistent with the fact that, unlike verbs, nouns (which lack the pro-drop system) are in fact quite rigid in the linear ordering of their dependents: (4) a. Hargaineko sorgin zaharrak H.-G witch old-pl. “The old witches from Hargain” b. * zaharrak sorgin Hargaineko c. * Hargaineko zaharrak sorgin d. * sorgin zaharrak Hargaineko e. * sorgin Hargaineko zaharrak f. * zaharrak Hargaineko sorgin 87
DERIVATIONS
In contrast, major sentential constituents can appear in just about any order (with information-theoretic consequences). Witness the alternatives to (2b): (5) a. b. c. d. e.
Miren maite du Jonek. Maite du Jonek Miren. Miren Jonek maite du. Maite du Miren Jonek. Jonek maite du Miren.
It is then plausible that all the examples above involve right- and left dislocations of phrases which “double” a pro element (all of this is meant pretheoretically at this point). I will assume the representations in (6) in order to capture the variations in (5): (6) a. b. c. d. e.
[pro Miren maite du] Jonek. [[pro pro maite du] Jonek] Miren. Miren [Jonek pro maite du]. [[pro pro maite du] Miren] Jonek. [Jonek pro maite du] Miren.
Despite the ordering possibilities in (5), when wh-movement has occurred, the wh-phrase must be left-adjacent to the main verb. Thus, (7a/b) are possible, but (7c/d) are not:4 (7) a. Zer bidali dio (Jonek) (Mireni)? what-A sent 3-have-3-3 J.-E M.-A “What has Jon sent Miren?” b. (Jonek) (Mireni) zer bidali dio? c. * Zer Mireni bidali dio (Jonek)? d. * (Mireni) Zer Jonek bidali dio? 2.2 A plausible analysis and some problems The fact in (7) has received a considerable amount of attention since it was first systematically discussed in Altube (1929). Within current assumptions, the standard account is that (7) exhibits a Verb second (V2) effect (Ortiz de Urbina 1989). An element like zer “what” in (7a) occupies the Spec of CP, and a verb like maite occupies the head of C, arguably guaranteeing the observed adjacency. From this perspective, (7b/c) would be the equivalent of the impossible English sentence *what to Mary has John sent? It is not my intention to question the technical details of the V2 analysis, but rather to offer an alternative within minimalism. For the sake of completeness, I should mention two difficulties the V2 approach must deal with, and which make the relevant analysis somewhat unnatural. The first problem is that Basque complementizers appear clause-finally:5 (8) Mirenek [[Jon etorri de]] la esan du M.-E Jon-A arrived is-3-that said 3-have-3 “Miren has said that Jon has arrived.” 88
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
No adjacency between the (rightmost) head of C and its (leftmost) specifier can be guaranteed under these circumstances. Ortiz de Urbina is thus forced to assume that, contrary to appearances, Basque complementizers are first and the observed post-verbal element is like a clitic. As evidence for his proposal, he claims that some Basque complementizers, e.g. nola are clause initial:6 (9) Mirenek [nola [Jon etorri de] n] esan du M.-E how J.-E arrived 3-is-if say 3-have-3 “Miren has said how Jon has arrived.” Perhaps a more plausible analysis for (9) is to treat nola as a (pseudofactive) element that occupies the Spec of CP, while the C head -n (underlined in (9)) occupies the appropriate rightmost site. Apart from preserving the generality of head positions in the language, this analysis would allow us to account for the contrasts in (10) involving long-distance wh-movement:7 (10) a. Nor esan du etorri dela Mirenek? who-A said 3-have-3 arrived 3-is-that M.-E Who has Miren said has arrived? b. * Nor esan du nola etorri den Mirenek? who-E said 3-have-3 how arrived 3-is-if M.-E In spite of observing the required adjacency between the moved wh-phrase and the verb, (10b) is out, unlike (10a). The facts could be nicely accounted for if the “escape hatch” of the lower CP, i.e. Spec of CP, is open in (10a) but is filled by nola in (10b).8 While it is true that Ortiz de Urbina could make the same claim within his V2 analysis (nola, in fact, occupying the Spec of CP), my point is that nola constitutes no independent evidence to conclude that Basque complementizer heads are clause initial. A second difficulty for the V2 analysis comes from the fact that, unlike standard V2, the Basque phenomenon under discussion is not restricted to the root, in two different senses: (11) a. Ez dakit zer (*Jonek) bidali dion. not know-1 what-A J.-E sent 3-have-3-3-if “I don’t know what Jon has sent.” b. Nork esan du ardoa bidali diola? who-E said 3-have-3 wine-(*the)-A sent 3-have-3-3-that “Who has he/she said has sent (*the) wine?” (11a) shows the effect in the complement of a question verb, whose associated wh-element (zer “what”) must be adjacent to the embedded verb (bidali “sent”). (11b) demonstrates that even bridge verbs – whose associated C structure is only used as an “escape hatch” – induce the relevant adjacency effect, in the following way. Even though a definite reading of ardoa “the wine” is possible in (11b), an indefinite reading is not. This definiteness effect can be accounted for if we assume that the embedded object is left-dislocated to the intermediate periphery (between the verb of saying and the embedded clause) 89
DERIVATIONS
when this object is definite, but such dislocation is not possible when the object receives an indefinite interpretation. If so, the indefinite object in (11b) occupies its normal preV position, and the ungrammaticality of the example (with a definite object) falls into the pattern seen before: ardoa breaks the adjacency between the embedded verb and the moving question word. These are not standard V2 effects. In fact, the V2 literature uses contexts like (11a) to test for the absence of V2 behavior (in languages where the phenomenon shows up in embedded clauses, though not with question verbs). As for (11b), the cyclic effect seen there is nothing like normal V2, which typically only affects the ultimate landing site of the wh-phrase. Ortiz de Urbina is well aware of all these issues, and thus compares the phenomena in (11) to the Spanish facts discussed by Torrego (1984): (12) No sé que (*Juan) envió. not know.1 what J. sent.3/past “I don’t know what Juan sent.” The comparison is well taken, but it is less clear that either set of facts (the Basque or the Spanish ones) can be analyzed as traditional V2. More importantly, it falls short of an explanation, insightful though the correlation clearly is. Laka (1990) argues convincingly that Basque does have standard V2 effects involving auxiliary movement in negative or emphatic contexts:9 (13) a. Miren ez du Jonek maite! M.-A not 3-have-3 J.-E love “Miren (is who) Jon hasn’t loved!” b. Arantza (ba) du Jonek maite! A.-A indeed 3-have-3 J.-E love “Arantza (is who) Jon indeed has loved!” c. Nor ez du Jonek maite? who-A not 3-have-3 J.-E love “Who (is who) Jon hasn’t loved?” Laka shows that this negative, emphatic construction exhibits the archetypical V2 pattern. The verb is displaced only in the matrix (contra what we saw in (11)), and only the auxiliary element appears in second position (contra what we have seen in all examples above, where the main verb and the auxiliary, in that order, appear adjacent to the wh-word). I include this comment on Laka’s observation to clarify that I am not claiming absence of V2 effects for Basque. Rather, standard V2 is not clearly at issue for the cases in which a wh-element must be adjacent to the verb (nor, for that matter, is V2 obviously at issue in the Spanish counterparts). In what follows I present an analysis of the Basque facts which extends to the sort of Spanish facts reported in (12), in effect denying that the relevant phenomenon is a V2 effect.
90
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
3 An alternative analysis 3.1 Toward a different proposal The general view in this chapter was first outlined in Laka and Uriagereka (1987; hereafter, L&U), within Chomsky’s (1986a) Barriers framework. I will recast the analysis in minimalist terms, but for presentational purposes I sketch it in the pre-minimalist model assumed in L&U. The major intuition of the analysis is that the observed adjacency between the moved wh-phrase (in the Spec of CP) and the main verb is a PF phenomenon. In other words, even though the elements in question are string-wise adjacent, they are not necessarily structurally adjacent. The main factor that conspires to yield this surface phenomenon is the fact that Basque is a pro-drop language. Hence, for sentences like ez dakit zer bidali dion “I don’t know what he sent to him/her,” it may well be that one or more null categories intervene between the wh-phrase zer and the verb bidali, roughly as follows: (14) [Ez dakit [zer [… pro … bidali dion]]] not know-1 what-A sent 3-have-3-3-if (14) is pronounced with PF adjacency between zer and bidali, but in fact several elements occur between these two items. If this view of the (superficial) adjacency is correct, then the adjacency is not really what needs to be explained. That is to say, in a language in which every argument of the verb can be realized by pro, PF adjacency between the whelement and the verb is not surprising. Rather, the problem is the alternative to (14) in (15), where the relevant pros are replaced with overt constituents. Assuming, for now at least, that the only difference between a normal argument and pro is their pronunciation (the structures being otherwise identical), why is (15) not as good as (14)? (15) *[Ez dakit [zer [Jonek/Mireni … bidali dion]]] not know-1 what-A J.-E M.-D sent 3-have-3-3-if L&U propose a characterization of the notion barrier – based on Fukui and Speas (1987) – the effect of which is to make an XP with an overt specifier a barrier, while leaving XPs whose specifier is unexpressed transparent to movement.10 I take this intuition as a guiding analysis, but rather than restating the facts as in L&U, I will attempt to provide an explanation for why the morphological “heaviness” of specifiers should trigger barrierhood. Before moving to the analysis, however, there is an extension of the data which argues for the adequacy of stating the basic generalization in terms of barriers. 3.2 Extending the data How can we distinguish the V2 analysis from a barriers approach? One possibility is in terms of hypothetical examples in which the PF adjacency is broken. The V2 analysis predicts that such examples should not exist in a language in 91
DERIVATIONS
which the C Spec and head are both expanded to the left (as is Basque in Ortiz de Urbina’s hypothesis). In contrast, the barriers approach leaves room for that possibility, all other things being equal. In particular this sort of example should exist if no barrier (technically, a category with a lexical specifier) is crossed. There are some examples of the relevant sort, for instance: (16) Zergatik zaldunak herensugea hil zuen? why knight-the-E dragon-the-A killed 3-had-3 “Why has the knight killed the dragon?” Instances like this were noted in Mitxelena (1981); Ortiz de Urbina mentions several, comparing them to similar Spanish instances noted by Torrego (1984): (17) Por qué el caballero mató al dragón? for what the knight killed.3/past to-the dragon “Why did the knight kill the dragon?” The correlation with Spanish is reasonable, but we must understand why whphrases like zergatik/por que “why” do not induce the alleged V2 effect. (18) is telling: (18) a. Zergatik (Jonek) esan du garagardoa edango duela? why J.-E say 3-have-3 beer drink-fut. 3-have-3-that “Why has Jon said that he will drink beer?” b. Por qué (Juan) dice que beberá cerveza? for what J. say.3 that drink.will.3 beer “Why does John say that he will drink beer?” Examples of this sort were introduced in Uriagereka (1988b) to illustrate the following property. (18a, b) are both grammatical with or without the intervening matrix subject; but when the subject is overt (e.g. “John”) only a matrix reading is possible for the wh-phrase, which thus asks for John’s reason for saying what he said rather than his reason for drinking beer (a reading which is possible when the intervening subject is not pronounced, in both Basque and Spanish). A barriers explanation for these contrasts is rather natural, if we assume, as is plausible, that wh-phrases like why are IP adjuncts. As an adjunct to IP in an unembedded context, movement of why to the C projection never crosses any barrier, even if IP takes the overt subject as its specifier; IP excludes the adjunct. However, if the IP which why modifies is embedded, then why must a fortiori cross the matrix IP were it to raise to the matrix C projection. If the matrix IP has a lexical specifier, it will be a barrier for movement of why from the lower clause; on the other hand, if the matrix IP has a null specifier, it is by hypothesis transparent to movement of why from the embedded clause. In this way, we predict the impossibility of why modifying the embedded clause when the matrix IP has an overt specifier. 92
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
It is unclear what the V2 analysis could say with respect to these facts. First, examples like (18) show that elements like why, when carefully examined, do in fact exhibit the alleged V2 effect (in this instance, a reading that is otherwise possible is prevented in the domain that concerns us). But then, how can one account for (16) or (17)? Given the possible readings in (18), the V2 approach is forced into a rather ad hoc analysis to capture the difference between (16)/(17) on the one hand and (18) on the other.11 L&U also note that (19) is marginally possible:12 (19) ? Nor horregatik etorriko litzake? who-A because.of.this come-Asp 3-have-3 “Who would come because of this?” The sharp contrast in grammaticality between (19) and the paradigmatic ungrammatical instances seen so far is hard to explain in V2 terms, where nothing should separate the moved wh-phrase in the CP Spec from the (hypothetically) moved verb in C. In contrast, for a barriers analysis, it does matter whether the intervening material is sufficient to create a barrier. This is arguably not the case for pure adjuncts like horregatik, a “because” type element which, as an adjunct to IP, does not serve as the specifier to any category and thus will not be a barrier. The theory predicts that in these instances no barriers would be crossed upon moving over horregatik. Other types of adjuncts behave differently from causal (pure) adjuncts. For example, temporal adjuncts block movement: (20) Nor (*orduan) etorriko litzake? who-A then come-Asp 3-have-3 “Who would come then?” This again follows, under the assumption that temporal adjuncts are specifiers rather than adjuncts. Of course, agreement in this instance is not obviously overt, and hence the claim about overt specifiers is necessarily more abstract in this case. Nonetheless, there is significant evidence in the literature that intermediate temporal and aspectual phrases may involve temporal specifiers in the appropriate sense, even in English (see, e.g. Thompson 1996). This treatment of some adjuncts is also very much in line with the fact in (21), which the theory predicts as well: (21) Noiz (*zaldunak) hil zuen herensugea? when knight-the-E killed 3-have-3 dragon-the-A “When has the knight killed the dragon?” Unlike a true adjunct (16)/(17), noiz “when” is sensitive to the presence of an intervening subject. This suggests that temporal adverbs are within IP, as specifiers of some internal projection(s). Again, it is unclear how the traditional V2 analysis can capture the subtleties of these behaviors.
93
DERIVATIONS
3.3 A difficulty The approach we have seen accounts for the absence of barriers between the moved wh-element and its trace. It does not matter what the relevant categories that take specifiers are – so long as the specifiers are overt (instead of pro), the theory predicts an immediate barrier. Thus, observe: (22) [Zer [pro (*Mireni) t bidali dio]] what-A M.-D sent 3-have-3-3 “What has he/she sent to Miren?” As we saw for subjects, intervening indirect objects also cause problems for whmovement. By hypothesis, this must mean that indirect objects also serve as proper specifiers to some category – now a standard assumption to make, which (22) offers evidence for if the present analysis is on track. Nonetheless, we must extend the claim even further to give a full account of the facts, since indeed no lexical specifier can intervene between the moved whphrase and the trace, nor between this trace and the verb. Notice that, at this point, it is not at all obvious why the latter statement should be true. Of course, claiming that direct objects also serve as specifiers to some category is no more controversial than making the claim for indirect objects.13 The puzzling difficulty, though, is that a direct object creates a barrier for subject movement: (23) [Nork [t [(*ogia) bidali dio]]] who-E bread-the-A sent 3-has-3-3 “Who has sent (him/her) the bread?” Why should the presumably lower specifier occupied by the object ever matter for the movement of the subject? A solution to this puzzle comes from denying that (23) is how subject movement proceeds. If the subject had to extract from a position lower than that of the object in its specifier, then we would expect the intervention effect: (24) [Nork [… [(*ogia)[t … bidali dio]]]] (24) is reminiscent of the analysis that Rizzi (1990: 62 and ff.) provides for Italian subject extraction.14 According to this view, the standard subject position, where it checks Case, is not a possible extraction site, essentially because of the “heaviness” of Agr. Under this perspective, subject extraction is forced into a roundabout movement, i.e. from inside the VP (details to follow). The L&U analysis predicts why the subject should not be able to extract from its Case position: the subject itself serves as the full lexical specifier of IP, paradoxically inducing the very barrier that it must move over.15 Of course, if the subject extracts from its VP internal position, it will be sensitive to any other intervening lexical specifiers. This view of subject extraction captures the adjacency between the trace of the moved element and the verb, just as the initial proposal guarantees the adjacency between the trace and its antecedent whphrase. At this point, we have a full, though still mere, description of the facts: 94
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
wh-movement from VP internal positions is sensitive to the presence of overt specifiers, which block it, unlike their null counterparts. 3.4 Beyond a description As a first step in understanding the description above, there are two important questions to ask. First, can we provide principled motivation for the crosslinguistic differences in extraction? Specifically, the Germanic languages typically allow movement over a fully specified IP: (25) I wonder what [John sent t] In contrast, the pro-drop Romance languages are not so easily analyzed, and lead us to proposals along the lines of Rizzi’s in which the “heaviness” of Agr is responsible for different extraction patterns: Romance languages (and Basque, extending the claims to all agreeing heads) have extra barriers where Germanic ones do not. But this observation concerning morphological differences is not an explanation; we must recast the intuition as a precise analysis. In part we have. We could define barriers as categories of a certain morphological weight, as in Uriagereka (1988a). But such an approach is unsatisfactory if we adopt minimalist assumptions, in particular the idea that mechanisms of the computational system must be seen to follow from interface conditions or from conceptual necessity. In this case, if we define barriers as categories of a certain morphological weight, we are forced to ask: why does “heaviness” of agreement make such a difference in the emergence of barriers? The second question to ask, to transcend our description, concerns new problems posed by the present approach: how does the trace of the wh-element in a non-Case Position become visible (or how does it check Case)? Similarly, if whmovement across a category with a lexical specifier is impossible in the relevant languages, why is A-movement possible in the simplest of sentences? (26) [Jonek [Mireni [ogia [t t t bidali dio]]]] J.-E M.-D bread-the-A sent 3-have-3-3 “Jon has sent Miren the bread.” To make their barriers analysis work, L&U were forced to adopt the now minimalist assumption that arguments move to Case specifiers. But if such a view is correct, then some of the movements in (26) will have to cross overt specifiers, unless some alternative route is found. In the remainder of this chapter, I shall address these issues within the Minimalist Program.16
4 Barriers in the minimalist system 4.1 Minimalist assumptions The Minimalist Program attempts to derive all restrictions on derivations from either bare output conditions (natural interactions with extra-syntactic 95
DERIVATIONS
components) or virtually conceptually necessary properties of the linguistic system. Within the latter rubric, Chomsky (1995b) includes economy, understood as a fundamental property of the natural world. Economy manifests itself in a variety of ways, both in the design of the system and its structural units, and in the computation of derivations and their alternatives. For example, the very definition of a proper chain presupposes that it involves shortest links (the Minimal Link Condition, MLC). Such a convergence condition is an instance of economy in the computation of derivations: a chain link which is not the shortest possible is not valid and thus the derivation does not converge. One way Chomsky (1995b: 297) seeks to formulate the MLC is by viewing movement, the mechanism which generates chains, as a consequence of Attract. That is, a given phrase marker K with feature F attracts the nearest available feature F which meets the demands of F. Movement of phrase marker containing F is a side-effect; although F only really attracts F, must for some reason take a “free ride” with F. As such, movement is taken to be rather like (so-called) pied-piping of a preposition when its object moves. Apart from economy in its structural design, the system is taken to manifest a kind of internal computational economy. Thus, given alternative derivations starting in the same array of lexical items, the derivation involving fewest steps eliminates the alternatives. In this instance, we crucially choose among competing, convergent candidates. These two sorts of explanations (conditions on convergence and optimality races) are the only kinds of accounts that minimalism favors.17 This means we only allow ourselves to propose a restriction of the sort “Movement across a lexical specifier is impossible” if an account cannot be otherwise derived from conditions on convergence or optimality races. In other words, such an ad hoc restriction is clearly neither a criterion for ranking derivations nor an obviously natural condition on convergence, and therefore does not meet the demands of minimalism in its purest form. 4.2 The essential elements for an analysis It would seem, then, that we are left without any room for expressing barriertype restrictions. However, if we take seriously the idea that movement is triggered by Attract, it is possible to blame the system’s dynamics for the observed effects, vis-à-vis movement across lexical specifiers. Given the assumption that a primary mechanism of the computational system is the attraction of some feature F to a feature F, we are in a position to recognize that Move is an ancillary operation triggered by Attract. In other words, while the bare requirement of the computational system is simply the displacement of F to F – and such movement of a feature would be characteristic of computations in the “covert” syntax, as in (27a) – cases of “overt” movement are arguably related to the morphological structure of the category which contains the attracted feature. Under this view, is not a well-formed PF object once F moves to F. Hence, while Attract strips F away from , Move brings 96
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
back to the domain where F has landed (27b), thus allowing for the repair of the morphological integrity of the lexical association (, F): (27) a.
b.
K H(K)
KP
L
K
… …
H(K) F'
… F' … Attract
L H(K)
… t …
Move
I suggest that this analysis of movement, as generalized “pied-piping,” provides a way to account for why overt specifiers induce barriers while a similar null element (pro) does not. In order to understand the connection between barriers and overt specifiers, it is necessary to make explicit certain assumptions involved in viewing movement as an ancillary operation. To begin with, as outlined above, the ancillary movement of is a consequence of the morphological integrity of the (, F) relation; notice, however, that proposing ancillary movement to establish configurational closeness between and F does not, in itself, provide a precise statement about how the morphological integrity of is re-established. To that end, suppose there is a morphological operation which repairs the (, F) relation: (28)
KP
K L
H(K) Repair F'
H(K)
… t …
Such an operation makes sense if the reason moves to the same domain as the attracted F is to repair some damage to the lexical integrity of . Proposing the operation in (28) implicates two additional assumptions. First, assuming that morphological operations proceed as the derivation unfolds, the pair (, F), as well as the configuration whose checking domain these elements occupy (KP in (28)), will have to be shipped to the morphological component at the point (28) applies. Suppose that what it means to be “shipped to the morphological component” is in fact a function of Spell-out, much as in Chomsky’s (1995b) original proposal. Thus, the morphological repair of (, F) in (28) requires that the minimal structure containing and F, i.e. KP, is spelled out at that point in the derivation. But given the assumption that the morphological repair implicates Spell-out, we must ask what happens in the derivation when KP is indeed spelled out. If 97
DERIVATIONS
we take the original intuition behind Spell-out at face value, then once the PF and LF features contained in KP are stripped away and shipped to the respective interfaces, KP is no longer available to overt syntactic computation given that it has, derivationally speaking, gone to PF and LF: (29)
continue merging KP to PF
F'
K H(K)
L
to LF
… t …
Then we can reach two conclusions about specifiers. They are created due to a morphological association between an attracted feature and the category it was stripped away from. The morphological repair of (, F), an operation invoked after the Spell-out of the category containing and F, creates a sort of “giant compound” which is no longer transparent to any further syntactic operation.18 Before moving directly to the connection between overt specifiers and barriers, let us first be completely explicit about the details of the above assumptions. First of all, the suggestion in (29) is an example of a “dynamically split” model of the sort argued for in Chapter 3, which goes back to early work by Bresnan, Jackendoff, Lasnik, and others. For empirical and theoretical reasons that are beyond the scope of this discussion, it is both possible and desirable to access Spell-out in successive derivational cascades as the derivation unfolds (see Chomsky 2000).19 Allowing Spell-out to apply more than once is, in fact, the null hypothesis, if we take Spell-out to be a mere derivational rule rather than a level of representation (such as S-structure). Nevertheless, assuming that Spell-out has cost, economy alone favors a tendency for the rule to apply as infrequently as possible. This must mean that the morphological repair strategy in (28) is a condition for convergence. If without it the derivation crashes, it is irrelevant to ask how costly an alternative derivation with a single Spell-out would be – there is no such alternative. A second important assumption concerns the cyclicity of the system, feeding morphology cyclicly as the derivation proceeds. The assumption is nothing but a version of the Distributed Morphology proposal in Halle and Marantz (1993), needed in this instance so as not to allow us to proceed derivationally beyond a projection of X only to come back to X and readjust it as the derivation maps to PF (though see Section 5, when Lexical-relatedness is brought to bear on these issues). In other words, the morphological repair strategy in (28) must be as much a “syntactic” (cyclic) process as movement or deletion. 98
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
A third (set of) assumption(s) relates to the fact that we separate those languages in which a lexical specifier blocks movement from those languages in which it does not. Following Rizzi’s (1990) suggestion, we should recognize this distinction as another consequence of the independently needed pro-drop parameter, extended to any functional category capable of licensing pro. In brief, only “heavy” categories which support pro go into Spell-out upon hitting a lexical specifier. This of course raises the question of why “light” categories pose no such problems. Given the logic of the system, the answer must be that the morphological repair in (28) only affects “heavy” categories, or alternatively (and perhaps relatedly) that “light” categories involve their specifier in terms of direct selectional requirements, instead of invoking the morphological repair above.20 If so, a light X will not force an application of Spell-out, which entails that if a category moves to the specifier of X in such a language, it will not induce a “barrier” effect. Given the system outlined above, we are now in a position to address the central question at hand. The reason why a category specified by pro does not constitute a barrier is direct under one last assumption: pro is a feature. If pro is simply F, an F with no related morpho-phonological content, then the attraction of F to some head will not induce the ancillary operation (27b), no application of Spell-out is necessary, and consequently no barrier emerges. Notice that this analysis of pro has an attractive consequence vis-à-vis minimalism: given minimalist assumptions, there is no obvious room in the theory for a pro specifier.21 However, under the analysis proposed here, we can see that pro is not actually a specifier in the configurational sense. Rather, in a language with a heavy feature, specifiers are created as a consequence of the morphological repair operation, and since there is no need for any morphological repair for a would-be pro specifier, we need not complicate the system by trying to motivate the existence of pro specifiers. But even if pro cannot exist as a specifier, it can exist as the attracted F. Needless to say, whether this account is plausible depends on the soundness and naturalness of the four assumptions just mentioned: Multiple Spell-Out, Distributed (cyclic) Morphology, heavy categories (which license pro) requiring a given morphological repair when involving overt specifiers, and pro understood as a feature. The first two are natural extensions of a model understood to be a dynamically derivational system. The other two are more specific assumptions concerning the pro-drop parameter; one is an extension of the tradition of pro as a pronominal feature on a functional category (where the subject is somehow encoded through agreement);22 the other is an extension of the proposal that movement is an epiphenomenon, i.e. movement emerges as the consequence of various sub-operations, as discussed in Chomsky (1995b: Chapter 4). In sum, the analysis explains why overt specifiers, in languages with agreement, induce a barrier to movement: the morphological integrity of the lexical items force the Spell-out of the structure they are specifiers of, subsequently rendering that structure unavailable to further syntactic computation (15). In contrast, corresponding structures without an overt specifier (14) do not induce 99
DERIVATIONS
Spell-out (because of economy), and thus such structures are transparent to movement. Languages without agreement are immune to these matters, hence allowing movement across lexical specifiers (25). 4.3 Typological distributions Typologically, the following picture emerges. (I set aside whatever triggers Move independently of Attract – Extended Projection Principle effects – and concentrate on pro-drop.) Possible parameters involved in the Attract process arise with respect to the sub-labels of the attracting head and the attracted feature itself. The attracting head may or may not be heavy; the former case forces overt movement (assuming that movement which is unnecessary by PF procrastinates). The attracted feature may or may not be pronounced, the former case forcing the ancillary operation that leads to morphological repair, and hence early Spell-out. Given this view, we can think of a heavy attracting head – in relevant languages – as nothing but an Agr element added as a feature of the attracting head to the lexical array. An unpronounced attracted feature is just pro. Why does Agr license pro? Because a heavy attracting head must attract a feature F, and pro is nothing but a possible (in fact, null) manifestation of F. Note that, from this perspective, we should not find instances in which pro is generally obligatory, without ever presenting a corresponding overt version with a full (pronominal) specifier. This accounts for the otherwise peculiar fact that although there are languages with overt but not null pronouns (English, French), and both overt and null pronouns (Basque, Spanish), there do not seem to be languages with null but not overt pronouns. Interestingly, there is no logical necessity for pro to be licensed through Agr: a language may exist with no heavy target for pro, and this element may survive its base-generated position as an unattracted feature with no PF reflex. This is presumably at the core of the null categories witnessed in East Asian languages like Japanese, which do not have agreement correlates: (30) a. [Osamu-ga [Keiko-o [aisiteru]]] Osamu-S Keiko-o love “Osamu loves Keiko.” b. [pro [pro [aisiteru]]] “He/she loves him/her.” Jaeggli and Safir (1989) speak of two major distributions for pro: languages with rich agreement have it, and so do languages with no agreement. This generalization fits nicely in the present picture, the relevant parameter being whether pro is attracted out of VP or whether it instead remains within VP, as a reviewer aptly suggests. Apart from their typological significance, Japanese facts are also important with respect to the issue of barrierhood: we predict that a lexical specifier in the sort of language where pro is not “licensed” morphologically does not induce a 100
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
barrier, in spite of the fact that this specifier appears in free variation with pro (27a/b). Japanese topicalization argues for this result: (31) [Keiko-o [Osamu-ga [t [aisiteru]]]] Keiko-o Osamu-S love “Keiko, Osamu loves.” Importantly, Keiko-o can topicalize over Osamu-ga, thus presumably over an IP which may (and does in this instance) take a lexical specifier.
5 Issues on A-movement 5.1 What is involved in a cycle? As we have seen, overt specifiers in languages with heavy agreement will induce barriers to movement. But if the analysis is correct, then we encounter the following problem: these specifiers will themselves create barriers for one another. In other words, in simple sentences like (26), which involve multiple movements to the various specifier positions, movement should be blocked. This section provides a solution to this puzzle. There is an easy and a difficult part to the solution. The easy part is the logic of this proposition: we have to somehow prevent the application of morphological operation (28) when dealing with A-movement across “the next” specifier. If (28) does not apply, the system will not create the “giant compound” in (29), and consequently no barrier will be induced. As a result, A-movement can proceed over the relevant specifier as it would over the corresponding pro, according to fact. The difficult part, of course, is explaining why the logic of that crucial proposition should in fact hold. The answer lies at the core of the A/A distinction. Providing a principled motivation for such a distinction has always been difficult, but it is more so within minimalism, where notations are forced to follow from bare output conditions. Chomsky’s (1995b) suggestion for the relevant split is the notion L(exical)-relatedness: A-positions are L-related, A positions are not. I return shortly to a minimalist characterization of L-relatedness, but assuming it for now, let us see how it bears on the present puzzle. To begin, I assume (32): (32) The L-relatedness Lemma Morphology treats a domain of L-relatedness as part of a cycle. The motivation for viewing (32) as a lemma will be made explicit in the next section. For now, assume that (32) is true. If so, morphological operation (28) will not apply, all other things being equal, when local A-movement is taking place, simply because it does not have to. Let us consider this in some detail. The intuition behind the dynamically split model of Multiple Spell-Out is that the system spells-out in a cyclic fashion – and only in a cyclic fashion. From this perspective, the PF/LF split is architecturally cyclic. Interpretation (at both the PF and LF interfaces) occurs cyclically through the course of the derivation. In 101
DERIVATIONS
this sense, the grammar is heavily derivational. Each cycle is spelled out in turn, until the last cycle goes by. Why are only cycles spelled-out? Because nothing else needs to, hence will not in an optimal system. This approach places a heavy burden on motivating the notion “cycle.” The only new idea (vis-à-vis cyclic systems of the 1970s) proposed in Chapter 3 is that, given reasonable structural restrictions, the cycle is an emergent property of derivations. For example, if Kayne’s (1994) LCA holds only of units of structure where Merge has applied exhaustively, and thus where command holds completely, then each such maximal unit will constitute a cycle. Let us see why:23 (33) a. Command unit: {s, {s, {t …}} } s →↑← {t …} said that …
b. Not a command unit: {s, { {t, {t, {c …}}}, {s, {s, {t …}} } } } {t, {t, {c …}}} →↑← {s, {s, {t …}} } t →↑← {c …} s →↑← {t …} the critic … said that … Exhaustive application Non-exhaustive application of of Merge to same object Merge, to two separately assembled objects
If we assemble [the critic …] said that . . . in (33b) prior to the spelling out of the critic, there will not be a way of linearizing the resulting structure, and the derivation will crash. The alternative derivation spells out the critic, and then merges the resulting “giant compound.” The point is only this. As a result of a simple convergence condition, a cycle has suddenly emerged under a maximal command unit. But as illustrated in this chapter, not just maximal command units force early Spell-out, again under reasonable assumptions. For the data under investigation here, the issue has been morphological operation (28), invoked in languages with heavy agreement after a process of feature attraction. In these cases, the emergent cycle appears to be the result of a combination of heavy morphology and the transformation which strips a feature from a category, thus forcing a subsequent repair. In all of these instances, failure to Spell-out results in a derivation that crashes at PF, and thus is not a valid competitor in the optimality race. All other things being equal, however, Spell-out would not apply if it need not apply (the derivation with less Spell-out wins). Now (32) becomes significant. If morphology can access (for repair) successively L-related specifiers in a single cycle, there will not be any need for the system to incur the cost of (a cycle of) Spell-out; as a consequence. Everything within an L-related domain will be syntactically transparent for the purposes of further movement. As noted earlier, this is the easy part of the reasoning. The hard part is justifying (32) as a lemma. 5.2 L-relatedness In order to explore the role of L-relatedness in this context, consider the problem in (34), noted independently by Alec Marantz and Juan Romero: 102
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
(34) a. [A ballroom was _ [where there was a monk arrested]] b. [There was a ballroom [where a monk was _ arrested]] These two examples start in the same lexical arrays, which Chomsky (1995b: Chapter 4) calls “numerations.” Interestingly, (34b) involves a local movement of a monk even though there is an option of inserting there, as in (34a). This is an important example because Chomsky argues for other similar cases, as in (35), that the derivation with insertion outranks the one with movement: (35) a. [there was believed [_ to be a monk arrested]] b. *[there was believed [a monk to be _ arrested]] Note that the evaluation which ranks (35a) over (35b) is reached cyclically. In other words, even though movement is invoked in both derivations (there in (35a), a monk in (35b)), the key difference is that at the point in the derivation when a monk is moved in (35b), there could have been inserted, and thus the derivation which does not invoke movement wins the optimality race then and there. (Technically, optimality races are computed with regards to the “derivational horizon” which remains after committing to the same partial numerations.) But if the option of insertion should rule out the derivation with movement, then why does (34b) not lose to (34a)? It is easy to see that the problem would not arise if we manage to divide the derivations of the problematic examples in (34) into separate sub-derivations, i.e. separate cycles, so that each evaluation of economy is based on different partial numerations. That is, suppose the partial numeration involved in computing the embedded clause in (34a) were (36a), while the one for the embedded clause in (34b) were (36b): (36) a. {there, was, a, monk, arrested, …} b. {was, a, monk, arrested …} Given this approach, there would be nowhere to compare the derivations of (34a–b): the two could not possibly have the same derivational horizon because they do not start in the same (partial) array. The result in (36) can be ensured as in Chomsky (2000), by simply adding a proviso telling the derivation how to access the numeration. Importantly, we do not want to lose the analysis of the facts in (35), which means that however we refine accessing a numeration, there must be a unique access for (35) (unlike what we saw for (34)/(36)). So we assert (37): (37) The minimal set of lexical items that result in a convergent structure constitutes a partial access to the numeration. (34) is dealt with by assuming a separate access to the numeration for the embedded clause. In contrast, a sub-derivation including only the elements which form *a monk to be arrested produces an illicit partial object (in this instance, in terms of Case). Thus, the array of items {to, be, a, monk, arrested} cannot be accessed on their own, which forces the two derivations in (35) into 103
DERIVATIONS
the same optimality race. Of course, this problem does not arise for (36), where each derivation contains the independently convergent sub-derivations of . . . there was a monk arrested and . . . a monk was arrested.24 One important consequence of this cyclic access to the numeration is that, as a result of it, a kind of “natural domain” has emerged in the system. Suppose the notion L-relatedness, and by extension the A/A distinction, is understood in terms of a cyclically determined derivational space. In other words, we may as well use (37) to define L-relatedness: (38) L-relatedness A cyclically accessed sub-numeration defines a domain of L-relatedness. Though of course not necessary, (38) is very plausible. The basic idea is that the minimal set of objects which stand in a relation of mutual syntactic dependency has a special status in the grammar. If thematic relations associated to a given verb are syntactic dependents, then such relations will immediately be relevant. At the same time, we know of interesting mismatches between thematic and Case structures, precisely in domains like the one involved in (35). In those instances, L-relatedness will go beyond theta-relatedness, in an expected direction. Quite simply, then, A-positions can be defined as Chomsky (1995b: Chapter 3) suggests: (39) A-position XP is in an A-position if L-related to an argument-taking unit. Once (39) is assumed, and shown to be both necessary and plausible, the difficult part of motivating the L-relatedness Lemma is in place: (32) follows from the trivial interaction of (37) and the conditions under which the system invokes Spell-out. L-relatedness is a direct consequence of a mechanism to access numerations in a cyclic fashion (37), which as we saw is independently needed (on empirical grounds). In turn, the system only goes into (early) Spell-out, hence closing off a cycle, to meet convergence requirements (e.g. morphological operation (28) or for maximal command units as in (33)). Therefore, given that the cycle emerges as a result of derivational dynamics, there simply is no cycle while L-related objects are involved. Not surprisingly, cyclic access to the numeration determines the domain of cyclic application of syntactic operations (e.g. early Spell-out and access to morphology).25 I should add a note of caution. Elements in C, at least in English and Basque, cannot be L-related to V, or everything proposed here will reduce to vacuity. However, even though C cannot be L-related to V in English and Basque, there is nothing in principle which requires such a condition. In fact, in various languages, elements related to the C system do behave as A-positions, as has been independently noted by many. The issue, then, is to parameterize the system, a very interesting matter that I turn to next.
104
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
6 Case matters Assuming the analysis as presented, immediate questions arise about various details concerning Case assignment. 6.1 The Case of pro It may seem, given what we have assumed about pro, that languages with and without this element involve very different sorts of Case relations. After all, the assumption is that pro is nothing but a neutralized category/head/feature, which starts its derivational life as a sister to a verbal projection, and ends up being attracted to a head H where its Case/agreement is checked as a sub-label of H. In contrast, regular arguments are potentially cumbersome phrases which raise through movement, ending up as specifiers, i.e. obviously not simple heads. This puzzle, however, is only apparent. The driving force behind the system is, in all instances, feature attraction. From this perspective, pro, as nothing more than a feature, is the ideal solution for a system which is driven by the attraction of one feature to another. It is only when an argument is not pro (and needs to engage in a pre-Spell-out process) that attract induces the ancillary mechanisms, including Move, which ultimately induce a barrier. But despite the possible occurrence of the ancillary mechanisms, feature checking takes place at the head level, and therefore is identical across languages; the parametric choices arise with respect to the earliness of this checking and the necessity of the ancillary processes. This state of affairs has a curious consequence: if a derivation could choose between pro and a lexical alternative, use of pro entails much less derivational cost. Generally, this choice is not there; for example, when a lexical NP is chosen from the lexical array, there is no issue of “replacing” it with pro – these would be two totally different derivations. However, it is conceivable that a derivation with pro and one with an overt pronoun compete with one another.26 All other things being equal, the system should prefer the derivation with pro. Only if processes which are inexpressible through pro – such as emphasis with a PF correlate – are relevant to the derivation would pro not be chosen, inasmuch as it would yield a non-convergent derivation. Of course, this fits nicely in the tendency we see in pro-drop languages to avoid overt pronouns, except for purposes of emphasis. 6.2 The Case of t Consider next how the wh-trace gets its Case checked. The question goes well beyond the present analysis. How does any wh-element check Case? Notice that we cannot simply say that the wh-phrase moves through the specifier of some Case position, and then moves again to its final destination, for the following reason. In the present version of minimalism, Case is not checked in a specifier of anything; movement to a specifier is ancillary to attraction to the corresponding head. Suppose then that the wh-feature moves through the checking domain 105
DERIVATIONS
of some Case position, and then moves again. But this is not trivial either: featural movement involves head-adjunction, a morphological operation that freezes the resulting structure, much in the same way that we saw for entire phrase-markers that undergo a repair strategy at Spell-out. Once the head and its adjuncts have been shipped to morphology, we cannot just excorporate the wh-feature leaving the rest of the word behind. That operation simply cannot be stated in standard minimalist terms. There are a couple of conceivable solutions that languages can try to “solve” this structural difficulty. In certain languages, the site that checks Case (for example, a v element encoding Accusative) may end up moving all the way up to C, where it can check the wh-feature it previously incorporated as a sub-label upon Case-checking the wh-feature. This would amount to saying that, in such a language, C is L-related to the V/T projections, and may well be a valid strategy for wh-movement. Returning to Basque, where as I said we have to assume C is not L-related to the V/T projection, a different strategy must be operating. The logic of the analysis entails that the wh-feature and the Case-feature of some lexical item are not contained in the same bag of features. There are reasons to suppose that this is the general instance. Semantically, a wh-question has three main parts: a quantifier, a variable and a predicate. Wh-movement is concerned with the quantificational part, whereas Case is concerned either with the predicate or the variable, depending on exactly what the role of Case is in the grammar. We may then follow the suggestion, which Chomsky (1964) attributes to Klima, that these semantic distinctions correspond directly to the actual syntactic form of wh-elements. Basically, who is shorthand for which one person, and what for which one thing, and so on. If so, attracting the wh-feature and attracting the Case-feature can be seen as attracting two different feature bags. This does not quite completely address the issue, though, since one may ask why only the whfeature demands a repair strategy analogous to that in (28) in its C specifier, the position where phonetic realization takes place. In other words, the question is still why the Case feature, in this instance, is not also in demand of such a repair, hence forcing phonetic realization in the site of Case assignment. The latter question is one all minimalist analyses face, if they assume Feature attraction. A plausible answer is this: the ancillary operation and corresponding repair strategy takes place only when necessary, i.e. when the attracted feature has morphological content. Intuitively, only an affixal attracted feature constitutes a morphological missing part of wherever it originated (the wh-phrase). A phonetically null feature may be attracted without affecting the morphological integrity of its source, and hence does not require the source to pied-pipe in search of the missing feature: (40) a. [[wh[C][you[[D[v]][saw who]]]] b. [who[wh[C]][you[[D[v]][saw t]]]]
106
wh- is morphological D- is null
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
C attracts the wh-feature and v attracts the D feature of who, which is possible if they are in two different bags (40a). Assuming wh- is morphological, it will trigger the ancillary wh-movement in (40b); this is not the case (given the present hypothesis) if the D feature is not morphological. A question to then ask is how the moved who in (40b) has its Case feature checked if it never moves through a Case-checking site. Note that when who moves up, its D feature has already been attracted to the v domain; hence, it is not who that moves to C, but who minus the appropriate Case-checking feature. This would not be a possibility were it not true that the Case-checking feature is null in a language like English, trivially materializes in the v specifier without a morphological realization. If so, we expect different facts in other languages. There should be some where wh- is not morphological, but D is. In such languages, we should find an OV order, but not overt wh-movement, as in Japanese. We may also find languages where neither feature is morphological, in which case we expect VO order and no overt wh-movement. Chinese may illustrate this. Finally, we ought to find languages where both features are morphological, but derivations in such languages would immediately encounter the difficulty posed in this section. How can related features be checked in two different categories, with morphological realization in both? An immediate possibility that comes to mind are languages that resort to resumptive pronouns (e.g. Swedish), which may solve the difficulty by, in effect, splitting the operator-variable relation almost entirely. Alternatively, other languages might exploit the possibility of an L-related C-projection, and hence Case checking from this unexpected site, external to IP. It is worth exploring whether Chamorro may constitute one such language. 6.3 The Case of Chamorro Chamorro exhibits wh-agreement with traces (Chung 1994: 9): (41) Hafa pära u-fa’tinas si Juan t? what fut wh-OB.Agr-make Juan “What is Juan going to make?” Why should agreement be sensitive to wh-traces? Chung (1994: 11) provides a functionalist analysis: Assuming the grammar needs to indicate that there is an unbound wh-trace in some specified domain, Case serves to narrow the possibilities of the location of the trace within that domain. Within minimalism, however, even if we agreed with this conclusion, we could not use it as the cause of the phenomenon since lexical arrays are insensitive to interpretive demands. But we can turn the idea around, supposing that something like Chung’s conclusion is obtained as a consequence of a purely formal process. In a language in which both Case and wh-features are morphological, they must combine the attracting forces observed in (40a), directly resulting in local wh-agreement. There is more. We saw that a language with those hypothetical characteristics must involve overt checking of both sets of relevant features (with the consequent repair strategies involving specifiers). The only way this will be possible in 107
DERIVATIONS
Chamorro is if V is L-related outside IP. This correlates nicely with Chung’s observation that Chamorro is VSO even in embedded clauses, thus, irrespective of V2 considerations. We may take this systematic order as an indication of verb movement past the subject position, for which the hypothesized L-relatedness “out of IP” is desirable as a motivation.27 Finally, consider what is perhaps the most surprising fact about Chamorro: wh-agreement is exhibited long distance: (42) Hafa ma’a’a ñao-ña i palao’an [t pära u-fa’nu’i si nana-ña t]? what wh-OBL.afraid-Agr the girl fut wh-OBJ.Agr-show mother-Agr “What is the girl afraid to show her mother?” Note, in particular, that the matrix verb exhibits wh-agreement with the moved element. Remarkably, also, the wh-agreement does not exhibit the Case morphology of the moved item (here, OBJ), but instead, as Chung notes (p. 14), the morphology appearing in the main verb corresponds to the Case of the embedded clause (here, OBL). This suggests that the scenario we predict does hold: the moved wh-phrase agrees, as it proceeds through the intermediate C Spec, with the C head, this agreement signaling both Case and wh-feature checking. Subsequently, the C head incorporates into the V that takes it as an argument. Because of this, the morphology on the matrix verb is not that of the wh-item, but is instead that of the CP headed by C.28 6.4 A residual Case There is a final wrinkle which the Basque data pose, as does, for that matter, Rizzi’s (1990) analysis of subject extraction in Italian. It is one thing to say that object or indirect object traces are derived as in (40). It is quite another thing to say that subject traces are obtained that way. If subject wh-traces are directly attracted from their VP internal position, what occupies the subject position in satisfaction of the Extended Projection Principle? We cannot really answer pro, as Rizzi in fact did, because pro is by hypothesis just another feature, and it seems unlikely that the Extended Projection Principle is ultimately a featural requirement. This suggests either that there is no such thing as an Extended Projection Principle (rather, the “EPP” should be seen as parametric), or else some null topic is satisfying the requirement in pro-drop languages. I will not decide among these possibilities here.29
7 Extensions and speculations It is fair to ask whether the analysis proposed here systematically extends beyond Basque. Likewise, we should consider whether structures other than verbal arguments are sensitive to the sort of locality considerations discussed in this chapter. 108
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
7.1 wh-movement in Hungarian An instance in the literature with similar descriptive properties to those of Basque wh-movement can be seen in Hungarian. Thus: (43) Kit (*Janos) t szeret? who-ACC. J.-NOM like-3 “Who does he/Janos like?” Kit “who” is fine when left-adjacent to the main verb szeret “like-3”; when the adjacency is broken, the result is bad. Many works have discussed this effect, some of them explicitly comparing it to the Basque data.30 Many of the descriptive generalizations that we have reached concerning Basque would seem to hold in Hungarian as well. For instance, Hungarian exhibits agreement (and associated pro-drop), not just with subjects, but also with objects, just like Basque. Consider this example adapted from Brody (1990): (44) Nem utalom not hate-1-3 “I don’t hate him.” The reader may have noted that I have not glossed the examples in (43) with object agreement markers. I have taken this practice from the literature, since it is generally assumed that true agreement in Hungarian is with definite, third-person objects (as in (43)). Quantificational and indefinite elements do not exhibit this overt agreement morphology. Thus, we compare, e.g. szereti “he likes definite” to szeret “he likes indefinite.” However, we may decide, based on the theoretical motivation of extending the analysis developed here to Hungarian, that Hungarian involves a form of (abstract) agreement even for indefinite objects. Given agreement projections in Hungarian, we predict specifiers to induce potential barriers. Note, though, that Hungarian uses the indefinite verbal form for wh-traces, as (43) shows. If this indefinite (object) form did not involve agreement, the present analysis would predict that a subject wh-phrase need not be adjacent to a verb when a direct object intervenes: (45) [Wh [t Object Verb]] The only motivation we had to rule out the analogous Basque example in (23) was to deny an extraction of precisely this sort. In the Rizzi-style analysis in (24) – schematically repeated in (46) – subject extraction is across an object, thus sensitive to morphological detail along the lines explored in this chapter: (46) [wh [… [Object [t Verb] …]]] But then we had to ask which languages proceed in this roundabout fashion, and, again following Rizzi, we suggested that these are the pro-drop languages, refining (46) to (47): (47) [wh [pro [Object [t Verb]]]] 109
DERIVATIONS
It then follows that, since examples of the form in (45) are bad, Hungarian must involve extraction as in (47), entailing a pro subject even when the pronounced object is “indefinite” (thus selecting the indefinite agreement pattern). More generally, if agreement were not involved in indefinites in Hungarian, we would expect questions of the form in (48): (48) *[wh [indefinite subject [t Verb]]] But these are generally impossible. In a nutshell, there does not seem to be any wh-/verb adjacency which depends on whether intermediate material is (in)definite, a situation which would, in present terms, force a similar agreement across the board, regardless of morphological manifestation. (We reached a similar conclusion about apparent adjuncts in Basque (see (20)), where adjuncts are treated as specifiers regardless of overt agreement.) Of course, it could also be that the Hungarian phenomenon is just different from the Basque one. There are reasons for and against this conclusion. On the similarity side we find that, over and above the core adjacency facts with whmovement, Hungarian exhibits exceptions of the sort seen in Section 3.2: (49) Miert Janos ment haza? why J.-NOM go-past-3 home “Why did Janos go home?” Actually, Kiss (1987: 60) notes that miert “why” is ambiguous between a VP (reason) reading and IP (cause) reading. It is only in the IP (cause) reading that sentences like (49) are acceptable, which is what we predicted for similar instances in Basque and Spanish given that IP adjuncts are excluded by IP. Notably, also, the whole paradigm studied in this chapter extends to focalization in both languages. Thus, compare these focalized instances ((50a) in Hungarian and (50b) in Basque): (50) a. PETER (*Marit) szereti. P.-NOM M.-ACC like-3-3 “Peter likes Mari.” b. PERUK (*Miren) atsegin du. P.-E M.-A like 3-have-3 “Peru likes Miren.” There are several reasons to suppose, with Brody (1990), that focalization involves a category below C. Thus, consider (51): (51) Ez dakit zergatik HONI eman behar diozun. not know-1 why THIS-D give must 3-have-3-2-if “I don’t know why TO THIS ONE you must give it.” In this embedded question, zergatik “why” is presumably in the Spec of C, and crucially the focused phrase is after the wh-word (and thus still satisfies the adjacency requirement with the verb). This fact extends to Hungarian, where (49) is naturally interpreted with focus on Janos. Now, for our purposes, it is not neces110
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
sary that the target of wh-movement and the target of focalization be the same – so long as movement is involved in both instances and sensitive to barrier considerations of the sort discussed before. Examples like (49) or (51), involving both a wh-phrase and a focalized phrase, can only be constructed with an adjunct in C. Once again, the reason for this is that only with an adjunct does the wh-phrase not need to be adjacent to the verb, as we saw in (16). The fact that Basque and Hungarian pattern alike suggests the convenience and desirability of a unified treatment. On the other hand, there are also important differences between the two languages. For example, long distance wh-movement in Hungarian looks more like the process witnessed in Chamorro than anything seen in Basque. Compare: (52) a. Janos irta hogy jon. J.-NOM write-past-3-3 that come-3 “Janos wrote that he would come.” b. Janos, akit mondtak hogy jon. J.-NOM who-ACC say-past-3 that come-3 “Janos, whom they said would come.” (52a) shows a regular verb with a complement clause exhibiting object agreement with this clause. When wh-movement out of the embedded clause takes place, the matrix verb does not exhibit definite agreement (Kiss 1987: and ff.).31 This is reminiscent of (42), and may find a similar explanation. The moved whphrase agrees in wh-features with the intermediate C. However, unlike in (42), C in this instance (hogy) does not incorporate to the main verb mondtak “saypast-3.” Nonetheless, the agreement between the moved wh-phrase and C suffices to turn the embedded CP into a “not-definite” phrase, thus forcing indefinite agreement in the matrix verb. The question of course is why the moved wh-phrase invokes agreement with the intermediate C. By parity of reasoning (with the analysis of Chamorro), agreement involving the intermediate C in (52) may plausibly be because the V projection is L-related “outside IP.” The possibility that C is L-related to the V projection in Hungarian, but not in Basque, may also account for the following contrasts: (53) a. Mireni, nork eman dio zer? Miren-D who-E give 3-have-3 what-A “To Miren, who gave what?” b. Marinak, ki mit adott/mit ki adott? M.-D who-NOM what-ACC give/what-ACC who-NOM give “To Mari, who gave what?” Basque does not generally allow multiple wh-questions in pre-IP position, unlike Hungarian (where, as (53b) shows, no ordering obtains among the moved wh-elements). It may be that Hungarian (as Slavic and Eastern Romance languages that allow multiple moved wh-questions) tolerates (53b) only through a C L-related to V. If the L-relatedness of C to V is relevant in instances like (52b) and (53b), it 111
DERIVATIONS
is very likely relevant for examples of the format in (48) as well, and may be argued to be responsible even for (50a), contrary to what we expect in the Basque example in (50b). But I must leave any detailed discussion of this claim for future research. 7.2 wh-islands Just as we plausibly encounter transparent movement across pro in languages other than Basque, a version of the process may be witnessed in other constructions. Of particular interest are wh-islands, which Chomsky (1995b: 295) suggests should reduce, in minimalist terms, to the MLC (which he partially retracts from in Chomsky 2000). That is unlikely for three reasons. First, wh-island violations like (54a) are nowhere near as bad as MLC violations like (54b): (54) a. ?*[What do you wonder [why [John bought t]]] b. *[John is likely [that it seems [t to be smart]]] Second, although it is arguably true that the matrix C in (54a) violates the MLC (it attracts the wh-feature of what over that of why) it is equally true that the wh-island effect arises even when the attracting feature moving over a whphrase is not obviously a wh-feature (Lasnik and Saito 1992): (55) ?*[this car [I wonder [why [John bought t]]]] And third, the wh-island effect is known to be, somehow, parameterized (Rizzi 1982). Thus, compare (54a) to the perfect Spanish sentence in (56), of the sort discussed by Torrego (1984): (56) ¿A quién no sabes por qué dio Juan t un beso? to whom not know-2 for what gave-3/past J. a kiss “Who do you wonder why John gave a kiss?” It is implausible that the MLC is not violated in (56), if it is violated in (54a), and it is hard to see how to parameterize this. One needs to argue that, despite appearances, (54a) is not an MLC violation, on a par with (55), which clearly does not violate the MLC. One possible way around the difficulty is in terms of the notion “closeness,” as defined in Chomsky (1995b: 299). For Chomsky, is closer to X than only if commands . It is conceivable that attracted wh-features never enter into command relations with one another if they are within categorial D elements: (57) … [CP [DP [D… Wh…]…] [DP [D…Wh…]…]]… Depending on the complexity of DP, the upper D commands the lower – but wh, the feature within D, does not obviously command anything. Second, we should try to argue that wh-island violations arise when the CP that a wh-phrase lexically specifies is forced to undergo partial Spell-out. This would be the case, according to our general reasoning, if C is morphologically 112
MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS
heavy and attracts a wh-feature, with a consequent ancillary operation for overt wh-phrases which involves the Spec of C. It is the ensuing repair strategy that would freeze the CP. As a consequence, nothing should be able to move over the fully specified CP, be it a wh-phrase (as in (38a) or any other category (as in (39)). As for those languages which do not invoke a wh-island violation in instances like (54a) or (55), one possibility is that, by analogy with (25), a wh-phrase in the Spec of C is there for a selectional requirement of the sort involved in the Extended Projection Principle (whatever that is), and not for reasons of feature attraction. This allows (in principle) for more than one type of movement to C which may be descriptively adequate, particularly given multiple wh-movement of the sort in (53b). A more interesting possibility arises when considering the Galician (58b), which is much better than its Spanish counterpart in (12), and even its Galician version in (58a): (58) a. Non sei qué (*ti ou eu) lle enviamos. not know.1 what you or I him send.past-ind.l-PL “I don’t know what you or I sent him.” b. No sei qué (ti ou eu) lle enviemos. not know.1 what you or I him send.pres-subj.l-PL “I don’t know what (you or I) to send him.” (58b) is relevant because the embedded clause involves both clear overt agreement and wh-movement, possibly over an overt specifier; hence, it should be out according to everything I have said, yet it is fine. The key, though, appears to be in whether the “intervening” agreement is indicative or subjunctive, a fact that is known to extend to many other Romance languages. It is reasonable to suppose that (58a) is good because the subjunctive verb is L-related to the embedded C, thus somehow extending the cycle to the next domain up.32 If this is true, one could apply the L-relatedness Lemma, and thus not go into early Spell-out upon hitting the specifier of IP. More generally, languages may use this sort of strategy, perhaps also involving L-relatedness between the embedded C and the matrix V, to avoid certain wh-island violations. A case in point may be the C incorporation hypothesized for Chamorro in (42).
8 Conclusions wh-movement in Basque serves to test both the Barriers framework and the Minimalist Program, in both cases with good results. The analysis has shown that Basque behaves as expected, and provides interesting clues concerning restrictions on wh-movement across a lexical specifier in languages with rich agreement. It is here that the Minimalist Program seems superior. Under the assumption that intervening lexical specifiers are moved for ancillary reasons (the real computational mechanism driving syntax being the process of feature attraction), their blocking effect is not surprising. Technically, the result follows 113
DERIVATIONS
from a morphological repair strategy which must proceed cyclically, forcing the system into early Spell-out. The proposed mechanisms do not block A-movement. Assuming that domains of Lexical-relatedness belong to the same cycle, the grammar does not need to invoke early Spell-out. This distinguishes A-movement from the less bounded wh-movement. The analysis confirms the view of the derivational system as dynamically split (in principle feeding PF and LF in successive derivational cascades, up to convergence and within optimality considerations); it also confirms the central role of morphological processes within the system at large.
114
6 LABELS AND PROJECTIONS A note on the syntax of quantifiers† with Norbert Hornstein 1 Introduction Determiners can be conceived as predicates of sets (see Larson and Segal 1995: Chapter 8). Thus a quantifier like all or some in (1) can be thought of as a relation between the set of whales and the set of mammals, as shown below, where B {x | x a whale} and A {x | x a mammal}. (1) a. All whales are mammals. a. ALL (B) (A) iff B is a subset of A b. Some whales are mammals. b. Some (B) (A) iff the intersection of A and B is non-empty Natural language determiners are conservative (see Keenan and Stavi 1986). A determiner is conservative if for any sets B and A that are its arguments the semantic value of “ (B)(A)” is the same as the semantic value of “ (B)(A 艚 B).” The conservativity of natural language determiners can be seen by considering the examples in (1). (1a) is true iff the set of whales is a subset of the set of mammals, iff the set of whales is a subset of the set of mammals that are whales. Similarly (1b) is true iff the intersection of the set of whales and mammals is not empty, iff the intersection of the set of whales with the intersection of the set of whales that are mammals is non-empty. Natural language determiners come in two varieties: strong determiners like every and weak ones like some (we are setting aside adverbs of quantification). It is not surprising that weak determiners should be conservative, as they are intersective. This means that the truth of a proposition introduced by a weak determiner only relies on properties of elements in the intersection of the two sets. This is illustrated by considering (1b) once more. If some whales are mammals is true then so is some mammals are whales. Thus, for a weak determiner , “ (B)(A)” is true iff the set B 艚 A has some particular property. Note that intersecting B and A yields the same members as first intersecting B and A and then intersecting this whole thing with B again, i.e. B 艚 A (B 艚 A) 艚 B. So here (B 艚 A) ⬅ ((B 艚 A) 艚 B). As this is the conservativity property, it should be now clear why intersective determiners are conservative (which is not to say that weak determiners cannot be treated as binary in the sense to be discussed below, see Section 9). 115
DERIVATIONS
In short, given that all that counts semantically for weak determiners is keyed to common members of the evaluated sets, and given that intersection simply yields common members, then intersecting the same sets again and again cannot have an effect on the outcome. Also, intersection is obviously symmetric. So not only do repeated intersections of the same sets yield no advantage to single intersections, but the order of the intersections makes no difference either. A 艚 B is the same as B 艚 A and ((A 艚 B) 艚 C) is the same as ((B 艚 C) 艚 A) and ((C 艚 A) 艚 B) etc. None of this extends to strong determiners. They too are conservative. However, the arguments of a strong determiner are not generally “interchangeable.” Thus, all mammals are whales is clearly not equivalent to (1a). In the case of strong determiners, it is critical that one distinguish its arguments and order them in some way, “ (B, A).” Indeed, the observation that strong determiners are conservative only follows if this ordering is respected. Thus, in (1a), whales is B and mammals is A. All whales are mammals is true iff all whales are mammals that are whales is true. However, all whales are mammals is not equivalent to all mammals are mammals that are whales nor to all whales that are mammals are whales. The order of the arguments makes a difference and this reflects the fact that strong quantifiers are not intersective. One of our central concerns in this chapter is understanding where that ordering comes from. The position we will be taking is simple: the ordering arises as a structural fact about how predicates and arguments are composed in natural languages. It is well known that verbs distinguish internal from external arguments. External arguments are thematically dependent on the compositional structure of the verb plus the object. This explains, for instance, why there are so many VObject idioms but no SubjectV idioms. If determiners are relations that take predicates as arguments, then it makes sense to think that the mechanisms for combining these predicates with their arguments should be the same as those the grammar exploits elsewhere, perhaps at some “higher order” of structuring. In effect, we should expect to find the internal/external argument distinction in such cases. Assume that this is indeed correct. What is the internal argument of a strong determiner and what its external argument? The first question has a ready answer. The internal argument is the NP that the determiner takes as complement. In (1a), whales is the internal argument of all, in effect, its object. What is harder to clarify is how the external argument is determined. In VPs, the external argument is a verbal specifier, which permits a compositional assignment of a -role to the expression in this position. Then by parity of reasoning, are mammals in (1a) should be (at whatever level this is relevant) the specifier of all. The question at issue is whether there is some direct way of getting this result, given the standard resources that grammatical theory makes available. In what follows we explore the following option. The structural distinction between internal and external arguments is related to labeling procedures, as it is these that allow one to structurally distinguish complements from specifiers. From this perspective, and given usual assumptions, it is not true that a VP like are mammals is the specifier of a determiner like all. Rather, VP is the comple116
LABELS AND PROJECTIONS
ment of I, which takes the projection of all as its specifier. However, we could ask the question whether this state of affairs holds throughout the entire derivation. Could it be that at some point in a derivation structures, in effect, “reproject”? Could a determiner like all take either VP or some projection containing it (say, I) as its specifier? These questions presuppose that labels are not indelible. This makes sense if labels are the consequence of relations established throughout the derivation. Seen this way, labels are analogous to “bar” levels or “relations of grammar” such as subject and object: real and substantive, but not primitive, and indeed capable of changing as the derivation unfolds. From this derivational perspective, it is entirely natural for “reprojections” to arise.
2 Move, project and reproject Assume a bare phrase structure (BPS) approach to phrasal constituency, and consider a phrase in which an expression , maximal, moves and targets a category K (i.e. a traditional “substitution” operation). Chomsky (1995b) argues that K must project to yield structure (2a). In contrast, the structure in (2b) (where projects) is taken to be ill-formed; consider why. (2) a. [K [K K0…[……]…]] b. [ [K K0…[……]…]] There are at least two ways to rule out (2b). First, one can render it illicit in terms of chain uniformity. If the lower copy of is maximal, then the upper copy is not in (2b), given a functional approach to bar levels. By projecting, becomes non-maximal and the chain fails to be uniform with respect to bar level. If chains must be uniform then (2b) is illicit. A second way of preventing (2b) is in terms of Greed, Attract and checking domains. For Chomsky (1995b) movement occurs when some feature of is attracted by some feature of K that needs checking for the derivation to converge. If projects, however, then is no longer in the checking domain of K0. This would prevent the feature that did the attracting from being checked. Thus moving would be illicit as it violates Greed. Regardless of whether either of these approaches is correct, we would like to consider the possibility that (2b) is legitimate; we will systematically call it a “reprojection” of (2a). There is some benefit, we believe, in allowing this. In what follows we (i) outline what that benefit is, (ii) spell-out the details of “reprojection,” (iii) consider some of its various implications, returning in particular to matters of chain uniformity, and (iv) emphasize that reprojection only applies to a class of determiners.
3 The argument structure of DPs We have observed in the introduction that natural language determiners are conservative, and that the conservativity of strong determiners relies on being 117
DERIVATIONS
able to impose an order on a determiner’s arguments. Specifically, it relies on distinguishing a determiner’s “internal argument” (the B argument in (1a)) from an “external argument” (the A argument in (1a)). None of this applies to weak determiners as they are intersective and, for semantic purposes, can be treated as unary, i.e. taking but a single argument. We further observed that the internal/external argument distinction has proven to be empirically useful in distinguishing the arguments of verbal predicates. It would be optimal if what we observe in the case of verbs could be somehow extended to determiners. (The analogy between determiners and verbs is illuminatingly discussed in Larson and Segal (1995); our argument here is extensively based on their observations.) The internal/external argument distinction finds its natural home in -theory. The internal argument of a verb is a sister of the verb head, the internal -role being “assigned” to “DP1” in a configuration like (3): (3) [V V DP1] In contrast, the external -role is “compositionally” assigned by the verb plus the internal argument. It is assigned to “DP2” in configurations roughly as in (4). (4) [V(P) DP2 [V() DP1]] Observe that, given these configurations, standard -role assignment is locally confined to the projection of V. The internal argument is in the immediate projection of the head while the external argument is in a more remote projection of the head. Of course, determiners do not involve standard arguments in the -theoretic sense (they do not exhibit lexical selection requirements, are not sensitive to thematic hierarchies, etc.). Nonetheless, suppose that, structurally at least, the semantic arguments of a determiner are also established in the local fashion mentioned above; i.e. in the projection of the head that takes those arguments. What does this imply for strong determiners? Recall that in a sentence like (1a), repeated here as (5a), whales is the internal argument while mammals is the external one. The standard structure of this sentence at LF is (5b), all whales having been moved from its underlying -position to Spec IP, in order to check various features. (5) a. All whales are mammals. b. [IP [DP All whales] [I are [SC t mammals]]]
( (1a))
From (5b) it is clear that whales is the internal argument of all – it is the immediate sister of the determiner. However, what is less clear (structurally) is what the external argument is. To make determiners parallel to verbs, we would like all to take as its external argument (are) mammals. However, all is in no structural position to do so if we assume that external arguments are established within the domain of heads that take them as such. Plainly, in (5b), all is in the domain of I0 where what we want is for (are) mammals to be in the domain of all. Interestingly, this impasse can be resolved (while holding faithfully to the 118
LABELS AND PROJECTIONS
parallel with verbs) by assuming that the DP projects its label in (5b). The structure would be as in (6). (6) [DP [D All whales] [IP are [SC t mammals]]] (6) is isomorphic to (4). The external argument of all in (6) is the I (now an IP) are mammals. The internal argument is whales. The assumption that arguments are established within the domain of the predicate that takes them, in this case all, is also retained. The only important difference between the DP and the VP case is that the internal/external structural difference is determined under Merge for Vs, but under Move for strong Ds. (This might also relate to why the VP projection plausibly employs a v-shell, whereas it is not obvious that the DP reprojection does; the addition of v does not affect our discussion.) The same feature checking seen above forces strong DPs out of their base VP positions at LF. A sentence like (7a) would have the LF (7b). (7) a. John likes every book. b. [IP Johni I0 [DP [D every book]j [VP ti likes tj]]] If every book failed to raise out of the VP and reproject, it would not find an external argument to which it could locally relate. Recall that inside the VP structure the DP must get a -role. This requires that likes project, but this reasoning does not affect the move to the outer, Case related, verbal specifier. By the time every book moves there, likes has established its -relations with every book and John within its projection. This allows every to reproject and meet its particular requirements vis-à-vis its external argument after moving.
4 The general form of reprojection Determiners do not leave their base configuration in order to meet their semantic demands. Nonetheless, an element which has moved for some bona fide syntactic reason (Case, agreement, etc.) can assume the structural properties necessary to obtain an appropriate interpretation. The relevant representations are interpretable in our examples precisely in terms of reprojection. Reprojection is not a transformation, or for that matter a computational procedure. It is a representational coding of the particular requirements that different dependents of a syntactic object happen to assume. For example, a quantificational element may have moved to, thus creating, the Spec of an inflectional element for Case reasons. Once there, what is the label of the category dominating the moved quantifier? The label qua Case is a projection of the inflectional element; however, qua the quantifier itself, the label should rather be taken as a projection of the quantifier. Assuming in terms of Greed that the movement-inducing Case relation is relevant before the other relation, there is good sense in saying that the second label is a reprojection. Viewed that way, reprojection is a matter of derivational perspective. If the derivation demands that the syntactic properties of some category X be accessible to the system, then X is taken to label (that is, type) the construction. 119
DERIVATIONS
If later on, the system needs to interpret some aspects of a category Y, sister to X, then Y is taken to label/type the construction. In our core instances, when the system needs to access I, then the relevant construct is typed as IP; but when the system needs to access the DP that has been attracted to I instead, and crucially the appropriate configuration is met, then the whole thing is typed as a DP. This will have important consequences, as we attempt to show below. There is a derivational ordering implicit in our previous comments which is intentional. The labeling of a phrase-marker may change from narrow syntactic representations (8a) to the final LF ones (8b), even if its constituent structure remains constant. This has non-trivial side-effects. (8) a.
b.
XP DP D
DP
X'
D' D
YP
XP YP
For example, we do not want (8b) to hold at the point of Spell-out, or we would incorrectly predict, in terms of Kayne’s (1994) LCA, that a strong determiner does not linearize in its normal subject position. (That is, the specifier of a structure precedes the rest of this structure, for both Kayne (1994) and Chomsky (1995b). After reprojection, the I turns out to be the specifier of D, but we clearly do not want to say that I precedes D.) Intuitively, at the point of Spellout there is no reason why the system should interpret labels as in (8b), since it is only semantic demands on the determiner that entail the reprojection – but we must show how the system guarantees this. It is also worth pointing out that the structural imbalance of (8a) (where DP is a “capped off” projection) is shifted in (8b) (where XP is “capped off”). Characteristically, “capped off projections” induce Huang’s Condition on Extraction Domain (CED) effects, preventing transformations across (see Chapters 3 and 4). If representational changes as in (9) really occur, we expect there to be contexts which do not induce islands in narrow syntax, but do at LF: (9) a.
Narrow syntax:
b.
LF component:
ZP
ZP …
WP
WP
…
XP DP D
DP D'
X' YP
D
…t…
relation across no barriers 120
XP YP
…t
relation across a barrier
LABELS AND PROJECTIONS
We may also observe that reprojection might be impossible in some contexts. Adjunction of D to X forces the creation of a category segment, represented in between inverted commas in (10) below. Plausibly, category segments must be labeled in terms of the category they are segments of, and thus cannot assume the label of the item that induced the projection (i.e. must have label X, not D). If so, reprojection would be banned in adjunction sites, since as it were it would not be a “clean” reprojection. (10) a.
Adjunction:
b.
Failed reprojection:
"X' " X'
DP D
? X'
D'
YP
D
YP
Reprojection may also affect the details of “reconstruction.” The question is: “Is it or is it not the same to reconstruct a chain through the clean path implied in regular structures than to do it through the somewhat altered path arising in reprojection?” In sum, if it exists, besides giving us a syntax for generalized quantification, reprojection should have plenty of structural effects. It should induce LF islands (Case A), which emerge only in contexts where the labels involved belong to bona fide categories, not segments (Case B), and perhaps affect specific reconstructions (Case C). In what follows we consider data that suggest that each of these possibilities actually arises.
5 Quantifier induced islands (Case A) Honcoop (1998) presents a convincing summary of a part of the literature devoted to contexts which are transparent with regards to some, though not all quantificational dependencies that involve operator-variable expressions. For instance: (11) a. *Nobody gave every child a red cent. b. Nobody gave two children a red cent. c. What did nobody give every child? There are two separate issues involved in these quantifier induced (QI) islands: (i) the constructions that test them, and (ii) the quantifiers that provoke them. Consider each in turn. QI islands manifest themselves in “split” constructions, which characteristically involve a quantificational expression that attempts to establish a relation with another element, typically an indefinite. A canonical instance of a split construction is the relation between a downward entailing quantifier and a negative polarity item (NPI), as in (11) above. Other such constructions analyzed in the 121
DERIVATIONS
rapidly growing literature include the “what . . . for” split of Germanic and Slavic languages, partial wh-movement and possibly also some instances of multiple questions. Split constructions cannot take place across certain weak islands, induced by some quantifiers. Clearly, all weak quantifiers in their standard use do not induce these islands. However, strong quantifiers exhibit a somewhat strange behavior. Most induce the relevant islands, but some do not. In particular, names, definite descriptions, demonstratives and kind-denoting plurals, allow split constructions across them: (12) a. b. c. d.
Nobody gave Billy a red cent. Nobody gave the child a red cent. Nobody gave that child a red cent. Nobody gives children a red cent.
These all contrast with strong quantifiers as in (11a), yet pattern with them, and against weak quantifiers, in triggering a “definiteness effect”: (13) a. *There is every child here. b. There are two children here. c. *There is/are Billy, the/that child, children here. Honcoop acknowledges this much, and discusses some possible semantic ways in which the right cut can be made. In our terms, the issue is in terms of reprojection, and our challenge is to demonstrate, first, that only a subset of strong quantifiers reproject; and second that, in spite of this, all strong quantifiers exhibit a definiteness effect as in (13). Before we return to that second goal and to a detailed presentation of why islands should arise in our system, we would like to point out that it is natural that names, definite descriptions, demonstratives, and kind denoting plurals, should not reproject. Remember that reprojection is a need only when we have a binary quantifier (involving internal and external arguments). The relevant question, then, is whether the elements we are now considering should be taken as binary. The issue hardly arises for names, at least in the classical view. As for definite and demonstrative descriptions, in most instances there is not much reason to make an internal/external argument distinction; witness: (14) a. The/these men are mammals. b. The/these mammals are men. Whenever (14a) is appropriate and true so is (14b). But this is the distinguishing mark of an intersective determiner. As noted earlier, intersective determiners can be treated as unary, which suggests that definite descriptions and demonstratives are (or at least can be treated as) unary. If so, they should not have to invoke an external argument, and therefore the syntax of reprojection. Then whatever accounts for why split constructions are impossible across reprojection should be irrelevant when names and articles are involved (see Pietroski 1999 122
LABELS AND PROJECTIONS
for much related discussion). Presumably the same can be said about kinddenoting expressions, although we will not argue for this now.
6 The emergence of LF islands (an account of Case A) Whatever account one proposes for QI islands, it must ensure that the basic facts in (11) are preserved: (15) […X…[Q…Y…]] R The relation R between X and Y cannot be absolutely prevented. R must be possible if either Q is unary or R takes place overtly (even if Q is binary). This relativity has made these facts reasonable candidates for semantic analyses of the sort Honcoop reviews and presents. Our syntactic analysis also addresses the relativity head on. Consider the chains involved in reprojected structures: (16) QUANTIFIER PRIOR TO AND AFTER REPROJECTION a. b. XP X'
DP D
DP XP
D'
YP
D … DP …
YP … DP …
In (16a) DP has moved, subsequently reprojecting and thus turning into D (16b). As a consequence, a chain involving D and DP links is not uniform, in the sense of Chomsky (1995b). The uniform chain would involve different occurrences of identical DP links, which do exist in (16b). However, the upper link includes the lower link. Formally, the top object is {D, {D, XP}} (for D the label and D and XP the constituents), and XP includes the lower DP (formally: {X, {X, {…DP…}}}). That is thought to be an impossible chain, where links (that is, category occurrences of the moved item) do not stand in a command relation. However, prior to reprojection the chain in question would certainly be appropriately identified. In (16a) a uniform chain can be easily observed, involving the upper and lower DP copies in a command relation. Chain identification, thus, must plausibly take place prior to reprojection. In turn, that very process of chain identification, assuming it involves some component of grammar, will create an island for processes that take place afterwards. To see this, compare once again the two types of syntax we are working with, enriched so as to include some neighboring context (we use the notation “XP|DP” to indicate the “reprojection of XP as DP”): 123
DERIVATIONS
(17) a.
INTRANSITIVE Q
b.
TRANSITIVE Q
ZP Z
ZP … XP XP
D' D
Z
… XP|DP
D'|DP
YP
D …W DP…
XP YP …W DP…
For a chain to be identified, the task the grammar faces is that of singling out an object involving the upper and lower occurrences of DP, technically {{DP,XP},{W,DP}}, where the phrasal context identifies each occurrence. The minimal amount of structure the grammar needs to operate on, in order to allow said identification, is XP, which contains both occurrences (both phrasal contexts) entering into the chain relation. We have circled that area in (17b), intending to denote the creation of a sort of “cascade” of derivational processes in this way. Following a similar idea for the overt syntax discussed in Chapter 3, we assume that “cascades” are interpretive cycles, which gives meaning to the idea that the system has “identified a given chain.” This is crucial, if the parts of a syntactic unit so interpreted become inaccessible to the computational system. The intuition is this: a structure X that has been “cashed out” for interpretation is, in a real sense, gone from computation. The terms of X surely must still be interpretable, but they are literally inaccessible to the syntactic engine. Covert operations across cascades are thus predicted to be impossible. For instance, suppose Z in (17b) were trying to syntactically relate to any element within the cashed out XP|DP (as in split constructions). There would be no way of stating such a relation, and as a result either the derivation would crash (if the relation is needed for convergence higher up in the phrase-marker) or else an interpretation based on the impossible relation would simply not obtain, and the relevant meaning would be unavailable. QI islands all reduce to the situation just described. For instance, let Z in (17b) be an NPI licenser, and suppose it is trying to relate down to an NPI, across XP|DP – as would be the case if the specifier of XP were taken by a strong quantifier. Trivially, now, the split relation in question would not obtain. Why do intransitive determiners not create the same interpretive cascades? They do not have to. Contrast (17a) with the already discussed (17b). The system does not have to create an interpretive cascade here, since the chain {{DP,XP},{W,DP}} can wait until further up in the phrase-marker to be identified. This is because no reprojection obtains with the intransitive element, and thus the chain is not forced into an “immediate identification” before the possibility of a uniform chain is destroyed after reprojection. Differently put: the 124
LABELS AND PROJECTIONS
system in this instance does not need to go into an interpretive cascade that separates a chunk of structure from the rest; if so, then any syntactic connection is still entirely possible in the undivided object, as desired. Slightly more precisely, it should be economy that prevents an interpretive cascade when it is not needed (it is cheaper to have fewer cascades than more). If so, access to LF interpretation should be somewhat akin to morphology application in the other side of the grammar (see Chapter 5), both operations with a cost that the system minimizes. Note also that, for these ideas to go through, the system must be heavily derivational and entirely cyclic. The computation cannot be allowed to wait until after all LF (in particular, “split”) processes have taken place to, as it were, “come back” and identify chains. If that were possible, there would be no interpretive cascades, and thus no associated LF islands. Such a system, also, would need further conceptual assumptions which seem hard to motivate in the Minimalist Program. The simplest proposal, thus, is also empirically adequate. It might seem as if, in our derivational model, chain identification must be seen as prior to certain LF processes. That would be contradictory, for presumably chain identification is the last bona fide LF function prior to handing structures over to the interpretive components of the intentional system. However, if we follow the derivation through as it proceeds upwards in the tree, we will see that chain identification never precedes any other LF process. Indeed, our proposal about LF islands relies on the fact that chain identification caps off a certain cascade of structure, making its parts inaccessible for further operations. Of course, further up in the phrase-marker it is possible for operations to continue, so long as they need not access the parts of the cashed out structures – that is the whole logic behind a cyclic system.
7 Incorporated quantifiers (Case B) We noted in Section 4 how adjunction, as involved for instance in incorporation, should create problems for our hypothesized reprojections. Kim (1998) observes that, in Korean, certain underlying objects can appear case-less, which he attributes to Noun-incorporation. Importantly, weak and strong determiners fare differently in this respect: (18) a. Nwukwu wassni? someone came b. *Motun salam wassni? All men came And most crucially for our typology, names and definite descriptions align with weak, not strong determiners: (19) a. Con wassni? John came b. Ku salam wassni? the men came 125
DERIVATIONS
At the very least, we can say that Kim’s observation follows closely the ones reported by Honcoop, in terms of the typology of items involved in each. However, Kim’s facts have nothing to do with islands. We find this rather important, since it suggests that the proper treatment of these effects must go well beyond their island-inducing qualities. Once Kim has unearthed the Korean facts, it does not seem particularly difficult to replicate them in English, in other instances that would seem to involve noun-incorporation, or at any rate adjunction to N (which is what matters for our purposes): (20) He is a
冦
Stalin children-as-a-group two-people-gabbing *most-children-gabbing *every-child
冧
hater.
Surely other restrictions arise in compound formation (for instance involving articles, thus preventing *the/some-child-hater regardless of the weak/strong distinction), but otherwise it would seem as if Kim’s observations generalize.
8 Clean reprojections (an account of Case B and a consequence) As we saw, it makes sense to say that adjunction structures cannot reproject, if they are not “clean,” involving complex segments as opposed to entire categories. Suppose a transitive determiner were to incorporate in Korean. Subsequently, if the incorporated phrase may not project, the derivation will result in a convergent representation with no semantic intelligibility, and no alternative derivation. In contrast, intransitive determiners face no such problem, and since they do not need the relevant reprojection, their interpretation is directly possible. That part of the analysis is straightforward enough, but we would like to extend it to less obvious instances involving existential constructions of the sort in (13) above. Observe that if every must reproject in (13), and if there prevents every man from so doing, then we have a simple account for why strong DPs like those are barred from being “associates” to an expletive. Chomsky (1995b) argues that associates (or at any rate, their crucial categorial features) adjoin to the pleonastic at LF. By parity of reasoning with the analysis of the Korean facts, the adjunction in question should prevent an associate from reprojecting, since the reprojection would not be clean in this instance either. This is fine if the associate is a unary quantifier, but again results in semantic deviance if it is a binary quantifier, thus the definiteness effect. There is a wrinkle (or a prediction anyway). The key to our account of the definiteness effect is not whether a determiner is strong, but rather whether it is transitive, that is binary. This fares well with strong quantifiers, barred in the 126
LABELS AND PROJECTIONS
contexts in question. But it raises an issue about names, definite descriptions, demonstratives, generics, and other such elements which, according to our tests, are (or at least can be) intransitive, that is unary. Are they possible or impossible in existential constructions? There are well-known so-called exceptions to the definiteness effects that typically involve precisely the sorts of elements that, according to our explanation, should be syntactically possible in those contexts, if they are indeed unary: (21) a. b. c. d.
Who can play Hamlet? Well, there’s Olivier . . . What can we use for a prop? There’s the table . . . There’s this crazy guy in your office screaming. These days, there’s crooks/the prototypical crook all over the place.
These contexts are obviously exemplary, presentational, or quasi definitional, which clearly affects the definiteness effect. Nonetheless, it is rather clear that strong quantifiers proper are not welcome here, even in these prototypical circumstances: (22) a. Who can play Hamlet? #Oh, there’s everyone . . . b. What can we use for a prop? #There’s really most things in the gym . . . c. #There’s/are most crazy guys in your office screaming. d. #These days, there’s all crooks all over the place. In turn, true existentials (controlling for the prototypical contexts just mentioned) are bad if they involve names, definite descriptions, demonstratives, or various generic expressions: (23) a. b. c. d.
#There’s Olivier on stage. #There’s the table you got me for a prop on stage. #There’s this door [speaker points] in my house. #There’s crooks/the prototypical crook in jail.
Still, nothing in the logic of our account prevents a definiteness effect here. All we have asserted is that, when the determiner associate is binary, it cannot reproject, resulting in an uninterpretable representation. When the determiner associate is unary, it should be able to reproject, but something else may be preventing it from appearing in these existential contexts. That must be the presuppositional character of names, definite descriptions, demonstratives, and generics, which makes them incompatible with the intrinsically non-presuppositional character of existential statements. What is wrong with the examples in (23), in our view, has nothing to do with the presence of the pleonastic there; the issue is an expression that forces a nonpresuppositional subject. This is consistent with the fact that the examples in (21), where the context is not existential, but exemplary, presentational or more generally prototypical and permits the unary elements in point. At the same time, those very examples still forbid transitive determiners, as (22) shows. (This 127
DERIVATIONS
presupposes a theory of the presuppositionality of names which is different from whatever is involved in their LF position.)
9 Reconstruction (Case C) To conclude our analysis of the factual consequences of reprojection, we want to discuss reconstruction. The general reasoning deployed above deduces one aspect of Diesing’s (1992) mapping hypothesis. Her proposal is (in part) that strong DPs must be outside the VP shell by LF to be properly interpreted. Diesing relates this to Heim’s (1982) tripartite structure for the proposition. The mapping principle provides an algorithm for taking an LF phrase marker and converting it into a Heimian 3-part proposition. We believe we obtain a similar structural conclusion without additional stipulations. In order to reason this out, we have to make certain commitments about reconstruction. We will not have anything to say about A-reconstruction here, but consider A-reconstruction. Suppose that, given an A-chain CH {{DP,XP}, {W,DP}}, reconstruction is interpretation of DP at any of its occurrences other than the pronounced one (typically, the highest). For example, if DP is pronounced in a Case position, reconstruction will be interpretation of its -occurrence, inside VP. Let us point out the following interpretive corollary: (24) A D chain must be interpreted in the occurrence that reprojects. (24) follows from the semantics of the determiners that reproject. If we were to reconstruct a determiner that reprojects, we simply would not be able to meet its semantic demands, the whole reason it reprojected to begin with. If in contrast D does not reproject, there is no reason why it should not reconstruct. This is in spite of the fact that what carries binary determiners out of the VP shell, such as Case, is also carrying unary determiners. It does not matter, if the unary determiner does not engage in any reprojection. As a consequence, in principle any of the occurrences of a DP chain headed by a unary determiner can be the source of interpretation. In large part, that predicts the Diesing effect, at least as interpreted in Hornstein (1995a). Diesing was speaking of weak vs. strong determiners, and as we saw the distinction for us is rather unary (generally weak) vs. binary (generally strong, except for names, definite and demonstrative descriptions, kind plurals), but we believe this can be accommodated. Thus, as we saw for our treatment of the definiteness effect, names, definite and demonstrative descriptions, and arguably kind plurals, can all be seen as intrinsically presuppositional, regardless of the shape of their syntactic support. If so, there is no obvious reason why the syntax should map these elements in VP external position. An interesting question, also, is whether binary DP subjects are interpreted in their -occurrence or rather in their Case occurrence. It is not easy to decide 128
LABELS AND PROJECTIONS
this semantically, since in both instances the DP is “outside VP,” in the very specific sense that it could take as its first argument the NP complement, and as its second argument either the “rest” of the VP (when in its (-occurrence) or else the I (when in its Case occurrence). The logic of our proposal forces an answer, given the facts. If binary DP subjects were allowed to be interpreted in their -occurrence, why should they reproject early, thus inducing an LF island? The reason we had for an early bleeding of the derivation into LF was because, otherwise, if we waited after reprojection, we would not be able to identify the chain that induced the reprojection under command and uniformity conditions. However, it is not clear that a DP subject in its -occurrence would have to be involved in any chain identification other than the one its trivial occurrence already provides. After all, who cares if the Case occurrence is not interpreted? It is uninterpretable to begin with. That leaves us with a mystery. Why can it not be the case that the binary DP subject is interpreted low? (Notice, in contrast, that binary DP objects do not have that option, since that would not yield an interpretation, missing one of the arguments.) We do not want that possibility, but nothing that we have said prevents it as an option. Perhaps this relates to the Extended Projection Principle for subjects, but we will not venture any speculations.
10 Quantifier raising (more details and a consequence of Case C) The difficulty just posed has a surprising consequence for quantifier interactions. Consider (25): (25) Many of the senators read every law proposal. If this sentence is ambiguous between narrow and wide scope readings for every law proposal, then there must be a rule of Quantifier Raising, independent of the configurations that standard grammatical relations (Case, agreement) otherwise achieve. The problem is that many of the senators is, as per the reasoning we reached immediately above, frozen in the subject site. The moment this is true, then the only other known way that every law proposal will be interpreted higher up in the structure is if some process promotes it there. That process, however, cannot be a standard rule of grammar motivated by some other syntactic considerations, under the assumption that the surface structure in (25) has already undergone the action of all such rules. That conclusion can be confirmed by trapping a quantifier inside another which, in addition, reprojects: (26) a. I don’t think that many of the senators (*ever) read every law proposal. b. What don’t you think that many of the senators (*ever) read? The impossibility of the polarity item tells us that many of induces an island, as 129
DERIVATIONS
expected if it is a strong binary quantifier. This must mean that many of reprojects, crucially after having sent the structure that properly contains its chain to early interpretation. Importantly, we have now placed another quantifier in object position which may take scope outside of many of. In spite of the fact that the quantifier would not seem to involve overt movement, it is behaving as if it were (like what in (26b)). This is intriguing, because we seem to have contradictory requirements: sending the reprojected structure to interpretation is what creates an island higher up in the structure, thus blocking the split construction. Yet the quantifier seems capable of escaping this island. We can avoid the paradox if things proceed as in (27), where parenthesized elements represent traces of movement. (27a) is the source structure. (27a) and (b) may apply in that order or the inverse, depending on one’s assumptions about overt A-movement (an issue that is immaterial now). Matters start being interesting after (27d), which takes place in the LF component. Note that QR has to carry every outside the scope of many of. It is not enough to piggyback on other grammatical sites of every because the logic of the analysis forces the upper occurrence of many of to undergo interpretation, thus yielding the necessary island. Very significantly, also, QR must take place before chains are identified, or else it would be as impossible as the split construction. Reprojections are presumably automatic after chains are identified (we do not code the reprojection of every not to clog the picture). (27) a. -relations established: [many of [V every]] b. Accusative Case checked: [every [many of [V (every)]]] c. Nominative Case checked: [many of [every [(many of) [V (every)]]]] d. QR of every to wide-scope site: [every [many of [(every) [(many of) [V (every)]]]]] e. Chain of many of identified: [every [many of [(every) [(many of) [V (every)]]]]] f. Reprojection of many of : [every [many ofi [(every) [(many ofi) [V (every)]]]]] X|DP (island) This order of things is surprising, and suggests that either QR is an overt process (Kayne 1997) or else that what is involved in split constructions is a post-LF interpretive phenomenon. Either way, what is important to note is that syntactic processes like QR and wh-movement do not seem to be sensitive to QI islands, which clearly puts them in a different component. Our approach appears to be compatible with either early QR or late split constructions, however one further consideration may help us decide. Given the way we reasoned out the induction of LF islands through interpretive cascades, after reprojection there should be no way that a reprojected determiner can syntactically connect to its own trace in the -position (the chain would either not be uniform or not satisfy its regular command conditions): 130
LABELS AND PROJECTIONS
(28) a.
Move prior to reprojection: b.
After reprojection:
XP X'
DP D
XP XP
D'
YP
D
YP
…t …
…t …
NO CONNECTION This must mean that the relation between the determiner and the -position is not a syntactic relation, but rather involves direct binding from the determiner to the variable in the -position. There is another canonical instance where disconnected constituents find each other, presumably in terms of the independently needed notion of “antecedence”: (29) a. * Which person did you hire him after meeting t? b. Which person did you hire t after meeting him? (29a) is a typical CED effect. However, as (29b) directly shows, the question operator can, from its commanding position, bind into the adjunct clause. Whatever prevents the chain relation in the first example does not prevent the corresponding antecedence relation in the second (see Chapters 3 and 4). Similarly, a binary quantifier may bind the variable in the -position, even if it does not form a chain with it. Of course, it must be the head D that binds the variable, since only it commands the variable, given the syntax in (28b). One cannot say, however, that split relations are also of this sort, precisely because they cannot take place in the conditions just described. So if post-LF processes involve “antecedence,” then split relations may not, at least not merely. This pretty much entails that, in fact, there are not two options for the state of affairs implied in (26), and instead QR takes place prior to the LF component. If so, it would be immune to interpretive islands that arise after early chain interpretation prior to reprojection.
11 A note on unary determiners To this point our main focus has been on binary determiners, which, we have argued, require their arguments to be ordered. This ordering has been treated as relatively similar to -role assignment and, we have assumed, must be done within the domain of the determiner. This assumption is what drives reprojection: (30) D orders P if P is in the domain of D. Until now, we have said nothing about how to interpret unary determiners except to observe that being intersective the arguments of such a determiner 131
DERIVATIONS
require no specific ordering. Given this, we can maintain that in contrast to binary determiners, unary ones do not specify their arguments at all. What then is the syntactic relation between a unary D and the NP it binds? We can answer that by strengthening (30) as (31): (31) D orders P if and only if P is in the domain of D. (31) and the assumption that unary Ds do not order their arguments result in the conclusion that in the weak reading of (32) many does not take men as an argument. Plausibly then many is a D adjoined to the nominal whose variable it binds: (32) a. many men left b. [NP [D many] [NP men]] Adjunction of the sort proposed in (32b) has a nice correlate, under the assumption that adjunction of X to Y does not alter fundamental structural relations: Z
(33) a. Y
Z
b. …
“Y” X
… Y
That is, if “. . .” is the command domain of Y in (33a), “. . .” is still the command domain of X adjoined to Y (33b). This can be independently shown, given the fact that a verb raised to Tense, for instance, behaves as if it were in situ with regards to its arguments (the intuition behind Baker’s (1988) Government Transparency Corollary). Then in (32a) many can command left and bind it, as well as men, in virtue of being adjoined to the latter. That permits us to provide a standard semantics for examples like (32a) which end up looking roughly as (34). (34) many x: [(men x) (left x)] Observe that this comes very close to analyzing the binding powers of unary quantifiers as akin to those of adverbs of quantification such as occasionally or sometimes. Whether that means treating them adjectivally (as in the classic Milsark 1977 analysis), or in some other way, still consistent with the geometry in (39b) (as in Herburger 1997), is immaterial to us. There is one important point that this proposal turns on. We are distinguishing the binding powers from the argument taking powers of a determiner. All determiners, unary and binary, bind arguments and “close off” variables in predicates. However, only binary determiners must order these arguments. Binding per se does not require reprojection (or even projection). What does require reprojection is ordering the arguments (akin to thematic structuring). Binary determiners order their predicate arguments and so must reproject. Unary ones do not so order their arguments and so require nothing of the sort. 132
LABELS AND PROJECTIONS
12 More data and a comparison with a semantic account An account like Honcoop’s, or any related proposal based solely on the island effects observed throughout this chapter, will of course come short of extending to the incorporation facts discussed in Sections 7 and 8. It is worth exploring whether other facts can distinguish between our syntactic account and a semantic approach. The logic of a proposal like Honcoop’s, or others alluded to in Honcoop’s thesis (including Szabolcsi and Zwart 1993, which started the series) goes like this. A complex expression (A, B) cannot be split across an “island” domain D because of some crucial semantic property of D. For example, for Honcoop A cannot relate to B across D because D disallows, more generally, all instances of “dynamic anaphora.” Thus compare: (35) a. Every child brought a bike. *It got parked in the garden. b. A child brought a bike. It got parked in the garden. A bike cannot hook up to the pronoun it in (35a), in the way it can in (35b); the only difference between the sentences is in terms of their being introduced by every as opposed to a, and we can give an explanation for these binding effects in standard dynamic binding terms. The question is, can that be the general explanation for the contrast in (36)? (36) a. *Nobody thought every child brought a damn bike. b. Nobody thought a child brought a damn bike. Certainly, Honcoop’s account does work for this particular example, where the relevant dynamic binding is between nobody and the NPI, across a domain introduced by either every or a. And of course that is compatible with everything we have said, which could be seen as nothing but the syntax for those semantics. However, is there a way to tease the two accounts apart? One possibility may be to construct an example which satisfies dynamic binding properties while still disallowing split constructions. Consider in this respect (37): (37) a. Every child can bring a bike. It should, however, not be left in the garden. b. *Nobody thinks that every child can bring a damn bike. It is known that the dynamic binding effect in (35) gets attenuated in modal contexts, as seen in (37a). Nevertheless, a context of that sort has apparently no attenuating effect on QI islands, as (26b) shows. This is unexpected in Honcoop’s terms. At the same time, consider (38): (38) A child didn’t bring a bike. *It didn’t get parked in the garden. Dynamic binding predicts the impossible sequence in (37). Negation acts as a blocker, just as binary quantifiers do; hence it is expected to produce a QI island 133
DERIVATIONS
effect. This is not testable with negative polarity items (since they would get licensed by the intervening negation itself). But negation is known to block other split constructions. This is unexpected for us, unless we can show that – somehow – negation forces reprojection. We will not attempt this now.
13 Conclusions and further questions This paper has explored a syntax for binary quantification, in terms of a straightforward possibility that bare phrase-structures allow within the Minimalist Program. In essence, independently moved, binary quantifiers are allowed to reproject after meeting whatever syntactic requirements carried them to wherever they are. As a result, the semantic arguments of a quantifier are also its natural syntactic dependents, in terms of domains of the sort already needed to account for the specifications of -theory. The proposal has immediate consequences for various LF processes, including the possibility of undergoing chain identification or reconstruction. In a nutshell, reprojected structures drastically affect the shape of the LF phrasemarker, which in turn disallows certain crucial properties that normally obtain, for instance under command. Assuming chains must be uniform as they are identified by the system, the reprojection of structures will necessarily follow chain identification. This chain identification will result, inasmuch as it involves the minimal phrase-marker X that includes all links of the chain-to-be-identified, in the emergence of a barrier for LF processes that involve material internal to X and some other element outside X. Right there we see direct differences between QI islands and more standard structural islands. A sentence headed by a wh-phrase, for instance, is an island per se. QI islands, however, emerge because the system is forced to go into early interpretation, or is not able to identify a chain and give an appropriate interpretation to a binary quantifier at the same time. This entails that not just the “specifier” branch is a QI island, but indeed the whole structure of dependents of the quantifier is. Thus, observe that (39b) is no better than (39a): (39) a. Nobody gave every [critic] [a present/*a red cent] b. Nobody gave every [[critic of the movie/*a damn movie]] [a book to read] (39a) is what we saw throughout; but in (38b) we placed the NPI inside the first argument of the binary determiner every. After reprojecting it, that first argument is in effect the complement of the determiner, and thus would seem to be in a transparent path with regards to the NPI licenser – but licensing fails. In our terms it should, and it does because it is not just the specifier branch that becomes an island, but the whole reprojected DP, since it is the entire object that must go to early interpretation in order to identify its chain prior to reprojection. It is that very logic that forced us to consider a rule of QR in the previous 134
LABELS AND PROJECTIONS
section, and indeed one that surprisingly takes place prior to split processes at LF, since quantifiers appear to be able to move to their scope-taking positions regardless of whether they do it across other binary quantifiers. We wanted to salvage a treatment of QR restricted to LF, by placing split processes in a postLF component, but we could not, since we need that post-LF component for antecedence, which is presumably what allows a determiner to relate to its own -position inside the very island that it induces. We compared our approach to a semantic one, and although it is quite possible that the two could go hand in hand, we have found some data that suggest otherwise. In particular, we have seen how in certain constructions quantifiers can induce LF islands, yet still allow dynamic binding across. This is reminiscent of what we have just said about antecedence: it would appear that bona fide binding is just not sensitive to LF islands, and instead it is other processes that are: split constructions (though not QR). On the other hand we have seen that negation both induces an LF island and disallows dynamic binding across – yet is not an obvious binary quantifier. One last concern we have that should be more systematically explored is what to do with strong quantifiers that do not have the trivial form of an article, such as at least one but no more than three children (Keenan 1987). This poses a serious question for us: what does it mean to reproject at least one but no more than? Many answers come to mind (Boolean phrases, incorporation, parallel structures), but they should be seriously studied. Related to this is the matter of adverbs of quantification, which we have set aside, and hopefully also the difficulty posed by negation.
135
7 A NOTE ON SUCCESSIVE CYCLICITY † with Juan Carlos Castillo 1 Introduction Most analyses of successive cyclic movement under the Minimalist Program have centered around the notion of a wh-feature in the embedded Comp. We suggest that this feature is spurious, given that it has no semantic import and its only purpose is to trigger the very movement it tries to explain. We make use of Richards’s tucking-in, and extend it from Move to Merge. This analysis, inspired by the Tree-Adjoining Grammar approach, allows us to trigger the effect of successive cyclic movement by postulating one single application of Move, and then merging elements from higher clauses below the position of the moved wh. Issues of clausal typing, typology of cyclic movement and wh-islands are also explored, in connection with ideas about phrase structure that apply independently to other constructions. We will assume that successive cyclicity holds of movement transformations. The questions we try to answer are (i) why this is the case, and (ii) how it can be captured. Successive cyclicity was initially required in order to implement movement out of bounding nodes (Chomsky 1977b). The intuition is still alive in Chomsky (2000): phases (substitutes for bounding nodes) are impenetrable except to their edge. If a category C manages to relate to the edge of a phase, then C may be able to relate further up in the phrase-marker. Why do edges have this privileged status and exactly how is the mechanism supposed to work? About the first question, Chomsky has little to say.1 About the second, he proposes a technical solution. Partial derivations have access to a special set of peripheral (P) features, which happen to be active when successive cyclicity is necessary, and happen to be checked at the edge of phases. Naturally, why any of this holds is just what we would like to understand. A different alternative exists in the literature which, we believe, has a more plausible take on successive cyclicity. It is due to Kroch and Joshi (1985), and is based on the formalism of a Tree Adjoining Grammar (TAG). Their idea is very simple. A sentence like (1a) starts in a structure like (1b): (1) a. I wonder who John thinks that Mary loves. b. [[Who]j [IP Mary loves tj]] 136
A NOTE ON SUCCESSIVE CYCLICITY
In their grammatical formalism, one can literally split who from the rest of the tree in (1b), and insert “under” it an entirely separate sub-tree John thinks that, as follows (avoiding technical details): (2) a. tree splitting: [Who]j & [IP Mary loves tj] b. additional sub-tree: [John thinks that IP] c. tree pasting, 1st step: [John thinks that [IP Mary loves tj]] d. tree pasting, 2nd step: [[Who]j [John thinks that [IP Mary loves tj]]] (2d) is the relevant chunk of structure that interests us here. One where “successive cyclic movement” has taken place. The basic intuition of this approach is that bona fide movement is always local, as in (1b). What carries a moved phrase further and further out in the tree is not movement, but tree splitting. One can dismiss this sort of account in Bare Phrase Structure (Chomsky 1995a, 1995b). If one does not have trees at all in one’s representational repertoire, but familiar sets of terms, then it is not clear how to execute the tree splitting and pasting operations.2 However, we believe that this is an overly narrow perspective, and that a generous reading of this proposal can be made by way of extending a device proposed in Richards (1997), tacitly assumed in Chomsky (2000): so-called tucking-in. We discuss the details of this possibility next.
2 Tucking-in As Chomsky (2000) points out, two natural conditions come to mind when thinking of how the constituents of a tree should be merged. Let us start with (3), where Y is merged to X, which projects: (3) a. tree notation X X
b. Official BPS notation K {X, {X, Y}}
Y
Suppose we want to now merge Z to X. Where do we add it? At the root of K or at the leaves? The first of these two possibilities would satisfy the Extension Condition in (4), whereas the second would satisfy an equally natural condition demanding Local Association, as in (5): (4) Extension Condition Always extend the phrase-marker as operations take place in it. (5) Local Association Always associate locally to given lexical terms. The effect of (4) is to make K grow at the top, whereas the effect of (5) is to make K grow at the leaves. Neither of these is preferable a priori, and it is quite 137
DERIVATIONS
possible that the empirical differences are equally insignificant, as Drury (1998) shows in some detail. Chomsky stipulates no Local Association to the head of an already assembled phrase-marker. We shall return to this. What matters to us now is that Chomsky does allow associations not only as in (6), but also as in (7), as is natural: (6) Root Association a. tree notation
b. official BPS notation {X, {W, {X, {Z, {X, {X, Y}}}}}}
X merge
W
prior to 3rd merge {X, {Z, {X, {X, Y}}}}
X Z
X X
Y
(7) Leaf Association a. tree notation
b. official BPS notation {X, {Z, {X, {W, {X, {X, Y}}}}}}
X Z merge
prior to 3rd merge {X, {Z, {X, {X, Y}}}}
X W
X X
Y
(7) is Richards’s tucking-in process.
3 Applying tucking-in to successive cyclicity Generally, tucking-in has been used for processes of movement, as in Richards (1997). It is not obvious, however, that anything would prevent tucking in of a merged sub-phrase-marker. In other words, W in (7a) may be there because of Move or, equally plausibly, because of Merge. Suppose this is the case. Then we can recast the TAG analysis in present terms. To get the gist of the idea, Z in (7a) will be who in (2a), W in (7a) will be John thinks that, and the rest of the structure in (7a) will be Mary loves tj in (2a). However, things cannot be that simple or we would get (8): (8)
CP whoj merge
C' W
John thinks that C
C' [IP Mary loves t j]
138
A NOTE ON SUCCESSIVE CYCLICITY
One could of course argue for (8) as the structure of long-distance questions, but we are not ready to do that. Consider one argument against this view. Of course, the logic of what we are saying in this chapter extends to instances of successive cyclic A-movement, assuming it exists (but see Castillo, Drury and Grohmann 1999; Epstein and Seely 2002). Relevant structures would be as in (9), tucking in seems to be likely: (9)
IP Johnj merge
I' I'
W
seems to be likely [?]
to
[VP t j love Mary]
An immediate problem is what counts as the complement of (be) likely, which we have marked as a question mark in (9) (a related problem arises in (8), where we put that as a non-obvious complement of think). More seriously, though, observe that a question can proceed from the domain of seems to be likely: (10) a. Why does John seem to be likely to love Mary? b. Because John always SEEMS something or other, which ends up never being the case. However, according to the representation in (9), the structure that seem is part of is a “left branch.” Extraction of why out of this structure should be a violation of Huang’s Condition on Extraction Domains (CED) (see Chapters 3 and 4 for a minimalist account). This suggests that the structure in (9) cannot be right. The successful derivation that we have in mind proceeds rather differently. We start with the structure in (11). The embedded clause is assembled, including an interrogative Comp, whose wh-feature must be checked overtly. Who moves to [Spec, CP], and checks the feature. Up to this point, the TAG approach and our analysis are the same. Notice that both analyses capture the intuition that the question “who does John like?” underlies the more complex “who does Bill think John likes?”
139
DERIVATIONS
(11)
CP who
C' C[wh]
IP
John
I' I
VP John
V'
likes
who
From this point on, our analysis is different from the TAG one. First, there is no tree-splitting operation. The effect of this operation is achieved by merging lexical items one-by-one below the C-projection. The derivation proceeds as follows. First, we merge that, which targets IP, a maximal projection, as expected (12a). Because that is minimal, and it selects for IP, it projects a CP-projection (more on this in Section 4). Next, we merge the matrix verb thinks, which again selects for a non-interrogative Comp. The verb targets CP and merges with it. Selection is satisfied and the verb projects a VP (12b). Next, we merge the subject of thinks, Mary. The DP needs a -role from the verb, and thus the verb projects a higher VP-node, which creates the appropriate Spec-head configuration for -role assignment (12c). (12) a.
b.
CP whoj
merge
whoj
C'
C[wh] that
CP C'
C[wh]
CP merge
IP John likes t j
VP
thinks that
CP IP John likes t j
140
A NOTE ON SUCCESSIVE CYCLICITY
c.
CP whoj
C' VP
C[wh] merge
Mary
V'
thinks that
CP IP John likes t j
For ease of exposition we have not bothered to specify a v projection, or the T projection in the intermediate clause; but those too would have to be tucked in, in the obvious fashion – with corresponding movements of John to [Spec, TP], and so on. Essentially, an entire derivation would have to be run in terms of tucked-in separate mergers and moves. Notice that each instance of Merge is motivated by the usual requirements: selectional restrictions of the item that is targeted (C merges with IP, V with CP, and so on) theta-relations (Mary is theta-marked by thinks, etc.) or checking configurations (movement of Mary to [Spec, IP] for Case/EPP reasons, etc.). Nothing special must be said, except that the merger does not occur at the root of the tree, but rather in an arbitrary point in its spine.
4 Basic technical implementation We are assuming, first, that there is a feature responsible for local whmovement. This wh-feature is what causes the wh-element to move to the front of the local sub-tree it is a part of. Crucially, for us the relevant feature is not waiting at the end of a derivational cycle, but is part of the first cycle. Whether it is already in the wh-element (as suggested in Chomsky 2000) or is rather in the first Comp in the derivation, is immaterial for our purposes. What matters is that the local movement must proceed, or else the rest of the account falls apart. Second, the question arises as to how the grammar “knows” when not to continue moving the wh-element. For instance, why does the tucking-in not apply in (13a) at the point illustrated in (13b). (13) a. Mary wonders who John likes. b. [whoj C[wh] John likes tj] Remember that we said that every instance of Merge must satisfy selectional requirements of the item merged. When faced with a structure like (13b), a verb that selects for a wh-complement, like wonder, has no choice but to merge 141
DERIVATIONS
at the root, because this is the only place where its selectional requirement is met. What prevents a mistaken tucking-in? Evidently, tucking-in is just a possibility that Move and Merge have. Hence, whatever can be moved/merged, according to whatever governs those processes, can also be tucked in. In the example above, we were just dealing with separate words. But surely entire phrases could be tucked in as well, for instance, the man instead of Mary in (12): (14)
CP whoj
C'
C[wh] merge
VP
DP
V'
the man thinks
CP
that
IP John likes t j
What could not be tucked in is Peter thinks, for instance, since that is not a valid phrase (it lacks a complement, hence is impossible in BPS terms). This general line of reasoning has consequences for our account of islands, as we now proceed to show.
5 wh-islands Observe the structure in (15): (15)
CP whoj merge
C' C'
W
John wonders why Bill said that
C[wh]
[IP Mary loves t j]
If we could tuck in John wonders why Bill said that, then it is not clear why (15) should be an island violation, as it is. (This is another argument against doing things as in (8).) We can prevent that if the complementizer that requires an IP complement, which it does not have in (15). That still raises the question of why one could not tuck in Mary loves tj as the complement of that. The unwanted derivation would be extremely bizarre. It 142
A NOTE ON SUCCESSIVE CYCLICITY
would go all the way up to (15), but then it would literally move the IP Mary loves tj to the complement of that, as illustrated in (16). (16)
CP whoj
C' VP
John
C' V' C[wh]
wonders
tk
CP
why
C' VP
C[wh] Bill
V' said
CP that
IPk
move
Mary loves t j
One way to rule this out is by having basic theta-relations satisfied in terms of Merge, not Move, as in Hale and Keyser (1993) and assumed throughout in Chomsky’s program. The derivation just outlined would violet the implicit condition. One might be tempted to try (17) instead: (17) a. b. c. d.
Separate whoj from Mary loves tj Assemble John wonders why Bill said that Merge Mary loves tj as a complement to that Merge whoj to John wonders why Bill said that Mary loves tj
The problem is in (17a). Our proposal is not separating whoj from Mary loves tj. Once those objects have been assembled, that structure is indeleble. Only a mirage may make one suppose that one is literally separating anything while tucking in. In fact, tucking in simply adds more structure to an already existing chunk, and hence the impossible island violation never arises this way. Another stab at the bad sentence would be as shown in (18). (18) a. Start with whoj Mary loves tj b. Tuck in, successively, that, said, Bill, why, wonders, John (and corresponding functional categories) 143
DERIVATIONS
This case is trickier, given that (19) is an intermediate derivational stage: (19)
CP whoj
C' VP
C[wh] 3 rd tucking-in
V'
Bill
2 nd tucking-in
said
1 st tucking-in
CP that
IP Mary loves t j
If we are allowed now to insert why, we are doomed. Successive insertion of wonder will license why, and then nothing else would prevent whoj from rising higher, thereby licensing the unwanted island. To prevent that, let us look at step 4, prior to the insertion of why: (20)
CP whoj
C' C'
C[wh] C[wh]
[IP Bill said that Mary loves t j]
The intuition to pursue is that there is a problem in the intermediate constituent {C, Cⴕ}, marked with boldface in (20) where two elements have exactly the same label: C. Of course, in BPS terms there is no such thing as Cⴕ; its label is really C. The element does have some internal constituent structure: {C, IP}. But suppose the grammar has no access, at least for category identification purposes, to the contents of a category; rather, it just sees the label. Then there is no way the grammar has of telling apart the two objects in the constituent set {C, Cⴕ}; the relevant set can thus not be formed (it is not a set of two – by assumption, identical – elements), and merger of further elements to C is then impossible if they are to be thought of as integrated in the phrase-marker in (20).3 Another way of putting what we are saying is that clausal typing is, to some extent at least, a consequence of long-distance relations. Why grammars should overtly type clauses is far from obvious, but if our reasoning is correct, if they did not, there would be no way of allowing tucking in, at least in the crucial spots that are relevant for long-distance movements, which involve several C categories. 144
A NOTE ON SUCCESSIVE CYCLICITY
Apart from giving us some insight into clausal typing, the suggestion above predicts the following possibilities: (21) a. typing wh b. no typing wh c. retyping in conflict cases (21a) is the situation just seen for English, yielding the obvious results: possible long-distance wh-movement when no two C’s clash, but impossible longdistance wh-movement when two C’s clash (that is, a wh-island). (21b) may be possible in a language, unless wh-typing is necessary for some independent reasons (e.g. wh-selection). In this language we expect no kind of long-distance wh-movement, as mere tucking-in would result in a C clash. Madurese is an arguable instance, if Davies’s (2000) claim that it does not have long distance wh-movement holds true, and indeed no clausal typing is ascertained in the language. Finally, (21c) would arise if a given language has an extra mechanism for re-typing identically typed merged C’s, of the sort in (20). If this possibility exists, the language in question would allow movement across a wh-island. Perhaps familiar instances from Romance, Slavic and others fall in the right category (again, pending some serious research on the putative retyping mechanisms). In sum, we suspect that all these possibilities exist, but our point now is just to observe the logic. Note, to conclude, that we are forced to say that all instances of movement across wh-islands involve the same sort of C, or else the account just proposed would fail. This may be significant for instances of long-distance topicalization, for example, which is known to be sensitive to wh-islands, at least in some languages.
6 Other islands We are forced to say that islands fall into two broad categories: those that arise because of the sort of situation outlined in the previous section (C clash), or else islands that arise because of impossible configurations or derivational routes. A couple of instances of the latter sort are worth mentioning. For example, if Chomsky (2000), and more generally Chapters 3 and 5, are right in devising a highly dynamic system with Multiple Spell-Out, whatever island properties arise from this system will be relevant for us, rather trivially. It is true that with our proposal much of the need for Chomsky’s specific phases, and some of their properties (in particular, edge-targeting and transparency) is obviated. Nonetheless, it is possible that other emerging islands arise in the system, for instance “cascades” for subjects or adjuncts. If movement is impossible from those domains of Spell-out, that result will immediately carry through to our system. More generally, it is conceivable that some islands will arise because we are tucking in wrongly, for instance as suggested at the beginning of the previous section. Tucking in has no properties, other than those of Merge and Move, but 145
DERIVATIONS
those operations do have rather specific properties which limit the array of possible outputs.
7 Conclusions Our main goal in this chapter was to re-present the TAG analysis of successive cyclic wh-movement by making it fit into the Minimalist machinery. Wh-phrases move to the first Comp in the derivation. From this point on, we concluded that the technical details of the tree-adjoining mechanism need to be changed to comply with BPS requirements. Instead of building a whole subtree for the matrix clause, the lexical items are merged one by one, by tucking them in underneath the original Comp. In the process of the derivation, there will be certain points at which two Comps are next to each other. If the computational system does not have a way to distinguish them, an island violation arises. Since our analysis assumes that wh-phrases move only once in the course of the derivation, the wh-feature that triggers this movement (be it on the wh itself or in Comp) is all that is needed to derive successive cyclic movement. This account does not need to stipulate additional features whose only purpose is to allow the derivation to proceed later on. Further research is needed to account for the usual phenomena involved in successive cyclic movement, such as that-trace effects, Complementizer agreement (both of the Celto-Germanic and the Chamorro type, at least), inversion in embedded and matrix clauses, the argument-adjunct asymmetry regarding long distance extraction, and of course reconstruction effects involving intermediate C domains.
146
8 FORMAL AND SUBSTANTIVE ELEGANCE IN THE MINIMALIST PROGRAM On the emergence of some linguistic forms† 1 General considerations The surprising fact about minimalism, in my view, is not that we seek economy, but that we actually find it. Biological evolution, to begin with, does not explain it, if seen in the realistic light that Gould (1991: 59–60) provides: Those characteristics [such as vision] that we share with other closely related species are most likely to be conventional adaptations. But attributes unique to our species are likely to be exaptations.1 . . . As an obvious prime candidate, consider . . . human language. The adaptationist and Darwinian tradition has long advocated a gradualistic continuationism . . . Noam Chomsky, on the other hand, has long advocated a position corresponding to the claim that language is an exaptation of brain structure. . . . The traits that Chomsky (1986b) attributes to language – universality of the generative grammar, lack of ontogeny, . . . highly peculiar and decidedly non-optimal structure, formal analogy to other attributes, including our unique numerical faculty with its concept of discrete infinity – fit far more easily with an exaptive, rather than an adaptive, explanation. [My emphasis.] Of course, one must be careful about what is meant by “non-optimal structure.” The structure of language is not functionally optimal, as garden paths show for parsing structure, and effability considerations (the grammar allows us to say less than we otherwise could) for producing structure. Lightfoot (1995) stresses this aspect of Gould’s view, in the familiar spirit of Chomsky’s work. Then again, the issue arises as to whether the structure of language is non-optimal as well, as the prevailing rhetoric of the 1980s presumed. The view at that time was that finding non-optimal structures is an excellent way of defending the specificity of the linguistic system as a biological exaptation (hence as a natural, independent phenomenon of mind in the strongest sense, with little or no connection to communication processes and all that). However, Chomsky (1986b) already showed that the linguistic practice was far removed from this rhetoric. Thus, the working details of this piece of research showed an example of optimality in syntax, as exemplified by the notion of Movement “as a last resort.”
147
DERIVATIONS
The piece also assumed categorial “symmetry,” a research program which had by then been developed into the X-theory. And in fact the book made significant use of the research strategy of eliminating redundancy within the model, a trait of Chomskyan linguistics which makes its practice closer to that of the physical sciences, rather than to the exercise of evolutionary biology. In the 1990s, Chomsky explicitly asks “. . . to what extent [standard] principles themselves can be reduced to deeper and natural properties of computation. To what extent, that is, is language ‘perfect’, relying on natural optimality conditions and very simple relations?” If the empirical answer is: “to some extent,” (realistic) biological evolution invoking exaptations (see Note 1) will have nothing to say about it. In class lectures (Spring 1995), Chomsky partly addresses this issue when comparing linguistics to another special science, chemistry, rather than to standard biology. Chemistry faces the question of how unstructured matter assumes organized form. Linguistics deals with how unstructured features are organized into syntagms and paradigms. The set of principles that is responsible for determining linguistic structure is, to some extent, comparable to the set of principles with parallel chemical effects. And although this is admittedly pushing it to a point which Chomsky might only very speculatively ever raise, it is imaginable that there is a deep level at which these sets of principles, which determine certain pockets of regularity, can be related in ways that go beyond the metaphorical. Recent work on molecular biology (where, for instance, García Bellido (1994) literally talks about “phonology and inflection in genetics,” “molecular sentences” or “syntactic analyses”) suggests that the connection might not be totally impossible, or implausible. Be that as it may, it is good to keep different sorts of economy in perspective. On the one hand, analyses in terms of what we may call static elegance have been very much part of the tradition of linguistics, and helped determine “best theories” or “learnable languages.” But on the other hand, minimalism makes fundamental use of the notion of what we may call dynamic elegance, when deciding among alternative derivations that converge on the basis of their least action. Fukui (1996) argues that this is an instance of the classic problem of calculus of variations, as is found in various areas of mechanics, electromagnetism, quantum physics, and so forth (see Stevens 1995). Given various paths that, say, a ray of light may take from an initial point 0 to a final point f, the path that light actually takes is determined in terms of it being the one involving least action. Likewise, given various derivational paths that may be invoked when going from a point 0 in a derivation to its final point f, the path the derivation takes is determined in terms of its fewest computational steps. This behavior of the linguistic system does not follow from its static elegance (bare output conditions). All competing derivations converge, and in that sense are equally elegant. In fact, there is a global character to dynamic elegance that we do not witness in static elegance. This is a consideration arising in calculus problems, where the underlying issue is the examination of a set of variations and a way of deciding on the adequacy of the solutions they provide.2 While there is a univocal solution to the problem “Is 148
FORMAL AND SUBSTANTIVE ELEGANCE
this structure grammatical?” (yes or no), there is no univocal solution to the problem “Is the derivation of this structure grammatical?” Whether this solution is valid depends on whether that solution is better; what is one bad solution for a structural context may be the best we have for an alternative context.3 The questions we face are not new. The substantive properties of water molecules, for instance, tell us little about the macroscopic behavior of a creek, and the whirlpools of turbulence it forms. Examples of this sort abound, involving global properties of systems of some sort. To use a fashionable term, these are emergent patterns that, for some reason, obtain of complex systems, in unclear circumstances. There is much talk these days about systems being comparable at some level, the behavior of some helping in the understanding of the behavior of others (see Meinzer 1994). I am not really sure what this means, though, particularly if we are supposed to be talking (at least in some central instances) about systems whose behavior is predictably unpredictable. More to the point, I have not seen any proposal explaining any linguistic principle in terms of any so-called principles of self-organization. Even Fukui’s interesting proposal that linguistic derivations involve the calculus of variations falls short of establishing what the relevant Lagrangian would be.4 Nonetheless, whether questions pertaining to the dynamic elegance of the system will be expressible in familiar terms is something we cannot determine a priori. Consider, in that respect, Chomsky’s treatment of the Last Resort Condition (LRC) as a derivational requirement. A derivation that does not meet the LRC when generating a chain is impossible, and it is cancelled at the point of violation, without ever reaching a convergent or even a crashed LF. One may ask why the LRC should hold of derivations, and Chomsky (1995b) speculates that conditions of this sort reduce the range of alternatives to calculate a solution from. The system does not even include in the reference set of derivations that compete for optimality those which are cancelled somewhere along the way, because they failed to meet the LRC. This reduction of possible variations of the system recalls Haken’s (1983) Slaving Principle. Unstable modes in a system influence and determine stable modes, which are thereby eliminated at certain thresholds.5 A new structure emerges which results from unstable modes serving as ordering devices, which partially determine the behavior of a system as a whole. We may thus think of the LRC as ordering a region within the macroscopic behavior of a derivation. That is, at any given stage t in a derivation there is a potentially very large derivational horizon ht to contemplate, if the system proceeds according to standard derivational laws. This derivational horizon is dynamic, in that the system could in principle proceed forward in various ways. But suppose that certain regions in the derivational horizon (those corresponding to domains of morphological checking) are less stable than others. In particular, let it be true that a strong feature is a “virus” that the computational system must eliminate, immunizing the derivation from it as fast as possible.6 There is a sense in which this area within the derivational horizon provides a point of instability (as derivational complexity increases), which according to the Slaving Principle should enslave other competing modes that lead to other horizons. That is, 149
DERIVATIONS
while derivations that move to “immunize” an intruder feature are attracted to a point of instability, derivations whose movement is “idle” (or existing just to reach an interpretation) are not attracted to a slaving vortex. Thus, the derivational system does not even try derivations that conform to less unstable modes, any more than, given a particular vortex within a water creek, turbulence will have a high probability of emerging (as fluid velocity increases) anywhere other than within the confines determined by that very vortex.7 Frankly, I am not trying to give this speculation so much as an explanation for LRC, as an example of the sort of approach that we may attempt. It has advantages and disadvantages. The advantages are that invoking this sort of line, we cannot be accused of gradualistic continuism (the LRC holds of derivational systems because, in this way, they are easier to use in speech production and processing, thereby giving us an argument that elegance within language is a result of an adaptation) or of Platonism (the LRC holds as a defining property of derivations because the nature of these processes is purely formal, and that is just how they work, thereby giving us an argument that language is a mathematical object). But the disadvantages should be obvious. First, we run the risk of applying the latest complexity tool to whatever condition we may want to derive; we might even be tempted to bend the linguistic theory beyond recognition, for it to fit with the outside principle. However, it would be surprising if traditional linguistic principles all of a sudden start falling into place within those principles which are beginning to emerge in the emerging science of complexity. More likely, we will find nothing but similarities to use more or less metaphorically – and probably because many things resemble many other things out there. Second, we may be escaping some form of Platonism to fall into yet another form, albeit at a larger scale (“reality is a mathematical object”). Why the LRC should follow from the Slaving Principle is something that should worry us. Fortunately, here we are in the same boat as everyone else within the special sciences, and perhaps we should not worry about this any more or less than biologists should worry about whether the Slaving Principle played any role in the emergence of life, say. So I suppose the moral is the usual one. We have to keep doing our dirty work, assuming that the end of syntax is nowhere nearer than the end of chemistry or biology is. However, doing work now involves two different, legitimate jobs. One is the usual analytic one, perhaps revamped with new tools. The other is a bit more abstract. Suppose we find phenomenon P, and characterize it in terms of conditions x, y, z. Have we found x, y, z properties of the linguistic system? The sorts of issues raised in this section allow for the possibility that phenomenon P may be an emergent property – like the spiral pattern in a snail shell – which arises dynamically from various systemic interactions. We may even have ways of coherently characterizing the properties of phenomenon P directly, by way of some function using x, y, z as variables – just as we can calculate the snail shell as a given logarithmic function. And yet, minimalism invites the suspicion that phenomenon P may not, in itself, instantiate any axiomatic property of the linguistic system – any more than a logarithmic function per se 150
FORMAL AND SUBSTANTIVE ELEGANCE
instantiates any axiomatic property of growth in the biological system.8 And to make life interesting, telling gold from glitter will not be nice or easy. In what follows, I raise this sort of question with respect to the phenomenon of obviation – which I believe is partly emergent. It involves two structural relations, and one (perhaps more) interpretive relations. Let us call the latter Relation R. As for the former, full referring expressions (names and definite descriptions) involve only command: must be obviative with respect to only if commands . In contrast, pronominal elements involve command and locality. So this is perhaps a candidate for phenomenon P above, obeying conditions x (R), y (command), and z (locality). These sorts of conditions are articulated into a theoretical whole in various places that I return to, but most notably in Chomsky (1981) and (1986b). I want to propose, instead, that each of the relations involved in obviation, and the corresponding structural correlates that they involve, follow from different properties of the language faculty. If so, seeking a unified theory of obviation is akin to seeking a theory of shell patterning.
2 Command paths and Multiple Spell-out Let me start by summarizing the proposal in Chapter 3, where it is shown that command paths emerging in the PF and LF components are a result of Multiple Spell-out. This is important in itself: Spell-out being just a rule (not a level), it should be able to apply anywhere, anytime. But the proposal is of relevance to us now because it makes command relations natural emergent properties of the system, thus not something that has to be re-stated at the PF or the LF components, even if playing a role there as well. To begin with, Chapter 3 deduces Kayne’s LCA:9 (1) precedes iff: (a) commands ; or (b) commands and dominates . Throughout, I assume that command is a direct reflex of merger (M), adapted from Epstein (1999): (2) commands iff merges with , reflexively dominating .10 Then command is the only relation which is derivationally defined (via M) for a set of heads within a derivational block. Compare (3a) and (3b). The boxes in (3) define “derivational blocks” or monotonic applications of M.11 Given any two categories X and Y within a derivational block, either X and Y are merged, or otherwise X is merged with (the projection of) a category which Y has merged with, etc.: (3) a.
{,{{,{,}}, }}
b.
{,{{ ,{ , }},{,{,}}}}
{,{,}}
{ ,{ , }}
151
{,{,}}
DERIVATIONS
Call the object that results from monotonic applications of M a “command unit.”12 Command units emerge within a system with usual assumptions. To deduce the base step of the LCA in (1a), note that it encodes two formal and one substantive requirement. The latter is that unordered structural relations map to precedence ones. Chomsky (1995b) assumes plausibly that this follows from bare output conditions on PF (contra Kayne 1994). The first formal requirement expressed through (1a) is that the sequence of PF heads should be structured in terms of already existing structural relations; the second formal requirement, that the PF sequence should map specifically to familiar precedence relations.13 Neither requirement follows from the substantive assumption, but they are optimal routes to take. First, command is the only deduced structural relation that holds of heads which are structured within “command units”. If the LCA attempts a mapping from already existing relations among heads, only command could be relevant without any additional cost.14 Second, mapping the command sequence , , , … to a sequence of PF timing slots 1, 2, 3, … in a bijective fashion is optimal. For a twodimensional space, mapping the first sequence to the x axis and the second sequence to the y axis, the relevant function is the trivial xy; alternatives to xy involve more operational symbols.15 Therefore the base-step of (1) follows from piggy-backing on M’s derivational history, given dynamic economy. Chapter 3 then proceeds to deduce the induction step in (1b) in a less direct fashion, since there is no trivial way of deducing the fact that domination is involved in this step. The proposal is that (1b) is a result of applying Spell-out multiply, each time according to the base step in (1a). Assuming an LC Theorem (the base in (1a) being deduced), only command units can be linearized by the application of a transformational procedure L at Spell-out. L is an operation L(c) p, mapping command units c to intermediate PF sequences p (see Note 15), and removing phrasal boundaries from c representations: (4) { ,{ ,{ ,{ ,{ ,{ , }}}}}} → { , , , , } The output of L is a labeled word sequence, with no constituent structure, hence, not a phrase-marker. It is still a labeled object, though, and in that sense akin to a frozen expression whose internal structure is inaccessible to the computational system, but can nonetheless be part of a phrase-marker. We can then proceed to merge it further, which will yield a new command unit for L to linearize. This is Multiple Spell-Out, which makes (1b) true, but derived. A complex phrase-marker involving several derivational blocks must be spelled-out prior to M, or the derivation will crash at PF (once (1b) has no axiomatic status). But note, crucially, that partially spelled-out objects must be mapped to LF prior to the phrasal flattening induced by L. Thus generalized merger (of the sort in (3b)) has to produce two objects, a linearized one for PF, and a hierarchically unordered one for LF. This entails a dynamically bifurcated model, of the sort sketched by Bresnan (1971), Jackendoff (1972) and Lasnik (1972, 1976: appendix), and recently revamped by Lebeaux (1991). The PF and LF components are not mapped in a single static step, but through a 152
FORMAL AND SUBSTANTIVE ELEGANCE
cascade of derivational events that end up converging (or not) on final representations. This property of the system is central, given that a dynamic bifurcated model must have incremental characteristics (so as to assemble partial structures until final representations are achieved). The ultimate assembling operation has to be substitution, as in Chomsky (1995b: Chapter 3). The spelled-out object serves the purpose of Chomsky’s designated substitution target 0, for it has a label (allowing substitution of the right sort), but no phrasal structure. Structure is gained only by substituting into the partial p and l representations, already “stored” in the LF/PF components upon previous instances of Spell-out:16 (5) a. Standard merger and Spell-out of first derivational block: { , , }
L
{ ,{ , }}
{ ,{ , }}
b. Standard merger and Spell-out of second derivational block: {,{ ,…}, ,}
L
{,{{ ,…},{,{,}}}} { ,…}
{,{,}}
{,{{ ,…}, {,{,}}}}
c. Generalized merger of first command unit and second command unit: Substitution S(p1, p2)
Substitution S(l1, l2)
{,{ ,…},,}
{, {{ ,…},{,{,}}}}
{ , , }
{ ,{ , }}
Final PF result:
Final LF result:
{,{ , , },,}
{,{{ ,{ , },{,{,}}}}
Note that PF processes must happen within intermediate p representations, or they will not be constructive processes. For instance, determination of a focus projection (as in Cinque 1993), or comparable phonological processes of the sort discussed in Bresnan (1971), must take place prior to the integration of p1, p2, p3, … pn sequences into a final phonetic representation. Afterwards the representation will be in the interpretive A/P components. We can in principle talk about representational units within this phonetic object, but these have to be justified on the basis of substantive bare (A/P) properties. It is unclear what such property a focus projection, say, could follow from. In contrast, a focus projection path is a natural object to expect in partial representations p. In fact, Cinque’s characterization is essentially in terms of command (right) branches, as the dynamic bifurcated model correctly predicts.17 Comparable issues arise for the LF branch. Just as PF processes must happen 153
DERIVATIONS
within intermediate p representations, LF processes must happen within intermediate l representations – or they will not be constructive processes. Then if only partial representations l can be the locus of LF constructive (i.e. formal) processes, it follows that only command units could be the locus of LF universals.18 Again, we could in principle talk about representational units within the final semantic representation, but these would need substantive (I/C) justification. In sum, command is indirectly relevant for LF (or PF) units because these are dynamically mapped from partial representations where only command holds among word-level units.19 If this proposal is correct, it does not just happen to be the case that LF dependencies are sensitive to command (as opposed to other imaginable relations). Matters could not have been otherwise. Relation R, the substantive relation involved in obviation, instantiates a sub-case of this general property of LF. Of course, we could define other relations, but whatever step we take in this direction has serious consequences, given these (rather crucial) assumptions: (6) Assumption about the Inclusiveness of the LF Component All LF representations are the expression of lexical relations. (7) Assumption about the Uniformity of the LF Component All LF processes are observable in some form prior to Spell-out. Given (6) and (7), simply stipulating a specifically covert relation like “is-in-thesame-phrase-marker-as,” will be non-inclusive, since this hypothetical relation would be determining representations which are not the expression of lexical features. Suppose, then, that we want to deduce the relation in question from some constructive procedure. Saying that it is a procedure specific to the LF component would be non-uniform, since the procedure in question would not be observable in the overt component. This means the only possibility left is that “is-in-the-same-phrase-marker-as” follows from the pre-Spell-out procedures. If the architecture I have discussed is anywhere near right, such procedures never yield a global relation of the sort we are considering; rather, they are limited to command relations (“is-in-the-same-derivational-block-as” dependencies). I hasten to add that the semantic component, after LF, may in fact be able to invoke such global relations as “is-in-the-same-phrase-marker-as.” But, ultimately, that is a matter of performance: whatever goes on after the levels of LF and PF which interface with the sensory-motor and intentional-conceptual components. Given the present architecture, if we find that phenomenon P is sensitive to a structural relation that cannot be deduced from syntactic procedures (in the narrow sense invoked here), this is an immediate indication that P is not narrowly syntactic. That possibly is the case with Weak Cross-over, of the sort illustrated in (8): (8) a. His son likes everyone. b. Who does his son like? c. His son likes a man. 154
FORMAL AND SUBSTANTIVE ELEGANCE
There is a characteristic interpretive limitation in sentences of this sort: the pronoun cannot co-vary with everyone, who or a man (in an indefinite reading). Crucially, though, there is no command relation between either of these elements, and hence by hypothesis whatever interpretive relation is relevant in these instances cannot be narrowly syntactic (i.e. takes place after LF).20 Conversely, that P is best characterized by a relation which can be deduced on syntactic grounds makes it possibly, but not necessarily syntactic. For instance, in light of (9), we may assert (10) (see May 1977): (9) a. b. c. d.
John is believed by his mother to be a genius. His mother believes that John is a genius. Everyone is believed by someone to be a genius. Someone believes that everyone is a genius.
(10) If quantifier commands quantifier at LF, then has scope over . Now, even if (10) indeed obtains, it may be possible for it to result from post-LF conditions.21 This is all to say that the fact that command defines obviation does not in itself make obviation an immediate part of syntax. I will argue that at least local obviation is syntactic, but the point cannot be conceded a priori.
3 Locality matters Consider next locality. The first obvious question is why given elements should care about how far away other elements are. For relations involving movement, we have an answer, the Minimal Link Condition. This poses the same sorts of theoretical questions that the Last Resort Condition did, in terms of whether its violation leads to a derivational cancellation or rather a crash, and either way, why locality should be part of the phenomenon of grammar. It is worth stressing that, when we say locality in minimalism, we mean something quite specific: (11) …
MinD(X) MinD(Y) […[ …[ … [ …[ …
Given a command unit including …,…, …, …,…, and where MinD(X) {,, …} and MinD(Y) { ,, …}, is closer to the elements in MinD(X) than the elements in MinD(Y) are, but (i) is not closer to any of the elements in MinD(X) than to any other such element in MinD(X) and (ii) none of the elements in MinD(Y) is closer to than any other element in MinD(Y). Distance is relevant only within command units, which we expect given the present architecture. But locality voids distance:. Elements within the same minimal domain are equally far as targets or equally close as sources of movement to or from an element that is trying to move or be moved. Assuming the state of affairs in (11),22 I am concerned with the fact that elements standing in a “super-local” relation of lexical (L) relatedness do not 155
DERIVATIONS
interfere with one another for distance purposes. Curiously, various phenomena seem to be sensitive to the notion of L-relatedness, as in (12): (12) and are L-related iff a projection of is in the minimal domain of . (13) a. b. c. d. e. f. g. h.
Absence of distance (as in (11)). Head movement. Word formation (as in Hale and Keyser 1993). -role assignment (as in Chomsky 1995b). A-movement (as in Chomsky 1995b). Distribution of Case features (see below). Local anaphoric binding (see below). Local obviation (see below).
(13b) obeys the head movement constraint.23 (13c) (words cannot be derivationally formed without L-relatedness of all their constituents to a pivotal head) adds to (13b) a limit on successive head movement. It only happens twice, still creating a word. (13d) is true for role assignment (verbs are L-related to their internal argument(s), and to the specifier of the v-projection, after the verb raises to the v shell). As for (13e), consider (14) below. (14) a.
TP S
T' T v
V
vP T
O
v' tS
v
v' VP
tv tV
156
tO
FORMAL AND SUBSTANTIVE ELEGANCE
b.
TP O
T' T v
V
vP T
S
v' tS
v
v' VP
tv tV
tO
While (14a) involves A-movements which are never across domains of L-relatedness (O moves within the domain of V’s L-relatedness, S, within the domain of v’s L-relatedness), the movement of O in (14b) is not within the domain of L-relatedness of any category, and is impossible. This should bear on the locality implied in the distribution of Case features (13f), although there is the issue altogether of what it means to have uninterpretable Case features that moved determiner phrases must check (see Section 5). In turn, (13g) can be illustrated as in (15) (to ensure local binding, I use the Spanish clitic se): (15) a. El se destruyó. he self destroyed
b. *El se destruyó fotos. *he self destroyed photos (“He destroyed photos of himself.”)
Finally, local obviation (13h) arises in the same contexts: (16) a. He destroyed him. b. He destroyed photos of him. Whereas (16a) is obviative (the reference of he and him differs), (16b) is not. To the extent that all of the phenomena in (13) involve L-relatedness, we may think of this notion as determining a dimension of some sort, within which certain structural properties are conserved.24 There are plausible candidates for conservation in most of these examples. (13a) conserves locality itself; (13b), the symmetry of head relations; (13c), lexical integrity; (13d), lexical dependencies; (13e), (A-)chain integrity. The last three, though, are less obvious. (13g) might be a sub-case of (13b) (as in the literature partly summarized in Cole and Sung 1994) or a sub-case, essentially, of (13d) (as in Reinhart and Reuland 1993, though see Section 7). Why (13f) should be conserving Case relations is hard to understand, if we do not know what Case is. And as for (13h), we are essentially clueless. In what follows, I shall try to relate the mystery in (13h) to that in (13f), by way of a detour. 157
DERIVATIONS
4 Clitic combinations The Spanish paradigm in (17) is idealized from Perlmutter (1971) (see more generally Wanner 1987). (The basic sentence is always: he showed y to z (in the mirror) and judgments are my own.) (17) Possible argument clitic combinations:25
a.
b.
c. d.
e.
Dat
Acc
Dat
Acc
Acc
Dat
Acc
Dat
*me I.sg *nos I.sg/pl *me I.sg *nos I.pl *le(s) III(pl) me I.sg nos I. pl *le(s) III(pl) *le(s) III(pl)
me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl lo(s) III(pl) lo(s) III(pl) lo(s) III(pl) me I.sg nos I.pl
*te II.sg *os II.pl *te II.sg *os II.pl
me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl
me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl
lo(s) III(pl) lo(s) III(pl) te II.sg os II.pl
me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl *le(s) III(pl) *le(s) III.pl le(s) III.pl me I.sg nos I.pl
*te II.sg *os II.pl *te II.sg *os II.pl
te II.sg os II.pl *le(s) III(pl) *le(s) III(pl)
*me I.sg *nos I.sg/pl *me I.sg *nos I.pl lo(s) III(pl) *me I.sg *nos I.pl *lo(s) III(pl) *lo(s) III(pl)
te II.sg *os II.pl *lo(s) III(pl) *lo(s) III(pl)
le(s) III(pl) le(s) III(pl) te II.sg os II.pl
The first surprising fact about this paradigm is that so few clitic combinations are possible. Some ungrammatical combinations may follow from the distinction argued for in Uriagereka (1995b) between strong [s] and weak [s] special clitics – I/II clitics being [s], and III clitics, definite determiners in nature, being [s]. Assuming that analysis, combinations of the form [s],[s] are impossible in a language like Spanish as a result of syntactic restrictions on movement and clitic placement, irrelevant now. That eliminates (17e). In turn, Uriagereka (1988a: 369 and ff.) provided an account of why in Spanish the clitic ordering Accusative, Dative is not possible (along the lines of why *I gave the book him is out in English). Supposing this too, the paradigm would reduce to (18) below. And surely there are ways in which to reduce some of the examples in (18) to violations of Chomsky’s Binding Condition B. However, the reduced paradigm in (18) shows that this cannot be the whole story:
158
FORMAL AND SUBSTANTIVE ELEGANCE
(18)
a.
b.
c. d.
Dat
Acc
Dat
Acc
*me I.sg *nos I.pl *me I.sg *nos I.pl *le(s) III(pl) me I.sg nos I.pl
me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl lo(s) III(pl) lo(s) III(pl) lo(s) III(pl)
*te II.sg *os II.pl *te II.sg *os II.pl
me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl
te II.sg os II.pl
lo(s) III(pl) lo(s) III(pl)
In the shaded examples, the reference of the two clitics is necessarily different, given their lexical interpretation. So while a binding theoretic account exists for the rest of the paradigm, it clearly does not generalize. To try a different take on these problems, consider restrictions as in (19): (19) a. b. c. d.
the lions the yankees the scissors the bus
/ / / /
* the lionses * the yankeeses * the scissorses the buses
Why can lionses not denote a plurality of pluralities of lions? Or yankeeses denote the plurality of teams which carry the name and/or description (the) yankees? Or why is scissorses not the way to denote two or more simple objects? Apparently morphology is to blame. When a plurality marker appears in lions, yankees and scissors (unlike in bus), a second one is not tolerated. This restriction is known to hold only for inflections, not for derivational affixes: (20) a. re-attach, re-reattach, re-rereattach . . . b. sub-optimal, sub-suboptimal, sub-subsuboptimal . . . c. transformation, transformationalize, transformationalization . . . It is at first not clear what minimalism can say about (19). Certainly, an inflectional feature must be checked (and erased if uninterpretable) in the course of the derivation, unlike a derivational feature. Assuming this, though, consider the Spanish (21c) below, which is analogous to (19a), but more conspicuous. First, note that in (21a) one of the two plural features must be uninterpretable. The data in (21c), which do not involve plurality markers on the nominals, suggest that the interpretable feature is the one in the determiner.26 Hence we assume that la “the” has the interpretable feature against which the plurality feature of leonas “lions” is checked: 159
DERIVATIONS
(21) a. [[[la] s] [[leona] s]] the [pl] lion [pl] b. * [[[la] s] [[[leona] s] as]] the [pl] lion [pl] [pl] c. la Ridruejo, Sartorius, Habsburg . . . “The female who is a member of the Ridruejo, Sartorius and Habsburg families.” d. las Ridruejo, Sartorius, Habsburg . . . “The female who are all members of the Ridruejo, Sartorius and Habsburg families” or “the female who are each a Ridruejo, a Sartorius and a Habsburg.” The question is why the checking in (21b) cannot proceed. Note that the problem is not the fact that we have two uninterpretable features to check against the, since the’s feature is interpretable, and hence does not erase upon checking an uninterpretable feature that moves to its checking domain. The problem is not either that the most deeply embedded [pl] feature would not be in the checking domain of the interpretable feature, particularly if at LF each uninterpretable feature can move (together in a feature bundle or separately) to the relevant target. Likewise, it does not seem that one of the features is a closer source for attraction than the other one is: both are within the same local domain. To avoid this impasse, consider the definition of Minimal Domain: (22) Definition of Minimal Domain: For , a feature-matrix or a head #X#, CH a chain (,t) or (the trivial chain) : ii(i) MAX() is the smallest maximal projection dominating . i(ii) The domain D(CH) of CH is the set of features dominated by MAX() that are distinct from and do not contain or t. (iii) The minimal domain MIN(D(CH)) of CH is the smallest subset K of D(CH) such that for any x belonging to D(CH), some y belonging to K dominates x. There is a crucial modification in (22) to the standard notion of Minimal Domain, which I have italicized. We must grant this extension in the current version of minimalism because, otherwise, matrices of formal features which move by themselves at LF will never be part of a minimal domain. The extension is trivial. Checking is about features, not about categories. Checking domains should be about features and not about categories.27 However, now consider the following fundamental question. How does the grammar know one feature from another? Two identical features should be taken to be only one feature once they are lumped into a set – and a minimal domain is nothing but a set. This is to say that if the two [pl] features in (21b) reach the same checking domain (which is a sub-set of a minimal domain), they will be identified as only 160
FORMAL AND SUBSTANTIVE ELEGANCE
one feature in that set, and hence only one checking will be possible. This leads to a direct crash, perhaps even to a derivational cancellation.28 A similar analysis can be given to the bad examples in (18), if we are able to show that the crucial feature that motivates clitic movement is identical in two different clitics which, following Chomsky (1995b), seek checking within the checking domain set of the same hosting head. For all of the (a) and (b) examples in (18) we can say that the relevant feature is [s], whose substantive content is “pertaining-to-the-pragmatic-axis” (that is, I and II). For the (c) examples the relevant feature is [s], whose substantive content is that of a definite article. Of course, the clitics do differ in phonological shape, but this is irrelevant. PF features never enter into syntactic computations. It may also be suggested that even if all III clitics have the same substantive character, they differ in the uninterpretable feature of Case. Below, I propose that it is because arguments in the same checking domain are in fact non-distinct that they must be marked with Case.29 Finally, why are I and II not distinguished? The grammar distinguishes the fact that these are speech-oriented clitics, which I code through feature [s]. What I do not think to be the case is that the grammar codes differences between I and II, even if pragmatics does.30 If these ideas are on the right track, then not surprisingly, when a combination of [s] and [s] features is at issue, the result is perfect, as in (18d). Each feature is indeed identified as distinct in the checking domain. One may be tempted to favor a semantic analysis, but the examples in (23) – corresponding to the bad sequences in (18a)/(18b) – do not make it advisable.31 (23) a. Me mostró a mí/nosotros. me showed to me/us “He showed me to me/us.” Nos mostró a mí/nosotros. us showed to me/us “He showed us to me/us.” b. Me mostró a ti/vosotros. me showed to you/you.pl “He showed me to you/ you guys.” Nos mostró a ti/vosotros. us showed to you/you.pl “He showed us to you/ you guys.”
Te mostró a mí/nosotros. you showed to me/us “He showed you to me/us.” Os mostró a mí/nosotros. you.pl showed to me/us “He showed you guys to me/us.” Te mostró a ti/vosotros. you showed to you/you.pl “He showed you to you/you guys.” Os mostró a ti/vosotros. you.pl showed to you/you.pl “He showed you guys to you/ you guys.”
The fact that some (as opposed to none) of these combinations arise with full pronouns indicates that there is no semantic problem with (18). But the fact that all of the examples are grammatical is even more interesting. It suggests that a Condition B treatment of some of the ungrammatical examples in (18) is on the wrong track. This is welcome, because otherwise we would be ruling out some of those examples twice, through Condition B, and through a failure in checking. 161
DERIVATIONS
One other example is significant. Compare (18c) now repeated, to (24b): (24) a. *Le lo mostró. *him him showed
b. Se lo mostró. se him showed “He showed him to him.”
This example usually gets what I believe to be a misleading analysis. It is glossed as (24a), claiming that se in (24b) is just a phonological variant of le. But why le should get a phonological variant in just this instance is never discussed. Likewise, the surprising fact in (25d) usually goes unnoticed: (25) a. Aquí se encarcela hasta a uno mismo. here se(impersonal) jail even to one same “Here, one even jails oneself.” b. *Aquí se se encarcela. *here se(impersonal) se(reflexive) jail c. Aquí se envía dinero a los familiares. here se send money to the relatives “Here, one sends money to one’s relatives.” d. *Aquí se se lo envía. *here se(impersonal) se(“DATIVE”) it send Bouchard (1984) discussed paradigms in which incompatibilities of the sort in (25b) arise. Impersonal se cannot co-occur with reflexive se. This has led various authors to treat all instances of se as being the same se (see Raposo and Uriagereka 1996 for discussion and references). Let us suppose this is correct. Whatever indirect object se is, it cannot co-occur with impersonal se (25d). Thus it looks as if this “dative” se is se after all.32 If so, we immediately see why (24b) is grammatical, assuming (see Section 6) that se, unlike le, does not have a feature [s], thus being compatible with the [s] lo.33 We shall discuss in Section 8 how (24b) comes to mean something similar to what (24a) would mean. (24b) is as good a structure as the grammar of Spanish generates when trying to express the meaning implied in (24a). It is plausible that other Romance languages avoid the sequence le lo in other ways, thus for instance creating merged clitics like the Galician-Portuguese llo/lho a single clitic coding the meaning of lle and (l)o, or the Italian glielo, which collapses le and lo. Why the language would need to do this is what matters for our purposes. Two clitics with the same [–s] feature hosted in the same head lead to an LF crash.
162
FORMAL AND SUBSTANTIVE ELEGANCE
5 Uninterpretable features Recall now the English paradigm in (26)–(30) (where ! marks obviation): (26) a. *I like me
b. … you
c. … him
d. … us
e. … them
(27) a. you like me b. *… you
c. … him
d. … us
e. … them
(28) a. he likes me
b. … you
c. !… him
d. … us
e. … them
(29) a. we like me
b. … you
c. … him
d. *… us
e. … them
(30) a. they like me b. … you
c. … him
d. … us
e. !… them
If the approach in the previous section is correct for (18), can we get it to handle these familiar facts?34 In Chomsky’s (1995b) system, the Case features of a direct object move to the domain of T at LF, where they are checked against the sublabel V of T (which is created upon the independent movement of V’s features to T).35 The point is: at some stage in the derivation the Case features of the subject are in the checking domain of T (its specifier), and later on the Case features of the object are in the checking domain of T (adjoined to it).36 How do we keep track of those features, if they are identical and included in a set? We may argue that the grammar does not get “fooled” because the object Case features are [accusative], while the subject Case features are [accusative]. Yet, there is something peculiar in this reasoning. Why are there different Case features for elements within the same domain? Because we need to distinguish subject Case checking from object Case checking. And why do we need to distinguish those? Because both subjects and objects have Case features! In a nutshell, we are positing a distinct Case feature with the sole purpose that an object or a subject moves to a configuration that gets rid of it. The reason that Chomsky (1995b) considers Case an uninterpretable feature is empirical. Argumental movement by LF itself would also be forced even if the Case features were interpretable, so long as the elements that host the Case feature at LF have themselves an uninterpretable feature. However, consider (31): (31) * The man seems that t is smart. The man moves to check the matrix D feature in T, after having checked and erased Case in the embedded clause. The only thing that goes wrong in (31) is that the matrix “Case hosting” feature is not checked, because the regular Case feature in the man has been erased downstairs; but that holds only if Case is uninterpretable, for otherwise Case could be checked several times. However, while these mechanics work, they are unsatisfactory. The real question was and still is why there should be uninterpretable Case features in the grammar. Note that the existence of uninterpretable features per se is not particularly troubling. Take, for instance, uninterpretable D features of various sorts. They code a dependency between an interpretable D feature in a determiner or tense and some associate. Why there should be such long-distance 163
DERIVATIONS
dependencies is an issue, but once this is assumed, it is perhaps even natural that we should code them through this particular device. What is harder to see is what is at issue in Case checking, since both the source and the target of the movement involve uninterpretable features. Is this an irreducible quirk? Consider Case features in some detail. Unlike intrinsic features (like D or wh-), Case features are relational. A D feature, for instance, has a value “” which needs no further validation and allows the D feature to appear in certain checking domains. Case is different. If a Case feature has value “accusative,” that value needs to be checked, specifically, against a “Case hosting” feature “I-check-accusative.” In fact, Chomsky argues that checking an “accusative” feature with a feature “I-check-nominative” leads to a feature mismatch, and an immediate derivational cancellation. But the very notion of “feature mismatch” presupposes not just feature checking, but furthermore feature matching. Why can we not say just that, in the same way that some feature in T attracts, specifically, a D feature (and not a wh- one, say), some other feature in T attracts, specifically, an accusative feature (and not a nominative one)? We cannot, because that denies the phenomenon of Case altogether. Accusative or nominative are not features, but values of a particular sort of feature, even if these values are relationally determined, through matching. It must be emphasized that matching is a grammatical process, unlike checking, which is just a non-unified phenomenon. Under certain checking configurations (given domains as in (22)), features may stand in checking relations. When they do, certain things may happen; for instance, erasure ensues if a given feature is uninterpretable. Matching, in contrast, is a derivational process that sanctions certain values of features; if this process fails, the derivation is canceled. No derivation is canceled for a failure in checking; there are no such failures – what fails is the resulting representation. It is thus important to understand what happens when matching succeeds: (32) Checking Convention When a relational feature [R-F] is attracted to match a feature [F-R], the FF-bag containing the attracted feature is R-marked. A simple look at an actual morphological paradigm shows that (32) is accurate: (33) Spanish pronominal clitics: Acc Dat sg I me me II te te III lo/la le pl I nos nos II os os III los/las les
Their features and their morphology: Acc Dat sg pl ms fm s o a I m e e II t e e III l e I n o o II o o III l e
The key here is this: there is no unified morphological realization for an accusative or dative feature. For I/II-sg, it is e. For I/II-pl, it is o. For III-sg/pl, it is for the accusative and e for the dative. Paradigms of this sort are quite 164
FORMAL AND SUBSTANTIVE ELEGANCE
common, and tell us something simple and well known. We need complex features. We must allow for features like [I, sg, acc] or [III, sg, ms, acc].37 Or differently put: the value accusative or dative is a value for a whole bag of formal features, as the Checking Convention predicts. But even if (33) indicates that things are this way, why should they be so? I suggest that the Checking Convention allows featural distinction, for FF-bags which typically end up in the same set. That is, assume that all person, number, gender features are interpretable, in accordance to Chomsky’s (1995b) analysis. Now imagine a situation (within a given minimal domain) where the features in each argument have the same values. This should be very common, once we set aside Case. Most statements about the world involve III subjects and III objects. Then it is natural for grammars to have a purely formal device to mark FF-bags for distinctness. At that point, no issue ever arises, regardless of specific values.
6 Obviation revisited The question is then what effect this grammatical fact has on speaker interpretation, a part of performance within minimalism. We have a feature “I-checkaccusative” associated to v, and it has the effect of erasing the Case feature in him, and correspondingly “accusative”-marking the FF-bag that contained the erased Case feature. I suggest that we bite the bullet and relate the mystery of Case to the mystery of local-obviation, in the following way: “accusative”marked FF-bags are disjoint from “nominative”-marked FF-bags.38 Chomsky (1995b) assumes, essentially, that FF-bags carry referential features (see Chapter 10), and in that respect it is not surprising that bags marked for distinctness should be interpreted differently as well. I assume with Postal (1966) that pronouns are hidden definite descriptions; him has the import of the one. With Higginbotham (1988), I take definite descriptions to invoke a context variable whose range is confined by the speaker. Then the one is roughly, though more accurately, the one I have in mind. In sum, (34a) has the informal logical form in (34b): (34) a. He likes him. b. [the x:one(x) & X(x)] [the y:one(y) & Y(y)] x likes y And the key factor is this: can context variables X and Y have the same value? If not, the two unique ones invoked will come out distinct: one is a unique one salient in context X, the other one a unique one salient at context Y. On the other hand, if XY, then the two ones will be the same, all other things being equal. My specific claim, then, can be technically expressed as follows: (35) Transparency Condition In the absence of a more specific indication to proceed otherwise, where FF-bags and are grammatically distinct, the speaker confines the range of ’s context variable differently from the range of ’s context variable. 165
DERIVATIONS
I do not purport to be claiming that (35) follows from anything, only that it is more natural than traditional obviation conditions, starting with Lasnik’s (1976) Disjoint Reference Rule. Why should pronouns have to be locally obviative? Within minimalism, that is a fair question to ask. (35) answers it thus: because they are grammatically distinct, the most transparent mapping to their semantics is also in terms of semantic distinctness. I emphasize also that, from this point of view, the phenomenon of obviation is taken to be speaker-dependent, thus invoking a mode of presentation from a perspective. This, though, is not crucial to my proposal, which is more modest than understanding the detailed implications of the obvious mapping in (34). I simply want to know why local obviation is there, to start with. If I am right, it is the semantic price of grammatical distinctness through Case. Consider next some potential problems that the rest of the examples in (26)–(30) may raise. First, Case marking also distinguishes you and he in he likes you, or you and I in I like you, raising the question of why this should be so if these are already distinct pronouns. (36) poses a related question: (36) a. * He has arrived. b. * He to arrive was a mistake. While he does not have to be different from any other argument here, to think that it should not involve Case – when it does otherwise – is to force the grammar into two sorts of paradigms for arguments, Case-marked and Case-less ones. This is more costly than having a single paradigm where all noun-phrases are Case marked. We do predict, however, that when a single argument is at stake (unaccusative constructions), grammars should not ascribe much significance to what form of Case is employed. This is what we find. Where English employs a subject Case in (36a), Basque employs an object Case in this same situation. Mixed paradigms also abound.39 Similarly, we may say that the you/he and you/I combinations, and others, are still Case marked because that is the simplest decision for the grammar to take, and it has no semantic consequences.40 The sequences I,I, II,II, or III,III in (26)–(30) now follow. The Case mark will force either ungrammatical or uninterpretable results for the first two sequences (depending on whether checking proceeds), and given the fact that II cannot be disjoint from II or I from I. As for III, the feature will yield grammatical and interpretable results, albeit obviative ones. Why local domains where Case is checked should correlate with the domains where local obviation obtains is also entirely expected. They are aspects of the same phenomenon. This means, first of all, that (37), and similar examples, should face the same sorts of problems that the paradigm we have just seen does: (37) a. b. c. d. e.
! A friend of mine likes a friend mine. * The best friend of mine likes the best friend of mine. ! John likes John. ! John likes him. ! He likes John. 166
FORMAL AND SUBSTANTIVE ELEGANCE
Certainly, all of these expressions are obviative, and (37b) is either ungrammatical or uninterpretable, assuming we are forcing two expressions singling out the same unique individual not to co-refer. As for why the last three examples should be obviative, the matter is trivial if only FF-features are relevant in the syntactic computation, as assumed by Chomsky.41 Now we must also concern ourselves with why the Case marker could not save all the examples in (18) – with the exceptions of combinations of the same personal feature, uninterpretable under disjointess. I deal with this below.
7 Two types of anaphora But first, we must deal with coreference. Given my analysis, coreference should be a marked instance for arguments within the same local domain. Typical local anaphoric instances support this intuition, given Pica’s (1987) insight that they are bimorphemic. A statement of self V-ing is generally possible only if the second argument carries an added morpheme like self or same (see Safir 1992). Then we may say that the syntax distinguishes each argument in the way we saw above, by Case-marking. In turn, the semantics is now dealing with two formal facts with an apparently contradictory interpretive import. Case differentiation normally entails difference, while the anaphoric element is instantiating some sort of statement about sameness. This need not be a problem, though, although we must separate two instances. Consider the expression of anaphoricity by way of a regular pronoun followed by a morpheme expressing contextual sameness, as in the Danish (38):42 (38) a. Peter fortalte Anne om hende selv /* ham selv. Peter told Anne about her self him self b. [the x:Peter(x) & X(x)] [the y:Anne(y) & Y(y)] [the z: one(z) & same-as-before(z)] x told y about z I take names not to pose any particular problems as definite descriptions, once they involve a context variable,43 so Peter is akin to the Peter I have in mind (see Chapter 12). In turn, I take selv to determine sameness of contextual confinement. When attached to hende (the one), selv makes the context variable predicated of the variable of the definite description be the same context variable as a previously introduced one. This is not just any variable, though, but the closest one.44 As a result, (38a) is ungrammatical when the pronoun ham (which normally would hook-up to Peter) is introduced. This is because the context variable which is closest to ham is that of Anne, and confining the range of ham in terms of Anne leads to an absurdity (assuming she is not a transvestite). Intuitively, the Transparency Condition would, by default, make all x, y and z in (38) be different variables. However, there is in this sentence a more specific, explicit indication to the effect that y and z have the same value, since they pick out a salient, unique individual in two identical contexts. While the puzzle above is solved by declaring the Transparency Condition a default interpretive strategy (and treating pronominal anaphors as logophors, in 167
DERIVATIONS
the spirit of Reinhart and Reuland 1993), instances of local anaphors involve a different process. Compare the Danish (39a) to the Galician-Portuguese (39b): (39) a. Peter fortalte sig selv om Anne. Peter told SIG same about Anne b. Pedro dixose algunha cousa a si propio sobre de Ana. Pedro told-SE something to SI same about of Ana “P. told himself (something) about A.” I take Lebeaux’s (1983) insight, particularly as pursued by Chomsky (1986b), to simply be that (39b) is pretty much the way (39a)’s LF should look like. If so, some feature from within the Danish sig selv must raise at LF to wherever it is that se is in Galician-Portuguese.45 But what is the relation between se and the double si (and therefore between whatever moves covertly in Danish and sig)? The Galician-Portuguese propio, literally “own,” gives us a clue, if we take it to be invoking “possession” as in Szabolcsi (1983), Kayne (1994), or more specifically as extended in Chapter 9 – see (40a). I propose in this light (and much in the spirit of Pica 1987) that the (se, si) relation invokes something like “x’s own persona” (40b), where pro is the null instantiation of a personal classifier (see Chapters 10 and 12): (40) a. [ John bein(have) [XP t [ t [AgrP a head Agr [SC t t ]]]]] b. [ … CLITIC … [XP DOUBLE [ t [AgrP pro Agr [SC t t ]]]]] (40b) is intended to cover all instances of clitic doubling (modifying an analysis presented in Uriagereka 1995b), and not just anaphoric clitic doubling. This is desirable on two other grounds. First, it unifies the paradigm in (41): (41) a. Le levanté a él la mano. him raised.1sg to him the hand “I raised his hand.” [ … le … [XP a él [ t [AgrP [ la mano Agr [SC t t ]]]]]] b. Lo levanté a él mismo him raised.1sg to him same “I raised him (himself).” [ … lo … [XP a él [ t [AgrP [ mismo [pro]] Agr [SC t t ]]]]] c. Se levantó a sí mismo se raised.1sg to si same “He raised himself.” [ … se … [XP a si [ t [AgrP [ mismo [pro]] Agr [SC t t ]]]]]
168
FORMAL AND SUBSTANTIVE ELEGANCE
(41a) is an inalienable possessive relation. The possessor clitic raises, leaving behind the small-clause predicate la mano “the hand.” Identically, in (41b), a normal transitive construction involving clitic doubling, the “possessor/whole” clitic raises, leaving behind the small-clause predicate pro. I take pro to be an individual classifier, as in Muromatsu (1995), for East Asian languages. We can discern the presence of pro by optionally adding the adverbial mismo “same,” as argued in Torrego (1996). Finally, (41c) is almost identical to (41b), except that the head of XP is the anaphoric clitic se.46 The second reason the proposal above is desirable is that it explicitly makes us treat (e.g. Romance) special clitics rather differently from (e.g. Germanic) regular pronouns (cf. (18) vs. (26)–(30)). Following Chomsky (1995b), I take regular pronouns to be both heads and maximal projections; if so, they will not involve the elaborate structures in (41). I think that precisely those structures account for why so few clitic combinations are possible in (18), and perhaps also why special clitics must be overtly displaced. Because clitics are non-trivial, functional heads as in (41), they are morphologically weak – in fact, rather more so than the idealized picture in (33) implies. Thus me, te, nos, os can be accusative and dative in all dialects; le(s) can be third person or formal second person in all dialects, and in Latin-American dialects any second person (particularly, when plural). In most Peninsular dialects there are no distinctions between accusatives or datives (they can all be of the le or of the la type); in most informal registers, le can be singular or plural. In Andean dialects lo can be just any third person (number and gender irrelevant), and in various sub-standard dialects los is used for first person plural. Regular pronouns, in contrast, are paradigmatically distinct, which we may associate to their being whole phrases, and thus being robust enough on morphological grounds to support stress, emphasis, heavy syllabic distinctions, and so forth. In the spirit of Corver and Delfitto (1993), I propose that, given their morphological defectiveness, arguments headed by special clitics are not safely distinguished by the Case-checking mechanism. More particularly, I propose that the Checking Convention is inoperative for special clitics, simply because the FFbag of the clitic is incapable of hosting a Case-mark. And I should emphasize that I am not saying that the clitic does not check Case; rather, that it is not appropriately Case-marked (as per (32)) as a consequence of this checking. In a nutshell, if the clitic does not engage in a further process, the relevant FF-bag will not be distinguished from other FF-bags in the same checking-domain-set. We may see overt cliticization as a process aimed at explicitly marking, through syntactic positioning, certain distinct relations. For example, a strong clitic in Spanish comes before a weak clitic, making the two distinct.47 At the same time, any combination (regardless of number, gender and morphological case) of weak, or strong, clitics leads to undistinguished FF-bags, and a subsequent LF crash. We can then return to what covert relation is relevant in the Danish (37a): by hypothesis one in which sig is subject to [selv[ pro]], and the features of a null se raise to the domain of Peter. I should say that standard se has peculiar 169
DERIVATIONS
placement properties (see Note 48), which may relate to the usual assumption in the literature, forcefully expressed in Burzio (1996), that se is in essence an expletive element, with no features other than categorial ones (it is neither [s] nor [s], and thus behaves like neither of the clitics thus defined; besides, se is unspecified for gender, person, number, and Case). Ideally, matters of PF and LF interpretability decide on successful placements for se. For instance, as the clitic that it is, it should group with other clitics at PF, albeit as a wild card. In turn, the fact that it is expletive in character may relate to its interpretive possibilities, as follows.
8 Some thoughts on “sameness” I set aside the ultimate semantic import of the relation between sig and pro, assuming (to judge from (41b)) some form of identity, with the rough content of “his person.” The issue is the relation between the null se heading the complex anaphoric construction and its antecedent. By hypothesis, se cannot be marked for distinctness in terms of Case. Thus, although it is distinguished from [s] and [s] clitics in that it is a clitic with no [s] value, it is harder to see how it could be syntactically distinguished from a non-clitic expression whose features end up in the same checking-domain-set. So suppose the grammar in fact does not tell se apart from such a non-clitic expression. This leads to no LF crash if, in fact, se only has categorial features, and hence no uninterpretable feature to erase.48 In turn, the expression that se “collapses” with can be seen as its antecedent, with the rough semantics in (42) for the examples in (39) (though see Chapter 12): (42) [the x:Peter(x) & X(x)] [the y:Anne(y) & Y(y)] x told [x’s person] about y The fact that se gets taken by the grammar to be the same item as the element it shares a checking domain with entails, ideally, the formation of an extended chain with two roles having been assigned configurationally before movement. It must be stressed that although (42) comes out as anaphoric as (38) does, they achieve this interpretive result in radically different ways. In (42), anaphoricity is expressed through a single operator binding two variables, and a single chain receiving two roles. In (38), there are two separate chains, each involving their own operator, which nonetheless come out “anaphoric” (or logophoric) because of a lexical particle demanding contextual sameness. The sort of analysis delineated above trivially extends to languages, like Basque, where anaphoric relations are expressed as in (43) (and see Chapter 10): (43) a. Jonek bere burua ikusi du. Jon-S his head-the-O seen has “John has seen himself.” b. [the x:Jon(x) & X(x)] x saw [x’s head] 170
FORMAL AND SUBSTANTIVE ELEGANCE
Obviously, the relation here between bere “his” and burua “head” is not the same as exists between sig and [[pro]selv]. Yet this is arguably a low level lexical fact, assuming that “head” is the relevant classifier in Basque for persons. What is again crucial is that the null clitic se relates to its antecedent by being syntactically indistinguishable from it, leading to a fused chain. I stress that nothing is wrong, a priori, with fused chains receiving two roles. Of course, the question is whether or not we should have a Thematic Criterion, a matter that Chomsky (1995b) debates and does not resolve. Certainly, nothing that I have said allows (44a) (meaning John used himself ): (44) a. [ John T [ t v [ used t ]]] b. [ John [ clitic-T [ t v [ used […t… ] ] ]]] Note that the first movement violates the Last Resort condition (Jairo Nunes, personal communication). What allows the similar structure in (44b) is that the clitic placement does not violate Last Resort, by hypothesis. In general, fused chains should be possible only when Last Resort is appropriately satisfied.49 Explicitly: (45) Chain Fusion Situation Let and be different chains. If ’s head is non-distinct from ’s head within a given checking domain, and ’s tail commands ’s tail, then and fuse into an integrated chain , subsuming properties of and . I take (45) to be the source of true anaphoric coreference, which should be limited to se (and similar elements) “collapsing” within a checking domain.50 It is now easy to see that nothing in the properties of se makes it anaphorical, specifically. Consider (46): (46) pro se mataron se killed.III “They killed themselves/each other.” Aside from group readings, this has three transitive, distributive readings. It can have the rough import of “they unintentionally caused their death.” It also has two causative readings: a reciprocal one, and an anaphoric intepretation discussed above. Of course, readings like these are disambiguated in terms of “doubles,” as they are in English with the overt elements themselves and each other. What I am saying, however, is that the “doubles” are not crucial to the dependency, in a way that se is, inasmuch as it induces chain fusion. (47a) provides an argument that only chain fusion induces anaphoricity: (47) a. *Juan se ha sido encomendado t a si mismo. *Juan se has been entrusted to si same b. Juan le ha sido encomendado t a el mismo. Juan him has been entrusted to him same “Juan has been entrusted to himself.” 171
DERIVATIONS
Rizzi (1986) discusses an Italian example similar to the Spanish (47a) to argue for his Local Binding Condition. He suggests that what goes wrong with (47a) is that Juan is trying to form a chain over se, which has the same index as Juan. Surprisingly, however, the Spanish (47b) is quite good. Here, we have replaced se for the indirect object le. Generally, that should give the indirect object an obviative reference with respect to Juan, but we have loaded the dice by adding a logophoric double a él mismo, “to him same,” which forces the reference of le to be that of Juan. If we only had indices to establish referential links, (47b) would fall into Rizzi’s trap, being predicted as ungrammatical, contrary to fact. However, what I have said above provides a different account. Compare: (48) a. [TP Juan se [VP t [ levantó [XP …t…] ]] ] “Juan raised himself”
b. *[TP Juan se [ha sido [ encomendado[v] [VP [XP …t…] [ t t ]]]]] A fused chain arises in (48a), since all the relevant elements stand in a command relation. In contrast, a fused chain does not arise in (48b), since the third and fourth elements do not command each other.51 Nonetheless, the featureless se cannot be told apart from the element it shares a checking-domainset with (technically, the D features of Juan). This is a paradox: se cannot head its chain, and cannot be in a fused chain. The result is ungrammatical. Compare finally (48b) to the grammatical (24b), repeated now as (49a). In this instance, we should note, a reflexive reading is actually possible (49b). However, a reading is also possible whereby se is taken to be some indirect object, a matter noted in Section 4 which we left pending until now. (49) a. Juan se lo mostró. Juan se it shown “Juan showed it to him.”/“Juan showed it to himself.” b. [TP Juan se lo [ [ mostró [ v]] [VP t [[XP …t…] [ t t ] ] ] ]]
c. [TP Juan se lo [ [ mostró [ v]] [VP t [ [XP …t…] [ t t ] ] ] ]]
When se is considered within the same checking domain as Juan, (49b) is as straightforward as (49a) (the fused chain succeeds in this instance, because command obtains throughout the members of the chain). In turn, (49c) involves the consideration of se within the same checking domain as lo.52 In this instance, 172
FORMAL AND SUBSTANTIVE ELEGANCE
a fused chain cannot be formed because, just as we saw for (49b), neither the third nor the fourth elements command the other. However, se can be told apart from lo, if what I have said in Section 4 is true: se is a different clitic from lo, in that the latter is [s], while se is not specified for this feature. Whereas this property of se does not make it distinguishable from the D features of a regular subject like Juan, it does make it distinguishable from lo, la, etc. Therefore, se does not collapse with lo in (49b), and then it can indeed form its own separate chain. It will not be an anaphoric chain, but this is in fact desirable. We want the reading of the se chain to be one invoking a third person.53
9 A word on long-distance obviation One can think of other potential problems for my approach, but by handling anaphoricity I have tried to show that there are reasonable ways of keeping the system to the sort of bare picture that minimalism expects. While I will not be able to extend myself at this point to many other instances that immediately come to mind, I will, however, say a word about the fact that my analysis does not predict the obviation in (50), and many similar instances: (50) ! He thinks that John is a smart guy. Now, consider an analysis of along the lines of Reinhart (1995: 51): The coreference generalization . . . is that two expressions in a given LF, D, cannot corefer if, at the translation to semantic representations, we discover that an alternative LF, D, exists where one of these is a variable bound by the other, and the two LFs have equivalent interpretations. i.e. D blocks coreference in D, unless they are semantically distinct. Reinhart lets structures like (50) converge; her economy analysis of an example of this sort is thus strictly not related to any of the derivational concerns raised here. Her intuition is that variable binding is a more economical means of identifying referential identity, provided that assignment of reference requires relating an expression to the set of entities in the discourse. This is post-LF economy in the performance systems, about which I have nothing to say.54 Suppose we were to accept this description of (50) – should it extend to all instances discussed here? While this is a fair question, the point cannot be conceded a priori, given the sort of issue this chapter is raising. A pattern of formal behavior does not immediately demand a unified, minimalist explanation. More concretely, what do (50) and the examples in (26)–(30) have in common? The fact that command matters for all is not significant beyond the fact (given the model of the grammar argued for in Section 2). This commonality is necessary to just any LF process. Locality is in fact not common to these examples, obtaining only in the (26)–(30) paradigm, but not in (50). Then the only significant commonality is disjointness of reference, Relation R. However, as I have noted, it is plausible that the grammar only codes sameness and, by default, difference. If so, how surprising is it really that we find two unrelated phenomena which have in 173
DERIVATIONS
common one of these two effects? To demand commonality between longdistance and short-distance obviation would be like demanding commonality between long-distance and short-distance anaphora, which Reinhart and Reuland (1993: 658–60) – in my view, correctly – are careful enough to distinguish. Once it is taken as an empirical question, are there in fact any advantages to keeping the local and the long-distance phenomena separate? I think there may be a couple. First, non-local relations are immediately suspect when attempting a standard syntactic analysis. They should not exist in derivational syntax. Then again, one may try to argue that what I have called local obviation is not a phenomenon of grammar either. While I do not have any special reasons to want obviation in the grammar, the analysis I have provided here gives a simple way of dealing with it, when it is local. My treatment follows trivially from an independent property of the system. The fact that it makes use of sets that we call checking domains, which are there in some form, whether we agree on the rest or not.55 As I have tried to argue, obviation is just one among several possible results of the marking-for-distinctness involved in tagging, specifically, different FF-bags within a checking-domain-set. Making these minimalist assumptions, we were able to account for the facts in (18), which involve FFsameness, but not obviation. In turn, we were forced to look into the matter of anaphoricity, which in itself has interesting consequences: a simple distinction between logophors and anaphors, and an account of Rizzi’s Local Binding Condition effects in terms of a mechanism of chain fusion. Finally, the present account gave us a motivation for what the mysterious uninterpretable Case features are. If we insist that whatever underlies long-distance obviation should predict the short-distance facts we will lose all of these rather natural results.
10 Concluding remarks As I see it, true LF properties are the result of constructive processes. I have not said this before, but now I can be loud and clear. The dynamic model that I have summarized in Section 2 is in effect saying that LF and PF do not exist as levels, although they certainly exist as components (the locus of Full Interpretation in a radically derivational system). If LF and PF are not levels, there should not be any formal relations obtaining there, other than the ones that the derivation provides by way of its mechanisms of constructing structure. From this perspective, anything long distance should be post LF. In turn, perhaps everything shortdistance is LF, in the sense that a constructive process of grammar is being invoked. In this respect, local obviation looks hopeless at first. Why would the grammar “bother” to code such a weird requirement? As a matter of fact, the very existence of obviation conditions is not a bad argument for Gould’s position on language. Prima facie, obviation hinders communication, in that it limits the class of possible thoughts that are expressed by way of grammatical combinations. Nonetheless, this is a point about language function, not language structure. Structurally, it is still possible that obviation has a minimalist place in the grammar. I have suggested that this place is a mere reflex of something deeper: 174
FORMAL AND SUBSTANTIVE ELEGANCE
the matter of deciding on identical or different structures, given set-theoretic dependency constructs. The grammar tags symbols for distinctness and by default assumes they are also semantically different, in some way relating to mode of presentation or intended reference. In sum, the Minimalist Program forces us to ponder the nature of principles themselves, over and above the nature of phenomena. The latter are obviously important. Yet without a program to guide us into reflecting on what principles we are proposing (it may be stressed: what properties we are ascribing to the human mind), we may find ourselves redescribing the structure of something like a snail shell, without learning something deeper about what is behind that very pretty pattern.
175
Part II PARADIGMATIC CONCERNS
9 INTEGRALS with Norbert Hornstein and Sara Rosen
1 Introduction Consider the following sentence. (1) There is a Ford T engine in my Saab. It embodies the ambiguity resolved in (2). (2) a. My Saab has a Ford T engine. b. (Located) in my Saab is a Ford T engine. (2a) depicts the Ford T engine as an integral part of the Saab. This is the kind of engine that drives it. Call this the integral interpretation (II). The meaning explicated by (2b) fixes the location of at least one Ford T engine. This engine need not be a part of the Saab. Were one in the business of freighting Ford T engines in the back seat of one’s Saab, (2b) could be true without (2a) being so. Call this the standard interpretation (SI). The central point of this chapter is that the ambiguity displayed by (1) between an I and S interpretation has a grammatical basis. In particular, (1) is derivable from two different kinds of small clauses, involving two different kinds of predication structures. Underlying a II is a small clause such as (3a). SIs derive from small clauses like (3b). (3) a. …[SC My Saab [a Ford T engine]] … b. …[SC a Ford T engine [in my Saab]] … (cf. “I saw a Ford T engine in my Saab.”) We intend to interpret (3b) in the standard way. The locational PP is predicated of the small clause subject. This SC is true, just in case at least one Ford T engine is in my Saab. Clearly the SC in (3a) cannot have the same standard predicational structure. Whereas one can read (3b) as saying that a Ford T engine is in my Saab one cannot read (3a) analogously. The SC does not assert that my Saab is a Ford T engine. Rather, (3a) must be interpreted as true just in case my Saab “is (partly) constituted of” a Ford T engine. We discuss below what we intend by this notion, henceforth simply referred to as “Relation R.” For present purposes, it suffices to distinguish the underlying small clauses in (3) and tie them to different kinds of predication within the grammar. We 179
DERIVATIONS
hypothesize that the predication structure displayed in (3a) underlies constructions involving inalienable possession, part/whole relations and mass term predications among various others. Our aim here is to show how making this assumption permits an explanation of the distinct properties such constructions typically have. The structure of this chapter is as follows. In the next sections we explore the grammar of IIs and SIs. In particular, we show how to get from their respective underlying small clauses to the various surface forms that they display. We rely on and expand earlier work by Kayne (1993) and Szabolcsi (1981 and 1983). Section 3 deals with the definiteness effect each structure displays and traces it to different sources. In Section 4, we discuss the fine points of the syntax/ semantic proposal we are making. Section 5 extends the range of data and shows how it fits with the grammatical analysis proposed.
2 Some syntax Kayne (1993) argues that English possession constructions such as (4a) are structurally identical to those in Hungarian. Following Szabolcsi (1981, 1983) he assigns them a small clause phrase structure as in (4b) (Spec Specifier). (4) a. John has a sister. b. [Spec be [DP Spec D0 [[DPposs John] Agr0 a sister]]] In essence, to get the manifested surface order, John raises to the Spec position before be, a sister raises to Spec D0 and the D0, which Kayne suggests is in some sense prepositional, incorporates into be. The resulting incorporated expression beP surfaces as have. These elaborate movements are visible in Hungarian as Szabolcsi showed. In Hungarian, if the larger DP is definite the possessor DP can remain in place. If it does, it surfaces with the nominative case. It can also raise to Spec D0, however. It appears with the dative case if it does. Once moved to Spec D0, the possessor DP can raise further to the matrix Spec position. In this instance, it takes the dative case along with it. Importantly, the dative possessor co-occurs with a post nominal Agr. If the larger DP is indefinite, as occurs in cases such as (4a), the raising of the possessor is obligatory and the agreement morpheme occurs.1 Kayne extends this analysis to English. He suggests that English has a nonovert oblique D0 and that the possessor moves through its Spec to get to the surface subject position. The derivation is as in (5), with the D0 treated as (in some sense) a preposition. (5) DPposs/i be [DP ti [D/P]0 [ti Agr0 QP/NP]] Kayne suggests that the incorporation of [D/P]0 is required to change its Spec from an A to an A position. Illicit movement obtains without this incorporation. Though we accept the basics of the Kayne/Szabolcsi analysis, we differ on some details. Before getting into these, however, consider how to extend this 180
INTEGRALS
analysis to cover II constructions like (2a). Given the II, the relevant underlying structure must be as in (6a). Given that these constructions display a definiteness effect,2 the derivation must be (6b) with the DPposs moving all the way up to the matrix Spec. (6) a. [Spec be [DP Spec [D/P]0 [[DPposs my Saab] Agr0 a Ford T engine]]] b. [My Saabi be[D/P]0j [ti ej [ti Agr0 a Ford T engine]]] (1) should have a similar underlying source on its II reading. However, here we run into technical difficulties if we adhere to the details of Kayne’s analysis. The problem is where to put in and how to derive the correct surface word order. If we respect the leading idea behind the Kayne/Szabolcsi analysis, the correct source structure should be something like (7). This redeems the intuition that the D0 is somehow prepositional. (7) [Spec be [DP Spec in [[DPposs my Saab] Agr0 a Ford T engine]]] Given (7), however, we must alter the details of Kayne’s proposed derivation. In particular, to get the right surface order we must raise the predicate a Ford T engine to the Spec of in and insert there in the matrix Spec position. (8) [there be [[a Ford T engine]i in [my Saab ti]]] If this is correct, it suggests that (mutatis mutandis) the derivation of (2a) also involves predicate raising. The derivation should thus be (9b) rather than (6b). (9) a. [Spec be [DP Spec [D/P]0 [[DPposs my Saab] Agr0 a Ford T engine]]] b. [my Saabi be[D/P]0j [a Ford T enginek ej [ti Agr0 tk]]] This alternative derivation has some pleasant features. First, it allows us to extend the Kayne/Szabolcsi analysis to constructions such as (1). Second, as noted above, Kayne required the incorporation of [D/P] into be to alter A/A status of the Spec position. However, it is not entirely clear how this change is effected by the incorporation. The derivation in (9b) gives another rationale for the required incorporation of [D/P] into be. Without it, movement of my Saab across the Spec containing a Ford T engine violates minimality. More precisely, we offer the following minimalist rationale for the incorporation. Chomsky (1993b) provides a way of evading minimality restrictions in certain cases. It proposes that minimality can be evaded just in case the target of movement and the landing site that is skipped are in the same minimal domain. The effect of incorporating [D/P] into be is to place the two Specs in the same domain. This then permits my Saab to raise to the matrix Spec position without violating minimality.3 Note, furthermore, that there are analogues to sentences like (2a) that provide evidence for this derivation. II constructions have paraphrases involving overt pronominal forms and the particular preposition we find in these paraphrases is the same one that appears in the there-integral construction. If we exchange the preposition, even for a close synonym, the II reading fades. (10b,c) only have SI readings, in contrast with (10a) which is ambiguous. 181
DERIVATIONS
(10) a. My Saab has a Ford T engine in it. b. My Saab has a Ford T engine inside it. c. There is a Ford T engine inside my Saab. In a theory that treats movement as copying and deletion (like the minimalist theory), forms like (10a) are to be expected.4 The derivation of the SI reading for (1) proceeds from a different underlying small clause. (11) There is [sc a Ford T engine in my Saab] Here, a Ford T engine is the subject, not predicate, of the underlying SC. If the expletive is not inserted, raising is possible and we can derive (12). (12) A Ford T enginei is [sc ti in my Saab] Observe that (12) is unambiguous. It does not have an II reading. This follows if we assume (with Kayne) that movement from Spec [D/P] position is a case of illicit movement. (13) is the required derivation of an II analogue of (12). The movement from Spec DP to the matrix Spec is disallowed on the assumption that Spec DP is an A position.5 (13) a. [Spec be [DP Spec in [[DPposs my Saab] Agr0 a Ford T engine]]] b. *[[A Ford T engine]i is [DP ti in [[DPposs my Saab] Agr0 ti]]] Note, furthermore, that incorporating in to be does not yield a valid sentence. (14) * A Ford T engine has my Saab. This follows if incorporating the preposition into the copula does not alter the A status of the Spec D0 position (pace Kayne).6 There is further data that this analysis accounts for. First, we explain why certain existential constructions do not have be-paraphrases. This is what we expect for an existential that has only an II source. Consider for example (15)–(17). Each (a)-example has an acceptable have-paraphrase. Note that the there-sentences only carry I interpretations. For example, (15a) means that Canada is comprised of ten provinces, not that ten provinces are located there.7 Similarly for (16a) and (17a). Consequently, these sentences must all be derived from SCs like (3a) above and so will possess have-paraphrases. (15) a. There are ten provinces in Canada. b. * Ten provinces are in Canada. c. Canada has ten provinces. (16) a. There are eight legs on a spider. b. *Eight legs are on a spider. c. A spider has eight legs. (17) a. There are too many storys in my building. b. * Too many storys are in my building. c. My building has too many storys. 182
INTEGRALS
Second, we account for why it is that pied-piping disambiguates (18). (18) has two readings. On the II reading it tells us that this elephant has a big nasal appendage. On the SI reading it locates a piece of luggage. Interestingly, (19) is not similarly ambiguous. It only has the SI reading. (18) You believe that there is a big trunk on this elephant. (19) On which elephant do you believe that there is a big trunk? This is what we should expect given the present analysis. It is only with the SI reading that on this elephant forms a constituent. With the II, on is actually a D0 that takes as complement an AgrP. The inability to pied-pipe with the which phrase in (19) is parallel to the unacceptability of pied-piping in (20). (20) *For which person would John prefer to take out the garbage? There is another parallel between on in the II reading of (18) and the for complementizer in (20). Neither licenses preposition stranding. (21) a. *Which person would John prefer for to take out the garbage? b. *Which elephant do you believe that there is a big trunk on? (21b) is relatively acceptable, but only with an SI reading, only in the construction where on the elephant forms a constituent. When we control this, we get unacceptability. Under the preferred reading, (22) has an integral interpretation. With this reading, the sentence is unacceptable. (22) *Which bug do you believe that there are eight legs on? There is one further piece of evidence that in my Saab does not form a PP constituent in (1) when given the II reading. The addition of PP specifiers like right disambiguates these constructions. (23) has only the SI. (23) There is a Ford T engine right in my Saab. This is what we expect if on the II, in is not the head of a PP constituent. If it is not, there is no PP for right to relate to. Hence the presence of this element is only consistent with the SI of (1) because in this case in my Saab is a PP constituent. A third property of existential constructions with II readings is that they show subtle but distinctive agreement properties. Consider the data in (24).8 (24) a. There appear to be no toilets in this place. b. There appears to be no toilets in this place. (24a) is ambiguous. It can report that this room is not a men’s room or it can be taken as saying that the toilet storage room seems to have been cleared out and left empty. Contrast this with (24b). It can have the first interpretation. However, it cannot be used to report on the inventory of toilets. In short, it can be used to say that this room has no toilets but not that it has been emptied of toilets. There is a straightforward way to present the generalization. To express the interpretation which has a be-paraphrase, agreement is required. If in cases 183
DERIVATIONS
such as these, number agreement is taken to be an LF phenomenon, then an expletive replacement account of the kind offered in Chomsky (1986b) could account for the obligatory agreement evidenced in the SI readings. Observe that standard expletive replacement is rather unlikely for the II readings given the lack of a be-paraphrase in such cases. This plausibly underlies the optional agreement pattern. A further agreement effect is visible in the following pair. (25) a. b. c. d.
There are two gorillas in this skit and don’t you change it. There’s two gorillas in this skit and don’t you change it. There are two gorillas in this skit; Horace and Jasper. *There’s two gorillas in this skit; Horace and Jasper.
This too makes sense given our analysis. In (25a,c) the indefinite is actually an underlying SC subject. In (25b,d) it is an underlying SC predicate. Now observe that indefinites in argument and predicate positions function differently when it comes to explicit enumeration of their members. Thus, the contrast in (26). (26) a. A doctor arrived, namely Paul. b. *He is a doctor, namely Paul. The contrasts in (25) can be viewed similarly. (25a,b) involve no enumeration at all so they are both acceptable. In (25c,d) we do enumerate the gorillas. As expected, the non-agreeing (25d), in which two gorillas is actually a predicate rather than an argument, is less acceptable than the constructions with an SI reading. Consider, finally, another important difference between these two kinds of there-constructions. They display rather different definiteness effects. As is well known, the associate in an existential construction must be a weak NP. What is curious is that not all weak NPs are felicitous in IIs and some strong NPs are. Consider the following contrasts. (27) a. There are some people in John’s kitchen. b. *There are some provinces in Canada. (28) a. *There is every (possible) problem being addressed. b. There is every (possible) problem with this solution. The (a) examples each have be-paraphrases. The (b) examples are paraphrased with have. Observe that the have-paraphrase with (27b) is also unacceptable while the one for (28b) is fine. (29) a. *Canada has some provinces. b. This solution has every possible problem. Given our view that IIs derive from SCs with a predicational structure like the one underlying these have-paraphrases, it is natural that their acceptability should swing in tandem. The key seems to be that some-phrases cannot act predicatively while the sort of every phrase we have in (29b) can. Consider (30) in this light. As expected, some pig, in contrast to a pig, cannot be used as a 184
INTEGRALS
predicate NP. If the associate in an II existential must so function, the unacceptability of (27b) follows. (30) a. Wilbur is a pig. b. *Wilbur is some pig.9 As for the acceptability of (28b), it seems tied to the fact that this NP, especially when possible modifies the head noun, can be used with an amount reading. Thus, (31) seems like a good paraphrase of (28b). (31) This solution is as problematic as it could be. These amount uses of every, then, seem to have the semantics of superlative constructions and these superlatives are possible as predicates. Given that the associate here is an underlying predicate, we can trace the contrast between (28b) and (28a) to the fact that the associate in the former is an underlying predicate, while this is not so in the latter. If accurate, the above suggests that the definiteness effect is not a unitary phenomenon. In effect, the restriction on the associate in II existentials with have-paraphrases is due to the fact that the associate is actually an underlying predicate. We expect NPs that appear here to be capable of predicative uses. In SIs, in contrast, whatever underlies the definiteness restrictions must be tied to something else. As the be-paraphrases indicate, in these cases the subjects are indeed arguments. We return to these issues in the next section. To recap, we have proposed that the ambiguity in (1) is due to two different underlying SC sources. The existence of these small clauses points to two fundamentally different kinds of predication structures that grammars make available.
3 Definiteness effects We suggested above that there are two sources for the definiteness effect (DE) observed in existential constructions. The DE observed for II readings in thereconstructions should be traced to the predicative status of the NP in associate position. In SI there-constructions, in contrast, the associate is never a predicate and so the DE in these constructions must have a different source. Several have been suggested. English existential constructions are typically analyzed as having the structure in (32). (32) Therei … be [NPi…] There is an expletive sitting in subject position. The copula takes a small clause complement (Stowell 1978; Burzio 1986) and the subject of the SC (the associate) is related to the expletive by co-indexation. A salient property of (32) is the DE. Two different syntactic approaches have been pursued in trying to account for the DE. One set of analyses traces the DE to some feature of the expletive. In Milsark (1974), for example, there is treated as a sort of quantificational 185
DERIVATIONS
expression in need of a variable to bind.10 Indefinites are analyzed as having both a quantificational and non-quantificational “adjectival” reading. In the latter guise, indefinites make available free variables which the expletive there (interpreted as an existential operator) can bind. Other accounts are similar in conception, though not in detail. Higginbotham (1987) allows adjectival NPs to be interpreted absolutely. The expletive there is treated as without semantic content of any sort though it occurs with postcopular NPs that can have the required absolute interpretations. The NP in existentials is interpreted propositionally, with the indefinite acting as subject and the N as predicate. Higginbotham (1987) ties the DE to a stipulated capacity of adjectival quantifiers to get absolute interpretations. The syntax of these constructions is exemplified in (33a) and the interpretive structure in (33b). Note that (33b) is of subject-predicate form and that the expletive is taken to be without semantic consequence (in contrast to the Milsark proposal). (33) a. There are [NP few men in the room] b. [Few (x)] [men x & in the room x] Keenan (1987) makes a relatively similar proposal, though once again the details are different. He does not derive a DE. Rather he shows that existential sentences are logically equivalent to existence assertions if and only if the determiner is existential. He too treats the expletive (and copula) as semantically inert. However, he treats the postcopular expression as sentential, in contrast to Higginbotham. Semantically and syntactically the postcopular expression is a proposition.11 What unites all these analyses (for our purposes) is that they each treat the postcopular expression here as forming a standard predication structure with the indefinite (or the quantificational part thereof) being the subject. The problem with this, from our perspective, is that it will not serve to adequately accommodate I interpreted existentials. If we are correct above, then these are not cases of standard predication, and the associate is not a subject. Thus, any account that treats it as such is inadequate. The empirical evidence points to our conclusion. We saw in Section 2 that the distribution of these indefinites differs somewhat from the distribution of indefinites in SIs. As a further illustration, consider a fact discussed in Higginbotham (1987). We pointed out above that one can find acceptable quantified associates in certain existential constructions. (34) a. There’s every possible error in this solution. b. This solution has every possible error (in it). This have-paraphrase underlay our suggestion that the universally quantified associate in (34a) is predicative. Higginbotham points out that we find everypredicates in sentences such as (35). (35) John is everything I respect. However, these expressions are not generally licensed in existential constructions. He notes the unacceptability of (36). 186
INTEGRALS
(36) *There was everything I respect about John. (cf. *John has everything that I respect (about him).) What underlies the contrast between (34a) and (36)? Given our proposal, the contrast is tied to the fact that only the former has an acceptable haveparaphrase and so only here is the every phrase a predicate. In contrast, in (36) the every phrase is a subject, thus falling under whatever correctly limits associates to indefinites. (34a) is acceptable so long as the every phrase can be interpreted predicatively. While the have-paraphrase of (36) is unacceptable, the pair in (37) is considerably more acceptable.12 (37) a. There is every quality that I respect in John. b. John has every quality that I respect (in him). There is a further point worth making. Keenan (1987) observes that haveconstructions also display a definiteness effect. (38) a. John has a brother (of his) in college. *John has the brother (of his) in college. b. This solution has a problem (with it). *This solution has the problem (with it). An adequate account of DEs should deal with these cases (see Note 12). We claim that the distribution of indefinites in these constructions follows from the predicative status of the post have NP in underlying structure. We therefore expect the definite effect in these constructions to parallel those found in II existentials. What we find significant is that the DE manifested in the haveconstructions is not identical to that found in SIs. We are claiming that a proper analysis of the DE in have existential constructions will extend to there existential constructions with an II reading provided that one adopts a Kayne/Szabolcsi syntax for these constructions. A second influential approach to the DE ties the effect specifically to the indefiniteness of post copular predicate NPs (Safir 1987). As should be clear, we agree that there are some DEs that should be traced to the fact that the NPs function predicatively. However, the flip side of our analysis is that not all instances of the DE should be derived in this way. In particular, the indefinite associate in SI interpreted there-clauses is not a predicate at any relevant level of representation. The crux of our proposal is that there are two types of DEs and that the two theoretical approaches that have been mooted each correctly characterize one of these. The data presented above empirically underwrites this dual approach to the DE.
4 Constitution and part/whole We have suggested that, in addition to standard predication, grammars employ a kind of integral predication. This second type of predication (involving Relation R) underlies IIs. This section aims to clarify somewhat just what this second 187
DERIVATIONS
kind of predication amounts to. Our proposal is that in (1) above this kind of predication involves two different kinds of information, a claim that the SC subject is partially characterized in terms of the SC predicate and a claim that there is more to the subject than that. The first kind of information is what gives such sentences their inalienable feel, the second, the part/whole aspect of the interpretation. Consider each in turn. Burge (1975), in discussing mass term constructions, proposes a novel variety of primitive predication. (39) a. The ring is gold. b. That puddle is water. The sentences in (39) are analyzed as in (40a,b) with C read as in (40c). (40) a. C [the ring, gold] b. C [that puddle, water] c. “C” “_____ is constituted of _____ at . . .” Thus, we read (39a) as follows: “the ring is (now) constituted of gold.” The interpretation of the copula as C in cases such as this allows for a smooth account of the logical form of sentences such as (41). (41) That engine was once steel but now it is aluminum. The reader is referred to Burge for further illuminating discussion. Let us assume that Burge is correct and that mass term constructions like those in (39) require postulation of the kind of predication that C embodies. We can adopt this for our purposes by proposing that an extension of C-predication is what we get in the II construction. Note that if Burge is right, then this is required even for constructions without a have-paraphrase, such as (39). Put another way, the sort of predication we envisage for IIs appears to have some independent motivation. Interestingly, constructions like those in (39) have “near” have-paraphrases. (42) a. The ring has gold (in it). b. That puddle has water (in it). The main difference between the sentences in (39) and (42) is that the former appear to be stronger statements. (39a) says that the relevant ring is entirely constituted of gold. (42a) says that it is partially so made up. In other words, the have analogues invite the inference that there is more than gold in that there ring. The distinction observed here also occurs in non-mass term constructions. (43) a. This dining room set is four chairs and a table. b. This dining room set has four chairs and a table (in it). (43a) tells us what comprises the dining room set in toto. (43b) says that the set includes these five items but comes with more. We would like to suggest that in addition to the C-predication that is the inherent interpretation of the small clause, the DP headed by the D/P yields the part/whole semantics that (42) and (43b) display.13 188
INTEGRALS
Now for a few refinements. The above adopts Burge’s proposal, but generalizes it. The relation of gold to the ring in (39a) is one of material composition. Gold is the stuff that the ring is made of. In (43), in contrast, the dining room set does not have four chairs and a table as stuff – rather these comprise the dining room set’s functional organization. If C-predication is restricted to the elaboration of stuff alone, we may think of R-predication (via Relation R) as characterizing the functional structure of an item in some abstract space. So just as humans are made up of flesh and bones they are also (more abstractly) made up of noses, heads, arms, and so on. The latter kind of characterization we call the functional make up in contrast to the physical make up that is typical of mass term constructions.14 In sum, we take C- and R-predications to be primitive, putting aside now what unifies these two (though see Note 15). We furthermore suggest that these predications occur in SCs of the kind that Kayne and Szabolcsi first investigated.15
5 More facts In light of the above, consider the data discussed by Vergnaud and Zubizareta (1992) (V&Z). They discuss two kinds of possession constructions in French. The first, which they dub the external poss construction (EPC) is exemplified in (44). The internal poss construction (IPC) is given in (45). (44) Le médecin a radiographié l’estomac aux enfants. the doctor has X-rayed the stomach to-the children “The doctor X-rayed the children’s stomachs.” (45) Le médecin a radiographié leurs estomacs. the doctor has X-rayed their stomachs “The doctor X-rayed their stomachs.” (44) also has a clitic variant: (46) Le médecin leur a radiographié l’estomac. the doctor to-them has X-rayed the stomach “The doctor X-rayed their stomachs.” V&Z point out some interesting properties that these constructions have. First, they observe that EPCs involve distributivity, in contrast to IPCs. For example, if leur “to them” refers to Bob, Sue and Sally in (47a) then each washed two hands. (47b), in contrast, is vaguer and can be true so long as each individual washed at least one. (47) a. On leur a lavé les mains. they to-them have washed the hands “We washed their hands.” b. On a lavé leurs mains. they have washed their hands “We washed their hands.” 189
DERIVATIONS
Second, observe that in EPCs the inalienably possessed part shows semantic number. In (44) l’estomac “the stomach” is in the singular given that children come with one stomach apiece. In (47a), in contrast, les mains “the hands” is plural given that people typically come equipped with two hands. In contrast, the number agreement in IPCs is constant. In both (45) and (47b) we find the plural s-marking on the head noun. Third, following Authier (1988), V&Z note adjectival restrictions on EPCs that do not beset IPCs. EPCs do not tolerate appositive adjectives, though restricted adjectives are permitted. (48) a. *Pierre lui a lavé les mains sales. *Pierre to-him has washed the hands dirty “Pierre washed his dirty hands.” b. Pierre a lavé ses mains sales. Pierre has washed his hands dirty “Pierre washed his dirty hands.” c. Il lui a bandé les doigts gelés. He to-him has wrapped the fingers frozen “He bandaged his frozen fingers.” (48a,b) display the contrast between the external and internal poss constructions while (48c) shows that adjectival modification of the external poss construction is possible so long as the adjective is restrictively interpreted. The account that we have elaborated above for inalienable constructions gives us an analysis of these three facts. Given our extension of the Kayne/Szabolcsi approach, (44) and (46) involve a SC structure such as (49). (49) le médecin a radiographié [DP Spec D/P [[DPposs [les enfants] Agr [l’estomac]]]] the doctor has X-rayed the children the stomach To derive the correct output for (44), l’estomac raises to the Spec position and a is inserted. ales is spelled out as aux. To derive (46), the clitic leur (corresponding to les enfants) moves up to clitic position. The derivations are as in (50). (50) a. Le médecin a radiographié [DP [l’estomac]i [D/P à [[DPposs les enfants Agr ti]]]] the doctor has X-rayed the stomach the children b. Le médecin leuri a radiographié [DP [l’estomac]j D/P [[DPposs ti Agr tj]]]16 the doctor to-them has X-rayed the stomach Now consider the interpretation of these constructions. The SC source carries with it the R-predication interpretation. What lends the distributivity to these constructions is the R-predication. V&Z note that a predication structure is 190
INTEGRALS
required to get the distributive readings. However, it is not just any predication which licenses it. For example, (51) can be true even if some strawberries are not spoiled. That is, despite the predication relation holding between the subject and the predicate distributivity is not enforced in standard cases. (51) The strawberries are spoiled. What is required for distributivity is not standard predication but R-predication. This makes sense. In asserting that children have stomachs as constituents we mean that each child has one – or whatever the canonical number happens to be.17 Note, furthermore, that (following V&Z) we assume that the number marking in these French constructions is significant, in that it provides the cardinality of the predicate. Thus we account for the number facts noted above. After all, it is true that typically children are constituted of one stomach and two hands (and see Note 17). Last of all, given that the inalienably possessed element is actually a predicate on this view, it is not surprising that it cannot be appositively modified. Appositive modification holds for individuals, not predicates. For example, in English, appositive relatives are restricted to referring heads, whereas non-restricted relatives apply to N predicates. This correctly bars appositives adjectives from EPCs, given our analysis. In sum, our proposed extension of the Kayne/Szabolcsi account provides a rationale for the properties that V&Z described.
191
10 FROM BEING TO HAVING Questions about ontology from a Kayne/Szabolcsi syntax†
1 Possession in cognition The relation between being and having has puzzled humans for millennia. Among grammarians, Benveniste offers an excellent instance of both caution and open mindedness when dealing with the details of this intriguing relationship. He tells us: That to have is an auxiliary with the same status as “to be” is a very strange thing. [To have] has the construction of a transitive verb, but it is not. . . . In fact, to have as a lexeme is a rarity in the world; most languages do not have it. (1971: 168) This is more than a curiosity about an auxiliary verb. Think of the relation between the sentences John has a sister (, Mary) and Mary is a sister (of John’s). The traditional analysis for this phenomenon (for instance, as insightfully presented in Keenan 1987) is in terms of postulating a relational term sister, which has two variable positions, as a matter of lexical fact. Then the intuition is: one of two elements can saturate each variable position. If what we may think of as the referent of sister is promoted to subject of the sentence, we have Mary is a sister (of John’s). If instead the other, possessor element is promoted to subject position, what we get is John has a sister (, Mary). All that be and have do is mark each relation. But if Benveniste is right, be and have in fact cannot systematically mark each relation, particularly in languages that lack have. The immediate question is: what is then the difference between being a sister and having a sister? How do we know that one of these can only be a property of Mary while the other is a property of John, but may be Mary’s as well? Is all of this out there, in reality, or is it somehow a function of the way humans conceptualize the world – and if so, how? Interestingly, Du Marsais worried about this very issue. The following quote is taken from Chomsky. Just as we have I have a book, [etc.] . . . we say . . . I have fever, . . . envy, . . . fear, a doubt, . . . pity, . . . an idea, etc. But . . . health, fever, fear, doubt, envy, are nothing but metaphysical terms that do not designate 192
FROM BEING TO HAVING
anything other than the ways of being considered by the points of view peculiar to the spirit. (Chomsky 1965: 199, n. 13) It is equally telling to see the context where Chomsky invokes his reference to Du Marsais, just after reminding the reader how “certain philosophical positions arise from false grammatical analogies” (p. 199). To support his view, he introduces the following quote from Reid (1785), alluding to having pain. Such phrases are meant . . . in a sense that is neither obscure nor false. But the philosopher puts them into his alembic, reduces them to their first principles, draws out of them a sense that was never meant, and so imagines that he has discovered an error of the vulgar. (p. 199) Chomsky then goes on to suggest that “a theory of ideas” cannot deviate from the “popular meaning,” to use Reid’s phrases. With this perspective in mind, consider the fact that all of the expressions in (1) have a possessive syntax. (1) a. b. c. d.
John has a house John has only one arm John has a sister: Mary John has a bad temper
When we say possessive syntax, we must not just mean that these expressions can go with have; they can also appear as in (2). (2) a. b. c. d.
John with a house John’s only arm A sister of John’s John is bad tempered
Certainly a relation, in fact a legal one, exists between John and the house he owns. Likewise, any part of John’s may be expressed, with respect to him, in possessive terms. It is tempting to blame this on a more general notion of “inalienability.” It is, however, not clear that one’s parts are uncontroversially inalienable – or there would be no successful transplants. The notion “inalienable” is even harder to characterize once part/whole relations are abandoned. Family relations seem inalienable, but not obviously – as the child who recently divorced her mother can readily attest. And as we saw, matters get even more confusing with abstract possessions. Children are said to have the tempers of their nasty relatives and the looks of their nice ones. What does this really mean, if these notions are supposed to be inalienable? It is also tempting to think that just about any relation between two entities can be expressed as a possession. This, however, is false. I relate to you right now, but it makes no sense to say “I have you.” Numbers relate to each other, in a sense inalienably, yet what does it mean that “3 has 1.2?” Against Reid’s advice, one could perhaps say there are a handful of core 193
DERIVATIONS
primitive possessive relations, and the rest are accidents of history, games people play or metaphors. It is, however, surprising to find the same types of accidents, games or metaphors, culture after culture. Take the examples in (3). (3) a. Juan con una casa Juan with a house b. Su único brazo his only arm c. Una hermana suya a sister his d. Está de mal humor. is-3sg of bad temper Basically the same things one is said to have in English, one is said to have in Spanish. Or in other languages, for that matter, as illustrated in (4).
.
(4) a. Vai: Nkun ? be. my head exists “I have a head.” b. Turkish: Bir ev-im var a house-mine is “I have a house.” c. Mongol: Nadur morin buy to me a horse is “I have a horse.” d. Ewe: Ga le asi-nye money is in-my hand “I have money.” I have chosen the instances in (4) from unrelated languages which exhibit superficial differences with both English and Spanish (for example, they do not involve have). Even so, the possessed elements here are hardly surprising. And as Benveniste (1971: 171) puts it, at “the other end of the world” (Tunica) there is a class of verbs that must carry prefixes of inalienable possession, and express emotional states (shame, happiness), physical states (hunger, cold), or mental states (knowledge, impressions). No such morphological manifestation exists in Spanish, but observe the examples in (5), which simply reiterate Du Marsais’s point. (5) Juan tiene … EMOTIONAL STATE vergüenza “shame” alegría “happiness”
PHYSICAL STATE hambre “hunger” frío “cold”
MENTAL STATE conocimiento “knowledge” impresión “impression”
If the conceptual agreement between pre-Colombian inhabitants of Louisiana and their brutal conquerors is an accident, this can be no other than the human accident. 194
FROM BEING TO HAVING
In sum – and this is what should worry us as grammarians – there is no obvious way we have of defining possession without falling into vicious circularity. What expressions are capable of appearing in the context of have and the like? Possessive expressions. What are possessive expressions? Those that appear in contexts involving have and the like. So at the very least inalienable possession appears to be a cognitive notion, seen across cultures with minimal variations. Still, what does this mean? Consider an example taken from a famous commercial, the punch line of which reads as in (6). (6) I want the soldier that belongs to this beer to step forward! The individual in question is no other than John Wayne himself, which raises this question: what might the nature be of that improbable beer that happens to own the duke? Is that serious possession or military talk? Perhaps the latter, but the Spanish examples in (7) suggest otherwise. (7) a. El oro tenía forma de anillo the gold had form of ring b. El anillo tenía (9gde) oro the ring had (9gr of gold (8) a. La ciudad tenía (estructura de) barrios the city had structure of neighborhoods b. Los barrios tenían lo peor de la ciudad the neighborhoods had the worst of the city The point of (7) and (8) is that, to some extent, they manifest an inalienable possessive relation and its inverse, with roughly the same syntax. Granted, these examples do not have the perfect symmetry of the John Wayne case in (6), but this may be a relatively low-level fact. Once we abandon the specific expression of possession through have or similar elements, we find (9)–(10). (9) a. El peso de un kilo the weight of one kilo a. Un kilo de peso one kilo of weight b. Una concentración de 70° a concentration of 70° b. 70° de concentración 70° of concentration (10) a. Una organización de subgrupos an organization of subgroups a. Subgrupos de organización subgroups of organization b. Un ensamblaje de partes an assembly of parts b. Partes de ensamblaje
195
DERIVATIONS
We could, of course, claim that (9) and (10) are not really possessive. It is unclear what that means, though, in the absence of an ontological notion of possession. Syntactically, we can say such things as the organization had subgroups or the subgroups had organization, as much in Spanish as we can in English; and certainly, there is a characteristic inalienability to all of the notions in (9) and (10). One can retreat, then, to saying that the organization the subgroups have is not the same organization that has the subgroups – but apart from hair-splitting, this is far from obvious. For the bottom line is: “are we more justified in saying this substance has form than we are in saying that this form has substance?” And if these are both grammatical, are we always going to insist on the opaque claim that the form this substance has is not the same as the form that has this substance? There is a different way to proceed. Suppose we agree that all of the above, form and substance, organization and subgroups, concentration and degrees, and even John Wayne and his temper, his horse or even his beer, stand in a yetto-be-determined Relation R, which in fact number 3 and number 1.2, or writers and readers, for some reason do not stand in. Crucially for our purposes, however, that Relation R has nothing to do, in itself, with the subject or object positions of a verb have or a preposition like of. Quite the opposite, an intricate syntax carries the terms of Relation R to either subject or object of the relevant syntactic expressions. Are there advantages in making such a claim?
2 Every term can be relational First, I find it interesting that we cannot confine relational problems to so-called relational terms like sister. The minute we extend our coverage to part-whole possessions, just what is not a part of something else? Other than the lexical entries for God, all terms in our lexicon have to be thought of as relational. This immediately suggests we are missing a generalization that should be placed out of the idiosyncratic lexicon. Second, consider the intriguing facts in (11) and (12). (11) a. The poor neighborhoods of the city b. The city’s poor neighborhoods c. The city has poor neighborhoods (12) a. A city of poor neighborhoods b. * The/a poor neighborhoods’ city c. The poor neighborhoods are the city’s Note that the part-whole relation (city, neighborhood) is kept constant in all these examples. Traditionally, this was expressed in terms of neighborhood having two variable positions, one referential and one possessive. Now, in (11) and (12) we see the city and the neighborhoods each promoted to subject position (or concomitantly, associated to the preposition of ). This is as expected. What is not expected is that (12b) should be out. One could try to blame that on the fact that, in this instance, the relational 196
FROM BEING TO HAVING
term neighborhoods is relinquishing reference to the other term, city. But this surely cannot be a problem in itself given the perfect (11b), where reference is to city. One might then try to take reference as an indication that, to begin with, we should not have compared the two paradigms; this, of course, would be because (11) invokes reference to neighborhoods, whereas (12) does only to city. If reference is an intrinsic property of a word, is this not mixing apples and oranges? Keep in mind, however, the central fact that in both (11) and (12), the R relation between city and neighborhoods is constant, crucially regardless of the ultimate reference of poor neighborhoods of the city or city of poor neighborhoods. If we demand that these two have nothing in common, the implied lexicon is going to be even uglier, for now we need two relational terms, neighborhoods and city, since each of these can project its own structure and be related to something else. This is worse than before. We needed to say that all terms in the lexicon are relational, but now we have to sortalize Relation R, the way a city relates, as a whole, to a neighborhood is different from how it relates, as a part, to a state. And never mind that: the greatest problem still is why on earth (12b) is impossible. I doubt this is a quirk, given that the very same facts hold in Spanish (as well as other languages), as shown in (13)–(14). (13) a. Los barrios pobres de la ciudad the neighborhoods poor of the city b. Sus barrios pobres (los de la ciudad) its neighborhoods poor those of the city c. La ciudad tiene barrios pobres the city has neighborhoods poor (14) a. Una ciudad de barrios pobres a city of neighborhoods poor b. *Su ciudad (la de los barrios pobres) *their city that of the neighborhoods poor c. Los barrios pobres son de la ciudad. the neighborhoods poor are of the city Indeed, the facts are extremely general, as (15)–(16) shows. (15) a. Los brazos largos de un calamar the arms long of a squid b. Sus brazos largos (los del calamar) its arms long those of-the squid c. El calamar tiene brazos largos. the squid has arms long (16) a. Un calamar de brazos largos a squid of arms long b. *Su calamar (el de los brazos largos) *their squid that of the arms long c. Los brazos largos son del calamar. the arms long are of-the squid 197
DERIVATIONS
Again, there are differences of detail between languages. For example, Spanish does not realize non-pronominal noun phrases in pre-nominal position, as does English; and English uses the expression a long-armed squid, with noun incorporation, for the corresponding Spanish “a squid of long arms” (16a). But neither language allows a form such as *the long arms’s squid or *their squid (16b), meaning the one with long arms. Needless to say, the syntactician must also predict this surprising gap, but it is precisely here that syntax works. Consider in this respect the syntax in (17), the structure discussed by Kayne (e.g. 1994) and Szabolcsi (e.g. 1983) in recent years, presented in the more accurate guise in Chapter 9. The main difference with Kayne’s structure is that, instead of bottoming out as an AgrP, (17) (p. 291) is built from a small clause, which is designed to capture Relation R in the syntax. Although I keep Kayne’s Agr intact, I think of this position as a referential site because, as it turns out, whatever moves to its checking domain determines reference. If we are to be technical, the moved item is assigned a referential feature that is attracted to the checking domain of Agr. This means the derivations in (18) start from different lexical arrays, which is as expected: despite the obvious parallelism, the terms of the relations differ, at least, in definitude. Nevertheless, what is important is that the two expressions in (18) have the same, as it were, pseudo-thematic structure, and hence code the same Relation R. (17)
DP D' D
AgrP Agr' Agr
SC city
(18) a.
neighborhoods
b.
DP
DP
D' D
D' AgrP
D Agr'
city [r] Agr of
Agr'
neighborhoods [r] Agr of
SC t
AgrP
neighborhoods
198
SC city
t
FROM BEING TO HAVING
Now observe the interesting movements, as in (19). (19) a.
b.
DP
DP
neighborhoods D' D ’s
city AgrP
D ’s Agr'
city [r]
D'
Agr
AgrP
neighbor [r] Agr
SC t
Agr'
t
SC t
t
Note that this time we involve the specifier of D. I think this is to check spatial contextual features, although this is really not crucial now. As is customary, we take the genitive ’s to materialize as the head of D when its specifier is occupied, and following Kayne, we do not lexicalize Agr when it does not support lexical material, presumably because it incorporates to the D head. But these are all details. What matters is the shape of the movement paths. The ones in (19b) cross, while those in (19a) are nested. One possible account for the situation in (19) is in terms of the Minimal Link Condition. Intuitively, the head D cannot attract neighborhoods in (19a) because city is closer. But the main point is that, whereas the movements as shown in (19) are meaningfully different, we cannot say much about the relevant lexical correspondences – which would all seem to be licit. This simply means that it pays off to place Relation R in the syntax, contra the traditional assumption that views it as merely lexical.
3 The syntax of possession We have, then, both conceptual and empirical reasons to suppose not only that the Kayne/Szabolcsi syntax for possession is right, but furthermore that this is where possession itself, as a semantic Relation R, is encoded – instead of lexically through relational terms. Ontologically, this is very interesting, since we now have to ask under what circumstances two terms enter into the pseudothematic relations involved in the small clause under discussion. We want the part (for instance, neighborhood) to be the small clause predicate, and the whole (for instance, city) to be the small clause subject, but why? Why not the other way around? And is this general? Likewise, if whatever moves to the Agr domain determines reference, what is the nature of reference? Is reference always coded in this relativistic manner? If these questions seem too troubling, note that we can propose a very transparent semantics for the objects in (13)–(14). 199
DERIVATIONS
(20) a. [∃e:city(e)] T1(city, e) & T2(neighborhood, e) b. [∃e:neighborhood(e)] T1(city, e) & T2(neighborhood, e) We can think of each term of the small clause as satisfying some sort of primitive pseudo-role (T1 or T2), of Agr as the lexicalization of an event variable, and of D as a quantificational element. The small clause determines the pseudothematic properties of the expression, much as a verb phrase determines the thematic properties of a sentence; the primitive pseudo-roles of T1 and T2 do not seem, a priori, any more or less worrisome than corresponding verbal roles like AGENT or THEME. In addition to pseudo-thematic or lexical structure, the functional structure of the expression determines referential and quantificational properties, through a variable site and a quantificational site. This fleshes out Szabolcsi’s intuition that nominal and verbal expressions are structurally alike. Architectural matters aside, though, we have to worry about the internal make-up of the small clauses. It is not enough to think of T1 and T2 as, say, WHOLE and PART roles, since we have seen the need for other inalienable relations. Of course, we could simply augment our vocabulary for each new sort of relation we find – but that is hardly illuminating. The bottom line is whether there is anything common to the three sentences in (21). (21) a. El vino tiene 12°. the wine has 12° b. La organización tiene varios subórganos. the organization has several sub-organs c. La gente mediterránea tiene muchos parientes The people Mediterranean has many relatives (21a) expresses a relation between a mass term and the measure of its attribute of concentration; (21b), the relation between a count term and the expression of its classifier of structure; (21c), the relation between an animate term and a specification of its kinship. Given their syntactic expression, these would all seem to be manifestations of Relation R – but what does that mean? Note, also, the facts in (22). (22) a. El vino tiene 12° de concentración. the wine has 12° of concentration b. La organización tiene varios subórganos de estructura. the organization has several sub-organs of structure c. ? La gente mediterránea tiene muchos parientes de familia. ? the people Mediterranean has many relatives of family Each of the possessed elements in (21) can show up with an associate term which demarcates the type of possession at stake. Curiously, this term can be promoted, as shown in (23) (see [9]–[10] above). (23) a. El vino tiene una concentración de 12°. the wine has a concentration of 12° 200
FROM BEING TO HAVING
b. La organización tiene una estructura de varios subórganos. the organization has a structure of several sub-organs c. La gente mediterránea tiene familias de muchos parientes. the people Mediterranean has families of many relatives Again, the expressions in (23) do not mean the same as those in (22). However, Relation R is kept constant in either instance. The examples in (23) are also significant in that any naive analysis of their syntax will make it really difficult to establish a thematic relation between the matrix subject and what looks like the complement of the object. Plainly, thematic relations are not that distant. Fortunately, the Kayne/Szabolcsi syntax gives us (24). (24) a.
b.
D'
D
Agr' Agr
concentration
D SC
c.
D' Agr' Agr
D SC
degrees structure
D' Agr' Agr
organs
family
SC relatives
Depending on what gets promoted, T1 or T2, we will find the same sort of distributional differences we already saw for (18) and the like. In turn, what the possessors in (22) and (23) – wine, the organization, and Mediterranean people – possess is the entire structure in (24), whatever it is that it refers to in each instance. That way, the thematic relation is as local in (23) as in (22) or (21), directly as desired. A related question that we must also address is when auxiliary have appears, and when it does not. Compare (23) to (25), which involves, instead, auxiliary be. (25) a. El vino es de 12°. the wine is of 12° b. La organización es de varios subórganos. the organization is of several sub-organs c. La gente mediterránea es de muchos parientes. the people Mediterranean is of many relatives The structure of these examples is very transparent, as in (26). This is possessor raising, of the sort seen in many languages. Of course, possessor raising is also at issue in similar examples involving have. According to Kayne’s analysis, a derivation of the form in (26) is involved in the paradigm of (21) to (23), with an associated D-incorporation to be. This, in the spirit of Freeze (1992), is in fact what causes be to Spell-out as have. If so, what is really the difference between (26), yielding (25), and a similar derivation yielding (21)? Why does only one involve D-incorporation, resulting in auxiliary have? 201
DERIVATIONS
(26) POSSESSOR be … of POSSESSED V' be
D' D
AgrP Agr'
POSSESSOR [r] Agr of
SC t
POSSESSED
There had better be some difference, because there is actually a rather important consequence for meaning in each structure. Thus, compare the examples in (27). (27) a. La ciudad es de barrios pobres. the city is of neighborhoods poor b. La ciudad tiene barrios pobres. the city has neighborhoods poor (27a) tells us what kind of city we are talking about, a poor city. However, (27b) tells us what kinds of neighborhoods the city has. Some are poor. It could be, and in fact there is an invited implicature to the effect that the city also has rich neighborhoods. No such implicature is invited in (27a), and the existence of rich neighborhoods in the scenario that (27a) makes reference to is contradictory with what the proposition asserts. These sorts of contrasts indicate a different derivation for be cases, as in (26), and for have cases, as in (28). Note, first, that this derivation is consistent with the fact that we can say in English the city has poor neighborhoods in it. We may follow Kayne in taking in – a two-place relation – to be one of the possible realizations of the D trace, redundantly spelled-out as a copy when the spacial context of [Spec, DP] forces agreement with it. Likewise, the clitic it spells-out the trace of the raised possessor in the specifier of Agr.1 In both of these suggestions, I am assuming that movement is an underlying copying process, which may or may not involve trace deletion depending on linearization factors (as in Nunes 1995).
202
FROM BEING TO HAVING
(28) POSSESSOR [beD] (have) … POSSESSED POSSESSOR… V' be
DP
[r] reference [c] context
POSSESSED D' [c] D
AgrP t [r]
Agr' SC
Agr t
t
But of course, the most pressing question is why (28) does not violate the Minimal Link Condition, just as (19a) did. Arguably, this is because the structure in (28) involves the Freeze-type incorporation that results in have. The general import of this sort of incorporation is to collapse distance within its confines. If so, the major difference between (19a) and (28) is that only in the latter are the terms related to the Agr/D skeleton equidistant from higher sites, in the sense of Chomsky (1995b: Chapter 3) (though with differences of detail that I will not explore now). Of course, the Freeze incorporation is not done in order to salvage a derivation. It is just a possibility that Universal Grammar grants, by distributing appropriate matching features to the relevant lexical items, for instance, affixal features to the Agr and D that incorporate. Without the necessary combinations of features, the alternative derivations terminate along the way, and are thus not valid alternatives. At any rate, why does this syntax entail the appropriate semantics? I have suggested in passing that the element that moves to the specifier of D codes contextual confinement for the quantifier in D; then we expect that movement through this site would have important semantic correlates. Concretely, in (28) the possessed serves as the context where Relation R matters, whereas the possessor determines R’s reference. Concerning poor neighborhoods, the city has that, but there is no reason to suppose the city does not have something else. This is different from the semantics that emerge for (26), where the possessor, a city, may serve to confine the context (although in the diagram I have not represented the possessor moving to [Spec, DP]). In this instance, the referent of R and the context confiner of the quantification over events which are structured in terms of R, are one and the same. Differently put, this is a decontextualized Relation R. Regardless of context, the city is endowed with poor neighborhoods. In other words, poor neighborhoods is a standing characteristic of the city, in the sense of Chapter 11. 203
DERIVATIONS
4 Paradigmatic gaps I trust that these derivational gymnastics have the effect, at least, of confirming that some serious syntax is plausibly involved in possession. But now we have obviously opened Pandora’s box. If indeed matters are so transparently syntactic as I am implying, why are there any gaps in the paradigms? (29) a. El the a. El the b. El the b. El the
kilo de carne que corte Shylock deberá de ser exacto. kilo of flesh that cut Shylock must be exact.AGR kilo de carne que compres deberá de ser tierna. kilo of meat that buy-you must be tender.AGR grupo de truchas que estudio es interesantísimo. group of trouts that study-I is interesting.SUP.AGR grupo de truchas que ví eran alevines. group of trouts that saw-I were young.AGR
deberá de estar engrasado. (30) a. El carro de leña que traigas the cart of wood that bring-you must be oiled.AGR debe de estar seca. a. *El carro de leña que traigas the cart of wood that bring-you must of be dry.AGR b. Una bandada de pájaros está muy organizada. a flock of birds is very organized.AGR b. *Una bandada de pájaros están piando como locos. a flock of birds are chirping like crazy.AGR (29) is as predicted. Depending on what moves to the referential site in the Kayne/Szabolcsi structure, we refer to either term of Relation R, as we can attest through agreement. For instance, kilo in (29a) agrees in the masculine, whereas carne “meat/flesh” in (29b) agrees in the feminine. However, we can have referential shifts only with certain canonical measures or classifiers, such as kilo or group, but not with cart or flock, as shown in (30). This is confirmed in (31)–(32). (31) a. La carne tendría el peso de un kilo. the meat must have the weight of one kilo a. La carne tendría un kilo de peso. the meat must have one kilo of weight b. Las truchas tenían la estructura de un grupo. the trouts had the structure of a group b. ? Las truchas tenían un grupo de estructura. the trouts had a group of structure (32) a. La leña tendría las dimensiones de un carro. the wood must have the dimensions of a cart a. * La leña tendría un carro de dimensiones. the wood must have a cart of dimensions b. Los pájaros tenían la organización de una bandada. the birds had the organization of a flock 204
FROM BEING TO HAVING
b.
Los pájaros tenían una bandada de organización. the birds had a flock of organization
Although all possessive relations can be expressed in the pedantic guise of indicating not just a certain measure or classifier of the possessor, but also the type of measure or classifier this is, a reversal of this expression is possible only with canonical measures or classifiers, as shown in (31), but not otherwise (cf. 32). Observe also the curious facts in (33) and (34). (33) a. (gramos de) oro con/tiene(n) *(forma de) anillo. (grams of gold with/have(pl) form of ring b. (*forma de) anillo con/tiene (gramos de) oro. (*form of ring with/has grams of gold (34) a. (conjunto de) truchas con/tiene(n) *(estructura de) grupo. (set of trouts with/have(pl) structure of group b. (*estructura de) grupo con/tiene (conjunto de) truchas. (*structure of group with/has set of trouts Why, together with gold with the form of a ring or trouts with the structure of a group, can we not say *gold with ring or *trouts with group? Why do we need to specify the notions “form” or “structure”? Conversely, we may say grams of gold with the form of a ring or set of trouts with the structure of a group, but not *form of ring with gold or *structure of group with trouts. Here what we cannot do is specify notions like “form” or “structure,” though they seem to be semantically appropriate. Note also that have is involved in the examples, which signals a derivation like (28). Curiously, though, the examples in (33) and (34) involve raising of the possessed (ring or group), instead of the possessor, as we saw in (28). I believe this is possible because of the Freeze incorporation, which leads to the spelledout have, and which should allow either term of Relation R to be promoted to subject position. To clarify the possibilities that this allows, I present the diagram in (35), with accompanying explanations.
205
DERIVATIONS
(35) a.
b.
POSSESSOR [D] V' be
POSSESSOR [D] V'
DP
be
DP
POSSESSED D' [c] D
D'
t [c] D
AgrP t [r]
POSSESSED Agr' [r]
Agr' SC
Agr t
t
t
c.
d. POSSESSED [D] V'
DP
be
D'
t [c] D
AgrP t [r]
AgrP
POSSESSOR Agr' [r]
Agr' SC
Agr
t
DP
POSSESSOR D' [c] D
SC
Agr
POSSESSED [D] V' be
AgrP
t
SC
Agr t
t
t
Explanation of diagram First, we distribute features: [r], a referential feature; [c], a contextual feature; and [D], the Extended Projection Feature that makes something a subject. Observe how all items marked [D] are promoted to subject position (the top element in the structure); how the items marked [c] move to or through the contextual site, by assumption [Spec, DP]; and how the items marked [r] move to or through the referential site, [Spec, AgrP]. Needless to say, I am assum206
FROM BEING TO HAVING
ing that different elements may involve different features, sometimes two of them. Now, this should allow for more possibilities than are, in fact, attested: the possible combinations are as in (36), but only some are fully grammatical. (36) a. a. b. b. c. c. d. d.
* (grs of) gold with (the form of) a ring in it/them. ? (set of) trouts with (*the structure of) a group in it/them. (grs of) gold with *(the form of) a ring (*in it/them). (set of) trouts with *(the structure of) a group (*in it/them). (*form of a) ring with (grs of) gold in it. ? (*structure of a) group with (a set of) trouts in it. ! (form of a) ring with (grs of) gold (*in it). ! (structure of a) group with (a set of) trouts (*in it).
Let me abstract away from the merely questionable status of some of these examples, concentrating only on relative judgments vis-à-vis completely ungrammatical instances. In turn, observe how the examples marked with an exclamation mark are possible strings of words – but this is arguably a mirage. I use the in it lexicalization of traces as an indication of the structure that concerns us now; (35b) and (35d) do not allow for such a lexicalization, since the surface grammatical object lands in [Spec, AgrP]. In contrast, examples with the structure in (35c) have the desired output format, in addition to the curious raising of the possessed element. For reasons of space, I will not examine all of the structures in (35) in any detail. The main point I am trying to raise is a simple one, though. Syntax alone does not predict the contrasts in (36) – at least I have not been able to determine how it could.
5 Toward a semantics for possession Nevertheless, some intriguing generalizations can be abstracted away from (36), in two separate groups. Those concerning referentiality are in (37). (37) I. In possessive structures mass terms are not referential. II. A possessed T2 can be a subject only if referential. If (37I) is correct, (35a) and (35d) are not viable derivations for mass terms in possessor guise, since these derivations would leave the referential Agr unchecked – a mass term being improperly forced into a referential site. This describes the ungrammaticality of (36a) and (36d). In turn, if (37II) is true, the movement to subject position in (35d) must be impossible, a non-referential possessed element ending up as subject; correspondingly, (36d) must be ungrammatical. The generalizations in (38) concern the possible formats of possessed terms. (38) I. When the possessed T2 is manifested in the referential site, it must be typed with an overt marker. II. Elsewhere, the possessed T2 may not be overtly typed. 207
DERIVATIONS
As we already saw, the terms of Relation R may surface in purely lexical guise (as gold or trouts), or through the more detailed expression of their extension (as some measure of gold or some set of trouts). In fact, even in its bare guise, a noun like gold in our examples really means some measure of gold, just as group means a structure of a group, and so on. In any case, these manifestations are generally possible, occasionally obligatory, and occasionally impossible. Curiously, the possessor term T1 of Relation R has no obvious restrictions. In contrast, (38I) describes obligatory manifestations of the possessed term T2, as in (36b) and (36b); and (38II) describes impossible manifestations of the possessed T2, as elsewhere in the paradigm. In other words, it is mostly T2 that is responsible for the idiosyncrasies in (36). This might help us understand Relation R. I have not really said much about what Relation R is. The question is very difficult. However, given the generalizations in (37) and (38), it seems as if T2, the second term of R, is less innocent than the semantics in (20) would lead us to believe. There, the possessed T2 is taken as a pseudo-role, just as the possessor T1 is. However, we now have reason to believe that T2 is special. For example, when T2 is promoted to a grammatical site where reference appears to be necessary, we must accompany this element by a grammatical mark that overtly marks its type, like set, for instance. Otherwise, we in fact cannot mark the type of T2. This would make sense if T2 is itself the kind of element whose general purpose is to tell us something about the more or less abstract type of an expression, a kind of presentational device for an otherwise unspecified conceptual space. The idea is expressed as in (39). (39)
MENTAL SPACE
PRESENTATION
Forgive my origami metaphor. The intention here is to talk about a raw mental space which gets measured, shaped, or otherwise topologically transformed, by way of a presentational operation. If this view of Relation R is on track, then T1 and T2 have a very different status. T2 is really an operation on T1, and the semantics in (20) would have to be complicated to capture this picture, a formal exercise that I will not attempt now (though see Chapter 14). The intuition is that generalization (38II) is the default way of realizing T2. What we see then in (38) is the Paninian Elsewhere Condition at work. When in referential sites, presentational device T2 is forced out of its canonical realization. In these contexts, T2 surfaces in the specific format that makes its nature explicit, as a set, or whatever. This way of looking at T2 has nice, independent consequences. (40) is constructed so as to allow for a plausibly ambiguous quantifier interaction, while at the same time not invoking an irrelevant specific reading on the possessor. (40) By 2010, most women in Utah will have had two husbands. The example invokes reference to two husbands per woman, in a country that allows divorce. The alternative reading is a priori equally plausible in a state that allows polygamy, and where divorce is infrequent. However, the possessed 208
FROM BEING TO HAVING
term does not like to take scope over the possessor. We may account for this if the inalienably possessed element, a presentation device in the terms of (39), is frozen in scope because it is a predicate of sorts. This is an old intuition that squares naturally with the syntax of small clauses that we are assigning to the terms of the R relation, where T2 is a predicate. The other aspect of the generalizations concerning (36) that I find interesting is the fact that mass terms are not referential in possessive constructions. I do not know why that is, but I think it correlates with another fact illustrated in (41). The Spanish example in (41) shows the relevant grammatical ordering when more than one R relation is involved. Crucially, alternatives to it, such as (42), are ungrammatical. (41) animal de 100 kgs (de peso) con varios órganos (de estructura) animal of 100 kgs of weight with several organs of structure (42) *animal de varios órganos (de estructura) con 100 kgs de peso *animal of several organs of structure with 100 kgs of weight This suggests a structural arrangement along the lines of (43). (43)
R' R [animal]
[various organs (of structure)] [100 kgs (of weight)]
Syntactically, (43) corresponds to the structure in (44). (44)
SC' SUBJECT' PREDICATE' SC [various organs (of structure)] SUBJECT [animal]
PREDICATE [100 kgs (of weight)]
If this much is granted, we have the possibility for a recursive structure, with potentially various levels of embedding, each corresponding to some typelifting, whatever that means. I must emphasize that (41) is a simple piece of data, and (44) a simple structure to describe it. Had (42) been grammatical, we would have needed (45) instead of (44).
209
DERIVATIONS
(45)
SC' SUBJECT' PREDICATE' SC [100 kgs (of weight)] SUBJECT [animal]
PREDICATE [various organs (of structure)]
But (42) is ungrammatical, and we should make much of this. We should because it makes no sense to blame the hierarchy in (44) on any sort of outside reality. Surely (44) looks Aristotelian, with substance coded logically prior to form, but there is no physical basis for this, or any similar distinction. Likewise, it makes no sense to blame (44) on any effective reality, of the sort presumably involved in how our species evolved. We have no idea what it would mean for us to code the world in terms of the alternative (45), simply because we do not. All we know is we have (44), with or without a reason. That is enough for someone who is concerned with how the mind is, and pessimistic about finding how it got to be so. In these terms, a real questions is how (44) is used to refer; apparently, standard reference in possessive structures is a phenomenon that starts in the second layer of structure in (44). This is like saying that the second presentational device in (44) is responsible for individuation, an intriguing restatement of generalization (37I).
6 A word on standard possession I cannot close without saying something about simple possessions, having shown what I would like to think of as “ontological” possession. What are we to make of John Wayne simply having, say, a gun? Immediately, we can say this. Inasmuch as standard possession exhibits many of the syntactic properties of ontological possession (e.g. presence of have/with and similar elements), we should probably take this general sort of possession to be nothing but ontological possession as well. Needless to say, if we do that, we have to discover in which way standard possession is hiding some sort of ontological claim. Having freed ourselves from the optimistic view that a possessor is just the subject of have, and the possessed is just its object, what prevents us from thinking that, in John has a gun, the gun (yes, the gun) is actually ontologically in possession of something like a stage of John? I do not find (46) accidental in this respect. (46) El arma está en manos de Juan. the gun is in hands of Juan “The gun is in Juan’s hands.” The question, of course, is why we take the gun to be in John’s hands as a rough paraphrase for John having the gun. Examples like (46) suggest that the gun is ontologically related to something 210
FROM BEING TO HAVING
which – for lack of a better word – we express in terms of a metaphor for a stage of John’s, his hands. This is important, because we surely do not want to say that the gun, in this instance at least, is ontologically related to the whole of John (or else we would be invoking, again, the sort of inalienable possession that we have seen so far). That is obviously just a speculation. I find it tantalizing, though, in one crucial respect. Once again the facts of language open a window into the facts of mind. Perhaps the small synecdoche we invoke in these instances, lexicalizing a part of an individual in place of one of its spatio/temporal slices, is no small indication of a surprising fact about standard possession, that it expresses an ontological, inalienable relation between what is alienably possessed and a spatio/temporal slice of what possesses it. At the very least, that is a humbling thought.
7 Conclusion Let me gladly admit that much work lies ahead. The sketch in (44) is a syntactic structure corresponding to promissory semantics. Relation R may turn out to be a way of operating on a mental space of some sort, perhaps (somehow) lifting its dimensionality – but this is just a fancy way of talking about the topological little story in (39). Interestingly, although this may be thought of as a lexical semantics, it has to somehow make it into Logical Form, or else we will not predict the absence of scope interaction in (40). Basically, the possessed element associated to T2 does not take scope because it does not have the right type to be a scopebearing element. Needless to say, (44) can be directly plugged into the Kayne/ Szabolcsi syntax for possession, and may be seen as a generalization of their insight. Philosophically, the main conclusion I would like to reach is perhaps not surprising for the linguist. The view that possession is out there in reality, and we code it trivially through little things like have, with, and all the rest, is mistaken. I also think it is wrong to think of possession as the manifestation of a lexical relation that certain terms allow. Possession is a kind of syntax, with well-behaved properties. Correspondingly, the semantics of possession seems to be a kind of presentational operation. If so, possession is just a cover term for something which may happen in various mental dimensions that embed within one another. Much as I would like to turn all of this into an argument against the capitalist dictum that derives being from having, I am satisfied with getting closer to an understanding of the distributional properties of possession, without blaming them on whim, metaphor or mistakes people make.
211
11 TWO TYPES OF SMALL CLAUSES Toward a syntax of theme/rheme relations† with Eduardo Raposo
1 Introduction In this chapter we propose that the fundamental difference between “stage” and “individual” level predicates is not lexico-semantic and is not expressed in thematic/aspectual terms. We study the apparent differences between small clauses with an individual and a stage level interpretation (which are selected by different types of matrix verbs) and argue that these differences are best expressed by way of purely syntactic devices. In particular, we argue that what is at stake are differences in information (theme/rheme) structure, which we encode in the syntax through different mechanisms of morphological marking. There are no individual-level predicates, but simply predicates which in some pragmatic sense “are about” their morphologically designated subject (an idea already present in Milsark 1977). There are no stage-level predicates, but simply predicates which rather than being about their thematic subject are about the event they introduce. The distinction corresponds roughly to what Kuroda (1972) calls a categorical and a thetic judgment (a terminology we adopt). The former is about a prominent argument (for us, a CATEGORY), while the latter is simply reporting on an event. A minimalist grammar encodes differences of this sort in terms of morphological features. These features are checked in a designated site F which interfaces with the performative levels, where aspects of intentional structure are expressed. Having argued for this syntactic account, the chapter proceeds to posing two related semantic questions. We deal with why it should be that categorical (individual level) predication introduces a standing characteristic of a CATEGORY, while thetic (stage level) predication introduces a non-standing characteristic of a standard subject argument. We also propose a line of research for why CATEGORIES should be “strong” in quantificational terms, while standard arguments may be “weak,” roughly in Milsark’s original sense. We suggest that these two semantic properties follow from, and do not drive the syntactic mapping. Our approach, thus, is blind to semantic motivation, although it is not immune to semantic consequence. Our main motivation in writing this chapter is that this is the correct order of things, and not the other way around. 212
TWO TYPES OF SMALL CLAUSES
2 Types of predication in small clauses Higginbotham (1983a) shows that Carlson’s (1977) distinction between individual-level and stage-level predication holds systematically even inside the simplest syntactic predication, the small clause. This raises an intriguing question, if small clauses (SC) are as proposed by Stowell (1981) (1), which leaves little room for expressing structural differences: (1) [XP NP [XP Pred]] Raposo and Uriagereka (1990) in specifically Carlson’s terms, and Chung and McCloskey (1987) in comparable terms, show systematic differences in the distribution of individual-level and stage-level SCs. Thus, only stage-level SCs can be pseudo-clefted (2), right-node raised (3), focus fronted (4), and be dependents of what . . . but . . . constructions (5). We illustrate this with Spanish, although the same point can be raised more generally in Romance and Celtic languages:1 (2) a. Lo que noto es [a María cansada]. what that note.I is to Maria tired “What I perceive is Mary tired.” b. *Lo que considero es [a María inteligente]. what that consider.I is to Maria intelligent (“What I consider is Mary intelligent.”) (3) a. Yo vi y María sintió a Juan cansado. I saw and Maria felt to Juan tired “I saw and Maria felt Juan tired.” b. *Yo creo y María considera a Juan inteligente. I believe and Maria considers to Juan intelligent (“I believe and Maria considers Juan intelligent.”) (4) a. Hasta a Juan borracho vi! even to Juan drunk saw.I “Even Juan drunk have I seen!” b. *Hasta a Juan inteligente considero! even to Juan intelligent consider.I (“Even Juan intelligent do I consider!”) (5) a. Qué iba a ver, sino a su padre borracho? what went.he to see but to his father drunk “What could he see but his father drunk?” b. *Qué iba a considerar, sino a su padre inteligente? what went.heto consider but to his father intelligent (“What could he consider but his father intelligent?”) Certain heads, such as perception verbs, take only stage-level SCs. Others, such as opinion verbs, take only individual-level SCs. Furthermore, the individuallevel SC must be associated to the head selecting it throughout a derivation, 213
DERIVATIONS
while this is not necessary for the stage-level SC, which can be displaced from the government domain of its head. So minimally a treatment of these matters must explain (i) how selection is done in these instances (i.e. what does one select if the structure is just (1)), and (ii) why the two types of SCs behave differently with respect to their dependency on the head that selects them (see the Appendix).
3 Some recent proposals An approach taken for SCs by Iatridou (1990) and Doherty (1992), and more generally for other predicates by at least Diesing (1990), De Hoop (1992) and Bowers (1992b), builds on Kratzer’s (1988) claim that only stage-level predicates introduce an event argument position e (a line discussed as well in Higginbotham 1983a). But it is not obvious what this means for SCs. The first difficulty arises because, although the different kinds of predication are easy enough to ascertain, it is not clear that there are pure individual-level or stage-level predicates. Thus, one can see or feel John finished as much as one can consider or declare John finished. In many languages John is finished may take a stage-level or an individual-level mark, such as an auxiliary or a given Case form in John. In fact one wonders whether the most rigidly individuallevel or stage-level predicates (I saw him angry vs. ??I consider him angry; ??I saw him intelligent vs. I consider him intelligent) are so narrow because of purely pragmatic considerations.2 But pragmatics aside, the grammar must provide a way in which a regular predicate may be taken as either a standing or a transient characteristic of its subject. This of course is the traditional intuition, however we may end up treating it. So a Kratzer-type approach forces us to systematically duplicate the syntactic representation of predicates like finished, angry or intelligent. In Kratzer’s terms, this entails that all predicates are ambiguously selected into phrase markers as in (1): either with or without an extra argument. The syntactic expression of this systematic ambiguity is not without problems. The intuition that all variants of Kratzer’s approach pursue is this. At D-structure the subject of an individual-level predicate is outside the lexical projection of this predicate. There are different ways of executing this, but mechanics aside, the question for SCs is clear. What does it mean for a subject to be outside of an SC in D-structure? SCs are not VPs. They are simple predication structures. To be outside of a SC is not to be part of the SC. So either our conception of these constructions as in (1) is incorrect, or else subjects for these elements are simply not outside of their domain. More generally, within current syntactic views and particularly in the Minimalist Program of Chomsky’s (1993b), all arguments are projected within the lexical domain of a word, since there is no level of representation to project them otherwise. That is, there is no D-structure to say that the argument Y of X is outside of the projection of X. If Y is an argument of X, Y starts within the X-shell associated to X. Second, and more generally, it is unclear what it means for a predicate not to 214
TWO TYPES OF SMALL CLAUSES
have a Davidsonian argument. The neo-Davidsonian project of Higginbotham (1985) and (1987) is rather straightforward about this. Clearly, Davidson’s original motivation for the event positions holds inside the simplest of SCs. Thus, you can consider Hitchcock brilliant, and raise the consideration vis-à-vis other Hollywood directors, only for his American movies, and putting aside his sexism. All of this can be predicated of the eventuality of Hitchcock’s brilliance, and it is unclear how to add these circumstances otherwise, short of falling into the poliadicity that worried Davidson and motivated event arguments to begin with.3 Third, empirical problems arise. Diesing (1990) argues that Kratzer’s approach is incorrect. Citing evidence from Bonet (1991), Diesing notes that in Catalan all subjects are VP internal, including subjects of individual-level predicates. Bonet’s argument is based on floating quantifiers, which following Sportiche (1988) she assumes originate VP internally. Floated quantifiers can be VP internal regardless of the nature of the predicate, as (6) shows: (6) The pigs are all stout. The floating quantifier in (6) tells us the underlying position of the subject, which must thus be internal to VP. To address this issue, Diesing (1990) proposes two types of Infl. Stage-level predicates have an Infl whose subject is base generated in VP, with raising being a possibility. Individual-level predicates have an Infl that assigns a -role to its Spec , with the import “has the property x,” x being expressed by the predicate. The NP in this Spec controls a PRO subject internal to VP, which gets the -role assigned by the V. The floated quantifier in (6) modifies the PRO in VP. Note that Diesing’s proposal alters the thematic relations by adding a -role to the structure. Each individual-level predicate that exhibits an adicity of n arguments is in effect of adicity n1, with the “subject” involving systematically two arguments in a control relation, an overt NP and an extra PRO. Following our 1990 proposal that SCs involve an Agr projection, Diesing’s approach could be adapted to SCs as in (7): (7) a. [AgrP NP [agr [XP PRO [XP IL Pred]]]] b. [AgrP [AGR [XP NP [XP SL Pred]]]] (We use the notation agr vs. AGR to distinguish each type of Infl.) Here the structure of the SC is invariant (as in (1)), and what changes is the structure that selects this SC (the agr/AGR head). Difficulties arise for Diesing’s approach when extending it to SCs. The idea is incompatible with standard analyses of (8a), taken from a similar example in Rizzi (1986). The clitic me “to me” climbs from inside the predicate fiel “faithful” up to the matrix clause. Climbing is local, which follows from the ECP (Kayne 1991; Roberts 1994; Uriagereka 1995a). But if the clitic governs its trace in (8c), nothing prevents the PRO that Diesing hypothesizes from being governed from outside its SC: 215
DERIVATIONS
(8) a. Juan me es (considerado) fiel. Juan me is considered faithful “Juan is considered faithful to me.” b. __ es (considerado) [Juan [fiel me]] c. … me … [AgrP NP [XP Agr [PRO [fiel t]]]] That PRO is indeed (undesirably) governed when it is the subject of a complement SC is shown in the ungrammatical examples in (9). Whatever the ungrammaticality of governed PRO ultimately follows from, it is unclear why PRO in a Diesing-style (8c) would be allowed to be governed. (9) a. b. c. d. c.
John tried [[PRO to be intelligent]] *John tried [[PRO intelligent]] *it seems [that [John is intelligent]] John seems [t (to be) intelligent] *it seems [PRO (to be) intelligent]
Consider also (10), a Dutch example discussed by De Hoop (1992): (10) Els zegt dat er twee eenhoorns intelligent zijn. Els says that there two unicorns intelligent are “Els says that two (of the) unicorns are intelligent.” De Hoop notes that in (10) the individual-level subject is VP internal. These data, unlike Bonet’s, cannot be explained away by positing a PRO inside VP. The specifier of IP is taken by an expletive.4 (10) forces us to assume, contra Kratzer, that all subjects start internal to the predicate projection, and, contra Diesing, that there are no special thematic relations associated to individuallevel predicates. Then, if something similar to the initial intuition is to be pursued, subjects of individual-level predicates must be forced out of the predicate projection in the course of the derivation. In the minimalist project, this conclusion is forced onto us. There are no levels of D-structure or S-structure. So if the distinctions noted in the literature are real, they must be expressed at (or by) LF. We discuss this next.
4 A more traditional approach There are three proposals in the recent literature which we want to (freely) build on. In the spirit of Kuroda (1972) and Milsark (1977), Schmitt (1993, 1996) notes that individual-level predicates introduce a depiction of their subject, while stage-level predicates present their subject as a mere event participant (see also Suh 1992 for Korean). For Schmitt, these two are different in aspectual terms, the former lacking aspect entirely. In her analysis, -roles are not assigned in the absence of aspectual dependencies, and hence individual-level dependencies are pure predications while stage-level dependencies are n-adic relations of a thematic sort. Although we believe there is something predicative 216
TWO TYPES OF SMALL CLAUSES
to individual-level dependencies which is not so clear in stage-level dependencies, we do not believe that this is to be expressed in terms of -roles missing in the first. Otherwise, we have to again posit a systematic ambiguity of predicates appearing in the individual-level or stage-level mode. For us all predicates are unique in having however many -roles they have, and if an extra predication of some sort is involved in individual-level instances, this is to be achieved in some other fashion. For Herburger (1993a), which deals with the definiteness effect, it matters what the LF position of a subject is vis-à-vis the event operator’s. Although this is not axiomatic for her, in individual-level instances the subject has scope over the event operator, while in stage-level instances the event operator has scope over a weak subject, a matter that we ultimately build on. But Herburger’s individual-level and stage-level predications have the same initial phrase marker; thus, it is not possible in her system to select one or the other type of SC. Second, for her the LF site of scope-taking elements is a matter of QR. This raises a problem for subjects which are quantificational and take scope outside of the event operator. Something like everyone is available does not have to invoke an individual-level reading (see Section 7). De Hoop (1992) concentrates on morphological ways of signaling the individual-level/stage-level distinction, thus is prima facie interesting from a minimalist perspective. Her system is actually different from ours, and goes into a semantic typology which we need not discuss.5 Pursuing the idea that Case affects interpretation, we want to claim that subjects of individual-level and stage-level predicates are marked with a different form of Case. This recalls the well-known distinctions found in East Asian languages that present topic markers, and is very welcome in a system like ours where the LF mapping is driven by the presence or absence of given features. The gist of our proposal builds on an intuition that both Kuroda (1972) and Milsark (1977) share. Individual-level subjects are what the sentence is about. More generally, (a sub-class of) topics are what the sentence is about. These “aboutness” subjects are highlighted by the grammar in a special way: a morphological case-marker, a phrasal arrangement, an intonational break, etc. We want to propose that this and nothing else is what IL-hood is, mere aboutness of a phrase which is appropriately (Case) marked. From this point of view the right split is not between individual-level and stage-level subjects. Objects too can enter into this sort of relations, as is known from examples like the non-contrastive Caesar, Brutus didn’t particularly like.6 This is the way in which the grammar allows us to talk about Caesar when this element is a grammatical object. Interestingly, strong restrictions apply in these topicalizations. For instance, Fodor and Sag (1982) discuss the impossibility of indefinite topics (??someone or other, Brutus didn’t particularly like). Also, this sort of topicalization is much more felicitous with states than with active events, particularly if these are specified for space/time (??Caesar, Brutus killed in the Senate yesterday morning). This strongly suggests that, in the spirit of Chomsky (1977a, 1977b), we should take topics to be subjects of a particular kind of 217
DERIVATIONS
predication, and that this predication has the semantic import of holding of its subject in a standing manner, that is irrespective of the events at which this subject participates. Of course this looks like a description of individual-level predication, but it is a description of topicalization instead. In sum, our intention is to construe individual-level predication as a sub-class of topicalization, itself a predication. To distance ourselves from other uses of this term, we reintroduce old terminology. We assume that the grammar encodes relations between PREDICABLES and CATEGORIES of various sorts, and that these need not be expressed in neo-Davidsonian (event) terms. That is, we take Caesar, Brutus didn’t like to have the same eventive structure as Brutus didn’t like Caesar, although the former invokes an extra predication between the displaced Caesar and the open expression left behind. More generally, we take something like Brutus killed Celts to be ambiguous between the obvious statement about what happened (say, at Brigantium in the first century BC) and an aboutness statement concerning Brutus. That was what Brutus characteristically engaged in. In the latter instance, we take Brutus to be displaced to a position outside the scope of the event operator. In order not to confuse matters with terminology from a different tradition, we adopt Kuroda’s distinction between thetic (formerly, “stage-level”), and categorical (formerly, “individual-level”) predications. A stage-level predication is henceforth referred to as a thetic-predication and an individual-level predication as a categorical-predication. It is important to note also that we take topicalization to involve a particular site. Chapter 5 and (Uriagereka 1995a, b) argue for an F category encoding the point of view of either the speaker or some embedded subject, which serves as the syntactic interface at LF with the pragmatic systems.7 We assume topicalization is to F because there are many predications that take place inside a regular sentence, but we take only one of those to be the main assertion. For example, in our proposal, the main assertion in John likes everyone is NOT about everyone (that John likes them), but rather about John, the topic of the sentence, (that he likes everyone). Basically, F is the designated position for the pragmatic subject which the main assertion is about, regardless of other presupposed predications. We have just illustrated our account with normal predicates, but a similar approach is possible for SCs, assuming the structures we argued for in Raposo and Uriagereka (1990). As Doherty (1992) shows, different functional projections can introduce SCs. This is illustrated in (11) for Irish (we assume that although the facts may not be that obvious elsewhere, they still hold more or less abstractly, with the same syntax needed for (11)). Note that the subject of a thetic SC (11b) receives a different Case than the subject of a categorical SC (11a). The latter is accusative, a default realization in Irish, while the former is nominative. The Agr projection introducing each SC is different as well. In the thetic SC we have a strong agreement element, the particle ina containing a subject clitic, while in the categorical SC agreement is abstract (pronounceable only in identity predications). Auxiliary selection is different too: the categorical auxiliary is vs. the thetic auxiliary ta. 218
TWO TYPES OF SMALL CLAUSES
(11) a. Is fhear e. is-cat man he-acc “He is a man.” b. Ta se ina fhear. is-thet he-nom in-his man “He is a man (now).” Given these facts, several things follow. First, although SCs are always identical in structure, they are associated to two different sorts of Infl, in the spirit of Diesing’s (1992) distinction. It is these inflectional elements (whatever their nature) that are selected by different heads, thus solving the selection puzzle. Unlike Diesing’s Infls, though, ours do not introduce extra arguments, but simply entail two different forms of Case realization. The default Case associated to what we may call C(ategorical)-agr marks an LF topic, while the regular Case associated to an A(rgumental)-AGR does not. We assume that pragmatic considerations demand that sentences be always about something, and thus even when an argument is not marked with the appropriate features to be in topic position, something else must. We may think of thetic auxiliaries as the equivalent of topic markers for thetic predicates. Recasting traditional ideas, we assume that in this instance the predicate gains scope over the rest of the expression, which is thus, in a sense, about the predicate.8 From this perspective SCs are just the simplest instances where the system presented here operates.9 In the minimalist project, movements like the displacement of material for aboutness purposes need a trigger in terms of appropriate morphological features, and a target where the features are checked. For this we assume the F position, whose Spec is the landing site of aboutness phrases, among others. The appropriate features are assigned as in (12) below. Weak C-agr assigns C(ategorical)-CASE (12a), which is realized in the Spec of FP as a default Case (accusative in Irish).10 Strong A-AGR assigns a more standard A(rgument)-case (12b), which is realized in various forms in different languages. The latter is the usual element that signals a -dependency. C-CASE (12) a. [agrP__ [C-agr [XP NP [XP Pred]]]]
(Categorical predication)
A-case b. [AGRP__ [A-AGR [XP NP [XP Pred]]]]
(Thetic predication)
Though ultimately this is a technical matter, we may need to relax the Visibility Condition as in (13b), since many languages mark displaced arguments just with a C-case, which must suffice for the trace of this argument to be visible at LF. In turn (13a) is added to restrict the kind of elements that can appear in a topic position. Intuitively, only those elements which are appropriately marked can raise there. 219
DERIVATIONS
(13) a. For X a lexical argument of PREDICABLE Y, X is the subject of Y only if X is marked as a CATEGORY through C-CASE. b. For X a lexical argument of Predicate Y, X is interpreted as an LF argument of Y only if X receives Case [either C-CASE or A-case]. To illustrate the mechanics, reconsider (11). In both examples, there is an SC [he [man]]. In (11b), where the SC is associated to AGR ina, “he” realizes nominative A-case (not C-CASE). This prevents the SC from being about a CAT se “he.nom,” given (13a). In contrast, in (11a), where the SC is associated with Agr, “he” receives C-CASE, a default accusative in Irish. The SC in this instance can be about a CAT e “he.acc.” But although Irish marks relations in this particular way, other variants are conceivable. The default mark of C-CASE may be nominative or a topic marker.11
5 Some semantic questions Given this syntactic system, we can show that our approach may have consequences for two well-known semantic issues. One is why subjects of categorical predicables do not tolerate weak quantifiers. We return to this in the Appendix, but note that from our viewpoint this must be the same reason why aboutness topics cannot be weak quantifiers. The second question is why categorical predicables are taken as standing characteristics of their subjects, while thetic predicates are transient. As a point of departure for that second question, consider the proposal in Herburger (1993a). Following Higginbotham (1987), she assumes that all predicates, including Ns, come with an event variable.12 If at LF the subject of a predicate is inside the scope of the event operator, this operator binds both the variable of the main predicate and that of the N. Thus in a man is available the event operator binds the event variable of available and the event variable of man. This translates as there being an event of availability, at which a man is the subject: (14) ∃e [available(e) ∃x [man(x, e) & Subject(x, e)]] If the subject of a predicate is outside the scope of the event operator, the operator does not bind into the NP. Therefore, in the man is intelligent the event operator binds only the event variable of intelligent. This translates as there being a (unique) man for an event of intelligence, of which the man is the subject:13 (15) [The x man(x, e)] ∃e [intelligent(e) & Subject(x, e)] We will pursue a version of this approach, though not this one. It may seem that the mechanism just discussed gives us intelligence as a standing predicate, leaving availability as a non-standing predicate lasting while the man is at that event. However, take the situation raised in Note 5. Bobby Fischer is a genius (i.e. “genial” in the old sense of the word). Consider a logical form where we 220
TWO TYPES OF SMALL CLAUSES
deny there being an event of geniality of which Fischer is the subject. This is the logical form that would result from having an LF in which the event operator has scope over the subject, resulting in a thetic predication, and denying that. The question is, when we say (last night) Fischer wasn’t genial, is that contradictory with the statement Fischer is genial? The answer is surely “no,” but it is not clear what in the logical form yields this result. Thus, consider (16b), corresponding to the Spanish (16a) (we have replaced the champion for Fischer in order not to go yet into the semantics of names): (16) a. El campeón es genial pero no está genial the champion is-C genial but not is-T genial “The champion is genial but is not genial right now.” b. [[The x champ(x, e)] ∃e [genial(e) & Subject(x, e)]] & ⬃[∃e [genial(e) [The x champ(x, e)] & Subject(x, e)]] (16b) conjoins two statements in such a way that a contradiction ensues. The logic is clear. If geniality holds of the champion irrespective of his being in a given event (and that is what the first conjunct asserts), then geniality will hold of him at all events in which he participates (and that is what the second conjunct denies). However, Spanish speakers find (16a) sensible. Herburger (1993b) suggests that in this situation the first conjunct asserts something along the lines pursued by Chierchia (1986): the champion is generally genial. The contradiction then disappears. But this is not the way a predicate like genial is interpreted. To be genial you do not have to be generally that – most geniuses are rarely genial. It is unclear (and irrelevant) what counts in making someone genial. Whatever that may be, it is not obvious that (16) can be explained away without complicating the semantics suggested in (14)/(15) for the C/T distinction. To avoid the contradiction in (16) we must modify the more general, categorical statement (which is falsified by the more concrete thetic statement). Two ways come to mind. We change the predicate in these instances (which is what Herburger (1993b) suggests), or we instead change the subject (which is what we plan to do). That is, a (trivial) way to avoid the contradiction in (16b) is to have the subject in each instance be a different subject. Suppose that we have a fine-grained semantics that allows us to distinguish Fischer at a given event from Fischer at some other event or irrespective of the event he participates in. Then we could avoid the contradiction. Geniality holds of Fischer decontextualized (“one” Fischer), and lack of geniality holds of Fischer in the context of some event (“another” Fischer). Although syntactically straightforward, this approach may seem tricky for the semantics of names: by “splitting” Fischer this way we get into questions about what it means for “Fischer” to be a rigid designator. Chapter 12 addresses this matter, rejecting a treatment of Fischer as a mere constant or a label for an object, by introducing Spanish examples of the sort in (17c): 221
DERIVATIONS
(17) a. En España hay mucho vino. “In Spain there’s much wine.” b. En España hay mucho torero. “In Spain there’s much bullfighter.” c. Hay Miguel Indurain para rato porque aun queda mucho Indurain por descubrir. De todos modos, el Indurain que más me sigue impresionando es el contra-relojista. “There’s Miguel Indurain for a long time because there’s still much Indurain to discover. In any case, the Indurain that continues to impress me is the time-trialist.” Notice how the name Indurain in (17c) appears in the same sorts of contexts as the noun torero “bullfighter,” which in Spanish can be the contexts where mass terms like vino “wine” are possible (see Chapter 15 on this).14 Some of the difficulties raised by (17c), in particular, or our approach to (16) for that matter, can be addressed adapting ideas discussed by Higginbotham (1988), who builds on insights of Burge (1974). Contra an important philosophical tradition, rigidity is arguably not part of the nature of an expression, such as a name, but rather is a result of the linguistic system. In Higginbotham’s proposal, all predicates introduce an event variable and also a second order context variable. How this free variable is set is a matter that Higginbotham leaves purposefully open for speakers to determine. It is perhaps a process involving cognitive mechanisms not unlike those involved in the contextualization of “measured” mass terms, as the parallelism in the examples in (17) suggests. The point is, we need contextualized notions such as “bull-fighter” or “Indurain” as much as we need notions like “wine,” although presumably each of these is presented differently (e.g. individual terms vs. mass terms, etc.). If we are ready to distinguish Fischer at a given context and Fischer at some other context (or no special context) a question to raise is what makes that Fischer. For us, this rigidity concern makes sense only as a linguistic property of some expression. In fact, it is relatively similar to that of what it is to be known as (the kind) wine, (the kind) bullfighter, and so on. All that we need here is the assumption that speakers figure this out in concrete cognitive terms, however it is they do it. Furthermore speakers are capable of distinguishing that whatever it is that makes something what it is does not entail that it has to be so “unified” in all events, or (17c) would not make any sense. It is beyond the scope of this chapter to discuss a possible mechanism involved in these matters (though see Chapter 12). For our purposes here, it is of no concern what makes the two sentences in (16) be sentences about specifically Fischer or the champ. Since we take Fischer to be a predicate, we may assume that what makes Fischer Fischer is (perceived or imagined) “Fischerhood,”15 just as “winehood” makes wine wine, for instance. The theoretical significance of all of this for present purposes is that we crucially need context variables, for it is in context that speakers present notions in various ways. Context allows us to speak of a mode of Indurain in (17c), or a 222
TWO TYPES OF SMALL CLAUSES
decontextualized Fischer or Fischer at an event in (16). Our plan now is to achieve the results sketched in (14) and (15) in terms a syntactic realization of these context variables, and not event variables.16
6 Contextual dependencies Assuming every quantificational expression comes together with a context, in sentences of the form “S is P” we need at least two. We require a context of “S-hood” corresponding to the subject, and a context of “P-hood” corresponding to the main predicate. Suppose further that contexts are set within other contexts, much like quantifiers have scope inside one another. If so, assuming that X is the context of the subject and Y is the context of the predicate, a sequence of contexts X, Y is interpreted differently from a sequence of contexts Y, X. The first of these sequences would introduce a context Y for predicate P within the context X for subject S. Conversely, the second sequence would introduce a context X of the subject within the context Y for the predicate. Let us say that under these circumstances a given context grounds another context. As suggested before, let both arguments and predicates have the option of being displaced to a topic site. Starting out with a structure S is P, suppose that the subject is in the LF topic site, as is the case in a categorical predication. Here the subject with context X has scope over the predicate with context Y in situ. This has the effect of confining the range of the context of the predicate to that of the subject, thus grounding context Y on context X. So a categorical assertion introduces a context X of an individual for a context Y of a predicate. In contrast the context of the subject is not grounded on the context of the main predicate. This is what results in the main predicate holding of the subject as a standing predicate, for it is a characteristic of this subject in a decontextualized fashion, not within the context of any given event. Consider next the LF for S is P where the predicate is in the LF topic site, as we assume to be the case in a thetic predication. The fact that thetic predicates are thus displaced derives their transient character. The subject is inside the scope of the event operator, and now it is a subject whose context X is confined to the context Y of the predicate. Whatever predicate may hold of the subject, then, will hold only of a subject grounded at the event the predicate introduces, not a decontextualized subject.17 The fact that in categorical predications the context of the predicate is grounded on the context of the subject should have an effect on the interpretation of the predicate, just as it does on the interpretation of the subject. In particular, a categorical predicable should be dependent on the context of the subject, in a way in which a thetic predicate is not. Observe (18) and (19) in this respect: (18) a. I consider the sea/the plant green. b. I saw the sea/the plant green. 223
DERIVATIONS
(19) a. Considero el mar/la planta verde sencillamente porque consider.I the sea/the plant green simply because la planta/el mar es verde. the plant/the sea ES green b. Vi el mar/la planta verde sencillamente porque la planta/ saw.I the sea/the plant green simply because the plant/ el mar está verde. the sea ESTÁ green. It would seem that in (18a)/(19a) the green we consider to hold of the sea is typically different from that we consider to hold of garden-variety plants (sea green seems bluer). However (18b)/(19b) exhibit no such “canonicity” of green-ness. It may be the case that we saw the sea and the plant with the very same light green, say, as a result of a chemical spill – the typicality of the color is not invoked in this instance. Importantly, the causal continuations in (19) present auxiliary selection (ser and estar) in exactly the way we would want it. It would be unacceptable to use estar in the continuation in (19a) and somewhat odd to use ser in (19b). These facts can be explained in terms of context grounding. Since the context of green in the categorical (19a) is grounded on the context of the plant or the sea, we get a canonical green in each instance. But our account also predicts that only categorical predicables are canonical, since only these are grounded on their subject. A thetic assertion introduces the context of a predicate for the context of an individual. Whatever characteristics green in (18b)/(19b) may have has nothing to do with the subject of the sentence, according to fact. Consider next a precise implementation. Context variables are free variables, whose values are set pragmatically by the speaker. On the basis of what does the speaker set the value of a context variable? Minimally, background information is necessary. More importantly for us, online information is also relevant. In concrete terms, we adapt the semantic interpretation proposed in Higginbotham (1988), following the schema in Burge (1974). For instance, (20b) is the categorical interpretation of (20a):18 (20) a. El campeón es genial. “The champion is genial.” b. In any utterance of (20a) in which the speaker assumes a context X, such that X confines the range of campeón to things x that Xx, for a context Y, such that Y confines the range of genial to events e that Ye, that utterance is true just in case: El x [campeón(x, e) & Xx] Ee [genial(e) & Ye] & Subject(x, e) To quote Higginbotham (1988), (20b) is taken to be the normal form for linguistic data about the truth conditions of whole sentences. If so, then truth values are to be thought of as predicated 224
TWO TYPES OF SMALL CLAUSES
absolutely of utterances, and the contextual features on which interpretation may depend are to be enumerated in the antecedent of a conditional, one side of whose biconditional consequent registers their effects on the sentence as uttered. (p. 34) The only thing we are adding to the Burge/Higginbotham semantics is the assumption that contexts confine contexts within their grounding potential. This is explicit in other frameworks, such as File Semantics (Heim 1982; Kamp 1984) or in the “dynamic binding” system in Chierchia (1992), from which we quote: The meaning of a sentence is not just its content, but its context change potential, namely the way in which a sentence can change the context in which it is uttered. The different behavior of indefinite NP’s and quantificational elements is captured in terms of the different contribution they make to context changes. (p. 113) Although the mechanics implicit in (20b) are different from those implicit in either a file semantics or a dynamic binding treatment of contexts, the conceptual point raised by Chierchia still holds for us, even sentence-internally.19 With Burge and Higginbotham, we also assume that contextual matters affect not just indefinites or quantificational elements, but also names and events more generally, as discussed in Schein (1993). Then something like the semantics in (20b) is necessary, and the main point that is left open is how from a given LF we reach the details of the antecedent of the conditional in (20b). There are two possibilities for this, both of which work for our examples. Hypothesis A encodes contextual scope at LF, for instance in terms of May’s (1985) Scope Principle: (21) Let us call a class of occurrences of operators C a S-sequence if and only if for any Oi, Oj belonging to C, Oi governs Oj. Members of S-sequences are free to take on any type of relative scope relation. Otherwise, the relative scope of n quantifiers is fixed as a function of constituency, determined from structurally superior to structurally inferior phrases. (21) is intended for standard quantification, but may extend to context secondorder free variables under the assumption in (22): (22) Given a structure …[…Xx…[…Yy…]…]…, The value of Y is set relative to the value of X [X grounds Y] only if the operator Ox takes scope over the operator Oy. Hypothesis B expresses contextual grounding after LF. Assuming that syntactic LF representations are mapped onto some intentional (post LF) Logical Form encoding relations of the sort in (14)/(15), relative contextual specifications may be read off of those representations. If this is the case, nothing as syntactically 225
DERIVATIONS
articulated as (21) and (22) would be at issue, and rather something more along the lines of traditional logical scope would determine that context at a given point serves as context for what comes next.20
7 Quantificational subjects An important puzzle for our system is posed by the fact that the quantificational properties of the subject do not affect the Categorical/Thetic distinction. For instance, cada hombre “each man” in (23) does not force a categorical interpretation. (23) Cada hombre está preparado. “Each man is ready.” Context variables are introduced so as to provide an appropriate confinement for quantificational expressions like each man and the like. But note that if we were to care only about the context of the entire quantificational expression (henceforth the “large” context of the NP), this would not help us in getting thetic readings. That is, in (23) the subject quantifier has scope over the event operator, thus yielding multiple events of readiness. This presumably means that the large context of the quantifier grounds the entire expression, as it should. The context of the restriction of each man is confined to relevant men, so that the sentence means that for each relevant man, it is the case that he is ready. That is fine, but orthogonal to the point that each of the men in question is in a state of readiness. In our system, for each relevant man there has to be a context (henceforth the “small” context) such that this small context is grounded on the scope of the event operator, yielding a non-standing character for the predicate holding of each man. From this viewpoint the issue is to get small contexts out of expressions apparently introducing only large contexts. Basically what we want is that the “large” context of the restriction of each be displaced together with the determiner, without displacing with it the “small” context somehow associated to the variable. There is a class of expressions where this is directly doable: (24) Cada uno de los hombres está preparado. Every one of the men is ready. Given the right syntax for partitive expressions, we may be able to scope out each . . . of the men and actually leave behind one. If this one element brings its own context with it, we are right on track. The intuition is that partitive expressions are contextually transparent, representing their large context through the partitive phrase of/among the N and their small context in terms of the element one. The head of a partitive expression is one, of/among the N being attached at a different level in a phrase marker, both in Spanish and in English (and in many other languages). Uriagereka (1993) sketches an approach to these sorts of facts, building on the analysis of possession by Szabolcsi (1983) for Hungarian, assumed by Kayne (1993) for English. In the Kayne/Szabolcsi structure, a pos226
TWO TYPES OF SMALL CLAUSES
sessive DP is more complex than a regular DP, involving a relation (“possessor,” “possessed”). John’s car implies that John has a car (assigned), which translates, for reasons we will not go into, to the syntax in (25) (see Note 21): b. After Move
(25) a. Initial PM DP
c. PF: John's car
DP
DP
DP John
D' D POSS
AgrP D POSS -s t Agr'
AgrP
John
Agr' Agr
D'
NP car
Agr
NP car
Uriagereka’s analysis suggests that, in partitive structures, the definite DP plays the role of the “possessor” while the indefinite NP plays the role of the “possessed.” One in each one of the men is to the men what car in John’s car is to John (both in English and in Spanish). Each one of the men implies the men include ones.21 The advantage of this syntax is that it provides us with a phrase marker where the men and one occupy entirely different nodes as “possessor” and “possessed.” In turn, the entire “possessive” DP can be the restriction of a determiner like each in (26) below.22 What is crucial there is that one, which is in the Spec of the possessive DP in the overt syntax, may move autonomously. Hence it is able to topicalize to FP, or to “reconstruct” to a VP internal subject position, all of it at LF. Assuming that one moves around with its own context, we will get a structure which gives us the right meaning. In particular, each one of the men means something like each instance of a partition of the men has an associated one: b. After Move
(26) a. Initial PM
DP
Q each
c. PF: each one of
QP
QP
DP
DP
Q each DP one
D' D POSS
the men
D POSS
AgrP
the men Agr
D'
of
Agr' DP one
227
AgrP
the men
Agr' Agr
NP t
DERIVATIONS
We get “variable” men from this structure because we are partitioning the set of relevant men in ones, each corresponding to a part of the set. We placed the word “variable” in scare quotes since the element in question is not a variable in the standard logical sense, an artifact of the semantic system. Nonetheless, this syntactically represented element has a denotation which is similar to that of a variable (though see below). Even if the QP in (26) undergoes QR in the sort of structure in (24), we can still use one with its associated “small” context to “reconstruct” inside the scope of the event operator, so that the “small” context is set in terms of the context of the event, as in other instances of thetic predications. Once the syntax in (26) is motivated for partitives, we may use it even for (23), in order to separate the Q element with its “large” context, from the “variable” obtained by predicating one of something. This is, as we saw, a “variable” resulting from a syntactic formative (one) which brings its own context, as it is conceived differently from the object it is a part of. In effect this mechanism divorces the quantifier from its otherwise automatically assigned semantic variable, and allows the “variable” to have its own context, as the predicate it actually is.23 The account then follows as in (26) – except, of course, for the relatively minor point that (23) does not exhibit an overt partitive format. The relevant structure is (27): b. After Move
(27) a. Initial PM
DP
Q cada
c. PF: cada hombre
QP
QP
DP
DP pro
D' D POSS
DP
Q cada
D POSS
AgrP
hombre Agr
D' AgrP
hombre
Agr' DP pro
Agr
Agr' NP t
Here an empty (pro) classifier is generated in place of one in (26). Presumably cada pro hombre, literally “each one man,” means something slightly different from the standard cada hombre “each man.” In the latter there is nothing special to say about the values for the relevant variable, while in cada pro hombre there is. A “variable” effect obtains only because hombre “man” stands in an “integral” relation with regards to pro (and see Note 22).24 Consider in that respect the Spanish contrasts in (28): (28) a. Todo campeón es/*está genial. All champion ES/ESTÁ genial 228
TWO TYPES OF SMALL CLAUSES
b. Todos los campeones son/están geniales. All the champs ES.PL/ESTÁ.PL genial In Spanish many quantifiers may, but need not, exhibit number specifications. Curiously, when they do not, they cannot be subjects of thetic predications either (though see (29)). We may suppose that number licenses the formative pro in (27) and similar environments.25 Then in (28a) there would be no pro classifier, and hence todo campeón “every champion” may not be the subject of a thetic predication. In particular, the quantifier todo forces QR outside the scope of the event operator, and since there is no separate, syntactically independent “variable” with its own small context to be left inside the scope of the event operator, expressions with subjects of this sort must be categorical. An important qualification is in order. (28a) becomes perfectly good if a specific context is provided. For example: (29) Todo campeón está genial alguna vez en su vida. All champion ESTÁ genial some time in his life However, we submit that in this instance it is the adverbial alguna vez en su vida “some time in his life” that allows for the thetic reading, in that it provides an explicit context for the variable of todo campeón “all champion,” crucially inside the scope of the event operator. Note that, to the extent that these considerations are on track, it is clear that contextual considerations determine auxiliary selection for the thetic/categorical distinction, as we expect in the terms just discussed.26
8 Concluding remarks It cannot be decided by just looking at a given predicate whether it will have an “individual-level” or “stage-level” interpretation. Most predicates can be used either way, depending on the context. For us the word “context” has a rather concrete meaning, in terms of the grounding of an event variable, whose syntactic realization we assume. The essential conclusion of this work is that the main distinction between thetic and categorical judgments arises as a result of the syntactic structure in each instance. In a categorical judgment the proposition is about a marked element, normally a subject but possibly an object, which has what amounts to a topic character. This topic’s context grounds the contextual specifications of whatever main event is introduced in the proposition. In contrast, in a thetic judgment the proposition is not about a salient topic. It is about the event itself, which gains structural prominence through dull syntactic mechanisms (e.g. auxiliary selection or Case/agreement properties). Intuitively, then, subjects of categorical judgments are structurally higher than subjects of thetic judgments. This has nothing to do with their thematic structure, and is rather a consequence of their intentional properties (in terms of context confinement). Aside from that main conclusion, the chapter discusses various interesting assumptions one needs to make for the analysis to work, in particular 229
DERIVATIONS
regarding the syntactic structure of quantifiers, as well as a couple of interpretive consequences, one of which is explored further in the Appendix.
Appendix Suppose the condition in (1) is true: (1) In a PREDICABLE(CATEGORY) relation, CATEGORY is anchored in time. The intuition is that a judgment is evidential, and a point of view expresses through time its actualization of a given CATEGORY, a prerequisite for any judgment. Assume also a syntactic condition on anchoring, as in (2): (2) A anchors B only if B is local with respect to A. “Local-with-respect-to” can be thought of in terms of government, or if one has minimalist scruples with this notion, something along the lines of (i) in Note 7, repeated now as (3): (3) B is referentially presented from [or anchored to] the point of view of the referent of A iff A is a sub-label of H, whose minimal domain M includes B. To make (3) pertinent to (2), we could adapt it as follows: (4) B is anchored to A iff A is a sub-label of H, whose minimal domain M includes B. “Minimal domain” is to be understood in the sense of Chomsky (1993b), that is a collection of dependents of a given head H (its complement, specifier, and all adjuncts to either H, H’s projections, or H’s dependents). A sub-label of H is either its own label or any feature adjoined to H, whose attraction triggers the transformational mapping. Note that (3) is a sub-case of (4), whereby anchoring to a formative with evidential characteristics – typically an element assumed to denote a sentient being – results in a point-of-view effect. From the perspective of (4), (1) amounts to this corollary: (5) Anchoring Corollary (AC) In a PREDICABLE(CATEGORY) relation, there is a temporal element T such that T is a sub-label of H, whose minimal domain M includes B. For the AC to be met, there must be a temporal element T which is local with respect to the relevant CATEGORY. For instance: (6) [HP[CATEGORY] [H T[H H]] [… [t PREDICABLE] …]] This is a situation arising in topicalization, where H is the F element discussed in Section 4, and Tense is adjoined to F. 230
TWO TYPES OF SMALL CLAUSES
Another possibility for meeting the AC involves a situation where there has not been displacement of the syntactic element which we want to treat as the CATEGORY, but nonetheless it happens to be local with respect to a T; for example: (7) [TP [T] [XP [NP…] [XP…]]] If a SC is as simple as noted in (1), where NP is base adjoined to XP, a T introducing SC will be local with respect to the CATEGORY NP. This amounts to saying that, when SCs are involved, predications anchored to time are possible without the need to topicalize their subject. The AC is a condition on whatever happens to be a CATEGORY. Given an appropriate source of C-Case, as in (12a), Section 4, for a particular nominal expression that will immediately be the relevant CATEGORY. But matters are reversed in conditions of thetic predication, where in effect the CATEGORY is the predicate of the main event. In those circumstances, thus, the syntactic formative in question ought to be in the minimal domain of some item that also has a T element in its checking domain. That is the case for SCs, as in (7), since XP is also local with respect to T. Which amounts to saying that thetic predications in SCs are possible without displacing their predicate to any topic-like position. Those were, ultimately pragmatic, considerations about anchoring CATEGORIES to time. But as we discussed in Section 6, the way we obtain the characteristic semantic differences between categorical and thetic predications is through having concrete elements ground the context variables in the rest of the expression. We saw how that takes place when topicalization is involved, but we have just seen that topicalization may not be needed in situations involving simple SCs (although it is possibly involved in languages, like Irish, where specific Case markings are deployed in these instances too). The question is whether SCs may present, at least in some languages, a structure similar to that of adjectives, which most likely does not determine the relevant placements of elements entering predications in terms of Case. Note that categorical predicables can be substituted by clitic lo “it,” unlike comparable elements involving thetic readings: (8) a. Alcohólico, es lo que considero a Pedro. Alcoholic is it that consider-I to Pedro “An alcoholic, is what I consider Pedro.” b. Borracho, es lo que he visto a Pedro. drunk is it that have.I seen to Pedro This would be explained if, at the level of anaphoric linking with lo, in effect borracho “drunk” in (8b) is not acting as a predicate. That result would be obtained if borracho has to be displaced, for some reason, to a non-predicative site. Assume that thetic SCs are in the end more complex than categorical ones, the former including the latter plus an extra “integral” layer of the sort discussed in (25) above: 231
DERIVATIONS
(9) a.
DP
borracho D
D'
b.
AgrP Agr'
Juan Agr
Agr'
Juan Agr
SC t
AgrP
t
SC t
alcoholico
If this is the correct analysis, the displaced borracho “drunk” goes through the Spec of the sort of possessive DP we discussed in Section 7, and is no longer a pure predicate (thus explaining the contrasts in (8), under the assumption that clitic lo “it” can only hook up with predicates). Intuitively, what (9) proposes is that alcohólico “alcoholic” is something that Juan is taken to be, whereas borracho “drunk” is a state that Juan is supposed to be in, or he possess in some abstract sense. Importantly, the contextual groundings in (9) are of the right sort. In (9a) the context variable of borracho “drunk” grounds that of Juan, whereas in (9b) the opposite is the case, with the context variable of Juan being in a position to ground that of alcohólico “alcoholic.” This is what we expect, without the need to invoke the particular Case system discussed before. If those are the correct structures for SCs, the default orders of SCs in Spanish should be different. This is not obvious at first, given the facts in (10): (10) a. He visto borracho a Juan/a Juan borracho. have.I seen drunk to Juan/to Juan drunk b. Considero a Juan alcohólico/alcohólico a Juan. consider.I to Juan alcoholic/alcoholic to Juan Both orders in (10) are possible. Nonetheless, there is a clear preference for the first orders given in (10), which gets accentuated in some circumstances. Thus: (11) Q: Cómo has visto a Pedro? how have.you seen to Pedro “How have you seen Pedro?” A: La verdad que… a. He visto BORRACHO a Pedro. the truth that… have.I seen DRUNK to Pedro b. He visto a Pedro BORRACHO. have.I seen to Pedro DRUNK “The truth is that I have seen Pedro DRUNK.” 232
TWO TYPES OF SMALL CLAUSES
(12) Q: Qué (cosa) consideras a Pedro? what thing consider.you to Pedro “What do you consider Pedro?” A: La verdad que… a. Considero a Pedro ALCOHÓLICO. the truth that… consider.I to Pedro ALCOHOLIC b. *Considero ALCOHÓLICO a Pedro. consider.I ALCOHOLIC to Pedro “The truth is that I consider Pedro AN ALCOHOLIC.” In question-answer pairs, while both orders are fine in the thetic instance, the categorical one is more restricted, disallowing the SC subject in final position. This is explained if, when possible as in (10b), a post-adjectival order for the SC subject involves focalization, which is incompatible with a second focalization as in the (12b) answer. Suppose also that the D element hypothesized for thetic small clauses, as in (9a), has temporal properties. If so it may be enough to sanction a predication internal to a nominal, in satisfaction of the AC. This would account for why these SCs, as we saw in (2)–(5), Section 1, have a characteristic independence from selecting V’s, and thus corresponding T’s introducing them: the AC is satisfied (practically) internal to the SC, which can then act as its own unit of predication – if carried by the DP layer. Let us turn finally to the general specificity of CATEGORIES. Fodor and Sag (1982) present a traditional intuition about this which seems to us appropriate. One can only talk about specific topics. (13) states this concretely, assuming the issue are judgments and the elements of which they hold: (13) Judgment Principle Judgments hold of actuals. Plausibly, few things count as actual (although lacking a full theory of reference, this is a merely intuitive claim). We would put in that realm classified elements (the one/pro car), prototypical expressions (the automobile), abstract nouns (beauty), and perhaps kind or generic expressions (Americans, an American). These are the sort of elements which can be in the extension of a predicable, yielding a valid judgment. Given (13), we also predict that those events which enter into predications as CATEGORIES need to be actualized. This actualization is plausibly done through a pleonastic like there, which is otherwise mysterious as a subject. Furthermore, suppose that actualization is done in human language through time (14), perhaps a cognitive imposition relating to the fact that time/place coordinates are perceptually unique: (14) Actualization Principle Actuals are mapped to time. The modular interaction of (13) (a semantic condition) and (14) (a pragmatic condition) deduces (1), repeated now as (15): 233
DERIVATIONS
(15) In a PREDICABLE(CATEGORY) relation, CATEGORY is anchored in time. The Judgment Principle in (13) has another important consequence. The CATEGORY of which a PREDICABLE holds is a classified element, or a prototype, or a kind, or an abstract notion. We take this to yield familiar Milsark-effects, which need not affect our syntactic account as they are taken to do in other analyses. Milsark (1977) noted that the subject of a categorical predication is specific. Indefinites (16) or weak quantifiers (17) are not subjects of categorical predications (nominal predicates are used for the categorical reading and participials for the thetic one, as these prevent alternative readings, see Note 2). (16) a. Un lobo es una bestia. a wolf ES a beast “A wolf is a beast.” b. Un lobo está acorralado. A wolf ESTA cornered “A wolf is cornered.” (17) a. Algún lobo es una bestia. some wolf ES a beast “Some wolf is a beast.” b. Algún lobo está acorralado. some wolf ESTA cornered “Some wolf is cornered.” The usual approach to these facts is in scopal terms, contra Milsark’s initial intuition (which we believe is correct). Given our analysis, there is no reason why scope should have the intended effect. That is, for us there are designated LF sites for subjects of categorical and thetic predications, which have nothing to do with their logical scope but, rather, encode pragmatic prominence. So we can have the paradigm follow in Milsark’s terms, from the fact that categorical predicables force the actualization of their subject, given the Pragmatic Principle. Hence, unspecific subjects of all sorts will not be interpretable as pragmatic subjects, which we take to be the reason why topics in general must be specific – or actual in our sense. These effects now follow from the LFs achieved in syntactic terms, and not conversely.
234
12 A NOTE ON RIGIDITY † but were I Brutus, And Brutus Antony, there were an Antony Would ruffle up your spirits, . . . (Shakespeare: Julius Caesar, III, 2)
1 Counterfactual identity statements Sentences like the one above pose a well-known difficulty for the now standard theory of names. What makes Antony Antony, and not Brutus, is (in the Kripkean view) the fact that Antony originates “from a certain hunk of matter” and has some reasonable properties concerning a given “substance” essential to Antony. In other words, there is an object in the world with the relevant character which, by its very nature, is Antony and not Brutus.1 This is all fine, but then there is something nonsensical about Antony’s statement above. He seems to be referring to a chimerical, counterfactual creature without any referent whatever. Antony is not, of course, referring to Brutus; Brutus, being who he was (and having murdered Caesar), would not attempt to ruffle up the spirits of the Romans upon reflecting on Caesar’s death. Then again, Antony is not referring to himself either, or he would not be using counterfactual speech; the whole point of his elegy to Caesar is to pose a dilemma: Antony, as who he is, does not have the clout to arouse the Romans into action, much as he would want to. Brutus, as who he is, (presumably) expects Caesar’s death to be forgotten – although he could, if he wanted to, move the Romans into action. So what Antony appears to be invoking is a creature with enough “Antonihood” to act in love of Caesar, but enough “Brutushood” to matter. The question is whether that creature is sensible. What is crucial for my purposes here is that the creature, call it “the chimera,” does not have obvious grounds for rigidity in the standard view. The reason is direct. Traditionally, what makes us refer to this chapter rigidly is the fact that the object is what it is, right there in front of your eyes. You can call it “A Note on Rigidity” or whatever, but it sustains its character across counterfactuals simply because it is just whatever it is. Had there been no object with the relevant properties, there would have been no standard rigid designation. And we have seen that the chimerical Antony/Brutus does not exist.2 But before one poses questions about such deep and muddy waters, I propose a modest exercise, that we see whether there are any linguistic phenomena related to the problematic expression. If there are, they might shed some light on our difficulties with its semantics. The expression is easy to paraphrase: 235
DERIVATIONS
(1) Were some of the modes of Antony some of the modes of Brutus, . . . Of course, there has to be some cooperation for such a paraphrase to help. The relevant Antony modes have to be seen as those that lack “wit, . . . words, action, . . . utterance, . . . the power of speech,” what makes him but “a plain blunt man.” Those are the gifts Brutus has. Meanwhile, there have to be other Antony modes that justify that the chimera should ruffle up the Roman spirits. It is in those modes where Antony’s love of Caesar “lives.” Put those loving modes together with the gifted Brutus modes and you will have a vengeful orator.
2 An excursus into wholes and parts (1) does not seem relevantly different from the plausible assertion below: (2) Had the Trojans been the Greeks, the Trojans would not have taken the horse. As far as I can tell, what (2) really means can be paraphrased in (3): (3) Had some of the Trojans been some of the Greeks, the Trojans that the Trojans would then have been would not have taken the horse. Of course, (3) is true in all circumstances in which a stronger version of (2), involving all of the Trojans and all of the Greeks, is true. But (3) has two advantages over that stronger version. First, it allows us to quantify over some relevant Greeks (who we take to be very smart, say Odysseus, Achilles and the like) and some relevant Trojans (who we take to be rather stupid, that is, crucially not sensible people like Laocoon, who did not want to take the infamous horse). Presumably, if stupid Trojans had been counterfactually swapped for smart Greeks, events would have proceeded differently from the way we are told – but arguably not otherwise. Second, imagine (2) were true of all Greeks and Trojans. One could argue that, then, there would be no Trojans left; they would all have been Greeks. If so, one could argue that the situation of their being able to take the horse would have never arisen, this event being the culmination of a war between Greeks and Trojans. To put it differently, we need some Trojans for presuppositions of the assertion in (2) (which is, after all, about Trojans) to make sense.3 In the case of the counterfactual Trojans, it is all those Trojans (in fact most of them) that got involved in a war with the Greeks, did this, that and the other, and eventually found themselves in front of a weird horse. In the case of the counterfactual modes of Antony, it is all those modes (presumably also most of them) that made him a friend of Caesar, and all that. These parts of the expressions may be thought of as “rooting” them in some intuitive sense, thus making them successful ways to denote Antony or the Trojans. At the same time, the chimeras are chimerical because not all their “chunks” are real: our recipe for their construction calls for invoking some smart Greeks, or eloquent Brutus modes, so that with the new and improved would-be crea236
A NOTE ON RIGIDITY
ture we are justified in asserting something about not being fooled by weird horses or ruffling up the Roman spirits. Had we been using (1), (2) or (3) (perfectly sensible, if stilted, English expressions), it would have been reasonable to conclude that nothing else is going on. In other words, we can, without much difficulty, invoke what I think is the chimera that concerns us here by means of roundabouts like the “modes of Antony.” It goes without saying that this raises a host of questions, starting with what are those modes, and going all the way down to what makes counterfactual modes be of who we assume is the real Antony. However, regardless of how we answer those questions, we have to say something for (1) – the tricks being within the lexical meaning of the entry mode and the grammatical meaning of some . . . of (Antony) in the appropriate context. Granted, the bare Shakespearean expression has no overt manifestation of anything other than Antony, but if generative grammar has taught us something, it is to expect underlying structures that are not visible to the naked eye. In the remainder of this chapter, I try to show that it makes sense to assume that sort of hidden paraphernalia for the relevant uses of names in counterfactual identity statements, and ponder what this means for the structure of names in general. Before I proceed, though, I want to signal the way. The key is comparing (1) and (2), with the intended meaning in (3). I suppose the Trojans is a name.4 Then the issue is counterfactually changing the Trojans a bit, by swapping some relevant players. The chimerical Trojans are similar to the real old Trojans in that they have enough Trojans to count as “the Trojans”. One should still ask how many that is, but I will leave it at that. More relevantly for our purposes, we could counterfactually swap some crucial modes of Antony for crucial modes of Brutus, again just enough to make the assertion true without messing around with presuppositions. The same questions obtain about how much is enough, and the same answers apply, however much one needs in the group instance. This is enough for us because we will have to say something about that instance, in which no particularly troubling considerations arise. The reason why this is, I believe, is simple. The ontological bite of the Trojans is in the group as such, not its members, which is like saying that the group is not equal to the sum of its parts (or the substance of the parts would essentially be the substance of the whole). If we (can) think of Antony as an array of modes, conceiving him as a whole that is not the sum of its modes, then we are in business. We can keep Antony intact across counterfactuals while still swapping some of his modes.5
3 Two intriguing linguistic facts Let us now move to two apparently unrelated and indeed very basic facts. The first one is why on earth Antony claims that, if the counterfactual were met, there would be an Antony that would ruffle up the Roman spirits. Certainly, Antony is not referring to just anyone bearing the name “Antony” that would show up for the occasion. The relevant Antony is the one and only, albeit with 237
DERIVATIONS
Brutus’s authority, power or what have you, to influence the Romans. So it seems that, in the end, Shakespeare is rigidly designating Antony, yet he is speaking, apparently paradoxically, of an Antony, as if he were describing someone. That sort of parlance is very common to sports commentators, “A fiercely insane Tyson bit an ear off of his opponent!” The description introducing this sentence invokes reference to the infamous Tyson, albeit in a subtle way, describing one of his modes. So there we go again: the modes (this time of Tyson’s) showing up when we want to describe non-rigidly a (certain) peak in Tyson’s eventful career, while still rigidly (and perhaps indirectly) referring to Tyson himself. If Antony were referring to an Antony mode when saying that “an Antony would ruffle up your spirits,” then we would be out of trouble in two ways. First, we would not need to invoke the peculiar idea of names ever being indefinite descriptions. What would be indefinite is the mode; the name it would be a mode of would not only be definite, but furthermore adequately rigid. Second, if we could proceed the way I am suggesting, we might be able to show – within the very sentence we are trying to analyze – that we need to invoke these little mode notions in counterfactual identity statements. Shakespeare might have not only used the troubling counterfactual, but also given us a key to its logical form. From my perspective, his indefinite description of a mode of Antony is very useful. To start with, it has some overt grammatical structure (the “a(n)” bit) that we can piggyback on when trying to construct the support for the elaborate structure in (1). Equally importantly, a description introduces a frame of reference. One hopes that such a frame is involved in the counterfactual swap of modes that is at the core of this chapter. Let me next move to the other linguistic fact, this time from Chinese: (4) Na gen Ten Zin Gyatso that CLASSIFIER Ten Zin Gyatso “That T. Z. G.” This language is one of those where nominal expressions are generally introduced by classifiers. In many languages, this is the way to quantify over count nouns, and in the case of Chinese the practice extends to instances of demonstration. For example, (5) is how we ostensively speak of a certain man: (5) Na ge ren that CLASSIFIER man “That man” Importantly, (4) denotes a given person called Ten Zin Gyatso – but as opposed to some other Ten Zin Gyatso. So, for instance, we may use (4) to contrastively refer to the Ten Zin Gyatso who is currently the Dalai Lama (and not somebody else). Two curious aspects of (4) can be highlighted. First is the fact that names, contrary to nouns, do not take classifiers. Importantly, (5) does not mean “that 238
A NOTE ON RIGIDITY
man as opposed to some other man,” or any such thing; it simply means “that man.” In turn, if you want to name the Dalai Lama in a non-contrastive way you simply say Ten Zin Gyatso, as you would in English. Of course, this is neither more nor less remarkable than the fact that (4), in English, is also contrastive.6 Since English does not take (overt) classifiers, we say that names are not introduced by demonstratives – and this could be the explanation for the only reading available in (4). But then there are clearly some instances where a name can be introduced by a demonstrative. For example: (6) That happy Ten Zin Gyatso Again, this is (several ways) ambiguous. It can refer to some particular Ten Zin Gyatso who is a happy fellow (the contrastive reading).7 And it can refer to a mode of the Dalai Lama – say one that could be witnessed last month. That relevant mode reading is highlighted in (7): (7) Last month’s happy Ten Zin Gyatso contrasts sharply with that Ten Zin Gyatso we once saw leaving Tibet. The most natural Chinese counterpart of such a complex nominal is (8): (8) Shang ge ye xinfu de Ten Zin Gyatso last CLASSIFIER month happy of Ten Zin Gyatso “Last month’s happy T. Z. G.” Importantly, though, (9) also yields the desired reading (although it apparently allows a contrastive interpretation as well): (9) Na ge shang ge yue xinfu de Ten Zin Gyatso that CLASSIFIER last CLASSIFIER month happy of Ten Zin Gyatso I find (9) a nice example because of its intricate syntactic properties. My hope here is that Chinese offers some clues to the structure of what I am calling “nominal modes”. In a nutshell, I expect that what the (first) classifier ge in (9) classifies in the relevant reading is a mode of Ten Zin Gyatso. If this is the case, we would have found direct syntactic evidence for the alleged modes. In turn, the fact that these modes do not lexicalize in (6) and other English examples seen here (including, in my view, the Shakespeare quote) is no deeper than the fact that the relevant classifier does not lexicalize in the Chinese example in (8). I do not know why that is, but whatever is going on with the classifier lexicalization of (9) vs. (8) may be invoked for the much more general Chinese vs. English distinction (granted, a promissory remark – but one of a familiar sort within generative grammar).
239
DERIVATIONS
4 On the issue of names and determiners Classical name theory makes much of the fact that names do not generally take determiners (or as we have seen, classifiers – which as far as I know is an undiscussed fact). The received (Russellian) wisdom is that a determiner introduces a quantification, something considerably more complex than is needed for names, traditionally taken as logical constants. It was Burge (1974) who first argued that classic rigidity effects can be captured even if names are not represented as constants, so long as they are introduced by a covert demonstrative. There are two different aspects to this view. First is the idea that a name can be used predicatively: (10) a. She is a Napoleon, although of course he wouldn’t have approved of her. b. Every Napoleon has hated the name, after he gave it such a bad press. c. Napoleon admirers were devastated when he was deported to Elba. Whatever is concocted for these examples should not be too fancy, since Napoleon here serves to anchor the anaphor he. Likewise, the theory of (generalized) quantification is based on treating Napoleon in (10b) as the first argument of every, generally in the same vein it would treat a noun like man. The second important aspect of Burge’s proposal is that demonstratives induce rigidity (for the same reason that names are classically taken to invoke rigidity; they pick out an object in the world). Put that fact together with the predicative analysis of names and you can have it both ways, name predicates, but aptly rigidified through a grammatical tool that picks out an object (of which the name holds as any other predicate would). The problem is that, as Higginbotham (1988) notes, Burge’s proposal is as interesting as it is wrong. Higginbotham’s demonstration is an old linguist’s trick. Make Burge’s demonstrative overt and see what happens. We already saw what happens in (4), (6) or (9): an entirely new reading emerges.8 Higginbotham’s arguments also unearth new and interesting facts which any theory will have to account for. Thus, compare (6) to (11): (11) Happy Ten Zin Gyatso He points out that (11) can only be read non-restrictively – contrary to what we saw for (6), which admits a restrictive reading. He accounts for this fact by claiming that happy in (11) modifies a whole noun phrase, which as such does not take restrictive modification. In contrast, he takes an example like (10c) as an argument that names can, in certain contexts, be smaller than noun phrases – under the assumption that this example involves noun (as opposed to noun phrase) incorporation. Whatever the merit of that argument, it creates its own wrinkle, given a modification like the one in (12) – which must be read nonrestrictively: (12) Every boxing aficionado is a good-old-George-fan. 240
A NOTE ON RIGIDITY
Be that as it may, (12) is a very serious counter-example to Burge’s view, since incorporation does not extend to nominal instances introduced by demonstratives (or other determiners). At the same time, (12) warns us against any trivial association of the rigidity of names to their (alleged) noun-phrase status; after all, by hypothesis (12) involves the nominal incorporation of something smaller than a noun phrase, and yet it still invokes reference to the one and only George (Foreman). Note, in particular, that (12) cannot mean that every boxing aficionado is a fan of anyone who is good and old and called George. (This may be an implausible reading, but a good-old-Kennedy-fan could reasonably, but may not factually, denote admirers of good old Kennedy fellows.) Taking stock, after exploring the factual base, we appear to need a treatment of names that, apart from yielding their rigidity, allows us to have them “chunked down” into mode components, associate to (generalized) quantifiers and appear in various predicative contexts, and resist restrictive modification in their bare guise while allowing it when associated to demonstratives.
5 Assumptions for a new proposal Let us now take seriously the idea that, just as there is a sound relation between a group and its members, so too there is some meaningful relation between an individual and its modes, whatever those are. In the case of the group, the relation can be of intricate complexity (cf. a team, family, corporation, country). Somewhat analogously, an individual can also be conceived in intricately complex ways (cf. a lump, object, organism, collective . . .). One may choose to express this through different formal mechanisms, but it is a mere and indeed obvious fact about human cognition. Syntactically, it is quite clear what is going on. We must phrasally support expressions like some (individual(s)) among the Greeks or some (modes) of Antony, which I hope are semantically alike in reasonable ways. The latter expression may sound a bit odd and formal when fully expanded, but is quite colloquial in its reduced guise, as the famous song All of me directly attests. Similarly, we can naturally say all of Jordan was invested in that game, most of Madonna is just hype, some of Che Guevara has remained in all of us, and so forth. (At least) the of in these instances signals a partitive syntax, which syntactically places the name in the realm of some “space” which is partitioned. The “mode” I keep invoking should be seen as the relevant (abstract) partition. These days we have a reasonable picture of what might be going on in (pseudo-) partitive expressions. We invoke for them the syntax of possession (roughly of the sort in Szabolcsi 1983, Kayne 1994, and Chapter 9), which allows both for nominal manifestations in partitive guise and in fact for garden variety possession, as in the Greeks had some outstanding warriors (chiefs, philosophers, individuals. . .) or Antony had some remarkable modes (moments, attributes, aspects. . .). For reasons of space, I will merely refer the reader to those sources, assuming without argument the sorts of structures in (13) (see Chapter 15): 241
DERIVATIONS
(13)
DP D some
AgrP Agr
SC
Greeks individual
D is a quantifier like some, whose argument is AgrP. This is a referential phrase, whose details are shown immediately. The lexical components of the expression stand in a relation which is syntactically coded as a small clause (SC). Within this lexical domain, we encounter some conceptual space (in this instance, Greeks, which can be thought of as a space of “Greekhood,” whatever that is), and an operation on that space that presents it in a particular way, giving it a characteristic shape and a corresponding semantic type (here, individuals). All theories have to code that sort of “integral” relation, whether they make it explicit or hide it into some lexical notion (as in “relational” terms). The only thing (13) is adding to traditional analyses is an explicit syntax in terms of wellattested properties of possessive constructions. Observe that (13) invokes no reference unless we code what that is; as it stands, DP is a quantification over Agr, which for mnemonic purposes I will henceforth write as R for “reference.” “R” in turn syntactically associates to some integral presentation of an abstract space as this, that or the other. We may code a given lexical item with a crucial referential feature of the checking system – in Chomsky’s (1995b) sense. Say this is individuals in (13). Then this item must move to the checking domain of R, leaving a trace. In other words: (14)
DP D some
RP
individual [r] R
R' SC Greeks
t
The syntax in (14) corresponds to some space of “Greekhood” presented in individual guise – although it could have been in other guises: pairs, groups, etc., up to cognitive limitations. What those are depends on the properties of the relevant spaces, as well as those of the modeling presentation. For example, if we are dealing with generalizations of Euclidean spaces called manifolds, the relevant spaces will exhibit special connectivity properties (see Kahn 1995: Chapter 7). Thus we can express an observation of Russell’s 242
A NOTE ON RIGIDITY
(1940: 33) which Chomsky (1965: Note 15 to Chapter 1) reinterprets on psychological grounds; a name is assigned to any continuous portion of space. Thus, Chomsky notes, humans do not name the four limbs of a dog as a single notion LIMB, of which the proposition “LIMB reaches the ground” could be true. It turns out that LIMB cannot be described as a (low dimensional) manifold.9 At the same time, a manifold of any given dimensionality acquires special properties depending on the sorts of twistings, foldings, gluings, etc., that one performs on it. This is what distinguishes a “cigar band” from a “Moebius strip” from a “torus” and so forth (all of them creatures of the same dimensionality). I call this sort of operation on a space a presentation because I would like to relate it to the Fregean notion of a “mode of presentation.” As is well-known, for Frege the meaning (or sense) of a term is its “mode of presenting” its bearer (or referent). I want Frege’s notion to correspond to the mathematical tool I am alluding to. This view would not please Frege, for whom it is the referent that is presented in a given guise, not a mental space corresponding (somehow) to that referent. I have little to say here about the referent, beyond the fact that it is modeled (by the speaker/hearer) in terms of a type of space presented in some fashion. I syntactically code this modeling relation as “complementation to R.” In (14) the referent encoded through R is modeled in terms of a presentation of a space of Greekhood as individual. Then what counts as a successful model for reference is still open. About the “successful” bit, I have nothing to say (like everyone else – pointing to an object only coding this problem). In any case, we must leave room for speakers’ intentions, under the assumption that terms do not refer, but rather it is speakers that use terms to refer. I code this by way of a context variable associated to the Q position in (14). This coding does not really address anything profound, but the matter will not affect what I have to say in this chapter. On the basis of (14), something like some modes of Antony must be analyzed as in (15) below, whereby a space of Antonihood presented in a “mode” guise is quantified over, yielding an array of Antony modes:10 (15)
DP D some
RP
modes [r]
R' R
SC Antony
t
This syntax, in itself, adds or subtracts nothing to the analysis suggested for (1), or any variant. What matters for the counterfactual to be sensible is that the spaces of Antonihood or Brutushood (which model certain “wholes”) be 243
DERIVATIONS
constant across situations or worlds; the same would be said about spaces of Greekhood or Trojanhood. The way of having the cake and eating it too is that some spaces can be defined in a certain way regardless (at least up to a point) of what they are composed of. For example, the United States did not cease to be itself after acquiring Arizona, or Cervantes did not cease to be himself after losing his arm. Mathematically, one can concoct homomorphisms between the old and the new US or Cervantes, of course without anybody yet having a clear picture as to when the homomorphisms break down when things get multidimensional. In any case, I am adding to these familiar issues nothing beyond treating an individual as an array of smaller elements, which allows us to speak of counterparts in a way that we need anyway for standard groups. That is not saying much, but it is avoiding an absurdity and focusing the solution in terms of a problem that is at least well understood. Two further problems remain: first, even if I am allowed to claim that (15) is the syntax for some modes of Antony, have I said anything about the syntax of Antony? Second, what does it mean for Antony to rigidly designate?
6 Toward a definition of name I propose that a name has no internal conceptual structure, that is, is purely atomic. In contrast, a noun, as we saw, is arguably an articulated manifold, with a corresponding elaborate syntax. Thus, compare the following expressions: (16) a. [Napoleon] b. [SMALL CLAUSE [SPACE man] [presentation CLASSIFIER]] The intuition is to liken all languages to East Asian variants, where classifiers introduce nouns, but as we saw not names. The classifier, of course, is intended as the relevant presentation that articulates the n-D manifold in specific ways (see Muromatsu 1998). Evidently, no such classification is morphologically necessary in most Indo-European languages, but it does show up here and there in terms of gender agreement and, most importantly, the genitive/possessive syntax of integral relations (wholes and parts, kinship relations, elaborate shape/function expressions, masses and their measures, and so on). Traditional grammar treats these elements as derivative, in some form or another, on basic count terms denoting individuals. I am not: individuals are expressed through complex topologies, just as the rest is. Since I speak of these matters in Chapter 15, I will not repeat myself here. The important addition I am making is that names do not participate in any of this. The cut I am suggesting in (16) gives us the phenomenology of rigidity without invoking reality. The name as characterized in (16a) is a primitive space. The essential difference between the name in (16a) and the noun in (16b) is that the latter has necessary properties expressed through the classifier. I should insist that I am not talking about properties of the bearer in reality of the name Napoleon or the description man; rather, I am talking about these terms themselves. For being man (ren in Chinese), a term must be a concrete kind of 244
A NOTE ON RIGIDITY
manifold (which in Chinese gets classified as ge, and by hypothesis in a similar, say, 4D fashion in all other languages). In contrast, for being Napoleon a term must simply be a mere, arbitrary, abstract manifold of just about any dimensionality. Thus, Napoleon can, in principle and in practice, name a (4D?) dog, a (3D?) ship, a (2D?) cognac, or even a (1D?) style. What makes these syntactically opaque names denote rigidly across counterfactuals? The key is in what allows modeling, which is something with parts (and/or flexibility) to it. If I am going to model such-and-such, I need various (and/or flexible) so-and-sos to tinker with, and twist, fold, glue, etc., until I am done. If you give me a single rigid so-and-so, I can only model that very so-andso, and no such-and-such, with it. I am supposing that a name is a rigid soand-so for the very simple reason that, by syntactic definition, I literally do not give you any access to its internal structure. A name is the ultimate, unsplit atom. In contrast, a noun in my view is a flexible space which is, as it were, warped in a way that its classifier specifies. Descriptive richness comes, at least in part, from the fact that a noun has not just one, but n-levels of structure, as many as dimensions it is expressed in as a space. So for instance a noun like man is intended as having a dimension corresponding to what Chinese speakers classify as ge, but what this morpheme actually classifies is itself a space of lower dimensionality, which in turn is presented in some other way specific to that dimensionality, and so on. All the way down to that dimension where the system bottoms out. The point is, being an n-D manifold allows for n ways in which to enrich the basic syntactic parameters that define you as whatever it is you are. In the meantime, there are various subtle ways in which an articulated model can be built, roughly corresponding to modes of presentation in terms of substance, change, movement, etc. (see Muromatsu 1998). All of this is missing from a name. These articulated spaces that allow for complex models sanction reference in terms of descriptions. By its very nature, an articulated complex model can be applied to different referents, those that match its specifications. In contrast, a model with no flexibility can only either model something isomorphic to itself, or else be arbitrarily assigned to something else. Thus, whereas the question “why do you call this a dog?” is sensible and can be answered in terms of fixing the relevant n-D parameters in one way or another (“Does it move like a dog?” “Does it have dog parts?” etc.). The question “Why do you call him Fido?” is a mere topic for discussion, and can only be addressed in terms of an arbitrary matching between the term and the individual, with no syntactic systematicity (“I liked the name,” “He’s called that by his owner,” etc.). Whereas the rigidity of a term has a consequence for what things can be referred to by it, it is in this view not determined by the referent, but the other way around. Note that a kind of rigidity arises, also, at the level at which any description bottoms out as an abstract concept. For example, man (as used in man comes from Africa) designates as rigidly as Mandela does in Mandela comes from Africa, a point first raised by Putnam (1975) (for certain kinds anyway, and for entirely different reasons). It is not that the terms man (used as a kind) and 245
DERIVATIONS
Mandela have the same dimensionality. That cannot be, since we know there are predicates that are picky with regards to the dimensionality of their arguments, and these clearly yield different results with man and Mandela (cf. Man waned from the forest vs. *Mandela waned from the forest). The point is, however, that Mandela has whatever dimensionality it has, but behaves like elements of the lowest dimensionality do in that neither has components. In the case of names this is so by definition, while in that of lowest dimensionality elements this follows from their very form.
7 Modes of names are nouns This view of things allows us to treat modes of names in a straightforward way. A mode classifies a name, that is turns it into a noun; which is good, for two reasons. First, it gives us a way of reconciling the fact that, although in Chinese names do not take classifiers, they do when a mode reading is invoked, as we saw in (9) above. The mode reading is by hypothesis a noun reading, which thus falls under the generalization that nouns (and not names) are classified. Second, we have seen that modes of names behave as descriptions. This too is expected. It is the mode that allows the name space (otherwise rigid) to act as a flexible tool for describing something or other. Granted, that something or other has to be confined to the range of modes of which the name holds (for instance, Antony modes), but that is neither more nor less surprising than having the referent of a noun confined to whatever falls under the range of that very noun – ultimately the name for a kind. That is why we only call dogs “dogs.” Assuming these ideas, we can now argue that Shakespeare’s quote need not be more troubling than (17), which of course troubles nobody: (17) If a friend (of Caesar) were an enemy (of Caesar), . . . The term friend can be applied to Antony just because the relevant syntactic parameters of this term can be fixed with what we know about Antony. Similarly, enemy can be applied to Brutus. But putting reference aside, both friend and enemy are, say, 4D manifolds that bottom out as different 1D spaces,11 in the process going through various presentations (classifiers and all that). There is nothing particularly interesting about a counterfactual in which Brutus could be modeled in terms of the apparatus associated to friend, and Antony, of that in enemy – a mere shift in parameter settings, often a matter of speaker’s opinions. But the fact that Shakespeare’s quote is not troubling anymore does not entail that we know what it is supposed to mean. We still have to make sure that there is a rationale for the counterfactual speech, where the chimera is invoked. This is why I suggested a paraphrase of the sort repeated in (18): (18) If (some) modes of Antony had been (some) modes of Brutus, . . . Given the lexical and logical semantics that I am assuming and/or proposing in this chapter, we find all the aspects sketched in (19) in the relevant expression, 246
A NOTE ON RIGIDITY
regardless of whether the identity statement has the completely obvious form in (18) or the much more sketchy one in Shakespeare’s quote: (19) a. b. c. d. e.
If there existed an Antony space presented in a mode guise, and relevantly confined by the speaker to such-and-such, such that this space so modeled were actually a Brutus space presented in a mode guise, and relevantly confined by the speaker to so-and-so, . . .
Let us consider each of these lines in turn. (19a) and (19d) are what introduces each nominal expression, and correspond to the quantificational structure that is coded as Q and R (the quantifier and variable positions) in the syntactic structure (15). The “presented” bit, in each instance, codes the integral relation between the Antony/Brutus spaces and their modes of presentation. (19c) is the identity statement. I ignore matters pertaining to the mood of this statement and all issues that arise with regards to the quantificational structure of the Brutus expression. This is because all of those considerations (though certainly important to the logical form of the chimerical sentence we are dealing with) arise more generally, and are not exclusive to names. Finally, (19b) and (19e) code speakers’ intentions, corresponding to the contextual specifications associated to Q and R in (15). Now let us talk about the “swap” of Antony modes for Brutus modes that is essential to Shakespeare’s sentence, and generates the chimera. Two important factors make the swap on the one hand possible, and on the other relevant and sound. First is the fact that there is something to swap. If we had not split Antony and Brutus down to constituent elements, all we could do is swap Antony for Brutus, which as we saw is of no great help (in fact, leads to absurdity). Second, the mode expressions are descriptive, which means they are quantifications of some sort with appropriate contextual confinements (as (19b) and (19e) show). This, in itself, means two different things, both crucial. What is basically going on in the “swap” is that the speaker pretends some of the modes that he knows are Brutus’s are really not his, but Antony’s. How could this be done if there were no coding for relevant Brutus’s and Antony’s modes? Surely, not just any old mode of these two people could be invoked for the consequent of the conditional to be meaningful. The question, then, is where relevance is coded. The being itself (Antony’s being Brutus) is not very useful, since it is not clear what a relevant being would be here. All that is left then are the terms Antony and Brutus, by hypothesis reduced to modes. Then speaking of relevant modes (and given that the quantificational syntax encodes standard context confinement) is as trivial as everyone left making reference to a given set of individuals, and not applying to everyone in the universe. But apart from giving us relevant modes, the assumption that we are dealing with descriptions, which involve the space associated to a name (as shown in (19a) and (19d)), directly “roots” the expressions in the individuals that matter, Antony and Brutus. This is in fact what holds the referential import of Antony 247
DERIVATIONS
and Brutus in place, even when relevant modes of each are swapped around and the consequent of the conditional is carried out, as it were, on the shoulders of an entirely chimerical creature.
8 A dynamic theory of naming I think ultimately everything boils down to names, be they like Antony or Brutus, or like man or dog. The answer to the Shakespearean riddle “What’s in a name?” is simple: nothing. At the same time, it is useful to think of a name as a space which can be conceived flexibly, and be warped into the sorts of nuisances that make up a noun, e.g. a mode. Nothing may be in a name, but the name itself is something, a space of some sort, if I am right. The direction I have suggested has some of the virtues of Burge’s, without falling into any of its pitfalls. As in Burge’s proposal, names for me are predicates, which can thus appear in predicative contexts like be a Napoleon, be arguments to quantifiers as in every Napoleon, incorporate as modifiers as in Napoleon admirer, and whatever else you think predicates should do. Like Burge, I also think that some (nominal) predicates can designate rigidly in some circumstances, but contra Burge, my name predicates are not rigid because of any demonstrative (which is what Higginbotham showed is wrong). Rigidity for me is a property of opaque structures themselves. I have extracted this conclusion fairly directly from the fact that Chinese and similar languages do not classify names, while of course they classify nouns in relevant contexts. That is surely a central fact for me, and my thesis can be directly destroyed by showing a language which systematically classifies names. For whoever is interested in finding such a language, you must remember what will not do: a contrastive reading of a classified name, of the sort seen in (4). It is actually very interesting why an expression like that Ten Zin Gyatso (as opposed to some other Ten Zin Gyatso) is not simply ungrammatical, and instead this contrastive possibility arises. Note what I am forced to say; inasmuch as this expression is indeed classified in Chinese, it must be that it actually involves a noun, somehow. My hunch is that this sort of noun corresponds to the colloquial English expression that Ten Zin Gyatso dude/bloke. If so, the issue is what specific operation dude/bloke, or corresponding Chinese classifiers, perform in the Ten Zin Gyatso space. Everything discussed so far had non-trivial effects within the relevant spaces. In particular, classifiers, measures, modes, etc., yield component elements of a given manifold. But one can imagine a trivial, identity operation whose value is the manifold itself. Say that is what dude/bloke does on the Ten Zin Gyatso space; then the question is what is the difference between Ten Zin Gyatso, the name, and a corresponding form to which the identity presentation has applied. Reference-wise, they are the same – we are picking out the same man. But we have syntactically produced a (trivial) description of that man. It is the x such that the Ten Zin Gyatso space obtaining of that x is presented in a dude/block guise. When I normally refer to Ten Zin Gyatso, and there is no other Ten Zing 248
A NOTE ON RIGIDITY
Gyatso around to bear that name, all I need to do is utter the name in question. However, if another Ten Zin Gyatso becomes relevant in context, an obvious difficulty arises. We have two concepts each associated to different individuals, yet the linguistic term used for each is indistinguishable from that used for the other. Then we must go into something more complex, a description, which as we saw comes together with its handy context variable. Now we are in business. The context variable allows us to speak contrastively of this relevant Ten Zin Gyatso as opposed to that other, irrelevant, Ten Zin Gyatso. Similar issues arise when comparing Happy Ten Zin Gyatso with that happy Ten Zin Gyatso, the second one of which allows a restrictive reading for happy. The reason for this is that, given the form of the demonstrative expression, it is a description, which means it involves not just the Ten Zin Gyatso space, but also an associated presentation. It is this associated presentation that makes the expression a description of an individual (dude, bloke or whatever), to whom the restrictive modification applies, precisely via the presentation. In contrast, when just the name is introduced, the only modification we can have is that of the conceptual space itself, which yields an associative reading of the sort found in paratactic appositives (Ten Zin Gyatso, a happy man, . . .). It is important not to equate non-restrictive readings with “true” names and restrictive readings with “predicative” uses of names. Incorporated nominals, as in Kennedy-admirer, show this conclusion to be factually wrong. They are reasonably modificational, therefore predicative; yet modifications of incorporated names can be non-restrictive, as in good-old-Kennedy-admirer. As far as I can see it does no harm to consider all names predicative, basically following Burge. Burge’s problem was not there, but in looking for a rigidifier external to the name itself, such as a demonstrative. I have shown a way in which the name can be rigid in and of itself, which is useful in these incorporated instances where there is no way a demonstrative could be incorporating. Of course, if all names are predicates, we crucially need the sort of syntax in (13), or else we will not have a way of telling them apart from nouns. With that syntax, though, if all the components are invoked (in particular, the small clause bit), we will have a noun; otherwise, a name. This, as we saw, will mean that many apparent names will really be nouns, simply because they have the appropriate syntax. For instance, a fiercely insane Tyson, that Ten Zyn Gyatso (dude/bloke), or even Antony and Brutus in were I Brutus and Brutus Antony. An interesting topic for future research is whether every Napoleon takes a bona fide name or rather a noun as its restriction. Two facts suggest that the latter should at least be an option. First, it is fine to speak of every Napoleon dude/bloke, which in my terms signals a noun. Second, instances of conjunction of the sort in every Napoleon and artillery expert suggest that the semantic type of these names can be like that of nouns. But I do not know whether these plausibility arguments are enough to reach a stronger conclusion. The matter is interesting among other things because *the Napoleon is not good in English (although of course it is in many languages). In any case, I do not see how answering that question one way or the other will alter anything I have said. 249
DERIVATIONS
Also for future research are the exact syntactic conditions under which names like Antony get the more specific (some) mode of Antony structure. In some instances this is just a matter of lexical choice. However, I take it that the interesting ones are those where the mode or quantificational parts are not pronounced, and yet they are interpreted, as in Shakespeare’s quote.
9 Some conclusions I know I have said several non-standard things, syntactically, semantically, and philosophically. What follows is a brief summary. Syntactic structure (13) is meant to replace more traditional noun and determiner phrases. It is still a determiner phrase, in the sense that it is headed by a quantificational element, but the complement of this element is not a simple noun phrase. Instead, it is a referential phrase, which itself introduces an “integral” relation as its complement. This relation is coded as a small clause, typically associating something like a noun to a classifier. So between the noun and the determiner there is much more than is usually assumed. I have not really given arguments for this view in this chapter but have done so in Chapters 10, 11 and will again in 15. It turns out to be crucial for the present analysis of names. I have also argued elsewhere for the semantics associated to (13). It has a lexical part and a logical part, the former responsible for its intentional properties and the latter for its conceptual specifications. The lexical part corresponds to the small clause, and can be conceived as an operation on an abstract space, possibly an n-dimensional manifold. This concept so arranged creates a predicate which holds of the variable that glues the expression together. The variable corresponds to the referential position in the syntax, itself the complement of the quantificational site. That bit, quantifier-variable-predicate, constitutes the logical arrangement of the expression. In the normal instance, to say that a nominal predicate holds of a variable entails that the speaker that invokes this predicate can use it to model something or other out there. The difference with standard analyses here is subtle but important. A predicate like dog does not denote the dog kind, the set of individual dogs, dogs in all possible worlds, or any such thing. For me, a predicate of that type is a complex array of spaces of different dimensions, all built on the basis of previous ones. That creates a mathematical apparatus – a topology of some sort – which may be used to model dogs. Once this door is open, one certainly wants a picture of what these spaces are, how they can be tinkered with, and so forth. I have hand-waved in the direction of manifolds because of the restrictiveness of Euclidean spaces. If pushed, I can even hand-wave in the direction of inter-modular connections between the linguistic and the visual systems (to distinguish spaces that bottom out as, say, red vs. blue) and the motor system (for quick and slow) and so on. Which means this is all a program, albeit a fairly clear one (see Chapter 15). The only reason that is interesting now is this: it is all irrelevant when it comes to names. That is the main idea of the chapter. A name does not have 250
A NOTE ON RIGIDITY
any of these intricacies, and hence serves no modeling purposes. That creates a sort of elsewhere scenario for nominal predicate association to the relevant variable. When there is nothing to model because there are no toys to tinker with, then the nominal predicate arbitrarily picks out something out there, of which the predicate holds as a designation, not a description. That takes rigidity as the defining characteristic of a name. It is rigid because it does not have parts. In this way, we come up with an operational, dynamic definition of names and nouns. For instance, names which are added a classifying device turn out to be descriptive, hence nouns at some level. This provides a solution to the problem of counterfactual identity statements. I will not review that solution again, but I do want to point out that I have kept in mind, in particular, Lewis’s (1973) notion of a counterpart. Of course, nothing in my analysis makes ontological commitments about alternative worlds – or even the present one. Indeed, nothing I have said bears much on what reference is, altogether. The chimerical Antony is a counterpart of the real Antony only in that the speaker somehow models an Antony that has some, but not all of Antony’s modes. It is a counterpart to Antony inasmuch as the syntax says so, and it says so by making the relevant expression a description based on the Antony space. I do not have a clue about this: what are those modes that we have to assume are Antony’s for him to still be Antony? Or, how many modes is enough? Or even, is this the right way to pose that question? So long as we understand that there is some relation between Antony and his modes roughly of the sort holding between the Greeks and people like Achilles or Odysseus, we are safe. The bottom line is: something having to do with how the Antony space is presented is enough for the speaker to “root” the name Antony in some appropriate individual. I am calling that process a modeling, and basically whatever it implies will do for my analysis of the Shakespearian quote, although of course I am very interested in the full details for the larger picture. For what it is worth, I think we are going to need something of this bizarre type anyway for fictional characters, but I doubt that anybody that has not been moved by what I have said so far will be moved by the rigidity of Pegasus or Santa Claus. I think it is there, and it certainly has nothing to do with objects in the real world. Of course, the modeling system presented here treats Santa, or for that matter Rudolf the Reindeer, with equal seriousness, and makes quite a bit of Rudolf not being Pegasus, and so on and so forth. Why bother with such arcane problems? Not because of their rarity, but because treating them pushes the theory in a direction that, for better or for worse, has much to say about intentionality and conceptualization. As a matter of fact, my approach purposely bites the bullet of separating intentional and conceptual representations. To my knowledge only an unwarranted assumption forces us to treat intentional and conceptual matters as part of, or the output of, the same level of representation. In “the other side of the grammar,” where we do treat matters of articulation and perception in a unified guise, we have a very good empirical reason to do so, the Motor Theory of Speech Perception. But in 251
DERIVATIONS
“this side of the grammar” we lump together concepts and intentions, as far as I know, because a) they relate to “thought,” and b) we have always done so.12 I cannot finish without a philosophical reflection. The philosopher is not interested in rigidity as a linguistic phenomenon. What he or she wants are more serious things, like grounding objects, ultimately scientific ones, at least enough to avoid relativism. There is much to say about whether this is the way to address that question, or whether the question (serious though it is) is even meaningful. But be that as it may, this chapter has tried to show that rigidity is a defining property of names, and thus a linguistic phenomenon. To the extent that rigidly designated objects are themselves rigid in a sense that interests the philosopher, this is an issue of what counts as a valid model, or how the linguistic objects that we come up with are appropriately used to designate whatever they designate. There may well be an issue there, but if so I do not see that it is of much interest to the linguist, or that it pertains to the Shakespearean quote analyzed here, or (more importantly) that it helps us understand much about linguistic categories and how they get to categorize.
252
13 PARATAXIS † with Esther Torrego
1 Introduction We would like to explore briefly two sorts of sentential dependencies. The paratactic view holds the following. To assert that Galileo believes that the earth is round is to assert something akin to “Galileo believed that,” with the object of believe being cataphorically related to the separate sentence, “the Earth is round”. This approach goes back to Andrés Bello’s original insights, and is defended, classically, by Davidson (1967b). In turn, the hypotactic view is familiar to syntactic analyses stemming from Chomsky’s. This view contends that there is a single clause, a complement, which rather than being nominal is an entire clause. We will argue that both types of dependencies are realized in UG. We will concentrate here on two non-interrogative finite connectives from the Romance languages, in particular Spanish, que (that) and como (how).1 We believe that these two exemplify canonical hypotaxis and parataxis, respectively.
2 The distribution of como Descriptively, como has a far more restricted distribution than que. Clauses introduced by como can appear after the verb (1) but not before. That is, they cannot be subjects (2), topics (3) or left-dislocated constituents (4): (1) Verás/te darás cuenta como tu madre llevaba razón. “You will see/realize how your mother was right.” (2) que/*como la tierra es redonda es verdad. “That/how the earth is round is true.” (3) que/*como la tierra es redonda, veréis algún día. “That/*how the earth is round you’ll see some day.” (4) que/*como la tierra es redonda (lo) veréis algun día. “That/*how the earth is round you’ll see some day.” Selection of como is also restricted in lexical terms. Nouns/adjectives (5), and prepositions (6) do not take como-clauses: 253
DERIVATIONS
(5) a. No me gusta la idea/el hecho ?(de) que/*como I don’t like the idea/fact that/*how . . . b. Estoy harto ?(de) que/*como . . . I’m fed up that/*how . . . (6) a. Para que/*como So that/*how c. Desde que/*como . . . Since that/*how . . .
b. Con que /*como Inasmuch as that/*how d. Entre que /*como . . . While that/*how . . .
As for verbs, several disallow como-clauses, for instance volitionals, factives, causatives: (7) Quiero/lamento/hice que/*como . . . I want/regret/caused that/*how. Note, also, that whereas there are various idioms of the form nominal-sentence with que, none comes to mind that invokes como: (8) a. Juan se tragó la bola de que/*como . . . Juan swallowed the ball of that/*how . . . “Juan believed the lie that . . .” b. Juan nos vendió la moto de que/*como . . . Juan to.us sold the scooter of that/*how . . . “Juan lied to us that . . .” c. Juan nos contó la película de que/*como . . . Juan to.us sold the movie of that/*how . . . “Juan was bullshitting that/*how . . .” So let us proceed tentatively under the assumption that these robust facts show, at least, two types of structures. Furthermore, recall that we want to argue that it is como structures that are paratactic. A strong prediction of this approach is that syntactic dependencies across como are barred, if parataxis involves two separate texts. This prediction is borne out, again with non-subtle data. Overt wh-movement is disallowed in the relevant contexts: (9) qué os enseñó que/*cómo estaba escribiendo? What did s/he show to you that/*how s/he was writing? Similarly, predicate raising across como yields ungrammaticality: (10) A punto de llorar vieron que/*cómo estaba! Ready to cry they saw that/*how s/he was! Likewise, “Neg”-raising and polarity items also show the opacity of the comoclause: (11) a. No verás que/*como diga la verdad jamás. Not will-see.you that/*how say.s/he the truth ever “You’ll see that she never tells the truth.” 254
PARATAXIS
b. No verás que/*como venga bicho viviente. Not will-see.you that/*how arrive bug living “You won’t see a soul coming.” More generally, a paratactic analysis predicts the absence of even weaker syntactic dependencies, such as bound variable binding. Again, the facts confirm this prediction: (12) a. Nadie ve que pro es tonto. Nobody sees that he is stupid. b. Nadie ve como pro es tonto. Nobody sees how he is stupid. While (12a) allows a variable reading, (12b) does not.
3 A possessive structure In essence, we would like to suggest that the sort of structure involved in the Spanish (1a) is akin to the one in (13): (13) You will realize/see the truth of your mother being right. This raises the question of what sort of specific structure (13) is. Perhaps obviously, it does not involve a relative clause (cf. *The truth which the earth is flat). However, we can show that it is not a standard Complex NP either, of the sort in (14): (14) John heard the rumor that the Earth is flat. Stowell (1981) argued that all “nominal complements” invoke a predication relation between the nominal and the clause. While this is essentially correct, some interesting differences arise between the two sorts of structures, as the contrasts in (15) suggest:2 (15) a. a. a. b. b. b.
The truth is that the Earth is round. That the Earth is round is (only) the truth. *That the Earth is round is a truth. The rumor is that the Earth is flat. (*) That the Earth is flat is (*only) the rumor.3 That the earth is flat is a rumor.
Note that these structures may or may not be transformationally related to (13) or (14). But it is a fact that the paradigm with truth and the paradigm with rumor differ, which indicates that we must distinguish two sorts of clausal dependencies on nominals. Largely for concreteness, we aim to capture the differences in association as follows. For rumor we will assume a standard merger analysis (16a). In contrast, we will argue that the structure relating truth to the CP is not a new category. Rather it is a segmental structure, as depicted in (16b):4 255
DERIVATIONS
(16) a. CATEGORY
b. SEGMENT OF CATEGORY
{X,{X,CP}} CP
{X,X,{X,CP}}
X rumor
CP
X truth
In other words, we are proposing that (16b) involves a “base-generated” adjunction, or a small clause. Thus far we have only provided a structure for (13). We must next address the question of the other structures with be in (15) (assuming they are transformationally related to the structure in (13)), and also – for completeness – those in (17): (17) That the Earth is flat has (some) truth to it. (Cf. *That the Earth is flat has (some) rumor (to it).) We have introduced three types of structures. One involving a relational predication with truth (13), another one involving a be Auxiliary and raising of the truth (15a) and finally, a structure involving a have Auxiliary and clausal raising (17). In recent literature, there is a structure reminiscent of the one above, which also has three variants. Consider (18): (18) a. This child of mine. b. This child is mine. c. I have a child. It is intuitively clear that these three expressions should have a common source. Kayne (1994), building on ideas of Anna Szabolcsi, proposes that relational terms such as child come in different guises, depending on a variety of factors having to do with definiteness. For instance, the examples in (19) are out: (19) a. (*)A child is mine. b. *I have this child. Whatever this follows from, observe the similar facts in (20): (20) a. a. b. b.
The truth is that the Earth is flat. (*) A truth is that the Earth is flat. That the Earth is flat has (some) truth to it. *That the Earth is flat has the truth (to it).
The particular structure proposed in the Kayne-Szabolcsi analysis involves two layers: an AgrP and a DP:
256
PARATAXIS
(21) …
DP D
… D
AgrP
Possessor
Agr'
Agr
Possessed
The important idea to keep in mind is that both possessor and possessed can raise to the specifier of D, and eventually to the matrix. When the possessed raises that far, Auxiliary be shows up. If the possessor does, the D element incorporates to the Auxiliary, and it is spelled out as Auxiliary have for irrelevant reasons (see Chapters 9 and 10). The point is that we can immediately accommodate the relevant facts in (13)–(17) to this sort of analysis. We have noted above that the basic relation between the CP and a DP-like the truth is predicational, and have suggested a concrete structure for it. We must thus enrich the structure in (21) to something along the lines of (22), a proposal independently argued for in Chapter 10 for related structures: (22)
DP D
… D
AgrP Agr'
… Agr
XP CP
XP truth
The intuition is that the following three expressions have the same structural underlying source (modulo definiteness): (23) a. The truth that the Earth is round. b. The truth is that the Earth is round. c. That the Earth is round has (some) truth to it. Suppose that (23b) is a raising counterpart of (23a), and (23c), in turn, is the analogue of derivations involving the morphological suppletion of “beD” as have, as in (18c).5 We furthermore raise the following data from Spanish: 257
DERIVATIONS
(24) a. Juan explicó la verdad de que la tierra es redonda. Juan explained the truth that the earth is round. b. Juan explicó como la tierra es redonda. Juan explained how the earth is round. c. *Juan explicó como (de) que la tierra es redonda. Juan explained how that the earth is round. The gist of the proposal is that (24b) has the semantic interpretation of (24a), although it involves a rather different syntax. The hunch is that the sentential connective como induces the same sort of effects that the more complex la verdad does. However, important differences arise as well, as (24c) shows. Contrary to what we see in (24a), como is incompatible with que. In what follows, we argue that como is not a complementizer at all.6
4 Implementation of the analysis Etimologically, como derives from the Latin quod modo. It is then tempting to argue that como is bimorphemic in the synchronic system as well, involving a D and a predicative part. In the most radical version of our proposal, it is literally co- that surfaces as a D, while -mo would be the predicate.7 The architecture of the derivation then allows us to analyze (25b) as in (25a): (25) a. …
b. … como tu madre llevaba razón how your mother was right
DP D co-
AgrP …
Agr' Agr
XP CP
XP -mo
tu madre llevaba razón
This predicts the absence of como idioms as in (8), since the -mo part occupies the lexical space of the nominal chunk of the idiom. Recall also that no overt complementizer can appear in the CP of comoclauses (24c). Of course, no overt complementizer appears in main clauses either. It is then possible that this is happening in this instance as well. The dependent clause is a root clause. Chomsky’s recent suggestion for why complementizers do not have a PF realization in matrix clauses has to do with the fact that, in general, lexical insertion after Spell-out is not an option, since the extension condition in (26) forbids it:8 258
PARATAXIS
(26) Extension Condition A Generalized Transformation (GT) extends the entire phrase structure containing the target of GT. This restricts to the root node the possibility of inserting lexical material. In turn, if lexical material is inserted after Spell-out, the grammar cannot deal with its phonological features. Thereby, any such post-Spell-out insertions must involve no PF features. Finally, since involving no PF features is less costly than involving them, the radically null option prevails.9 One significant difference between clausal dependents of como and those of nominals such as la verdad “the truth” is that the dependent clause is in one instance associated to the genitive marker de, whereas in the other this is not the case: (27) a. . . . la verdad *(de) que la tierra es redonda the truth (of) that the Earth is round b. . . . como (*de) la tierra es redonda how (of) the Earth is round We think that this reflects a structural difference between the two, much along the lines of the contrast in (28): (28) a. . . . the sister *(of) John’s b. . . . John’s (*of) sister Within the Kayne-Szabolcsi’s analysis, (28) indicates different structural relations in the overt syntax. John’s is lower in (28a) than it is in (28b). If we apply this sort of criterion to our structures, we are led to the conclusion that the dependent clause in (28b) is structurally higher than the one in (28a). Within the minimalist system this can only have a cause. The higher element has had a reason to move by Spell-out, whereas the lower element has procrastinated. This implies that the moved element is attracted to a strong feature, not present in the instance without movement. Consequently, we must postulate a strong feature in structures with como, unlike in structures with la verdad (the truth). The natural step to take is to say that whereas the D element which we hypothesize for como structures selects for a functional category with a strong feature, the same is not the case for the D heading the other structures. The logic of the proposal makes one wonder whether the strong feature of the functional category hypothesized for como could not be licensing a null pro. Suppose it does.10 This makes a prediction. Dependent clauses introduced by como may have a null pro-like expression, relevantly licensed in discourse, just as null pronominals are in general. The same should not be true of clauses introduced by la verdad (the truth). Surprisingly, this obtains: (29) a. A: La tierra es redonda. The Earth is round. B: Ya verás como *(sí)! Indeed (you) will see how yes (You will see how that it is indeed true.) 259
DERIVATIONS
b. A: La tierra es redonda. The Earth is round. B: *Ya verás la verdad (de) (sí). Indeed (you) will see the truth of yes (You will see how that it is indeed true.) The emphatic marker sí is necessary for this sort of sentence to be grammatical, as shown in (29aB). This suggests that yet a further category exists between Agr and the lower level of structure, and that it is this category which must be selected by D, across a semantically inert Agr. The natural candidate is Laka’s (1994) Sigma. The postulation of such a category has another advantage within the minimalist framework in Chomsky (1995b: Chapter 3). Strictly, Agr is not the locus of Case checking; rather Agr plus some other category is. Note that in the sort of structures we are hypothesizing the CP dependent is an argument of a predicate such as the truth or the -mo part of como. Assuming with Chomsky and Lasnik (1993) that all arguments need to check Case, it follows directly that such a CP must be in a position to check its Case by LF. This directly entails the existence of an extra category between Agr and the structure including the CP. Furthermore, Martins (1994) has argued that Sigma is the sort of category which is responsible for the traditional nominativus pendens. It is then reasonable to propose such an extension, which in turn motivates the presence of a lexically realized sí in (29b). Yet, (30) is not an option: (30) *Ya verás como la tierra es redonda sí. Already (you) will see how the earth is round indeed This suggests that the pro-clause element is licensed only if Sigma is specified for speaker-oriented features, such as those involved in the emphasis encoded by sí. We assume that this is an interpretative condition taking place after LF. If so, the following sentence is grammatical, indeed technically interpretable, but unintelligible, a straightforward possibility within the minimalist system:11 (31) a. La tierra es redonda. The Earth is round. b. #Ya verás como. You’ll see how. Here, the pro-clause after como cannot be interpreted in the absence of the emphatic, point-of-view-dependent sí. In turn, this suggests that the problem with (30) is the unnecessary spell-out of Sigma as sí. The matter directly relates to the familiar contrast in (32), analyzed in terms of economy: (32) a. John (*did) leave.12 b. *John not left. c. John didn’t leave. 260
PARATAXIS
It is reasonable to expect the emphatic sí to correlate with the emphatic do (as in Laka’s proposal). If this connection is granted, the matter of sí’s economy is very much the same as that of do’s, all other things being equal. In particular, there is no need for sí to be the spelled-out Sigma in (30), since, in our terms, there will be in fact no pro-clause to be licensed at LF.13 Finally, we propose that even in those instances where the CP is apparently associated to como in a rather direct way, this is only true in the LF component. In fact, it is always the case that como introduces a pro-clause item in the overt syntax. The gist of the analysis is that this pro-clause remains at the LF component if and only if it is appropriately licensed by a point of view element such as emphatic sí (as in (29b)). That is to say, when the pro-clause is so interpreted. However, in all other instances, a pro-clause is also generated in the initial phrase marker as the subject of -mo, ultimately moving to the Spec of Agr: (33)
D' D co-mo
AgrP
pro
Agr' Agr
XP t
XP t
Two questions then arise: (a) “Why does it seem as if a whole clause is the dependent of como?” and (b) “Why can it not be the case that a real clause is generated in place of the pro-clause?” The answer to question (a) relates to instances of tough-constructions, as in Chomsky’s (1995b) analysis. The main feature of structures as in (34) below is that they involve a Generalized Transformation merging two different phrase markers: (34) A man[who t is easy Op PRO to please t]is easy Op PRO to like t In the minimalist system, there is no D-Structure level of representation. Therefore, it is possible (in fact, necessary in a case like (34)) to build separate phrase markers and merge them in the derivation. In the spirit of Lebeaux (1988), we propose that a Generalized Transformation is responsible for paratactic dependencies. In the initial phrase marker, a pro-clause occupies the place which is otherwise taken by an entire clause. It is this item that enters into the syntactic derivation, engaging in checking just as any other syntactic formative would. At LF, however, two options exist. Either pro remains as such (in which case a point of view salience 261
DERIVATIONS
is necessary for interpretation), or else a separate sentence, literally a separate text, substitutes into the pro-clause (35):14 (35)
DP D
… D co-mo
AgrP Agr'
pro Agr
XP t
CP
XP t
As for question (b) (Why can a clause NOT be base-generated in place of pro?), Note the following property of paratactic dependencies. They need not invoke an overt Comp, and hence they cannot, within the logic of minimalism. To put it differently, if a main clause-like dependent is possible, then it must be chosen over a subordinate-like dependent. Economy alone grants this conclusion. In our terms, this means that, whenever possible, the grammar will prefer the presence of a pro-clause, instead of a full clause, perhaps much as overt pro-forms are avoided. Finally, the matter arises of why a pro-clause is impossible in instances with la verdad “the truth” or more generally hypotactic dependents. This is now a matter of pro licensing, as in (35). In our terms, pro-clauses are licensed only in the Spec of an AgrP associated to a strong, point-of-view dependent Sigma head. There are no pro-clauses elsewhere, anymore than pro items in general do not appear other than associated to AgrP Specs whose head has the appropriate characteristics (in terms of strength or whatever else is relevant). The reason why parataxis is so restricted is straightforward. It requires the presence of a pro-form, which is itself extremely restricted, in familiar ways. That is intended also as a general way of predicting (2) through (7). In all these contexts by hypothesis the relevant syntax is impossible. Thus observe: (36) a. (*La verdad the truth b. (*La verdad the truth c. (*La verdad día. the truth day
de) que of that de) que of that de) que
la tierra the earth la tierra the earth la tierra
es is es is es
redonda es un hecho. flat is a fact redonda aceptaréis algún día. round you’ll accept some day redonda lo aceptaréis algún
of that the earth is round
262
it you’ll accept some
PARATAXIS
d. No me gusta el hecho de (*la verdad) que la tierra es redonda. “I don’t like” the fact of the truth that the earth is round e. Desde (*la verdad de) que la tierra es redonda, . . . Since the truth of that the earth is round, f. Lamento (*la verdad de) que la tierra sea redonda. I regret the truth of that the earth be round Why point-of-view dependent Sigma heads are impossible in all these contexts is of no concern to us right now, although of course that issue is in itself ultimately very important. All that we are saying now is that absence of constructions with la verdad in these instances correlates with absence of como, which is natural in terms of the sort of syntax we have argued for.
5 Some further considerations on complementizers To conclude, note that given the logic of what we have said, complementizers which are not pronounced should outrank pronounceable counterparts (just as Sigma realizations have a PF matrix only if independently needed). However, consider (37): (37) a. Galileo believed the Earth was round. b. Galileo believed that the Earth was round. If the absence of a pronounced complementizer in (37a) were to be preferred, the option with the overt complementizer should be impossible, contrary to fact. This suggests that the null complementizer in (37a) has a PF representation, residual as it may be, and it is in fact a totally different lexical item from its overt counterpart. The point is that, inasmuch as the two complementizers are different lexical items, a comparison of derivations involving either of them would be illicit.15 Presumably, the null complementizer in (37a) arises as the result of cliticization to the matrix verb. Of course, in the matrix a null complementizer could not cliticize to any host, and a derivation involving it would crash. This is all to say that the facts in (37) do not involve parataxis, even in (37a), where the complementizer is missing. Rather, (37a) involves a clitic complementizer which is a different lexical item from a full version, and is possible only in instances where cliticization is independently allowed.16 In contrast, what we are suggesting for radically null complementizers is Chomsky’s proposal that elements with no PF features can be inserted in the LF component. These newly inserted null complementizers outrank their overt counterparts because they are taken to be the same lexical item. Then we must determine why clauses introduced by radically null complementizers (i.e. main clauses) can only substitute into the sites which we are hypothesizing as paratactic, and not into the sites which everyone assumes are hypotactic. But now the answer is clear. The relevant substitution is a textual 263
DERIVATIONS
reconstruction into a pro-clause, and only that (35). Therefore, once again, the distribution of pro-clauses holds the key to parataxis. This approach predicts some recalcitrant data suggesting that the phenomenon of null complementizers is not unified. We will venture this specific claim: while the complementizer in (38) is a clitic, the one in (39) is the result of LF insertion: (38) a. Deseo lleguen bien. I wish you arrive well. DESIDERATIVES b. Quiere no les falte de nada. He wants they miss nothing. VOLITIONALS (39) a. Dijeron habían llegado ayer. They said they had arrived yesterday. DECLARATIVES b. Lamento no estés contento con tu trabajo. I regret you are not happy with your work. FACTIVES The analysis predicts that the clitic complementizer should be restricted by the possibilities of cliticization, thus explaining the required adjacency between matrix and lower verb in (40):17 (40) a. Deseo (*los niños) lleguen bien. I wish the children arrive well. b. Quiere (*a sus hijos) no les falte de nada. He wants for his children nothing is missing. In contrasts, declaratives, epistemics and factives tolerate a preverbal subject after the null complementizer, as noted in Moll’s (1993) dissertation: (41) a. Decía los estudiantes apenas se habían quejado. He said the students had hardly complained. b. Lamentamos a tu hermana no le hayan dado el trabajo. We regret to your brother they haven’t given the job. c. Pensaba a ellos les iban a hacer este honor. He thought to them they were going to do them that honor. d. Dijo a su confesor le había de contar tales cosas. He said to his confessor s/he would tell him such things. Particularly interesting in this respect are instances of wh-extraction. The prediction is that movement across (radically) null complementizers should be barred, since such are, in effect, main clauses. In contrast, movement across clitic complementizers should be possible. We believe the prediction is borne out:18 (42) a. *Qué libro dijeron/pensaron/creyeron no habían leído? What book did they say/think/believed they hadn’t read? b. *Con quién lamentas/das por sentado hayan hablado? With whom do you regret/conclude they may have spoken? 264
PARATAXIS
c. Qué libro quieres/deseas/esperas hayan leído? What book do you want/wish/expect they may have read? Recall, at last, that epistemic and declarative verbs allow dependent clauses which may or may not be introduced by que. Why are overt complementizers even possible in these instances? Notice, crucially, that the cliticization option is not at stake in the relevant Spanish instances. Therefore, we are led to conclude that the contrast involves two entirely different lexical structures, one paratactic and one hypotactic.19 The tests introduced in Section 1 confirm this extreme hypothesis. The facts are as in (43) through (45): (43) Predicate raising: A punto de llorar pensaba/decía *(que) estaba. Ready to cry w/he thought/said *(that) s/he was. (44) “Neg-raising” and the licensing of negative polarity items:20 a. No pienso *(que) diga la verdad jamás. I don’t think *(that) he’ll ever tell the truth. (I think he won’t ever tell the truth.) b. No pienso *(que) venga bicho viviente. I don’t think *(that) a soul will come. (45) Bound variable binding demanding an overt complementizer: Nadie piensa (que) es tonto. Nobody believes (that) he is stupid. This is possible without the complementizer only if the embedded subject is read referentially. We must thus allow declaratives and epistemics in two distinct subcategorization frames. By hypothesis, one of these must involve the complex sort of structures we have tried to motivate here. But at the same time, we must allow these sorts of verbs to appear together with simpler clausal structures. The latter should be the source of hypotaxis.
265
14 DIMENSIONS OF NATURAL LANGUAGE with Paul Pietroski 1 Introduction Human language is manifested primarily through the one-dimensional channel of speech, in which temporal order reflects certain linear relations among the parts of complex expressions. To a large extent, the linguist’s task is to uncover aspects of grammar not manifested in the patent linguistic signals. For instance, a little reflection shows that language is at least two-dimensional, if only because hierarchical relations matter. A string of words like she saw the man with binoculars corresponds to more than one expression; and John thinks he likes spinach differs from he thinks John likes spinach in ways that go beyond mere differences in the linear order of constituents. But how far should such reflections be pursued? Do adjuncts, for example, “inhabit a different dimension” from arguments? Do causative verbs like kill exhibit a “higher” dimensionality than adjectives like dead? Or are these just metaphors? In this chapter we explore two related theses. The adjunct system is brutely concatenative and thus essentially flat (apart from asymmetries induced by the history of concatenation). But the thematic system brings in dimensionality – and, as a consequence, nontrivial asymmetries. If correct, these claims bear on many current topics, including the nature of the substantive and grammatical sub-systems of the lexicon, as well as the place and overall nature of lexicoconceptual notions within the system at large. It may also shed some new light into the ongoing debate between atomists (like Fodor) and those (like Pustejovsky) who advocate lexical decomposition. For while a proposal that systematically codes structurally rich notions into elements bearing thematic relations cannot be purely atomistic, inasmuch as those notions cut across the entire fabric of the linguistic system (and perhaps beyond), neither will it be decompositional in the usual sense.
2 The asymmetric nature of language Let us start by thinking about when talk of dimensions is appropriate. While physicists speak of space-time as having four (ten, twenty-one, . . .) dimensions, Euclidean geometry provides the most obvious examples. In terms of succes266
DIMENSIONS OF NATURAL LANGUAGE
sively lower dimensions, we contrast cubes with squares, line segments and points, or triangular pyramids with triangles, line segments and points, etc. As discussions of “Flatlanders” make vivid, one can capture all the facts about, say, two-dimensional (planar) objects without doing the same about threedimensional objects. Unfortunately, linguists find themselves in the role of Flatlanders. We experience a one-dimensional object and, through various sorts of tests, must somehow figure out what higher dimensions it may correspond to. While nothing guarantees success in that task, in other domains too the dimensionality exhibited by a given range of objects is far from obvious. For example, one can represent the numbers as points on a line, running east to west, with each point standing for a number greater than the number represented by any point to the west. Are all numbers formal objects of the same dimension? Arguably not, since there are more real numbers between 1 and 2 (in a sense made precise by diagonalization proofs) than positive integers. This difference in cardinality is masked with the “decimal point” notation system in which digits to the left of the point are associated with increasing positive powers of ten, while digits to the right of the point are associated with increasing negative powers of ten. This lets us say both that 101 is greater than 11 and that 1.01 is smaller than 1.1; and it lets us accommodate infinitely many real numbers between 1 and 2, like /2, that can only be approximated with (nonrepeating) decimal expansions. Correspondingly, the (one-dimensional) number line fails to encode many facts about numbers. If point P maps to /2 and point Q to , the distance between P and Q does not itself reflect the fact that the number corresponding to Q is twice the number corresponding to P.1 One can also speak of different dimensionalities, without considering sets of different cardinality, if one is thinking about differences between certain operations. While there is an intuitive sense in which subtraction, division and roots are “inverses” of addition, multiplication and powers (and vice versa), there is also an intuitive asymmetry here. If we start out thinking about positive integers, adding, multiplying or exponentiating will not force us to consider anything new; whereas subtracting, dividing and taking roots will lead (via expressions like “1 1,” “1 2,” “2/3,” or “兹1”) to zero, negative numbers, fractions or imaginary numbers. Familiar considerations, reviewed in the appendix, suggest natural ways of thinking about each expanded class of numbers in terms of different dimensionalities corresponding to an inverse of an intuitively more basic operation. We mention these points, first of all, as stage-setting for a discussion of some linguistic facts which suggest that natural language presents different kinds of unboundedness. In the most boring sense, sentences can be very very long because words like very can be repeated ad nauseam. We may call this iteration, a process which is easily describable in terms of simple-minded finite-state automata.2 A slightly less boring fact is that connectives like and, or, but, etc., in conjunction with finitely many “core” sentences, allow for endlessly many (more complex) sentences. Verbs that take sentential complements, as in Pat said that Chris believes that Tom sang, also introduce that kind of openendedness. In neither of those instances is it enough to invoke iteration, and we 267
DERIVATIONS
rather need some recursive mechanism of the sort provided by rewrite systems.3 In turn adjuncts (as opposed to arguments) may present an especially interesting case of unboundedness, even more so in the case of “disjunct” adjuncts. It is important to determine whether these kinds of unboundedness somehow relate to the matter of dimensionality. Moreover, we know that language manifests all sorts of asymmetries, hierarchies and sub-case conditions. Some are obvious like words are arranged into phrases. Others are more subtle, but have been explored by linguists over the years. For example Kayne (1994) has argued that asymmetries in mechanisms of phrasal ensemble result in particular differences in word order. Work by many in the psycho-linguistic arena demonstrates that children can somehow select a given structural sub-case as the initial hypothesis, and then learn the elsewhere case, if different from the initial hypothesis, in the presence of positive data (see Crain and Thornton 1998). If the relevant organizing force is language itself, these sub-case relations must reflect a cut in the very structure of the system. In the last half century or so, researchers have also found, on solid empirical grounds, thematic, aspectual, nominal or referential hierarchies. Of course for some purposes it may be enough to just describe these “levels” and either take them as primitive or blame them on some outside reality (e.g. by correlating the thematic hierarchy with causality). But if one wants to understand how come these hierarchies arise in natural language, and how they relate to one another and to other such asymmetries, including those found in language acquisition, one ought to go deeper. In our view, one should then ask whether what one might call “the asymmetric nature of language” somehow reflects the different dimensionalities in its very fabric. That question turns out to be specific. As an example of the general point, we consider below a much-discussed fact concerning causative constructions, namely, the “one-way” character of the typical entailments. For example: if x boiled y, then it follows that x did something that caused y to boil; but if x did something that caused y to boil, it doesn’t follow that x boiled y. Why should this be so? One can encode the facts by saying that x boiledT y means “x directly-caused y to boilI,” where subscripts indicate transitive/ intransitive forms of the verb and “directly-caused” is a term of art intended to capture the difference between “x boiledT y” and “x caused y to boilI.” But not only does this seem circular; it fails to explain a crucial fact, “Why is the entailment one-way?” A related fact, also requiring explanation, is that “x boiled y on Monday” fails to be ambiguous in a way that “x caused the soup to boil on Monday” is. We take this to be a special case of a more general question. Why does natural language exhibit the asymmetries it does?
3 “Accordion” events As many theorists have discussed (see, e.g. Parsons 1990),4 sentences like (1) Pat boiled the soup. 268
DIMENSIONS OF NATURAL LANGUAGE
have meanings that seem to be structured along lines indicated by (2) ∃e∃x{Agent(e, Pat) & R(e, x) & Boiled(x) & Theme(x, the soup)}; where “R” stands for some relation that an event e (done by the Agent) bears to the boiling of the soup, and “Boiled” captures the meaning of the intransitive verb in (3) The soup boiled. We assume that the meaning of (3) is correctly represented with (4) ∃e{Boiled(e) & Theme(e, the soup)} If (2) is true, so is (4), and arguably, this explains why (3) is true if (1) is. Following Chomsky’s (1995b) development, via Baker (1988), of Hale and Keyser (1993), suppose the syntax of (1) involves a hidden verbal element, like the overt causative element in many languages, with which the intransitive verb boiled combines. If the syntactic structure of (1) is basically (1S) {( Pat) [( v–boiledj) [ tj ( the soup)]]}, where the intransitive predicate boiled (which originally combines with an internal argument) raises to combine with the covert v (thereby forming a complex predicate that combines with an external argument) then the question of why (1) implies (3) reduces to the question of why (1S) has a meaning of the sort indicated by (2). And while there is room for debate about the details (including the nature of v in general), it is not hard to see how such a story would go, at least in outline. This does not, however, explain why (1) differs semantically from (5) Pat did something that caused the soup to boil. or (6) Pat made the soup boil, both of which can be true in cases where (1) is false. Suppose Pat is an arsonist who torches a house that, unbeknown to Pat, contains a pot of soup. As a result of Pat’s action, the soup boils. So (5) is true, and our judgment is that (6) is also true. Yet (1) is false. This shows that “R” cannot stand for a simple (extensional and transitive) notion of causation, thus raising the question of what “R” does stand for. But even if one has a hypothesis, say in terms of “direct” causation, that accommodates the facts, the question remains, “Why isn’t (1) synonymous with (5)? Why does (1) have a natural meaning that restricts it to a sub-class of the cases that would verify (5)?” Moreover, as Fodor (1970) notes, (7) Pat boiled the soup on Monday. is not ambiguous in a way one might expect given (1M). (7) cannot mean that Pat (directly) caused an “on-Monday boiling” of the soup, leaving open the possibility that Pat’s action occurred on Sunday; and 269
DERIVATIONS
(8) ∃e∃x{Agent(e, Pat) & R(e, x) & Boiled(x) & Theme(x, the soup) & On-Monday(x)} is not a possible meaning of (7). But neither can (7) mean that Pat acted on Monday and thereby caused the soup to boil, leaving open the possibility that the boiling of the soup did not occur until Tuesday. We take this as evidence that, for reasons that we return to, “R” stands for a whole-to-part relation. Call that Assumption One. The idea, developed in some form by many authors,5 is that if (7) is true, Pat is the agent of a complex “accordion-style” event whose final part is a boiling of the soup and whose first part is an action by Pat, and this event, which includes both Pat’s action and the boiling of the soup, occurred on Monday. Thus, we would specify the meanings of (1), (3) and (7) with (1M) ∃e{Agent(e, Pat) & ∃x[Terminator(e, x) & Boiled(x)] & Theme(e, the soup)} (3M) ∃e{Boiled(e) & Theme(e, the soup)} (7M) ∃e{Agent(e, Pat) & ∃x[Terminator(e, x) & Boiled(x)] & Theme(e, the soup) & OM(e)} where “Terminator” expresses a kind of thematic role. If an event x is the Terminator of an event e, then x “participates in” e by virtue of being e’s final part. This instantiates our Assumption One. As it stands, (3M) does not strictly follow from (1M), without a further assumption. The Theme of an accordion-style event e is the Theme of any Terminator of e. Call this Assumption Two, formally: Terminator(e, f) → [Theme(e, x) ↔ Theme(f, x)] This is a plausible assumption about natural language, if, as Tenny (1994) and others have argued, Themes “measure out” events with duration by somehow establishing their “end points.” We also return to this central assumption, although in less explanatory terms than we have for the first one. That handles some of the facts Fodor (1970) stresses; see also Fodor and Lepore (1998). If Pat sets the house on fire, thereby causing the soup to boil, it does not follow that there is any event e such as that Pat is the Agent of e and the Theme of e is some boiling of the soup, that is, there may be no single (accordion-style) event that includes both Pat’s action and the subsequent boiling.6 Still, Fodor’s main question remains. Why can (8) not mean that Pat is the Agent of an accordion-event that ends with a boiling of the soup on Monday? Why is (8M*) ∃e{Agent(e, Pat) & ∃x[Terminator(e, x) & Boiled(x) & OM(x)] & Theme(x, the soup)} not a possible meaning of (8)? 270
DIMENSIONS OF NATURAL LANGUAGE
If the syntactic structure of (8) is (8S) {( Pat) [[( v–boiledj) [ tj ( the soup)]]] (on Monday)} where the predicate “( ( v–boiledj) [ tj ( the soup j)])” combines with the adjunct “on Monday” to form a still more complex predicate (that combines with the external argument “Pat”), one can hypothesize that independent syntactic principles block the adjunct from combining with “v–boiled.” And if (8S*) {( Pat) [[( v–boiledj) (on Monday)] [ tj ( the soup)]]} is not a possible structure of natural language, that might well explain why (8) fails to be ambiguous in the way Fodor stresses. Of course, the question is to determine precisely why (8S*) is bad.
4 Two approaches to sub-event modification At this point we must extend our database in ways generally inspired by examples in Pustejovsky (1995), adapted to our purposes. Consider (9): (9) Jack grew shiitake mushrooms for weeks at a time in the 1990s. The art of mushroom growing involves spore inoculation into a log, which then just sits there for half a dozen years or more. All that most shiitake farmers do is wait for a good rain, and then mushrooms grow like, well, mushrooms. This normally happens twice a year. It seems that if (9) is true, then some event of Jack’s growing mushrooms lasted close to a decade. However, there are various subevents of growing mushrooms involved as well, each lasting less than a month. At first, this might suggest that one can (after all) use adjuncts to describe “internal events” that are integral parts of the larger matrix event. In support of this idea, one might also note a classic example from the 1960s: (10) The king jailed the prince in the tower. has a reading that seems to mean (roughly) “the king brought it about that the prince was jailed in the tower.” These facts are hardly positive for Fodor, since they suggest meanings that ought to be unavailable on his view. But they are also puzzling if (in reply to Fodor) one holds, like we have, that adjuncts cannot modify an incorporated element. On the other hand, (11) Jack grew shiitake mushrooms in 1995. is not ambiguous in the relevant way. It cannot be true if Jack inoculates his logs in 1993, dies in 1994 and the mushrooms finally come out in 1995. Which means there are two sorts of facts at issue. Modifiers denoting open temporal events (like for weeks) can, if appropriately chosen, be used to say something about sub-events. By contrast, modifiers denoting concrete times (like in 1995) modify the whole event. There are at least two approaches one can take to the facts just mentioned. 271
DERIVATIONS
One is to assume that some adjuncts can adjoin to verbs like “grew” (or adjectives like “jailed”) prior to incorporation, and then go along for the ride when incorporation occurs. Call that Hypothesis A. Another possibility, however, is that all these adjuncts are formally predicates of the matrix event (corresponding to the post-incorporation transitive verb), but some predicates apply to an accordion-event by virtue of how certain parts of that event are related to the whole (as per Assumption One). That would be Hypothesis B. To clarify Hypothesis B, consider an old puzzle concerning sentences like (12) Jack took the beans. (13) Jack took the beans intentionally. If Jack tried to take the beans and did so, (12) and (13) are both true. But if Jack successfully tried to take a box, which unbeknown to Jack contained the beans, (12) is true but not (13). This makes it hard to see how (13) could be true iff ∃e[Agent(e, Jack) & Took(e) & Theme(e, the beans) & Intentional(e)]. But the first part of an accordion-event will typically be some action such as an attempt to do something by the relevant agent. That is, for accordion events: Agent(e, x) → ∃a[Initator(a, e) & action-of(a, x)]. And suppose that actions are associated with propositional satisfaction conditions; see Pietroski (1998, 2000). Then, as a first-pass approximation, an accordion-event e is intentional if the condition associated with the action that initiates e is satisfied by the occurrence of e itself. In which case, an event of Jack taking the beans is intentional if it starts with Jack trying to take the beans, while an event of Jack taking the beans is not intentional if it starts with Jack trying to take a box. Similarly, one might say that a complex event e of shiitake-growing satisfies the predicate “for weeks at a time” if e has sub-event parts of the same sort each of which lasted for weeks. Again, this shows Assumption One at work, a single complex event, composed of multiple episodes of shiitake-growing-by-Jack, could be both “in the 1990s” and “for weeks at a time.” In either hypothesis A or B one has to understand how come certain adjuncts modify into sub-events, either through direct adjunction prior to incorporationto-v (in Hypothesis A) or by way of modifying the whole in a way that depends on certain part-whole relations (in Hypothesis B). In other words, in both instances adjuncts have to be selective, as a result of their lexico-semantic properties (say, whatever relevant point distinguishes for weeks from in 1995), much in the spirit of Ernst (2001). Hypothesis A, however, makes those selection properties relevant as the derivation unfolds, with modifiers becoming active pretty much at the point that the sub-event they modify becomes active. This view is obviously decompositionalist. In contrast, in Hypothesis B modifiers are active only “at the tip of the iceberg,” and manage to modify into the internal structure of complex predicates as a result of the part-whole make-up of these predicates. Of course, Hypothesis B is more congenial to an atomist treatment 272
DIMENSIONS OF NATURAL LANGUAGE
of lexical items than Hypothesis A is. (Thus our reply to Fodor is compatible with at least some of Fodor’s theoretical commitments.)
5 Context anchoring of sub-events We have seen sub-events active through the use of targeted modifiers, some of which may reach them. Other grammatical mechanisms, this time of an argumental sort, yield similar results and allow us to illustrate the role of Assumption Two. To see this, consider the effect of clitic climbing on sub-events. This is an ideal test-ground because, although pronouns can be standard arguments of given predicates, they climb when in clitic guise, in which case they end up introducing peculiarly narrower readings. Consider this situation. Somewhere, at high noon, a terrorist launches a missile that strikes a hospital at 12:20. In that time interval, doctors remove a patient’s heart (which was in its original place at 12:00) and store it in a laboratory at the other end of the hospital, while a new heart is placed (at 12:15) in the patient’s chest. The missile strikes and the following Spanish sentences are uttered in the news to describe the horrifying event: (14) a. El terrorista destrozó el corazón del paciente. the terrorist destroyed the heart of-the patient b. El terrorista destrozó su corazón del paciente. the terrorist destroyed his/her heart of-the patient c. El terrorista le destrozó el corazón al paciente. the terrorist DAT destroyed the heart to-the patient The question is, according to these reports, where did the missile hit? Was it the operating room where the patient was, or the laboratory where the original heart was at 12:20? It is not easy to translate those sentences into English, as they all mean “the terrorist destroyed the patient’s heart.” As it turns out, though, Spanish has a way of determining whether “the patient’s heart” in question is the one in the chest. This subtlety is not exhibited in (14a), which has pretty much the import of the English counterpart. It is not shown in the very stilted (14b) either, which would be the equivalent of the English “the terrorist destroyed the patient’s own heart.” Unfortunately this is of little help in this instance, as the heart stored in the laboratory is definitely, inalienably, indeed genetically the patient’s own, albeit non-functioning heart and the minute the new heart is connected in the patient, barring a rejection, that heart is also definitely, inalienably, if not genetically, the patient’s own. The important sentence is (14c). There is no doubt in this instance. The destroyed heart is the new heart, not the old one. The precise mechanics for why that is are immaterial (see Uriagereka 2001a). Descriptively, we must ensure that clitic le, referring to the patient, serves as contextual anchor for the heart’s destruction.7 If the destroyed heart is not just any old heart, but the heart at the patient, then the intended semantics would follow.8 But the important thing is this. The verb destroy has the rough logical 273
DERIVATIONS
form of boil in the examples above, lexical differences aside. One must then ensure that the clitic le contextually anchors just the destruction part. At the time of the missile’s launching by the terrorist (12:00), the heart which would be hit was still attached to the patient’s body. Qua that causing event, then, le (the patient) as contextual anchor should be possible. The issue is how contextual anchoring of the embedded sub-event works. On an A-style explanation each sub-event is syntactically active, and thus context-confinement via le can take place of the result sub-event without affecting its associated, causing sub-event. Again, if this is the only possible analysis of the facts, it constitutes a counter-example to the atomist view. However, a B-style explanation is also possible. The key from that perspective is that we cannot just contextualize the causative part of the event ignoring the rest of the accordion-event. Which heart was at the patient’s chest when the accordion-event started is quite irrelevant to the event at large. The heart at the patient’s chest when the event ended is what is crucial, even if, in the case that concerns us now, that heart was not around at the event inception. The lexico-conceptual contribution of the heart is tied to the intransitive (preincorporation) verb “destroy.” So the relevant heart is the one at the patient’s chest when its destruction takes place, and, as per Assumption Two, this is the only theme for the accordion-event at large. Once again, this sort of approach saves the atomist perspective, as no direct contextualization of the internal event ever takes place in this view.
6 The internal make-up of events To sum up what we have seen in these last couple of sections, note that either A or B-style approaches to the possible modification or contextual anchoring of internal events require that these sub-events be somehow represented, either at some initial syntactic level or in logical form. The main point of this chapter is the nature of that representation. Our suggestion is that it has dimensional characteristics. Intuitively, if you detect a causative layer in an accordion-event you a fortiori know now that there are lower layers in much the same way that an n-dimensional apparatus underlies an nm one. That sort of reasoning is direct for A-style approaches, as the different syntactic layers correspond to the various dimensions. But even in B-style approaches, mindful of atomistic considerations, the reasoning follows. After all, what allows us to recover the presence of information in the “viscera” of an event is its specific information make-up. This make-up may not be syntactic, in the sense of being available for various syntactic operations; still, it has enough properties to support modification (by appropriate adjuncts) and various sorts of contextual anchors. We think this kind of make-up suggests that the relevant expressions exhibit dimensionality that is not initially obvious.9 The main argument for our claim stems from the fact that it provides an explanation to what we have called Assumption One. To repeat both it and Assumption Two: 274
DIMENSIONS OF NATURAL LANGUAGE
Assumption One If an event x is the Terminator of an event e, then x “participates in” e by virtue of being e’s final part. Assumption Two The Theme of an accordion-style event e is the Theme of any Terminator of e. Consider the intuition that events (as opposed to states) introduce the idea of change over time, and processes somehow “extend” that idea to introduce Agents responsible for the change; see Mori (forthcoming) for this general sort of idea, and references. Modulo concerns of the sort raised by Fodor (discussed above), the inferential relations among: (15) Jack opened the door, (16) The door opened, (17) The door was open, suggest a hierarchical relation among the adjectival, intransitive and transitive forms of open. To a first approximation, one can think of the adjective open as a predicate of individuals, thus representing the meaning of (17) with “the(x):door(x)[Open(x)].” Or one might think of the adjective as a predicate of states, conceived as eventualities that “hold” through time (whereas events “culminate” with their themes being in certain states), rendering the meaning of (17) with “∃s[Open(s) & the(x):door(x) [Theme(s, x)]]”; see Parsons (1990, 2000) for discussion. Either way, one can go on to think of events as changes (of state) in individuals, thus treating intransitive verbs as predicates of changes, and accordion-events as processes that terminate in an event of some individual coming to be in the relevant state. Correspondingly, one might think of all predications as involving (at the very least) ascription of a property to an object y. A more sophisticated and informative predication would have the implication that y was not always open. It underwent some change of state (over time). We could represent this kind of predication with “∃e∃s[Open(s) & Theme(s, y) & Change(e, s),” where “Change(e, s)” means that e is an event in which the theme of s, and thus the Theme of e, comes to be in state s. A still more sophisticated and informative predication would have an implication concerning the source of the relevant change, and thus implicate another event participant – namely, the Causer. We could represent this kind of predication with “∃e{Agent(e,x) & ∃f[Terminator(e, f) & ∃sChange(f, s) & Open(s)] & Theme(e, y)]}.” Clearly, (18c) implies (18b), which implies (18a): (18) a. ∃sOpen(s) STATE b. ∃f[∃sChange(f, s) & Open(s) EVENT c. ∃e{Agent(e,x) & ∃f[Terminator(e, f) & ∃sChange(f, s) & Open(s)]} PROCESS 275
DERIVATIONS
It does not immediately follow that the relevant representations should be dimensionally expressed. One could, for example, just state the facts in terms of “meaning postulates” and leave it at that. But this leaves the question of why, in addition to the brute hierarchical array in (18), the more subtle central facts embodied by Assumptions One and Two should also hold.10 The central intuition behind Assumption Two is the idea that an eventuality is somehow built around a “lower” Theme. In particular, the thesis Terminator(e, f) → [Theme(e, x) ↔ Theme(f, x)] states that if we extend a simple event f into a process e, then the Theme of e just is the Theme of f. That does not follow on any simple-minded interpretation of the hierarchy in (18). Why could it not be, for instance, that if we extend f into e then the Theme of e is an entirely separate entity? Hierarchies, as such, are a dime-a-dozen. For example, one can hierarchically arrange straws according to their length, without any given straw standing in any particularly interesting relation with regard to any other. Accounting for Assumption Two evidently requires a hierarchy that arises because its levels are, in some sense, defined in terms of dimensions involving the Theme space. Similar considerations apply with regard to Assumption One. Nothing in the sort of hierarchy loosely represented in (18) entails anything with respect to a given sub-event being part of a larger event. Again, in the straw hierarchy we just mentioned there is no sense in which a small straw is part of a larger one. Yet without our narrower assumption, coupled with the other assumption just discussed, we would not be able to address Fodor’s important concerns, and our theory would be wrong. To repeat, we are not saying that the tight nature of the hierarchy hinted at in (18) must be dimensionally represented. There are other, more or less cumbersome, ways of adding the extra assumptions. Indeed, one could just list them. For now on, our point is more modest. A dimensional view of the hierarchy would fit nicely with our (empirically motivated) Assumptions One and Two. Even that last sentence ought to be clarified. Nothing that we have said forces Themes, specifically, as a foundation for the dimensional system. But it is easy enough to see the kind of extra step one would need in order to make Themes privileged. For instance, making the very definition of a verb work around the designated Theme role, in much the same way as a dynamic function (e.g. the derivative over time of space, that is, velocity) is built around some static space. But that extra step is not formally central to the concerns of this chapter. One could have had a dimensional approach to the issues of concern now without events being built around themes. The fact that they are still needs an explanation, though we will not provide one in this chapter.
7 The locus of dimensional shifts We have suggested that the apparatus implicit in (18) is, in fundamental respects, like the one underlying familiar notions from geometry or arithmetic. 276
DIMENSIONS OF NATURAL LANGUAGE
An analogy is that processes are to events are to states as cubes-over-time are to cubes are to squares: Process (e) → ∃f [Terminator (e, f)] → ∃s [Change (f, s)] 4-Dimensional (x) → ∃y [Temporal-Projection-of (x,y)] → ∃z[Depth-Projection-of (y,z)]
But even if something along these lines is correct, one wants to know whether specific lexical items (or something else in the grammar) are responsible for these dimensional shifts. On the neo-Davidsonian account defended here, one specifies the meaning of (7), repeated below (7) Pat boiled the soup on Monday. as: (7M) ∃e{Agent(e, Pat) & ∃x[Terminator(e, x) & Boiled(x)] & Theme(e, the soup) & OM(e)} The transitive verb (derived from the intransitive) is a monadic predicate of events, as is the partly saturated phrase on Monday. Likewise, Agent(e, Pat) is a monadic predicate of events. Pat, like Monday, the object of a preposition, makes its semantic contribution as the argument to a binary predicate that expresses a relation between events and other things. Similar remarks apply to the soup. As is often noted, thematically elaborated event analyses treat arguments and adjuncts on a par, since both are treated as conjuncts of a complex event description. Indeed, verbs are treated the same way. Whatever special role verbs may play in sentence formation, for purposes of interpreting the sentence formed, verbs are (like arguments and adjuncts) treated semantically as conjuncts of an event description. This is a simple and fairly radical idea. For the suggestion is that, modulo an occasional existential closure, phrase markers are interpreted as conjunctive predicates. This requires that arguments like Pat and the soup be interpreted via thematic roles, as by themselves they are not predicates of events. Thus, neoDavidsonians are committed to a limited kind of type-shifting. When Pat appears as the subject of a verb like boiledT, it is interpreted as the monadic event predicate “Agent(e, Pat)” – or making the argument position more explicit, “∃x[Agent(e, x) & Pat(x)]”; and when the soup appears as the object of such a verb, it is interpreted as the monadic event predicate “Theme(e, the soup)” – or making the argument position more explicit, “∃x[Theme(e, x) & the-soup(x)].” In this sense, arguments and thematic roles introduce a twist to the basic compositional apparatus. This is unsurprising, in so far as event analyses are designed to account for the compellingness of inferences involving adjuncts, like “Pat boiled the water on Monday, so Pat boiled the water,” as instances of conjunction-reduction (in the scope of an existential closure; see Pietroski (forthcoming a, b) for details). 277
DERIVATIONS
We find it significant that no language we know of has lexical items synonymous with the (metalanguage) expressions “Theme,” “Agent,” “Benefactive,” and so on. One can say that there was a boiling of the water by John; but “of” and “by” do not mean what “Theme” and “Agent” mean. This is of interest. Languages have words for tense, force indicators, all sorts of arcane quantifications and many others. Yet they do not lexically represent what seems to be a central part of their vocabulary. Similarly, case-markers are not correlated with -roles (except, perhaps, for a handful of restricted, so-called lexical cases). Thus the accusative him can bear all sorts of -roles : (19) a. b. c. d. e.
I like him. I lied to him. I believe him to be a genius. I literally used him as a counterweight to lift the piano. I lifted the piano with him as a counterweight.
This is typical, and so familiar that it is hardly ever noticed. Why is there no language that distinguishes him in, say, (19a) and (19e) as in (20)? (20) I like theme-him, but I used instrumental-him to lift the piano. It is not clear to us why most familiar analyses do not predict some variant of (20), with either morphemes attached to him or entirely separate words in each instance (with a paradigm for pronouns of the sort witnessed for person, number, gender, definiteness, and so on). We think this sort of fact reveals a simple truth. -roles are not part of the object-language. This makes perfectly good sense in the neo-Davidsonian view, where the normal mechanisms for composition that language has are utterly trivial, dull predication. But as we just saw, language has a mechanism for “stepping out” of this simple-minded predicative routine. And perhaps this is relevantly like the way that “inverting” arithmetic operations can lead to “stepping out” of a given domain (see Appendix). -roles let speakers use otherwise simple linguistic expressions, initially “designed” for simple predication, to describe domains (and causal relations) with elaborate structures. More technically, we take a -role to be a type-lifter, which raises the type of an argument to that of a predicate, which can then relate appropriately to other predicates. Whenever the system uses one of these type lifters, bona fide functions with an argument and a value, it must step out of its boundaries. This is empirically reflected on the fact that these devices are not syntactic formatives. And it strengthens the argument for the dimensional view. It is natural to think of these external-to-the-lexicon items as co-extensive with dimensional cuts (a view first presented in Mori 1997).11
8 The place for adjuncts We have only given plausibility arguments for dimensions. It is natural to have them as points where -roles come in, and the resulting representations are 278
DIMENSIONS OF NATURAL LANGUAGE
tightly articulated in interesting ways (with part-whole implications built around the notion of a theme space). But in truth those same results could be achieved in other ways, perhaps less elegantly. In this section, we would like to explore the possibility of a stronger argument. It will be even more speculative, but perhaps also more tantalizing in its form. The issue is where adjuncts fit in the general minimalist picture of grammar. In a “bare” phrase-structure system of the sort explored by Chomsky (1995b), one can easily define heads, their complements and their (multiple) specifiers, but adjuncts are a nightmare to define. Similarly for other important works, which fit adjuncts at the price of assimilating them to specifiers (Lasnik and Saito 1992; Cinque 1999) or conversely (Kayne 1994). Adjuncts are special in other respects as well, having no complex syntactic properties to speak of. Pure adjuncts are not selected, do not move to check features, do not get bound or bind, do not surface as empty categories (hence do not control and are not controlled). In addition they disallow movement across and their wh-movement (if present at all) is destroyed by the weakest of islands. Uriagereka (2001b) develops an argument, suggested by Chomsky in recent class lectures (Spring 2001) and reminiscent of ideas presented in Lebeaux (1988), that adjuncts inhabit their own (especially simple) dimension. Suppose that phrase-markers with adjuncts cannot be appropriately labeled, since labeling mechanisms reflect finitely many different types of core linguistic relations.12 In essence a verb-phrase does not change its type because of relating to some adjunct, and this sort of relation is in principle unbounded. If adjuncts do not have labels, how does the system tell apart one syntactic object X with adjunct Y from the same syntactic object X with adjunct Z? In one sense, it ought to be simple. The adjunct, after all, is there. And from a neo-Davidsonian semantic perspective, the adjunct is indeed “just there” as a mere conjunct (among potentially many) in an event description. But it is not clear what formal properties (if any) the grammar tracks when dealing with the relevant object, X associated to Y or to Z, if (by hypothesis) it is not tracking sub-parts of the object by their labels. Moreover, one can keep adding modifiers as in (21) Beans grew for weeks, for years, for decades. Suppose we code each modified sub-expression with a number, so that beans grew for weeks is labeled “1,” beans grew for weeks, for years is labeled “2,” beans grew for weeks, for years, for decades is labeled “3,” and so on. Then the algebraic structure of these syntactic objects, resulting from unbounded modification, will be like that of the numerals that stand for positive integers. Next consider (22): (22) Jack [grew beans for weeks, for years, for decades …] twice, three times, four times … Imagine Jack living long enough to have grown beans for weeks at a time for years, and to have done this for decades, etc., and to have done all that twice, 279
DERIVATIONS
and to have done all that three times, etc. Continuing with the notational task, let us code each new modification (at the next level) with a second numeral in an ordered pair, so that Jack grew beans for weeks, for years, for decades twice is labeled “(3, 1),” Jack grew beans for weeks, for years, for decades twice, three times is labeled “(3, 2),” etc. Now the algebraic structure of the relevant syntactic objects will be like that of the numerals that stand for rational numbers (. . .3/1, 3/2, . . .), and as discussed in the Appendix, this is plausibly viewed as a dimensional difference. That suggests, though it does not prove, that adjunctions stem from a very simple (perhaps brutely concatenative) method of expanding symbols (unlike the richer and more constrained hierarchical thematic system). In essence, within the confines of a given dimensionality of lexical expression adjuncts are just derivationally added, without ever being attached to the phrase-marker. No issue, then, arises about their bare phrasal representation. Similarly, their discontinuity, and lack of transformational syntax, follows. In turn, their semantic scope for the purposes of compositionality ensues from the sheer order of activation in the derivational workspace.
9 Infinite regress The last comment we made in the previous section about the scope of modification ought to provide another test scenario for the idea that adjuncts are very different from arguments. Suppose that two (or more) adjuncts are simultaneously activated in a derivational workspace. Then they ought to show no relative scope differences. Examples of that sort exist in natural language, and occasionally go by the name of “disjuncts.” Thus, for instance, aside from the obvious differences in meaning between (23a) and (23b), (23) a. Lawyers behave nicely rudely, b. Lawyers behave rudely nicely, there is a certain, open-ended reading for which those two sentences mean the same thing, something paraphrasable as: lawyers behave nicely, rudely . . . , rudely, nicely . . . who knows? both ways, as life is messy in court. Disjuncts, aside from having a peculiar intonation, must come to the right of the head they modify (cf. nicely, rudely, (*. . .) lawyers behave). This allows us to construct a more radical sort of test for the view that different sorts of adjuncts associate with different dimensionalities. It has to do with the possibility that sentences containing adjuncts could be, in some non-trivial sense, infinitely long. If that were the case, as Langendoen and Postal (1984) have shown, then the class of sentences involving these disjuncts would be transfinite, thus of a different dimensionality from that of sentences not involving them. We emphasize that latter point because it makes no sense, in our terms, to have sentences with infinitely many arguments, assuming that these require transformational syntax to converge (e.g. in terms of Case assignment). A transformation maps a definite input to a definite output, thus cannot involve infinitely long inputs or 280
DIMENSIONS OF NATURAL LANGUAGE
outputs. However disjuncts, adjuncts more generally – do not, by hypothesis, require transformational syntax to converge. If so, all bets are off with regards to them (at least in these particular terms) and the size of the sentences that bear them. It is virtually impossible to establish whether there are sentences of infinite length, given contingencies about both human existence and the linear nature of our phonetic system. However, it might be possible to test the idea with regard to pure LF representations, with no associated PF and hence no obvious need to be linear. One particularly interesting ground to construct an experiment comes from domains in the literature where we know that infinite regress puzzles arise in ellipsis. We can see what sort of intuition speakers have about these when they involve disjuncts and when they involve arguments. Consider these Antecedent Contained Deletion (ACD) examples: (24) a. Inevitably Monday follows every Monday that Sunday does. b. Inevitably Monday follows Monday, which Sunday does as well. meaning “a Monday follows every Monday a Sunday follows” and “a Monday follows a Monday, which Sunday follows too.” For these readings to be possible, the direct object (every) Monday must be capable of scoping out of the verbphrase, so that the elided material (follows Monday) does not involve the elliptical phrase contained in its very antecedent (that/which Sunday does), or else in this phrase too we would have to recover the ellipsis, and so on, ad infinitum. (24a) is an example of the sort first discussed by May (1977), and (24b) one discussed by Hornstein (1995a) and Lasnik (1999). In both instances some transformation carries the material we do not want within the ellipsis out of the antecedent domain, Quantifier Raising in (24a) and standard A-movement for Case/agreement checking in (24b). In this context, though, we are interested in seeing what happens when we do not provide a mechanism for that, and so we in fact force an infinite regress. The verb follow is normally interpreted transitively. However, it also has an intransitive reading (as in a news bulletin follows) which allows us to build a test case, when Monday is interpreted as a bare nominal adverb, with the meaning on Monday. Observe then (25): (25) Inevitably Monday follows (on) every Monday that Sunday does as well. This is possible with the reading “Monday follows (intransitively) on every Monday that Sunday (transitively) follows.” This much is not surprising, as the quantifier adjunct every Monday can in principle move out of the verb phrase through Quantifier Raising. (26) is more interesting: (26) Inevitably Monday follows (on) Monday, which Sunday does as well. In spite of involving no Quantifier Raising (Monday is not a quantifier) or any movement for the purposes of Case/agreement checking (Monday is not an argument in this instance), the sentence has a meaning along the lines of 281
DERIVATIONS
“Monday follows (intransitively) on Monday, which Sunday (transitively) follows.” Note that we elide just “follows,” and not “follows (on) Monday,” thus again avoiding the infinite regress. Apparently that is possible with adjuncts. One can target the ellipsis of an X to which some adjunct Y has adjoined without having to include Y as well.13 Now we are ready for the test case, involving disjuncts. The relevant example is (27), where we have added the word etc. in order to suggest an open-ended, raising intonation on as well, thus at least allowing the disjunct interpretation: (27) Inevitably Monday follows (on) Monday, which Sunday does as well, etc. For speakers we have consulted, this can roughly mean “Monday (intransitively) follows on Monday, which Sunday (transitively) follows as well on Monday, which Sunday (transitively) follows as well on Monday, etc.” It is a sort of interpretation that somehow invokes the infinity, indeed monotony, of time. The sentence, which is a generalization, is false if taken strictly (assuming time will come to an end), though perhaps it is still felicitous as a generic claim whose exact truth-conditions would be hard to pin down precisely. Having seen an acceptable, interpretable (in the sense that some well-known Escher lithographs are) sentence involving an infinite regress with adjuncts, we must now consider what happens when arguments are involved. For that we have to find a situation where the argument does not have an “escape hatch” by way of either Quantifier Raising or any other movement. Observe (28), where which crucially modifies Monday and not the day before Monday: (28) Inevitably Monday follows the day before Monday, which Tuesday does too (etc.). In this instance there is no Quantifier Raising, and if A-movement carries something out of the VP, that ought to be the day before Monday. This, though, does not provide a relevant ellipsis. In particular, there is no way to get the meaning, “Monday follows the day before Monday, the day before which Tuesday follows too.” That is expected, but why can the sentence not just mean, “Monday follows the day before Monday, the day before which Tuesday follows too, the day before which Tuesday follows too, the day before which Tuesday follows too, etc.?” Arguments disallow infinite regress. It may be hard for speakers to force the which in (28) to go with Monday instead of the day before Monday. Since otherwise the test is irrelevant, consider also (29): (29) Inevitably male descendants follow the family of their ancestors, . . . a. . . . who of course female descendants do too. b. . . . which of course female descendants do too. The claim is that (29b) is better than (29a), even though there are, in principle, infinite regress readings for both of these sentences (thus for (29a), “male 282
DIMENSIONS OF NATURAL LANGUAGE
descendants follow the family of their ancestors, who female descendants follow the family of, who female descendants follow the family of, who. . .”). By way of movement of the family of their ancestors (29b) does not need to go into an infinite regress to be grammatical with the meaning “male descendants follow the family of their ancestors, which female descendants follow as well.” But that option does not exist for (29a), hence an ungrammaticality ensues. This shows again that infinite regress is impossible with arguments. The latter conclusion is of course not novel, although the possibility, highlighted in (27), that such regresses are possible with adjuncts (and more specifically disjuncts) is, so far as we know, an original claim. For our purposes, the point is that a system that allows infinitely long expressions ought to be of a higher dimension. Since the argument we have just provided can (in principle) be repeated at any level that disjuncts are possible (the domains that -roles/type-lifters define) we expect each of these levels to correspond to different dimensions. We should clarify that of the three types of unboundedness mentioned at the outset; the first and third kind might actually be related. Iterativity is easy to model in a finite-state automaton, which cannot model recursion, and recursion is easy to model in a phrase-structure grammar, which cannot strictly model anything involving strings of infinite length, of the sort we have just seen. Still, it is interesting to note that iterative structures, like very very difficult do not ascribe meaning to each very (created by a separate loop in the system). There is no real meaning to the idea that a very very very difficult proposal is only (say) three fourths as difficult as a very very very very difficult proposal. Both of these are just emphatically difficult proposals, period (the amount of emphasis being a function of the speaker’s passion, dullness, stuttering or whatever). But this arguably means that what we need in this instance is a very loose system, perhaps not even something as fancy as a loop; possibly modifications are “just there” and do not add much to meaning because, strictly, they are not being added to the semantic representation. Be that as it may, this opens the question of what, precisely, modification is (see Uriagereka 2001b on this), and suggests that the open-endedness we are now experiencing is related to iterativity, the latter being just a trivial sub-case of the former. What that open-endedness really amounts to, especially in the case of disjuncts, is something we will not go into now.
10 Some conclusions We have shown that a dimensional interpretation of familiar hierarchies is at least possible and perhaps even plausible. We have done this within two closely related views, stemming from a minimalist syntax and a neo-Davidsionian semantics. In essence we take adjunction to be a more basic syntactic relation that Merge, and correspondingly, predication to be a more elementary semantic notion than -role assignment. We have not explored the syntax of either of these notions in any detail, but 283
DERIVATIONS
the assumption has been that adjunction is flat, whereas hierarchies emerge as a result of argument taking, for which the system steps out of its boundaries, quite literally, creating new representational spaces or dimensions. It is through dimensional shifts, associated with argument taking, that asymmetry enters the picture, thus predicting characteristic entailments of the sort analyzed in this chapter. We also mentioned, though have not examined, how this kind of imbalance should be responsible, in the end, for just about any area in language where some sort of hierarchical, sub-set or otherwise asymmetric relation obtains (see Chapter 15). In the last couple of sections we have explored the intriguing possibility that, within dimensions induced by arguments, adjuncts just “sit there” with no real syntax to speak of and, as a consequence, the possibility of not just unbounded, but also infinite expressions ensues at least for disjuncts. The latter result is obviously tentative, but potentially very significant, since if true it would suggest, in the spirit of conclusions reached by Langendoen and Postal (1984) for the system at large (which we do not embrace), that a full representation of at least disjunct properties is impossible (in computational terms). For that chunk of language responsible for disjuncts it is possible that expressions exhibit systematicities, but not strictly compositional properties. We would not be surprised if, for instance, the Escher-style ACD examples involving disjuncts are interpreted, to the extent that they are, in roughly those ways. Of course, semantics is compositional, so strictly speaking disjuncts fall outside of the realm of standard semantics. But disjuncts are just a sub-case of adjuncts, which do seem to have standard compositional properties (if not necessarily strict compositionality, as they arguably do not compose as part of a phrasemarker). So in a hierarchy of strictures, disjuncts precede adjuncts precede arguments, in terms of mere systematicity, compositionality, and strict compositionality. That suggests a kind of open-endedness for a chunk of language that can only be understood biologically if the system is so underspecified that it has virtually no cognitive limits, and thus is, in itself, relatively limited as a system of thought, communication, and so on. Yet that very system, with a minor improvement (argument taking) all of a sudden becomes constrained, plastic, creative in useful terms, and otherwise familiar as our human language. For anyone interested in understanding the evolution and emergence of language, this curious transition should be of some interest. Our conclusions also have a bearing on an important debate between atomists and decompositionalists, mainly carried on the pages of Linguistic Inquiry the last couple of years, which has circled around the following four theses: Thesis One: Thesis Two:
The lexicon is productive. A simple, typically first order, formal language is enough to capture the subtleties of natural language semantics. Thesis Three: In fundamental respects all languages are literally identical. Thesis Four: Analiticity is an achievable goal. 284
DIMENSIONS OF NATURAL LANGUAGE
The two sides on the debate have had opposing views on these theses. For instance Pustejovsky (1995) assumes all four, while Fodor and Lepore (1998) deny all of them. Nothing that we have said here entails that the lexicon should be productive, in the generative sense that Pustejovsky and others (going all the way back to generative semantics) advocate. In fact, we do not even know how to show that the lexicon is potentially unlimited,14 so we assume it is not. Nonetheless, the fact that we do not assume a generative lexicon does not entail that we allow no systematic structure for words. This is where our dimensions come in. Nothing that we have said, either, entails that natural language semantics should involve any fancy mechanisms. Quite the opposite, we have made a big deal of the fact that language has -roles and these are not pronounced, hence by hypothesis are not part of the substantive lexicon. Our view, thus, is that “higher order” devices, if that talk is even appropriate for the sorts of entities we are analyzing,15 lay outside of the object language. Again, that is where dimensional cuts are signaled. So in those first two theses we align with atomists, and contra (standard) decompositionalists. However, our view is the exact opposite with regards to theses Three and Four. With Chomsky throughout his career we assume, and find ample reason to believe, in the deep uniformity of languages. Similarly, also in the spirit of much of Chomsky’s heritage, we do believe in analyticity. It is analytic that y boiled if x boiled y, and that y was open (at some point) if x opened y; see Pietroski (forthcoming a, b). But this is not because words reduce to other words. It is because language exhibits hierarchies and (one-way) relations between “dimensions” like event and state. By trying to argue for certain dimensions of language, we have attempted to explore the prolegomena of a theory that seeks to determine what the analytical foundations of language are. We interpret the project of Hale and Keyser (1993 and elsewhere) in roughly these terms, so do not claim any originality for the broad picture we present. In other words, we seek some constrained analyticity, in particular through the use of dimensions that cut across the lexicon. There are two major ways of implementing this overall program, which we have termed Hypothesis A and Hypothesis B. The former is less sympathetic to atomism than the latter, which is why we have pursued Hypothesis B (after all, we are atomists at least in terms of theses One and Two). But in either instance there has to be some component of the system that analyzes lexical concepts in dimensional ways.
Appendix Imagine a familiar kind of invented language in which “1” is a symbol, the result of concatenating any symbol with “*” is a symbol, and if “X” and “Y” are symbols, “(X, Y)” is a symbol, as is “(X, Y).”16 The symbols of this language include “1*,” “1**,” . . . , “(1, 1*),” . . . , “((1*, 1**), 1*),” “(1*, (1, 1)),” and so on. An obvious possible interpretation is that “1” denotes the smallest 285
DERIVATIONS
positive integer. “X*” denotes the successor of whatever “X” denotes. “(X, Y)” denotes the sum of whatever “X” and “Y” denote, and “(X, Y)” denotes exactly one of two arbitrarily chosen objects, call them “T” and “F”, depending on whether or not “X” and “Y” denote the same thing. There are endlessly many expressions of this language, and one can speak of several different types of expressions. There is also a sense in which complex expressions of the language exhibit hierarchical structure. For example, in the sentence (1****, (1**, 1*)) there is an asymmetric relation between “” and “*.” Likewise, there is an asymmetric relation between “1****” and “1**.” Strictly speaking, there is also an ordering of the asterisks in “1**,” which is the result of concatenating the symbol “1*” with “*”; though intuitively, this is a less interesting kind of hierarchy. It is easy to imagine simple mechanical procedures for determining whether sentences of this language denote T or F. An expression of the form “(1m, 1n),” where “m” and “n” stand for numbers of asterisks, can be replaced with an expression of the form “1mn1” by alternately erasing components of “1n” and adding asterisks to “1m,” and then erasing “” (its brackets and comma) when nothing remains of “1n.” A slightly different procedure, of erasing a component from both “1n” and “1m” until at least one of them is completely erased, could be used to evaluate expressions of the form “(1m, 1n).” If both “1”s are erased in the same round, replace “(,)” with “T.” Otherwise, replace whatever is left with “F.” The language can also be expanded via the following rules. If “X” and “Y” are symbols, so is “#(X, Y)” and “%(X, Y)”; where by stipulation, “#(X, Y)” denotes the product of whatever “X” and “Y” denote, and “∧(X, Y)” denotes whatever “X” denotes raised to the power of whatever “Y” denotes.17 One can use erase/write procedures for evaluating the resulting expressions, at least in principle, in terms of addition; see, for example, Boolos and Jeffrey (1980). If “#” and “∧” are defined in these terms, one can speak of procedural meanings for expressions of the language, where the meaning of each expression determines its denotation. One could use mechanisms of the sort just described to determine the (absolute value of the) difference between two unequal integers, and one can stipulate that if “X” and “Y” are symbols, so is “(X, Y).” But this is not yet to introduce a device for representing subtraction. One needs to know how symbols like “(1, 1)” and “(1, 1*)” should be interpreted, since the procedures described thus far do not settle the question. A now obvious, but initially dramatic, thought is that the successor function is “reversible,” and one can encode the idea of a predecessor function by allowing for symbols like “*1,” “**1,” etc. Given the possibility of both right-concatenation and left-concatenation, there can be symbols like “**1***” and “***1**.” These are easily transformed, via alternating erasures, into symbols like “1*” and “*1.” But this still raises new interpretive questions, in a way that introducing “#” and “∧” did not. Put another way, the meanings thus far assigned to expressions do not yet determine the denotata of sentences like “((1, 1), *1)” and “((*1, 1), **1).” Do they denote T or F or neither? This extension of the basic language requires 286
DIMENSIONS OF NATURAL LANGUAGE
a conception of the relevant domain (of possible denotata for symbols of the language) as including more than just the number 1 and its successors. Similar considerations arise if the language is extended to allow for symbols of the form “%(X, Y),” interpreted as whatever “X” denotes divided by whatever “Y” denotes. For while one can think of division as an inversion of multiplication, the denotatum of “%(1***, 1**)” is hardly determined by the procedures described above, and such expressions cannot be reduced to expressions of the form “(X, Y).” In this sense, the meaning of “%(1***, 1**)” is something new under the sun, expressible in a new dimension, whereas “#(1***, 1**)” is equivalent to “((1***, 1***), 1***).” Correspondingly, one now needs to think about the relevant domain as including all the rational numbers.18
287
15 WARPS Some thoughts on categorization†
1 Introduction Language presents paradigmatic regularities, together with the usual syntagmatic ones that syntax is designed to capture. This chapter proposes a way of deriving systematic hierarchies by analyzing linguistic categories through the algebraic structure of numbering systems (hence by way of dimensions, each recursively defined on the previous). The goal is not to equate “vertical” structuring and “horizontal” syntax, but rather to explore the properties of the former in order to predict certain well-known implicational facts. New recalcitrant data are also brought to bear on the issue, as well as a proposal for acquiring lexical categories in present terms which, it is argued, successfully mimics the acquisition sequence by infants. Combinatorial or “horizontal” approaches to linguistic structuring, by their very nature, are not designed to capture implicational or “vertical” properties of human language. An old debate, recently refueled on the pages of Linguistic Inquiry,1 is concentrated on whether these “vertical” properties are real. I will suggest below that they are, at least to some extent. More importantly, though, all present analyses of the implicational properties of language merely restate them formally. That is, category X is taken to implicate category Y only because some formal relation stipulates the entailment in some level of linguistic or nonlinguistic representation. I do not see the explanatory power behind that. The basic idea in this chapter is that there is a more profound way of doing things, with interesting empirical consequences. When all is said and done, current approaches stipulate each class of implications possible from a given categorial unit. In the alternative I suggest, a single stipulation is intended to work for all classes of implications. Furthermore, the stipulation in question is needed independently to conceptualize the structure of numbering systems. I make much of the fact that the species which is able to speak is also capable of counting, subtracting, doing fractions, and so on. I think the structure of both systems is not just related, but in fact identical, when seen at the appropriate level of abstraction. This suggests that the stipulation required in order to obtain implicational structure is derivable from deeper properties of cognition. Interestingly, if this is the case, language will have a curious “dimensional” 288
WARPS
property to it, which I suspect relates to other cognitive abilities that make use of this property. Differently put, human lexical concepts will be recursively built in layers that can be expressed derivationally, but it need not be, which means the system in question need not be of the same sort as standard syntax. The present proposal has certain structural properties following from cognitive principles. However, in other proposals I am familiar with, standard conceptual frames (usually expressed in a variant of a first-order predicate calculus) are taken to underlie syntactic structures. What we will develop here is intended to map something with conceptual and associated intentional properties. However, in itself the underlying structure is purposely abstract, hence not even remotely similar to what is assumed in current versions of generative semantics or related subdisciplines. This is all to say that I take the autonomy of syntax rather seriously, but because of syntactic reasons I propose a kind of unfamiliar structure to account for a phenomenon that has been, so far as I know, never analyzed in these terms.
2 The problem with categories Recent work within the Minimalist Program has explored the possibility that the faculty of language is a pure derivational system. Nowhere has this idea been as explicitly advocated as in Epstein and Seely (2002), who citing the work of complexity theorists claim that theoretical appeal to macro structure properties (of the sort apparent in levels of representation) fails to explain macro structure itself. In essence, for Epstein and Seely, “if you have not derived it, you have not explained it.” In this chapter, I concentrate on an issue that is still left open even in the radically derivational system that Epstein and Seely advocate. It operates with representations such as V, N and so on. If one has not derived those, has one explained them? Arguably we do not need to explain that. Any system has primitives, and perhaps those are the primitives of ours. It is slightly troublesome, though, that some of those (V, N and the other lexical categories) have been postulated over two millennia ago, roughly contemporary with Democritus’s atom. Perhaps linguists had the right cut at the time, although physicists did not. Then again, perhaps linguistic understanding has not advanced as much as physical understand has. On a related note, the set of functional categories is growing by the month. Cinque (1999) has shown some forty categories of the relevant sort. Does that mean we have forty functional primitives? None of these questions have a priori answers. One can deny the facts, but the linguists that have proposed them are clearly competent and honest. One can try to divert the facts to a different domain, for instance insisting on not decomposing whatever one’s favorite lexical atom happens to be, and blaming the putative structure that “clouds the picture” on the structure of thought, the world or something else. Or one could also try to bite the bullet and reflect on the nature of the structure of these creatures that derivations start on, how it differs from syntactic structure, and yet how it seriously affects it. We can pose the question thus. Should linguists stop their research when 289
DERIVATIONS
faced with V, N and so on? Or should we split the lexical atom to see what kinds of interactions are possible at that level of linguistic reality? And if the latter, what is the nature of those interactions? Perhaps it is worth pondering for a moment a relatively similar situation that arose within physics in this century. Atoms were not expected to have parts, which is why they were called atoms, but they did. Then the issue was how many, with what properties and so on. By the 1940s and 1950s, sub-atomic particles were being found by the dozens. Did that mean that the primitives of physics were counted in dozen? For some (unclear yet powerful) reason physicists do not like that. They observed that those various particles exhibited regularities coded in terms of “conservation laws,” which led to the postulation of “particle families.” Soon new regularities emerged, and the quark model was postulated in the early 1960s. We may also keep in mind that the realm of the sub-atomic (where electromagnetism and the strong and weak nuclear forces obtain) obeys principles which are mathematically incompatible with those obeyed by the realm of the superatomic (where gravity reigns). Contemporary physics has lived almost a century in this contradiction, without the project having stalled. Certainly many physicists try to unify the situation, but a successful, grand unified theory is still missing. Demanding more from present-day linguistics than from physics seems unreasonable. That is to say that I am not going to be troubled if the result of our research into sub-lexical units forces us into a theory that is different from, or even incompatible with the one we have for super-lexical stuff. If that is the case, so be it. We shall seek unification in due time.
3 A vertical dimension in language Returning to basic facts, it is obvious that language is organized not just syntagmatically (in a “horizontal” way), but also paradigmatically (in a “vertical” way). Syntagmatic organization is what our familiar derivations are good at capturing: morphemes, words, phrases, transformations. About paradigmatic organization, much less is understood. Take for instance a Vendler-style classification of verbs into states, activities, achievements and that sort of thing (on these matters, see Pustejovsky (1995) and his references). Standard derivations have nothing to say about this. For if a derivation is to run smoothly, it makes no difference if states are higher or lower in the classification than achievements. The same can be said about thematic hierarchies (Theme low, Agent high, etc.), complexity hierarchies within nominal structures (mass low, count high, etc.), auxiliary hierarchies (modal, perfective, etc.), and just about any other hierarchy that has been proposed in the literature. It is easy to see, for instance, that a Phrase-structure Grammar would have different rules depending on whether we want modal to appear higher than perfective or vice versa, but the grammar itself does not care about the class of productions it admits, so long as they are permissible. The reality of these hierarchies should be hardly in doubt. There are many 290
WARPS
quibbles about the details of how they ought to be expressed in syntax (“Is theme higher or lower than goal?” and so forth; see for instance Baker 1996), but it is clear that language presents vertical cuts, and little of substance is ever said about them. There is actually not much to say about them, in principle, in standard derivational terms. A Chomsky-style derivation is a generalization of a concatenation algebra which is designed for that, concatenation, or horizontal organization (see Chomsky 1955). But one wonders, in the Epstein/Seely spirit, whether it should not be the case that the very fabric of grammar yields both the familiar horizontal ordering and the vertical one as well. It is of course tempting to blame the nature of the vertical cuts on the nature of reality, which language is supposed to represent. That would work like this. Reality is (appears to humans) as vertically structured into familiar classes. Language is used to speak of those classes. Ergo it is normal for language to reflect the structure of what it speaks of. For instance, say that an expression like a chicken presupposes some chicken stuff, whereas an expression like chicken does not presuppose a (whole) chicken. But is that a linguistic fact? Is it not enough to say that all chickens are made of chicken stuff, whereas not all chicken stuff is necessarily attached to a chicken? In the absence of a theory of reality, this is hard to ascertain, but one can certainly play the game. So suppose the world does work like that for the chicken instance. Unfortunately, things get more messy right away. To see this, suppose we formalize the chicken example in a bit of detail. Say we have an “ontology” of individuals (like chickens, lambs and so forth), and since these are made of whatever they are made of, we have a corresponding statement in the theory to say that individuals have some mass.2 The mass of an individual is denoted with such words as chicken, whereas the individuals themselves are denoted with that predicate plus some determiner. With that machinery in mind, let us move on to more interesting examples. Let us start with something as everyday as fish. Surely, that fits the picture we have just sketched. But now consider a dish of very small fishes, baby eels. When we eat that, we say that we are eating fish, although at no point are we eating parts of fish, rather, we gobble entire fishes, dozens of them. But that is fish too, so fish has to be not just the stuff that makes up an individual fish, but also whatever makes up sets of those, if they are tiny. That qualification is important. Imagine I have eaten one thousand sardines in my life; I could refer to the entire event of sardine eating in my life as a fish-eating event. I ate both fishes (one thousand of them) and fish (in a given amount). People consider both of those propositions natural. In contrast, suppose I have just eaten baby eels in an amount which, if I count, involves precisely one hundred individual baby eels. The proposition that I have eaten fish is natural, although the proposition that I have eaten one hundred fishes seems somewhat odd to speakers, although it is still reasonable in terms of the individuals I have eaten. What is going on is not too arcane. Somehow a perspective about the size of what I am eating is relevant in natural language. Fish, as used to denote what 291
DERIVATIONS
humans eat in their meals, is canonically measured in terms of the size of a dish. It does not matter whether what I eat is part of a large salmon, an entire sardine or a whole set of baby eels. What counts is to fill up a plate. And what fills up a plate is fish, regardless of its individual nature. That tells us that a connection to reality is not as clean as suggested above.3 I understand, in biological, chemical, physical terms what it is to say that an eel, a sardine, a salmon, have eel, sardine, or salmon flesh inside their skins. But, in those terms or any terms other than cognitive-linguistic ones, what does it mean to say that for fish to be naturally used to denote some food, you must have many baby eels, one sardine, or a chunk of salmon: the size fitting a dish? Similar considerations apply for Vendler-type ontologies. If I compose a tune, it comes into existence. My tune composing event in time entails a permanent state of tune existence. The tune I have composed will remain composed forever. But is that a fact about the world? Suppose we say that all that is required to understand the relation between my composing the tune and its state of existence is something basic like an (obvious) causal relation.4 Is X being caused by Y a necessary or sufficient condition for Y to stand in some kind of implicational relation with X, expressed through lexical information? The existence of my tune may have caused my neighbor to cry in desperation. If just about any causal relation can find its way to the linguistic system because it exists, I should be able to say that “*My tune cried my neighbor” or “My tuned Xed my neighbor” (for X any verb) with the entailment that my neighbor cried because of my tune. Certainly I can say that my tune bothered my neighbor (say), but that does not entail anything about his crying, even if that is what actually happened. In fact, I do not know how to express such a thought other than in a sentential manner (as I have). So causal relations between given events in the real world are clearly not sufficient for these events to relate to each other in purely lexical terms. In turn, causal relations between given events in the real world do not seem to be necessary for them to relate to each other in lexical terms. If Marlow follows Shorty to the crime scene, it makes sense to say that Marlow’s following event entails Shorty’s followed state because Marlow’s action caused Shorty’s state; but, if Marlow follows his sister in his family, although it is true that Marlow’s following condition entails his sister’s state of being followed, it is less obvious that the condition caused the state. Presumably, what caused the state is something about Marlow’s parents and their moods, chance, and whatever is involved in family planning. It is there where standard causality would seem to stop, but one can still speak of events, conditions and actions that go beyond basic causal relations. One could stretch one’s theory of causality to accommodate these sorts of situations, but it seems beside the point. Instead of making cumbersome theories of reality or causality bend themselves to the demands of human concepts as expressed through language, it is worth exploring the alternative view. That reality, causality and all that are fine as they are, but human perspective through lexical concepts has much to add, in particular the generative edifice that imposes the vertical cut in the implicational reasoning entertained above. 292
WARPS
In what follows I suggest one way of approaching this paradigmatic, vertical aspect of language, which manifests itself through lexical regularities. I believe that this aspect and the sort of mechanism I have in mind for expressing it can be useful in understanding the nature of syntactic categories more generally (not just stative and eventive verbs, or mass and count nouns, but even the very distinctions between verbs and nouns themselves). Given what I have said so far, it should be obvious that I am not making any direct claim about “outside” reality. Only “inside” cognition is at issue.
4 A word on existing theories, plus an alternative There are two theoretical takes on the vertical cut of language. A Fodor-style, atomistic approach denies that any of this is linguistic (see Fodor and Lepore 1998). What it is from this perspective (whether something in the structure of thought, or in that of reality) has not been worked out in any detail, so far as I know. A Jackendoff-style, decompositional approach manifests itself in various linguistic guises, with more or less emphasis on syntactic (à la Hale and Keyser 1993), or semantic (à la Pustejovsky 1995) aspects of the system. All of these present serious conceptual differences among themselves, but share, I believe, the following formal property. Say I want to state that denotation X (or syntactic object X corresponding to denotation “X,” or, mental or real, world structure X corresponding to denotation “X”) is in some sense a proper part of structure Y. Then I propose a formal mechanism that basically states that fact in some first-order language: (1) X is a proper part of Y. (There are fancier ways of stating (1), but plain English will do.) If we allow ourselves statements of this kind, we may construct all sorts of elaborate arrangements, trivially. Kinship trees, for instance, work just like that, where the notion “descendent” substitutes for “proper part” in (1), and allows familiar ensembles. In linguistic terms, a statement like (1) allows one to express, for instance, a Vendler-style classification, directly through relevant denotations (“state X is a proper part of event Y”), or corresponding syntactic objects (VP is a proper part of vP), or even related (mental) world objects. Consider, in contrast, how we would classify something else in a different realm. For instance, numbers, abstractions with no meaning which may allow us to reflect on the structure of the problem without attaching any extrinsic significance to the formal question. Say we want to express the fact that the set of objects like 2 is part of the set of objects like 2, and that set in turn is part of the set of objects like .5 Try (2): (2) a. The set of natural numbers is included in the set of whole numbers. b. The set of whole numbers is included in the set of rational numbers. These statements are obviously true, but unsatisfactory. First of all, there is a reason behind these claims. The naturals are part of the 293
DERIVATIONS
whole numbers because those can be obtained by inverting the mechanism that gives us the naturals (succession or addition). When we start subtracting naturals we stumble onto the “rest” of the whole numbers, the negatives. And indeed, all naturals have an expression in terms of the mechanism that yields the rest of the whole numbers (thus (1) is the negative expression of 1, and so on). Similarly for the whole numbers and the rationals, which can be obtained by inverting a natural mechanism within the whole numbers (multiplication, an addition of additions). When we start dividing the whole numbers we stumble onto the “rest” of the rationals, the fractionary. And indeed all whole numbers have an expression in terms of the mechanism that yields the rest of the rationals (2/2, 3/3, etc., are fractionary expressions of 1, and so on). Second, the reason (2a) is true is related to the reason why (2b) is true, although at a different dimension. The previous paragraph expresses this intuition. Division (the inverse of multiplication) is to the rationals as subtraction (the inverse of addition) is to the whole numbers. We can think of each of these as generating functions, or the particular kind of relation whose effect on the generating set (the naturals for subtraction, the whole numbers for division) is the “larger” set, which the original set was a part of.6 Indeed, we could proceed generating “larger” sets in roughly the manner indicated, by choosing another natural operation within, say, the set of rational numbers (powers, a multiplication of multiplications) and inverting it. That way we obtain irrational numbers, so that the following statement is also true: (2) c. The set of rational numbers is a subset of the set of real (rational and irrational) numbers. All of the statements in (2) are true, but not primitive. They follow from the algebraic structure of numbers and how they are constructed. One can give a computational characterization of the “generating” operations outlined above (subtraction on the naturals for the whole numbers, division in the whole numbers for the rationals, and so on; see Chapter 14: Appendix). What is more, a higher order characterization would be possible as well. In English: (3) The inverse ⬃O of a closed operation O in number set X yields a set X which X is a part of. (3) is a defining statement about what I have been calling a “generating function,” and is obviously not a number statement, but a statement about numbers. If a statement about 1 or 2 is of order n, a statement about “inverting” operations within the system that 1 or 2 obtain is clearly of a superior order nm. The moral is an important one, I believe, if we judge from the fact that, just as natural language has obvious horizontal and vertical cuts, so do numbers. Thus, 2/32 is a linear expression whose syntax could be trivially generated by a concatenation algebra, just as that of John loves chicken can, but the fact that “” is qualitatively different from “2,” and the set of objects like “2/3” includes the set of objects like “2” is not expressed in concatenation fashion, any more than the differences between John and loves or John and chicken are. In the 294
WARPS
case of numbers, however, we need not blame those vertical correspondences on any outside system (the world, some semantic translation, or whatever). The very formal structure of numbers necessarily yields the vertical dimension that we are alluding to. That is, 2/3 would not have been 2/3, or 2, 2, without the implications we have been discussing. Could this way of dealing with the vertical property of numbers tell us something about the vertical property of lexical expressions? One is tempted to answer in haste, of course not. There is nothing necessary in a chicken having chicken stuff, or ontologies dividing into individuals and actions. Those are contingent properties. It could have turned out (I suppose) that the universe came out with no individuals whatever, and had just stuff, or that stuff was composed of individuals and not the other way around, or that the universe were static, with no actions. Ah, but those are all considerations about reality, which I said I was setting aside. In a sense the question could be posed this way: “Is there meaning to the claim that the vertical structure of lexical concepts (their classifications, entailments and other ‘algebraic’ properties) happens to be necessary, and to some extent at least, essentially the one found in the algebraic structure of the numbering system?” That question is more sensible than it might seem at first, although it presupposes a view of cognition that is worth exposing. Once we grant that there is “independent” structure to lexical concepts, somehow determined by the properties of mind/brains, then it becomes an empirical issue what that structure is. That it might share aspects of the structure of numbers, or any other structure present in the mind, should not be particularly troubling, especially in a biological creature, subject to the twists of evolution. It is a fact that humans have the capacity to understand numerical relations, and furthermore that they are the only species with that capacity. Then that the structure in question may be used for this or that purpose is entirely within the realm of possible uses of evolutionary results. Just as one does not find it philosophically hard to swallow that the same mental capacity that underlies mathematics should underlie our musical abilities, a priori it should be equally possible that this very structure (or at any rate some structure with the relevant algebraic properties) might underlie lexico-conceptual knowledge. Consider how that would work. The good news is obvious. We can import the algebraic structure of the mathematical system (whatever is responsible for (3)) to give us the observable hierarchies and other relations. No need, then, to restate the relevant relations, blame them on reality or anything of the sort. At the same time, the devil is in the details. How would that tell us the difference between V and N, or the various classifications of each V and N, and so on? The remainder of the chapter proposes a way of addressing that sort of conceptual question. But I want to be very clear about the difficulty that the related intentional question poses. Suppose I convince anyone that N is “this algebraicobject” and that V is “that-algebraic-construct”. Still, what does it mean to succeed in describing John by way of the expression “man” or to denote a peanut eating event by saying that “a man ate peanuts”? Throughout this 295
DERIVATIONS
chapter, I only have speculations about that very tough question. I should insist, however, that the way the truth calculation of “a man ate peanuts” is usually done is different in spirit from what I will attempt here. As far as I know, all standard theories assume a given ontology (of men, peanuts, eating, and so forth) which already suffers from the difficulties I am trying to avoid, presupposing the vertical relations that I want to account for. My problem is the reverse. By postulating an abstract mathematical structure for the array of lexical concepts, the (relatively) easy part is capturing the vertical relations. They are there (“inside”). The hard part is to make that correspond to anything familiar out there (“outside”), since by going abstract in the vertical dimension we will distance ourselves from an apparatus of easily observable items, discerned through pointing and other such devices which (I think) researchers implicitly assume when deciding on their ontologies. That is all to say, my peanuts or my eating is not going to be anything as trivial as a little picture of either. What it will be remains to be seen, but I honestly do not think that presently existing alternatives are as straightforward as customarily assumed. That I can point to a bag of peanuts (also a unique human activity) already presupposes an incredibly difficult intentional operation. How I got my mind there is what I would like to understand, and theories so far merely presuppose that it got there. But if things were so straightforward, it should be easy to program a robot (or get a dog) to understand that innocent pointing. I know of no results in that respect.
5 The basic idea Humans are good at what, as a homage to Star Trek, I will call the “warping” task. My wife twists and folds a thread (a unidimensional entity) until with enough knots it creates a variety of forms in what can be seen as a twodimensional lattice: a scarf, a sock, a sweater. Even I can twist a piece of paper (a bidimensional entity) until after a few calculated folds and pulls I can create a fairly decent airplane, an entity which can be described as a three-dimensional object. Some of my Japanese friends do better and produce frogs, birds, balloons. And almost everyone in my department plays a game which consists of taking a three-dimensional object from one part of a three-dimensional field to another. We do that for a variable amount of time, and whoever puts the object more times into a designated part of the opponent’s field is said to win.7 What goes on in those situations is mathematically fascinating, a transformation of n dimensional space into an n1 dimensional object. How is that “magic” performed? We could think of it this way. The trick is to subvert the rules of some n dimensional space, by literally warping it. When one thinks of the characteristic Cartesian coordinates one draws in a notebook, one plays within the rules of Euclidean space. Inside those one, two or three lines one draws mundane things such as segments, triangles, fake cubes, and so on. But one can do better by tearing off the page from the notebook and warping the Cartesian axes until the page forms, say, a tube. That very tube (topologists call 296
WARPS
it a “cigar band”) is already a pretty nifty trick. It looks like a three-dimensional object from our three-dimensional space, although of course it is still a twodimensional thing in its own world, since all one did to get it was warp the page. In turn, if one warps the “cigar band” carefully (particularly with soft paper), one can insert one end of the tube into the other. At this point one gets a sort of doughnut (topologists call it “torus”). Those activities are uniquely human, and I suspect entirely universal. I would describe them thus. Humans have the capacity to exploit a given formal system so much that they can even use its formal limitations to come out of that system, and take the system itself as an object at a higher dimension. This of course recalls Goedel’s results. Simply put, Goedel found that no formal system which is complex is complete as well. There is always a basic statement that must be expressed outside the system.8 That fact has serious consequences for what is provable, as opposed to “merely” true. Not all true statements can be proven. Furthermore, that fact creates a hierarchy of the sort we need, because of the formal limitation we are describing. Recall the number instances we were playing with before. For example, in addition within the set of naturals, adding any two of those yields another and closes the operation in that set. But humans cannot help but ask: “If I can add x to y to yield z, I should be able to express the relation backwards, so that taking z and y I get x.” It is only natural to explore this “inverse” situation. But watch out. Everything is fine if z is larger than y, but if z is smaller than y (a possibility within the naturals), then the result of that operation is not defined in the original set. Here the boring mathematician will blame us for cheating. But the child in us asks why; can we not have a new type of number? Therein is born a little monster whose existence arises because imagination cannot be constrained by the answer “you cannot do that.” Of course, the imaginative creature is not an “anything goes” kind of object. It has properties. It resulted precisely where expected. It has the form it should have there – and it opens a whole new world of possibilities. The same occurs with the folded Cartesian axes. Normally we are not allowed to do that, we must draw within them. But any normal ten year old is prone to ask why she or he should draw the coordinate lines straight. Why can they not be circles (or wavy lines, etc.)? At that point we are on the verge of stepping out of the more basic object. The price will be a very hard time with proofs, but the prize will be new objects, and for free. Given what I have just said, I obviously have (and believe there is) no standard way of proving this, yet it may be true: (4) If operation ⬃O is not closed in system X, applying ⬃O to the objects x of X creates new sorts of objects x, so that a new system X is created with the x objects, such that X is a part of X. We have seen this with numbers in the previous section ((3) is a sub-case of (4)). But (4) is true even in the topological examples discussed in this section. What we did when we identified the two edges of the page to form the “cigar 297
DERIVATIONS
band” is prohibited in the standard Euclidean plane. Prior to the trick, we had the two parallel lines in (5): (5) ____________ ____________ Warping the paper makes us identify these two parallel lines. Euclid’s fifth postulate prevents two parallel lines from ever meeting. Ours, however, meet at all points after the warp. That is cheating within the rules of classical space, but the reward is something which, when looked at from the perspective of the standard (Euclidean) three-dimensional plane, is a new object.9 All that was mathematics. What does it have to do with concepts? If that obvious human ability is at use in concept formation in the linguistic system, then we will get a couple of things in the same package, vertical hierarchies, and indeed new objects with no dependencies on the real world, of course, for better and for worse. I say the latter because this is not what is going on: we mentally warp a mental piece of paper into a mental torus, and we use that to pick out doughnuts. It cannot be that simple (think of abstract or intangible concepts, to point out the obvious). The picture I am about to suggest is considerably more abstract than that, and any temptation should be abandoned at this point of relating everything out there to concepts “in here” which can be iconically “likened” with the objects. We must just bask on a more modest glory. We can obtain necessary hierarchies from the warp situation, and some objects arise as well, even if useless – at least until I say something else – for intentional purposes.
6 The duck problem and porous modularity The intentional problem I have just noted relates to another traditional problem, which has been emphasized in recent years in the work of Jackendoff (e.g. 1990: 32). There are some featural distinctions that seem reasonable, like mass and count, which correlate with obvious selectional restrictions and manifest themselves in various morphemic ways across languages. However, think of the featural distinction that separates a duck from a goose. What is it? Is it plus or minus “long neck”?10 In present systems, the way to associate phonetics to meaning is precisely through semantic features which happen to fall in the word that bears the relevant phonetic distinctions. There is no “direct” connection between phonetics and meaning, if standard models are correct. Obviously, we use phonetic differences to tell lexical types apart, and we assume that those types correspond to different semantic types. But how do we do the latter? That is where we need something like a lexico-semantic difference, so that then the truth or presentation conditions of I saw a duck come out different from those of I saw a goose. Atomists like examples of this sort, since they make them suppose that the only real feature distinguishing those two is plus or minus “duck.” Of course that is like saying that “duck” is a primitive concept. Jackendoff uses this sort of case to suggest a different kind of solution. A duck 298
WARPS
looks different from a goose. Why could we not distinguish ducks and geese the way other species (presumably) do, by the visual, or other systems? If the lexical entry for duck has a visual instruction to it, say, then the problem of telling a duck from a goose in linguistic terms becomes much more manageable in principle.11 But what kind of system is that? One of the most cherished aspects of Fodor’s modularity thesis (1983) is that information is encapsulated within given systems. The linguistic system is not supposed to know what is going on in the visual system, or vice versa. If we start allowing the linguistic system to use visual information (or auditory, olfactory, tactile information, more generally) then it is hard to see how in the end we will not have an anti-modular, connectionist network in front of us. Now think of the issue from the perspective of multiple dimensions of mental systems. One might suppose that, while systems are encapsulated interdimensionally (the modularity thesis), they are intradimensionally porous (a restricted connectionist thesis).12 That is, suppose all relevant cognitive systems that enter into the computation of lexical meaning are dimensional, or layered, in the sense above. One might suppose that the layers in question communicate, though the systems are otherwise encapsulated. The picture would be as in (6): (6)
Language
Vision
Audition
Motor
etc. 4-D 3-D 2-D 1-D
Suppose the “long-neckedness” of geese is a property that the visual system distinguishes in its 3-dimensional (3D) representations; then the linguistic system should be able to use that very representation in one of its 3D representations. More generally, since a 3D visual representation implies a 2D representation, if it is the case that some 2D visual representation is accessible through the 3D one, the linguistic system should be able to access indirectly that as well from the perspective of a 3D representation. What should not be possible is for a 3D linguistic representation to have access to 4D visual representation, or even directly to a 2D visual representation which is not accessible to the 3D visual representation. We can think of this as “porous modularity.” I should emphasize that, when I speak of dimensions in the visual system, I do not mean the obvious Euclidean ones, or even necessarily what has been proposed in the literature, following work by Marr (1982). It could happen to be that those particular dimensions are the ones we need for porous modularity to work, but it is also logically possible that we require something more (or less) abstract. That is, at any rate, an empirical issue. If porous modularity holds of the human mind, then the lexicon can be a more interesting thing than is usually assumed. The familiar set of idiosyncratic 299
DERIVATIONS
associations between sound and meaning would really be a set of intra dimensional associations among several modules. The dimensional system that I am trying to sketch can be the natural locus of the association. Seen in this light, the temptation of trivializing the dimensional picture in terms of the origami metaphors reduces quite considerably. True, part of the lexical model of a goose might be something very much like an origami model, qua its visual aspects. But the linguistic representation would be richer than that, including other sensory information, as well as any kind of information that happens to be there in the human mind and is used for these tasks. That of course still leaves the question open of what are the dimensions in each kind of system, and in short I have no idea. Luckily, there are indirect ways of going at this which, if coupled with a serious study of the different modules involved, might give us a better understanding of what I suspect is a phenomenally complex task.
7 Structural complexity, semantic complexity? One of those indirect ways of finding the dimensions of each kind of system involved in the linguistic faculty might be to look at the structural complexity of expressions. Stating this is easy. If expression X is syntactically more complex than expression Y, we expect expression X to correspond to a semantically more complex object than expression Y. But do we? It depends on our assumptions about the syntax-semantic interface. To see the difficulty in full generality, consider a simple mathematical result which is, again, based on Goedel’s work. Part of Goedel’s strategy for his Incompleteness Theorem was to come up with what is usually called Goedel’s numbers, natural numbers associated to any arithmetic operation. You can assign a number x to 224, a number y to 235, and so on.13 Arithmetic has the same expressive power as any first-order language, including predicate calculus or the sort of grammars that yield familiar phrase-markers, so there are also corresponding Goedel numbers for those. The problem is that once you assign a number to a phrasemarker chunk (any number and any phrase marker) you can assign anything you want to that phrase-marker (the objects in the periodic table, the humans that ever existed, the stars in the universe). There is no “interpretation” of the phrase in question which can be seen as more or less reasonable, a priori, than its alternatives. Anyone who has examined an unfamiliar writing system realizes that much. So then, what does it really mean to say that a syntactically complex expression should correspond to a semantically complex expression? Why should it not correspond otherwise, or be a random combination? Few ways around this difficulty exist. One possibility is to go iconic, but it is never very clear what that means. Another possibility is to attempt a minimalist reasoning. Nothing demands the correspondence we are alluding to, but if language is an optimal solution to an interface problem, it is not unreasonable that the complexity we see on one side should correspond to the complexity we see in the other.14 Tacitly, something very much along those lines is already assumed, for 300
WARPS
instance by way of the Compositionality Thesis. At times it is forgotten that this thesis is an empirical one. Human language need not have been compositional;15 it is just that this is the way we think it is. Thus if I see nine symbols in this sentence, I expect some factor of nine to tell me the number of semantic relations I must pay attention to (usually many more than that is expected). With that sort of empirical thesis in mind (whatever its ultimate justification), we can come back to the problem of finding dimensions in human expressions. Take, for instance, the count/mass distinction of chickens vs. chicken. In the former instance there are more symbols involved than in the latter; chickens is chicken plus -s. Does that reflect something about the semantics of the expressions? Semanticists do not usually think that way. If their ontology works with individuals (as most do), then that is where it all starts. They of course have treatments of mass terms, which are generally thought of as lattices of individual atoms. Here, the mass term is more complex than the individual term, ignoring the grammatical fact that languages never express the mass term as grammatically more complex than the count term.16 If we go with syntactic complexity, as presented in the languages of the world through agreement, selection, displacement possibilities, ellipsis, and similar syntactic devices, a hierarchy of the sort discussed by Muromatsu (1998), or Castillo (2001) arises (see in particular (7c)): (7) a.
b.
4-D: Mutatio 3-D: Forma 2-D: Quanta 1-D: Qualia
301
DERIVATIONS
c.
personal noun ANIMACY inanimate noun COUNTNESS
mass noun
MEASURABILITY
abstract noun
NOUNS
The observation that bare nouns are mere predicates goes back to generative semantics, and is explicitly defended in current cognitive grammar.17 From that point on, grammatical complexity takes over, through measure phrases (for 2D), noun classifiers (for 3D), and still unclear elements for the animates, suggested to be coding “change potential” in Bleam’s (1999) thesis. Traditional analyses have little to say about these added morphemes, or about the fact that they are needed. Why, for instance, do languages resort to noun classifiers to individuate nouns which are counted (as in Japanese) or demonstrated (as in Chinese)? If matters were as trivial as implied by standard semantics, should it not be enough to say “three cats” to mean three cats? Why do languages bother to express that as three instances of cat, with that very partitive syntax?18 If one is moved by that grammatical fact, it is not unreasonable to take one’s semantic ontology not to have such things as individual cats. Rather, one deals with a raw conceptual “cat” space, instantiated into a token cat if an element like a noun classifier is added to the expression. The reader might be wondering what a raw conceptual “cat” space is. I do too; but I also wonder about what a primitive cat is, as in “the one in my ontology.” The question is equally puzzling in both instances (answering that the cat is there will not do, since how we know it is a cat, is part of the question. Besides, many things are not there and we want to refer to them as well). At any rate, for present purposes it suffices to say that the conceptual space in question is some network of associations between the different modules involved in this lexical entry. It is, I suppose, the kind of lexical space that tells us of a certain look, smell, this and that, but crucially nothing about the set of cats, or that set in all possible worlds, or any such thing that involves individual cats. From this perspective the “concept” cat is a mental network. Token cats are denoted only if a grammatical formative enters the picture to say, essentially, “having that raw cat space, now you present it as an individual entity.” This might be a complex way of doing things, but that should hardly be an issue. What we have to determine is if this (coherent) view is how natural language actually works. This is really what is being said. At 1D, a nominal conceptual space is a given network of associations with other mental modules; given the porous modularity thesis, the associations must be confined to unidimen302
WARPS
sional ones (the system cannot see highly elaborate dimensions unless they are built in, thus certainly not from the 1D layer). One might think that this already kills the idea, since after all the way a cat looks is presumably a visual 3D representation. But this is not true. I said we cannot be naive about visual representations. So far as I know, what counts as a 1D visual representation has nothing to do with width, length or height. Whether it has to do with colors, contours, shades, or simultaneous maps of all that, is an empirical question. Whatever the answer, the linguistic system could use that for its 1D representation of the noun cat. Say for concreteness that this has to do with simultaneous visual maps (color, contour, shade, etc.), then that kind of information would go into the lexical entry cat, although nothing about higher order visual dimensions (whatever those are) would. Those, however, might enter the picture once the term cat is presented in token fashion by way of an individual classifier. The next question is how to move from the ID space we have been exploring to the more complex dimensions. How do we get mass into the picture? Regardless of whether our 1D space is that of a cat or a square root or whatever (differences in terms of the various modules involved in the relevant network of connections) one thing we can do with abstract 1D space is warp it back and forth, until we get a lattice. The very fact that my wife knits the scarf proves, Dr. Johnson style, that you can get a lattice from a one-dimensional object. Once you have it, familiar mass properties, as customarily analyzed, follow directly. For instance, more “knitting” is more scarf, less is less. Two scarf halves are still a scarf and so on. Similarly, we can ask how to move from the 2D space of the mass expression (the scarf representing whatever concept we are dealing with) to a token element. Again, we can take the topological metaphors seriously.19 With the scarf plane we can get a cigar band or some other origami-style three dimensional element. It does not have to look like a cat or chicken or a square root; that is not the point. It has to be objectual in some sense, which we can represent in terms of whatever folding carries us from the previous dimension correlated with mass to the new dimension, where boundaries arise. Whether the thing looks like an individual cat or a chicken, smells like that, and so forth, are properties that will come from the other interconnected modules, now accessible at 3D. It is reasonable to ask why we chose to express mass at 2D and countable elements at 3D and not the other way around, for instance. There is, first, a phenomenological reason for that. When we look at language from the point of view of the syntax/semantics complexity correlation, that is what we actually find. Count terms are more complex than mass ones. It might have been otherwise, but it just is not. Then again, could it really have been otherwise, given the general system we are presenting? That is a more difficult question, but it is not unreasonable to speculate in the direction of a necessary mapping. A topology is a more complex kind of mathematical object than a lattice. You get things like toruses and so on 303
DERIVATIONS
from folding lattices into themselves (twice, for that particular topological form). In other words, form in this sense we are exploring it is mathematically more complex than absence of form. A lattice has neither form nor boundaries. The elements with which we are modeling tokens of conceptual spaces do have form through boundaries (and perhaps also parts and other intricacies, depending on the various warps and folds, which mass terms lack). If this is correct, things could not have been otherwise.20 Once this picture is in place, it trivially predicts the kind of implicational correlations that we have become familiar with for lexical terms. I have not attempted to build a case of the sort sketched in Muromatsu’s (1998) thesis for lexical spaces other than nominal ones, but it is easy to see that similar considerations apply to verbs, for instance. That is precisely what Mori (forthcoming) tries to show, making a correlation of the sort in (7) for the Vendler ontology. In that view states are akin to abstract terms, activities to mass terms, more complex events to count terms. Already familiar questions arise for the basic source of the relevant conceptual spaces, but once again the kinds of intermodular, porous connections mentioned above would be very relevant, in principle. Issues about the boundaries of more complex events expressed through topological notions, as opposed to the lattice character of simpler verbs, also have an obvious expression in the terms discussed above for nouns.
8 Some predictions That a given event implies some state, or a count term some mass, is now expected. To be exact, the entailment obtains at the level of the mathematical structure we have been discussing and we use to model the various concepts. Aside from these vertical relations, we can now also capture some curious super-horizontal relations which do not obtain at the usual syntagmatic cut. For example, stative verbs behave like abstract nouns, processes like mass terms, or events like count nouns. Thus observe the kinds of correspondences in (8): (8) a. much-little coffee b. one-two-etc. cats
He grew much-little He sneezed once-twice-etc.
Mass quantifiers like much-little go with mass terms like coffee in (8a), unlike count quantifiers like many-few or the numerals. Similarly, they go with processes like grow, in adverbial fashion. Conversely, count quantifiers go both with count nouns like cats in (8b), and delimited events like sneeze, in adverbial fashion. This correspondence in behavior between different lexical classes is easy to state, but hard to code within the theory. However, in the terms presented here, the correspondence in behavior is expected, since the mathematical structure modeling each sort of element, whether nominal or verbal, is the same. This will force us to think about what is the difference between nouns and verbs, but I will return to that. We can also predict certain surprising behaviors of expressions which seem to be simultaneously of different sorts at the same time. One can say, for instance, (9): 304
WARPS
(9) My dog is soft, small, and very intelligent. The predicates here hold of the dog presumably as a mass, an object and a sentient creature; that is, for an expression that has various dimensionalities. In present terms, each predicate applies to each of the dimensions of my dog. Something related can be seen with selectional restrictions. There are certain verbs that select for direct objects of a given dimensionality. For instance, a predicate like weigh selects direct objects with substance, thus the contrasts in (10): (10) a. *Weigh mass.
b. Weigh coffee.
Note, there is nothing incoherent in the idea of weighing the mass property of particles (as opposed to, say, their spin), but it is odd to express that as in (10a), with the abstract term mass (which implies no actual mass). At any rate, given the assumptions above, all higher dimensional nominals imply the lowdimensional substance, hence weigh is able to combine with them: (10) c. Weigh tables.
d. Weigh people.
In contrast, there are predicates, like tame, which select direct objects of a very high dimensionality: (11) a. Tame beasts. c. *Tame rice.
b. *Tame trees. d. *Tame life.
Again, I am trying to give a chance to the meaning of these expressions. The objects in (11) all either have life or denote it. One could imagine the verb tame as selecting animate objects, thus expressing the idea of turning their properties from the wild to the domestic. However, physical animacy is not what the higher cognitive dimension would seem to care about, but rather change potential, in Bleam’s (1999) sense. Physically trees, rice, and life surely change, but cognitively they seem to be generally analyzed as more static, thus at a lower dimension, hence the unacceptable combinations in (11). The important thing, though, is that selection of a low dimension, as in (10), allows for presence of higher dimensions, irrelevantly for the selection process, but selection of a high dimension forces a unique sort of combination. Nothing short of that high dimension can be present in the selection process. Nouns or verbs are canonically expressed in a given dimension, as we saw above. However, these are easily altered in some languages, with an appropriate choice of determiner. Thus, in English we can get a coffee or a wine, going from the 2D mass terms to the higher dimensional expressions. We can also lower the dimensionality of an expression, particularly when we are referring to its stuff (chicken, lamb, etc.).21 In some languages this is done very systematically, even when we are not referring to an expression’s physical mass;22 for example, consider the Spanish (12): (12) Aquí hay mucho torero. Here have much bullfighter 305
DERIVATIONS
It is hard to translate (12). It does not mean there are proportionally many bullfighters here, but that there are cardinally many. At the same time, however, we cannot invoke reference to token bullfighters with that expression. Thus: (13) a. Diez de mis amigos conocen a muchos sinvergüenzas. Ten of my friends know to many rascals b. Diez de mis amigos conocen a mucho sinvergüenza. Ten of my friends know to much rascal (13a) is ambiguous, having both wide and narrow scope readings for muchos sinvergüenzas, “many rascals.” But (13b) only has the narrow scope reading of mucho sinvergüenza. My ten relevant friends either know the same or different numerous groups of rascals, but it cannot be the case that they each know a couple of rascals, and jointly all those rascals (known a few at a time, now a numerous group) are known by my ten relevant friends. I take that to mean that mucho sinvergüenza cannot act as a quantifier that generates readings where another quantifier can anchor its referential import, as normal individual quantifiers can.23 The expression is acting like a mass term not just in its morphological guise, but also in its semantic properties. If nominal expressions present different dimensionalities to them, we should not be surprised if otherwise inappropriate quantifiers (e.g. a mass quantifier with a count noun) can target those hidden dimensions. Why there are restrictions to this is something to study. For example, why is the phenomenon not observed in English? And why, even within Romance, is it restricted to those contexts where bare plurals are possible? (14) a. Aquí hay sinvergüenzas/mucho sinvergüenza. Here has rascals much rascal b. Yo he visto sinvergüenzas/mucho sinvergüenza. I have seen rascals much rascal c. *Sinvergüenzas/mucho sinvergüenza gobierna el país rascals much rascal rules the country Both types of expressions are grammatical only in direct object guise (14a), (14b), for reasons that are not obvious. English does have a phenomenon that resembles what we have just seen. Compare: (15) a. Some of my team mates scored many times. b. Some of my team mates scored much. (15a) is ambiguous. The interesting reading is, “Many times, some of my team mates scored.” That wide-scope reading for the adverbial is impossible in (15b). Normally, an event like score would be quantified in terms of a count quantifier like many. However, we also can use a quantifier like much to quantify over the scoring, but then we do not obtain individual token scorings, but rather we quantify over a raw scoring space which cannot anchor separate scoring instances necessary to allow for the narrow scope reading for the subject some of my team mates. 306
WARPS
The use of non-canonical quantifiers, restricted to event quantification in English, does not seem to be constrained by subject-object asymmetries as in Spanish. However, the appearance of bare plurals in English is not constrained that way either: (16) a. I want peanuts/my team mates to score much. b. Peanuts/For my team mates to score much would be fun. It is likely that bare plurals themselves, which we see patterning with noncanonical quantification in both Spanish and English, introduce a lower dimensionality reading for an otherwise count noun.24 Thus compare: (17) a. Some of my team mates scored many goals. b. Some of my team mates scored goals. It is rather hard to get a wide-scope reading for the bare plural in (17b), possible in (17a). Perhaps the expression goals without a quantifier is in essence acting as a mass term (semantically that is known to make sense, since a set of goals is unbounded, and can be modeled as a lattice). Then the question would be why in English non-canonical mass terms cannot go with mass quantifiers, even when they exist. In Spanish, in contrast, not only do those expressions appear systematically where we expect them (co-occurring with bare plurals), but they can furthermore be introduced by standard mass quantifiers. In any case, regardless of all those important grammatical details, the expressions of interest now exist, and in effect lower the dimensionality of an otherwise canonical count expression. Interestingly, that is done without invoking the actual mass of the denoted entities; that is, much bullfighter in Spanish or corresponding bare plurals in English obviously do not refer to the bullfighter’s flesh and bones. This should dissuade us from taking the dimensional notions as telling us something about the real world, or from obtaining the unexpected concepts by way of “grinding” operators and the like. Surely humans can conceive bullfighters as made of blood and guts, but more abstract and relevant mass perspectives are possible as well. One can literally find bullfighter stuff in a gory bullfight (the reading is possible). However, when one talks of having met much bullfighter (in Spanish), one is speaking of other lattice-like attributes of the relevant concept. Each dimension is built on a network of associations among mental modules, so there is no need to limit ourselves to the tangible, smelly, physical stuff. Only if we are bound to build our ontology on physical, smelly, tangible individuals, do we have to commit to that.
9 Nouns and verbs It should be clear by now what sorts of things the present system expects to find and can predict, and that each vertical dimension postulated, whatever its details, corresponds to a category. Now we are beginning to approach the Epstein-Seely goal. Although we have not been invoking standard derivations, the dimensional apparatus we have described could be modeled derivationally. 307
DERIVATIONS
I do not want to go now into the matter of whether that modeling, pushed to its formal limits, would be desirable. A dimension could be seen as a level of representation, a logical entry to the next level; thus we could vertically articulate the non-terminal elements in customary horizontal derivations. However, the ordering imposed on concatenative derivations (for instance from D-structure to LF) is extrinsic, whereas the order implicit in the dimensional system is intrinsic. That is a non-trivial formal difference between the two sorts of systems, and it may or may not indicate something deep about vertical and horizontal syntax, which I am not trying to confuse into a single type of process (see the last section). Be that as it may, the horizontal, concatenative derivation does not have to care about what the vertical system provides, so long as its demands are respected, for instance, in terms of selectional restrictions. Similarly, the vertical, paradigmatic system does not care about what combinations the horizontal derivation permits with its legitimate items. Logically, the vertical system is prior, since it generates the vocabulary for the horizontal one. And since the two are not the same, they may have different properties (I return to some at the end). But a serious question remains. We have said something about how verbs and nouns fit the present picture and how they even behave in parallel fashion, as expected, but how do they relate to each other? If we have not derived the verb/noun distinction, we have not explained it.25 I will pursue an idea alluded to by Mori (forthcoming), who in turn summarizes much previous work. Theme arguments are characteristic functions of verbs. In general, a standard verb must have a theme. If so, and provided that themes are standardly nouns, we could probably define verbal elements as functions over nouns, in some sense. If we succeed at that, we would have articulated the vertical dimension around nominal mental spaces. Defining verbs around nouns, in a precise mathematical sense, would have an added advantage. Theme arguments are known to delimit events, to use Tenny’s terminology. When I drink a beer, my drinking event lasts while there is beer left. That sort of situation may help us model the verb, in essence a function that monitors the change of the beer mass from some quantity to less of that, or no quantity. Functions of that sort are often conceived of as derivatives over time. Since in previous sections we spoke of mental spaces for nouns with proper mathematical correspondences, it should not be difficult to interpret (18): (18) A verb expresses the derivative of its theme’s space over time. Needless to say, to give non-metaphorical meaning to (18) we have to provide values to the theme’s space, or for that matter time in a grammatical sense. I will not attempt this here, because it is formally trivial but conceptually very difficult.26 As expected, different kinds of verbs care about different dimensions of space, in the obvious way. You can see, kill and eat a lobster, say, and you are clearly targeting different dimensions in each of those actions, which are thus 308
WARPS
bounded in different ways: the killing is conceived as terminating with the animal’s change potential (animacy), while the eating is not conceived as concluding until much later, when the substance is taken to be consumed. So-called “internal aspect,” thus, is a by-product of the dimensions we are articulating, specifically in terms of the dynamic function in (18). All verbs seem to be dynamic in that respect, although as Atutxa shows in work in progress, the rate of change for the theme argument in some is zero (states), and in others limiting instances exist for that argument (achievements and accomplishments). That those possibilities should arise is expected, but it is still interesting to ask what gives verbs their different dimensionality. That question was tough enough for nouns, but there we blamed it on the mathematical complexity of lattices vs. topologies. One expects a similar result in the realm of verbs, although events are harder to analyze directly in these terms, not as a matter of principle, but because they have an elusive dynamicity added, which makes even intuitive calculations hard to pin down.27 An intuition that Mori pursues in her thesis is that the more arguments a verb has, the more complex it is in its dimensionality. That is in the spirit of the minimalist syntax/semantics interface conjecture presented before. The more symbolic representations enter into the computation of an expression, the more semantic complexity we expect for it. Thus, if faced with the following two expressions: (19) a. John Xed a house. b. John Yed himself a house. We intuitively expect Xed to be a conceptually simpler verb than Yed, simply because the extra himself should be contributing something to the meaning, and if it is an argument of the main event, it will arguably affect its dimensionality. So one way to understand the dimensionality of a verb is in terms of its argument structure. But immediate complications arise. Some are uninteresting. Arguments may incorporate, and thus are hard to spot. That just means one has to be careful in the analysis. Other difficulties are much more subtle. It appears, for instance, that some verbs have a rather high dimensionality (they are, say, achievements as opposed to states) in spite of the fact that they have as many arguments as verbs of a lower dimensionality. For example: (20) a. Oswald killed Kennedy. b. Oswald hated Kennedy. In spite of both verbs being transitive, kill would seem to require a high dimensional theme, unlike hate. One can hate, but not kill, say, wine (setting metaphorical readings aside).28 Then a question that Mori poses is whether the high dimensionality of the theme could not in principle “boost” the dimensionality of the verb defined over it. Mathematically, this is not unreasonable, but it remains to be seen whether it is the right conclusion. Although many other difficult questions remain, Mori is probably correct in noting that the thematic hierarchy follows trivially if her general approach is 309
DERIVATIONS
right. Since for her a verb is defined over a theme, theme arguments should be special in that sense, and “lowest” in the hierarchy, since they would be presupposed by any verbal space. Then other arguments would add more verbal complexity, thus correlating classes in the Vendler hierarchy with arguments in the thematic one. These would not be two separate hierarchies, but only one, and the consequence of the general dimensional system. Again, this is reasonable, but it has to be tested seriously. This much seems true. It is easy to build a model where verbs are derivatives of their theme’s mental space over time, in which case the main task of relating nominal and verbal categories could be achieved. Note, incidentally, that this dynamic view of verbs need not make them in principle higher in dimension than nouns (pace Mori’s “boosting” situation). That you can map a function to time need not make that a higher dimensional function, just a dynamic one. In fact, recall that certain verbs and nouns are known to behave alike with regard to their process/mass or event/count properties, which indicates in present terms that the models we use to represent them have the same dimensionality, hence the same mathematical properties. A related question, of course, is why we have this static/dynamic duality between nouns and verbs, or why a verb is a dynamic perspective of a noun. Deep questions, it seems, about human cognition and our capacity to understand the universe as both permanent and changing, for in a nutshell that is what we are saying. Both nouns and verbs correspond to mathematical spaces of various dimensions, the difference between them being whether those spaces are seen as permanent or mutable.29 Needless to say, these conjectures constitute a program, but not an unreasonable one, even in practical terms. For instance, any of the current relational maps in the literature would do for our purposes as a starting analytical point. Surely there will be many details that will have to be imported and which may test the theory, after we translate those maps to the dimensional picture argued for here. Nonetheless, I am more worried about getting non-obvious generalizations that are not usually mentioned, than in obtaining total descriptive adequacy with regards to the immense wealth of data. For example, there is an observation of Bertrand Russell’s to the effect that normal human names have a characteristic continuity.30 We name a dog dog, but we do not name the set of its four legs *limb, say.31 How do we represent that and what does it mean? It is easy enough to state the fact in terms of a claim like (21): (21) Human concepts are generalizations of Euclidean space. Note, we have warped Euclidean space shamelessly, but we have not torn it. We could have. But perhaps human cognition does not use that meta-operation in ordinary language. If so, ordinary lexical concepts would be what topologists call “manifolds” of various dimensionalities. That is either right or wrong, but the question is rarely even posed. Many other simple, grammatical questions arise. For instance, the syntax of possession, as studied by Szabolcsi (1983) and Kayne (1994) (see Chapter 10), 310
WARPS
would seem to enter the conceptual picture we are studying. Thus observe the expressions below: (22) An animal with two pounds of weight, with structure of symmetrical organs, with families of polygamous composition. The use of preposition with immediately indicates possessive syntax, as does the fact that those predications can be expressed with inalienable have (“the animal has a family of polygamous composition”). This is universal, indicating that possessive syntax enters the realm of ontological classification, perhaps serving as the interface between the vertical dimension studied here and its horizontal manifestation. Note, incidentally, that the order in (22) is the natural one, corresponding to the neo-Aristotelian substantive claims made by Muromatsu (1998): (23) [[[[entity] substance] structure] relations] Likewise, many of these conceptual notions manifest themselves adjectivally, as in “a long haired animal,” and as Muromatsu suggests, familiar adjectival hierarchies should also fall in line with the dimensional structuring, with orderings alternative to those in (24) sounding either emphatic or ungrammatical: (24) [promiscuous [symmetrical [light [animal]]]] On a related note, grammatical functors surely affect all these relations we are talking about. A nominalizer will have the effect of coding a verbal space, as complex as it may be, as a nominal one. I take it though that processes like that are not the default ones, on which the system is based, which is possibly why those elements are morphologically specified. The category C may well also have that nominalizing effect; after all a complex event which is normally presented in propositional guise to make an assertion becomes something which a higher verb can use as theme, by way of C: (25) I hate that Oswald killed Kennedy. Oswald’s killing of Kennedy can be a very complex, high dimensional verbal expression. Nonetheless, one can hate that as much as one can hate coffee. That plausibly is because the complementizer automatically translates the event into a noun of sorts.
10 Learnability considerations We should not conclude this chapter without reflecting on how the present system would square with the acquisition task, and if it does, whether it can reproduce the actual acquisition process. A system which does not pass these tests is basically worthless. Let us first clarify what would be a reasonable demand for the system. Given situations of ambiguity, whereby a word is uttered in front of a child in a situation which could be analyzed in terms of more than one of the dimensions above, can the child decide on one analysis, and if so, is that the correct one? That is a sound question. To expect that the system presented here would tell us something about 311
DERIVATIONS
the general, overall acquisition sequence would be utterly disproportionate, among other things because it is not even clear what that sequence is. The logic of the interesting test may be presented in a narrative fashion A child observes a rabbit go by when a native speaker points at the creature while uttering, “gavagai!”32 How does the child know what the native meant? Was it “rabbit” or “fur,” or more exotic combinations? Let us stick to those two to simplify. Suppose the analysis in terms of “rabbit” is a 4D approach in Muromatsu’s sense, whereas the one in terms of “fur” is an instance of a 2D, thus in some definable informational sense, simpler take.33 What should the child, or a model Language Acquisition Device (LAD), conclude in the absence of explicit training? Does gavagai mean rabbit or fur here? We know what children do. They interpret gavagai to mean “rabbit”; in the dimensional terms, that would mean they go with the highest dimension. If one assumes the dimensional theory and is a behaviorist of any form (including a connectionist), this means trouble. What sense would it make for the blank, or scarcely organized, mind to go with the more complex structure without having acquired the simpler one first? What one should expect from this perspective is that, in the situation just outlined, the child should start with the simpler hypothesis, taking gavagai to mean the lower dimensional “fur.” So either the dimensional, or the behaviorist approach is wrong. Now take the question from the innatist perspective. The dimensions are already in place, in the human mind prior to experience. Of course, the child does not yet know that gavagai is arbitrarily associated to this or that, but the child does have the mathematical equipment in place in his or her mind to analyze the incoming situation. However, since the situation can be ambiguously analyzed, how does the child decide? Once again, the only relevant metric is informational complexity. However, far from hypothesizing the informationally simpler structure, a LAD with all the structures in place does best in hypothesizing the informationally most complex structure, among those present. The logic is familiar, Panini’s Elsewhere Condition. This conservative learning strategy is often expressed in terms of the Subset Principle, but that particular formulation will not do for non-Extensional Languages, like the one assumed in the Minimalist Program. Nonetheless, the logic is clear enough, and it can be stated in terms of a Sub-case Principle: (26) Subcase Principle Assuming: (a) A cognitive situation C, integrating sub-situations c1, c2, … , cn; (b) a concrete set W of lexical structures l1, l2, … , ln, each corresponding to a sub-situation c, (c) that there is a structure lt corresponding to a situation ct which is a sub-case of all other sub-situations of C, and (d) that the LAD does not know which lexical structure lt is invoked when processing a given term T uttered in C; then: the LAD selects lt as a hypothesized target structure corresponding to T. 312
WARPS
Note that it is situations that enter into subcase relations. Tacitly, I am assuming that: (27) Given cognitive sub-situations c and c, obtaining at a situation C, for I and I linguistic structures corresponding to c and c, respectively, and where d and d are the dimensions where l and l are expressed, we can say that c is a sub-case of c if and only if d d. Once a complete theory is in place telling us exactly how more or less complex situations are analyzed by the dimensional system, (27) (though true) should be a mere corollary of the system, but to make the reasoning work now, it must be stated. To see how these notions provide an analysis of the case discussed, consider two different scenarios: (28) Scenario 1: In fact gavagai means “fur.” Scenario 2: In fact gavagai means “rabbit.” And let us now see how the analysis works in terms of (26). Assuming (a) a cognitive situation C (the perceived event), integrating sub-situations c1 [“a 4D rabbit”], c2 [“2D fur”], (b) a concrete set W of lexical structures l1,l2 (the different possible interpretations of a word associated to the perceived event that universal grammar allows), each corresponding to a sub-situation c. (c) that there is a structure lt (which involves four dimensions of the basic syntactic structure) corresponding to a situation ct (concretely c2, the “4D rabbit”) which (as per (27)) is a sub-case of all other sub-situations of C (concretely c1, the “2D fur”), and (d) that the LAD does not know which lexical structure lt is invoked when processing a given term T (concretely, gavagai) uttered in C; then: the LAD selects lt as a hypothesized target structure corresponding to T. In other words, the LAD selects the “4D rabbit,” a sub-case of the “2D fur” in the sense of (27), as the meaning for gavagai, regardless of the factual details of the two scenarios in (28). Now, in Scenario 1, the child is wrong. However, she or he will not produce an “erroneous expression” if deciding to utter the now (for better or for worse) acquired gavagai, assuming all rabbits are furry. (If we do not assume that, then the whole experiment is entirely irrelevant, for we are trying to study situations of complete analytic ambiguity, that is, sub-case scenarios.) In Scenario 2, the child is, of course, right. So how does the child come out of her error in Scenario 1? Suppose she assumes an Exclusivity Hypothesis concerning lexical meaning in the acquisition stages:34 (29) Exclusivity Hypothesis Entities only have one name. If so, the child can retreat from the initial mistake by either hearing the word gavagai used for any other furry object for which she or he already has a name, or by hearing another word used for “rabbit.” Thus correcting the mistake 313
DERIVATIONS
necessitates no instruction. In contrast, imagine a LAD which did not select the conservative, most specific hypothesis in terms of the Sub-case Principle, one hypothesizing “fur” for the original meaning in the sub-case situation. Such a LAD would have difficulties with Scenario 2, and produce an erroneous expression when uttering gavagai before just any furry thing. It would not be easy for a child corresponding to that LAD to retreat from such a mistake. All other uses of gavagai from the community would reinforce the wrong guess, assuming all rabbits are furry. Someone might utter a different word in front of the rabbit, one in fact meaning “fur.” That will not be helpful, since the LAD would presumably take that to mean “rabbit,” assuming the Exclusivity Hypothesis. Of course, that new term would also be used by the community applied to other furry things, so the LAD would then be confused. What it took to mean “fur” is reinforced as meaning fur, but what it took to mean “rabbit” is actually challenged, without much room for maneuver, unless the entire edifice of choices is destroyed and decisions are again made from scratch. Or some correction by the community is provided. Linguists of my orientation take such corrections to be pointless and thus are pretty much forced to look the other way. Interestingly, psycholinguistic evidence points out in precisely that direction: children, given ambiguous situations, go with the more object-like analysis of nouns, or event-like analysis of verbs; in our terms, with the higher dimension.35 A couple of wrinkles remain. First, why do children not analyze gavagai to mean “Peter Rabbit.” That opens up the issue of names, and how they fit into this system. In Chapter 12, I argued that their characteristic rigidity is best analyzed as absence of internal structure of the sort relevant to us here. If so, a name should not compete with the lexical notions discussed so far (would be part of an entirely different paradigm), or if it does it should come in last, as an element of lowest (or no) dimensionality.36 Next, why do children not go with an analysis of cumbersome dimensionality, a 6D flag manifold, say, of the sort used to study quarks? That poses the question of what are the upper limits of grammatical dimensions in ordinary lexical cognition. If we go with the phenomenology, things seem to stop pretty much at three or four dimensions like those involved in countable vs. animate nouns or achievements vs. accomplishments.37 Needless to say, I have no clue as to why that limit exists for cognition, but once it is there, it is reasonable not to expect the child to come up with a more complex analysis. Finally, is it really true that children assume things only have a name, as crucially implied in the reasoning above and what about synonymous expressions, like car and automobile? Obviously those exist, and suggest the picture has to be complicated to include the notion “lexical paradigm.” It is quite unclear what that is, but it seems to exist even beyond these examples. For example, cases studied by Horn (1989) show that given a system that covers a certain “lexical space,” such as that involved in possible, impossible and necessary, redundantly expressed notions like *innecessary (something which is either possible or impossible) yield their existence to the more specific concepts. That itself is 314
WARPS
arguably a consequence of the Sub-case Principle, but it only works within paradigms; thus, unnecessary is perfectly fine, but it does not lexically compete with the paradigm above (since it uses prefix -un instead of -in).
11 Some conclusions The notion “paradigm” is interesting not just as a solution to the puzzle we posed in the previous section for the Exclusivity Hypothesis. It is telling us that lexical relations of the sort studied here have peculiar properties. There are no paradigms in horizontal derivations, no need to speak of exclusivity within them or to allude to a Sub-case Principle when acquiring them. Another specific property of lexical relations, canonicity, has been noted in Sections 3 and 8. Whatever that is, it should tell us that normally bullfighters are animate entities, although we can interpret the concept as a mass term in some instances. We can also interpret coffee as count, but it is normally a mass. Canonicity appears all over the place in the formation of complex predicates. To saddle a horse is not just to put the saddle on the horse. You would not saddle the horse if you put the saddle on its head. It implies a kind of arbitrariness that is observed, also, in the actual words that languages use for classification or measure expressions, which as is well known vary culturally. I have nothing to say about the variation here, nor do I think there is much to say in standard terms. My concern was mostly with the fact that these kinds of symbols have a certain dimensionality to them, thus, for instance, measures are in a meaningful sense lower in dimensionality than classifiers. But the fact that they exhibit canonicity restrictions should be accountable in standard scientific terms, since it is a universal. When we speak of the differences between the horizontal and vertical dimensions of languages, narrow syntax and the paradigms of the lexicon, most of us insist on three properties that fueled the linguistics wars: transparency, productivity, systematicity. Lexical structures are opaque to internal syntactic manipulation, random in the relations they allow and idiosyncratic, unlike narrow syntactic structures. But that perspective is loaded. Vertical structures are different from horizontal structures. That there should be opacity may just be signaling this cut, as conservation laws do in physics. A particle’s conservation of the handedness of its spin may have an influence in the formation of a field, but it may be essentially unaffected by a “higher” order gravitational component. The universe happens to be layered. On the other hand, the alleged randomness or idiosyncrasy may just be because of the wrong theoretical demands. If we impose the logic of relativity on quantum mechanics we get randomness or even nonsense. But why should we? When looked at from its own perspective, things happen productively and systematically within lexical paradigms. It is when we start looking across them that randomness and idiosyncrasy take over. If this is the right way of looking at things, what we really must understand is why vertical syntax manifests itself in lexical paradigms, unlike horizontal syntax. This might have to do with the fact that only vertical syntax is significantly acquired, horizontal syntax either being entirely universal or having 315
DERIVATIONS
parametric choices whose values ultimately reduce to properties that are best captured within vertical syntax. Why that is that way, implying that vertical syntax feeds horizontal syntax, is to my mind a deep question. I think it was a mistake of generative semantics to confuse these two general cuts on the fabric of language, but I think it would also be a mistake not to take the vertical cut seriously, just because we understand the horizontal one better. A related question is whether the dimensions I have been talking about here are a matter of syntax or semantics, and if the latter whether it is lexical or logical semantics. In part, these are terminological issues. It is clear that some structure resides behind the notions discussed in this chapter, with tight hierarchical properties. I find it useful to express the relations in question in “small clause” fashion, in part because I would like to relate the relevant structures to possessive ones, and those, I think, bottom out as predications of an “integral” sort. But whether we need standard or “integral” small clauses, or some other more abstract object such that each dimension is in effect an object in the next (thus the order of the formal language augments with each warp) is a formal matter which should not be decided without empirical argumentation. My own feeling is that syntax should code various orders of formal complexity, which would make its mapping to a corresponding semantics with various, progressively more complex types, all the more transparent. But other than a minimalist (that is naturalistic or aesthetic) reason for this, I hide nothing behind this hunch, although a serious study of functional categories, which I have set aside now, may have something to say about the matter. Deciding on that would also bear on whether the mathematical difference presented here between the vertical and the horizontal systems is to be taken seriously. Again, order in the vertical system is intrinsic, if what I said is correct, whereas in the horizontal syntax order is customarily taken to be an extrinsic mapping (from D-structure to LF, or corresponding objects). Could it be that this order, too, should be expressed in dimensional terms? Is that the right move, though, if the vertical and horizontal syntax do have significantly different properties? One may ask also what is the “ultimate” reality of these vertical hierarchies, how they differ from whatever underlies numbers, and so on. I do not know. Evidently, a mental module for mathematical reasoning is not the same as the language faculty, or those of us who are terrible practitioners of mathematics would not be able to finish a sentence. Luckily it is not like that, nor is there any reason, even from the perspective presented here, why it should be so. It is true that I am making an explicit connection between the number system, generalizations of Euclidean space and lexical concepts. This is in part because we seem to be the only species that moves around comfortably within those parameters, and more importantly for us here because that may have something to say about the vertical cut of language, which otherwise has to be blamed on some unclear properties of reality. God only knows how the human mind has access to that general ability. More concretely, but equally mysteriously, some evolutionary event must have reorganized the human brain to give 316
WARPS
us access to that. Perhaps the fact that we seem to be limited to lexical conceptualization in (roughly) four dimensions might even relate to the non-trivial fact that the world we occupy comes to us in four dimensions, at least in this stage of its physical evolution. Before (in the first few milliseconds after the Big Bang) it apparently had more. Although less than a speculation, that would be a curious state of affairs, with the structure of what is “out there” heavily influencing aspects of what’s “in here.” We have no way of knowing that, with present understanding in any of the sciences. Yet, it is reasonable to suppose that once the evolutionary event in point took place, whatever its details and causes, other parts of the brain got coopted to develop the faculty of language, mathematics, music, and perhaps other faculties. It is hard to imagine what else other than this sort of “exaptation” (in the sense of Gould (1991) and elsewhere) could have taken place, at least in the case of music, whose adaptive properties are non-existent; in my view similar cases can be built for mathematics (most of which is useless, certainly in a survival-of-the-fittest sense) and even language (although the latter is controversial, see Uriagereka 1998). In any case, presumably different interfaces exist for each of the modules just mentioned, hence the behaviors they allow are all significantly different, even massively so. This is expected in complex systems more generally, which may obey similar principles of general organization, but end up with structures mathematically as similar, yet functionally as diverse, as a sunflower corolla and a peacock’s tail.
317
NOTES
2 CONCEPTUAL MATTERS 1 My appreciation, first and obviously, to Noam Chomsky, for more than I can express. Section 1 has benefited from comments from Elena Herburger, Norbert Hornstein, Howard Lasnik, Roger Martin, and Carlos Otero. I assume the errors, etc. 2 I thank Cedric Boeckx, Elena Herburger, David Lightfoot, Roger Martin, Massimo Piattelli-Palmarini, Johan Rooryck, and especially Peter Svenonius for useful commentaries on Section 2. 3 Icelandic is trickier because it presents both V and P agreement. Hence it should pattern with (Sa) languages, but it can exhibit the (A, P) order. However, it is known that in this language the associate can move past the left periphery of VP (e.g. across auxiliaries). It is thus likely that at the stage of the derivation that concerns me the relevant order is (P, A) even in Icelandic. 4 P in the graphs below stands for “Participial head,” which may or may not exhibit agreement. “. . .” denotes any arbitrary number of irrelevant categories. Move is represented through arrows (e.g. as in (9b)) whereas attract (prior to move) is signaled via dotted lines (e.g. as in (10)); impossible dependencies are crossed. 5 (11b) is locally convergent (i.e. within the derivational horizon that is relevant for determining entropy), even if not using up the pleonastic will eventually lead to the crash of that derivational line. It does not matter, as all that these would be lines do is determine the most entropic, actual path. At the point of the derivational evaluation, the specific step considered is convergent, as desired. 6 Given what we said about Icelandic in Note 3, this should be the derivational fate of that language. Importantly, Icelandic is the typical example within Scandinavian languages where expletives can be null (non-existent in our terms). Expletives, though, may be pronounced in Icelandic, which recalls the situation in Western Iberian (nullsubject and an overt expletive). In both instances, the overt element is associated with the complementizer system, hence irrelevant. 7 The idea can also be stated in terms of intervention by the participial head itself. It would take me too far afield, however, to present the system in these terms, since the mere presence of a rich head does not create an intervention effect if the expletive is missing (as in Romance). 8 Section 3 has benefited from the useful help of Danny Fox, Joel Hoffman, Esther Yeshanov, Lilian Zohar, and very especially Hagit Borer, regarding the Hebrew data, and general comments from Cedric Boeckx, Elena Herburger, and Roger Martin. 3 MULTIPLE SPELL-OUT †
The contents of this chapter have been presented in several lectures, at the Universities of Connecticut, Delaware, Maryland, Pennsylvania, Stuttgart, Porto Alegre, Potsdam,
318
NOTES
and Rio de Janeiro, the City University of New York, the National University of Comahue, Oxford and Yale Universities, the Max Planck (Berlin), San Raffaelle (Milan), and Ortega y Gasset (Madrid) Institutes, and the School of African Studies in London. Warm thanks to the generous hosts of these institutions, as well as the various audiences, for comments, questions and criticisms. I am indebted to Juan Carlos Castillo, Stephen Crain, John Drury, Jim Higginbotham, Howard Lasnik, Roger Martin, Javier Ormazabal, and especially Norbert Hornstein and Jairo Nunes for very useful commentary, and both Maggie Browning (initially) and Norbert Hornstein (eventually) for their interest, as editors, in these ideas. This research was partially funded by NSF grant SBR 9601559. 1 For instance, in Bresnan (1971), Jackendoff (1972) or Lasnik (1972). Tree-adjoining grammars explored in, for example, Kroch (1989) also have the desired feature. 2 The reasons why compounds and spelled-out phrase markers are “frozen” are completely different (a real compound does not collapse), but the formal effect is the same. 3 This would be very much in the spirit of Hoffman’s (1996) idea that syntactic unification is not given by the derivation itself. 4 I assume the standard definition of a sequence a, b as a set {{a}, {a, b}} (see, for instance, Quine 1970: 65). Jim Higginbotham (personal communication) observes that the notation {a, {a, b}} would also have the desired effects, although touching on a deep issue concerning whether one assumes the Foundation Axiom (or whether the individual “a” is allowed to be identified with the set {a}). For the most part, I would like to put these issues aside, although I cannot fail to mention two things. One, if one assumes Quine’s notation, as we will see shortly, syntactic terminals will ultimately turn out to be defined as objects of the form {terminal} rather than objects of the form terminal. Two, this might not be a bad result, given that in general we want to distinguish labels from terms, which could be done by way of the definition of term in (6), stating that labels are members of (set) phrase markers that are not terms. Then the problem is terminal items, which clearly are terms but need to be labeled as well. One possibility is to consider a given terminal term as labeled only after it has been linearized, hence having been turned by the system to a {terminal} (the whole object is a term; thus, terminal is its label). 5 Note that the most natural interpretation of the radical version of MSO ships noncomplements to performance prior to the rest of the structure, thus proceeds topdown. This matter becomes even more significant below, when we discuss antecedence. 6 If specifiers are adjuncts, one can then attribute their being linearized prior to corresponding heads to the (poorly understood) concept of adjunction. 7 If command is not defined for an intermediate projection, this category will never command (hence precede) its specifier. The converse is true by fiat, given that a specifier is by definition a maximal projection. At the same time, intermediate projections must be relevant in computing command; if they were not, a head and its specifier would command, hence precede, each other. 8 The most difficult case does not arise when a specifier projects (the system prevents this on grounds of chain uniformity and similar considerations pertaining to checking domains – although see Chapter 6). Rather, it arises when the system sees an intermediate projection as a branch to Spell-out and later, after spelling it out, continues projecting it by merging it with a specifier. That should be perfectly fine, and it leads to an object that is linearized “backward,” with the specifier coming last. 9 See Kitahara (1993) and (1994) for the source of these ideas. 10 The presentation that follows owes much to useful discussions with Jairo Nunes and to Nunes 1995. 11 Ormazabal, Uriagereka and Uribe-Etxebarria (1994) and (independently) Takahashi
319
NOTES
(1994) do make proposals about the matter, which prevent extractions from inside subjects in terms of the Uniformity Condition. However, unlike the present proposal, neither of these naturally extends to extractions from inside adjuncts (assuming adjuncts are noncomplements). This view is generally contrary to the spirit of Larson (1988) – in particular, the idea that direct objects are structurally high in the phrase marker. 12 Note that the facts are no different if the subject is a pronoun. That is, (i) allows no more of a focus projection than (15b). (i) HE painted those frescoes This must mean that, despite appearances, pronouns (at least focused ones) are complex enough, in phrasal terms, to trigger a separate Spell-out. 13 Certainly, instances of a phrase simultaneously agreeing with two heads are not attested (setting aside chains, which are not phrases). The intuition is that multiple agreement as in (19) creates a “Necker cube” effect, which the Agreement Criterion explicitly prevents (see Section 6). 14 Interestingly, father does not turn out to be a term. In fact, no “last” element in a right branch ever turns out to be a term after a phrase marker is linearized. (Of course, prior to that, these elements are normal terms.) Technically, this entails that such elements cannot have an antecedent, which if pushed to a logical extreme might well mean that they cannot have reference. This would lend itself nicely to the idea, expressed in Chapter 15, that the “last” element in a right branch is always the predicate of a small clause; and it bears on the analysis of examples like (i). (i) every politician thinks that [some picture of him] should be destroyed Castillo (1998) argues that the correct structure for picture of him involves, in the lexical base, the small clause [him [picture]]; if that analysis is correct, him is actually not a complement, but a subject of sorts (of which picture is “integrally” predicated, in the sense developed in Chapter 9). If so, him turns out to be a term and can be bound by every. 15 At least, it is not obvious that an antecedent buried inside a “left branch” can hook up with a variable in a different command path. There are well-known (apparent?) exceptions, such as (i), or similar instances involving “inverse linking” or what look like bound anaphors in East Asian languages. (i) ?everyone’s mother likes him To the extent that these examples are acceptable, they may well involve a process akin to, but formally unlike, variable binding. If pronouns like him in (i) can be analyzed as incomplete definite descriptions, then (i) may have the meaning of something like (ii): (ii) everyone’s mother likes “the one that is relevant” By cooperatively confining the range of the context variable of him, we may end up with a semantics that is truth-conditionally equivalent to that implicit in (i). (See Chapter 8 for a similar treatment of certain anaphors, which may extend to the East Asian instances.) Then the question is what conditions govern context confinement, something that need not be sensitive to the strict command restrictions that are presently being explored for the syntax (see Chapter 11). 16 I mention this to address a reasonable objection that Jim Higginbotham raises in personal communication: semantically, it makes sense to say that “an anaphor seeks an antecedent”; but what does it mean to say that “an antecedent seeks an anaphor”? The issue is turned on its head immediately below, where I show how the radical version of the MSO proposal can deal with general issues of antecedence. 17 Evidently, I am speaking of bound-variable pronouns, not of anaphors subject to
320
NOTES
local principles – which presumably involve some sort of movement to the antecedent. Those, of course, are impossible unless antecedent and anaphor share the same CU, as expected given the MSO architecture. 18 The Agreement Criterion does not preclude an antecedent from binding two different pronouns, since the definition of antecedence requires only that the would-be antecedent agree with the phrase containing the bindee(s). 19 Why is it the lower and not the upper copy that deletes (although to solve the linearization riddle either one would do)? Here, Nunes relies on the mechanics of feature checking. Basically, feature checking takes place in the checking domain of the attracting phrase marker, and thus it is the copy in that domain that remains within the system. For present purposes, this is immaterial. 20 In the past few years, this sort of idea has been revamped by various researchers (see, e.g. Pica and Snyder 1995). 4 CYCLICITY AND EXTRACTION DOMAINS †
We are grateful to Norbert Hornstein, Marcelo Ferreira, Max Guimarães, Sam Epstein, and an anonymous reviewer for comments and suggestions on an earlier version of this chapter. Jairo Nunes is grateful for the support CNPq (grant 300897/96-0) and FAPESP (grants 97/9180-7 and 98/05558-8) have provided for this research, and the same applies to Juan Uriagereka, who acknowledges NSF grant SBR960/559. 1 For purposes of presentation, we ignore cases where two heads are in mutual c-command. For discussion, see Chomsky (1995b: 337). 2 In Chomsky (1995b: Chapter 4), the term LCA is used to refer both to the Linear Correspondence Axiom and the mapping operation that makes representations satisfy this axiom, as becomes clear when it is suggested that the LCA may delete traces (see Chomsky 1995b: 337). We will avoid this ambiguity and use the term Linearize for the operation. 3 See Chapter 3 for a discussion of how agreement relations could also be used as addresses for spelled-out structures. 4 Following Chapter 3, we assume that spelled-out structures do not project. Hence, if the computational system applies Spell-out to K instead of L in (9), the subsequent merger of L and the spelled-out K does not yield a configuration for the appropriate thematic relation to be established, violating the -Criterion. Similar considerations apply, mutatis mutandis, to spelling out the target of adjunction instead of the adjunct in example (14). 5 That is, regardless of whether adjuncts are linearized by the procedure that linearizes specifiers and complements or by a different procedure (see Kayne 1994 and Chomsky 1995b for different views), the important point to have in mind is that, if the formulation of the LCA is to be as simple as (7), the lexical items within L in (15) cannot be directly linearized with respect to the lexical items contained in the lower vP segment. 6 In principle, the rule of Spell-out can be interpreted as immediately sending spelledout material for pronunciation, or rather as freezing relevant material as PF-bound, but with actual pronunciation taking place later on in the phonological component. For present purposes, the second interpretation is the assumed one. 7 The approach outlined above is incompatible with a Larsonian analysis of double object constructions (see Larson 1988), if extraction from within a direct object in a ditransitive construction is to be allowed. 8 The computation of nondistinct copies as the same for purposes of linearization may be taken to follow from Uriagereka’s (1998) First Conservation Law, according to which items in the numeration input must be preserved in the interpretive outputs. 9 Notice that the structure in (24b) could also be linearized if the head of chain were
321
NOTES
10 11 12 13 14
15 16
deleted. Nunes (1995, 1999) argues that the choice of the links to be deleted is actually determined by optimality considerations. Roughly speaking, the head of a chain in general becomes the optimal link with respect to phonetic realization as it participates in more checking relations. For the sake of presentation, we will assume that deletion always targets traces. The sequence of derivational steps in (25) has also been called inter-arboreal operation by Bobaljik and Brown (1997) and paracyclic movement by Uriagereka (1998). Recall that the label of a spelled-out object encodes the information that is relevant to the computational system; that includes the information that is required for a thematic relation to be established between file and [which which, paper] in (30b). See Brody (1995) for a discussion of this kind of “forking” chains from a representational point of view. See the technical discussion about the structure of linearized objects in Chapter 3, where it is shown that constituents of linearized objects such as copy3 in (33) come out as terms in the sense of Chomsky (1995b: Chapter 4). As for the computation of the wh-copies inside the adjunct in (33) with respect to the whole structure in the interpretive component, there are two plausible scenarios to consider. In the first one, the interpretive component holds the spelled-out structures in a buffer and only computes chain relations after the whole structure is spelled out and the previously spelled-out structures are plugged in where they belong; in this case, identification of chains in terms of c-command is straightforward, because the structural relations have not changed. In the second scenario, the interpretive component operates with each object it receives, one at a time, and chain relations must then be determined in a paratactic-like fashion through the notion of antecedence. The reader is referred to Chapter 3 for general discussion of these possibilities (see also Chapter 6). See Hornstein (2001) for a similar analysis. This is arguably what excludes the parasitic gap construction in (i), since sideward movement of who places it in two thematic configurations within the same derivational workspace. (i) *whoi did you give pictures of ei to ei
17 This raises the very serious question of whether deletion ought to be cyclic, and if it is not, what that means. In Martin and Uriagereka (forthcoming) a different approach to these matters is attempted in terms of chains “collapsing” at different occurrences, without invoking deletion at all. 18 Needless to say, as stated this implies a representational view of chains, which must satisfy c-command conditions regardless of how they were generated. 19 It is not our intention here to present an analysis for all the different aspects involved in parasitic gap constructions. The aim of the discussion of the so-called S-Structure licensing condition on parasitic gaps was simply to illustrate how sideward movement is constrained. See Nunes (1995, 1998), Hornstein (2001) and Hornstein and Nunes (1999) for deductions of other properties of parasitic gap constructions under a sideward movement approach. 20 Following Chomsky (2000), we are assuming, largely for concreteness, that the maximal projection determined by a subarray is either vP or CP (a phase in Chomsky’s (2000) terms). In convergent derivations, prepositions that select clausal complements must then belong to the “subordinating” array, and not to an array associated with the complement clause (otherwise, we would have a PP phase). Hence, the prepositions after and without in (59) and before in (61) belong to subarrays determined by a light verb, and not by a complementizer. 21 For further evidence that sideward movement must proceed in this strongly derivational fashion, see Hornstein (2001) and Hornstein and Nunes (1999).
322
NOTES
5 MINIMAL RESTRICTIONS ON BASQUE MOVEMENTS †
For various reasons (including my own procrastination) this chapter has lived several lives. Its rudiments go back to early work with Itziar Laka, presented at ESCOL (Pittsburgh) and NELS (Cambridge) a decade ago. Later elaborations were presented at the Universities of Comahue, Connecticut, the Basque Country, Maryland, Pennsylvania, Porto Alegre, Santa Catarina, and the GLOW Summer Courses at Girona. Lakarra and Ortiz de Urbina (1992) includes a paper where most of this early work is summarized, and the barriers solution discussed in the third section of the present version is suggested. This version has been entirely rethought in minimalist terms – where I believe the most natural solution emerges – and this has taken me quite some time because the Minimalist Program is still rather open ended. I thank all the audiences who attended the relevant lectures and the editors of corresponding publications for various comments and questions. Likewise, I appreciate careful and useful NLLT reviews, as well as extensive editorial assistance from Alec Marantz. In addition, I am grateful for concrete comments (concerning the various versions) from Xabier Artiagoitia, Andolin Eguskitza, Rikardo Etxepare, Elena Herburger, Norbert Hornstein, Istvan Kenesei, Joseba Lakarra, Howard Lasnik, David Lightfoot, Jon Ortiz de Urbina, Beñat Oyarçabal, Georges Rebuschi, Pello Salaburu, Ibon Sarasola, Esther Torrego, Amy Weinberg, and especially Jairo Nunes, Javier Ormazabal, Myriam Uribe-Etxebarria, and also Mark Arnold and Ellen Thompson (who edited the piece). Itziar Laka deserves to be mentioned essentially as a co-author, although I do not hold her or anyone else responsible for my claims, particularly the minimalist extensions. 1 The classical introduction to Basque from the principles and parameters point of view, which contains also many relevant references within and outside generative grammar, is Ortiz de Urbina (1989). 2 The agreement marker in (2c) is not introducing a theme, in spite of appearances. The word lan “work” there is not associated to a determiner, hence is arguably not a real argument, as it is in (i), a true transitive sentence: (i) Aizkolariak lana egin du lumber jack-the-E work-the/a-A make 3-have-3 “The lumber jack has done the/a work.” 3 A reviewer points out that scrambling could also be at issue in the examples in (3) or (5) below. This is true, but I refrain from that possibility because I know of no systematic study of scrambling in Basque. In any case, all of the facts I discuss here arise regardless of whether the phrases involved may not have plausibly scrambled (e.g. because of their indefinite character). 4 In (7) I am not being exhaustive, but the point is Jonek and Mireni can appear anywhere except between verb and wh-phrase. 5 This is something that Ortiz de Urbina acknowledges. Note that the embedded auxiliary in (8) associates to the element (e)la “that.” In (9) below we see the embedded auxiliary associated to (e)n, a wh-complementizer that I gloss as “if.” 6 The complementizer nola should not be confused with the equally pronounced question word nola “how.” The same is true in English: (i) The professor explained how the earth moves around the sun. (i) is of course ambiguous, making reference to either the professor’s explanation about the mechanics of the earth moving, or to the mere fact of the explanation, whose truth is assumed. 7 (10) directly contradicts any attempt at analyzing the phenomenon under scrutiny as in situ wh-questioning: the question word is displaced from its base position. Longdistance wh-movement is shown below to behave as expected. 8 Matters get more complex if we assume Kayne’s (1994) analysis, which involves overt
323
NOTES
9
10 11 12
movement of IP to the Spec of (a universally first) C in languages where C appears last. This predicts no long-distance extraction in Basque (contrary to fact), unless multiple specifiers are assumed, as in Chomsky (1995b: Chapter 4). If so though, it is unclear why multiple specifiers do not salvage (10b). Laka’s “sigma” is, by hypothesis, a left-periphery head. This raises the same questions that Ortiz de Urbina’s leftward C does, and suggests that Kayne’s universal head-first analysis is on track. Nothing that I say bears on Basque complementizers being last, and if I raised the issue before it was only as a possible question for the V2 analysis. The reason I will not pursue the Kayne line here is that it is extremely cumbersome to present in full generality; but we have in fact sketched the details for that sort of analysis in Ormazabal, Uriagereka and Uribe-Etxebarria (1994). I thank a reviewer for observing that a proposal along these lines is Jelinek (1984). Chomsky (1986a: 39) suggests this view, too. It will not help to claim, à la Rizzi (1990), that why and similar adjuncts do not leave traces and hence only modify what they directly associate to. Under other circumstances (with subjects out of the way), IP adjuncts modify long distance. Consider also (i): (i) Nork mahaia bedeinkatuko du? who-E table-the/a-A bless-fut 3-have-3 “Who will bless the table?” Sentences of the sort in (i) were raised in L&U as a further problem for Ortiz de Urbina’s analysis. Sarasola (personal communication) has provided several examples, from written classical texts, of the format wh S O V; apparently, it is harder to find exceptions of the sort wh O S V. Modern speakers have varying judgments with respect to (i), Northeastern speakers allowing it more readily than others for whom the construction is clearly stigmatized in normative grammars. It is reasonable to suppose that (contrary to both L&U and Ortiz de Urbina 1989) (i) involves (vacuous) LF wh-movement, perhaps only if the direct object is unspecific, as a reviewer suggests. Vacuous movement may be happening, also, in multiple questions, which would otherwise create a serious paradox for anyone’s analysis (only one whphrase can be left adjacent to V): (ii) Ez dakit nork zer ikusi duen. not know-1 who-E what-A see 3-have-3-if “I do not know who has seen what.”
13 See Chomsky (1995b) on whether this should be an Agr head. 14 See also Raposo (1988), Ambar (1992), among others. As for why IP is not a barrier in English, see Sections 4.2 and 4.3. 15 Although this analysis was possible in the L&U framework, it was not pursued, mistakenly assuming that the trace of a wh-phrase should not count as a valid specifier of a given category. 16 The interested reader can find an introduction to details and conceptual foundations of this system in Uriagereka (1998). 17 The system also allows for convergent, optimal derivations which have no semantic interpretation. This is irrelevant now. 18 The idea of Multiple Spell-Out creating a “giant compound” is essential not just to the present analysis, but to everything said in Chapter 3, where the system is introduced. The result of (Multiple) Spell-Out is independent of what in the grammar forces the application of this rule – in this instance the repair strategy in (29). A reviewer suggests that the reason (29) should force early Spell-out is that the specifier and the head must undergo Halle-Marantz type fusion under adjacency conditions; this seems to me very plausible.
324
NOTES
19 Multiple Spell-Out of exhaustively merged sub-units of structure derives a number of notions and conditions, as discussed in Chapter 3. 20 This line has been pursued by Chomsky in his 1997 Fall lectures and in Chomsky (2000), and is consistent with everything I say, so long as we keep the morphological repair for specifiers of “heavy” heads. 21 Although adapting it to present minimalist concerns, I am now pursuing a modified version of an interesting idea suggested by a reviewer: “Why not say that pro does not need Case until LF, and so stays within VP until after all overt movement?” A pro moved at LF is nothing but a feature. I would like to reserve that specific sort of pro for Asian languages which do not exhibit strong agreement, which are discussed in the next section. 22 This is sometimes referred to as “Taraldsen’s generalization.” Taraldsen (1992) essentially pursued the sort of analysis argued for here. Note also that a feature pro must be a neutralized head/maximal projection, in Chomsky’s (1995b) “bare” phrase structure sense. That is, pro is a sort of clitic which is enough of a projection to constitute a real argument, and enough of a head to move in order to check morphology. 23 Chapter 3 shows that for this particular case of Kayne’s LCA we do not need a separate axiom; under reasonable assumptions, the result follows from economy considerations. 24 The creation of these partial objects of well-formedness suggests that the notion “level of representation of LF/PF” is playing no role, since all convergence issues are decided in terms of local sub-phrase-marker, very much in the spirit of the sub-trees of Tree Adjoining grammars explored by Aravind Joshi and Tony Kroch, and their associates (see Chapter 7). We still crucially need the notion “component of LF/PF representation,” which is what virtual conceptual necessity ensures anyway in a system that relates “sound” and “meaning.” 25 The notion proposed here is essentially identical to Chomsky’s (2000) “phase,” although it remains to be seen precisely what constitutes a phase. 26 That is under the assumption that features (here, an instruction to pronounce or not to pronounce, depending on what is taken to be basic) can be added in the course of the derivation in given structural contexts. 27 Chung suggests that VSO should not be achieved through head movement. The alternative she presents (her (11)) is incompatible with the minimalist system as presently being explored. I will abstract away from that possibility. 28 The fact that a complementizer incorporates to the matrix verb, obviously leaving the moved verb behind, suggests that the verb itself moves no higher than Laka’s (1990) “sigma” position. Chung considers and rejects this general sort of analysis for three main reasons. First, she adduces technical complications which are now solved. Second, she has interpretive reasons to proceed the way she does; the reasons are well taken, but can be kept as a consequence, not the cause of the syntactic phenomena. Third (her most important reason), she reasonably wonders why the trace of whphrases, and not the head of the wh-chain, triggers agreement; that, in present terms, is tantamount to asking why C incorporates across a wh-trace, but not a wh-phrase, which must be because if C does not incorporate, long distance wh-movement is impossible in this language. I will not address this intriguing matter here, though see the end of Section 7.2. 29 As a reviewer points out, subject extraction from the post-verbal position is correlated, in some Romance variants, with absence of overt agreement. This can be interpreted in various ways, among them assuming that, in those instances, the subject is pleonastic. Whether this case can be generalized to instances where overt agreement shows up is hard to know. 30 See Kiss (1995) for a review of proposals and various references. 31 Kiss notes that the facts are slightly more complex. The agreement is generally optional, although it becomes obligatory when the moved phrase is in the accusative
325
NOTES
Case (as in (54b)). This sort of Case can be acquired in Hungarian in the course of the derivation (see Kiss 1987: 140 and ff.). 32 This is somewhat consistent with well-known obviation effects that subjunctive clauses induce, particularly if they are analyzed as in Kempchinsky (1986). 6 LABELS AND PROJECTIONS †
We thank participants in our seminars for questions and comments, as well as an audience at USC, especially Joseph Aoun and Barry Schein. Thanks also to Elena Herburger, Paul Pietroski and Anna Szabolcsi. This work was funded by NSF Grant BCS-9817569. 7 A NOTE ON SUCCESSIVE CYCLICITY
†
We are grateful to audiences at the University of Maryland and the University of Iowa. We are especially indebted to Alice Davison, Norbert Hornstein and Paula Kempchinsky, as well as an anonymous reviewer. 1 A reviewer points out that, according to Chomsky, phases have the effect of limiting the search space and thus reducing the complexity of the computation. True as this may be, it still does not explain why the edges of these syntactic objects should be accessible from the outside, or why phases should be impenetrable, for that matter. 2 A reviewer points out that the TAG analysis also violates the constituency of the sentence, because the subtree is not a constituent. This is not strictly true in the TAG formulation, given that the foot of the tree contains an empty label that indicates the kind of constituent that has to be added at that point, represented in (2b) as IP. This is problematic nonetheless in a BPS framework, given that this category is not derived through the projection of a lexical item. 3 This is in the spirit of proposals in Chapter 8 about different Cases within a phase as diacritics on otherwise identical D elements, and Castillo’s (1999) account of weak pronouns in Old Spanish, where multiple specifiers are only distinguished through the A/A difference. 8 FORMAL AND SUBSTANTIVE ELEGANCE IN THE MINIMALIST PROGRAM
†
This is a version of a talk delivered at The Role of Economy Principles in Linguistic Theory, Max Planck Institute, Berlin. I wish to thank the organizers of the conference for their invitation and their useful editorial comments, and the audience for their very helpful comments. I also thank my students and colleagues at College Park for their cooperation when sorting out some of these ideas in my Spring seminar on minimalism. I am indebted to Elena Herburger, Norbert Hornstein, David Lightfoot, and Jairo Nunes for their comments on a draft. Usual disclaimers apply. This research was partly financed by a Summer Research grant from UMD at College Park. 1 Exaptation Evolutionary theory lacks a term for a crucial concept – a feature, now useful to an organism, that did not arise as an adaptation for its present role, but was subsequently coopted for its current function. I call such features “exaptations” and show that they are neither rare nor arcane, but dominant features of evolution [serving] as a centerpiece for grasping the origin and meaning of brain size in human evolution. Gould (1991: abstract) 2 This is not to say, of course, that procedures to integrate, for instance, go through each possible variation. That is a matter of implementation.
326
NOTES
3 For instance (i), which is ungrammatical as part of (ii) (because (iii) is a better solution), but grammatical as part of (iv) (because there is no better, convergent alternative). See Chomsky (1995b: Chapter 4). (i) (ii) (iii) (iv)
[a man to be t here] * [there was believed [a man to be t here]] [there was believed [t to be a man here]] [I believe [a man to be t here]]
4 In standard problems in dynamics, we can define a quantity, called a Lagrangian L, which ranges over velocity and position, and equals the kinetic energy minus the potential energy. This quantity can be used to rewrite Newton’s law, by way of the Euler-Lagrange equation to describe a particle’s motion in one dimension (an idealized version of the problem, which involves more than one particle; see Stevens (1995: 27–39 and 59–68) on these matters): . . L(x,x) d L(x,x) 0 . x x dt The particle path that satisfies the Euler-Lagrange equation makes the function A (the action) a minimum: tf . A [x(t)] L(x,x)dt
冕
ti
5 This corresponds to the procedure of “adiabatic elimination” of fast relaxing variables. The procedure is used, for instance, in reducing degrees of freedom within a probabilistic equation (see Meinzer 1994: 66 and ff.). 6 Uriagereka (1998: Chapter 6) offers a speculation as to what it means for a feature to be “viral” (as assumed in Chomsky 1995b). 7 This is not meant metaphorically. The Slaving Principle is surely a clearer determinant factor in the behavior of turbulence than in linguistic examples. 8 Why the growth of the snail shell proceeds the way it does is a complex matter involving genetic information and epigenetic processes of various sorts. On an early, extremely insightful view on this matter, see Thompson (1945: Chapter VI). 9 This version of the Linear Correspondence Axiom is discussed in Uriagereka (1998: Chapter 3), and is adapted to fit a bare-phrase structure theory. 10 Domination can be defined in terms of set-inclusion of constituent elements within terms, as in Nunes and Thompson (1998). 11 (3b) cannot be monotonically assembled into a unitary phrase-marker, given a “bare” X-theory; instead, the system allows the merger of structures which have been previously assembled, by way of generalized transformations. 12 A command unit roughly corresponds to one of Kayne’s (1984) “unambiguous paths,” appropriately adapted to the present system. As Dave Peugh and Mike Dillinger independently point out, command units can be generated in terms of Markovian systems, and are thus iterative (not recursive) structures. In the present system, recursion is obtained through a generalized transformation. 13 In Kayne’s terms, the correspondence is of the following sort (for A an abstract root node:b,c,d… a sequence of ordered terminals, and t1,t2,t3,t4,t5, … a sequence of time slots in the A/P components): (i) A→ t1 A b→ t2 A b c→ t3 A b c d→ t4 A b c d …> t5 …
327
NOTES
What we must determine is why this correspondence obtains, as opposed to other possible mappings (an issue first raised by Samuel D. Epstein, as far as I know). 14 Otherwise, one would have to define a new structural relation, deduce it and show this to be a better alternative to command; I do not see what that could be. 15 Generally: y f(x) (for f a variety of procedures). For instance: y1 is mapped to the x value three-times removed from 0; y2 is mapped to the x value prior to x1; y3 is mapped to the x value three-times removed from x2; and so on. (i)
1
16 17 18 19
20
21 22
23 24
2
3
4
5
6
7
(i) converges (the hierarchical ordering is appropriately mapped to some sequence of PF slots); but is not a simpler realization of y f(x) than y x. Whether or not these mechanics in terms of substitution are necessary depends on the particular version of the Multiple Spell-Out system that one assumes (see Chapter 3). More generally, the proposal has empirical consequence whenever we find structures that do not depend on merger (such as discourse representations or paratactic dependencies involving adjuncts). See Hoffman (1996) on this. The LF component being internal, here we can be bold and talk not just of structural properties, but in fact of universal structural properties. The flattened word-like object that results from L has to be understood as a wordlevel unit. Following a suggestion by Jairo Nunes, in Chapter 3 I deduce from this the impossibility of relating subject/adjunct internal structure to the rest of the phrasemarker (Huang’s (1982) CED effects). If a proposal first noted in Chomsky (1964), and attributed to Klima, is on the right track, perhaps a discourse representation line can be pursued for these examples. Klima’s suggestion was that wh-expressions hide an indefinite predicate of existence; who in (8b) is akin to which x and exists x. This essentially indefinite predicate of existence should be sensitive to Wasow’s (1972) Novelty Condition, with the pronoun his introducing a more familiar expression inducing an odd interpretation. The same can be said about everyone in (8a), if this element too contains a hidden indefinite one, as its morphology indicates (see Chapter 3). I do not make any commitments, however, as to whether this is the case. Just as the Last Resort Condition reduces computational complexity in derivations, so too does the Minimal Link Condition, if derivations that violate it are canceled. I do not see how this condition might follow from something like the Slaving Principle, but it might conceivably relate to other conditions on systems imposing locality and predicting field and “domino” effects. In this guise: A head moves to a head only if and are L-related. We are talking about structural properties which are conserved across derivational processes. It should be remembered, though, that quantity conservation laws in physics have helped in the understanding of particle families, and predicted new particles that were later on discovered.
328
NOTES
25 These are argument clitics. In contrast, sentences such as (i) are possible: (i) te me vas a resfriar you me go to get.a.cold “You’re going to get a cold on me.”
26
27
28
29 30 31 32
However, me is not an argument of resfriar “get a cold,” but is something more akin to an argument of the assertive predicate introducing the speaker’s perspective. An analysis of this and related cases would take me too far afield. This is welcome. The nominal carrying an uninterpretable feature must then raise to check the number feature in the determiner (see Longobardi 1994). Had the feature in the nominal been interpretable, this raising would be unmotivated. We would then have to say that in Spanish the determiner has a strong feature for the nominal to check, unlike in English – which would lead to two rather different LFs for each language. Interestingly, Jairo Nunes points out that several variants of Brazilian Portuguese overtly encode number only in determiners. These ideas also relate to Chomsky’s (2000) notions of “probe” and “goal.” Nunes & Thompson (1998) modify the notion “dominates” so as to have it hold of features. In class lectures (Fall, 1995), Chomsky abandoned the concept of “checking domain” altogether in favor of a theory of sub-labels. This is in the spirit of everything I have to say here, where it is crucially features, and not categories, that matter for various syntactic purposes. See also Nunes (1995) for much related discussion. See Halle and Marantz (1993: 129 and ff.) for a recent treatment of why forms like *oxens are not attested. This case is slightly different from the one I am discussing, *lionses, in that the latter involves two obviously identical plurals. It may be significant that infants do produce *oxens but never *lionses. This idea was suggested in Uriagereka (1988a: 54). That is, while the grammar codes the presence of a context variable (in essence, [s] is a contextual feature), it does not assign a value to it (I or II), any more than it assigns values to other context variables. These were noted in Uriagereka (1995b): Section 4. As Viola Miglio points out, (25d) contrasts with the perfect Italian (i): (i) Qui glielo si invia. “Here one sends it to them.” Thus, (25d) cannot be out for semantic reasons. In contrast, (ii) (provided by Jairo Nunes to illustrate a phenomenon which Eduardo Raposo also notes) indicates that the impossibility is not merely phonological either. Thus, while sentences involving the se, se sequence are impossible in Portuguese, (ii) is perfect: (ii) Se se morrer de amor… “If one were to die of love…” if se(impersonal) would.die of love
Crucially, though, the first se here is not a pronoun, but a complementizer. 33 I am not implying with this that se has a [s] feature; see below. 34 I am abstracting away from the exact status of (29a). What follows is a version of an analysis I attempted in (1988a), but was not able to put together. It owes much to the seminar on binding and Case taught by Luigi Burzio at College Park. Although I do not follow, specifically, his approach to these matters, they have greatly influenced my way of looking at the problem. See Burzio (1996, 2000). 35 That is, the direct object features do not directly move to v, particularly because in many languages involving V movement, this would have to imply incorporation onto a trace, plausibly barred by Chomsky under the view that chains are integral objects whose parts cannot be transformationally targeted. 36 In the version of the theory being explored by Chomsky in class lectures (Fall 1995), only heads are targeted for featural movement, specifiers being involved as a
329
NOTES
37 38
39 40
morphological side-effect. Then the checking domain reduces to the sub-labels of a head, the set of dependents of this head which associate via adjunction. Everything else said here remains unchanged, provided that the relevant domain of checking (technically, not a “checking domain”) is a set. Mathematically, this does not go into orders of complexity different from those involved in standard features. Matrices are needed either way. The intuition is to relate local obviation to switch reference phenomena, of the sort studied in Finer (1985). Matters have to be slightly more complicated than implied in the text, given the fact that the subject of an ECM verb’s complement (normally marked with the accusative value) has to be disjoint from the object of this verb (also marked accusative). There are different ways to address this puzzle, but I will put it to the side now. This parameter is otherwise extremely hard to motivate, if Case is an uninterpretable feature pertaining, ultimately, to the covert component. The Mojave example in (i) (attributed by Lasnik (1990) to Langdon and Muro (1979)) suggest that this view is correct, in light of what is said in Note 37: (i) ?inyec pap ?-∧kxi:e-m Judy-c salyi:-k I.sg potato I-peel-DR Judy-subj fry-Tense “After I peeled the potatoes, Judy fried them.”
41
42 43 44 45
46 47
48 49
As Lasnik observes: the whole switch reference system is still exhibited even with I and II pronouns. While this is surprising from a semantic point of view, it is natural from the purely formal perspective that I have just discussed. Regardless of this, the issue is moot if only D features make it to the same checking domain of t that the formal features of the pronoun do. This is particularly so if we assume that names, just as any other arguments, are headed by D, which is what gets to be in the checking domain of T. See Longobardi (1994), who builds on the essentials of Higginbotham (1988). The Danish data are courtesy of Sten Vikner, to whom I am indebted for an insightful discussion of these issues. See Vikner (1985) for a full presentation. These ideas go back to Burge (1973), who took name rigidity to follow from an implicit demonstrative. Higginbotham (1988) reviews and reworks Burge’s insight in terms that have inspired the proposal in the text. Why this should be so is, in and of itself, interesting, but I have nothing of any profundity to say about it, other than it fits well with the rest of the system. See Cole and Sung (1994) for this sort of analysis. Uriagereka (1988a: Chapter 4) presented an analysis along these lines as well, with empty operator movement to Infl Tense. This specific analysis is more in the spirit of what I have to say immediately below, since I do not think it is either sig or selv that moves. For some reason that I do not understand the adverbial mismo is obligatory. Templatic conditions of this sort are known to be relevant in various areas of morphology, across languages. Schematically, for Romance [s] tends to come before [s] (notorious reversals exist in Aragonese and Old Leonese). In turn, the unspecified [s] clitic (se) is a bit of a wild card. In Spanish, for instance, it comes first, before strong and weak clitics. In Italian, in contrast, it comes after weak clitics, but before locative clitics (Wanner 1987). In Friulian, it comes as a verbal prefix (see Kayne (1991: 664) for an analysis, and in archaic Italian, as a verbal suffix (Kayne 1991: 663). For discussion on how this affects the Case system see Raposo and Uriagereka (1996), where it is argued that structures involving se may involve Case reversal situations, as expected. None of the arguments that Chomsky gives for the Thematic Criterion carry through. For instance, it is said that without a Thematic criterion, (ia) should outrank (ib), by involving one transformation less:
330
NOTES
(i) a. [John t [v [used Bill]]] b. [John T[t v [used Bill]]] However, at the point of moving John, the sentences involve two different partial numerations, and are thus not even comparable for optimality purposes. Chomsky also wants to prevent (ii) in thematic terms: (ii) I believe [t to be a great man]
50
51 52 53
54
55
But as John Frampton (personal communication) points out, it is not obvious how a great man receives Case if the sort of believe that allows raising (selecting for the relevant sort of infinitival) is essentially unaccusative. These same mechanics can be extended to (expletive, argument) pairs, without needing to stipulate that the former are morphemically related to the latter. All that matters is that the associate’s features end up in the same checking-domain-set as the expletive features, as argued in Chomsky (1995b). In (47b), I am assuming a simplified version of Larson’s (1988) analysis. The idea here is that checking domains are set-theoretic notions super-imposed on phrasal dependencies (see Uriagereka 1998: Chapter 5 for a detailed definition). About this reading, I have nothing to add to what is said in Raposo and Uriagereka (1996). I should note, however, that our interpretation of indefinite se creates an apparent problem, since the interpretation of se in dative sites (47c) is not necessarily indefinite. I suspect this relates to another fact about dative clitics which is discussed in Uriagereka (1995b): their double can be definite or indefinite, something which is peculiar (the double of accusative clitics cannot be indefinite). Arguably, then, the interpretation of dative clitics is simply unspecified for definiteness. Reinhart’s proposal is in many respects rather different in spirit from Chomsky’s. She believes that “interface economy . . . determines the shape of the numeration: . . . it is at this stage of choosing the ‘stone blocks’ that speakers pay attention to what it is they want to say” (Reinhart 1995: 49). In contrast, Chomsky asserts that “there is . . . no meaningful question as to why one numeration is formed rather than another . . . That would be like asking that a theory of some formal operation on integers – say, addition – explains why some integers are added together rather than others . . . Or that a theory of the mechanisms of vision or motor coordination explains why someone chooses to look at a sunset or reach for a banana. The problem of choice of action is real, and largely mysterious, but does not arise within the narrow study of mechanisms” (Chomsky 1995b: 237). Actually, any sets would do the trick, although checking domains as set-theoretic objects are natural domains for everything I have said here to happen. 9 INTEGRALS
1 Observe that it would be consistent with this account if we added a layer to the underlying phrase structure. (i) [Spec be [DP Spec D0 [DPposs [Spec Agr0 [SC John a sister]]]]] This would yield an underlying small clause structure without any functional material. The derivation would then proceed with John moving to Spec Agr0 and then to DPposs from there on. The derivation would be as in the text. 2 This is discussed in Section 3. The matter was already discussed in Keenan (1987) and De Jong (1987), and observed as early as in Benveniste (1966). 3 This does not mean to say that the movement of the [D/P] occurs so that a minimality violation can be avoided. Such altruistic movement is barred given the assumptions in Chomsky (1995a, 1995b). More likely, the movement occurs to license the [D/P]. This would make sense if in these sorts of cases the [D/P] is null and that
331
NOTES
incorporation is required to license the expression. See den Dikken (1992) for suggestions along these lines for other null prepositions. 4 The details of this process are not fully understood. We do not know when deletion is required and when not. Furthermore, it appears that copying is understood as preserving reference rather than copying as such, given the presence of the pronoun. For relevant discussion see Fiengo and May (1994) and their discussion of “vehicle change.” Alternatively, the pronoun would be the Spell-out of the relevant lexical material (see Uriagereka (1994) for discussion of this traditional idea). 5 This assumption is defended in Sportiche (1990). In other work, we suggest a different alternative, by semantically analyzing each structure as follows. (i) a. ∃e [Infl (a-Ford-T-engine, e) & in-my-Saab (e)] b. ∃e [Infl (My Saab, a-Ford-T-engine, e), & in (e)] (ia) is invoked in SI instances and is simpler than (ib), the structure associated to II instances. While the former is an unaccusative structure, the latter is transitive, representing a relation of in-ness (or of-ness, to-ness, and similar instantiations of relation R). Following Bresnan (1994), we assume that only locative unaccusative structures allow the processes of long predicate raising. (For discussion of predicate raising, see den Dikken 1992.) This is exemplified in (2b), where the predicate in my Saab raises to satisfy the EPP. In (13b) it is a Ford T engine that raises, which is also a predicate in our terms. Note, however, that the predicate in this instance comes from a transitive structure (ib), hence cannot be long-moved. 6 Alternatively, it follows trivially from the suggestion made in Note 5. 7 Note, for example, that Canada has nine provinces is false, and would be true if the expression could convey the thought that nine provinces are located in Canada (which is true). This problem does not appear with SI constructions. There are two doctors in NYC is obviously true (even if there surely are more than two doctors in NYC in toto). 8 This is particularly clear in Spanish where plural associates do not trigger plural agreement in II interpreted existentials. (i) a. Había muchos wateres en el tercer piso. “There was (sg.) many toilets in the third floor.” b. Había(n) muchos wateres en el tercer piso. “There was/were many toilets in the third floor.” (ia) means that the third floor had many toilets. (ib) means that many toilets were on the third floor. Note that it is the SI existential that can show agreement in Spanish, for reasons that we will not go into here. 9 This is the unstressed some. When stressed it can be used exclamatively. (i) Wilbur is SOME pig! 10 A similar assumption is made in Reuland (1983), Loebner (1987), and Hornstein (1993), among others. 11 For sentences such as (i) Keenan proposes that there is a null predicate. (i) There is [a God XP] 12 Keenan (1987) ties the DE in have-constructions together with those in thereexistentials. He does this by licensing the subject in (i) by binding an open position that the relational noun inherently has. (i) John has a brother (of his) in college. In (i), John binds his and gets its -role accordingly. If so, (i) does not have a thematic subject and, if one assumes that have is semantically vacuous, the structure of the post-have material is (ii), similar to the post-copular material in there-clauses.
332
NOTES
(ii) [sc [NP a brother of John’s] [in college]] This early analysis is reminiscent of much of what we have to say, with Keenan arguing that some of the sentences we take to be ungrammatical are simply not interpreted existentially. For instance, consider (iii). (iii) Michael Corleone has Sonny’s brother in NYC. (iii) can mean that Michael has kidnapped Sonny’s brother Dino in NYC, but not that Michael has a brother in NYC who also happens to be Sonny’s brother. It is hard to see how a more traditional analysis would deal with these sorts of facts, but we do not think that Keenan’s interesting proposal should be extended to all regular existential constructions with there. 13 The following also have a partitive feel to them. (i) I have a brother (of mine) in college. (ii) John has a picture (of his) in the exhibition. These sentences invite the inference that I have other brothers and John other pictures. In other words, they have a partitive undertone. This contrasts with sentences such as (iii) in which the post-verbal NP fails to offer a similar invitation. (iii) John saw a picture at the exhibition. 14 We do not wish to convey the idea, however, that the functional/physical distinction is an ontological one, one being more “material” than the other. Throughout, we are talking about cognitive spaces. 15 Similar relations, at an even more abstract level, need to be postulated for “inalienable” possessions. Of course, there is no obvious sense in which we are constituted of our relatives. This suggests that what is central in unifying the C and R relations has little to do with constitution proper. One possible approach is as follows (for examples of the sort in the text): (i) [Extension (x,e) & Division (y,e) & in (e) & Saab (x) & Ford-T-engine (y)] The semantics in (i) translates an IS expression as a quantification over an event of in-ness (an integration) spatially extended in terms of a Saab, and expressing a spatial division of this extension in terms of a Ford-T-engine. Suppose the roles EXTENSION and DIVISION are primitive cognitive notions, which map onto traditional axiomatic operators on Boolean algebraic spaces (analogous to “” and “” for arithmetics). These operators express part-whole relations when operating on eventualities of in-ness, but can express other sorts of relations as well: (inalienable) possession when operating on eventualities of of-ness (the father of the bride); abstract constitutions for eventualities of to-ness (there’s various sides to this issue); and perhaps others. The point is, whereas an extension at a material level is divided in terms of constituent parts, an extension at a more abstract level may be divided in more abstract terms. Crucially, what is extended in (i) is a relation of in-ness, through a Saab (it is not the Saab which is being extended). Along these lines, consider (ii): (ii) [Extension(x,e) & Division (y,e) & of(e) & bride(x) & father(y)] There is no reason why (ii) should say anything about a bride being constituted of a father. All that (ii) (partially) describes is an eventuality of of-ness (an abstract relation), extended through a bride and measured through a father. In that abstract space, the relation is one of bride-dom, and fathers can apparently serve as appropriate measures of such spaces. 16 It is possible that the clitic first adjoins to D/P and then the whole complex moves out of the small clause. This prior adjunction would evade minimality restrictions. 17 Thus, for instance, in (i) four stomachs are typically at issue:
333
NOTES
(i) On leur a lavé les estomacs aux agneaux. they to-them have washed the stomachs to the lambs “We washed the lambs’ stomachs.” 10 FROM BEING TO HAVING †
This chapter would not have been possible without the many comments, debates, criticisms, and advice from the participants in my recent seminars at College Park and the Instituto Universitario Ortega y Gasset. I cannot credit everyone adequately for their valuable contribution. I cannot do justice, either, to the vast amount of literature that has emerged in recent years around the Kayne/Szabolcsi structure. Let me just say that the present chapter should be seen as a mere companion to those pieces. Finally, my gratitude goes to the generous hosts of LSRL 27 at Irvine (especially Armin Schwegler, Bernard Tranel, and Myriam Uribe-Etxebarria) and the audience at my lecture, whose suggestions have helped me focus my own thoughts. The present research was partially supported by NSF grant # SBR9601559. 1 This of course is not obvious, and would force us to treat this element essentially as a resumptive pronoun. 11 TWO TYPES OF SMALL CLAUSES
†
Parts of this material were presented at the GLOW conference in Lund, and at colloquia at the University of Rochester and the CUNY Graduate Center, as well as a seminar at the University of Maryland. We appreciate comments from all these audiences, as well as from our students and colleagues. We also appreciate critical commentary from an anonymous reviewer and Anna Cardinaletti and Maria Teresa Guasti, the editors of Small Clauses. 1 Many of these examples are somewhat marginal in English, perhaps because Case realization in this language in the SC subject is not through a dative marker, as in Spanish. 2 Nominals seem like purely individual-level predicates, thus: (i) ?*I saw him a man. However, Schmitt (1993) notes that (ii) is fine in Portuguese, with the import of “he has turned into a man” or “he looks like a man.” (ii) Ele está um homem. he ESTÁ a man This suggests that the impossibility of (i) in English is not deep, but perhaps again a result of Case theoretic matters (see Note 1). In turn, participial elements seem like purely stage-level predicates. So far as we know, (iii) is out in all Romance languages where the estar auxiliary is used: (iii) *Juan es despedido. Juan ES fired (cf. “Juan está despedido.”) Of course, (iii) is fine with a passive interpretation, which might relate to why this sort of predicate cannot be coerced into an individual-level reading. 3 Kratzer (1988) claims that certain individual-level structures are more constrained for modification purposes than comparable stage-level structures are. Thus: (i) a. Most people are scared in Sarajevo. b. Most people are black in Louisville.
334
NOTES
(ia) can be true of most of the inhabitants in Sarajevo or of most of the people that happen to be there. In contrast, Kratzer takes an example like (ib) to be true only of the inhabitants of Louisville, not the people that happen to be there. However, Schmitt (1993) points out that there may be a pragmatic factor involved here. Thus, consider (ii): (ii) Most children are intelligent in Central High School. (ii) is ambiguous, apparently in the same way that (ia) is. It can mean that most children in that school are intelligent, or that when in that (mediocre) school any child actually stands out as intelligent. 4 A reviewer points out that, in instances of this sort, the subject may precede negation and some adverbs. If this is optional, the point still holds for the option where the subject does not precede negation or the adverbs. The reviewer also notes that PRO could be inside VP with the lexical subject occupying some intermediate projection in a more articulated clausal structure. 5 De Hoop works within a Montagovian system partly enriched in DRT terms (Heim 1982; Kamp 1984). We assume neither, and instead work our proposal out in a neoDavidsonian system. One other semantic proposal that we will not go into here is Chierchia (1986), which analyzes the individual-level/stage-level distinction in terms of an implicit genericity operator for individual-level predications. This sort of approach may run into difficulties with (i), from Spanish: (i) Bobby Fischer es genial, pero no estuvo genial en Yugoslavia. “Bobby Fischer is genial, but he wasn’t genial in Yugoslavia.” This is a very typical instance where auxiliaries ser and estar can be used to distinguish the standing nature of a characteristic vis-à-vis its transient state. It is one of Fischer’s standing characteristics that he has genius, but that does not mean that he cannot have a bad day. Conversely, consider (ii): (ii) Soy triste de tanto estarlo. “I’m sad from being that so much.” This asserts that being in a general state of sadness makes the poet sad in a standing manner. If the latter (expressed through ser) presupposed the former (expressed through estar), then (ii) would be an uninformative tautology – which it is not. 6 There are a variety of topics that are entirely irrelevant for our purposes. We are just concerned with those which do not introduce emphasis, contrast, focus, etc., but are neutral starting points for a sentence. 7 A proposal of this sort was explicitly made in Chomsky (1977b), with a different machinery and assumptions. See also Lasnik and Saito (1992) for an alternative in terms of adjunction, and references. Uriagereka (forthcoming) argues for a principle along the lines of (i), responsible for obviation facts, referential clitic placement, and others: (i) B is referentially presented from [or anchored to] the point of view of the referent of A iff A is a sub-label of H whose minimal domain M includes B. From this perspective (discussed in the Appendix), what drives processes of fronting to the vicinity of a subject – which is to be responsible for a given judgment – is the need to place the raised material in the minimal domain (essentially, the binding domain) of the responsible subject. If (i) is correct as an LF principle, it may be immaterial whether the landing site of the fronting is a Spec (e.g. the Spec of F) or an adjunction site à la Lasnik and Saito. However, it may be the case that (i) is an interface principle of the post-LF mappings, in which case it would indeed matter whether at LF a feature checking mechanism drives the relevant movements (which would argue for a separate category like F). We will proceed assuming F for concreteness and because of the issues to be discussed immediately below.
335
NOTES
8 See below on this. To insist, this operation is not a result of QR or any semantically driven mechanism, pace Herburger (1993a), where the intuition is taken from. See also Guéron (1980). 9 A reviewer raises the question of whether the F position is inside the SC. The answer must be no, assuming the simplicity of these objects. The F position is needed solely for the pragmatic subject to land on at LF. In all instances, it is outside of the periphery of the clause, be it small or regular. However, see the Appendix for more on this. 10 We adapt this idea on nominative in Romance from Zwart (1989). Note that realizing a default Case does not mean that the Case is assigned by default. Assignment is in the usual way, but default realization emerges in peripheral sites. 11 Similarly, the realization of A-case may be morphological or in terms of government by a Case assigner. Redundantly (or alternatively), the entire structural process may be signaled through a given auxiliary. 12 A reviewer asks whether N’s come with an event variable even when N is not a predicate. In this system, though, every N is a predicate at some level, even if not necessarily the main predicate. This is true even for names (see below). 13 There is a complication with this approach. Consider (i): (i) A former world champion raped a beauty contestant. The predicate here is thetic, which means the variable in former world champion must be bound by the event operator. Presumably this means that the rapist was a former world champion at the event of raping. However, this is not necessary. Thus, suppose that the rapist was world champion at the time of the event, although he is not now. The speaker may choose to refer to him as a former world champion, and rightly so for he is not a champion any more. It is not entirely clear how former is interpreted outside of the event of raping if the event variable of the noun is bound by the event operator. If this is indeed a problem, it will cease to be an issue once we develop the contextual system we propose below. 14 A more standard form of (17b) exists with many instead of much, but the properties of this expression are significantly different. For instance, the latter binds a pronominal variable, but the former does not: (i) En España hay mucho torero # que está desempleado. “In Spain there’s much bullfighter # who is unemployed.” See also (28) in the text. 15 A reviewer is concerned with the meaning of Fischer in (i): (i) Fischer is our best friend. If Fischer is a predicate, what is our best friend? The latter is the main predicate of the assertion. But surely there are other predicates here. The difference between all of them is how they are bound. We take it that Fischer is bound by something like a ridigity operator internal to the projection of the subject, and hence its predicative status does not carry over to the main assertion. 16 A reviewer is concerned about the difference between the notions of context and event. Our system is essentially building on Schein’s (1993) on this. We take context variables to be predicated of event variables – hence the two are of a different order. Note, incidentally, that we are not suggesting that we should get rid of event variables (this would not make any sense from our perspective). Rather, event variables are not the mechanism to deal with the issue of the transience of thetic predications, and for that we need context variables. 17 In fact, this is the essence of Herburger’s insight, now reinterpreted in minimalist terms enriched with a realistic semantics. 18 Xx just means that X holds as a predicate of x.
336
NOTES
19 In fact, even within simplex sentences like the matrix one in (i): (i) Every golfer hit the ball as if he/she was going to break it. Incomplete definite descriptions such as the ball in (i) need a previous context for their uniqueness to hold. That is, (i) in its most salient reading means that every golfer hit the ball that he/she hit as if he/she was going to break it. The content of “that he/she hit” is expressed for us through a free context variable. The value of this variable must be set in terms of a context associated to each of the hitting events (see Uriagereka 1993). 20 The hypotheses make different predictions. A predicts that context is determined hierarchically, whereas B predicts that context is determined linearly. However, both approaches make their prediction with respect to highly elaborate LF or post-LF representations, and not overt structures. Hence, for our purposes now it is immaterial which of the hypotheses holds, since at LF we literally scope out the element which anchors subsequent contexts. This element is both hierarchically superior and linearly precedent vis-à-vis the element whose context it is intended to set. 21 For Szabolcsi or Kayne, sentences like John has a brother studying Physics and ?There is a brother of John’s studying Physics have a similar source, roughly (i): (i) [BE [John [a brother]]]
22
23
24 25
26
Each sentence is derived by way of either John or a brother raising for various reasons. Chapter 9 interprets the relation [John [a brother]] as a “possession” SC, and extends these possessive relations to a number of related instances. See also Keenan (1987) for similar ideas involving “integral” relations. The main point of Szabolcsi’s analysis is to show the independence of “possessors” vis-à-vis “possessed.” Uriagereka (1993) also discusses how to ensure that in an expression like every one of the men is available, every can take as a restriction the sort of structure in (25), and still have the same truth values as every man is available, where the restriction is much simpler. The issue is of no relevance to us now. It remains a fact that this paraphrase holds, and it cannot be explained away in the usual semantic claim that reduces every one of the to a determiner (see Keenan 1987). Syntactically, this is unacceptable. Evidently, the move also forces us to consider “variables” as objects of a predicate type, which poses questions about the relation between this predicate and the one denoting the set where the partition occurs. Though real, those very questions arise, independently, for partitive expressions in general. The only radical move being made at this point is to assimilate those partitive expressions to at least some quantificational ones, although not to all, as becomes apparent immediately. Chapter 9 suggests extending an idea along the lines in (27) to expressions which do not come in a partitive format. Intrinsically distributive quantifiers, such as cada “each” in (23) apparently license pro as in (27) even in the apparent absence of overt number marking. Importantly in this instance, though, relevant quantifiers do not tolerate plural expressions; thus *cadas los hombres “each-PL the men” contrasts sharply with todos los hombres “allPL the men.” This suggests that the distributive quantifier does have number specifications associated to it, but they must be singular. It is worth pointing out that some concrete expressions with auxiliary estar have lexicalized into a frozen idiom which does not tolerate auxiliary alternations with ser. For instance, in the Castilian dialect: estar (*ser) loco/como una cabra “to be crazy/like a goat (nuts),” estar (*ser) buena/para parar un tren “to be good looking/ready to stop a train (gorgeous).” Interestingly, these readings invoke standing characteristics of individuals, regardless of the auxiliary. Predictably, in these circumstances examples of the sort in (28a) are fine, without any further contextual qualifications:
337
NOTES
(i) a. Todo portero está loco/como una cabra. All goal-keeper is crazy/nuts b. Toda quinceanera esta buena/para parar un tren. All teenager is good-looking/gorgeous 12 A NOTE ON RIGIDITY †
I wish to express my gratitude to Chris Wilder and Artemis Alexiadou for their interest in this piece, and for editorial assistance. I also thank Qi Ming Chen and Yi Ching Su for the Chinese data and valuable discussion, and Norbert Hornstein and Elena Herburger for commentary and (dis)agreement. Any errors are mine.
1 It is not at issue that we might have called Antony “Brutus” or any other name. 2 I’m not trying to imply that there is no known solution to this puzzle. “Counterpart” theory and complex enough versions of modal logic have the grounds to solve it, for instance. 3 One could complicate the picture in other directions – e.g. invoking characteristics of Greeks or Trojans as nations, or each of the individuals in the relevant sets. However, I am trying to idealize here precisely to understand what might be involved in the more elaborate modes in (1). 4 If this sounds too much like a description, substitute it for the name of your favorite group, for instance, Nirvana. Literally everything said applies to them vis-à-vis, say, the Bee Gees. 5 It may seem that by changing the parts of a whole one always changes the whole, but this depends on the nature of both whole and part. For example, a complex dynamic system like that involved in the crystallization or vaporization of water involves a whole whose parts are H2O molecules. You can change this or the other H2O molecule, and ice will still be ice and steam steam, under similar conditions of temperature/pressure/volume. Note, incidentally, that the issue has nothing to do with parts being essential or accidental. It is essential that ice is composed of H2O molecules, yet changing some of these molecules for other H2O molecules has no consequence. 6 Putting aside the “affective” reading that arises when saying “that (drunkard/ bastard/glorious) Buster Keaton was a true genius!” 7 It can also affectively refer to (oh!) that happy Dalai Lama, a reading l am setting aside. 8 Burge could always try to stick to his guns by claiming that overt demonstratives do not behave like covert ones, but that seems unilluminating and unfalsifiable. 9 Bear in mind that manifolds can have different dimensionalities, and what is apparent connectivity at some level is not at a lower level. This may be going on in “archipelago” or “silverware,” which are not contiguous in obvious dimensions, but are indeed temporally contiguous (which can be thought of in a fourth dimension). I do not know whether this extends to “country” or “furniture,” or whether there we need contiguity at an even higher, perhaps modal, dimension. 10 I shall now ignore the element of, which need not be present in other realizations of the relevant expression, such as some Antony modes or Antony’s modes. The fact that all these are possible constitutes evidence for the syntax implied here, but I shall not repeat the arguments I give in Chapter 15 for this. 11 That is, they are articulated manifolds built from progressively more complex spaces, all the way down to a more elementary space. 12 I read these last paragraphs again and they all sound like excuses. Therefore, I shall leave them. If I did not feel insecure about this, I would say that much of what I have done here is in the spirit of Larson and Segal (1995) at the intentional level and Jackendoff (1990) at the conceptual level. However, since I have pushed matters quite a bit in directions that not even they have pursued, I am ready to take the blame gallantly, all by myself.
338
NOTES
13 PARATAXIS †
This chapter was read at the Georgetown University Round Table on Linguistics (1995). We are grateful to the organizers of the event, especially Hector Campos, and the participants in the workshop for useful comments and suggestions. 1 The how we have in mind is not adverbial in the intended reading. Rather, it appears in “he explained to us how we should never be late,” which is of course different from “he explained how we should behave” where the mode of behaving is part of the explanation. There are no obvious modes of “being late.” 2 Ross (1967) discusses data like (i) as a surprising violation of his Complex NP Constraint: (i) Which company did you hear rumors/a rumor that they have squandered? Uriagereka (1988a) presented this as evidence for the complement status of the clausal dependent of the noun. But while the facts are clear, it is far from obvious what they constitute evidence for in the present system. 3 We mark this sentence with an asterisk even though it is somewhat acceptable with an incomplete definite description interpretation for the predicate. The sentence improves as (i): (i) That the earth is flat is the rumor that I heard. The contrasts shown by Spanish in (ii) are significant in this respect: (ii) a. El que la tierra es plana es el rumor que escuché. The that the earth is flat is the rumor that I heard. b. *Que la tierra es plana es el rumor que escuché. That the earth is flat is the rumor that I heard. (iia) is good, but notice that the CP is a DP: el CP. Hence, it seems as if (iia) involves a relation between two nominals, and not a sentence and a nominal. In fact, it seems as if the nominals in (ii) are both DPs, and the relation is equative. It may be the case that the same is true about the English (i), even when the sentence in subject position does not have the obvious form of a nominal. At any rate, when the relation between the sentence and a nominal is clear (as in (iib)), the result is ungrammatical. It should perhaps be emphasized, also, that it is not the case that all Spanish sentences in subject position need to be nominal. Thus: (iii) Que la tierra es redonda es verdad. That the earth is round is true 4 In (17), linear order is irrelevant, and is assumed to follow after the rule of Spellout from Kayne’s (1994) LCA, in the version presented in Chomsky (1995b). For our purposes, it is also irrelevant whether rumor or truth project as N’s or as D’s, and we will simply use an “X” to label whatever projection is relevant – admitting, even, that it may be different in each instance. The notation employed in (17) is taken from Chomsky (1995b). Note, in particular, that categorial labels are underlined, and that the labels are different in each instance. In (17a), the label is the object X. In contrast, in (17b) the label is the ordered pair X,X. This indicates that the merger in (17b) projects a segment, while the merger in (17a) projects a category. The rest of the information in the categorial notation in (17) is not ordered. That is, the information coding the constituent sets is simply stating that, for instance, the label in (17a) is obtained by merging X and CP, in no particular order. 5 Curiously, we need to it to make the sentence fully grammatical: (i) ??That the earth is round has some truth.
339
NOTES
In Chapter 9, similar elements are observed in the context of mass terms, although in those instances they are fully optional: (ii) This ring has some gold (in it). At this point, we simply note the fact, and have no deep explanation for it. 6 These sorts of issues have been explored at length in Etxepare (1997). 7 Although this of course is not necessary, it could be that in the present system como is an unanalyzable item, which nonetheless targets a null D for checking reasons. 8 Kitahara 1997 allows for LF-mergers which do not extend phrase structure. However, his assumptions are not the same as those in Chomsky (1995b: Chapter 4), which we are using as our basic framework. 9 However, optimality is a matter of comparison, which entails that structures involving radically null complementizers in matrix clauses must compete with structures involving standard complementizers. Technically, this presupposes a common lexical array for both. Such a possibility exits only if we weaken the strong lexicalist approach in Chapter 3 of Chomsky (1995b) to a version allowed in his Chapter 4 approach, having incorporated the theoretical proposals in Lasnik (1995). Simply put, we must distinguish items in the lexical array (technically a multi-set referred to as a numeration) from items in the lexicon proper. As it is standardly assumed, the lexicon is a repository of idiosyncrasies. Yet, lexical items make it into a numeration with a considerable degree of systematic features. This indicates that such features are added upon considering the lexical item as a member of the numeration for syntactic computation. Then, the matter arises as to whether such features should or should not constitute a mark of difference with respect to the reference set of derivations for optimality considerations. The matter is currently under investigation, and partial results suggest that, at least in some languages, such features may not force entirely different reference sets. It is crucial for Chomsky’s reasoning about complementizers that, if radically null ones outrank overt ones, such comparisons be part of the system. 10 In the spirit of the proposals in the 1980s, Chomsky (1995b), Chapter 3, still assumes that strong Agr is associated to the presence of pro. Although this is not a necessary assumption, we will pursue its logic to see where it leads. 11 Convergence involves legibility conditions at the interface. Nonetheless a derivation may converge as gibberish in post-grammatical components, the assumption for this instance. 12 We still do not explain, though, why (32) is out with an emphatic reading, but acceptable otherwise, while (30) is simply out. 13 Note that whether an economy analysis is tenable within Chomsky’s (1995b) system depends on the details of the numeration. If do is its own separate lexical item, then there is no hope of getting the relevant examples above into a competition, simply because the sets of relevant derivations are not identical. 14 This is somewhat in the spirit of Fiengo and May (1994). 15 This is under the substantive assumption that only derivations stemming from the same lexical array are comparable. Once again, this is a topic of much debate (and recall Notes 10 and 14). 16 This, incidentally, aligns directly with the description of the facts in Stowell (1981), whereby null complementizers in English are shown to appear in governed positions only. In current terms, we would say that these are precisely the sites from which cliticization is possible (see Uriagereka (1988a) for an early version of this line). 17 The exclusion of the preverbal subject in (39a) is symptomatic that Spanish preverbal subjects are outside IP altogether. In fact, under present assumptions, they will have to be outside CP (so that they prevent the cliticization of the complementizer; see Barbosa 1995). Assuming this, a preverbal subject is the signature of extra functional material, which should directly prevent cliticization across it. As expected, also, in
340
NOTES
languages where the overt subject does not appear in the periphery of the clause, complementizer cliticization over overt subjects is allowed. Hence, the grammaticality of the Italian example in (i) (see Torrego 1983): (i) Voglio Sandro arrive domani. I want Sandro arrive tomorrow. 18 Quite a different picture should emerge, however, with verbs that establish a hypotactic relation with their clausal dependents. In general, there is no reason why they should bar wh-movement across them, and this is in fact the case. Interestingly, though, we find asymmetries in instances of long wh-movement (which are dealt with in Torrego 1983, within a different framework). Basically, long extraction of a whphrase forces all intervening complementizers to either be present, or else to all be absent: (i) a. ¿qué libro esperas que quieran que pueda conseguirte yo t? what book do you expect that they want that I may get? b. ¿*qué libro esperas – quieran que pueda conseguirte yo t? *what book do you expect they want that I may get? c. ¿*qué libro esperas que quieran – pueda conseguirte yo t? *what book do you expect that they want I may get? d. ¿que libro esperas – quieran – pueda conseguirte yo t? what book do you expect they want I may get? Within the minimalist framework, there is an immediate approach to this sort of asymmetry. Movement is driven by feature-checking. The overt movement of the wh-phrase in the sentences in (i) will have to be successive cyclic. If every time a whphrase enters into a CP, it does so attracted by a strong feature. Suppose this is the case. It is natural to assume (in fact, necessary, to motivate cliticization) that the features of the overt complementizer are not the same as the features of the cliticized version. In (i), the wh-phrase that checks successive cyclic wh-features is one and the same. Since each link of the chain has to be attracted by a strong feature, it follows that one and the same wh-phrase cannot be attracted by two different features. Technically, it could be the case that each Comp happens to attract the wh-phrase for entirely different reasons. However, this would have to be argued for, and we know of no empirical evidence that would support such a claim. 19 It is common in various languages (e.g. in the Balkan peninsula) to have different complementizers for each sort of structure, and allow extraction only of hypotactic contexts (see Spanish data below and Note 20). An interesting question is what happens in instances where wh-extraction is more or less marginally possible, at least in some languages, across paratactic domains. We suspect that other strategies are involved in this instance, including the licensing of parasitic gap structures of the rough form in (i), where the italicized element is perhaps either a clitic or not pronounced at all: . (i) Of whom do you think that Mary likes ’im (cf. Whom do you think that Mary likes) Pursuing this idea here would take us too far afield, especially given structures of the form in (ii), where a parasitic gap analysis should not be possible: (ii) Why do you think that Mary left? Then again, this sort of structure is known to be somewhat exotic, and it certainly degrades easily with further embedding or extraction across domains which do not agree in tense: (iii) a. Why do you think that Mary said that Peter left? b. Why will you say that Mary has left?
341
NOTES
These examples are odd with deeply embedded reading for why, suggesting that something peculiar is at stake in the acceptable (ii), perhaps related to the event structure of the expression. 20 The verb decir, when followed by a subjunctive (otherwise necessary for “Negraising” and polarity licensing) takes on an imperative flavor. Under this interpretation, the raising is perfect with the complementizer, but not without it. 14 DIMENSIONS OF NATURAL LANGUAGE 1 Or put another way, our modern system for representing numbers (as governed by the relevant rules for transforming those representations) encodes far more than linear order; “/2” describes that real number as divided by 2. In this sense, “/2” represents the rational number in question as a “mirror” of 2, since 2(/2) 1. Likewise, “1” represents the negative number in question as a “mirror” of “1,” since 1 1 0, where “0” is a label for the number whose successor is “1.” 2 In particular, through the creation of a loop (responsible for pronouncing very) when hitting some particular state (e.g. one pronouncing the word long). 3 Involving recursive rules such as S → S and/or/but S, or recursive systems of rules such as S → NP VP, VP → V S. 4 For defense of the general neo-Davidsonian “eventish” framework in which such accounts of causatives are embedded, see Davidson (1967a, 1985); Higginbotham (1983a, 1985, 2000); Taylor (1985); Parsons (1990); Schein (1993); Pietroski (1998, forthcoming a, b); Higginbotham, Pianesi and Varzi (2000); Herburger (2000), etc. 5 Feinberg (1965); Davidson (1967a); Goldman (1970); Thomson (1971, 1977); Thalberg (1972); etc. See Costa (1987) for a review. See Pietroski (1998, 2000), forthcoming b for discussion. 6 There are, no doubt, questions about the metaphysics of events lurking here. And we are committed to denying an unrestricted mereology principle according to which the fusion of Pat’s action and the subsequent boiling would itself be an event with Pat as Agent and the soup as Theme. But we see no reason for thinking that theories of meaning for natural language must adopt such profligate principles concerning which events speakers tacitly quantify over when using eventish constructions like (8). 7 This can be made rather precise if event quantification is restricted (Herburger 2000), and typically quantifier restrictions are contextually confined by speakers. Apparently, context confinement has some syntax to it, and is not completely free (see Castillo 2001 on this). 8 Perhaps it ought to be emphasized that the readings have nothing to do with (in)alienability. Again, both hearts in question are inalienably the patient’s, but it is the one in the chest that makes the difference. 9 In essence these two approaches correspond to whether lexico-conceptual structures occupy a separate syntactic component of the system or are rather a mere aspect of some other level. 10 These concerns are no different from the ones presented in Muromatsu (1998), Kamiya (2001) or Castillo (2001) for nominal expressions, which make all of them argue for a dimensional view. 11 Within Montague semantics -roles are not necessary at all, to start with. So their lexical absence could perhaps be explained as total absence. At the same time, a language system without -roles would be empirically adequate in other respects but we would have no way of capturing familiar, observable hierarchies. (Moreover, a traditional Montagovian semantics requires a more widespread appeal to type-lifting in order to accommodate adjuncts.) So our full argument is this. We need roles, yet they are never pronounced, and this is because they are elements outside the system. 12 Collins (2001) deduces labels in syntactic objects from more basic notions. This is fine with us, although it is important to clarify an idea that is often confused. The fact that
342
NOTES
13 14 15 16 17 18
something is deduced does not mean that it does not exist at some level. The moon’s existence follows from the laws of gravity when applied to celestial objects resulting from asteroid collisions with earth. But the moon exists. Indeed, its presence is crucial in determining tides, and with them, for instance, the emergence of life. Similarly for labels, deduced or not, they exist, and they are crucial, for instance, in the “reprojection” process studied in Chapter 6. When we say below that adjuncts do not have labels, we do not mean that labels are deduced for them. In that instance they truly do not exist. That itself is telling with regards to the syntax of adjuncts. It is as if they were not there, at least with regard to aspects of the grammar that care about formal properties. Lasnik and Uriagereka (forthcoming, Chapter 8) show how difficult it is to construct an argument for an unlimited class of words, similar to the one existing in the literature for an infinite class of sentences. In a certain sense it surely is not, if human I-languages are not formal E-languages. Here and throughout, we use quote marks (instead of corner quotes) when talking about schemata involving variables ranging over expressions. For our purposes, this simplification will do no harm. Note that order matters for “∧” but not for “#” or “.” Alternatively, one can enter a stipulation (ad hoc from a syntactic perspective) that such inscriptions are not legitimate symbols of the language. One must also be prepared to leave certain well-formed symbols, like “%(*1, *1),” undefined on pain of paradox. 15 WARPS
†I would first of all like to thank those students who have been “crazy” enough to even listen to these arcane ideas, especially those who put their thesis at risk because of them: Aitziber Atutxa, José Luis Ariel-Méndez, Tonia Bleam, Cedric Boeckx, Juan Carlos Castillo, Rikardo Etxepare, Analía García, Enrique López-Díaz, Nobue Mori, Keiko Muromatsu, Lucía Quintana. Without their work, these pages would have been worthless. I am grateful also to my friends of the La Fragua group, who participated in most of the discussions behind this system, and Pablo Bustos even pursued the visual connection with the help of Felix Monasterio. I would like to thank also my colleagues Norbert Hornstein and Sara Rosen; it was quite a challenge to put an intentional and a conceptual semanticist literally on the same page, but I at least have benefited tremendously from that clash of insight, which I have developed in the present direction (knowingly in ways which neither Norbert nor Sara probably care to endorse, and which should not be blamed on them either). My psycholinguistic babbling would not have been possible without the supervision of Stephen Crain and Rozz Thornton, who presented it to me in “parentese” fashion; again, no blame should go to them for my errors. I had the privilege of discussing many of these issues with Ian Roberts during my stay in England a couple of years ago, particularly the syntactic connection that I have not really developed here. The pleasure of (too) many beers with him might account for part of their radicalness. Finally, I thank everyone who attended lectures on these topics in various places, too numerous to mention, the editors of working papers where versions of this work came out, and lastly Frank Beckmann for his unrelenting enthusiasm and support, which are directly responsible for the present publication. The research behind it was partly supported by NSF grant BCS-9817569. 1 See for instance volume 29, number 2, or volume 30, number 3. 2 See for instance the type of model in Dowty, Wall and Peters (1981). A “meaning postulate” would relate elements in the individual ontology to the corresponding mass.
343
NOTES
3 This is a point that Chomsky often emphasizes, for instance in (1993a) and (1994). Philosophers do not usually even take the question seriously, although see Atlas (1989). 4 The view about causal relations with regards to the relevant events is not uncommon. See Pietroski (2000) for a general discussion of these matters. 5 This kind of approach is common within the framework of Category Theory (see Barr and Wells (1995) for a useful introduction). Here I am using sets (whenever I can) in order to avoid a more arcane discussion. 6 I have put the word “larger” in scare quotes because, of course, the size of the sets is the same. They are all infinite; yet it is still true that one set meaningfully generates the other, and not vice versa, and in that sense we can conceive of the generating set as “smaller.” 7 Playing might count as a four-dimensional object, particularly if it is not given a set amount of time. It is arguable at least that what gives play its fourth dimension is messing around in three dimensions, usually carrying an object from here to there by way of some elaborate patterns. I owe most reflections about games at this quasitheoretical level to a conversation with Ignacio Bosque. One other relevant example might be dancing, since the entity obtained through the dance in three-dimensional space clearly incorporates a crucial temporal dimension. Similar considerations apply to hunting. Baking (cakes, ceramic, etc.) might also be relevant, or for that matter any thermodynamic process that incorporates some feedback of energy in order to produce some kind of result (most kitchen tasks, but also agricultural activities). 8 There are many introductions to Goedel’s proof (the classic being Nagel and Newman 1958), but nothing nearly as delightful as Hofstadter (1979). 9 For Star Trek fans, the trick in (5) is a warp in the “technical” sense of the series. That is, somehow ordinary space-time is warped in such a way that a spacecraft travels from point a to point b, distant light years, through the “approximation” that the warp creates, thus much faster. That is warp 1. Once you get into that kind of super space, if you warp it again, you take the next short cut, warp 2, etc. (see Figure (7a)). Your speed grows logarithmically. Unfortunately, this is all physical nonsense, not to speak of the philosophical paradoxes it would create if possible. For a very instructive discussion of these issues see Kraus (1995). 10 It might be thought that the feature in question is not linguistic. This is far from obvious, though, given present assumptions. Surely duck starts with a [d], whereas goose starts with a [g], and that difference (plus the other phonetic distinctions) contributes to the fact that we take the two terms to be different. Roughly, this is true: (i) Different phonetic forms in N correspond to different referents or at least different modes of presentation for entities intentionally associated to N. 11 Philosophers will not be satisfied with this, since they are concerned with “ultimate reference,” as opposed to what I like to think of as “ordinary reference.” Thus, it could be argued that Jones might see (what I think of as) a duck and either confuse the length of its neck, or have a problem in his visual system so that he sees things longer, or any such concoction. Then is he referring to the same thing I refer to when I say “duck”? Perhaps not, but this is, so far as I can see, completely orthogonal to ordinary reference. What I expect my intentional theory to tell me is how humans normally succeed in picking out ducks when saying “duck.” There might be confusions, mistakes, lies and so on, but just as there are aberrations in any other realm of natural science, still, we study the norm. (Which may or may not be accounted for in Jackendoff’s terms, by invoking the visual, or other systems, assuming they work similarly in different humans, as is to be expected.) Needless to say, rather different questions arise if one is not interested in what humans do when they talk to each other, but is rather concerned with when they reflect about science, philosophy and the like. There what one means by “atom” or “V” and so on is a theoretical issue, and I do
344
NOTES
12
13 14
15
16 17 18
19 20
21 22 23
24
not think that Jackendoff’s suggestion is of any help. At the same time, I do not see that there is any point in considering that question from the semantics of natural language perspective. See Uriagereka (1998: Chapter 2) on these issues. This idea was inspired by fruitful discussion with other members of the La Fragua group, in its third meeting in the mountains of Toledo. I am particularly grateful to Pablo Bustos, Felix Monasterio and Juan Romero. This part of the point I am making is, again, well explained in Hofstadter’s book. The rest of the point can be appreciated in the useful commentary by Putnam (1983). I have suggested this kind of meta-theoretical approach in Chapter 8 for the LF side of the grammar (by way of the Transparency Thesis), and in Chapter 3 for the PF side (in attempting to deduce the base step of Kayne’s LCA from economy considerations). See Hornstein (1995a) for a useful discussion of the differences between “compositionality” and “systematicity,” the latter being a necessary property of a communication system, but certainly not the former. Larson and Segal (1995) is one of the few places I know where compositionality is explicitly taken to be an empirical thesis. I believe this general criticism obtains of all well-known treatments in the literature. See Langacker (1987), and for the generative semantics source, see Newmeyer (1980). Gil (1987) does address the question from a traditional semantic perspective, and comes up with an admittedly Worfian proposal. Humans have (at least) two modes of cognition. The perspective I am advocating, explicitly defended by Muromatsu with regards to classifiers, is a priori more reasonable. All languages have a classifier system (overt or covert). But that has semantic consequences of the sort I am now outlining. For a useful introduction to topological matters of the sort implied here, see for instance Gemignani (1967). Incidentally, we are talking about concepts here, but if one considers how form actually emerges in the natural world, exactly the same conclusions apply. One only has to think of the growth of a baby from a cell to a tube to something with various axes and aggregating tissues, which fatefully crumbles upon death. It is within the realm of reasonable possibility that natural form, whether in a physical sense or internal to our cognitive capacities, must be of this sort. From that perspective, it would not be unthinkable that the cognitive capacity reflects the nature of reality itself after all it is a part of it. See the last section on these questions. Generally, these processes are presented in terms of operations on the conceptual units. For instance, Jackendoff (1991) proposes a “grinding” mechanism. This is important. A “grinding” mechanism of the sort proposed by Jackendoff would give the wrong meaning here. The notion “generate readings” is borrowed from Szabolcsi and Zwarts’s (1993) concept of a “generator”; see also Hornstein (1995a). Muromatsu (1998) prevents any form of quantification from appearing in lower dimensionalities. I actually think that is too strong, since the lowest-dimensional concepts can be quantified by very, and certainly mass terms can too, with much and few. However, I suspect Muromatsu is essentially right about her guess, although it should be restricted to bona fide generalized quantification, which for some reason necessitates the highest dimensions. This is perhaps because in those is where individuals arise, and thus corresponding sets, necessary for generalized quantification. Evidently, the idea of set-demanding quantifiers must be connected with “generators” of the Szabolcsi/Zwart sort. How that is done is a very difficult question, for obviously a plurality is comprised of individuals yet plural expressions behave in some sense as masses. In other words, the question is, “What do we do in order to conceive a plurality of individuals as a notion which we generally think of as being presupposed in the understanding of
345
NOTES
25 26
27 28
29
30
31
32 33 34 35
36
37
individuals?” It would seem as if the plural morpheme allows for a kind of “loop back” from higher to lower dimensionalities. Similar questions obtain for the other lexical categories, but I will concentrate on the major cut, setting the others aside. For instance, I do not know what sorts of values nominal spaces should be conceived of as having, more or less, depending on degree of a quality, amount of mass, number of elements, all of the above (hence in the general case something more abstract that ranges over all of them). Presumably that is the way to go, since after all it matters for bounding the event whether there is, say, this much or that much of beer. But does language care about exactly how much or it simply expresses that there is a certain amount, and leaves the rest unspecified? I suspect the latter is the case, but it does not matter for my general purposes now, since all those possibilities would work. “What are event boundaries for a verbal topology?” or “What are the atoms of a verbal lattice?,” for example, are immediate hard questions. Regardless of the fact that wine is, strictly, a living entity, again, the issue is not reality, but abstract spaces, where wine is conceived of as a low-dimensional lattice, while people and other animate entities are conceived as very complex topologies with change potential. One other place where the permanent vs. mutable comes out is the rigidity of names vs. the flexibility of descriptions. Interestingly, nouns denote kinds, which have been argued to be rigid (a noun could be seen as the name of a kind). In contrast verbs with their arguments do not obviously denote kinds, and can be thought of as the description of an event. As noted by Chomsky (1965: footnote 15), who extends Russell’s observations beyond logical proper names, and raises related issues about the internal structure of objects with parts (e.g. when a cow breaks a leg we cannot conclude that a herd breaks a leg). There are potential counterexamples to this, but they may have a simple explanation. For instance, silverware, china or a forest, are non-continuous notions, but these would seem to be usable as mass terms, and then the discontinuous elements are nothing but the atoms of the corresponding lattice. At that atomic level everything is discontinuous, trivially. This is intended in the spirit of the puzzle posed in Quine (1960, Chapter 2), although of course he was not speaking of language acquisition by children. I do not care right now whether “rabbit” is 3D or whatever, so long as its dimensionality is higher than the one required for “fur.” On these issues, see Markman (1989), especially Chapter 2. See Markman and Wachtel (1988). I am indebted to Stephen Crain for discussing these matters, and providing me with the observation about event vs. state interpretations. For much relevant information and various references, see Crain and Thornton (1998). The fact that children acquire names at about the same time they do everything else does not allow us to decide among these two alternatives. As I said before, we need a much more controlled environment. In this instance, whether a child would take the description of something as its name, I am predicting that this is not the case. It is in fact somewhat unclear whether those two sorts of notions can be grammatically distinguished. For instance, although I know of languages that limit certain processes to animate expressions, I know of no language that has a grammatical morpheme to code them (equivalent to a noun classifier or a plurality marker).
346
BIBLIOGRAPHY
Alexiadou, A. and C. Wilder (eds) (1998) Possessors, Predicates and Movement in the Determiner Phrase, Amsterdam: Benjamins. Altube, S. (1929) Erderismos, San Sebastián: Gaubeka. Ambar, M. M. (1992) Para uma sintaxe da inversão sujeito-verbo em português, Lisbon: Colibri. Aoshima, S., J. Drury and T. Neuvonen (eds) (1999) University of Maryland Working Papers in Linguistics 8, College Park, MD. Atlas, J. D. (1989) Philosophy without Ambiguity, Oxford: Oxford University Press. Atutxa, A. (forthcoming) “Aktionsart: from the simplest to the most complex,” unpublished manuscript, University of Maryland. Authier, J.-M. (1988) The Syntax of Unselective Binding, unpublished PhD dissertation, University of Southern California, Los Angeles. Baker, M. (1988) Incorporation, Chicago: University of Chicago Press. —— (1996) “On the structural positions of themes and goals,” in J. Rooryck and L. Zaring (eds), 7–34. —— (1997) “Thematic roles and syntactic structure,” in L. Haegeman (ed.) Elements of Grammar, Dordrecht: Kluwer, 73–137. Barbosa, P. (1995) Null Arguments, unpublished PhD dissertation, MIT. Barlow, M. and C. Fergusson (1988) Agreement in Natural Language, Chicago: University of Chicago Press. Barr, M. and C. Wells (1995) Category Theory, New York: Prentice Hall. Benveniste, E. (1966) Problèmes de Linguistique Générale, Paris: Gallimard (trans. (1971) Problems in General Linguistics, Coral Gables, FL: University of Miami Press). Bleam, T. (1999) “The syntax of clitic doubling in leista Spanish,” unpublished PhD dissertation, University of Delaware. Bobaljik, J. D. (1995) “Morphosyntax: the syntax of verbal inflection,” unpublished PhD dissertation, MIT. Bobalijk, J. D. and S. Brown (1997) “Inter-arboreal operations: head-movement and the extension requirement,” Linguistic Inquiry 28: 345–56. Bonet, E. (1991) “Morphology after syntax: pronominal clitics in romance,” unpublished PhD dissertation, MIT. Boolos, G. and R. Jeffrey (1980) Computability and Logic, Cambridge: Cambridge University Press. Bouchard, D. (1984) On the Content of Empty Categories, Dordrecht: Foris. Bowers, J. (1992a) “Extended X theory, the ECP, and the left branch condition,” Proceedings of the 7th West Coast Conference on Formal Linguistics.
347
DERIVATIONS
—— (1992b) “The structure of stage and individual level predicates,” unpublished manuscript, Cornell University, Ithaca, NY. Bresnan, J. (1971) “Sentence stress and syntactic transformations,” Language 47.2: 257–81. —— (1994) “Locative inversion and the architecture of Universal Grammar,” Language 70.1: 72–131. Brody, M. (1990) “Some remarks on the focus field in Hungarian,” UCL Working Papers in Linguistics 2, London, 201–26. —— (1995) Lexico-Logical Form: A Radical Minimalist Theory, Cambridge, MA: MIT Press. Burge, T. (1973) “Reference and proper names,” Journal of Philosophy 70: 425–39. —— (1974) “Demonstrative constructions, reference, and truth,” Journal of Philosophy 71: 205–23. —— (1975) “Mass terms, count nouns and change,” Synthese 31, reprinted in F. Pelletier (ed.) (1979) Mass Terms: Some Philosophical Problems, Dordrecht: Reidel. Burzio, L. (1986) Italian Syntax: A Government-Binding Approach, Dordrecht: Kluwer. —— (1996) “The role of the antecedent in anaphoric relations,” in R. Freidin (ed.). —— (2000) “Anatomy of a generalization,” in E. Reuland (ed.) Arguments and Case: Explaining Burzio’s Generalization, Amsterdam: Benjamins. Campos, H. and P. Kempchinsky (eds) (1995) Evolution and Revolution In Linguistic Theory: Essays in Honor of Carlos Otero, Washington, DC: Georgetown University Press. Cardinaletti, A. and M. T. Guasti (eds) (1995) Small Clauses, New York: Academic Press. Carlson, G. (1977) “Reference to kinds in English,” unpublished PhD dissertation, University of Massachusetts, Amherst. —— (1984) “Thematic roles and their role in semantic interpretation,” Linguistics 22: 259–79. Castañeda, H. (1967) “Comments,” in N. Rescher (ed.) The Logic of Decision and Action, Pittsburgh: University of Pittsburgh Press. Castillo, J. C. (1998) “The syntax of container/content relations,” in E. Murgia, A. Pires and L. Quintana. (eds) University of Maryland Working Papers in Linguistics 6, College Park, MD. —— (1999) “From Latin to Romance: the tripartition of pronouns,” in S. Aoshima, J. Drury and T. Neuvonen (eds) 43–65. —— (2001) “Possessive relations and the syntax of noun phrases,” unpublished PhD dissertation, University of Maryland, College Park. Castillo, J. C., J. Drury and K. Grohmann (1999) “The status of the merge over move preference,” in S. Aoshima, J. Drury and T. Neuvonen (eds) 66–103. Cattell, R. (1976) “Constraints on movement rules,” Language 52: 18–50. Chierchia, G. (1986) “Individual level predicates as inherent generics,” unpublished manuscript, Cornell University. —— (1992) “Anaphora and dynamic binding,” Linguistics and Philosophy 15: 111–83. Chomsky, N. (1955) “The logical structure of linguistic theory,” unpublished PhD dissertation, University of Pennsylvania (published (1975), Chicago: University of Chicago Press). —— (1964) Current Issues in Linguistic Theory, The Hague: Mouton. —— (1965) Aspects of the Theory of Syntax, Cambridge, MA: MIT Press. —— (1972) Studies in Semantics in Generative Grammar, The Hague: Mouton.
348
BIBLIOGRAPHY
—— (1977a) Essays on Form and Interpretation, North Holland. —— (1977b) “On wh-movement,” in P. Culicover, T. Wasow and A. Akmajian (eds) Formal Syntax, New York: Academic Press. —— (1981) Lectures on Government and Binding, Dordrecht: Foris. —— (1982) Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, MA: MIT Press. —— (1986a) Barriers, Cambridge, MA: MIT Press. —— (1986b) Knowledge of Language: Its Nature, Origin and Use, New York: Praeger. —— (1993a) Language and Thought, Wakefield, RI: Moyer Bell. —— (1993b) “A minimalist program for linguistic theory,” in K. Hale and S. J. Keyser (eds), 1–52 (reprinted in Chomsky 1995b: Chapter 3). —— (1994) “Language and nature,” Mind 104: 1–61. —— (1995a) “Bare phrase structure,” in G. Webelhuth (ed.) Government and Binding Theory and the Minimalist Program, Oxford: Blackwell, also in H. Campos and P. Kempchinsky (eds). —— (1995b) The Minimalist Program, Cambridge, MA: MIT Press. —— (1995c) “Categories and transformations,” in Chomsky 1995b: Chapter 4. —— (2000) “Minimalist inquiries: the framework,” in R. Martin, D. Michaels and J. Uriagereka (eds) Step by Step: Essays on Minimalist Syntax in Honor of Howard Lasnik, Cambridge, MA: MIT Press, 89–155. Chomsky, N. and H. Lasnik (1993) “The theory of principles and parameters,” in J. Jacobs, A. von Stechow, W. Sternefeld and T. Vennemann (eds) Syntax: An International Handbook of Contemporary Research, Berlin: Walter de Gruyter (reprinted in Chomsky 1995b: Chapter 1). Chung, S. (1994) “wh-agreement and referentiality in Chamorro,” Linguistic Inquiry 25: 1–44. Chung, S. and J. McCloskey (1987) “Government, barriers, and small clauses in Modern Irish,” Linguistic Inquiry 18. Cinque, G. (1993) “A null theory of phrase and compound stress,” Linguistic Inquiry 24.2: 239–97. —— (1999) Adverbs and Functional Heads: A Cross-linguistic Perspective, New York: Oxford University Press. Cole, P. and L. Sung (1994) “Head-movement and long-distance reflexives,” Linguistic Inquiry 25.3: 355–406. Collins, C. (2001) “Eliminating labels and projections,” unpublished manuscript, Cornell University. Contreras, H. (1984) “A note on parasitic gaps,” Linguistic Inquiry 15: 704–13. Corver, N. and D. Delfitto (1993) “Feature asymmetry and the nature of pronoun movement,” paper presented at the GLOW Colloquium, Lund. Costa, M. (1987) “Causal theories of action,” Canadian Journal of Philosophy 17: 831–54. Crain, S. and R. Thornton (1998) Investigations in Universal Grammar: A Guide to Experiments on the Acquisition of Syntax and Semantics, Cambridge, MA: MIT Press. Davidson, D. (1967a) “The logical form of action sentences,” reprinted in D. Davidson 1980. —— (1967b) “On saying that,” reprinted in Inquiries into Truth and Interpretation, Oxford: Clarendon Press (1984). —— (1980) Essays on Actions and Events, Oxford: Oxford University Press. —— (1985) Adverbs of Action, in B. Vermazen and M. Hintikka (eds).
349
DERIVATIONS
Davies, W. D. (2000) “Against long movement in Madurese,” paper presented at the 7th Meeting of the Austronesian Formal Linguistics Association, Amsterdam. De Hoop, H. (1992) “Case configuration and noun phrase interpretation,” unpublished PhD dissertation, University of Groningen. De Jong, F. (1987) “The compositional nature of (in)definiteness,” in E. Reuland and A. ter Meulen (eds), 270–85. den Dikken, M. (1992) “Particles,” unpublished PhD dissertation, Holland Institute of Generative Linguistics. Diesing, M. (1992) Indefinites, Cambridge, MA: MIT Press. Doherty, C. (1992) “Clausal structure and the Modern Irish copula,” unpublished manuscript, UCSC. Dowty, D., R. Wall and S. Peters (1981) Introduction to Montague Semantics, Dordrecht: Reidel. Drury, J. (1998) “Root first derivations: Multiple Spell-Out, atomic merge, and the coresidence theory of movement,” unpublished manuscript, University of Maryland, College Park. Epstein, S. D. (1999) “Un-principled syntax and the derivation of syntactic relations,” in S. D. Epstein and N. Hornstein (eds), 317–45. Epstein S. D. and N. Hornstein (1999) Working Minimalism, Cambridge, MA: MIT Press. Epstein, S. D. and D. Seely (2002) Transformations and Derivations, Cambridge: Cambridge University Press. Ernst, T. (2001) The Syntax of Adjuncts, Cambridge: Cambridge University Press. Etxepare, R. (1997) “The syntax of illocutionary force,” unpublished PhD dissertation, University of Maryland, College Park. Feinberg, J. (1965) “Action and responsibility,” in M. Black (ed.) Philosophy in America, Ithaca: Cornell University Press. Fiengo, R. and R. May (1994) Indices and Identity, Cambridge, MA: MIT Press. Finer, D. (1985) “The syntax of switch reference,” Linguistic Inquiry 16.1: 35–55. Fodor, J. (1970) “Three reasons for not deriving kill from cause to die,” Linguistic Inquiry 1: 429–38. —— (1983) The Modularity of Mind, Cambridge, MA: MIT Press. Fodor, J. and E. Lepore (1998) “The emptiness of the lexicon: reflections on James Pustejovsky’s The Generative Lexicon,” Linguistic Inquiry 29.2: 269–88. —— (forthcoming) “Morphemes matter,” unpublished manuscript, Rutgers University. Fodor, J. D. and I. Sag (1982) “Referential and quantificational indefinites,” Linguistics and Philosophy 5. Freeze, R. (1992) “Existentials and other locatives,” Language 68: 553–95. Freidin R. (ed.) (1996) Current Issues in Comparative Grammar, Dordrecht: Kluwer. —— (1997) “Chomsky: the minimalist program,” Language 73.3: 571–82. Fukui, N. (1996) “On the nature of economy in language,” Cognitive Studies 3.1: 51–71. Fukui, N. and M. Speas (1987) “Specifiers and projection,” in N. Fukui, T. Rapoport and E. Sagey (eds) MIT Working Papers in Linguistics 8: Papers in Theoretical Linguistics, 128–72. García Bellido, A. (1994) “Towards a genetic grammar,” paper presented at the Real Academia de Ciencias Exactas, Físicas, y Naturales, Madrid. Gemignani, M. (1967) Elementary Topology, New York: Dover. Gil, D. (1987) “Definiteness, noun phrase configurationality, and the count-mass distinction,” in E. Reuland and A. ter Meulen (eds).
350
BIBLIOGRAPHY
Goldman, A. (1970) A Theory of Human Action, Princeton, NJ: Princeton University Press. Gould, S. J. (1991) “Exaptation: a crucial tool for evolutionary psychology,” Journal of Social Issues 47.3: 43–65. Guerón, J. (1980) “On the syntax and semantics of PP extraposition,” Linguistic Inquiry 11. Haken, H. (1983) Synergetics: An Introduction, Berlin: Springer. Hale, K. and S. J. Keyser (1993) “On argument structure and the lexical expression of syntactic relations,” in K. Hale and S. J. Keyser (eds) (1993) The View from Building 20: Essays in Honor of Sylvain Bromberger, Cambridge, MA: MIT Press, 53–110. Halle, M. and A. Marantz (1993) “Distributed morphology and the pieces of inflection,” in K. Hale and S. J. Keyser (eds), 111–76. Heim, I. (1982) “The semantics of definite and indefinite noun phrases,” unpublished PhD dissertation, University of Massachusetts, Amherst. Herburger, E. (1993a) “Focus and the LF of NP quantification,” paper presented at SALT III. —— (1993b) “Davidsonian decomposition and focus,” unpublished manuscript, UCSC. —— (1997) “Focus and weak noun phrases,” Natural Language Semantics 5.1: 53–78. —— (2000) What Counts, Cambridge, MA: MIT Press. Higginbotham, J. (1983a) “The logical form of perceptual reports,” Journal of Philosophy 80: 100–27. —— (1983b) “A note on phrase-markers,” Revue Québecoise de Linguistique 13.1: 147–66. —— (1985) “On semantics,” Linguistic Inquiry 16: 547–93. —— (1987) “Indefiniteness and predication,” in E. Reuland and A. ter Meulen (eds), 43–70. —— (1988) “Contexts, models, and meaning,” in R. Kempson (ed.) Mental Representations: The Interface between Language and Reality, Cambridge: Cambridge University Press. —— (2000) “On events in linguistic semantics,” in J. Higginbotham, F. Pianesi and A. Varzi (eds). Higginbotham, J., F. Pianesi and A. Varzi (eds) (2000) Speaking of Events, Oxford: Oxford University Press. Hoffman, J. (1996) “Syntactic and paratactic word-order effects,” unpublished PhD dissertation, University of Maryland, College Park. Hofstadter, D. R. (1979) Godel, Escher, Bach, New York: Vintage. Honcoop, M. (1998) “Excursions in dynamic binding,” unpublished PhD dissertation, Leiden University. Horn, L. (1989) A Natural History of Negation, Chicago: University of Chicago Press. Hornstein, N. (1993) “Expletives: a comparative study of English and Icelandic,” unpublished manuscript, University of Maryland, College Park. —— (1995a) Logical Form: From GB to Minimalism, Oxford: Blackwell. —— (1995b) “Putting truth into Universal Grammar,” Linguistics and Philosophy 18.4: 381–400. —— (2001) Move: A Minimalist Theory of Construal, Oxford: Blackwell. Hornstein, N. and J. Nunes (1999) “Asymmetries between parasitic gap and across-theboard extraction constructions,” unpublished manuscript, University of Maryland, College Park and University of Campinas. Huang, C.-T. J. (1982) “Logical relations in Chinese and the theory of grammar,” unpublished PhD dissertation, MIT.
351
DERIVATIONS
Iatridou, S. (1990) “About Agr(P),” Linguistic Inquiry 21.4: 551–77. Jackendoff, R. (1972) Semantic Interpretation in Generative Grammar, Cambridge, MA: MIT Press. —— (1982) “The universal grinder,” in B. Levin and S. Pinker (eds) 1991. —— (1990) Semantic Structures, Cambridge, MA: MIT Press. —— (1991) “Parts and boundaries,” in: B. Levin and S. Pinker (eds) Lexical and Conceptual Semantics, Oxford: Blackwell. Jaeggli, O. and K. Safir (eds) (1989) The Null Subject Parameter, Dordrecht: Kluwer. Jelinek, E. (1984) “Empty categories, Case, and configurationality,” Natural Language and Linguistic Theory 2: 39–76. Kahn, D. (1995) Topology: An Introduction to the Point-set and Algebraic Areas, New York: Dover. Kaisse, E. (1985) Connected Speech: The Interaction of Syntax and Phonology, New York: Academic Press. Kamiya, M. (2001) “Dimensional approach to derived nominals,” UMD general paper. Kamp, H. (1984) “A theory of truth and semantic interpretation,” in J. Groenendijk, T. Jannssen and M. Stokhof (eds) Truth, Interpretation, and Information: Selected Papers from the Third Amsterdam Colloquium, Dordrecht: Foris. Kayne, R. (1984) Connectedness and Binary Branching, Dordrecht, Foris. —— (1991) “Romance clitics, verb movement, and PRO,” Linguistic Inquiry 22.4: 647–86. —— (1993) “Toward a modular theory of auxiliary selection,” Studia Linguistica 47.1. —— (1994) The Antisymmetry of Syntax, Cambridge, MA: MIT Press. —— (1997) “Constituent structure and quantification,” unpublished manuscript, CUNY. Keenan, E. (1987) “A semantic definition of ‘indefinite NP’ ,” in The Representation of (In)definiteness, (ed.) E. Reuland and A. ter Meulen, 286–317. Cambridge, MA: MIT Press. Keenan, E. and Y. Stavi (1986) “A semantic characterization of natural language determiners,” Linguistics and Philosophy 9: 253–326. Kempchinsky, P. (1986) “Romance subjunctive clauses and logical form,” unpublished PhD dissertation, UCLA. Kim, K.-S. (1998) “(Anti-)connectivity,” unpublished PhD dissertation, University of Maryland, College Park. Kim, S. W. (1991) “Scope and multiple quantification,” unpublished PhD dissertation, Brandeis University, Waltham, MA. Kiss, K. E. (1987) Configurationality in Hungarian, Dordrecht: Kluwer. —— (ed.) (1995) Discourse Configurational Languages, Oxford: Oxford University Press. Kitahara, H. (1993) “Deducing superiority effects from the shortest chain requirement,” in H. Thráinsson, S. D. Epstein and S. Kuno (eds) Harvard Working Papers in Linguistics 3, Harvard University, Cambridge, MA. —— (1994) “Target Alpha: a unified theory of movement and structure-building,” unpublished PhD dissertation, Harvard University, Cambridge, MA. —— (1997) Elementary Operations and Optimal Derivations, Cambridge, MA: MIT Press. Kratzer, A. (1988) “Stage-level and individual-level predicates,” unpublished manuscript, University of Massachusetts, Amherst. Kraus, L. (1995) The Physics of Star Trek, New York: Basic Books. Kroch, A. (1989) “Asymmetries in long-distance extraction in tree-adjoining grammar,”
352
BIBLIOGRAPHY
in M. Baltin and A. Kroch (eds) Alternative Conceptions of Phrase Structure, Chicago: University of Chicago Press. Kroch, A. and A. Joshi (1985) The Linguistic Evidence of Tree Adjoining Grammar, Philadelphia: University of Pennsylvania Department of Computer and Information Science Technical Report MS-CIS-85-16. Kuroda, Y. (1972) “The categorical and the thetic judgement: evidence from Japanese syntax,” Foundations of Language 9. Laka, I. (1990) “Negation in syntax: on the nature of functional categories and projections,” unpublished PhD dissertation, MIT. —— (1994) On the Syntax of Negation, New York: Garland. Laka, I. and J. Uriagereka (1987) “Barriers for Basque and vice-versa,” Proceedings of NELS 17, University of Massachusetts, Amherst, 394–408. Lakarra, J. and J. Ortiz de Urbina (eds) (1992) “Syntactic theory and Basque syntax,” Diputación Foral de Gipuzkoa, San Sebastián. Langacker, R. (1987) Foundations of Cognitive Grammar, Stanford, CA: Stanford University Press. Langdon, M. and P. Muro (1979) “Subjects and switch reference in Yuman,” Folia Linguistica 13. Langendoen, T. and P. Postal (1984) The Vastness of Natural Languages, Oxford: Blackwell. Lappin, S., R. Levine and D. Johnson (2000) “The structure of unscientific revolutions,” Natural Language and Linguistic Theory 18.3: 665–71. Larson, R. (1988) “On the double object construction,” Linguistic Inquiry 19.3: 335–91. Larson, R. and G. Segal (1995) Knowledge of Meaning, Cambridge, MA: MIT Press. Lasnik, H. (1972) “Analyses of negation in English,” unpublished PhD dissertation, MIT. —— (1976) “Remarks on coreference,” Linguistic Analysis 2: 1–22. —— (1990) “Pronouns and non-coreference,” paper presented at the Princeton Conference on Linguistic and Philosophical Approaches to Anaphora. —— (1995) “Verbal morphology: Syntactic Structures meets the Minimalist Program,” in H. Campos and J. Kempchinsky (eds). —— (1999) Minimalist Analyses, Oxford: Blackwell. Lasnik, H. and J. Kupin (1977) “A restrictive theory of transformational grammar,” Theoretical Linguistics 4: 173–96. Lasnik, H. and M. Saito (1984) “On the proper treatment of proper government,” Linguistic Inquiry 15: 235–89. —— (1992) Move , Cambridge, MA: MIT Press. Lasnik, H. and J. Uriagereka (forthcoming) Essential Topics in the Minimalist Program, Oxford: Blackwell. Lebeaux, D. (1983) “A distributional difference between reciprocals and reflexives,” Linguistic Inquiry 14.4: 723–30. —— (1988) “Language acquisition and the form of the grammar,” unpublished PhD dissertation, University of Massachusetts, Amherst. —— (1991) “Relative clauses, licensing, and the nature of the derivation,” in S. Rothstein (ed.) Perspectives on Phrase Structure, New York: Academic Press, 209–39. —— (1996) “Determining the kernel,” in J. Rooryck and L. Zaring (eds). Lewis, D. (1973) Counterfactuals, Oxford: Blackwell. Lightfoot, D. (1995) “The evolution of language: adaptationism or the spandrels of San Marcos?”, paper presented at Developments in Evolutionary Biology, Istituto di Arte e Scienzia, Venice.
353
DERIVATIONS
Loebner, S. (1987) “Natural language and generalized quantifier theory,” in P. Gardenfors (ed.) Generalized Quantifiers, Dordrecht: Reidel. Longobardi, G. (1994) “Reference and proper names,” Linguistic Inquiry 25.4: 609–66. Markman, E. (1989) Categorization and Naming in Children, Cambridge, MA: MIT Press. Markman, E. and G. Wetchel (1988) “Children’s use of mutual exclusivity to constrain the meaning of words,” Cognitive Psychology 20: 120–57. Marr, D. (1982) Vision, San Francisco: W.H. Freeman. Martin, R. and J. Uriagereka (forthcoming) “Collapsed waves in syntax,” unpublished manuscript, Tsukuba University and University of Maryland, College Park. Martins, A. (1994) “Clíticos na história do Português,” unpublished PhD dissertation, University of Lisbon. May, R. (1977) “The grammar of quantification,” unpublished PhD dissertation, MIT. —— (1985) Logical Form, MIT Press. Meinzer, K. (1994) Thinking in Complexity, Berlin: Springer. Milsark, G. (1974) “Existential sentences in English,” unpublished PhD dissertation, MIT. —— (1977) “Toward an explanation of certain peculiarities of the existential construction in English,” Linguistic Analysis 3: 1–29. Mitxelena, L. (1981) “Galdegaia eta mintzagaia euskaraz,” in Euskal Linguistika eta Literatura: Bide berriak, University of Deusto, Bilbao, Spain. Moll, A. (1993) “Estructuras de rección en un texto colonial del siglo XVII,” PhD dissertation, University of Maryland, College Park. Mori, N. (1997) “A syntactic representation for internal aspect,” generals paper, University of Maryland, College Park. —— (forthcoming) Untitled PhD dissertation, University of Maryland, College Park. Munn, A. (1994) “A minimalist account of reconstruction asymmetries,” in Proceedings of NELS 24, University of Massachusetts, Amherst, 397–410. Muromatsu, K. (1995) “The classifier as a primitive: individuation, referability, and argumenthood,” paper presented at GLOW, Tromsö. —— (1998) “On the syntax of classifiers,” unpublished PhD dissertation, University of Maryland, College Park. Nagel, E. and J. Newman (1958) Godel’s Proof, New York: New York University Press. Nespor, M. and I. Vogel (1986) Prosodic Phonology, Dordrecht: Foris. Newmeyer, F. (1980) Linguistic Theory in America: The First Quarter Century of Transformational Generative Grammar, New York: Academic Press. Nunes, J. (1995) “The copy theory of movement and linearization of chains in the Minimalist Program,” unpublished PhD dissertation, University of Maryland, College Park. —— (1998) “Sideward movement and linearization of chains in the Minimalist Program,” unpublished manuscript, University of Campinas. —— (1999) “Linearization of chains and phonetic realization of chain links,” in S. D. Epstein and N. Hornstein (eds), 217–50. Nunes, J. and E. Thompson (1998) Appendix to Uriagereka 1998. Ormazabal, J., J. Uriagereka and M. Uribe-Etxebarria (1994) “Word order and whmovement: towards a parametric account,” paper presented at the 17th GLOW Colloquium, Vienna. Ortiz de Urbina, J. (1989) Parameters in the Grammar of Basque: a GB Approach to Basque Syntax, Dordrecht: Foris. Otero, C. (1996) “Head movement, cliticization, precompilation, and word insertion,” in R. Freidin (ed.).
354
BIBLIOGRAPHY
Parsons, T. (1990) Events in the Semantics of English, Cambridge, MA: MIT Press. —— (2000) “Underlying states and time travel,” in J. Higginbotham, F. Pianesi and A. Varzi (eds). Perlmutter, D. (1971) Deep and Surface Constraints in Syntax, New York: Holt, Rinehart, and Winston. Pica, P. (1987) “On the nature of the reflexivization cycle,” in Proceedings of NELS 17, University of Massachusetts, Amherst, Vol. 2: 483–99. Pica, P. and W. Snyder (1995) “Weak crossover, scope, and agreement in a minimalist framework,” in R. Aranovich, W. Byrne, S. Preuss and M. Senturia (eds) Proceedings of the 13th West Coast Conference on Formal Linguistics, Stanford, CA: CSLI Publications. Pietroski, P. (1998) “Actions, adverbs, and agency,” Mind 107: 73–112. —— (1999) “Plural descriptions as existential quantifiers,” in S. Aoshima, J. Drury and T. Neuvonen (eds). —— (2000) Causing Actions, Oxford: Oxford University Press. —— (forthcoming a) “Small verbs, complex events: analyticity without synonymy,” in L. Antony and N. Hornstein (eds) Chomsky and His Critics, Oxford: Blackwell. —— (forthcoming b) Events and Semantic Architecture, Oxford: Oxford University Press. Postal, P. (1966) “On so-called ‘pronouns’ in English,” in F. P. Dineen (ed.) Report of the 17th Annual Round Table Meeting on Linguistics and Language Studies, Washington, DC: Georgetown University Press. Prince, A. and P. Smolensky (1993) “Optimality theory,” unpublished manuscript, Rutgers University and University of Colorado. Pustejovsky, J. (1995) The Generative Lexicon, Cambridge, MA: MIT Press. Putnam, H. (1975) Mind, Language and Reality: Philosophical Papers, Cambridge: Cambridge University Press. —— (1983) “Models and reality,” in P. Benaceraff and H. Putnam (eds) Philosophy of Mathematics, Cambridge: Cambridge University Press. Quine, W. V. O. (1960) Word and Object, Cambridge, MA: MIT Press. —— (1970) Philosophy of Logic, Englewood Cliffs, NJ: Prentice Hall. Raposo, E. (1988) “Romance inversion, the Minimality Condition and the ECP,” in J. Blevins and J. Carter (eds) Proceedings of NELS 18, University of Massachusetts, Amherst, 357–74. Raposo, E. and J. Uriagereka (1990) “Long-distance Case assignment,” Linguistic Inquiry 21.4: 505–37. —— (1996) “Indefinite se,” Natural and Linguistic Theory 14: 749–810. Reid, T. (1785) Essays on the Intellectual Powers of Man, abridged edition by A. D. Woozley (1941) London: Macmillan. Reinhart, T. (1995) “Interface strategies,” OTS Working Papers, Utrecht. Reinhart, T. and Reuland. E. (1993) “Reflexivity,” Linguistic Inquiry 24: 657–720. Reuland, E. (1983) “The extended projection principle and the definiteness effect,” in Proceedings of the 2nd West Coast Conference on Formal Linguistics, Stanford, CA. Reuland, E. and A. ter Meulen (eds) (1987) The Representation of (In)definiteness, Cambridge, MA: MIT Press. Richards, N. (1997) “What moves where when in which language,” unpublished PhD dissertation, MIT. Rizzi, L. (1982) Issues on Italian Syntax, Dordrecht: Foris. —— (1986) “On chain formation,” in H. Borer (ed.) The Grammar of Pronominal Clitics, New York: Academic Press, 65–95.
355
DERIVATIONS
—— (1990) Relativized Minimality, Cambridge: MIT Press. Roberts, I. (1994) “Long head movement, Case, and agreement in Romance,” in N. Hornstein and D. Lightfoot (eds) Verb Movement, Cambridge: Cambridge University Press. Rooryck, J. and L. Zaring (eds) (1996) Phrase Structure and the Lexicon, Dordrecht: Kluwer. Ross, J. (1967) “Constraints on variables in syntax,” unpublished PhD dissertation, MIT. Russell, B. (1940) An Inquiry into Meaning and Truth, London: Allen and Unwin. Safir, K. (1987) “What explains the definiteness effect,” in E. Reuland and A. ter Meulen (eds), 71–97. Safir, K. (1992) “Implied non-coreference and the pattern of anaphora,” Linguistics and Philosophy 15: 1–52. Schein, B. (1993) Plurals and Events, Cambridge, MA: MIT Press. Schmitt, C. (1993) “Ser and estar: a matter of aspect,” in Proceedings of NELS 22, University of Massachusetts, Amherst. —— (1996) “Aspect and the syntax of noun phrases,” unpublished PhD dissertation, University of Maryland, College Park. Schwegler, A., B. Tranel and M. Uribe-Etxebarria (eds) (1998) Romance Linguistics: Theoretical Perspectives, Amsterdam: Benjamins. Selkirk, E. (1984) Phonology and Syntax, Cambridge, MA: MIT Press. Sportiche, D. (1988) “A theory of floating quantifiers and its corollaries for constituent structure,” Linguistic Inquiry 19. —— (1990) “Movement, agreement and Case,” unpublished manuscript, UCLA. Stevens, C. (1995) The Six Core Theories of Modern Physics, Cambridge, MA: MIT Press. Stowell, T. (1978) “What was there before there was there,” in Proceedings of CLS 14. —— (1981) “Origins of phrase structure,” unpublished PhD dissertation, MIT. Suh, S. (1992) “The distribution of topic and nominative-marked phrases in Korean: the universality of IP structure,” MITWPL 16. Szabolcsi, A. (1981) “The possessive construction in Hungarian: a configurational category in a nonconfigurational language,” Acta Linguistica Academiae Hungarivae 31. —— (1983) “The possessor that ran away from home,” The Linguistic Review 3: 89–102. Szabolcsi, A. and F. Zwart (1993) “Weak islands and an algebraic semantics for scopetaking.” Natural Language Semantics 1–3: 235–84, reprinted in A. Szabolcsi (ed.) (1997) Ways of Scope-Taking, Dordrecht: Kluwer. Takahashi, D. (1994) “Minimality of movement,” unpublished PhD dissertation, University of Connecticut, Storrs. Taraldsen, T. (1992) “Agreement as pronoun incorporation,” paper presented at the GLOW Colloquium, Lisbon. Taylor, B. (1985) Modes of Occurrence, Oxford: Blackwell. Tenny, C. (1994) Aspectual Roles and the Syntax-Semantics Interface, Dordrecht: Kluwer. Thalberg, I. (1972) Enigmas of Agency, London: Allen and Unwin. Thompson, D. (1945) On Growth and Form, Cambridge: Cambridge University Press, reprinted 1992. Thompson, E. (1996) “The syntax of tense,” unpublished PhD dissertation, University of Maryland, College Park. Thomson, J. (1971) “Individuating actions,” Journal of Philosophy 68: 771–81. —— (1977) Acts and Other Events, Ithaca: Cornell University Press. Torrego, E. (1983) “More effects of successive cyclic movement,” Linguistic Inquiry 14.3.
356
BIBLIOGRAPHY
—— (1984) “On inversion in Spanish and some of its effects,” Linguistic Inquiry 15.1: 103–29. —— (1996) “On quantifier float in control clauses,” Linguistic Inquiry 27.1: 111–26. Uriagereka, J. (1988a) “On government,” unpublished PhD dissertation, University of Connecticut, Storrs. —— (1988b) “Different strategies for eliminating barriers,” in J. Blevins and J. Carter (eds) Proceedings of NELS 18, University of Massachusetts, Amherst, 509–22. —— (1993) “Specificity and the name constraint,” in University of Maryland Working Papers in Linguistics 1, College Park. —— (1994) “A note on obviation,” unpublished manuscript, University of Maryland, College Park. —— (1995a) “An F position in Romance,” in K. E. Kiss (ed.). —— (1995b) “Aspects of clitic placement in Western Romance,” Linguistic Inquiry 25.1: 79–123. —— (1996) “Determiner clitic placement,” in Freidin (ed.). —— (1998) Rhyme and Reason, an Introduction to Minimalist Syntax, Cambridge, MA: MIT Press. —— (2001a) “Doubling and possession,” in B. Gerlach and J. Grijzenhout (eds) Clitics in Phonology, Morphology and Syntax, Amsterdam: Benjamins. —— (2001b) “Pure adjuncts,” invited talk delivered at the Coloquio de Gramática Generativa, to appear in the proceedings. —— (forthcoming) “Remarks on the syntax of nominal reference,” University of Maryland, College Park. Vergnaud, J.-R. and M. Zubizarreta (1992) “The definite determiner and the inalienable constructions in French and in English,” Linguistic Inquiry 23.4: 595–652. Vermazen, B. and M. Hintikka (eds) (1985) Essays on Davidson: Actions and Events, Oxford: Clarendon Press. Vikner, S. (1985) “Parameters of binder and of binding category in Danish,” Working Papers in Scandinavian Syntax 23, University of Trondheim. Wanner, D. (1987) The Development of Romance Clitics from Latin to Old Romance, Berlin: Mouton De Gruyter. Wasow, T. (1972) “Anaphoric relations in English,” unpublished PhD dissertation, MIT. Watanabe, A. (1992) “Subjacency and S-Structure movement of wh-in-situ,” Journal of East Asian Linguistics 1: 255–91. Weinberg, A. (1999) “A minimalist theory of human sentence processing,” in S. D. Epstein and N. Hornstein (eds), 283–315. West, G., J. Brown and B. Enquist (1997) “A general model for the origin of allometric scaling laws in biology,” Science 280: 122–5. Wilder, C., H.-M. Gaertner and M. Bierwisch (eds) (1996) Studia Grammatica 40: The Role of Economy Principles in Linguistic Theory, Berlin: Akademie Verlag. Zwart, J.-W. (1989) “The first Case,” unpublished MA thesis, University of Groningen. —— (1998) “Review article: The Minimalist Program, Noam Chomsky,” Journal of Linguistics 3.4: 213–26.
357
INDEX
A-movement: barriers 95, 101–4, 114; infinite regress 281–2; L-relatedness 156–7; successive cyclicity 139 A-position 101, 104 A' position 101, 104 A-reconstruction 128–9 AC see Anchoring Corollary accessibility, antecedence 57 accordion events 268–71, 272, 274 ACD see Antecedent Contained Deletion acquisition 311–15 Actualization Principle 233–4 adjuncts: Basque 92–3; Condition on Extraction Domains 66, 70; copy theory 75, 76–7, 79–80; dimensions 266, 278–80, 283–4; infinite regress 280–3; Multiple Spell-Out 51; reprojection 126; sub-events 271–3; thematic roles 277; unboundedness 268; wh-movement in Hungarian 111 admissibility conditions 3, 23 Agents, sub-events 275–6, 277–8 agreement: antecedence and Multiple Spell-Out 58–9; barriers 94–5, 99–100, 102; Basque 87; cliticization 56; expletive constructions 28–32, 34–7, 40–1; integrals 183–4; Multiple Spell-Out 11, 51, 52, 56, 58–9, 60; pro 105; thetic (stage-level) and categorical (individual level) predication 218–19; wh-movement 107–8, 109–10, 111 Agreement Criterion 60 Altube, S. 88 anaphora 167–73 Anchoring Corollary (AC) 230–4 antecedence: Multiple Spell-Out 11, 56–60; reprojection 131 Antecedent Contained Deletion (ACD) 281, 284
arguments: Basque 87–8, 91; determiners 116–19; dimensions 284, 308–10; infinite regress 282–3; ordering 13; reprojection 131–2, 134–5; thematic roles 277; thetic (stage-level) and categorical (individual level) predication 212, 214–15 associate (A) 28–32, 34–7, 40–1, 126–7 Assumption One 270, 272, 274–6 Assumption Two 270, 273–4, 274–6 asymmetry condition 73 atomism 15–18, 20, 284–5, 293; internal make-up of events 274; sub-events 272–3; thematic relations 266 Attract: barriers 96–7, 100; expletive constructions 30–2; Merge 52; reprojection 117 Atutxa, A. 309 Authier, J.-M. 190 bare output conditions 95–6, 148 bare phrase structure (BPS) 10; determiners 117; Linear Correspondence Axiom 45–6, 68; reprojection 134; successive cyclicity 137 barriers 86, 91–105, 109–14 Basque 86–114, 166, 170–1 be 192–211 Bello, Andrés 253 Benveniste, E. 192, 194 binary quantifiers: ordering 13, 14; reprojection 122, 123, 126–9, 131, 134–5 binding: clitics 158–9; dynamic 133–4, 135; L-relatedness 156, 157; parataxis and hypotaxis 255, 265 Binding Condition B 158–9, 161 Bleam, T. 302, 305 body plans 26–7 Bonet, E. 215 Bouchard, D. 162
358
INDEX
bound variable binding 255, 265 bounding nodes 136 Bounding Theory 24 Bowers, J. 214 BPS see bare phrase structure Bresnan, J. 98, 152, 153 Brody, M. 109, 110 Brown, J. 33 Burge, T. 188–9, 224–5, 240–1, 248, 249 Burzio, Luigi 34, 170 c-command 73, 76, 78–9, 84–5 calculus of variations 148–9 Carlson, G. 213 cascades: command paths 153; computational complexity 12; Linear Correspondence Axiom 48–9; Multiple Spell-Out 10–12, 51–8, 64; reprojection 124–5 Case: anaphora 167; barriers 94–5, 105–8; Basque 86–7; clitics 161, 169–73; government 24–5; Government and Binding 23; L-relatedness 156–7; Multiple Spell-Out 11–12, 59–60; obviation 165–7; parasitic gap constructions 74; parataxis and hypotaxis 260; reprojection 119, 128–9; thetic (stage-level) and categorical (individual level) predication 217, 218–20; uninterpretable features 163–5 Case features 23–4, 157 Castillo, Juan Carlos 15, 136–46, 301 Catalan 215 categorical-predication (individual-level predication ) 212–34 categories 18–20; warps 288–317 CATEGORIES 212, 218–20, 230–4 Cattell, R. 69 causal relations 292–3 CED see Condition on Extraction Domains center embedding 38 chains 6; Condition on Extraction Domains 72, 84–5; copy theory 73; fusion 170–3; Last Resort Condition 149–50; Minimal Link Condition 96; parasitic gap constructions 76; reprojection 117, 121, 123–5, 129, 130–1, 134 Chamorro 107–8, 111 change potential 302, 309 Checking Convention 164–5, 169 Chierchia, G. 221, 225 Chinese 107, 238–9, 245, 246, 248 Chomsky, Noam 1; A-positions 104;
associates 126; Attract 117; bare phrase structure 10, 68; Barriers framework 91; being and having 192–3; bounding nodes 136; Case 106, 163, 165; chains 123; closeness 112; complementizers 258–9; computational complexity 12; distance 52–3; economy 38, 96; L-relatedness 101; Last Resort Condition 149–50; Linear Correspondence Axiom 45, 152; Merge 50, 52, 137–8; “mind plan” 26–7; minimal domain 230; Minimal Link Condition 96, 112; Minimalist Program 9, 22–8, 34; Multiple Spell-Out 51, 58, 65, 145; names 243; numerations 103; obviation 151; optimality 29, 147–8; parasitic gap constructions 74, 82, 85; parataxis and hypotaxis 253, 263; topics 217–18; wh-islands 112 Chung, S. 107, 213 Cinque, G. 54, 153, 289 classifiers 238–9, 240, 244, 248 clausal typing 144–5 clitic climbing 273–4 clitic doubling 168–9 clitics 158–62; anaphora 168–73; Multiple Spell-Out 55–6, 64; parataxis and hypotaxis 263–5; thetic (stage-level) and categorical (individual level) predication 215–16 closeness 112 command: distance 52–3; Linear Correspondence Axiom 46–8; Multiple Spell-Out 10–12, 54, 64, 151–5; obviation 151 command units (CUs): Linear Correspondence Axiom 46–9; locality 155; merger 152–4; Multiple Spell-Out 49–58, 63–5 como (Spanish) 253–65 complementizers: Basque 88–9; parataxis and hypotaxis 255–65; tuck-in 15 complements: Basque 88–90; Condition on Extraction Domains 68, 72; determiners 116–17; hypotaxis 253; Multiple SpellOut 51, 53, 54, 69; parataxis and hypotaxis 255–65 Compositionality Thesis 301 computational complexity 8–9, 12 concatenation algebras 1–2, 3, 291, 294–6 Condition C 65 Condition on Extraction Domains (CED): cyclicity 66–85; Multiple Spell-Out 59–64; reprojection 120, 131; successive cyclicity 139
359
INDEX
dimensions 18–20, 266–87, 288–317 Disjoint Reference Rule 166 disjuncts 280–3, 284 distance 52–3 Distributed Morphology 98, 99 Doherty, C. 214, 218 Drury, J. 58, 138 Dutch 216 dynamic binding 133–4, 135 dynamic elegance 148–9 dynamically split model 98, 152–3; A-movement 101–2; Linear Correspondence Axiom 48, 54, 57
connectionist networks 23–4 constraint violations 23 constraints 24 context 223–9, 231, 273–4 Control theory, government 24 convergence 6, 12, 72, 96, 104 copy deletion 62–4 copy theory of movement 67, 70–2, 73–84 copy traces 5 copying, possession 202 coreference 167–70, 173–4 count nouns 301, 303–6, 315 counterfactual identity statements 235–52 counterparts 251 CUs see command units cyclicity 5–6, 8–9; barriers 98–9, 101–4; computational complexity 12; Condition on Extraction Domains 70–2; copy theory 76–7, 79–85; extraction domains 66–85; ordering 14–15; reprojection 124–5; wh-movement in Basque 114; see also successive cyclicity Danish 28, 167–8, 169–70 Davidson, D. 215, 253 DE see definiteness effect De Hoop, H. 214, 216, 217 decompositional approach 20, 266, 272–3, 284–5, 293 Deep-structure (D-structure): Government and Binding model 5–6; levels of representation 2; Minimalist Program 9, 26; representational systems 5–6; thetic (stage-level) and categorical (individual level) predication 214 definite descriptions: anaphora 167; obviation 151, 165–6; reprojection 125–6, 127–8; split constructions 122–3 definiteness effect (DE) 122–3; Basque 89–90; integrals 181, 184–7; reprojection 126–7, 128; thetic (stage-level) and categorical (individual level) predication 217 demonstratives: names 239, 240–1, 248; reprojection 127–8; split constructions 122–3 “derivation”, definition 2 derivational blocks 151–4 derivational entropy 29–33 derivational horizons 103, 149–50 derivational systems 2–20 determiners 115–35; cliticization 55–6; names 240–1, 248, 250 Diesing, M. 128, 214, 215–16, 219
economy 25–6; barriers 98; Condition on Extraction Domains 72; elegance 147–8; entropy 32, 33; Linear Correspondence Axiom 45–6, 52; MP 34, 37–40, 96; numerations 103 ECP see Empty Category Principle elegance 147–75 ellipsis, infinite regress 281–2 Elsewhere Condition 208, 312–13 Empty Category Principle (ECP) 25–6, 27, 33 English: Case 166; expletive constructions 28–9, 35–7; integrals 180, 185–6; L-relatedness 104; names 239; nouns 305, 306–7; possession 193–4, 196–8; thetic (stage-level) and categorical (individual level) predication 226–7 Enquist, B. 33 entailments 268; warps 288–317 entropy condition 29–33, 37–41 EPC see external poss construction Epstein, S. D. 9–10, 47, 289 EST see Extended Standard Theory events, accordion 268–71, 272, 274 evolution 147–8 Exclusivity Hypothesis 313–14, 315 existentials 126–7, 182–7 expletives 34–42; definiteness effects 185–6; MP 28–32; reprojection 126; sideward movement 82 Extended Projection Feature 206 Extended Projection Principle 24, 108, 113, 129 Extended Standard Theory (EST) 6 Extension Condition 137–8, 258–9 external arguments 13, 116–17, 118–19, 122–3 external poss construction (EPC) 189–91 extraction: barriers 94–5; cyclicity 66–85; see also Condition on Extraction Domains
360
INDEX
hypotaxis 253–65 Hypothesis A 272–3, 274, 285 Hypothesis B 272–3, 274, 285
familiarity/novelty condition 59, 64–5 feature attraction: barriers 96–101, 102, 105–7, 113, 114; Case 164–5 feature checking 23, 160–5 FF-bags 164–5, 174 focalization 54, 110–11, 233 focus projection 153 Fodor, Jerry 217, 233; accordion events 269, 270; atomism 293; modularity 23, 299; representational theory of mind 3; sub-events 271, 273, 276 Freeze, R. 201, 203, 205 Freidin, R. 22 French 189–91 Fukui, N. 33, 91, 148, 149 Full Interpretation 6 Galician 55–6, 113, 168–9 GB see Government and Binding Generalized Transformation 259, 261–2 generics, reprojection 127–8 glitches 9 Goedel 297, 300 Gould, Stephen Jay 48, 147 Government and Binding (GB) model: comparison to MP 22–8; expletive constructions 28–32; last resort 33; modularity 22–4; representation 5–6 Greed 26, 117, 119 Haken, H. 149 Hale, K. 143, 285 Halle, M. 98 have 192–211 head movement, L-relatedness 156–7 heavy categories, barriers 99, 100 Heavy NP shift 35 Hebrew 35–7 Heim, I. 128 Herburger, E. 217, 220–1 hierarchies: dimensions 266–8, 283–4; warps 288–317 Higginbotham, James 1, 68, 165, 186–7; names 240; thetic (stage-level) and categorical (individual level) predication 213, 215, 220, 222, 224–5 Holmberg, Anders 34 Honcoop, M. 121, 122, 123, 126, 133 horizontal syntax 288–317 Horn, L. 314–15 Hornstein, Norbert 14, 16, 74, 115–35, 179–91 Huang, C.-T. S. 120 Hungarian 109–12, 180, 226–7
Iatridou, S. 214 identity 84 identity statements, counterfactual 235–52 II see integral interpretation impenetrability 12 Incompleteness Theorem 300 incorporated quantifiers 125–6 incorporation, reprojection 125–6 indefinite descriptions 238 individual-level predicates 212–34 infinite regress 280–3, 284 inflections, checking 159–60 integral interpretation (II) 179–91 intelligibility 7 internal arguments 13, 116–17, 118–19, 122–3 internal events 271–6 internal poss construction (IPC) 189–91 interpretability, antecedence 57 IPC see internal poss construction Irish, small clauses 218–20 “is-a” relation 2 islands: impenetrability 12; MP 86; reprojection 14, 120–5, 129–31, 133–4, 134; successive cyclicity 15, 142–6; see also Condition on Extraction Domains Italian 94–5, 108, 172 iteration 267–8, 283 Jackendoff, R. 98, 152, 293, 298 Jaeggli, O. 100 Japanese 100–1, 107 Jean, Roger 39 Johnson, D. 28, 33–4, 35–42 Joshi, A. 136 Judgement Principle 233–4 Kaisse, E. 55 Kayne, Richard 1; being and having 192–211; Linear Correspondence Axiom 45–6, 62, 67, 120; Multiple Spell-Out 51; relational terms 16–17, 256–7, 259; small clauses 180–1; thetic (stage-level) and categorical (individual level) predication 226–7 Keenan, E. 186, 187 Keyser, S. J. Kim, S. W. 125–6 kind plurals 128 kind-denoting plurals 122–3
361
INDEX
Kiss, K. E. 110, 111 Korean 125–6 Kratzer, A. 214–15, 216 Kroch, A. 136 Kupin, J. 23 Kuroda, Y. 65, 212, 216, 217, 218 L-relatedness: A-movement 101–4; Case 106, 107–8; locality 155–7; wh-movement 111–12, 113, 114 L-relatedness Lemma 101–2, 104, 113 labels 115–35, 152 Laka, I. 90, 91, 93, 94, 260 Langendoen, T. 280, 284 language acquisition 311–15 Language Acquisition Device (LAD) 312–15 Lappin, S. 28, 33–4, 35–42 Larson, R. 118 Lasnik, Howard 1, 23, 25–6, 98, 152, 166 last resort 33, 37–8; Condition on Extraction Domains 84; copy theory and parasitic gap constructions 77, 78, 79; Multiple Spell-Out 53; optimality 147–8 Last Resort Condition (LRC) 149–50, 171 Law of the Conservation of Patterns 14 LCA see Linear Correspondence Axiom Least Action 33 Lebeaux, D. 55, 152, 168 legibility conditions 6, 7, 66 levels of representation 1–2, 6, 8–9, 10 Levine, R. 28, 33–4, 35–42 Lewis, D. 251 LF: A-movement 101–2; Case 25; command 56–7, 64, 151–5; as a level 174; levels of representation 6; Linear Correspondence Axiom 49; Minimalist Program 9–10; Multiple Spell-Out 11, 56–7, 64, 151–5; reprojection 119–25, 134–5; thetic (stage-level) and categorical (individual level) predication 216–21, 223, 225–6, 234 Lightfoot, D. 147 Lindemayer, Aristide 27 Linear Correspondence Axiom (LCA): A-movement 102; computational complexity 12; Condition on Extraction Domains 67, 68–9; copy theory 73; Multiple Spell-Out 10, 45–52, 62, 151–2; reprojection 120 linearization 45–54; command paths 152–3; Condition on Extraction Domains 68–72; copy theory of movement 73, 84; parasitic gap constructions 76, 78–9
Linearize 68–71, 76 Local Association 137–8 Local Binding Condition 172, 174 local derivational horizons 29 locality: CED 66–7; expletive constructions 30; obviation 155–7, 165–7, 173–4; relational terms 17 LRC see Last Resort Condition McCloskey, J. 213 Madurese 145 manifolds 242–3, 244–5, 248, 250, 310, 314 mapping hypothesis 128 Marantz, Alec 82, 98, 102–3 Martin, R. 29 Martins, A. 260 mass nouns 301, 303–7, 315 mass term constructions 188–9 matching, Case 164 May, R. 225 Merge 24; arguments 119; Condition on Extraction Domains 68, 69–70; cyclicity 102; Linear Correspondence Axiom 45–8; Multiple Spell-Out 49–53, 60–2; parasitic gap constructions 74; successive cyclicity 136, 137–8, 140–3, 145–6 merger 151–3, 255 Milsark, G. 212, 216, 217, 234 mind plans 26–7 Minimal Domain 160–1, 230 Minimal Link Condition (MLC) 53, 96; Condition on Extraction Domains 67; locality 155; possession 199, 203; whislands 112–13 Minimalist Program (MP) 9–10, 22–42; Basque movements 86–114; Condition on Extraction Domains 66–7; cyclic ordering 14–15; as derivational system 290; economy 25–6, 33, 34, 96; elegance 147–75; government 24–5; islands 86; Linear Correspondence Axiom 45–52; mind plan 26–7; modularity 22–4; Multiple Spell-Out 48–52; representation 5–6; successive cyclic movement 136; thetic (stage-level) and categorical (individual level) predication 214 Mitxelena, L. 92 MLC see Minimal Link Condition modeling, names 245, 250–1 modes 235–52, 276 modification: infinite regress 280–3; sub-events 271–3 modularity thesis 22–4, 299–300 Moll, A. 264
362
INDEX
monadic event predicates 277 Mongol 194 monostrings 23–4 Mori, N. 275, 304, 308–10 Move: Attract 96–7; barriers 105; Condition on Extraction Domains 84; copy theory 73–4; determiners 117, 119; successive cyclicity 138, 141–3, 145–6 MP see Minimalist Program MSO see Multiple Spell-Out Multiple Spell-Out (MSO) 10–12, 45–65; A-movement 101–4; barriers 97–100; Basque movements 86–114; Case 105–7; command 151–5; Condition on Extraction Domains 67, 68–73; islands 86; Linear Correspondence Axiom 68–9; parasitic gap constructions 74–8; tucking in and successive cyclicity 145 Munn, A. 74 Muromatsu, K. 169, 301, 304, 311 names: counterfactual identity statements 235–52; obviation 151; reprojection 125–6, 127–8; rigidity 314–15; split constructions 122–3; thetic (stage-level) and categorical (individual level) predication 221–3 negation 133–4, 135 negative polarity item (NPI) 121, 124, 133–4 Neo-Davidsonians 17, 283–4 Nespor, M. 55 noncomplements: Condition on Extraction Domains 68; Multiple Spell-Out 51–2, 53–4, 58–64 Norwegian 28–9 Noun-incorporation 125–6 nouns: count/mass distinction 301, 303–7, 315; dimensions 301–11; names 244, 245–51; relational terms 15–18 NPI see negative polarity item numerations 103–4 Nunes, Jairo 62, 66–85 obviation 151–7, 163, 165–7, 172, 173–5 ontology 192–211, 291–2, 295–6, 301–4 optimality 29–33, 37–42, 147–8; Linear Correspondence Axiom 47–8; representational vs derivational systems 8, 9–10 optimality races 96, 102–4 Optimality Theory (OT) 5, 23–4, 32–3, 37–42 ordering 12–15, 116 Ortiz de Urbina, J. 88–90, 92
OT see Optimality Theory Otero, C. 55 parasitic gap constructions 66–7, 73–84 parataxis 253–65 participial agreement 28–32, 34–7 participial (P) expressions 28–32, 34–7, 40–1 performance 8, 51, 55–6, 57–8, 64–5 Perlmutter, D. 158 PF: A-movement 101–2; cliticization 64; command paths 151–4; copy theory 73, 76; expletive expressions 41; levels of representation 6, 174; Linear Correspondence Axiom 45–9, 50, 68–9; Minimalist Program 9–10; Multiple Spell-Out 10–11, 54–6, 62, 64, 151–4; wh-movement in Basque 91–2 phonological components: copy theory 73, 76, 78, 81; Linear Correspondence Axiom 68–9; Multiple Spell-Out 54–6 phrase-markers: adjuncts 279–80; command paths 152–4; levels of representation 2; linearization 45–54; sub-events 277 Phrase-structure Grammar 289 Pica, P. 167 pied-piping 63–4, 96, 97, 183 Pietroski, Paul 19, 266–87 pleonastics 41–2, 126–7 plurality markers 159–60 porous modularity 298–300, 302–4 Portuguese 169 possession 193–211; clitic doubling 168–9; dimensionality 310–11; integrals 180–91; parataxis and hypotaxis 255–8; relational terms 16–18 possessor raising 201–3 Postal, P. 165, 280, 284 PREDICABLES 218–20, 230–4 predicates: accordion vents 269–71; determiners 116, 119, 132; dimensions 283–4; names 240, 248–50; reprojection 132; sub-events 272–4, 275–6, 277–8; thetic (stage-level) and categorical (individual level) predication 212–34 primitives 289–90, 294–5 Prince, A. 23 Principle of Strict Cyclicity 64 pro: barriers 97, 99–101; Case 105, 108; clitic doubling 169–70; parataxis and hypotaxis 259–63; thetic (stage-level) and categorical (individual level) predication 228–9; wh-movement in Basque 91
363
INDEX
Richards, N. 136, 137, 138 rigidity 235–52, 314–15 Rizzi, L. 94–5, 99, 108, 109, 172 Romero, Juan 82, 102–3 Rosen, Sara 16, 179–91 rule application 8, 15 rule ordering 5, 8, 12–15
pro-drop: barriers 99; Basque 87–8, 91; Case 105; Extended Projection Principle 108; GB 29; subject extraction 95; wh-movement in Hungarian 109–10 PRO 215–16 projections 115–35; bare Phrase-structure 10; Multiple Spell-Out 50–2; see also reprojection pronouns: coreference 167–70; obviation 165–6; pro-drop languages 105; resumptive 107; sub-events 273 prosodic phrasing 54–6 Pustejovsky, J. 271, 285 QI islands see quantifier induced islands QR see Quantifier Raising quantifier induced (QI) islands 121–5, 129–31, 133–4 Quantifier Raising (QR): infinite regress 281–2; ordering 13–14; reprojection 129–31, 134–5; thetic (stage-level) and categorical (individual level) predication 217, 228–9 quantifiers 115–35; binary 13–14; dimensions 304–7; names 240–4, 247; thetic (stage-level) and categorical (individual level) predication 220, 226–9, 234 que (Spanish) 253–65 questions 86–114, 233 R-predication 189–91 Raposo, E. 171, 213, 218 reconstruction, reprojection 128–9 recursion 268, 283 reference: counterfactual identity statements 235–6; names 242–5, 247–9; possession 204–10; relational terms 16, 196–207; representation 3; see also coreference regress, infinite 280–3 Reid, T. 193–4 Reinhart, T. 168, 173, 174 Relation R 151, 154, 173–4; integrals 179–91; possession 196–205, 208–9, 211 relational terms 15–18, 196–207, 256 representation 1–20 representational theory of mind 3–4 reprojection 14, 117–35 resumptive pronouns 107 Reuland, E. 168, 174 rheme 212–34 rho 2, 3
Safir, K. 29, 100 Sag, I. 217, 233 Saito, M. 25–6 SC see small clauses Schein, B. 225 Schmitt, C. 216 scope: disjuncts 280; possession 211; thetic (stage-level) and categorical (individual level) predication 217–20, 223, 225–9, 234 Scope Principle 225 Second Law of Thermodynamics 32–3, 39 Seely, D. 289 Segal, G. 118 semantics: categorization 288–317; dimensions 298–304, 316; dynamic binding 133–4, 135; Multiple Spell-Out 11–12; thetic (stage-level) and categorical (individual level) predication 220–3, 231 sentential dependencies 253–65 set-theoretic objects 3, 5, 23, 50 SI see standard interpretation sideward movement 67, 73–84 Sigma 260–3 Slaving Principle 32–3, 149–50 small clauses: integral interpretation 179–91; names 242, 250; parataxis and hypotaxis 256; possession 198, 199–200, 209; relational terms 17; thetic (stagelevel) and categorical (individual level) predication 212–34 smallest-number-of-steps-condition 39 Smolensky, P. 23 Spanish: anaphora 171–3; clitics 157, 158–62, 164–5; expletive constructions 28–9; nouns 305–7; parataxis and hypotaxis 253–65; plurality markers 159–60; possession 17, 194–6, 197–8, 200–5, 209–10; thetic (stage-level) and categorical (individual level) predication 213, 221–2, 224, 226–9, 231–4; V2 effect 90, 92; wh-islands 112, 113 Speas, M. 91 specifiers: barriers 96–104, 109–10; Case 105–7; determiners 116–17; Multiple
364
INDEX
Spell-Out 51; wh-movement in Basque 91–5, 113–14 split constructions: dynamic binding 133–4; QI islands 121–4; reprojection 129–31, 135 stage-level predicates 212–34 standard interpretation (SI) 179–91 Stowell, T. 213, 255 strong determiners 115–16, 117–19, 125–6, 128 strong quantifiers 126–7, 130 structural complexity 300–4 Sub-case Principle 312–13, 315 sub-events 271–6 subject extraction 94–5, 109–10 subjects: Basque wh-movement 94–5; Condition on Extraction Domains 66–7, 68–9, 70; copy theory 77–8; thetic (stagelevel) and categorical (individual level) predication 217–18, 220–3, 226–30 successive cyclicity 14–15, 48–9, 136–46 Surface-structure (S-structure) 2, 5–6, 9, 48 Swedish 28–9 symbols 4, 6–8, 9–12 syntactic objects 2, 49–50, 53–4, 57 Szabolcsi, Anna 16–17, 180–1, 192–211, 226–7, 256–7, 259 TAG 136–7 TAG see Tree-Adjoining Grammars Tenny, C. 270 Thematic Criterion 171 thematic relations 104, 212–34, 266–87, 308–10 there constructions, integrals 181–2, 184–7 Theta-Criterion 77 theta-relations: copy theory 77, 84; MP 26; reprojection 130–1; successive cyclicity 143 -roles: determiners 116, 118–19; dimensions 278–9, 283–4; L-relatedness 156–7; Multiple Spell-Out and noncomplements 59–60; successive cyclicity 140 theta-theory: determiners 118–19; reprojection 134 thetic-predication (stage-level predication) 212–34 topic 217–20, 223, 227–8, 229, 230–1, 233–4 Torrego, Esther 90, 92, 112, 253–65 tough-constructions 261 trace copies 62–4 trace deletion 73, 76, 78–9 traces: Basque wh-movement 94–5; Case
105–8; Condition on Extraction Domains 71 Transparency Condition 165–6, 167–8 Transparency Thesis 11–12 tree splitting 137, 140 Tree-Adjoining Grammars (TAG) 15, 136–7, 138–40, 146 tucking in 15, 136–46 Tunica 194 Turkish 194 type-shifting 277–8 unary determiners 118, 122–3, 131–2 unary quantifiers 126–8 uninterpretable features 159–60, 163–5, 170 Vai 194 Verb second (V2) effect 88–93 verbs: classification of 290–1; dimensions 304–11; relational terms 15–18 Vergnaud, J.-R. 189–91 vertical syntax 15–18, 288–317 Visibility Condition 219 Vogel, I. 55 warps 19, 288–317 Weak Cross-over 11, 64–5, 154–5 weak determiners 115–16, 118, 125–6, 128 weak quantifiers 122, 220, 234 West, G. 33 wh-chains 76 wh-copies 75–7, 78, 79–81 wh-elements: Basque 89–91, 94; Case 105–8; Condition on Extraction Domains 66–7, 70–1; copy theory and parasitic gap constructions 77–8; distance 52–3; successive cyclicity 141–2 wh-extraction 264–5 wh-features: Case 105–8; Condition on Extraction Domains 70–1, 72; Multiple Spell-Out 53–4, 59–64; parasitic gap constructions 75–6; successive cyclicity 136, 139, 141–2, 146; wh-islands 112; wh-movement in Chamorro 111 wh-islands 15, 112–13, 142–5 wh-movement: Basque 86–95, 109–11, 113–14; Case 105–8; Condition on Extraction Domains 70–1, 72; cyclic ordering 15; Hungarian 109–12; Multiple Spell-Out 59–64; parasitic gap constructions 75–81; successive cyclicity 141–5, 146; wh-islands 112–13
365
INDEX
wh-phrases: Basque 88–95; Case 105–8; copy theory and parasitic gap constructions 77–8; wh-islands 112–13; wh-movement in Hungarian 109–11
wh-traces 76, 105–8, 109 Zubizareta, M. 189–91 Zwart, J.-W. 22
366