LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
This book was originally selected and revised to be included in the ...
30 downloads
1182 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
This book was originally selected and revised to be included in the World Theses Series (Holland Academic Graphics, The Hague), edited by Lisa L.-S. Cheng.
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
DAVID LEBEAUX NEC Research Institute
JOHN BENJAMINS PUBLISHING COMPANY PHILADELPHIA/AMSTERDAM
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984.
Library of Congress Cataloging-in-Publication Data Lebeaux, David. Language acquisition and the form of the grammar / David Lebeaux p. cm. Includes bibliographical references and index. 1. Language acquisition. 2. Generative grammar. I. Title. P118.L38995 2000 401’.93--dc21 ISBN 90 272 2565 6 (Eur.) / 1 55619 858 2 (US)
00-039775
© 2000 – John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. • P.O.Box 75577 • 1070 AN Amsterdam • The Netherlands John Benjamins North America • P.O.Box 27519 • Philadelphia PA 19118-0519 • USA
Table of Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
C 1 A Re-Definition of the Problem . . . . . . . . . . . . . . . . . . . . . 1.1 The Pivot/Open Distinction and the Government Relation 1.1.1 Braine’s Distinction . . . . . . . . . . . . . . . . . . . . . . 1.1.2 The Government Relation . . . . . . . . . . . . . . . . . 1.2 The Open/Closed Class Distinction . . . . . . . . . . . . . . . . 1.2.1 Finiteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 The Question of Levels . . . . . . . . . . . . . . . . . . . 1.3 Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 A Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Determining the base order of German . . . . . . . . 1.3.2.1 The Movement of NEG (syntax) . . . . . . . 1.3.2.2 The Placement of NEG (Acquisition) . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
7 7 7 9 11 13 14 16 16 17 24 26
C 2 Project-α α, Argument-Linking, and Telegraphic Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Parametric variation in Phrase Structure . . . . . . . . . . . . . 2.1.1 Phrase Structure Articulation . . . . . . . . . . . . . . . 2.1.2 Building Phrase Structure (Pinker 1984) . . . . . . . 2.2 Argument-linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 An ergative subsystem: English nominals . . . . . . 2.2.2 Argument-linking and Phrase Structure: Summary
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
31 31 31 32 38 41 45
vi
TABLE OF CONTENTS
2.3 The Projection of Lexical Structure . . . . . . . . . . . . . . . . 2.3.1 The Nature of Projection . . . . . . . . . . . . . . . . . . 2.3.2 Pre-Project-α representations (acquisition) . . . . . . 2.3.3 Pre-Project-α representations and the Segmentation 2.3.4 The Initial Induction: Summary . . . . . . . . . . . . . 2.3.5 The Early Phrase Marker (continued) . . . . . . . . . 2.3.6 From the Lexical to the Phrasal Syntax . . . . . . . . 2.3.7 Licensing of Determiners . . . . . . . . . . . . . . . . . . 2.3.8 Submaximal Projections . . . . . . . . . . . . . . . . . . .
....... ....... ....... Problem ....... ....... ....... ....... .......
C 3 Adjoin-α α and Relative Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Some general considerations . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The Argument/Adjunct Distinction, Derivationally Considered . 3.3.1 RCs and the Argument/Adjunct Distinction . . . . . . . . . 3.3.2 Adjunctual Structure and the Structure of the Base . . . . 3.3.3 Anti-Reconstruction Effects . . . . . . . . . . . . . . . . . . . . 3.3.4 In the Derivational Mode: Adjoin-α . . . . . . . . . . . . . . 3.3.5 A Conceptual Argument . . . . . . . . . . . . . . . . . . . . . . 3.4 An Account of Parametric Variation . . . . . . . . . . . . . . . . . . . 3.5 Relative Clause Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The Fine Structure of the Grammar, with Correspondences: The General Congruence Principle . . . . . . . . . . . . . . . . . . . . . . . . 3.7 What the Relation of the Grammar to the Parser Might Be . . . C 4 Agreement and Merger . . . . . . . . . . . . . . . . . . 4.1 The Complement of Operations . . . . . . . . . 4.2 Agreement . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Merger or Project-α . . . . . . . . . . . . . . . . . 4.3.1 Relation to Psycholinguistic Evidence 4.3.2 Reduced Structures . . . . . . . . . . . . . 4.3.3 Merger, or Project-α . . . . . . . . . . . . 4.3.4 Idioms . . . . . . . . . . . . . . . . . . . . . . 4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . .
47 51 56 60 65 66 75 84 86
. . . . . . . . . . .
91 91 93 94 94 98 102 104 110 112 120
. . . . 126 . . . . 136
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
145 146 149 154 154 157 165 169 181
vii
TABLE OF CONTENTS
C 5 The Abrogation of DS Functions: Dislocated Constituents and Indexing Relations . . . . . . . . . . . . . . . . 5.1 “Shallow” Analyses vs. the Derivational Theory of Complexity . . . 5.2 Computational Complexity and The Notion of Anchoring . . . . . . . 5.3 Levels of Representation and Learnability . . . . . . . . . . . . . . . . . . 5.4 Equipollence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Case Study I: Tavakolian’s results and the Early Nature of Control . 5.5.1 Tavakolian’s Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Two Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 PRO as Pro, or as a Neutralized Element . . . . . . . . . . . . . . 5.5.4 The Control Rule, Syntactic Considerations: The Question of C-command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.5 The Abrogation of DS functions . . . . . . . . . . . . . . . . . . . . 5.6 Case Study II: Condition C and Dislocated Constituents . . . . . . . . 5.6.1 The Abrogation of DS Functions: Condition C . . . . . . . . . . 5.6.2 The Application of Indexing . . . . . . . . . . . . . . . . . . . . . . 5.6.3 Distinguishing Accounts . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Case Study III: Wh-Questions and Strong Crossover . . . . . . . . . . . 5.7.1 Wh-questions: Barriers framework . . . . . . . . . . . . . . . . . . 5.7.2 Strong Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Acquisition Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.4 Two possibilities of explanation . . . . . . . . . . . . . . . . . . . . 5.7.5 A Representational Account . . . . . . . . . . . . . . . . . . . . . . . 5.7.6 A Derivational Account, and a Possible Compromise . . . . .
. . . . . . . . .
183 184 188 192 199 203 204 207 208
. . . . . . . . . . . . .
213 220 224 226 229 234 239 240 242 245 248 249 251
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
There are two ways of painting two trees together. Draw a large tree and add a small one; this is called fu lao (carrying the old on the back). Draw a small tree and add a large one; this is called hsieh yu (leading the young by the hand). Old trees should show a grave dignity and an air of compassion. Young trees should appear modest and retiring. They should stand together gazing at each other. Mai-mai Sze The Way of Chinese Painting
Acknowledgments
This book had its origins as a linguistics thesis at the University of Massachusetts. First of all, I would like to thank my committee: Tom Roeper, for scores of hours of talk, for encouragement, and for his unflagging conviction of the importance of work in language acquisition; Edwin Williams, for the example of his work; Lyn Frazier, for an acute and creative reading; and Chuck Clifton, for a psychologist’s view. More generally, I would like to thank the faculty and students of the University of Massachusetts, for making it a place where creative thinking is valued. The concerns and orientation of this book are very much molded by the training that I received there. Further back, I would like to thank the people who got me interested in all of this in the first place: Steve Pinker, Jorge Hankamer, Jane Grimshaw, Annie Zaenen, Merrill Garrett and Susan Carey. I would also like to thank Noam Chomsky for encouragement throughout the years. Since the writing of the thesis, I have had the encouragement and advice of many fine colleagues. I would especially like to thank Susan Powers, Alan Munn, Cristina Schmitt, Juan Uriagereka, Anne Vainikka, Ann Farmer, and AnaTeresa Perez-Leroux. I am also indebted to Sandiway Fong, as well as Bob Krovetz, Christiane Fellbaum, Kiyoshi Yamabana, Piroska Csuri, and the NEC Research Institute for a remarkable environment in which to pursue the research further. I would also like to thank Mamoru Saito, Hajime Hoji, Peggy Speas, Juergen Weissenborn, Clare Voss, Keiko Muromatsu, Eloise Jelinek, Emmon Bach, Jan Koster, and Ray Jackendoff. Finally, I would like to thank my parents, Charles and Lillian Lebeaux, my sister, Debbie Lebeaux, and my sons, Mark and Theo. Most of all, I would like to thank my wife Pam, without whom this book would have been done badly, if at all. This book is dedicated to her, with love.
Preface
What is the best way to structure a grammar? This is the question that I started out with in the writing of my thesis in 1988. I believe that the thesis had a marked effect in its answering of this question, particularly in the creation of the Minimalist Program by Chomsky (1993) a few years later. I attempted real answers to the question of how to structure a grammar, and the answers were these: (i)
In acquisition, the grammar is arranged along the lines of subgrammars. These grammars are arranged so that the child passes from one to the next, and each succeeding grammar contains the last. I shall make this clearer below. (ii) In addition, in acquisition, the child proceeds to construct his/her grammar from derivational endpoints (Chapter 5). From the derivational endpoints, the child proceeds to construct the entire grammar. This may be forward or backward, depending on what the derivational endpoint is. If the derivation endpoint, or anchorpoint, is DS, then the construction is forward; if the derivational endpoint or anchorpoint is S-structure or the surface, then the construction proceeds backwards. The above two proposals were the main proposals made about the acquisition sequence. There were many proposals made about the syntax. Of these, the main architectural proposals were the following. (iii) The acquisition sequence and the syntax — in particular, the syntactic derivation — are not to be considered in isolation from each other, but rather are tightly yoked. The acquisition sequence can be seen as the result of derivational steps or subsequences (as can be seen in Chapter 2, 3, and 4). This means that the acquisition sequence gives unique purchase onto the derivation itself, including the adult derivation. (iv) Phrase structure is not given as is, nor is derived top-down, but rather is composed (Speas 1990). This phrase structure composition (Lebeaux 1988), is not strictly bottom up, as in Chomsky’s (1995) Merge, but rather involves
xiv
PREFACE
(a) the intermingling or units, (b) is grammatically licensed, and not simply geometrical (bottom-up) in character (in a way which will become clearer below), and (c) involves, among other transformations, the transformation Project-α (Chapter 4). (v) Two specific composition operations (and the beginnings of a third) are proposed. Adjoin-α (Chapter 3) is proposed, adding adjuncts to the basic nuclear clause structure (Conjoin-α is also suggested in that chapter). In further work, this is quite similar to the Adjunction operation of Joshi and Kroch, and the Tree Adjoining Grammars (Joshi 1985; Joshi and Kroch 1985; Frank 1992), though the proposals are independent and the proposals are not exactly the same. The second new composition operation is Project-α (Chapter 4), which is an absolutely new operation in the field. It projects open class structure into a closed class frame, and constitutes the single most radical syntactic proposal of this book. (vi) Finally, composition operations, and the variance in the grammar as a whole, are linked to the closed class set — elements like the, a, to, of, etc. In particular, each composition operation requires the satisfaction of a closed class element; as well as a closed class element being implicated in each parameter. These constitute some of the major proposals that are made in the course of this thesis. In this preface I would like to both lay out these proposals in more detail, and compare them with some of the other proposals that have been made since the publication of this thesis in 1988. While this thesis played a major role in the coming of the Minimalist Program (Chomsky 1993, 1995), the ideas of the thesis warrant a renewed look by researchers in the field, for they have provocative implications for the treatment of language acquisition and the composition of phrase structure. Let us start to outline the differences of this thesis with respect to later proposals, not with respect to language acquisition, but with respect to syntax. In particular, let us start with parts (iv) and (v) above: that the phrase marker is composed from smaller units. A similar proposal is made with Chomsky’s (1995) Merge. However, here, unlike Merge: (1) (2)
The composition is not simply bottom-up, but involves the possible intermingling of units. The composition is syntactically triggered in that all phrase structure composition involves the satisfaction of closed class elements
xv
PREFACE
(3)
(Chapters 3 and 4), and is not simply the geometric putting together of two units, as in Merge, and The composition consists of two operations among others (these are the only two that are developed in this thesis), Adjoin-α and Project-α.
With respect to the idea that all composition operations are syntactically triggered by features, let us take the operation Adjoin-α. This takes two structures and adjoins the second into the first. (1)
s1: s2:
the man met the woman who loved him
Adjoin-α
the man met the woman who loved him
This shows the intermingling of units, as the second is intermeshed with the first. However, I argue here (Chapter 4), that it also shows the satisfaction of closed class elements, in an interesting way. Let us call the wh-element of the relative clause, who here, the relative clause linker. It is a proposal of this thesis that the adjunction operation itself involves the satisfaction of the relative clause linker (who), by the relative clause head (the woman), and it is this relation, which is the relation of Agreement, which composes the phrase marker. The relative clause linker is part of the closed class set. This relative clause linker is satisfied in the course of Agreement, thus the composition operation is put into a 1-to-1 relation with the satisfaction of a closed class head. (This proposal, so far as I know, is brand new in the literature). (2)
Agree Relative head/relativizer ↔ Adjoin-α
This goes along with the proposal (Chapter 4), which was taken up in the Minimalist literature (Chomsky 1992, 1995), that movement involves the satisfaction of closed class features. The proposal here, however, is that composition, as well as movement, involves the satisfaction of a closed class feature (in particular, Agreement). In the position here, taken up in the Minimalist literature, the movement of an element to the subject position is put into a 1-to-1 correspondence with agreement (Chapter 4 again). (3)
Agree Subject/Predicate ↔ Move NP (Chapter 4)
The proposal here is thus more thoroughgoing than that in the minimalist literature, in that both the composition operation, and the movement operation are triggered by Agreement, and the satisfaction of closed class features. In the minimalist literature, it is simply movement which is triggered by the satisfaction
xvi
PREFACE
of closed class elements (features); phrase structure composition is done simply geometrically (bottom-up). Here, both are done through the satisfaction of Agreement. This is shown below. (4)
Minimalism
Lebeaux (1988)
Movement
syntactic (satisfaction of features)
syntactic (satisfaction of features)
Phrase Structure Composition
asyntactic (geometric)
syntactic (satisfaction of features)
This proposal (Lebeaux 1988) links the entire grammar to the closed class set — both the movement operations and the composition operations are linked to this set. The set of composition operations discussed in this thesis is not intended to be exhaustive, merely representative. Along with Adjoin-α which Chomskyadjoins elements into the representation (Chapter 3), let us take the second, yet more radical phrase structure composition operation, Project-α. This is not equivalent to Speas’ (1990) Project-α, but rather projects an open class structure into a closed class frame. The open class structure also represents pure thematic structure, and the closed class structure, pure Case structure. This operation, for a simple partial sentence, looks like (5) (see Lebeaux 1988, 1991, 1997, 1998 for further extensive discussion). The operation projects the open class elements into the closed class (Case) frame. It also projects up the Case information from Determiner to DP, and unifies the theta information, from the theta subtree, into the Case Frame, so that it appears on the DP node. The Project-α operation was motivated in part by the postulation of a subgrammar in acquisition (Chapters 2, 3, and 4), in part by the remarkable speech error data of Garrett (Chapter 4, Garrett 1975), and in part by idioms (Chapter 4). This operation is discussed at much greater length in further developments by myself (Lebeaux 1991, 1997, 1998). I will discuss in more detail about the subgrammar underpinnings of the Project-α approach later in this preface. For now, I would simply like to point to the remarkable speech error data collected by Merrill Garrett (1975, 1980), the MIT corpus, which anchors this approach.
xvii
PREFACE
(5)
Theta subtree (open class)
Case Frame (closed class)
V N agent
VP V
V
man see
V′
DP +nom V
N patient
Det +nom
woman
the
DP +acc
NP
e
Det +acc
NP
a
e
see
Project-α
VP DP +agent +nom Det +nom
NP +agent
the
man
V′ DP +patient +acc
V see Det +acc
NP +patient
a
woman
Garrett and Shattuck-Hufnagel collected a sample of 3400 speech errors. Of these, by far the most interesting class is the so-called “morpheme-stranding” errors. These are absolutely remarkable in that they show the insertion of open class elements into a closed class frame. Thus, empirically, the apparent “importance” of open class and closed class items is reversed — rather than open class items being paramount, closed class items are paramount, and guide the derivation. Open class elements are put into slots provided by closed class elements, in Garrett’s remarkable work. A small sample of Garrett’s set is shown below.
xviii
PREFACE
(6)
Speech errors (stranded morpheme errors), Garrett (personal communication) (permuted elements underlined) Error Target my frozers are shoulden → my shoulders are frozen that just a back trucking out → a truck backing out McGovern favors pushing busters → favors busting pushers but the clean’s twoer → … two’s cleaner … his sink is shipping → ship is sinking the cancel has been practiced → the practice has been cancelled → … sights set … she’s got her sets sight a puncture tiring device → … tire puncturing device …
As can be seen, these errors can only arise at a level where open class elements are inserted into a closed class frame. The insertion does not take place correctly — a speech error — so that the open class elements end up in permuted slots (e.g. a puncture tiring device). Garrett summarizes this as follows: … why should the presence of a syntactically active bound morpheme be associated with an error at the level described in [(6)]? Precisely because the attachment of a syntactic morpheme to a particular lexical stem reflects a mapping from a “functional” level [i.e. “grammatical functional”, i.e. my theta subtree, D. L.] to a “positional” level of sentence planning …
This summarizes the two phrase structure composition operations that I propose in this thesis: Adjoin-α and Project-α. As can be seen, these involve (1) the intermingling of structures (and are not simply bottom up), and (2) satisfaction of closed class elements. Let us now turn to the general acquisition side of the problem. It was said above that this thesis was unique in that the acquisition sequence and the syntax — in particular, the syntactic derivation — were not considered in isolation, but rather in tandem. The acquisition sequence can be viewed as the output of derivational processes. Therefore, to the extent to which the derivation is partial, the corresponding stage of the acquisition sequence can be seen as a subgrammar of the full grammar. The yoking of the acquisition sequence and the syntax is therefore the following: (7)
A S
subgrammar approach phrase structure composition from smaller units
PREFACE
xix
The subgrammar approach means that children literally have a smaller grammar than the adult. The grammar increases over time by adding new structures (e.g. relative clauses, conjunctions), and by adding new primitives of the representational vocabulary, as in the change from pure theta composed speech, to theta and Case composed speech. The addition of new structures — e.g. relative clauses and conjunctions — may be thought of as follows. A complex sentence like that in (8) may be thought of as a triple: the two units, and the operation composing them (8b). (8)
a. b.
The man saw the woman who loved him. (the man saw the woman (rooted), who loved him, Adjoin-α)
Therefore a subgrammar, if it is lacking the operation joining the units may be thought of as simply taking one of the units — let us say the rooted one — and letting go of the other unit (plus letting go of the operation itself). This is possible and necessary because it is the operation itself which joins the units: if the operation is not present, one or the other of the units must be chosen. The subgrammar behind (8a), but lacking the Adjoin-α operation, will therefore generate the structure in (9) (assuming that it is the rooted structure which is chosen). (9)
The man saw the woman.
This is what is wanted. Note that the subgrammar approach (in acquisition), and the phrase structure composition approach (in syntax itself) are in perfect parity. The phrase structure composition approach gives the actual operation dividing the subgrammar from the supergrammar. That is, with respect to this operation (Adjoin-α), the grammars are arranged in two circles: Grammar 1 containing the grammar itself, but without Adjoin-α, and Grammar 2 containing the grammar including Adjoin-α. (10)
Grammar 2 (w/ Adjoin-α)
Grammar 1
The above is a case of adding a new operation. The case of adding another representational primitive is yet more interesting.
xx
PREFACE
Let us assume that the initial grammar is a pure representation of theta relations. At a later stage, Case comes in. This hypothesis is of the “layering of vocabulary”: one type of representational vocabulary comes in, and does not displace, but rather is added to, another. (11)
theta Stage I
→
theta + Case Stage II
The natural lines along which this representational addition takes place is precisely given by the operation Project-α. The derivation may again be thought of as a triple: the two composing structures, one a pure representation of theta relations, and one a pure representation of Case, and the operation composing them. (12)
((man (see woman)), (the __ (see (a __))), Project-α) the “sees” in theta tree and Case frame each contain partial information which is unified in the Project-α operation.
The subgrammar is one of the two representational units: in this case, the unit (man(see woman)). That is a sort of theta representation or telegraphic speech. The sequence from Grammar 0 to Grammar 1 is therefore given by the addition of Project-α. (13)
Grammar 1 (w/ Project-α)
Grammar 0
The full pattern of stage-like growth is shown in the chart below: (14)
A: Subgrammar Approach Add construction operations Relative clauses, to simplified tree Conjunction (not discussed here) Add primitives to Theta → Theta + Case representational vocabulary
As can be seen, the acquisition sequence and the syntax — syntactic derivation — are tightly yoked. Another way of putting the arguments above is in terms of distinguishing
xxi
PREFACE
accounts. I wish to distinguish the phrase structure operations here from Merge; and the acquisition subgrammar approach here from the alternative, which is the Full Tree, or Full Competence, Approach (the full tree approach holds that the child does not start out with a substructure, but rather has the full tree, at all stages of development.) Let us see how the accounts are distinguished, in turn. Let us start with Chomsky’s Merge. According to Merge, the (adult) phrase structure tree, as in Montague (1974), is built up bottom-up, taking individual units and joining them together, and so on. The chief property of Merge is that it is strictly bottom-up. Thus, for example, in a right-branching structure like “see the big man”, Merge would first take big and man and Merge them together, then add the to big man, and then add see to the resultant. (15)
Application of Merge: V
Det Adj
see
the
N
N′
→
big man
N
big
man
VP
DP Det
NP Adj
the
→
Adj
→ N
big man
V see
DP Det the
NP Adj
N
big man The proposal assayed in this thesis (Lebeaux 1988) would, however, have a radically different derivation. It would take the basic structure as being the basic government relation: (see man). This is the primitive unit (unlike with Merge). To this, the the and the big may be added, by separate transformations, Project-α and Adjoin-α, respectively.
xxii
PREFACE
(16)
a. Project-α Theta subtree V V
N
Project-α V′
see
man V
Case Frame
DP
(see) Det
NP
the
man
V′ V
DP
(see) Det
NP
the
b. Adjoin-α (V′ see (DP the man)) (ADJ big)
e
Adjoin-α → (V′ see (DP the (NP big man)))
How can these radically distinct accounts (Lebeaux 1988 and Merge) be empirically distinguished? I would suggest in two ways. First, conceptually the proposal here (as in Chomsky 1975–1955, 1957, and Tree Adjoining Grammars, Kroch and Joshi 1985) takes information nuclei as its input structures, not arbitrary pieces of string. For example, for the structure “The man saw the photograph that was taken by Stieglitz”, the representation here would take the two clausal nuclear structures, shown in (17) below, and adjoin them. This is not true for Merge which does not deal in nuclear units. (17) s1: the man saw the photograph Adjoin-α the man saw the photograph s2: that was by Stieglitz that was by Stieglitz Even more interesting nuclear units are implicated in the transformation Project-α, where the full sentence is decomposed into a nuclear unit which is the theta subtree, and the Case Frame.
xxiii
PREFACE
(18)
(man (see woman)) The man saw the woman (the _(see a_))
The structure in (18), the man saw the woman, is composed of a basic nuclear unit, (man (see woman)), which is telegraphic speech (as argued for in Chapter 2). No such nuclear unit exists in the Merge derivation of “the man saw the woman”: that is, in the Merge derivation, (man (see woman)) does not exist as a substructure of ((the man) (saw (the woman)). This is the conceptual argument for preferring the composition operation here over Merge. In addition, there are two simplicity arguments, of which I will give just one here. The simplicity argument has to do with a set of structures that children produce which are called replacement sequences (Braine 1976). In these sequences, the child is trying to reach (output) some structure which is somewhat too difficult for him/her. To make it, therefore, he or she first outputs a substructure, and then the whole structure. Examples are given below: the first line is the first outputted structure, and the second line is the second outputted structure, as the child attempts to reach the target (which is the second line). (19) (20)
see see see see
ball (first output) big ball (second output and target) ball (first output) the ball (second output and target)
What is striking about these replacement sequences is that the child does not simply first output random substrings of the final target, but rather that the first output is an organized part of the second. Thus in both (19) and (20), what the child has done is first isolate out the basic government relation, (see ball), and then added to it: with “big” and “the”, respectively. The particular simplifications chosen are precisely what we would expect with the substructure approach outlined here, and crucially not with Merge. With the substructure approach outlined here (Chapter 2, 4), what the child (or adult) first has in the derivation is precisely the structure (see ball), shown in example (21).
xxiv
PREFACE
(21)
V V
N +patient
see
ball
To this structure is then added other elements, by Project-α or Adjoin-α. Thus, crucially, the first structure in (19) and (20) actually exists as a literal substructure of the final form — line 2 — and thus could help the child in deriving the final form. It literally goes into the derivation. By contrast, with Merge, the first line in (19) and (20) never underlies the second line. It is easy to see why. Merge is simply bottom-up — it extends the phrase marker. Therefore, the phrase structure composition derivation underlying (20) line 2, is simply the following (Merge derivation). (22)
Merge derivation underlying (20) line 2 (N ball) (DP (D the) (N ball)) (see (DP (D the) (N ball)))
However, this derivation crucially does not have the first line of (20) — (see (ball)) — as a subcomponent. That is, (see (ball)) does not go into the making of (see (the ball)), in the Merge derivation, but it does in the substructure derivation. But this is a strong argument against Merge. For the first line of the outputted sequence of (20), (see ball), is presumably helping the child in reaching the ultimate target (see (the ball)). But this is impossible with Merge, for the first line in (20) does not go into the making of the second line, according to the Merge derivation. That is, Merge cannot explain why (see ball) would help the child get to the target (see (the ball)), since (see ball) is not part of the derivation of (see (the ball)), in the Merge derivation. It is part of the sub-derivation in the substructure approach outlined here, because of the operation Project-α. The above (see Chapters 2, 3, and 4) differentiates the sort of phrase structure composition operations found here from Merge. This is in the domain of syntax — though I have used language acquisition argumentation. In the domain of language acquisition proper, the proposal of this thesis — the hypothesis of substructures — must be contrasted with the alternative, which holds that the child is outputting the full tree, even when the child is potentially just in the one word stage: this may be called the Full Tree Hypothesis. These
xxv
PREFACE
differential possibilities are shown below. (For much additional discussion, see Lebeaux 1991, 1997, 1998, in preparation.) (23)
Lebeaux (1988) Syntax
Distinguished From
phrase structure com- Both: position (1) no composition (2) Merge
Language Acquisition subgrammar approach Full Tree Approach Let us now briefly distinguish the proposals here from the Full Tree Approach. In the Full Tree Approach, the structure underlying a child sentence like “ball” or “see ball” might be the following in (24). In contrast, the substructure approach (Lebeaux, 1988) would assign the radically different representation, given in (25). (24)
Full Tree Approach IP TP
DP D
NP
AgrSP
T AgrS
AgrOP AgrO
VP DP
V′ V
e
e
e
e
e
e
e
DP D
NP
e
ball
xxvi
PREFACE
(25)
Substructure Approach V V
N +patient ball
How can these approaches be distinguished? That is, how can a choice be made between (25), the substructure approach, and (24), the Full Tree approach? I would suggest briefly at least four ways (to see full argumentation, consult Lebeaux 1997, to appear; Powers and Lebeaux 1998). First, the subgrammar approach, but not the full tree approach, has some notion of simplicity in representation and derivation. Simplicity is a much used notion in science, for example deciding between two equally empirically adequate theories. The Full Tree Approach has no notion of simplicity: in particular, it has no idea of how the child would proceed from simpler structures to more complex ones. On the other hand, the substructure theory has a strong proposal to make: the child proceeds from simpler structures over time to those which are more complex. Thus the subgrammar point of view makes a strong proposal linked to simplicity, while the Full Tree hypothesis makes none. A second argument has to do with the closed class elements, and may be broken up into two subarguments. The first of these arguments is that, in the Full Tree Approach, there is no principled reason for the exclusion of closed class elements in early speech (telegraphic speech). That is, both the open class and closed class nodes exist, according to the Full Tree Hypothesis, and there is no principled reason why initial speech would simply be open class, as it is. That is, given the Full Tree Hypothesis, since the full tree is present, lexical insertion could take place just as easily in the closed class nodes as the open class nodes. The fact that it doesn’t leaves the Full Tree approach with no principled reason why closed class items are lacking in early speech. A second reason having to do with closed class items, has to do with the special role that they have in structuring an utterance, as shown by the work of Garrett (1975, 1980), and Gleitman (1990). Since the Full Tree Approach gives open and closed class items the same status, it has no explanation for why closed class items play a special role in processing and acquisition. The substructure approach, with Project-α, on the other hand, faithfully models the difference, by having open class and closed class elements initially on different representations,
PREFACE
xxvii
which are then fused (for additional discussion, see Chapter 4, and Lebeaux 1991, 1997, to appear). A third argument against the Full Tree Approach has to do with structures like “see ball” (natural) vs. “see big” (unnatural) given below. (26)
see ball (natural and common) see big (unnatural and uncommon)
Why would an utterance like “see ball” be natural and common for the child — maintaining the government relation — while “see big” is unnatural and uncommon? There is a common sense explanation for this: “see ball” maintains the government relation (between a verb and a complement), while “see” and “big” have no natural relation. While this fact is obvious, it cannot be accounted for with the Full Tree Approach. The reason is that the Full Tree Approach has all nodes potentially available for use: including the adjectival ones. Thus there would be no constraint on lexically inserting “see” and “big” (rather than “see” and “ball”). On the substructure approach, on the other hand, there is a marked difference: “see” and “ball” are on a single primitive substructure — the theta tree — while “see” and “big” are not. A fourth argument against the Full Tree Approach and for the substructure approach comes from a paper by Laporte-Grimes and Lebeaux (1993). In this paper, the authors show that the acquisition sequence proceeds almost sequentially in terms of the geometric complexity of the phrase marker. This is, children first output binary branching structures, then double binary branching, then triply binary branching, and so on. This complexity result would be unexpected with the Full Tree Approach, where the full tree is always available. This concludes the four arguments against the Full Tree Approach, and for the substructure approach in acquisition. The substructure approach (in acquisition) and the composition of the phrase marker (in syntax) form the two main proposals of this thesis. Aside from the main lines of argumentation, which I have just given, there are a number of other proposals in this thesis. I just list them here. (1) One main proposal which I take up in all of Chapter 5 is that the acquisition sequence is built up from derivational endpoints. In particular, for some purposes, the child’s derivation is anchored in the surface, and only goes part of the way back to DS. The main example of this can be seen with dislocated constituents. In examples like (27a) and (b), exemplifying Strong Crossover and a Condition C violation respectively, the adult would not allow these constructions, while the child does.
xxviii (27)
PREFACE
a. *Which mani did hei see t? (OK for child) b. *In John’si house, hei put a book t. (OK for child)
It cannot be simply said, as in (27b), that Condition C does not apply in the child’s grammar, because it does, in nondislocated structures (Carden 1986b). The solution to this puzzle — and there exist a large number of similar puzzles in the acquisition literature, see Chapter 5–is that Condition C in general applies over direct c-command relations, including at D-Structure (Lebeaux 1988, 1991, 1998), and that the child analyzes structures like (27b) as if they were dislocated at all levels of representation, thus never triggering Condition C (a similar analysis holds of Strong Crossover, construed as a Condition C type constraint, at DS, van Riemsdijk and Williams 1981). That is, the child derivation, unlike the adult, does not have movement, but starts out with the element in a dislocated position, and indexes it to the trace. This explains the lack of Condition C and Crossover constraints (shown in Chapter 5). It does so by saying that the child’s derivation is shallow: anchored at SS or the surface, and the dislocated item is never treated as if it were fully back in the DS position. This is the shallowness of the derivation, anchored in SS (discussed in Chapter 5). (2) A number of proposals are made in Chapter 2. One main proposal concerns the theta tree. In order to construct the tree, one takes a lexical entry, and does lexical insertion of open class items directly into that. This is shown in (28).
V
(28)
N man →
V V see
N patient
←woman
This means that the sequence between the lexicon and the syntax is in fact a continuum: the theta subtree constitutes an intermediate structure between those usually thought to be in the lexicon, and those in the syntax. This is a radical proposal. A second proposal made in Chapter 2 is that X′ projections project up as far as they need to. Thus if one assumed the X′-theory of Jackendoff (1977) (as I did in this thesis) — recall that Jackendoff had 3 X′ levels — then an element might project up to the single bar level, double bar level, or all the way up to the triple bar level, as needed.
PREFACE
xxix
N′′′
(29)
N′′ N′ N This was called the hypothesis of submaximal projections. A final proposal of Chapter 2 is that the English nominal system is ergative. That is, a simple intransitive noun phrase like that in (29), with the subject in the subject position (of the noun phrase) is always derived from a DS in which the subject is a DS object. Crucially, this includes not simply unaccusative verbs (i.e. nominals from unaccusative verbs) but unergative verbs as well (such as sleeping and swimming). (30)
a. b.
John’s sleeping derived from: the sleeping of John (subject internal) John’s swimming derived from: the swimming of John (subject internal)
This means that the English nominal system is actually ergative in character — a startling result. Some final editorial comments. For space reasons in this series, Chapter 5 in the original thesis has been deleted, and Chapter 6 has been re-numbered Chapter 5. Second, I have maintained the phrase structure nodes of the original trees, rather than trying to “update” them with the more recent nodes. The current IP is therefore generally labelled S (sentence), the current DP is generally labelled NP (noun phrase), and the current CP is sometimes labelled S′ (S-bar, the old name for CP). Finally, the term dislocation in Chapter 5 is intended to be neutral by itself between moved and base-generated. The argument of that section is that wh-elements which are moved by the adult, are base generated in dislocated positions by the child. Finally, I would like to thank Lisa Cheng and Anke de Looper for helpful editorial assistance.
Introduction This work arose out of an attempt to answer three questions: I.
Is there a way in which the Government-Binding theory of Chomsky (1981) can be formulated so that the leveling in it is more essential than in the current version of the theory? II. What is the relation between the sequence of grammars that the child adopts, and the basic formation of the grammar, and is there such a relation? III. Is there a way to anchor Chomsky’s (1981) finiteness claim that the set of possible human grammars is finite, so that it becomes a central explanatory factor in the grammar itself? The work attempts to accomplish the following: I.
To provide for an essentially leveled theory, in two ways: by showing that DS and SS are clearly demarcated by positing operations additional to Move-α which relate them, and by suggesting that there is a ordering in addition by vocabulary, the vocabulary of description (in particular, Case and theta theory) accumulating over the derivation. II. To relate this syntactically argued for leveling to the acquisition theory, again in two ways: by arguing that the external levels (DS, the Surface, PF) may precede S-structure with respect to the induction of structure, and by positing a general principle, the General Congruence Principle, which relates acquisition stages and syntactic levels. III. To give the closed class elements a crucial role to play: with respect to parametric variation, they are the locus of the specification of parametric difference, and with respect to the composition of the phrase marker: it is the need for closed class (CC) elements to be satisfied which gives rise to phrase marker composition from more primitive units, and which initiates Move-α as well. In terms of syntactic content, Chapters 2–4 deal with phrase structure — both the acquisition and the syntactic analysis thereof — and Chapter 5 deals with the interaction of indexing functions, Control and Binding Theory, with levels of representation, particularly as it is displayed in the acquisition sequence.
2
INTRODUCTION
Thematically, a number of concerns emerge throughout. A major concern is with closed class elements and finiteness. With respect to parametric variation, I suggest that closed class elements are the locus of parametric variation. This guarantees finiteness of possible grammars in UG, since the set of possible closed class elements is finite.1 With respect to phrase structure composition, it is the closed class elements, and the necessity for their satisfaction, which require the phrase marker to be composed, and initiate movement as well (e.g. Move-wh is in a 1-to-1 correspondence with the lexical necessity: Satisfy +wh feature). The phrase marker composition has some relation to the traditional generalized transformations of Chomsky (1957), and they may apply (in the case of Adjoin-α) after movement. But the composition that occurs is of a strictly limited sort, where the units are demarcated according to the principles of GB. Finally, closed class elements form a fixed frame into which the open class (OC) elements are projected (Chapters 1, 2, and 4). More exactly, they form a Case frame into which a theta sub-tree is projected (Chapter 4). This rule, I call Merger (or Project-a). A second theme is the relation of stages in acquisition to levels of grammatical representation. Since the apparent difficulty of any theory which involves the learning of transformations,2 the precise nature of the relation of the acquisition sequence to the structure of the grammar has remained murky, without a theory of how the grammatical acquisition sequence interacts with, or displays the structure of the grammar, and with, perhaps, many theoreticians believing that any correspondence is otiose. Yet there is considerable reason to believe that there should be such a correspondence. On theoretical grounds, this would be expected for the following reason: The child in his/her induction of the grammar is not handed information from all levels in the grammar at once, but rather from particular picked out levels; the external levels of Chomsky (class lectures, 1985) — DS, LF, and PF or the surface. These are contrasted to the internal level, S-structure. Briefly, information from the external levels are available to the child; about LF because of the paired meaning interpretation, from the surface in the obvious fashion, and from DS, construed here simply as the format of lexical forms, which are presumably given by UG. As such, the child’s task (still!) involves the interpolation of operations and levels between these relatively fixed points. But, this then means
1. Modulo the comments in Chapter 1, footnote 1. 2. Because individual transformations are no longer sanctioned in the grammar. I do not believe, however, that the jury is yet in on the type of theory that Wexler and Culicover (1980) envisage.
INTRODUCTION
3
that the acquisition sequence must build on these external levels, and display the structure of the levels, perhaps in a complex fashion. A numerical argument leads in the same direction: namely, that the acquisition theory, in addition to being a parametric theory, should contain some essential reference to, and reflect, the structure of the grammar. Suppose that, as above, the closed class elements and their values are identified with the possible parameters. Let us (somewhat fancifully) set the number at 25, and assume that they are binary. This would then give 225 target grammars in UG (=30 million), a really quite small finite system. But, consider the range of acquisition sequences involved. If parameters are independent — a common assumption — then any of these 25 parameters could be set first, then any of the remaining 24, and so on. This gives 25! possible acquisition sequences for the learning of a single language (=1.5 × 1025), a truly gigantic number. That is, the range of acquisition sequences would be much larger than the range of possible grammars, and children might be expected to display widely divergent intermediate grammars in their path to the final common target, given independence. Yet they do nothing of the sort; acquisition sequences in a given language look remarkably similar. All children pass through a stage of telegraphic speech, and similar sorts of errors are made in structures of complementation, in the acquisition of Control, and so on. There is no wide fecundity in the display of intermediate grammars. The way that has been broached in the acquisition literature to handle this has been the so-called linking of parameters, where the setting of a single parameter leads to another being set. This could restrict the range of acquisition sequences. But, the theories embodying this idea have tended to have a rather idiosyncratic and fragmentary character, and have not been numerous. The suggestion in this work is that there is substructuring, but this is not in the lexical-parametric domain itself (conceived of as the set of values for the closed class (CC) elements), but in the operational domain with which this lexical domain is associated. An example of this association was given above with the relation of the wh-movement to the satisfaction of the +wh feature; another example would be with satisfaction of the relative clause linker (the wh-element itself), which either needs or does not need to be satisfied in the syntax. This gives rise to either language in which the relative forms a constituent with the head (English-type languages), or languages in which it is splayed out after the main proposition, correlative languages. (1) Lexical Domain Operational Domain +wh must be satisfied by SS Move-wh applies in syntax +wh may not be satisfied by SS Move-wh applies at LF
4
INTRODUCTION
Lexical Domain Relative Clause linker must be satisfied by SS Relative Clause linker may not be satisfied by SS
Operational Domain English-type language Correlative language
The theory of this work suggests that all operations are dually specified in the lexical domain (requiring satisfaction of a CC lexical element) and in the operational domain. The acquisition sequence reflects the structure of the grammar in two ways: via the General Congruence Principle, which states that the stages in acquisition are in a congruence relation with the structure of parameters (see Chapter 3 for discussion), and via the use of the external levels (DS, PF, LF) as anchoring levels for the analysis — essentially, as the inductive basis. The General Congruence Principle is discussed in Chapter 2–4, the possibility of broader anchoring levels, in Chapter 5. The latter point of view is somewhat distinct from the former, and (to be frank) the exact relation between them is not yet clear to the author. It may be that the General Congruence Principle is a special case, when the anchoring level is DS, or it may be that these are autonomous principles. I leave this question open. The third theme of this work has to do with levels or precedence relations in the grammar. In particular, with respect to two issues: (a) Is it possible to make an argument that the grammar is essentially derivational in character, rather than in the representational mode (cf. Chomsky’s 1981 discussion of Move-α)? (b) Is there any evidence of intermediate levels, of the sort postulated in van Riemsdijk and Williams (1981)? I believe that considering a wider range of operations than Move-α may move this debate forward. In particular, I propose two additional operations of phrase structure composition: Adjoin-α, which adjoins adjuncts in the course of the derivation, and Project-α, which relates the lexical syntax to the phrasal. With respect to these operations, two types of precedence relations do seem to hold. First, operation/default organization holds within an operation type. In the case of Adjoin-α and its corresponding default, Conjoin-α (i.e., two of the types of generalized transformations in Chomsky 1957, are organized as a single operation type, with an operation/default relation between them). The other precedence relation is vocabulary layering and this hold between different operations, for example, Case and theta theory (see Chapter 2, 3, and 4 for discussion). Further, operations like Adjoin-α may follow Move-α, and this explains the anti-Reconstruction facts of van Riemsdijk and
INTRODUCTION
5
Williams (1981); such facts cannot be easily explained in the representational mode (see Chapter 3). In general, throughout this work I will interleave acquisition data and theory with ‘pure’ syntactic theory, since I do not really differentiate between them. Thus, the proposal having to do with Adjoin-α was motivated by pure syntactic concerns (the anti-Reconstruction facts, and the attempt to get a simple description of licensing), but was then carried over into the acquisition sphere. The proposal having to do with the operation of Project-α (or Merger) was formulated first in order to give a succinct account of telegraphic speech (and, to a lesser degree, to account for speech error data), and was then carried over into the syntactic domain. To the extent to which this type of work is successful, the two areas, pure syntactic theory and acquisition theory may be brought much closer, perhaps identified.
C 1 A Re-Definition of the Problem
1.1 The Pivot/Open Distinction and the Government Relation For many years language acquisition research has been a sort of weak sister in grammatical research. The reason for this, I believe, lies not so much in its own intrinsic weakness (for a theoretical tour de force, see Wexler and Culicover 1980, see also Pinker 1984), but rather, as in other unequal sibships, in relation. This relation has not been a close one; moreover the lionizing of the theoretical importance of language acquisition as the conceptual ground of linguistic theorizing has existed in uneasy conscience alongside a real practical lack of interest. Nor is the fault purely on the side of theoretical linguistics: the acquisition literature, especially on the psychological side, is notorious for having drifted further and further from the original goal of explaining acquisition, i.e. the sequence of mappings which take the child from G0 to the terminal grammar Gn, to the study of a different sort of creature altogether, Child Language (see Pinker 1984, for discussion and a diagnostic). 1.1.1
Braine’s Distinction
Nonetheless, even in the psychological literature, especially early on, there were a number of proposals of quite far-reaching importance which would, or could, have (had) a direct bearing on linguistic theory, and which pointed the way to theories far more advanced than those available at the time. For example, Braine’s (1963a) postulation of pivot-open structures in early grammars. Braine essentially noticed and isolated three properties of early speech: for a large number of children, the vocabulary divided into two classes, which he called pivot and open. The pivot class was “closed class”, partly in the sense that it applies in the adult grammar (e.g., containing prepositions, pronouns, etc.) but partly also in the broader sense: it was a class that contained a small set of words which couldn’t be added on to, even though these words corresponded to
8
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
those which would ordinarily be thought of as open class (e.g. “come”); these words operated on a comparatively large number of open class elements. An example of the Braine data is given below. (1)
Steven’s word combinations want baby see record want car see Stevie want do want get whoa cards want glasses whoa jeep want head want high more ball want horsie more book want jeep want more there ball want page there book want pon there doggie want purse there doll want ride there high want up there momma want byebye car there record there trunk it ball there byebye car it bang there daddy truck it checker there momma truck it daddy it Dennis that box it X etc. that Dennis that X etc. get ball get Betty here bed get doll here checker here doll see ball here truck see doll bunny do daddy do momma do
A RE-DEFINITION OF THE PROBLEM
9
The second property of the pivot/open distinction noticed by Braine was that pivot and open are positional classes, occurring in a specified position with respect to each other, though the positional ordering was specific to the pivot element itself (P1 Open, Open P2, etc.) and hence not to be captured by a general phrase structure rewrite rule: S → Pivot Open. This latter fact was used by critical studies of the time (Fodor, Bever, and Garrett 1974, for example) to argue that Braine’s distinction was somehow incoherent, since the one means of capturing such a distinction, phrase structure rules, required a general collapse across elements in the pivot class which was simply not available in the data. The third property of the pivot/open distinction was that the open class elements were generally optional, while the pivot elements were not. 1.1.2
The Government Relation
What is interesting from the perspective of current theory is just how closely Braine managed to isolate analogs not to the phrase structure rule descriptions popular at that time, but to the central relation primitives of the current theory. Thus the relation of pivot to open classes may be thought of as that between governor and governed element, or perhaps more generally that of head to complement; something like a primitive prediction or small clause structure (in the extended sense of Kayne 1984) appears to be in evidence in these early structures as well: (2)
Steven word utterances: it ball that box it bang that Dennis it checker that doll it X, etc. that Tommy that truck there ball here bed there book here checker there doggie here doll there X, etc. here X, etc. Andrew word combinations: boot off light off pants off shirt off shoe off
airplane all gone Calico all gone Calico all done salt all shut all done milk
10
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
water off clock on there up on there hot in there X in/on there, etc.
all all all all
done gone gone gone
now juice outside pacifier
Gregory word combinations: byebye plane byebye man byebye hot
allgone allgone allgone allgone allgone
shoe vitamins egg lettuce watch
etc. The third property that Braine notes, the optionality of the open constituent with respect to the pivot, may also be regularized to current theory: it is simply the idea that heads are generally obligatory while complements are not. The idea that the child, very early on, is trying to determine the general properties of the government relation in the language (remaining neutral for now about whether this is case or theta government) is supported by two other facts as well: the presence of what Braine calls “groping patterns” in the early data, and the presence of what he calls formulas of limited scope. The former can be seen in the presence of the “allgone” constructions in Andrew’s speech. The latter refers simply to the fact that in the very early two-word grammars, the set of relations possible between the two words appears limited in terms of the semantic relation which hold between them. This may be thought of as showing that the initial government relation is learned with respect to specific lexical items, or cognitively specified subclasses, and is then collapsed between them. See also later discussion. The presence of “groping patterns”, i.e. the presence, in two word utterances of patterns in which the order of elements is not fixed for lexically specific elements corresponds to the original experimentation in determining the directionality of government (Chomsky 1981, Stowell 1981). The presence of groping patterns is problematic for any theory of grammar which gives a prominent role to phrase structure rules in early speech, since the order of elements must be fixed for all elements in a class. See, e.g., the discussion in Pinker (1984), which attempts, unsuccessfully I believe, to naturalize this set of data. To the extent to which phrase structure order is considered to be a derivative notion, and the government-of relation the primitive one, the presence of lexically specific order difference is not particularly problematic, as long as the
A RE-DEFINITION OF THE PROBLEM
11
directionality of government is assumed to be determined at first on a word-byword basis.
1.2 The Open/Closed Class Distinction Braine’s prescient analysis was attacked in the psychological literature on both empirical and especially theoretical grounds; it was ignored in the linguistic literature. The basis of the theoretical attack was that the pivot/open distinction, being lexically specific with respect to distribution, would not be accommodated in a general theory of phrase structure rules (as already mentioned above); moreover, the particular form of the theory adopted by Braine posited a radical discontinuity in the form of the grammar as it changed from a pivot/open grammar to a standard Aspects-style PS grammar. This latter charge we may partly diffuse by noting that there is no need to suppose a radical discontinuity in the form of the grammar as it changed over time, the pivot/open grammar is simply contained as a subgrammar in all the later stages. However, we wish to remain neutral, for now, on the general issue of whether such radical discontinuities are possible. The proponents of such a view, especially the holders of the view that the original grammar was essentially “semantic” (i.e. thematically organized), held the view in either a more or less radical form. The more extreme advocates (Schlesinger 1971) held not simply that there was a radical discontinuity, but that the primitives of later stages — syntactic primitives like case and syntactic categories like noun or noun phrase — were constructed out of the primitives of the earlier stages: a position one may emphatically reject. Other theoreticians, however, in particular Melissa Bowerman (Bowerman 1973, 1974) held that there was such a discontinuity, but without supposing any construction of the primitives of the later stages from those of the earlier. We return, in detail, to this possibility below. More generally, however, the charge that the pivot/open class stage presents a problem for grammatical description appears to dissolve once the governmentof relation is taken to be the primitive, rather than the learning of a collection of (internally coherent) phrase structure rules. However, more still needs to be said about Braine’s data. For it is not simply the case that a rudimentary government relation is being established, but that this is overlaid, in a mysterious way, with the open/closed class distinction. Thus it is not simply that the child is determining the government-of and predicate of relations in his or her language, but also that the class of governing
12
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
elements is, in some peculiar way, associated with a distributional class: namely, that of closed class elements. While the central place of the government-of relation in current theory gives us insight into one-half of Braine’s data, the role of the closed class/open class distinction, though absolutely pervasive in both Braine’s work and in all the psycholinguistic literature (see Garrett 1975, Shattuck-Hufnagel 1974, Bradley 1979, for a small sample) has remained totally untouched. Indeed, even the semantic literature, which has in general paid much more attention to the specifier relation than transformational-generative linguistics, does not appear to have anything to say that would account for the acquisition facts. What could we say about the initial overlay of the elements closed class and the set of governors? The minimal assumption would be something like this: (3)
The set of canonical governors is closed class.
While this is an interesting possibility, it would involve, for example, including prepositions and auxiliary verbs in the class of canonical governors, but not main verbs. Suppose that we strengthen (3), nonetheless. (4)
Only closed class elements may govern.
What about verbs? Interestingly, a solution already exists in the literature: in fact, two of them. Stowell (1981) suggests that it is not the verb per se which governs its complements, but rather the theta grid associated with it. Thus the complements are theta governed under coindexing with positions in the theta grid. And while the class of verbs in a language is clearly open class and potentially infinite, the class of theta grids is equally clearly finite: a member of a closed, finite set of elements. Along the same lines, Koopman (1984) makes the interesting, though at first glance odd, suggestion that it is not the verb which Case-governs its complements, but Case-assigning features associated with the verb. She does this in the context of a discussion of Stowell’s Case adjacency requirement for case assignment; a proposal which appears to be immediately falsified by the existence of Dutch, a language in which the verb is VP final, but the accusative marked object is at the left periphery of the VP. Koopman saves Stowell’s proposal by supposing that the Case-assigning features of the verb are at the left periphery, though the verb itself is at the right. This idea that the two aspects of the verb are separable in this fashion will be returned to, and supported, below. What is crucial for present purposes is simply to note that Casegoverning properties of the verb are themselves closed class, though the set of verbs is not. Thus both the Case-assigning and theta-assigning properties of the
A RE-DEFINITION OF THE PROBLEM
13
verb are closed class, and we may assume that these, rather than some property of the open class itself enters into the government relation. There is a second possibility, less theory-dependant. This is simply that, as has often been noted, there is within the “open” part of the vocabulary of language a subset which is potentially closed: this is the so-called basic vocabulary of the language, used in the teaching of basic English, and other languages. The verb say would presumably be part of this closed subset, but not the verb mutter, as would their translations. The child task may be viewed as centering on the closed class elements in the less abstract sense of lexical items, if these are included in the set. 1.2.1
Finiteness
While the syntactic conjecture that the Case features on the verb are governing its object has been often enough made, the theoretical potential of such a proposal has not been realized. In essence, this proposal reduces a property of an open class of elements, namely verbs, to a property of a closed class of elements (the Case features on verbs). Insofar as direction of government is treated as a parameter of variation across languages, by reducing government directionality to a property of a closed class set, the two sorts of finiteness, lexical and syntactic, are joined together. The finiteness of syntactic variation (Chomsky 1981) is tied, in the closest possible way, to the necessary finiteness of a lexical class (and the specifications associated with it). Let us take another example. English allows wh-movement in the syntax; Chinese, apparently, apportions it into LF (Huang 1982). This is a parametric difference in the level of derivation at which a particular operation applies. However, this may well be reducible to a parametric difference in a closed class element. Let us suppose, following Chomsky (1986), that wh-movement is movement into the specifier position of C′. Ordinarily it is assumed that lexical selection (of the complement-taking verb) is of the head. Let us assume likewise — the matrix verb must select for a +/− wh feature in Comp. This, in turn, must regulate the possible appearance of the wh-word appearing in the specifier position of C′. We may assume that some agreement relation holds between these two positions, in direct analog to the agreement relation which exists generally between specifier and head positions, e.g. with respect to case. Thus the presence of the overt wh-element in Spec C′ is necessary to agree with, or saturate the +wh feature which is basegenerated in Comp. What then is the difference between English and Chinese? Just this: the agreeing element in Comp must be satisfied at S-structure in English,
14
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
while it needs only be satisfied at LF in Chinese. This difference, in turn, may be traced to some intrinsic property of agreement in the two languages, we might hope. (5)
I wonder
C′′ C′
who
I′′
Comp
I′
NP John
VP
I V
NP
saw
e
If this sketch of an analysis is correct — or something like it is — then the parametric difference between English and Chinese with respect to wh-movement is reduced to a difference in the lexical specification of a closed class element.1 Since the possible set of universal specifications associated with a closed class set of elements is of necessity finite, the finiteness conjecture of Chomsky (1981) would be vindicated in the strongest possible way. Namely, the finiteness in parametric variation would be tied, and perhaps only tied, to the finiteness of a group of necessarily finite lexical elements, and the information associated with them. 1.2.2
The Question of Levels
There is a different aspect of this which requires note. The difference between Chinese and English with respect to wh-movement is perhaps associated with features on the closed class morpheme, but this shows up as a difference in the appearance of the structure at a representational level. I believe that this is in
1. I should note that the term closed class element here is being used in a somewhat broader sense than usual, to encompass elements like the +wh feature. The finiteness in the closed class set cannot be that of the actual lexical items themselves, since these may vary from language to language, but in the schema which defines them (e.g. definite determiner, indefinite determiner, Infl, etc.).
A RE-DEFINITION OF THE PROBLEM
15
general the case: namely, that while information associated with a closed class element is at the root of some aspect of parametric variation, this difference often evidences itself in the grammar by a difference in the representational level at which a particular operation applies. We may put this in the form of a proposal: (6)
The theory of UG is the theory of the parametric variation in the specifications of closed class elements, filtered through a theory of levels.
I will return throughout this work to more specific ways in which the conjecture in (6) may be fleshed out, but I would like to return at this point to two aspects which seem relevant. First is the observation made repeatedly by Chomsky (1981, 1986a), that while the set of possible human languages is (at least conjecturally) finite, they appear to have a wide “scatter” in terms of surface features. Why, we might ask, should this be the case? If the above conjecture (6) is correct, it is precisely because of the interaction of the finite set of specifications associated with the closed class elements, and the rather huge surface differences which would follow from having different operations apply at different levels. The information associated with the former would determine the latter; the latter would give rise to the apparent huge differences in the description of the world’s languages, but would itself be tied to a parametric variation in a small, necessarily finite set. How does language acquisition proceed under these circumstances? Briefly, it must proceed in two ways: by determining the properties of lexical specifications associated with the closed class set the child determines the structure of the levels; by determining the structure of the levels he or she determines the properties of the closed class morphemes. The proposal that the discovery of properties associated with closed class lexical items is central obviously owes a lot to Borer’s (1985) lexical learning hypothesis, that what the child learns, and all that he/she learns is associated with properties of lexical elements. It constitutes, in fact, a (fairly radical) strengthening of that proposal, in the direction of finiteness. Thus while the original lexical learning hypothesis would not guarantee finiteness in parametric variation, the version adopted in (6) would, and thus may be viewed as providing a particular sort of grounding for Chomsky’s finiteness claim. However, the proposal in (6) contains an additional claim as well: that the difference in the specifications of closed class elements cashes in as a difference in the level that various operations apply. Thus it provides an outline of the way that the gross scatter of languages may be associated with a finite range.
16
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
1.3 Triggers 1.3.1
A Constraint
The theory of parametric variation or grammatical determination has often been linked with a different theory: that of triggers (Roeper 1978b, 1982, Roeper and Williams 1986). A trigger may be thought of, in the most general case, as a piece of information, on the surface string, which allows the child to determine some aspect of grammatical realization. The idea is an attractive one, in that it suggests a direct connection between a piece of surface data and the underlying projected grammar; it is also in danger, if left further undefined, of becoming nearly vacuous as a means of grammatical description. A trigger, as it is commonly used, may apply to virtually any property of the surface string which allows the child to make some determination about his or her grammar. There is, as is usual in linguistic theory, a way to make an idea more theoretically valuable: that is, by constraining it. This constraint may be either right or wrong, but it should, in either case, sharpen the theoretical issues involved. In line with the discussion earlier in the chapter, let us limit the content of “trigger” in the following way: (7)
A trigger is a determination in the property of a closed class element.
Given the previous discussion, the differences in the “look” of the output grammar may be large, given that a trigger has been set. The trigger-setting itself, however, is aligned with the setting of the specification of a closed class element. There are a number of instances of “triggers” in the input which must be reexamined given (7) above, there are however, at least two very good instances of triggers in the above sense which have been proposed in the literature. The first is Hyams (1985, 1986, 1987) analysis of the early dropping of subjects in English. Hyams suggests that children start off with a grammar which is essentially pro-drop, and that English-speaking children then move to an Englishtype grammar, which is not. These correspond to developmental stages in which children initially allow subjects to drop, filter out auxiliaries, and so on (as a first step), to one in which they do not so do (as the second step). The means by which children pass from the first grammar to the second, Hyams suggests, is by means of the detection of expletives in the input. Such elements are generally assumed not to exist in pro-drop languages; the presence of such elements would thus allow the child to determine the type of the language that he or she was facing.
A RE-DEFINITION OF THE PROBLEM
1.3.2
17
Determining the base order of German
The other example of a trigger, in the sense of (7) above, is found in Roeper’s (1978b) analysis of German. While German sentences are underlyingly verb-final (see Bierwisch 1963, Bach 1962, Koster 1975, and many others), the verb may show up in either the second or final position. (8)
a. b.
Ich sah ihn. I saw him. Ich glaube dass ich ihn gesehen habe. I believe that I him seen have
Roeper’s empirical data suggests that the child analyses German as verb-final at a very early stage. However, this leaves the acquisition question open: how does the child know that German is verb final? Roeper proposes two possible answers: (9)
i. ii.
Children pay attention to the word order in embedded, not matrix clauses. Children isolate the deep structure position of the verb by reference to the placement of the word “not”, which is always at the end of the sentence.
At first, it appears that the solution (i) is far preferable. It is much more general, for one thing, and it also allows a natural tie-in with theory — namely, Emonds (1975) conception, that various transformations apply in root clauses which are barred from applying in embedded contexts. However, recent work by Safir (1982) suggests that Emonds generalization follows from other principles, in particular that of government, and even if it were the case that Safir’s particular proposal were not correct, it would certainly be expected, in the context of current theory, that the difference between root and embedded clauses would not be stated as part of the primitive basis, but would follow from more primitive specifications. A different line of deduction, not available to Roeper in 1974, appears to be more promising. Namely, for the child to deduce DS position from a property of the government relation. Given Case Adjacency (Stowell 1981), and given a theory which states that Case assignment applies prior to verb movement, and given the assumption that the accusative-marked element has not moved (all these assumptions are necessary), then the presence of the accusative-marked object, in the presence of other material preceding it in the VP, would act as a legitimate “marker” for the presence of the DS verb following it:
18
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(10)
a. b.
Ich habe dem Mann das Buch gegeben. I have (to) the man the book given. Ich gebe dem Mann das Buch t. I gave the man the book.
This is one way out of the problem. However, certain of these assumptions appear questionable, or at least not easily determinable by the child on the basis of surface evidence. For example, the accusative will appear adjacent to the phrase-final verb if both objects in the double object construction are definite full NPs, but if the direct object is a pronoun, the order of the two complements reverses, obligatorily (Thiersch 1978). (11)
a. *Ich hatte dem Mann es gegeben. I had the man it given b. Ich hatte es dem Mann gegeben. I had it the man given
Assuming that accusative is a case assigned by the verb but that dative is not, the interposition of the dative object between the verb and the direct object would create learnability problems for the child; in particular, the presence of the accusative object would not be an invariable marker of the presence (at DS) of the verb beside it. Of course, additional properties or rules (e.g. with respect to the possibility of cliticization) may be added to account for the adult data, but this would complicate the learnability problem, in the sense that the child would already have to have access to this information prior to his/her positing of the verb-final base. A second and equally serious difficulty with using the presence of the accusative object as a sure marker of the presence of the DS subject (under Case Adjacency) is simply that in quite closely related languages, e.g. Dutch, such a strict adjacency requirement does not seem to be required. Thus in Dutch, the accusative object appears at the beginning of the verb phrase, while the verb is phrase final. (12)
a. b.
Jan plant bomen in de tuin. John plants trees in the garden. Jan be-plant de tuin met bomen. John plants the garden with trees.
Of course, it is possible to take account of this theoretically, along the lines that Koopman (1984) suggests, where the Case-assigning features are split off from the verb itself. But the degree of freedom necessitated in this proposal, while quite possible from the point of view of a synchronic description of the grammar,
A RE-DEFINITION OF THE PROBLEM
19
makes it unattractive as a learnability trigger (in the sense of (7) above). In particular, the abstract Case-assigning features, now separated from the verb, could no longer allow the presence of an accusative marked object to be the invariable marker of the verb itself, and thus allow the child to determine the deep structure position of the verb within the VP. While not unambiguously rejecting the possibility that the presence of accusative case may act as the marker for the verb in conjunction with other factors for the child (since, in current theory, it is the interaction of theories like Case and Theta theory which allow properties of a construction to be determined: why should it be any different for the child?) let us turn to the third option for learnability, the second option outlined by Roeper (1974). This is that the position of Neg or the negative-like element marks the placement of the verb for the child. Not following (yet) from any general principle, this may appear to be the most unpromising proposal of the lot. Let us first, however, suitably generalize it: (13)
The child locates the position of the head by locating the position of the closed class specifier of the head; the latter acts as a marker for the presence of the former.
If we assume that Neg or not is in the specifier of V′ or V″, the generalization in (13) is appropriate. Does (13) hold? Before turning to specifically linguistic questions, it should be noted that there does exist a body of evidence in the psycholinguistic literature, dating from the mid-sixties, which bears on this question (Braine 1965, Morgan, Meier, and Newport 1987) This is in the learning of artificial languages by adults. Such languages may be constructed to have differing properties, and, in particular, to either have or not have a closed class subset in them. Such languages are “learned”, not by direct tuition, but rather by direct exposure to a large number of instances of well-formed sentences in the language, large enough so that the subject cannot have recourse to nonlinguistic, general problem-solving techniques. What is interesting about this line of research is that the closed-class morphemes seem to play an absolutely crucial role in the learnability of the language (Braine 1965, Morgan, Meier, and Newport 1987) In particular, with such morphemes, the grammar of the language is fairly easily deducible, but without them the deduction is very much more difficult. Certain questions relevant to the issue at hand have not been dealt with in this literature — for example, in a language in which at the surface the string may have the head in one of two places, how is this placement determined? — but the general upshot of this line of research seems clear: such elements are crucial in the language’s acquisition. Of course, it must still be noted that this is language-learning by
20
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
adults, not children, but the fact that the learning is occurring under conditions of input-flooding, rather than direct tuition, makes it correspond much more closely to the original conditions under which learning takes place. Let us return to (13). The claim in (13) is that the closed class specifiers associated with a head are better markers of that head’s DS position than the head itself. Given that the child must in general determine whether movement has taken place, it is necessary that there be some principle, gatherable from the surface itself, which so determines it. With respect to direct complements of a head, we may assume that the detection of movement (i.e. the construction of the DS and SS representations from the surface) is done with respect to characteristics of the head: for example, that the head is missing one of its obligatorily specified complements. What about if the head itself is moved? In this case, any of three sorts of information would suffice: either the stranding of a (on the surface, not governed) complement would suffice, or the stranding of a specifier. It is also possible, if the head is subcategorized by another head, and this head is left without a complement, then the detection of movement would take place. This would be the case, in English, in instances where a full clause was fronted. With respect to the movement of the verbal head in German, the second proposal, that the DS position of the verb is determined with respect to an obligatorily present complement, corresponds to the proposal that the DS position of the verb is detected by reference to the accusative-marked object. The second possibility, that the stranding of the specifier marks the DS position of the head, is essentially Roeper’s proposal with respect to the placement of Neg. What about if the specifier itself is moved? This could be detected by the placement of the head, if the head itself were assumed to be rigid in such a configuration. A grave difficulty would arise, however, if both the specifier and the head-complement complex were moved from a single DS position, since the DS position would not be determinable.
XP
(14)
X′ YP
Spec X′ X
We are left with the following logical space of possibilities [Recall that at the time that this work was written, the specifier was viewed differently than it is now. It constituted the set of closed class elements associated
A RE-DEFINITION OF THE PROBLEM
21
with an open class head, among other things. For example the was considered the specifier in the picture of John; and an auxiliary verb like was was considered the specifier of the verb phrase, in a verb phrase like was talking to Mary. Thus the and was would be considered specifiers, rather than independently projecting heads. The following discussion only makes sense with this notion of specifier in mind. D. L.] (15)
Moved element complement head specifier
Detected by head complement/specifier/subcategorizing head head/subcategorizing head (?)
By “subcategorizing head” I mean the head which, if it exists, selects for the embedded head and its arguments. The idea that the subcategorizing head may determine the presence of the specifier, as well as the head of the selected phrase, may be related to the proposal, found in Fukui and Speas (1986) as well as in the categorial grammar, that, for some purposes at least, the specifier may act as the head of a given element (e.g. of an NP). I return in detail to this possibility below. The chart in (15) gives the logical space in which the problem may be solved, but leaves almost all substantive issues unresolved; more disturbingly, the process of “detection”, as it is faced by the child in (15), does not bear any obvious and direct relation to current linguistic theory. The linking of dislocated categories and their DS positions takes place, in current theory, under two relations: antecedent government and lexical government (Lasnik and Saito 1985, Chomsky 1986, Aoun, Hornstein, Lightfoot and Weinberg 1987). Let us go further, and, in the spirit of Aoun, Hornstein, Lightfoot, and Weinberg (1987), associate lexical government with the detection of the existence of the null element (and perhaps its category), while antecedent government determines the properties of that element: both constitute, in the broadest sense, a sort of recoverability condition with respect to the placement of dislocated elements. We might take the detection of the existence to take place at a particular level (e.g. PF or the surface), while the detection of properties takes place at another (e.g., LF). It was suggested earlier that, in spite of its theoretical attractiveness, the possibility that the child detected the DS position of the verb via the position of the accusative-marked object and Case Adjacency seemed unlikely, as too difficult an empirical problem (given the possible splitting up of Case-assigning features from the verb, etc.) Let us suppose that this difficulty is principled, in the sense that in the child grammar, as in the adult grammar, the movement of
22
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
heads is never “detected” by the governed element of the head, but rather by the governor. Thus the child, even though he is constructing the grammar, is using the same principles as the adult. This radically reduces the logical space of (15) to that in (16):2 (16)
Type of element moved complement head specifier
Detected by governing head governing head ?/governing head
The question mark next to the specifier in (16) is partly because of an embarrassment of riches — it could be either the higher head or the category head itself which governed the specifier — and partly because of a lack: it is not clear that either governs the specifier in the same way that a complement (e.g a subcategorized NP) is governed by a head. One thing seems quite clear: closed class specifiers are fixed, with respect to certain movement operations in a way that other elements are not. There is, for example, no way to directly question the cardinality of the specifier in English, without moving the entire NP along with it: (17) *a did you see (e man)? Nor, as Chierchia (1984) points out, is there a way to directly question prepositions, suggesting a similar constraint: (18) *To did you talk (e the man)? (“Was it to that you talked to the man?”) While prepositions are not normally thought of as specifiers of NP, but rather as enforcing their own X′ system (see Jackendoff 1977), there is a strong and persistent undercurrent that certain PPs, even in English, are simply a spell-out of Case-marking and perhaps theta-marking features, something which is, arguably part of the specifier, which later gets spelled out onto the head. If this is the case, then the data in (17) and (18), which seems to fall together in terms of pre-theoretical intuitions, may ultimately be collapsed. But why should (17) and (18) be so bad? The chart in (16), which suggests essentially that all elements are detected (i.e. lexically governed for the point of view of recovery) by their governor, gives no clue. While one might argue that
2. Recall again that the notion of specifier used is that current in 1988 [D.L. in 1999]. It includes elements like the determiner the in the picture of Mary, and was in was talking to John: closed class items specifying the head.
A RE-DEFINITION OF THE PROBLEM
23
there is some sort of constraint that, in attempting to extract a head, necessarily drags other material with it, and that this accounted for the ungrammaticality of (18), there is no way to extend this constraint to (17), under normal assumptions about the headedness of the phrase. However, even in its own terms such a constraint is dubious, since in, e.g., German and Dutch, there is verb movement without the complement of the verb being moved as well. Chierchia himself suggests that the ungrammaticality of sentences like (17) and (18) is due to a (deep) property of the type system: namely, that the system is strictly second-order (with a nominalization operator), and that no variable categories exist of a high enough type to correspond to the traces of the determiner and preposition. While Chierchia’s solution is coherent, and indeed exciting, from the point of view of the theory that he advocates, there are obvious problems in transposing the solution to any current version of GB. Indeed, even if the constraint did follow from some deep semantic property of the system, we would still be licensed in asking if there was some constraint in the purely syntactic system which corresponds to it. To the extent to which constraints are placed over purely syntactic forms, as well as (perhaps) the semantic system corresponding to it, we arrive at a position of an autonomous syntax, which, while perhaps constructed over a semantic base, retains its own set of properties distinct from the conceptual core on which it was formed. For discussion, from quite different points of view, see Chomsky (1981), where it is argued that the core relation of government is “grammaticalized” in a way which might not be determinable from its conceptual content alone, and that this sort of formal extension is a deep property of human language; see also Pinker (1984) where the notion of semantic bootstrapping plays a similar role. Returning to the problem posed by the ungrammaticality of (17) and (18), we would wish to propose a syntactic constraint which would bar the unwanted sentences, and at the same time help the acquisition system to operate: (19)
Fixed specifier constraint: The closed class specifier of a category is fixed (may not move independently from the category itself).
It is clear why something like (19) would be beneficial from the point of view of the acquisition system. The problem for that system, under conditions of extensive movement, is that there is no set of fixed points from which to determine the D-structure. Of course, a trace-enriched S-structure would be sufficient, but the child himself is given no such structure, only the surface. The fixed specifier constraint suggests that there is a set of fixed points, loci from
24
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
which the structure of the string can be determined. Further, this seems to be supported by the grammatical evidence of lack of extractability. The alternative is that the D-structure and the fact that movement has taken place, is determinable from a complex set of principles that the child has reference to (Case theory, theta theory, etc.), but without any particular set of elements picked out as the fixed points around which others may move. This possibility cannot be rejected out of hand, but a system which contained (19) instead would quite clearly be simpler. (19) itself is a generalization of Roeper’s initial proposal, that it is the element Neg which plays a crucial role; we will see later on that there is overt evidence for the role of this element in the acquisition of English. The learnability point notwithstanding, it may be asked whether a strong constraint such as (19) can empirically be upheld. While the closed class specifiers of NP clearly do not move in English, and it is curiously supportive as well that, if we do think of prepositions as in some sense reanalyzed as the specifier of the NP that they precede, that particle movement applies just in case the preposition does not have an NP associated with it (i.e. cannot be reanalyzed as a specifier), there do seem to be instances in which specifiers move at a sentential level. Not may apparently move in Neg-raising contexts (20a), and do fronts in questions, sometimes in conjunction with a cliticized not(20 b). (20)
a. b.
I don’t believe that he is coming. (= I believe that he isn’t coming.) Didn’t he leave already?
Thus, while it is in general the case that complements are more mobile than heads, and heads are more mobile than specifiers, it is by no means clear that specifiers form the “grid” necessary to determine the basic underlying structure of the language for the child. 1.3.2.1 The Movement of NEG (syntax) The syntactic problem posed by (20) for the general idea that specifiers constitute a fixed grid from which the child posits syntactic order is a difficult one, but perhaps not insuperable. The status of the Neg-raising case, is in any case unclear — e.g. as to whether movement has taken place. The problem posed by (20b) is more difficult. Given that movement has taken place, examples such as (20b) would seem to provide a straightforward violation of the fixed specifier constraint (and thus leave the learnability problem untouched). However, the example given, and the movement operation, is one which affects both Neg and the auxiliary verb: examples such as (21) are ungrammatical. (21) *Not he saw Mary.
A RE-DEFINITION OF THE PROBLEM
25
That is, the movement operation does not move Neg per se, but rather the category under which it is adjoined. If we consider this category itself to be Infl, which is not a specifier category, but rather the head of I′, then it is not the case that the closed class specifier itself has been moved to the front of the sentence, but rather I, a head constituent. The fixed specifier constraint is therefore not violated by Subject/Aux inversion. (22)
S NP
I′ Infl
VP
Infl Neg he
is
not
going
Derivation: (He ((is not) (going))) → (Isn’t he ((e) (going))) (23)
Movement types. a. Move-α. Potentially unbounded, applies to major categories, maximal projections. b. Hop-α. String adjacent, applies to minor categories, closed class elements, minimal projections.
The set of properties listed under the movement types is intended as a pretheoretical characterization only, with the formal status of this division to be determined in detail. We might include other differing properties as well: e.g., perhaps Hop-α, but not Move-α, is restricted to particular syntactic levels. Further, the exact empirical extension of Hop-α is left undetermined. In the original account (Chomsky 1975–1955, 1957), Hop-α was restricted (though not in principle) to the movement of affixes, i.e. closed class selected morphemes, onto the governed element, in particular the governed verb. In Fiengo’s interesting extension, Hop-α may be applied to other string adjacent operations involving closed class elements.3 Assuming a division of movement types such as that given in (23), the Neg movement operation adjoining not to Infl may be considered a movement
3. For a somewhat different view of Hop-α, see Chomsky (1986b).
26
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
operation of a particular type: namely, an instance of Hop-α, not Move-α. As such, the fixed specifier constraint may still be retained, but modified in the following way: (24)
Fixed specifier constraint (modified form): A closed class specifier may not be moved by Move-α.
1.3.2.2 The Placement of NEG (Acquisition) While the revision of the Fixed Specifier Constraint in (24) allows the syntactic system to retain a set of nearly fixed points, and thus simplify the acquisition problem at a theoretical level, a very interesting body of literature remains, about the acquisition sequence, which appears to directly undermine the claim that this information is in fact used by the child. This is the set of papers due to Ursula Bellugi and Edward Klima from the mid-60’s (Klima and Bellugi 1966), recently re-investigated by Klein (1982), in which it appears that Neg is initially analyzed by the child as a sentential, not a VP operator, and hence appears prior to the modified clause. Of course, if the child himself allows negation to move in his grammar, it can hardly be the case that he is using it as a fixed point from which to determine the placement of other elements. Bellugi and Klima distinguish three major stages in the acquisition of negation. (25)
a. b. c.
Stage 1: Negation appears in pre-sentential position. Stage 2: Negation appears contracted with an auxiliary, stage prior to that at which the auxiliary appears alone. Stage 3: Negation is used correctly.
Bellugi and Klima suggest that in the intermediate stage the auxiliary does not have the same status that it has in the adult grammar, since it appears only in the case that negation also occurs contracted on it. They suggest, rather, that the negation and the auxiliary form themselves a constituent headed by Neg: (NEG can (NEG not)) . Thus the fact that the negated auxiliary appears prior to any occurrences of the non-negated auxiliary is accounted for by supposing that no independent auxiliary node exists as such, the initial negative auxiliary is a projection of Neg. The data corresponding to the Stages 1–3 above is given in (26). (26)
a.
Stage 1: no see Mommy no I go no Bobby bam-bam etc.
A RE-DEFINITION OF THE PROBLEM
b.
c.
27
Stage 2: I no leave I no put dress me can’t go table etc. Stage 3: comparable to adult utterances
The problematic utterances from the current point of view are given in (26a). Given the assumption that the structure of these utterances is that in (27), the child appears to be lowering the negation into the VP, in the transition between Stage 1 and Stage 2. This, in turn, is problematic for any view that such elements are viewed as fixed for the child. (27)
(Klima and Bellugi 1966), analysis of Stage 1 negation: S Neg
S
no
NP
VP
me
like spinach
The analysis given in (27), however, is the sentential analysis of Bellugi and Klima. Recently, a new analysis has been given for the basic structure of S (Kitagawa 1986, Sportiche 1988, Fukui and Speas 1986). Kitagawa, Sportiche, Fukui and Speas argue, on the basis of data from Italian and Japanese, that the basic D-structure of S has the subject internal to the VP, though outside the V′. The D-structure of (28a) is therefore given in (28b). (28)
a. b.
John saw Mary. S NP
I′ I
e
VP NP
V′
John
saw Mary
28
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The internal-to-VP analysis allows theta assignment to take place internal to the maximal projection VP; it also provides for a variable position in the VP, as a predication-type structure results: NPi (VP ei (saw Mary)) (Predication type analysis). Kitagawa also argues that certain differences in the analysis of English and Japanese follow from this analysis. En route to S-structure, the DS subject is moved out of the VP, to the subject position. Note that such a movement is necessary, if the subject is to be assigned Case by Infl. A Kitagawa/Sportiche/Fukui-Speas type analysis of the basic structure of S receives striking confirmation from the acquisition data, if we assume that negation is fixed throughout the acquisition sequence, and that, throughout Stage 1 speech, it is direct theta role assignment, rather than assignment of (abstract) Case, which is regulating the appearance of arguments. That is, in Stage 1 speech the negation is, as expected, adjoined off of VP as the Spec of VP. However, the subject is internal to the VP: that is, in its D-structure position. The relevant structure then is that in (29). (29)
VP Spec VP
VP NP
no
me
V′ V
NP
go
mommy
In this structure, if we assume that theta role assignment to the subject is via the V′, and further, that abstract Case has not yet entered the system (see later discussion), then the resultant structure is precisely what would be expected, given the fixity of the specifier and the lack of subject raising. The apparent sentential scope is actually VP scope, with a VP-internal subject. At the point at which abstract Case does enter the system, the subject must be external, and appears, moved, prior to negation: Stage 2. I will return to this analysis, and to a fuller analysis of the role of Case and theta assignment in early grammars in later chapters. For the moment, we may simply note two properties of the above analysis: first, that the (very) early system is regulated purely by theta assignment, rather than the assignment of abstract Case. This is close to the traditional analysis in much of the psycholinguistic literature (e.g. Bowerman 1973) that the early grammar is “semantic”,
A RE-DEFINITION OF THE PROBLEM
29
i.e. thematic. The second property of this analysis is in the relation of syntactic levels, in the adult grammar, to stages in the acquisition sequence. Namely, there is a very simple relation: the two stages in the acquisition sequence correspond to two adjacent levels of representation in the synchronic analysis of the adult grammar. That is, the “geological” pattern of surface forms L1 → L2 corresponding to adjacent grammars in the child’s acquisition sequence corresponds to adjacent levels of representation in the adult grammar. This sort of congruence, while natural, is nonetheless striking, and suggests that rather deep properties of the adult grammar may be projected from the acquisition sequence, i.e. the fact of development.
C 2 Project-α α, Argument-Linking, and Telegraphic Speech
2.1 Parametric variation in Phrase Structure In the last chapter, I suggested that the range of parametric variation across languages was tied to the difference in the specifications associated with closed class elements. This strengthened the finiteness claim of Chomsky (1981), by linking the finiteness of variation in possible grammars with another sort of finiteness: that of the closed class elements and their specifications. However, this very small range of possible parametric variation still had to be reconciled with a very different fact: the apparent “scatter” of the world’s languages with respect to their external properties, so that radically different surface types appear to occur. It was suggested that this scatter was due to the interaction of the (finite and localized) theory of parametric variation in lexical items with a different aspect of the theory: that of representational levels. Slightly different parametric settings of the closed class set would give rise to different, perhaps radically different, organizations of grammars. This would include the postulate that different operations might apply at different grammatical levels crosslinguistically, as in the earlier discussion which suggested that the different role that wh-movement played in English and Chinese (Huang 1982) — a levels difference — should be tied to some property of the agreement relation which held between the fronted wh-element and the +/−wh feature in Comp, and that this, in turn, could be related to the differing status of agreement, a closed class morpheme, in the two languages. If this general sort of approach is correct, it may be supposed that large numbers of differences may be traced back in this way. 2.1.1
Phrase Structure Articulation
One difference cross-linguistically, then, would be traceable to a difference in the level at which a particular operation applied. If the foregoing is correct, then
32
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
this should not simply be stated directly, but instead in terms of the varying property of some element of the closed class set. What about differences in phrase structure? In the general sort of framework proposed in Stowell (1981), and further developed by many others, phrase structure rules should not have the status of primitives in the grammar, but should be replaced by, on the one hand, lexical specifications (e.g. in government direction), and, on the other, general licensing conditions. Within a theory of parametric variation, one would therefore expect that languages would differ in these two ways. On the other hand, really radical differences in phrase structure articulation cross-linguistically may be possible, at least if the theory of Hale (1979) is correct. Even if one did not adopt the radical bifurcationist view implicit in the notion of W* languages (see Hale’s appendix), one might still adopt the view that degrees of articulation are possible with respect to the phrase structure of a language. The flatter and less articulated a language in phrase structure, the closer that it would approximate a W* language. Of course, the question still arises of how a child would learn these cross-linguistic differences in degree of articulation, particularly if true W* languages existed alongside languages which were not W*, but exhibited a large degree of scrambling. 2.1.2
Building Phrase Structure (Pinker 1984)
Pinker and Lebeaux (1982) and Pinker (1984) made one sort of proposal to deal precisely with this problem: how might the child learn the full range of phrase structure articulation, in the presence of solely positive evidence. The answer given relied on a few key ideas. First, following Grimshaw (1981), relations between particular sets of primitives were assumed to contain a subset of canonical realizations. The possibility of such realizations were assumed to be directly available to the child, and in fact used by him/her in the labelling of the string. Thus, in the first place, the child has access to a set of cognitively based notions: thing, action, property, and so on. These correspond, in a way that is obviously not one-to-one, to the set of grammatical categories: NP, Verb, Adjective Phrase, and so on. What is the relation, if not one-to-one? According to Grimshaw, the grammatical categories, while ultimately fully formal in character, are nonetheless “centered” in the cognitive categories, so that membership in the latter acts as a marker for membership in the former: a noun phrase is the canonical grammatical category corresponding to the cognitive category “thing”; a verb (or verb phrase) is the canonical grammatical category corresponding to the cognitive category “action”; a clause is the canonical grammatical category corresponding to the cognitive category “event” or “proposition”; and so on. This
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
33
assumes an ontology rich enough to make the correct differentiations, see Jackendoff (1983) for a preliminary attempt to construct such an ontology. Crucially, the canonical realizations are implicational, not bidirectional; further, once the formal system is constructed over the cognitive base, it is freed from its reliance on the earlier set of categories. Consider how this would work for the basic labelling of a string. The simple three-word sentence in (1) would have cognitive categories associated with each of their elements. (1)
thing act thing (cognitive category) John saw Mary.
These would be associated with their canonical structural realizations in phrasal categories. (2)
NP V NP (canonical grammatical realization) thing act thing (cognitive category) John saw Mary
On the other hand, sentences to which the child was exposed which did not satisfy the canonical correspondences, would not be assigned a structure: (3)
? ? ? (cognitive category) This situation resembles a morass.
A number of questions arise here, as throughout. For example, could the child be fooled by sequences which did not only not satisfy the canonical correspondences, but positively defied them? Deverbal nominals would be a good example: (4)
VP event
(canonical grammatical realization) (cognitive category)
The examination of the patient In (4), the deverbal nominal recognizably names an event. Given the canonical correspondences, this should be labelled a VP, or some projection of V. But this, in turn, would severely hamper the child in the correct determination of the basic structure of the language. One way around this problem would be to simply note that deverbal nominals are not likely to be common in the input. A more principled solution, I believe, would be to further restrict the base on which the canonical correspondences are drawn. For example, within each language there is not simply a class of
34
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
nouns, roughly labelling things, but a distinguished subset of proper names. Of course, if the general theory in Lebeaux (1986) is correct, then derived nominals of the sort in (4) — i.e. nominalized processes or actions — actually are projections of V: namely nominalized V′s or V″s, with -tion acting as a nominalizing affix, though they achieve this category only at LF, after affix raising (see Lebeaux 1986 for such an analysis). The second property of the analysis, along with the idea that one set of primitives is related to another by being a canonical realization of the first in the different system, is that the second set is constructed over the first and is itself autonomous, and freed from its reliance on the first in the fully adult system. It is this which allows strings like that given above (E.g. “This situation resembles a morass”), which do not obey the canonical correspondences, to be generated by the grammar. We may imagine a number of possibilities in how the systems may be overlaid: it may be that the original set of primitives, while initially used by the grammatical system in the acquisition phase, is entirely eliminated in the adult grammatical system. This presumably would be the case in the above labelling of elements as “thing”, “action”, etc., which would not be retained in the adult (or older child’s) grammar. On the other hand, certain sets of primitives might be expected to be retained in the adult system. In the framework of Pinker (1984), this would include the set of grammatical relations, which were used to build up the phrase structure. In Pinker and Lebeaux (1982), Pinker (1984) the labelled string allowed the basic structure of S to be built over it in the following way: (i) particular elements of the string were thematically labelled in a way retrievable from context (Wexler and Culicover 1980), (ii) particular grammatical functions corresponded to the canonical structural realizations of thematically labelled elements (agent → subject, patient → object, goal → oblique object, etc.), (iii) grammatical relations were realized, according to their Aspects definition, as elements in a phrase marker: subject (NP, S), object (NP, VP), Oblique Object (NP, PP), and so on, (iv) the definitions in (iii) were relaxed as required, to avoid crossing branches in the PS tree. The proviso in (iv) was intended to provide for languages exhibiting a range of hierarchical structuring. Rather than specifically including each degree of hierarchical structuring as a setting in UG as a substantive universal (i.e. a possible setting for a substantive universal), the highly modular approach of (i)–(iv) allows for the interaction of the substantive universal in (iii) and the formal universal in (iv) to introduce the degree of hierarchical relaxation necessary, without specific provision having to be made for a series of differing grammars.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
35
The provisions (i)–(iv) may be put in a “procedural” format: (5)
Building phrase structure: a. Thematic labelling: (i) Label agent of action: agent (ii) Label patient of action: patient (iii) Label goal of action: goal etc. b. Grammatical Functional labelling: (i) Label agent: subject (ii) Label patient: object (iii) Label goal: oblique object etc. c. Tree-building: (i) Let subject be (NP, S) (ii) Let object be (NP, VP) (iii) Let oblique object be (NP, XP), XP not VP, S etc. d. Tree-relaxation: If (a)–(c) requires crossing branches, eliminate offending nodes as necessary, from the bottom up. Allow default attachment to the next highest node.
The combination of (c) and (d) assumes maximum structure, and then relaxes that assumption as necessary. The principle (5a) meshes with the general cognitive system, as does the node-labelling mentioned earlier. The other principles are purely linguistic, but even here the question arises of whether they are permanent properties of the linguistic system (i.e. UG as it describes the adult grammar), or localized in the acquisition system per se, as Grimshaw (1981) suggests. We leave this question open for now. I have considered above how the basic labelling would work; consider now the general analysis. The string in (6) is segmented by the child. (6)
John hit Bill.
From the general cognitive system, the segmented entities may be labelled for their semantic content. (7)
thing (name) action thing (name) John hit Bill
These cognitive terms take their canonical grammatical realization in node labels.
36
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(8)
NP thing(name)
V action
NP thing(name)
John
hit
Bill
These nodes, in turn, may be thematically labelled. (9)
NP thing(name) agent
V action
NP thing(name) patient
John
hit
Bill
The canonical realization of these theta roles is as particular grammatical relations. (10)
NP thing(name) agent subject
V action
NP thing(name) patient object
John
hit
Bill
These grammatical relations have, as in the Aspects definition, particular structural encodings. (11)
S NP thing(name) agent subject
John
VP V
NP thing(name) patient object
hit
Bill
And the phrase structure tree is complete. As Pinker (1984) notes, once the structure in (11) is built, the relevant general grammatical information (e.g. S → NP VP) may be entered in the grammar. The PS rule is available apart from its particular instantiating instance,
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
37
and the system itself is freed from its reliance on conceptual or notional categories. Rather, it may analyze new instances which do not satisfy the canonical correspondences: any pre-verbal NP, regardless of its conceptual content (e.g. naming an event, rather than a thing) will be analyzed as an NP. It is precisely this property, which represents the autonomous character of the syntactic system, after its initial meshing with the cognitive/semantic system, which gives the proposal its power. I have included, in the phrase structure tree, not purely grammatical information, e.g. the node labels, but also other sorts of information: thematic and grammatical relational, as well as cognitive. Should such information be included? There is some reason to believe not, at least for categories above. Thus, while grammatical processes require reference to node labels like NP, they do not seem to require reference to cognitive categories like “thing” or “proper name”. Giving such labels equal status in the grammatical representation implies counterfactually that grammatical processes may refer to them. This suggests, in turn, that they should not be part of the representation per se, but part of the rules constructing the representation. The situation is more complex with respect to the other information in the phrase structure tree. Thus in the tree in (11), it is assumed that thematic information (the thematic labels “agent”, “patient”, etc.) is copied onto the node labels directly, as is grammatical-functional information (Subj, Obj, etc.). Presumably, in a theory such as GB, the latter intermediate stage of grammatical functions would be discarded in favor of a theory of abstract Case. The question of whether the thematic labels are copied directly onto the phrasal nodes can also not be answered a priori, and is associated with the question of how thematic assignment occurs exactly in the adult grammar. In traditional analyzes, theta roles were thought of as being copied directly onto the associated NP: i.e. the relevant argument NP was considered to have the thematic role as part of its feature set directly. In the theory of Stowell (1981), the NP does not have the theta role copied onto it, but rather receives its theta role by virtue of a (mini-)chain which coindexes the phrasal node with the position in the theta grid. (12)
VP V
NPj
see (Ag, Patj)
Mary
38
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
While Stowell’s system is in certain respects more natural — in particular, it captures the fact that case, but not theta roles, seem to show up morphologically on the relevant arguments, and hence may be viewed as direct “spell-outs” of the feature set — the acquisition sequence given here suggests that theta roles actually are part of the feature set of the relevant NP. Since I will assume here, as throughout, that the representations adopted in the course of acquisition are very closely aligned with those in the adult grammar, this suggests that theta roles actually are assigned to the phrasal NP, at least in the case of agent and patient (which are the central roles for this part of the account).
2.2 Argument-linking The plausibility and efficacy of the above approach in learning cross-linguistic variation in phrase structure depends in part on the outcome of unresolved linguistic questions. In particular: (i) to what degree do languages actually differ in degree of articulation, (ii) to what degree may elements directly associated with the verbal head, or auxiliary, the pronominal arguments of the head (Jelinek 1984), be construed as the direct arguments, with the auxiliary NP considered to be simply adjuncts or adarguments, and (iii) the precise characterization of the difference between nominative/accusative and ergative/absolutive languages, or the range of languages that partially have ergative/absolutive properties (e.g. split ergative languages). The existence of so-called “true ergative” languages has, in particular, been used to critique the above notion that there are canonical (absolute) syntactic/thematic correspondences, and that these may be used universally by the child to determine the Grammatical Relational or abstract Case assignment in the language. Thus it is often noted that while nominative/ accusative languages use the mapping principles in (13a), ergative/absolutive languages use those in (13b) (Marantz 1984; Levin 1983). (13)
a.
b.
subject of transivitive subject of intransitive object of transitive subject of transitive subject of intransitive object of transitive
→ → → → → →
nominative nominative accusative ergative absolutive absolutive
And in fact the “true ergative” language Dyiribal is assumed to have the following alignment of theta roles and grammatical relations (Marantz 1984; Levin 1983):
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(14)
agent patient
→ →
39
object subject
The mapping principles in (13) are stated in terms of grammatical relations, but it is clear that even if the principles were re-stated in other terms (e.g those relating to abstract Case), a serious problem would arise for the sort of acquisition system envisioned above, if no more were said. The reason is that the set of canonical correspondences must be assumed to be universal, and one set of primitives (case-marking) centered in another set (thematic), perhaps through the mediation of a third set (abstract Case). The linking rules must be assumed to be universal, since if they were not so assumed, they would not give a determinate value in the learning of any language, for the child faced with a language of unknown type. It is easy to see why. Let us call the (canonical) theta role associated with the subject position of intransitives t1, the canonical theta role associated with the subject position of transitives t2, and the canonical theta role associated with the object position of transitives t3. Then nominative/accusative languages use the following grouping into the case system (15). (15)
t1 t2
nominative
t3
accusative
theta system
case system
The ergative absolutive languages use the following grouping: (16)
t1
absolutive
t2 t3
ergative
theta system
case system
This is perfectly adequate as a description, but if the child is going to use theta roles to determine the (grammatical relation and) case system in the language in which he has been placed, then the position is hopeless, since the child would have to know what language-type community he was in antecedent to his postulation of where subjects were in the language. Otherwise he will not be able
40
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
to tell where, e.g., subjects are in the language. But it is precisely this that the radical break in the linking principles evidenced in (15) and (16) would not allow. The division of language types in this fashion would thus create insuperable difficulties for the child, since there would be no coherent set of theta– Grammatical Relational or theta–Case mappings that he or she could use, to determine the position of the subject in the language. Of course, once the language-type was determined, the mapping rules themselves would be as well, but it is precisely this information that the child has to determine. There is, however, an unexamined assumption in this critique. It is that the situation is truly symmetrical: that is, that the child is faced with the choice of the following two linking patterns: (17)
G0
nom/acc pattern
erg/abs pattern
Given such an assumption, there is no way for this acquisition system to proceed. However, if the situation is not truly symmetrical — i.e. if there are other differences in the languages exhibiting ergative vs. those exhibiting nominative/accusative linking patterns — and if these differences are determined by the child prior to his/her determination of the linking pattern in (17), then the critique itself is without force. We would wish to discover, rather, how this prior determination occurs, and how it and the adoption of a particular linking pattern mesh. In fact, there appears to be evidence for just this lack of asymmetry: evidence that the majority (and perhaps the vast bulk) of ergative/absolutive languages are associated with a different sort of argument structure. I rely here on the work of Jelinek (1984, 1985), and associated work. Jelinek proposes a typological difference between broadly configurational languages, which take their associated NPs as direct arguments (e.g. English, French), and languages which she designates “pronominal argument languages”. In the latter type (she argues), the pronominal “clitics” associated with the verbal head or auxiliary are actually acting as direct arguments of the main predicate, and the lexical noun phrases are adjuncts or adarguments, further specifying the content of the argument slot. The sentential pattern of phrasally realized arguments in such languages, then, would roughly resemble the nominal pattern in English in picture-noun phrases, where all arguments may be considered to be adjuncts in relation to the head. While it is not the case that all ergative languages reveal this sort of optionality of arguments (and in particular Dyiribal does not), it does seem that
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
41
the bulk do (Jelinek 1984). If we take this as the primary characteristic of these languages, then the choice for the child is no longer the irresolvable choice of (17), but rather the following: (18)
G0 arguments obligatory
arguments (i.e. not arguments optional but ad-arguments)
nominative/ accusative
ergative/ absolutive
The “choice matrix” in (18) is undoubtedly very much simplified. Nonetheless, it appears to be the basic cut made in the data. If this is so, however, then the original decision made by the child is not that of the linking pattern used, but rather in the determination of the argument status of the phrasal arguments; this, in turn, may cue the child into the sort of language that he/she is facing. 2.2.1
An ergative subsystem: English nominals
The general pattern suggested in (18) gets support from a rather remarkable source: English nominals. It was noted above that simple nominals in English (e.g. “picture”), have pure optionality of arguments. What has been less commented on is that deverbal English nominals have an ergative linking pattern as well, in the sense of (13b) above. Deverbal nominals from a transitive stem base have the same linking patterns as their verbal counterparts (Chomsky 1970): (19)
John’s destruction of the evidence John destroyed the evidence
I assume, with Chomsky (1970), that in cases in which the DS object is in subject position (the city’s destruction) Move-α has applied. What about deverbal nominals from an intransitive base? Here the data is more complex. Appearance, from the unaccusative verb appear, allows the argument to appear in either N′ internal or N′ external position, with genitive marking on the latter. (20)
the appearance of John shocked us all John’s appearance shocked us all
42
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The second example in (20) is actually ambiguous between a true argument and a relation R type reading (the fact that John appeared vs. the way he looked, respectively), the latter reading does not concern us here. What about other intransitives? Surprisingly, the “internal” appearance of the single argument of intransitives verbs is not limited to unaccusatives, but occurs with other verbs as well. For example, sleep and swim, totally unexceptionable unergative (i.e. pure intransitive) verbs, allow their argument to appear both internal and external to the N′. (21)
the sleeping of John John’s sleeping
(22)
the swimming of John John’s swimming
This possibility of taking the “subject” argument as internal is not limited to deverbal nominals simply taking one argument, but extends to those taking other internal arguments as well, as long as those would be realized as prepositional objects (rather than direct objects) in the corresponding verbal form. (23)
a. b. c.
the talking of John to Mary (John talked to Mary) the reliance of John on Bill (John relied on Bill) the commenting of John on her unkindness (John commented on her unkindness)
Of course, deverbal nominals formed from simple transitive verbs do not allow the subject of the verbal construction to appear in the internal-to-N′ position in an of-phrase: (24)
a. *the destruction of John of the city (John destroyed the city) b. *the examination of Bill of the students (Bill’s examination of the students)
It appears, then, that the linking pattern for subjects in nominals differs, according to whether the underlying verbal form is transitive or not. In all deverbal nominals formed from intransitive verbs — not simply those formed from unaccusatives — the subject of the corresponding verb may be found in the nominal in internal-to-N′ position, in the of-phrase which marks direct arguments (see (23)), and without genitive marking in that position. This is totally impossible in deverbal nominals formed on a transitive verbal base.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
43
This clearly is an ergative-like pattern, but before drawing that conclusion directly, a slightly broader pattern of data must be examined. It is not simply the case that nominals formed from intransitive verbal bases allow their (single) direct argument to appear internal to the N′; it may appear in the external position as well, as noted above. (25)
a. b. c. d.
the sleeping of John John’s sleeping (in derived nominal, not gerundive, reading) the talking of John to Mary John’s talking to Mary the reliance of John on Mary’s help John’s reliance on Mary’s help the love of John for his homeland John’s love for his homeland
Second, genitives may appear post-posed, in certain cases: (26)
a. b.
John’s pictures of Mary the pictures of Mary of John’s John’s stories the stories of John’s
However, we may note that when the subject is postposed, it must appear with genitive marking (or in a by-phrase). (27)
a.
the pictures of Mary of John’s *the pictures of Mary of John b. John’s examination the examination of John’s (possessive reading) the examination of John (OK, but only under patient/theme reading)
A reasonable possibility for analysis is that the NP with genitive marking which appears post-head is postposed from the subject/genitive slot: this accounts for the genitive marking (see Aoun, Hornstein, Lightfoot, and Weinberg 1987, for a different analysis, in which the genitive is actually associated with a null N′ category). What appears to be the case, however, is that elements which have moved into the subject position, may no longer postpose. (28)
a. b.
John’s picture (ambiguous between possessor and theme reading, the latter derived from movement) the picture of John’s (only possessor reading)
44
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(29)
a. b.
John’s examination (ambiguous between possessor and theme reading) the examination of John’s (only possessor reading)
Thus the nominal in (28a) is ambiguous between the moved and not moved interpretation while the postposed genitive in (28b) is not, similarly for (29a) and (b). Thus the following constraint appears to be true, whatever its genesis. (see Aoun, Hornstein, Lightfoot, and Weinberg 1987, which makes a similar empirical observation, though it gives a different analysis). (30)
Posthead genitives may only be associated with the deep structure subject interpretation; derived genitive subjects may not appear posthead with genitive marking.
The constraint in (30) may now be used to help determine the argument structure of the intransitive nominals under investigation. It was earlier noted that the subject in such nominals appeared on either side of the head. (31)
a. b.
the appearance of John John’s appearance the sleeping of John John’s sleeping
Which position is the DS position? Given (30), the genitive subject should count as a deep structure subject if it may postpose with genitive marking, but not otherwise. In fact, it cannot postpose: (32)
a.
the appearance of John (startled us all) John’s appearance *the appearance of John’s b. the sleeping of John John’s sleeping *the sleeping of John’s c. the talking of John to Mary John’s talking to Mary *the talking to Mary of John’s
The inability of the genitive subject of the intransitive to postpose then constitutes evidence that the subject position is not the DS position of the single direct argument, but rather the internal-to-N′ position, and the constructions in which that element does appear in subject position are themselves derived by preposing. The argument-linking pattern for English nominals would then not be some sort of mixed system, but a true ergative-style system, given in (33).
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(33)
English nominals: t1 (subject of intransitives) t2 (object of transitives)
internal position
t3 (subject of transitives)
external position
45
This, in turn, strongly suggests that the argument linking pattern split found between the nominative/accusative languages and ergative/absolutive languages should not be itself the primary “cut” faced by the child, but rather that between languages (and sub-languages) which obligatorily take arguments, and those which do not. Even in English, a strongly nominative/accusative language, a sub-system exists which is basically ergative in character: this, not coincidentally, corresponds to the subsystem in which the arguments are optional, the nominal system. 2.2.2
Argument-linking and Phrase Structure: Summary
To summarize: I have suggested (following Pinker 1984) the need for maximally general linking rules, associating thematic roles with particular grammatical functions, or abstract case. Pinker has suggested the term “semantic-bootstrapping” to apply to such cases; it would be preferable, perhaps, to consider this a specific instance of a more general concept of analytic priority. (34)
(35)
Analytic priority: A set of primitives a1, a2, … an is analytically dependant on another set b1, b2, … bn iff bi must be applied to the input in order for ai to apply. The set of theta-theoretic primitives is analytically prior to the set of Case-theoretic primitives.
Semantic-bootstrapping would thus be a particular instance of analytical priority; another such example would be Stowell’s derivation (1981) of phrase structure ordering from principles involving Case. In order for a set of primitives to be analytically prior in the way suggested above, and for this to aid the child in acquisition, it must be the case that the analytic dependance is truly universal, since if this were not so, the child would not be able to determine the crucial features of his or her language (in particular, the alignment of the analytically dependant primitives a1, a2, … an) on the basis of the analytically prior set. It is for this reason that the existence of ergative languages constitutes an important potential counterexample to the idea of analytic priority, and its particular instantiation here, since it would be that there would be no linking regularities that the child could antecedently have access to.
46
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
However, there is fairly strong evidence that the languages differ not only in their linking patterns, but in their argument structure in general: in particular, that ergative/absolutive languages are of a “pronominal argument” structure type, with elements in the verb itself satisfying the projection principle (Jelinek 1984). The choice faced by the child would then be that given in (18), repeated here below. (36)
G0 obligatory arguments
optional arguments
nominative/ accusative
ergative/ absolutive
The association of optionality with ergativity was supported by the fact that such languages do in general show optionality in the realization of arguments, and, further, in the extensive existence of ergative splits. Most remarkably of all, the English nominal appears to show an ergative-like pattern in argument-linking: that is, the ergative pattern shows up precisely in that sub-system in which arguments need not be obligatorily realized. This suggests, then, that the idea of analytic priority, and in general the priority of a given set of primitives over another set, is viable. This general idea, however, has a range of application far more general than the original application in terms of acquisition; moreover, there is some reason to believe that while the original proposals involving semantic bootstrapping or analytic priority were made in an LFG framework, they would be able to accommodate themselves in a more interesting form a multi-leveled framework like Government-Binding theory. While the proposal that the grammar — and in particular, the syntax — is levelled has been common at least since the inception of generative grammar (Chomsky, 1975–1955), the precise characterization of these levels has remained somewhat vague and underdetermined by evidence (van Riemsdijk and Williams 1981). Within the Government-Binding theory of Chomsky (1981), this problem has become more interesting and yet more acute, since while it is crucial that particular subtheories (e.g. Binding Theory) apply at particular levels (e.g. S-structure), the general principles which would force a particular module to be tied to a particular level are by no means clear, nor is there any general characterization which would lead one to expect that a particular module or subtheory should apply at a single level (Control theory, Binding theory), while another subtheory must be satisfied throughout the derivation (theta theory, according to the Projection Principle). The problem is intensified by the fact that, given that
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
47
a single operation, Move-α, is the only relation between the levels, and the content of this operation is for the most part — perhaps entirely — retrievable from the output, the levels themselves may be collapsed, at least conjecturally, so that the entire representation is no longer in the derivational mode, but rather contains all its information in a single level representation, S-structure (the representational mode). Chomsky (1981, 1982), while opting ultimately for a derivational description, notes that in a framework containing only Move-α as the relation between the levels, the choice between these modes of the grammar is quite difficult, and appears to rest on evidence which is hardly central. This indeterminacy of representational style may appear to be quite unrelated to the other problem noted in Chapter 1, the lack of any general understanding of how acquisition, and the acquisition sequence in its specificity, fits into the theory of the adult grammar. I believe, however, that the relation between the two “problem areas” is close, and that an understanding of the acquisition sequence provides a unique clue to the theory of levels. In particular, at a first approximation, the levels of representation simply correspond, one-to-one, to the stages of acquisition. That is: (37)
General Congruence Principle: Levels of grammatical representation correspond to (the output of) acquisitional stages.
We will return to more exact formulations, and general consequences, throughout. For the present, we simply note that if the General Congruence Principle is correct, then the idea of analytic priority, and the possibility that the Case system in some way “bootstraps” off of the thematic system would be expected to be true not only of the acquisition sequence, but reflected in the adult grammar as well. Before turning to the ramifications of (37), and its fuller specification, a different aspect of phrase structure variance must be considered.
2.3 The Projection of Lexical Structure In the section above, one aspect of grammatical variance was considered, from the point of view of a learnability theory: namely, the possibility that languages used radically different argument linking patterns. Such a possibility would run strongly against Grimshaw’s (1981) hypothesis, that particular (sets of) primitives were “centered” in other sets, being their canonical structural realizations in a different syntactic vocabulary (e.g. cognitive type: phrasal category, thematic role: grammatical relation, and so on). It was argued, however, that Grimshaw’s
48
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
hypothesis could be upheld upon closer examination, and that the divergence in linking pattern was not the primary cut in the data: that being rather tied to the difference in optionality and obligatoriness of arguments. Since this divergence could be learned by the child antecedently to the linking pattern itself, the learnability problem would not arise in its most extreme form, for this portion of the data. Moreover, bootstrapping-type accounts were found to be a subcase of a broader set: those involving analytic priority, one set of primitives being applied to the data antecedently to another set. This notion would appear to be quite natural from the point of view of a multi-leveled theory like GovernmentBinding theory. We may turn now to another aspect of the structure building rules in (5), where structure is assumed to be maximal, and then relaxed as necessary to avoid crossing branches. The structure-building and structure-labelling rules are repeated below: (38)
Building phrase structure: a. Thematic labelling: i) Label agent of action: agent ii) Label patient of action: patient iii) Label goal of action: goal (Note: the categories on the left are cognitive, those on right are linguistic.) b. Grammatical Functional labelling: i) Label agent: subject ii) Label patient: object iii) Label goal: oblique object etc. c. Tree-building: i) Let subject be (NP, S) ii) Let object be (NP, VP) iii) Let oblique object be (NP, XP), XP not VP, S d. Tree-relaxation: If (a)–(c) requires crossing branches, eliminate offending nodes as necessary, from the bottom up. Allow default attachment to the next highest node.
This degree of relaxation may be assumed to occur either in a language-wide or on a construction-by-construction basis (Pinker 1984). To the extent to which the interaction of rules in (38) are accurate, they would allow the learner to determine a
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
49
range of possible structural configurations for languages, on the basis of evidence readily accessible to the child: surface order. Finally, while it is not the case that the grammars at first adopted would be a direct subset of those adopted later on (assuming some “relaxation” has occurred), it would be the case that the languages generated by these grammars would be in a subset relation. These attractive features notwithstanding, there are potential empirical and theoretical complications, if a theory of the above sort is to be fleshed out and made to work. The problem is perhaps more acute if the theory is to be transposed from an LFG to a GB framework. This is because LFG maintains grammatical functions as nonstructurally defined primitives; the lack of articulation in phrase structure in “flat” languages is still perfectly compatible with a built-up and articulated functional structure. Relevant constraints may then be stated over this structure. This possibility is not open in GB (though see Williams 1984, for a discussion which apparently allows “scrambling” in the widest sense — perhaps including flattening — as a relation between S-structure and the surface). This difficulty reaches its apex in the syntactic description of nonconfigurational languages. In the general GB framework followed by Hale (1979), and also adopted here, no formal level of f (unctional)-structure exists. Nonetheless, the descriptive problem remains: if there is argument structure in such languages, and rules sensitive to asymmetries in it, then there must be some level or substructure which has the necessary degree of articulation. If we assume with Hale that there are languages in which this degree of syntactic articulation does not exist at any level of phrasal representation (though this set of languages need not necessary include languages such as Japanese, where it may be the case that Move-α has applied instead, see Farmer 1984; Saito and Hoji 1983 for relevant divergent opinions), then this articulation must still be present somewhere. Hale’s original proposal (1979) was that the necessary structure was available at a separate level, lexical structure, at which elements were coindexed with positions in the theta structure of the head. It was left unclear in Hale’s proposal what precisely the nature of lexical structure would be in configurational languages. Because of this, in part, the proposal was sharply questioned by Stowell (1981, 1981/1982), essentially on learnability grounds. Stowell noted the difficulty in supposing that languages represented their argument structure in two radically different ways, phrase-structurally or in lexical structure: how could the child tell the difference? would lexical structure, as Hale defined it, just remove itself in languages which didn’t use it? Let us sketch one way through this, then step back to consider the consequences. We may recast Hale’s original proposal in such a way as to escape
50
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
some of these questions, using a mechanism introduced by Stowell himself: the theta grid. Suppose we identify Hale’s lexical structure simply with the theta grid itself. Whatever degree of articulation is necessary to describe languages of this type would then have to be present on the theta grid itself; the deployment of the phrasal nodes would themselves be flat. Rather than representing the lexicalthematic information on a grid, as Stowell suggests, let us represent it on a small (lexical) subtree. (39)
hit: (lexical structure)
V N agent
V V
N patient
hit It is these positions, not positions in a grid, which are linked to different arguments in a full clausal structure: (40)
S NPi
VP
The man
V
NPi
Ni (agent)
V V
the boy
Nj (patient)
hit With respect to theta assignment, this representation and Stowell’s would behave identically. However, there are key differences between the two representations. First, the representation of thematic positions as actual positions on a subtree gives them full syntactic status: this is not clearly the case with the grid representation. Second, the tree is articulated in a way that the grid is not: in particular, the internal argument is represented as inside a smaller V than the external argument. This means that the internal/external difference is directly and configurationally represented, at the level of the “grid” itself (no longer a grid, rather, a subtree). There is some reason to think that theta positions may have the “real” status given to them here. It would allow a very clear representation of
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
51
clitics, for example: they would simply be lexicalized theta tree positions (perhaps binding phrasal node empty categories). Similarly, the “pronominal arguments” of pronominal argument languages would be lexically realized elements of these positions on the small theta subtree. And, finally, the possibility of such a lexical subtree provides a solution to a puzzle involving noun incorporation. In the theory of Baker (1985), noun incorporation involves the movement of the nominal head (N) of an NP into the verb itself, incorporating that head but obligatorily not retaining specifiers, determiners, and other material. (41)
a. I push the bark. b. I bark-push. (incorporated analysis) c. *I the bark-push.
Assuming the basics of Baker’s theory, we might ask why the incorporated noun is restricted to the head, and specifically the N0 category: why, for example, it cannot include specifiers. Baker suggests that this is because movement is restricted to either maximal or minimal projections. This representation, however, suggests a different reason: the noun incorporation movement is a substitution operation, rather than an adjunction operation. As such, the moved category must be of the right type to “land” inside the word: in particular, it must not be of category bar-level greater than that available as a landing site. Since the landing site is a mere N, not an N′ or NP, the only possible incorporating elements are of the X0 level. I will henceforth use the theta subtree rather than the theta grid, except for reason of expository convenience. 2.3.1
The Nature of Projection
The Grimshaw (1981) and Pinker (1984) proposal contained essentially two parts. The first was that certain aspects of argument-linking are universal, and it is this which allows the child to pick out the set of subjects in his or her language. I have attempted above to adopt this proposal intact, and in fact expand on it, so that the principles are not simply in the mapping principles allowing the child to determine his or her initial grammar, but in the synchronic description of the grammar as well. The second part of the proposal is concerned not so much with argument-linking, but the building of phrase structure. Here I would like to take a rather different position than that considered in the Grimshaw/Pinker work. In particular, I would like to argue that the Projection Principle, construed as a continually applying rule Project-α, plays a crucial role. Recall the substance of the proposal in (38), where arguments are first
52
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
thematically labelled, then given grammatical functions, and then given PS representations, relaxed as necessary to avoid crossing branches. From the point of view of the current theory (GB), there are aspects of this proposal which are conceptually odd or unexpected, and perhaps empirical problems as well. First, the tree-building rules given above require crucial reference to grammatical relations. In most versions of Government-Binding Theory, grammatical relations play a rather peripheral role, e.g. as in the definition of function chains (Chomsky 1981). Second, the tree representation itself is given an unusual prominence, in that it is precisely the crossing of branches which forces a flattening of structure. This extensive reliance on tree geometry runs against the general concept of theories like Stowell (1981). Further, there is an empirical problem. Given the possibility that two different sorts of languages exist, those which are truly flat, and those which are not but have extensive recourse to Move-α — such at least would be the conclusion to be drawn by putting together the proposals of Hale (1979), Jelinek (1984), on the one hand, and Saito and Hoji (1983) on the other — the simple recourse to flattening in the case in which the canonical correspondences are not satisfied cannot possibly be sufficient. Finally, it is odd again that the Projection Principle, which plays such a large role in the adult grammar, should play no role at all in the determination of early child grammars. We may pose the same questions in a more positive way. The child starts out with a one-word lexical-looking grammar. From that, he or she must enter into phrasal syntax. How is that done? Let us give the Projection Principle, construed as an operation as well as a condition on representations, central place. In particular, let us assume the following rule: (42)
Project-α
Project-α holds at all levels of representations; it is a condition on representations. However, it is also a rule (or principle) which generates the phrasal syntax from the lexical syntax, and links the two together. Thus, looking at the representation from the head outward, we may assume that the phrasal structure enveloping it actually is projected from the lexical argument structure. This relies crucially, of course, on the theta subtree representation introduced above.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(43)
53
V
lexical form: N agent
Project-α
V V
N patient
hit (44)
phrasal projection:
x
V
NP1
V
x
V N1 agent
NP2 V
V
N2 patient
hit In the representation in (43)–(44), the lexical representation of the verbal head has projected itself into the phrasal syntax, retaining the structure of the lexical entry. In effect, it has constructed the phrasal representation around it, “projecting out” the structure of the lexical head. Into this phrasal structure, lexical insertion may take place. I have left it intentionally vague, for now, what the level of the phrasal projections which are the output of “Project-α” is, calling them simply Vx. Similarly, the question of whether the thematic role (agent, patient, etc) projects has been left undetermined (no projection is shown above). There are 4 areas in which the above representation may be queried, or its general characteristics further investigated: i)
Given that the phrase structure is already articulated, does not the addition of an articulated lexical representation, in addition to the PS tree, introduce a massive redundancy into the system? ii) Assuming that Project-α does occur, is all information associated with the lexical structure projected, or just some of it? Further, are there any conditions under which Project-α is optional, or need not be fully realized? iii) Given that Project-α is an operation, as well as a condition on ultimate representations, is there any evidence for pre-Project-α representations, either in acquisition or elsewhere?
54
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
iv) How might the typology of natural languages, in particular, Hale’s (1979) conjecture that nonconfigurational languages are flatter in structure (i.e. less articulated with respect to X′-theory), be associated with differences in Project-α? In the following sections, I will be concentrating on questions (ii)-(iv). The question — or challenge — posed in (i), I would like to consider here. It requires a response which is essentially two-pronged. The first part concerns the nature of the lexical representation that is employed: that is, as a subtree with articulated argument structure, including potential order information (e.g. the internal argument is on the right), rather than the more standard formats with the predicate first, and unordered sets of arguments (Bresnan 1978, 1982; Stowell 1981). In part, the problem posed here is simply one of unfamiliarity of notation — an unordered set of arguments may be converted into a less familiar tree structure with no loss of information — but in part the most natural theoretical commitments of the notations will be different, and the notation adopted here requires a defense. I have suggested above one line of reasoning for it: namely, by allowing the theta grid real syntactic status, the placement of elements like clitics (and perhaps noun-incorporation structures) can be accounted for. Two other consequences are: the notion of “Move-α in the lexicon” (Roeper and Siegel 1978; Roeper and Keyser 1984) would be given a natural format if tree structures are assumed; perhaps less so in the more standard Bresnan-Stowell type notation. Moreover, making the usual assumptions about c-command and unbound traces (namely, that traces must be bound by a c-commanding antecedent), the tree notation makes a prediction: externalization rules in the lexicon should be possible, but internalization rules should not. This is because the latter rule would be a lowering rule, leaving an unbound trace; the externalization rule would not. (45)
Externalization: V Ni agent
V V melt
Ni patient
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(46)
55
Internalization: V * Ni agent
V V
Ni patient
melt
Aside from this empirical consequence, which falls out automatically from the notation and requires verification or not, the tree notation has a further consequence, bearing on a theoretical proposal of Travis (1984) and Koopman (1984). Travis and Koopman suggest that there are two distinct processes, Case assignment (or government) and theta assignment (or government), and that each of these processes is directional. Thus, in the theory of Travis (1984), these processes may be opposite in directionality, for some categories (e.g. V in Chinese), with Move-α occurring obligatorily as a consequence. The notion that case-assignment is a directional process is a natural one, the notion that thetaassignment is directional is perhaps less so, at least under the version of theta assignment outlined in Stowell (1981), where theta roles are assigned under coindexing with positions in the grid. Note, however, that given the tree-type representation of lexical entries above, where order information is included, the idea that theta government is directional becomes considerably more natural. This is because, to the extent to which Project-α faithfully projects all the information associated with the lexical entry, the order information encoded in the lexical entry would be expected to be projected as well. Thus the Koopman-Travis proposal, and the proposal for the format of lexical entries suggested here, mutually reinforce each other, if it is assumed that Project-α is a true projection of the information present in the lexical representation. This partially answers the question posed by i): the existence of “redundancy” in the chosen lexical representation, and that given in the syntax. While the two representations are close to each other in format and information given, this is precisely what would be expected under interpretations of the Projection Principle in which the projected syntactic information is faithfully projected from the lexicon. In fact, we may use the syntactic information — so the Projection Principle would tell us — to infer information about the structure of the lexical entry, given a strict interpretation of that principle. There is, however, a different aspect of the problem. The above argument
56
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
serves to weaken criticisms about redundancy based on the similarity between the lexical representation and the syntactic: in fact, just such redundancy or congruence might be expected, given the Projection Principle. It might still be the case that both sorts of information, while present in the grammar, are not present in the syntactic tree itself. This would represent a different sort of redundancy. The output of Project-α given above includes both the lexical information, and the syntactic (as does Stowell’s theta grid).
V
(47)
N agent
VP Project-α
V V
NP
N patient
V′ V
N agent
NP V
V
N patient
The arguments for this sort of reduplication of information are essentially empirical. They depend ultimately on the position that implicit arguments play in the grammar. In many recent formulations (e.g. Williams 1982), implicit arguments are conceived of as positions in the theta grid not bound by (coindexed with) any phrasal node. As such, they may be partially active in a syntactic representation; co-indexed with the PRO subject of a purposive, or an adverbial adjunct (following Roeper 1986), even though the position associated with them is not phrasally projected. (48)
a. b.
The boati (vp was (v agj (v sunk patienti)) ti PROj to collect the insurance). The pianoi (vp was (v agj (v played themei)) ti (PROj nude)).
Of course, if such positions are available in the syntax, even partially, they must be present in the representation. 2.3.2
Pre-Project-a representations (acquisition)
The theory outlined above has two central consequences. First, the Pre-Project-α representation has a structure which is fully syntactic, a syntactically represented subtree. Second, this subtree is available in principle prior to the application of Project-α, since the latter is interpreted both as an operation, and as a constraint
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
57
on the outputted representations. In the normal course of language use, no instances of Pre-Project-α representations would be expected to be visible, since one utters elements which are sentences, or at least part of the phrasal syntax, and not bare words. However, this need not be the case with the initial stages of acquisition, if we assume, as in fact seems to be the case, that the child starts out with a “lexical-looking” syntax, and only later moves into the phrasal syntax. The idea that the child may be uttering a single lexical item, even when he or she is in the two (or perhaps three) word stage, becomes plausible if we adopt the sub-tree notation suggested earlier. It was suggested above that a verb like want, in its lexical representation, consists of a small subtree, with the positions intrinsically theta-marked.
V
(49)
N agent
V V
N theme
want There were shown above a number of constructions from Braine (1963, 1976) from very early speech. Among them were the want constructions from Steven: (50)
want want want want want want want etc.
baby car do get glasses head high
These appear directly after the one-word stage. One possibility of analysis is that these are (small) phrasal collocations. Suppose we assume instead that they are themselves simply words: the lexical subtree in (49), with the terminal nodes (or the object terminal node) filled in.
58
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
V
(51)
N agent
V V
N theme
want
baby
Then the child is, at this point, still only speaking words, though with the terminals partly filled in. Can other constructions be analyzed similarly? It appears so, for the nounparticle constructions (Andrew, word combinations), if we assume that a particle subcategorizes for its subject.
P
(52)
N theme
P
boot
off
A different, interesting set is the collocations of a closed class referring element and a following noun. These have often been called “structures of nomination”: (53)
that that that that etc.
Dennis doll Tommy truck
it bang it checker etc.
A similar set involves a locative closed class subject, followed by various predicative nouns. (54)
Steven word combinations: there ball here bed there book here checker there doggie here doll etc. etc.
From the point of view of the current proposal, these examples are potentially problematic on at least two grounds. First, we have been assuming, essentially
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
59
following Braine, that the initial structures were pivot-open (order irrelevant), where this distinction is close to that of head-complement or subcategorizersubcategorized. The set of pivots (heads) is small compared to the set of elements that they “operate on”. However, under normal assumptions, it is the predicate of a simple copular sentence (which the sentences in (53) and (54) correspond to) which is the head, and subcategorizer, and least in the semantic sense (though INFL may be the head of the whole clause). (55)
S NP
H
John
is a fool
The data in (53) and (54), however, suggest the reverse. In all these cases it is the subject which is closed class and fixed (the pivot), and the predicative element freely varies. Trusting the acquisition data, we must then say that it is the subject which is acting as the head or operator, and that is “operating on” the open element, for simple subject-copula-predicative phrase constructions, though not generally: (56)
i) ii) iii) iv)
here __ there __ that __ it __
This suggests in turn that simple sentences may differ with respect to their semantic and syntactic “headedness”, depending on whether they are copular or eventive, headed by a main verb predicate. In the latter structures (e.g. want X), the main verb seems to act as the pivot or functor category; in the former (e.g. here X, it X), the subject seems to act in a similar manner. This syntactic fact from early speech is in accord with a semantic intuition: namely, that eventive sentences are in some sense “about “ the event that they designate, while copular sentences are about the subject. Precisely how these semantic intuitions may be syntactically represented, I leave open. In effect, the structures in (56) are like small clause structures, without the copula. In the literature, two sorts of small clauses are often distinguished: simple small clauses and resultatives. The former are shown in (57a), the latter in (57b).
60
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(57)
a. b.
John ate the meat (PRO nude) I consider that a man I painted the barn black I want him out
Interestingly, both of these structures are available in the acquisition data above, but with different classifications of what counts as the pivot. Corresponding to the simple small clauses are the structures of nomination in (58), where the subject acts as pivot. Corresponding to the resultatives are the noun-particle constructions given in (52). (58)
a.
b.
that __ it ___ here __ __ off
The former have their pivot (head) to the left, it is the subject. The latter have their pivot (head) to the right. This suggests that, semantically, the subject “operates on” (or syntactically licenses) the predicate in the constructions in (58a) in the child grammar (and, ceteris paribus, in the adult grammar as well), while the reverse occurs in (58b). This is additional evidence for speculations directly above about differences in argument structure in these types of sentences, both at a broad semantic level, and syntactically. 2.3.3
Pre-Project-a representations and the Segmentation Problem
The section above provides a plausibility argument that the initial representations are lexical in character, and that the pivot is to be identified with the head of a Pre-Project-α representation, while the open element is a direct realization of the position in the argument structure. I will consider here some additional data, and a constraint relevant to that determination. One aspect of the analytical side of the child’s task of acquisition in early speech must be the problem of segmentation. The child is faced with a string of elements, largely undetermined as to category and identity, and must, from this, segment the string sufficiently to label the elements. Some part of this task may be achieved by hearing individual words in isolation (and thus escaping the segmentation task altogether), but it is clear that the vast majority of the child’s vocabulary must be learned in syntactic context, through some sort of segmentation procedure. It has often been assumed that the segmentation task requires extensive
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
61
reliance on phrase structure rules. Thus given the phrase structure rule in (59), and given the string in (60) with the verb analyzed but the NP object not analyzed yet by the child, the application of the phrase structure rule (59) to the partially labelled string in (60) would be sufficient to determine the category of the object. (59) (60)
VP → V NP V ? see the cat
→
V NP see the cat
Similarly, given the phrase structure rule in (61), and the partially analyzed string in (62), the identity of the second element may be determined. (61) (62)
NP → Det N Det ? an albatross
→
Det N an albatross
While these simple examples seem to allow for a perspicacious identification of categories on a phrase structure basis, both empirical and theoretical complications arise with this solution, when the problem is considered at a more detailed level. The empirical problems are of two sorts: the indeterminacy of category labelling given the optionality of categories in the phrase structure rule, and the mislabeling of categories, with potentially catastrophic results. An example of the former can be seen very easily at both the VP and NP level. A traditional, Jackendoffian expansion of the VP might be something like the following: (63)
VP → V (NP) (NP) (PP) (S′) …
And that of the NP is given in (64). (64)
NP → (Det) (Adj) N (PP) (PP) (S′) …
Consider the child’s task. He or she has isolated the verb in some construction. A complement appears after it. What is the complement’s type? The expansion rule in (63) is nearly hopeless for this task. Suppose that the child has a relatively complete VP expansion. This is then laid over the partially segmented string: (65) (66)
V (NP) (NP) (PP) (S′) see beavers beavers: NP? PP? S?
62
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The result, given in (66) is nowhere near determinate. In fact, the unknown element may be placed as any of the optional categories: this extreme indeterminacy of result makes the procedure very unclear as to result. The same holds for the nominal expansion. Suppose that the child hears the following: (67)
American arrogance
This has two analyses under his/her grammar, perhaps with additional specification of the determiner in (68b). (68)
a. b.
( (American)A (arrogance)N)NP ( (American)DET (arrogance)N)NP
Again, the category assigned by the phrase structure rule is indeterminate, and perhaps misleading. It may be objected at this point that the above criticism, while correct as far as it goes, does not take account of the fact that additional evidence may be brought to bear to lessen the indeterminacy of the analysis. For example, in (66), the category of beavers may be initially placed as (NP, PP, S′), but the choice between these is drastically narrowed by the fact that beavers name things, and the canonical structural realization of things is as NPs. However, while this assertion is correct, it allows a mode of analysis which is too powerful, if the phrase structure rule itself is still assumed to play a central role. If the child is able to segment the substring beavers from the larger string, and if the child knows that this labels a thing, and is able to know that things are NPs (in general), then the category of beavers can itself be determined by this procedure, without reference to the phrase structure analysis at all. In fact, things are even more difficult for the phrase structure analysis than has so far appeared. We have been assuming thus far that the child has a full phrase structure analysis including all optional categories, and has applied that to the input. Suppose that, as seems more likely, the initial phrase structure grammar is not complete, but only has a subset of the categories. The VP, let us say, has only the object position; the NP has a determiner, but no adjective. Let us further assume that these categories have so far been analyzed as obligatory — i.e. no intransitive verbs (or an insufficient number of them) have appeared in the input, similarly for determinerless nouns. (69)
VP → V NP NP → Det N
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
63
While it is obviously not the case that every child will have the phrase structure rules in (69) at an early stage, the postulation of these rules would not seem out of place for at least a subset of the children. Consider now what happens when the child is faced with the following two instances. (70)
a. b.
go to Mommy Tall men (are nice)
The application of the PS rules in (69) to the data in (70) gives the following result: (71) (72)
( (go)V ( (to)D (Mommy)N)NP ( (tall)D (men)N)NP
To is mislabeled as a determiner, and to Mommy as an NP; tall is also mislabelled as a determiner. Of course, this misidentification of categories is not recoverable from on the basis of positive evidence (the category would just be considered to be ambiguous); worse, the misidentification of a category would not be local, but would insinuate itself throughout the grammar. Thus the misidentification of to as a determiner might result in the following misidentification of the true determiner category following it, and in the following misidentification of the selection of other verbs: (73) (74)
a. to this store b. ( (to)D (this)A (store)N)NP ( (talk) ( (to)D (Mary)N)NP
It is precisely this indeterminacy of analysis, and possibly misleading analyses, which led Grimshaw (1981) and Pinker (1984) to adopt the notion that structural canonical realizations play a crucial role in early grammars, with the analytic role of phrase structure rules reduced. However, the Grimshaw/Pinker analysis does not go far enough. For the later stages of grammars, the optionality of elements in the phrase structure rule would make the output of analytic operations involving them useless (giving multiple analyses), while in the early stages, such analyses could be positively misleading, with erroneous analyses triggering other misanalyses in the system. Let us therefore eliminate their role entirely: (75)
No analytic operations occur at the phrase structure level.
A generalization such as (75) would be more or less expected, given Stowell’s (1981) elimination of phrase structure rules from the grammar. It is nonetheless
64
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
satisfying that the elimination of a component in the adult grammar is not paired with its ghostly continuance in the acquisition system. The segmentation problem, however, still remains. Here I would like to adopt a proposal, in essence a universal constraint, from Lyn Frazier (personal communication): (76)
Segmentation Constraint: All analytic segmentation operations apply at the word level.
Let us suppose that (76) is correct. Suppose that the child hears a sentence like the following: (77)
I saw the woman.
Suppose that the child has already identified the meaning of see, and thus the lexical thematic structure associated with it. According to the discussion above, this is a small lexical sub-tree.
V
(78)
N agent
V V
N theme
The subtree in (78) is applied the input. In the resultant, the closed class determiner, which is causing the N to project up to the full maximal node (see later discussion), is dropped out.
V
(79)
N agent
V V
N theme
saw
woman
I have included the subject as optional in (78) partly for syntactic reasons (it is by no means clear that the subject is part of the lexical representation in the same sense as the object, see Hale and Keyser 1986a, 1986b), and partly because of the acquisition data (the subject appears to be optional in child language, but we may conceive of this as due to some property of the lexical entry, or the case
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
65
system, rather than due to the existence of pro-drop: see later discussion). The dropping of the determiner element in the child’s speech falls out immediately if we assume that the child is speaking words (which contain simple θ-bar-level nominal constituents); this assumption is being carried over from the discussion of the initial utterances. The isolation of the nominal category (woman, here), occurs at the word or word-internal level. Frazier’s segmentation constraint therefore fits in well with the fact that the early grammars drop determiner elements. If such elements are part of the phrasal syntax (i.e. project up to the XP, or cause the head to so project), and if such syntax is not available in these very early stages, then the dropping of the determiner elements is precisely what would be expected. Further, the segmentation of early strings becomes a morphological operation, which is surely natural. 2.3.4
The Initial Induction: Summary
I will discuss further aspects of the analytic problem, and the role that the open class/closed class distinction may play in the identification of categories, and in the composition of phrase structure itself, further below. For the moment, I would like to summarize the differences between the proposals above, and the Grimshaw/Pinker proposals with which the chapter began, since the current comments suggest a difference in orientation not only with respect to the building and relaxation of phrase structure (the second part of the proposal), but with the nature of the generalization involved in argument-linking as well. In the Grimshaw/Pinker system, the following holds: i)
ii)
The subject and object are identified by virtue of their associated thematic relations, agent and patient; similarly for other thematic roles. This identification is universal (the assumption of canonical grammatical-functional realizations). Cross-linguistic variation in articulation of phrase structure follows from a constraint on the phrase structure system: no crossing branches are allowed. Given this constraint, and given the apparent permutation of subject and object arguments, the phrase structure tree of the target language will be flattened.
In the proposal above, the Projection Principle, in the form Project-α, is given pride of place. In addition, it is claimed that i) lexical representations are articulated (sub-)tree representations, which Project-α faithfully represents the information in, and ii) that very early representations are lexical in character. The proposals corresponding to the Grimshaw/Pinker proposals would then take the
66
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
following form. First, the primitives subject and object would be replaced with Williams’ (1981) categorization of internal and external argument. This would be present structurally in terms of the articulation of the phrase structure tree, with the external argument more external than the internal argument, though perhaps not external in Williams’ sense of external to a maximal projection (at least at D-structure, see Chapter 1, discussion of Kitagawa and Sportiche). The external/ internal distinction would also be present in the lexical sub-tree. Proposal i) of Grimshaw/Pinker would therefore correspond to the following: (80)
Agents are more external than patients (universally).
If we assume (80) to hold both over lexical representations, and, by virtue of the Projection Principle, over syntactic representations as well, then the child may, upon isolation of elements carrying the thematic roles agent and patient, determine the following sub-tree.
V
(81)
(N) agent
V V
N patient
This has the same content as the original proposal in terms of the articulation of the initial trees, but eliminates the primitives subject and object from the system. 2.3.5
The Early Phrase Marker (continued)
I have suggested above that the initial grammar is lexical, in the sense of its content being determined by X0 elements. In general, the phrasal system is entered into at the point at which determiner elements, i.e. closed class specifier elements, are met in the surface string. This general picture, together with the fact that the lexicon and the syntax use the same type of formal representation, namely tree-structures (acyclic, directed, labelled graphs, with precedence defined), suggests that the demarcation between the lexicon and the syntax is not as sharp as has sometimes been claimed. Let me now expand a bit on the type of the representation, since the claim that the child is still operating with lexical representations at (say) the two-word stage is unnecessarily dramatic, though still significant. The actuality is that the representation is pre-Project-α, and that more than one level characterizes representations of this type.
67
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
Let us divide the pre-Project-α representation into two relevant levels (there may be others, cf. Zubizarreta 1987, they do not concern us now). The first will be called the lexical representation proper; the second is the thematic representation. Both of these are tree representations. The difference is that OC (open class) lexical insertion has taken place in the latter, but not the former. (82)
Lexical representation V N agent
OC insertion
V V hit
Thematic representation V
N patient
N agent
V V
man
N patient
hit dog S Project-α
NP The man
VP V
NP
hit
the dog
In the sequence of posited mappings, open class lexical insertion (and perhaps other operations) distinguish the thematic representation from the lexical representation. The rule Project-α projects the thematic representation into the phrasal syntax. The claim that the child is speaking “lexical representations” when he/she is speaking telegraphic speech may now be recast as: the child is speaking thematic representations. This is a pre-Project-α representation, but does not correspond to any single accepted level in current theories (e.g. GB or LFG). It has rather an intermediate character between what is generally assumed to be lexical and syntactic levels. Let us turn now back to the segmentation problem. We may simply view this as the determination of the set of primitives that the child employs at PF (where the segmentation occurs). One obvious fact about acquisition is that at the stage where the child commands telegraphic speech, he or she is able to isolate out the open class parts of the vocabulary from parental speech. Assuming that Motherese does not play a crucial role in this process (though it may speed it up, Gleitman, Gleitman, and
68
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Newport 1977), we may view this as involving the child parsing the initial full sentence which is heard with a lexical representation of the head, a tree with precedence and externality/internality defined, giving rise to the thematic representation. For the case of (83), this would be the following:
V
(83)
N
V
V N see Input: The bureaucrat saw the monster Retrieved representation: (V bureaucrat (V see (N monster))) The representation which would be applied to the input would be the lexical representation in the sense suggested above: a headed subtree, with precedence and externality structurally defined. The non-head nodes correspond to lexical, not phrasal, categories. As such, only part of the input string would be analyzed: that corresponding to the full lexical categories. As such, there is no need for a separate parsing structure here apart from that already implicated in the grammar itself: the parsing is done with a representation which is part of the grammar, the lexical representation (and it returns the thematic representation). A second example would be the following: the structures of nomination. These are given in (84), where, as already noted above, the subject appears to act as a fixed functor category, taking a series of predicates as arguments. These occur in copular constructions, not elsewhere. (84)
a. b.
that ball that ___ it mouse it ___
These may be formally represented as involving a tree structure with a fixed subject and open predicate. applied successively to different inputs. The resultant is a partially analyzed string.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
69
(85)
N
V
lexical/thematic representation
that (86)
N
.
that is a
N ball
The copular element and the determiner drop out of the analyzed representation; as in the case of the headed eventive structures, it is the lexical/thematic structures themselves which parse the string. In the adult grammar, these initial representations are licensing conditions: the that __ above licenses the predicate that follows it, at the thematic level of representation (the thematic representation exists in the adult grammar as well). This mode of analysis suggests that the complexity of initial structures should be viewed as a function of the complexity of the primitive units, and of the relations between them. As such they may be used to gain knowledge of what the primitive units are. As noted by Bloom (1970), children’s ability to construct sentences does not proceed sequentially as the following hypothetical paradigm might suggest. bridge big bridge a big bridge build a big bridge This suggests that the child’s grammar, while built of more primitive units, is not built purely bottom-up: see also discussion next chapter. More natural sequences are those given below: (87)
(88)
I see see bridge see big bridge like Mary I like Mary
These sequences tend to isolate the government relation, and then add elements to it (e.g. in (87)). One prediction of the account above would be that in the two word stage, the government relation, and perhaps other types of licensing
70
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
relations (e.g. modification) would be paramount; two word utterances not evidencing these relations would be relatively rare. That this is so is hardly news in the acquisition literature (see Bloom 1970; Brown 1973, and a host of others), but the difficulty has been to model this fact in grammatical theory. Phrase structure representations do not do well; the idea of primitive licensing relations, and compounds of these (see also Chomsky 1975–1955) would do much better. This would give one notion of the complexity of the initial phrase marker. Another point of reference would be the differences in complexity between subjects and objects. Much early work in acquisition (by Bloom and others) suggested that the initial subjects were different than objects in category type: less articulated, and much less likely to be introduced with a determiner or modifier. This was encoded in Bloom (1970) by introducing subjects with the dominating node Nom rather than NP in the phrase structure rule, since it showed different distributional characteristics. This is shown in (89). (89)
S → VP →
Nom VP V NP
If there is a distinction like this (though see the critical discussion in Pinker 1984), then it would follow quite naturally from the sort of theory discussed in this chapter. Given a rule of Project-α, one might ask: is there an ordering in how arguments are projected? In particular, are arguments projected simultaneously, or one-by-one? Projection here would be close to the inverse operation to that of “argument-taking”. The Bloom data suggests an answer, assuming, as always, a real convergence between the ordering of stages in acquisition and the ordering of operations in the completed grammar. Suppose the following holds: (90)
The projection of internal arguments applies prior to the projection of external arguments (in both the completed grammar and in acquisition).
Then the Bloom data and the difference between the subject and object would follow. I leave this open. Let us turn back to the segmentation problem (which may be simply identical to determining the set of primitive structures at PF). I suggested earlier that the parsing of initial strings was done by application of the thematic representation, a pre-Project-α representation, to the phrasal input.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
71
V
(91)
N
V V
N
the bureaucrat saw the monster Returns:
V
N
V V
bureaucrat see
N monster
This returns the open class representation given in (91). Ultimately, the category of the determiner (and its existence), must be determined as well. I suggested above that all segmentation takes place at the lexical level. This means that for the determiner to be segmented out, it must first be analyzed as part of a single word with the noun that it specifies. This means that one of two representations of it must hold. (92)
a.
N (at PF) Det the
b.
or
N man Det (at PF)
Det the
N man
For the time being, I will assume the first of these representations, the more traditional one. This goes against the position of Abney (1987a), for example. There is a reason for this, however. With the notion that the noun is the head of the NP, we are able to keep a coherent semantic characterization of the specifier element (in general) — it contains closed class elements, and not other elements. The notion that all elements are projecting heads in the NP, no longer allows us to keep this semantic characterization.
72
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The notion of functor category that I am using here, deriving from Braine’s work on pivots, is different from that of Abney (1987a), Fukui and Speas (1986), and different as well from the notion of functor in categorial grammar. I have taken the following categories as functors, in the child grammar and presumably in the adult grammar as well: verbs in eventive structures, prepositions in constructions like “boot off”, and the (deictic) subject in sentences like “that ball”. This is clearly not a single lexical class, nor is it all closed class elements. It is closer to the notion of governing head, but here as well differences appear: that in “that ball” would not normally be taken as a governing head. Let us define a functor or pivot as follows: (93)
G is a functor or pivot iff there is a lexical representation containing G and an open variable position.
As such, the extension of the term functor is an empirical matter, to be determined by the data. In these terms, there is some reason to think that closed class determiners are not functors or pivots, in the sense of (93). While the child does exhibit replacement sequences like those in (94) (with the verbal head taking the nominal complement), he/she does not exhibit structures like those in (95). (94)
(95)
see ball see man see big ball the ball (not present in output data) the man the table
The presence of the former type of sequences may be viewed as a result of the child inserting a nominal into the open position in the lexical frame in (96).
V
(96)
V
N
see The fact that the latter does not occur may be taken to suggest that there is no comparable stored representation such as (97) in the grammar.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(97)
73
Det or N Det
N
the Note that a pragmatic explanation, that such lists are unnecessary for the child for the cases in (97), would be incorrect. The child needs to identify the set of cardinal nouns in his language, and the formation in (97) would be a way to do it. Curiously, the phenomenon of lexical selection, and its direction, may have some light shed on it by second language acquisition, particularly by the format of grammatical drills. A common approach is to retain a single head, and vary the complement. (98)
a. b. c.
(ich) sah den Mann. (verb repeated, object alternated) I saw the man. (ich) sah das Maedchen. I saw the girl. (ich) sah die Frau. I saw the woman. etc.
Much less common would be lists in which the object is held constant and the complement-taking head varied. (99)
a. b. c.
(ich) sah den Mann. (object repeated, verb alternated) I saw the man. (ich) totet den Mann. I killed the man. (ich) kusste den Mann I kissed the man. etc.
This may be viewed as a means of inducing the head-centered thematic representations suggested above, while replacing the complement with a variable term. Induced representation for (98): (100)
V V sah
N
74
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Note that this answers the question asked in Aspects (Chomsky 1965): should the subcategorization frame be a piece of information specified with the verbal head, or should the subcategorization (e.g. __ NP) be freely generated with the complement, and a context-sensitive rule govern the insertion of the verb? Only the former device would be appropriate, given the data here. (The Projection Principle derives the result from first principles.) Another type of grammatical drill is one in which the subject is kept constant, and the copular predicate varied: (101) a. b. c.
Der the Der the Der the
Tisch table Tisch table Tisch table
ist is ist is ist is
grau. (subject constant, copular predicate varied) gray. blau. blue. grun green.
Much less common is one in which the subject of an intransitive is kept constant and the verb varied: (102) a. b.
Der the Der the etc.
Mann man Mann man
schlaft. sleeps totet. killed
This suggests again that there is a difference in thematic headness, with the simple copular predication structure being “about” the subject, and forming a nominal-variable structure like that in (103), while the eventive intransitive does not form a subject headed structure (104). (103)
N
A
Der Tisch (104)
V N Der Mann
V
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
75
Finally, we note that lists in which the determiner is held constant and the nominal varied are not common at all. (105) der Mann der ___ etc. This suggests, again, that these do not form the sort of headed lexical structures (with the determiner as head) noted above. The data from second language drill, and first language acquisition (first stages) are therefore much in parity. While it is true that second language learning grammars have the disadvantage of attempting to teach language by direct tuition, the particular area of language drills is not one for which a (possibly misleading) prescriptive tradition exists. Rather, linguistic intuitions are tapped. It is therefore striking that the variable positions in such drills correspond to what may be viewed as the open position in a functor-open analysis (or governed-governee in a broad sense). I have suggested, then, the following: there is a set of lexical-centered frames which are applied by children to the input data; the head or functor element corresponds to a fixed lexical item. These are in addition part of the adult grammatical competence. Let us return now to the question of how closed class elements are analyzed in the initial parse. 2.3.6
From the Lexical to the Phrasal Syntax
The result of applying the verbal lexical entry to the full phrase structure string is an instance of parsed telegraphic speech. (106)
V N
V V
N
The bureacrat saw the monster The residue of this analysis are the closed class elements: the subject determiner, Infl, and the object determiner. In the earliest speech, these are simply filtered out. At a later stage, these are not part of active speech, but understood at least partly in the input, before being mastered. The acquisition problem therefore is the following: how, and when, are these elements incorporated into the phrase marker? I have suggested already the first step: the lexical representation of the head
76
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(see, in this case) is projected over the entire structure, parsing the open class part of the structure. The problem with the closed class elements is that they must be analyzed in the phrase marker antecedent to their identification as markers of categories of given types (this would follow if telegraphic speech really is a type of speech, and shows the primary identification of the open class heads). I will assume at this point a uniform X0 for all the closed class elements. That is, they must be attached first on simply structural grounds. Let us attach determiners and closed class elements into the structure as follows. (107) a. b. c.
Segment the independent closed class elements. Identify the Principle Branching Direction of the language (Lust and Mangione 1984; Lust 1986). Attach each closed class element in accordance with the Principle Branching Direction (and the already established segmentation).
Proceeding from right-to-left, the child would, apart from the strictures in (107) be allowed either of the two attachments for the, actually associated with the object (assuming binary branching as a first approximation): (108) Two possible attachments: a. Right attachment: V
N
V V
N
The bureacrat saw the monster b.
Left attachment: V N
V V
N
The bureacrat saw the monster Given (107), only the structure in (108a) would be the appropriate one. Three questions: How is the Principle Branching Direction determined by the child? Why is the parsing done right-to-left? Why are the elements only added in a binary-branching fashion?
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
77
Three answers: 1) The Principle Branching Direction is either given to the child on phonological grounds, or is perhaps identified with the branching direction given by the transitive verb (with the internal/external division). 2) Parsing of the closed class elements is done in the direction opposite of the Principle Branching Direction (so, right-to-left in English). 3) While it need not be the case that all structures in the language are binary branching, I will assume that there is necessary binary branching in one of the following two situations: a) where the selecting category is a functional head in the sense of the Abney (1987a, b) possibly, in cases in which the language is uniformly right-headed (as in. the case of Japanese, Hoji 1983, 1985). Continuing with the parse, the left-peripheral element is arrived at. In line with (107c), this would give the following (erroneous) parse (if the were attached low, there would be a left branch). (109)
V N
V V
N
The bureaucrat saw the monster Let us exclude this in principle, in the following way: (110) a.
b.
α is extra-syntactic iff it is an unincorporated element on the periphery of the domain on the border opposite from that with which the parsing has begun. Extra-syntactic elements need not be parsed in the initial parsing.
The result of (107) together with (110) would therefore be the following, with the initial the extrasyntactic. (111)
V N
V V
N
(The) bureaucrat saw the monster
78
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The parentheses indicate extra-syntacticity. The tree in (111) must now be labelled. I have suggested earlier that the closed class elements, while segmented, have not been labelled for distinguishing form class. Let us therefore label them all X0 or XCC (Xclosed class). In addition, there is the question of what the constituent is that is composed of an unspecified X0 and an N. Let us adopt the following labeling conventions or principles: (112) Labelling: a. If a major category (N, V, A) is grouped with a minor element, the resultant is part of the phrasal syntax. b. The major category projects over an unspecified minor category, or c. The adjoined category never projects up. d. No lexical category may dominate a phrasal category. (112a) is the labelling principle corresponding to the suggestion above that it is precisely when the closed class determiner elements are added that the phrasal syntax is entered into: this part of the parsing procedure corresponds, then, to the rule of Project-α. (112d) is an unexceptional assumption of much current work (though problems arise potentially for the analysis of idioms). (112b) is more tentatively assumed than the others; see later discussion. None of these are intended to be special “parsing principles”; all are simply part of the grammar. The addition of these elements to the phrase marker stretches it, rather than building it up. The attachment operations are roughly equivalent to the sort of transformations envisioned in Joshi, Levy, and Takahashi (1975), Kroch and Joshi (1985), Joshi (1985). The resultant of applying (112) is the following tree:
VP
(113)
N
VP V
the bureaucrat saw
NP X0
N
the
monster
The second the is labelled X0 and attached to a constituent with the following noun. By (112a) this forms a phrasal category; by (112b) or (c) this phrasal category is an NP. Since a lexical category may not dominate a phrasal category ((112c)), the type of the category dominating the verb and the following elements
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
79
is not a V, but a VP. By (112c), though not (112b), the entire category would be labelled VP. Finally, the initial the is still not incorporated. This initial representation in fact makes a prediction: that the child may initially know that there is a slot before the head noun, and in other places, but have this slot unspecified as to content and category. Interestingly, in production data, it often appears that the overt competence of closed class determiner items, and other sorts of items is preceded by a stage in which a simple schwa appears in the position in which the determiner would be due. This seems to be the case (from Bloom 1970). (114) a. b. c. d. e.
This 6 slide. That 6 baby. 6 take 6 nap. 6 put 6 baby. Here comes 6 ’chine.
More strikingly, however, the schwa appears in two others places: in a preverbal (or perhaps copular) position, and in a position at the beginning of the sentence (from Bloom 1970). (115) a. b. c. d. e. f. g.
This 6 tiger books. (could be determiner or copula) This 6 slide. (could be determiner or copula) 6 pull. (sentence initial or subject) 6 pull hat. (sentence initial or subject) 6 write. (sentence initial or subject) 6 sit. (sentence initial or subject) 6 fit. (sentence initial or subject)
Bloom makes the following comment (pg. 74–75): The transformation for placing the /6/ in preverbal position accounted for the 31 occurrences of /6/ before verbs — for example, “6 try”, “6 see ball” — and for the fact that the schwa was not affected by the operation of the reduction transformation. The preverbal occurrence of /6/ was observed in the texts of all three children and deserves consideration — particularly since such occurrence precluded the specification of /6/ in the texts as an article or determiner exclusively. The problem of ordering the /6/ placement transformation in relation to the reduction rules [the set of deletion rules that Bloom assumes deriving the child’s phrase marker from a relatively more expansive adult-like one, D. L.] has not been solved satisfactorily. The order in which they appear in the grammar was chosen because /6/ was not affected by reduction; that is, it was not deleted with increased complexity within the sentence, and its occurrence
80
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR did not appear to operate to reduce the string in the way that a preverbal noun (as sentence-subject) did. This order is attractive because /6/ can be viewed as ‘standing in’ for the deleted element and appears to justify the /6/ as a grammatical place holder.
The reduction transformation that Bloom posits is of course not carried over into the present theory. Interestingly, the set of places that the schwa is occurring is precisely where one would expect the X0 categories to appear in the current theory. Three questions again arise: How and when is the sentence initial the incorporated into the representation? What about the inflection on the verb (children’s initial verbs are uninflected)? And, crucially, what constitutes the domain over which extra-syntacticity is defined? The second of these questions also has repercussions for what the category of the entire resultant is (VP or IP). For the first of these questions, let us assume that the answer is the same that is given in phonology for extrametricality: If α is extrasyntactic, group it locally with the most nearly adjacent category. This would give rise to the initial the being grouped, correctly, in the initial NP. The question of Infl is more complex, and partly hangs on the domain of extrasyntacticity. Let us assume, as I believe we must, that the child initially is able to analyze the verb as composed of the verb and some additional element. I will assume that this additional element is simply analyzed as an X0, i.e., an additional closed class element of uncertain category, and is simply factored out of the produced representation. That is, the child’s telegraphic speech is of bare infinitival stems (or the citation form), not of correct (or incorrect!) inflected verbs. If we assume that Infl is also adjoined to the VP in accordance with the principle branching direction of the language, we arrive at a representation which is, except for categoricity, identical to the representation of the phrase marker given (for example) in Chomsky (1986). This corresponds, presumably, to the DS representation. On the other hand, if we assume that the domain of extrasyntacticity is the VP, as well as the S, and that Infl is therefore regarded as extrasyntactic in its domain, its attachment site would be low, with the verb. This would correspond to the representation after affix-hopping, in the standard theory (though see, again, Chomsky 1986).
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(116) a.
81
Placement of Infl, if S is only extrasyntactic node: S NP X0
N
X0
VP V
The bureaucrat Infl
b.
NP
see
X0
N
the
monster
Placement of Infl, if S and VP are extrasyntactic nodes: S VP
NP X0
N
The bureaucrat
V
NP
X0
V
X0
N
Infl
see
the
monster
These representations correspond not only to two different representations of the phrase marker, but to what might be assumed to be two different levels of the phrase marker: at DS, Infl takes scope over the entire VP, while at PF, it is attached to the verb. This process itself may be simply a subcase of a more general process in the grammar where a closed class element appears to take wider scope semantically than it does at PF: in morphology, where constructions like transformational grammarian have the -ian take wide scope semantically, but comparative narrow scope syntactically; and perhaps relative clause constructions, where the determiner must appear outside the N′ or N″ for a perspicacious construal of scope relations (the Det-Nom analysis), while for other purposes it appears that the NP-S′ analysis may be preferable (where the determiner has been lowered). With respect to Infl, in earlier work (Lebeaux 1987) I took the position that the initial Infl was adjoined to V, hence did not govern the subject position, and hence the latter could stay unrealized in early speech. This was in contradistinction
82
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
to Hyams’ analysis of early “pro-drop” (Hyams 1985, 1986, 1987). The present representation suggests other alternatives for the early representation of subjects. One alternative, which I will simply mention without taking a position on: that in the early grammar, the subject position is, initially, purely a thematic position (a pure instantiation of theta relations). It might be assumed that such positions, if external, need not be realized on purely thematic grounds. This would account for the non-obligatoriness of “deep structure subjects” in passive constructions; the actual obligatoriness of subjects in active constructions would then have to be due to some other principle. More strikingly, extra-syntacticity may give an account for the placement of early subjects. I suggested, again in Lebeaux (1987), that early pronominal subjects may not be in subject position, but rather adjoined to the verb, as verbal clitics (see also Vainikka 1985). This would follow if such elements were extrasyntactic, and adjoined low, late.
VP
(117)
V
NP
N
V
N
my
did
it
Other elements, in particular agreement elements in Italian, noted by Hyams (1985, 1986, 1987) in her discussion of the early phrase marker would presumably have the same analysis, and hence not be counterexamples to the thesis of this chapter (see also Chapter 4). In the representation above, I have begun with a thematic representation, i.e., a projection of the pure thematic argument structure of the head, and then incorporated the closed class determiner elements by “stretching” the representation to incorporate them, rather than building strictly bottom-up. This is what the operation corresponds to in the analytic mode. In the derivational mode, this corresponds to the basic rule of Project-α: these are simply aspects of the same operation. This is the means by which the phrasal syntax is entered into, from the lexical syntax. Since the closed class elements are not initially analyzed as to category, and their attachment direction is also not given, these choices must initially be made blindly, in accordance with the Principle Branching Direction of the language (or perhaps the governing direction of eventive verbs), and the
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
83
projection of categorial information. Crucial as well was the notion of extrasyntacticity at the periphery of a domain: it was this, and only this, which allowed the appropriate attachment of the initial determiner. One might wonder how far these initial assumptions would go, in building the phrase marker by the child, and how they are revised. For example, a sentence like that in (118a) would have by the child an initial (telegraphic) analysis like that in (118b). (118) a.
N N
Nloc
mommy
roof
b. NP
NP NP NP
N
X0
X0
X0
Nloc
Mommy is
on
the
roof
The representation in (118b), while correct so far as constituency is concerned, is obviously incorrect so far as categoricity: the PP is misanalyzed, and worse, given the uniform analysis of the closed class elements all being of the form X0, one would expect, wrongly, that they would freely permute in the child grammar. One might of course immediately reply that while the initial grouping is given by the X0 term, the more specific categoricity is determined later, far prior to the point at which the closed class determiner elements play a role in the grammar, and thus that the representation in (118c) should not play any formal role in the analysis, being merely a stepping off stage: in the child grammar, as well as in the adult. Nonetheless, it has shed light before — and will continue to do so in the rest of this work — to suppose that the child’s representations have a real status both in themselves, and in their relations to the adult grammar. The question would then be what representations of the form in (118 c) would mean.
84
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
2.3.7
Licensing of Determiners
One advantageous aspect to the analysis in (118) is that it supposes the closed class elements to be of a uniform category. While this is false with respect to distributional characteristics in the syntax, it may well be close to true in the phonology: i.e., at PF. To the extent to which the acquisition system in its analytic mode is operating on PF structures (see Chapter 5), this assumption is correct. A deeper question, with respect to the format of the adult grammar is the following. In the traditional phrase structure rule approach (Chomsky 1965) categories were licensed with relation to their mother node, via the phrase structure rules. (119) VP → V NP The rewrite rule in (119) may be taken to mean that the node VP licenses to dominated categories, V, NP, in that order. In Chomsky (1981) this view is revised in a number of ways, most particularly by assuming that the basic complement is a projection of the lexical entry (of heads), and the element is licensed in that way. This leaves open a number of licensing possibilities for those elements which are not projections of a head: for example, adjuncts and determiners (or specifiers). The situation with adjuncts is discussed in the following chapter. Insofar as the current literature is concerned (see especially the suggestive and important work of Fukui and Speas 1986; Abney 1987a) a particular position has been taken with respect to the specifier-of relation. Namely, in the case of nominals at least, these have been taken to be a projection of the determiner, with the determiner taking the (old) Noun Phrase as a complement. (120) DetP → Det NP This answers, though obliquely, a particular question that might be raised — namely, what is the relation between the specifier and the head (or head plus complement) — and it answers it by assuming that that relation is modelled in the grammar in essentially the same type of way that a simple head (e.g. a verbal head) takes its complement. In essence, both are types of complementation, the head projecting up. I would like to suggest here that the specifier-of relation should be modelled in a way which is different than complementation, as has already been suggested by the treatment of the closed class categories in acquisition above. This involves (at least) two specifications: i) what is the projecting head?, and ii) how is the
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
85
licensing done? The projecting head is the major category. The licensing is done in the following way: (121) a.
b.
Projection: Let M a major category, project up to any of the bar levels associated with it (for concreteness, I will assume three: thus, N, N′, N″, N″′). Attachment: Attach X0 to the appropriate bar level.
It is the attachment rule here which is distinguished from the type of licensing which occurs for direct complements. Categories traditionally analyzed as determiners would therefore have different attachment sites associated with them. If Jackendoff (1977) is correct, and definite and indefinite determiners differ in their attachment site in English, the former being N″′, and the latter being N″, then their lexical specifications would be the following. (122) a. b.
the: X0, (X0, N″′) a: X0, (X0, N″)
The notation on the left is the category; the notation on the right is simply the structural definition of the element, in terms of the mother-daughter relation (cf. the structural definition of grammatical relations in Aspects). It was suggested in earlier work of mine (Lebeaux 1987, see also Lebeaux 1985) that NP elements may be licensed in more than one way: via their relation to sister elements, which is associated with what is called structural case in Chomsky (1981), and via their relation to their mother nodes. An example of the latter would be the licensing of the genitive subject of nominals in English: that is licensed via its relation (NP, NP). In that article, I argued that this differential type of licensing was associated with a different type of case assignment, namely what I called phrase-structural case, and this is associated with a different type of theta assignment as well (where the NP genitive element is not given its theta role by the N′ following it, but rather a variable relation, relation R, is supplied at the full NP node, relating both to each other; see Lebeaux 1985). By adopting the format in (127) for closed class determiner elements, and other closed class elements, I am adopting the position that these elements are licensed not via their relation to the N′ with which they are associated, but rather via their relation to the mother element (the N″ or N″′ or whatever). The category which they close off (the variable of) is the one which they are licensed by. They neither license the following N′ nor are licensed by it: that is, a complement-of type relation with respect to licensing is not adopted in either direction. Rather, a different type of licensing relation, that of the element to the mother node, is adopted.
86
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
By supposing that the NP-genitive, and the determiner are both licensed in a common way, i.e. via the structural condition on insertion with respect to the mother node, an intuition behind the position of Chomsky (1970) is perhaps explained. In that work, Chomsky supposed that the genitive element was generated directly under the specifier, in the same position as the closed class determiner. This is not obviously correct. Nonetheless, if the above is correct, these elements are licensed in a common way. 2.3.8
Submaximal Projections
As the foregoing suggests, categories may be built up not simply to the maximal projection, but to what might be called submaximal projections. By submaximal projections, I mean a projection which is maximal within its phrase structure context (i.e., its dominating category is of a different type), but which is not the maximal projection allowed for that category in the grammar. For nominals, for example, assuming provisionally the three-bar system of Jackendoff (1977) (nothing hangs on the number of bar levels here), a nominal might be built up to simply N, or to N′, or to N″, or to N″′: N″′ would be maximal for the grammar, the others would be submaximal (though perhaps maximal for that particular instantiation). So far as I know, the syntactic proposal was first made in an unpublished paper by myself (Lebeaux 1982), it had earlier been suggested in acquisition work by Pinker and Lebeaux (1982) and Pinker (1984), and has recently been independently formulated, in a rather different way, by Abney, Fukui, and Speas. Though some of the terminology and many of the concerns will be similar to those of Abney, Fukui, and Speas, there are differences both of principal and detail; I will build here on my own work (Lebeaux 1982, 1987). Some consequences of assuming submaximal projections are the following: I. Subcategorization is not in terms of a particular phrasal category (e.g. N″′), but rather in terms of a category of a given type. E.g. a nominal category Nx, where x may vary over bar levels. II. Crucially, the definite/indefinite contrast may be gotten, without reference to features, or to percolation from the nonhead node (which would be necessary otherwise if one assumes that N is the head of the full category NP). This is done as follows: the nominal is built up as far as necessary. Jackendoff (1977) argues quite forcefully that the definite determiner is attached under N″′ in English, while the indefinite is attached under N″. The two representations are therefore the following.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(123)
87
N′′ Det
a (124)
N′ N
PP
picture
of Mary
N′′′ N′′
Det
N′
the
N
PP
picture
of Mary
Thus, the semantics need not refer to a feature such as +/−definite: this is encoded in the representation. Other types of processes sensitive to the definite/ indefinite contrast, e.g., there-insertion in English and dative shift in German, would presumably refer to this same structural difference. III. Other types of binding and bounding processes may refer to the maximal projection, rather than to the presence or absence of the definite determiner. Here, the relevant contrasts are those such as those in (125) and (126): (125) a. Who would you like a picture of t? b. Who do you like pictures of t? c. *Who do you like the picture of t? (126) a. Every mani thinks that a picture of himi would be nice. b. Every mani thinks that pictures of himi would be nice. c. ?*Every mani thought that the picture of himi was nice. Assuming the same general theory as that above, these contrasts would be specified not by the feature +/−definite, but by the maximal projected node. Of course, the means of capturing the generalization here is simply as strong as the original generalization, and there is some reason to think that it is “namehood” or “specificity” which matters in determining these binding relations
88
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(Fiengo and Higginbotham 1981), not the presence or absence of the determiner itself. If so, there is no purely structural correlate of the opacity. (127) a. *Which director do you like the picture of t? b. What director do you like the first picture of t? IV. Assuming that there is the possibility of submaximal projections, we find a curious phenomenon: for each major category, there are cases which appear to be “degenerate” with respect to projection (Lebeaux 1982). That is, they simply project up to the single bar-level, no further. These are the following. (128) Basic category verb noun preposition adjective
Degenerate instance auxiliary verb pronoun particle (prenominal adjective in English)
The first three of these are clear-cut, the fourth is more speculative. The sense in which a pronoun would be a degenerate case of a noun is that it does not allow complements, nor specifiers. (129) a. *the him b. *every him c. every one d. the friend of Mary, and the one/*him of Sue This may be accounted for if we assume that it is inherently degenerate with respect to the projections that these would be registered under. Similarly, while the issues and data are complex — see the data in Radford (1981) for interesting puzzles — an auxiliary verb does not appear to take complements, specifiers, and adjuncts in the same way that verbs do. Indeed, to account for some of the mixed set of data in Radford, it may make sense to allow for both a Syntactic Structures type analysis and one along the lines of Ross (1968) or Akmajian, Steele, and Wasow (1979), where the auxiliary takes the different structures at different levels of representation. Aside from facts having to do with complement-taking, there are those having to do with cliticization, and the rhythm rule (Selkirk 1984) which in general support the hypothesis of bare, nonprojecting lexical categories. In general, the rhythm rule does not apply across phrasal categories. However, it may apply in prenominal adjectival phrases. (130) a. b.
That solution is piecemealÁ. A pieceÁmeal solution.
PROJECT-α, ARGUMENT-LINKING, AND TELEGRAPHIC SPEECH
(131) a. b.
89
The tooth is impactedÁ. An imÁpacted tooth
In (130b) the main adjectival stress has retracted onto the first syllable; in (131b) it has done likewise, optionally. Assuming that the rhythm rule does not apply across phrasal boundaries, this means that the structure must be one in which the adjective is non-phrasally projecting, i.e., an A, not an A″ or A″′: (132)
NP Det A an
N
impacted tooth
This would also account for the impossibility of phrasal APs in prenominal position in English: it suggests that it is the degeneracy of this category in this position which is the point to be explained, rather than something along the lines of the Head Final Filter of Williams. The other area in which phonological evidence is relevant is with respect to cliticization. In English, a pronoun, but not a full noun, nor the pro-nominal one, may lose its word beat and retract onto the governing verb. (133) a. I saw ’m. b. I saw one. c. *I saw ’n. Similarly, the auxiliary may do so, and cliticize onto the subject, or onto each other. (134) a. b.
I cn go. He may’ve left.
The generalization in these cases seems to be the following. i) α governs β, and ii) β is an X0 category, Then β may cliticize onto α.
(135) If
But this generalization requires the presence of submaximal nodes: i.e., degenerate projections.
90
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
V. Finally, we may note that the device of submaximal projections gives the possibility of differentiating null categories, without assuming a primitive feature set of +/−anaphor, +/−pronominal. Namely, the categories may be differentiated by the node to which they project up. These would be as follows. (136) Null Category PRO, pro wh-trace NP-trace
Category type N N′ N″
The particular assignments are given for the following reasons: PRO and pro because of their similarity to simple pronominals in interpretation (though not, in the case of PRO, in their dependency domains), NP-trace because it may also simply be regarded semantically as a surrogate for the NP which binds it: this is not true for either wh-trace or PRO (or pro). The reason that wh-trace is identified with N′ will be apparent in the next chapter. This identification of null categories with nominal projection levels here is intended as suggestive: a full proposal along these lines would go far beyond the scope of this thesis.
C 3 Adjoin-α α and Relative Clauses
3.1 Introduction In the previous chapter I dealt with some aspects of argument structure, phrase structure, and the realization rule, Project-α, which relates lexical representations to the syntactic system strictly speaking. Aside from particular points that were made, a few central conclusions were reached: i) that Project-α is a relation relating the lexical entry to the phrase marker in a very faithful way — in particular, the internal/external distinction used in the syntax, and even directionality information as to theta marking (Travis 1984; Koopman 1984) is found in the lexical entry, ii) that telegraphic speech was a representation of pure theta relations, and iii) that there was a close relation between acquisition stages and representational levels, a relation of congruence (the General Congruence Principle). In this chapter, I take up the third point more carefully, modifying it as necessary. What is the nature of the relation between acquisitional stages and representational levels? Are there true levels, and, if not, are there at least ordered sequences of operations (where these sequences do not, however, pick out levels)? If there are levels, are they ordered: i.e. are DS, SS, PF, LF, and perhaps others, to be construed as a set, or is there a (partial) ordering relation between them? Finally, what is the way that levels may be ordered: on what grounds? Note that to the extent to which there are real acquisitional stages, and these are in a correspondence relation with representational levels, a strong syntactic result would be possible: that the grammar is essentially leveled, and the leveling follows from, is “projected from”, the course of development. A geological metaphor is apt: the sedimentation pattern over a period of time is essentially leveled. The sedimentation layers are distinct and form strata, moreover they have distinct “vocabularies”. The course of the geological history is projected into the final structure: a cross-section reveals the geological type of the resultant.
92
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
In the next three chapters, I would like to discuss three areas of grammatical research. These are intended to shed light on the question of representational mode. One of these areas is well mapped out in current syntactic work, though the appropriate analysis is still not clear: the argument/adjunct distinction. This I will discuss in this chapter. Here I will be looking at, in particular, the representation of relative clauses — the paramount case of an element in an adjunctual relation to a head. The general conclusion will be that the adjunct relation should be modeled in a derivational way: namely, by supposing that adjuncts are “added into” the representation in the course of a derivation. Both syntactic and acquisition evidence will be presented to support this view. A second question has been asked more extensively in the acquisition literature than in syntactic research per se, though with notable exceptions (Chomsky 1980, on Case assignment; Marantz 1982, on early theta assignment). It divides into two parts. First, is there a “semantic stage” in acquisition, such as suggested by much early work in psycholinguistics (Bowerman 1973, 1974; Brown 1973)? Such a stage would use purely semantic, i.e. thematic, descriptors in its vocabulary, without reference to, e.g., grammatical relations or abstract Case. The same question may be asked with respect to Case assignment: are there different types of Case assignment, are these ordered in the derivation, and may they be put into some correspondence with acquisitional stages? N. Chomsky in “On Binding” answered the first and second questions in the affirmative about case assignment (though this was before work on abstract Case), and in earlier work (Lebeaux 1987), I suggested that Hyams’ data about the dropping of early subjects might fall under the rubric not of pro-drop, but of the lack of analysis of the verb + Infl combination, together with two types of Case assignment (structural and phrase-structural) having a precedence relation in the acquisition sequence, and operating in different manners. My analysis, then, answered the third question in a positive manner, with respect to Case. I will not discuss the possibility of precedence within types of Case assignment in this book, but I will discuss the thematic issue, in Chapter 4. A third question has to do with the nature of the open class/closed class distinction, and how this might be modelled in a grammatical theory. This question, again, has received more attention in the psycholinguistic literature (see Shattuck-Hufnagel 1974; Garrett 1975; Bradley 1979; Clark and Clark 1977, and a panoply of other references) than in grammatical theory proper. I propose in Chapter 4, the beginnings of a way to accommodate the two literatures. The current chapter, however, will be concerned with the argument/adjunct distinction.
93
ADJOIN-α AND RELATIVE CLAUSES
3.2 Some general considerations As has often been noted (Koster 1978; Chomsky 1981), given a particular string, say that in (1), there are two ways of modelling that string, and the dependencies within it, within a GB-type theory. (1)
Whoj didi John ei see ej?
On the one hand, these informational dependencies may be viewed as an aspect of a single level of representation. Thus in Chomsky (1981) it is suggested that the operation Move-α may be viewed as a set of characteristics of the S-structure string, involving boundedness, single assignment of theta role, and so on. On the other hand, the string in (1) may be viewed not as a representation at a single level, but as being the output of a particular derivation, that in (2). (2)
DS:
SS: C′′
C′′ C′
C
C′
Who IP
C
IP
Move-α NP John
I′ I Infl
NP VP
John
I′ I
VP
V
NP
V
NP
see
who
see
e
Under this view, the sentence in (1) is just a sectioning of a larger figure. The full figure is multi-leveled. The representation in (1) retains a good deal — and perhaps all — of the information necessary to derive the full representation in (2) back again. It is precisely this character, forced in part by the projection principle, that makes the distinction between “representational” and “derivational” modes noted in Chomsky (1981) so difficult. Indeed, in the old-style Aspects derivations, where no traces were left, such a question could not arise, since the surface (corresponding to the present S-structure) patently did not contain all the information present in the DS form: it did not define for example, the position from which the movement had taken place. However, to the extent to which constancy principles hold — i.e. principles
94
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
like the Projection Principle which force information to be present at all levels of the same derivation — the problem of the competition in analysis between the representational and derivational modes becomes more vexed. It is therefore natural, and necessary, to see what sort of information in principle might help decide between them. Basically, the information may be of two types. Either i) there will be information present in the representational mode which is not present in the derivational mode, or can only be present under conditions of some unnaturalness, or ii) there is information present in the derivational mode which is not present in the representational mode — or, again, may only be represented under conditions of some unnaturalness. It is possible to conceive of more complex possibilities as well. For example, it may be that the grammar is stored in both modes, and is used for particular purposes for either. I do not wish to examine this third, more complex, possibility here.
3.3 The Argument/Adjunct Distinction, Derivationally Considered In the next few sections, I would like to argue for a derivational approach, both from the point of view of the adult system, and from the point of view of acquisition. The issue here is the formation of relative clauses and the modelling of the argument/adjunct distinction in a derivational approach. 3.3.1
RCs and the Argument/Adjunct Distinction
Let us consider the following sentences: (3)
a. b. c. d. e.
The man near Fred joined us. The picture of Fred amazed us. We enjoyed the stories that Rick told. We disbelieved the claim that Bill saw a ghost. John left because he wanted to.
The following examples give the same sentences with the adjuncts in italics. (4)
a. b. c. d. e.
The man near Fred joined us. The picture of Fred amazed us. We enjoyed the stories that Rick told. We disbelieved the claim that Bill saw a ghost. John left because he wanted to.
ADJOIN-α AND RELATIVE CLAUSES
95
I have differentiated in the sentences above between the modifying phrases near Fred and that Rick told in (4a, c), and the phrases of Fred and that Bill saw a ghost in (4b, d), which have intuitively more the force of direct arguments. See Jackendoff (1977) for structural arguments that the two types of elements should be distinguished. It is sometimes claimed that the latter phrases are adjuncts as well (Stowell 1981; Grimshaw 1986), but it seems clear that, whatever the extension of the term “adjunct”, there is some difference between the complements in (4b, d) vs. those in (4a, c). It is likely, therefore, that there is a threeway difference between pure arguments like obligatory arguments in verbal constructions (“I like John”), the optional arguments in instances like picturenoun constructions and in the complement of denominals like claim (“the picture of John”, “the claim that Bill saw a ghost”), and true adjuncts like relative clauses and locative modifiers (“near Fred” and “that Rick told” above). For present purposes, what matters is the distinction between the second type of complement and the third, and it is here that I will locate the argument/adjunct distinction. There is no unequivocal way to determine the adjunctual status of a given phrase, at least pre-theoretically. One commonly mentioned criterion is optionality, but that will not work for the complements above, since all the nominal complements are optional — yet we still wish to make a distinction between the picture-noun case (as nominal arguments), and the locative phrases or relative clauses (as adjuncts). Nonetheless, if the intuition that linguists have is correct, the property of optionality is somehow involved. Note that there is still a difference in the two nominal cases: namely, that if the nominal construction is transposed to its verbal correlate in the cases that we would wish to call “argument”, then the complement is indeed obligatory, while the locative complement remains not so. (5) (6)
a. b. a.
the photograph (of Fred) the photograph (near Fred) We photographed Fred. #We photographed. (not same interpretation) b. We photographed near Fred. We photographed. (same interpretation)
This suggests that the difference between (5a) and (6a) may not reside so much in theta theory as in Case theory (Norbert Hornstein, p.c.). Let us therefore divide the problem in this way. There are two sorts of optionality involved. The first is an optionality across the subcategorization frames of an element. The
96
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
nominal head of a construction like photograph, or picture, is optional across subcategorization frames, while the corresponding verbal head is not. (7)
a. b.
photograph (V): __NP photograph (N): __ (NP)
It is this sort of optionality which may, ultimately, following Hornstein, be attributed to the theory of Case: for example, that the verbal head photograph assigns case and hence requires an internal argument at all levels, while the nominal head photograph does not. Over against this sort of optionality, let us consider another sort: namely, that of licensing in a derivation. Since the work of Williams (1980), Chomsky (1982), Rothstein (1983), Abney and Cole (1985), and Abney (1987b), as well as traditional grammatical work, it is clear that elements may be licensed in a phrase marker in different ways. In particular, there exist at least two different sorts of licensing: licensing by theta assignment and licensing by predication. Let us divide these into three subcases: the licensing of an object by its head, which is a pure case of theta licensing, the licensing involved in the subject-predicate relation, which perhaps involves both theta licensing and licensing by predication, and the relation of a relative clause to its head, which is again a pure instance of predication licensing (according to Chomsky 1982). These sorts of combinatorial relations may themselves be part of a broader theory of theta satisfaction along the lines sketched by Higginbotham (1985), which treats things like integration of adjectival elements into a construction; the refinements of Higginbotham’s approach are irrelevant for the moment. (8)
a. b. c.
John hit Bill. (licensed by theta theory) John hit Bill. (licensed by theta theory and predication theory) The man that John saw (licensed by predication theory)
Let us now, following the basic approach of Chomsky (1982) and Williams (1980), note that predication licensing — i.e. predication indexing in Williams’ sense — need not take place throughout the derivation, but may be associated with a particular level, Predication Structure in Williams’ theory. Let us weaken Williams’ position that there is a particular level at which predication need apply, and adopt instead the following division, which still maintains an organizational difference between the two licensing conditions: (9)
a. b.
If α is licensed by theta theory, it must be so licensed at all levels of representation. If α is not licensed by theta theory, it need not be licensed at all levels of representation (but only at some point).
ADJOIN-α AND RELATIVE CLAUSES
97
Predication licensing in Chomsky’s (1982) broad sense (and possibly in Williams 1980, sense as well) would fall under (9b), while the licensing of direct internal arguments would fall under (9a). However, (9a) itself is just a natural consequence of the Projection Principle, while (9b) simply reduces to the instances over which the Projection Principle holds no domain, which needs no special statement. The strictures in (9) may therefore be reduced to (10), which is already known. (10)
a. b.
The Projection Principle holds. All categories must be licensed.
In terms of the two types of “optionality” noted above, the optionality of (9) is the optionality in licensing conditions for adjuncts at DS. (11)
Arguments must be licensed at DS; adjuncts are optionally licensed at DS.
With respect to the constructions discussed earlier, the picture-noun complements and complements of claim, this means that the complements in such constructions, as arguments, must be assigned a theta role and licensed at DS, when they appear. (12)
(13)
the picture of Mary theme licensed at DS the claim that Rick saw a ghost theme licensed at DS
These complements need not appear; they are optional for the particular head (picture, claim). However, when they appear, they must be licensed and theta marked at DS, by the Projection Principle. This distinguishes them from true adjuncts, which need not be licensed at DS. The optionality in the licensing of adjuncts at DS, but not arguments, is one way of playing out the argument/adjunct distinction which goes beyond a simple representational difference such as is found in Jackendoff (1977), where arguments and adjuncts are attached under different bar-levels. However, there is a more profound way in which the argument/adjunct distinction, and the derivational optionality associated with it may enter into the construction of the grammar. It is to this that I turn in the next section.
98
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
3.3.2
Adjunctual Structure and the Structure of the Base
In the sentences in (4) above the adjuncts were italicized, picking them out. Suppose that, rather than considering the adjuncts in isolation, we consider the rest of the structure, filtering out the adjuncts themselves. (The (b) sentences are after “adjunct filtering”.) (14) (15) (16) (17) (18)
a. b. a. b. a. b. a. b. a. b.
Bill enjoyed the picture of Fred. Bill enjoyed the picture of Fred. He looked at the picture near Fred. He looked at the picture. We disbelieved the claim that John saw a ghost. We disbelieved the claim that John saw a ghost. We liked the stories that Rick told. We liked the stories. John left because he wanted to. John left.
Comparing the (a) and (b) structures, what is left is the main proposition, divested of adjuncts. Let us suppose that we apply this adjunct filtering operation conceptually to each string. The output will be a set of structures, in which the “argument-of” relation holds in a pure way within each structure (i.e. the subjectof, object-of, or prepositional-object-of is purely instantiated), but the relation of adjunct-of holds between structures. In addition, one substructure is specially picked out as the root. (19)
(20)
(21)
(15a) after adjunct filtering: Argument structure 1: He looked at the picture. Argument structure 2: near Fred The rooted structure is 1. (16a) after adjunct filtering: Argument structure 1: We disbelieved the claim that John saw a ghost. The rooted structure is 1. (17a) after adjunct filtering: Argument structure 1: We liked the stories. Argument structure 2: that Rick told. The rooted structure is 1.
ADJOIN-α AND RELATIVE CLAUSES
(22)
99
(18a) after adjunct filtering: Argument structure 1: John left. Argument structure 2: because he wanted to The rooted structure is 1.
Each of the separate argument structure elements are a pure representation of the argument-of relation; no adjuncts are included. They may be called the Argument Skeletons of the phrase marker. In this sense, each phrase marker is composed of a set of argument skeletons, with certain embedding relations between them (which haven’t been indicated above), and one element picked out as the root. (23)
Phrase marker
Argument skeletons Can anything be made of such a conceptual device? Before considering data, let us note one aspect of current formulations of the base. According to Stowell (1981), there is no independent specification of the base. Rather, its properties follow from that of other modules: the theory of the lexicon, Case theory, theta theory, and so on. Let us take this as a point of departure: all properties of the base follow from general principles in the grammar. What about the actual content of the base: of the initial phrase marker? Here we note (as was noted above) that a duality arises in licensing conditions: elements may either be directly licensed by selection by a head (i.e. subcategorized, perhaps in the extended sense of theta selection), or they may not be obligatorily licensed at all, but may be optionally present, and, if so, need not be licensed at DS, but simply at some point in the derivation: the case of adjuncts (Chomsky 1982, and others). Suppose that we adopt the following constraint on D-structures: (24)
(Every) D-structure is a pure representation of a single licensing condition.
Then the duality noted in the licensing conditions would be forced deeper into the grammar. The consequence of (24) would be that arguments, licensed by a head, and adjuncts, licensed in some other way, would no longer be able to both
100
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
be present in the base.1 The base instead would be split up into a set of sub-structures, each a pure representation of a single licensing condition (“argument-of” or “assigned-a-theta-role-by”), with certain adjoining relations between them. That is, if (24) is adopted, the argument skeletons above (arg. skeleton 1, arg. skeleton 2, etc.) are not simply conceptual divisions of the phrase marker, but real divisions, recorded as such in the base. Ultimately, they must be put together by an operation: Adjoin-α. By adopting a position such as (24), we arrive then, at a position in some ways related to that of Chomsky (1957) (see also Bach 1977): there is a (limited) amount of phrase marker composition in the course of a derivation. Yet while phrase markers are composed (in limited ways), they are not composed in the manner that Chomsky (1957) assumes. Rather, the Projection Principle guides the system in such a way that the substructures must respect it. There is, in fact, another way of conceiving of the argument structures picked out in (19)–(23). They are the result of the Projection Principle operating in the grammar, and, with respect to the formulation of the base, only the Projection Principle. If the Projection Principle holds, then there must be the argument structures recorded in (19)–(23), at all levels of representation. However, there need not be other elements in the base, there need not be adjuncts. If we assume that the Projection Principle holds, and (with respect to this issue) only the Projection Principle, then it would require additional stipulation to actually have adjuncts present in the base: the Projection Principle requires that arguments be present, but not adjuncts. It is simpler to assume that only the Projection Principle holds, and the adjuncts need not be present. The sort of phrase structure composition suggested above differs from both the sort suggested in varieties of categorial grammar (e.g. Dowty, Wall, and Peters 1979), and from the domains of operation of traditional cyclic transformations (Chomsky 1965). With respect to categorial grammar, since the ultimate phrase marker or analysis tree is fully the result of composition operations, there are no subunits which respect the Projection Principle. The analysis may start off with the transitive verb (say, of category S/NP/NP), compose it with an object creating a transitive verb phrase, and compose that TVP with a subject. The original verb, however, starts out “naked”, unlike the representation in Chapter 2, and the argument skeletons above. And the composition operation, being close
1. This is a stronger condition than that which simply follows from the Projection Principle, because it requires that something actually force the adjuncts in, for them to be present. In other words, elements cannot be “hanging around” in the structure, before they are licensed there (or if they are not licensed there).
101
ADJOIN-α AND RELATIVE CLAUSES
to the inverse of the usual phrase structure rule derivation (with the possible difference of extensions like “Right- and Left- Wrap”, Bach 1979), would not add adjuncts like relative clauses in the course of the derivation, but rather would compose a relative clause directly with its head, and then let the resultant be taken as an argument: exactly the reverse of the order of expansion in the phrase marker. The operation above, however, takes two well-formed argument skeletons and embeds one in the other. The difference between the domains scanned in the theory proposed above, and that found in standard early versions of cyclic theories is perhaps more subtle. Cyclic theories (pre-Freidin 1978) scan successive sequences of subdomains, the least inclusive sub-domains first. A partial ordering exists between the sub-domains, where the possibility of multiple branching requires that the ordering be merely partial rather than complete. This is also true with the argument skeleton approach above. However, the domains which are in such an inclusion relation are different. This is shown in (25) below. (25)
Cyclic domains S
Ordering Relations
NP
➄ ➃
NP
1<2 2<5 3<4 4<4
VP S
V
➂
➁
S NP
VP
➀ Argument skeleton domains S
Domains
NP
➁ NP
1 2–1 2
VP S
V
S
➀ NP
VP
Ordering Relations 1<2 2–1 < 2–1 –1
102
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Unlike the case with cyclic domains, there is no strict inclusion relation with the argument skeleton domains. Rather, it is as if the upper bar level complements of Jackendoff (1977) were not present in the original scanning, and are only added later, in the course of the derivation. 3.3.3
Anti-Reconstruction Effects
Van Riemsdijk and Williams (1981) note a peculiar hole in the operation of Condition C, as it applies to structures involving moved constituents. Consider the data in (26). (26)
a. *Hei likes those pictures of Johni. b. *Hei likes the pictures that Johni took. c. ?*Which pictures of Johni does hei like? d. Which pictures that Johni took does hei like?
As expected, both (26a) and (26b) are ungrammatical; he c-commands the coreferent John, and is out by Condition C. The interesting divergence occurs in (26c) and (26d). Here, where John is contained in a fronted noun phrase, Condition C applies differentially in the two cases. Where John is the object of a picture-noun phrase the sentence retains the ungrammaticality of the original (26a), but when it is contained inside a relative clause, and this is fronted, then the ungrammaticality suddenly disappears: (26d) is perfect. At first glance, it may appear that this contrast can be handled by locating the application of Condition C to a particular level, say, D-structure or S-structure. Yet it is clear that this device will not work. If Condition C were located at DS, then all the sentences above, (26a, b, c and d), would be expected to be bad. This is not the case: (26d) is fine. On the other hand, if Condition C were located at S-structure, and applied directly on structures (rather than using a derived notion of c-command such as is found in Williams 1987), then the grammar would allow in too much: both (26c, d) would be expected to be good. But neither of these locations for Condition C would allow for the true result: the grammaticality of (26d) and the ungrammaticality of (26c). Van Riemsdijk and Williams themselves take a different tack. They suggest that the degree of embedding of the name in the dislocated constituent is the crucial factor in creating the contrast. In particular, they suggest that the name in the relative clause in (26d) is embedded under an S, while the name in (26c) is embedded in a PP. This lack of embedding in (26c) is related, they suggest, to the comparative grammaticality of that construction. The formulation that they give is the following.
ADJOIN-α AND RELATIVE CLAUSES
(27)
103
In a structure where NP is part of a dislocated constituent, NP is exempted from reconstruction if it is deeply embedded enough: part of a S′ or genitive phrase.
While the van Riemsdijk and Williams observation is extremely interesting, there is some reason to believe that their statement of the constraint may be improved upon, most particularly by examining the function of the structure containing the name, rather than the degree of embedding per se. Let us consider a somewhat more inclusive range of data. Note first the following contrast: (28)
a. *Hei believes the claim that Johni is nice. b. *Hei likes the story that Johni wrote. c. *Whose claim that Johni is nice did hei believe? d. Which story that Johni wrote did hei like?
The contrast in (28) is rather striking. All constructions evince the same degree of embedding: the name is embedded in an S′. As expected, the non-dislocated structures show a Condition C violation. However, there is a clear distinction in the sentences with dislocated NPs. In (28c), where the name John is contained in an S′ which is a complement of the head noun claim, the ungrammaticality of the initial undislocated structure is retained with full force. In (28d), where the name is likewise contained in an S′, but where the S′ is part of an adjunct relative clause associated with the dislocated head, the output becomes perfect. In (28), then, it is the adjunct status of the containing structure, rather than the degree of embedding of the name, which is associated with the difference in grammaticality. The same can be seen by an appropriate choice of PPs. As noted before, if the name is contained in the PP complement of a picture-noun phrase, and fronted, the resultant is ungrammatical. As suggested earlier, the internal argument of a picture-noun is a sort of direct complement (Jackendoff 1977). Consider what happens when the name appears in an indisputable adjunct. (29)
a. *Hei destroyed those pictures of Johni. b. *Hei destroyed those pictures near Johni. c. ?*Which pictures of Johni did hei destroy? d. Which pictures near Johni did hei destroy?
As expected, (29a) and (b) are ungrammatical: they violate Condition C. The interesting contrast appears when the fronted NPs are in dislocated position. When the name is part of a picture-noun phrase, and fronted, the output retains
104
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
the ungrammaticality of the base (29c). However, when it is part of a (locative) adjunct, the ungrammaticality disappears (29d), though it is still present in the putative D-structure (29c). Degree of embedding is not an issue, being held constant (this same observation, that adjuncthood is what matters, not degree of embedding per se, was made by Freidin 1986, independently: a fact brought to the author’s notice only after this was written). We note the same contrast between (29c) and (d) in (30), using this time derived nouns in their “verbal” vs. “nominal” uses (Lebeaux 1984, 1986) (30)
a. ?*Whose examination of Johni did hei fear? b. Which examinations near Johni did hei peek at?
The deverbal noun examination is being used in (30) is being used in either a simple referential sense (30b), or as a “nominalized process” (Lebeaux 1984, 1986). It is plausible to consider that the argument structures of these usages differ. The data in (30) supports that: the true argument John in (30a) violates Condition C when dislocated, while the adjunct near John does not. The data in (28)–(30), then, supports the following conclusion: it is the grammatical function or character of the structure within which the name is contained, which determines whether a Condition C violation occurs when it is dislocated. Yet this grammatical function or character (the argument/adjunct distinction) is irrelevant, if the structure within which the name is contained is in place. 3.3.4
In the Derivational Mode: Adjoin-a
One way of accounting for the data above — the anti-reconstruction facts — would be to simply stipulate it as part of the primitive basis: (31)
If α, a name, is contained within a fronted adjunct then Condition C effects are abrogated; otherwise not.
However, this is hardly an intuitive solution to the problem: stipulation (31), as a primitive specification in UG — it would have to be in UG; there is not sufficient evidence in the data to set this as a parameter or possibility from lowlevel learning — is hardly satisfactory. Further, this sort of stipulation would leave unexplained, in a rather a priori fashion, the relation of the anti-reconstruction constraint to the more standard oddities associated with adjuncts, the Condition on Extraction Domains (Huang 1982), however reconstructed. While the solution proposed below will not directly relate the anti-reconstruction facts to the Condition on Extraction Domains, it does, I believe, clear the way for such
105
ADJOIN-α AND RELATIVE CLAUSES
a relation to be made: something which is not the case if (31) were simply adopted per force. Let us return to the earlier theoretical construct: the argument skeleton. It may be assumed that the Projection Principle requires that heads and their arguments, and the arguments of these heads, and so on, must be present in the base. That is, the entire argument skeleton must be present, insofar as it is a pure instantiation of the relation “argument-of”. However, adjuncts need not be present in the base. They may then be added later by a rule. Let us call this Adjoin-α. Adjoin-α takes two tree structure, and adjoins the second into the first. Let us assume that this always involves Chomsky-adjunction, copying the node in the adjoined-to structure. Like Move-α, Adjoin-α applies perfectly freely, ungrammatical results ruled out by general principles, interpretive or otherwise. (32)
A:
XP YP WP
B:
UP
ZP
Output:
XP YP
Adjoin-α
YP WP
ZP
UP
Here, the subtree ZP has been adjoined into the phrase marker A, copying the YP node. Relative clause adjunction would look like the following. (33)
A:
S
NP
VP V
B:
S¢
NP
Ï Ô Ô Ô Ì Ô Ô Ô Ó
Output:
S NP
Adjoin-α
VP V
NP NP
S¢
And the adjunction of a locative NP-modifying PP would look like this, if the locative is adjoined to the object:
106
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(34)
A:
S
NP
VP V
B:
PP
NP
Output:
S NP
Adjoin-α
VP V
NP NP
PP
Here, the subtree B has been adjoined into A, copying the NP node. We are left, then, with the base generating a set of phrase markers (one specified as the root). The rule Adjoin-α is defined over pairs of phrase markers; the rule Move-α is defined over a single phrase marker. Given absolutely minimal assumptions, Move-α would be expected to apply both prior to, and posterior to, any given adjunction operation, since it is simply defined as movement within a phrase marker, and phrase markers exist both prior to, and posterior to Adjoin-α. There is thus no “level” at which Adjoin-α takes place, it is simply an operation joining phrase markers, given minimal assumptions. We will see below that there are empirical reasons as well to assume the free ordering of Move-α and Adjoin-α. I will assume that each individual substructure prior to Adjoin-α is well-formed. Assuming a derivation of this type — where both Move-α and Adjoin-α are available as operations — a solution is at hand for the anti-reconstruction effects discussed above. Let us assume that Condition C is not earmarked for any particular level — it applies throughout the derivation, and marks as ungrammatical any configuration which it sees, in which a name is c-commanded by a coindexed pronoun.2 Let us further assume that it applies directly over structures,
2. Like Lasnik (1986), I will assume that Condition C is actually split into two separate conditions, one which bars the c-commanding of a name by a pronoun, which is much stronger, and one which bars the c-commanding of a name by another name, which is much weaker. As Lasnik notes, some languages, e.g. Thai, allow c-command of the second sort, but not the first. I will discuss the first constraint here, and restrict the term Condition C to that. The statement that Condition C applies throughout the derivation may be too strong, given that sentences like (1b) are grammatical (consider what the DS would be). (1) a. *It seems to himi that John’si mother is nice. b. John’si mother seems to himi t to be nice. One way to account for this is to restrict Condition C to apply at any point after NP movement. A more radical, but more principled solution, I believe, is to maintain that Condition C applies everywhere, but to argue that the lexical insertion of names applies after NP movement. This assumption is fairly natural given the theory in Chapter 4, but obviously has widespread implications.
ADJOIN-α AND RELATIVE CLAUSES
107
not using any derived or re-defined notion of c-command (Chapter 5). Assume further, as discussed above, that the full argument skeleton must be present at all levels of representation, by the Projection Principle, but that adjuncts need not be. Consider, now, the two relevant structures. (35)
a. Which pictures that Johni took did hei like? b. ?*Whose claim that Johni took pictures did hei deny?
The full DS for (35b) must be the following. (36) *Hei denied the claim that Johni took pictures. (36) is the full argument skeleton. Deny subcategorizes for the internal argument claim, and claim itself takes the clause that John took pictures as a complement (not an adjunct). We must assume then, that the full structure is present at DS, by the Projection Principle. This full structure, however, violates Condition C, since the name is c-commanded by a coindexed pronoun. Therefore the sentence is marked as ungrammatical at that level. Making the usual assumption that starred sentences may not be “saved” by additional operations, this means that the grammar, correctly, disallows the sentence. Consider now the unexpectedly grammatical (35a). The corresponding nonquestion for (35a) is (37). (37)
Hei liked which pictures that Johni took.
Under standard assumptions, this sentence would be marked as ungrammatical at DS. However, the corresponding SS (35a) is fully grammatical. Under the theory proposed here, however, the deep structure underlying (35 a) is not (37). Rather, it is (38) (i.e. the full phrase structure trees corresponding to the argument skeletons. I suppress PS detail for convenience.) (38)
Argument skeleton 1: (S (NP He) (VP liked which pictures)). Argument skeleton 2: (S′ that John took). The rooted structure is 1.
To each of these argument skeletons Move-α may apply; Adjoin-α also applies adjoining argument skeleton 2 into argument skeleton 1. Move-α may also apply to the resulting, full, sentence structure. There are two possible derivations for (35a). In one, Adjoin-α applies prior to Move-α.
108
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(39)
Derivation 1: He liked which pictures. a. that John took. b.
Adjoin-α
*Hei liked which pictures that Johni took.
Move-α
*Which pictures that Johni took did hei like?
In this derivation, if Adjoin-α applies first, then Condition C will apply to the intermediate structure, ruling it out. There is, however, another derivation, given in (40). (40)
Derivation 2: He liked which pictures. a. that John took. b. a. b.
Which pictures did he like? that John took
Move-α
Adjoin-α
Which pictures that Johni took did hei like? In (40), Move-α, applying in argument skeleton 1, applies before Adjoin-α. This derivation gives rise to the appropriate s-structure as well. However, unlike the derivation in (39), as well as the standard derivation, there is no structure in (40) which violates Condition C. This is because the adjunct clause containing John has been adjoined into the representation after movement has taken place, and after the fronted NP has been removed from the position in which it is c-commanded by the pronoun he. This is possible only for adjuncts: direct complements, like the complement of claim, must be present at all levels (i.e. part of the rooted argument structure at all levels). The same analysis as given for relative clauses holds for locative adjuncts in NPs. Recall the contrast in (30). (41)
a. ?*Whose examination of Johni did hei fear? b. Which examinations near Johni did hei peek at?
The DS of (41a) is given in (42); the DS of (41b) is given in (43). (42)
Root: (S (NP He) (VP feared (NP whose examination of John))).
109
ADJOIN-α AND RELATIVE CLAUSES
(43)
Root: (S (NP He) (VP peeked (at (NP which examinations)))) Argument Structure 2: (PP near John)
In (42) only one transformation may apply: Move-α. Move-α fronts the wh-phrase whose examination of John. However, coreference is disallowed between he and John since he c-commands John at D-structure. In (43), two transformations apply: Move-α and Adjoin-α. These may be ordered in either fashion: Move-α applying in the root prior to the adjunction operation, or after it. If Move-α applies after the adjunction operation, then coreference between John and the pronoun is impossible, because Condition C would be violated. However, there is still the derivation in which Move-α applies in the root prior to Adjoin-α. This would look as follows. (44)
A:
S NP
VP
He
V
PP Move-a
peeked P
NP
at which examinations B: near John
A:
C¢¢ C¢
SpecC¢ Which exams
C did
NP he
B: near John
Ï Ô Ô Ô S(=IP) Ô VP Ì Ô V PP Ô Ô peek P NP Ô Ó at
Adjoin-a
110
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
C¢¢ SpecC¢ Which exams near John
C¢ C did
S NP
VP
he
peek at
The PP containing the name is added into the representation after Move-α has applied. Hence there is no structure in which Condition C is violated. The sentence is, correctly, marked as grammatical. Finally, let us note that the same contrast with respect to the abrogation of Condition C holds for sentential adjuncts vs. complements when these are not part of a more inclusive NP. By itself this contrast could have been due to a number of other factors-see Reinhart (1983), where it is traced to the attachment site — but given the data already noted, it seems likely that it should be traced to the same cause as above. The contrast is given in (45). (45)
a. In John’si neighborhood, hei jogs. b. *In John’si neighborhood, hei resides. c. In John’si home, hei smokes dope. d. ??In John’si home, hei placed a new Buddha. e. ?Under Johni, hei noticed that a general hubbub was occurring. f. *Under Johni, hei placed a blanket. (cf. Under himi, Johni placed a blanket.)
These types of sentences are familiar from Reinhart’s work, where the fronted adjunct does not initiate a Condition C violation, while the fronted argument does (45a, c, e) vs. (45b, d, f). The point here is that the same set of data would follow from the theory advocated above. 3.3.5
A Conceptual Argument
One consequence of the analysis above has to do with the question with which the chapter began: the question of mode of representation. Grossly, to the extent to which little happens in a derivation, the derivation may be conceptually collapsed: the derivational mode falling into the representational mode. If information is gained, or lost, then such a collapse becomes commensurately
ADJOIN-α AND RELATIVE CLAUSES
111
more difficult. In the traditional Aspects analysis of NP-movement, information is lost: the position from which an element has moved is eliminated. Real movement was needed. In the wake of the advent of trace theory (Chomsky 1973), the necessity for movement became less clear, the derivational distance between DS and SS having been lessened. The type of analysis above increases the distance between DS and SS again by increasing the disparity between them, this time not because information is lost between DS and SS as in the early versions of movement, but because information is gained: the adjunct is added in the course of the derivation. It might still be argued, correctly, that the statement in (31), repeated below, is perfectly sufficient to maintain the appropriate distinction. (46)
If α, a name, is contained in a fronted adjunct then Condition C effects are abrogated; otherwise not.
Generalizing (46) somewhat to a Reconstruction-type account (or a lambda conversion type account, à la Williams 1987), and restricting ourselves to the case where the adjunct is contained in a nominal, we would have the following. (47)
Reconstruct only N′ obligatorily; all other bar-levels of N optionally (or define as-if reconstructed, etc.)
What is the matter with (47)? It seems clear that empirically there is nothing wrong with (47) — at least over the range of data that we are considering. (47) as well as the postulation of Adjoin-α will account for the following data set, assuming that of John is under N′ and near John is under N″. (48)
a. *Whose examination of Johni did hei fear? b. Which examinations near Johni did hei peek at?
However, there is one rather striking difference between the formulation in (47) and the Adjoin-α account: the latter analysis, but not the former, is intimately bound up with the properties of the Projection Principle. In fact, assuming the Projection Principle, and nothing else, the possibility that adjuncts may be added in the course of the derivation, but not arguments, follows as a consequence. The data in (48) could not be the reverse: (49)
a. Whose examination of Johni did hei fear? b. *Which examinations near Johni did hei peek at?
Given the account in (47), however, there is nothing which would favor the formulation in (47) over the alternative in (50), for example.
112
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(50)
Reconstruct only the elements under the N″ node obligatorily; all other elements optionally.
The alternative in (50), in which only adjunctual elements are added back into the representation at Reconstruction Structure, would not violate any constraint. In particular, it would not violate the Projection Principle, so far as I can see. Yet it would give rise precisely to the (erroneous) data set in (49). In effect, the palette of possibilities allowed by the Reconstruction type approach (or chain-binding approach, if construed as (50)) is too large; the child must be assumed to be given one sort of reconstruction statement (47) instead of another (50), though there is no general principle to choose between them. This is not the case with the Adjoin-α analysis, where the possibility of adjunction follows directly from the Projection Principle, and no inverse is possible.3 3.4 An Account of Parametric Variation Let us assume, then, that the grammar consists of at least two rules: (51)
Move-α Adjoin-α
Either may apply in the course of the derivation, in either order. Is there any substructure in either? 3. There is another area of data here that might provide evidence that adjuncts must be added after D-structure. This is based on an observation on NP-NP anaphora. The peculiar fact is that NP-NP anaphora, with both names, is in general possible, but not if the second NP has part of its reference specified by an adjunct. Some examples follow: the (a) example allows anaphora, but the (b) example does not. (1) a. The old man tried to catch the marlin, but the big fish was tough to catch. b. *The old man tried to catch the marlin, but the fish that was big was tough to catch. (2) a. I looked at the Corvette, during which time the wheelless car really impressed me. b. *I looked at the Corvette, during which time the car without wheels really impressed me. (3) a. I was reading Ulysses, while my baby was trying to eat the story of Bloom. b. ?*I was reading Ulysses, while my baby was trying to eat the story about Bloom. c. *I was reading Ulysses, while my baby was trying to eat the story that was about Bloom. (4) a. I was examining the Stieglitz photograph, while my wife tried to buy the picture of the clouds. b. *I was examining the Stieglitz photograph, while my wife tried to buy the picture near the entrance.
ADJOIN-α AND RELATIVE CLAUSES
113
I suggest that there is, and that, in particular, Adjoin-α is stated in this manner: (52)
Adjoin-α Default: Conjunction
In effect, this states that the grammar attempts to adjoin a second structure into a first, but that if this fails, there is still a default operation. This is simple conjunction: the linearization rule of Williams (1978). If (52) is correct, then “linearization” is not so much a special rule or device, but just a default operation which applies, when two argument skeletons exist, as they often do, and one has not been embedded in the other. That is, additional information is needed to adjoin one structure into another; if no such information is available, the grammar simply conjoins the two structures as a matter of default. Consider now the exhibition of parametric variation as it relates to the statement in Chapter 1 (Chap 1, 6): (53)
The theory of UG is the theory of parametric variation in the specifications of closed class elements, filtered through a theory of levels.
Some languages (e.g. English) contain a relative clause that is structurally adjacent to its head. Other languages, e.g. ancient Hittite (Cooper 1978) and Austronesian languages, possess an adjoined relative clause, the so-called corelative construction which does not structurally form a constituent with the head, but apparently exists in juxtaposition with the full clause, following it. The analyses I have seen have not been exact about the precise phrase structure in these cases; I will assume that the contrast is that below: (54)
English type: NP (=N′′) NP (=N′′)
(55)
S
Co-relative: S S
RC
NP
VP V
NP
114
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
If one wished to give a phrase structure analysis of the two languages, they would differ in the rewrite rules below: (56) (57)
English, Japanese, etc: NP → (NP, S′) (either order) Adjoined RC language: S → (S, S′) (where S′ is of RC type)
Given the strictures of Stowell (1981) such an analysis is not available to us; there is, however, something better. A relative clause must agree with its head (Chomsky 1982). Let us assume that the rule doing this, a predication rule, indexes the head with the element in Comp: the index gets copied from the head to the element in Comp. (58)
(the man)i (who (I knew t)) → (the man)i (whoi (I knew t))
It is, in fact, this rule of construal which allows the relative clause to be interpreted: syntactically, we may conceive that it is the saturation of the head of the relative clause (who above) with an index is what licenses it. Let us call this relative clause head the relative linker — it is the link between the RC itself and the NP head which it modifies. What, then, is the difference between a language like English (or French or Italian … ) and a language in which the co-relative construction is employed? Just this: in English, the wh-head of the relative clause, the relative linker, must be saturated in the syntax, while in any language employing a co-relative it need not be. That is, the parametric difference is the following: (59)
a. b.
English: Relative clause linker must be saturated at s-structure. Corelative language: Relative clause linker need not be saturated at s-structure.
Notice now what has happened. Rather than positing a difference in the phrase structure rule generating relative clauses, the difference has been located elsewhere: in the saturating of the relative clause linker. But this linker itself is simply part of the closed class set. That is, the difference between the two language types is isolated to a single difference in the specification of a closed class element; an element which is part of a closed class set, and which is of necessity finite. Further, this difference in the specification of an element in the closed class set translates into a difference of representation at a particular level. The difference, then, is reduced to something of the form of (53) above: the relative clause differences are reduced to differences in the parametric specifications
ADJOIN-α AND RELATIVE CLAUSES
115
of closed class elements, filtered through a theory of levels. Further, the statement in (59) may itself be re-stated in line with the notions that we have been assuming. When does “Adjoin-α” take place; what causes it? The answer is at hand: adjoin-α takes place at the point at which the relative clause linker is saturated. More exactly, it is the saturation operation itself which composes the two structures: the adjunct into the argument skeleton. There is a 1-to-1 relation between them. (60)
Adjoin-α ↔ Saturate RC linker
The adjunction operation thereby is in a 1-to-1 relation with the saturation operation itself: it is the necessity for saturation in the syntax, in languages like English, which composes the two (clausal) representations, and builds the relative clause structure. In languages which do not require saturation of the relative clause linker in the syntax (though perhaps they do by LF, in which case adjunction will take place later in the derivation: I will remain neutral on this point), the relative clause is simply conjoined at the end of the clausal construction: the co-relative construction. The English case is the following. (61)
A1: John saw the man. Adjoin-α or Saturate linker A2: who Bill liked. John saw the mani whoi Bill liked.
Thus the adjunction operation is actually put in a 1-to-1 correspondence with the change in the specification of a single closed class element, the relative clause linker. We might, in fact, say the same thing for the Move-α. The primitive operation is generally taken to be the movement operation itself. However, this may be taken to follow from — or, more exactly, to be put in 1-to-1 correspondence with — the saturation of the Comp feature. Recall the parametric analysis in Chapter 1. In that chapter, it was suggested that Move-α, as movement to SpecC′ (Chomsky, 1986b), still needed to agree with a +/− wh feature in Comp. This was necessary since the grammar needs to detect whether wh-movement has taken place in the syntax, and this detection (i.e. selection) must be in terms of the head position: Comp, not Spec C′. (62)
a. *John didn’t know Bill saw who. b. John didn’t know who Bill saw.
116
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
If we assume that the coindexing and saturation of the +/−wh feature is attendant upon movement, the following representation would result: (63)
C¢¢ SpecC¢ John didn’t know whoi
C¢ C
S
+whi
Bill saw
Here the +wh taking verb, know, would select for the saturated +wh Comp. The parametric difference between a language which requires (or allows) wh-movement in the syntax, English, and a language which apportions it to LF, Chinese, then comes down to the following specification on the wh-feature. (64)
+wh must be satisfied in the syntax (English) +wh need not be satisfied in the syntax (Chinese)
The Move-α rule itself, as it applies to wh-words, is put into 1-to-1 correspondence with the satisfaction of that feature, exactly in the same way that the Adjoin-α rule was put into 1-to-1 correspondence with the satisfaction of the RC Linker. We may take the following to be equivalent: (65)
Move-α ↔ (as it applies (to wh-words)
Satisfy +wh feature
It is the satisfaction of the +wh feature which “initiates” Move-α. This supports the finiteness claim of Chomsky (1981). Movement itself is put into 1-to-1 correspondence with a contrast in a bit associated with a closed class element. Note an additional consequence. While Move-wh and Move-NP (Chomsky 1977b, van Riemsdijk and Williams 1981), are not specified as distinct operations via the movement rule itself — just as is in general the case in most recent work, where a single rule Move-α is assumed — they are differentiated in terms of the lexical feature which must be satisfied. In the case of wh-movement this is the wh-feature itself; we leave it open for now what it is with respect to NP-movement. With respect to parametric variation, then, we are led to the following picture. Adjoin-α either need or need not apply in a language for relative clauses.
ADJOIN-α AND RELATIVE CLAUSES
117
If it does not apply, a default operation of simple conjunction occurs. The parametric situation in terms of the operation is the following. (66)
UG: (Adjoin-α) Default: Conjunction
We may conceive of these two operations as being in a bleeding relation in a derivation. If Adjoin-α does not apply, then Conjunction will. (67)
DS
DS Adjoin-α
no Adjoin-α
Conjunction (always bled)
Conjunction (always applies)
SS
SS
English
Co-relative language
The order of operation is the same in English and in a co-relative language. The difference is that in English, Adjoin-α is always specified as applying (and so conjunction as a universal default is always bled), while in a co-relative language, Adjoin-α will never apply, and the default will take over. This same difference may be looked at with respect to the specification of the satisfaction of the relative clause linker. The linker either will have to be, or will not need to be, satisfied in a given language. (68)
UG
RC linker must be satisfied
RC linker need not/must not be satisfied (Default)
The grammar, having made the decision in (68), will adopt one or the other sort of relative clause. The situation is similar, though subtly different, in the case of wh-movement. At first, it appears to be identical. We have a cross-linguistic difference in movement, depending on the satisfaction of the +wh element in Comp at S-structure.
118
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(69)
UG
+wh element must be satisfied at SS (English)
+wh element need not be satisfied at SS (Chinese)
If we assume that the nonspecification of information is always the default, then the Chinese case would be the default case, with the wh-element in place, in spite of the fact that it appears to be cross-linguistically less common. However, there is an additional possible distinction in the wh-data, which has been brought home most forcefully by recent work in Japanese (Hoji 1985), and which also echoes a position that Joan Bresnan took further back (Bresnan 1977). Namely, given a dislocated topic or wh-word, it appears that there are two derivations, in a language like Japanese, which could have produced it. In one, the dislocated element (perhaps a topic) has been generated in its theta position at D-structure, and is moved to the dislocated position in the course of the derivation, by the rule Move-α. In the other derivation, the dislocated element is generated in place at D-structure, this dislocated element is assigned a theta role, and is somehow linked to the gap. The most plausible means by which this could be done would be via operator movement plus ultimate co-indexing of the operator with the dislocated phrase (as in Chomsky’s 1977a, analysis of toughmovement), though it is possible that there is some other sort of indexing procedure altogether. Note that in this case, one might wish to assume that there is some sort of auxiliary theta role associated with the base-generated D-structure position (perhaps in tough-constructions in English), so that the element gets a theta role at DS, and that it additionally gets a theta role from the operator. See also Chapter 5 (end of chapter), where I suggest that this possibility may hold for wh-questions in early speech. (70)
Dislocated element
coindexed by Move-α
generated in place, linked to gap by Move-α of operator
These possibilities may be put together as follows:
ADJOIN-α AND RELATIVE CLAUSES
(71)
119
Wh-Dependencies
Dislocated
Not Dislocated (+wh-feature need not be satisfied at SS)
Movement of Element
Base-generation of Element
In terms of parameter-setting, the parametric situation here would be determined, first, by the necessity for the wh-feature to be satisfied by SS (or not), and second, by the possibility of theta assignment to dislocated positions. There does seem to be some evidence in the acquisition sequence for a “switch” between the two left branches in (71) (see Chapter 5), but no evidence for the child adopting the right branch in initial grammars, i.e. that the +wh feature need not be dislocated at SS in English. This may be due to the fact that the right branch is not a default option cross-linguistically, or due to the fact that there is no positive evidence for that option, or due to some other factor. See Chapter 5 for discussion. I should note that a Barriers-type analysis suggests the possibility that it is not the wh-element itself which binds the trace, as suggested in traditional analyses of wh-movement, nor an index associated with the phrasal category, as suggested in van Riemsdijk and Williams (1981), but the wh-feature itself in Comp. (72) (73)
traditional analysis: whoi did John see ti? van Riemsdijk and Williams analysis: I don’t know whoi (i John saw ti)?
120
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(74)
(after Chomsky 1986) S NP I
VP don’t know
CP whoi
C′ Compi
S
+whi
John saw ei
There is some evidence for this position, see Chapter 5 for discussion. 3.5 Relative Clause Acquisition Let us now take a look at the acquisition of relative clauses. If the above analysis of parametric variation is illuminating, it should carry over, ceteris paribus, to the acquisition of relative clauses as well. The major work on the acquisition of relative clauses is Tavakolian (1978); previous work had been done by Sheldon (1974) as well as many others; subsequent work has been done by Goodluck (1978), Solan and Roeper (1978), Hamburger and Crain (1982), and again many others. The basic contention of Tavakolian (1978) is the following: children attempt to parse RCs with the rules (and computational resources) present in the grammar; to the extent that these fail, they adopt a conjoined clause analysis of the relative clause. The relevant structures, then, would be the following. (75)
Subject/Object relative (adult grammar): S NP
VP V
NP NP
The sheep
kissed
S
the monkey who tickled the rabbit.
121
ADJOIN-α AND RELATIVE CLAUSES
(76)
Subject/Object relative (child grammar, when it fails): S S
S
NP
VP
NP
VP
The sheep
kissed the monkey
who
tickled the rabbit.
Tavakolian notes that children interpret this in the same manner as the corresponding conjoined clause structure: “The sheep kissed the monkey and tickled the rabbit.” Assuming, as she assumes, that there is a null element in the conjoined clause (i.e. the relative), and assuming that there is the high attachment, then the propensity for young children to interpret the relative clause as if it were modifying the first subject (the sheep above) is explained. It is treated as a sort of co-ordinate construction. This hypothesis differed from an earlier hypothesis due to Amy Sheldon, who suggested that children attempt to maintain parallelism in grammatical function between elements in the matrix and the relative clause: thus, in Sheldon’s view, subject-subject relatives would be wellunderstood (i.e. RCs where the subject had been relativized and associated with the main clause subject), and object-object relatives, but not subject-object relatives or object-subject relatives. Tavakolian adduced a number of pieces of evidence for her position. The most interesting have to do with the difference in the comprehension of RCs with relative subjects, depending on whether they hang off of a subject or an object NP. For subject-subject relatives, the comprehension data is the following (Tavakolian 1978): (77)
SS relatives: The sheep that hit the rabbit kisses the lion. 1 2 3
(78)
Response category Correct (12,13)
12,23
21,23 12,32 other
Age 3.0–3.6 Age 4.0–4.6 Age 5.0–5.6
18 16 22
2 5 0
1 1 0
0 0 2
3 2 0
Totals
56
7
2
2
5
Note: A 12, 13 response means that the child acts out 2 actions, one in which the first NP acts on the second (12), and one in which the first NP acts on the third (13). Similarly for all the number pairs.
122
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
It is clear that children do very well in the comprehension of this relative. For contrast, now, consider the object/subject relative: i.e., the subject relative off of an object. Note that according to the usual notions of parsing complexity, these structures should be easier than the subject relative off of a subject, since they involve right branching rather than left branching. The results are the following. (79)
Object/Subject relatives “The lion kissed the duck that hit the pig.” Response Category Age
Correct (12,23) 12,13 12,31 12,32 21,23 Other
3.0–3.6 4.0–4.6 5.0–5.6
1 4 9
17 15 13
1 3 1
2 1 0
1 0 1
2 1 0
Total
14
45
5
3
2
3
The clear result, lessening over time, is that children choose a response in which the subject of the relative clause is the matrix subject, not the matrix object: i.e. the child takes the 12, 13 response (as if, “the lion kissed the duck and hit the pig”), not the 12, 23 response (“the lion kissed the duck and the duck hit the pig”). This is a remarkable contrast with the Subject/Subject response, where the child’s response is largely appropriate. Note also that it would be unexpected given the usual parsing theory of greater complexity in left-branching structures. Given the analysis of relative clauses suggested in this chapter, however, the Tavakolian data follows immediately. Let us modify Tavakolian’s “conjoined clause analysis”, so that it becomes not simply a parsing principle, but is integrated into the general structure of the grammar. Let us assume, in particular, as above, that relative clauses are not present in the base, but rather are added-in in the course of the derivation. There are two ways for a language to do so. It may have recourse to the rule Adjoin-α, which adjoins a relative to its nominal head. This is shown in (80). (80)
structure 1: S NP
structure 2: S′ VP
V
NP
Output: English-type relative
ADJOIN-α AND RELATIVE CLAUSES
123
Or, it may conjoin the structure (Conjoin-α). For convenience, let us assume that this involves daughter adjunction under S. These are the parametric possibilities. Let us now make the simplest assumption about the immature grammar: that it may, under conditions of computational complexity, have recourse not to the actual rule in the target grammar of the language to be learned, but rather to the default rule allowed by UG. (81)
The immature grammar may have recourse to the default rule.
In such a case, the grammar would be “un-English”, but it would not be “unUG”. It would simply be displaying an option available in UG, but unavailable in the language to be learned. Note that this gives a rather different view of parameter-setting than is conventionally understood. Rather than the child first setting the parameter at a default, and attempting to learn the actual value, the child has the actual value as a target at all times. When the grammar/computational system fails, it takes the default, as a default. It is not, however, vacillating between two choices. A physical analogy would be: the grammar is a 3-space, and in the 3-space are hills to be climbed — these are the parameters to be set. In times of computational difficulty, the grammar may fall into a hollow. These hollows are the default settings. Both the hilltops and the hollows are specified possibilities of UG: the system, however, is trying to hill-climb. It is only under conditions of computational complexity that recourse is had to a setting not true to the target language. Returning from these general considerations, let us consider how recourse to the default setting will explain the Tavakolian data. Let us take as a point of departure the assumption that children are having recourse to the default setting: Conjoin-α. Let us also, more tentatively, assume that this involves daughter adjunction of the RC S-node underneath S. (82) (83)
Default grammar (children): Conjoin-α Conjoin-α is daughter adjunction under S.
What now happens in the case at hand? Assuming the default grammar in (82) for both the subject and object relatives, the child would have the following structural analyses.
124
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(84)
Relative off of subject: structure 1: S NP
structure 2: S′ VP
V
NP
Output after Conjoin-α: S NP (85)
S′
VP
Relative off of object: structure 1: S NP
structure 2: S′ VP
Output after Conjoin-α: S NP
VP
S′
In both cases, the RC is daughter adjoined to the S. However, this gives us precisely the result that we want with respect to interpretation. The relative clause does not form a constituent in relation to any NP: in all cases it hangs off of S. There is, therefore, no natural relative interpretation. However, the RC does lack a subject: it is a sort of predicate with an unsaturated subject position. This means that it may be construed with any sister NP to form a proposition. In the case of the relative clause off of a subject, the sister NP will be the subject of the sentence, and the appropriate interpretation will be given — accidentally, so to speak. In the case of the relative clause off of an object, the RC will again be daughter-adjoined under S, and the relevant sister will be the subject of the sentence: the object of the sentence would not c-command the relative clause. The interpretation that will be given, therefore, will be one in which the relative clause is construed as interpreted with the subject of the sentence: the wrong interpretation.
ADJOIN-α AND RELATIVE CLAUSES
125
By assuming that the child has recourse to the default operation, then, we are able to account for the pattern of data in the misconstrual of these relative clauses by children. Strikingly, no separate parsing principle is needed, but what is needed is a radical restructuring of our understanding of the genesis of relative clauses. In this way, the acquisition theory may actually lead the syntactic theory to a novel analysis. Is there any additional evidence that this sort of analysis is correct ? In fact, there is. In Tavakolian’s analysis, the difficulty for children in interpretation is linked to a structural difference in the phrase markers between children and adults: the Object/Subject relative is attached high by the children but not by adults. Solan and Roeper (1978) distinguish Tavakolian’s account from the parallel structures account in the following way. Solan and Roeper constructed sentences in which the relative clause, a subject relative, is attached off of an object. This is similar to the Object/Subject sentences above. However, they provided a crucial test to determine whether the high attachment analysis (conjoined clause analysis) is correct. Namely, they chose sentences which contained, in addition to the direct object, an obligatory prepositional object. These were sentences using the verb put. (86)
The lion put the turtle that saw the fish in the trunk.
The adult analysis of (86) would have the RC adjoined to the head noun the turtle. This is shown with the full line in the diagram in (87). Suppose, however, that because of computational difficulties the child cannot use the rule Adjoin-α. Then the default Conjoin-α should appear (the Tavakolian analysis). However, in this case, Conjoin-α, interpreted as S-conjunction also must fail, because it would result in crossing branches (Solan and Roeper 1978). Hence there is only one further possibility, that the relative clause remains entirely unattached into the structure. Now if we make the additional obvious assumption that only rooted structures may be interpreted, this would mean that the relative clause would not be erroneously interpreted by the child as conjoined (i.e., as having the subject as its antecedent) for these put constructions, but rather would not be interpreted at all. In fact, this seems to be the case (88).
126
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(87)
S ? NP
VP V
The lion
put
NP
PP
NP
S
the turtle
that saw the fish
in the trunk
(88) Conjoined Clause Response Failure to Interpret RC Sentences with put Sentences with push
00 40
42 06
The Roeper and Solan data show clearly that Tavakolian’s analysis, and the analysis here, are correct. The child first attempts to have recourse to the adjunction structure: Adjoin-α. If that fails, he or she attempts to conjoin the structure: Conjoin-α. If that fails, the relative clause must remain unrooted, and so uninterpretable. (89)
a. b. c.
Adjoin-α (Correct Interpretation) If fails, Conjoin-α (Conjoined Clause Interpretation), If fails, remains unrooted (No interpretation)
The acquisition data and the syntactic analysis involving a syntactic rule of adjunction are then in perfect parity.
3.6 The Fine Structure of the Grammar, with Correspondences: The General Congruence Principle I wish now to present one view as to how the theory of levels, the theory of parametric variation, and the theory of acquisition relate. In the sections above, I have suggested that there is a rule adding relative clauses, and in general adjuncts, in the course of the derivation. However, this rule itself, Adjoin-α, has a certain substructure with respect to the derivation. We may consider it as an optional rule in UG ordered before the default associated with it, Conjoin-α, which it completely bleeds, if present.
ADJOIN-α AND RELATIVE CLAUSES
127
DS
(90)
(Adjoin-α) Conjoin-α SS UG Specification DS
(91)
Adjoin-α DS
Conjoin-α SS
(Adjoin-α) Conjoin-α
DS
SS
Conjoin-α SS Universal Grammar This general point of view, a parameter-setting approach, has the structure in (92), with G1 being a relative clause-head language like English, and G2 being a co-relative type language. (92)
G1 G0 G2 Universal Grammar
128
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
As noted in the previous section, this view, while of a parameter-setting type, differs from a standard parameter-setting view in at least two ways. First, the initial grammar G0, is not a possible final grammar. The initial grammar has Adjoin-α as an optional rule, and this presumably is not an option — at least generally — for the final grammar. A more usual parameter-setting approach might have the child’s grammar originally set at either G1 or G2, and changing over, if necessary, to the other type. Second, this view diverges from a standard parameter-setting view with respect to the notion of “setting a parameter”. In the standard view, a parameter is a sort of cognitive switch: the child starts with the switch set in a particular direction, and the setting of the switch may change. Each position is stable. In the representation in (92) however, with an operation/default type arrangement, the original setting is neither of the final two settings, and the parameter is not so much a switch to be set as a hill to be climbed, a target of the system. Thus when the child fails to apply the rule Adjoin-α in his/her grammar, or in his or her analysis of a sentence, the default rule Conjoin-α is fallen into. It is as if the former (Adjoin-α) were a local maximum surrounded by a local minimum (Conjoin-α). Moreover, the grammar itself must be organized in such a fashion: that when the attempt at a target rule fails, a local minimum must exist to fall into, which itself is a possible specification of UG. This at least is an obvious conclusion to draw from such an approach. There is, finally, an interesting property of this system in (92) which requires note. It supports the general sort of philosophical framework that has been set forth by Chomsky (1986a) and Fodor (1975, 1981) with respect to the nature of learning. The instance of parameter-setting given in (91) is very strange from the point of view of traditional learning-theoretic and behaviorist notions — or even more commonly held man-in-the-street views — according to which learning is an accretion of knowledge or information. What is actually happening in (91) is the reverse of that. Each of the two final grammars in (91), G1 and G2, actually has less information than the initial grammar G0 in terms of the number of bits of information in them. The initial grammar has two pieces of information associated with the operation Adjoin-α: Adjoin-a itself, and the parenthesis ( ) surrounding it. The final grammars have less information than that. In the English type grammar, the parentheses surrounding Adjoin-α have been erased: this means that the single piece of information, Adjoin-a, as an operation, exists. The co-relative language contains even less information, containing neither Adjoin-α nor the parenthesis. This means that both of the final grammars have fewer pieces of information in them than the initial grammar: the process of “learning” involves the erasure of information specified in UG, at least for this
ADJOIN-α AND RELATIVE CLAUSES
129
central case. This is very much in line with the sort of view of learning that Chomsky/Fodor propose, and much against an accretionist view. Let us return to the main problem. The general structure of the choice situation in (92) can be used both to describe the parametric situation crosslinguistically, and the child’s acquisition problem. The child’s undecided language may be associated with G0, and she may choose either of the two options, G1 or G2. There is an asymmetry in the choice of options, in that G2 is the default (cf. the section above): if the child is aiming for G1 as the target grammar she may fall into G2 under conditions of computational complexity, but not the reverse. Further, there are three welcome (or at least interestingly different) features of the analysis which recommend it: i) it introduces a developmental aspect, in that the initial grammar in UG, G0, is not a final grammar, ii) it views parameter-setting not so much as setting a switch, as attempting to reach a target grammar (climbing a hill): the default grammar is therefore fallen into, rather than initially specified, and iii) it views learning as the erasure, rather than the accretion of information. All this appears well and good. But there is a hidden difficulty at this point for the thesis of this work. According to the General Congruence Principle (Chapter 2) there is some congruence relation between the acquisition sequence and the organization of operations in the grammar. But it seems fairly clear that this is not the case for the analysis of relative clauses presented so far. The particular format of the parameter-setting approach given in (92), repeated below, can hold both for the structure of parameters cross-linguistically and for the child’s setting of a parameter in her language. (93)
G1 G0 G2 UG: Structure of choice of grammars (parameter-setting)
The structure of operations in the grammar, in UG has so far been presented as rather different: it involved the optional specification of Adjoin-α followed by the obligatory specification of Conjoin-α.
130
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
DS
(94)
(Adjoin-α) Conjoin-α SS UG: Structure of operations A language like English would erase the brackets in (94), while a corelative language would erase the entire specification (Adjoin-a). However, if a principle like the General Congruence Principle is to be correct — suggesting that there is a deep correspondence between the structure of operations within a grammar (94), and the parametric-acquisitional choice (93) — then the particular organizations in (93) and (94) cannot possibly both be correct: there is no isomorphism between them, as can be seen by simple inspection. Rather, either (93) or (94) must be mistaken, and there must be a common format for the two aspects of Universal Grammar. Let us therefore proceed in this manner. Change the format in (93) from that in which a pure choice exists between grammar G1 and G2 to something of the format in (95). (95)
Adjoin-α G0((
Conjoin-α G1)
G2)
Parametric Specification in UG The interpretation of the parenthesis in (95) will be peculiar; I return to this below. And let us keep the format of the operations in the grammar virtually the same, changing it slightly in typography. (96)
DS ((
O1
s1)
O2
s2)
O1: Adjoin-α O2: Conjoin-α Ignoring parenthesis, the figure in (96) is to be read as follows: the operation O1, Adjoin-α, maps the DS into structure s1, the operation o2 maps the representation
ADJOIN-α AND RELATIVE CLAUSES
131
into structure s2. S2 may be identified with SS in the case where no other operations have applied. It is apparent that the structure of the grammars in (95) and the structure of the operations in (96) are identical. More on notation. The parenthesis are not to be read as optionality. Rather, they are to be read as invisibility. The material inside the parenthesis is invisible to the grammar/acquisitional device. The grammar develops by removing parenthesis, allowing for the instantiation of operations already specified in UG. Second, the particular numbering on the grammars, operations, and structures (e.g., s1 vs. s2) is of no ultimate significance — s2, for example, may come directly after DS in a particular language’s grammar. Let us start now with (96). I suggested earlier that a child’s initial grammar had recourse to a default rule of Conjoin-α to analyze relative clauses. Prior to this, however, the child’s grammar filters out relative clauses altogether, only understanding the main proposition. The full developmental sequence is the following. (97)
Stage I: Stage II: Stage III:
relative clause not understood at all (filtered out) relative clause understood as generated by the rule Conjoin-α relative clause understood at generated by the rule Adjoin-α
This full developmental sequence is represented in the diagram in (96) if we assume that: i) operations within the parenthesis at time t are unavailable to the grammar at that time, and ii) the child progresses by removing parenthesis in the UG representation, starting from the most external set and proceeding inward. Consider how this would work. The initial grammar for the child would simply be the following: o1 o2 (98) DS (( s1) s2)
O1: Adjoin-α O2: Conjoin-α The operations in parentheses would be unavailable to the child in the initial grammar, that is both the operations Adjoin-α and Conjoin-α would be unavailable. This means that the representation in the grammar with respect to these operations would be simply that in (99). (99)
DS
132
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Since by earlier assumption the DS representation is a pure representation of the rooted argument-of relation with no adjuncts present, this means that the child’s initial analysis of a sentence like (100a) would be simply (100b), without Adjoin-α or Conjoin-α applying. (100) a. b.
The man saw the woman who knew Bill. The man saw the woman.
That is, the child’s grammar at the stage in (99) would be doing the “adjunct filtering” that was noted earlier. This goes along with the observation that in initial stages, relative clauses are simply dropped by the child. What then is the next stage? According to the above, parenthesis are erased, starting from the outermost inward. Erasing the outermost parenthesis in (98) would give rise to the following grammar: (101)
o1
DS (
s1 )
o2
s2
O1: Adjoin-α O2: Conjoin-α Assuming that the material inside the parenthesis is invisible to the child, this is simply equivalent to the grammar in (102), where the numbering on the operation (O2) is not significant. (102)
o2
DS
s2
O2: Conjoin-α Conjoin-α will be the operative operation in the grammar at this point. This means that the child will interpret a sentence with the actual bracketing in (103a) as having the bracketing in (103b), and one with the actual bracketing of (103c) as having the bracketing in (103d). This is because only Conjoin-α, not Adjoin-α, is part of the grammar at this point. This explains the Tavakolian result. (103) a. b. c. d.
(S (S (S (S
The man saw (NP (NP the woman) (S′ who knew Bill))) The man (VP saw (NP the woman)) (S′ who knew Bill)) (NP (NP The man) (S′ who knew Bill)) (saw the woman)) (NP The man) (S′ who knew Bill) (saw the woman))
In (103b) the relative off of the object has been attached high — daughteradjoined under S. This means that the subject is the only possible “controller”,
ADJOIN-α AND RELATIVE CLAUSES
133
i.e., coreferent item, with the subject variable who. This is indeed the mistake that children make, choosing the subject of the sentence as the subject of the RC. In (103d), the relative clause is again attached high. However, in this case, with the subject as controller, the correct interpretation is gotten — even though the structural analysis is faulty. So the Tavakolian facts follow. In the final stage in the acquisition of the construction the innermost brackets are removed (e.g, in the acquisition of English). This gives rise to the following derivational representation. (104)
DS
o1
s1
o2
s2
O1: Adjoin-α O2: Conjoin-α This was exactly the sequence of operations that we noted earlier as the appropriate one for English ((91) above), with Adjoin-α continually bleeding Conjoin-α for the appropriate choice of structures. Thus the sequence of grammars that the child passes through is accounted for if we assume the following representation in UG — together with a rule which removes outermost brackets on the basis of positive evidence. (105)
DS ((
O1
s1)
O2
s2)
O1: Adjoin-α O2: Conjoin-α Consider now the parametric situation. I suggested earlier that there was a congruence between the structure of operations in levels, and the structure of parameter-setting itself. This requires that the parametric structure of grammars is the following: (106)
Adjoin-α G0((
Conjoin-α G1)
G2)
The parenthesis are given the same interpretation as above: as “invisibility” if present. The first grammar would therefore be that in (107a), this would be read simply as (107b).
134
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(107)
Adjoin-α a. b.
G0(( G0
Conjoin-α G1)
G2)
Neither the adjunction nor the conjunction operation would be an attribute of this grammar. While no human language apparently has this property — i.e. is a pure representation of argument structure with no adjunctual possibilities allowed: all are too rich for this — it is possible that certain subparts of natural language have precisely this property. I am thinking in particular of the lexicon, or lexical representation, which is often thought to represent argument structure, but not adjunctual structure or conjunction (Bresnan 1982; Zubizarreta 1987). The idea that the most primitive grammar is a pure representation of argument structure, and that this is the “type” of the lexicon also goes along with the idea presented in Chapter 2, that the original grammar is the lexicon, where the lexicon itself has a tree-like structure. The next ordered grammar would involve the removal of the outermost parenthesis in (106). This would be the representation in (108a), which would be read as (108b) (recall that the subscripts bear no absolute significance). (108)
Adjoin-α a.
G 0(
b.
G0
Conjoin-α G1)
G2
Conjoin-α G2
This would demarcate precisely the co-relative languages with no head-adjoined clause structure. These grammars are less rich structurally than those containing Adjoin-α, and are the first to be reached temporally. The next and final grammar would be the one reached after the final, innermost, parenthesis had been removed. (109)
O:Adjoin-α G0
O:Conjoin-α G1
G2
(arrows to be read as the addition of the operation to the grammar) What is the significance of this representation? At first, it appears quite unreasonable. It states that the child, having started from an original G0, passes to a grammar which is characterized by having in addition the operation Adjoin-α. However, from there, he or she has the additional operation of Conjoin-α added
ADJOIN-α AND RELATIVE CLAUSES
135
into the grammar. But since Conjoin-α will not even be relevant now for the structures under consideration, what sense does this make? I would suggest, however, that it is precisely this organization that is needed to allow for the fact that when the child fails with Adjoin-α, he or she falls into a grammar that is characterized by the rule Conjoin-α. If we assume the bifurcationist structure above, repeated again below, then there is no reason in the format of the grammars themselves why the child should fall into G2, failing G1. This is represented by the arrow, but that has no formal significance. (110)
G1 G0
falls into G2
But given the format in (109), there is such an organization. Under normal conditions, the rule Conjoin-α is always “ready to be added” to the grammar in (109). However, for the relevant RC structures it is always bled, since the rule of Adjoin-α is known to apply. Consider now what happens upon the failure of Adjoin-α — let us say on a construction-by-construction basis. To represent this we may simply cross out the operation, and the resulting grammar will look like (111b) (recall that the numbering on grammars has no significance). (111)
Adjoin-α a.
G0
b.
G0
Conjoin-α G1
G2
Conjoin-α G2
This grammar, however, is simply the grammar in which Conjoin-α holds: exactly the default grammar that was wanted. Thus by this particular arrangement of grammar, the retreat of a grammar into a default is accounted for formally notationally, and not simply by fiat. Rather than crossing out the operation (Adjoin-α), we may consider a failure under conditions of computational complexity as tantamount to the reinsertion of parenthesis in a format in which they have already been removed. This would be equivalent to a regression to a state which is less specified, and closer to the original Universal Grammar representation. The grammar would thus “regress” to the grammar in (112a), which is read as (112b).
136
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(112)
Adjoin-α a.
G 0(
b.
G0
Conjoin-α G1)
G2
Conjoin-α G2
Namely, the default grammar, in cases of computational complexity, would be the grammar in which conjunction held. This is precisely the needed result. To summarize: in this section, I have argued that there ia a principle, the General Congruence Principle, which relates structures of operations in a grammar, and the structure of parameters themselves. These are equivalent up to isomorphism, at least for the analysis of relative clauses. Second, the child proceeds by removing parentheses in a representation. This supports the Chomsky/Fodor position with respect to learning: in this case, at least, learning is not the accretion of information, but the removal of information, representing the removal of possibilities from a universally specified set. Third, under conditions of computational complexity the child falls into a default grammar involving conjunction rather than adjunction. This however is not due to a separate parsing principle (as Tavakolian suggests), but rather due to a retreat to a grammatical format which is closer to the UG format, with all parentheses included. Finally, the nature of parameter setting is taken to involve not so much the flipping of a switch, but the climbing of a hill. This represents a local maximum (the target grammar) surrounded by local minima (the default options). Both the target grammar, and the fall-back grammars must be represented in UG.
3.7 What the Relation of the Grammar to the Parser Might Be A common position in the acquisition literature has been that the grammar remains constant over time, and is hidden by the exercise of parsing strategies. That is, when the grammar/computational system fails to come up with an analysis, an exogenous parsing strategy enters in and returns an analysis not countenanced by the current grammar: the child’s analysis, so to speak, “falls out of” the grammar, and returns a value which is not one of the possible permissible targets. In the following, I would like to take the position that that sort of masking of the grammar does not in fact occur: i.e., that it is not the case that the grammar remains constant and is masked by an autonomous system of parsing, production, etc., with its own separate principles. Rather, to the extent to which parsing and performance considerations matter, they do so via the
ADJOIN-α AND RELATIVE CLAUSES
137
grammar itself, either directly, in the sense that possible restrictions on left-toright computations are taken up into and stated in the grammar, or indirectly, where if the child’s analysis fail, it falls into another permissible grammar, as in the discussion above. Clearly a complete argument for this position would be impossible at present, given that so little is known about the parser. What I will do instead is outline some possible relations of the parser to the grammar, with special attention to the sorts of claims that have been made in the acquisition literature. Let us imagine what a “parsing account” of ungrammaticality might be, by imagining two instances of it. The first is taken from Frazier (1979), in which she suggests that there may be a parsing ground that is required for sentential subjects. (113) a. That John loves horseradish is obvious. b. *John loves horseradish is obvious. Frazier notes the following: suppose that there is a parsing principle like that of minimal attachment. Then the sentential subject without a that complementizer would, in the course of the parsing derivation, be immediately attached to the root. (114) Parsing representation: left-segment Sroot
NP
VP
John
loves horseradish
This representation would then have to be reanalyzed at a later point, so that it was subordinated, at a later stage of the parse. Suppressing details: (115)
S S′
VP
John loves horseradish
is obvious
reanalyzed as subordinated
138
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Such a reanalysis would not have to be done for the sentential subject marked with that, a subordination marker. Suppose that we assume that such a reanalysis is either: i) costly, or ii) impossible under these conditions. If the latter, then we would have the basis for a direct parsing account for the ungrammaticality in (113b). If the former, which is what Frazier assumes, then the necessity for the that-complementizer is a parsing-based necessity, but this is encoded in the grammar in a way which may not be based on a parsing vocabulary at all, for example in terms of proper government (Stowell 1981). By parsing vocabulary in the last sentence, I mean minimally an explanation which depends on leftto-rightness. Let us take a second example, a direct parsing-grammatical account of that-trace effects. This particular account is by myself, and it is intended for demonstration purposes only (though it may turn out that something like it is true, if facts like the que/qui alternation in French could be handled). Assume that the partial computations in the left-to-right parse must be grammatically well-formed, fulfilling X′ theory, proper government of null categories, etc. Assume further that null closed class heads need not be postulated until the end of selected domains of the parse, and that a limited amount of reanalysis, in terms of addition of phrasal categories, is allowed (exactly how this is done, I will leave unspecified). Consider now the that-trace effect in (116). (116) a. Who do you believe e is here? b. *Who do you believe that e is here? In (116a), the null category could be parsed as part of the matrix in a left-toright partial computation, if we assume that null categories in argument position must be projected immediately, and that categories like the embedded IP are projected at the time that their heads are encountered. (117) Partial Parse: CP
Who do
you
VP V
believe
NP e ...
properly governed (in partial parse)
ADJOIN-α AND RELATIVE CLAUSES
139
On the other hand, no such partial parse exists for the construction with that. The null category, if posited, will not be properly governed during the intermediate parse, prior to the uncovering of Infl.
CP
(118)
Who do
you
VP V
believe
CP C that
IP NP e not properly governed
This would then constitute a parsing-grammatical explanation for that-trace effects. It would make predictions as well: e.g. that that-trace effects should not be collapsed with a general inability of extraction from subjects. As noted above, this explanation is for demonstration purposes only: what I would like to concentrate on is not the particular explanations above, but their general type. These would constitute genuine parsing-theoretic explanations of types of ungrammaticality. Frazier’s explanation would restrict the set of grammar by developing constraints on the possible re-analysis of a partially parsed tree; the constraint directly above would restrict the set of grammars by stating well-formedness conditions on partially computed objects. In this latter case, these well-formedness conditions would be exactly the same conditions which characterized the full phrase marker. The claim is often made that the early linguistic system is more dependant upon or is masked by the parser, but it is not quite clear what this means. For this type, would it mean that there are more constraints of this type on the early grammar? That they are stricter? Further, it is unclear, in a terminological way, that one would want to call the above constraints “parsing” constraints. Let us call a constraint a left-linear constraint if it is a constraint on the building up of a tree (from a string), in a left-linear fashion. The two example constraints above would be instances of leftlinear constraints. The parsing theorist, insofar as he or she is making claims about the parser directly determining the properties of the grammar, is making well-formedness claims about the formal, partially computed object in a left-to-
140
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
right analysis. Yet these constraints on left-linear analysis characterize the speaker as well as the hearer. But then, insofar as such constraints exist, they should be simply be considered part of the grammar: i.e. a part of the grammar concerned with the well-formedness of certain subtrees. That is, the grammatical theory should be expanded to consider these as part of the grammar: they would be part of some future grammar, they would be left-linear constraints in such a grammar, and would not have a different ontological character than simple grammatical constraints. The following sorts of relations seem possible (focussing on the “failure” of the grammatical system in acquisition). (119)
Role of Parser
Parsing considerations determine grammar (at least partly) Directly
Left-Linear Constraints
Indirectly
Frazier Fall into less mature grammar (this book)
Parsing considerations do not determine grammar
Parser returns value Ranking not in grammar Hypotheses Grammar Masked (Hamburger and Crain)
Correct value returned; unavailable
The parsing theory implicitly adopted in this chapter, and in the work as a whole, is that parsing considerations indirectly determine the grammar, in the sense that computational difficulties cause the system as a whole to fall into a less mature system, where this system is both grammatically prior and computationally simpler. That is, it is not so much that the parser determines the form of the grammar, but that the parser (i.e. computational considerations) partly determines what grammar one is in, out of a sequence of successive grammars in acquisition. In this particular addendum, I have suggested there may as well be ways by which the linking is more direct, to the extent to which left-linear constraints are directly stated in the grammar (see Weinberg 1988, for such a view) — unfortunately there have been very few grammatical-parsing theories of this left-linear type, so it is difficult to gauge their range of application: see Marcus, Hindle, and Fleck (1983) for an exception. In general, I should tend to favor either of these two sorts of approaches on the left, which may be broadly
ADJOIN-α AND RELATIVE CLAUSES
141
distinguished from those which, when confronted with a nonadult structure in acquisition, hold that the grammar is fully adult, but that it is not reached by the parser: i.e. that the parser returns a value not in its permissible range. Rather, it seems preferable to assume that the grammar is organized in such a way that when the child is confronted with a structure too (computationally) difficult for him or her to analyze, the grammar/computational system falls back as a unit to a grammar/computational system in which the child can analyze the string and return a permissible value in the less advanced system, even if some elements in the string must be ignored (and part of the meaning may be ignored or erroneously construed). This would be the case if the following were true. (120) Property of Smooth Degradation The child’s analysis degrades smoothly when faced with a not fully understood input. (121) Principle of Representability All analyses by the child are generated by the child’s grammar. These two assumptions may appear to be obvious, but their mutual adoption has, it seems to me, far-reaching effects in the grammar. One would expect the property of smooth degradation to hold of any truly robust learning system. When a failure occurs in the analysis of the input, it ensures that the child has some sort of analysis of an incoming string, and thus ensures that when the child hears a sentence that cannot be completely analyzed, a partial analysis will still be able to be given, so that i) the meaning can be partially recovered, and ii) the elements which are not understood can be isolated. The Property of Smooth Degradation distinguishes, I believe, the particular sort of “failure” that one finds in intermediate stages of child language, from those which occur in default due to injury or stroke, i.e. various types of aphasia, and of course from other sorts of simple, non-redundant input systems like radio receivers. The Principle of Representability requires the grammar to be “reached” by the parser at all times: all values in the parer’s range are in the grammar. How should the Property of Smooth Degradation be modeled (if it in fact is the case, as it seems to be)? It suggest that there is a sort of redundancy in the system. However, this redundancy is not in the formal redundancy of identical element, nor in the overlaying of constraints (see Chomsky’s 1980, comments on the Tensed Sentence Condition and the Specified Subject Condition), but in the overlaying of a more articulated and richer system over one which is less so. Part of the way in which this could be accomplished would be by having recourse to operation/default organization within the grammar, another would be to have two (or more) systems operating in distinct vocabularies over the same
142
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
input string: in particular, the Case and theta system (perhaps several systems within Case); see Chapter 4. With respect to vocabulary, the redundancy is a functional redundancy, not a formal redundancy: the two systems are distinct in their primitives. But then by the Principle of Representability, the partial analysis by the child must itself be represented in the grammar. That is, the child may be viewed as passing through a sequence of distinct and gradually enriching grammars, the simpler grammars acting as a back up for the more complex ones. Many of the traditional findings in the psycholinguistic literature may be viewed in precisely this way: not as the exercise of autonomous parsing principles, but as the falling back into an earlier grammatical analysis. For example, a traditional finding due to Bever (1970) is that children initially misanalyze passives, when they do, as equivalent to the corresponding active form. (122) John was V-ed by Mary. Child’s interpretation: John V-ed Mary. active Bever interpreted this as involving the exercise of an autonomous parsing strategy: namely, the child tries to fit the structure NP-V-NP over the input string, where the first NP is an agent, and the second, a patient. Yet this same finding may be viewed not as implicating a parsing strategy, but of the fitting of the direct lexical form of the verb, or its form after Project-α, to the input. As such, part of the string would be (mis-)analyzed, and the rest would be ignored. That is, instead of viewing the misanalysis as due to the intercession of a separate system, one may view it as the retreat to a former system — respecting the Principle of Representability above. Let me go into somewhat more detail about what the indirect approach in (119) above would be for relative clauses. Let us suppose that, with respect to a given construction type, e.g. relative clauses, the child successively adopts over the course of development one of three analyses: i) the first, in which relative clauses are entirely filtered out, ii) the second, in which there is high attachment (Tavakolian 1978), and iii) the third, in which the relative clause is correctly adjoined to the head, and construed with it. The second of these analyses corresponds roughly to what is a co-relative construction in the world’s language: the third of these corresponds to a grammar in which the phrase marker is a pure representation of the argument-of relation. We may list the successive grammars below. (123) Grammar for Relative Clause G1 ———— RC not attached G2 ———— High Attachment G3 ———— Regular Attachment
ADJOIN-α AND RELATIVE CLAUSES
143
Suppose now that we reach a particular situation in acquisition. Namely, a child sometimes chooses a high attachment for the relative clause, and sometimes chooses the correct NP-S′ or Det-N′-S′ analysis. One possibility is that the child’s grammar picks out the right analysis, but the child’s parser incorrectly returns a different analysis. That is, the parser returns a value — an erroneous phrase marker — which is not one of the permissible values in its range, the set of structures countenanced by the grammar. The parser masks the grammar. The indirect possibility is the following. At a given time, the entire computational device, the grammar/parser, is at a certain stage of development. There is a particular grammar that the device is located at, G3. Together with these grammars are paired analyses sanctioned by them. (124) Time t1 t2 t3
Grammar G1 G2 G3
Analysis A1 A2 A3
The claim is the following. If the grammar at a particular point fails, then it is not masked, but retreats back to the analysis associated most directly with some previous grammar. Thus at time t3, the child is normally at grammar G3, which is associated with analysis A3. However, given a particularly difficult sentence, the child may fall back to analysis A2, associated with grammar G2, or possibly even A1, though the last would be unlikely. The situation in which the child is sometimes returning values of A2 (high attachment), and sometimes those of A3 (the correct analysis), corresponds to the time in which a child may vary in his or her analysis, according to other factors (computational load, pragmatic considerations, etc.) However, the child never “falls out of” a grammar specified in UG. That is, the parser never returns a value which is not countenanced by any grammar that the child has ever adopted. This means that even the “mistakes” of the child are fully subject to grammatical analysis: they show, in fact, the geological layering of the grammar. This type of analysis has an additional prediction to make. Namely, the simpler grammars that the child falls into must also be computationally simpler. For it would do the child little good to fall into a grammatically simpler system, if the system were computationally more complex. The full position should therefore be the following, where G1 is simpler than G2, is simpler than G3 on grammatical grounds, and P1 is simpler than P2, is simpler than P3 in terms of parsing operations.
144
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(125) Time t1 t2 t3
Grammars G1 G2 G3
Analyses A1 A2 A3
Parsing Operations P1 P2 P3
where G1 < G2 < G3 P1 < P2 < P3 and < denotes degree of simplicity, along some metric I have argued in this paper so far precisely for such a grammatical difference in simplicity. The grammar itself has, for any particular operation, both a place with respect to a lexical sequence of values (the values for the closed class elements, choosing operations), and in the operational sequence. Along the latter of these, a very simple sequencing exists: remove external parentheses. The latter grammars are more advanced than the former in that a larger number of external parentheses have been removed (“the mind employs the art of the sculptor”: at least here). To retreat to an earlier grammar, all that is necessary is to reinsert the most external parenthesis.
C 4 Agreement and Merger The organization of operations that I will assume is within the general framework of Government-Binding theory, but extended to include certain composition operations. While composition operations will be used, the primitives are those of GB theory, including Case theory, theta theory, Move-α, and so on. Moreover, the composition is not strictly bottom-up. This chapter will introduce the relevant notions. There are three basic questions that I wish to focus on: I.
What is the primitive vocabulary of the grammar (NP, VP …; agent, patient, … ; nominative, accusative, … ; subject, object … )? II. What is the set of primitive operations? III. How do the distinct vocabularies (Case Theory, Theta theory, etc.) enter into the description of the phrase marker? What is the organization of rules or operations in the grammar? These questions are intended to be answered in such a way as to guarantee the following: IV. That finiteness is a necessary part of the grammar (see Chomsky 1981, and Chapter 1), V. That differences in vocabulary type are modelled in an adequate way in the grammar — in particular, that the grammar makes the same sensible cuts as the vocabulary types themselves do, by means of its organization (e.g. Case theory vs theta theory; open class vs. closed class elements), VI. That the acquisition sequence bears a congruence relation to the structure of the grammar (Chapters 2 and 3). Each of the questions I–III may be looked at in either of two aspects: with respect to Universal Grammar, or with respect to the mapping in the derivation itself. It is a thesis of this work that there exists a deep isomorphism between the two: i.e. that the structure of choices in UG is isomorphic to the structure of operations in the derivation.
146
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
4.1 The Complement of Operations I take there to be 2 basic processes in the grammar, the second divided into two subparts. These are the following: (1) (2)
Assignment of features (theta assignment) Copying of features a. Unidirectionally (Case government) b. Bidirectionally (Agreement)
The unidirectional copying of features is government. Government takes place under strict sisterhood. The canonical relation of this type would be Case government, where a verb or prepositions copies abstract Case features to its right or to its left. (3)
accusative hit
or (4)
Bill
Input:
hit Bill (+acc.feature) Output: hit Bill (+acc.feature)
The examples in (3) and (4) show two different ways of conceptualizing the operation. The logic is clearer in (4), where an actual feature, +accusative Case, is transferred from the head to the Case-governed element. There is also the bidirectional copying of features (not the same features). This is an instance of agreement. Agreement also takes place under strict sisterhood. The canonical case of agreement is agreement under predication for the subject-predicate relation (Williams 1980). (5)
NP a features d features
VP b features e features
NP a features d features b features
VP b features e features a features
predication
AGREEMENT AND MERGER
147
The mutual copying of features is shown in (5). The relevant categories, NP and VP, each have features associated with them (the labelling a, b, d, etc. is simply conventional: the labels have no significance). In (5), each category is associated with a set of features: the features which will ultimately be copied from them (a features for NP, b features for VP), and a residue (d features for NP, e features for VP). After the mutual copying operation has taken place, NP and VP share certain features (a and b features), and do not share others (d and e features). In the theory of Williams (1980), predication involves the copying of an index from the NP onto the VP. According to the discussion here, this cannot be the case, since predication is actually an agreement relation, agreement between an NP subject and a VP, and such relations are bidirectional, not unidirectional. Rather, predication involves copying the number of the subject NP onto the VP (where it percolates down to the head), and the copying of the external theta role associated with the VP onto the NP subject. (6)
Predication and Agreement NP VP number theta role
This is then a typical instance of an agreement relation, with information passing in both directions. Note that if agreement is essentially a bi-directional operation, the general reduction of the Subject-Infl-Predicate relation into one of government of the Subject by Infl is erroneous. Rather, this is an instance of agreement, a symmetrical operation, unlike government. This would then constitute a distinct primitive relation in the grammar. Along with the unidirectional (Case govt.) and bi-directional (agreement) copying of features, there is another operation, which I have called feature assignment. A better name might be feature sanctioning or licensing. This is different than the copying of features, because in the latter case the features actually do originate with the head, and once they are copied the head no longer retains them. With the assignment or sanctioning of features, there is no copying, but simple licensing in a configuration. Thus I take (7) to be an instance of feature licensing, but not (8). (7)
hit Bill patient
148
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(8)
*Input:
hit +patient
Output: hit
Bill Bill +patient
Of (7) and (8), the first of these, (7), is more accurate. This is because there is no time prior to the licensing of the theta role: it is not ordered in the derivation, at every point the theta role is already assigned. Feature assignment or licensing in this sense is not a temporal operation, as feature copying is, but rather a continuous process. The configuration in (7) is continuously licensed in the course of a derivation. Since there is no copying of information from the head to the complement, feature licensing may continually apply. It is for this reason that the Projection Principle holds. Namely, this relation is not of a copying type, and such relations may take place continuously or constantly over the course of a derivation. To summarize the foregoing, I assume the following operations. (9) Operation Type
Example
Structural Cond.
Informat. Flow
Feature lic. Feature copy a) Unidirect. b) Bidirect.
theta
sisterhood
—————— (continuous)
Case ass. subj-pred.
sisterhood sisterhood
head to compl. (once) sisterhood bi-directional (once)
We have, then, different types of information flow: feature assignment vs. two types of feature copying (unidirectional and bidirectional). Following from this, there is a difference in type or mode of application. If a process involves a transfer of a piece of information (feature-copying), then it must take place at a single time in the derivation. If it involves what I have called feature assignment or licensing, then it may apply throughout. Note that this use of the terms feature assignment or feature licensing is more restrictive than the usual sense, which would include such things as Case assignment (which I am calling an instance of feature copying). For feature assignment or licensing, but not feature copying, there may be constancy principles holding, such as the Projection Principle. Finally, these different types of operations are associated with different vocabularies. Feature assignment or licensing (in this restrictive sense) involves theta roles, unidirectional feature copying is exemplified by case assignment, and bidirectional feature copying is associated with agreement. In the NP-VP case,
AGREEMENT AND MERGER
149
this involves the copying of the number onto the VP from the subject, and the copying of the VP-associated theta role onto the subject. All the processes so far discussed have been assumed to apply under strict sisterhood. There are further distinctions which might be made: for example, it may be that Case assignment requires adjacency as well, while theta assignment does not. This would be the case if the Prepositional object to Mary in cases like (10) were assigned a theta role by the verb — I will assume that it is. (10)
John gave a book to Mary.
4.2 Agreement I would now like to introduce a second type of Case assignment. After earlier work (Lebeaux 1987), I will call this phrase structural Case. I will assume that, like theta assignment, phrase structural Case assignment takes place throughout the derivation. It is closest to the operation Assign GF in Chomsky (1981), but also may be spelled out as a particular case in the case system. Phrase structural Case, unlike structural case, is dependent on mother-daughter relations, not headsister relations (Lebeaux 1987). In Lebeaux (1987), I argue that the first instances of Case assignment to the subject position by the child are actually instances of the assignment of phrase-structural Case, not structural case (see Chomsky 1981: structural case is sister-assigned Case). Thus in examples like (11), phrase structural Case is assigned to the subject position. (11)
My/me did it.
I will assume that there are three major places in which phrase structural case is assigned in adult English. (12)
a. b. c.
subject of S: (NP, S) subject of NP: (NP, NP) topic position: (NP, S″)
If we consider the topic position to be the subject of the utterance, or perhaps the subject of (what used to be known as) S′ or S″, then its appearance would be regularized to the other two. Note that phrase structural Case, like structural case, may take different forms. The subject of NP is marked genitive, while the topic position is marked accusative.
150
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Note as well that all of these positions are either islands (b) and (c), or partial islands (a), from the point of view of extraction. Phrase structural Case assignment differs from simple structural case assignment in a few central ways. First, unlike structural case assignment, it is assigned optionally. Thus the subject position of a NP need not be assigned any case: e.g. if no lexical element is in that position. Second, it would be assigned under the mother-daughter relation. It would not be an instance of feature copying, since this would require that information be copied from a mother node to one of its daughters. Rather, it would be an instance of feature assignment or feature licensing, which has the technical restricted sense given to it above. This fact has a further consequence: phrase structural case may apply several times throughout the derivation. This follows from the fact that the assignment of such case does not involve the transfer of information from one node to another, but rather the scanning of a tree to see if the structural condition has been met. In this sense — as in many others — it is similar to the Assign GF relation in Chomsky (1981), the relation from which function chains are formed (and which must apply throughout the derivation). The full set of operations, then, is the following. (13) Operation Type Example
Structural Cond. Informat. flow
Application
Feature licensing a. theta b. PS case
sisterhood head to sister mother-daughter mother to daugh.
continuous continuous
Feature copying a. Unidirect. struct. c. b. Bidirect. agreement
sisterhood sisterhood
single time single time
head to sister both directions
Thus there are two continuous processes, theta assignment and phrase-structural Case assignment, and two one-time process, the assignment of structural case, and agreement. How are the two operations known or postulated to exist in the grammar, Move-α and Adjoin-α, incorporated into this scheme? Move-α moves an NP into an A or A′ position: the subject position of S, or the Spec C′ position of C″. The latter movement may be thought of as movement into the subject position of C″. Thus both types of movement are movement into a subject position, broadly construed (Pustejovsky 1984). If we conceive of movement in this way, then movement itself may be conceived of as a sort of bi-product of a more primitive necessity. That necessity would be to saturate the +/ wh feature
AGREEMENT AND MERGER
151
in the case of wh-movement, and to saturate Infl in the case of NP-movement. These both would seem to fall under the rubric of agreement: i.e. the necessity for agreement would initiate NP movement. The table above suggests that movement of both types — may be considered a result of, or more exactly, in a 1-to-1 relation with feature satisfaction. Let us adopt the following terminology: an operation O is initiated by a feature F iff the satisfaction of F requires that O take place. This would work in the obvious way for structures like (14). Given an input structure like that in (14), the satisfaction of the closed class RC linker would initiate the relative clause adjoining operation. (14)
S1:
S NP
VP
The man
S2:
V
NP
saw
the woman
S′ Comp
S
who
I knew
The two operations, Saturate-RC Linker and Adjoin-α, are therefore in a 1-to-1 relationship; the necessity for saturation involves or initiates the adjoining operation. Similarly, wh-movement may be considered to be in a 1-to-1 relationship with the satisfaction of the +/ wh feature in the Comp for the clause in which the wh-element finally appears. Note that this differentiates the final target of wh-movement from any of its intermediate landing sites. The situation here is therefore somewhat more complex than that with the Adjoin-α operation, because the wh-element may move several times in the derivation, with only the last movement — into the Spec C′ satisfying the +/ wh feature in Comp. We might assume that the full set of movements are initiated once the ultimate Comp feature requires satisfaction, or alternatively, that the intermediate movements are free, and only the last movement is regulated by the necessity for feature satisfaction. I leave this open.
152
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
If we adopt such a solution, then instead of thinking of a derivation as being composed of primitive operations (Move-α, Adjoin-α), we may consider it to consist of the specifications of closed class elements, which must be satisfied. For the operations above this would be the following: (15)
Move-NP Move-wh Adjoin-α
— — —
satisfy Infl/Agr satisfy +/ wh feature satisfy RC linker
This provides for an interestingly different way of conceiving of the operations of the derivation: that they are equivalent to the satisfaction of the specifications of certain closed class elements. This would satisfy the finiteness characteristic noted in Chapter 1 (and in Chomsky 1981). While there would be ordered relations between the necessity for satisfaction in a particular derivation, there would be nothing in the scheme in (15) to require that actual levels be picked out. Move-NP and Move-wh would therefore be differentiated in the following way: not by the specification of the movement rule itself (operationally), but in the satisfaction of the differing closed class elements, or equivalently, in the differing agreement relations which take place. There are two operations: Agree: Subj./Pred. and Agree: Spec C′/C. The first of these applies to the structure in (16). (16)
S NP
S
was
e
VP
NP
V
NP
hit
John
was
John
VP V
NP
hit
e
Then it is actually the Subject/Predicate operation itself which forces movement. Similarly, wh-movement may apply in a 1-to-1 correspondence with the relation which satisfies the +/−wh-feature in Comp. Call this Spec C′/C agreement. (17)
S SpecC′
S′ C′
C
e
e
SpecC′ S
C′ C
NP
VP
John
saw who
who did
S John see e?
AGREEMENT AND MERGER
153
Since this is another sort of agreement relation, agreement actually underlies both Wh-movement and NP-movement. The two operations, then, are the following (18)
Agree Subject/Predicate Agree Spec C′/C
↔ Move NP ↔ Move wh
These operations initiate movement. By their application, wh-movement and NP-movement take place. The third operation which has been introduced so far is the adjunction of the relative clause into the NP. So far, I have suggested that this involves saturation of the relative clause linker. This by itself would make the operation of Adjoin-α, a unidirectional operation. However, as Chomsky (1982) notes, the relation of the relative clause to the head is also an operation of Predication. If this is correct then relative clause formation (adjunction) is also an instance of a bidirectional operation: the head N′ satisfies the relative clause linker, but at the same time the relative clause itself is predicated of the head N′. This would mean that relative clause adjunction would also be an instance of agreement. (19)
Agree Subject/Predicate Agree Spec C′/C Agree Rel head/relativizer
↔ Move NP ↔ Move-wh ↔ Adjoin-α
This is shown above. In fact, this would mean that all of the operations which involve a radically changing operation of any type (both forms of movement, relative clause adjunction) are initiated by the action of agreement. Other types of information relations, theta assignment and structural case assignment for example, do not radically change the structure of the tree, or the position of elements in it. The brunt of this section, then, has been to introduce a new primitive operation into the grammar: agreement. Unlike Case Assignment, agreement is intrinsically bi-directional. It should not be reduced to some other primitive operation (e.g. government). Agreement has as well two other attributes. It can always be put in a 1-to-1 correspondence with the satisfaction of a closed class element, as in (19) above. This guarantees finiteness. Second, it is agreement itself (so far) which composes substructures into a whole.
154
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
4.3 Merger or Project-α α 4.3.1
Relation to Psycholinguistic Evidence
Let us look at another sort of phenomenon, apparent in acquisition. It is a commonplace in the acquisition literature that children in the earliest stages of language just use open class elements in production. This is the famous stage of telegraphic speech, where the closed class morphemes have dropped out (Brown 1973). The phenomenon of telegraphic speech is well known to every linguist, as well as every parent, yet very little has been made of it in the literature. Nor has much been made of the closed class/open class distinction as a significant demarcation in adult speech. Indeed, in Aspects (Chomsky 1965), as well as in recent work by Joseph Emonds (Emonds 1985), some attempt has been made to model the open class/closed class distinction in a derivational way. The proposal in Chomsky (1965) was to allow for late S-structure insertion of closed class elements; a similar proposal is made in Emonds (1985). These two have perhaps been the main line proposals in the syntactic literature, but neither has been substantively followed up. Yet the existence of telegraphic (i.e. open class) speech by children suggests that the open class representation would have considerable significance in development; the General Congruence Principle would direct that it have repercussions on representations in adult speech as well. Examples of telegraphic speech are given below. (20)
see ball here Mommy want orange juice make castle etc.
In spite of the paucity of proposals about the open class/closed class distinction in syntactic theory proper, this lack of attention does not seem to be principled. One reason for the lack of interest has to do with the fact that closed class morphemes do not belong to a single category (like NP), but rather to any of a number of types. There are Determiners (the), auxiliary verbs (may), inflectional elements (to), prepositions (to, of), and nouns (him). Since the majority of generalizations in linguistic theory are stated in terms of category types (e.g. lexical NPs need Case), the lack of a coherent categorization for closed class elements has perhaps drawn investigation away from this class. A second reason bears more directly on acquisition. While in general GB theorists have shown a useful suspicion of functionalist proposals, within the
AGREEMENT AND MERGER
155
realm of telegraphic speech such proposals have reigned supreme, without substantive criticism. The functionalist proposal for the absence of closed class elements in early speech would be simply the following: the child has limited memory and computational resources in early stages. Given such a limitation, morphemes are at a premium. And since open class morphemes are informationrich compared to closed class morphemes, it is hardly surprising that the child has recourse to the former rather than the latter. There are, however, a number of difficulties with this functionalist account. First, even given limited resources, one would expect that the closed class morphemes would appear sometime, if the child had command of them. The fact that they do not appear at all, in this early stage, suggests that their exclusion is principled, not simply functional in character. More exactly, while there may be a functional reason why closed class morphemes are not generally used, it is reasonable to believe that this functionalist reason has been “grammaticalized”: i.e. realized in the grammar in a principled and meaningful way. Otherwise, one would expect occasional outcroppings of closed class elements even in the earliest stages, something which does not occur (except for pronouns). Evidence from quite a different area suggests as well that there are real differences in the adult computational system in the handling of open class and open class elements. I am thinking here of a quite complex paper by Garrett (1975). Garrett analyzed a large corpus of speech errors, gathered by ShattuckHufnagel and himself, the so-called MIT Corpus (approximately 3400 errors). Exchange errors fell within two basic types: those which occurred between independent words, and those which occurred in what Garrett calls “combined forms”, essentially involving the stranding of bound affixes, as the free morphemes were interchanged. (21)
(22)
Independent form exchanges (examples): a. I broke a dinghy in the stay yesterday. b. I’ve got to go home and give my bath a hot back. Combined form exchanges (examples): a. McGovern favors pushing busters. b. It just sounded to start. c. Oh, that’s just a back trucking out. (exchanged elements underlined)
The independent forms and the combined forms apparently operate differently in exchanges: in particular, the former, but not the latter, obey form class (i.e. syntactic category), according to Garrett, and this constraint is stronger in
156
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
between-clause exchanges. He thus suggests that there are two independent levels of syntactic processing (see also Lapointe 1985, for discussion): a.
b.
Exchanged words that are (relatively) widely separated in the intended output or that are members of distinct surface clauses will serve similar roles in the sentence structures underlying the intended utterance, and, in particular, will be of the same form class. These exchange errors represent interactions of elements at a level of processing for which functional relations are the determinant of “computational simultaneity” … Exchanged elements that are (relatively) near to each other and which violate form class represent inter-actions at a level of processing for which the serial order of an intended utterance is the determinant of computational simultaneity … from Garrett (1975)
Yet more interesting, from the point of view of the theory advocated here is the following comment on the stranding of closed class morphemes (Garrett 1975): The errors we have been referring to as “combined form” exchanges are errors of a rather remarkable sort. They might, as a matter of fact, have been more aptly described as “morpheme stranding” errors, for not only are the permuted elements nearly always free forms, but the elements left behind are as often bound morphemes … (30) … I’m not in the read for mooding (31) … he made a lot of money intelephoning stalls (32) She’s already trunked two packs. … Why should the presence of a syntactically active bound morpheme be associated with an error at the level described in (b)? Precisely because the attachment of a syntactic morpheme to a particular lexical item reflects a mapping from the “functional” level to the “positional” level of sentence planning …
It is examples like those in (30)–(32) that lead Garrett to propose that syntactic production is divided into two levels: a functional level, and a positional level, and that the former is mapped into the latter. This whole line of research might be sanitized from the point of view of linguistics by assuming that what it really pertains to is the theory of the language producer and language acquirer (in the case of telegraphic speech). This would then require that these be part of a separate theory, of unclear extent, which does not have to have any correspondence with syntactic theory per se. Let us nonetheless not take such a position, and instead reach to integrate the Garrett-Shattuck proposals, and the phenomenon of telegraphic speech within linguistic theory proper. This is exactly the position which was taken last chapter
AGREEMENT AND MERGER
157
with regard to relative clauses, where it was argued that the high attachment of RCs noted by Tavakolian (1978) was not a separate parsing principle, but an instantiation of a possibility open in UG: namely, of having a co-relative construction. By taking such a position, a sort of synthesis was achieved between the (now standard) parameter setting approach, and the thesis of this work, that real development takes place. I will concentrate here more on the telegraphic speech, with the Garrett data forming a sort of backdrop. There is a final observation which suggests that the stage of development of telegraphic speech is an organized stage, and that it should be taken account of in adult speech as well. This is the simple, but meaningful observation that adults, as well as children, can speak telegraphic speech. If we viewed such speech as simply the direct result of a computational deficit by the child, we would expect that adults would no longer be able to produce such speech, at least insofar as this would require mimicking a computational deficit that the adult no longer had. Given the fact that adults can speak telegraphically, there is a strong implication, though of course no sure proof, that telegraphic speech is an actual subgrammar of the full grammar, and that adults using such speech are gaining access to that subgrammar. This in turn would be very much in line with the General Congruence Principle, which suggests that the acquisitional stage exists in the adult grammar in something like the same sense that a particular geological layer may underlie a landscape: it therefore may be accessed. 4.3.2
Reduced Structures
But what would this subgrammar look like? It was noted above that the open class/closed class distinction had been mentioned, and partly modelled, by such early works as Aspects (1965), where it was assumed that closed class elements were a late spell-out of certain types of information. The Garrett and ShattuckHufnagel data, however, suggests that something like the opposite ordering holds. Namely, that there exists a grid or template of closed class elements, and the open class elements are projected into them. This is perhaps counter-intuitive from the point of view of actual speech production, yet the logic of the grammar supports it. It is also in line with certain conclusions that were reached in Chapter 1. A constraint was suggested there, the Fixed Specifier Constraint, which would bar the independent movement of closed class specifiers (i.e. unless they were part of a whole constituent which was moved). This was made necessary by the fact that under conditions of extensive movement, there must remain certain stable elements, the grid around which the others are moved, for the child to be able to induce a grammar at all. The closed class specifier
158
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
elements seemed to be just such a set. (Note that this form of the proposal, while based in part on considerations that Garrett raises, differs in content.) Let us adopt the same conceptual device that was used in the analysis of relative clauses early. In that chapter a full sentence was put through a “conceptual filter”, which filtered adjuncts out of the representation. This created the argument skeleton on the one hand (the rooted structure which was a pure representation of the argument-of relation), and a set of adjuncts which would later be added into the representation. If we adopt the same device here, we would get a reduction of a full sentence, together with a set of closed class elements. Let us ignore the latter set for now, concentrating on the reduction itself. The term reduction here is used with a rather different meaning than that in Bloom (1970). (23) (24) (25)
I saw the ball. reduction: see ball Mommy left the room. reduction: Mommy leave room I put the ball on the table reduction: put ball (on) table
In fact, what representations like (23)–(25) show us is that we had not gone far enough in Chapter 3 in attempting to isolate a pure representation of argument structure. The reductions in (23)–(25) are a purer isolate yet. And, if the General Congruence Principle is to hold, it must be the case that these reductions are not simply spoken by the child, but underlie adult speech as well. The term reduction to describe (23)–(25) is intended purely descriptively. Assuming that the reduction in (23)–(25) is what the child would say if he or she wished to express the full meaning directly above it, we will call the child’s utterance a reduction of the full phrase marker. This still leaves undetermined what the nature of this reduction is. There are three central possibilities: I.
that the reduced phrase marker is directly generated as such by the child’s grammar, II. that there is a reduction transformation of some sort from a fuller structure (where this reduction may be done by the parser rather than the grammar), and III. that the actual phrase marker is relatively more developed even at SS, and null elements fill the determiner and other closed class positions.1 1. This possibility was suggested to me by Joung-Ran Kim.
159
AGREEMENT AND MERGER
Of these three possibilities, I wish to adopt the first, and to some degree the third. This has a further effect. Given the General Congruence Principles, such a reduced mode of representation must also underlie the more complete adult representation. That is, telegraphic speech is actually generated by a subgrammar of the adult grammar, a modular and unified subgrammar, and this enters into the full phrase marker. The three logical possibilities underlying the child’s reduction of adult speech are the following: (26)
(27)
Simple reduced phrase marker: V V
N theme
see
ball
Deletion account: S NP I
I′ Tns e
VP V see
Deep Structure
S NP
VP
Det
N
V
NP or N
the
ball
see
ball
Surface Structure
160
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(28)
Null lexical items: S NP e
I′ Tns e
VP V see
NP Det
N
e
ball
Deep Structure and Surface Structure The arguments for the first proposal over the deletion transformation account are conceptual in nature. Suppose that we assume that there is some measure of complexity of a phrase marker. This would be a function of the complexity of the tree, the licensing relations in it, and so on, and would no doubt differ to some degree from production to comprehension. It would be natural, given such an analysis to suppose that the phrase marker itself (though not the universal principles underlying it), “complicates itself” over time, in the sense that the complexity with respect to that metric increases. That is, the analyses allowed by the grammar become more complex over time, though the universal principles do not. This was the case with the analysis of relative clauses earlier, where the UG information was, in fact, lessened over time (as parentheses were removed), while the analysis itself was to some degree made more complex. Given such an assumption, there is something extremely odd about the deletion account, Proposal II. Such an account requires that there be an original full representation, together with an operation, a reduction transformation which operates on it. The child grammar (or system) and the adult grammar generating the syntactic representation underlying “see the ball” would therefore be the following: (29)
Adult grammar: Child grammar:
rules underlying full phrase marker. i) rules underlying full phrase marker ii) reduction transformation or operation
But this is surely odd, if the grammar has any sort of computational reflex at all. The child’s grammar here contains more material in it than the adult grammar, and these operations must all work: precisely the opposite of what might be
AGREEMENT AND MERGER
161
expected. One might expect, given the grammar in (29), that a more complex structure (i.e. one containing more NPs) would be less reduced, because the rules underlying the full phrase marker would already be stretched to the limit, and it would be difficult for the child to apply (ii) in addition: i.e. (i) would be instantiated at the cost of (ii). There is a second way in which the reduction analysis is odd. Namely, such an operation would not exist in UG, but would simply be present at a particular stage in acquisition, the telegraphic stage. This would make it look very unlike the situation with relative clauses discussed earlier, where the possibility of high attachment was reduced to an actual alternative specification in UG: the possibility of a corelative construction. The theory under construction in the last chapter would require that when an appropriate structure is not reached by the child (in this case, the full phrase marker), the child falls into another grammar specified by UG. This would not be the case with a reduction transformation, since the reduction transformation itself is not specified by UG. The child, therefore, would be “falling into” a grammar which is not specified by UG. Keeping with the strictures above, this possibility is unavailable to us. I will therefore assume that there is no reduction operation of this type. With respect to the third possibility, the situation is more interesting. I limit myself here to some preliminary comments. In Chapter 2, I suggested that the analysis of the early string by the child took place in two stages. The child first fitted the lexical subtree over the open class part of the incoming string. (30)
V (N)
V V
the man
saw
N the woman
This gave rise to telegraphic speech: an analysis in which the closed class elements dropped out.
162
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(31)
V (N)
V
(man)
V
N
saw
woman
At a logically slightly later stage, the closed class elements, marked simply X0, were incorporated into the structure, according to the Principle Branching Direction of the language. (32)
V N X0
V N
V X0
the man
saw
N
the woman
This would correspond to a derivational stage in which Project-α occurs. Some evidence for this second stage would be the presence of schwa in the output, corresponding to the X0 elements. If this is in fact the progression, then both the null category and the simple structures view would be expected to be correct, though at slightly different stages: the latter slightly less advanced than the former, and logically prior to it. (This assumes that the Pre-Project-α stage is phonologically realizable: if not, true telegraphic “speech” in the simple structures sense above would exist only in the initial analysis, and as a subgrammar of the final grammar (see later discussion), and not in exteriorized speech at the telegraphic stage. I leave this odd possibility aside.) Either of the two views above would have an advantage over a fourth view (in GB-theory): that the initial NP is a full phrasal node, with no determiner.
AGREEMENT AND MERGER
(33)
163
S NP
VP V
NP
see
N ball
The reason has to do with extendibility of the grammar (in Marcus, et. al.’s sense). The following generalization must be expressed somewhere in the grammar of English. (34)
In the phrasal syntax, definiteness is marked with the (or perhaps, building up to the N″′ level: Chapter 2)
The representation in (33) would violate the restriction in (34), while the representation in (31) would not, since the phrasal syntax had not yet been entered. That is, by assuming a different type of representation, the thematic representation, one arrives at the position that the child’s grammar is at this stage not incorrect, but simply incomplete. Let me turn now to another consideration in syntactic description at this stage. At the stage at which children are saying things like “see ball”, their behavior suggests that they are using something quite different than that simple sentence for the information structure which is input to semantic interpretation. In particular, while ball in “see ball” is determinerless in early childhood speech, and while determinerless nouns in adult speech generally have a generic or class interpretation, the child speaks reduced phrases like “see ball” in contexts where ball must be regarded as specific in reference. Thus the child interprets “see ball” as something like “I see the ball”. But this simple fact creates difficulties for the simple structures account. If the DS and SS representations are indeed ((see)v (ball)n)v — i.e. extremely simplified, and without a determiner: pure lexical representations — then the child must still have a way of rendering the fact that such structures are not generic. Assuming that at LF the structure is interpreted, this means that by LF, the representation must be something like ((Vsee) (NP the ball)). But if this is so, then structure-building operations must be available at LF, for the child but not the adult. This is surely not to be desired. Further, the postulation of structure-building operations at LF mimics the possibility of a reduction transformation already rejected, although in a reverse
164
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
direction. The null lexical item account avoids these problems because there is already a slot for the determiner element. We might even assume that the slot itself is marked for definiteness or indefiniteness: (35)
VP V see
NP Det +Definite
N
e
ball
In this case, the child would not need to structure-build at all at LF, but would simply use the structure as given: definiteness is already correctly marked, though the lexical item is missing. This difficulty with the thematic structure account also appears to put it at a disadvantage with respect to an acquisition theory like that in LFG (see Pinker 1984). In Pinker’s theory, a relatively complete f-structure may be paired with an incomplete c-structure. The representation of “see ball” might therefore be the following. (36)
S VP V
NP
see
N
È SUBJ (Pred “I”) Í see (SUBJ,OBJ) Í OBJ +definite È È Í Î Pred “ball”Î Í Î TNS Present
È Í Í Í Í Î
ball c-structure
f-structure
In the LFG account, it is possible to allow for a severely reduced c-structure, while the f-structure is still full. Since there is only a realization mapping between f-structure and c-structure, and each realization operation presumably comes at a computational cost, in LFG the simpler c-structure would be associated with less computational cost. This is presumably not the case for GovernmentBinding Theory, insofar as a null or identity transformation, i.e. one not deforming
AGREEMENT AND MERGER
165
the D-structure at all, would presumably be the computationally cheapest. (This is an old idea, cf. Miller and McKean 1964, but I will retain it.) Because D-structures can be directly realized as S-structures in that way (unlike f-structures and c-structures, which use distinct vocabularies), the reduction transformation, or the inverse in terms of LF structure-building, would presumably be more computationally expensive then doing nothing at all to the input structure. I will give one response to this line of attack here, reserving more full discussion to later in the chapter, and future work. If it is in fact the case that the child has reference to the notion of definiteness, when producing sentences like “see ball”, then this fact must be registered somewhere. However, in a theory like that of Chomsky (1977b), this need not take place at LF. Rather there is a different representation, SI-1 and SI-2, at which linguistic information interfaces with information about the world, and the general cognitive system. It is quite unclear what the format of the information would be at that point. However, if definiteness were marked then, and not sooner, then there would be no need for structure building operations at LF. In a sense, the greater computation load would be placed back further in the grammar, as a default. The actual operation would be viewed as closing off the free variables in the representation, via an iota operator: see (x) & ball (x) → ιx (see (x) & ball (x)). Thus by locating the variable binding operations at SI-2 for children, the difficulty with the simple structures account of “see ball” is avoided. 4.3.3
Merger, or Project-a
Let us work out one way by which this proposal for early thematic speech, i.e. the reduced phrase marker (and the Garrett-Shattuck-Hufnagel findings about word exchanges) would be instantiated in adult speech. Other ways are possible as well. The particular proposal here does not hew unusually closely to the original Garrett proposals, but instead tries to integrate the idea of an open class and closed class representation in ways more closely modelled to those available in linguistic theory. In particular, rather than supposing that the distinction or the cut is between open class and closed class representations, let us assume that the two main representational theories, Case theory and theta theory are implicated. In particular: (37) (38)
The reductions in (23)–(25), repeated below, and telegraphic speech in general, is a pure representation of theta relations. I saw the ball. reduction: see ball
166
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(39) (40)
Mommy left the room. reduction: Mommy leave room I put the ball on the table. reduction: put ball (on) table
What then creates the fully structured phrase marker out of the pure theta relations in (38)–(40)? This, I will assume, is caused by the projection of the theta representation into a different representation, a pure representative of Case relations. There are therefore two separable representations comprising the VP “saw the woman”. One is a representation of theta relations, the other is a pure representation of Case.
V
(41)
V
N
see
woman
Theta representation (42)
VP V Case assigning features of verb
NP Det +acc
N
the Case representation The Case theory representation includes closed class elements — in particular, the closed class determiner the. It also includes the case assigning features of the verb, but not the verb itself. (For an argument that these should be separated, see Koopman 1984). Case is not assigned to the NP as a whole, nor to the nominal (N) head, but rather to the closed class determiner position in the NP. Thus Case and theta are actually assigned to different elements in the object NP. Theta roles are assigned to the nominal head, N, in the theta representation. Case is assigned to the determiner position in the Case representation. Ultimately, both get spread over the NP, but in different ways. The theta role assigned to the nominal head percolates to the NP node after a Merger, or Project-α, operation
167
AGREEMENT AND MERGER
merges the case and theta representations. The accusative feature on the determiner gets copied onto the head N, again in the operation of Merger, where the determiner and the head must agree in case assignment. It is thus the operation of Merger which merges the Case and the theta representations. Note that each of these representations is a pure representation of the particular vocabulary that it invokes (Case theory vs. theta theory), and indeed, the crucial categories that are mentioned in each theory (determiner vs. nominal head) are distinct as well. (43)
VP
V V
N theme
see
woman
V Case assigning features
Theta theory representation
NP Det +acc
N
the
e
Case theory representation
Merger, or Project-α
VP V
NP +theme
see Det +acc
N +acc
the
woman
The theta representation above has already been put forward in the discussion of the lexicon. The vocabulary of the representation are theta roles (already assigned), and category labels of the zero bar level. This representation is somewhere in between an enriched lexical representation and a truly syntactic
168
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
one: enriched because along with the representation of theta argument structure, it includes the filled terminal nodes with lexical items in them. What I have called the Case theory representation is a good deal stranger. It factors out the closed class aspect of the V-NP representation in a principled way. What it contains is the following: (1) a subtree in the phrasal syntax (it projects up to at least V′), (2) where the Case assigning features of the verb are present, but not the verb, and (3) in which Case has been assigned to the determiner. The position that Case is assigned not to the nominal head, but rather to the determiner, is reasonable given the fact that in languages like German, it is actually the determiner system (especially the definite determiner) which shows the demarcations according to Case. (44)
nominative accusative dative genitive
der Mann den Mann dem Mann des Mannes
The operation of Merger has three main effects. First, it inserts two lexical items into the slots provided by the case frame: the head verb and the theta governed noun. Second, it percolates the theta relation already assigned to the noun to the NP node (theme, in this case). Third, it copies the Case that was originally associated with the determiner position onto the head noun. This means that ball, as well as the, is marked for (abstract) accusative Case. However, other parts of the noun phrase in a complex noun phrase (e.g. the pictures of Mary) will not be so-marked. This stands in contrast to theta assignment, which percolates up to the NP node, and so characterizes the whole NP. The two representations above underlie adult speech, as well as the child’s. This allows for a very compact description of telegraphic speech: telegraphic speech is simply a pure representation of the theta structure, itself a sub-representation generated by the adult grammar. There are a few other properties of the above representational system which require note: (i)
The determiner and full NP categories play something like the same role in Case and theta assignment to the simple NP object, that the subject NP and S play in ECM-type constructions (e.g. the complement of believe). Just as a theta role is assigned to the full S in such constructions, while Case is assigned only to the subject NP, in this representation, Case is assigned only to the determiner, while a theta role is ultimately inherited by the full NP. (ii) The conception of the grammar is not simply modular, but separable in character. That is, it is not simply the case that Case theory applies at a
AGREEMENT AND MERGER
169
different point then theta theory, but that the two primitives actually pick out different representations which purely instantiate them. These representations are then composed by the operation of Merger. (iii) It allows for a clear demarcation to be made between the lexical and syntactic passive. They apply to different structures: the syntactic passive to the Case representation, and the lexical passive to the theta representation. (iv) The fixed character of the closed class elements (Chapter 1) is modelled by having such elements be the frame into which the theta representation is projected. The operation of Merger here (or Project-α) is unlike that of Agreement, discussed earlier in the chapter, in that the latter requires that information pass in both directions. The paradigm case of an agreement relation, in this view, is Subject-Predicate agreement, where number information passes in one direction, and theta information in the other. (45)
S NP
VP number theta
Perhaps the most interesting area of evidence outside of acquisition for the above theory is in idioms. 4.3.4
Idioms
Let us consider another area of evidence. I have suggested above that there are two central sorts of representations which have open class elements in them. The first is the simple theta representation — a sort of extended lexical representation with terminal leaves. The other representation which has open class elements in it is the post-merger representation: the representation after the OC (open class) representation has merged with the Case grid. If this conception is correct, then there should be certain set phrases, idioms, which show where the lines of demarcation are. In particular, since idioms require sequences of set items, then any particular idiom should be frozen with respect to the level of elements that it is stipulated at. An idiom could be stipulated at the level of the OC representation: the theta representation. Or it could be stipulated at the post-merger level: what ordinarily would be the result
170
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
of a projection operation, but since idioms are set sequences, may be the furthest back that the grammar could go. In essence, one would expect two types of idioms: Level I, theta type idioms, and Level II, post-merger idioms. To the extent to which such a division exists, and has syntactic consequences, the case for a separate theta representation, and post-merger representation becomes stronger. Consider the sample idiom sets in (46). (46)
OC Idioms break bread make tracks take advantage turn tricks mean business turn tail give two cents keep tabs make strides make much of
Idioms with definite determiner kick the bucket buy the farm climb the walls break the ice smoke the (peace) pipe hit the ceiling make the grade give the lie to bite the dust bite the bullet
I have broken the idioms into two basic types: those which take an object with a definite determiner, and those which obligatorily take a simple N, and allow for a free specifier. I would like to introduce now a generalization. (47)
Determiner Generalization: Object idioms with a specified determiner do not allow passivization.
By a “specified determiner”, I mean — and this is crucial — that the determiner itself is a necessary part of the idiom (e.g. kick the bucket; *kick a bucket). For example, the is part of the idiom in kick the bucket, but no determiner is part of the idiom in take advantage of (thus I mean specified, not that it is specific): an indefinite determiner is specified, in this sense, as long as it is part of the idiom. The determiner generalization is basically correct. A minimal contrast supported by the generalization would be the following: (48) (49)
a. John took advantage of Bill. b. Advantage was taken of Bill. a. John kicked the bucket. b. *The bucket was kicked by John.
AGREEMENT AND MERGER
171
The idiom in (48), with the bare N in the object slot, passivizes freely. However, the idiom in (49), with the fixed definite determine the, does not passivize at all. Consider another minimal pair. (50) (51)
a. We broke bread over the agreement. b. Bread was broken over the agreement. a. John climbed the walls. b. *The walls were climbed by John.
And a third example. (52) (53)
a. We made (great) strides. b. (Great) strides were made by us. a. The boss hit the ceiling. b. *The ceiling was hit by the boss.
These three examples show sharply the distinction in passivization. They also do something equally important: they exclude the possibility that the reason that the OC idioms should passivize more freely is simply that the object is more interpretable in isolation. Thus (it might be thought) the individual subparts of the OC idiom are themselves interpretable, and this is the reason that the DS object can appear in the subject position in idioms, but not the post-merger object. However, it would be difficult to make such a case for the examples here. Bread in break bread need not be interpreted literally in the passive: “Bread was broken over the agreement by drinking a glass of wine”. Yet the idiom more or less retains its idiomatic usage in the passive — certainly does compared to the contrasting “climb the walls”. Even more striking, (great) strides cannot be taken any more literally in “Great strides were made by us in getting the new contract” than the ceiling can be in “*The ceiling was hit by the President”. Yet in the former case, the simple N passivizes freely, while in the latter case the fully specified NP does not passivize at all. These contrasts suggest that a proposal having to do simply with the individual interpretability of the idiom-parts will not go through: they suggest, in fact, that the apparent free interpretability of the idiom parts in the OC (open class) idiom is an effect of passivizability, not a cause. The actual reason for the contrast in passivization has to do with the specification of the determiner. I have simply included an idiom list above. Let us now label the list, where Y stands for passivizes freely, and N for not. The regularity is striking, though not universal.
172
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(54)
Sample of idioms OC Idioms Idioms with definite determiner Y break bread N kick the bucket Y make tracks N buy the farm Y take advantage N climb the walls Y turn tricks Y break the ice N mean business N smoke the (peace) pipe N turn tail N hit the ceiling Y give two cents N make the grade Y keep tabs Y give the lie to Y make strides N bite the dust Y make much of N bite the bullet Totals: theta representation: 8 yes 2 no definite determiner: 2 yes 8 no
The counter-examples fall into two types: theta representation idioms which unexpectedly do not passivize (mean business, turn tail), and full specified idioms which do (break the ice, give the lie to). Of those which unexpectedly do not passivize, mean business falls under a different generalization: namely, mean does not seem to passivize in any of its forms (*A great deal was meant by John to Mary). Turn tail may have thematic reasons as well (Jackendoff, 1972) for not passivizing. For the full specified idioms, break the ice, if investigated more fully actually supports the generalization as an absolute. This is because break the ice actually has structurally freed up with respect to its determiner. Consider the following. (55)
break the ice break a lot of ice ?break some ice (with that remark) (56) kick the bucket *(some men) kicked some buckets *kick a lot of buckets Compared to an idiom where the determiner is definitely specified, such as kick the bucket, the determiner element in break the ice is quite free. This suggests that it truly does not belong in the specified determiner list at all. Similarly, other marginally passivizeable idioms not mentioned here, for
AGREEMENT AND MERGER
173
example, toe the line, seem to have a similar property: the determiner has freed up (to some degree), and to that degree the idiom is passivizeable. (57)
a. b. c.
He toed the line. line was toed by him. ??A narrow line was toed by him. ??The
It appears, then, that the Determiner Generalization may be taken as a principle, rather than as a simple correlation. However, at this point an empirical problem arises. I have so far restricted myself to a discussion of idioms with definite determiners (in the object slot). Curiously, some idioms with an indefinite determiner, do seem to allow passivization. (58)
(59) (60)
a.
take a fancy to: A fancy was taken to Jeff by Mary. b. take a shine to: A shine was taken to Jeff by Mary. take a bath: A bath was taken in the stock market (by John). leave a lot to be desired: A lot to be desired was left by John’s behavior.
There is, however, a fascinating contrast in these idioms. If they are given the passive progressive form, the possibility of passivization disappears. (61)
a.
take a fancy to: *While we talked in the kitchen, a fancy was being taken to Jeff by Mary in the dining room. b. take a shine to: *While we talked in the kitchen, a shine was being taken to Jeff by Mary in the dining room. (62) take a bath: *A bath was being taken in the stock market by Jeff. (63) leave a lot to be desired: ?*A lot to be desired was being left by John’s behavior. This restriction on the progressive appears only in the passive, and hence it cannot be viewed as arising out of the some semantic property of the verbs themselves (e.g. that they are stative). (64)
a. *A shine was being taken to Jeff by Mary. b. Mary was taking a shine to Jeff.
174
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The question, then, is : why are certain idioms with specified determiners freed from the general ban on passivization, but not in the passive progressive? In fact, a clear and ready answer is available. In general, while the syntactic passive is either progressive or stative in character, the lexical passive is only stative (with respect to the lexical class of the predicate). Thus while the nonprogresssive passive in (65) allows either of two readings, the progressive passive is restricted to just one. (65)
a.
b.
The toy was broken. Reading 1: Someone broke the toy. Reading 2: The toy is in pieces. The toy was being broken. Reading 1: Someone (was) break (ing) the toy. Reading 2: None
It is natural to assume that “the toy is in pieces” reading, where a simple property is being predicated of the toy, is in fact the lexical passive reading. Similarly for the actional predicate in (66). (66)
a.
b.
The glass was shattered. Reading 1: Someone shattered the glass. Reading 2: The glass is in pieces. The glass was being shattered. Reading 1: Someone (was) shatter (ing) the glass. Reading 2: None
If we assume that the progressive filters out the possibility of the lexical passive, then the disappearance of the second reading in (65) and (66) is explained. Note that when a predicate is chosen which (is generally taken to) only allow lexical passives, the actional reading is filtered out. (67)
The toy appears broken Reading 1: None Reading 2: The toy is in pieces.
Finally, let us note another generalization: while the syntactic passive, associated with Reading 1, has a true affected patient surface subject (the glass is affected by the action in (65) and (66)), the surface subject in the lexical passive, associated with Reading 2, is not affected, but is a simple theme (the toy is in the state of being broken). In general, a syntactic passive, but not a lexical passive, allows a +affected surface subject. But now the reason for the peculiar behavior of the idioms in (58)–(63) is
AGREEMENT AND MERGER
175
apparent. The passives which are allowed are not syntactic passives, but lexical passives. Hence they are possible in the simple past form, but not in the past progressive (A fancy was taken to Jeff/*A fancy was being taken to Jeff). Hence the Specified Determiner Constraint holds in full force (“Specified” here means determined in the idiom, not “specific”; the specified determiner is in the object itself). (68)
Specified Determiner Constraint: An object idiom with a specified determiner (i.e. determiner specified as part of the idiom) does not allow syntactic passivization.
Finally, let us note that two other possible generalizations which might be made about the data are insufficient. A possible alternate explanation is that the fixity of the object in these idioms with the specified determiner does not have to do with the passive, but with a general impossibility of movement of the element with the specified determiner. However, this is counter-exemplified by subject-tosubject raising, which occurs in subject idioms, even in the more rigid definite determiner type. (69)
a. b. c.
The cat seems to be out of the bag. The shit seems to have hit the fan. The roof seems to have caved in on George.
Second, one might suppose that an idiom element without reference may not be moved into a semantically transparent position. But this explanation has problems on two grounds. First, idiom chunks do appear to show up in semantically transparent positions like the head of relative clauses: (70)
The headway that we made was remarkable.
Also, the subject position in a passive construction presumably need not be construed as necessarily semantically transparent (or if it is it allows movement of elements into it any way), since movement of elements into it is allowed. (71)
A unicorn was being sought.
Finally, there is a syntactic subgeneralization — or perhaps an independent generalization — which does seem to hold over the idioms which is different from the Specified Determiner Constraint. This is that idioms which have agentive predicates as their heads do not allow passivization, while those which do not have agentive predicates do. The example below, which was used to argue for the Specified Determiner Constraint, might equally be used to show that agentivity
176
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
in a predicate blocked passivization (i.e. where the predicate is generally an agentive one, whether it seems to be used agentively in the idiom or not). (72)
a. *The bucket was kicked. b. Advantage was taken (of John).
Looking back at the predicates listed above, we note that the agentive/nonagentive contrast will account for a large bulk of the data as well. In fact, I initially took this to be the basic generalization that held, in the blocking of passivization. (73)
Agent/Patient Generalization An object idiom may not be passivized if the main predicate is agentive.
There is some reason to take the agentive/patient generalization to be the subsidiary one, and the specified determiner constraint as more central. One reason is conceptual. Given the view of passive suggested above, any instance of passive may be either lexical or syntactic. The possibility of a lexical passive is screened out in two ways: by placing the construction in the progressive, or by allowing the derived subject to be affected (a patient). Thus the lexical version of “A toy is broken” has a simple theme subject; in the case of an affected subject, the passive is syntactic. If this is correct, then agentive predicates cannot undergo lexical passivization and allow their derived subject to be affected. But it that is correct, then the passives which are being allowed in the idioms are simply lexical passives, hence not agent/patient, hence the generalization in (73). That is, (73) would fall out of the properties of the lexical passive. The second reason is more empirical. If (73) were correct, then one would expect that, holding the main verb constant, there would be no effect of varying the determiner status of the object. But consider the following set of idioms for take, a common idiom-taking verb. (74)
Idioms with take Idiom take the cake take a leak take a hike take a bath take a fancy to take a shine to take five take heed take issue
Passive? no no no no only non-progressive (lexical) only non-progressive (lexical) no yes yes
AGREEMENT AND MERGER
177
Aside from “take five”, the division according to the presence of the determiner is perfect. Those idioms which require a specified determiner do not passivize; those which do not require a specified determiner, do. In this case, of course, the verb is constant, so agentivity is not an issue. I will therefore assume that the Specified Determiner Constraint is correct.2 The Specified Determiner Constraint, that idioms with a specified determiner cannot passivize, would appear to be totally baffling from the point of view of current versions of Government-Binding Theory (as well as other grammatical theories). Case is generally assumed to be assigned to the full NP object, and is inherited by various elements in that NP, in particular, by the head noun and the determiner. Assuming that Case is assigned in that way, there would be no reason to expect that anything about the internal structure of the NP object, even in the special case of an idiom, should have anything to do with passivizability. On the other hand, given the division of labor suggested here, where Case is assigned to the determiner not to the whole NP, and there is a merger operation of the Case and theta frames, the unpassivizability of the full idioms, but not the OC (open class) idioms is rather expected. Consider how the passive would be stated. It is necessary here — as in the usual statement of passive — to prevent accusative case from being accidentally assigned to the object, prior to affix hopping.3 In terms of the two representations assumed here, this means that the Case-absorbing morphology must itself be part of the Case frame.
2. The situation with wh-movement is interesting. In brief, it appears that there is a constraint on wh-movement in idioms, but that it is not the same constraint which applies on NP-movement. For example, it appears that some idioms that are not frozen for NP movement are frozen for wh-movement, presumably because one cannot quantify over the elements involved. (1) a. The cat seems to be out of the bag. ?*Which cat is out of the bag? On the other hand, some idioms which are frozen for NP movement (in the syntactic passive) are not frozen for wh-movement. (2) a. *A fancy was being taken of George by Mary. b. How much of a fancy did Mary take of George? This suggests that the constraints are different. 3. Thus, I am not assuming a surface filter on Case.
178
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(75)
V′ V Case assigning features
NP -ed absorbs Case
Det no Case
Case frame for passive Now consider how idioms must be stated. An idiom is a specified chunk of linguistic material corresponding to a specified meaning, which must be listed. All idioms must be fixed and listable, and let us assume that they are listed at one specific level. Now the open class theta representation idioms may simply be listed as such, in the theta representation itself, together with their meaning. (76)
listed for theta-representation take
advantage
meaning: ... The post-merger idioms are also listed. However, since these are unitary chunks, this means that the deepest format for post-merger idioms is the post-merger representation itself. The idiom is not divisible into the theta and the Case representation. (77)
Deepest level for Level II idioms: Post-merger: VP VP
kick
NP Det
N
the
bucket
meaning: ... The representation in (77) is the deepest level that the post merger idiom may take. It does not exist in the usual subunits of a freely composable theta representation and a freely composable Case representation: if this were the case, then the idiom would have to be specified at two places at once, this was ruled out above.
AGREEMENT AND MERGER
179
Thus the following picture holds in the formation of idioms: (78)
Theta representation
Case representation
deepest level for OC (pure theta) idioms
Merger or Project-α
deepest level for specified determiner idioms
full representation
We may conceive of idioms, then, as being pieces of listed structure. These pieces of listed structure may be specified at different levels. The idioms without a specified determiner above were listed in the thematic representation; those with a specified determiner were listed post-merger. We may call these Level I and Level II idioms (not referring to those actual levels in lexical phonology, but to comparable types of specified levels in syntax). Consider now the problem of passivization. Given the general structure in (78), it would be impossible for the post-merger idioms to passivize. This is because the syntactic passive is a manipulation of the case representation: one which adds the Case-absorbing morpheme, -ed, into that representation. The passivization of Level I idioms would therefore look like the following:
180
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(79)
Level I idioms: Theta representation independant representation
Case representation
take advantage (of)
V′ V +acc assign. features
Passive (applies to Case frame)
nothing happens
NP Det +acc
N
V′′ NPi
V′ V
+acc assign. features
NP -ed
Merger or Project-α
Passive form: Advantage was taken
ti
181
AGREEMENT AND MERGER
(80)
Level II idioms: Theta representation
Case representation
independant representation
not applicable
not applicable
Passive
not applicable
not applicable
Merger
Deepest level of syntactic representation
VP V
kick
NP Det
N
the
bucket
The Case representation exists as a separate representation only for the OC (open class) idioms. This does not so exist for the post-merger idioms, which are specified only in the full representation. Therefore, passive, which applies to the Case frame, is impossible for post-merger idioms: exactly the wanted result.
4.4 Conclusion I have suggested in this chapter two new primitive operations: agreement and merger (or Project-α). These are not, I would argue, reducible to the operation of Case or theta government, but must simply be viewed as primitive operations that the grammar has recourse to. Agreement differs from government in being bi-directional. More strikingly, perhaps, agreement operations can be put into 1-to-1 correspondence with basic movement and adjunct-adding rules (Adjoin-α, Move-NP, Move-wh). The latter may themselves be put into 1-to-1 correspondence with the satisfaction of closed class elements (the +wh feature in Comp, Infl, and the Relative Clause linker). Thus while agreement would encompass
182
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
certain other more primitive operations, it would of necessity be a finite subpart of the grammar. The other operation which was introduced was merger. Merger merged together an open class representation, the theta representation, and a closed class frame. The stage of telegraphic speech, in this view, is not simply the result of an extraneous computational deficit, nor the result of a reduction transformation, but is itself a well-formed subpart of the grammar. However, it is only partial, not having undergone merger with a closed class representation. By the General Congruence Principle, a similar structure of telegraphic speech must underlie adult utterances as well. I suggested that there was evidence for this because of the existence of both pre-merger and post-merger idioms, with differing properties. Pre-merger idioms (take advantage, break bread) were a pure representation of theta structure, and passivized readily. Post-merger idioms (kick the bucket, hit the ceiling) were an instance of post-merger speech.
C 5 The Abrogation of DS Functions: Dislocated Constituents and Indexing Relations I have argued above for a grammar which contains, along with the primitive rule Move-α, at least one rule of Adjunction, Adjoin-α, and a rule of PS merger, or Projection, Project-α. The first of these corresponds roughly to the classic generalized transformations of Chomsky (1975–1955, 1957), but applies only to adjuncts. The restriction of the operation to adjuncts would be forced theoretically by the Projection Principle; further, the minimal specification of the grammar according to the Projection Principle — namely, that the lexical specifications of elements must be satisfied at all levels, but other properties of the phrase structure tree need not be — would predict that such an operation would be possible. The treatment in Chapter 3 would therefore be what would be expected, under such minimal assumptions. Further, on purely theoretical grounds, it would be well if X′ theory held in some pure form at least at D-Structure. One of the most interesting claims of that theory (Jackendoff 1977) is that X′-syntax involves a smooth gradation of bar levels, with the head of a construction connected to its maximal projections by a sequence of bar levels obligatorily descending by 1 (Jackendoff 1977). (1)
Xn → … Xn–1 …
Let us suppose that D-Structure is a pure representation of X′-theory in Jackendoff’s sense, and that it obeys the restriction in (1). Let us assume further that one of the classical analyses of relative clauses is correct, where the structure is of a Chomsky-adjoined type: that is, with one of the node labels (NP or N″) repeated. It follows as a consequence that RCs must be adjoined in the course of a derivation. for X′-theory not to be violated at DS. These theoretical boons would be vacuous if interesting empirical consequences did not hold as well. It was argued in Chapter 3 that they do hold, both in the analysis of the adult syntactic structure, and with respect to acquisition. The so-called “anti-Reconstruction” facts of van Riemsdijk and Williams (1981), reconstrued as involving the argument/adjunct distinction rather than degree of embedding, can be accounted for by assuming that adjuncts, but not arguments,
184
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
may be embedded in the course of the derivation. This means that adjunctembedding may follow wh-movement in the sequence of operations, and thus allows fronted names in adjuncts, but not in direct complements (arguments) to escape Condition C violations. The acquisition data pointed strongly in the same direction. Tavakolian (1978) has shown that, in initial stages, the child may have recourse to a “high” attachment of relative clause: the conjoined clause analysis. This means that a relative clause construction like that in (2a) may receive an analysis like (2b) by the child, where the relative clause is not embedded under the relevant NP, but simply conjoined. (2)
a. b.
The duck saw (the lion (that hit the sheep)). ((The duck saw the lion) (that hit the sheep)).
The empirical consequence of this is that the child allows (and in fact virtually requires) the surface subject of the matrix clause — the duck, here — to “control” the subject of the relative clause. The resultant analysis is that the matrix subject (the duck) is treated as the subject of both clauses (Tavakolian 1978). While Tavakolian suggests that this high attachment analysis is due to the presence of a separate parsing principle in the grammar, the analysis above suggests that — whether there is a parsing principle or not — the high attachment would follow from a an embedding analysis, together with the assumption that, in cases where embedding failed, a default analysis of conjunction is superimposed by the child on the data. Thus the child first attempts to analyze the relative clause as embedded with the noun phrase that it modifies. If this analysis succeeds, the correct analysis, and correct modification possibilities, follow. lf the analysis does not succeed (perhaps for computational reasons), the child adopts the default of a conjunction analysis. This accounts for the modification possibilities that the child mistakenly adopts (Tavakolian 1978); it also explains the Solan and Roeper data (Solan and Roeper 1978), that when the conjoined clause analysis is impossible, as in the put constructions, the clause remains unrooted, and hence is not analyzed at all. Thus while the conjoined clause analysis can be thought of as a “parsing principle” in some sense, it is not exclusively that, but is rooted into the general conception of the computational organization of the grammar.
5.1 “Shallow” Analyses vs. the Derivational Theory of Complexity The discussion above allows for one way by which the child grammar may
THE ABROGATION OF DS FUNCTIONS
185
diverge from the adult grammar, by which a computational weakness in the child’s grammar may be viewed as giving rise to an analysis which is itself parametrically possible: an adjoined RC analysis. I believe that this is in general the case: the failures in the child grammar give rise to analyses which are parametrically possible, and that, in fact, the grammar is organized so that this is the case. That is, the child, while not speaking grammatically according to the adult grammar, must nonetheless be speaking grammatically according to some grammar — some option in Universal Grammar. This constitutes a sort of wellformedness constraint over the intermediate grammars that the child adopts. The present chapter considers another way by which the intermediate grammars through which the child passes may be well-formed in this broad sense. In this case, I will argue that, for three separate constructions, the child adopts an analysis which is “shallow” with respect to the representations computed in the adult grammar: rooted in S-structure or possibly the surface, but extending back only part of the way toward D-structure. I will remain rather neutral throughout this chapter on whether this effect is essentially computational in character and has no repercussions on the actual grammar, or whether it does affect the way that parameters are initially set. That is, if we believe that the derivation DS → SS involves computational operations in some broad sense, and we believe further that the child’s computational resources are more limited than the adult’s, then it would be expected that the set of operations, s1, s2, … sn which relate the two levels in the child’s grammar would not be as full as in the adult’s grammar. Whether this would occur on a construction-by-construction basis, perhaps isolated to constructions involving dislocations, or is overall, would be a question yet to be answered. Similarly, the question of the direction of the shallowness. In the classical Derivational Theory of Complexity (Miller and McKean 1964; McNeill 1970; Fodor, Bever, and Garrett 1974; see also Wexler and Culicover 1980), the child’s grammar was assumed to be lacking in specific transformations, which would be added onto D-structure. The child’s grammar would thus have the structure in (3a), while the adult’s grammar had the structure (3b).
186
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(3)
a.
DS
SS′
b.
DS
shallow derivation
SS
SS
Child’s derivation
Adult’s derivation
The shallowness of the derivation, construed as a lacking of optional transformations in the mapping from DS to SS, would give rise to different surface structures. The child would speak structures of the form: SS′, while adults would speak fully “processed” representations: SS. Differences in the child’s and the adult’s grammar could be traced to this fact (Miller and McKean 1964; McNeill 1970). It is interesting to re-evaluate the Derivational Theory of Complexity in light of more recent theories of the grammar than that adopted at the time of its first proposal (essentially, early versions of the Standard Theory). While the current theory retains transformations, these are not the lexically governed and specific transformations of the Standard Theory, but rather instances of a single movement rule, Move-α. The possibilities for output are, again, not specified in the operation itself, but by a system of principles which forces the products of any particular operation to take a particular form: that is, the interaction of Case theory, Theta theory, Binding theory, Control theory, the Projection Principle, and so on. Thus, while Move-α applies, the child does not actually learn individual transformations: rather, he or she learns (or has triggered in him/her) the principles which govern the full set of possible derivations. Given these facts, it may appear that the derivational theory of complexity, in any form, is irrelevant to the current view. Before addressing that question directly, however, we may note that a number of the empirical arguments which were given against the Derivational Theory of Complexity in Fodor, Bever, and Garrett (1974) — which essentially adopted a negative view of it — would no longer hold given current analyses. Thus one of the arguments against the Derivational Theory of Complexity was that children did not exhibit the full counterpart of sentences which underwent an Equi Transformation (Equivalent Noun Phrase Deletion, an early rule in which coreferent nouns were deleted: now known as Control). That is, sentences of the form (4a) were not present in the grammar prior to sentences of the form (4b) (the D-structures of the sentences in (4b) used to be thought to be those in (4a), with a coreferent noun phrase in the
THE ABROGATION OF DS FUNCTIONS
187
lower subject position. This noun phrase was later deleted to create the surface. This rule was known as “Equi”: Equivalent Noun Phrase Deletion.) (4)
a. *John John b. John John
tried (for) John to leave. wanted him/himself to leave. tried to leave. wanted to leave.
While it might be argued that the sentence which underwent the obligatory transformation in (4a) (“John tried for John to leave”) would not be expected to be present in the output in any case, since the transformation is obligatory, this would not be so for the second sentence: John wanted him/himself to leave. The input is itself fully well-formed in one of these variants, and so, given the optionality of transformations, would be expected to be present in the surface first, if the Derivational Theory of Complexity were correct — that is, if computationally more complex constructions surfaced later. I believe that the logic of this argument against the Derivational Theory of Complexity would be correct and convincing — if it were the case that an Equi transformation existed. In current work, however, no such transformation is assumed. Rather, the subject of the embedded clause is at all levels null, and is coindexed with the matrix subject (or object) by the coindexing rule of Control. (5)
a. b.
Johni Johni Johni Johni
tried PRO to leave. tried PROi to leave. wanted PRO to leave. wanted PROi to leave.
→
Control
→ Control
Given the assumption that control applies rather than equi, the Fodor, Bever, and Garrett argument does not go through simply on empirical grounds. There is no D-structure which has a full noun phrase in the embedded subject position, hence the fact that such elements do not show up on the surface antedating the appearance of the truncated form does not constitute a computational argument against the Derivational Theory of Complexity. It might be suggested, nonetheless, that a variant of the Fodor, Bever, and Garrett argument may be resurrected, but simply applied to the rule of Control itself, rather than to Equi. Suppose that the relevant DS, as in (5a) and (b) above, simply involves a null subject in the controlled clause. Assume, as above, that a rule of Control applies, coindexing the embedded subject with an element in the matrix. Then a computationally weak system might be assumed to not undergo an instance of Control: the output would simply be the DS form with the surface embedded subject not coindexed.
188
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(6)
Revised argument against Derivational Theory of Complexity: a. DS: Johni wants PRO to leave. → no Control SS: Johni want PRO to leave. b. DS: Johni tried PRO to leave. → no Control SS: Johni tried PRO to leave.
Given a surface structure like that in (6a) and (b), it might be suggested that the child would adopt a default analysis for the unindexed PRO: say, that it is interpreted as arbitrary in reference. Further, while it may be argued that the control rule applies obligatorily in the case of predicates like try, it does not apply obligatorily for want, and thus an indexed PRO, and a consequent erroneous reading, would be expected in that case. But such a reading does not appear. This revised argument against the Derivational Theory of Complexity is of course more powerful, given current assumptions. However, it itself is susceptible to question. The main issue would revolve around the status of Control as a rule vs. Control as a principle. If control were simply an (optional) rule in these cases, then the lack of indexing on PRO (and a consequent erroneous analysis) would indeed be expected. However, if Control is a principle, and if such principles are indeed part of the genetic basis of language, then it might be expected that control would apply immediately. Moreover, the fact that control is not necessary in cases like “John wanted for himself to win” is irrelevant, since these structures do not match the domain over which the control principle is stated: namely over structures of the form … NP … PRO.., with some minimal distance-type principle ensuring locality. Thus, given the assumption that Control is a principle, the revised argument against the Derivational Theory of Complexity would not go through either.
5.2 Computational Complexity and The Notion of Anchoring These points notwithstanding, I do not wish to undertake a resurrection of the Derivational Theory of Complexity. Yet I do believe that the basic conception behind it is correct: that there is some fairly straightforward relation between the set of computations undertaken by the child, and the complexity of the analysis that the child arrives at: the set of computations used by the child is simply less rich. In earlier chapters, I outlined how this would work with respect to telegraphic speech and relative clauses: in the former case, the child was computing a partial structure (the thematic representation), in the latter case, the child was using an earlier specified, default rule (conjunction rather than adjunction).
THE ABROGATION OF DS FUNCTIONS
189
With respect to wh-movement, there is evidence for the shallowness of the child’s grammar but not in the direction that the Derivational Theory of Complexity supposes. Rather, the grammatical analysis is shallow in the other direction: anchored in SS or perhaps the surface, with a “shallowly generated” DS, DS′. If we imagine the child, or the adult, being handed a level of representation, SS or more exactly the surface, for the purposes of comprehension, the other levels of grammatical representation DS and LF must be computed from that. To the extent to which the computational system is immature and not fully strong, it will not be able to go “backwards” enough to compute DS (and possibly will lack in some aspects of LF as well). Thus the grammar will be shallow in comprehension, but not in the direction that the Derivational Theory of Complexity supposes. Rather, it will be anchored in S-structure, and the D-structure which is computed will not be the D-structure of the adult grammar. Rather it will be some other D-structure, a deepest computed structure, D-structure′. Let us consider this in slightly more detail. The adult grammar generates quadruplets of structures: (DS, SS, Surface, LF). Any particular sentence is given, at least, a representation at each of these levels. (7)
Surface: DS: SS: LF:
Who did you see (t) ? You Infl see who? Whoi did you see ti? For which x, you saw x?
Further, these representations have a number of other characteristics. DS is ordered before SS, and SS is ordered before LF, in the sense that Move-α (and possibly other rules) apply to DS to derive SS, and apply to SS to derive LF. Move-α thus induces an ordering over the levels. In addition, particular constraints hold over the individual levels: the Projection Principle holds at all levels, and individual constraints or modules, Case Theory, for example, or the Binding Theory, are earmarked for particular levels, though in a way which is as yet not fully clear (see discussion above). Move-α may be defined in either of two ways: as a particular relation between levels, or as a projection of a certain sort of information, chain information, on a particular level. Let us now make a supposition: the child’s computational system computes shallow analyses for particular constructions (perhaps particularly for comprehension). These analyses are rooted or anchored at SS or the surface (what the child hears) but not all aspects of DS are recovered. In particular, the operation of the rule Move-α is not fully undone. Thus while the computed representation may recede part of the way back to the adult DS, it is not fully such a level, but is rather an intermediate level: DS′. The result would be the following:
190
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(8)
DS
DS
DS′
SS PF
SS LF
Child’s derivation
PF
LF
Adult’s derivation
That is, the child would still compute 4 levels of analysis. However, one of these levels, the deepest level DS′, would differ from the adult computation of DS. In particular, it would be closer to the surface than the comparable adult DS. With respect to the sentence given above, for example, the child’s full representation might be the following. (9)
Surface: DS: SS: LF:
Who did you see (t)? Whoi did you see ti? Whoi did you see ti? For which x, you saw x?
The child’s DS representation, in this instance, would match the SS representation. The wh-element, rather than being generated, at the deepest level, in the object position, would be generated in Comp, or in Spec C′ (Chomsky 1986b). The consequence of the shallowness of computation would be a representational difference in the computed deepest structure. Before proceeding, an additional comment is necessary. In the earlier chapters, I have suggested that the grammar change for the child with respect to his or her analysis of relative clauses or telegraphic speech. The situation is considerably less clear with respect to the analysis of dislocated elements. In particular it is unlikely that we would wish to say that the D-structure is actually constructed as a matter of the grammar, in the course of an attempt to understand sentences with dislocated structures. (I owe this point to Janet Fodor.) Instead, we may assume that the comprehension process, starting from S-structure or the surface, computes back toward D-structure, but perhaps only part of the way. Thus, under different conditions, the computed D-structure (that is, D-structure’1, D-structure’2, etc.} may be distinct depending upon computational load, receding toward the adult DS in cases where the load is light, or pragmatic
THE ABROGATION OF DS FUNCTIONS
191
information has intervened to make the computation more possible. For a given child, at a particular stage and thus with a particular given grammar, a variety of DS′ may be computed, depending on computational load. (10)
DS DS′′′
Potential levels of analysis
DS′′ DS′
SS PF
LF
Thus for the example given in (11), the child’s analysis may generally have the wh-element in dislocated position at the deepest computed level, but it may occasionally have the wh-element originating in the (adult) DS object position as well. (11)
Surface: DS′ DS″ DS″′ SS: LF:
Who did you see (t)? (Computation l): Whoi did you see ti? (Computation 2): Whoi did you see ti? (Computation 3): You saw who? Whoi did you see ti? For which x, you saw x?
This is in accord with the psycholinguistic fact that the possibility of analysis will change under differing conditions. The claim above would be that the child’s analysis is shallow, not his or her grammar. As noted above, this fact might be taken to be a purely computational fact. Or it might be that it has parametric effects: in particular, for those structures in which the wh-element is analyzed as base-generated in dislocated position, it is also analyzed as being in a theta position (or quasi-theta position) even at the deepest level of analysis. This would mean that a computational effect would have parametric repercussions: the child would fall into a different type of grammar. Some evidence for this is discussed in 5.7.6. The crucial point throughout the chapter however, will be the light that the shallow analysis sheds on the structure of the grammar.
192
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
5.3 Levels of Representation and Learnability While shallowness of analysis in this case need not be considered a property of the grammar per se (but may rather be of the computational-acquisition device as it computes a representation), it does provide a unique clue into the structure of the grammar. Namely a prediction is made: insofar as the analysis is shallow (i.e. extends backwards from SS to DS′), the set of grammatical functions associated with the not-present levels (DS to DS′) would be expected to not be present as well. Suppose that a particular grammatical module (e.g. part of the Binding Theory) applies at DS. Given that DS is not available, in a particular analysis of a string, to the child, the part of the Binding Theory which was stated over DS would also be expected to be not present. Thus the set of structures which underwent some rule at DS (being marked for coreference or obligatory disjoint reference} would be expected to be treated differently in the child’s grammar than in the adult’s grammar. This is shown in (12) below: (12)
PF
DS
set of operations or rules which apply at DS
DS′
shallow analysis
SS
“anchor” of the child’s analysis LF
We may say that the child’s analysis is anchored at a particular level: SS, or the surface. (For convenience, I will henceforth simply use SS as the anchoring level rather than the surface. No theoretical point is intended thereby, and I use it for convenience since the properties of that level, but not of the surface, are relatively well-explored. The contrast is with D-structure anchored representations.) Over time, the derivation fills out backwards, the analysis becomes less shallow. At any particular time, however, the analysis is shallow, not encompassing the adult DS. This, however, has a consequence. The set of grammatical functions associated with the adult DS will not be present in the child’s grammar. Borrowing, and changing, terminology from Williams (1986), the set of grammatical functions associated with DS would be unavailable: this is the abrogation of DS functions.
THE ABROGATION OF DS FUNCTIONS
(13)
DS
functions unavailable
DS′
deepest level computed
SS
anchor of the analysis
193
To say that SS is the anchor of the analysis is to say that the computation proceeds backwards from that level, at least in part. This fact may then be used in a positive way by the linguist: to determine the structure of levels or organization of rules in the grammar. Insofar as particular grammatical functions be shown to not be available to the child (e.g. some aspects of Binding Theory), they are earmarked as belonging to the missing levels: i.e. in the domain DS–DS′. The shallowness of analysis would thus give us insight into the structure of the grammar, and where therein particular operations apply. There is a second property of this type of analysis which is worthy of note. I have suggested somewhat tentatively that this aspect of shallowness of analysis (with the anchoring at SS) may not be general, but rather associated with a particular process: comprehension. What about the child as speaker in the speaker/hearer duality? Here I will mention a possibility that will remain quite speculative. Let us suppose that the child as speaker again adopts (computes) a shallow analysis. However, this analysis is shallow in the opposite direction: anchored at DS, but shallow with respect to SS. The result would be the following: (14)
PF
DS
production anchored at
SS′
shallow S-structure
(SS)
LF
corresponding adult S-structure
Representation 1: Anchored at DS
194
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
PF
DS
corresponding adult d-structure
DS′
computed structure
SS
analysis anchored at LF
Representation 2: Anchored at SS Such a system would then give the following relation of the computational system to the structure of the grammar, where comprehension and production are viewed with a suitable degree of abstractness. (15)
i. ii.
Comprehension: shallow in the upwards direction, anchored at SS, functions uniquely associated with DS not available. Production: shallow in the downwards direction, anchored at DS, functions uniquely associated with SS not available.
While the grammar is of the Chomsky-Lasnik type, the anchorings for comprehension and production would be distinct. Let us define a grammar/computational system as equipollent if it has the following property. (16)
A (Grammar, Comput. System) is equipollent if it is anchored at all levels for all operations (comprehension/production).
(17)
The adult grammar, but not the developing child’s, is equipollent.
(16) and (17) together give us a characterization of a developing grammar. Moreover, this characterization is not only available to us as linguists, but to the child him/herself. This suggests a solution — a general solution — to the problem of overgeneration and negative evidence discussed in detail in Pinker (1984). Recall that Pinker faced a serious problem with respect to the intermediate grammars that the child adopted. Namely, the child, at early stages, produces sentences like the following: (18)
Me give ball Mommy (“I gave the ball to Mommy”) I walk the table (“I am walking on the table”)
In Chapters 2 and 4, I suggested a particular solution to the ungrammaticality of these utterances: namely, that they are not ungrammatical at all, but rather
THE ABROGATION OF DS FUNCTIONS
195
correspond to subrepresentations in the adult grammar: the theta representations. Let us consider a second possibility here, which will appear incompatible with the other approach. (In the genesis of this thesis, I considered the approach here first, and later abandoned it in favor of the approach in Chapters 2 and 4; I will attempt to synthesize them here). This approach originates more directly out of an attempt to come to grips with certain problems in Pinker’s approach. It depends crucially on the notion of anchoring at a level. A natural assumption given Pinker’s approach is that the sentences in (18) would correspond to the following phrase markers (note that this would not be the representation given in Chapters 2 and 4, where they would be part of the sub-phrasal syntax). (19)
a.
S NP
Me b.
VP V
NP
NP
give
ball
Mommy
S NP
I
VP V
NP
walk
table
But while this assumption is natural, it leads, as Pinker notes, to a serious difficulty. If we assume that lexical heads have subcategorization frames corresponding to these phrase markers, these would have to be the following. (20)
a.
give: ____
b.
walk: ____
NP NP theme goal NP location
The problem is that these subcategorization frames are impossible for the adult grammar. That is the goal for give must either be marked with to or must precede the theme: the locative object of walk is the object of an on preposition (in this usage). How does the child then get rid of the erroneous subcategorization
196
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
information? (Note that this problem, a delearnability problem, does not arise in the representation in Chapters 2 and 4, because the representations would be taken to be accurate, though in a subgrammar: let us put aside this solution for now.) This is the problem of negative evidence, in its strongest form. It might at first be thought that a uniqueness principle would be appropriate. For example, if it could be argued that every lexical item has a unique deployment of thematic relations and category types, then the later acquisition of a subcategorization frame such as the following would knock out the subcategorization frame in (20a). (21)
give: ___
NP theme
PP goal
While it can indeed be argued for the subclass of “give” verbs that only one DS deployment exists — NPtheme PPgoal, with a possible movement operation producing the double object form (see Stowell 1981; Baker 1985, for a theory along these lines) — and thus while a uniqueness principle may in fact be used for this set of examples to exclude the erroneous entry, this cannot in general be the case. The spray/load class of verbs, for example, allow two realizations of objects. (22)
a. b.
spray the wall with paint spray paint on the wall spray: ___ NP (with NP) loc (inst spray: ___ NP (on NP) inst (loc
So the existence of two lexical entries per se, for a given verb, cannot in general help the child in excluding initial erroneous entries. However, this leaves the question of how the child eliminates the offending entries in (18)–(20) from the grammar. In an important contribution, Pinker (1984) adopts one possible solution. Namely, he suggests that until the final grammar is set, lexical entries (and phrase structure rules) are given provisional status, by the device of “orphaned nodes”. This means that the phrase structure rule, and the corresponding subcategorization frame, that the child uses is marked with a question mark to indicate its provisional status; the phrase marker itself contains an “orphan” and a possible mother node (or set of such nodes). The lexical entries in the grammar corresponding to the PS expansions in (19) would then be:
THE ABROGATION OF DS FUNCTIONS
(23)
a.
?give:
___
b.
?walk:
___
197
NP NP theme goal NP location
Since these entries are not given full-fledged status in the grammar, the problem of the lack of evidence to eliminate the entries in (23) does not arise. How, then, is the correct grammar reached? According to Pinker, the intermediate entries are assigned some provisional probability of occurring. The actual sanctioning of a lexical entry is not all-or-none, but rather with respect to a learning curve. In the long run, not enough evidence is gotten from a late enough stage to allow for the erroneous entry to be permanently listed. While the above solution is interesting, it has a peculiar property. The entire grammar is “up for grabs” at every intermediate point, and there is, in addition, no definite way of knowing for certain when the end point has been reached — i.e. when the “question mark” has been erased. This means that the entire grammar has, at every intermediate point, a rather provisional status. This character might be argued to be not a failing, but a virtue: that this is precisely what occurs in a learning system. That is, the all-or-none idealizations of linguistics are false precisely on this point, and the notion of a learning curve and a learning system, must explicitly allow for the notion of a “questionmarked” entry, where the question mark ultimately fades into oblivion. However, there is one good reason to suppose that this solution is less than optimal. This is because a linguistic system is not simply a group of isolated facts, but has itself a deductive structure. Certain pieces of information must be used as the basis for determining other pieces of information. This means, in turn, that the pieces of information which are used as such a basis must be known with almost exact certainty, otherwise, the degree of uncertainty in the initial entry infects the rest of the grammar. In fact, to the degree to which more than one piece of information enters into a deduction, the degree of certainty in the result decreases with the multiplicand of the certainty of the elements in the basis: two elements, each with the relative certainty of .8 and .7 give rise to a deduction of certainty only .56. What would this mean, in terms of the characterization of the abstract, learnable, deductive system? It would mean, I believe, the following. The ideal learnable deductive system should not so much have a normal distribution in terms of the certainty of the elements of the grammar therein at intermediate points, but rather have something closer to a bi-modal distribution. Certain elements would be known with almost exact certainty: say, probability .98 or
198
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
above. Certain other elements would be provisional in status: clustered at some markedly lower probability (say, .60), and marked as such. Given such a distribution, the function of the elements in the grammar would differ. Namely, the elements clustered around surety would act as the deductive basis for further pieces of information in the grammar. The elements clustered at the lower probability would not. Further, the elements at the lower probability would be checked for exactness by the application of the deductive structure inherent in the system to the elements which are known with relative certainty: that is, the “certain” elements would act in tandem to weed out the less certain. Further, in the course of development, elements would move from an unsure characterization as to accuracy to a quite sure characterization fairly rapidly. There does seem to be some evidence for such a distribution. Consider a standard description of learning, as it applies to the natural learning of some element of an articulated system such as language. It goes something like the following. The child, at the initial stage, uses a piece of information sporadically and often wrongly. This stage may last years. There appears then a stage in which the element is used over and over again, often incorrectly but with greater and greater frequency of appropriateness. This stage is comparatively very quick: often lasting only a month or two or three. Finally, the construction is mastered, and frequency of use again drops down. The intermediate stage is much shorter in duration than either of the two flanking stages. Labov and Labov (1976) note exactly such a process in the mastering of questions. For about fourteen months after Jessie’s first wh-questions, she showed only sporadic uses of this syntactic form, less than one a day. In the first three months of 1975 (3:4–3:8), there was a small but significant increase to two or three a day. Then there was a sudden increase to an average of 30 questions a day in April, May, and June, and another sudden jump to an average of 79 questions a day in July with a peak of 115 questions on July 16th … After the peak of mid-July, the average frequency fell off slowly over the next two months (to 4:0), then fell more sharply through October and December to a stable plateau of 14–18 a day for the next seven months.
This pattern would correspond exactly to one in which an element began as unlearned (i.e. with low subjective assignment of probability of accuracy: prior to 1975 here), to one in which it had (from October 1975 onward): in between, was the time of discovery, relatively short. Such descriptions are of course characteristic of most instances of learning, and are obvious from simple observation. Note that if this description is correct, the “all-or-none” characterization of learning in linguistic theory is in fact close to the truth. While some variance must at both points be allowed, what is crucial is that the distribution is
THE ABROGATION OF DS FUNCTIONS
199
bimodal, where the modes correspond to the long periods of time over which the element is viewed as “not learned” and “learned”, the two points of temporal stability. Furthermore, we may expect that the elements in the two groups differ in function: the elements which are known are the basis for further deduction, while the elements which are not known are not.
5.4 Equipollence Assuming the above as a general characterization of the system, there must be some way by which particular entries are marked as certain, while other entries are not. Rather than assuming that this is done simply quantitatively, and tagged onto the system, let us take it as a working assumption that this is done in the representational system itself. If this were the case, then we may have a reason for the necessity of the bimodality in certainty: a necessity linked to the representational system. The above-mentioned paradigm suggests a way in which this might be done. Suppose that the child saves two representations of a given lexical item or syntactic substring. One is anchored in the more “surfacy” level — i.e. S-structure or the surface. This representation extends to the other levels (DS, LF, etc.), but is anchored in a particular, surfacy level: say: S-structure. The other representation also extends to all levels, but is anchored at another level: say, DS or underlying representation. The child therefore has two representations, each with a full complement of levels (and thus fully formed), but with a different anchoring level. How does the child then know when his or her final grammar has been reached? Suppose that, rather than relying on a device such as questioning all intermediate entries, the child uses the notion of “equipollent” (equally anchored) defined above. In particular, the following holds: (24)
(25)
When the representation underlying a construction is equipollent (single representation anchored at all levels), the representation is final and correct. When the representation underlying a construction is not equipollent — i.e. consists of either two representations, or a single non-equipollent representation, it is provisional.
Learning, then, would be the process of converting representations in the form of (25) to the form in (24). Note that this faithfully represents the idea that there is a basic two way distinction in the form of the knowledge and that this is encoded
200
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
in the representational system: either a piece of information is in the form (24), in which case it is “learned”, or it is in the form (25), in which case it is “not yet learned”. Further, the child knows, by the representational system itself, which element falls into which group: the non-learned elements have multiple, nonequipollent entries, while the learned elements have a single, equipollent entry. How might this work in practice? Suppose that we take the “overgenerations” noted earlier: (26)
a. b.
Me give book John (“I gave a book to John”) I walk table (“I am walking on the table”)
Rather than supposing a provisional lexical entry for these constructions, let us imagine a real one: (27)
a.
give: ___
b.
walk: ___
NP NP theme goal NP location
Clearly, however, this cannot be all that is said, or the entry could never be driven out. Let us seek help from the Projection Principle, which states (roughly) that representations from all syntactic levels are projections from the lexicon. This means, however, that we may view S-structure as a projection from the lexicon as well. Let us suppose that this is the case, and that the child’s grammar contains a representation of the lexical representation underlying S-structure, as well as that underlying DS: so much is implicit in the Projection Principle. Let us, however, put this together with the notion of anchoring. Suppose that the child has two representations, not one, of a single lexical item (in a single usage). One is anchored at DS, it is the one that underlies the sentences that the child produced in (26), and is to be found in (27a). (28)
give: ___
NP theme
NP (“Me give doll mommy”) goal
The second is anchored at S-structure, and consists of the actual heard form: (29)
give: ___ NP theme
PP goal
The child’s full representation, of a single lexical entry, therefore consists of the following subentries, both distributed throughout the grammar (recall that there are two entries, not one), but anchored at different points.
201
THE ABROGATION OF DS FUNCTIONS
(30)
a.
Subentry 1: DS
Give: ___
NP theme
NP goal
SS
Give: ___
NP theme
NP goal
PF LF (PF and LF entries same as DS and SS) Anchored at DS b.
Subentry 2: DS
Give: ___
NP theme
PP goal
SS
Give: ___
NP theme
PP goal
PF LF (PF and LF entries same as DS and SS) Anchored at SS Notice that this solution immediately explains one problem which is puzzling in Pinker’s account. Namely, if the child’s grammar only contains the first entry in (28) (with the question mark attached), how is the child able to comprehend sentences such as “John gave a ball to Mary”? That is, at the same time that the child is generating (in the nontechnical sense) erroneous forms, the appropriate representation must be available somehow to account for comprehension. The representation in (30) does so, and anchors it appropriately. More centrally, the outline of a way to handle the negative evidence problem can be seen. Recall the problem: the child is exhibiting constructions which appear to be ungrammatical from the point of view of the adult. Furthermore, the lexical subcategorization frames underlying them are overgenerated, but appear not to be eliminitable from uniqueness principles alone. Rather than marking broad sections of the grammar provisional per se, two entries are listed in the grammar, anchored in different places. Since the representation is not equipollent, being neither single, nor anchored at all levels, the child knows that
202
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
the form, which he may be using, is not the final form that it is appropriate to have for the grammar, i.e. the child knows that “learning” must take place. This method has one great advantage. Aside from marking particular entries as provisional, it allows other entries to be unequivocally marked as final: i.e., complete and accurate. These are the entries which are equipollent. Thus, suppose that at a later stage, a different representation of give was internalized. (31)
give: ___ give: ___
NP theme NP theme
PP (anchored in DS) goal NP (anchored in SS) goal
At this stage, this part of the grammar would be equipollent, anchored both in DS and SS (we may ignore LF and PF, for current purposes). More exactly. the syntactic structure corresponding to the projected lexical entry would be equipollent, so the lexical entry would be. This would mean, however, that this part of the grammar could be considered by the child to be complete and true: namely, a full and complete lexical entry. This is important, because certain sections of the grammar must be known as certain, and not simply provisionally, in order for the child to use them to make other judgments. That is, in situations of partial information, it is important that certain of the pieces of information to be known absolutely (or nearly absolutely) as true. It is these points which act as the basis for further inference. This presents a picture different than that in Pinker (1984). Rather than the grammar being provisional en masse, and gradually achieving certitude, particular parts of the grammar — namely, those which are equipollent — are known to be accurate. It is these which “exert their force” over the rest of the grammar from the point of view of inference. This has an advantage over that which Pinker assumes, because if an entry is marked as provisional it would not act as the basis for further inferences in the grammar, and hence would not infect the rest of the grammar. Consider now briefly how this sort of system may be accommodated to the approach sketched in Chapters 2 and 4. A single equipollent entry may be created from two nonequipollent entries in the following ways: i)
ii)
By retaining the D-structure anchored entry and removing the S-structure anchored entry from the grammar (allowing the D-structure-entry to be anchored at all levels). By retaining the S-structure anchored entry and removing the D-structure anchored entry from the grammar (allowing the S-structure anchored entry to be anchored at all levels).
THE ABROGATION OF DS FUNCTIONS
203
iii) By retaining both the D-structure and S-structure entries, and positing an operation which mediates between them. The difference between the approach in Chapters 2 and 4 and that given directly above is a difference in the type of approach above. In the chapters above, the third mechanism (iii above) is adopted, assuming that the initial representation (“me go Mommy”) is actually retained in the final grammar, and the problem for the child is to mediate between this and the adult representation — which he/she does by means of the rule Project-α. In the assumption directly above, the D-structure anchored entries by the child are assumed to be simply false, and the child ultimately projects back the S-structure anchored entry, using mechanism ii). The problem there was for the false entries to be eliminated, without the entire grammar falling into disrepute (by the means of questioning entries). This was done by means of allowing two entries by the child, but not allowing them to be equipollent. Let us assume henceforth that the approach in Chapters 2 and 4 is correct, rather than the one outlined directly above. In this case, it will still be necessary to allow for two entries, one anchored at DS (or, more exactly, thematic structure), and another anchored at SS. Thus the crucial notion of anchoring still holds. Further, it will be necessary to mediate between these two representation. This may be done by the following rule: (32)
a.
b.
Entry merger: If α is the entry for a word anchored at DS, and β is an entry anchored at SS, and there is some operation γ existing in UG mediating between α and β, then Merge (α, β) with the operation γ. If (a) is impossible, chose α or β as the anchored entry and create an equipollent entry from it.
This creates a single equipollent entry, or allows a transformation to mediate between the two known levels. A number of questions at the theoretical level remain; for the remainder of this chapter, however, I will simply concentrate on empirical evidence supporting the notion of anchoring.
5.5 Case Study I: Tavakolian’s results and the Early Nature of Control In the preceding section I suggested that a general solution to the problem posed by negative evidence was to be found with the concept of equipollence. In the rest of the chapter, I would like to investigate a particular instantiation of the
204
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
notion of “shallowness of a derivation” — in particular, with respect to the anchoring in S-structure, with the derivation shallow with respect to DS. That is, the situation in (33) abides: (33)
DS
DS′
SS Shallow derivation This would mean that the particular grammatical functions associated with the “real” DS would be unavailable to the child at the point at which DS′ were the deepest computed level. Such a general prospect would have three effects. First to the extent to which grammatical functions associated with DS are screened out in early derivations (which only proceed back to DS′), there is evidence that the grammar really is leveled, rather than DS being simply an aspect of SS. Second, the indeterminacy as to where particular operations apply in the adult grammar may be solved — or at least moved toward a solution-by looking at the child grammar. To the extent to which development is truly organized as this model suggests, and not simply helter-skelter, the actual content of DS, and the principles which apply there, may be determined by noting which operations fail to apply at the stage at which a shallow derivation is computed. Finally, the notion of “development” is given real status in the grammar, both in terms of the structure of levels in the grammar, and in the development of particular constructions from non-equipollence to equipollence. I discuss now three phenomena indicating the presence of a “shallow” derivation: Susan Tavakolian’s (1978) data concerning control into sentential subjects, Guy Carden’s (1986b) thoughtful analysis of Condition C effects in dislocated constituents, and Roeper, Akiyama, Mallis and Rooth’s (1986) paper concerning wh-movement, Strong Crossover, and quantificational binding. 5.5.1
Tavakolian’s Results
Susan Tavakolian, in an extremely interesting set of experiments (Tavakolian 1978), has investigated aspects of the acquisition of sentences with clausal
THE ABROGATION OF DS FUNCTIONS
205
subjects. In particular, she tested children’s competence on infinitival clausal complements, including those both with and without a pronominal subject — the latter being control structures. It is these control structures, often called instances of nonobligatory control, after Williams (1980), which are of interest here. Examples of the control structures tested are the following. (34)
a. b. c. d. e. f.
To To To To To To
stand on the rabbit would make the duck happy. bump into the pig would make the sheep sad. walk around the pig would make the duck glad. kiss the lion would make the duck happy. hit the duck would make the horse sad. jump over the duck would make the rabbit happy.
Control properties of structures with subjectless infinitival complements (i.e. apparently subjectless) are extremely various. In particular, the following sorts of control seem to be possible. (35)
a.
b. c. d. e.
“Arbitrary” control, where the PRO refers to an arbitrary unspecified element, similar (in English)to the meaning of “one”. Control from the object or subject of the predicate which the controlled clause is the complement of. Control from a controller up-the-tree, so-called long distance control. Control or interpretation by a discourse or extra-sentential referent, definite in reference. Control by the prepositional object of a restricted class of predicates, mostly psychological predicates, into a sentential subject.
Additional refinements are possible: for example, between thematic and pragmatic control (Nishigauchi 1984), and between control as it takes place in the want vs. try class (Rosenbaum 1967). These are irrelevant for what follows. Examples of the type of control in (35) are given below. (36)
a. b. c. d. e.
To know oneself is difficult. (Arbitrary Control) John persuaded Mary to leave. (Control by Matrix Object) Bill said that shaving himself was a drag. (Long Distance Control) Have you seen Bill? Shooting himself in the foot must really have hurt. (Discourse Control) To know himself is difficult for Bill. (Control by Object into Sentential Subject)
206
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
The theoretical problem is to try to get a unified account out of this apparent diversity. See Williams (1980), Chomsky (1981), Manzini (1983), and Clark (1986) for further discussion. This variance notwithstanding — a topic which will be discussed shortly — it appears that in the Tavakolian sentences, given in (34), and in a large class of like constructions, if an object is present (including a for object), it must control the PRO subject. Thus while (37a) is perfect with the lion as controller, it is considerably worse than an arbitrary “one” reading, and (I believe) impossible with control outside the clause: a discourse referent. (37)
a.
PRO to kiss the duck would make the lion happy. (the lion kisses the duck) b. ??PRO to kiss oneself would make the lion happy. (test for one interpretation by (in-)ability to take reflexive oneself object) c. Did you see the pigs? PRO to kiss the duck would make the cow happy. (Impossible under interpretation in which the pig is kissing the duck)
The same appears to be the case with all of Tavakolian’s examples, which are identical to (37a) in form, except for the choice of NPs, representing different animals. Discourse control, either by an explicitly mentioned referent (37c), or by a pragmatically accessible entity, are impossible for adult speakers, for this class of examples. However, the situation with children is different. Children consistently and systematically allow a discourse referent, one which was not even mentioned in a previous sentence but is pragmatically available in the set of animals with which the child was told to act-out each sentence. Tavakolian’s results were the following (Tavakolian 1978: 187): Table 1. Distribution of Responses to Sentential Subjects with Missing Complement Subjects (“To kiss the lion would make the duck happy”) Response Type Age
Matrix NP
Extrasentential NP
Other
3.0–3.6 4.0–4.6 5.0–6.6
7 8 3
12 13 19
5 3 2
Total
18
44
10
25%
61%
14%
Percentage
THE ABROGATION OF DS FUNCTIONS
207
Thus 61% of the children allowed an extrasentential referent for PRO. For the sentence given, this might be, for example, a pig which was included in the set of farm animals. This percentage did not significantly change for the ages under investigation, though it was, in fact, slightly higher for 5 year olds than 3 year olds. This is clearly at variance with the adult response. Moreover, unlike other somewhat similar cases, it is not liable to the methodological strictures of Lasnik anal Crain (1985) and Crain and McKee (1985), who note correctly that with respect to backwards anaphora with overt pronouns, the predisposition for children to allow an extrasentential reading (Solan 1983) does not make a grammatical point. The authors above note that for sentences such as the following: (38)
After he left, John went to the store.
Children tend to either take he as an extrasentential referent, or, if they allow coreference, to transpose the name and the pronoun in the following manner: (39)
After John left, he went to the store.
But as Lasnik and Crain note, the tendency for coreference in (39) is not a grammatical phenomenon in any case (but purely pragmatic), and, even more crucially, if the child is able to transpose the pronoun and the name in (39) in a repetition task, this means that the two elements in (38) in the comprehension part of the task must have been co-indexed: i.e. known to be coreferent. So the bias, on this type of backwards anaphora, must not be part of the grammar. The situation with the Tavakolian sentences is quite different. First, the necessity for object coreference in sentences such as (40) seems to be a matter of the grammar, not simply pragmatics. Adults do not allow extrasentential referents for the control clause in examples such as the following: (40)
To kiss the duck would make the lion happy.
Second, when children are asked to repeat these examples, they may supply a pronoun for the PRO (Tavakolian 1978), but that pronoun is not taken as referring to the matrix object, but to an extrasentential referent. Assuming that they are paraphrasing their “semantic representation” in some sense, this means that in that representation, and thus in the output to the comprehension task, the missing subject in the control clause is marked for extrasentential reference. Thus the phenomenon appears to be grammatical in character. 5.5.2
Two Solutions
There are two possible avenues to take in attempting to isolate the aspect of the
208
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
child’s grammar here which is different from an adult’s grammar. The difference might fall in the control rule itself — or, more exactly, in how control interacts with levels of representation — or it might be traced to a difference in the categorization of the null element. If the PRO, for example, were interpreted as a simple pronoun in this stage of development, then one might expect free extra-sentential reference. This is, in fact, the explanation that Tavokolian herself offers: that PRO is interpreted as a simple pronoun. A more sophisticated version of this same idea might be the following. All null categories are, at some stage, neutralized along some dimension. Thus if we assume that null categories, in their feature sets of +/−anaphor, +/−pronominal (Chomsky 1981) are stored in a paradigm and the paradigm itself must be learned, then their might be some intermediate stage in the articulation of the paradigm which would predict extrasentential reference. For a recent version of this sort of theory, see Drozd (1987, 1994). Of course, any theory which posits a change in the categorization of PRO, either in the simple version (namely, that pro → PRO) or in the more complex version (that the null category paradigm is learned, and neutralizations exist in it in early stages), must give some account of how the final setting appears from the initial setting. The second possibility is that there is some charge in the control rule over time: more exactly, in the interaction of control with levels of representation. Suppose that, for the constructions under investigation, PRO is its normal adult category (e.g. +anaphor, +pronominal), with standard characteristics. Suppose that, for some reason, the Control rule coindexing the PRO with an antecedent does not apply in the usual manner. An uncontrolled (i.e. unindexed) PRO, we may suppose, is able to pick up a general referent from discourse, or taken to be arb. In this way, the same set of data could arise. In the next two subsections, I will further discuss these two possibilities, coming to the conclusion that of the two possible “failings” — a failure in category typing, or in the control rule as it interacts with levels of representation — the latter is the more likely, though with some real possibility of a mediated view. Finally, I will attempt to justify the difference in the control rule in terms of the general structure of the grammar. 5.5.3
PRO as Pro, or as a Neutralized Element
One solution to the puzzle that external reference provides for the analysis of acquisition is simply to assume that PRO has different referential properties in the child’s grammar, and thus, in GB theory, that it has a different feature composition in this stage of development than it does in the adult’s grammar.
209
THE ABROGATION OF DS FUNCTIONS
The propensity of children to take an external referent in cases like (40), repeated below, would follow if PRO were simply being interpreted as little pro. (41)
PRO to kiss the duck would make the lion happy. (External reference: 61%)
There is some interesting empirical evidence from Tavakolian which supports this view. First, as she notes, the propensity for external control in these structures is very similar to that for similar constructions with overt pronouns. Examples like (42) are given an external referent reading about 55% of the time (Tavakolian 1978). (42)
For him to kiss the duck would make the lion happy.
This similarity in response would be explained if PRO were simply being interpreted as a pronominal. Such a misconstrual would be similar to the sort of theory that Hyams (1985, 1986, 1987) proposes with respect to the acquisition of null subjects: that they are misconstrued as little pro. In spite of the simplicity of the proposal, there is empirical data which severely undercuts it. This data is of one basic type: instances in which the relevant null category seems to be acting like PRO, not pro, in the child’s grammar. To the extent to which these cases are convincing — and they appear to be — there is no way that we can assume a general identification of PRO with pro in the child’s grammar: that is, it is not the case that PRO is simply misconstrued as pro by the child. The clearest empirical counterevidence to this hypothesis is to be found in Goodluck’s (1978) thesis. First, there is a class of control structures studied by Goodluck, but not by Tavakolian, which very clearly operated as if they contained a controlled PRO. These were purposives and rationale clauses, and temporal adjuncts. In the case of in order to clauses, children show a clear c-command constraint on the choice of the controller. Table 2. Control in Purposives (Goodluck) Percentage Subject Control
4 years 5 years
In sentences with Direct Object NPs
In sentences with locative PPs
56.7% 63.4%
90.1% 90.1%
210
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
These are sentences like the following: (43)
a.
b.
Daisy hits Pluto PRO to put on the watch. (56.7% subject control, age 4; (63.4% subject control, age 5) Daisy stands near Pluto PRO to do a somersault. (90.1% subject control age 4; (90.1% subject control age 5)
The correct choice for adults for these constructions would of course be subject control, for both instances. What is interesting here is that even though subject control is not uniform for the children here, it is obligatory (or nearly so) for the constructions involving a locative: those in (43b). This may be explained simply if the child is not able to control PRO out of a locative PP. This c-command constraint would not be expected with little pro, which is allowed free coreference, like any pronoun. If we use Tavakolian’s paraphrase test, we see in addition that an overt pronoun allows reference to either the subject or the locative NP. (44)
Daisy stands near Pluto for him to do a somersault.
If anything, the more pragmatically biased reading in (44) would be that in which Pluto were the antecedent of him. The fact that the child does not allow coreference with Pluto when the subject of the complement clause is nonovert suggests that it is not acting as little pro, which should allow Pluto as antecedent as him does. The second body of data weighing against the general interpretation of PRO as pro comes again from Goodluck, and involves temporal adjuncts. These do not allow extrasentential reference, but rather must refer back to the main clause subject. (45)
Daisy hit Pluto after putting on the watch.
The percentage of subject coreference for these was the following (Goodluck does not provide the degree of extra-sentential reference): Table 3. Subject Control with Temporal Adjuncts Age 4 5
Percentage Subject Control 66.7% 63.4%
This data is again very strongly in contrast with the original data of control into sentential subjects, where extrasentential control was the usual case (“PRO to
211
THE ABROGATION OF DS FUNCTIONS
kiss the duck would make the lion happy”, Controller: extrasentential). The uniform assumption that PRO was simply misconstrued as pro in this stage of the child’s grammar would predict uniform results in the two cases: this is not the case. Again, supplying an overt pronoun, which presumably should operate similarly to pro, easily allows object coreference, to the extent to which the construction is grammatical at all. (46)
Daisy hit Pluto after him putting on the watch. (Only semi-grammatical, but either main clause NP may he coreferent)
All this suggests that the simple misconstrual hypothesis cannot be maintained: while one might suppose on the basis of Tavakolian’s initial results that PRO in sentential subjects was bring interpreted simply as pro, as a result of a general tendency for the null element to take such an interpretations this solution would “overgenerate” pro in positions in which something much closer to the adult control rule was operative. These are in purposives and also in temporal adjuncts. So it cannot be the case that in early grammars PRO is simply uniformly interpreted as pro. There is a second possibility that we might consider given the basic miscategorization hypothesis. Namely, that the paradigm of null categories is underdifferentiated in initial stages, so that the relevant null category is neither pro nor PRO, but something antedating either. This might be done, for example, by supposing that only the +/−pronominal feature and not the +/−anaphoric feature was operative, in initial stages. The reduced paradigm would look like the following. (47)
Full Paradigm (Adult) +Pronominal
−Pronominal
e e
e e
+Pronominal
−Pronominal
e
e
+Anaphor −Anaphor (48)
? Reduced Paradigm (Child)?
+/−Anaphor
In the adult paradigm, the null category will be interpreted as one of the four major types depending on the slot that the element fills in the paradigm: e in +pronominal, +anaphor = PRO, e in +pronominal, −anaphor = pro, e in −pro-
212
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
nominal, +anaphor = NP-trace, e in −pronominal, −anaphor = wh-trace. Suppose that the paradigm is neutralized along the anaphoric dimension: the anaphoric feature had not been “discovered” yet by the child. The result would be a collapsed paradigm in which the anaphoric feature played no role: the +pronominal null category would be a hybrid between PRO and pro; the −pronominal null category would be a hybrid between NP-trace and wh-trace. (Needless to say, this is not the only sort of neutralized paradigm for null categories that might be imagined: I choose this as illustrative.) The general properties of such a paradigm collapse in acquisition (more exactly, underdifferentiation: Pinker and Lebeaux 1982; Pinker 1984) are quite interesting, and bear further investigation: see Drozd (1987, 1994), for further discussion. Nonetheless, there is reason to believe that for the particular set of data under discussion, even this sophisticated version of the misinterpretation of the early null category is insufficient. The reason is the following. The logical difficulty with the hypothesis that PRO is, in early grammars, uniformly interpreted as pro (i.e. a simple, though null, pronoun) resided in the fact that different constructions operated differently. Control structures of the type directly studied by Tavakolian, those involving control of an object into a sentential subject, did indeed allow the null category in the subject of that control clause to freely choose extrasentential referents. One way of accounting for this would be to suppose that it was operating as a null pronominal, and that this followed from some deficiency in the interpretation of PRO. However, this “deficiency” is hardly viable, given that other instances of PRO operate in a standard way, namely as an obligatorily controlled element. That is, the difficulty cannot reside simply in the misinterpretation of PRO, since this difficulty would then be expected to be general: but it is not. The same logic would rule out the undifferentiated paradigm explanation, at least if it is considered alone to be the root cause. The underdifferentiated paradigm would allow us to posit a new null category, not yet differentiated between PRO and pro. But whatever this set of properties were, we would expect them to be consistent, just as the properties of PRO and pro are. However, this is precisely what we do not find: sometimes the null category is acting like a small free pronominal, and sometimes as controlled PRO. That is, while the underdifferentiated paradigm idea would introduce a difference between the adult grammar and the child’s in the interpretation of the element, it does not seem to be the right sort of difference: what is needed is a difference which will allow the null element in the sentential subject construction to have different properties from the adult PRO, but the null category in (say) temporal adjuncts not to have such different properties. An underdifferentiated paradigm, by itself,
THE ABROGATION OF DS FUNCTIONS
213
would not capture this difference (but see Drozd 1987, 1994, for a somewhat different point of view). 5.5.4
The Control Rule, Syntactic Considerations: The Question of C-command
Given the failure of the theory that initial PRO is analyzed as pro (or as a neutralized category), we are driven to look elsewhere for an answer. The questions appear to be: (49)
a.
b.
Why are certain control complements (purposives, temporals) behaving differently than others (control into sentential subjects) for children? Why are children treating clausal subject complements differently than adults, with respect to control?
Underlying this, we might wish for a unified theory of control, at least at some level. One possibility for the difference in (49a) is that this difference is associated with the distinction between OC (Obligatory Control) and NOC (Nonobligatory Control), in the sense of Williams (1980). Goodluck (1978) makes such a suggestion. This might appear to be on the right track, yet recent work (Lebeaux 1984, 1984–1985; Sportiche 1983) suggests that the distinction between the two sorts of control, while existent, is not of the primitive sort posited by Williams. If the analysis of arbitrary control of Lebeaux (1984), Epstein (1984) is correct, than so-called PROarb is not an unbound element (i.e., a free variable), but rather operator-bound. Evidence for this is found in double binding constructions (Lebeaux 1984). (50)
a. b.
c.
PRO to know him is PRO to love him. PRO to get a nice apartment requires PRO getting a higher paying job. (*PRO to get a nice apartment requires PRO getting trustworthy tenants.) PRO to become a movie star involves PRO becoming well-known. (*PRO to become a movie star involves PRO recognizing you.)
In such constructions, each PRO is arbitrary in reference, but the two PROs must be linked in reference. Thus (50a) means that for some arbitrary person to know him, that same arbitrary person will love him. (50b) means that for some person, x, to get a nice apartment, that same person x, must get a higher paying job. The logical representation of the structures in (50) is therefore the following.
214
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(51)
a. b. c.
Ox ((PROx to know him) is (PROx to love him)) Ox ((PROx to get a nice apartment) requires (PROx getting a higher paying job)) Ox ((PROx to become a movie star) involves (PROx becoming well-known))
Further, the operator binding must take place quite locally since the double binding effect disappears when one of the open sentences is further embedded. (52) (53)
PRO being from the Old World means that stories about PRO winning the West are unlikely to be thrilling. PRO being from the Old World means PRO hearing stories about PRO winning the West.
In (52), the two arbitrary PROs are unlinked: since the latter is embedded in an NP, it is not “close enough” to the other arbitrary PRO so that they are bound by the same operator. In (53) the first two PROs are close enough, and they are linked in reference (bound by the same operator); the third arbitrary PRO is unlinked, being embedded in an additional NP. In earlier work (Lebeaux 1984), I suggested that: i) PRO (including arbitrary PRO) must always be bound (to account for the above facts), ii) that the binding is local (to account for the above facts and the crossing effects), and iii) that the binding element was a universal quantifier in an operator position. I wish to retain the first two of these assumptions, and may do so via the following specification: (54)
PRO must be bound in the minimal maximal NP, S dominating controlled S′, where controlled S′ is the S′ most immediately dominating PRO.
With respect to the third assumption, I will change that here in the following way. Namely, the binding element is not a universal quantifier, but rather a simple abstractor. This abstraction, however, does not take place locally, i.e. within the predicate, but at the S′ or S″ level (compare to Chierchia 1984). This element will continue to be represented with O, simply meaning a null category (assuming that null categories in particular positions may have operator status). Second, in part in response to considerations raised in Browning (1987), I will no longer assume the binder to be in Comp, but in a topic position, or some (quasi-)theta position peripheral to S. The reason for this will appear below. The idea that arbitrary PRO and long distance control PRO are in fact operator bound (see the above mentioned work for additional discussion of Long
THE ABROGATION OF DS FUNCTIONS
215
Distance control PRO) means that another issue, which might initially be thought to be decided by incontrovertible evidence, is thrown into high relief. Namely, is c-command necessary for all instances of control? It would be well if it could be, since this would regularize it to other co-indexing operations in the grammar. Yet both Williams (1980) and Chomsky (1981) are driven to answer the question in the negative, Williams using the lack of c-command as a criterion for his classification of Non-Obligatory Control, while Chomsky correctly notes the existence of sentences such as this: (55)
PRO to learn math is necessary for John’s development.
Indeed, if John is the direct controller of PRO here, this would counterexemplify the need for c-command directly. With the possibility of a nonovert topic operator, however, PRO could get its reference from that, with the topic itself taking the reference of John from discourse, similar to the situation noted in Huang (1982) for Chinese. (56)
Oi ((PROi to learn math) is necessary for John’s development).
One argument against such an analysis is that there would be a Condition C violation with respect to the coindexed operator and John, which have the same referent. However, this violation is weak, in pseudo-topic type constructions in English. (57)
?As
for John, to learn math is necessary for John’s development.
Thus one would not expect Condition C to rule out sentences such as (56). There does seem to be some quite suggestive evidence for just such an analysis. It is of the following form: that there are widespread similarities in the patterns of grammaticality and ungrammaticality for sentences which allow (or do not allow) bound reading for control, and the grammaticality and ungrammaticality of sentences with an overt as-for topic. Consider first the following sentences: (58) (59)
a. As for Johni, this shows that hei is a liar. b. ?*As for Johni, this shows that Johni is a liar. a. As for Johni, this sort of thing is important for John’si development. b. ?*As for Johni, this sort of thing is important to John’si mother.
The contrast in (58) is expected: the Condition C violation, while weaker for these constructions than that usually found, is still present (it is considerably weaker when a name c-commands a name, than when a pronoun c-commands a name, as Lasnik 1986, notes). The contrast in (59) is the really interesting one. When a name is part of a nominal like John’s development, it causes a much
216
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
weaker Condition C violation then when it is part of a name like John’s mother. From (59) it is not clear whether this is because the nominal is deverbal in the case of John’s development, or because John’s mother is an animate referent; for present purposes, this does not matter. Quite likely, the contrast in (59) is not due to Condition C at all, but to the aboutness relationship which must hold between the as-for topic and the rest of the sentence, which is (for some reason) easier to contrive in (59a) then in (59b). Consider now the corresponding cases with control. Corresponding to (58a) and (b) is the following contrast from Bresnan (1982). (My judgements, Bresnan finds a stronger contrast yet, OK vs *). (60)
PROi contradicting himself demonstrates that hei is a liar. (Bresnan (51)) b. ??PROi contradicting himself demonstrates that Mr. Jonesi is a liar. (Bresnan (52)) a.
If one assumed that there were some rule of control by which the PRO were coindexed with the element in the element in the lower clause, or one assumed an arb-rewriting rule, this contrast would be inexplicable: such a rule would not be expected to be sensitive to the pronoun vs. name distinction. However, this contrast would be explained if there were a null topic in the cases in (60): in (60b), but not in (60a), a name would be c-commanded by another name. (61)
a. As for Johni, this sort of thing is important for John’si development. b. *As for Johni, this sort of thing is important to John’si mother.
The contrast in (61) is the really interesting one. When a name is part of a nominal like John’s development, it causes a much weaker violation then when it is part of a name like John’s mother. Quite likely, the contrast in (61) is not due to Condition C at all, but to the aboutness relationship which must hold between the as-for topic and the rest of the sentence, which is (for the same reason) easier to contrive in (61a) then in (61b). Consider now the following set of sentences. Here, again, the apparent “control” contrast is strongly in parallel with the “aboutness” relation contrast. (62)
a. As for John, this is important for John’s development. b. *As for John, this is important to John’s mother. c. PROi to learn math is important for John’si development. d. *PROi to learn math is important to John’si mother.
The contrast between (62a) and (b) parallels that between (62c) and (d), though the latter is a case of control and the former is not. This is explained if we
THE ABROGATION OF DS FUNCTIONS
217
assume that it is not the control rule which is sensitive to such predicates as partof-a-possible-controller, but rather that the control rule, like other co-indexing rules, requires c-command by some minimally local element: in this case a null topic. The contrast between (62c) and (d) would then not be stated in the control rule itself, but such a contrast would be factored into the possible constraints on an aboutness rule, which is independently needed to explain the contrast in (62a) and (b), and a structure-dependant control rule. Theoretically, this allows for the control rule, a coindexing rule, to not be sensitive to pragmatic information, and thus regularizes it to other coindexing rules in the grammar, to some degree. The same sort of parallel holds for cases of embedded object vs. embedded subject, as the following quadruplet shows: (63)
a. ?As for Bill, this shows that Bill is really smart. b. *As for Bill, this shows that Mary is right about Bill. c. ?PROi winning the Nobel Prize shows that Billi is really smart. d. *PROi winning the Nobel Prize shows that Mary was right about Billi.
The same comments apply. To summarize: by choosing a null topic analysis, the pragmatic sensitive features of control are, to some degree, teased out, and taken to be part of the aboutness relation between the null topic and the following clause. This specification is independently needed as shown above. While this does not identify obligatory and nonobligatory control (differences exist between them — if Lebeaux 1984–1985 is correct, the former and not the latter is bound by predication; other differences are found in Koster 1984 and Franks and Hornstein 1990), this does regularize control, even so-called arbitrary control, to other c-commanding relations in the grammar. Directly relevant here: it would mean that c-command characterizes all cases of control. Pending further analysis, then, I will assume that the operator-type structure given in (56) is the correct analysis for sentences such as (55). We are still left with the problem of explaining the acquisition facts, and a major linguistic problem as well. In a large number of constructions with control into sentential subjects, it is not possible to take a discourse antecedent — as already noted with the Tavakolian sentences — and, further, in others, a long distance antecedent is not viable. Examples are given below. (64)
a. *Did you see the pigi? PROi to kiss the duck would make the lion happy.
218
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
b. *Johni thinks that PROi to sleep more would be pleasing to his father. c. *Billi said that PROi knowing himself was difficult for his wife. This is in spite of the fact that in many cases an as-for topic would be sanctioned by the following context. Thus in the sentences in (65) and (66) the as-for topic would seem to be licensed by the possessor identical to it. Yet it is still not possible to have control occur. (65) (66)
a. b. a.
As for John, this would please his father. As for Mary, this made her mother angry. Do you know Johni? *PROi to succeed in business would please hisi father. b. *I’ve met Maryi, and PROi to have a mohawk haircut makes heri mother angry.
Therefore, in cases like (65) and (66), it is necessary to prevent the PRO from being controlled by a topic or a discourse element. We thus appear to have two sets of conflicting data. On the one hand, it appears that an operator-type analysis fulfills three functions: i) it explains the linked reading for arbitrary PROs, and the locality involved, ii) it explains the crossing effects for long-distance binding of PRO, and iii) it allows c-command to be maintained for examples such as (57). On the other hand, the possibility of an operator-type reading — which is associated with Long Distance control and discourse control — is strictly limited. In none of the examples in (64) is such a reading available. Let us first note again an observation made earlier: the class of control into subject constructions is associated with a restricted group of predicates. These are of differing types: i) psychological predicates (please, disgusts, excites, etc.), ii) tough-predicates (tough, easy, etc.), and iii) predicates involving necessity (is necessary, requires, etc.), iv) predicates involving causation (make, etc.). While these predicate-types differ from each other in a number of ways, the latter three types allow an expletive subject. (67) (68) (69)
It is tough (for John) to do that. It is necessary (for John) to do that. It makes Mary happy to do that.
Psychological predicates normally take an NP subject, but if the argument is clausal, this may also appear in an extraposed position.
THE ABROGATION OF DS FUNCTIONS
(70)
219
It pleases Mary to do that. that Jeff is so handsome.
It has been recently convincingly argued that psychological predicates have, in at least some of their uses, two internal arguments at DS (Belletti and Rizzi 1986; Johnson 1986). Accepting the basic position of Belletti/Rizzi and Johnson, this means that the D-structure of (71b) is (71a), and that the NP is moved into subject position. (71)
a. b.
e please John pictures of himself. Pictures of himself please John.
(The authors above differ in their DS assignments: Belletti and Rizzi assuming that the s-structure subject starts off in the most internal position in the VP, while Johnson assumes that it originates in a 2nd NP position. For concreteness, I have used Johnson’s structure.) A piece of supporting evidence for the Belletti/Rizzi and Johnson analysis for English is the placement of arguments in nominals. In nominals, where case is not a consideration, both arguments appear in internal position. (72)
the pleasure of John in Mary’s company (Mary’s company pleases John)
The two-internal-arguments analysis allows a long-standing difficulty with the binding theory to be resolved. Given the standard analysis in which reflexivization requires c-command, structures such as (73) constitute a puzzle. (73)
a. b.
Pictures of himself please John. Each other’s choice of friends baffle the two boys.
Given the DS posited by the theorists above, however, these structures are no longer a puzzle. At DS, c-command does hold, and one internal argument acts as an antecedent for the other. (74)
((e
NP)
(please (John
NP)
(pictures of himself
NP)VP)S)
Suppose that we extend the Belletti and Rizzi analysis to the (somewhat more problematic) cases of control. The DS of (75a) would then be (75b); the DS of (76a) would be (76b). (75) (76)
a. b. a. b.
PRO to kiss the duck would please the lion. e would please the lion (PRO to kiss the duck). PRO kiss the duck would make the lion happy. e would make the lion happy PRO to kiss the duck.
220
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Further, assume that the small clause complement originates as part of a complex verb (Chomsky 1957, 1975–1955; Bach 1979). (77)
e would make-happy the lion (PRO to kiss the duck).
The strict c-command of the PRO by the controller is in all cases preserved. Such an analysis, in conjunction with the operator-type analysis, allows us to retain strict c-command as a necessary condition for control. The nearobligatoriness of control by the object in these constructions is tied to the deep structure position of the control clause, before it is fronted. Suppose that control at DS is generally obligatory, modulo the data discussed above. Suppose that for some predicate it does not take place at DS. Then the clause is fronted to an S-initial position. Assuming that PRO always looks to its nearest c-commanding antecedent, and that this relation is local, then at this point null topic insertion will take place, if it has escaped control by the DS object, closing off the sentence. In effect, the necessity of object control for most predicates when the clause is in subject position is a bleeding phenomenon, where the object coindexes with the controlled PRO. Only in such cases where it has escaped control is reference up the tree possible. This account traces both the possibility and near-obligatoriness of object control to the deep structure position of the clause as sister to its object. It is at this point that object control occurs, if it does. The solution is the following: (78)
a. b. c.
5.5.5
The controlled subject clause is an internal argument at DS. The control of PRO is defined directly rather than using a derived notion of c-command. External reference, or reference “up the tree” takes place after the clause has moved to its S-structure position, and only then.
The Abrogation of DS functions
We are now in a position to trace the difference in the child’s interpretation of sentential subjects to a very simple difference in the control rule, as it interacts with levels of representation. The difference is simply this: that for the child, but not for the adult, the control clause is always in the fronted position, both at S-structure, and at the deepest computed level D-structure’ (when I say “dislocated”, above, and throughout the chapter, I am purposely being vague as to whether I mean moved or base-generated in a dislocated position, unless otherwise specified). When the controlled clause originates in an internal to VP position at D-structure in the adult grammar, the following sequence of operations
221
THE ABROGATION OF DS FUNCTIONS
apply in the adult grammar (see (78)): i) application of control, dependant on direct c-command, ii) movement of control clause to fronted position. (79)
Adult grammar a. DS:
S
Control
NP
VP V
e
b.
NP
S
make happy the lion PRO to kiss the duck
S
Fronting VP
NP V e c.
NPi
S
make happy the lion PROi to kiss the duck S′
S′j
S I′
PROi to kiss the duck would
VP V
NPi
make happy the lion
S′j e
The control rule applies in (near-)obligatory fashion to the representation in which the control clause is an internal complement. It demands direct c-command, and has it. Following control, the clause is fronted. The result is the coindexed representation, with PRO coindexed with the controller (here: the lion). Since an element may only be indexed once, this control is final: there is no possibility of extra-sentential reference (in the adult grammar). Consider now what happens with the child. The control rule is constant: it
222
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
applies in the child grammar exactly as in the adult system, and requires direct c-command. PRO also has the same status in the child’s grammar as in the adult’s. The only difference is the following: the child’s initial representation is shallow: DS′ rather than DS. At DS′, the deepest level, the controlled clause already is in the fronted position. (80)
Child’s representation (DS′ and SS): S′ S′j
S I′
PRO to kiss the duck would
VP V
NP
make happy the lion
S′j e
Given that the clause originates in the surface fronted position, it is never in the c-command domain of the object (“the lion”). Since control is stated over direct c-command relations, this means that control cannot apply to coindex PRO with the object. Instead, unindexed, it looks “up the tree”, to the inserted topic, for an antecedent. This is precisely Tavakolian’s result. (Since the Projection Principle and Theta theory must be satisfied, the fronted element binds a null category in the theta position. This null category, however, is not a trace in the derivational sense: the dislocated control clause never originated in that position.) In effect, the application of a control rule in the adult grammar, bleeds the possibility of external reference. Since the clause does not originate in an internal position in the child’s grammar (the deepest representation being shallower), the co-indexing, and bleeding, does not occur. Hence extrasentential reference is expected. The structure of the adult and child grammars is therefore the following:
223
THE ABROGATION OF DS FUNCTIONS
(81)
Adult Grammar Control-by-Object
Child Gramar DS
Control-by-Object
Control applies throughout
Control applies throughout Fronting
Control applies throughout
Fronting
DS′
Control of as-yet-unindexed elements (by topic)
Deepest computed level by child Control applies throughout Control of as-yet-unindexed elements (by topic)
SS The control rule, and the fronting rule apply identically in the child’s grammar and the adult’s. The only difference is that the deepest computed level for the child is post-fronting, while that for the adult is pre-fronting. As such, the operation which is the default for adults, namely “control of as-yet-unindexed elements”, applies as a matter of course for children. This involves operatorbinding, allowing reference extrasententially or up the tree. This allows for an explanation not only of the difference of binding into sentential subjects found by Tavakolian, but also the instances in which control applies identically in the child’s grammar and in the adult’s. These are found in (82). (82)
a. b. c.
Daisy hits Pluto PRO to put on the watch. Daisy stands near Pluto PRO to do a somersault. Daisy hits Pluto after PRO putting on a watch.
As noted earlier, the children’s results in these constructions do not allow extrasentential reference. Given the structure of the grammar in (81), and the retention of the standard control rule by children, this result is expected. In the instances of control with the sentential subject, the actual control rule in the adult grammar applies when the clause is in an internal position. Since the child’s
224
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
grammar is shallow, the clause is never in that position, and escapes standard control. There is no fronting, however, in the examples in (82). Hence the control rule applies, as it should, and the child’s interpretation is identical with the adult’s.
5.6 Case Study II: Condition C and Dislocated Constituents In this section, I would like to look at a different range of experimental evidence supporting the conclusion advanced above: namely, that if a particular operation (e.g. control) applies at DS (as well as elsewhere), and the child’s analysis is shallow, then the child’s grammar will not show evidence of the operation applying. This amounts to the abrogation of that particular DS function. In the previous section, the form that the abrogation took was in the lack of an obligatory rule: the indexing function between internal arguments at DS, a positive condition. In this section, the abrogation is of a negative condition: namely, Condition C as it applies at DS. As such, the child’s grammar will appear to overgenerate. This is a result, again, of the shallowness of analysis by the child. The empirical data that I will draw upon is taken largely from a review article by Guy Carden (Carden 1986b), which in turn analyzes, and re-analyzes, a variety of sources. I take Carden’s proposal to be particularly acute, and follow it in a number of respects (though for some counter-discussion, see Lust 1986). Carden (1986b) explores in some detail the differences which follow from what he calls a “Surface” vs. an “Abstract” Model of Noncoreference. The progenitors of the Abstract Model he gives as Carden (1986a), Carden and Dietrich (1981), and McCawley (1984); the Surface Model is that advanced by Reinhart (1983). Recent models, not discussed by Carden, have fallen under the rubric of Reconstruction, whether real Reconstruction or quasi-Reconstruction where no actual reconstruction is found: see, e.g., Higginbotham (1985), Williams (1987), and Barss (1985, 1986) for proposals along these lines. Carden, and the discussion here, draw on both the adult grammar and the acquisition evidence. The relevant data are examples such as these: (83)
a. *Near Johni, hei saw a snake e. b. *In John’si bag, hei put some tennis shoes e. c. Near himi, Johni saw a snake e. d. In hisi bag, Johni put some tennis shoes e.
THE ABROGATION OF DS FUNCTIONS
225
In (83c) and (d) coreference between the pronoun contained in the preposed PP and the subject is possible; in (83a) and (b), the name in the preposed PP may not be coreferent with the subject pronoun. In Carden’s account, these facts may be accounted for in two distinct ways: by an “abstract” account which states the condition on disjoint reference at deep structure, or by Reinhart’s “surface” model, where such conditions are stated on s-structure. For Reinhart, this does not involve reference to the trace as well. To the two possibilities outlined by Carden, we may note a third: the possibility of using a level of Reconstruction Structure, or using derived c-command relations at a level like LF. This is more like the D-structure approach in making use of the original DS position. The D-structure model of Carden would state the conditions on disjoint reference at D-structure; the sentences in (83a), (83b) would then be related to their DS counterparts. (84)
a. *Hei saw a snake near Johni. b. *Hei put some tennis shoes in John’si bag.
The sentences in (84) would then be marked ungrammatical at DS, as violating Condition C. They retain their ungrammaticality throughout the derivation. This is identical to the position suggested in Chapter 3. In Reinhart (1983), Condition C is stated over S-structure, not using the position from which the dislocation occurred. This is done by extending the notion of c-command so that an element c-commands all relevant elements in its maximal projection as in (85) (as in Aoun and Sportiche 1981). (Or it may be done by positing a structure in which the preposed PP hangs off S, as in (86).) (85)
*
S′ PP
Near Johni
(86)
*
S NP
VP
hei
saw a snake
S′ PP
NP
VP
Near Johni hei saw a snake
226
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Using some modified notion of c-command, the relevant coreference relations in (85), (86) may be expected to be impossible, on the derived position (Reinhart 1983). While a number of arguments may be broached against this S-structure solution to the problem of disjointness, perhaps the strongest argument has never been mentioned in the literature, to my knowledge. This is simply that the necessity for disjointness holds even under additional embedding. (87) *Near Johni, Bill said hei saw a snake. Reference to simple S-structure position, without reference to some trace clearly will not work for this sentence since there is no manipulation of the phrase marker by which he directly c-commands John here. While it might be the case that disjoint reference (i.e. Condition C) applies to an intermediate structure, e.g. when it has been fronted in the lower clause but not the upper, it seems clear that Reinhart’s general solution, in terms of S-structure c-command without reference to traces, is not really viable. 5.6.1
The Abrogation of DS Functions: Condition C
Let us now consider some acquisition evidence. The following pattern of results is from Carden 1986b, summarizing a large body of experiments (for more detailed evidence, see that article.) The names of the experimental conditions have been changed to reflect current GB-style terminology. (88)
Question-Answering Interpretation Task (Age:3.5–7.0). (italicized elements coreferent) a. Pronominal Coreference i) Mickey is afraid that he might fall down. (78% coreference: Ingram and Shaw) ii) Ken’s mother said that he was sick. (96% coreference: Taylor-Browne) b. Condition C: Dislocated Constituent i) Under Mickey, he found a penny. (78% coreference: Ingram and Shaw) ii) Near Barbara, she dropped the earring. (76% coreference: Taylor-Browne) c. Pronominal Coreference: Dislocated Constituent Near him, Wayne found the programme. (69% coreference: Taylor-Browne)
THE ABROGATION OF DS FUNCTIONS
d.
227
Condition C i) He was glad that Donald got the earring. (24% coreference: Ingram and Shaw) ii) He was glad that Wayne was coming. (13% coreference: Taylor-Browne)
The data may be summarized as follows. First, simple pronominal coreference is of course possible (88a). Second, contrary to the sometimes posited “linearity” conditions on children’s grammar, backwards coreference also appears possible with the coreferent pronoun preceding the name (as long as it doesn’t c-command it). This is shown by examples like (88c): Near him, Wayne saw a snake. This is in line with the Solan (1983) conclusion. Third, children at this age do appear to have Condition C, as they rightly reject coreference in examples like those in (88d): “He was glad that Wayne was coming.” In all these respects, i.e. in examples (88) (a), (c), and (d), children are behaving identically to adults. Finally, however, they do diverge from the adult data in (88b). Coreference is allowed by children in examples in which the fronted PP contains a name, which is coreferent with a pronoun which c-commands it in D-structure: “Under Mickey, he found a snake”. That is, instances of Condition C with a dislocated name are not blocked for the child. Carden draws the correct conclusion with respect to the consequence this data has for the D-structure account vs. Reinhart’s direct c-command account. (Reconstruction accounts here would pair with the D-structure account.) Reinhart unifies the instances of obligatory disjointness in (89a) and (b) at a single level, and under a single condition: the c-command condition (Condition C) applying at S-structure. (89)
a. *He was angry that Wayne was there. b. *Under Wayne, he put a dime.
Insofar as such a unification is appropriate, one would expect it to appear uniformly in the developmental sequence as well: either the c-command condition on coreference holds or it does not, at any given developmental stage. However, this is not the case: examples like (89a) are correctly rejected by the child (under the coreferent interpretation), while coreference is possible in (89b). But there is no “value” of the c-command condition as Reinhart states it which could change over time to allow (89b) in, while (89a) is out. The D-structure account, and the Reconstruction account as well, would distinguish the data in the appropriate way, by allowing for two distinctions: the Condition C condition itself, and the fact of dislocation. Condition C holds for
228
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
the child, but not under movement. This suggests that it is not Condition C which is at fault at all, but rather that the D-structure representation which would act as the target for that condition is not computed by the child, under conditions of dislocation. In particular, the data follows from the following shallow derivation. The acquisition sequence is shallow, going back to DS′ rather than DS. This is shown in (90) and (91). (90)
(91)
(92)
Adult analysis. DS: *Hei saw a snake near Johni. SS: *Near Johni, hei saw a snake t. (retains star) Child analysis. DS′: Near Johni, hei saw a snake e. SS: Near Johni, hei saw a snake e.
DS
DS′
shallow analysis Condition C applies throughout
SS Condition C applies throughout, and it applies directly to structurally defined c-command, rather than through some derived notion using chains as an equivalence class. But this means that Condition C will not apply if the child has a shallow structure such as that in (91) at the deepest level of analysis. This is precisely Carden’s result. In general, both this analysis and the above analysis of control, suggest that the grammatical functions associated with a level will be abrogated, if that level itself is not computed, or is only partially computed, by the child. In the case of control, the abrogated function is the positive indexing rule, where the sentential element is indexed with its object controller in its DS position. Since the child does not analyze the dislocated clause as ever being in that position in the course of the derivation, the grammatical function associated with that position, the indexing to the Direct Object controller, is abrogated, since its structural condition is not met. Hence the default rule of null topic control applies instead.
THE ABROGATION OF DS FUNCTIONS
229
But there is no change in the control rule (or principle) itself; it is simply that one of the set of structures feeding it has not been supplied. Similarly, here, with respect to the negative (contra-)indexing rule of Condition C. Its structural condition is not met, so it does not apply, given the shallow analysis. (93)
Control: Object control structural condition not met
DS
DS′
Condition C: Structural Condition not met deepest computed level
SS
5.6.2
The Application of Indexing
While Carden correctly notes the advantage of the movement account for the above data, there is no sense in which the abrogation of Condition C under movement would logically follow in his account. Under the sort of analysis suggested in this chapter, however, it is not simply the case that the data is cut naturally, but that the particular sort of failing by the child, the failure of Condition C only under conditions of movement, would be predicted. This is because the failure is not a failure of Condition C at all (which applies correctly at all the relevant levels), but rather of shallowness of analysis. Since Condition C, and all the binding conditions, apply directly on structures, and since the child is not computing a full derivation DS-SS, but rather a shallow derivation DS′-SS, the relevant level in which the pronoun is c-commanding the name is not available to the child. Rather, at the deepest level of analysis computed by the child, the preposed element (generally a PP) is already in fronted position, binding a trace in argument position. This means, however, that the name is not c-commanded by the pronoun at any level of analysis. So Condition C, while operative in the child’s grammar, finds no level in which its “structural condition” — i.e. a name c-commanded by another name or a pronoun — is satisfied. So none of the relevant structures are marked *; the grammar overgenerates. The analysis above therefore supports the following set of propositions: a)
that there is a D-structure (i.e. that the grammar is in the derivational, rather than the representational, mode),
230 b) c) d)
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
that binding principles, in particular Condition C and Control, apply throughout the derivation, that the binding principles apply to a structurally defined c-command relation, and that therefore if DS is not computed, or is only partially computed, in a particular analysis in acquisition, then the positive or negative principles associated with it will be abrogated.
Let us assume the following metatheoretical condition on the indexing rules. (94)
Metatheoretical Condition on Indexing a. If a positive condition applies, it must be satisfied somewhere in the course of a derivation. b. If a negative condition applies, it must be satisfied nowhere in the course of a derivation.
For the present, we may assume that the conditions in (94) apply particularly to the Binding Theory, and in general to binding, i.e. to the marking of co- and disjoint reference, though there may be other areas in which it would be applicable as well. A positive condition would be the marking of coreference; a negative condition would be the marking of disjoint reference. Further, I will assume, following the discussion in Chapter 3, that the Binding Theory applies so that c-command is defined directly, rather than derivatively, or by some further level of reconstruction. Consider how (94a) would work for positive conditions of coreference, for example, for anaphoric binding. The anaphoric element would enter the derivation with no index. Throughout the derivation the element could be indexed with an antecedent, if the locality conditions of the binding theory were met with respect to that antecedent. Finally, at LF (or LF′), all elements are checked for an index: structures having unindexed elements are thrown out. This means that if the element satisfies the binding conditions at any point, the derivation will be sound, if the binding condition is a “positive” one: i.e. one which requires or involves the assignment of an index. This sort of theory is close in spirit to the “Assign gamma feature” of Lasnik and Saito (1984). With respect to alternative notions of the Binding Theory, the view proposed in (94) must be defended in two ways. On the one hand, one might propose that the Binding Theory, or binding, applies at some particular level: for example, NP-structure or LF. On the other hand, one might view the binding theory as applying at “Reconstruction Structure”, where reconstruction structure is not a level in the traditional sense, but something like the union of information from the previous levels (see, e.g. Williams 1987, for such a conception). The
THE ABROGATION OF DS FUNCTIONS
231
latter view is of course rather difficult to distinguish from the position given above since in both cases a union of information is being taken, but it is possible to distinguish between them. A complete discussion of the Principle in (94) goes beyond the scope of this book, see Lebeaux (1991) for further discussion (see also Barss 1985, 1986, for a relatively thorough discussion of reconstruction: I was able to see that work only after the following was written). In the following several paragraphs, I will try to briefly indicate some effects of the relevant positions — particularly with respect to reconstruction. Let us first consider the possibility of Binding Theory applying within the derivation. In cases of NP movement, binding must at least be allowed after the movement has occurred, to account for examples like (95). (95)
The boys seemed to each other t to be very nice.
Examples like (95) also show that the positive condition, Condition A, need not be satisfied everywhere, since it is not satisfied at DS. Condition A must also be allowed to apply post wh-movement, to account for the “pit stop” property of Steve Weisler (p.c.): namely, that a wh-element in a moved wh-phrase may contain a reflexive bound to any of the NPs in the intervening clauses (possible antecedents italicized). (96) (97)
a. Which pictures of himself did John say Bill liked e? b. Which pictures of himself did John say Bill liked e? John wondered which pictures of himself Bill liked.
In (96), the reflexive appears to be bound to John from its position in the intervening Comp. This, together with simpler data like that in (97), suggests that anaphoric binding must at least apply after wh-movement. It might be suggested, then, that S-structure is the place. Note that even here some sort of union of indexing must be involved — either throughout the derivation or by an equivalence class in chains — since both of the indexings in (98) are possible (coreferent items italicized). (98)
a. b.
John wondered which pictures of himself Bill liked e. John wondered which pictures of himself Bill liked e.
There is some interesting evidence, however, from T. Daniel Seeley (Seeley 1989) which suggests that the Binding Theory must also apply at LF (the interpretation here is mine not Seeley’s). Consider the behavior of stressed reflexives in a discourse (Seeley 1989).
232
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(99)
A: Does John like MARY? B: No, John likes HIMSELF. (100) A: Do the boys believe John to like STEVE? B: ?No, the boys believe John to like EACH OTHER. (101) A: Do the boys expect Steve to believe John to have seen BILL? B: ?*No, the boys expect Steve to believe John to have seen EACH OTHER. (102) A: Did Bill leave because Sue saw MARY? B: *No, Bill left because Sue saw HIMSELF. The judgements in (99)–(102) are by no means crystal-clear, but I believe that they capture the intuitions of a number of native speakers. The judgement of (101) is the most various one; many speakers, including myself, find it ungrammatical, but some find it acceptable. If the judgements are correct, two facts emerge: stressed reflexives may escape their immediate clause, but not further, and stressed reflexives in an adjunct may not take an element in the main clause as an antecedent (102). These facts may be captured with a simple analysis. The stressed reflexive, like other focussed elements, is fronted to an S-adjoined position at LF (Chomsky 1977b). This means that it escapes from its binding category at LF, and may take a higher clause element as antecedent. (103) LF of (100b): No, The boysi believe (each otheri (John to like ei)) Consequently, (100) would be grammatical because Condition A would be satisfied at LF. (99) would be grammatical because Condition A would be satisfied at S-structure (or earlier). However, neither (101) nor (102) would be grammatical because the output after focus fronting would not satisfy the binding conditions. (104) LF of (101): ?*No, The boys expect Steve to believe (each other (John to have i i seen ei)). (105) LF of (102): *Johni left because (himselfi (Sue saw ei)). The binding in (104) would violate the binding conditions, as would the binding in (105). Seeley’s data from stressed reflexives, as well as the even more clear-cut evidence from wh-movement suggests that positive binding conditions cannot be stated at a single level, without reference to either other levels as in the cumulative-derivational approach as suggested above, where the actual indexing is done
THE ABROGATION OF DS FUNCTIONS
233
throughout, or via a reconstruction-type approach, which defines a derived notion of c-command or equivalence classes of chains. Certain facts, in particular those which have come under the rubric of chain-binding (Barss 1985, 1986), seem to be problematic to the cumulative-derivational approach. (106) Chain-binding a. Those pictures of himself are the ones that I think that John really likes. b. Those pictures of each other are the kinds of things that Bill thought that those men really liked. Here, the reflexive seems to require reference to the embedded trace, though presumably no movement has occurred. Of course, the entire binding theory cannot be stated over an equivalence class of chains (if one wished to define a chain having the left-most member of the copular sentence in (106) as its head, the lexical relative clause head as a middle member, and the trace as the tail), since Condition C does not apply obligatorily. (107) Those pictures of John are the ones that he really enjoys. One empirical oddity about the structures in (106) seems to me to be the following. These constructions, unlike other long distance binding into nominals in “standard” sentences, require the nominal bound into as having a implicit possessor identical to the anaphor. My judgements are the following: (108) a.
b. (109) a.
b.
Those pictures of each other are the ones that the boys really like. (must be their pictures of each other) The boys really like those pictures of each other. (need not be their pictures of each other) Those stories about each other are the ones that the boys believe to be true. (must be their stories about each other) The boys believe those stories about each other to be true. (need not be their stories about each other)
Oddly, this implicit possessor reading does not seem to be required of the corresponding pseudo-cleft type. (110) What the boys like are stories about each other. If these judgements are correct, then the theoretical problem associated with the copular sentences in (109) dissolves, though not the one for (110).
234
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
More generally, it would seem to me preferable to retain direct notions of c-command and chains, and revise the relevant notion of phrase structure, than to do the reverse. See Lebeaux (1991, 1998) for more discussion. 5.6.3
Distinguishing Accounts
In this chapter, I have been following a particular type of proposal in order to account for certain facts in acquisition. Namely, that indexing applies throughout the derivation, that it applies directly in terms of structurally defined c-command, and that certain differences between the child’s grammar and the adult’s may then follow from the fact that the child’s analysis is, in certain respects, shallow. The grammatical functions associated with the partially missing or partially computed levels would therefore be abrogated. In the last section, I suggested that this view would follow from a general metatheoretical condition on indexing, namely that positive indexing applies throughout the derivation (i.e. positive conditions must be satisfied somewhere), while negative conditions may never be satisfied. While issues are complex, I would like to indicate briefly in this section the differences between this account, and those which fall under the rubric Reconstruction. Two broad types of reconstruction accounts may be distinguished, those which involve actual reconstruction of the moved element, and those which define c-command relations or equivalence classes of elements in chains in terms of the dislocated structure. The latter type of account, I will call quasi-Reconstruction. In spite of the similarities between the cumulative notion of indexing above and the reconstruction approaches in general — both allow for the union of certain types of information — differences would be expected between them. In particular: (1) to the extent to which elements are added in the course of the derivation (Chapter 3), various conditions may be taken to not apply to the added element, in the cumulative-derivational view, while this result may only be gotten with difficulty using (quasi-)reconstruction, (2) the cumulative-derivational view would allow for an ordering of operations within the grammar, and hence for bleeding or blocking possibilities, a result which would, again, only be possible with difficulty using quasi-reconstruction. To the extent to which (1) and (2) hold, the cumulative-derivational view is supported (recall that the cumulativederivational view holds that binding possibilities — e.g. positive indexing as in Condition A — applies cumulatively throughout the derivation, adding indexings). I will examine (1) and (2) here only with respect to the acquisition evidence above, in particular with respect to the abrogation of DS functions; see Lebeaux (in preparation) for a more complete syntactic discussion.
THE ABROGATION OF DS FUNCTIONS
235
Let us consider an instance of the two accounts. The positive conditions everywhere/negative conditions nowhere account would have the following form: (111) DS
Binding operations apply throughout (e.g. Condition A)
SS LF
Indices Checked
(112) DS
Negative Condition may not be met (anywhere)
SS LF
And assume a Reconstruction-account of the following form, where Reconstructionstructure, either as a set of structures or a set of defined relations, holds at LF. (113) DS
SS LF ~ Reconstruction Structure Now consider the acquisition data that we have been examining as well as earlier syntactic data, to see which is preferable. The cumulative-derivational account above can account for the lack of Condition C effects for dislocated constituents (Carden’s data), by assuming that DS is not computed, insofar as the dislocation is concerned. The Reconstruction-type account apparently can account for the (lack of) Condition C effects as well. Suppose that a parallel reconstruction-type account has the following form: (114) Condition C is stated over R-structure, a set of structures derived from LF by … (115) In the child’s grammar, no separate level of R-structure exists at a particular stage, because the derivation is shallow in the LF direction. Then the lack of Condition C effects for the Carden type sentences above is explained: (116) In Mickey’s wallet, he put a penny e. (OK for child: coreferent items italicized)
236
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Sentence (116) would not have a R-structure corresponding to the structure in which “In Mickey’s wallet” has been put back into place (however exactly this is done). Hence it would invoke no Condition C violation by the child: the actual result. This is shown in (117) (117) DS
SS LF′
actual computed structure LF
R-structure
However, we earlier noted syntactic considerations which mediated against Condition C being stated over R-structure: the anti-Reconstruction effects of van Riemsdijk and Williams (1981) (see Chapter 3). Given the existence of these effects, and the argument/adjunct distinction in the presence of a Condition C violation for dislocated constituents, syntactic considerations alone advance a Positive Conditions Everywhere/Negative Conditions Nowhere type approach. We are left with the following. (118) Type of Binding into Dislocated Constituent (cumulative approach vs. R-structure) Condition C acquisition evidence indeterminate syntactic evidence cumulative approach (Positive Conditions Everywhere/ Negative Conditions Nowhere) This summarizes the set of data considered in Chapter 3: the Condition C binding. What about the data considered in the last section: that bearing on control. Here, the syntactic evidence is indeterminate about the type of approach which is supported, but the acquisition evidence is not, supporting a cumulativederivational approach, though weakly, over one involving a level of R-structure. The full chart, then, for the two instances discussed will be the following.
THE ABROGATION OF DS FUNCTIONS
237
(119) Type of Binding into Dislocated Constituent (cumulative-derivational approach vs. R-structure) Condition C
Control
acquisition evidence
indeterminate
cumulative approach
syntactic evidence
cumulative approach indeterminate
cumulative approach = Positive Conditions Everywhere/Negative Conditions Nowhere Consider why the acquisition evidence does support the cumulative derivational (direct) approach, given the general analysis of Non-Obligatory Control given in the previous section. The crucial data were the Tavakolian-type sentences given in (120). (120) a. b.
PRO to kiss the duck would make the lion happy. PRO to leave the room would make the skunk happy.
As noted there, children, but not adults, allow extra-sentential reference in such constructions. I suggested that this was due to the interaction of the following three factors: (121) a. b. c.
Non Obligatory Control clausal subjects originate in internal-toVP position Control requires direct c-command; when this applies at DS, the other internal argument is the controller The sentential clause is fronted; if the PRO is unindexed, it is operator-bound, and gets its index from the operator.
In this theory, it is the presence of the control clause in internal position, and the application of control there, which “bleeds” the later possibility of operatorbinding and external reference. As noted earlier, if the analysis is shallow, i.e. if the deepest computed level is DS′ not DS, and at DS′ the control clause is already in fronted position, then control by the internal to VP element will not apply. So operator binding will take place, and extrasentential reference will occur. This accounts for the Tavakolian results. (122) Adult analysis: a. DS e would make the lion happy (PRO to kiss the duck).
238
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
b.
DS after Control e would make the lioni happy (PROi to kiss the duck). c. SS (PROi to kiss the duck)j would make the lioni happy ej. (123) Child’s analysis: a. DS′ (PRO to kiss the duck)j would make the lion happy ej. b. SS Oi (PROi to kiss the duck)j would make the lion happy ej. It is the shallowness of the derivation which allows the operator to be inserted. (Note that in the child’s analysis, an empty category is present in the second object position at all levels of representation, as required by the Projection Principle, etc; it is just not the trace of a real movement operation.) (124) DS
normal control rule
DS′
shallow analysis
SS LF
default topic insertion and interpretation
Suppose that we tried to do it with a Reconstruction type account. (125) Adult Grammar:
DS SS LF ~ R-structure Operator (topic) interpretation or insertion
THE ABROGATION OF DS FUNCTIONS
239
(126) Child Grammar:
DS SS LF′ LF
R-structure
Operator (topic) interpretation or insertion The corresponding control rule, using reconstruction, would be the following. The dislocated constituent is placed back into its dislocated position (or read as-if placed back) at R-structure. This operation is optional. If it applies, control by the object NP is possible; otherwise a peripheral topic operator is inserted. The key syntactic fact would be the ordering of the reconstruction operation and the topic interpretation. Consider now what would happen in acquisition. As noted above for Condition C effects, the parallel way of accomplishing this end would be by supposing that R-structure was not relevant for the child’s grammar, the grammar being shallow in that direction. But this situation is not as symmetrical as it seems. The control rule and the operator insertion rule are on opposite sides of the grammar for the cumulative-derivational formulation of control, but not for the reconstruction type approach. A shallow analysis (lacking in some of the operations of D-structure) for the first case would eliminate object control, but would retain operator insertion, which is occurring on the other side of the grammar. In the second case, however, the shallowness of analysis at LF would eliminate (abrogate) both the reconstruction rule, and the default rule of operator insertion or interpretation. The result would be a structure which would be illdefined: not the actual result.
5.7 Case Study III: Wh-Questions and Strong Crossover In the previous two cases under investigation, I have dealt with two areas in which it appears that the child’s grammar is “shallow”: the analysis of control, and the analysis of constructions which should apparently be ruled out by Condition C. In each case, it was found that the child’s grammar diverged from the adult’s. However, this was taken not as evidence that the condition itself was
240
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
different in the two grammars (Non Obligatory Control, Condition C), nor that there was a difference in category type (i.e. that the initial PRO was pro, or some neutralized null category), but rather that the child’s analysis was shallow: anchored in S-structure, and extended only part of the way back to DS, to DS′. As such, the constraints and operations which would have applied to the DS representation did not. As a consequence: (i) in the case of Control, since the c-command condition for control was not met, a default operation of operator insertion applied, and extrasentential reference was gotten; (ii) in the case of Condition C, with dislocated constituents at the deepest level of analysis, Condition C did not apply because its structural condition was not met (the name did not have a c-commanding coreferent name or pronoun). In the next sections, I would like to discuss a third area in which the child adopts a shallow analysis: the analysis of wh-questions. The data here are drawn from an important paper: Roeper, Akiyama, Mallis, and Rooth (1986), “Crossover and Binding in Children’s Grammars.” The data are quite complex, and the paper itself is not as well known as it should be. Accordingly, I will discuss here first the analysis of wh-questions which I will adopt, then summarize the paper, and then present an analysis of how the acquisition data can be accounted for within the general levels-of-representation conception that this work argues for. 5.7.1
Wh-questions: Barriers framework
While I am generally assuming the extension of GB found in Barriers (Chomsky 1986), the analysis of wh-questions in acquisition is more specifically tied to elements of the analysis of Barriers than other aspects of this thesis. (Indeed, it strongly supports particular aspects of that analysis, and would not be workable without it.) A quick review of the relevant points is therefore in order. With respect to X′-theory, , the domain of elements falling outside of S (= I″ = IP) is more articulated than in earlier versions (Chomsky 1981). In particular, S is the maximal projection of Infl, and, crucially, wh-movement is not into Comp, but into the Spec C′ position. That is, the full structure of the clause is as follows.
THE ABROGATION OF DS FUNCTIONS
(127)
241
C′′ C′
SpecC′ Comp
S (=IP) NP
I′ Infl
VP V
NP
The movement of a fronted NP is no longer into Comp (or an adjunction in Comp), but rather into the specifier position of C′ (it will become clearer later why I am reviewing this). A second innovation of the Barriers-type approach is that the movement operation is a substitution operation (into Spec), rather than an adjunction. A general consequence of that is that Infl, including do, may now move into the head position of Comp. The entire clause then becomes a projection of Infl. The movement of Infl into Comp, and overlay of Comp by Infl, is shown in (128).
Ι′′
(128)
I′
SpecI′ Infl
S (=IP) NP
I′ Infl
VP V
NP
A third innovation has to do with the locality of the movement. Chomsky (1986), following the analysis of anaphor movement in Lebeaux (1983), assumes that movement is highly local; Chomsky (1986b) assumes likewise for wh-movement, involving adjunction to intermediate nodes, including VP. While I accept this part of the analysis, it will not be crucial in what follows. We note immediately one consequence of the Barriers analysis, which has already been commented on in the foregoing (Chapter 1). Assuming that
242
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
categorial selection is selection of the head (Belletti and Rizzi 1988), a wh-clause must be selected in terms of its Comp feature, not the element in Spec C′. This means, in turn, that Spec C′ must agree with the +/−wh-feature in Comp, so that the selectional process in the grammar “knows” that a wh-element has been moved into Spec, and can differentiate (129a) from (129b). I assume, simply, that there is, still, such a +/−wh-feature in Comp, and that it agrees with Spec C′. (129) a. I wonder ((who) ((+wh) (John saw))) b. *I wonder ((e) ((+wh) (John saw who))) The reason for the ungrammaticality of (129) is that the +wh-feature is selected by the verb, and this is unsatisfied at S-structure. Such satisfaction is required at S-structure, for English. Before proceeding, let us note three facts which support Chomsky’s analysis. First, the analysis of Infl movement as overlaying Comp is supported by selectional facts. Assuming that complement taking verbs (believe, wonder) select for a particular type of Comp (+wh, −wh, respectively), and assuming that this selection must be satisfied at all levels of representation, we have an immediate explanation, given Chomsky’s analysis, of why Subject/Aux inversion is impossible in subordinate clauses. If Aux moved into Comp, the clause itself would be a projection of Aux (or Infl): i.e., it would be I″. This would mean, however, that the selected clause was C″ at DS, but I″ at SS: an impossibility, given the assumptions above. Hence no selected clause may have Subject/Aux inversion, the correct result. A second fact supporting Chomsky’s analysis is conceptual, but I believe powerful. In traditional Extended Standard Theory and GB analyses (e.g. Chomsky and Lasnik 1977, Chomsky 1981), Comp is a sort of “garbage” category. It contains mostly closed class elements (that, if, etc.), but also those of radically different character, open class NPs (whose hat, etc.). This made Comp very difficult to treat as a unified element, and very difficult to probe the properties of. Given the current theory, Comp again makes sense categorially: it is a position in which a particular set of closed morphemes may appear. A third fact has to do with wh-island effects. It has sometimes been noted, though usually just in passing, that there is a considerable contrast in grammaticality between examples (130a) and (130b). (130) a. *Who do you wonder which books John gave e to e? b. Who do you wonder if John gave books to e? In the example in (130a), the dependencies are nested, so a crossing constraint cannot be the cause of the ungrammaticality.
THE ABROGATION OF DS FUNCTIONS
243
While other candidates for explanation of the difference are available in pre-Barriers type frameworks, Barriers does provide a ready explanation for the difference. While a +wh Comp does exist in the complement clause in both cases, in the former case, but not the latter, the Spec C′ position is filled (with which books). This suggests that the long distance extraction in (130b) is through that position, and the ungrammaticality of (130a) should be traced precisely to the fact that that position is unavailable. This proposal might be instantiated in a number of ways, which I will not try to go into here. This sort of explanation is not nearly as available if one assumes that extraction is through Comp, since if is an obligatory element (though variants may be tried, using particular indexing algorithms). 5.7.2
Strong Crossover
Before turning to the acquisition evidence, let us deal with another aspect of whquestions: namely, the existence of (Strong) Crossover effects. Such effects, first noted by Postal (1974) essentially forbid the “crossing over” of a moved wh-item over a coreferent pronoun or name, in configurations in which the pronoun or name c-commands the trace of the wh-element. Contrasts such as (131) were noted by Postal. (131) a. *Whoi did hei say that John liked ei? b. Whoi did the man that saw himi say that John liked ei? In (131b), who may be construed as coreferent with he, but not in (131a). Similarly, and yet more clearly, there is a difference in the possibility of a bound reading in (132) depending on whether a crossover or a noncrossover configuration underlies it. (132) a. b.
Whoi ei ate hisi hat? Whoi did hei say ei ate hisi hat?
(132a) easily allows a bound reading. However, while (132b) would be structurally identical (with the addition of an element), if one simply looked at the indexing and ignored the lexical/nonlexical distinction, it does not allow such a reading. (133) a. b.
Whox (x ate x’s hat)? Whox did (x say x ate x’s hat)? (not allowed as reading)
The contrast between (132) and (133), then, is strong evidence for the role of Strong Crossover in the adult grammar.
244
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
Strikingly, and remarkably, such a contrast does not exist in the child grammar, except for some peripheral structures (Roeper, Akiyama, Mallis and Rooth 1986). It is to this acquisition evidence, and the theoretical consequence of that evidence, that I will return below. While the Strong Crossover fact is quite uncontroversial, the theoretical explanation of the fact is a good deal less so. At least three proposals exist in the literature. Postal (1974) suggests that the condition is actually a condition on a transformation — i.e., that the crossing of a wh-element over a c-commanding name or pronoun is disallowed. Chomsky (class lectures 1985) has at least contemplated a similar idea. A second alternative, perhaps the most widely accepted, is that the trace left by wh-movement is a name (Chomsky 1981), and that Condition C bars the relevant configuration because a name would be c-commanded by a coreferent element in an A position, at S-structure. (134) Whoi did hei visit ei? ↑ ↑ c-commanding name pronoun A third possibility has been suggested by van Riemsdijk and Williams (1981). This is that the condition is neither on a movement operation, nor on the trace as a name, but rather on the pre-movement structure. If the wh-element itself is considered a name, and one adopts the position advocated earlier in this work — that positive conditions must be satisfied everywhere, and negative conditions nowhere violated — then the Condition C will rule out the pre-movement structures at DS. (135) a. *Hei didn’t know whoi? DS b. *Whoi didn’t hei know ei ? SS (retains * from DS) (135a) is ruled out at DS, and the full derivation in (135) retains the ungrammaticality of (135a). The van Riemsdijk and Williams proposal has certain attractive features, though they are hardly decisive. First, it allows the rather natural proposal of Joseph Aoun, that wh-trace is an anaphor as a locally necessarily dependant element, to be straightforwardly instantiated. Given that it is the wh-element itself which is the name, and that Condition C is stated in terms of that, the whtrace is “freed” to be an anaphor. Second, the van Riemsdijk and Williams proposal does not require recourse to layered traces. As van Riemsdijk and Williams note, in an important way, the strong crossover effect holds not only over the whole moved phrasal node, but over all the material that it dominates.
THE ABROGATION OF DS FUNCTIONS
245
(136) *Whosei hat did hei eat e ? Whose in (136) cannot be coreferent with he. This constraint cannot be stated over the maximal null phrasal category, but must be stated in terms of layered traces. While layered traces would, under certain renditions of movement, even be expected, they have certain characteristics, aside from complexity, which are somewhat unattractive. It must not only be the case that the trace is layered, but also that each individual subnode is individually co-indexed with its antecedent in the moved item. Further, the government relation must be defined between a set of elements, all null. More problematic, from the point of view of this thesis, is that syntactic elements which correspond to a phonologically null segment of the string are no longer closed class (i.e. necessarily finite in character), but open class. This is because a layered trace may contain arbitrarily complex syntactic material. In the following, I will argue that acquisition evidence supports the van Riemsdijk/Williams proposal (or possibly Postal’s original proposal) over the alternative, that variables act as names. This supports, or allows support to develop for, Joseph Aoun’s proposal: that wh-traces are anaphors. It also further supports, and allows articulation for, the proposal with which this chapter is concerned: that the derivation is real, and may be construed, by the child under certain circumstances, as shallow. 5.7.3
Acquisition Evidence
The basic finding of the Roeper et. al experiments is that Strong Crossover does not exist for children, for a majority of constructions. (The exceptions, those constructions in which Strong Crossover does exist for the child, also play a crucial role in the following analysis.) Why should this be? One possibility is that the constraint itself is not available at an early stage, and “pops into” the grammar at some later stage. Recall, however, that a similar solution was found wanting for the lack of Condition C effects in constructions like the following: (137) In John’si room, hei put a book. (OK for kids, * for adults) As noted earlier in this chapter (see also Carden 1986a, 1986b), it is not the case that Condition C has disappeared at the point at which constructions like (137) are wrongly accepted. Rather, Condition C is present in its simple form, but just doesn’t seem to be operating in dislocated structures. That pattern of judgements, it was argued, was not diagnostic of the lack of Condition C, as a condition, at all, but rather due to the fact that its structural condition was not met, due to the
246
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
shallowness of analysis by the child. A similar explanation was found for the Tavakolian data. Let us exclude the possibility of a principle suddenly appearing as follows: (138) Universal Application of Principles Any (universal) principle P in the adult grammar applies at all stages of development, if the vocabulary satisfying that principle is present. By the parenthetical universal in (138), I do not mean to restrict the application unnecessarily, but simply to allow for the fact that a parameterized principle would not have to apply in its language-specific form at all stages of development. The proviso in (138) would require for each principle P in UG, that i) it either hold in the appropriate language specific form at all stages of development for a given language L, or ii) it be given a different parametric form than the one in the language L, but still be part of the specification in UG, or iii) that the vocabulary over which the principle holds, not be present in the child’s grammar. This would then require that the binding principles apply as soon as the vocabulary defining them (presumably the +referential features on nouns) were defined. An explanation of the Jakubewicz (1984) and Wexler and Chien (1985, 1987a) data would therefore have to be found which would be in accord with this principle. The earlier case, where only theta theory applied in initial representations, would not be a counterexample, since the vocabulary over which Case assignment was defined would not be present. With respect to the data at hand here, the question is whether a similar, levels-of-representation type analysis can be found for the strong crossover data. As noted above, children allow both (139a) and (b) as well-formed structures at first approximation, according to Roeper, et.al. (139) a. b.
Whoi ei thinks hei likes hisi hat? Whoi does hei think ei likes hisi hat?
Indeed, the full set of data is quite complex and apparently somewhat confusing, the result of complex array of experiments performed by Roeper and his colleagues. Considered in full, the theoretical problem becomes quite intricate. I will present here the data from experiment #7 which is the most detailed set of data which Roeper et.al. provide, and characteristic of the whole.
247
THE ABROGATION OF DS FUNCTIONS
Percent coreferent or bound I.
Noncrossover configuration: single clause a. Who is V-ing himself? b. Who is V-ing him? c. Who is V-ing his N?
100.0 027.0 036.9
Crossover configuration: single clause a. What is he V-ing? b. Who is he V-ing? c. Whose N is he V-ing?
015.9 025.9 003.6
III. Noncrossover configuration: 2 clauses a. Who thinks NP is V-ing him? b. Who thinks he is V-ing NP?
040.5 038.1
II.
IV.
Crossover configuration: 2 clauses a. Who does he think NP is V-ing? b. Who does he think is V-ing NP? c. Who does he think he is V-ing?
029.8 019.0 035.2
Figure 1. Percentage of Coreferent or Bound Responses
A word about the notation is in order. V stands for any of a number of the verbs chosen; and NP for any of a number of NPs. An exemplar of “Who does he think NP is V-ing?” would be “Who does he think Big Bird is pushing e?” The data in I. is of theoretical interest only as a basis of comparison. Note that: i) children have 100% bound responses when the bound element is a reflexive (Ia.), ii) coreference or binding is allowed, at least marginally, for single clause structures with a coreferent pronoun (Ib., 27%). This result is already familiar from work by Jakubewitz (1984) and Wexler and Chien (1985, 1987a). The first striking result comes in IIb. This is a classic crossover configuration, and unlike IIa., has who as a questioned word, so coreference would be possible without violating animacy requirements. For this question, it appears that 25.9% of the children allow coreference, a clear violation of the adult rule. More strikingly, and yet comfortingly as well for the accuracy of the original result, is the fact that the result in IIb. is contrasted with that in IIc. Children do not allow coreference between the fronted wh-element and the crossed over pronoun, if that wh-element is part of a containing wh-phrase (*Whosei hat did hei like e? for children, as well as adults). The fact that children obey the crossover constraint here is crucial, since it shows that there is not simply a total breakdown in the grammar, or that the allowance of strong crossover in the simple wh- cases is due to the
248
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
complexity of the task. Rather, in the slightly more complex case — where whose hat has been fronted — strong crossover is obeyed, even if it is not in the simpler cases (only 3.6% coreference in IIc).This is then the second puzzle to account for, along with the original puzzle of the lack of strong crossover in cases like IIb. Finally, there is a third result to be explained, which does not show up so strongly in this set of data, but does in other data sets gathered by Roeper et.al. Roeper et.al. note that the lack of a strong crossover condition is not uniform across ages with respect to clauses. Rather, they note that there is a developmental pattern of the following sort. (140) Stage I: Stage II: Stage III:
No Strong Crossover Condition Strong Crossover Condition for 1 clause sentences; no Strong Crossover for 2 clause sentences. Strong Crossover Condition generally
This result, noted by Roeper et al., does not rise clearly out of the data in Figure I, but perhaps its outlines can be seen by comparing the 25.9% strong crossover violation in IIb with the 29.8% and 35.2% strong crossover in IIIa and IIIc. See Roeper et al. for more extensive discussion of this result. To summarize, there are three problems or puzzles which must be answered by a legitimate acquisition account: (i)
How does one account for the fact that Strong Crossover does not seem to be operating — or not nearly as strongly — in the child’s grammar as in the adult’s, for examples like (i)′? (i)′ Whoi is hei V-ing ei? (ii) How does one account for the fact that, at the same time that Strong Crossover is not respected by the child in constructions like (i)′, it is respected for (ii)′, where a full NP is fronted? (ii)′ Whosei hat is hei V-ing e? (iii) How does one account for a particular lag in acquisition? Namely, that children first “learn” the strong crossover constraint in one clause constructions, and then repeat their initial mistake of not having strong crossover in two clause structures. Such a construction-sensitive difference would not be expected in any simple parameter-setting account. 5.7.4
Two possibilities of explanation
Given the basic bifurcation of grammars into those in the representational mode vs. those in the derivational mode, two major possibilities arise in the explanation
THE ABROGATION OF DS FUNCTIONS
249
of puzzles (i)–(iii) above, if one excludes the possibility that strong crossover has suddenly popped into the grammar at the relevant stage (this latter possibility is in any case made unlikely by the data in (ii) above). On the one hand, one might assume that there has been some change in the representation (say, the S-structure representation). For example, there could be a change in the category type of the element corresponding to the wh-trace in the adult grammar. This is the position that Roeper et al. take: that the initial wh-trace is actually little pro. As such, the Roeper et al. explanation is grounded, as it were, in the representational mode. Though the grammar as a whole contains a derivation for Roeper et al., the particular acquisition explanation is not dependant on that, just as it is not in Hyams (1985, 1986, 1987). On the other hand, we might suppose that the difference over time for children is not represenationally based, but derivationally based: for example, that the derivation is shallower for children than adults, or somehow different. This type of acquisition account underlies the analysis of the data above: i.e. the analysis of the Tavakolian and Carden data. However, this account may be deepened and made more subtle in a number of ways. For example, it needn’t be the case that if the explanation is derivationally based, the representational system is exactly the same that it is in the adult grammar. Rather, another possibility presents itself: that the child’s derivation is different, and because of this, as an effect, the representation is different as well. Say, the representation at S-structure. Under this view, while it may well be true that some aspect of the representation has been changed over time — for example, the null element has changed from pro to wh-trace — this is not the deepest level of analysis. This is to be found instead in the derivation itself, and it is the change in the derivation which has given rise to the definitional properties which mean that the representation is read differently. In a sense, the parametric change, while real, is not the cause of the acquisitional change, but an effect. That is the position taken in the analysis below. 5.7.5
A Representational Account
Roeper et al. take a purely representational view. They suggest that the basic difference between the adult grammar and that of the child is in the category type of the null element corresponding to the wh-trace in the adult grammar. They argue, in essence, that the null category corresponding to the wh-trace in the adult grammar is not a wh-trace for the child at all, but rather an indexed little pro. The representation of (i)′ above is then the following:
250
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(141) Whoi is he V-ing proi? (e.g. Whoi is he following proi?) Although I will argue against this view as the ultimate basis for the acquisition facts, this sort of explanation has much to recommend it. First, and most crucially, it allows for the explanation of the lack of Strong Crossover by children. Assuming that the strong crossover effect is really dependant on the fact that wh-traces, as names, cannot be c-commanded by coindexed pronouns, if one changes the category type to an element which can be A-bound — namely, to small pro — the lack of Strong Crossover is explained. Second, the assumption that the initial trace is specifically little pro could help to explain the one clause/two clause contrast noted above: namely, that the strong crossover constraint seems to come into the grammar first for one clause constructions, and only later for two clause constructions. This might be traced, not to the application of Condition C to a name, but to the application of Condition B to little pro. Since little pro would obey Condition B, to the extent to which this condition is operative in the child’s grammar at all (see Jakubewitz 1985; Wexler and Chien 1985, 1987a), it would be expected to disallow single clause coindexed structures before it would disallow double clause structures. This would give the appearance of the strong crossover constraint operating in single clause structures. In spite of the interest of the representational view above, there are considerable difficulties, if this is taken as the ultimate level of explanation. These seem to me to support a theory that is derivational in character, or at root derivational, though the change in derivation may have (as always) representational effects. Perhaps the most significant difficulty with the approach outlined above has to do with the difference in strong crossover effects for children depending on whether a full noun phrase has been fronted (whose hat), or a simple wh-element (who). The Roeper et al. data strongly shows a contrast between the two. (142) a. b.
Whoi is hei V-ing ei? Whosei N is hei V-ing ei?
(25.9% coreferent) (3.9% coreferent)
This is perhaps the most significant statistical result in the whole experiment. Yet this distinction is not really covered by the basic motif that Roeper et al. follow: that wh-trace is read by the child as little pro. Of course other possibilities of explanation may be advanced — as they are in the paper — but it would be preferable if such a significant result could be a result of the basic framework. A second difficulty in the account has to do with the one clause vs. two
THE ABROGATION OF DS FUNCTIONS
251
clause contrast. As noted above, children begin to respect strong crossover in one clause structures before they respect it in two clause structures. They have the following developmental path: (143) a. b. c.
Coreference allowed everywhere (no Strong Crossover constraint) Coreference allowed in 2 clause structures; not in 1 clause structures (Strong Crossover constraint for 1 clause only) Coreference not allowed (adult Strong Crossover constraint)
It might at first be thought that this divergence in one clause and two clause structures could be traced simply to the fact that the initial wh-trace is treated as little pro by the child, and that this obeys Condition B. However, things cannot be this simple. That is because there are two changes in the grammar given in (143), but just one parameter to manipulate: pro → wh-trace. If the change in parameter (pro → wh-trace) is supposed to account for the first change in the data, i.e. the transition from (143a) to (143b), then it cannot also account for the second change, from (143b) to (143c). If it is supposed to account for the second change, then it cannot also be at the root of the first. In short, there are two changes in the developmental path, but only one parameter: both cannot be linked to a single change in the grammar. There is indeed a way around this, which rather stretches the conceptual grounding of the notion “parameter”. This is to say that there is a simple parametric change (pro → wh-trace), but that this operates more than once. In particular, it is construction-sensitive, first operating in one clause structures, and later in two clause structures. This is odd, however, from the point of view of what is normally meant by “parameter”: i.e, an independently specified piece of information in the grammar. Such a piece of information would not normally be thought of as construction-specific: if wh-trace were being read as little pro by the child, one would assume that it was listed in the grammar as such. And the particular change that was noted: that it would first change from little pro to whtrace in one clause structures, and only later in two clause structures, seems equally inexplicable, given that the only possible basis for such a divergence, some general change in computational complexity, operates in the opposite direction with respect to the extraction of simple wh-elements vs. full wh-phrases (who vs. whose hat). 5.7.6
A Derivational Account, and a Possible Compromise
As I have just noted, it would be odd, if one assumes that the deepest level of explanation is a representationally based parametric one, to assume that that
252
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
wh-trace begins as little pro in one and two clause constructions, then becomes wh-trace in one clause constructions only and retains its status as little pro in two clause constructions, and finally becomes wh-trace generally. However, this is odd not because this developmental course is odd per se, but rather because it does not fit in well with the notion of parameter-setting as conventionally understood. Suppose that, instead of assuming that the categorial change were the root level of explanation, that the difference between the adult grammar and the child’s were to be traced instead to some change in the derivation occurring over time. Suppose further that null categories, and in particular wh-trace, are derivationally, and not representationally defined: i.e. that wh-trace is the null element left by wh-movement. Then the derivational difference would have an immediate effect on the definition of elements at any particular level. This difference might well be on a construction-by-construction basis, rather than parametric in the conventional sense. It is this sort of line of inquiry that I will follow. Let us first adopt the van Riemsdijk and Williams account of Strong Crossover — that it is an instance of Condition C applying at DS to structures such as (144). (144) *Hei liked whosei hat. The van Riemsdijk and Williams approach allows for an explanation of the acquisition facts on exactly the same grounds that had been advanced earlier: the shallowness of the grammar. Assuming that S-structure or the surface is the (computational) anchor for comprehension, and assuming a shallow analysis in which the wh-element is base generated in place by the child, but not by the adult, then the child’s representation for sentences like (145) will be given in (146), while the adult’s representation will be that in (147). (145) Whoi did hei say that Bill liked ei? (146) Child’s analysis: DS′: Who did he say that Bill liked e? SS: Whoi did hei say that Bill liked ei? (147) Adult’s analysis: DS: *Hei said that Bill liked whoi. SS: *Whoi did hei say that Bill liked ei? Assuming that the deepest level of analysis computed by the child is DS′ — where the wh-element is already in fronted position — and assuming that Strong Crossover really is a condition on the c-command of a wh-trace by a coreferent name or pronoun, then the child’s grammar would not be expected to exhibit strong crossover effects. That is, if we assume shallowness of analysis, then the
THE ABROGATION OF DS FUNCTIONS
253
strong crossover facts — i.e. the lack of strong crossover effects for children — are explained without any additional stipulation given the van Riemsdijk/ Williams account, since the wh-element would not be c-commanded by a coindexed name or pronoun at any level of representation. The acquisition account therefore supports the van Riemsdijk and Williams proposal.. Consider now the second contrast noted by Roeper et al.: that between a fronted simple wh-element and a full fronted NP. (148) a. b.
Who is he V-ing t? 25.9% coreferent Whose N is he V-ing t? 3.6% coreferent
As noted earlier, this contrast is striking. These facts are also important in showing that the child is not simply behaving randomly in examples like (148a) due to confusion, because in the equally complex (148b) Strong Crossover is maintained. Of course, no such contrast exists in the adult grammar. As noted earlier, Chomsky (1986b) allows two left peripheral positions outside of S (=IP), a direct Comp position, and a position which is Spec C′. (149)
CP SpecC′
C′
Comp +/−wh The former contains a limited set of closed class features and words (+/−wh, if, that, etc.), while the latter contains full NP’s. As noted earlier, this allows the “garbage category” character of Comp to be avoided. Let us make use of this contrast here in the following way: (150) Who, what, and other closed class wh-elements are optionally generated in Comp by the child, as spell-outs of the +wh feature; which man, whose book, and other full wh-NPs are found in Spec C′. The distinction in (150) is reasonable, on two grounds. First, since simple +wh elements (who, what, etc.) contain barely more information then the fact that they are wh-elements themselves (plus some information as to humanness. etc.), they could be simple spell-outs of the +wh feature. This is of course impossible with the full NP phrase. Second, there is some evidence from early acquisition which is quite suggestive as to the placement of the initial wh-words. A traditional finding is that there is a stage of development in which children allow either the
254
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
fronting of the wh-word, or the fronting of the auxiliary, but not both (though see Pinker 1984, for some critical discussion). (151) Possible: Possible: Impossible:
Did John see Mary? Who John can see? Who can John see?
This has generally been suggested to be due to a computational deficit of some type: that both wh-movement and Subj/Aux inversion take some sort of computational power, so the mutual application of both is impossible. The comments above suggest instead a structural explanation: if the wh-word is generated initially as a spell-out of a wh-feature in Comp, and if Subj/Aux inversion is a substitution operation into Comp (Chomsky 1986b), then the configuration of data in (151) is explained. Let us, therefore, accept the premise of (150), that in wh-questions at this stage, the child is able optionally to spell-out the wh-element from the +wh feature in Comp (if it is simple). This allows us immediately to explain the lack of Strong Crossover for such questions: the wh-word is generated in place, so it is never c-commanded by a coreferent pronoun. But what now of the fact that Strong Crossover always holds for the full fronted wh-phrases? I believe that this calls for the revision of an assumption that I have been making throughout this chapter. I have been assuming that in the shallow analysis, the element starts off in a dislocated position at DS′, and is able to do so because it gets its theta role from a null category in an associated GF-θ position. While this allows for the element to get a theta role in its deepest level position via the trace, it does generate an element in a θ-bar position at the deepest level. This would still not violate the Projection Principle if one assumed that the adult DS, but not the child’s computed DS (=DS′) were a pure instantiation of GF-θ. That is if one assumed the following: (152) DS (i.e. the adult DS) is the representation where pure theta relations are expressed (i.e. is a direct projection of the thematic structure). Let us suppose instead that the following holds, a more reasonable assumption: (153) The deepest computed level (at any stage of development) must be where pure theta relations are expressed. The assumption in (153) would require that the dislocated elements in (154a) and (b) — i.e. the two types of dislocated elements discussed earlier — actually be in some sort of theta position of their own at DS′.
THE ABROGATION OF DS FUNCTIONS
(154) a. b.
255
Near John’s house, he put a case. To kiss the pig would make the horse happy.
This is certainly not implausible for (154b), where the element may be in a subject position (Subj/Aux inversion can apply), and assigned an auxiliary theta role; let us assume that it is also the case in (154a), where the element may be in some sort of topic position. In effect this states that there is a parametric difference in the child’s grammar which allows certain positions to be basegenerated theta positions. Given the above assumption, an appropriate derivational distinction may be made. The dislocated full wh-phrase in sentences like (155) is in the Spec C′ position, and this is in no instance a theta position. (155) Whose hat did he buy? Hence the wh-element must come from an actual DS object position, even in the children’s derivation (given revised assumption (153)). (156) Child’s analysis: DS: He saw whose hat. SS: Whose hat did he see? Since the wh-element is coming from its DS position, there is a Condition C violation for the child. On the other hand, the simple wh-element is a spell-out of the +wh feature. It is generated in Comp, perhaps spelled-out at some later level, and never appears in a non-dislocated position. The child’s derivation is therefore the following: (157) Child’s analysis: DS′: Who (did) he see? SS: Who (did) he see? Because the wh-element is never c-commanded by a name, no Condition C violation results. Let us turn now to the third finding of Roeper et al.: that there is a sequencing effect in the appearance of the Strong Crossover Constraint. Namely, the child first allows both 1 and 2 clause structures violating the constraint, then correctly rules out 1 clause structures, while allowing Strong Crossover to be violated in 2 clause structures, and finally allowing no Strong Crossover at all.
256
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
(158) Stage I: Who is hei V-ing t? Who does hei think NP is V-ing t? Stage II: Who is hei V-ing t? Who does hei think NP is V-ing t? Stage III: Who is hei V-ing t? Who does hei think NP is V-ing t?
OK for children OK for children * for children OK for children both * for children (from Roeper et al.)
If this degree of complexity of data is to be trusted, then what we have here is not indicative of parameter-setting in the representational mode as commonly understood. If one assumed a single change in the course of development pro → wh-trace. then it would be unexpected that this would apply in a context-sensitive way, depending on the one clause vs. two clause contrast. Nor would this data be explained if one assumed that Strong Crossover suddenly popped into the grammar, a possibility which is in any case excluded in principle above. On the other hand, if the Strong Crossover Constraint were dependant on the possibility of wh-movement, then the outlines of the solution for the sequencing effect are clear, since the emergence of Strong Crossover is dependant on the amount of wh-movement which is occurring. The data would follow from the following: (159) a.
b.
c.
Stage I: Wh-movement does not occur at all for simple wh-elements, the elements are spellouts of features in Comp. Stage II: Wh-movement does occur for simple wh-elements, but only within CP (CP (=S′) acts as an absolute barrier). Stage III: Wh-movement applies as in the adult grammar.
The first stage is one in which there is no Strong Crossover Effect at all for the simple moved wh-elements. The second stage is the crucial one for current purposes. While there is apparent extraction over two clauses, the second stage would require that the wh-element be simply base-generated as a spell-out of +wh features in the matrix, rather than moved from the lower clause. This in turn would require the movement of a null operator in the lower clause, and the indexing of this element with the matrix wh-element. In the third stage the grammar would be identical to the adult one.
THE ABROGATION OF DS FUNCTIONS
(160) a.
b.
257
Stage I: Who did he see t? Spell-out of wh-features Stage II: i) 1 clause: Who did he see t? movement ii) 2 clause: Who did he think that Oi NP saw t? spellout
c.
additional indexing
movement
Stage III: Who did he think t that NP saw t? movement
movement
Stage I and Stage III have already been extensively discussed. The crucial fact about Stage II is that by assuming that CP is an absolute barrier to wh-movement at this stage, the wh-element is forced to be generated in the matrix Comp node, at this stage, rather than moved from below. This would then mean that Strong Crossover would not occur in these structures, while at the same time it would for 1 clause structures, which is Roeper et al.’s result. The crucial fact is that the stage-like progression in the acquisition data can find a stage-like progression in the extraction domain of wh-elements. A similar sort of solution is not available if one assumes that the change is in category type: pro → wh-trace. To summarize, I have suggested in this chapter that the grammar is essentially in the derivational mode, and that reflexes of this may be seen in the intermediate grammars that the child adopts. Let me close, however, by considering a point of contact of the above analysis with that of Roeper et al. Let us suppose that, in the derivational mode, the notion of wh-trace is not defined in terms of contextual features (e.g. a Case-assigned empty category), but rather in terms of its history. Something is a trace, in this sense, if it is a null category left by movement: it is a wh-trace if it is a null category left by movement to an A′position. This is of course different from the contextual definition sometimes adopted or suggested (e.g. Chomsky 1981, 1982), but is itself viable and rather
258
LANGUAGE ACQUISITION AND THE FORM OF THE GRAMMAR
straightforward. Consider now what happens under the conditions that we have been suggesting: that the child computes, in some instances, a shallow derivation, anchored in S-structure, but only receding back to DS′. (161) Child’s analysis: DS′: Who did he say Bill liked. SS: Whoi did hei say Bill liked ei? The derivation is clear, but what is the nature of the null category? According to the definition above, it could not be a wh-trace even though it “looks like” a whtrace in the adult grammar — simply because a wh-trace is an element left by wh-movement, and no wh-movement has taken place in the derivation. What then is it? In fact, it is not clear what the null category would be. One possibility, and a fairly likely one for the category is that it is simply little pro, at least at DS′. For it is Case-marked, as little pro is, and it is not derived by whmovement, as is also the case for little pro. If we assume that elements retain their character over a derivation (Brody 1984), and we also allow the possibility of an A′-bound little pro (Roeper et al. 1986), then the element would also be little pro at s-structure: the A-bound pro of Roeper et. al. The theoretical ramifications of this sort of account are, I believe, quite interesting. It would mean that both the derivational and representational views were correct: the derivation is shallow, and so the element is defined as little pro. However, unlike the pure representational account, or even a representational-based account like that of Roeper et al., the representational assignment as little pro is based on the analysis in the derivational mode. That is, it is the shallowness of the derivation which requires the child to analyze the null element as little pro (or at least not wh-trace), since wh-trace is defined as the position from which A′-movement has taken place. In this way it is possible to hold to the basic insight of the Roeper et al. approach, namely that initial wh-trace is not treated as such, without holding to the view that this is the deepest level of explanatory analysis. That has to do with the shallowness of the computed derivation, and it is because of this shallowness that the null category is, as a result, analyzed as little pro. This analysis would also have the following effect: that when the child’s grammar fails, perhaps for computational reasons, it falls into another grammar in UG. In this case the grammar would be one in which the element was a bound little pro. While the details in such a case require further research, the general architecture seems clear.
References
Abney, S. (1987a). The English Noun Phrase in Its Sentential Aspect. Ph.D. dissertation, Massachusetts Institute of Technology. Abney, S. (1987b). Licensing and Parsing. Proceedings of NELS 17. Abney, S. and J. Cole (1985). A Government-Binding Parser. Proceedings of NELS 15. Akmajian, A. , S. Steele, and T. Wasow (1979). The Category Aux in Universal Grammar. Linguistic Inquiry 10. 1, 1–64. Anderson, M. (1979). Noun Phrase Structure. Ph.D Dissertation, Univ. of Connecticut. Aoun, Y. (1985). A Grammar of Anaphora, Cambridge, Mass.: MIT Press. Aoun, Y. and D. Sportiche (1981). On the Formal Theory of Government. The Linguistic Review 2. 3, 211–236. Aoun, Y. , N. Hornstein, D. Lightfoot, and A. Weinberg (1987). Two Types of Locality. Linguistic Inquiry 18. 4, 537–577. Bach, E. (1962). The Order of Elements in a Transformational Grammar of German. Language 38, 263–269. Bach, E. (1977). The Position of the Embedding Transformation in the Grammar Revisited. Linguistic Structures Processing, edited by A. Zampoli. New York: NorthHolland. Bach, E. (1979). Control and Montague Grammar. Linguistic Inquiry 10. 4, 515–531. Bach, E. (1983). On the Relationship between Word Grammar and Phrase Grammar. Natural Language and Linguistic Theory 1, 65–89. Baker, C. L. (1979). Syntactic Theory and the Projection Problem. Linguistic Inquiry 10. 4, 533–581. Baker, C. L. (1977). Comments on Culicover and Wexler. Formal Syntax, edited by P. Culicover, T. Wasow, and A. Akmajian. New York: Academic Press. Baker, C. L. and J. J. McCarthy, eds. (1981). The Logical Problem of Language Acquisition. Cambridge, Mass.: MIT Press. Baker, M. (1985). Incorporation : A Theory of Grammatical Function Changing. Ph.D. dissertation, Massachusetts Institute of Technology. Bar-Adon, A. and W. Leopold, ed. (1971). Child Language: A Book of Readings. New York: Prentice-Hall. Barss, A. (1985). Chain-Binding. Presentation given at West Coast Conference on Formal Linguistics.
260
REFERENCES
Barss, A. (1986). Chains and Anaphoric Dependencies. Ph.D. dissertation, Massachusetts Institute of Technology. Barss, A. and H. Lasnik (1986). A Note on Anaphora and Double Objects. Linguistic Inquiry 17. 2, 347–354. Belletti, A. and L. Rizzi (1988). Psych-verbs and Th-theory. Natural Language and Linguistic Theory 6. 3, 291–352. Berwick, R. (1985). The Acquisition of Syntactic Knowledge. Cambridge, Mass.: MIT Press. Berwick, R. and A. Weinberg (1984). The Grammatical Basis of Linguistic Performance. Cambridge, Mass.: MIT Press. Bever, T. (1970). The Cognitive Basis for Linguistic Structures. Cognition and the Development of Language, edited by J. R. Hayes, 279–352. New York: Wiley. Bierwisch, M. (1963). Grammatik des Deutschen Verbs. Studia Grammatica. Berlin: GDR. Bloom, L. (1970). Language Development: Form and Function in Emerging Grammars. Cambridge, Mass: MIT Press. Bloom, L. , P. Lightbown, and L. Hood (1975). Structure and Variation in Child Language. Monographs of the Society for Research in Child Development 40. Borer, H. (1984). The Projection Principle and Rules of Morphology. Proceedings of NELS 14. Borer, H. (1985). The Lexical Learning Hypothesis and Universal Grammar. Boston University Conference on Language Development, Boston, Mass. Borer, H. and K. Wexler (1987). The Maturation of Syntax. Parameter-Setting, edited by T. Roeper and E. Williams. Cambridge, Mass.: MIT Press. Bouchard, D. (1984). On the Content of Empty Categories. Dordrecht: Foris. Bowerman, M. (1973). Early Syntactic Development. Cambridge, England: Cambridge University Press. Bowerman, M. (1974). Learning the Structure of Causative Verbs. Papers and Reports on Child Language Development 8, edited by E. Clark, 142–178. Stanford, Calif: Stanford University. Bowerman, M. (1982). Reorganizational Processes in Lexical and Syntactic Development. Language Acquisition: The State of the Art, edited by E. Wanner and L. Gleitman, 319- 347. Cambridge, England: Cambridge University Press. Bradley, D. (1979). Computational Distinctions of Vocabulary Type. Ph.D. dissertation Massachusetts Institute of Technology. Braine, M. D. S. (1963). The Ontogeny of the English Phrase Structure: the First Phase. Reprinted in Child Language: A Book of Readings, edited by A. Bar-Adon. and W. Leopold. New York: Prentice-Hall. Braine, M. D. S. (1963). On Learning the Grammatical Order of Words. Reprinted in Child Language: A Book of Readings, edited by A. Bar-Adon and W. Leopold. New York: Prentice-Hall.
REFERENCES
261
Braine, M. D. S. (1965). Learning the Positions of Words Relative to a Marker Element. Journal of Experimental Psychology 72. 4, 532–540. Braine, M. D. S. (1976). Children’s First Word Combinations. Monographs of the Society for Research in Child Development 41, 1–96. Bresnan, J. (1977). Variables in the Theory of Transformations. Formal Syntax, edited by P. Culicover, T. Wasow, and A. Akmajian, New York: Academic Press. Bresnan, J. (1978). A Realistic Transformational Grammar. Linguistic Theory and Psychological Reality, edited by M. Halle, J. Bresnan, and G. Miller. Cambridge, Mass.: MIT Press. Bresnan, J. , ed. (1982). The Mental Representation of Grammatical Relations. Cambridge, Mass.: MIT Press. Brody, M. (1984). On Contextual Definitions and the Role of Chains. Linguistic Inquiry 15. 3, 355–380. Brown, R. (1973). A First Language: The Early Stages. Cambridge, Mass.: Harvard University Press. Brown, R. and U. Bellugi (1964). Three Processes in the Child’s Acquisition of Syntax. reprinted in Child Language: A Book of Readings, edited by A. Bar-Adon and W. Leopold. New York: Prentice-Hall. Browning, M. (1987). Null Operator Constructions. Ph.D. dissertation, Massachusetts Institute of Technology. Burzio, Luigi (1986). Italian Syntax, Dordrecht: Reidel. Carden, G. (1986a). Blocked Forwards Coreference and Unblocked Forwards Anaphora: Evidence for an Abstract Model of Coreference. Papers from the Regional Meeting of CLS 22, 262–276. Carden, G. (1986b). Blocked Forwards Coreference:Theoretical Implications of the Acquisition Data. Studies in the Acquisition of Anaphora, Vol I, edited by B. Lust. Dordrecht: Reidel. Carden, G. and T. Dietrich (1981). Introspection, Observation, and Experiment. Proceedings of the 1980 Biennial Meeting of the Philosophy of Science Association in East Lansing, MI, edited by R. Giere, 583–597. Chierchia, G. (1984). Topics in the Syntax and Semantics of Infinitives and Gerunds. Ph.D. dissertation, University of Massachusetts Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton. Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, Mass.: MIT Press. Chomsky, N. (1970). Remarks on Nominalization. Studies in Semantics in Generative Grammar, Papers by N. Chomsky. The Hague: Mouton. Chomsky, N. (1972). Studies in Semantics in Generative Grammar. The Hague: Mouton. Chomsky, N. (1973). Conditions on Transformations. A Festschrift for Morris Halle, edited by S. Anderson and P. Kiparsky, 223–286. New York: Holt, Rinehardt, and Winston. Chomsky, N. (1975–1955). The Logical Structure of Linguistic Theory. New York: Plenum Press.
262
REFERENCES
Chomsky, N. (1975). Reflections on Language. Pantheon. Chomsky, N. (1977a). On Wh-Movement. Formal Syntax, edited by P. Culicover, T. Wasow, and A. Akmajian, 71–132. New York: Academic Press. Chomsky, N. (1977b). Essays on Form and Interpretation. New York: North-Holland. Chomsky, N. (1980). On Binding. Linguistic Inquiry 11. 1, 1–46. Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. (1982). Some Concepts and Consequences of the Theory of Government and Binding, Cambridge, Mass.: MIT Press. Chomsky, N. (1986a). Knowledge of Language: Its Nature, Origins, and Use, Praeger. Chomsky, N. (1986b). Barriers. Cambridge, Mass.: MIT Press. Chomsky, N. (1993). A Minimalist Program for Linguistic Theory. The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, edited by Ken Hale and S. J. Keyser, 1–52. Cambridge, Mass.: MIT Press. Chomsky, N. (1995). The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, N. and H. Lasnik (1977). Filters and Control. Linguistic Inquiry 8. 3, 425–504. Clark, E. (1973). What’s in a Word? On the Child’s Acquisition of Semantics in his First Language. Cognitive Development and the Acquisition of Language, edited by T. E. Moore, 65–110. New York: Academic Press. Clark, H. and E. Clark (1977). Psychology and Language: An Introduction to Psycholinguistics. New York: Harcourt, Brace, Jovanich. Clark, R. (1986). Boundaries and the Treatment of Control. Ph.D. dissertation, UCLA. Cooper, R. (1978). Montague’s Semantic Theory and Transformational Grammar. Ph.D. dissertation, University of Massachusetts. Crain, S. and C. McKee (1985). Acquisition of Structural Constraints on Anaphora. Proceedings of NELS 16. Culicover, P. (1967). The Treatment of Idioms within a Transformational Framework. IBM Technical Report. Culicover, P. , T. Wasow, and A. Akmajian, eds. (1977). Formal Syntax. New York: Academic Press. Culicover, P. and K. Wexler (1977). A Degree-2 Theory of Learnability. Formal Syntax, edited by P. Culicover, T. Wasow, and A. Akmajian. New York: Academic Press. Davis, H. (1985). Syntactic Undergeneration in the Acquisition of English: Wh-constructions and the ECP. Proceedings of NELS 16. Dowty, D. , R. Wall, and S. Peters (1979). Introduction to Montague Semantics. Dordrecht: Reidel. Drozd, K. (1987). Minimal Syntactic Structures in Child Language. Manuscript, Tucson, Ariz.: University of Arizona. Drozd, K. (1994). A Unification Categorial Grammar of Child English Negation. Ph.D. dissertation, University of Arizona. Emonds, J. (1975). A Transformational Approach to English Syntax. New York: Academic Press. Emonds, J. (1985). A Unified Theory of Syntactic Categories. Dordrecht: Foris.
REFERENCES
263
Epstein, S. (1984). Quantifier-pro and the LF Interpretation of PROarb. Linguistic Inquiry 15. 3, 499–505. Farmer, A. (1984). Modularity in Syntax: A Study of Japanese and English. Cambridge, Mass.: MIT Press. Fiengo, R. (1980). Surface Structure: The Interface of Autonomous Components. Cambridge, Mass.: Harvard University Press. Fiengo, R. and J. Higginbotham (1981). Opacity in NP. Linguistic Analysis 7. 4, 395–421. Finer, D. and E. Broselow (1985). The Acquisition of Binding in Second Language Learning. Proceedings of NELS 15. Fodor, J. A. (1975). The Language of Thought. New York: Crowell. Fodor, J. A. (1981). Methodological Solipsism as a Research Strategy in Psycholinguistics. Representations, edited by J. Fodor. Cambridge, Mass.: MIT Press. Fodor, J., T. G. Bever, and M. F. Garrett (1974). The Psychology of Language. New York: McGraw-Hill. Fodor, J. D. (1977). Semantics: Theories of Meaning in Generative Grammar. New York: Crowell. Frank, R. (1992). Syntactic Locality and Tree Adjoining Grammar. University of Pennsylvania Ph.D. dissertation Franks, S. and N. Hornstein (1990). Governed PRO. McGill Working Papers in Linguistics 6. 3, 167–191. Freidin, R. (1978). Cyclicity and the Theory of Grammar. Linguistic Inquiry 9. 4, 519–549. Freidin, R. (1986). Fundamental Issues in the Theory of Binding. Studies in the Acquisition of Anaphora, edited by B. Lust, 151–191, Dordrecht: Kluwer. Frazier, Lyn (1979). Some Notes on Parsing. University of Massachusetts Occasional Papers in Linguistics. Amherst, Mass.: University of Massachusetts. Fukui, N. and M. Speas (1986). Specifiers and Projections. MIT Working Papers in Linguistics 8, 128–172. Garrett, M. F. (1975). The Analysis of Sentence Production. The Psychology of Learning and Motivation 9, edited by G. H. Bower, 133–177. New York: Academic Press. Garrett, M. F. (1980). Levels of Processing in Speech Processing. Language Production, vol. 1, edited by B. Butterworth, 177–220. New York: Academic Press. Gleitman, L. and H. Gleitman (1990). Structural Sources of Verb Meaning. Language Acquisition 1, 3–55. Goodall, G. (1984). Parallel Structures in Syntax. Ph.D. dissertation, University of California at San Diego. Goodall, G. (1985–1986). Parallel Structures in Syntax. The Linguistic Review 5. 2, 173–184. Goodluck, H. (1978). Linguistic Principles in Children’s Grammar of Complement Subject Interpretation. Ph.D. dissertation, University of Massachusetts. Goodluck, H. and S. Tavakolian (1982). Competence and Processing in Children’s Grammar of Relative Clauses. Cognition 16, 1–28.
264
REFERENCES
Grimshaw, J. (1981). Form, Function, and Language Acquisition Device. The Logical Problem of Language Acquisition, edited by C. L. Baker and J. J. McCarthy. Cambridge, Mass.: MIT Press. Grimshaw, J. (1986). Nouns, Arguments, and Adjuncts. MIT Working Papers in Linguistics. Cambridge, Mass.: Massachusetts Institute of Technology. Gruber, J. (1967). Topicalization in Child Language. Child Language: A Book of Readings, edited by A. Bar-Adon and W. Leopold, New York: Prentice-Hall. Guilfoyle, E. (1985). The Acquisition of Tense and the Emergence of Lexical Subjects in Child Grammars of English. McGill Working Papers in Linguistics. Hale, K. (1979). On the Position of Walpiri in a Typology of the Base. Bloomington, Ind.: Indiana University Linguistic Club. Hale, K. (1983). Warlpiri and the Grammar of Nonconfigurational Languages. Natural Language and Linguistic Theory 1, 5–48. Hale, K. and J. Keyser (1986a). On the Syntax of Argument Structure. Lexicon Project Working Papers 34. MIT Working Papers in Linguistics. Cambridge, Mass.: Massachusetts Institute of Technology. Hale, K. and J. Keyser (1986b). Some Transitivity Alternations in English. Lexicon Project Working Papers 7. MIT Working Papers in Linguistics. Cambridge, Mass.: Massachusetts Institute of Technology. Hamburger, H. and S. Crain (1982). Relative Acquisition. Language Development: Volume 1, edited by S. Kuczaj II, 245–274. Mahwah, NJ: Lawrence Erlbaum. Hayes, B. (1980). A Metrical Theory of Stress Rules. Ph.D. dissertation, Massachusetts Institute of Technology. Higginbotham, J. (1985). On Semantics. Linguistic Inquiry, 16, 547–594. Hoji, H. (1983). Xn→ (YP) Xn-1 and the Bound Variable Zibun. MIT Workshop on Japanese Linguistics. Cambridge, Mass.: Massachusetts Institute of Technology. Hoji, H. (1985). Logical Form Constraints and Configurational Structures in Japanese. Ph.D. dissertation, University of Washington. Hoji, H. (1986). Empty Pronominals in Japanese and the Subject of NP. Proceedings of NELS 17. Hornstein, N. (1985–1986). Restructuring and Interpretation in a T-model. The Linguistic Review 5. 4, 301–334. Hornstein, N. (1987). Levels of Meaning. Modularity in Representation and Natural Language Understanding, edited by J. Garfield. Cambridge, Mass.: MIT Press. Huang, J. (1982). Logical Relations in Chinese and the Theory of Grammar. Ph.D. dissertation, Massachusetts Institute of Technology. Huang, J. (1993). Reconstruction and the Structure of the VP: Some Theoretical Consequences. Linguistic Inquiry 24. Hyams, N. (1985). Language Acquisition and the Theory of Parameters. Ph.D. dissertation, City University of New York. Hyams, N. (1986). Language Acquisition and the Theory of Parameters. Dordrecht: Reidel.
REFERENCES
265
Hyams, N. (1987). Parameter-Setting. Parameter-Setting, edited by T. Roeper and E. Williams, 1–22. Dordrecht: Reidel. Jackendoff, R. (1972). Semantic Interpretation in Generative Grammar. Cambridge, Mass.: MIT Press. Jackendoff, R. (1977). X′-Syntax. Cambridge, Mass.: MIT Press. Jackendoff, R. (1983). Semantics and Cognition. Cambridge, Mass.: MIT Press. Jackendoff, R. (1988). Consciousness and the Computational Mind. Bradford Books, Cambridge, Mass.: MIT Press. Jaeggli, O. (1986). Passive. Linguistic Inquiry 17. 4, 587–622. Jakubewicz, C. (1984). On Markedness and Binding Principles. Proceedings of NELS 14, 154–182. Jelinek, E. (1984). Empty Categories, Case, and Configurationality. Natural Language and Linguistic Theory 2. 1, 39–76. Jelinek, E. (1985). The Projection Principle and the Argument Type Parameter. Paper presented at Linguistic Society of America, winter meeting. Johnson, K. (1986). Subjects and Theta Theory. Manuscript, Cambridge, Mass.: Massachusetts Institute of Technology. Joshi, A. (1985). Tree-Adjoining Grammars: How Much Context Sensitivity is Required to Provide Reasonable Descriptions? Natural Language Parsing, edited by D. Dowty, L. Kartunen, and A. Zwicky. Cambridge, Eng.: Cambridge University Press. Joshi, A. , L. Levy, and M. Takahashi (1975). Tree Adjunct Grammars. Journal of Computer and System Sciences 10. Joshi, A. and A. Kroch (1985), “The Linguistic Relevance of Tree-Adjoining Grammars”, MS-CS-85–16, Department of Computer and Information Sciences, University of Pennsylvania. Kayne, R. (1981). ECP Extensions. Linguistic Inquiry 12, 93–133. Kayne, R. (1983). Connectedness. Linguistic Inquiry 14, 223–249. Kayne, R. (1984). Connectedness and Binary Branching. Dordrecht: Foris. Kayne, R. (1985). Principles of Particle Constructions. Grammatical Representations, edited by J. Guéron, H. -G. Obenauer, and J. Pollock, 101–140. Dordrecht: Foris. Kegl, J. and J. Gee (undated). ASL Structure: Toward a Theory of Abstract Case. Manuscript, Cambridge, Mass.: Massachusetts Institute of Technology. Keyser, J. and T. Roeper (1984). On the Middle and Ergative Constructions in English. Linguistic Inquiry 15, 381–416. Kiparsky, P. (1982a). From Cyclic Phonology to Lexical Phonology. The Structure of Phonological Representations, edited by H. van der Hulst and N. Smith, 131–177. Dordrecht: Foris. Kiparsky, P. (1982b). Lexical Morphology and Phonology. Linguistics in the Morning Calm, edited by I. -S. Yang, 3–93. Seoul: Hansin. Kitagawa, Y. (1986). Subjects in English and Japanese, Ph.D. dissertation, University of Massachusetts.
266
REFERENCES
Klein, S. (1982). Syntactic Theory and the Developing Grammar: Reestablishing the Relationship between Linguistic Theory and Data from Language Acquisition. Ph.D. dissertation, University of California at Los Angeles. Klima, E. and U. Bellugi (1966). Syntactic Regularities in the Speech of Children. Psycholinguistic Papers, edited by J. Lyons and R. Wales, 183–208. Edinburgh: Edinburgh University Press. Koopman, H. (1984). The Syntax of Verbs. Dordrecht: Foris. Koster, J. (1975). Dutch as an SOV Language. Linguistic Analysis 1, 111–136. Koster, J. (1978). Locality Principles in Syntax. Dordrecht: Foris. Koster, J. (1982). Class lectures. Salzburg Institute of Summer Linguistics, Salzburg, Austria. Koster, J. (1984). On Binding and Control. Linguistic Inquiry 15, 417–459. Koster, J. (1987). Domains and Dynasties. Dordrecht: Foris. Kroch, A. and A. Joshi (1988). Analyzing Extraposition in a Tree Adjoining Grammar. Syntax and Semantics 20, edited by G. Huck and A. Ojeda. New York: Academic Press. Labov, W. and T. Labov (1976). The Learning of Syntax from Questions. Zeitschrift fuer Literatur Wissenschaft und Linguistik 6, 47–82. Ladusaw, W. (1985). A Proposed Distinction between Levels and Strata. Paper presented at Linguistic Society of America, winter meeting. Lapointe, S. (1978). A Theory of Grammatical Agreement. Ph.D. dissertation, University of Massachusetts. Lapointe, S. (1985a). A Model of Syntactic Phrase Combination in Speech Production. Proceedings of NELS 15. Lapointe, S. (1985b). A Theory of Verb Form Use in the Speech of Agrammatic Aphasics. Brain and Language 24. 1, 100–155. Laporte-Grimes, L. and D. Lebeaux (1993). Complexity Considerations in Early Speech. Manuscript. University of Connecticut and University of Maryland. Lasnik, H. (1986). Two Types of Condition C. Presentation at Princeton Conference on Linguistic Theory, Princeton, NJ. Lasnik, H. and S. Crain (1985). On the Acquisition of Pronominal Reference. Lingua 65, 135–154. Lasnik, H. and M. Saito (1984). On the Nature of Proper Government. Linguistic Inquiry 15, 235–289. Lebeaux, D. (1981). The Acquisition of the Passive. MA Thesis, Harvard University. Lebeaux, D. (1982). Submaximal Projections. Manuscript. Amherst, Mass.: University of Massachusetts. Lebeaux, D. (1983). A Distributional Difference Between Reciprocals and Reflexives. Linguistic Inquiry 14. Lebeaux, D. (1984). Anaphoric Binding and the Definition of PRO. Proceedings of NELS 14.
REFERENCES
267
Lebeaux, D. (1984–1985). Locality and Anaphoric Binding. The Linguistic Review 4, 343–363. Lebeaux, D. (1986). The Interpretation of Derived Nominals. Papers from the Regional Meeting of the Chicago Linguistic Society 22, 231–247. Lebeaux, D. (1987). Comments on Hyams. Parameter-Setting, edited by T. Roeper and E. Williams, Dordrecht: Reidel. Lebeaux, D. (1987). The Composition of Phrase Structure. Presentation, Tucson, Az.: University of Arizona. Lebeaux, D. (1988). The Feature +Affected and the Formation of the Passive. Thematic Relations [Syntax and Semantic 21], edited by W. Wilkins. New York: Academic Press. Lebeaux, D. (1988). Language Acquisition and the Form of the Grammar. Ph.D. dissertation University of Massachusetts, Amherst. Lebeaux, D. (in preparation). Appeared as Lebeaux (1991). Relative Clauses, Licensing, and the Nature of the Derivation. Perspectives on Phrase Structure: Heads and Licensing [Syntax and Semantics 25], edited by S. Rothstein. New York: Academic Press. Lebeaux, D. (1997). Determining the Kernel II: Prosodic Form, Syntactic Form, and Phonological Bootstrapping. NEC Technical Report 97–094. Lebeaux, D. (1998). Where does the Binding Theory Apply? NEC Technical Report 98–015. Princeton, NJ: NEC Research Institute. Lebeaux, D. (to appear). A Subgrammar Approach to Language Acquisition. NEC Technical Report. Levin, B. (1983). On the Nature of Ergativity, Ph.D. dissertation, Massachusetts Institute of Technology. Lust, B. and L. Mangione (1984). The Principal Branching Direction Constraint in First Language Acquisition of Anaphora. Proceedings of NELS 14. Lust, B. , ed. (1986). Studies in the Acquisition of Anaphora. Dordrecht: Reidel. Manzini, R. (1983). On Control and Control Theory. Linguistic Inquiry 14, 421–446. Marantz, A. (1980). Whither Move NP. MIT Working Papers in Linguistics. Cambridge, Mass.: Massachusetts Institute of Technology. Marantz, A. (1982). On the Acquisition of Grammatical Relations. Linguistische Berichte: Linguistik als Kognitive Wissenschaft 80/82, 32–69. Marantz, A. (1984). On the Nature of Grammatical Relations. Cambridge, Mass.: MIT Press. Maratsos, M. , S. Kuczaj II, D. Fox, and M. Chalkley (1979). Some Empirical Studies in the Acquisition of Transformational Relations: Passives, Negatives, and the Past Tense. Minnesota Symposium on Child Psychology 12, edited by W. Collins, Mahwah, NJ: Lawrence Erlbaum. Maratsos, M. , D. Fox, J. Becker, and M. Chalkley (1985). Semantic Restrictions in Children’s Passives. Cognition 19, 167–192.
268
REFERENCES
Marcus, M. , D. Hindle, and M. Fleck (1983). D-theory: Talking about Talking about Trees. Association for Computational Linguistics 21, 129–136. May, R. (1985). Logical Form. Cambridge, Mass.: MIT Press. McCawley, J. (1984). Anaphora and Notions of Command. Proceedings of the Tenth Annual Meeting of the Berkeley Linguistic Society. Berkeley, Calif.: University of California. McNeill, D. (1970). The Acquisition of Language: The Study of Developmental Psycholinguistics. New York: Harper and Row. Miller, G. and K. McKean (1964). A Chronometric Study of Some Relations between Sentences. Quarterly Journal of Experimental Psychology 16, 297–308. Mohanon, K. P. (1982). Lexical Phonology. Ph.D. dissertation, Massachusetts Institute of Technology. Montague, R. (1974). Formal Philosophy, edited by R. Thomason. New Haven, Conn.: Yale University Press. Morgan, J. , R. Meier, and E. Newport (1987). Structural Packaging in the Input to Language Learning: Contributions of Prosodic and Morphological Marking of Phrases to the Acquisition of Language. Cognitive Psychology 19. 4, 498–550. Newport, E. , L. Gleitman, and H. Gleitman (1977). Mother, I’d Rather Do It Myself: Some Effects and Non-effects of Maternal Speech Style. Talking to Children: Language Input and Acquisition, edited by C. E. Snow and C. Ferguson. Cambridge: Cambridge University Press. Nishigauchi, T. (1984). Control and Thematic Domain. Language 60, 215–250. Partee, B. H. (1979). Montague Grammar and The Well-Formedness Constraint. Selections from the Third Groningen Round Table [Syntax and Semantics 10], edited by F. Heny and B. Schnelle, 275–313, New York: Academic Press. Partee, B. H. (1984). Compositionality. Varities of Formal Semantics, Proceedings of the 4th Amsterdam Colloquium, edited by F. Landman and F. Veltman, 281–311. Dordrecht: Foris. Pesetsky, D. (1982). Paths and Categories, Ph.D. dissertation, Massachusetts Institute of Technology. Pesetsky, D. (1985). Morphology and Logical Form. Linguistic Inquiry 16, 193–246. Pinker, S. (1979). A Theory of the Acquisition of Lexical Interpretive Grammars. MIT Lexicon Project. Pinker, S. (1984). Language Learnability and Language Development, Cambridge, Mass.: Harvard University Press. Pinker, S. and D. Lebeaux (1982). A Learnability-Theoretic Approach to Language Acquisition. Manuscript, Cambridge, Mass.: Harvard University. Postal, P. (1984). On Raising. Cambridge, Mass.: MIT Press. Powers, S. and D. Lebeaux (1998). Data on DP Acquisition. Issues in the Theory of Language Acquisition, edited by N. Dittmar and Zvi Penner, 37–76. Bern: Peter Lang.
REFERENCES
269
Pustejovsky, J. (1984). Studies in Generalized Binding. Ph.D. dissertation, University of Massachusetts. Radford, A. (1981). Transformational Syntax. Cambridge, Eng.: Cambridge University Press. Randall, J. (1985). Morphological Structure and Language Acquisition. New York: Garland Press. Reinhart, T. (1983). Anaphora and Semantic Interpretation. Chicago: University of Chicago Press. Riemsdijk, H. van and E. Williams (1981). NP-structure. The Linguistic Review 1. Rizzi, L. (1982). Issues in Italian Syntax. Dordrecht: Foris. Rizzi, L. (1986a). On Chain Formation. The Syntax of Pronominal Clitics [Syntax and Semantics 19], edited by H. Borer, 65–95. New York: Academic Press. Rizzi, L. (1986b). Null Objects in Italian and the Theory of pro. Linguistic Inquiry 17, 501558. Roberts, I. (1986). Implicit and Dethematized Subjects. Ph.D. dissertation, University of Southern California. Roeper, T. (1974). Ph.D. dissertation, Harvard University. Roeper, T. and M. Siegel (1978a). A Lexical Transformation for Verbal Compounds. Linguistic Inquiry 9, 199–260. Roeper, T. (1978b). Linguistic Universals and the Acquisition of Gerunds. University of Massachusetts Occasional Papers 4, edited by H. Goodluck and L. Solan. Amherst, Mass.: University of Massachusetts. Roeper, T. (1982). The Role of Universals in the Acquisition of Gerunds. Language Acquisition: The State of the Art, edited by E. Wanner and L. Gleitman, 267–288. Cambridge, Eng.: Cambridge University Press. Roeper, T. (1983). Implicit Theta Roles in the Lexicon and Syntax. Manuscript, Amherst, Mass.: University of Massachusetts. Roeper, T. (1987). Implicit Arguments and the Head Complement Relation. Linguistic Inquiry 18. 2, 267–310. Roeper, T. , S. Akiyama, L. Mallis, and M. Rooth (1986), The Problem of Empty Categories and Bound Variables in Language Acquisition. Manuscript, Amherst, Mass.: University of Massachusetts. Roeper, T. and J. Keyser (1984). On the Middle and Ergative Constructions in English. Linguistic Inquiry 15, 381–416. Roeper, T. and E. Williams (1986). Parameter Setting, Dordrecht: Reidel. Rosenbaum, P. (1967). The Grammar of English Predication Complement Constructions. Cambridge, Mass.: MIT Press. Ross, J. R. (1968). Constraints on Variables in Syntax. Ph.D. dissertation, Massachusetts Institute of Technology. Rothstein, S. (1983). The Syntactic Forms of Predication. Ph.D. dissertation, Massachusetts Institute of Technology.
270
REFERENCES
Rozwadowska, B. (1986). Thematic Relations in Derived Nominals. Thematic Relations [Syntax and Semantics 21], edited by W. Wilkins. New York: Academic Press. Safir, K. (1982). Inflection-Government and Inversion. The Linguistic Review 1. 4, 417–467. Safir, K. (1987). The Syntactic Projection of Lexical Thematic Structure. Natural Language and Linguistic Theory 5. 4, 561–611. Safir, K. (1987). Comments on Manzini and Wexler. Parameter-Setting, edited by T. Roeper and E. Williams. Dordrecht: Reidel. Saito, M. and H. Hoji (1983). Weak Crossover and Move-α in Japanese. Natural Language and Linguistic Theory 1, 245–260. Schlesinger, I. M. (1971). Production of Utterances and Language Acquisition. The Ontogenesis of Grammar, edited by D. Slobin, 63–101. New York: Academic Press. Seeley, T. D. (1989). Anaphoric Relations, Chains, and Paths. Ph.D. dissertation, University of Massachusetts. Selkirk, E. (1984). Phonology and Syntax: The Relation between Sound and Structure. Cambridge, Mass.: MIT Press. Shattuck, S. R. (1974). Speech Errors: An Analysis. Ph.D. dissertation, Massachusetts Institute of Technology. Sheldon, A. (1974). The Role of Parallel Function in the Acquisition of Relative Clauses/ Journal of Verbal Learning and Verbal Behavior 13, 272–281. Solan, L. (1983). Pronominal Reference: Child Language and the Theory of Grammar. Dordrecht: Reidel. Solan, L. and T. Roeper (1978). Children’s Use of Syntactic Structure in Interpreting Relative Clauses. Papers in the Structure and Development of Child Language, University of Massachusetts Occasional Papers 4, edited by H. Goodluck and L. Solan, 105–126. Amherst, Mass.: University of Massachusetts. Speas, M. (1990). Phrase Structure in Natural Language. Dordrecht: Reidel. Sproat, R. (1985). On Deriving the Lexicon. Ph.D. dissertation, Massachusetts Institute of Technology. Sproat, R. (1985). The Projection Principle and the Syntax of Synthetic Compounds. Proceedings of NELS 16. Sportiche, D. (1983). Structural Invariance and Symmetry in Syntax. Ph.D dissertation, Massachusetts Institute of Technology. Sportiche, D. (1987). Unifying Movement Theory. Manuscript, Los Angeles: University of Southern California. Sportiche, D. (1988). A Theory of Floated Quantifiers, and its Consequences for Constituent Structure. Linguistic Inquiry 19, 425–450. Steele, S. (in preparation). A Grammar of Luiseno. Stowell, T. (1981). The Origins of Phrase Structure. Ph.D. dissertation, Massachusetts Institute of Technology. Stowell, T. (1981/1982). A Formal Theory of Configurational Phenomena. Proceedings of NELS 12.
REFERENCES
271
Stowell, T. (1983). Subjects across Categories. The Linguistic Review 2, 285–312. Stowell, T. (1988). Small Clause Restructuring. Manuscript, Los Angeles: University of California at Los Angeles. Tavakolian, S. (1978). Structural Principles in the Acquisition of Complex Sentences. Ph.D. dissertation, University of Massachusetts. Tavakolian, S. (1981). The Conjoined Clause Analysis of Relative Clauses. Language Acquisition and Linguistic Theory, edited by S. Tavakolian, Cambridge, Mass.: MIT Press. Tavakolian, S., ed. (1981). Language Acquisition and Linguistic Theory. Cambridge, Mass.: MIT Press. Thiersch, C. (1978). Topics in German Syntax. Ph.D. dissertation, Massachusetts Institute of Technology. Travis, L. (1984). Parameters and Effects of Word Order Variation. Ph.D. dissertation, Massachusetts Institute of Technology. Vainikka, A. (1985). The Acquisition of English Case. Presented at Boston University Conference on Language Development 10, later appeared as Vainikka, A. (1993/ 1994). Case in the Development of English Syntax. Language Acquisition 3, 257–324. Vainikka, A. (1986). Case in Finnish and Acquisition. Manuscript, Amherst, Mass.: University of Massachusetts. Vainikka, A. (1986). Nominative Signals Movement. Presentation at 3rd Workshop in Comparative Germanic Syntax, Turku, Finland. Vainikka, A. (1988), Manuscript, Amherst, Mass.: University of Massachusetts. Vergnaud, J. -R. (1985). Dependances et niveaux de representation en syntaxe. Amsterdam: John Benjamins. Wanner, E. and L. Gleitman, ed. (1982). Language Acquisition: The State of the Art, Cambridge, Eng.: Cambridge University Press. Wasow, T. (1977). Adjectival and Verbal Passive. Formal Syntax, edited by P. Culicover, T. Wasow, and A. Akmajian. New York: Academic Press. Weinberg, A. (1988), Locality Principles in Syntax and in Parsing,. Ph.D. dissertation, Massachusetts Institute of Technology. Wexler, K. and P. Culicover (1980). Formal Principles of Language Acquisition. Cambridge, Mass.: MIT Press. Wexler, K. (1982). A Principled Theory for Language Acquisition. Language Acquisition: The State of the Art edited by E. Wanner and L. Gleitman, Cambridge, Eng.: Cambridge University Press. Wexler, K. and Y. Chien (1985). The Development of Lexical Anaphors and Pronouns. Papers and Reports on Child Language Development 24, 138–149. Stanford, Calif.: Stanford University. Wexler, K. and Y. Chien (1987a). Children’s Acquisition of Locality Conditions for Reflexives and Pronouns. Papers in Linguistics 26, 30–39. Irvine, California: University of California.
272
REFERENCES
Wexler, K. and R. Manzini (1987b). Parameters and Learnability in Binding Theory. Parameter-Setting edited by T. Roeper and E. Williams, Dordrecht: Reidel. Williams, E. (1978). Across-the-Board Rule Application. Linguistic Inquiry 9. 1, 31–43. Williams, E. (1980). Predication. Linguistic Inquiry 11. 1, 203–238. Williams, E. (1981). Argument Structure and Morphology. The Linguistic Review 1, 81–114. Williams, E. (1982). The NP-Cycle. Linguistic Inquiry 13. 2, 277–296. Williams, E. (1987). Reassignment of Functions at LF. Linguistic Inquiry 17. 2, 265–299. Zubizarreta, M. -L. (1987). Levels of Representation in the Lexicon and in the Syntax. Dordrecht: Foris.
Index A Abney, 71, 72, 77, 84, 96 Abrogation of deep structure functions, 183–258 Adjoin-α, xv, xix, 91–144 Agreement, 145–153 Akiyama, 204, 239–258 Akmajian, 88 Anchoring and deepest computed level, 188–194 derivation anchored at DS, 184–194 derivation anchored at SS, 184–194 Anti-Reconstruction Effects, 102–112 Aoun, 21, 43, 44, 244 Argument/Adjunct distinction, 94–136 Argument-linking, 38–51 and learnability, 38–41 and the Projection Principle, 104–112 and the Structure of the Base, 104–112 and ergative languages, 38–41 and the derivation, 94–136 B Bach, 17, 100, 101, 220 Baker, 51 Barss, 224–239 Base Order, 17–29 Determining the base order 17–29 Belletti, 219, 241 Bellugi, 26, 27 Bever, 9, 142, 185, 186 Bierwisch, 17 Bloom, 69–70, 79–80, 158
Borer, 15 Bowerman, 11, 28, 92 Bradley,12, 92 Braine, xxiii , 7–12, 19 Bresnan, 54, 118, 216 Brody, 258 Brown, 70, 92, 154 Browning, 214 C Canonical structural realization, 36 Carden, xxviii, 204, 224–239 Case representation, 178, 179 Chien, 245, 249 Chierchia, 23 Chomsky, xiii, xiv, xv, xxii, 2, 4, 10, 13–15, 23, 25, 31, 41, 46, 47, 52, 70, 74, 80, 84–86, 93, 96, 97, 100, 114, 116, 118, 128, 141, 145, 149, 150, 152–154, 165, 183, 206, 208, 215, 220, 240–242, 253 Clark, E., 92 Clark, H., 92 Clark, R., 206 Closed class elements, 1–5, 7,11–16, 151–153 and set of governors, 12 and finiteness, 13–14 and open class elements, 11–15 link with grammatical operations, 151–153 Cole, 96
274
INDEX
Composition of phrase structure, 91–144 and saturation of closed class elements, 114–120 Condition C, 102–112, 224–239 and Dislocated Constituents, 224–239 constraint of direct c-command, 224–229 Conjoin-α, 112–115, 120–136 Constansy principles, 91–94 Control, 203–224 and abrogation of DS functions, 188–194, 220–224 c-command constraint, 213–220 double-binding constructions, 213–220 early stages, 204–213 Goodluck’s result, 210–211 representation in early grammar, 220–224 Tavakolian’s result, 206–207 Cooper, 113 Crain, 120, 207 Culicover, 7, 185 D Deductive system, modelling, 195–203 Deep structure, 104–112 Derivational Endpoints, xiii, xxvii Derivational model, 91–136 Derivational Theory of Complexity, 184–194 Detecting Movement, 19–22 Dietrich, 224–239 Dislocated constituents and indexing functions, 183–258 Dowty, 100 Drozd, 208–213 E Early phrase structure building, 31–36, 47–53, 56–84 from lexical to phrasal syntax, 68–80
lexical representation, 67 pivot/open sequence, 58–60, 68–69, 72–79 thematic representation, 67 Emonds, 17, 154 English nominals as ergative, 41–45 Epstein, 213 Equipollence, 194–203 Ergative languages, 38–45 F Farmer, 68 Fiengo, 25, 88 Finiteness, 2, 13–14 and closed class elements, 13–14 Fixed Specifier Constraint, 19–23 Fleck, 140 Fodor, 9, 128, 185, 186 Frank, xiv Franks, 217 Frazier, 137 Freiden, 101, 104 Fukui, 27, 28, 72, 84 Full vs. Reduced Paradigm, 211 Functor/Pivot, 68–69, 71–75 G Garrett, xvi, xxvi, 9, 12, 92, 155–157, 185, 186 General Congruence Principle, 47, 126–136 and setting of parameters, 126–136 Gleitman H., 67 Gleitman L., xxvi, 67 Goodluck, 120, 209–211, 213 Governor, Canonical, 12 Grammatical operations, 145–153 Grammatical Sequence, relative clauses, 142–144 Grimshaw, 32, 35, 47, 51, 95
INDEX H Hale, 32, 49, 52, 54, 55 Hamburger, 120 Higginbotham, 88, 96, 224 Hindle, 140 Hoji, 52, 77, 118 Hornstein, 21, 43, 44, 217 Huang, 13, 42, 104 Hyams, 16, 82, 209, 215, 249 I Idioms, 165–182 and passive, 165–182 Level I idioms, 178–181 Level II idioms, 178–181 J Jackendoff, xxviii, 22, 33, 61, 85, 86, 95, 97, 102, 103, 172, 183 Jakubewicz, 245, 247, 249 Jelinek, 38, 40, 41, 46, 52 Johnson, 219 Joshi, xiv, xxii, 78 K Keyser, 54, 64 Kitagawa, 27, 28 Klein, 26 Klima, 26, 27 Koopman, 12, 18, 55, 91, 166 Koster, 17, 93, 217 Kroch, xiv, xxii, 78 L Labov,T., 198 Labov,W., 198 Lapointe, 156 Laporte-Grimes, xxvii Lasnik, 21, 106, 207, 215, 230, 242 Lebeaux, xvi, xxi, xxii, xxv, xxvi, xxvii, xxviii, 32, 34, 81, 82, 85, 86, 88, 92, 149, 174, 212, 213-220, 241
275
Levels of Representation and parametric variation, 14–15 Levin, 38 Levy , 78 Lexical entry/representation, 50, 53–60, 63, 66–78 structure of, 50, 53–60, 63, 66–78 with lexical insertion into open slots, 57–58 Lightfoot, 21, 43, 44 Link of closed class item with grammatical operations, 151–153 Lust, 76 M Mallis, 204, 239–258 Mangione, 76 Manzini, 206 Marantz, 38, 92 Marcus, 140 Marker, 19–22 McCawley, 224 McKean, 165, 185, 186 McKee, 207 McNeill, 185, 186 Meier, 19 Merge, xxii-xxiv Merger, 154–182 Metatheoretical Constraint on Indexing, 231 and chain-binding, 230–234 Miller, 165, 185, 186 Minimalist Program, the, xiii-xxix Montague, xxi Morgan, 19 N Newport, 19, 68 Nishigauchi, 205 O Open class/closed class distinction, 7–16
276
INDEX
P Parametric variation in phrase structure, 3, 14–15, 16–18, 31–41 amount of, 3 and levels, 14–15, and triggers, 16–18 Parametric variation in Relative Clauses and saturation of closed class elements, 114–120 and structure of parameters, 112–120, 126–136 Peters, 100 Phrase Structural Case, 150 Phrase structure composition, xiii-xxix Phrase Structure, Building, 31–36, 47–53, 56–84 Pinker, 7, 10, 23, 31–36, 45, 48, 51, 63, 65, 70, 86, 164, 194–197, 212 Pivot/Open Constructions, 58–60, 68–69, 72–75 Pivot/Open distinction, 7–11 and government relation, 9,10 Postal, 243 Powers, xxvi Predication, 146–147 Pre-Project-α representation, 51–83 Principle of Representability, 141 Processing considerations, 136–144 and ECP, 138–139 and grammar, 136–144 Project-α, xvi-xxvii, 154–182 Projection of Lexical Structure, 47–53 Projection Principle, 104–112 Property of Smooth Degradation, 141 Pustejovsky, 150 R Radford, 88 Reconstruction, 234–239 quasi-Reconstruction, 234–239 vs. direct approach, 224–229
Reduced structures deletion account of, 159 null item account of, 160 subgrammar account of, 165–182 Reinhart, 110, 224–226 Relative clauses, 91–144 Relative Clauses, Acquisition conjoined clause analysis, 120–126 default grammar, 123 Replacement Sequences, 69 Rizzi, 219, 241 Roeper, 16, 17, 19, 20, 24, 54, 56, 120–126, 184, 204, 239–258 Rooth, 204, 239–258 Rosenbaum, 205 Ross, 88 Rothstein, 96 S Safir, 17 Saito, 21, 72, 230 Schlesinger, 11 Seely, 231–235 Selkirk, 88 Semantic bootstrapping, 31–38, 45–47 Sequence of Structures, 69 Shallow analysis/derivation, 184–194 and Derivational Theory of Complexity, 184–194 Shattuck-Hufnagel, xxvi, 12, 92, 155–157 Sheldon, 120, 121 Siegel, 54 Solan, 120, 125, 126, 184, 207, 227 Speas, xvi, 27, 28, 72, 84 Specified Determiner Generalization, 170–175 Speech errors, xviii, 155–156, 165–169 Sportiche, 21, 27, 28, 213 Steele, 88 Stowell, 10, 12, 32, 37, 49, 52, 54, 55, 95, 99, 114, 138
INDEX Strong Crossover, 239–258 acquisition evidence, 245–258 and van Riemsdijk and Williams proposal, 243–244 and wh-questions, 239–258 derivational account, 247–249, 251–258 representational account, 247–251 Structure of the Base, 104–112 Subgrammar Approach, xiii-xxix, 53, 67, 75–78, 165–182 Submaximal Projections, 86–87 T Takahashi, 78 Tavakolian, 120–126, 142, 184, 203–224 Telegraphic speech, 154–182 The placement of Neg syntax, 24–26 acquisition, 26–29 Thematic Representation, 67 Theta representation, 178, 179 Thiersch, 18 Travis, 55, 91 Triggers, 16–29
277
U Universal Application of Principles, 245 V Vainikka, 82 van Riemsdijk, xxviii, 4, 46, 102, 103, 116, 119, 183, 236, 243–258 W Wall, 100 Wasow, 88 Weinberg, 21, 43, 44, 140 Weisler, 231 Wexler, 7, 185, 245, 247, 249 Williams, xxviii, 5, 46, 56, 66, 96, 102, 103, 111, 113, 116, 119, 146, 147, 183, 205, 206, 213, 215, 224, 236, 243–258 Z Zubizarretta, 67