This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Benjamins Current Topics Special issues of established journals tend to circulate within the orbit of the subscribers of those journals. For the Benjamins Current Topics series a number of special issues have been selected containing salient topics of research with the aim to widen the readership and to give this interesting material an additional lease of life in book format.
Volume 7 What Counts as Evidence in Linguistics. The case of innateness Edited by Martina Penke and Anette Rosenbach These materials have been previously published in Studies in Language 28:3 (2004)
What Counts as Evidence in Linguistics The case of innateness
Edited by
Martina Penke Anette Rosenbach Heinrich-Heine-University Düsseldorf
John Benjamins Publishing Company Amsterdam / Philadelphia
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984.
Library of Congress Cataloging-in-Publication Data What counts as evidence in linguistics : the case of innateness / edited by Martina Penke, Anette Rosenbach. p. cm. -- (Benjamins current topics, ISSN 1874-0081 ; v. 7) Includes index. 1. Linguistic analysis (Linguistics) 2. Grammar, Comparative and general. 3. Innateness hypothesis (Linguistics) I. Penke, Martina. II. Rosenbach, Anette. P126.W493 2007 410--dc22 2007007384 ISBN 978-90-272-2237-4 (hb : alk. paper)
Table of contents Preface What counts as evidence in linguistics? An introduction Martina Penke and Anette Rosenbach Typological evidence and Universal Grammar Frederick J. Newmeyer Remarks on the relation between language typology and Universal Grammar: Commentary on Newmeyer Mark Baltin Does linguistic explanation presuppose linguistic description? Martin Haspelmath Remarks on description and explanation in grammar: Commentary on Haspelmath Judith Aissen and Joan Bresnan Authors’ response Martin Haspelmath
vii 1
51
75
81
109
113
From UG to Universals: Linguistic adaptation through iterated learning Simon Kirby, Kenny Smith and Henry Brighton
117
Form, meaning and speakers in the evolution of language: Commentary on Kirby, Smith and Brighton William Croft
139
Authors’ response Simon Kirby, Kenny Smith and Henry Brighton
143
Why assume UG? Dieter Wunderlich
147
What kind of evidence could refute the UG hypothesis? Commentary on Wunderlich Michael Tomasello
Author’s response: Is there any evidence that refutes the UG hypothesis? Dieter Wunderlich A question of relevance: Some remarks on standard languages Helmut Weiß The relevance of variation: Remarks on Weiß’s standard-dialect-problem Horst J. Simon
179
181
209
Author’s response Helmut Weiß
215
Universals, innateness and explanation in second language acquisition Fred R. Eckman
217
‘Internal’ versus ‘external’ universals: Commentary on Eckman Lydia White
241
Author’s response: ‘External’ universals and explanation in SLA Fred R. Eckman
245
What counts as evidence in historical linguistics? Olga Fischer
249
Abstraction and performance: Commentary on Fischer David W. Lightfoot
283
Author’s response Olga Fischer
287
Index
291
<SECTION "opt" TITLE "">
Preface
Most of the articles in this volume stem from a workshop on ‘What counts as evidence in linguistics?’ held at the 25th Annual Meeting of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS) in Munich, 26–28 February 2003. When we posted our Call for Papers, we were surprised by the response we received. Obviously, we had hit some nerve, and the field was ripe for a discussion of linguistic evidence. Methodologies have always been critically discussed in the applied disciplines. Nowadays, however, linguistic evidence has also become a prominent topic in theoretical linguistics, where the importance of a solid empirical foundation of theoretical models is getting increasingly realized and acknowledged. In particular, there is a growing awareness among formal linguists that a sole reliance on introspective data (in the past often collected in quite an idiosyncratic way) will no longer do. Not only should speakers’ intuitions be collected in a systematic way, but also should the database of linguistic theories be broadened as to include types of data that well go beyond introspective data as the primary data of linguistic theorizing (such as e.g. psycholinguistic and historical evidence). This recent concern of formal linguistic theory is reflected in other conferences/workshops more or less explicitly touching on the issue of linguistic evidence as well; see, for example, the ‘Gradedness’ conference in Potsdam (October 2002), the Tübingen conference on ‘Linguistic Evidence’ (January 2004), the workshop on the empirical foundations of syntactic modeling organized by Fanselow, Krifka, and Sternefeld as part of the 26th Annual Meeting of the DGfS in Mainz (February 2004), or the workshop on ‘Approaches to empirical syntax’ at the ZAS in Berlin (August 2004). The very fact that all these venues took place in Germany may give rise to the impression that the current discussion on linguistic evidence is an essentially German enterprise. But it is not, as evidenced by the participants in these (international) conferences/workshops. For sure, the issue of linguistic evidence has come to the fore in general in linguistics these days. The topic as such is huge. Not only does it touch on the database of linguistic theories but also, inevitably, on methodological issues (evidence, of whatever type, is only as good as the methodology by which it has been ascertained). It was clear from the outset that a volume like this could not possibly do justice to the topic in all its complexity. We therefore wanted to compile a small selection of articles that
viii
Preface
would exemplify, within a more circumscribed perspective, some of the fundamental issues involved in the topic of linguistic evidence. Among the contributions to our DGfS workshop some highlighted on the relevance of typological data for the construction of Universal Grammar (UG); see the contributions by Kirby, Smith & Brighton, Haspelmath, Newmeyer, and Wunderlich, and, more indirectly, Weiß, and it was intriguing to see how different linguists would come to very different conclusions here. At the same time, we wanted to illustrate how different research agendas and underlying language ontologies — as most clearly reflected in formal and functional approaches — would necessarily lead to different perspectives on linguistic evidence. The contributions mentioned above therefore seemed particularly apt for the purpose of this volume. Discussing the relevance of (primarily) typological evidence for the construction of UG is, in turn, inevitably linked to the issue of innateness. And so, of course, is the controversy between formal and functional approaches. Therefore, a focus on the innateness debate looked very promising to reveal some of the basic differences in the treatment of linguistic evidence within formal and functional approaches. When inviting for contributions we set out the following guiding questions to our contributors: i.
What type of evidence can be used for innateness claims (or Universal Grammar [UG])? ii. What is the content of such innate features (or UG)? iii. How can UG be used as a theory guiding empirical research? To more fully include the third perspective, we additionally invited the contributions by Eckman (on second language acquisition research) and Fischer (on historical linguistics), which also further brought in the functional perspective into this special area of investigation. As a result, this volume contains seven articles that illustrate core arguments of both formal and functional linguistics surrounding innateness claims and which critically take issue with each other. To stimulate the discussion between formal and functional linguists even further, for each article a commentary was invited from a scholar who (ideally from the opposite linguistic camp) would critically take issue with some of the central arguments raised by the authors. The authors were then given the chance for a final brief reply. The order of the contributions in this volume reflects the order of the three guiding questions given above and their treatment by the authors. Note again, that this volume is deliberately not restricted to the topics currently discussed in formal linguistic theory in the context of linguistic evidence, as evidenced in the other conferences/workshops mentioned above. Such issues are not highlighted in this volume, and the interested reader is referred to the forthcoming publications resulting from these venues. The purpose of this volume is both to broaden and to restrict the discussion on linguistic evidence: Broaden, in that it goes beyond formal theoretical linguistics and explicitly compares the two
Preface
linguistic camps, thereby pursuing a more general, epistemological perspective on linguistic evidence. Restrict, in that the focus is on the innateness debate (and within here, mainly on the relevance of typological evidence). This volume is the first special issue ever published by Studies in Language (SiL), and we would like to thank the SiL editorial and John Benjamins, in particular Werner Abraham and Kees Vaes for taking this step with us. It is only due to their efforts, as well as those of our authors, reviewers, and commentators, that made such a speedy publication possible at all.
Düsseldorf, May 2004 / February 2006
Martina Penke & Anette Rosenbach
ix
"intro-r125"> "intro-r93"> "art" "opt"> TITLE "Articles">
What counts as evidence in linguistics? An introduction* Martina Penke and Anette Rosenbach
1.
Introduction
While thirty years ago linguists were still debating whether linguistics ought to be an ‘empirical science’ (see e.g. the contributions in Wunderlich 1976; or Perry 1980), today we can quite safely say that this issue has been settled by and large and that nowadays most linguists will probably agree that linguistics is indeed an empirical science.1 What is being discussed is therefore not whether empirical evidence may or should be used, but rather what type of empirical evidence, and how it is to be used. Of course, the question of evidence itself is closely connected to the question: what for? This again crucially hinges on the view we have on language, or more precisely, which aspect of language we focus on in our research program. Functional approaches to linguistics will certainly give another answer to this question than formal approaches. It is one aim of this volume to bring together both functional and formal views on this topic, and the unifying topic chosen is what counts as evidence for innateness claims, a notorious bone of contention between the two linguistic camps. In this introduction to this volume we will first generally discuss what it means to work ‘empirically’, and then move on to present a short overview on the various types of data and methodologies used in linguistics (Section 2). We will then (Section 3) discuss linguistic evidence more closely with respect to the two central — and in fact opposing — approaches to linguistics, i.e. formal and functional approaches. In particular, we will show how linguistic evidence relates to innateness claims, which is the focus of this volume. We will give an overview on the state of the art of research into linguistic nativism, presenting various types of evidence and arguments brought forward in the literature. In this final section we will then also introduce the contributions to this volume which focus on two questions, namely in how far data from language description (in general, and with special focus on typological evidence) can be used for ascertaining claims about innateness
(or more precisely, Universal Grammar [UG]), and in how far innateness claims (or UG) can be used to guide empirical research.
2. Linguistics as an empirical science Ever since the 19th century linguists have been striving to be ‘scientific’, trying to align themselves with the sciences, looking up to the prevailing scientific paradigm at the time as a model, and trying to integrate the study of language therein (see e.g. Sampson 1980: 17). And ever since, linguistics as a discipline has been wavering between the arts/humanities (Geisteswissenschaften) and the natural sciences (Naturwissenschaften). In the 19th century, linguists (such as August Schleicher, or Max Müller) adopted evolutionary biology as the dominant scientific paradigm at that time, perceiving of languages as natural organisms that grow and decay. The analogy between biology and language, however, proved to be not unproblematic at that time, and, in addition, at the turn of the century evolutionary biology eventually became less prestigious (for discussion, see e.g. Sampson 1980: §1). So, in the 20th century linguists started to look for another scientific paradigm as a model. In the early 20th century then sociology started to supersede evolutionary biology as the dominant scientific paradigm. And Saussure was greatly influenced by the sociologist Durkheim, regarding language as ‘a social fact’ (see e.g. Sampson 1980: §2; Botha 1992: §5.2).2 Later on, the American structuralists, most prominently Bloomfield, got hooked up with psychology, with Wilhelm Wundt being the leading figure at the time. In accordance with the general scientific climate being dominated by positivism at that time, the method employed by the American structuralists was the inductive method, with the primary goal being the description of languages, in their case, the native American languages. Their approach was also strictly anti-mentalist. The object matter of investigation was restricted to what could be observed. With the rise of generative grammar in the late 1950s, linguistics was explicitly defined by Chomsky as a branch of cognitive psychology (and ultimately human biology, [Chomsky 1980]), and language began to be perceived of as a ‘mental organ’, or a ‘language instinct’ (Pinker 1994). That does not mean, however, that linguistics was solely defined in these terms. In Europe, for example, structuralism took a different strand and continued to be devoted to historical linguistics, with the link to the philologies being much tighter than in the US. In general, until today the field of linguistics has been very heterogeneous. This brief tour through the history of linguistics is certainly grossly oversimplified (the interested reader is referred to Sampson 1980; or Newmeyer 1980 for details). It should, however, demonstrate that ever since the 19th century linguists have sought the connection — and analogy — with other sciences, which
What counts as evidence in linguistics?
also affected the way language was conceived of (as a natural organism, as a social fact, as something materialist [i.e. observable], as a mental organ, etc.; for an overview on language ontologies, see e.g. Botha 1992). And, depending on the dominating scientific paradigm at the time, linguistics attached to different disciplines: evolutionary biology in the 19th century, sociology and psychology in the early 20th century, and cognitive psychology/sciences in the latter half of the 20th century. It should, however, be kept in mind that not all strands of linguistics wanted to be ‘scientific’ in that way, and that to this day there are still researchers of language who would probably regard themselves first and foremost as philologists, who study language(s) without necessarily committing (and consequently restricting) themselves to a certain scientific paradigm. We will leave that philological strand of linguistic research out of consideration here, and will focus in the following on linguistics as an empirical science. We would like to stress, however, that philology and science need not be each other counterparts. In certain areas of linguistics (e.g. historical or text linguistics) one can, and in fact should, do empirical research by employing philological knowledge on the data and empirical methods might be of value in philological research as well. The crucial point is that the opposite does not hold true, and that doing philology does not necessarily require the application of the scientific method as known from the natural sciences. 2.1 What does it mean to work ‘empirically’? So, what does it mean to work ‘empirically’ as a linguist? In this section, we will explore what the term ‘empirical’ implies, or rather can imply. We must, first of all, distinguish between empiricism as a philosophy of science, and empiricism as a method. In the former case, empiricism is an epistemological statement about how knowledge is acquired. Empiricism, as founded by Aristotle, holds that we are born with a ‘blank slate’ (tabula rasa, as coined by John Locke), which states that everything we know we have acquired through our senses (i.e. through experience). In this conception, therefore, all knowledge is learned. Diametrically opposed to this position is rationalism, as founded by Plato, which maintains that we are already born with knowledge and that it is through this knowledge that we interpret the world and our experiences. In this conception, knowledge is innate. From this use of empiricism as an epistemological statement about the acquisition of knowledge we have to distinguish the empirical method as used in the sciences (although the two are connected). However, even in the latter sense the term ‘empirical’ is used in various ways in linguistics, which will be discussed below.
3
4
Martina Penke and Anette Rosenbach
Popper and the principle of falsification Ever since Popper (1959 [2002]) introduced his concept of empirical science and methodology, it has become the dominant paradigm in the empirical sciences. So, what does it mean to work empirically, according to Popper? Empirical science, according to Popper, is first and foremost concerned with the stating and testing of hypotheses. A scientist, whether theorist or experimenter, puts forward statements, or systems of statements, and tests them step by step. In the field of empirical sciences, more particularly, he constructs hypotheses, or systems of theories, and tests them against experience by observation and experiment. (Popper 1959 [2002]: 1)
Popper argued that hypotheses and theories can never be verified but only ever be falsified. In this view, any scientific statement is supposed to be true only in the sense that it has not been falsified (yet). Popper’s infamous example was that if we only see white swans, we might inductively generalize that all swans are white. However, we can never be sure that all swans are white and that there are not, for example, any black swans somewhere out there. Therefore, a crucial requirement for any scientific statement or theory is that it must in principle be possible to falsify it by a systematic collection of data. Consequently, empirical research aims at ascertaining data that allows to falsify a hypothesis or theory. This hypothesis/ theory is then as long right as it has not been shown to be wrong. This, in a nutshell, is the empirical method in its most rigid version. What does it take for a theory or hypothesis to be falsified?3 In principle, any systematically collected counter-example will do. In practice, however, things are far more complicated (see e.g. the extensive discussion following on ‘Falsifiability vs. Usefulness’ in Mai 2002 on Linguist List, 13.1279, which touched on this very question). If it is only one single case, should we take it to reject a whole theory? If it is one experiment that provides counter-evidence against a theory as opposed to various other experiments that do not, how to evaluate this? A crucial factor for evaluating counter-evidence is certainly — and quite trivially — the quality of the research conducted. A systematically collected set of data that was obtained in a carefully constructed experiment in which all possible confounding factors were controlled for is, of course, more relevant than data obtained in a sloppy experiment. In general, we must also distinguish between weak and strong versions of falsifiability. In its strong version, strictly speaking, one counter-example should do. In its weak version, however, we are dealing with statistical tendencies rather than absolute statements. In this case, counter-examples, if statistically rare, will not really threaten the hypothesis. The crucial question, however, remains how many counter-examples it will take to refute such ‘softer’ types of hypotheses.4
An interesting instance of this dilemma can, for example, be found in recent discussions on the unidirectionality hypothesis in the framework of grammaticalization, which states that grammaticalization processes run from lexical to more grammatical, but not vice versa. Unidirectionality has been taken to be the strongest claim made in the grammaticalization framework, which is potentially falsifiable, and as such in fact it has been taken to justify its status as a ‘theory’. However, counter-examples to unidirectionality, i.e. cases of degrammaticalization, have been reported,5 and the question is in how far these should affect the unidirectionality hypothesis. In its strong version, any counter-example (i.e. case of degrammaticalization) should falsify it. The more commonly shared view is, however, that counter-examples exist, but that these are so rare that they do not threaten the principle of unidirectionality (see e.g. Hopper & Traugott 1993; Haspelmath 2004). In defense of this ‘softer’ version of the unidirectionality hypothesis, Hopper & Traugott (1993 [2003]: 133) in their second edition of their by now classic textbook to grammaticalization point out that in a functionalist approach such as grammaticalization, which focuses on the interplay of language structure and use, we are necessarily faced with non-categorical, gradient phenomena, and we therefore cannot expect things to be one hundred percent, unlike formal approaches, where the focus is on categorical, invariant phenomena (though see recent probabilistic formal approaches for incorporating gradience, see further Section 3.1 below). Another position is that these few cases are irrelevant as they can be explained otherwise (see e.g. Lehmann 1982 [1995]: §2.3). Whatever the position taken, linguists agree that cases of degrammaticalization are indeed far more infrequent than cases of grammaticalization. The perception of the rareness of counter-examples to unidirectionality may, however, be partly biased by the fact that in the past researchers have primarily set out to collect prototypical cases of grammaticalization, i.e. have worked inductively, but were not really out to find the odd cases, i.e. the falsifiers. This has changed somewhat in the meantime but the question remains what the basis for the statistics ought to be (cf. also Lass 2000: 214). Another notorious problem in the whole discussion is that there does not seem to be any consensus as to the question of what should indeed count as a genuine counter-example to unidirectionality. Is it sufficient that an element develops back along a grammaticalization cline? Or does it have to develop all the way back to a lexeme again (see e.g. Eythorsson et al. 2002)? Is decrease in scope a diagnostics for grammaticalization (Lehmann 1982 [1995]), or does the reverse hold true, i.e. increase in scope (Tabor & Traugott 1998)? All this shows that grammaticalization as an empirical theory is not well enough defined yet, since its potential falsifiers are not defined sufficiently (cf. also Lass 2000 for a lucid discussion on the role of counter-examples in grammaticalization research and a similar conclusion; see also Rosenbach 2004 for discussion).
Even if linguists generally agree on the fact that a certain type of evidence does indeed constitute a hard-and-fast counter-example, another issue is how to deal with this counter-evidence. According to Chomsky it is legitimate to ignore certain data to gain a deeper understanding of the principles governing the system under investigation (e.g. Chomsky 2002). Chomsky here refers to the so-called ‘Galilean style’ of science, a term coined by the nuclear physicist Steven Weinberg (Weinberg 1976). For example, contrary to the common Aristotelean assumption that the velocity of a falling body was determined by its weight, Galileo stated that an iron ball of 100 pounds weight falling down 100 meters would drop on the ground in the same instant as an identical iron ball weighing only one pound. In fact, this turned out to be wrong, since the heavy ball will drop down a short moment before the light one. Instead of rejecting his theory, this piece of counter-evidence led Galileo to further research leading to the discovery of the influence of air resistance and friction. This example shows that this ‘Galilean style’ of science can be valuable in scientific research (for other examples from the natural sciences, see Chomsky 2002; Jenkins 2000). In all these cases, the apparent counter-evidence was not taken to refute a theory, but stimulated further research that resulted in the discovery of principles so far unknown, thus enhancing our understanding of the phenomena under study. Chomsky therefore insists on adopting this ‘Galilean style’ of rational inquiry in linguistic research. Note, that in this scientific style empirical evidence may be ‘sacrificed’ in order to gain deeper insights into the phenomena under study. It may be for this reason (among others) that functional linguists often consider Chomsky a theoretical and essentially ‘unempirical’ or ‘axiomatic’ linguist (see e.g. Ringen 1980; Sampson 2001 [2002]). In general, the question is: what does it take to be an ‘empirical linguist’? Does it imply to conduct experiments? Or is it sufficient to accept the relevance of experimental evidence? In the former sense, Chomsky certainly is not an empirical linguist, and he obviously is not empirical in the epistemological sense (i.e. his philosophical views on the acquisition of linguistic knowledge). However, he does subscribe to linguistics as an empirical science in which evidence — in principle — counts:6 Approaching the topic as in the sciences, we will look for all sorts of evidence. […] evidence can be found from studies of language acquisition and perception, aphasia, sign language, electrical activity of the brain, and who knows what else. (Chomsky 1994: 205)
So far, we have been looking at the use of the term ‘empirical’ in the strict, i.e. Popperian sense. In a looser sense, the notion of ‘empirical’ is commonly used to refer to data-driven research. At the extreme end, working ‘empirically’ is sometimes used to imply any kind of research based on naturally occurring speech data. In this use of the notion ‘empirical’, it is sufficient to use an actually attested example from some corpus to illustrate one’s theoretical point. This notion of
"intro-r39"> "intro-r83">
What counts as evidence in linguistics?
‘empirical’ seems to have developed out of an opposition to intuitive data as the primary data source of formal linguists. The important point, however, is that data in linguistics comes in a variety of types: as an utterance found in a corpus, as intuitive judgments of a native speaker on the grammaticality of a specific construction, as errors performed by normal speakers, by speakers learning a language or by speakers suffering a language disorder, as reaction-time data collected in a psycholinguistic experiment, as imaging data of brain areas activated in a language task, and so forth. To count as good empirical data it must be collected in a systematic way. Which of these data types is then invoked to provide evidence for a linguistic statement depends on a number of factors, such as the field of linguistics, the particular topic under investigation, or the willingness to consider data collected outside one’s own field of research. 2.2 Types of empirical data In the following, we will try to give a short systematization of the types of empirical data used in linguistics. We will classify empirical data along three dimensions: (i) qualitative vs. quantitative data, (ii) direct vs. indirect evidence, (iii) spontaneous vs. elicited data.
Qualitative vs. quantitative evidence In the use of ‘empirical’ in the sense of data-oriented research as described above, we can distinguish two types of evidence according to the way of how evidence is used, i.e. qualitative evidence and quantitative evidence (for this distinction, see e.g. Albert & Koster 2002: 2–3; for a similar distinction see also Fischer & Rosenbach 2000: 7–8). The terms are certainly not meant to be evaluative with ‘qualitative’ evidence constituting any better type of evidence than ‘quantitative’ evidence. Rather, the two terms refer to two different ways of collecting and using data. Using data qualitatively simply means that we use data to show that a certain form/construction is possible in a specific context or that a certain experimental effect occurs in an experimental setting. To illustrate this with a quite trivial example: If we pick an example from the New York Times (NYT) that shows that we can leave out the complementizer in a complement clause as in (1), this constitutes evidence that this is possible in English. This is positive evidence. (1) I’m not saying Santa is gay. (Quotation of the Day, NYT 27/11/2003, on Harvey Fierstein who set off debate by saying he would appear as Mrs. Claus in the Macy’s Thanksgiving Day Parade.)
Although it is only one example nobody would probably ever question its relevance. But what about a more unusual example? Manning (2003:292), for example,
7
8
Martina Penke and Anette Rosenbach
notes the occurrence of the construction as least as in a novel, as in (2), which he usually would regard as simply ungrammatical. (2) By the time their son was born, though, Honus Whiting was beginning to understand and privately share his wife’s opinion, as least as it pertained to Empire Falls. (R. Russo, Empire Falls, as cited in Manning 2003: 292)
When, however, searching for the construction in the New York Times newswire he could find some further examples, and plenty more when searching the Web. So, obviously, the use of as least as in the novel was not simply a typo or slip but represents a serious piece of positive evidence, despite its alleged ungrammatical status at first glance (see also Manning 2003: 292–3 for a good discussion of the status of such positive evidence). Positive evidence certainly constitutes first-order evidence in empirical research, but this evidence should be based on solid ground, i.e. on a systematic collection of data. Isolated or dubious cases require further, independent, and systematically collected evidence to distinguish mere ‘garbage’ from meaningful evidence (as in example [2] discussed by Manning 2003). Sometimes it is also argued that the fact that we cannot find a certain form/ construction or experimental effect is telling. However, such sort of negative evidence is of a much weaker type than positive evidence. We simply cannot know whether the form/construction or experimental effect is missing for a principled reason or not showing up by coincidence, for example, because we just did not look at a large enough data set. While we should be aware of the limits of negative evidence, in the absence of any other information it might sometimes serve as an interesting piece of data. So, for example, in historical linguistics it is sometimes necessary to rely on negative evidence despite its meager epistemological status, because we want to ascertain change, i.e. when certain forms/constructions (or certain uses of them) cease or begin to exist. For example, if we cannot find evidence for the existence of a form/construction in a text corpus, we might conclude that it has not been in use yet. However, the relevance of such negative evidence depends on whether other explanations for the lack of this form/ construction can be ruled out. For example, if we do not find any definite article in a Middle English corpus, this type of negative evidence is certainly telling. But what about a more marginal construction? Take, for example, the his-genitive (as in John his book) which is attested for earlier English. There is an ongoing debate in historical linguistics whether the Modern English possessive ‘s (John’s book) derives from this his-genitive (John his book) by reanalysis in late Middle English.7 Now, the problem is that in the crucial period (Middle English) such his-genitives are very rare. Moreover, there is good reason to assume that if they had been used at all, they were more common in colloquial, informal than in written language. What does it therefore mean if we do not find any relevant examples of his-gen-
itives in written Middle English texts (which are usually of high register)? Not too much, actually.8 To work quantitatively means that we do not use data solely to show that a form/construction or effect exists but rather how much of it exists, i.e. we quantify the data. Again, it is crucial that these figures were obtained in a scientifically sound way, and when quantifying data it is inadmissible to make explicit what was counted, and how. Statistical methods help to decide whether the differences found are meaningful (= significant) or random. In this case, the probability level (p) indicates how likely it is that the null hypothesis is correct and that our hypothesis is wrong. For a significant result then the probability level should not exceed 5%. It is important to note that such quantitative work is used in both functional and formal approaches to linguistics. In light of the fact that (classic) formal approaches proceed from a categorical view on language, this might look like an apparent contradiction, but it is not. Rather, the results of quantitative studies are taken to show whether a (categorical) rule exists. So, for example, in this line of research when investigating whether children have successfully mastered English past tense inflection, the presence of, say, 97% target forms in the data set is taken to confirm that the (categorical) past tense rule has been acquired. For a different application and interpretation of quantitative work within formal approaches, see the discussion on probabilistic formal approaches in Section 3.1 below.
Direct vs. indirect evidence Another way of classifying linguistic evidence draws on how directly the data reflects language knowledge. Corpus data and grammaticality judgments, for example, can be considered as direct evidence, while all types of data where we test for the occurrence or non-occurrence of specific and well-established experimental effects provide indirect evidence.9 The presence or absence of such an expected effect is then interpreted as evidence for or against a given theoretical model. To give an example, the status of German noun plurals on -n has been a topic of controversy for some years now. Whereas Clahsen (1999) stated that all German noun plurals other than -s are stored irregular forms, a number of researchers have countered that the plural marker -n is completely predictable for feminine nouns which end in Schwa in the singular (Blume-Blumen ‘flowers’) (henceforth -nfemplurals) and is therefore based on a process of regular affixation (e.g. Bittner 1994; Wiese 1996; Wunderlich 1999). Indirect evidence, that makes use of effects such as a frequency effect, can be used to shed some light on this controversy. Thus, for instance, Penke & Krause (2002) conducted a lexical-decision experiment (LDE) on German -n-plurals. In a LDE, subjects have to decide as quickly and accurately as possible whether a presented item is an existing word or not. The reaction time
9
"intro-r23"> "intro-r92">
10
Martina Penke and Anette Rosenbach
required to fulfil this word — nonword discrimination task is measured. Lexicaldecision times are affected by frequency (see Balota 1994), i.e. subjects take less time to decide that a frequent item is an existing word than they take for infrequent words. This effect reflects the assumption that memory traces get stronger with each exposure, making frequent forms easier accessible than infrequent ones. If -nfem-plurals were stored irregular forms, decision times for infrequent -nfem-plurals should be significantly longer than decision times for frequent ones. If however, -nfem-plurals are not stored but built by regular affixation, there should be no effect of the pluralform frequency since these noun plurals are not stored in the mental lexicon (cf. Clahsen 1999). Whereas Penke & Krause (2002) observed plural-form frequency effects for two types of noun plurals that are generally assumed to be stored irregular forms, there was no frequency effect for the critical -nfem-plurals. Penke & Krause thus interpreted the lack of this frequency effect as evidence that -nfem-plurals are not irregular, but built by affixation. Other indirect effects that are often used in psycholinguistic or neurolinguistic experiments are ungrammaticality effects that measure the processing costs of detecting and repairing ungrammatical structures, priming effects that reflect the activation of entries in the mental lexicon, or ERP components (i.e. specific electrical activation patterns of the brain) such as the N400 which occurs in response to semantic anomalies. These and other well-established effects can be used to provide indirect evidence on the status of grammatical constructions via the presence, absence or strength of the tested effect (cf. Clahsen 1999; or Penke 2002 for overview).
Spontaneous vs. elicited data Further, we can classify empirical research according to the different sources or methodologies used to collect data. Here we can broadly distinguish between spontaneous and elicited speech data. Spontaneous speech data is language data which is naturally produced, either in conversation, or in written texts. For linguists to observe this data, it must be recorded in corpora. By a linguistic corpus we usually mean a body of naturally occurring data that serves as the basis for linguistic analysis.10 We have to distinguish self-assembled corpora from publicly available corpora. In the former case, the researcher puts together the data. (Note, that even when looking at two Shakespeare plays to investigate a certain grammatical phenomenon, this will already constitute a corpus, whether it constitutes a good one, is another question though and depends on the research question.) In the latter case, a corpus is put together and made publicly accessible for research. Such corpora usually come in electronic format. One advantage of public corpora is certainly that studies are
easily comparable and replicable. Another advantage of such public corpora is their explicit attempt to be representative (see e.g. Kennedy 1998; McEvery & Wilson 1996 [2001]; and Meyer 2002 for addressing the subtle issue of corpus assembly). A well-known limitation of corpora is, however, the representation of rare phenomena, which may not be found within the limits of a corpus (cf. e.g. Eisenbeiss 2002: §III.1, III.5). In some fields of research, this limitation might be overcome by another type of public corpus which is getting increasingly popular, i.e. the World Wide Web. It contains a wealth of data, by far outweighing all other public corpora, and it may be searched with internet search machines (like google) or search engines specifically designed for linguistic research on the web, such as Webcorp (http://www.webcorp.org.uk/; cf. Renouf 2003). Bresnan & Nikitina (2003), for example, show on the basis of web searches that dative alternation constructions for verbs of manner of speaking that have previously been regarded as ungrammatical can actually be found on the web (cf. [3]). (3) Shooting the Urasian a surprised look, she muttered him a hurried apology as well before skirting down the hall. (www.geocities.com/cassiopeia.sc/fanfiction/findthemselves.html)
Another advantage of the internet as a data source is its use in tracking very recent developments in usage. Corpora are never truly up-to-date — the collection and annotation of a corpus is a tedious business —, so when the corpus is eventually launched the data will already be a few years old (at least). When trying to track the degree of grammaticalization of the let’s-construction in British and American English, De Clerck (2003), for example, could barely find any evidence for particlelike constructions (such as let’s don’t or let’s us), which are evidence for increased grammaticalization of let’s, in current electronic corpora of British and American English. However, when searching the web, he could adduce quite a few interesting examples. Moreover, language on the internet has also been shown to represent a new text type, which is in-between written and spoken language (see e.g. Zitzen & Stein 2004; and Zitzen 2003). So it is also in this respect that it serves as a good data source for tracking on-going developments in language, because written language is known to be conservative and change-inhibiting. However, there is certainly caution needed when using the web. All sorts of data, including data of non-native speakers can be found, and the data pool is constantly changing. For a useful discussion of the limits and possibilities of the World Wide Web as a linguistic corpus we refer to Meyer et al. (2003). However, no matter how large a corpus is, corpus data necessarily remains limited as it does not show anything that is possible in language. To test for language potential, it is therefore often necessary to elicit data. As elicited data we characterize all type of data that is explicitly elicited from
11
"intro-r73"> "intro-r77"> "intro-r33">
12
Martina Penke and Anette Rosenbach
informants by the researcher. Accordingly, elicited data does not only encompass data elicited in an elicitation task, but data coming from all types of psycho- or neurolinguistic experiments (e.g. production and comprehension experiments, or neuroimaging studies) as well as data collected in linguistic fieldwork, for instance by sociolinguists or typologists. In linguistic fieldwork, interviews or questionnaires are a common technique.11 A common methodological problem of such elicitation techniques is, however, that usually the linguist wants to know about what speakers naturally do when they are not observed, but the only way to find out about this is by observing them. This has become known as the observer’s paradox, a notorious problem known from sociolinguistic work. People tend to evaluate non-standard forms as bad and distance themselves from them when asked explicitly, while still using such forms actively. So, for example, Labov (1975) reports the case of a man who judged the use of any more in positive contexts, as in John is smoking a lot any more, as bad, but was later overheard to use it himself (as cited in Sampson 2001: 3–4). The art of a good interviewer is to overcome the observer’s paradox, ideally by indirect elicitation techniques that mask what the researcher is interested in, leaving the interviewed ignorant about it. Data can also be elicited by experimental techniques, most commonly found in psycholinguistics. A great advantage of experimental studies is that the researcher can control for all sorts of confounding factors when testing a hypothesis so that, in the ideal case, only the relevant factor is tested and all other factors are controlled for. In this sense, we will make sure that we only measure the influence of the tested independent variable (and not other confounding variables) on the measured dependent variable. On the other hand, this control over the variables is obtained at a price, namely the (sometimes highly) artificial situation and hence the lower naturalness of data acquired in an experimental setting. This in fact has led some linguists to regard corpus data as the better and more natural performance data (cf. e.g. Leech et al. 1994: 58). Note, however, that experimental settings can be designed in a more or less artificial way — this will both depend on the abilities of the experimenter as well as on the specific research question. So, for example, experimental techniques have been developed which put children in a natural play situation to elicit forms/constructions that otherwise only occur rarely in child corpora of spontaneous speech (cf. e.g. Eisenbeiss 1994; McDaniel 1996).
Computer modeling as a ‘third kind’ of evidence Apart from the various types of data discussed so far, recently also another, quite innovative type of evidence has become available in linguistics, i.e. computer modeling. Computer simulations certainly provide no performance data. Rather,
"intro-r101"> "intro-r70">
What counts as evidence in linguistics?
they are used to test the fit of a theory. A basic application of computer modeling is the simulation of language learning. In absence of any direct access to the processes of language learning in the brain, such models try to simulate them (with algorithms being more or less neurophysiologically plausible). Such models test how well they fit the observed data, given a certain input, a certain initial state, and learning algorithm. See, for example, the Iterated Learning Model (ILM) as introduced by Kirby, Smith & Brighton in this volume (cf. also Kirby & Hurford 2002), which tests what the initial state of the learner (= UG) on an evolutionary time-scale must have been to reach the typological patterns found today. While such approaches are not unproblematic (they often proceed from quite idealized — and hence unrealistic — assumptions), they nonetheless provide an interesting new type of evidence for various linguistic disciplines. It is beyond the scope of this introduction to give an exhaustive overview of all applications of computer simulations (for a useful introduction, see e.g. Kirby, Smith & Brighton this volume). Note, that this type of evidence is of special importance for claims about innateness. As such computer models are simulating language learning, they have come to be used as a piece of evidence in the debate of what can plausibly be assumed to be innate and what rather needs to be learned; see most prominently the role of connectionist models and also the Iterated Learning Model (cf. Kirby, Smith & Brighton this volume). We will get back to such models in Sections 3.2.1 and 3.2.4 below. 2.3 Common misconceptions concerning the nature of linguistic data
Spontaneous speech data as privileged evidence As said above, spontaneous speech data is sometimes regarded by functional linguists as a superior data source in contrast to intuitive data, as dominantly used in formal approaches (see e.g. Sampson 2001 [2002]). This view seems to have emerged as a reaction against the often very idiosyncractic and haphazard way that intuitive data has been obtained — in “as it suits the linguist’s fancy”, as Schütze (1996: 52) put it. In the extreme case, in the past linguists produced their own judgment and regarded this as a solid piece of evidence. This is not problematic in very clear cases (as in example [1] above), but what about more subtle cases? Most linguists will probably deem John is smoking a lot any more as ungrammatical, but as shown above, other speakers naturally use such examples in their vernacular. This example illustrates one pitfall of intuitive data, i.e. the problem of coping with data from non-standard varieties. It also exemplifies a crucial methodological problem in such cases: even if an informant judges this construction as bad, this does not mean that it is truly ungrammatical for her or him, as shown above. People simply tend to evaluate the more prestigious standard forms as more
13
"intro-r90">
14
Martina Penke and Anette Rosenbach
‘grammatical’, obviously mixing up ‘grammaticality’ and ‘correctness’ (see e.g. Cornips 2005, 2006 for a good discussion on the relevance of such sociolinguistic issues for research on grammaticality judgments; see also Weiß (this volume) for discussing the relevance of (non-)standard data). This relates to a general problem: Would laypersons really know how to tease apart subtle distinctions such as ‘grammaticality’, ‘interpretability’, ‘correctness’, or ‘acceptability’ as commonly made in linguistics in their responses? Note, however, that there is a growing awareness among formal linguists for the shakiness and shiftiness of grammaticality judgments, and that recently there have indeed been considerable efforts to minimize such pitfalls in empirical research, as probably initiated by Schütze’s (1996) monograph on the empirical basis of grammaticality judgments.12 The crucial point is that grammaticality judgments form valuable empirical evidence if they stand up to the requirements of good empirical research, i.e. are collected in a systematic way, and if possible confounds are well controlled for in the data collection. Of course, they form a quite different type of linguistic evidence than spontaneous speech data. However, the latter is not per se privileged (as e.g. also recently stressed by Newmeyer 2003b), nor in fact is the former.13
Competence data vs. performance data A widely held view, found both among functionalists and formalists, is that speakers’ intuitions about language constitute a different type of data, i.e. ‘competence data’, whereas data on speech production and comprehension constitute ‘performance data’. However, as most clearly argued by Schütze (1996, 2003), there is no such thing as competence data. According to Chomsky, language competence is an abstract system of knowledge represented in the mind/brain. This system is accessed when we produce, comprehend or judge the grammaticality of speech utterances. The application of this abstract system of knowledge results in performance data. Grammaticality judgments, thus, are performance data as well. Since this abstract system of knowledge is not directly accessible, insights in our language competence can only be gained by the analysis of performance data reflecting the application of this knowledge. In this sense, intuitive data just forms one type of performance data, among various others (see also Section 3.1 below).
‘Psychologically real’ data It is also sometimes argued in cognitive linguistics as well as in psycho- and neurolinguistics that data obtained in experiments are superior to data used in formal theoretical linguistics, since, in this view, experimental data provides evidence for the ‘psycholinguistic reality’ of the assumptions made in theoretical
"intro-r85"> "intro-r22"> "intro-r61">
What counts as evidence in linguistics?
linguistics, which are often judged as mere intellectual speculations without any relation to actual data (e.g. McCawley 1980; Sandra 1998). Representative for this view is the opinion expressed by McCawley (1980: 27) that linguistic theories do not tell us about language competence, “but about what transformational grammarians are willing to let each other get away with”. For a contrary view, however, see the discussions in Chomsky (1980) and Jenkins (2000). Under their view, any theory of grammar carries a truth claim, i.e. the claim to adequately explain reality. Accordingly, any linguistic analysis of performance data follows — or at least should follow — the goal of adequately representing our language faculty. Data from psycho- or neurolinguistic experiments offer no better insight into grammatical competence than other types of data, such as data obtained by grammaticality judgments of native speakers. They constitute, however, a different type of data allowing for different ways of accessing linguistic knowledge. Thus, there is no a priori evaluative distinction between these different types of data used in linguistics in the sense that one constitutes better evidence for the study of our language faculty than others.14
3. What counts as evidence in formal and functional approaches to linguistics? In the preceding section we looked at the general status of evidence in linguistics and the various types of data available to linguists. In this section, we will now zoom in at the question of what type of evidence is used in formal and functional approaches to linguistics, with the special focus on innateness claims. We will try to show that the question of what counts as evidence in linguistics crucially rests on the respective approach taken to the study of language and the underlying conception of language. 3.1 Formal and functional approaches to linguistics The distinction between formal and functional approaches is a common way of classifying theoretical approaches to linguistics. As often with such broad terms, they are difficult to define.
Formal approaches The term ‘formal’ can refer both to the fact that (i) an approach is concerned with language form (rather than function), or (ii) that it is associated with the generative paradigm (see also Dryer 2006: §3.6). Note, however, that the term ‘generative’
15
16
Martina Penke and Anette Rosenbach
itself is ambiguous, too. In a narrow sense, it refers to Chomskyan generative grammar; in a wider and more general sense, however, the term can subsume all theories which aim at generating a grammar with some formal mechanism — the latter not necessarily having to be of the Chomskyan type (see e.g. Head-Driven Phrase Structure Grammar or Optimality Theory). In the wider reading of ‘formal’, formal linguistics is quite a heterogeneous field, and different approaches may consider different types of evidence as relevant or legitimate evidence — see most crucially the type of evidence considered by recent probabilistic formal approaches, as discussed in Section 3.1 below. Historically, Chomsky (1957) introduced the generative approach into the field of linguistics. In Chomsky’s rationalist view on language, language is an inborn mental capacity that is independent of other cognitive faculties. That is, his approach subsumes both nativism and mentalism as well as domain-specificity. Accordingly, the subject matter of investigation of a generative linguist is the abstract linguistic knowledge, i.e. competence or I-language. However, competence is not directly accessible: all available data is performance data or E-language (see also discussion in Section 2.3 above). While performance data must be derived from competence and language competence can only be deduced through analyzing performance data, there is no one-to-one relation between the two: grammatical competence is only one factor determining performance, with other factors, such as e.g. processing limitations, social, pragmatic, or discourse factors affecting it, too, as illustrated in Figure 1 below. Linguistic competence remains a ‘black box’, which does not allow for direct observation, and the crucial task for generative linguists is to find evidence for the content of this black box within performance data. Traditionally, introspection and speakers’ intuitions, in particular grammaticality judgments, have been considered social factors
processing/memory limitations
performance data
competence
Figure 1.The relation between performance data and competence
the primary data (and in fact superior data). However, these are not the only data source. To name just a few: typological studies investigating the similarities and differences between typologically diverse languages are taken to provide insight into the universal principles underlying languages as well as into those areas where variation can occur (see the discussion of typological evidence in Section 3.2.4 below and the contributions by Kirby, Smith & Brighton, Newmeyer, Haspelmath, and Wunderlich to this volume). Historical evidence on language change has been argued by Kiparsky (1968) to provide a ‘window to the mind’, i.e. changes should only occur within the limits given by the principles ruling the language faculty (and both Fischer and Lightfoot in this volume agree on this point, if largely disagreeing otherwise). Investigations of acquired or inherited language disorders have followed the rationale that we can learn about linguistic competence especially when it fails (see e.g. Caramazza 1984, 1992; Fromkin 1997). And studies making use of neuroimaging techniques such as PET and fMRI have been undertaken to provide insights into the neural localization of components of the language faculty (see e.g. Indefrey & Levelt 2000; Hagoort et al. 1999).
Functional approaches Functional approaches to language are very difficult to classify because they are so heterogeneous (for attempts, see e.g. Nichols 1984; Bates & McWhinney 1982; or Newmeyer 1998a). As Newmeyer (1998a: 13), quoting Bates, aptly put it: Elizabeth Bates has remarked that ‘functionalism is like Protestantism: it is a group of warring sects which agree only on the rejection of the authority of the Pope’ (cited in Van Valin 1990: 171).
What can probably be safely said is that functional approaches ascribe a central role to language function and not solely on language form (as compared to formal approaches), and the subject matter of investigation is much broader than in formal approaches, comprising all sorts of discourse-pragmatic factors affecting the use of language. Or, to put it negatively and echo Bates, what unites functional approaches to language is their rejection of Chomsky’s conception of language, most crucially his view that the language capacity is innate and domain-specific. In contrast to formal approaches, functionalists put emphasis on language usage and not on Chomskyan competence. And as pluralist as the approach is, as pluralist is the type of linguistic evidence that can be used: in principle, all kind of usage data may be used, and here functionalists just do not share the problem of formal linguists who have to filter out mere performance factors to see ‘the light’ of competence, since functionalists are not concerned with the investigation of grammatical competence in the first place — their domain is performance.15
Probabilistic formal approaches One type of evidence so far excluded from formal approaches is data on linguistic variation or optionality in the sense of “two ways of expressing the same thing” (Labov 1972: 271). For Chomsky, such variation was excluded for heuristic reasons from his investigation of competence in his infamous — and most controversial — postulation of the “ideal speaker-listener, in a completely homogeneous speechcommunity” (Chomsky 1965: 314). This does not deny the existence of variation, but it deems it as essentially irrelevant for the study of competence. However, sociolinguistic research, starting in the 1960s, has most clearly demonstrated that variation is not an arbitrary, random phenomenon, but highly structured, an insight reflected in the notion of ‘structured heterogeneity’ coined by Weinreich et al. (1968). The generative notion of competence, however, perceives of grammar as containing only categorical rules and hence only captures all-or-none phenomena. Linguistic variation in the sense of ‘two ways of saying the same thing’ has therefore been put outside the generative enterprise into the realm of performance. So far, therefore, research on linguistic variation has been largely a topic for sociolinguists but not for formal linguists. Very recently, however, probabilistic formal approaches to linguistics have been introduced, which do have become interested in the phenomenon of linguistic variation, and which ultimately break with the old axiom of categoricity in the language system (competence) of classic formal approaches to linguistics; for a good overview of this new field, see e.g. Bod et al. (2003). Such probabilistic formal approaches try to incorporate preferences (i.e. probabilities) into the notion of grammatical competence. One such approach is the functional/stochastic Optimalitytheoretic (OT) approach (see e.g. Bresnan et al. 2001; Bresnan & Aissen 2002; Aissen & Bresnan 2002, and this volume; Bresnan & Nikitina 2003; or Bresnan 2003). This approach tries to capture the observation (commonly found in functional approaches, see e.g. Givón 1979) that the same constraints that govern grammaticalized patterns cross-linguistically may show up as statistical preferences in single languages. A case in point, for example, is Bresnan et al.’s (2001) study on the role of person in active-passive variation. They show that there are languages in which the choice of a passive is obligatory if the patient is high in person, as e.g. for local (1st, 2nd) persons in Lummi, a Coast Salish language. The same constraints that result in a categorical effect in Lummi, are then shown to lead to statistical preferences in English. The results of a corpus analysis clearly show that in English a passive is more likely if the patient is a local person (see also Dingare 2001 for details), i.e. in English it is more likely to use (4a) than (4b). (4) a.
I was attacked by a man.
"intro-r19"> "intro-r16"> "intro-r90">
What counts as evidence in linguistics?
b. Mary was attacked by a man.
Under the classic formal view one would have to perceive of the grammaticalized pattern in Lummi as ‘competence’, while the variable pattern found in English ought to be viewed as ‘performance’, though both are governed by the same principles, as stressed by Bresnan (2003). In the functional/stochastic OT approach, categorical and variable (gradient) phenomena are dimensions on a scale, rather than two qualitatively different things, with categoricity only representing the extreme endpoint. The observed parallels between categorical and variable (i.e. frequency) effects are then modeled within a stochastic grammar.16 Unlike classic OT, in stochastic OT (cf. Boersma 1998) constraints are ranked on a continuous scale, where constraints can be more or less close to each other. The closer they are, the more variable the output; the more distant they are from each other, the less variable (or even categorical) they are. In this way, stochastic OT can model variation which lies outside the scope of classic OT, where by definition there can only be one winner (i.e. categoricity). In so doing, functional/stochastic OT incorporates linguistic variation into the study of grammar, which previously had been a priori excluded from formal investigation, and hence also variationist data. This shows how the type of evidence admitted will broaden, if the underlying notion of competence is broadened, too. The functional/stochastic OT approach can be viewed as a bridge theory between formal and functional linguistics: it incorporates phenomena such as variation and gradience which traditionally have been captured by functionalist approaches, but does so in a formal way. In so doing, it formalizes classic functional insights. Note, however, that the functional/ stochastic OT approach is quite controversial among classic generative approaches, see particularly Newmeyer (2002, 2003a) for criticism. Another strand of probabilistic linguistics (in morphology) is the work on analogical modeling by Baayen and his colleagues, who have suggested that morphological productivity is a graded phenomenon (for an overview on this work, see e.g. Baayen 2003). A notorious case investigated and discussed by Baayen (2003) is linking elements in Dutch, which represent an “apparently random morphological phenomenon that nevertheless is fully productive” (Baayen 2003: 243). In Dutch, the choice of a linking element in compounds looks quite random at first glance and cannot be accounted for by any rule. To illustrate this, Baayen (2003: 243) gives the following example, where the left constituent schaap ‘sheep’ can either occur with the linking elements -s- (5b), or -en- (5c), or without any linking element at all (5a): (5) a.
schaap-herder sheep-herder ‘shepherd’
19
20
Martina Penke and Anette Rosenbach
b. schaap-s-fold sheep-s-fold ‘sheepfold’ c. schaap-en-vlees sheep-en-meat ‘mutton’
However, the choice of a linking element is not completely random, and there is evidence that linking elements are chosen productively in novel compounds. Analogical modeling makes it possible to account for this phenomenon by adopting an analogical approach rather than a rule-based approach, showing that the likelihood for the occurrence of a certain linking element within a compound depends on its similarity to other compounds. Baayen (2003: 243–244) calls this ‘lazy learning’ because in these cases learning does not lead to the generation of a hard-and-fast rule, but “may involve a continuous process driven by an everincreasing instance base of exemplars” (244). Naturally, such probabilistic approaches crucially rely on quantitative data, and sophisticated statistical methods and models. To avoid any misunderstanding it should be added that quantitative data has been used in formal approaches before the advent of probabilistic formal linguistics, as e.g. in formal psycholinguistic work (see also Section 2.2 above). The crucial difference to the new probabilistic approaches is that in classic formal approaches quantitative data was used to empirically test certain hypotheses, but they did not aim at directly building the frequency distributions into a model of grammar, as the functional OT approach does. 3.2 Evidence for innateness claims We now turn to the special focus of this volume on the question of innateness. Innateness is one of the most hotly debated issues between formal and functional approaches and therefore seemed particularly well suited to show how these two linguistic camps differ not only in the type of evidence used, but also in their argumentation. When inviting contributions to this volume, we were laying out the following three guiding questions to our contributors: i.
What type of evidence can be used for innateness claims (or Universal Grammar [UG])? ii. What is the content of such innate features (or UG)? iii. How can UG be used as a theory guiding empirical research? In fact, the contributions to this volume focus on different aspects of these guiding questions. The papers by Kirby, Smith & Brighton, Haspelmath, Newmeyer, Weiß,
"intro-r34"> "intro-r94">
What counts as evidence in linguistics?
and Wunderlich focus on the first question, Wunderlich is the only one to address — quite valiantly — our second question, and Eckman and Fischer focus on the third question, which is also touched upon in Haspelmath’s, Newmeyer’s and Wunderlich’s contributions. In the following, we will subsequently discuss all three points. Special emphasis will be given to (i), i.e. what type of evidence is used for innateness claims. We will first give an overview of the arguments and evidence usually put forward in favour of or against nativist claims; these are domain-specificity (Section 3.2.1), the logical problem of language acquisition (Section 3.2.2), genetic evidence (Section 3.2.3), and universality (Section 3.2.4). In the following, we will discuss these points in that order. Note, that in the innateness debate usually neuro- and psycholinguistic evidence (including data from first language acquisition (L1)), as well as genetic evidence are widely — and highly controversially — discussed. For the purpose of this volume, however, we wanted to focus on a less commonly discussed type of evidence in the innateness debate, i.e. typological evidence, and hence the argument of universality, which is addressed by the contributions of Kirby, Smith & Brighton, Haspelmath, Newmeyer, and Wunderlich. Yet, we nonetheless want to give an overview of the current state of the art of the dominant strand of research into linguistic nativism in this introduction. We will try to make explicit which arguments are put forward by formal and functional approaches in defense of their views, and what type of evidence is invoked and in which way. 3.2.1 Species-specificity and domain-specificity
Species-specificity It is by now widely accepted that humans are the only species capable of learning and using language on the basis of the input available to children during language acquisition (see e.g. Elman et al. 1996; Pinker 2002: §I). The fact that all children — except under extremely hostile conditions and certain severe disorders — acquire the language they are exposed to, whereas other species such as dogs or chimpanzees exposed to the same input and input conditions, and despite excellent statistical learning and computational capacities will not, has been interpreted as evidence for some innate specification of language. This innate specification minimally enables the child to distinguish language material from other input, and it includes the mechanisms that subserve language acquisition, since “there can be no learning without innate circuitry to do the learning” (Pinker 2002: 35), whatever this may precisely look like. Or, to put it even more generally: human beings must be in some way biologically prepared to acquire language (cf. Tomasello 2003: 284). This much at least seems uncontroversial among formal and functional linguists, with
the debate rather focussing on what this innate language capacity precisely consists of, and whether it is specific to language and independent of other cognitive capacities. That is, the core of the controversy between formal and functional approaches in the innateness debate is on domain-specificity rather than on species-specificity (i.e. innateness proper).
Domain-specificity According to the generative approach, the human language faculty is due to a specialized, domain-specific language organ situated in the mind/brain that is part of our biological endowment and thus genetically specified (e.g. Chomsky 1980, 2002). In classic functional thinking, in contrast, our language ability is part and parcel of our general cognitive make-up; therefore, whatever brought about our cognitive system must also have brought about our language capacity (see e.g. Bates & MacWhinney 1979, 1982; Langacker 1987, 1988). A specification of the innate basis of this cognitive system is, however, an issue that is in general beyond the scope of the functional enterprise. In this respect, for functional linguists innateness as such is not a topic in the first place — it only becomes a topic for functionalists in their rejection of the generative counter-position, i.e. that there are aspects of language (most notably grammar) that are language-specific, i.e. independent from our general cognitive abilities, and that these aspects ought to be innately specified. Over the last 20 years proponents of functionalism have referred to evidence from the neurosciences and computer modeling (i.e. artificial neural network models) to show that the nativist, domain-specific position is wrong. In the neurosciences, for example, it has been shown that learning in the brain depends on the modulation of connections between simultaneously activated neurons linked together in cell assemblies (cf. Hebb 1949; Kandel & Hawkins 1992; Squire & Kandel 1999). The neurons themselves as well as the mechanisms involved in learning have, in principle, been found to be domain-unspecific and independent of the cognitive task and material involved. Further evidence against innate domain-specific circuitry in the brain is seen in findings on the plasticity of brain tissue. Results on how experience shapes the development of brain circuitry have been taken as support against an innate specification and for the equipotentiality of cortical areas in the brain (Elman et al. 1996; Quartz & Sejnowski 1997). In loose analogy to the processes involved in learning in the brain, connectionist neural network models aim at modeling learning in networks of simple input — output units linked by weighted connections. Connectionist networks modify their structure according to the task at hand. Learning proceeds via a restructuring of the given network’s architecture as connection weights are
modified to yield the desired output for a given input. Such networks have succeeded in learning a variety of cognitive tasks including language tasks such as producing English past tense forms for given real and nonce verbs (for an overview see Hinton 1992; Elman et al. 1996; Westermann 2000; Marcus 2001). They have, thus, provided evidence that simple associative learning mechanisms based on frequency distributions, similarity of features, and statistical correlations in the input data are indeed successful. The success of such simple domainunspecific learning mechanisms, and the impression that knowledge representation and learning as simulated in connectionist networks are somehow “closer to the neural level” (Smolensky 1988: 9) has led to a revival of the debate between formal and functional approaches whether the innate human language capacity is due to domain-specific or domain-general principles shared with other cognitive domains (for discussion see e.g. Pinker 1994, 2002; Elman et al. 1996; Westermann 2000; Marcus 2001; Penke 2002).17 The arguments from the neurosciences brought forward by connectionists have been countered by formal linguists with evidence that indicates that the brain is in general organized in a domain-specific way (Chomsky 2002; Pinker 2002: §I). Chomsky (2002: 64) cites the neuroscientist C. R. Gallistel (1999) with the statement that the modular view of learning — according to which the brain incorporates different organs, i.e. neural circuits that are computationally specialized to solve particular kinds of problems — is the norm these days in neuroscience. These specialized organs develop due to the interaction of innate predispositions, sculpting the gross structure of the brain, and the activation of the brain that is triggered by external or internally generated experience. In fact, recent findings point to the importance of genetic specifications for the organization of the brain (see Pinker 2002 for an overview). The visual cortex of ferrets, for instance, has been shown to develop its basic architecture despite removal of both eyes during prenatal development (Crowley & Katz 2000). The available evidence suggests that neural plasticity, i.e. the reallocation of brain tissue to new tasks, can only occur within the limits defined by the genetically shaped structure of the brain (see Pinker 2002: §I for discussion). Thus, far from being “a protean substance that can be shaped almost limitless by the structure and demands of the environment” (Pinker 2002: 74), the basic architecture of the brain, and its division into specialized domain-specific computational organs seems to be under genetic control. That is, the brain is not as plastic as connectionist work might suggest. Note, however, that it does not automatically follow from the overall modular organization of the brain that language is domain-specific, too (within such a modular brain). Another argument brought forward in support of the domain-specificity of an innate human language faculty is based on the dissociations between language and other cognitive capacities observed in inherited genetic disorders such as Specific
Language Impairment (SLI), Williams syndrome, or language savants (see Levy 1996 for an overview). Whereas children with SLI exhibit deficits in several core areas of the grammar despite a normal development of general cognitive functions as measured by IQ, children with Williams syndrome and language savants show remarkable language skills despite severe mental retardation and IQ values that often do not exceed 50 (cf. Gopnik & Crago 1991; van der Lely 1996; van der Lely & Stollwerck 1997; Bellugi et al. 1988, 1994, 2000; Clahsen & Almazan 1998; Levi & Kavé 1999). The double dissociation between language and general cognitive capacities observed in these disorders has been taken as evidence that the human language capacity is independent from other cognitive domains and cannot be put down to the operation of domain-general principles guiding acquisition (see Bellugi et al. 1988, 1994, 2000). Now, the crucial question is whether this double dissociation is indeed an empirical fact. This has been questioned by functionalists. Are the language abilities of children and adolescents with Williams syndrome really completely unimpaired compared to normal age-matched children? For a negative conclusion, see e.g. Stevens & Karmiloff-Smith (1997), Karmiloff-Smith (1998), Volterra et al. (1996), or Grant et al. (2002). And, are the language deficits of children with SLI really language-specific, or can they be attributed to domain-unspecific cognitive impairments, such as deficits in symbolic play, in nonverbal attention, or in spatial imagery (cf. Thal et al. 1991; Johnston 1994; Townsend et al. 1995)? There are certainly areas of language, such as spatial prepositions, principles of word meaning acquisition, or morphophonological well-formedness constraints, where children with Williams syndrome behave differently to normal controls (Stevens & Karmiloff-Smith 1997; Bellugi et al. 2000; Penke & Krause 2004). However, performance in core areas of grammar, such as regular inflection, word order, subject-verb agreement, passives, and binding, is often comparable to that of unimpaired agematched control children (cf. Clahsen & Almazan 1998; Penke & Krause 2004). Thus, at least for these areas a dissociation between language and general cognitive capacities seems to hold. Likewise, while certain non-linguistic deficits can be observed in children with SLI, there is evidence that these do not affect all children diagnosed with this disorder. Van der Lely (1997), for example, presents a case of a boy diagnosed with SLI who displays severely impaired morphosyntactic abilities, but no non-linguistic cognitive problems. Moreover, Bishop (2003) points out that the crucial issue is not whether children with SLI have any non-linguistic cognitive deficits, but whether these deficits can explain the observed language impairments. In her view, this has not been sufficiently shown so far. An interesting argument, recently brought forward by Karmiloff-Smith (1998), states that individuals suffering from Williams syndrome have acquired a language system that differs substantially from the system acquired by unimpaired speakers.
"intro-r43"> "intro-r65"> "intro-r22">
What counts as evidence in linguistics?
According to her neuroconstructivist approach, the genetic impairment underlying Williams syndrome causes slight differences in neural mechanisms such as the level of firing thresholds of neurons. These differences in the initial state push children onto a different developmental pathway, resulting in the construction of a qualitatively different grammatical system in the brain. Since the language system of individuals with Williams syndrome is qualitatively different from the ‘normal’ system, so Karmiloff-Smith’s argument, nothing can be concluded on the basis of data from Williams syndrome about the ‘normal’ language faculty — for instance about its autonomy from general cognition. However, there is, on the other hand, neuroanatomical evidence that suggests that an atypical organization of the language areas in the brain does not necessarily have to point to a qualitatively different language system. Geschwind & Levitzky (1968), for example, found that about one third of the normal population has a different anatomical make-up of the language areas in the brain without necessarily suffering from a language disorder. Also, as said above, individuals with Williams Syndrome seem to exhibit perfect mastery of core areas of the grammar with no apparent indication of qualitative differences affecting these systems. Future empirical enquiry will have to show whether further evidence can be found in support of Karmiloff-Smith’s (1998) hypothesis. 3.2.2 The logical problem of language acquisition Another central — and in fact classic — argument for assuming a domain-specific, innate language capacity is the so-called ‘logical problem of language acquisition’. The logical problem of language acquisition is based on the ‘poverty-of-thestimulus’ argument invoked in Chomsky’s earlier writings (Chomsky 1965, 1980); for discussion, see also Fischer (this volume). The ‘poverty-of-the-stimulus’ argument, sometimes also called ‘Plato’s problem’, states that there is an astonishing discrepancy between the sparseness and the poverty of the input data children receive during language acquisition, and the complexity and uniformity of the acquired grammatical knowledge. This pretty vague formulation has led to various interpretations as to what the ‘poverty-of-the-stimulus’ argument actually refers to (for a good discussion and overview, see Pullum & Scholz 2002).18 In the nativist conception then, this gap between experience and acquired knowledge can only be overcome by an innate disposition for language. As, however, argued by Pullum & Scholz (2002) this argument proceeds from the assumption that the input is indeed impoverished, which should be subject to empirical enquiry rather than taken as given. And indeed, since the first statement of the ‘poverty-of-the-stimulus’ argument, a body of evidence has been accumulated in research on language acquisition that has led to a more realistic evaluation of the input data. This research has shown that the input children receive during language acquisition is not as depleted, poor, and ungrammatical as originally claimed by Chomsky (cf.
Snow 1977, 1995; Locke 1995). Moreover, prosodic characteristics of child directed speech, the frequency of specific elements in the input, and the situational context of communication have been shown to facilitate the task of language acquisition considerably (cf. Gerken 1996, 2001; Jusczyk 2001; Pinker 1989; the overview by Eisenbeiss 2002: §I.4; and the literature cited by Fischer this volume). As a consequence, the ‘poverty-of-the-stimulus’ argument has been reformulated into the ‘logical problem of language acquisition’, which states that the input data for a child is still underdetermined in two crucial ways: (i) It is quantitatively underdetermined, i.e. the child receives a finite set of input data, but must be able to produce and understand a non-finite set of data after acquisition. (ii) The input is also qualitatively underdetermined, i.e. the child is exposed to spoken utterances, but has to induce abstract rules of grammar (in order to be able to produce and understand a non-finite set of data after acquisition) (Fanselow & Felix 1987). A simple inductive learning mechanism based on positive input data will, however, run into problems since any given set of data is compatible with indefinitely many generalizations (Gold 1967; Uriagereka 1998; Scholz & Pullum 2002). In contrast to the ‘poverty-of-the-stimulus’ argument — which sometimes proceeds from the absence of certain constructions in the input (see Pullum & Scholz 2002) — the ‘logical problem of language acquisition’ points to more subtle aspects in which language input is underdetermined: even in the data present, the child does not get sufficient information for the generalizations she or he will arrive at. How else, then, do children unanimously come to the right generalizations? A particularly severe problem will occur in those cases where the child induces a rule that generates all the possible structures of her or his native language plus additional ones not allowed by the grammar of this language (the subset-superset problem, cf. Pinker 1989). True, the child will never hear these illicit structures produced by her or his grammar in the input. However, the non-occurrence of specific grammatical constructions cannot be taken as evidence for their ungrammaticality, since it might just be accidental that these constructions so far have not occurred (cf. Pinker 1989; Marcus 1993). In these cases, only negative evidence, i.e. explicit information on the ungrammaticality of these illicit structures, could inform the child about the incorrectness of her or his generalization (cf. Bowerman 1988; Pinker 1989; Marcus 1993).19 Whether caretakers indeed provide negative evidence has been shown to depend on factors such as the cultural background, or their socio-economic status. In addition, more often than not, it is not at all obvious whether the negative evidence provided refers to articulatory, grammatical, or situational correctness. And finally, children make no use of negative evidence and even reject it (see McNeill 1966; Pinker 1994: §9; Eisenbeiss 2002: §I.4). The crucial issue regarding negative evidence, therefore, is the following: negative evidence that is systematically supplied and transparently related to the grammati-
cal structure of the child’s utterance is not available to every child. Since all children nevertheless acquire the grammar of their native language, negative evidence cannot be necessary for language acquisition. In a nutshell, then, the ‘logical problem of language acquisition’ consists of the following parts: The child has to generalize abstract rules on the basis of a finite set of positive input data. The number of compatible generalizations is unlimited and some will lead to the subset-superset problem. Negative evidence that could prevent the child from incorrect generalizations is not available to all children and therefore cannot be necessary for language acquisition (Fanselow & Felix 1987). Under these conditions, language acquisition has to fail. However, it does not. The crucial question now is how to account for this astonishing fact? In principle, the question is whether the apparent underdetermination of the data must necessarily be interpreted as implying domain-specific innate knowledge, or not. Note, that underdetermination as such does not necessarily have to imply domain-specificity at all (see e.g. Newmeyer 1998a: 88, citing Dryer, p.c.; Scholz & Pullum 2002: 199). And again, quite naturally, functional and generative linguists come to different, in fact opposing, answers. The generative position is that children master language acquisition despite the ‘logical problem of language acquisition’ because of their innate predisposition for learning grammar, i.e. Universal Grammar (UG), which inherently constrains the set of possible generalizations the child might entertain. In contrast, functionalists have put forward the view that children can indeed acquire language on the basis of the input alone, without any innate grammatical knowledge. In fact, functionalists criticize generative research for not paying enough attention to the properties of the input. Therefore, the functional line of research has focused on the investigation of the input children are exposed to during language acquisition, and they have tried to show that this input is indeed sufficient for children to acquire language (for an overview see e.g. Eisenbeiss 2002: §I.4). Generative linguists react by pointing out that the crucial task is not to identify what might be of help in language acquisition, but to investigate which characteristics are necessary and sufficient to guarantee successful language acquisition by all children. For whatever evidence functionalists put forward in support of their view, generativists will then quite comfortably lay back and challenge them to show that every generalization can be adduced from the input by every child. And it is indeed quite freely admitted by Tomasello (2003: 301) that to date functionalists have not been able to constrain the generalizations that children make fully. Note, however, that in this enterprise the burden of providing evidence is apparently put on functionalists, with generativists a priori proceeding from the assumption that these generalizations are already innately constrained. The generativist position is generally to ask ‘how else?’ (if not innate), and the functionalists are challenged to show how things may
27
"intro-r22">
28
Martina Penke and Anette Rosenbach
be otherwise. That is, these two positions are not to be reconciled, because they a priori proceed from different epistemological positions as to the acquisition of knowledge and hence language. It is beyond the scope of an introduction like this to give a full overview of all the approaches to account for language acquisition. Rather, to show the basic spirit of the arguments put forward, we will now briefly contrast the major formal theoretical framework used to overcome the ‘logical problem of language acquisition’ (i.e. Chomsky’s) with one recent functionalist account of language acquisition, i.e. the construction-based approach by Tomasello (2003). According to Chomsky’s Principles and Parameters approach (Chomsky 1981), children are equipped with an innate, genetically specified language faculty — Universal Grammar (UG). UG consists of a set of principles universal to all languages. Some of these principles allow for a choice between limited options (ideally two). During language acquisition these parameters have to be fixed to the values expressed in the language acquired. A parameter is set based on the input available to the child and thus no negative evidence is required. According to Chomsky’s solution, the mind/brain of the child is pre-programmed with a vast store of information which is genetically determined.20 In Tomasello’s (2003) approach, rather than an innate language-specific disposition for language, it is human-specific socio-cognitive skills, such as intention-reading and pattern-finding, that enable children to acquire language. This does not really deny the existence of some innate prewiring, but it pushes the content of prewiring considerably back into the non-linguistic cognitive domain. According to Tomasello’s construction-based approach, children first start from item-based constructions, from which they then proceed — qua analogy (a way of pattern-finding) — to more complex abstract constructions. Note, that in this approach syntactic creativity (one of the major arguments of formal linguists to assume a ‘generative grammar’) is accounted for by analogy (on analogy, see also Fischer this volume, §3.3.2).21 Proceeding from two fundamentally different perspectives of how children might overcome the ‘logical problem of language acquisition’, generative approaches (as the Principles and Parameters approach) and functional approaches (as the construction-based approach), considerably differ in the type of evidence used and admitted. For example, while in Tomasello’s (2003) construction approach idiomatic child utterances (e.g. thank you, I-wanna-do-it) are an important piece of evidence (as they, together with early holophrases, provide the first step from which more abstract constructions are generalized), in generative approaches such idiomatic chunks are usually not considered, since they represent unanalyzed, and hence from the generativist point of view, uninteresting and irrelevant data. That is, the underlying theoretical framework constrains the way the researcher will look at the data. A very interesting source of evidence directly related to the question of whether
children proceed in language development from the structure of the input or from an innate language faculty comes from data of language-isolated children. Three cases of language-isolation have by now been described in the literature: (i) children exposed to Hawaiian pidgin (Bickerton 1984, 1999), (ii) deaf children that grew up without sign language in Nicaragua (Kegl et al. 2001), and (iii) deaf children of hearing parents with no knowledge of a sign language (Goldin-Meadow & Mylander 1990).22 What is common to these cases is that all these children have not been exposed to any structured language input during childhood. Nevertheless, these children have been found to create language systems that exhibit characteristics such as a combinatorial system of basic elements, a system of grammatical markers, and consistent ordering of elements depending on the thematic roles expressed. Note that these characteristics are not present in the input the children are exposed to. Therefore the creation of systems that display these characteristics cannot be ascribed to simple learning mechanisms that make use of statistical distributions of elements in the input. “[…] it doesn’t take language to make language” as Kegl et al. (2001: 206) conclude. On the contrary, these findings suggest that innately specified biases are operative in language acquisition, and are put to use in the creation of a systematically structured language system. This argument is countered by functionalists by questioning the empirical assumptions underlying it. See, for example, the literature cited in Tomasello (2003: 287–288) where it is argued that creole-developing children are exposed to more input than assumed by Bickerton. Whether all the cases of language-isolated children reported can be explained this way, remains an empirical question. 3.2.3 Genetic evidence If the language capacity is in any way innately specified, this should ultimately show somewhere on the genetic level. And indeed, evidence for the genetic inheritance of language disorders has been quoted in the literature as probably the most persuasive piece of evidence in support of an innate language capacity (cf. e.g. Pinker 1994:§10). By now, a number of such language disorders for which a genetic basis is assumed have been reported in the literature. In the following, we will focus on the best studied case, i.e. the British KE family (for an excellent overview on the heritability of language abilities and impairments see Stromswold 2001). The facts as such are clear: about half of the members of this three-generational family suffer from a speech and language disorder that runs through this family in a pattern that strongly suggests autosomal dominant inheritance (Vargha-Kahdem et al. 1995). However, the very fact that there is some genetically inherited disorder does not automatically mean that it is also language-specific (cf. Pinker 1994:48–49; see also Newmeyer 1998a: 93). Thus, the controversy between formal and functional linguists, again, centers on the empirical underpinning of the underlying
assumption, i.e. whether these deficits found are indeed language-specific, or not. In the following, we will try to give an overview of the debate and recent findings. Note, from the outset, that it is one thing to deduce a genetic deficit from the occurrence of certain disorders among various generations of a family, yet another to investigate the genetic mechanisms on the molecular genetic level. The great advances that have been made in molecular genetics during the last few years, however, make an assessment of the genetic basis of these deficits much more feasible than in the past. In 2001, a point mutation of the human gene FOXP2 (located on the long arm of chromosome 7) was found to be related to the inherited speech and language disorder affecting the members of the KE family (Lai et al. 2001). The findings suggest that the mutation of this gene leads to an atypical development of brain areas typically associated with speech and language. This is the first case attested where a genetic defect can be directly correlated to neuroanatomical abnormalities which in turn are held responsible for certain speech and language functions (Lai et al. 2001).23 Furthermore, a comparison between the FOXP2 genes of humans, mice and apes suggests that the human variant of FOXP2 is of relatively recent origin and that fixation in the genome of our species occurred during the last 200,000 years concomitant with or subsequent to the emergence of anatomically modern humans (Enard et al. 2002). FOXP2 was therefore enthusiastically celebrated as the first ‘language gene’ to be discovered.24 Proponents of functionalism (e.g. Sampson 1997: 90–96) usually refer to the work by Vargha-Kahdem and colleagues (1995) to show that the affected members of the KE family do not suffer from a selective deficit in grammatical capacities as was proposed by Gopnik and colleagues (cf. Gopnik & Crago 1991; Gopnik et al. 1997). Vargha-Kahdem et al. (1998: 12698) rather argue that the deficit primarily affects “the rapid and precise coordination of orofacial movements, including those required for the sequential articulation of speech sounds”. The precise description of the speech and language disorders evinced by affected members of the KE family is still a matter of debate. However, even the evidence available in the study of Vargha-Kahdem and colleagues (1995) points to deficits in verbal inflection and the inflection of nonwords that cannot be explained by the articulatory disorder of the affected family members. The presence of grammatical deficits besides an articulatory disorder is also supported by neuroimaging studies of affected members of the KE family (Vargha-Kahdem 1998; Watkins et al. 2002; Liégeois et al. 2003). In addition to functional and anatomical abnormalities in cortical and subcortical brain regions associated with the impairment in speech production, abnormalities were also observed for Broca’s and Wernicke’s area, two brain areas generally associated with language processing. Thus, in a recent publication even Vargha-Kahdem’s group concludes “that the FOXP2 gene is critically involved in
"intro-r49"> "intro-r22">
What counts as evidence in linguistics?
the development of the neural systems that mediate speech and language” (Liégois et al. 2003: 1230). So, again, from the generative point of view functionalists are challenged to show that all the reported deficits can indeed be attributed to deficits in non-linguistic skills. 3.2.4 Universality and typological evidence Given that linguistic knowledge is supposed to be innately specified, it follows that it should be the same for every human being. And in fact, we usually tell our undergraduates about the amazing fact that a German child raised in China in a Chinese-speaking environment will acquire Chinese in the same way as a Chinese child being raised in a German-speaking environment in Germany will learn flawless German. Given this fact, linguistic evidence from the languages of the world (i.e. typological evidence) should provide a good window into the mind, too. And, indeed it has been claimed that cross-linguistic data is of central importance for getting insights into the nature of UG (cf. e.g. Haegeman 1994: 18, as cited in Haspelmath this volume, p. 87). Practically, however, typology has not been a central concern of generative research (see also Newmeyer this volume). As such, the positions in the two linguistic camps are clear. Formalists attribute the universality of certain language features to UG (cf. e.g. Chomsky 1998: 33, as cited in Newmeyer this volume, p. 52), while for functionalists the typological patterns fall out from language usage, and universality is ascribed to general cognitive principles (rather than grammar-specific principles; see e.g. Croft 1990). However, in the innateness debate universality — and hence typological evidence — has not played the prominent role one might expect. Most of the papers in this volume, i.e. the contributions by Kirby, Smith & Brighton, Haspelmath, Newmeyer, and Wunderlich center on the status of typological evidence for linguistic theory in general, and for the construction of UG in particular. It is not surprising that different linguists would come to different conclusions as to the status of typological evidence for UG. What is surprising, however, is that in this case different positions cannot straightforwardly be delineated along the formalfunctional opposition. In the contributions to this volume, Newmeyer and Haspelmath agree on the fact that typological evidence is essentially irrelevant for the construction of UG, while Wunderlich strongly opposes that view.25 Now, how come that two formal linguists (Newmeyer and Wunderlich) strongly disagree, while a functional (Haspelmath) and formal linguist (Newmeyer) agree? To answer this question, we first need to look at how the term ‘typological evidence’ is used in these contributions. It is important to note that typological evidence comes in three qualitatively different types, as pointed out by Newmeyer (this volume, p. 55): (i) absolute universals, i.e. properties of language without any cross-linguistic exceptions, (ii) implicational universals (‘If a language has property X, then it will have
31
"intro-r52">
32
Martina Penke and Anette Rosenbach
property Y’), and (iii) frequency statements of the form ‘More languages have X than Y’. Newmeyer’s claim that typological evidence is irrelevant for the construction of UG concerns only the latter two types, but not the first one. He argues that it is very implausible that typological evidence in the sense of (ii) and (iii) should form part of linguistic knowledge. Moreover, he demonstrates that most of these typological generalizations fall out from language usage, as they can be explained by usage-based principles such as Hawkins’ (1994) Early Immediate Constituents (EIC) principle, or frequency. Note, that in so doing Newmeyer proceeds from the classic generative division of labour between competence and performance, in which the former only contains domain-specific and categorical statements, while the latter contains everything else.26 Interestingly, Haspelmath (this volume) backs up Newmeyer’s (generative) position in as far as the (ir)relevance of typological evidence for the construction of UG is concerned from a functional point of view. In particular, Haspelmath agrees with Newmeyer’s (1998b, this volume) position that UG is a statement about what is possible in language, but that the attested languages only form part of what is possible, i.e. their properties do not determine UG fully. Therefore, typological evidence will not help to reveal properties of UG sufficiently. In fact, Haspelmath advocates a certain division of labor between (functional)-typological work and UG approaches. In his view, the description of language (i.e. typology) should be purely phenomenological. In this sense, then, we do not need specific theoretical frameworks for language description (see, however, Wunderlich this volume, for arguing against Haspelmath’s position). On the other hand, approaches interested in UG, rather than looking at evidence from language description, should resort to other types of evidence, and Haspelmath (§3.4) proposes here, quite originally, the natural acquisition of artificial languages (such as Esperanto), artificial acquisition experiments with adults, or language games as possibly relevant evidence for UG. Wunderlich (this volume) disagrees with both Newmeyer and Haspelmath on the irrelevance of typological evidence for UG. It seems to us, however, that the different positions between the two formal linguists, Wunderlich and Newmeyer, are due to different perspectives on typological evidence. Among other things, Newmeyer argues that typological evidence (in the sense of (ii) and (iii) mentioned above) cannot be part of linguistic knowledge because it is highly implausible that children should have any knowledge about such typological generalizations. Wunderlich (this volume, note 13) agrees with Newmeyer on this point. Rather, Wunderlich focuses on ‘typological evidence’ as a piece of evidence available to the linguist to get further insights into language, and ultimately UG (in his framework). By looking at typological evidence, he tries to separate possible features of UG from what he perceives of as ‘later innovations’ (see also Section 3.3 below). Under this view then, many typological generalizations — even if not being part of
What counts as evidence in linguistics?
UG — will still have been determined by it, having passed through the ‘linguistic bottleneck’ of first language acquisition (which in turn is determined by UG in his view). In fact, Wunderlich gives a quite precise definition of UG in terms of “a description of the (genetically transferred) information for the brain of how it has to process chunks of memorized linguistic input” (p. 148). In this way, the concept of UG is inextricably linked to neurolinguistics, and moreover to the evolution of language in that UG must be the properties that a protolanguage (once the language capacity had emerged) contained. Typological diversity is then the result of an interplay of a changing input and UG (in the process of L1).27 Ultimately, then, Wunderlich and Newmeyer seem to agree on the fact that most typological evidence (except absolute universals) are not direct evidence for UG — what they disagree on is the worth of such more indirect evidence for insights into UG. While Newmeyer regards such evidence as essentially irrelevant, Wunderlich still ascribes it an important role. Wunderlich’s assumption of language acquisition as an evolutionary ‘linguistic bottleneck’ is akin to the Iterated Learning Model (ILM) as introduced by Kirby, Smith & Brighton (this volume; see also Kirby & Hurford 2002). This is a computer model that tries to simulate the emergence of language universals on the basis of some given initial state for a language learner (i.e. UG) on an evolutionary time scale.28 Most crucially, they show that UG is only one factor contributing to the emergence of language universals in a process of cultural adaptive learning, and that therefore the two cannot be equated. From this follows that language universals cannot be taken as direct evidence for UG, as indeed cautioned by Kirby, Smith & Brighton (this volume). In this sense, Kirby, Smith & Brighton’s conclusion is consistent with both Newmeyer’s and Wunderlich’s conclusion, the controversial question remaining, as said, how to interpret such indirect evidence. An important question addressed by Weiß (this volume) is the relevance of (non)standard data for linguistic research and UG. Talking about typological evidence, what does it mean to look at evidence from, say, ‘German’? Linguists usually only consider evidence from the standard variety, but what about evidence from non-standard varieties, such as Bavarian German? Even more so, if the evidence from standard and non-standard varieties points to different interpretations (as in the data on German negation discussed by Weiß)? Weiß points out that historically, over a long period of time, the standard variety in languages such as German and Dutch was only a written variety which was not used in spoken language, and hence not acquired as a first language. On the basis of this observation, Weiß draws a general distinction between what he calls N1 languages (i.e. languages acquired as first language [L1]) and N2 languages (= languages not acquired as L1), classifying such standard languages (at least historically) as N2 languages. In so doing, Weiß raises the important question (not only for UG-based approaches) of what is the worth of such standard data as linguistic evidence.
33
"intro-r37">
34
Martina Penke and Anette Rosenbach
3.3 Content of Universal Grammar (UG) It is hard to find any exhaustive list of what should be part of UG. Given this, it is all the more suprising — and in fact welcome — to find such a list suggested by Wunderlich (this volume). His list of UG features/properties includes very general aspects of language, such as the distinction between nouns and verbs, distinctive features, or double articulation. Most crucially, he excludes syntactic principles (in the sense of displacement/movement) from UG — a step which will certainly be considered as highly controversial among generative linguists, for whom syntax is the heart of UG, so to speak. Ultimately, of course, the precise content of UG is and remains an empirical question. It is important to note that the notion of ‘UG’ as such is not defined in any unique way in the formal literature. In its strongest interpretation, it refers to the generative (domain-specific) type of UG which postulates quite specific principles (such as subjacency).29 Minimally, it can refer to whatever must be pregiven to allow the acquisition of language (not necessarily domain-specific); see e.g. Kirby, Smith & Brighton (this volume)’s equation of UG with the initial state of a language learner, or Wunderlich’s definition given above. When evaluating the postulation of UG principles/features, two lines of reasoning seem to prevail: i. Which aspects of language can be shown to be learned? ii. What can plausibly be assumed to be innate? Under the first reasoning, anything that can be shown to be learnable should not be put inside UG. In deciding what is in fact learnable or not, computer modeling has become an important and innovative piece of evidence, as e.g. demonstrated in this volume by Kirby, Smith & Brighton, which can test what properties of UG should be reasonably assumed to get to the structural patterns and the variation found today. As to the second line of reasoning, Wunderlich (this volume) draws attention to a paradox pointed out by Fanselow (1992): Some UG principles proposed (such as subjacency) are so specific that they cannot be plausibly assumed to be innately specified. On the other hand, some more general UG principles are so general that they probably can be explained by general cognitive principles (as e.g. the Elsewhere condition, or economy). Newmeyer (this volume), for example, argues against the plausibility of the parameters proposed by Baker (2001). Apart from pointing out empirical problems with Baker’s approach, Newmeyer also points to the fact that an incredible number of parameters are needed to handle typological variation — in Newmeyer’s view far too many to be plausibly assumed to be innate. Note, that the high inflation of parameters in the Principles and Parameters approach over the years has led to doubts about how all the principles and parameters suggested in the literature might possibly be encoded in the human genome,
even among formal researchers (see Fanselow 1992; Wunderlich this volume). As the amount of pre-specified information grew, claims on the innateness of this knowledge were to an increasing degree regarded as vacuous. A similar problem can be observed in the postulation of innate OT constraints, where more and more constraints have been assumed to save a particular analysis. Many of these constraints appear to be highly language- or phenomenon-idiosyncratic and thus have come under considerable attack (see e.g. McMahon 2000). Such implausible innateness claims may have led to the view that the innateness of language structures or principles is often simply invoked as an ad hoc explanation, as a mere stipulation of what cannot be explained otherwise. It should, however, be kept in mind that innateness remains an empirical question. During the last 15 years there have been considerable efforts to reduce the amount of domain-specific innate knowledge of language (cf. Wunderlich this volume; Bierwisch 1992, 1996; Fanselow 1991, 1992, 1993; Chomsky 1995; Kayne 1994; Eisenbeiss 2002: §I.7, II). For example, the quite detailed X-bar-scheme has been replaced by the simple operation merge (Chomsky 1995).30 Word order parameters, such as the head parameter, have been replaced by general principles stating that all structures are right-branching (Kayne 1994), and that heads should be placed consistently (Fanselow 1993; Eisenbeiss 2002: §I.7, II). And domain-specific principles such as the Elsewhere principle have been replaced by domain-general principles such as the Specificity principle which captures the relationship between regular processes and exceptions in other cognitive domains as well (cf. Eisenbeiss 2002: §I.7, II; Wunderlich this volume). These examples illustrate that invoking innate, domainspecific knowledge of language is not a dead end to further scientific investigation. Note, however, that particularly the latter modifications (such as those made by Eisenbeiss or Wunderlich) are not those associated by functional linguists with the notion of UG — they typically associate UG with the strong (i.e pre-minimalist) Chomskyan type of UG, and would regard the Wunderlich notion of UG as a much ‘softer’ — and hence more uncontroversial — type.31 For example, Deutscher (2005), proceeds in his functional (popularizing) account of the emergence of syntactic structure (i.e. grammaticalization) from an initial stage, which we might equate with Wunderlich’s notion of UG as protolanguage, i.e. it represents the state of language at a stage when humans have become adapted for language. In Deutscher’s account, this initial stage forms the platform from which his story about the emergence of complex syntactic structure out of simple, primitive speech proceeds. It is interesting to note that many of the features proposed by Wunderlich do not differ dramatically from what Deutscher assumes in his initial stage. Ultimately then, the remaining disagreement between formal and functional linguists will be on the particular theoretical terminology/framework used (a formal or a functional one) as well as on the willingness to explicitly regard
35
36
Martina Penke and Anette Rosenbach
properties as innate (as Wunderlich does), or to take them as given — and beyond the reach of investigation (as Deutscher does). 3.4 How can UG be used as a theory guiding empirical research? We now get to our last guiding question, namely how innateness claims, or UG, can — or should — be used to guide empirical research. Note, that the relation between UG and empirical evidence is reciprocal: On the one hand, we can ask whether empirical evidence can be used for the construction of UG (that was the issue addressed in Section 3.2.4 on typological evidence, for example); on the other hand, we may question whether UG can be used as a theoretical framework guiding and interpreting empirical evidence, which is the focus of this section. Most often, the answer to both questions will be the same. Both Newmeyer and Haspelmath in this volume, for example, agree on the fact that typological evidence should neither be used for the construction of UG, nor that UG can serve as a useful tool interpreting typological data, while Wunderlich (this volume, p. 167) thinks that “[w]ithout linguistic typology, all considerations of UG are blind because of the lack of knowledge about languages, and without some conception of UG, all linguistic typology is mindless, because it is purely descriptive.” Fischer (this volume), however, shows that the answer to these two questions can indeed be different, and that they therefore should be kept distinct (see further below). In general, we can observe two different positions within linguistics regarding the relation between linguistic theory and empirical research. More theoretically oriented researchers use data to find support for their theories, while there are other researchers whose prime interest is in the phenomena itself rather than using data for testing conflicting theoretical frameworks. In the latter perspective, theories may help the researcher to come to grips with the data or state hypotheses, but he or she is not — or rather should not be — tied to a single theory. This latter perspective is apparently the one advocated by Fischer, who addresses the question of what counts as evidence in historical linguistics in this volume. According to Fischer, the task of the historical linguist is to account for language change rather than grammar change, as opposed to, as she says, Lightfoot’s position, and in her view the basis for historical research is therefore concrete linguistic utterances rather than the theoretical construct of a grammar. According to Fischer, a focus on grammar change will not only (potentially) misguide us in historical research, it will also blind us for those changes which lie behind grammar change. In Lightfoot’s approach, only a change in the triggering data is crucial for grammar change. What changed such triggering data is, however, external to language and therefore outside the scope of a formal historical linguist.32 Note, however, that Fischer does not reject the notion of UG per se. She does indeed regard investi-
"intro-r69"> "intro-r32">
What counts as evidence in linguistics?
gation into UG as a reasonable and desirable research program. In so doing, she would also subscribe, in principle, to Kiparsky’s (1968) famous dictum that historical evidence may provide a ‘window to the mind’. However, she considers the concept of UG to be still too empirically weak to use it as a theory guiding historical research, and she provides a survey of functionalist arguments and evidence against UG to support her view. That is, historical evidence for the construction of UG — yes, UG as a theory guiding historical research — no (or, more precisely, not yet). Like Fischer’s, Eckman’s contribution to this volume focuses on the question of how UG — or innateness claims — may be used to guide empirical research, in his case research on second language acquisition (L2), and like Fischer, his answer is quite negative. Unlike Fischer, however, he does not question the empirical support for UG; rather, Eckman’s argumentation concerns the scientific status of functional and formal approaches to L2 with respect to their explanatory power. Functional approaches that make appeal to notions such as markedness have quite commonly been criticized for being not really explanatory (for various reasons, as discussed by Eckman). It is, for example, a common claim of UG-based approaches to be more explanatory than functional approaches because they relate to a ‘higher theory’, connecting language and the mind. Eckman shows that this is a misconception and that the (functional) typological approach to L2 (as in Eckman 1977, and subsequent work) is explanatory. Drawing on Hempel & Oppenheim’s (1948) logic of scientific explanation, Eckman shows that the dichotomy between ‘explanation’ and ‘description’ is a false one — instead, there are only levels of explanation, and accordingly theories will then differ by the level of explanation they provide but not with respect to their status of being explanatory (or not). In fact, Eckman then moves on to argue that the functional-typological approach is even to be preferred to UG-based approaches to L2 on scientific grounds. For Eckman, research programs should be evaluated on the grounds of their ability to allow for further explanatory ascent. However, by the postulation of innate, domain-specific principles, in his view the UG-based approach, cuts off any higher explanation — at least any feasible one.33
4. Conclusion In this introduction we presented an overview on the nature of linguistic evidence and how it is used in linguistics. Apart from the fact that there are criteria for the evaluation of empirical evidence (in the sense that any systematically collected piece of evidence is to be preferred to a sketchy one), we have tried to show how the question of evidence depends on the principal approach taken to linguistics, i.e. a
37
38
Martina Penke and Anette Rosenbach
formal or functional one. This is due to the two different — in fact opposing — conceptions of language which determine the subject matter of investigation: while formal linguists ultimately want to find out about the innate language faculty (competence), functional linguists want to investigate language usage, i.e. performance. Both ask different questions, accordingly different types of empirical evidence dominate in these approaches, and evidence is used in different ways. However, it is important to keep in mind that all types of evidence may — in principle — be relevant for both linguistic orientations. No type of evidence is per se better or worse than another. Rather, the worth of evidence must be evaluated on the basis of the way it was collected (‘systematically’) and on the basis of how well it does account for the topic under investigation. The contributions to this volume, which focus more narrowly on the issue of innateness, illustrate some of the problems involved in approaching the issue of linguistic evidence from the perspectives of both formal and functional linguists.
Notes *We wish to thank Werner Abraham and Guy Deutscher for valuable comments on a first draft of this introduction, and our contributors for various discussions we had throughout the compilation of this volume. All remaining errors are, of course, our own. We are also grateful to Eva Neuhaus for her help in editing the contributions to this volume. The second author gratefully acknowledges support from the Deutsche Forschungsgemeinschaft (DFG grant Ro 2408/2–1). Note, that for reasons of time the commentaries and authors’ replies, which came in at the final stages in the compilation of this volume, are not further discussed in this introduction. 1. More precisely, it seems to depend on the precise definition of ‘empirical science’ how linguists would classify themselves. In a narrow definition of empiricism in the Popperian sense, hermeneutics will have to be excluded here, for example. If conceiving of ‘empirical’ very broadly as ‘dealing with data’, we can include any type of data-oriented research here. In this sense, probably every linguist works necessarily empirically — or how could one probably do linguistics without any type of linguistic data (defined loosely as any piece of language)? Another question is whether the term ‘science’ should be equated with the natural sciences. In the anglophone tradition there is a very close association between ‘science’ and ‘natural science’. In contrast, in German scholarship the term Geisteswissenschaften (the arts/humanities) does entail the notion of science (Wissenschaft = ‘science’). In this conception, there can be scientific research outside the natural sciences (as in the humanities). Practically, however, there is a tendency even outside the anglophone scholarly community to equate the term ‘science’ with natural science. Therefore, some linguists who do not commit themselves to the natural sciences may not feel comfortable to be subsumed under the empirical sciences. On the other hand, such linguists should also not be deemed to be ‘unscientific’ because they are not committing themselves to the natural sciences. All these are subtle issues that would need much closer scrutiny, and which are outside the scope of an introduction like this — we would just like to draw attention to possible misinterpretations here.
2. Saussure’s conception of language has long been controversially discussed; for discussion see e.g. Botha (1992: §5.2), or Koerner (1973). 3. For the purpose of the present discussion we use the terms ‘theory’ and ‘hypothesis’ somewhat interchangeably. Note, however, that these belong to different domains. A theory is the more general concept from which then certain hypotheses fall out. 4. According to Popper (1959 [2002]: 183), “probability statements are not falisifiable.” If they are not, what is their scientific status? Falsifiable theories are certainly the epistemologically highest type of theory (because of their falsifiability); however, not every theory/generalization may be falsifiable in this sense, if still interesting. 5. For an overview, see e.g. Lehmann (1982 [1995]: §2.3), Hopper & Traugott (1993 [2003]: §5), Newmeyer (1998a: §5.5), Haspelmath (2004), or the articles in the special issue on degrammaticalization in Language Sciences 23, 2001. 6. Note, however, that there is a discrepancy between the ‘Galilean’ style of research and Popper’s definition of empirical science, where counter-examples, strictly speaking, should falsify the theory and consequently lead to abandoning or modifying the theory (see Chomsky 2002 for discussion). 7. This hypothesis has been most forcefully put forward by Janda (1980, 2001). However, see Allen (1997) for a counter-view; see also Rosenbach (2002:212–217) and Rosenbach (2004) for discussion. 8. See, however, Allen (1997, 2003) for another view on the origin of the his-genitive, and the stylistic registers it occurs in. See also Kroch (1997) for discussion. 9. The term ‘direct’ is used here to indicate that this sort of evidence is ‘more direct’ than the other. In an absolute sense, of course, there is no direct evidence for competence; see also Section 3.1 below. 10. Note, that Bauer (2002: 98) defines a corpus as “a body of language data which can serve as a basis for linguistic analysis and description”. As Bauer (2002: 98) himself notes, this definition is “(intentionally) inclusive, possibly excessively so, since it would even allow a set of sentences invented by so-called ‘arm-chair’ linguists to prove a particular grammatical point as a corpus.” That is, Bauer’s broad definition goes beyond naturally occurring speech here, and as such it is opposed to views which explicitly set corpus data in opposition to intuitive data (as e.g. Sampson’s 2001 [2002]). 11. For a useful overview of field work, see e.g. Feagin (2002). 12. Snyder (2000), for example, showed that there appear to be two types of grammaticality judgments. One, in which an initially badly evaluated construction becomes better for subjects with increased training, and one, in which a construction keeps being bad for subjects, no matter how much training they receive on it. This indicates that in the first case processing problems are at stake, which are absent in the second case, as pointed out by Gisbert Fanselow in the discussion of the 2003 Nijmegen Lectures on “Categoricity and Gradience in Syntax” (by Joan Bresnan, December 10–12, 2003). For the formal linguist who is interested in representational aspects of language, therefore only the latter type is interesting. 13. As, for example, claimed by Chomsky (1961 [1971]: 131), as cited in Sampson (2001 [2002]: 2): “a direct record — an actual corpus — is almost useless as it stands, for linguistic analysis of any but the most superficial kind.” 14. The crucial point here is that it is not always necessary to carry out an experiment to make a valid theoretical point. Theoretical linguists can also systematically find out about the constraints under which a construction is used. Note, however, that it is often desirable to supplement such
theoretical considerations by looking at corpus or experimental data. As a case in point see, for example, Bresnan & Nikitina’s (2003) study on English dative alternation which has shown certain constructions to be grammatical despite earlier theoretical considerations, and which thus calls for a revision of the theoretical account of this phenomenon (see also Section 2.2 above). 15. Alternatively, sometimes a broadened notion of competence is used. See e.g. the sociolinguist notion of ‘communicative competence’, where extra-grammatical factors such as social factors, pragmatics, or style are included (cf. e.g. Hymes 1997). 16. Note, that this also allows for a uniform treatment of historical changes. As is well-known, variability often (though not necessarily) leads to the grammaticalization of one construction; and grammaticalized, i.e. categorical contexts, may become variable again over time. In the functional/stochastic approach this can be accounted for by one grammar, in contrast to other formal approaches which proceed from the assumption of several ‘competing grammars’ (Kroch 1994). For further discussion of how, in general, formal and functional approaches deal with variation, see also the overview in Rosenbach (2002: §5). 17. Despite the welcome result of such connectionist models for functionalists that children may extract quite a few patterns on the basis of the input alone, Tomasello (2003: 191), for example, critically points out that such models lack psychological plausibility. In particular, they cannot implement communicative functions, which are a crucial prerequisite for language development in functional approaches, such as Tomasello’s. 18. This article is part of a double special issue on the ‘poverty-of-the-stimulus’ argument, edited by Nancy Ritter, of The Linguistic Review, 19, 2002, where this argument is quite controversially discussed. 19. Note, that this use of the term ‘negative evidence’ in the literature on language acquisition is different from the more general use of the term as introduced in Section 2.2 where it simply refers to evidence in the form of non-occurrence. 20. For some conceptual problems of the Principles and Parameters approach, see also Section 3.3 below. 21. As noted by Tomasello (2003: 164), analogy is a well-known principle accounting for historical development, but so far it has barely been used to account for children’s syntactic development. 22. Another case would be language isolation due to deprivation and severe neglect as in the case of Genie (cf. Curtiss 1979). In these cases, however, language isolation is accompanied with the deprivation of any social contact. Resulting are severe disorders affecting all aspects of personality and cognition as well as a complete lack of language (see Bishop & Mogford 1993 for overview). 23. FOXP2 encodes for a transcription factor that regulates the activation of genes probably involved in the development of neural structures that are important for speech and language. Lai et al. (2001) suggest that the observed mutation of this transcription factor leads to an atypical development of these brain areas. Indeed, studies making use of neuro-imaging techniques revealed anatomical and functional changes in brain areas involved in speech and language functions (Vargha-Kahdem et al. 1998; Watkins et al. 2002; Liégeois et al. 2003). 24. ‘Language gene’ is to be understood as a short-hand for a gene which is involved in the development of brain areas associated with speech and language. 25. Fischer, this volume, explicitly takes issue with the position advocated by Newmeyer that typological evidence should be irrelevant for the construction of UG, though only in passing.
26. Note, however, that recently alternative models of competence have been suggested (as e.g. the functional/stochastic account discussed in Section 3.1 above) that do incorporate frequency statements into grammar and also call for a functional grounding of (innate) OT constraints (Bresnan & Aissen 2002, and Aissen and Bresnan this volume). That is, in this account the typological generalizations excluded by Newmeyer as being frequentistic and having functional explanations are, in principle, not at all incompatible with their view of grammar. For a criticism of this approach see, however, Newmeyer (2002, 2003 a). In defence of the idea of ‘functional grounding’ of OT constraints, see Bermúdez-Otero & Börjars (2003). In the end, all depends on one’s conception of grammar and the theory chosen. 27. Note, that such a view ascribes an important role to UG on linguistic variation without reducing it to UG. It leaves open the possibility of all sorts of external factors coming in (which it should, indeed). Under such a view external factors (e.g. contact situations) would affect the language input, which in turn then affects the grammar the child acquires. 28. Kirby, Smith & Brighton talk about the emergence of ‘language universals’ which they contrast to UG. It is not clear to us in how their use of the term ‘language universals’ relates to the three notions of ‘typological evidence’ given by Newmeyer discussed above. 29. Subjacency is, for example, given by Newmeyer (this special issue) as a clear case of an innate principle, if only in passing. 30. The issue here is that the principles to be assumed as innate have been considerably simplified over the years in these approaches. It is still a matter of debate among generative linguists which theoretical framework (e.g. Principles and Parameters, or Minimalism) is to be preferred. 31. Fischer (this special issue: note 15), for example, refers to Jackendoff (2002) as representing a “watered-down version of UG”. 32. Note, that therefore McMahon (2000: 124–125) argues that the generative approach to language change is not truly explanatory, because the explanation is essentially theory-internal, with the ultimate causes of language change remaining in the dark. 33. Eckman is careful to note that the UG-based approach does not in principle cut off any further explanation. Rather, what he claims is that any further explanation of innate, domainspecific principles must make recourse to evolution, and that such explanation seems at least less practicable than explanations in terms of general cognitive principles.
References Aissen, J.; and Bresnan, J. 2002. “Optimality theory and typology”. Course taught at the DGfS/ LSA Summer School “Formal and Functional Linguistics”, Heinrich-HeineUniversität Düsseldorf, 14 July–3 August 2002. (for course material, see: http://www.phil-fak.uni-duesseldorf.de/summerschool2002/CDV/CDAissen.htm) Albert, R.; and Koster, C. J. 2002. Empirie in Linguistik und Sprachlehrforschung. Ein methodologisches Arbeitsbuch. Tübingen: Gunter Narr. Allen, C. 1997. “The origins of the ‘group genitive’ in English”. Transactions of the Philological Society 95: 111–131. Allen, C. 2003. “Deflexion and the development of the genitive in English”. English Language and Linguistics 7(1): 1–28.
Baayen, R. H. 2003. “Probabilistic approaches to morphology”. In: Bod, R.; Hay, J.; and Jannedy, S. (eds), Probabilistic linguistics 229–287. Cambridge, MA: MIT Press. Baker, M. C. 2001. The atoms of language: the mind’s hidden rules of grammar. New York: Basic Books. Balota, D. 1994. “Visual word recognition: the journey from features to meaning”. In: Gernsbacher, M. (ed.), Handbook of psycholinguistics 303–358. San Diego: Academic Press. Bates, E.; and MacWhinney, B. 1979. “The functionalist approach to the acquisition of grammar”. In: Ochs, E.; and Schieffelin, B. (eds), Developmental pragmatics 167–211. New York: Academic Press. Bates, E.; and MacWhinney, B. 1982. “Functionalist approaches to grammar”. In: Wanner, E.; and Gleitman, L. (eds), Language acquisition: the state of the art 173–218. Cambridge: CUP. Bauer, L. 2002. “Inferring variation and change from public corpora”. In: Chambers, J. K.; Trudgill, P.; and Schilling-Estes, N. (eds), The handbook of language variation and change 97–114. Oxford: Blackwell. Bellugi, U.; Marks, S.; Bihrle, A.; and Sabo, H. 1988. “Dissociation between language and cognitive functions in WS”. In: Bishop, D.; and Mogford, K. (eds), Language development in exceptional circumstances 177–189. London: Churchill Livingstone. Bellugi, U.; Wang, P.; and Jernigan, T. 1994. “Williams syndrome: an unusual neuropsychological profile”. In: Broman, S.; and Grafman, J. (eds), Atypical cognitive deficits in developmental disorders 23–56. Hillsdale, NJ: Erlbaum. Bellugi, U.; Lichtenberger, L.; Jones, W.; and Lai, Z. 2000. “The neurocognitive profile of Williams syndrome. A complex pattern of strenghts and weaknesses”. Journal of Cognitive Neuroscience 12 (Supplement): 7–29. Bermúdez-Otero, R.; and Börjars, K. 2006. “Markedness in phonology and syntax: the problem of grounding”. In: Honeybone, P.; and Bermúdez-Otero, R. (eds), Linguistic knowledge: perspectives from phonology and from syntax. Special Issue of Lingua 166(5): 710–756. Bickerton, D. 1984. “The language bioprogram hypothesis”. Behavioral and Brain Sciences 7: 173–212. Bickerton, D. 1999. “Creole languages, the language bioprogram hypothesis, and language acquisition”. In: Ritchie, W. C.; and Bhatia, T. K. (eds), Handbook of child language acquisition 195–220. San Diego: Academic Press. Bierwisch, M. 1992. “Probleme der biologischen Erklärung natürlicher Sprache”. In: Suchsland, P. (ed.), Biologische und soziale Grundlagen der Sprache 7–45. Tübingen: Niemeyer. Bierwisch, M. 1996. “Lexical information from a minimalist point of view”. In: Wilder, C.; Gärtner, H.-M.; and Bierwisch, M. (eds), The role of economy principles in linguistic theory 227–266. Berlin: Akademie-Verlag. Bishop, D. 2003. “Putting language genes in perspective”. TRENDS in Genetics 18(2): 57–59. Bishop, D.; and Mogford, K. (eds). 1993. Language development in exceptional circumstances. Hove: Lawrence Erlbaum. Bittner, D. 1994. “Die Bedeutung der Genusklassifikation für die Organisation der deutschen Substantivflexion”. In: Köpcke, K.-M. (ed.), Funktionale Untersuchungen zur deutschen Nominal- und Verbalmorphologie 65–80. Tübingen: Niemeyer. Bod, R.; Hay, J.; and Jannedy, S. (eds). 2003. Probabilistic linguistics. Cambridge, MA: MIT Press. Boersma, P. 1998. Functional phonology: formalizing the interaction between articulatory and perceptual drives. The Hague: Holland Academic Graphics. Botha, R. P. 1992. Twentieth century conceptions of language. Oxford: Blackwell.
Bowerman, M. 1988. “The ‘no negative evidence’ problem: how do children avoid constructing an overly general grammar?”. In: Hawkins, J. A. (ed.), Explaining language universals 56–72. Oxford: Blackwell. Bresnan, J. 2003. “Categoricity and gradience in syntax.” Nijmegen Lectures 2003, Nijmegen, December 10–12, 2003. Bresnan, J.; Dingare, S.; and Manning, C. 2001. “Soft constraints mirror hard constraints: voice and person in English and Lummi”. In: Butt, M.; and King, T. H. (eds), Proceedings of the LFG 01 conference 13–32. University of Hong Kong, on-line proceedings. Stanford: CSLI Publications (http://csli-publications. stanford.edu/) Bresnan, J.; and Aissen, J. 2002. “Optimality and functionality: objections and refutations”. Natural Language and Linguistic Theory 20: 81–95. Bresnan, J.; and Nikitina, T. 2003. “On the gradience of the dative alternation”. Ms., Stanford University. (http://www-lfg.stanford.edu/bresnan/new-dative.pdf) Brown, R.; and Hanlon, C. 1970. “Derivational complexity and order of acquisition in child speech”. In: Hayes, J. R. (ed.), Cognition and the Development of Language 11–53. New York, NY: Wiley. Caramazza, A. 1984. “The logic of neuropsychological research and the problem of patient classification in aphasia”. Brain and Language 21: 9–20. Caramazza, A. 1992. “Is cognitive neuropsychology possible?”. Journal of Cognitive Neuroscience 4(1): 80–95. Chomsky, N. 1957. Syntactic structures. The Hague: Mouton. Chomsky, N. 1961. “Formal discussion: the development of grammar in child language”. Conference contribution reprinted 1971 in: Allen, J. P. B.; and Van Buren, P. (eds), Chomsky: Selected readings 129–134. Oxford: OUP. Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Chomsky, N. 1980. “Rules and representations”. Behavioral and Brain Sciences 3: 1–14. Chomsky, N. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, N. 1994. “Naturalism and dualism in the study of language and mind”. International Journal of Philosophical Studies 2 (2): 181–209. Chomsky, N. 1995. The minimalist program. Cambridge, MA: MIT Press. Chomsky, N. 1998. “Noam Chomsky’s minimalist program and the philosophy of mind. An interview [with] Camilo J. Cela-Conde and Gisèle Marty”. Syntax 1: 19–36. Chomsky, N. 2002. On nature and language. Cambridge: CUP. Clahsen, H. 1999. “Lexical entries and rules of language: a multidisciplinary study of German inflection”. Behavioral and Brain Sciences 22: 991–1060. Clahsen, H.; and Almazan, M. 1998. “Syntax and morphology in Williams syndrome”. Cognition 68: 167–198. Cornips, L. 2005. “On standardising syntactic elicitation techniques”. Lingua 115(7): 939–957. Cornips, L. 2006. “Intermediate syntactic variants in a dialect – Standard speech repertoire and relative acceptability”. In: Fanselow, G.; Féry, C.; Schlesewsky, M.; and Vogel, R. (eds), Gradience in grammar 85–105. Oxford: OUP. Croft, W. 1990. Typology and universals. Cambridge: CUP. Crowley, J. C.; and Katz, L. C. 2000. “Early development of ocular dominance columns”. Science 290: 1321–1324. Curtiss, S. 1979. “Genie: language and cognition”. UCLA Working Papers in Cognitive Linguistics 1: 15–62.
De Clerck, B. 2003. “The syntactic and pragmatic analysis of let’s in present-day British and American English”. Paper given at the symposium Syntactic functions — focus on the periphery, Helsinki, November 13–15, 2003. Deutscher, G. 2005. The unfolding of language. London: W. Heinemann (Random House). Dingare, S. 2001. The effect of feature hierarchies on frequencies of passivization in English. Master’s thesis. Stanford University. (Rutgers Optimality Archive: http://ruccs.rutgers.edu/ roa.html. ROA-467–0901) Dryer, M. 2006. “Descriptive theories, explanatory theories, and basic linguistic theory”. In: Ameka, Felix; Dench, Alan; and Evan, Nicholas (eds), Catching Language: The standing challenge of grammar writing 207–234. Berlin: Mouton de Gruyter. Eckman, F. 1977. “Markedness and the contrastive analysis hypothesis”. Language Learning 27: 315–330. Eisenbeiss, S. 1994. “Elizitation von Nominalphrasen und Kasusmarkierungen. In: Eisenbeiss, S.; Bartke, S.; Weyerts, H.; and Clahsen, H. (eds), Elizitationsverfahren in der Spracherwerbsforschung: Nominalphrasen, Kasus, Plural, Partizipien (Arbeiten des Sonderforschungsbereichs 282, 57) 1–38. Düsseldorf: Heinrich-Heine-Universität. Eisenbeiss, S. 2002. Merkmalsgesteuerter Grammatikerwerb: eine Untersuchung zum Erwerb der Struktur und Flexion von Nominalphrasen. Doctoral dissertation, Heinrich-Heine-Universität Düsseldorf. (http://privatewww.essex.ac.uk/~seisen/my%20dissertation.htm) Elman, J.; Bates, E.; Johnson, M.; Karmiloff-Smith, A.; Parisi, D.; and Plunkett, K. 1996. Rethinking innateness: a connectionist perspective on development. Cambridge, MA: MIT Press. Enard, W.; Przeworski, M.; Fisher, S. E.; Lai, C. S.; Wiebe, V.; Kitano, T.; Monaco, A. P.; and Pääbo, S. 2002. “Molecular evolution of FOXP2, a gene involved in speech and language”. Nature 418: 869–872. Eythórsson, T.; Börjars, K.; and Vincent, N. 2002. “On defining degrammaticalization”. Paper presented at New reflections on grammaticalization 2, April 4–6, 2002, University of Amsterdam. Fanselow, G. 1991. Minimale Syntax. Groningen: Rijksuniversiteit Groningen [GAGL 32, Groninger Arbeiten zur germanistischen Linguistik]. Fanselow, G. 1992. “Zur biologischen Autonomie der Grammatik.” In: Suchsland, P. (ed.), Biologische und soziale Grundlagen der Sprache 335–356. Tübingen: Niemeyer. Fanselow, G. 1993. “Instead of preface: some reflections on parameters”. In: Fanselow, G. (ed.), The parametrization of Universal Grammar VII–XVII. Amsterdam: Benjamins. Fanselow, G.; and Felix, S. W. 1987. Sprachtheorie: eine Einführung in die Generative Grammatik Bd. II. Tübingen: Francke. Feagin, C. 2002. “Entering the community: fieldwork”. In: Chambers, J. K.; Trudgill, P.; and Schilling-Estes, N. (eds), The handbook of language variation and change 20–39. Oxford: Blackwell. Fischer, O.; and Rosenbach, A. 2000. “Introduction”. In: Fischer, O.; Rosenbach, A.; and Stein, D. (eds), Pathways of change. Grammaticalization in English 1–37. Amsterdam: Benjamins. Fromkin, V.A. 1997. “Some thoughts about the brain/mind/language interface”. Lingua 100: 3–27. Gallistel, C. R. 1999. “The replacement of general-purpose learning models with adaptively specialized learning modules”. In: Gazzaniga, M. (ed.), The cognitive neurosciences (2nd edition). Cambridge, MA: MIT Press. Gerken, L. 1996. “Phonological and distributional information in syntax acquisition”. In: Morgan, J. L.; and Demuth, D. (eds), Signal to syntax: bootstrapping from speech to grammar in early acquisition 411–425. Mahwah, NJ: Erlbaum. Gerken, L. 2001. “Signal to syntax”. In: Weissenborn, J.; and Höhle, B. (eds), Approaches to bootstrapping 147–166. Amsterdam: Benjamins.
Geschwind, N.; and Levitsky, W. 1968: “Human brain: asymmetries in the temporal speech region”. Science 161: 186–187. Givón, T. 1979. On understanding grammar. New York: Academic Press. Gold, E. 1967. “Language identification in the limit”. Information and Control 16: 447–474. Goldin-Meadow, S. and Mylander, C. 1990. “Beyond the input given: the child’s role in the acquisition of language”. Language 66: 323–355. Gopnik, M.; and Crago, M. B. 1991. “Familial aggregation of a developmental language disorder”. Cognition 39: 1–50. Gopnik, M.; Dalalakis, J.; Fukuda, S. E.; and Fukuda, S. 1997. “Familial language impairment”. In: Gopnik, M. (ed.), The inheritance and innateness of grammars 111–140. Oxford: OUP. Grant, J.; Valian, V.; and Karmiloff-Smith, A. 2002. “A study of relative clauses in Williams syndrome”. Journal of Child Language 29: 403–416. Haegeman, L. 1994. Introduction to government and binding theory. Oxford: Blackwell. Hagoort, P.; Brown, C. M.; and Osterhout, L. 1999. “The neurocognition of syntactic processing”. In: Brown, C.M; and Hagoort, P. (eds), The neurocognition of language 273–316. Oxford: OUP. Haspelmath, M. 2004. “On directionality in language change with particular reference to grammaticalization”. In: Fischer, O.; Norde, M.; and Perridon, H. (eds), Up and down the cline — the nature of grammaticalization 17–44. Amsterdam: Benjamins. Hawkins, J. A. 1994. A performance theory of order and constituency. Cambridge: CUP. Hebb, D. O. 1949. The organization of behavior: a neuropsychological theory. New York: Wiley. Hempel, C.; and Oppenheim, J. 1948. “Studies in the philosophy of science”. Philosophy of Science XV: 135–175. Hinton, G. E. 1992. “Wie neuronale Netze aus Erfahrung lernen”. Spektrum der Wissenschaft 11: 134–143. Hopper, P.; and Traugott, E.C. 1993. Grammaticalization. (2nd edition 2003). Cambridge: CUP. Hymes, D. 1997. “The scope of sociolinguistics”. In: Coupland, N.; and Jaworski, A. (eds), Sociolinguistics 12–22. Houndsmill: Macmillan. Indefrey, P.; and Levelt, W. 2000. “The neural correlates of language production”. In: Gazzaniga, M. (ed.), The new cognitive neurosciences 845–865. Cambridge, MA: MIT Press. Jackendoff, R. 2002. Foundations of language. Brain, meaning, grammar, evolution. Oxford: OUP. Janda, R. D. 1980. “On the decline of declensional systems: the overall loss of OE nominal case inflections and the ME reanalysis of -es as his”. In: Traugott, E. C.; Labrum, R.; and Sheperd, S. (eds), Papers from the 4th international conference on historical linguistics 243–253. Amsterdam: Benjamins. Janda, R. D. 2001. “Beyond ‘pathways’ and ‘unidirectionality’: on the discontinuity of language transmission and the counterability of grammaticalization”. Language Sciences 23: 265–340. Jenkins, L. 2000. Biolinguistics: exploring the biology of language. Cambridge: CUP. Johnston, J. R. 1994. “Cognitive abilities of language-impaired children”. In: Watkins, R.; and Rice, M. (eds), Specific language impairments in children: current directions in research and intervention. Baltimore: Brookes. Jusczyk, P. 2001. “Bootstrapping from the signal”. In: Weissenborn, J.; and Höhle, B. (eds), Approaches to bootstrapping 3–24. Amsterdam: Benjamins. Kandel, E. R.; and Hawkins, R. D. 1992. “Molekulare Grundlagen des Lernens”. Spektrum der Wissenschaft 11: 66–76. Karmiloff-Smith, A. 1998. “Development itself is the key to understanding developmental disorders”. Trends in Cognitive Sciences 2: 389–398. Kayne, R. 1994. The antisymmetry of syntax. Cambridge, MA: MIT Press.
Kegl, J.; Senghas, A.; and Coppola, M. 2001. “Creation through contact: sign language emergence and sign language change in Nicaragua”. In: DeGraff, M. (ed.), Language creation and language change: creolization, diachrony, and development 179–237. Cambridge, MA: MIT Press. Kennedy, G. 1998. An introduction on corpus linguistics. London: Longman. Kiparsky, P. 1968. “Linguistic universals and linguistic change”. In: Bach, E.; and Harms, R. T. (eds), Universals in linguistic theory 171–202. New York: Holt, Rinehart and Winston. Kirby, S.; and Hurford, J. 2002. “The emergence of linguistic structure: an overview of the iterated learning model”. In: Cangelosi, A.; and Parisi, D. (eds), Simulating the evolution of language 121–148. London: Springer. Koerner, E. F. K. 1973. Ferdinand de Saussure. Origin and development of his linguistic thought. Braunschweig: Vieweg. Kroch, A. 1994. “Morphosyntactic variation”. In: Beals, K. (ed.), Papers from the 30th regional meeting of the Chicago Linguistic Society, vol. 2: the parasessions on variation in linguistic theory (CLS 30) 180–201. Chicago: Chicago Linguistic Society. Kroch, A. 1997. “Comments on ‘syntax shinding’ papers”. Transactions of the Philological Society 95(1): 133–147. Labov, W. 1972. Sociolinguistic patterns. Philadelphia: University of Philadelphia Press. Labov, W. 1975. “Empirical foundations of linguistic theory”. In: Austerlitz, R. (ed.), The scope of American linguistics: papers of the first golden anniversary symposium of the Linguistic Society of America 77–133. Lisse: Peter de Ridder. Lai, C. S.; Fisher, S. E.; Hurst, J. A.; Vargha-Khadem, F.; and Monaco, A. P. 2001. “A forkheaddomain gene is mutated in a severe speech and language disorder”. Nature 413: 519–523. Langacker, R. W. 1987. Foundations of cognitive grammar, vol. 1: theoretical prerequisites. Stanford, CA: Stanford University Press. Langacker, R. W. 1988. “An overview of cognitive grammar”. In: Rudzka-Ostyn, B. (ed.), Topics in cognitive linguistics 3–48. Amsterdam: Benjamins. Lass, R. 2000. “Remarks on (uni)directionality”. In: Fischer, O.; Rosenbach, A.; and Stein, D. (eds), Pathways of change. Grammaticalization in English 207–227. Amsterdam: Benjamins. Leech, G.; Francis, B.; and Xu, X. 1994. “The use of computer corpora in the textual demonstrability of gradience in linguistic categories”. In: Fuchs, C.; and Victorri, B. (eds), Continuity in linguistic semantics 57–76. Amsterdam: Benjamins. Lehmann, C. 1982. Thoughts on grammaticalization. A programmatic sketch. Universität zu Köln. Revised and expanded version 1995 München/Newcastle: Lincom [Studies in Theoretical Linguistics 01]. Levy, Y. 1996. “Modularity of language reconsidered”. Brain and Language 55: 240–263. Levy, Y.; and Kavé, G. 1999. “Language breakdown and linguistic theory: a tutorial overview”. Lingua 107: 95–143. Liégeois, F.; Baldeweg, T.; Connelly, A.; Gadian, D. G.; Mishkin, M.; and Vargha-Khadem, F. 2003. “Language fMRI abnormalities associated with FOXP2 gene mutation”. Nature Neuroscience 6(11): 1230–1237. Locke, J. L. 1995. “Development of the capacity for spoken language”. In: Fletcher, P.; and MacWhinney, B. (eds), The handbook of child language 278–302. Oxford: Blackwell. MacDaniel, D. (ed.). 1996. Methods for assessing children’s syntax. Cambridge, MA: MIT Press. Manning, C. D. 2003. “Probabilistic syntax”. In: Bod, R.; Hay, J.; and Jannedy, S. (eds), Probabilistic linguistics 289–341. Cambridge, MA: MIT Press. Marcus, G. F. 1993. “Negative evidence in language acquisition”. Cognition 46: 53–85. Marcus, G. F. 2001. The algebraic mind: integrating connectionism and cognitive science. Cambridge, MA: MIT Press.
McCawley, J. D. 1980. “Tabula si, rasa no!”. Behavioral and Brain Sciences 3: 26–27. McEvery, T.; and Wilson, A. 1996. Corpus linguistics. An introduction. (2nd edition 2001). Edinburgh: Edinburgh University Press. McMahon, A. 2000. Change, chance, and optimality. Oxford: OUP. McNeill, D. 1966. “The creation of language by children”. In: Lyons, J. R.; and Wales, R. J. (eds), Psycholinguistic papers: the proceedings of the 1966 Edinburgh conference 99–115. Edinburgh: Edinburgh University Press. Meyer, C. 2002. English corpus linguistics. An introduction. Cambridge: CUP. Meyer, C.; Grabowski, R.; Han, H.-Y.; Mantzouranis, K.; and Moses, S. 2003. “The world wide web as linguistic corpus”. In: Leistyna, P.; and Meyer, C. F. (eds), Language structure and language use 241–254. Amsterdam: Rodopi. Newmeyer, F. J. 1980. Linguistic theory in America. New York: Academic Press. Newmeyer, F. J. 1998a. Language form and language function. Cambridge, MA: MIT Press. Newmeyer, F. J. 1998b. “The irrelevance of typology for linguistic theory”. Syntaxis 1: 161–197. Newmeyer, F. J. 2002. “Optimality and functionality: a critique of functionally-based optimalitytheoretic syntax”. Natural Language and Linguistic Theory 21: 43–80. Newmeyer, F. J. 2003a. “Grammar is grammar, and usage is usage”. Language 79: 682–707. Newmeyer, F. J. 2003b. “Discourse-derived evidence is not privileged evidence”. Paper given at the 25th Annual Meeting of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS), München, February 26–28. Nichols, J. 1984. “Functional theories of grammar”. Annual Review of Anthropology 13: 97–117. Penke, M. 2002. Flexion im mentalen Lexikon: eine neuro- und psycholinguistische Perspektive. Postdoctoral thesis, Heinrich-Heine-Universität Düsseldorf. (http://www.phil-fak.uniduesseldorf.de/sfb282/C8/Habil_Martina_Penke.pdf). Penke, M.; and Krause, M. 2002. “German noun plurals — a challenge to the dual-mechanism model”. Brain and Language 81: 303–311. Penke, M.; and Krause, M. 2004. “Regular and irregular inflectional morphology in German Williams syndrome”. In Bartke, S.; and Siegmüller, J. (eds), Williams syndrome across languages 245–270. Amsterdam: Benjamins. Perry, T. A. (ed.). 1980. Evidence and argumentation in linguistics. Berlin: de Gruyter. Pinker, S. 1989. Learnability and cognition: the acquisition of argument structure. Cambridge, MA: MIT Press. Pinker, S. 1994. The language instinct. How the mind creates language. New York: William Morrow & Comp. Pinker, S. 2002. The blank slate: the modern denial of human nature. New York: Viking. Popper, K. 1959. The logic of scientific discovery. (English edition 2002). London: Routledge. Pullum, G. K.; and Scholz, B. C. 2002. “Empirical assessment of stimulus poverty arguments”. The Linguistic Review 19: 9–50. Quartz, S. R.; and Sejnowski, T. J. 1997. “The neural basis of cognitive development: a constructivist manifesto”. Behavioral and Brain Sciences 20: 537–596. Renouf, A. 2003. “Webcorp: providing a renewable data source for corpus linguists”. In: Granger, S.; and Petch-Tyson, S. (eds), Extending the scope of corpus-based research 39–58. Amsterdam: Rodopi. Ringen, J. D. 1980. “Linguistic facts. A study of the empirical scientific status of transformational generative grammar”. In: Wunderlich, D. (ed.), Wissenschaftstheorie der Linguistik 97–132. Kronberg: Athenäum. Rosenbach, A. 2002. Genitive variation in English. Conceptual factors in synchronic and diachronic studies. Berlin: Mouton de Gruyter.
Rosenbach, A. 2004. “The English s-genitive — a case of degrammaticalization?”. In: Fischer, O.; Norde, M.; and Perridon, H. (eds), Up and down the cline — the nature of grammaticalization 73–96. Amsterdam: Benjamins. Sampson, G. 1980. Schools of linguistics. London: Hutchinson. Sampson, G. 1997. Educating Eve. The language instinct debate. London: Cassell. Sampson, G. 2001. Empirical linguistics. (Reprinted paperback edition 2002). London: Continuum. Sandra, D. 1998. “What linguists can and can’t tell you about the human mind: a reply to Croft”. Cognitive Linguistics 9 (4): 361–378. Scholz, B. C.; and Pullum, G. K. 2002. “Searching for arguments to support linguistic nativism”. The Linguistic Review 19: 185–223. Schütze, C. T. 1996. The empirical base of linguistics. Grammaticality judgments and linguistic methodology. Chicago: The University of Chicago Press. Schütze, C. T. 2003. “Linguistic theory and empirical evidence: clarifying some misconceptions”. Paper given at the 25th Annual Meeting of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS), München, February 26–28. Smolensky, P. 1988. “On the proper treatment of connectionism”. Behavioral and Brain Sciences 11: 1–73. Snow, C.E. 1977. “Mother’s speech research: from input to interaction”. In: Snow, C.E.; and Ferguson, C.A. (eds), Talking to children: language input and acquisition 31–49. Cambridge: CUP. Snow, C. E. 1995. “Issues in the study of input: finetuning, universality, individual and developmental differences, and necessary causes”. In: Fletcher, P.; and MacWhinney, B. (eds), The handbook of child language 180–193. Oxford: Blackwell. Snyder, W. 2000. “An experimental investigation of syntactic satiation effects”. Linguistic Inquiry 31(3): 575–582. Squire, L.R.; and Kandel, E.R. 1999. From mind to molecules. New York: Scientific American Library. Stevens, T.; and Karmiloff-Smith, A. 1997. “Word learning in a special population: do individuals with Williams syndrome obey lexical constraints?”. Journal of Child Language 24: 737–765. Stromswold, K. 2001. “The heritability of language: a review and metaanalysis of twin, adoption, and linkage studies”. Language 77(4): 647–723. Tabor, W.; and Traugott, E. C. 1998. “Structural scope expansion and grammaticalization”. In: Ramat, A. G.; and Hopper, P. (eds), The limits of grammaticalization 229–272. Amsterdam: Benjamins. Thal, D.; Tobias, S.; and Morrison, D. 1991. “Language and gesture in late talkers: a one-year follow-up”. Journal of Speech and Hearing Research 34(3): 604–612. Tomasello, M. 2003. Constructing a language. A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Townsend, J.; Wulfeck, B.; Nichols, S.; and Koch, L. 1995. “Attentional deficits in children with developmental language disorder”. Technical Report CND-9503. Center for Rerearch in Language, University of California at San Diego. Uriagereka, J. 1998. Rhyme and reason. Cambridge, MA: MIT Press. Van der Lely, H. K. J. 1996. “Specifically language impaired and normally developing children: verbal passive vs. adjectival passive sentence interpretation”. Lingua 98: 243–272. Van der Lely, H. K. J. 1997. “Language and cognitive development in a grammatical SLI boy: modularity and innateness”. Journal of Neurolinguistics 10 (2/3): 75–107. Van der Lely, H. K. J.; and Stollwerck, L. 1997. “Binding theory and grammatical specific language impairment in children”. Cognition 62: 245–290. Van Valin, R. D. 1990. “Functionalism, anaphora, and syntax. Review of Functional syntax, by S. Kuno”. Studies in Language 14: 169–219.
Vargha-Kahdem, F.; Watkins, K.E.; Alcock, K.J.; Fletcher, P.; and Passingham, R.E. 1995. “Praxic and non-verbal cognitive deficits in a large family with a genetically transmitted speech and language disorder”. Proceedings of the National Academy of Sciences of the USA 92: 930–933. Vargha-Kahdem, F.; Watkins, K. E.; Price, C. J. et al. 1998. “Neural basis of an inherited speech and language disorder”. Proceedings of the National Academy of Sciences of the USA 95(21): 12695–12700. Volterra, V.; Capirci, O.; Pezzini, G.; Sabbadini, L.; and Vicari, S. 1996. “Linguistic abilities in Italian children with Williams syndrome”. Cortex 32: 663–677. Watkins, K. E.; Vargha-Kahdem, F.; Ashburner, J. et al. 2002. “MRI analysis of an inherited speech and language disorder: structural brain abnormalities”. Brain 125(3): 465–478. Weinberg, S. 1976. “The forces of nature”. Bulletin of the American Society of Arts and Sciences 29: 28–29. Weinreich, U.; Labov, W.; and Herzog, M. 1968. “Empirical foundations for a theory of language change”. In: Lehmann, W.; and Malkiel, Y. (eds), Directions for historical linguistics 95–188. Austin: University of Texas Press. Westermann, G. 2000. Constructivist neural network models of cognitive development. Dissertation, University of Edinburgh. (http://www.cbcd.bbk.ac.uk/people/gert/publications/thesis.pdf) Wiese, R. 1996. The phonology of German. Oxford: Clarendon Press. Wunderlich, D. (ed.). 1976. Wissenschaftstheorie der Linguistik. Kronberg: Athenäum. Wunderlich, D. 1999. “German noun plural reconsidered”. Behavioral and Brain Sciences 22: 1044–1045. Zitzen, M. 2003. Topic shift markers in asynchronous and synchronous computer-mediated communication (CMC). Doctoral dissertation, Heinrich-Heine-Universität Düsseldorf. Zitzen, M.; and Stein, D. 2004. “Chat and conversation. A case of transmedial stability?”. Linguistics 42(5): 983–1021.
49
"new-r28">
Typological evidence and Universal Grammar* Frederick J. Newmeyer University of Washington
The paper discusses the relevance of typological evidence for the construction of a theory of Universal Grammar (UG). After introducing UG-based approaches to typology, it goes on to argue that most typological generalizations are in no sense ‘knowledge of language’. In fact, some of the best-established typological generalizations have explanations based on language use, and so it is either empirically unmotivated or redundant to attempt to encompass them within UG theory. This conclusion is reinforced by a look at the widely-accepted Lexical Parameterization Hypothesis and by the current shift of interest to ‘microparameters’. The paper goes on to take a critical look at Mark Baker’s Parameter Hierarchy.
1.
Introduction
The purpose of this paper is to develop certain ideas that I first raised in Newmeyer (1998a, 2000). In a nutshell, I argued that the goal of Universal Grammar (UG) should be to capture the notion ‘possible human language’, but not the notion ‘probable human language’. That is, UG handles typological generalizations only to the extent that it allows or excludes in principle certain language types. However, the implicational and statistical cross-linguistic generalizations that form the bulk of the typology literature are not within the explanatory province of UG theory. I supported my conclusions by pointing to how singularly unsuccessful UG-based approaches have been in handling typology and in a few brief comments attributed this lack of success to the fact that such generalizations are explained better by a theory of language use than by one of language structure.1 The present paper explores this latter point in detail, focusing on the relevance of typological evidence for the construction of a theory of UG. After introducing UG-based approaches to typology (§2), it goes on to argue that most typological generalizations are in no sense ‘knowledge of language’ (§3). Section 4 makes the case
"new-r4"> "new-r22">
52
Frederick J. Newmeyer
that some of the best-established typological generalizations have explanations based on language use, and it so it is either empirically unmotivated or redundant to attempt to encompass them within UG theory. This conclusion is reinforced by a look at the widely-accepted Lexical Parameterization Hypothesis (§5) and by the current shift of interest to ‘microparameters’ (§6). The paper goes on to take a critical look at Mark Baker’s Parameter Hierarchy (§7). Section 8 is a brief conclusion.
2. UG-based approaches to typology The Principles-and-Parameters (P&P) approach (encompassing both the Government-Binding Theory and the Minimalist Program) seems at first blush ideally constructed to handle typological generalizations. As its name implies, a common set of UG principles is at work in all of the world’s languages. Language-particular variation is captured by ‘parameterizing’ these principles differently in different languages. As Chomsky put it in the foundational P&P work: What we expect to find, then, is a highly structured theory of UG based on a number of fundamental principles […] with parameters that have to be fixed by experience. If these parameters are embedded in a theory of UG that is sufficiently rich in structure, then the languages that are determined by fixing their values one way or another will appear to be quite diverse, since the consequences of one set of choices may be very different from the consequences of another set; yet at the same time, limited evidence, just sufficient to fix the parameters of UG, will determine a grammar that may be very intricate … (Chomsky 1981: 3–4)
In recent years, Chomsky has become quite explicit that it is the job of UG to handle the sorts of implicational relations uncovered by typologists: There has also been very productive study of generalizations that are more directly observable: generalizations about the word orders we actually see, for example. The work of Joseph Greenberg has been particularly instructive and influential in this regard. These universals are probably descriptive generalizations that should be derived from principles of UG. (Chomsky 1998: 33, emphasis added)
In P&P, a program for typology immediately suggests itself. The possible range of parameters and their settings determines the set of possible grammars. Implicational relations among settings determines why certain grammars are cross-linguistically more common than others and why certain grammatical features tend to cluster together (for early discussion of how that might be accomplished, see Hyams 1986; and Travis 1989). One would assume, then, that the pages of the generative-oriented journals would be filled with articles devoted to working out the relevant implicational relations with the ultimate goal of deriving the more robust generalizations that
Typological evidence and Universal Grammar
have been uncovered in the past few decades of typological investigation. Nothing could be farther from the truth, however. Many articles and books in P&P posit a particular parameter (and associated settings) to distinguish one language or dialect from another and most introductions to transformational syntax devote a few pages (but rarely more) to how cross-linguistic generalizations might be captured. But with one exception, researchers in the P&P tradition have not attempted a comprehensive treatment of parameters and their settings. That is, they have given no more than lip service to the emphasized assertion in the above Chomsky quote. That exception is Mark Baker’s book The Atoms of Language: The Mind’s Hidden Rules of Grammar (Baker 2001).2 Baker takes seriously the P&P program for typology and proposes an intricate ‘Parameter Hierarchy’ (PH) in which implicational relations between parameters and their settings are made explicit. Figure 1 presents his final version of that hierarchy. The PH is to be interpreted as follows. If Parameter X has logical priority over Polysynthesis no
Parameter Y, then X is written higher than Y and is connected to Y by a downward slanting line. If two parameters are logically independent of each other, then they are written on the same line and separated by a dash. Such is the case only for the Head Directionality Parameter (HDP) and the Optional Polysynthesis Parameter (OPP). The logical independence of these two parameters leads to four possible ‘choices’, each represented by a branching line: ‘head first’ for the HDP and ‘no’ optional polysynthesis for the OPP; ‘head first’ for the HDP and ‘yes’ optional polysynthesis for the OPP; ‘head last’ for the HDP and ‘yes’ optional polysynthesis for the OPP; and ‘head last’ for the HDP and ‘no’ optional polysynthesis for the OPP. If there are no further parametric choices to be made, given a particular setting of a particular parameter, then the branch ends in a terminal symbol *. Beneath the asterisk languages are listed that have this combination of parameter settings. As a consequence, structurally similar languages should end up being close on the diagram, and dissimilar languages far apart. In Baker’s account, the clustering of typological features is a consequence of the formulation of the parameters themselves and the hierarchical relations among them. To take a simple case, VO languages tend to be prepositional because the notion ‘head’ enters into the definition of the Head Directionality Parameter and verbs and prepositions are heads of their respective phrases. More subtly, all polysynthetic languages are predicted to be nonconfigurational (in fact, Baker rejects the idea of a separate ‘Configurationality Parameter’ as in Hale 1983), since a positive value for the Polysynthesis Parameter treats full arguments as mere adjuncts, with corresponding freedom of occurrence. And head-initial languages are claimed never to be topic prominent, since the branch leading to the Topic Prominent Parameter originates from the setting ‘last’ for the Head Directionality Parameter. Baker’s account of why certain typological features are more common than others is more indirect. Essentially, the more ‘choices’ a language learner needs to make, the rarer the language type is claimed to be. As far as VO versus OV is concerned: Since the difference between English-style and Japanese-style word order is attributable to a single parameter [Head Directionality], there is only one decision to make by coin flip: heads, heads are initial; tails, heads are final. So we expect roughly equal numbers of English-type and Japanese-type languages. (Baker 2001:134)
Why are VSO languages so much rarer than SVO languages, then? Because two more parameters enter into the characterization of the former than of the latter:3 Within the head-initial languages, however, it requires two further decisions [the value for the Subject Placement Parameter and the value for the Verb Attraction Parameter] to get a verb-initial, Welsh-type language: Subjects must be added early and tense auxiliaries must host verbs. If either of these decisions is made in the opposite way, then subject-verb-object order will still emerge. If the decisions were made by coin flips, we would predict that about 25 percent of the head-initial
Typological evidence and Universal Grammar
languages would be of the Welsh type and 75 percent of the English type. This too is approximately correct […] (Baker 2001: 134)
In the remainder of this paper, in view of the absence of any worked out alternatives, I take Baker’s approach to parameters as the ‘default’ P&P approach, except where I explicitly note otherwise.
3. Typological evidence and ‘knowledge of language’ This section argues that the very nature of typological evidence renders it all but irrelevant to the construction of a theory of UG. We should begin by considering the types of typological data that might in principle be of value for the construction of such a theory. The first type are absolute universals, that is properties that all languages share or that no languages allow. The second type are implicational universals, namely statements of the form: ‘If a language has property X, then it will have property Y.’ The third type are simple statements of (relative) frequency, such as: ‘75% of languages have property X and 25% have property Y’ or merely: ‘More languages have X than Y.’ Now, absolute universals might well be relevant to UG, since the non-existence of some property might (in principle) result from the initial state of I-language being such that a grammar containing that property is literally unobtainable. However, I will now argue that the second and third types of typological evidence are indeed irrelevant to UG theory. I take it as a working hypothesis that the child constructs his or her grammar by means of an interplay between what is innately provided by UG and ‘environmental’ evidence to which he or she is exposed. How might implicational and frequency-based typological generalizations be located with respect to these two dimensions? Certainly, we can rule out without further discussion that evidence bearing on them could be ‘environmental’. No child is exposed to cross-linguistic generalizations. More children in more speech communities are exposed to the correlation of VO order and prepositionality than to the correlation of VO order and postpositionality. But English-acquiring children have no way of knowing that they are following the ‘majority’ in this respect and Finnish-acquiring children have no way of knowing that they are in the minority. Do implicational and frequency-based generalizations follow directly then from an innate UG? As we have seen, Chomsky and Baker make precisely that claim. Indeed, Baker attributes the property of innate knowledge to the PH. He remarks that ‘it would make sense if children, too, instinctively work their way down the hierarchy, taking advantage of its logical structure to avoid agonizing over needless decisions’ (Baker 2001: 192) and goes on to suggest that ‘the parameter hierarchy provides a logical flowchart that children use in the process of language
55
56
Frederick J. Newmeyer
acquisition’ (p. 195). If Chomsky and Baker are correct, then knowledge of the second and third types of typological generalizations must be hard-wired into the child. The only place that ‘evidence’ enters into the picture is where the linguistic data presented to the child helps him or her to determine the setting of a particular parameter and (although the issue is not discussed by Chomsky or Baker) in what respects the language being learned is in violation of some innately-provided typological generalization. I am extremely skeptical that implicational and frequency-based typological generalizations are part of our innate genetic inheritance and the remainder of this paper is devoted to elucidating the grounds for my skepticism. In the remainder of this section I call attention to two difficulties with this idea. First of all, and most problematically, such typological generalizations tend to be stochastic. That is, they are not of the type that can be represented by the either-or (or yes-no) switch settings implied by Chomsky and Baker. Consider the fact that VO languages tend to have (overt) Wh-Movement and OV languages tend not to. Could we say that the parameter setting for Wh-Movement is linked in some way to the setting for the Head Directionality parameter? No, because the facts are more complicated than that. Consider Table 1, with typological generalizations gleaned from Dryer (1991). Table 1.Percent of V-final, SVO, and V-initial languages manifesting particular properties (Dryer 1991) Property Postpositional Relative-Noun Standard of comparison-Adjective Predicate-Copula Subordinate clause-Subordinator Noun-Plural word Adpositional phrase-Verb Manner Adverb-Verb Verb-Tense/aspect aux verb Verb-Negative auxiliary Genitive-Noun Sentence-Question particle Wh-in situ
V-final
SVO
V-initial
96 43 82 85 70 100 90 91 94 88 89 73 71
14 01 02 26 06 24 01 25 21 13 59 30 42
09 00 00 39 06 13 00 17 13 00 28 13 16
What we observe is that according to 13 criteria, SVO languages are intermediate in their typological properties between V-final and V-initial languages. In other words, in informal terms, one can say that the closer the verb is to the front of the clause, the more likely some other property will also be manifest. That sort of statistically-framed generalization cannot be stated by means of parameter settings,
Typological evidence and Universal Grammar
and is incompatible with the ‘algebraic’ nature of UG, as it has generally been conceived.4 In other words, incorporating the generalizations of Table 1 into a theory of UG would necessitate a profound rethinking of UG-theory — and one that would lead in a direction that one would have to assume to be uncongenial to the great bulk of UG-theorists. We cannot rule out a priori, of course, the possibility of implicational, yet absolute, universals, that is generalizations of the form: ‘If a language has property X, then it must have property Y.’ A language with X, but not Y, then, would be impossible and hence a candidate for being excluded by UG. It remains to be seen whether any (non-trivial and non-accidental) universals of this form actually exist.5 Secondly, as far as I can see, a UG-parametric approach to typological generalizations leads to incorrect predictions about language learning and use. The problem centers around languages that appear to violate some robust typological generalization. So not all languages are consistently ‘first’ or ‘last’ with respect to Baker’s PH. German, by most accounts, for example, is head-initial within NP and head-final within VP. Furthermore, German is a ‘V2’ language, a typologically rare trait (and one which Baker never mentions). German verbal behavior, therefore, presumably falls within Chomsky’s ‘periphery of marked exceptions’.6 If that expression means anything, it follows that there are two components to acquisition and knowledge of language, one governed by UG proper (i.e. the parameters, their possible settings and the implicational relations among the settings), the other forming part of some separate mental component. If so, this distinction should be easily observable in acquisition and use. So, one would predict perhaps that elements of the marked periphery would be acquired with more difficulty than those of the parametric core. However, I know of no evidence that such might be the case. For example, as far as I know, German-acquiring children never go through a stage in which they set the Head Parameter as consistently ‘first’ or ‘last’, as one might expect given the Chomsky-Baker approach to typology. Likewise, there appears to be no period during which German-speaking children fail to set the V2 parameter (Poeppel & Wexler 1993). Indeed, there is no evidence that typologically inconsistent languages provide any more of a challenge to learners than typologically consistent ones. Such a consequence is expected if UG does not in any way ‘register’ the distinction between consistent and inconsistent languages, but unexpected otherwise. Likewise, there is no evidence that ‘peripheral’ knowledge is stored and/or used any differently from that provided by the system of principles and parameters per se. When head-directionality or V2-ness are at stake, do German speakers perform more slowly in reaction time experiments than do speakers of head-consistent nonV2 languages? Do they make more mistakes in everyday speech, say by substituting unmarked constructions for marked ones? Do the marked forms pose comprehen-
57
"new-r10"> "new-r36"> "new-r26">
58
Frederick J. Newmeyer
sion difficulties? In fact, is there any evidence whatsoever that such knowledge is dissociable in some way from more ‘core’ knowledge? As far as I am aware, the answers to all of these questions are ‘no’. I conclude then that the fact that typological generalizations are inconceivably learned inductively by the child and are implausibly innate suggests that they are not part of knowledge of language at all.
4. Alternative explanations of typological generalizations In this section I argue in a different way that typological evidence is irrelevant to the construction of a theory of UG. I demonstrate that for many of the bestsupported typological generalizations, alternative explanations are available, rendering appeal to UG unnecessary. There has never been a shortage of non-UG-based explanations of word order correlations. As noted by Dryer (1992), most of these have been semantically-based and have assumed that the unmarked tendency for any given language is for heads either to consistently precede their dependents or to consistently follow them (see, for example, Vennemann 1973; Keenan 1978). Dryer himself proposes a structurally-based alternative, Branching Direction Theory, which holds that nonphrasal categories preferentially either consistently precede or consistently follow their phrasal sisters, and provides considerable evidence demonstrating the superiority of this approach over the alternative. Dryer’s insight is incorporated into the much more comprehensive parsing theory presented in Hawkins (1994). The central parsing principle that Hawkins proposes is called ‘Early Immediate Constituents’ (EIC) and is stated as follows (1994: 77): (1) Early Immediate Constituents (EIC) The human parser prefers linear orders that maximize the IC-to-non-IC ratios of constituent recognition domains (CRD).
A ‘constituent recognition domain’ for a particular phrasal mother node M consists of the set of nodes that have to be parsed in order to recognize M and all of the ICs of M.7 Consider how EIC explains one extremely robust word order universal, the Prepositional Noun-Modifier Hierarchy (PrNMH) of Hawkins (1983): (2) PrNMH If a language is prepositional, then if RelN then GenN, if GenN then AdjN, and if AdjN then DemN.
This hierarchy allows prepositional phrases with the structures depicted below (along with an exemplifying language):
However, no language allows, say, a relative clause to intercede between a preposition and its noun complement, but not an adjective. The EIC-based explanation of the PrNMH is straightforward. The CRD for the PP is the distance from the initial P to the head N of the complement NP. Since relative clauses tend to be longer than possessive phrases which tend to be longer than adjectives which tend to be longer than demonstratives which are always longer than ‘silence’, the hierarchy is predicted on parsing grounds. There is no need to appeal to UG parameters. Hawkins demonstrates that other word order generalizations, from the correlation between verb-object order and adpositionality to cross-linguistic generalizations about the sequencing of elements within NP follow in a similar manner. What makes a language use-based explanation for word order facts so appealing is that independently we need to appeal to this genre of explanation where a UG-based explanation would be far-fetched. I now provide a couple of examples. Faltz (1977/1985) and Comrie (1998) point out that if a language has 1st and 2nd person reflexives, it will also have 3rd person reflexives, as (4) illustrates: (4) Occurrence of distinctive reflexives
English Old English French *
Third person
First/Second Person
yes no yes no
yes no no yes
Presumably, one could incorporate this generalization into UG, if one wished to. In fact, it could be done trivially, simply by positing the following UG principle: (5) If a language has 1st and 2nd person reflexives, it will also have 3rd person reflexives.
The stipulative nature of (5) hardly needs calling attention to. I take it as beyond the need for discussion that it would be desirable to appeal to some independent principle that obviated the need to posit something like (5). What might such a principle be? Faltz’s and Comrie’s explanation for (4) is based on the idea that 1st
59
60
Frederick J. Newmeyer
and 2nd person referents are unique. But 3rd person referents are open-ended. In principle, a 3rd person referent could be any entity other than the speaker or the hearer. So it would seem to be more ‘useful’ to have 3rd person reflexives, since they narrow down the class of possible referents. Hence it appears that grammars are serving our needs by reducing potential ambiguity. I am highly skeptical of explanations that appeal to ambiguity-reduction or, in fact, to any sort of general ‘usefulness’ to the language user. The amount of formal ambiguity that one finds in language is enormous and ‘usefulness’ is such a vague concept that it seems inherently undesirable to base an explanation on it. In any event, it is worth asking how much ambiguity is reduced by a 3rd person reflexive anyway. It eliminates one possible referent for the object, leaving an indefinite number of possibilities remaining. Table 2.Reflexive Pronoun Occurrence in English (Johansson and Hofland 1989) Reflexive pronoun
Number of occurrences in corpus
myself yourself himself herself itself
169 94 511 203 272
Total 3rd pers. SG.
986
I can offer an explanation of these facts that does not involve problematic appeals to ambiguity-reduction. In languages that have reflexive pronouns in all three persons, 3rd person reflexives are used more frequently than 1st and 2nd. Consider English. In a million-word collection of British English texts utilizing a wide variety of genres, 3rd person singular reflexives were 5.8 times more likely to occur than 1st person and 10.5 times more likely to occur than 2nd person (Table 2 gives the facts). Language users (for whatever reason) more frequently use identical subjects and objects in the 3rd person than in the 1st or 2nd. Given that more frequently appealed to concepts are more likely to be lexicalized than those that are less frequently appealed to, the implicational relationship among reflexive pronouns follows automatically. There is no need to appeal to ambiguity-reducing ‘usefulness’. The metatheoretical advantage of a frequency-based explanation of the reflexive-number generalization over both a UG-based and a usefulness-based explanation is that, like those based in parsing, it is a syntagmatic explanation. That is, it is one that appeals to on-line pressure on the speaker to produce his or her utterances rapidly. UG-based explanations are (at least in this case) highly stipulat-
"new-r1"> "new-r34"> "new-r7">
Typological evidence and Universal Grammar
ive, while usefulness-based explanations are psycholinguistically implausible, in that they require the speaker to imagine the set of possible contrasting utterances and their meanings. It is worth giving one more example of where a syntagmatic explanation of a typological generalization is superior to both a UG-based and an ambiguity reduction-based explanation. The example is based on the phenomenon of differential object marking (DOM). Some languages overtly case-mark direct objects and some do not. In many languages, whether they do or not is a function of the degree of animacy or definiteness of that object. The higher in the hierarchies of animacy and/or definiteness (see 6 and 7) the object is in such languages, the more likely it is to be case marked in a particular language: (6) Animacy Hierarchy: Human > Animate > Inanimate (7) Definiteness Hierarchy: Personal Pronoun > Proper Noun > Definite NP > Indefinite Specific NP > Non-specific NP
Aissen (2003) provides a UG-based explanation of DOM, formulated in the notation provided by Optimality Theory (OT). Without going into specifics about the mechanics of the analysis, she posits the constraint rankings in (8) and the constraints (9) and (10): (8) a. *Oj/Hum » *Oj/Anim » *Oj/Inam b. *Oj/Pro » *Oj/PN » *Oj/Def » *Oj/Spec » *Oj/Nspec (9) *ØC ‘Star Zero’: Penalizes the absence of a value for the feature CASE. (10) *STRUCC: penalizes a value for the morphological category CASE.
By means of local conjunction of the hierarchies of (8) with constraint (9), she is in a position to specify the class of permissible case systems, as far as DOM is concerned. The functionalist account of DOM has generally been an ambiguity reductionbased one (see for example, Silverstein 1981; Croft 1988; Comrie 1989). As Table 3 illustrates, subjects of transitive sentences are overwhelmingly animate and definite, while direct objects are overwhelmingly inanimate and are definite to a much smaller degree than subjects.
Table 3.Frequencies of subjects and objects in transitive sentences in the SAMTAL corpus of spoken Swedish (Jäger 2003)11
Subj Obj
NP
+def
−def
+pron
−pron
+anim
−anim
3151 3151
3098 1830
53 1321
2984 1512
167 1639
2948 317
203 2834
61
"new-r18"> "new-r1">
62
Frederick J. Newmeyer
Hence, it is argued, marking the less prototypical animate and/or definite direct objects prevents them from being confused with subjects. As Aissen notes:8 An intuition which recurs in the literature on DOM is that it is those direct objects which are most in need of being distinguished from subjects that get overtly case marked. This intuition is sometimes expressed as the idea that the function of DOM is to distinguish subject from object. … In a weaker form, the intuition can be understood in the following terms: the high prominence which motivates DOM for objects is exactly the prominence which is unmarked for subjects. Thus it is those direct objects which most resemble typical subjects that get overtly case marked. (Aissen 2003: 436)
Again, a purely syntagmatic frequency-based explanation will suffice. All that one needs to adopt is the well-established hypothesis that within a given domain, more frequent combinations of features require less coding than less frequent ones (for discussion, see Haiman 1983). There is no need to appeal to ambiguity reduction — a welcome result, since, as noted by Aissen, it is ‘clear that DOM is required in many instances where the absence of case marking could not possibly lead to ambiguity’ (Aissen 2003: 438). It is important to stress that a rejection of a UG-based account of typological generalizations does not in and of itself entail a rejection of UG principles per se. The poverty of the stimulus based-arguments for the innateness of one or another constraint still need to be evaluated on their own merits. Take the principle of Subjacency, for example. Hoekstra & Kooij (1988) argue that this principle could not be learned inductively. Since this constraint prohibits the formation of wh-questions if a wh-phrase intervenes between the filler and the gap, it predicts correctly that (11a) is ambiguous as to the scope of where, while (11b) is not. Note that in (11a), where can refer both to the place of John’s saying and the place of getting off the bus, while in (11b), where can refer only to the place of John’s asking: (11) a. Whereij did John say ___i that we had to get off the bus ___j? b. Whereij did John ask ___i whether we had to get off the bus *___j?
They argue that positive evidence alone could hardly suffice to enable the child language learner to come to the conclusion that (11b) does not manifest the same ambiguity as (11a) — the abstractness and complexity of the principle and the paucity of direct evidence bearing directly on it guarantee that the child could never figure Subjacency out ‘for itself ’. Thus knowledge that not any link is possible between a wh-phrase and its co-indexed gap must be pre-wired into the language learner. Now, then, how do Hoekstra and Kooij’s arguments bear on questions of typology. The answer is ‘Not at all’. There is no reason to assume that the typological variation that one finds with Subjacency requires recourse to an innate UG, even if the principle itself might have an innate basis. Let us review some of the
"new-r4">
Typological evidence and Universal Grammar
typological variation that we find with this principle. The initial proposal of Subjacency was made in Chomsky (1973). Working with exclusively English data, Chomsky suggested that Subjacency prohibits a moved element from crossing two (or more) bounding nodes, where bounding nodes are S and NP. So consider Subjacency violation (12), whose tree representation is (13): (12) *What did you wonder where Bill put? S´
(13)
COMP whati
S NP
VP
you
V wonder
S´ COMP wherej
S NP Bill
VP V
NP
PP
put
ti
tj
Movement of what to the highest COMP crosses the two circled bounding nodes. Rizzi (1982), however, observed that Italian allows sentences of the form of (12), as we can see in (14a–b): (14) a.
Il solo incarico che non sapevi a chi avrebbero affidato è poi finito proprio a te. ‘The only charge that you didn’t know to whom they would entrust has been entrusted exactly to you.’ b. Tuo fratello, a cui mi domando che storie abbiano raccontato, era molto preoccupato. ‘Your brother, to whom I wonder which stories they told, was very troubled.’
Rizzi’s solution to this problem was to suggest that the notion of ‘bounding node’ is parameterized. Different languages have bounding nodes. In Italian, S¢ is a bounding node, but S is not:
63
64
Frederick J. Newmeyer
(15) NP NP
S (BN Ital)
the task
COMP
il incarico
NPi
NP
INF
which
I
didn’t
que
S (BN Eng)
non
VP V
S (BN Ital)
know COMP sapevi
PPj
S (BN Eng) NP
INFL
to whom they would a chi
VP V
NP PP
avrebbero entrust ti
ti
affidato
So, the Italian sentence is grammatical and the English sentence is not. Other languages with Wh-Movement are stricter than English. Russian has Wh-Movement: (16) a.
kavo ljubit Marija who-acc loves Mary-nom ‘who does Mary love?’ b. ja znaju kavo Marija ljubit I know who-acc Mary-nom loves ‘I know who Mary loves’
(Russian,Freidin&Quicoli1989)
But the wh-phrase may not be extracted from its clause: (17) *kavo gavorit Ivan ˇcto Marija ljubit who-acc says Ivan that Mary-nom loves ‘Who does Ivan say that Mary loves?’
Hence in Russian both S and S¢ are bounding nodes. In other words, we have a typological difference among languages, in terms of their bounding nodes. Now how might UG play a role in explaining this difference? I think that mainstream generative opinion would assume that UG provides the set of possible bounding nodes (i.e., S and NP; S¢ and NP; S, S¢, and NP; and so on.) and that the child, in the process of acquisition, picks out the correct set of nodes for his or her language based on positive evidence. But the former assumption
"new-r3"> "new-r27">
Typological evidence and Universal Grammar
seems entirely gratuitous. If, as we have hypothesized, Subjacency is innate and, as surely must be the case, positive evidence is available to pin down its operation for any particular language, what reason have we for the additional assumption that UG pre-selects the set of possible bounding nodes? I have never seen a poverty of the stimulus-based argument for such an assumption and doubt that one can be constructed.
5. The Lexical Parameterization Hypothesis, typological evidence, and UG This section argues that the Lexical Parameterization Hypothesis further undercuts a UG-based approach to typological evidence. By the mid 1980s a near consensus view had arisen among P&P syntacticians, namely that parametric variation is restricted to the lexicon: (18) The Lexical Parameterization Hypothesis (LPH) (Borer 1984; Manzini & Wexler 1987): Values of a parameter are associated not with particular grammars, but with particular lexical items.
At least some version of the LPH seemed justified by virtue of the fact that even within a particular language different lexical items behave differently with respect to particular UG principles. For example, many languages have up to a half dozen different anaphors, each of which seem to differ in terms of their binding possibilities. Even in English, a language with only a couple of anaphors, reflexive and reciprocal anaphors seem to have different properties. The reflexive anaphor themselves is categorically impossible as subject of an embedded tensed clause, while the reciprocal anaphor each other does occur in that position, at least in informal speech: (19) John and Mary think that [*themselves/?each other are the best candidates].
A metatheoretical reason for advocating the LPH has also been put forward. If it is correct, the argument goes, then there is only one human language outside of the lexicon. Language acquisition is simply a matter of learning lexical idiosyncrasies. The LPH is too strong. As pointed out by Fukui (1995), one would not want to make head directionality, say, a matter of lexical choice. In no language do the verbs that translate as ‘hit’ and ‘believe’ take their complements to the right and those that translate as ‘kick’ and ‘intend’ take their complements to the left. Likewise, there have been numerous proposals that suggest that languages can differ according to the number of bar levels that they allow for particular projections. Yet the fact that VP, say, might have two levels in one language and three in another is not in any sense a ‘lexical’ fact.9 Oddly, there has been no discussion of the fact that at least the strong version
65
"new-r8"> "new-r2">
66
Frederick J. Newmeyer
of the LPH partly undercuts a UG-based approach to typological evidence. Consider the distinction between the possibility that a grammar fixes a value for a particular parameter and the possibility that a word fixed a value for the same parameter. In the former case, we have, at least potentially, an argument for UG based on the poverty of the stimulus (henceforth an ‘APS argument’). On the basis of input that is greatly underdetermined with respect to the knowledge to be acquired, the child succeeds in intuiting some broad fact about the language being acquired. But in the latter case, if the child sets parameters word-by-word, no such APS argument holds. That is, given the LPH, the child is a conservative attentive learner (to use the wording of Culicover 1999), one who places each newly-encountered word into its structural context and in that way elaborates its lexical specifications. In such a situation, it seems an abuse of terminology to appeal to the notion of ‘parameters’ at all.
6. Microparameters and the number of parameters needed to handle typological evidence After presenting the first version of his PH, Mark Baker in his Atoms of Language makes the statement that “So far we know of only some eight parameters” (Baker 2001: 173). He ends up proposing twelve that can be directly incorporated into the hierarchy (see Figure 1) and several more whose position with respect to the PH is unclear. If fifteen binary-valued were all that were needed to characterize the possible grammars of the world’s languages, it would not strain credulity to posit their incorporation into an innate UG. However, recent work by Richard Kayne (see especially the papers collected in Kayne 2000) has suggested that considerably more than fifteen parameters would be needed. Kayne notes that “it is often estimated that the number of languages presently in existence is 4000–5000” (2000: 7). However, the number turns out to be much higher once one turns one’s attention to ‘microparametric variation’, that is, to differences between closely related languages or dialects. Kayne writes: […] in Northern Italy alone one can individuate at least 25 syntactically distinct languages/dialects solely by studying the syntax of subject clitics. More recently, I have had the privilege of participating in a Padua-based syntactic atlas/(micro)comparative syntax project with Paola Benincà, Cecilia Poletto, and Laura Vanelli, on the basis of which it is evident that one can individuate at least 100 syntactically distinct languages/dialects in Northern Italy. A very conservative estimate would be that present-day Italy has at least 500 syntactically distinct-languages/dialects. 500,000 would in consequence, I think, then be a very conservative extrapolation to the number of syntactically distinct languages/dialects in the world at present. (Kayne 2000: 7)
"new-r25">
Typological evidence and Universal Grammar
However, once one starts comparing syntactic differences between individuals speaking the ‘same’ language/dialect, the number of grammars increases exponentially, indeed perhaps it might be “some number substantially greater than 5 billion” (p. 8)! Kayne is not troubled by this result, since, as he points out, “the number of independent binary-valued syntactic parameters needed to allow for 5 billion syntactically distinct grammars is only 33 (2 raised to the 33rd power is about 8.5 billion) […] it seems plausible that the child is capable of setting at least that many syntactic parameters” (p. 8). Kayne’s math may be right, but from that fact it does not follow that only 33 parameters would be needed to capture all of the microvariation that one finds in the world’s languages and dialects. In principle, the goal of a parametric approach is to capture the set of possible human languages, not the set (however large) of actually existing ones. One can only speculate that the number of such languages is in the trillions or quadrillions. In any event, Kayne’s own work suggests that the number of parameters is vastly higher than 33. Depending on precisely what counts as a parameter (Kayne is not always clear on that point), just to characterize the difference among the Romance dialects discussed in the first part of Kayne (2000) with respect to clitic behavior, null subjects, verb movement, and participle agreement would require several dozen distinct parameters. It is hard to avoid the conclusion that characterizing just a few more differences among the dialects would lead to dozens of new parameters. If the number of parameters needed to characterize the different grammars of the worlds languages, dialects, and (possibly) idiolects is in the thousands (or, worse, millions), then ascribing them to an innate UG to my mind loses all semblance of plausibility. True, we are not yet at the point of being able to ‘prove’ that the child is not innately equipped with 7846 (or 7,846,938) parameters, each of whose settings is fixed by some relevant triggering experience. I would put my money, however, on the possibility that evolution has not endowed human beings in such an exuberant fashion.10
7. Problems with hierarchizing parameter choice This section outlines some conceptual and empirical difficulties with any attempt to hierarchize parameter choice in the manner of Baker (2001). Discussion will be confined exclusively to Baker’s proposals, for the simple reason that there is no other proposal extant that comes close in the amount of detail that one finds in the PH. The great potential appeal of the PH is its architectural simplicity. All choices branch from a single parametric node, namely the Polysynthesis Parameter, which
67
68
Frederick J. Newmeyer
is the most basic parametric choice that language learners need to make. Furthermore, with one exception (the PH puts the Head Directionality and Optional Polysynthesis parameters at the same level), all branching (and hence all choices) are binary. One binary choice leads inexorably to another, with parameters on collateral branches playing, in principle, no role in any particular choice. Unfortunately, the typological evidence argues against a model of parametric choice with properties remotely that simple. Take the Ergative Case Parameter as an example. In the PH, this parameter comes into play only for head-final languages without optional polysynthesis. But what Baker really wants to say, I think, is that only languages with these typological properties can be ergative. Somehow, speakers of head-initial languages have to know (or come to know) that their language is accusative. Nothing on the PH conveys the information that accusativity is, in essence, the ‘default’. Along the same lines, nothing in the PH conveys information about whether ergative languages can have serial verbs, whether languages that neutralize adjectives can be verb-subject, or whether topic prominent languages can have null subjects. Recording this information in the PH would considerably complicate its architecture, since doing so would require that branches cross each other or demand some equivalent notational device. There are serious problems as well with the idea that the rarity of a language type is positively correlated with the number of ‘decisions’ (i.e. parametric choices) that a language learner has to make. Baker’s discussion of verb-initial languages (see §2 above) implies that for each parameter there should be a roughly equal number of languages with positive and negative settings. That cannot possibly be right. There are many more non-polysynthetic languages than polysynthetic ones, despite the fact that the choice of one or the other is a matter of a yes-no choice. The same point could be made for subject-initial head-first languages vis-à-vis subject-last ones and non-optional polysynthesis languages vis-à-vis optional polysynthetic ones. Most problematically of all, the Null Subject Parameter is the lowest of all in the PH, implying that null subject languages should be rare, indeed, rarer than verb-initial languages. However, according to Gilligan (1987), a solid majority of the world’s languages are null subject. The PH is also rife with purely empirical problems. To ‘start at the top’, Baker assigns a positive value of the Polysynthesis Parameter to both Mohawk and Warlpiri, despite extreme typological differences between them. Among other things, Mohawk makes heavy use of incorporation and has no overt case marking, while Warlpiri has rich case marking. The problem with distinguishing the two languages by means of a case marking parameter is that Baker wants case marking to fall out from the Head Directionality Parameter, since most head-final languages have case marking. But as Table 4 shows, a sizeable percentage of head-first languages have case marking, while 36% of head-last languages lack it. None of
"new-r9"> "new-r29">
Typological evidence and Universal Grammar
Table 4.Percent of languages of each type with explicit dependent (case) marking (Siewierska and Bakker 1996) V-initial
V-medial
V-final
42
30
64
these languages would appear to have any place in the PH. Furthermore, the PH posits that a positive value for the Polysynthesis Parameter automatically suggests adjective neutralization, that is, adjectives belonging to the class of nouns or verbs, rather than forming a category of their own. But it is far from being the case that adjective neutralization is limited to polysynthetic languages. According to Dixon (1977), in Chinese, Thai, and in many Austronesian languages, adjectives belong to the class of verbs, while in Arabic, Tagalog, and in many Dravidian and Bantu languages, adjectives belong to the class of nouns. Again, there is no place for such languages in the PH. Moving further down the hierarchy, one finds more curious features. The Ergative Case Parameter applies only to head-final languages and indeed the great majority of languages with ergative case are head-final. The problem is that agreement morphology can also be ergative (as Baker himself notes, p. 181). The problem here is that such languages tend to be either verb-initial (as in Chamorro and Sahaptin) or verb-final (as in Abkhaz and Canela-Kraho) (see Nichols 1992). Since the parameters that determine head-finality and verb-initiality could not be farther apart on the PH, it is far from clear how the two subcases of ergative agreement marking could be treated in a unified fashion and how both could be unified parameter-wise with ergative case marking (should that be desirable). The Serial Verb Parameter is placed to allow only SVO languages to manifest this phenomenon, even though Schiller (1990) gives examples of SOV and VOS languages with serial verbs. And only a subset of SVO languages are permitted to have a positive value for the Null Subject Parameter, even though null subject languages can be SOV (Turkish) and VSO (Irish). It is clearly the case that not all problems with the PH are ‘structural’ in the sense that the very architecture of the hierarchy prevents an adequate statement of the relevant generalization. Indeed, Baker is to be commended for pushing the UG-parametric approach to typology as far as he has — and certainly much farther than anyone else has done. One would naturally expect empirical problems in a pioneering work of this sort. But the bulk of problems are crucially structural. No hierarchy of the general form of the PH is capable of representing the parametric choices that the child is hypothesized to make. Needless to say, from that fact we have no right to conclude that no parametric approach of any sort to typological evidence within UG can succeed. But the burden, I feel, is on those who have an
69
"new-r28"> "new-n*">
70
Frederick J. Newmeyer
alternative that equals Baker’s PH in detail and coverage, while at the same time providing an empirically more adequate model.
8. Conclusion This paper has raised the question of whether typological evidence is relevant to the construction of a theory of Universal Grammar and has answered the question in the negative. Since knowledge of language typology is neither innate nor learned in the process of language acquisition, it is not in any sense part of the grammar that the child acquires. Typological generalizations are therefore phenomena whose explanation is not the task of grammatical theory. If such a conclusion is correct, then the explanatory domain of Universal Grammar is considerably smaller than has been assumed in much work in the Principles-and-Parameters approach.
Notes *I would like to thank Martin Haspelmath, Martina Penke, Anette Rosenbach, Helmut Weiß, and an anonymous reviewer for their comments on an earlier version of this paper. They are not responsible for any errors. 1. For somewhat different arguments leading to the same conclusion, see the papers in this volume by Haspelmath and Kirby et al. For a contrary view, see the Wunderlich contribution. 2. The Atoms of Language, however, presents an unusual challenge to the critical reader. It is not a ‘research monograph’ in the usual sense of the term, but rather, as the dust cover puts it, a ‘book for a general audience’. Very little knowledge is presupposed about the intricacies of grammatical theory. Baker’s book, then, is possibly unique in the annals of science publishing, in that it is a popularization of research results that were never argued for in refereed journals in their full technically elaborated form. Unfortunately I see no alternative but to regard and evaluate the book as if it presented research results, even though the claims made are typically presented in an extremely informal manner, given the intended audience. 3. The relative rarity of VSO languages with respect to SOV languages has frequently been attributed to the apparent fact that more rules are involved in the derivation of the former than of the latter. For example, Emonds (1980) argued that Irish and other VSO languages are SVO throughout the greater part of the derivation. Such languages have a late rule that fronts the verb before the subject. As Emonds put it (1980: 44), such languages are ‘more complicated, therefore rarer’. Presumably the ‘extra rule’ account could be assimilated to the ‘extra parameter’ account. 4. Stochastic conceptions of I-language have indeed been proposed recently. For a critique, see Newmeyer (2003). 5. An anonymous reviewer suggests two universals of this form: the Relative Clause Accessibility Hierarchy (Keenan and Comrie 1977) and the generalization that any language that has subjectverb inversion in yes-no questions will necessarily have this same inversion in wh-questions, but not vice-versa. But the former universal is well known to be riddled with exceptions (see
Newmeyer 1998b), while the latter has not been investigated thoroughly enough to determine its status with any degree of certainty. 6. Chomsky himself has had very little to say about markedness in grammar and the coreperiphery distinction in more than a decade. However, since the sorts of generalizations that these notions were intended to capture have not been attributed to other, newer, mechanisms, one must assume that it continues to be necessary to appeal to them. 7. As an anonymous reviewer points out, EIC (and other parsing principles proposed in Hawkins 1994) are surely themselves innate, albeit not part of UG per se. One might take an expanded view of UG, expressed in Fanselow (1992) and Wunderlich (this volume), which rejects the distinction between UG principles and processing principles. I feel that such a move is to a large degree terminological, and where not terminological, it is incorrect (for relevant discussion, see Newmeyer 2003). 8. Aissen regards her OT account as a formalization of the functionalist account. For skeptical discussion, see Newmeyer (2002). 9. A number of weaker versions of the LPH have been proposed. To cite one example, Clahsen, Eisenbeiss & Penke (1996) have proposed the ‘Lexical Learning Hypothesis’ (LLH), in which language acquisition proceeds through the interplay between UG principles and the identification of functional elements such as inflectional affixes, determiners, complementizers, etc. The child’s task is to discover — based on positive evidence — which grammatical concepts are lexicalized. Languages thus differ only with respect to which concepts they lexicalize. It remains to be investigated to what extent the LLH is subject to the same criticism as the LPH, as discussed below. 10. As both Anette Rosenbach and Martina Penke have pointed out (personal communication), there is an evolution-based argument against typological generalizations being encoded in UG. If UG is related to a specific genetic endowment of our species and goes back to some type of proto-language capacity humans had before spreading out from Africa (see Wunderlich, this volume), then it seems implausible that the typological variation which originated afterwards could be part of this universal language capacity. Also, if we assume a gradual evolution of the language capacity triggered by evolutionary principles, how could there be a selection for an option or a selection for a parameter value that one’s language does not realize? 11. Jäger remarks that the SAMTAL corpus is a collection of everyday conversations in Swedish that was annotated by Oesten Dahl. He also notes that the same patterns have been found in the Wall Street Journal Corpus by Hank Zeevat, in the CallHome corpus of spoken Japanese by Fry (2001), and in the SUSANNE and CHRISTINE corpora of spoken English by himself.
References Aissen, Judith. 2003. “Differential object marking: iconicity vs. economy”. Natural Language and Linguistic Theory 21: 435–483. Baker, Mark C. 2001. The atoms of language: the mind’s hidden rules of grammar. New York: Basic Books. Borer, Hagit. 1984. Parametric syntax: case studies in Semitic and Romance languages. Dordrecht: Foris.
Chomsky, Noam. 1973. “Conditions on transformations”. In: Anderson, Steven; and Kiparsky, Paul (eds), A festschrift for Morris Halle 232–286. New York: Holt Rinehart & Winston. Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, Noam 1998. “Noam Chomsky’s minimalist program and the philosophy of mind. An interview [with] Camilo J. Cela-Conde and Gisèle Marty”. Syntax 1: 19–36. Clahsen, Harald; Eisenbeiß, Sonja; and Penke, Martina. 1996. “Underspecification and lexical learning in early child grammars”. In: Clahsen, Harald; and Hawkins, Roger (eds), Generative approaches to first and second language acquisition 129–160. Amsterdam: Benjamins. Comrie, Bernard. 1989. Language universals and linguistic typology. (2nd edition). Chicago: University of Chicago Press. Comrie, Bernard. 1998. “Reference-tracking: description and explanation”. Sprachtypologie und Universalienforschung 51: 335–346. Croft, William. 1988. “Agreement vs. case marking and direct objects”. In: Barlow, Michael; and Ferguson, Charles A. (eds), Agreement in natural language: approaches, theories, descriptions 159–179. Stanford, CA: Center for the Study of Language and Information. Culicover, Peter W. 1999. Syntactic nuts: hard cases, syntactic theory, and language acquisition. Oxford: Oxford University Press. Dixon, Robert M.W. 1977. “Where have all the adjectives gone?” Studies in Language 1: 1–80. Dryer, Matthew S. 1991. “SVO languages and the OV:VO typology”. Journal of Linguistics 27: 443–482. Dryer, Matthew S. 1992. “The Greenbergian word order correlations”. Language 68: 81–138. Emonds, Joseph E. 1980. “Word order in generative grammar”. Journal of Linguistic Research 1: 33–54. Faltz, Leonard M. 1977/1985. Reflexivization: a study in universal syntax. New York: Garland. Fanselow, Gisbert. 1992. “Zur biologischen Autonomie der Grammatik”. In: Suchsland, Peter (ed.), Biologische und soziale Grundlagen der Sprache: 335–356. Tübingen: Niemeyer. Freidin, Robert; and Quicoli, A. Carlos. 1989. “Zero-stimulation for parameter setting”. Behavioral and Brain Sciences 12: 338–339. Fry, John. 2001. Ellipsis and wa-marking in Japanese conversation. Unpublished Ph. D. thesis, Stanford University. Fukui, Naoki. 1995. “The principles-and-parameters approach: a comparative syntax of English and Japanese”. In: Shibatani, Masayoshi; and Bynon, Theodora (eds), Approaches to language typology 327–372. Oxford: Clarendon Press. Gilligan, Gary M. 1987. A cross-linguistic approach to the pro-drop parameter. Unpublished Ph. D. thesis, University of Southern California. Haiman, John. 1983. “Iconic and economic motivation”. Language 59: 781–819. Hale, Kenneth 1983. “Warlpiri and the grammar of nonconfigurational languages”. Natural Language and Linguistic Theory 1: 5–47. Hawkins, John A. 1983. Word order universals. New York: Academic Press. Hawkins, John A. 1994. A performance theory of order and constituency. Cambridge: Cambridge University Press. Hoekstra, Teun; and Kooij, Jan G. 1988. “The innateness hypothesis”. In: Hawkins, John A. (ed.), Explaining language universals 31–55. Oxford: Blackwell. Hyams, Nina M. 1986. Language acquisition and the theory of parameters. Dordrecht: Reidel. Jäger, Gerhard. 2003. “Learning constraint sub-hierarchies: the bidirectional gradual learning algorithm”. In: Blutner, Reinhard; and Zeevat, Henk (eds), Pragmatics in optimality theory 251–287. Palgrave: Macmillan.
Johansson, Stig; and Hofland, Knut. 1989. Frequency analysis of English vocabulary and grammar based on the LOB corpus. Volume 1: tag frequencies and word frequencies. Oxford: Clarendon Press. Kayne, Richard S. 2000. Parameters and universals. Oxford: Oxford University Press. Keenan, Edward L. 1978. “On surface form and logical form”. In: Kachru, Braj B. (ed.), Linguistics in the seventies: directions and prospects 163–204. Urbana, IL: Department of Linguistics, University of Illinois. Keenan, Edward L.; and Comrie, Bernard. 1977. “Noun phrase accessibility and universal grammar”. Linguistic Inquiry 8: 63–99. Manzini, M. Rita; and Wexler, Kenneth. 1987. “Parameters, binding, and learning theory”. Linguistic Inquiry 18: 413–444. Newmeyer, Frederick J. 1998a. “The irrelevance of typology for linguistic theory”. Syntaxis 1: 161–197. Newmeyer, Frederick J. 1998b. Language form and language function. Cambridge, MA: MIT Press. Newmeyer, Frederick J. 2000. “Why typology doesn’t matter to linguistic theory”. In: Goodall, Grant; Schulte-Nafeh, Martha; and Samiian, Vida (eds), Proceedings of the twenty-eighth meeting of the Western Conference on Linguistics 334–352. Fresno: Department of Linguistics, California State University at Fresno. Newmeyer, Frederick J. 2002. “Optimality and functionality: a critique of functionally-based optimality-theoretic syntax”. Natural Language and Linguistic Theory 20: 43–80. Newmeyer, Frederick J. 2003. “Grammar is grammar and usage is usage”. Language 79: 682–707. Nichols, Johanna. 1992. Linguistic diversity in space and time. Chicago: University of Chicago Press. Poeppel, David; and Wexler, Kenneth. 1993. “The full competence hypothesis of clause structure in early German”. Language 69: 1–33. Rizzi, Luigi. 1982. Issues in Italian syntax. Dordrecht: Foris. Schiller, Eric 1990. “The typology of serial verb constructions”. Chicago Linguistic Society 26. Siewierska, Anna; and Bakker, Dik. 1996. “The distribution of subject and object agreement and word order type”. Studies in Language 20: 115–161. Silverstein, Michael 1981. “Case marking and the nature of language”. Australian Journal of Linguistics 1: 227–246. Travis, Lisa. 1989. “Parameters of phrase structure”. In: Baltin, Mark R.; and Kroch, Anthony S. (eds), Alternative conceptions of phrase structure 263–279. Chicago: University of Chicago Press. Vennemann, Theo. 1973. “Explanation in syntax”. In: Kimball, John (ed.), Syntax and semantics 2: 1–50. New York: Seminar Press.
73
"bal-r2"> "bal-r3"> "bal-r5">
Remarks on the relation between language typology and Universal Grammar Commentary on Newmeyer Mark Baltin New York University
Although there are usually thought to be two distinct areas of linguistics, the fields labeled language typology and formal syntax, it is not clear that the subject matter or goals of these two fields really differ. Both are concerned with discovering the nature of universal grammar, and the limits of the observed variation between languages. In both cases, researchers are hypothesizing a set of terms to describe the observed variation, and the terms are a priori, and can be said to constitute a theory of grammar. Terms such as ‘noun’, ‘verb’, and ‘adposition’ are not given by the data, a point made by Chomsky (1957) in his discussion of structuralism. Rather, I suspect that the difference is a methodological one, in the range and breadth of the phenomena investigated. For example, Dryer (1992) notes the small sample size of Greenberg (1963), a seminal work in the field of language typology, which dealt with word order correlations in 32 historically unrelated languages, and instead is dealing with linguistic correlations, typically word order correlations, in 625 languages. While one is impressed with the amount of research that goes into such work, and the work is in many respects an extremely valuable work with a great deal of insight, there is a sense in which the wider database for the analysis precludes an in-depth analysis of the phenomena than is permitted in a smaller sample size, but a compensatory in-depth analysis of the languages. From this standpoint, Newmeyer’s observations about the relationship between the findings of language typologists and their implications for generative attempts to formulate an account of Universal Grammar are extremely plausible. From a generative perspective, many of the correlations that language typology finds among languages are effects to be explained, and cannot be directly imported into a theory of Universal Grammar, which should reflect both what is found in all grammars, as well as the range and nature of grammar variation. For example, consider the (implicit) theory of phrase-structure that one finds in Hawkins (1983), in which there is a principle of Cross-Category Harmony, but
"bal-r9"> "bal-r3">
76
Mark Baltin
categories consist of heads and dependents. Contrast this with an X-bar theory of phrase-structure, such as the one in (1): (1) X≤ Æ (Y≤) X¢ X¢ Æ {W≤ X¢} {X0 (Z≤)}
One would call Y≤ a specifier of X≤, W≤ an adjunct, and Z≤ a complement. However, for Hawkins, all of these items would be collapsed as dependents, predicting that, e.g., relative clauses, which are commonly viewed as adjuncts (but see Kayne (1994) for a dissenting view), would be treated the same way as objects, which are typically viewed as complements. Interestingly, Dryer’s (1992, Table 2) word order correlations show this correlation not to hold up, with 37 languages being both OV and NRel, and 26 languages being OV and RelN. Or take VSO languages, which are commonly taken to be a primitive type. In generative terms, VSO order is thought to arise in several ways, even if one assumes that VSO word order takes an SVO order as input. For example, if one assumes that the clause is organized into the following phrase-structure (2) the first D≤, the subject, is generated as the specifier of V≤, and the second D≤, the object, is generated as the complement of V. (2) [C≤ [C¢ C [T≤ [T¢ T [V≤ D≤ [V¢ V D≤]]]]]]
One way in which a VSO word order can arise would be for the verb to move to T, placing it before the subject, and the subject would simply remain in place; a second way would involve the subject moving to the specifier position of T≤, with the verb moving first to T, and then to C. Interestingly, McCloskey (1996) has proposed that languages differ as to these two options. If he is right, VSO is not monolithic, but is an effect that can arise from various causes. A language typology which blurs these distinctions, then, is masking the nature of distinct grammars. It seems that Baker’s view of parameters, as described by Newmeyer, is also blurring these distinctions. As noted by Newmeyer, there are really two main views of parameters, theoretical permissible dimensions of grammar variation in Universal Grammar: macroparameters and microparameters. Macroparameters can be thought of as ‘big’ on-off switches in Universal Grammar. For example, it has been proposed that grammars differ in the X-bar theory along the dimensions of headfirst and head-last, with English, with its SVO order, as head-first, and Japanese, with its SOV word order, as head-last. As Newmeyer notes, Baker’s view also holds each parameter as being binary in nature, and equipotential as to the two values of the parameter (the ‘coin-flip’). However, this goes against another notion that has been influential in linguistics since the 1930’s, originating within the Prague Circle — the notion of markedness
"bal-r8"> "bal-r9"> "bal-r2"> "bal-r1">
Commentary on Newmeyer
(Jakobson 1936/1972). According to this notion, one might take one value of a parameter to represent the ‘unmarked’ value, to be set as the default case, while positive evidence would be required to change the parameter setting to the marked value. There might be some evidence that SVO word order represents the unmarked value, and this evidence comes from creole languages, which can be viewed as ‘constructed’ languages arising from contact situations between two or more languages. According to my colleague John Singler (personal communication), creole languages are universally SVO, even when the input languages are SOV. A particularly clear case, according to Singler, is Berbice Creole Dutch, which is SVO, even though both Dutch and the dominant substrate language, Ijo, are SOV. One might simply say that SVO is the unmarked value of the parameter, or, more generally, that head-first is the unmarked value, but is there any deeper explanation for this fact? There may very well be. Kayne (1994) has proposed that there is no headedness parameter, and that the universal underlying word order in a phrase is specifier-head-complement. Assuming that subjects are specifiers and heads are complements, one would then have to account for the existence of SOV languages by a movement transformation of some sort, such as the object moving to the left of the verb. Now, it has been proposed in recent years, by Chomsky (1995), that movement occurs for morphological reasons, to check morphological features. It has also been proposed that there is an X0 head called Agr, which checks the agreement features of some phrase that is close enough to it, typically in the specifier position of Agr. One might then say that in SOV languages, the structure of the clause is really as in (3): (3) [C≤ [C¢ C [T≤ [T¢ T [Agr≤ [Agr¢ Agr [V≤ D≤ [V¢ V D≤]]]]]]]]
And the object moves into the specifier position of Agr in order to check the agreement features of Agr, yielding (4): (4) [C≤ [C¢ C [T≤ [T¢ T [Agr≤ D≤ [Agr¢ Agr [V≤ D≤ [V¢ V]]]]]]]]
It has also been proposed that creoles have relatively impoverished morphology. As such, one would not expect movement for morphological reasons in a creole. Hence, if SVO order is basic, creoles would be expected to exhibit this word order. This takes me to the second view of parameters, the view that parameters are microparameters, meaning that the dimensions of grammar variation are quite small. In particular, Borer (1984) has proposed that parameters are limited to variants of functional X0s. Agr is a functional X0, and the difference between Japanese and creoles might then reduce to the presence versus absence of an Agr which requires that its specifier position be filled for morphological reasons. For this reason, I do not feel that Fukui’s (1995) objections to microparameters, as presented by Newmeyer, are compelling. At least this parameter can be
traced to the nature of a closed-class, functional category’s lexical specification, qualifying it for microparameter status. Indeed, I have never seen convincing evidence for macroparameters. One classic case for such a parameter was Rizzi’s (1982) claim that English and Italian differed with respect to the set of bounding nodes that were to be counted for Chomsky’s (1973) subjacency condition. In Chomsky’s view, one could only move an element to the periphery of one bounding node up from its point of origin. However, it was claimed that English and Italian differed with respect to whether one could extract from within an embedded question. If one claimed that embedded questions are CPs (at the time, S’s), one might claim that CP was a bounding node in Italian, while TP (S) was a bounding node in English. If correct, this would be a macroparameter, since the locus of variation would be not a lexical item, but a phrasal projection. One might first ask what it means to say that two languages differ with respect to the value of a parameter. Most linguists do not believe that the distinction between a language and a dialect has any linguistic import (recall Max Weinreich’s famous dictum that “A language is a dialect with an army and a navy” (Weinreich 1945:13)). Indeed, Grimshaw (1986) showed that many English speakers could also extract out of an embedded question. However, the problem for Rizzi’s account was more serious. This account predicts that, universally, one would not be able to extract from within the clausal complement of an embedded question, since one would have to be crossing two CPs to do so. Therefore, whichever bounding node, TP or CP, is claimed to be the operative bounding node in that language, subjacency would be violated. However, Lasnik & Saito (1984) cite the following sentence as grammatical, and I and others to whom I have spoken find it fully acceptable: (5) John, whoi I wonder whether anyone thinks __ will win, is a friend of mine.
The subjacency prediction is therefore problematic, and it therefore looks as though we have no clear account of the parameter, much less whether the variation that is claimed by the positing of the parameter exists. In conclusion, Newmeyer’s account is extremely plausible from a generative perspective. The observations of language typologists are quite valuable, but their conclusions cannot be directly imported into a Chomskyan conception of UG.
References Borer, Hagit. 1984. Parametric syntax. Dordrecht: Foris. Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton. Chomsky, Noam. 1973. “Conditions on transformations”. In: Anderson, Stephen R.; and Halle, Morris (eds), A festschrift for Morris Halle 232–286. New York: Holt, Rinehart, & Winston.
Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press. Dryer, Matthew. 1992. “The Greenbergian word order correlations”. Language 68(1): 81–138. Fukui, Naoki. 1995. “The principles-and-parameters approach: a comparative syntax of English and Japanese”. In: Shibatani, Masayoshi; and Bynon, Theodora (eds), Approaches to language typology 327–372. Oxford: Clarendon Press. Greenberg, Joseph. 1963. “Some universals of grammar with reference to meaningful elements”. In: Greenberg, Joseph (ed.), Universals of language 73–113. Cambridge, MA: MIT Press. Grimshaw, Jane. 1986. “Subjacency and the S/S¢ parameter”. Linguistic Inquiry 17(2): 364–369. Hawkins, John A. 1983. Word order universals. New York: Academic Press. Jakobson, Roman. 1936/1972. Child language, aphasia, and phonological universals. (2nd edition). The Hague: Mouton. Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge, MA: MIT Press. Lasnik, Howard; and Saito, Mamoru. 1984. “On the nature of proper government”. Linguistic Inquiry 15(2): 235–289. McCloskey, James. 1996. “Subjects and subject positions in Irish”. In: Borsley, Robert D.; and Roberts, Ian (eds), The syntax of the Celtic languages: a comparative perspective 241–283. Cambridge: CUP. Weinreich, Max. 1945. “YIVO and the problems of our time”. YIVO-bleter 25(1):13. Rizzi, Luigi. 1982. “Violations of the wh-island condition and the status of subjacency”. Journal of Italian Linguistics 5(1–2): 157–195.
79
Does linguistic explanation presuppose linguistic description?* Martin Haspelmath Max-Planck-Institut für evolutionäre Anthropologie, Leipzig
I argue that the following two assumptions are incorrect: (i) The properties of the innate Universal Grammar can be discovered by comparing language systems, and (ii) functional explanation of language structure presupposes a “correct”, i.e. cognitively realistic, description. Thus, there are two ways in which linguistic explanation does not presuppose linguistic description. The generative program of building cross-linguistic generalizations into the hypothesized Universal Grammar cannot succeed because the actually observed generalizations are typically one-way implications or implicational scales, and because they typically have exceptions. The cross-linguistic generalizations are much more plausibly due to functional factors. I distinguish sharply between “phenomenological description” (which makes no claims about mental reality) and “cognitively realistic description”, and I show that for functional explanation, phenomenological description is sufficient.
1.
Introduction
Although it may seem obvious that linguistic explanation necessarily presupposes linguistic description, I will argue in this paper that there are two important respects in which this is not the case. Of course, some kind of description is an indispensable prerequisite for any kind of explanation, but there are different kinds of description and different kinds of explanation. My point here is that for two pairs of kinds of description and explanation, it is not the case, contrary to widespread assumptions among linguists, that the latter presupposes the former. Specifically, I will claim that i.
linguistic explanation that appeals to the genetically fixed (“innate”) languagespecific properties of the human cognitive system (often referred to as “Universal Grammar”) does not presuppose any kind of thorough, systematic description of human language; and that
82
Martin Haspelmath
ii. linguistic explanation that appeals to the regularities of language use (“functional explanation”) does not presuppose a description that is intended to be cognitively real. These are two rather different claims which are held together only at a rather abstract level. However, both are perhaps equally surprising for many linguists, so I treat them together here. Before getting to these two claims in §3 and §4, I will discuss what I see as some of the main goals of theoretical linguistic research, comparing them with analogous research goals in biology and chemistry.
2. Goals of theoretical linguistics I take “theoretical linguistics” as being opposed to “applied linguistics” (cf. Lyons 1981: 35), so that all kinds of non-applied linguistics fall in its scope, including language-particular description.1 There are many different goals pursued by theoretical linguists, e.g. understanding the process of language acquisition, or understanding the spread of linguistic innovations through a community. Here I want to focus just on the goals of what is sometimes called “core linguistics”. I distinguish four different goals in this area: i.
language-particular phenomenological description, resulting in (fragments of) descriptive grammars; ii. language-particular cognitively realistic description, resulting in “cognitive grammars” (or “generative grammars”); iii. description of the “cognitive code” for language, i.e. the elements of the human cognitive apparatus that are involved in building up (= acquiring) a cognitive grammar (the cognitive code is also called “Universal Grammar”); iv. explanation of restrictions on attested grammatical systems, i.e. the explanation of grammatical universals.
The difference between the first two goals is that while descriptive grammars claim to present a complete account of the grammatical regularities, only cognitive grammars claim to mirror the mental grammars internalized by speakers.2 This more ambitious goal of formulating cognitively realistic descriptions is shared both by Chomskyan generative linguists and by linguists of the Cognitive Linguistics school. In practice the main differences between the two kinds of description are (i) that descriptive grammars tend to use widely understood concepts and terms, while cognitive/generative grammars tend to use highly specific terminology and notation, and (ii) that descriptive grammars are often content with formulating rules that speakers must possess, while cognitive/generative grammars often try to go
Does linguistic explanation presuppose linguistic description?
beyond these and formulate more general, more abstract rules that are then attributed to speakers’ knowledge of their language.3 As a simple example of (ii), consider the Present-Tense inflection of three Latin verb classes (only the singular forms are given here): (1) 1sg 2sg 3sg
a-conjugation
e-conjugation
Ø-conjugation
base + -o¯ laudo¯ base + -a¯s lauda¯s base + -at laudat ‘praise’
base + -eo¯ habeo¯ base + -e¯s habe¯s base + -et habet ‘have’
base + -o¯ base + -is base + -it
ago¯ agis agit ‘act’
Any complete descriptive grammar must minimally contain these three patterns, because they represent productive patterns in Latin. However, linguists immediately see the similarities between the three inflection classes, and a typical generative or cognitive grammar will try to relate them to each other, e.g. by saying that the abstract stems are lauda¯-, habe¯-, and ag-, that the suffixes are -o¯, -s and -t, and that morphophonological rules delete a¯ and shorten ¯e before o¯, shorten both a¯ and ¯e before -t, and insert i between g and s/t. It seems to be a widespread assumption that speakers extract as many generalizations from the data as they can detect, and that linguists should follow them in formulating their hypotheses about mental grammars. (This assumption will be questioned below, §4.4.) The third goal, what I call “description of the cognitive code for language”, is at first sight the most controversial one among linguists. While this goal (often called “characterization of the nature of Universal Grammar”) is seen as the central goal of theoretical linguistics by Chomskyan generative grammarians, many nonChomskyans deny that there are any grammar-specific components of the human cognitive apparatus (e.g. Tomasello 1995; cf. also Fischer, this volume). However, it is clear that the nature of human cognition is relevant for our hypotheses about cognitive grammars, so the notion “cognitive code” can be understood more widely as referring to those properties of human cognition that make grammar possible (see Wunderlich, this volume, §1, for a very similar characterization of Universal Grammar). The traditional Chomskyan view is that the cognitive code in this sense is domain-specific, while non-Chomskyans prefer to see it as domain-general, being responsible also for non-linguistic cognitive capabilities. The fourth goal has also given rise to major controversies, because very different proposals have been advanced for explaining grammatical universals. Generative linguists have often argued that the explanation for universals can be derived directly from hypotheses about the cognitive code (= Universal Grammar), and that conversely empirical observations about universals can help constrain hypotheses about the cognitive code. By contrast, typologically oriented functional-
83
"has-r33"> "has-r23"> "has-r39"> "has-r13">
84
Martin Haspelmath
ists have argued that grammatical universals can be explained on the basis of properties of language use. This issue will be the main focus of §3. In view of these controversies in linguistics, it seems useful to compare the four major goals of theoretical linguistics to analogous goals in other sciencies. Some of my arguments below (see especially §3.3) will be based on analogies with other disciplines. I restrict myself to biology and chemistry. Parallels between linguistics and biology have often been drawn at least since August Schleicher (cf. recently Lass 1997; Haspelmath 1999b; Nettle 1999; Croft 2000), and the parallel with chemistry is due to Baker (2001). The parallels are summarized in Table 1. The unit of analysis that is compared to a language (or grammar) is a species in biology and a compound in chemistry. The first column of the table contains an abstract characterization of the goals. Table 1. linguistics
biology
chemistry
(unit: language)
(unit: species)
(unit: compound)
phenomenological description
descriptive grammar
zoological/botanical description
color, smell etc. of a compound
underlying system
“cognitive grammar”
description of species genome
description of molecular structure
basic building blocks
“cognitive code” (= elements of UG)
genetic code
atomic structure
explanation of phenomenology and system
diachronic adaptation evolutionary adaptation
?
explanation of basic building blocks
biology
nuclear physics
biochemistry
Parallel to descriptive grammars in linguistics, we have zoological and botanical descriptions in biology and phenomenological description of a chemical compound. The latter is not a prestigious part of the theoretical chemist’s job (though it is crucial in applied chemistry), but at least in traditional biology the phenotypical description of a newly discovered plant or animal species was considered an important task of the field biologist, making field biology and field linguistics quite parallel (the difference being that linguists cannot easily deposit a specimen in a museum).4 At a higher level of abstraction, chemists are interested in the molecular structure of compounds that is ultimately responsible for its phenomenological properties, and biologists are interested in the genome of a species that gives rise to
"has-r23">
Does linguistic explanation presuppose linguistic description?
the phenotype in a process of ontogenetic development. Similarly, linguists would like to know what the mental reality is behind the grammatical patterns that can be observed in speakers’ utterances. Cognitive grammars “underlie” speech in much the same way as the genome “underlies” an organism and the molecule “underlies” a chemical compound. Next, all three disciplines are interested in the basic building blocks that are used by the underlying system: atoms making up molecules, the genetic code for the genome of a species, and the “cognitive code” for a mental grammar. The basic building blocks put certain restrictions on possible underlying systems: There are only a little over 100 different types of atoms which can combine in limited ways, thus constraining the kinds of possible molecules (and thus compounds); there are only four different “letters” of the genetic “alphabet” (or twenty different amino acids coded by them), thus constraining the kinds of possible genomes (and hence species); and presumably the cognitive code also shows its limitations, thus constraining the possible kinds of mental grammars (and thus languages). Now if we want to explain the properties of the basic building blocks, we have to move to a different scientific discipline: Chemists have to go to nuclear physics to learn about the nature of atoms, biologists have to go to biochemistry to learn about the nature of DNA, and cognitive scientists have to go to biology (neurology, genetics) to learn about the nature of the cognitive apparatus. However, there is also a different mode of deeper explanation, both in biology and in linguistics. Biologists explain the properties of organisms by an evolutionary process of adaptation to the environment, and similarly linguists can explain many properties of grammars through a diachronic process of functional adaptation (Haspelmath 1999b; Nettle 1999). Biological organisms live in many different kinds of environments, and their diversity is in part explained in this way. Grammatical systems, by contrast, “live” in very similar kinds of environments; human “needs” for grammar are largely invariant across populations, and cultural differences have only a limited impact on grammars (e.g. in the area of polite pronouns). Thus, functional explanations are mostly confined to universal properties of grammars in linguistics, but otherwise the similarities between evolutionary explanation in biology and functional explanation in linguistics are very strong. (I am not aware of an analogy to evolutionary/functional explanation in chemistry.) We are now ready to discuss the two major controversial claims of this paper.
85
"has-r25"> "has-r20">
86
Martin Haspelmath
3. The search for Universal Grammar does not presuppose linguistic description For Chomsykan generative linguistics, the characterization of the cognitive code (“Universal Grammar”) is the ultimate explanatory goal. The general consensus seems to be that Universal Grammar is explanatory in two different ways: On the one hand, UG explains observed universals of grammatical structure: The next task is to explain why the facts are the way they are, facts of the sort we have reviewed, for example [e.g. binding phenomena, M. H.]. This task of explanation leads to inquiry into the language faculty. A theory of the language faculty is sometimes called universal grammar… Universal grammar provides a genuine explanation of observed phenomena. From its principles we can deduce that the phenomena must be of a certain character, given the initial data that the language faculty used to achieve its current state. (Chomsky 1988: 61–62)
On the other hand, UG explains the fact that language acquisition is possible, despite the “poverty of the stimulus”. The above quote continues as follows: To the extent that we can construct a theory of universal grammar, we have a solution to Plato’s problem [i.e. the question how we can know so much despite the poverty of our evidence, M. H.] in this domain. (Chomsky 1988: 62)
Chomsky’s choice of a grand term like “Plato’s problem” suggests that he regards this second explanatory role of UG as more important. Hoekstra and Kooij (1988: 45) are quite explicit about this: [T]he explanation of so-called language universals constitutes only a derivative goal of generative theory. The primary explanandum is the uniformity of acquisition of a rich and structured grammar on the basis of varied, degenerate, random and non-structured experience… This situation contrasts sharply with the one found in [functionalist theories]. The explananda for these theories are the language universals themselves. (1988: 45)
Thus, UG is conceived of as a very important type of explanation in Chomskyan linguistics. In this section I argue that UG cannot be discovered on the basis of linguistic description (either cross-linguistic or language-particular), and that it cannot serve as an explanans for observed universals of language structure. 3.1 From comparative grammar to Universal Grammar? Now how do we arrive at hypotheses about the nature of UG? Haegeman (1994: 18) summarizes a view that was widespread in the 1980s and 1990s (and is perhaps still widespread):
"has-r15"> "has-r26"> "has-r19">
Does linguistic explanation presuppose linguistic description?
[B]y simply looking at English and only that, the generative linguist cannot hope to achieve his goal [of formulating the principles and parameters of UG]. All he can do is write a grammar of English that is observationally and descriptively adequate but he will not be able to provide a model of the knowledge of the native speaker and how it is attained. The generativist will have to compare English with other languages to discover to what extent the properties he has identified are universal and to what extent they are language-specific choices determined by universal grammar … Work in generative linguistics is therefore by definition comparative.
Generative work in the comparative-grammar tradition arrives at hypotheses about UG by examining a range of phenomena both within and across languages, formulating higher-level language-internal and cross-linguistic generalizations, and then building these generalizations into the model of UG. That is, the nature of UG is claimed to be such that the generalizations fall out automatically from the innate cognitive code. When a situation is encountered where some non-occurring structures could just as easily be described by the current descriptive framework (= the current view of UG) as the occurring structures, this is taken as indication that the descriptive framework is too powerful and needs to be made more restrictive. In this sense, one can say that description and explanation coincide in generative linguistics (whereas they are sharply distinguished in functional linguistics; cf. Dryer 1999, 2006). Let us look at a few simple examples. 3.1.1 Syntax: The X-bar schema In the generative framework of the 1960s, it was theoretically possible not only to have phrase-structure rules such as (2a–c) which actually occur, but also rules such as (2d–e), which apparently do not occur in any language. (2) a. b. c. d. e.
(the horse on the meadow) (often eats a flower) (right under the tree)
To make the framework more restrictive, Chomsky (1970) and Jackendoff (1977) proposed that Universal Grammar includes an X-bar schema (such as “XP Æ Y [x¢ X ZP]”) which restricts the possible phrase structures to those which consist of a head X plus a complement ZP and a specifier Y. The fact that only structures like (2a–c) occur now falls out from the theory of UG. Moreover, the X-bar schema captures the behavioral parallels between the projections of different categories (e.g. America invaded Iraq and America’s invasion of Iraq), and it may allow us to derive some of the best-known word-order universals of Greenberg (1963):
87
"has-r3"> "has-r19"> "has-r28">
88
Martin Haspelmath
We assume that ordering relations are determined by a few parameter settings. Thus in English, a right-branching language, all heads precede their complements, while in Japanese, a left-branching language, all heads follow their complements; the order is determined by one setting of the head parameter. (Chomsky and Lasnik 1993: 518)
3.1.2 Morphology: Lexicon and syntax as two separate components Greenberg (1963, universal 28) had observed that derivational affixes always come between the root and inflectional affixes when both inflection and derivation occurs on the same side of the root. Anderson (1992) proposed a model of the architecture of Universal Grammar from which this generalization falls out: If the lexicon and syntax are two separate components of grammar, and derivation is part of the lexicon, while inflection is part of the syntax, and if rules of the syntactic component, applying after lexical rules, can only add material peripherally, then Greenberg’s generalization follows from the model of UG. 3.1.3 Phonology: Innate markedness constraints of Optimality Theory Chomsky and Halle (1968:Ch.9) had observed that the machinery used throughout their book on English phonology could also be used to describe all kinds of nonoccurring or highly unusual phonological patterns. They felt that they were therefore missing significant generalizations and proposed a markedness theory as part of UG to complement their earlier proposals. A more recent and more successful version of this markedness theory is the markedness constraints of Optimality Theory. For example, Kager (1999: 40–43) discusses the phenomenon of final devoicing, as found in Dutch, where the underlying form /bed/ ‘bed’ is pronounced [bet]. This could be described by a 1960s-style rule “[obstruent] Æ [−voice] / __$” (= an obstruent is unvoiced in syllable coda position), but that framework would also allow formulating a non-occurring rule like “[obstruent] Æ [−voice] / $__” (= an obstruent is unvoiced in syllable onset position). In Optimality Theory, a markedness constraint *Voiced-Coda is proposed, which may be ranked below the faithfulness constraint Ident-IO(voice) (which favors the preservation of underlying voice contrasts), as in English, where /bed/ surfaces as [bed]. Alternatively, *Voiced-Coda may be ranked higher than Ident-IO(voice), so that a Dutch-type language results. The impossibility of a language with only initial devoicing follows from the fact that there is no constraint *Voiced-Onset in the model of UG. In this way, OT’s descriptive apparatus simultaneously explains cross-linguistic generalizations. I believe that none of these proposals are promising hypotheses about UG, and that they do not help explain cross-linguistic patterns, as I will argue in the next sub-section.
Does linguistic explanation presuppose linguistic description?
3.2 Cross-linguistic evidence does not tell us about the cognitive code (= UG) That typological evidence cannot be used in building hypotheses about UG, contrary to the views summarized in §3.1, has already been argued in some detail by Newmeyer (1998b) (see also Newmeyer, this volume). Newmeyer’s main arguments are: (i) Some robust typological generalizations, such as the correlation between verb-final order and wh-in situ order, do not fall out from any proposal about UG; (ii) the D-structure of generative syntax is not a good predictor of wordorder correlations; (iii) the predictions of the famous null-subject parameter have not held up to closer scrutiny (see also Newmeyer 1998a:357–358); (iv) simpler grammars are not necessarily more common than more complex grammars, e.g. grammars with preposition-stranding; (v) typologically rare patterns are not in general acquired later than frequent patterns; (vi) the Greenbergian word-order correlations are best explained by a processing theory such as Hawkins’s (1994) theory. Here I would like to add three more arguments that lead me to the same conclusion. 3.2.1 Universals as one-way implications A principles-and-parameters model is good at explaining two-way implications. If there is a head parameter, as suggested by Chomsky and Lasnik (1993) (see §3.1.1), it predicts that there should be exactly two types of languages: head-final languages (like Japanese) and head-initial languages (like English). Thus, Greenberg’s universal 2 (prepositional languages have noun–possessor order, postpositional languages have possessor-noun order) can be easily made to follow from categorial uniformity (i.e. X-bar theory) and a head parameter. However, in practice the observed cross-linguistic generalizations are mostly one-way implications, as illustrated by the examples in (3). (3) Some typical cross-linguistic generalizations a. If a language has VO order, the relative clause follows the head noun (but not the converse: if a language has OV order, the relative clause precedes the head noun) (Dryer 1991: 455). b. If a language has case-marking for inanimate direct-object NPs, it also has case-marking for animate direct-object NPs (but not the converse) (Comrie 1989: Ch. 6). c. If a language has a plural form for inanimate nouns, it also has a plural form for animate nouns (but not the converse) (Corbett 2000). d. If a language uses a reflexive pronoun with typically self-directed actions (‘wash (oneself)’, ‘defend oneself ’), then it also uses a reflexive pronoun
with typically other-directed actions (‘attack’, ‘criticize’) (but not the converse) (König and Siemund 1999). If a wh-phrase can be extracted from a subordinate clause, then it can also be extracted from a verb phrase (but not the converse) (Hawkins 1999: 263). If a language has a syllable-final voicing contrast, then it has a syllableinitial voicing contrast (but not the converse) (Kager 1999: 40–43).
The fact that robustly attested universals are mostly of the one-way implicational type means that they can also be conceived of in terms of universal preferences (Vennemann 1983): Postnominal relative clauses are universally preferred, animate plurals are preferred, reflexive pronouns are preferred for typically other-directed actions, syllable-initial voicing is preferred, and so on. In a model that just consists of rigid principles and variable parameters, such patterns cannot be accounted for. And conversely, such patterns do not yield evidence for principles of UG, unless one adopts a very different model of UG, in which the principles are not rigid but are themselves conceived of as preferences, as in much work under the heading of Optimality Theory. As was mentioned in §3.1.3, the constraint *Voiced-Coda explains the one-way implication in (3f) if no corresponding constraint *Voiced-Onset exists. Similarly, one might propose the constraints *RelNoun, *InanimAcc, *InanimPlural, *SelfdirectedReflpron, and *ClausalTrace to acount for (3a–e), and in fact the OT literature shows many markedness constraints of this type. According to McCarthy (2002: 15), “the real primary evidence for markedness constraints is the correctness of the typologies they predict”. Thus, this mode of explanation of observed universals is even more blatantly circular than the Chomskyan principles-and-parameters model, where there are usually other considerations apart from cross-linguistic distributions that also play a major role in positing principles of UG. Moreover, the resulting model of the cognitive code contains hundreds or thousands of highly specific innate principles (= constraints), many of which have a fairly obvious explanation in terms of general constraints on language use. To some extent, the OT literature itself mentions these functional explanations and cites them in support of the assumed constraints. For instance, Kager (1999: 5) states that “phonological markedness is ultimately grounded in factors outside of the grammatical system proper”, and Aissen (2003) relates her OT account of differential object marking to economy and iconicity (see also Haspelmath 1999b: 183–184). To the extent that good system-external explanations for the constraints are available, the standard OT model is weakened. An OT model with innate markedness constraints may be attractive from a narrow linguistic point of view because it allows languageparticular description and cross-linguistic explanation with the same set of tools, but from a broader cognitive perspective it is very implausible.
Does linguistic explanation presuppose linguistic description?
It is not just functionally oriented linguists who have pointed out that crosslinguistic generalizations of the type in (3) are best explained functionally and do not provide evidence for UG. Hale and Reiss (2000: 162), in a very antifunctionalist paper, write (for phonology): [M]any of the so-called phonological universals (often discussed under the rubric of markedness) are in fact epiphenomena deriving from the interaction of extragrammatical factors like acoustic salience and the nature of language change… Phonology [i.e. a theory of UG in this domain, M. H.] is not and should not be grounded in phonetics since the facts that phonetic grounding is meant to explain can be derived without reference to phonology.
3.2.2 Universals as preference scales Many implicational universals of the type in (3) are just special cases of larger implicational scales (cf. Croft 2003: §5.1). Some examples are listed in (4). (4) a.
Constituent order for languages with prepositions: RelN > GenN > AdjN > DemN (Hawkins 1983: 75ff.). b. Case-marking on direct objects: inanimate > animal > human common NP > proper NP > 3rd person pronoun > 1st/2nd person pronoun (Silverstein 1976; Comrie 1989: Ch. 6). c. Plural marking on nouns: mass noun > discrete inanimate > animal > human > kin term > pronoun (Smith-Stark 1974; Corbett 2000). d. Extraction site for wh-movement: S in NP > S > VP (Hawkins 1999:263). e. Voicing contrast: word-final > syllable-final > syllable-initial.
Scalar phenomena immediately suggest an explanation in terms of gradient extralinguistic concepts like economy, frequency, perceptual/articulatory difficulty, and so on. Thus, the scale in (4e) is presumably due to the increasing difficulty of maintaining a voice contrast in syllable-initial position (when it is easiest), syllablefinal position, and word-final position. Similarly, the further left a direct object is on the scale in (4b), the easier it is to predict its object role, so that case-marking is increasingly redundant. And as Hawkins (1994) shows, the shorter a prenominal constituent is in a prepositional language, the less processing difficulty it causes, which explains the implicational scale in (4a). These scalar universals have always been felt to be irrelevant to principles-andparameters models of UG, but more recently they have been discussed in the context of Optimality Theory. Thus, Aissen (2003) proposes a fixed constraint hierarchy (“*Obj/Human » *Obj/Animate » *Obj/Inanimate”) that allows the implicational scale in (4b) to fall out from her model of UG. But as in the case of the constraints mentioned in §3.2.1, this constraint hierarchy is very implausible as a component of UG. Attributing it to UG is apparently motivated exclusively by the desire to make as many phenomena as possible fall under the scope of UG.5
91
"has-r10"> "has-r15"> "has-r22"> "has-r40">
92
Martin Haspelmath
3.2.3 Universals typically have exceptions According to Chomsky (1988: 62), “the principles of universal grammar are exceptionless”, but we know that many of the observed cross-linguistic generalizations have exceptions. Greenberg (1963) was aware of exceptions to some of his universals, and he weakened his statements by the qualification “almost always”, or “with overwhelmingly greater than chance frequency”. In the meantime, further research has uncovered exceptions to most of the universals that for Greenberg were still exceptionless, and none of the generalizations in (3) or (4) is likely to be exceptionless. So should we say that universals with exceptions are ignored, and only those relatively few universals for which no exceptions have been found are taken as significant, providing evidence for Universal Grammar? This would not be wise, because, as noted by Comrie (1989:20), we will never know whether we simply have not discovered the exceptions yet. Some generalizations have many exceptions (perhaps 20% of the cases), others have few (say, 2–3%), and yet others have very few (say, 0.01%), and so on (see also Dryer 1997). Thus, on purely statistical grounds, there is every reason to believe that there are also generalizations with exceptions that we could only observe if there existed six billion languages in the world. The same conclusion is drawn by the antifunctionalists Hale and Reiss (2000: 162), for phonology: It is not surprising that even among their proponents, markedness “universals” are usually stated as “tendencies”. If our goal as generative linguists is to define the set of computationally possible human grammars [i.e. those allowed by UG, M. H.], “universal tendencies” are irrelevant to that enterprise.
This echoes Newmeyer’s (1998b: 191) conclusion, for the domain of syntax: The task of explaining the most robust typological generalizations, the Greenbergian correlations, falls not to UG, but to the theory of language processing. In short, it is the task of grammatical theory [= UG theory, M. H.] to characterize the notion possible human language, but not the notion probable human language. In this sense, then, typology is indeed irrelevant to grammatical theory.
The generative linguists Hale and Reiss and Newmeyer are thus in agreement that the role of the generative enterprise in accounting for the limits of linguistic diversity is much smaller than is typically assumed. Wunderlich (this volume, §3) concurs: “UG is less restrictive than is often thought”. In practice, language structure is primarily constrained by functional factors, not by Universal Grammar. 3.3 Possible languages and possible organisms Clearly, in the vast space of possible human languages, only a small part is populated by actual languages — that part which contains languages that are usable. There is little doubt that the set of computationally possible languages
Does linguistic explanation presuppose linguistic description?
includes languages with only monosyllabic roots and only disyllabic affixes; languages with accusative case-marking of only indefinite inanimate objects; languages with eight labial and sixteen dorsal, but no coronal consonants; and so on. Such languages could be acquired and used, but they would not be very userfriendly, and they would undergo change very soon if they were created artificially in some kind of experiment. This is completely analogous to the vast space of possible organisms. Presumably, the structure of the genetic code readily allows for three-legged mammals, trees that shed their leaves in the spring, or herbivorous spiders. The reasons why we don’t find such things among the existing species is well-known: they would have no chance of surviving.6 We do not even need experiments involving genetic engineering to be sure of this, because nature itself occasionally creates monsters whose sad fate we can observe. Of course, there are presumably also some restrictions on possible organisms which are due to the genetic code, and likewise, it seems plausible that there are some restrictions on possible grammars which are due to the cognitive code. For instance, it could be that no language can have a rule that inserts an affix after the third segment of a word (“grammars don’t count”), or a rule that requires certain constructions to be pronounced faster than others (“grammars make use of pitch and intensity, but not speed of pronunciation”). Such rules may simply be unlearnable in an absolute sense. But the comparative study of attested languages does not help us much to find restrictions of this kind if they also have a plausible functional explanation. More generally, it does not help us much in identifying the cognitive code for language.7 Analogously, the comparative study of plant and animal species does not help us in identifying the genetic code in biology. Comparative botany and zoology were sophisticated, well-developed disciplines before genetics even began to exist. And Darwinian evolutionary theory was originally built on comparative botany and zoology, not on genetics. The discoveries of 20th century genetics mostly confirmed what evolutionary biology had discovered in the 19th century. Similarly, once we know more about the cognitive code for language, I expect it to confirm what functionalist linguists have discovered on the basis of comparative linguistics. So I conclude that the empirical study of cross-linguistic similarities does not help us in identifying the cognitive code that underlies our cognitive abilities to acquire and use language. The cognitive code evidently allows vastly more than is actually attested, and cross-linguistic generalizations can be explained by general constraints on language use.8 From this perspective, it is odd to refer to UG as a “bottleneck” through which innovations in language use must pass (Wunderlich, this volume, §1). The real bottleneck is language use itself (cf. Kirby 1999: 36, Kirby et al., this volume, §4).
93
"has-r48">
94
Martin Haspelmath
3.4 What kind of evidence can give us insights into the cognitive code? There are of course many other ways in which one could try to get insights into the nature of the cognitive code. The most direct way would be to study the neurons and read the cognitive code off of them directly, somewhat like modern genetics can look at chromosomes at the molecular level, sequence DNA strings and identify the genes on them. But neurology is apparently much more difficult than molecular genetics, so this direct method does not give detailed results yet. The study of the genetic code did not begin at the molecular level with DNA sequencing, but at the level of the organism, using a range of simple but ingenious experiments with closely related organisms (Gregor Mendel’s experiments with the progeny of different varieties of pea plants, which led him to formulate the first theory of heredity). I would like to suggest that unusual experiments of this kind hold some promise for the study of the cognitive code. However, while ordinary psycholinguistic or neurolinguistic experiments with mature speakers may give us insights about their language-particular mental representations, they do not tell us much about the cognitive code in general. What we really need to test the outer limits of UG is experiments on the acquisition of very unlikely or (apparently) impossible languages. For ethical and practical reasons, it is virtually impossible to create an artificial language, use it in the environment of a young child and see whether the child acquires it. And yet this is the kind of experiment that would give the clearest results. So it is worth looking at situations that approach this “ideal” experimental setup to some extent: i. The natural acquisition of an artificial language like Esperanto: There has long been a sizable community of Esperanto speakers, and some have acquired Esperanto natively because it is used as the main language at home (see Versteegh 1993). To the extent that Esperanto has structural properties that are not found in any natural human languages, we can study the language of Esperanto native speakers and see whether these speakers have problems in acquiring them. (I do not know whether such studies have been carried out.) Similarly, it may be possible to derive insights from languages which were once only used in written form and acquired through instruction in the classroom but then became spoken vernaculars, as happened most famously with Modern Hebrew. See Weiß (this volume) for related discussion. ii. Artificial acquisition experiments with adult subjects: Bybee and Newman (1995) created fragments of artificial languages and exposed adults to them, letting them “acquire” these languages as second languages. They found that a language with systematic stem changes is not more difficult to acquire than a language with affixation, and they claimed that the comparative rarity of stem changes has to do with the likelihood of certain diachronic changes, not with their synchronically
"has-r42"> "has-r17"> "has-r24"> "has-r14">
Does linguistic explanation presuppose linguistic description?
dispreferred status. See also Smith et al. (1993) for somewhat more sophisticated experiments with a single highly skilled speaker. iii. Language games (also known as “ludlings”): These are special speech registers involving rule-governed phonological manipulations of ordinary speech, such as, for example, “insert a k into every syllable”, or “say every word backwards”. They are often used fluently by speakers, and they show that the cognitive possibilities are apparently much greater than the patterns that are attested in ordinary languages (see also the quotation from Anderson 1999 in note 8). An example is the Indonesian language game Warasa, which in each word replaces the first onset of the final foot and anything preceding it with war. Compare the following sentence, recorded from spontaneous speech (Gil 2002; the second line gives the ordinary Indonesian equivalents): (5) Warengak warabu warengkau warumbuk waranges ang. (Bengak labu engkau tumbuk n-anges ang.) lie lie you hit ag-cry fut [Conversation amongst friends deteriorates into argument] ‘Liar, I’m going to beat you until you cry.’
No known ordinary language has processes of this kind (at least not applying to every word in an utterance), but the existence of such language games shows that this is not because the cognitive code does not allow us to learn and use them. Another conceivable source of insights into the cognitive code would be unlearnable patterns in adult languages. It is often claimed that some patterns cannot be learned on the basis of positive evidence (“poverty of the stimulus”, see the discussion in Fischer, this volume), but we still know very little about what can and what cannot be acquired on the basis of positive evidence. As Hawkins (1988: 7–8) pointed out, there are also language-particular facts that seem difficult to acquire without negative evidence (e.g. the English contrast between *Harry is possible to come and Harry is likely to come). Culicover (1999), too, stresses the large amount of language-particular idiosyncrasies that every child acquires effortlessly and points out that a highly general mechanism such as the Chomskyan UG does not seem to be of much help here. Be that as it may, all these diverse approaches to understanding the cognitive code do not depend on a thorough, systematic description of languages (recall that this was my first claim of §1). Rather, the nature of UG needs to be studied on the basis of other kinds of system-external evidence (which of course does presuppose some kind of superficial description, but not the sort of thorough, systematic description that linguists typically spend much effort on).
4. Functional explanation does not presuppose cognitively realistic description 4.1 Phenomenological descriptions are sufficient for functional explanation In this section I justify my claim of §1 that there is another respect in which linguistic explanation does not presuppose linguistic description: Functional explanations of language universals (of the kind illustrated by works like Haiman 1983; Comrie 1989; Hawkins 1994, 1999; Haspelmath 1999a; Croft 2003) do not presuppose cognitively realistic descriptions of languages, but can make do with phenomenological descriptions (using basic linguistic theory, cf. Dryer 2006). This is of course what we find in practice: Functional-typological linguists draw their data from reference grammars and generalize over them to formulate universals, which are then explained with reference to grammar-external factors. This is similar to adaptive explanations in biology, which do not presuppose knowledge of the genome of a species, but can be based on phenomenological descriptions of organisms and their habitat. This approach has been criticized by generative linguists on the grounds that only detailed analyses of particular languages can meaningfully be used in cross-linguistic comparison. For example, Coopmans (1983) (in his review of Comrie 1981) maintains that observations about surface word order cannot be used to argue against a particular X-bar theory, because only specific, thorough grammatical analyses that are incompatible with a proposal about UG can be used to refute such a proposal. Newmeyer (1998a) goes even further in demanding that functional explanations should be based on “formal analysis” even if they are not presented as being incompatible with hypotheses about UG:9 [F]ormal analysis of language is a logical and temporal prerequisite to language typology. That is, if one’s goal is to describe and explain the typological distribution of linguistic elements, then one’s first task should be to develop a formal theory. (Newmeyer 1998a:337)
I would agree with Newmeyer if he accepted phenomenological descriptions (of the kind typically found in reference grammars) as constituting “formal analyses” in his sense. They are surely “formal” in that every satisfactory reference grammar will make use of grammatical notions such as affix, case, agreement, valence, indirect object; virtually everybody agrees that grammars cannot be described using exclusively semantic or pragmatic notions (like agent, focus, coreference, recipient), i.e. in practice virtually everybody assumes the “autonomy of syntax” in Newmeyer’s (1998a:25–55) sense.10 The point that I want to emphasize here is that for the purposes of discovering
"has-r9"> "has-r29"> "has-r28"> "has-r37">
Does linguistic explanation presuppose linguistic description?
empirical universals (and explaining them in functional terms), it is sufficient to have phenomenological descriptions that are agnostic about what the speakers’ mental patterns are. We do not need “cognitive” or “generative” grammars that are “descriptively adequate”. “Observational adequacy” is sufficient. In other words, a descriptive grammar must contain all the information that a second-language learner (or perhaps a robot) would need to learn to speak the language correctly, but it need not be a model of the knowledge of the native speaker. Thus, most of the issues that have divided the different descriptive frameworks of formal linguistics and that have been at the center of attention for many linguists are simply irrelevant for functional explanations. In the next subsection, we will see a few examples illustrating this general point. 4.2 The irrelevance of descriptive frameworks for functional explanation 4.2.1 Final devoicing This was discussed in §3.1.3 and §3.2.1. An example comes from Dutch, where we have alternations like bedden [bed6] ‘beds’ vs. bed [bet] ‘bed’. Similar alternations are widespread in the world’s languages (cf. Keating et al. 1983). The functional explanation for this presumably refers to the phonetic difficulty of maintaining voicing distinctions in final position. This explanation is independent of the type of description: –
–
whether we assume an abstract underlying form /bed/ that is – either transformed to a surface form by applying a sequence of rules (of the type [+obstr] Æ [−voice]/__$), as in Chomsky and Halle (1968), – or used as the input for the generation of candidates, from which the optimal output form is selected, as in Optimality Theory (Kager 1999; McCarthy 2002), or whether we assume no abstract underlying form, so that all alternating stems have to be listed separately.
4.2.2 Inflection and derivation This was discussed in §3.1.2. The basic observation is that derivational affixes always come between the root and inflectional affixes when both inflection and derivation occur on the same side of the root. A functional explanation for this generalization appeals to the meaning differences between inflectional and derivational affixes: There is “a “diagrammatic” relation between the meanings and their expression” (Bybee 1985:35), such that the “closer” (more relevant) the meaning of a grammatical morpheme is to the meaning of the lexeme, the closer the expression unit will occur to the stem. This explanation is independent of the type of description:
whether the inflectional and the derivational components are strictly separate (as in Anderson 1992), or whether inflection and derivation are assigned to the same component obeying the same kinds of general principles (as in Lieber 1992).
4.2.3 Differential case-marking This was discussed in §3.2.1–2 (see 3b and 4b). It basically says that case-marking on direct objects is the more likely, the higher the object referent is on the animacy scale. A functional explanation for this is that the more animate a referent is, the less likely it is that it will occur as a direct object, and it is particularly unlikely grammatical constellations that need overt coding (cf. Comrie 1989: Ch. 6). This explanation is independent of the type of description: – – –
whether object-case marking is achieved by a set of separate rules as in Relational Grammar (cf. Blake 1990), or whether object-case marking is achieved by specifier-head agreement with an Agreement node, as in some versions of the Chomsykan framework, or whether a set of Optimality Theoretic constraints are employed (as in Aissen 2003).
4.2.4 Extraction of interrogative pronouns This was mentioned in §3.2.1–2 (see 3e and 4d). The relevant generalization here is that the more deeply embedded the gap is, the less likely the extraction is (S in NP > S > VP; Hawkins 1999: 263). A functional explanation for this is that constructions with more deeply embedded gaps have larger “Filler–Gap Domains” and are hence more difficult to process (Hawkins 1999). This explanation is independent of the type of description: –
–
whether extraction constructions are described by an undelying structure with the interrogative pronoun in its expected position, which is transformed by a movement operation (restricted by subjacency and bounding nodes), or whether the interrogative pronoun is base-generated in initial position and related to the gap by a more eleborate feature system (as in Gazdar et al. 1985).
4.2.5 Word-order preferences There is a very strong preference for agents to precede patients in simple transitive clauses (cf. Greenberg 1963: 77, Universal 1). A functional explanation of this is that agents are typically thematic, and more thematic information tends to precede less thematic information (see Tomlin 1986). This explanation is independent of the type of description:
Does linguistic explanation presuppose linguistic description?
– – –
whether consituency or dependency is assumed to be the major organizing principle of syntax, and if constituency is assumed, whether a completely flat structure, lacking a VP, is assumed ([S NPag V NPpat]) or whether a clause structure with a VP is assumed ([S NPag [VP V NPpat]].
4.2.6 Article–possessor complementarity In Haspelmath (1999a), I discussed the phenomenon that languages sometimes require the definite article to be omitted in the presence of a possessor (cf. English *Robert’s the bag/*the Robert’s bag). My functional explanation of this was that the definite article is somewhat redundant in this construction, because possessed noun phrases are significantly more likely to be definite than non-possessed noun phrases. This explanation is independent of the type of description: –
–
whether a determiner position is assumed that can be filled only once, either by the definite article or by the possessor (as in Bloomfield 1933: 203; Givón 1993: 255; McCawley 1998: 400, among many others, for English), or whether it is not assumed that there is such a determiner position, and that the grammar has to include a separate statement to the effect that the definite article must be omitted from possessed noun phrases (as in pre-structuralist descriptions, as well as Abney 1987: 271, and much subsequent work in the Chomsykan tradition).
Thus, as Dryer (1999: §2) points out (note that Dryer uses the term “descriptive framework” and “metalanguage” interchangeably): [W]e do indeed need to describe languages, and describing them entails having some sort of metalanguage, but it does not particularly matter what the metalanguage is. There may be practical considerations, such as choosing a mode of description that is user-friendly, but on the whole the choice of metalanguage is devoid of theoretical implications.
A reviewer observes that it should not be a criterion for the scientific value of an approach that it avoids making choices in the cases of §4.2.1–6, especially since the competing descriptions do not all make the same predictions. The latter observation is probably correct, though (as the reviewer also recognizes) the full range of predictions of a particular description is rarely explored. Typically linguists argue for a particular description primarily on conceptual grounds (see §4.4 below), not because it accounts better for all the data. This introduces a strong element of subjectivity into linguistic description, and for this reason I have to disagree with the reviewer: It is indeed a sign of the scientific value of an approach if it avoids subjective decisions and stays out of debates that are hardly resolvable by empirical considerations.
99
"has-r32"> "has-r13">
100
Martin Haspelmath
In the preceding two subsections I have contrasted my approach mostly with the Chomsykan approach, but of course many functional linguists, too, are claiming that their descriptions are cognitively real (e.g. work in the cognitive grammar tradition of Langacker 1987: 91). What I have said about generative approaches mostly also applies to these functionalist approaches: Their descriptive proposals presuppose a dangerous number of subjective decisions, and it is a virtue of the approach favored here that it depends neither on particular generative nor on particular functionalist descriptive frameworks. 4.3 Cross-linguistic generalizations are not premature We saw in §4.1 that Newmeyer (1998a) asserts the need for “formal analysis” to precede cross-linguistic generalization and functional explanation. And clearly he is not content with phenomenological descriptions of the sort found in reference grammars: [T]he only question is how much formal analysis is a prerequisite [to functional analysis]. I will suggest that the answer is a great deal more than many functionally oriented linguists would acknowledge. To read the literature of the functional-typological approach, one gets the impression that the task of identifying the grammatical elements in a particular language is considered to be fairly trivial. (Newmeyer 1998a: 337–338)
In the last sentence of this passage, Newmeyer seems to confuse two things: on the one hand, the definition of language-particular grammatical classes, which many reference grammars devote considerable attention to (and which by contrast is typically considered trivial by generative linguists), and on the other hand, the definition of categories for cross-linguistic comparison. The latter must be based on meaning (cf. Croft 2003: 6–12), so the detailed formal analysis found in reference grammars is not directly relevant to it. For instance, distinguishing adjectives and verbs in a particular language may require detailed discussion of mood forms, relativization strategies and comparative constructions, but a crosslinguistic study of (say) property word syntax only needs a (“fairly trivial”) semantic characterization of its subject matter. Most of the analytical effort in generative grammar is in fact not devoted to the identification of language-particular categories, but to the identification of categories attributed to Universal Grammar. And this is, of course, extremely difficult: Assigning category membership is often no easy task… Is Inflection the head of the category Sentence, thus transforming the latter into a[n] Inflection Phrase? … Is every Noun Phrase dominated by a Determiner Phrase? … There are no settled answers to these questions. Given the fact that we are unsure precisely what the inventory of categories for any language is, it is clearly premature to make sweeping claims about their semantic or discourse roots. Yet much functionalist-based typological work does just that. (Newmeyer 1998a: 338)
"has-r7">
Does linguistic explanation presuppose linguistic description?
The idea that Infl is the head of IP (= S¢), or that noun phrases are really DPs, did not come from the study of particular languages, but from certain speculative considerations about what the categories of UG might be.11 As we saw in §3, it is clearly premature to make sweeping claims like these about UG, so it is not surprising that consensus about such matters is generally reached only through authority. But it is not premature to provide phenomenological descriptions of particular languages, and to formulate cross-linguistic generalizations on their basis. 4.4 What kind of evidence can be used for cognitively realistic descriptions? It is fortunate that we do not need cognitively realistic descriptions for functional explanations, because such descriptions are extremely difficult to come by. How would we choose between two competing descriptions of a phenomenon for an individual language? To take a concrete example: How do we choose between the determiner-position analysis of English article–possessor complementarity and the alternative analysis that operates without a determiner concept? Both descriptions are “observationally adequate”, but which one is more “descriptively adequate”, i.e. which one reflects better the generalizations that speakers make? How do we know whether English speakers make use of a determiner concept?12 Two general guiding principles that formal linguists use to make the choice are: (i) Choose the more economical or elegant description over the less economical/ elegant description, and (ii) choose the description that fits better with your favorite view of Universal Grammar. The determiner-position analysis was first proposed for English by Bloomfield (1933: 203), who was among the most influential authors in disseminating the idea that descriptions are more highly valued if they are economical or elegant. It was adopted by generative grammarians, until for unrelated reasons a view of UG became prevalent which did not allow the determiner-position analysis anymore (cf. Abney 1987 and subsequent work in the Chomskyan framework, where the determiner and the possessor are seen as occupying two different positions).13 Unfortunately, both these principles are unlikely to lead to success. The first principle (favoring economy/elegance) is of little help because we do not know whether speakers prefer the most economical or elegant description, and even if we knew that they do, we would not know exactly what they want to economize on primarily (for instance, whether they want to economize on components of the grammar, on analytical concepts, on individual rules, or on items listed in the lexicon), and exactly what appears elegant to them. (In actual fact, there are many indications that speakers are more concerned with processing efficiency than with elegance of the system.)
101
102
Martin Haspelmath
The second principle (favoring conformity with UG) is of little help because, as we saw in §3, cross-linguistic description (or indeed detailed language-particular description) does not help us in discovering UG, and the other sources of evidence have not yielded much information yet, so that we know almost nothing about UG at this point. It seems that as in the case of the search for UG (cf. §4), we have to look beyond the evidence provided by language description, and consider evidence from psycholinguistics, neurolinguistics, and language change (i.e. “external evidence”, or “substantive evidence”). The relevance of evidence from these sources for cognitive grammars and the cognitive code has often been acknowledged by linguists, but what they have typically had in mind is that external evidence can be used in addition to evidence from language description (and in practice, the evidence from language description has played a much more significant role). What I am saying here is that external evidence is the only type of evidence that can give us some hints about how to choose between two different observationally adequate descriptions.
5. Conclusion To summarize, I have made the following claims in this paper: – – – – – –
cross-linguistic data cannot be used to argue for (or against) a model of UG; conversely, a model of UG cannot be invoked to explain cross-linguistic generalizations; a model of the cognitive code requires evidence from domains other than language description; cross-linguistic generalizations are best explained by system-external constraints on language use, i.e. functionally; cognitively realistic description of individual languages is not a necessary prerequisite for functional explanation of universals; a model of a speaker’s knowledge of a language cannot be based on a description of the language but requires evidence from domains other than language description.
Thus, we see that pure language description can only give us phenomenological descriptions and phenomenological universals, and that it does not help us much with cognitively realistic description and the cognitive code. This may seem like a somewhat pessimistic conclusion, because it reduces the role of “pure” linguistics in addressing the theoretical goals of Table 1 above. However, “pure” linguistics will not become unemployed anytime soon. Even if half the world’s languages become extinct by the end of this century, there will still be three thousand
Does linguistic explanation presuppose linguistic description?
languages left to be described, and plenty of cross-linguistic generalizations (and their functional explanations) remain to be discovered or tested. And those who mostly care about what is in our head before language acquisition (i.e. the cognitive code, or Universal Grammar) or after language acquisition (i.e. the cognitive grammar) will have plenty of other sources of evidence to tap.
Notes *I am grateful to Martina Penke, Anette Rosenbach, and Helmut Weiß for detailed comments on an earlier version of this paper, as well as to an audience at the Max Planck Institute for Evolutionary Anthropology. 1. One often encounters an opposition “theoretical vs. descriptive linguistics”, but this makes little sense, as any description presuposes some kind of descriptive framework or theory (cf. Dryer 2006 for recent discussion). Of course, some linguistic work primarily aims to increase our knowledge about particular languages, whereas other work focuses on increasing our knowledge about language in general or about the best descriptive frameworks (or descriptive theories). The latter type of work is best called “general linguistics” (Lyons 1981: 34). 2. Thus, my “phenomenological description” seems to correspond to Chomsky’s “observational adequacy”, while my “cognitively realistic description” seems to correspond to Chomsky’s “descriptive adequacy”. 3. These two differences which we find in practice are not definitional, however. Descriptions which are not intended to be cognitively realistic may still include highly abstract concepts and statements (as many descriptions in the American structuralist tradition), and descriptions which are intended to be cognitively realistic may favor low-level generalizations and concreteness (e.g. Bybee 1985). 4. True, just as biologists can come home from a field trip with a specimen of a new species, field linguists can collect specimens of speech and deposit tapes and transcriptions in a linguistic archive. But of course the ultimate goal is the description of the type, not the specimen token, and in linguistics the “type” (i.e. the grammar) cannot easily be reconstructed on the basis of specimens of speech, especially if they consist of only a few hours (or less) of speech. Complete grammatical description also requires experimentation (i.e. elicitation). 5. Aissen does not actually say that she conceives of her constraints and constraint hierarchies as being part of the innate cognitive code; and by prominently invoking the notions of economy and iconicity, she invites the inference that she thinks of her model as a kind of formalization of the functional explanations of Silverstein (1976) and Comrie (1989), not as a contribution to the theory of UG. If that is the right interpretation, then Aissen’s work is irrelevant to the present concerns. A recent paper that makes use of similar concepts but adopts an explicitly functionalist point of view, minimizing the role of innate factors, is Jäger (2003). 6. Not surprisingly, linguists of Chomskyan persuasion often point out that even in biology, there may be other, nonadaptive factors that explain certain properties of organisms, such as Thompson’s (1961) principles of biological forms (e.g. Lightfoot 1999: 237; see Newmeyer 1998c for critical discussion). In this line of thinking, a reviewer suggests that the non-existence of three-legged mammals might be due to a general symmetry preference. One should not dismiss
103
"has-r4"> "has-r3"> "has-r9"> "has-r23">
104
Martin Haspelmath
such a possibility out of hand, but it is hardly an accident that almost all moving organisms show symmetrical bodies, while stationary organisms need not be symmetrical (flowers often have three, five or seven petals). Apparently symmetrical bodies make movement easier (note also that cars and airplanes are usually symmetrical, whereas houses are often asymmetrical). 7. Of course, there is one (rather trivial) sense in which cross-linguistic research gives us information about the cognitive code: If we find a language with a certain surprising property (e.g. a manner adverb agreeing in gender with the object, as in Tsakhur), then we know that the cognitive code must allow such a language. Data from language description can thus give us a lower bound on what the cognitive code can do, but not an upper bound. 8. Here is another quotation from a well-known generative linguist who agrees with this conclusion: …the scope of the language faculty cannot be derived even from an exhaustive enumeration of the properties of existing languages, because these contingent facts result from the interaction of the language faculty with a variety of other factors, including the mechanism of historical change. To see that what is natural cannot be limited to what occurs in nature, consider the range of systems we find in spontaneously developed language games, as surveyed by Bagemihl (1988)… …the underlying faculty is rather richer than we might have imagined even on the basis of the most comprehensive survey of actual, observable languages… …observations about preferences, tendencies, and which of a range of structural possibilities speakers will tend to use in a given situation are largely irrelevant to an understanding of what those possibilities are. (Anderson 1999: 121) 9. Note that since §3 argued that arguments for UG cannot be derived from typological evidence, it is implied that arguments against UG cannot be derived from typological evidence either. 10. Functionalists often describe their stance as differing from formalists in rejecting the Chomskyan autonomy thesis, but by this they generally mean that they reject the idea that language use should play no role in the explanation of language form, not that they reject autonomy in Newmeyer’s sense (i.e. that purely formal, non-semantic, non-functional concepts are systematically needed in the description of language form). See Haspelmath (2000) for more discussion of Newmeyer’s autonomy notion. 11. A reviewer objects that in the development of these ideas, data analysis and theoretical considerations went hand in hand. However, a close reading of Chomsky (1986) (the source of the “IP” idea) and Abney (1987) (the source of the “DP” idea) clearly shows that conceptual elegance was the main motivation, in particular the desire to fit all phrases into a uniform X-bar schema. In Abney’s (1987) crucial section II.3 (“The DP analysis”, pp. 54–88), the first twenty pages are entirely free of data, i.e. they consist of speculative considerations about what the categories of Universal Gramar might be. 12. Moreover, how do we know that all speakers of English make the same generalizations? It could be that for whatever reason, some speakers make use of a determiner concept in their mental grammars, while other speakers do not. 13. Abney (1987) proposed that the determiner occupies a head position. The possessor cannot be in this position because it can be phrasal (as in the girl’s bike), and heads cannot be phrasal.
Does linguistic explanation presuppose linguistic description?
References Abney, Stephen. 1987. The English noun phrase in its sentential aspect. Ph.D. Dissertation, MIT. Aissen, Judith. 2003. “Differential object marking: iconicity vs. economy”. Natural Language and Linguistic Theory 21.3: 435–483. Anderson, Stephen R. 1999. “A formalist’s reading of some functionalist work in syntax”. In: Darnell, Michael et al. (eds), Functionalism and formalism in linguistics, vol. 1 111–135. Amsterdam: Benjamins. Anderson, Stephen. R. 1992. A-morphous morphology. Cambridge: Cambridge University Press. Bagemihl, Bruce. 1988. Alternate phonologies and morphologies. Ph.D. Dissertation, University of British Columbia. Baker, Mark C. 2001. The atoms of language: the mind’s hidden rules of grammar. New York: Basic Books. Blake, Barry. 1990. Relational grammar. London: Routledge. Bloomfield, Leonard. 1933. Language. New York: Holt. Bybee, Joan L. 1985. Morphology: a study of the relation between meaning and form. Amsterdam: Benjamins. Bybee, Joan L.; and Newman, Jean E. 1995. “Are stem changes as natural as affixes?” Linguistics 33: 633–654. Chomsky, Noam A. 1970. “Remarks on nominalization”. In: Jacobs, Roderick A.; and Rosenbaum, Peter S. (eds), Readings in English transformational grammar 184–221. Waltham/MA: Ginn. Chomsky, Noam A. 1986. Barriers. Cambridge/MA: MIT Press. Chomsky, Noam A. 1988. Language and problems of knowledge: the Managua lectures. Cambridge, MA: MIT Press. Chomsky, Noam; and Halle, Morris. 1968. The sound pattern of English. New York: Harper and Row. Chomsky, Noam; and Lasnik, Howard. 1993. “The theory of principles and parameters”. In: Jacobs, Joachim et al. (eds), Syntax, vol. 1 506–569. Berlin: de Gruyter. Comrie, Bernard. 1981. Language universals and linguistic typology. Oxford: Blackwell. Comrie, Bernard. 1989. Language universals and linguistic typology. 2nd ed. Oxford: Blackwell. Coopmans, Peter. 1983. “Review of Language universals and linguistic typology by Bernard Comrie”. Journal of Linguistics 19: 455–474. Corbett, Greville. 2000. Number. Cambridge: Cambridge University Press. Croft, William. 2000. Explaining language change: an evolutionary approach. London: Longman. Croft, William. 2003. Typology and universals. 2nd ed. Cambridge: Cambridge University Press. Culicover, Peter W. 1999. Syntactic nuts. Oxford: Oxford University Press. Dryer, Matthew S. 1991. “SVO languages and the OV/VO typology”. Journal of Linguistics 27: 443–482. Dryer, Matthew. 1997. “Why statistical universals are better than absolute universals”. Chicago Linguistic Society 33: 123–145. Dryer, Matthew. 1999. “Functionalism and the metalanguage-theory confusion”. (downloadable from http://wings.buffalo.edu/linguistics/people/faculty/dryer/dryer/papers) Dryer, Matthew. 2006. “Descriptive theories, explanatory theories, and basic linguistic theory”. In: Ameka, Felix; Dench, Alan; and Evans, Nicholas (eds), Catching Language: The standing challenge of grammar writing 207–234. Berlin: Mouton de Gruyter. Gazdar, Gerald et al. 1985. Generalized phrase structure grammar. Oxford: Blackwell.
Gil, David. 2002. “Ludlings in malayic languages: an introduction”. In: Bambang, Kaswanti Purwo (ed.), PELBBA 15 (Pertemuan Linguistik Pusat Kajian Bahasa dan Budaya Atma Jaya: Kelima Belas) 125–180. Jakarta: Unika Atma Jaya. Givón, Talmy. 1993. English grammar, vols. 1–2. Amsterdam: John Benjamins. Greenberg, Joseph H. 1963. “Some universals of grammar with particular reference to the order of meaningful elements”. In: Greenberg, Joseph H. (ed.), Universals of grammar 73–113. Cambridge, Mass.: MIT Press. Haegeman, Liliane. 1994. Introduction to government and binding theory. Oxford: Blackwell. Haiman, John. 1983. “Iconic and economic motivation”. Language 59: 781–819. Hale, Mark; and Reiss, Charles. 2000. “‘Substance abuse’ and ‘dysfunctionalism’: current trends in phonology”. Linguistic Inquiry 31: 157–169. Haspelmath, Martin. 1999a. “Explaining article–possessor complementarity: economic motivation in noun phrase syntax”. Language 75.2: 227–243. Haspelmath, Martin. 1999b. “Optimality and diachronic adaptation”. Zeitschrift für Sprachwissenschaft 18.2: 180–205. Haspelmath, Martin. 2000. “Why can’t we talk to each other? A review article of [Newmeyer, Frederick. 1998. Language form and language function. Cambridge: MIT Press.]”. Lingua 110.4: 235–255. Hawkins, John A. 1983. Word order universals. New York: Academic Press. Hawkins, John A. 1994. A performance theory of order and constituency. Cambridge: Cambridge University Press. Hawkins, John A. 1999. “Processing complexity and filler-gap dependencies across grammars”. Language 75.2: 244–285. Hoekstra, Teun; and Kooij, Jan G. 1988. “The innateness hypothesis”. In: Hawkins, John A. (ed.), Explaining language universals 31–55. Oxford: Blackwell. Jackendoff, Ray. 1977. X-bar syntax: a study of phrase structure. Cambridge/MA: MIT Press. Jäger, Gerhard. 2003. “Learning constraint sub-hierarchies: the bidirectional gradual learning algorithm”. In: Blutner, R.; and Zeevat, Henk (eds), Pragmatics in optimality theory 251–287. Palgrave Macmillan. Kager, René. 1999. Optimality theory. Cambridge: Cambridge University Press. Keating, Patricia; Linker, Wendy; and Huffman, Marie. 1983. “Patterns in allophone distribution for voiced and voiceless stops”. Journal of Phonetics 11: 277–290. Kirby, Simon. 1999. Function, selection, and innateness: the emergence of language universals. Oxford: Oxford University Press. König, Ekkehard; and Siemund, Peter. 1999. “Intensifiers and reflexives: a typological perspective”. In: Frajzyngier, Zygmunt; and Curl, Traci S. (eds), Reflexives: forms and functions 41–74. Amsterdam: Benjamins [Typological Studies in Language 40]. Langacker, Ronald. 1987–1991. Foundations of cognitive grammar, vol. 1–2. Stanford: Stanford University Press. Lass, Roger. 1997. Historical linguistics and language change. Cambridge: Cambridge University Press. Lieber, Rochelle. 1992. Deconstructing morphology: word formation in syntactic theory. Chicago: University of Chicago Press. Lightfoot, David. 1999. The development of language: acquisition, change, and evolution. Oxford: Blackwell. Lyons, John. 1981. Language and linguistics: an introduction. Cambridge: Cambridge University Press. McCarthy, John J. 2002. A thematic guide to optimality theory. Cambridge: Cambridge University Press.
Does linguistic explanation presuppose linguistic description?
McCawley, James D. 1998. The syntactic phenomena of English. 2nd ed. Chicago: University of Chicago Press. Nettle, Daniel. 1999. Linguistic diversity. Oxford: Oxford University Press. Newmeyer, Frederick. 1998a. Language form and language function. Cambridge: MIT Press. Newmeyer, Frederick. 1998b. “The irrelevance of typology for grammatical theory”. Syntaxis 1: 161–197. Newmeyer, Frederick. 1998c. “On the supposed ‘counterfunctionality’ of Universal Grammar: some evolutionary implications”. In: Hurford, James R.; Studdert-Kennedy, Michael; and Knight, Chris (eds), Approaches to the evolution of language 305–319. Cambridge: Cambridge University Press. Silverstein, Michael. 1976. “Hierarchy of features and ergativity”. In: Dixon, R. M. W. (ed.), Grammatical categories in Australian languages 112–171. Canberra: Australian Institute of Aboriginal Studies. Smith, Neil V.; Tsimpli, Ianthi-Maria; and Ouhalla, Jamal. 1993. “Learning the impossible: the acquisition of possible and impossible languages by a polyglot savant”. Lingua 91: 279–347. Smith-Stark, T. Cedric. 1974. “The plurality split”. Chicago Linguistic Society 10: 657–661. Thompson, D’Arcy W. 1961. On growth and form. Cambridge: Cambridge University Press. Tomasello, Michael. 1995. “Language is not an instinct”. Cognitive Development 10: 131–156. Tomlin, Russell S. 1986. Basic word order: functional principles. London: Croom Helm. Vennemann, Theo. 1983. “Causality in language change: theories of linguistic preferences as a basis for linguistic explanations”. Folia Linguistica Historica 4: 5–26. Versteegh, Kees. 1993. “Esperanto as a first language: language acquisition with a restricted input”. Linguistics 31: 539–555.
107
Remarks on description and explanation in grammar Commentary on Haspelmath Judith Aissen and Joan Bresnan University of California at Santa Cruz / Stanford University
Haspelmath’s paper assumes an opposition between two modes of grammatical description, the “phenomenological” and the “cognitively real”. The difference between the two “is that while descriptive [phenomenological, JA & JB] grammars claim to present a complete account of the grammatical regularities, only cognitive grammars claim to mirror the mental grammars internalized by speakers” (p. 555). For Haspelmath, descriptive grammars exemplify the first mode and generative grammars, particularly those in the Chomskyan school, exemplify the second. To illustrate the differences, Haspelmath offers the inflectional paradigm. A phenomenological description could consist simply of a list of all the forms, while a generative description (aiming to model speaker knowledge) would likely extract generalizations from the paradigm and state them as rules. Haspelmath makes two points, both of which concern the limited relevance of cognitively real description to other goals of linguistics. The first is that cognitively real descriptions will not lead us closer to an understanding of the ‘cognitive code’, i.e. the mental structures which make language acquisition possible. The reason is that grammars are not a pure product of the cognitive code — they are also shaped by functional pressures. The second point is that cognitively real descriptions are not particularly relevant to the work of typologists: “for functional explanation, phenomenological description is sufficient” (p. 554). In both respects then, the point seems to be that descriptions which aim to model speaker knowledge have no privileged status. They are on a par with descriptions which aim ‘only’ for “a complete account of the grammatical regularities” (p. 555) One is left with the impression that the distinction between the two modes of description is of little importance outside the sociopolitics of the field, and indeed one of the differences Haspelmath appeals to in distinguishing the two is the use of familiar vs. technical terminology.
"ais-r3">
110
Judith Aissen and Joan Bresnan
We question whether Haspelmath’s distinction is viable, or whether it simply perpetuates, with a kind of inverse valuation, Chomsky’s earlier distinction between “observational adequacy” and “descriptive adequacy”. In part we are led to this question through our own experience with syntactic description and the fact that we cannot locate our practice in either of the two descriptive modes that Haspelmath defines. We believe that there are many other linguists for whom the same is true. Haspelmath’s discussion seems to presuppose that data and analytical frameworks (theories) exist on independent planes, and that the adequacy and completeness of a phenomenological description can be straightforwardly determined. But it is clear that data and theory are intertwined at every point. For example, Haspelmath writes (p. 568) that “every satisfactory reference grammar will make use of grammatical notions such as affix, case, agreement, valence, indirect object” as accepted phenomenological descriptions. Yet the concept of ‘indirect object’ is itself theory-laden and not at all applicable to every language (Dryer 1986, Siewierska 2003). For another example, Haspelmath writes (p. 570), “the more animate a referent is, the less likely it is that it will occur as a direct object”. Again, the concept of ‘direct object’, like the notion of ‘transitivity’ is far from a straightforward phenomenological category when the full range of semantic classes of verbs is taken into account — epistemic, emotive, motional, semi-transitive, verbs of creation, verbs with cognate objects, and the like, as well as the object-like arguments of unaccusative predicates. To quantify this claim and evaluate it empirically in a study of naturalistic language use would require careful consideration of the theoretical construction of ‘direct object’ status. There is a real sense in which particular theories allow us to conceive of facts that we could not (or did not) conceive of earlier. If so, the question whether a description provides a “complete account of the grammatical regularities” (p. 555) must be answered differently at different points in time. Research in syntax in the last 30 years has tremendously enlarged the range of data known to be relevant to wh-movement, binding, ellipsis, ergativity, and many other topics. The idea that any phenomenological description suffices for typology never confronts the question of how one could know whether or not such a description were “correct” or complete. It may be possible to list the full set of forms in inflectional morphology but the same is scarcely true in syntax. One of the values of generative grammar has been the idea that the adequacy of analyses could be checked by formulating precise, formal descriptions which generate testable empirical predictions. Haspelmath’s belief that few linguists fully explore the empirical predictions of their theories (p. 571), whether true or not, is not relevant to the fact that such testing is crucial in deciding among genuinely alternative
Commentary on Haspelmath
descriptions. A “complete account of the grammatical regularities” (p. 555) may indeed be all a typologist needs, but achieving such an account and knowing it is no small matter. We agree with Haspelmath that grammars are determined both by the ‘cognitive code’ and by functional pressures. In advance, one cannot know how to tease these forces apart, but clearly the expectation is that functional principles will play a role in grammar. One reason that formal syntacticians and semanticists have been drawn to Optimality Theory is because it provides a way to achieve a higher level of observational adequacy by the incorporation of functionally grounded principles. And it is perfectly possible to entertain the possibility that some functionallymotivated constraints are universal without holding that they are innate. Indeed, just as Haspelmath perpetuates with an inverse valuation the old Chomskyan distinction between observational and descriptive adequacy, so he perpetuates the later Chomskyan position that “description and explanation coincide in generative linguistics” (p. 560). Here Haspelmath presupposes the position that rich descriptive formalisms are explanatorily weak if they lack the deductive structure to constrain the space of possible descriptions to those that are universally attested. Haspelmath does not recognize the extent to which recent theoretical work has begun to undermine this position. Optimality Theory itself is quite subversive in this respect, which may be why it has attracted such a critical response within some generative circles (witness Hale and Reiss, cited by Haspelmath). The descriptive component in an Optimality-theoretic grammar is the generator GEN, which provides the space of possible descriptive analyses in accordance with the principle of “freedom of analysis”. In other words, GEN must ‘overgenerate’ by providing many possible but unexplanatory descriptive analyses. The explanatory burden in an OT grammar lies in the constraint set and its evaluation, which admit only a subset of the possibilities generated by GEN. In OT, therefore, description and explanation diverge because generation and evaluation diverge. It is precisely this divergence between description (generation) and explanation (constraint interaction under evaluation) that permits OT to be used to model and empirically test functional theories of grammar. Of universals as preference scales, Haspelmath writes (564), “Scalar phenomena immediately suggest an explanation in terms of gradient extralinguistic concepts like economy, frequency, perceptual/ articulatory difficulty, and so on.” [emphasis added–JA & JB] Yet functionally oriented OT theorists like Paul Boersma, Edward Flemming, Donca Steriade, and Bruce Hayes have deeply questioned whether these are truly “extralinguistic” concepts in phonology. We have raised similar questions in syntax (Bresnan and Aissen 2002), and the same questions arise in semantics and pragmatics (Blutner
111
"ais-r1"> "ais-r2"> "ais-r3">
112
Judith Aissen and Joan Bresnan
and Zeevat 2003). From our point of view it seems odd that Haspelmath’s critique continues to presuppose and perpetuate a narrowly Chomskyan position about the nature of grammar.
References Blutner, Reinhard; and Zeevat, Henk (eds). 2003. Optimality theory and pragmatics. Houndmills, Basingstoke, Hampshire: Palgrave/Macmillan. Bresnan Joan; and Aissen, Judith. 2002. “Optimality and functionality: objections and refutations”. Natural Language & Linguistic Theory 20: 81–95. Dryer, Matthew S. 1986. “Primary objects, secondary objects, and antidative”. Language 62: 808–845. Siewierska, Anna. 2003. “Reduced pronominals and argument prominence”. In: Butt, Miriam; and King, Tracy Holloway (eds), Nominals: inside and out 119–150. Stanford: CSLI Publications.
Author’s response Martin Haspelmath Max-Planck-Institut für evolutionäre Anthropologie, Leipzig
Aissen & Bresnan make two basic points which I would like to take up here: (i) that the distinction between phenomenological and cognitively real description is not viable and not reflected in many linguists’ practice, and (ii) that description and explanation no longer coincide in a major recent trend in the generative tradition, Optimality Theory. 1. Is the distinction “phenomenological vs. cognitively real description” irrelevant? Aissen & Bresnan are perfectly right that “achieving [a complete account of the grammatical regularities of a language] and knowing it is no small matter” (p. 581), but what I question is that this is made easier by simultaneously asking what the speakers have in their heads. They also correctly observe that “research in syntax in the last 30 years has tremendously enlarged the range of data known to be relevant” (p. 581), but again it is unclear what role the mentalistic perspective had in this (we have clearly made much less progress in our understanding of the mental representation of these data). There is no doubt that we should formulate “precise, formal descriptions which generate testable empirical predictions” (p. 581), but it seems to me a historical accident that this goal has come to be associated by some with Chomskyan generative grammar. The main new claim of generative grammar was that linguists’ descriptions should aim to model speakers’ mental grammars, and that linguists should also try to describe the cognitive code. What my paper argues is that this mentalistic approach to description does not bring us much closer to understanding grammars or to the nature of the cognitive code. I agree with Aissen & Bresnan that “data and theory are intertwined at every point” (p. 581), but their example of the concepts ‘indirect object’ and ‘direct object’ does not show that we need to go beyond phenomenological description. In fact, these concepts illustrate very well the kinds of pseudo-problems that arise if one approaches a new language with the assumption that it will instantiate a rich set of innate and therefore universal categories. As Dryer (1997) has convincingly argued, grammatical relations are language-particular notions. A concept such as ‘the French indirect object’ is a straightforward phenomenological notion, but as
"has2-r3"> "has2-r5"> "has2-r7">
114
Martin Haspelmath
soon as we try to find it in English or Huichol, the problems start (and they never end). (See also Croft 2000 for parts of speech, and Croft 2001 for grammatical categories in general.) When we compare languages, we have no choice but to resort to semantic notions, because only these are applicable across languages (see §4.3 of my paper).1 The problem with generative linguistics is that 80 percent of its efforts are not directed at “formulating precise, formal descriptions”, but at trying to show that a particular vision of Universal Grammar allows the linguist to capture generalizations that had previously seemed accidental.2 (In most cases, we have no idea whether speakers actually make these generalizations that linguists detect.) It may be that many linguists cannot yet locate their practice in either of the two descriptive modes that I have defined, but one of my goals in writing this paper was to make linguists aware that insightful description does not necessarily imply mentalistic description and relating a particular language to a hypothesized Universal Grammar. 2. Is Optimality Theory (OT) immune to the critique of my paper? I agree with Aissen & Bresnan in welcoming the trend in OT to incorporate functionally grounded principles. However, they are wrong in asserting that in OT, “description and explanation diverge because generation and evaluation diverge” (p. 582). OT replaces earlier constrained generation plus constrained rule application by unconstrained generation plus constraint-based evaluation, but both generation and evaluation are needed for the description of a language (until I see the constraint ranking of Pitjantjatjara, I do not know whether it has differential object marking or not). As in earlier generative models, the idea is that universal (and probably innate) language-internal constraints explain the observed grammatical universals, i.e. the descriptive framework also provides the explanation. This is stated clearly on the first page of the most authoritative book on OT: One of the most compelling features of OT, in my view, is the way that it unites description of individual languages with explanation in language typology… the grammar of one language inevitably incorporates claims about the grammars of all languages. (McCarthy 2002: 1)
Now of course there is a wide range of views among practitioners of OT, especially in phonology, where both antifunctionalists (like Mark Hale and Charles Reiss) and functionalists (like Paul Boersma and Donca Steriade) make use of the OT formalism and its key concepts. It appears that Aissen & Bresnan are leaning toward a functionalist stance and would not necessarily interpret the OT constraints with which they are working as elements of the cognitive code.3 As I say in the paper (note 4, §3.2.2), if that is the right interpretation of their approach, my critique is indeed not relevant to it. (But see Yu 2004 and Blevins 2004 for a critical assessment of functionalist OT work in phonology from an evolutionary-
functionalist point of view that is similar in many ways to the approach advocated in my paper.) In any event, I was very pleased to see that Aissen & Bresnan agree with me on the most basic point, that “grammars are determined both by the ‘cognitive code’ and by functional pressures. In advance, one cannot know how to tease these forces apart” (p. 581). If all theoretical grammarians recognized that teasing the two forces apart is their fundamental challenge, then a lot of progress could be made, I feel.
Notes 1. So when I refer to “direct objects” in a universal sense on p. 570 (§4.2.3), this is to be understood as a convenient abbreviation for “patient of (minimally) a change-of-state word when the agent is also referred to”, not as a label for a grammatical relation. 2. For example, according to Newmeyer (2003: 265), “one of the major advances in [generative, MH] syntactic theory was the proposal, first articulated in Bresnan (1970), that what we would later call the “specifier of CP” was the unique landing site for Wh-movement. This proposal paved the way for the hypothesis of a single movement rule Move-alpha…” It is these kinds of proposals that have had the greatest prestige in generative linguistics. 3. However, since they do not state clearly whether and in what way their views diverge from the “classical” OT approach summarized by McCarthy, one may get the (probably wrong) impression that their approach is fundamentally compatible with the classical approach.
References Blevins, Juliette. 2004. Evolutionary phonology. Cambridge: Cambridge University Press. Bresnan, Joan W. 1970. “On complementizers: toward a syntactic theory of complement types”. Foundations of Language 6: 297–321. Croft, William. 2000. “Parts of speech as language universals and as language-particular categories”. In: Vogel, Petra; and Comrie, Bernard (eds), Approaches to the typology of word classes 65–102. Berlin: Mouton de Gruyter. Croft, William. 2001. Radical construction grammar. Oxford: Oxford University Press. Dryer, Matthew. 1997. “Are grammatical relations universal?” In: Bybee, Joan; Haiman, John; and Thompson, Sandra A. (eds), Essays on language function and language type 115–143. Amsterdam: Benjamins. McCarthy, John J. 2002. A thematic guide to Optimality Theory. Cambridge: Cambridge University Press. Newmeyer, Frederick J. 2003. “’Interpretable’ features and feature-driven A-bar movement”. In: Beyssade, Claire; Bonami, Olivier; Cabredo Hofherr, Patricia; and Corblin, Francis (eds), Empirical issues in formal syntax and semantics 4 255–271. Paris: Presses de l’Université de Paris-Sorbonne. Yu, Alan C. L. 2004. “Explaining final obstruent voicing in Lezgian: phonetics and history“. Language 80.1: 73–97.
115
From UG to Universals Linguistic adaptation through iterated learning Simon Kirby, Kenny Smith and Henry Brighton University of Edinburgh
What constitutes linguistic evidence for Universal Grammar (UG)? The principal approach to this question equates UG on the one hand with language universals on the other. Parsimonious and general characterizations of linguistic variation are assumed to uncover features of UG. This paper reviews a recently developed evolutionary approach to language that casts doubt on this assumption: the Iterated Learning Model (ILM). We treat UG as a model of our prior learning bias, and consider how languages may adapt in response to this bias. By dealing directly with populations of linguistic agents, the ILM allows us to study the adaptive landscape that particular learning biases result in. The key result from this work is that the relationship between UG and language structure is non-trivial.
1.
Introduction
A fundamental goal for linguistics is to understand why languages are the way they are and not some other way. In other words, we seek to explain the particular universal properties of human language. This requires both a characterisation of what these universals are, and an account of what determines the specific nature of these universals. In this paper we examine a particular strategy for linguistic explanation, one which makes a direct link between language universals and an innate Universal Grammar (UG). It seems reasonable to assume that, if UG determines language universals, then language universals can be used as evidence for the structure of UG. However, we will argue that this assumption is potentially dangerous. Our central message is that we can seek linguistic evidence for UG only if we have a clear understanding of the mechanisms that link properties of language acquisition on the one hand and language universals on the other. In the following section we will discuss what is actually meant by the term UG. There are a number of differing senses of the term, but a neutral definition can be
"kir-r15">
118
Simon Kirby, Kenny Smith and Henry Brighton
given in terms of prior learning bias. We will then sketch an account of the universal properties of language in terms of this bias. In Section 3, we will compare this kind of explanation to an alternative approach, linguistic functionalism, which focuses on the use of language. A wellrecognised difficulty with this approach is the problem of linkage: what is the mechanism that links universals to linguistic functions? We claim that not only does the UG-approach suffer exactly the same problem, but the solution is the same in both cases. Section 4 sets out this solution in terms of Iterated Learning, an idealised model of the process of linguistic transmission. We survey some of the results of modelling iterated learning to show how it can help solve the problem of linkage. Finally, in the closing sections of the paper we argue that language universals, and linguistic structure more generally, should be viewed as adaptations that arise from the fundamentally evolutionary nature of linguistic transmission.
2. What is Universal Grammar? Before we discuss the role of UG in explaining language universals, we need to be clear what we mean. Unfortunately, there is some variation in how the term is used (see Jackendoff 2002 for an excellent review of the literature): i.
UG as the features that all languages have in common. Clearly, this equates UG exactly with universals. This is not the sense of UG that we will be concerning ourselves with in this paper. Initially, it may seem absurd to imply that a characterisation of UG in this sense could possibly be an explanation of the universal characteristics of human language. Rather, it may appear only to be a description of the properties of language. However, we should be careful about dismissing the explanatory significance of a theory of UG that ‘merely’ sets out the constraints on cross-linguistic variation. In fact, it is conceivable that a truly explanatory theory of language could consist of an account of UG in this sense. Chomsky (2002) gives an illuminating analogy that makes clear there is more than one way to explanatory adequacy in science. Consider, he suggests, the case of the discovery of the Periodic Table in late 19th century chemistry. To simplify somewhat, chemists, through careful experimental observations of the elements, were able to uncover a range of regularities that made sense of the behaviour of those elements. Repeating, periodic patterns could be seen if the elements were arranged in a particular way — approximately, as a table made up of rows of a fixed length.
"kir-r5">
From UG to Universals
In one sense we could see the periodic table as being merely a description of the behaviour of matter. We could claim that the discovery of the periodic table does nothing to explain the mass of experimental data that chemists have collected. This seems wrong. Surely such a concise and elegant generalisation is, in some sense, explanatory. See Eckman (this volume) for an extended discussion of the relationship between generalisation and explanation. The periodic table itself can now be explained by physicists with reference to more fundamental constituents of matter, but this does not alter the status of the table in chemistry itself. Are linguists in the process of discovering an equivalent of the periodic table? Is there a model of UG ‘out there’ that has the same combination of formal simplicity and predictive power? It is a worthy research goal, and one that is being pursued by many, but we may be chasing phantoms. As we will argue in this paper, UG should be considered as only part of an explanatory framework for language. ii. UG as the initial state of the language learning child. This sense of UG is very closely related to the previous sense. Jackendoff (2002) notes that Chomsky (1972) uses the term UG to denote the configuration of a language-ready child’s brain that sets the stage for language acquisition. This ‘state-zero’ can, in fact, be thought of as specifying the complete range of possible grammars from which a maturation process ‘picks’ a target grammar in response to linguistic data. It is natural to equate the space of languages specified in state-zero with the range of possible languages characterised by language universals. The main difference between this sense of UG and the previous one is that it gives UG an explicit psychological reality. iii. UG as initial state and Language Acquisition Device. Jackendoff (2002) points out that in its most common usage, UG is taken to correspond to the knowledge of language that the child is born with. This consists not only of the initial state, but also the machinery to move from this state to the final target grammar. Chomsky refers to this machinery as the Language Acquisition Device or LAD. For convenience, we will consider this device to encapsulate the initial state as well as the machinery of acquisition. This means that we will treat this sense of UG as simply a description of the LAD. In summary, there are a number of different ways we can think about what Universal Grammar actually is. This may seem like terminological confusion, but really all these senses have something fundamental in common: they all appear to relate UG directly with universals. The different senses we have surveyed differ primarily with respect to how UG is situated in a wider theory of cognition. The picture is something like the one shown in Figure 1. The broadly Chomskyan
119
"kir-r5"> "kir-r12">
120
Simon Kirby, Kenny Smith and Henry Brighton
program for linguistics is to uncover the properties of UG. Since UG and language universals are coextensive, then the evidence for UG can be derived directly from a careful characterisation of the (universal) properties of linguistic structure. PRIMARY LINGUISTIC DATA
LANGUAGE ACQUISITION DEVICE
GRAMMATICAL COMPETENCE
Defines/constrains UNIVERSAL GRAMMAR
Figure 1.The language acquisition device (LAD) takes primary linguistic data and generates the adult grammatical competence of a language. Universal grammar defines or constrains the operation of the LAD.
A sensible question is how we can characterise UG/LAD in such a way that there is a clear relationship between the theory and constraints on linguistic variation (i.e., universals). Various approaches are possible. For example, in Principles and Parameters theory (Chomsky 1981) there is a direct relationship between cross-linguistic parametric variation and the elements of the model, parameters, that are set in response to input data.1 Similarly, in Optimality Theory (Grimshaw 1997) variation arises from the constraint ranking that is arrived at through the acquisition process. The literature on machine learning (e.g., Mitchell 1997) suggests a general way of characterising the relationship between language learning and linguistic variation. We can think of the learning task for language to be the identification of the most probable grammar that generates the data observed. More formally, given a set of data D and a space of hypotheses about the target grammar H, we wish to pick the hypothesis h ŒH that maximises the probability Pr(h | D), in other words, the probability of h given D. From Bayes law, we have: Pr( h | D ) =
Pr( D | h )Pr( h ) Pr( D )
The task of the learner is to find: arg maxh ŒH Pr(h | D) = arg maxh ŒH Pr(D | h)Pr(h) (We can ignore the term Pr(D) since this is constant for all hypotheses). What is the contribution of UG/LAD in this framework? It is simply the prior bias of the learner. This bias is everything2 that the learner brings to the task
"kir-r19">
From UG to Universals
independent of the data. In other words, it is the probability Pr(h) assigned to each hypothesis h ŒH. One of the interesting things about this Bayesian formulation is that it allows us to see the classic problem of induction in a new light (Li & Vitanyi 1993). Consider what a completely ‘general purpose’ learner would look like. Such a learner would not be biased a priori in favour of any one hypothesis over another. In other words, Pr(h) would be equal for all hypotheses. Such a learner would then simply pick the hypothesis that maximised Pr(D | h). In other words, the best a learner can do is pick the hypothesis that recreates the data exactly. Such a learner cannot, therefore, generalise. Since language learning involves generalisation, then any theory of language learning must have a model of prior bias. Where does this prior bias come from? An obvious answer is that it is innate. Note, however, that we have said nothing about domain specificity. It is crucial that the issues of innateness and domain specificity are kept separate. It is a fascinating but difficult challenge to discover which features of the child’s prior bias (if any) are there for language. We note here only that an approach to this problem must be based on a theory of the relationship between the structure of innate mechanisms and the functions to which they are put (e.g., language learning). In other words, answers to questions about domain-specificity will come from a better understanding of the biological evolution of the human brain. To summarise, a central goal for linguistics is to discover the properties of UG. We argue that, in general, this amounts to a characterisation of the prior learning bias that children bring to bear on the task of language acquisition. Since it is Universal Grammar that leads to universal properties of human languages, a sensible strategy seems to be to use observable properties of languages to infer the content of UG. In the next section, we will show that this argument suffers from a problem that has been identified with a quite different approach to linguistic explanation: functionalism.
3. The problem of linkage The functionalist approach to explaining language universals (see e.g., Hawkins 1988) seems at first blush to be incompatible with explanations that appeal to UG. A functionalist explanation for some aspect of language structure will relate it to some feature of language use. This runs completely counter to the generativist program, which focuses on explaining linguistic structure on its own terms, explicitly denying a place for language use ‘inside’ a theory of UG. If chemistry is a good analogy for the generativist enterprise, then perhaps biology is the equivalent for functionalists. The central idea is that we can only make sense of structure in
121
"kir-r22"> "kir-r13">
122
Simon Kirby, Kenny Smith and Henry Brighton
light of an understanding of what it is used for. (See Newmeyer (1998) and Kirby (1999) for further discussion of functionalism and the generativist tradition.) A particularly ambitious attempt to explain a wide range of data in terms of language use is Hawkins’ (1994) processing theory. Hawkins’ main target is an explanation of the universal patterns of word-order variation. For example, he notes that there is a constraint on possible ordering in noun-phrases — a universal he calls the prepositional noun-modifier hierarchy: In prepositional languages, within the noun-phrase, if the noun precedes the adjective, then the noun precedes the genitive. Furthermore, if the noun precedes the genitive, then the noun precedes the relative clause. This hierarchy predicts that, if a language has structure n in the following list, then it will have all structures less than n: 1. 2. 3.
PP[P NP[N
S¢]] PP[P NP[N NP]] PP[P NP[N Adj]]
Hawkins’ explanation rests on the idea that when processing such structures, stress on our working memory increases as the distance between the preposition and the noun increases. He argues that the NP node in the parse-tree is only constructed once the head noun is processed. This means that the immediate daughters of the PP are only available for attachment to the PP node when both the preposition and noun have been heard. Since relative clauses are typically longer than nounphrases, which are usually longer than adjectives, the difficulty in processing each of these structures increases down the list. Assuming this account is correct, does the relative processing difficulty of each structure actually explain the language universal? Kirby (1999) points out that the identification of a processing asymmetry that corresponds to an asymmetry in the distribution of languages is not quite enough to count as an explanation. What is missing is something to connect working-memory on the one hand with numbers of languages in the world on the other. The problem of linkage: Given a set of observed constraints on cross-linguistic variation, and a corresponding pattern of functional preference, an explanation of this fit will solve the problem: how does the latter give rise to the former? (Kirby 1999: 20)
Kirby (1999) sets out an agent-based model of linguistic transmission to tackle this problem. Agent-based modelling is a computational simulation technique used extensively in the field of artificial life (see Kirby 2002b for a review of the way this field has approached language evolution). ‘Agents’ in these simulations are simple, idealised models of individuals, in this case language users. The details of the simulation are not important, but the basic idea is that variant word-orders are
From UG to Universals
transmitted over time from agent to agent through a cycle of production, parsing, and acquisition. In the simulations, different word-order variants appear to compete for survival, with universal patterns of cross-linguistic variation emerging out of this competition. These models show that for some functional explanations, processing asymmetries do indeed result in equivalent language universals. However, this is not always the case. In general, hierarchical universals cannot be explained using only one set of functional asymmetries. The particular details are not relevant here,3 but the moral should be clear: without an explicit mechanism linking explanans and explanandum we cannot be sure that the explanation really works. At this point we might ask what relevance this discussion has for the generative type of explanation, which treats language universals as being encoded in Universal Grammar. In fact, we would argue that there is very little difference between these two modes of explanation, and as such the same problem of linkage applies. In Hawkins’ approach to functional explanation, a direct link is made between a feature of the language user’s psychology (such as working memory) and the universal properties of language. Similarly, the generative approach makes a direct link between another feature of the language user’s psychology (this time, learning bias) and language universals. The problem of linkage holds for both functionalist and generative explanations for language universals. In the next section, we look at a development of the model put forward in Kirby (1999) that demonstrates the rather subtle connections between language learning and language structure arising out of the process of linguistic transmission.
4. Iterated learning Over the last few years there has been a growing interest in modelling a type of cultural information transmission we call Iterated Learning (Kirby & Hurford 2002). The central idea underlying the iterated learning framework is that behaviour can be transmitted culturally by agents learning from other agents’ behaviour which was itself the result of the same learning process. Human language is an obvious example of a behaviour that is transmitted through iterated learning.4 The linguistic behaviour that an individual exhibits is both a result of exposure to the behaviour of others and a source of data that other learners may be exposed to. The Iterated Learning Model (ILM) gives us a tool with which we can explore the properties of systems that are transmitted in this way. In this section we will briefly review some of the ways the ILM has been used to look at systems for mapping meanings to signals that are transmitted through repeated learning and
use. The main message we hope to convey is that the relationship between learning and the structure of what is being learned is non-trivial. Hence, when we look at the ‘real’ system of human language, we should expect the relationship between UG and universals to be similarly complex.
A simple ILM Consider a system where there are a number of meanings that agents want to express. They are able to do this by drawing on a set of possible signals. The way in which they relate signals and meanings is by using some internal grammar. The means by which they arrive at this grammar is through observation of particular instances of other agents’ expression of meanings. We can imagine systems like this with large populations of agents interacting and learning from each other, with the possibility for various kinds of population turnover (i.e., how the population changes over time). The simplest possible population model is shown in Figure 2. Here there are only two agents at any one time: an adult and a learner. The adult will be prompted with a randomly chosen meaning and, using its grammar, will generate a signal. This signal-meaning pair will then form part of the input data to the learner. From a set of the adult’s signalmeaning pairs (the size of the set being a parameter in the simulation) the learner will try and induce the adult’s grammar. We are interested in what happens when a language (conceived of as a mapping between meanings and signals) is transmitted in this way. Will the language change? If so, are there any stable states? What do stable languages look like and what determines their stability? Ultimately, we can only begin to find answers to these questions by actually implementing the ILM in simulation. To do this we need to implement a model agent, decide what the set of meanings and signals will look like, and also the structure and dynamics of the population. In an ILM, the particular learning algorithm used determines the prior bias of the agents. We can think of the learning algorithm as essentially a model of UG. A wide range of designs of ILM simulations have been employed in the literature. The following is a partial list (there has also been further work looking at models of language that do not treat it as a mapping from meanings to signals, such as Jäger 2003, Teal & Taylor 1999, and Zuidema 2001): –
– –
(Batali 1998). Models a population of simple recurrent networks (Elman 1990). Meanings are bit-vectors with some internal structure. There is no population turnover in this simulation. (Kirby 2000). Agents learn using a heuristically-driven grammar inducer. Meanings are simple feature-structures, and the population has gradual turnover. (Kirby 2002a). Similar learning algorithms, but with recursively structured meaning representation. Described in more detail below.
Figure 2.A simple population model for iterated learning. Each generation has only one agent, A. This agent observes utterances produced by the previous generation’s agent. The learner forms a hypothesis, H, based on these utterances. In other words, the agent aims to acquire the same language as that of the previous generation. Prompted by a random set of meanings, M, this agent goes on to produce new utterances for the learner in the next generation. Note that, crucially, these utterances will not simply be a reiteration of those the agent has heard because the particular meanings chosen will not be the same.
– – –
–
– –
(Kirby 2001). Same learning algorithm. Meanings are coordinates in a twodimensional space, with a non-uniform frequency distribution. (Batali 2002). Population of agents using instance-based learning techniques. Meanings are flat lists of predicates with argument variables. (Brighton & Kirby 2001). Agents acquire a form of finite-state transducer using Minimum Description Length learning. Many runs of the simulation are carried out with different meaning-spaces. (Tonkes 2002). Along with a number of other models, Tonkes implements an ILM with a population of simple recurrent networks with a continuous meaning space (each meaning is a number between 0.0 and 1.0). (Smith, Brighton & Kirby, forthcoming). Uses associative networks to map between strings and feature-vectors. (Vogt 2003). Implements a simulation of a robotics experiment — the ‘Talking Heads’ model of Steels (1999). The agents communicate about objects of various shapes, colours and locations. This is part of a broader research effort to get round the problem that the ILM requires a pre-existing model of meanings. By grounding the ILM in a real environment, both signals and meanings can be seen to emerge.
125
"kir-r17">
126
Simon Kirby, Kenny Smith and Henry Brighton
These simulations are typically seeded with an initial population that behaves randomly — in other words, agents simply invent random signals (usually strings of characters) for each meaning that they wish to produce. This idiosyncratic, unstructured language is learned by other agents as they are exposed to these utterances, and in turn these learners go on to produce utterances based on their own experience. The remarkable thing is that, despite their very different approaches to modelling learning (i.e., models of UG) the same kind of behaviour is seen in all these models. The initial random language is highly unstable and changes rapidly, but over time stability begins to increase and some structure in the mapping between meanings and signals emerges. Eventually, a stable language evolves in which something like syntactic structure is apparent. For example, Kirby (2002a) uses the ILM to explore how recursive compositionality could have evolved. In this model, the population structure is as in Figure 2. The agents’ model of language is represented as a form of context-free grammar, and a heuristic-based induction algorithm is used to acquire a grammar from a set of example utterances. The signals are simply strings of characters, and the meanings take the form of simple predicate logic expressions. (This is not the place to go into the technical details of the model — these are given in the original article.) Here are a few of the sentences produced by an agent early on in the simulation run. The meaning of each sentence is glossed in English. (Note that the letters that make up these strings are chosen at random — there is no role for phonetics or phonology in this simulation): (1) ldg ‘Mary admires John’ (2) xkq ‘Mary loves John’ (3) gj ‘Mary admires Gavin’ (4) axk ‘John admires Gavin’ (5) gb ‘John knows that Mary knows that John admires Gavin’
In this early stage, the language of the population is unstructured. Each meaning is simply given a completely idiosyncratic, unstructured string of symbols. There is no compositionality or recursion here, and it is better to think of the language as a vocabulary where a word for every possible meaning has to be individually listed. This type of syntax-free language, which Wray (1998) refers to as a holistic protolanguage, may have been a very early stage in the evolution of human language. It can be compared with animal communication systems inasmuch as they typically
From UG to Universals
exhibit no compositional structure.5 Wray suggests that living fossils of this protolanguage still exist today in our use of formulaic utterances and holistic processing. The hallmark of these early languages in the ILM is instability. The pairing of meanings and strings changes rapidly and as a result the communicative ability of the agents is poor. It is easy to see why this is. The learners are only exposed to a subset of the range of possible meanings (which, strictly speaking, are infinite in this model because the meanings are defined recursively). This means each learner can only accurately reproduce the language of the adult for meanings that it has seen. Given the five sentences listed above, how would you generalise to another meaning, say ‘Mary loves Gavin’? The best you could do would be to either say nothing, or produce a string of random syllables of approximately the same length as the ones you have seen. This is precisely the challenge agents early in the simulation are faced with (although the number of sentences they are exposed to is much higher). Thousands of generations later, however, and the language looks very different (note that the speakers do not actually generate spaces within the signals — these are included here for clarity only): (6) gj h f tej m John Mary admires ‘Mary admires John’ (7) gj h f tej wp John Mary loves ‘Mary loves John’ (8) gj qp f tej m Gavin Mary admires ‘Mary admires Gavin’ (9) gj qp fh m Gavin John admires ‘John admires Gavin’ (10) i h u i tej u gj qp fh m John knows Mary knows Gavin John admires ‘John knows that Mary knows that John admires Gavin’
This is clearly a compositional language. The meaning of the whole string is a function of the meanings of parts of the string. The compositional structure is also recursive as can be seen in the last example. What is interesting is that this language is completely stable. It is successfully learned by generation after generation of agents. The grammar of this language is also completely expressive. There is perfect communication between agents.
127
"kir-r2">
128
Simon Kirby, Kenny Smith and Henry Brighton
Again, it is easy to see why this is so. If you were asked the same question as before — how to express the meaning ‘Mary loves Gavin’ — you would probably give the answer gjqpftejwp. What is happening here is that you, just like the agents, are able to generalise successfully from this small sample of sentences, by uncovering substrings that refer to individual meanings, and ways to put these substrings together. There is no need for recourse to random invention. Because of this, the language is stable. All agents will (barring some unfortunate training set) converge on the same set of generalisations. They will all be able to communicate successfully about the full range of meanings (which are infinite in this case). To summarise: in the ILM, not all languages are equally stable. A language’s stability is directly related to its generalisability. If the language is such that generalisation to unseen meanings is difficult, then noise will be introduced to the transmission process. A crucial feature of the process of iterated learning is that if a learner makes a generalisation, even if that is an over-generalisation, the utterances that the learner produces will themselves be evidence for that generalisation. In other words, generalisations propagate. As the language comes to exhibit more and more generalisability, the level of noise in the transmission process declines, leading finally to a completely stable and highly regular linguistic system.6 A similar process is seen in every simulation run although the particular words used, and their word-order is different each time. It is important to realise that this is not an idiosyncratic feature of this particular model. For example, with a quite different learning model (simple recurrent networks), meaning space (bit-vectors), and population model, Batali (1998) also observed a similar movement from unstructured holism to regular compositionality. There seems to be a universal principle at work here. As Hurford (2000) puts it, social transmission favours linguistic generalisation. There appear to be two crucial parameters in these models that determine the types of language that are stable through iterated learning: the size of the training set the learners are exposed to, and the structure of the space of possible meanings. Hurford (2002) refers to the size of the training data as the ‘bottleneck’ on linguistic transmission. The bottleneck is the expected proportion of the space of possible meanings that the learners will be exposed to. When the bottleneck is too tight, no language is stable — the learners do not have enough data to reconstruct even a perfectly compositional system. If, on the other hand, the bottleneck is very wide then unstructured, holistic languages are as stable as compositional ones. This is because there is no pressure to generalise. It is possible in these models to vary the frequency with which different meanings are expressed. This means that the bottleneck will not be equal for all meanings. In this case, we should expect frequent meanings to tend to exhibit less regularity than infrequent ones — a result that matches what we find in the
"kir-r17"> "kir-r4"> "kir-r25">
From UG to Universals
morphology of many languages. This result is exactly what we find in simulation (Kirby 2001) which confirms the central role of the bottleneck in driving the evolution of linguistic structure. The result also demonstrates that the particular choice of meanings that the agents are to communicate about is important.7 Brighton (2002) examines the relationship between stability in the ILM and the particular structure of each meaning. In this study, meanings are treated as feature vectors. Different results are obtained depending on the number of features and the number of values each feature can take. Using both simulation and mathematical models of the iterated learning process, the relationship between feature structure and the relative stability of compositional languages can be determined. This approach is extended by Smith (2003) in a set of simulations where only some meanings are actually used by the agents. In both cases it can be shown that there is a complex relationship between meanings and the types of language that will emerge. The broad conclusion that can be drawn is that compositional structure evolves when the environment is richly structured and the meanings that the agents communicate about reflect this structure. This work on iterated learning is at a very early stage. There is a huge gulf between the elements of these models and their real counterparts. Obviously, neither feature vectors nor simple predicate logic formulae are particularly realistic models of how we see the world. The learning algorithms the agents use and their internal representations of linguistic knowledge are not adequate for capturing the rich structure of real human languages. Does this render the results of the modelling work irrelevant? Unsurprisingly, we would argue to the contrary. Just as simulation modelling has proved invaluable in psycholinguistics and cognitive science more generally (Elman et al. 1996), we feel that it can be used as a way of testing hypotheses about the relationship between individuals, the environment, and language universals. We know that language is transmitted over time through a process of iterated learning, but as yet we do not have a complete understanding of what this implies. We gain insights from idealised models which can be brought to bear on fundamental questions in linguistics. In this section, we have put forward a general solution to the problem of linkage. UG, instantiated in individuals as prior learning bias, impacts on the transmission of language through iterated learning. This results in a dynamical system — some languages are inherently unstable and communicatively dysfunctional. These could never be viable human languages. Nevertheless, this fact may not be recoverable purely through examination of the biases of the learner. In other words, universals (such as compositionality) are derived in part by prior learning biases, but are not built into the learner directly. Through the iterated learning process, these languages evolve towards regions of relative stability in this dynamic
129
130
Simon Kirby, Kenny Smith and Henry Brighton
landscape. The implication is clear: UG and universals cannot be directly equated. Rather, the connection is mediated by the dynamics of iterated learning. From this we can conclude that we must be very cautious in setting out a theory of UG on the basis of the observed structure of human languages — we may unwittingly be setting up a situation that results in a hidden prediction of other universals. In general, the languages that are stable through iterated learning will be a subset of those that appear to be predicted by the model of learning used.
5. Universals as emergent adaptations The models we described in the previous section looked at how recursive compositionality, perhaps the most fundamental property of human language, can evolve through the iterated learning process. Why does this happen, and can this result help us understand language universals more generally? Earlier, we discussed what happens in the ILM from the point of view of the learner, but to answer these questions it helps to take a quite different perspective. We are used to thinking about language from the individual’s point of view. For example, we are keen to understand what the structure of the language acquisition mechanism needs to be in order for children to acquire language. Similarly, we think about language processing in terms of a challenge posed to the user of language. For many linguists, implicit in this thinking is the view that humans are adapted to the task of acquiring and using language. If language is the problem, individual human psychology is the solution. What if we turn this round? In the context of iterated learning, it is languages not language users that are adapting. Let us imagine some linguistic rule, or set of rules, that mediates the mapping between a set of meanings and their corresponding signals. For that rule to survive through iterated learning it must be repeatedly used and acquired. Consider first the case of an early-stage holistic language. Here, each rule in the language covers only a single meaning. In the example given in the last section, there was a rule that maps the meaning for ‘Mary loves John’ onto the string xkq. That is all the rule does, it is not involved in any other points in the meaning-space. For this rule to survive into the next generation, a learner must hear it being used to express ‘Mary loves John’. Now we consider the case of the perfectly compositional language. Here things are more complex because there are a number of rules used to map the meaning ‘Mary loves John’ onto the string gjhftejwp. However, the important point is that all of these rules are used in the expression of many more than this single meaning. These rules therefore produce more evidence for themselves than the idiosyncratic
"kir-r9"> "kir-r13">
From UG to Universals
rule in the previous example. The challenge for rules or regularities in a language is to survive being repeatedly squeezed through the transmission bottleneck. As Deacon (1997) puts it, “language structures that are poorly adapted to this niche simply will not persist for long” (p. 110). To put it simply, sets of rules that have general, rather than specific, application are better adapted to this challenge. In this case, recursive compositionality is a linguistic adaptation to iterated learning. In this view, language universals can be seen as adaptations that emerge from the process of linguistic transmission. They are adaptive with respect to the primary pressure on language itself — its successful social transmission from individual to individual. Taking this perspective on the structure of language shows how compatible the generativist and functionalist approaches actually are. Figure 3 shows how adapting to innate learning bias is only one of the many problems language faces. Every step in the chain that links the speaker’s knowledge of language to the hearer’s knowledge of language will impact on the set of viable, stable human languages (see, for example, Kirby & Hurford 1997 for a model that combines processing pressures and a parameter-setting learner). social factors ambiguity
environment articulation errors
LANGUAGEi disfluencies least-effort principles
working memory
LANGUAGEi+1
noise discourse structure
learning bias
Figure 3.Many factors impinge on linguistic transmission. Language adapts in response to these pressures.
Indeed, there may be cases where the boundary between explanations based on acquisition, and explanations based on processing is very hard to draw. We mentioned Hawkins’ (1994) approach to word-order universals in Section 2. This has been applied to the general universal tendency for languages to order their heads consistently at either left-edge or right-edge of phrases throughout their grammars. This is argued to reflect a preference of the parser to keep the overall distance between heads as short as possible to reduce working-memory load. Kirby (1999) implements this preference in a very simple ILM of the transmission of word-order variants to show how the head-ordering universal emerges. It seems clear in this case that we are talking about a quintessentially functionalist explanation — an explanation couched in terms of the use of language. Howev-
er, Christiansen & Devlin (1997) explain the same facts in terms of language learning, using a general model of sequential learning: the Simple Recurrent Network (Elman 1990). The networks exhibit errors in learning in precisely those languages that are rare cross-linguistically. This seems a completely different explanation to Hawkins’. But do we really know what it is that causes the network errors? To test how well these networks have learned a language, the experimenter must give them example sentences to process. As a result, we do not know if the problem with the languages exhibiting unusual word-order arises from processing or acquisition. Perhaps we should collapse this distinction entirely. In some sense, when we acquire language we are acquiring an ability to use that language.8 The purpose of this discussion is to show that the distinction between functionalist approaches to typology and generativist explanations of language structure is not as clear as it might appear. UG and language function both play a role rather like the environment of adaptation does in evolutionary biology. Natural selection predicts that organisms will be fit. They will show the appearance of being designed for successful survival and replication. Similarly, linguistic structure will reflect properties of the bottleneck in linguistic transmission. Once this analogy is made, it is tempting to try and apply it further. Could we explain the emergence of linguistic structure in terms of a kind of natural selection applied to cultural evolution? There have been many attempts to do just this both in general (Blackmore 1999) and in the case of language (e.g., Croft 2000; and Kirby 1999). We would like to sound a note of caution, however. There are important differences between iterated learning and biological replication (see Figure 4). In biology, there is direct copying of genetic material during reproduction. The central dogma of molecular biology (Crick 1970) states that transformation from DNA to organism is one-way only. In iterated learning, however, there is repeated transformation from internal representation to external behaviour and back again. The function of learning is to try and reconstruct the other agent’s internal representation on the basis of their behaviour. This disanalogy with the process of selective replication in biology must be taken into account in any theory of linguistic transmission based on selection. This is not to say that an explanatory model that utilises selection is impossible. Much depends on exactly how the model is formulated. For example, Croft’s (2000) model focuses on the replication of constructions (as opposed to induced grammatical competence). By identifying the construction as the locus of replication, Croft’s model has a more natural selectionist interpretation. A final comment should be made about the notion of adaptation we are appealing to. The simulations discussed in the previous section exhibited a universal tendency for a movement from inexpressive holistic languages to maximally expressive compositional ones. It is obvious that agents at the end of the
From UG to Universals
PLD production
PLD learning
GC
production
PROTEINS
translation
DNA
Linguistic transmission
GC
PROTEINS
translation
selection
replication
DNA
Genetic transmission
Figure 4.Similarities and differences between linguistic and genetic transmission. The central dogma of molecular biology states that there is no reverse translation from phenotype (i.e., proteins) to genotype (i.e., DNA). Genetic information persists by direct copying of the DNA. The only influence of the phenotype is in determining whether or not the organism has a chance of replication (hence, selection). In linguistic transmission, there is a far more complex mechanism — learning — that attempts to reconstruct grammatical competence (GC) by “reverse engineering” the primary linguistic data (PLD).
simulation are capable of far more successful communication than those early on. In some models they are capable of infinite expressivity that can be reliably acquired from sparse evidence — a defining hallmark of human language. These late-stage agents are using a far more communicatively functional language than those earlier in the simulation run. However strange it sounds, this is merely a happy bi-product of the adaptive mechanism at work. Languages are not adapting to be more useful for the agents (at least not directly). Rather, they are simply adapting to aid their own transmission fidelity. In practice, this will usually be the same thing. If this idea is correct, then it would be interesting to try and find examples where the needs of language (to survive from generation to generation) and the needs of its users (to communicate easily and successfully) diverge. In other words, can we find apparently disfunctional aspects of language that are nevertheless stable, and furthermore can we give these a natural explanation in terms of iterated learning? This is a challenging research goal, but there may be places we can start to look. For example, there are constructions that are notoriously hard to parse, such as centre-embedded relative clauses that are nevertheless clearly part of everyone’s linguistic competence. Why are we burdened with these apparently
133
134
Simon Kirby, Kenny Smith and Henry Brighton
suboptimal aspects of grammar? Perhaps the answer will lie in understanding the relative stability through iterated learning of a language with centre-embedding and a minimally different one that ruled-out the difficult constructions.
6. Conclusion In this paper, we have explored the relationship between Universal Grammar and universal properties of language structure in the light of recent computational models of linguistic transmission. In summary: –
– –
– –
– –
–
We treat Universal Grammar as a theory of what the language learner brings to the task of language acquisition that is independent of the linguistic data. In other words, UG is determined by the initial state of the child in addition to the Language Acquisition Device. UG in this sense can be equated to prior learning bias in a general Bayesian approach to learning. This prior bias is innately coded. It is fruitless to search for a bias-free model of language acquisition. In other words, there will always be a role for innateness in understanding language acquisition. The degree to which our innate bias is language specific is an open question — one that will probably require an evolutionary approach to answer. Both functionalist explanations for language universals and explanations in terms of UG suffer from the problem of linking an individual-level phenomenon (e.g., learning bias, processing pressures, social factors, etc.) with a global property of linguistic distribution. Language is a particular kind of cultural adaptive system that arises from information being transmitted by iterated learning. Computational models have been employed to uncover properties of iterated learning. For example, where the language model is a mapping between structured meanings and structured signals, compositionality emerges. One way of understanding language universals in the light of iterated learning is as adaptive solutions to the problem language faces of being successfully transmitted.
Because the connection between UG and universal properties of linguistic structure is not direct, we need to be cautious about how we use linguistic evidence. As Niyogi & Berwick (1997) show in their work on the link between acquisition and language change, a theory of acquisition that is explicitly designed to account for syntactic variation may actually make the wrong predictions once linguistic transmission is taken into account.
"kir-r16"> "kir-r1"> "kir-r24">
From UG to Universals
On the other hand, iterated learning can lift some of the burden of explanation from our theories of universal grammar. Jäger (2003) examines a model of variation in case-systems based on functional Optimality Theory. To account for the known facts a rather unsatisfying extra piece of theoretical machinery — the case hierarchy of Aissen (2003) — has been proposed. Using simulations of iterated learning, in combination with a model of the linguistic environment based on corpora, Jäger demonstrates that this hierarchy emerges ‘for free’ from the iterated learning process. We hope that future research will continue to discover general, universal properties of iterated learning as well as relating these to questions of genuine interest to linguistics. In some ways these goals are orthogonal. The most idealised models of linguistic transmission tend to have questionable relevance to linguistics. For example, the ‘language dynamical equation’ developed by Nowak, Komarova & Niyogi (2001) treats language acquisition simply as a matrix of transition probabilities, and combines this with a model of reproductive fitness of speakers in a population. This leads to mathematically tractable solutions for a very limited subset of possible models of acquisition, but it is far from clear that these results correspond to anything in the real world (for example, it seems implausible that language change is driven primarily by the number of offspring a particular speaker has). Nevertheless, we do need idealised models such as those we have presented; but crucially, models that can help us to understand how the real linguistic system adapts. Getting the balance right between tractable idealisation, and relevant realism is likely to be the biggest challenge facing future research.
Notes 1. Note however that Newmeyer (this volume) argues that, in practice, parametric theories of UG are poor explanations for implicational and statistical universals. 2. This is actually a slight simplification. For a given hypothesis, h, that is not learnable, we can treat this as being excluded from the set H (giving us a second type of information the learner brings to the learning task), or by including it in the set and assigning it a prior probability of zero. 3. A key component of explanations for universals that license different types in an asymmetrical markedness relationship is the existence of ‘competing motivations’ that create complex dynamics — see Kirby (1999) for details. 4. Music might be another example. Miranda, Kirby & Todd (2003) use simulations of iterated learning to explore new compositional techniques which reflect the cultural evolution of musical form. 5. We should be a little cautious of this comparison, however. The holistic protolanguage in the simulation is learned, whereas most animal communication systems are innately coded — although there appear to be some exceptions to this generalisation.
6. This process bears some similarity to an optimisation technique in computer science called ‘simulated annealing’ (Kirkpatrick & Vecchi 1983). The search-space is explored over a wide area initially, but as the solution is approached, the search focuses in more closely on the region of the relevant region of space. It is interesting that this kind of optimisation arises naturally out of iterated learning without it being explicitly coded anywhere in the model. 7. The frequency of meaning expression is presumably driven largely by the environment (although Tullo & Hurford 2003 look at a model where ongoing dialog determines meaningchoice in deriving the Zipfian distribution). Grounded models from robotics give us increasingly sophisticated ways of relating meanings and environment (e.g., Vogt 2002). 8. There is another possible way of explaining why languages typically exhibit these word-order patterns. Dryer (1992) and Christiansen & Devlin (1997) refer to consistent branching direction rather than head-ordering, although these are nearly equivalent. Consistently left- or rightbranching languages are more common than mixed types. Brighton (2003) shows that a general property of stable languages in the ILM is the simplicity of their grammatical representation, where simplicity is defined in terms of the number of bits the learners use for storage. A topic for ongoing research is whether the commonly occurring word-order patterns are those that result in maximally compressible representations.
References Aissen, J. 2003. “Differential object marking: iconicity vs. economy”. Natural Language and Linguistic Theory 21: 435–483. Batali, J. 1998. “Computational simulations of the emergence of grammar”. In: Hurford, J. R.; Studdert-Kennedy, M.; and Knight, C. (eds), Approaches to the evolution of language: social and cognitive bases 405–426. Cambridge: CUP. Batali, J. 2002. “The negotiation and acquisition of recursive grammars as a result of competition among exemplars”. In: Briscoe, E. J. (ed.), Linguistic evolution through language acquisition 111–172. Cambridge: CUP. Blackmore, S. 1999. The meme machine. Oxford: OUP. Brighton, H. 2002. “Compositional syntax from cultural transmission”. Artificial Life 8: 25–54. Brighton, H. 2003. Simplicity as a driving force in linguistic evolution. Unpublished PhD Thesis, University of Edinburgh. Brighton, H.; and Kirby, S. 2001. “The survival of the smallest: stability conditions for the cultural evolution of compositional language”. In: Kelemen, J.; and Sosik, P. (eds), Advances in artificial life (vol. 2159) 592–601. Berlin: Springer. Chomsky, N. 1972. Language and mind (2nd ed.). New York: Harcourt, Brace & World. Chomsky, N. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, N. 2002. On nature and language. Cambridge: CUP. Christiansen, M.; and Devlin, J. 1997. “Recursive inconsistencies are hard to learn: a connectionist perspective on universal word-order correlations”. In: Shafto, M; and Langley, P. (eds), Proceedings of the 19th annual Cognitive Science Society Conference 113–118. Mahwah, NJ: Erlbaum. Crick, F. 1970. “Central dogma of molecular biology”. Nature 227: 561–563. Croft, W. 2000. Explaining language change. London: Longman. Deacon, T. 1997. The symbolic species. New York: Norton.
Dryer, M. 1992. “The Greenbergian word-order correlations”. Language 68: 81–138. Elman, J. 1990. “Finding structure in time”. Cognitive Science 14(2): 179–211. Elman, J.; Bates, E. A.; Johnson, M. H.; Karmiloff-Smith, A.; Parisi, D.; and Plunkett, K. 1996. Rethinking innateness. Cambridge, MA: MIT Press. Grimshaw, J. 1997. “Projection, heads and optimality”. Linguistic Inquiry 28: 373–422. Hawkins, J. A. 1994. A performance theory of order and constituency. Cambridge: CUP. Hawkins, J. A. (ed.). 1988. Explaining language universals. Oxford: Basil Blackwell. Hurford, J. R. 2000. “Social transmission favours linguistic generalisation”. In: Knight, C.; Studdert-Kennedy, M.; and Hurford, J. R. (eds), The evolutionary emergence of language: social function and the origins of linguistic form 324–352. Cambridge: CUP. Hurford, J. R. 2002. “Expression/induction models of language evolution: dimensions and issues”. In: Briscoe, E. J. (ed.), Linguistic evolution through language acquisition 301–344. Cambridge: CUP. Jackendoff, R. 2002. Foundations of language: brain, meaning, grammar, evolution. Oxford: OUP. Jäger, G. 2003. “Simulating language evolution with functional OT”. In: Kirby, S. (ed.), Language evolution and computation: ESSLLI workshop proceedings 52–61. Kirby, S. 1999. Function, selection and innateness: the emergence of language universals. Oxford: OUP. Kirby, S. 2000. “Syntax without natural selection: how compositionality emerges from vocabulary in a population of learners”. In: Knight, C.; Studdert-Kennedy, M.; and Hurford, J. R. (eds), The evolutionary emergence of language: social function and the origins of linguistic form 303–323. Cambridge: CUP. Kirby, S. 2001. “Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity”. IEEE Journal of Evolutionary Computation 5(2): 102–110. Kirby, S. 2002a. “Learning, bottlenecks and the evolution of recursive syntax”. In: Briscoe, E. J. (ed.), Linguistic evolution through language acquisition. Cambridge: CUP. Kirby, S. 2002b. “Natural language from artificial life”. Artificial Life 8: 185–215. Kirby, S.; and Hurford, J. R. 1997. “Learning, culture and evolution in the origin of linguistic constraints”. In: Husbands, P.; and Harvey, I. (eds), Proeceedings of the 4th European Conference on Artificial Life 493–502. Cambridge, MA: MIT Press. Kirby, S.; and Hurford, J. R. 2002. “The emergence of linguistic structure: an overview of the iterated learning model”. In: Cangelosi, A.; and Parisi, D. (eds), Simulating the evolution of language 121–148. Berlin: Springer. Kirkpatrick, S.; Gelatt, C. D. Jr.; and Vecchi, M. P. 1983. “Optimization by simulated annealing”. Science 220 (4598): 671–680. Li, M.; and Vitanyi, P. 1993. Introduction to Kolmogorov complexity. London: Springer. Miranda, E.; Kirby, S.; and Todd, P. In press. “On computational models of the evolution of music: from the origins of musical taste to the emergence of grammars”. Contemporary Music Review. Mitchell, T. 1997. Machine learning. New York: McGraw Hill. Newmeyer, F. J. 1998. Language form and language function. Cambridge, MA: MIT Press. Niyogi, P.; and Berwick, R. 1997. “A dynamical systems model of language change”. Complex Systems 11: 161–204. Nowak, M.; Komarova, N.; and Niyogi, P. 2001. “Evolution of Universal Grammar”. Science 291: 114–118. Smith, K. 2003. The transmission of language: models of biological and cultural evolution. Unpublished PhD Thesis, University of Edinburgh.
Smith, K.; Brighton, H.; and Kirby, S. Forthcoming. “Complex systems in the language evolution: the cultural emergence of compositional structure”. Advances in Complex Systems. Steels, L. 1999. The talking heads experiment, vol. 1: words and meanings. Antwerpen: LABORATORIUM. Teal, T.; and Taylor, C. 1999. “Compression and adaptation”. In: Floreano, D.; Nicoud, J. D.; and Mondada, F. (eds), Advances in artificial life (vol. 1674) 709–719. Berlin: Springer. Tonkes, B. 2002. On the origins of linguistic structure: computational models of the evolution of language. Unpublished PhD Thesis, University of Queensland, Australia. Tullo, C.; and Hurford, J. R. 2003. “Modelling Zipfian distributions in language”. In: Kirby, S. (ed.), Language evolution and computation: ESSLLI workshop proceedings 62–75. Vogt, P. 2002. “The physical symbol grounding problem”. Cognitive Systems Research 3(3): 429–457. Vogt, P. 2003. “Iterated learning and grounding: from holistic to compositional languages”. In: Kirby, S. (ed.), Language evolution and computation: ESSLLI workshop proceedings 76–86. Wray, A. 1998. “Protolanguage as a holistic system for social interaction”. Language and Communication 18: 47–67. Zuidema, W. 2001. “Emergent syntax: the unremitting value of computational modelling for understanding the origins of complex language”. In: Kelemen, J.; and Sosik, P. (eds), Advances in artificial life (vol. 2159) 641–644. Berlin: Springer.
Form, meaning and speakers in the evolution of language Commentary on Kirby, Smith and Brighton William Croft University of Manchester/Center for Advanced Study in the Behavioral Sciences
Kirby et al. discuss the problem of the evolutionary origin of language and offer a simulation of a method, iterated learning, to deal with the problem of how linguistic structure emerged. I will examine here what Kirby et al.’s simulation might tell us about this problem, in both generative and functionalist terms. Kirby et al. frame their theoretical discussion in terms of the generative model of UG and the assumption of the child-based approach to language change (where language change is effected by children intuiting a different grammar from their parents in acquisition). Yet there are many serious empirical problems with both UG as a model of language universals and the child-based model. Four decades of cross-linguistic research has demonstrated that language universals can only be formulated as universals constraining variation, not as absolute universals of the form ‘All languages have X’. Although parameters have been introduced to accommodate such variation, the empirical predictions (where tested) rarely succeed, partly due to sampling problems and partly due to the rarity of biconditional universals (see Croft 2003, especially §3.5). The child-based model is also problematic. The sort of changes that are attested in language history are not the same as those found in child language behavior. Children are remarkably good at intuiting the same grammar as their parents even though they are almost never given direct negative evidence. This is called the ‘no negative evidence’ problem in child language acquisition. Whatever the solution to this problem is, it indicates that language acquisition is remarkably robust, and does not appear to be the cause of language change. Finally, children as a social group do not become agents of linguistic change (that is, driving forward the propagation of an innovation) until adolescence, and by that time, the child language acquisition process has largely ended (see Croft 2000, §3.2 and references cited therein). Does Kirby et al.’s model presuppose a generative UG or the child-based model of language change, with its serious empirical problems? Not really. Kirby et al.
"cro-r3">
140
William Croft
argue that innate capacities need not be domain-specific. If not, then their notion of UG is essentially the functionalist position: innate human capacity for language is part and parcel of more general, presumably innate cognitive and social capacities. Kirby et al.’s simulation model involves agents who revise their internal grammars upon hearing an utterance produced by another agent. Kirby et al. describe the interlocutors as ‘adult’ and ‘learner’, following the child-based model. But this isn’t necessary: the interlocutors could equally be described as adult speakers adjusting their grammatical knowledge by exposure to language use — precisely the usagebased model of language change advocated by functionalists. So Kirby et al.’s model does not presuppose a generative approach to UG or language change. What do Kirby et al.’s model and simulation tell us about language evolution, then? They present one specific example of a simulation, from Kirby (2002). This example demonstrates that a system of agents starting with random expressions for individual meanings can evolve a recursive compositional syntax, a basic property of language. A functionalist might describe what the model demonstrates as the emergence of iconicity, since the structure of the expressions reflects the structure of the semantic representation, a predicate calculus. However, much is built into the model. The recursive compositionality is already there — in the predicate calculus representation of meaning. The model uses a generalization algorithm based on the structure of the semantic representation, improving its grammar where the two match (Kirby 2002: 179–82). So a bias for an iconic mapping is already there as well. Kirby et al.’s simulation demonstrates that a system given a recursive, compositional ‘language of thought’ (the semantic representation), and the opportunity to construct an iconic mapping between the ‘language of thought’ and another language, is able to do so under suitable circumstances. This is an interesting result, but one must not read too much into it. Kirby et al. suggest that a holistic protolanguage (where a string denotes a whole semantic proposition without being analyzable) of the sort that Wray (1998) proposes would be unstable. But Wray did not intend her holistic protolanguage to be unstable, and its instability in the simulation is due to the presumption of the predicate calculus semantic representation plus the generalization algorithm based on it. Kirby et al. also suggest that their model demonstrates that “social transmission favors linguistic generalization” (p. 128). But the meanings are randomly generated by the speaker, and the listener is given the meaning along with the speaker’s expression to analyze. The ability to generalize is built into the grammar-constructing algorithm and is a function of the relation between the expression and the meaning, not the relation between the listener and the speaker. I believe the real question is, where did the ‘language of thought’ come from? The world does not come parsed into language-like predicate-argument structures. I suspect that by the time our ancestors had analyzed the world into such
Commentary on Kirby, Smith and Brighton
structures, they were well on their way to formulating them as multiword utterances. In other words, the evolution of conceptualization and the evolution of language probably proceeded hand in hand. And a substantive model of social interaction should probably play a significant role in determining how the world is conceptualized and analyzed for the specific purpose of human communication through language. Kirby et al. suggest that we can flip the locus of selection from the grammars to the linguistic utterances themselves, referring to my proposals along those lines. I obviously agree that this is a good, in fact better, way to look at language change. Kirby et al. however worry about “transformation from internal representation to external behavior and back again” (p. 132), and finding “examples where the needs of language (to survive from generation to generation) and the needs of its users (to communicate easily and successfully) diverge” (p. 133). But both of these concerns are based on a faulty theory (not theirs!) of the selection process and how it applies to language and other cultural phenomena. There is no ‘transformation’ from internal representation to external behavior. There is replication by speakers of linguistic structures in utterances. That replication process is of course mediated by the speakers, more specifically their knowledge about their language. But there are many examples of mediated replication, including the canonical biological example of DNA; not all replication is selfreplication. There is nothing incompatible with the view that linguistic structures in utterances are replicated by speakers. Speakers’ knowledge is an important part of the evolutionary process, of course, but it plays a different role. This role can be defined in response to Kirby et al.’s second worry. Not all replication is governed by self-selection. Evolution is a two-step process: replication and the variation generated in replication, and environmental interaction leading to selection (propagation or extinction of variants). There are two distinct roles for the two steps of the process: replicator and interactor. In gene-based biological evolution, genes are replicators. But as Hull (1988) points out, interactors occur at many levels of the biological hierarchy. Genes may be interactors, but so are cells and organisms. Genes, cells, and especially organisms all interact with the environment in such a way as to cause differential replication, that is, selection, of the relevant replicators (genes). In language change, speakers are interactors, in fact, one of the most important interactors in the process. In other words, speakers play a different role in an evolutionary model of language change than language, that is, linguistic structures in utterances. It is not a matter of different needs of language and language users (speakers). Rather, speakers’ interactions with their environment — what is to be communicated and above all who they are speaking to — causes selection of linguistic structures in utterances. Both speakers and utterances play essential roles in an evolutionary model of language.
141
"cro-r1"> "cro-r2"> "cro-r3">
142
William Croft
References Croft, William. 2000. Explaining language change: an evolutionary approach. Harlow, Essex: Longman. Croft, William. 2003. Typology and universals. (2nd edition). Cambridge: CUP. Hull, David L. 1988. Science as a process: an evolutionary account of the social and conceptual development of science. Chicago: University of Chicago Press. Kirby, Simon. 2002. “Learning, bottlenecks and the evolution of recursive syntax”. In: Briscoe, Ted (ed.), Linguistic evolution through language acquisition 173–203. Cambridge: CUP. Wray, Alison. 1998. “Protolanguage as a holistic system for social interaction”. Language and Communication 18: 47–67.
Authors’ response Simon Kirby, Kenny Smith and Henry Brighton University of Edinburgh
In our article, we argue for an approach to linguistic explanation that takes seriously the fact that language arises from the interaction of complex dynamical systems. One of the unusual features of this type of explanation is that there is often an opaque relationship between components of a theory and the predictions that theory makes. To reiterate our central point: the consequence of this is that a model of the language learner and the resultant structure of language cannot be directly equated. In other words, it is a mistake to use language universals as evidence for any kind of theory of UG (or equivalently any kind of theory of language use) without exploring the mechanisms that link the two. We place a good deal of emphasis on the use of modelling to find a solution to this opaqueness problem. Computational techniques based on multi-agent modelling are particularly effective for relating local behaviour and global dynamics. This is why we believe them to be an important and appropriate tool for theoretical linguistics. Developing computational models is not without its problems, however. One necessarily needs to balance the simplicity of a model against its realism, and the specific choices made in constructing the model against the generality of the results. We cite many computational models throughout the article all of which represent different approaches to getting this balance right. Ultimately, it is the convergence of results in so many different models that convinces us that our general conclusions are valid. Understandably however, Croft raises some issues with the particular model that we chose to describe in depth. For example, his commentary suggests that compositionality is there from the start in the simulation — specifically, in the type of meaning space we used. We should be clear that, for us, compositionality is a property of the relationship between meanings and utterances, and therefore cannot be said to hold for the meanings alone. This is merely a terminological problem. So, in Croft’s terms we might say that we have shown the emergence of iconicity in a model that starts with a non-iconic language.
"kir2-r1"> "kir2-r3"> "kir2-r2">
144
Simon Kirby, Kenny Smith and Henry Brighton
Nevertheless, it seems that the substance of this criticism is that by using such a highly structured representation of meanings, and by implementing a learning model that seeks appropriate generalisations, the result is somehow unsurprising. If this is the case, then our key point about the complex nature of the relationship between learning and emergent universals is undermined. In other words, the dynamical system of transmission has little explanatory role to play. It is important to note, however, that it is far from inevitable that a learner will acquire a compositional (iconic) language in this model. For many generations, the languages do not look like that. It is not even the case that perfect compositionality is the inevitable end-point of the transmission dynamic. Brighton (2002) and Smith, Brighton & Kirby (forthcoming) show that the relative stability of compositional languages varies depending on assumptions about meaning-space structure and the size of the learning bottleneck. In addition, we note that languages are not perfectly compositional. Kirby (2001) shows how varying the frequency of meanings in the model results in islands of non-compositionality in high-frequency parts of the space. From this we can see that the irregularity-by-frequency interaction in the morphology of many languages could also be a result of linguistic evolution driven by transmission pressures. Croft suggests that starting with a meaning-space of a particular structure is the wrong approach. After all, where did this structure come from? We agree that this is a very important question, and one that could impact on the validity of our conclusions. If meaning structure develops hand-in-hand with utterance structure, then can we be sure that the transmission dynamic will behave the way we think it will? This is an area where we expect to see many exciting development in the years to come. Researchers like Hurford (2003) are beginning to relate semantic constructs like predicate and argument to non-linguistic structure like that found in the brain’s perceptual system. As we point out in the article, Vogt (2003) and others are developing versions of the iterated learning model where meaning structure emerges out of agent interactions grounded in a real perceptual world. From initial results, it would appear that our general conclusions apply here as well. Despite the central role of simulation models in our argument, they are really only a means to an end. Our aim, of course, is to begin to sketch out a truly explanatory theory of language — a theory that is based on an understanding of the dynamic relationship between the individual and the population. We suggest that such a theory will treat language itself as an adaptive system. It is perhaps inevitable that this brings to mind parallels with evolutionary biology and that arch mechanism for the explanation of adaptive structure: natural selection. Croft’s work has demonstrated that there are very useful analogies to be made between language and biology. We should keep in mind, however, that a theory of language is not right or wrong on the basis of how well it maps onto such an
"kir2-r1"> "kir2-r2"> "kir2-r3"> "kir2-r4">
Authors’ response
analogy. It seems to us a self-evident truth that the mechanisms of linguistic and biological information transmission are very different, so we should not expect there to necessarily be a universal theory of selection that can be applied in both cases. In our discussion, we note that language exists in two different states: as mental representations and actual utterances. For language to persist, there must be mechanisms that transform one state into the other and vice versa. Croft appears to deny this, but really the difference between our positions may be one of emphasis. For us, language adapts to solve particular problems it faces: to be used and learnt. Whereas Croft’s work concentrates on instances of socially-driven language change, we seek explanations for the origins of fundamental and universal properties of linguistic structure.
References Brighton, H. 2002. “Compositional syntax from cultural transmission”. Artificial Life 8: 25–54. Hurford, J. R. 2003. “The neural basis of predicate-argument structure”. Behavioral and Brain Sciences 26: 261–316. Kirby, S. 2001. “Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity”. IEEE Journal of Evolutionary Computation 5(2): 102–110. Smith, K.; Brighton, H.; and Kirby, S. Forthcoming. “Complex systems in the language evolution: the cultural emergence of compositional structure”. Advances in Complex Systems. Vogt, P. 2003. “Iterated learning and grounding: from holistic to compositional languages”. In: Kirby, S. (ed.), Language evolution and computation: ESSLLI workshop proceedings 76–86.
This paper deliberates for a number of linguistic features whether they are part of UG, i.e., specific to human language, or whether they are adapted from other cognitive capacities which were evolutionarily prior to language. Among others, it is argued that the distinction between predication and reference already belongs to the conceptual system, whereas the distinction between verb and noun (which is not identical with the former one) is one of the innovations of UG. It is furthermore argued that syntax in the sense that it deals with displacement (‘movement’) is a property of human language that lies outside of UG. The paper then discusses whether linguistic typology can contribute to our knowledge of UG, and whether aiming at this is a reasonable goal for typological research. It stands against Newmeyer’s position (this volume) that typological evidence is essentially irrelevant for the construction of UG, as well as against Haspelmath’s position (this volume), who argues that typological research can do without a concept of UG.
1.
Introduction
What is meant by Universal Grammar (UG)? In short, UG is assumed to be the innate language faculty of human beings. If one tries to make precise the notion of UG a little further, many facets come into mind, two of which are the most prominent ones, and of course compatible with each other (see also Jackendoff 2002). i.
UG characterizes the set of possible human languages. This definition emphasizes the product of language acquisition. Typologists who study the set of existing human languages (which is clearly only a subset of the possible languages) might feel that UG is too weak a notion for delimiting their field of interest. But they may also believe that their own research contributes to our knowledge of UG. For instance, an unexpected structural feature of a hitherto little-known language gives us insight into what is possible for a human language. ii. UG is a human-specific learning algorithm towards language. This definition emphasizes language acquisition itself. As an innate faculty, UG becomes
148
Dieter Wunderlich
manifest in language acquisition, while languages of adult speakers depend on many more factors, such as linguistic experience and cultural contacts. Typologists might be less interested in language acquisition than, for instance, psycholinguists or neurolinguists. All innate faculties are genetically transferred, and a learning algorithm is a set of instructions of how a certain type of input is to be processed. If the input changes, the same learning algorithm yields different results. As is well-known by now, all linguistic activities are processed in certain areas of the brain, and they are based on a certain memorized inventory. UG, then, more precisely, is a description of the (genetically transferred) information for the brain of how it has to process chunks of memorized linguistic input. This is the explication of UG I am going to argue for in the following. With respect to this explication, I would like to add two remarks. First, UG does not simply support the understanding of an input (some stretch of speech together with contextual information), but rather the analysis of memorized input (although in the very beginning only little can be memorized) because all structural notions have to be detected by comparison and minimal contrast. This does not only concern lexical items and bound morphemes, but also the inventory of phonemes. The child will detect a phoneme of the input language only by inspecting some comparison set of items. Second, I do not think that the brain gets organized in implementing UG properties first, which are then modified according to the input, but rather I think that it gets organized in processing (memorized) linguistic input, supported by genetic UG information. That is, the organization of the brain, including the memory, goes hand in hand with implementing language-specific properties under the control of UG. Indeed, we feel that the neurolinguistic postulate is imperative: UG must be a specific predisposition of the human brain. In principle, everything of the language faculty that is innate must be translatable into genetically guided differentiation and organization of the human brain. And, consequently, everything that is characteristic of the language capacity of an individual being must be translatable into neuronal storage and processing. Even if linguists feel that they are dealing with features of quite specific linguistic objects such as sentences being read, they have to confess that the syntactic principles they are generalizing from these objects have ultimately to be regarded as processing principles. For instance, Fanselow, Kliegl & Schlesewsky (1999) clearly point out that a syntactic principle such as the minimal link condition has been grammaticalized from a processing principle. In this view, UG is one of the starting conditions for the human brain; it leads to a specific processing behaviour of the brain if it is confronted with linguistic input. The brain of any non-human being would react differently.1 Earlier considerations of UG have lead to an apparent paradox. Fanselow
"wun-r10"> "wun-r11">
Why assume UG?
(1992) pointed out that some putative syntactic universals claimed in the literature are so specific, and at the same time complex, that it is unreasonable to assume them to be innate.2 On the other hand, some other putative syntactic universals, though they are general enough to be innate, can be traced back to other cognitive systems, especially to the visual or geometric system.3 However, this ‘paradox’ only indicates how little linguists know about universals. It is in no way implied that linguistic universals do not exist. If certain syntactic universals turn out to be too specific, this fact rather characterizes a certain state of the art, and one is entitled to look for more general or abstract principles. And the fact that linguistic principles can make use of cognitive resources that evolutionarily were prior to the language faculty, is not surprising at all, on the contrary, it is to be expected. Nevertheless, there might be an important point in Fanselow’s observation. It could be the case that syntax (in the sense that it sets out conditions of locality and constrains movement within a sentence structure) is not the right domain in which languagespecific universals can be found. One could argue that conditions of locality and movement also play an important role in the geometric system. Therefore, syntax (in the above sense) could have been established independently from UG; it might be an innovation in the tradition of language which spells out a much more general cognitive capacity.4 To illustrate this point: The following metaprinciples, given by Eisenbeiss (2002) on the basis of many insightful studies are probably not specific for language because quite similar principles can also be found in the visual system, for instance, in the figure-ground distinction and in geometric transformations. –
– – –
Input (output) specificity: A rule α is not applied in the domain of the rule β, if the domain (range) of α properly includes the domain of β. (Here, Fanselow 1992 already argued that this is not a principle specific for UG.) Structural dependency: Rules specific to a level of representation only refer to functional units of this level and the relations between them. Economy of representation and derivation: Representations only contain necessary symbols. Rules only apply in order to satisfy well-formedness conditions. Preservation of relations: Every mapping between levels of representation preserves the asymmetric relations that hold between the involved elements.5
Hauser et al. (2002) regard discrete infinity (recursion) as the core property of the linguistic computational system, but this property also characterizes the natural numbers, hence, it is not UG-specific. (One could argue that the development of the number system has profited from the linguistic capacity, however, it could just be the other way round; note that infinite embedding is also found in the geometrical system.). Within language, recursion can be observed at different levels: in compounds, with propositional operators, and at several places within clausal structure (relative clauses, serial verb construction, verbs with propositional
149
150
Dieter Wunderlich
complements). The latter, more complex type of recursion which can affect both verbs and their arguments in any order (let us call it clausal recursion) is particularly interesting because it seems to be specific for language. Linguistic typology is concerned with the diversity of existing human languages, trying to classify these languages according to certain prominent grammatical features. Many of these classifications lead to markedness scales, which are motivated on both internal and external grounds, internally in terms of more or less complex grammatical feature combinations, and externally in terms of factors such as frequency, or cognitive biases. Linguistic typology may go on in establishing universal conditional statements of the type ‘If a language exhibits the feature α, than it also exhibits the feature β’. Such a statement is falsified if a language turns up that exhibits α but not β. Simultaneously, this statement is also a hypothesis about the way in which the human brain is working, especially the brain of a language learner; first, the brain has to identify the feature β, and only if it was successful, it can identify the feature α.6 It is quite uncontroversial that linguistic diversity is affected by UG; the reason is that all possible language change is filtered by language acquisition. Whatever linguistic means the members of a community may have acquired, it must pass the filter of language acquisition in order to become significant in the course of time. Language acquisition turns out to be the bottleneck through which all linguistic innovations must be poured in order to become a property of a natural language (see also Kirby 1999, 2002). Different frequencies in the input varieties lead to different awareness of the language learners when they try to imitate the input. Language learners also try to detect the productive ‘rules’ by decomposing and categorizing the overheard and memorized utterance chunks, and, simultaneously, they try to generalize the categories involved, again depending on frequency.7 All this structure-sensitive linguistic processing in the child is assumed to be governed by UG. Linguistic variation, then, results from the interplay of UG with possible variations in the input of language learners. In the following Section 2 an attempt is made to specify the possible contents of UG in view of the fundamental properties of human language, while Section 3 deals with the question of how typological knowledge helps us to restrict the contents of UG more narrowly.
2. UG and the language faculty Although we have established a reasonable notion of UG, my exposition suggests that certain well-known syntactic principles may have been borrowed from other cognitive resources prior to language. Therefore, it becomes necessary to substanti-
"wun-r15"> "wun-r35"> "wun-r31"> "wun-r12">
Why assume UG?
ate the possible contents of UG in a way that does not rely on syntax. Following this program, I propose to reconsider some of the fundamental properties of human language, such as those outlined by Hockett (1960) and many other researchers.8 It seems that the driving factor in language acquisition is the child’s astonishing faculty of imitation. Every child tries to imitate gestures of all kind, in particular those that have specific communicative content, to an extent that clearly outranges that of other primates (Tomasello et al. 1993; Tomasello in press). There is reason to believe that this specific human imitation faculty evolved from a faculty that other primates already possessed. As Rizzolatti et al. (1996) observed, if an ape sees another ape handling in specific ways, a part of the motoric region of its brain becomes active, as if the ape tries to imitate the hand movements of its partner. This observation has led to the so-called mirror-neuron hypothesis: some neurons of the motoric region serve to mirror the motoric actions of other individuals, given that these actions are intended to handle food. Two conclusions have been drawn from these findings. First, the further development of mirror neurons (occupying also neighboring regions of the brain) could have given rise to the evolution of other kinds of intentional actions, in particular those signalled by facial gestures. Second, manual gestures may have played an important role in the evolution of language because these gestures could easily be interpreted by internal reconstruction.9 According to this interpretation of the mirror-neuron hypothesis, it was only later that mirror-neurons also developed for vocalic speech.10 The following scenario may help us to understand how this could have happened. First, vocalic gestures (besides their function as attention and structuring signals) may have accompanied manual gestures in order to support reference to absent participants and to modify gestural predication. The vocalic utterances may then have been detached from the gestures they were associated with, for instance, to enable other than face-to-face communication under visibility conditions. Whereas the manual gestures largely functioned iconically, the detached vocalic utterances were only able to carry out the task symbolically: While they still represented a similar concept as the gestures, the relationship between the vocalic utterances and the concept became arbitrary.11 If it is true that the evolution of the imitation faculty laid the basis for the evolution of language, it becomes clear at once why symmetry (Hockett’s interchangeability) is one of the basic pragmatic factors of language. Every human language is a speaker-hearer symmetric system in that it allows for a fast turntaking; speaker and hearer can exchange their roles at nearly every moment. For the same reason do personal and spatial deixis play an important role in all languages; these domains belong to the best-documented fields of cross-linguistic study (Fillmore 1982; Levinson 1998, 2003).
151
152
Dieter Wunderlich
There are two other innate features of language that can be correlated with its motoric origin, given the hypothesis that manual gestures were prior to vocalic utterances. i.
Iconicity: Features of utterances mirror features of meaning. For sign languages it is evident that manual expressions are in many ways iconic;12 however, iconicity also plays an important role in the temporal order of phonetic expressions (such as ‘cause precedes result’, ‘agent comes first’), as well as in other phenomena based on cognitive scales. Iconicity not only allows for a first, default interpretation, but also enables effective parallel processing in which morphosyntactic parsing and the building-up of a semantic interpretation go hand in hand. Iconicity itself is certainly not UG-specific, however, in a more articulated system, it lays the ground for compositionality (every additional phonetic material is connected with some additional meaning), as well as form-meaning isomorphism. ii. Structure-sensitivity: Generalizations based on features of utterance structure are more feasible than those based on features of associated meanings or contexts. If the motor theory is right, the generation of a copy of the utterance is the primary factor of understanding, hence, it is always structural features (rather than purely semantic features) that determine which interpretation is to be derived. Moreover, it is this structure-sensitivity that allows us to establish ‘rules’ with discrete elements, which in turn serve to relieve our memory, and, simultaneously, make us able to improve expressivity. Structure-sensitivity paired with distinctive features (see below) is one of the core properties of human language; this property must have been present at the time when vocalic utterances were detached from gestures (see above) because otherwise the communicative advantage for speech (that it was not any more restricted to visibility) would have been counterbalanced by the loss of expressivity. The drive for imitation explains why the child is eager to communicate and to receive linguistic input. Imitation allows the brain to become organized for the processing of input and to then use the acquired routines for self-expression. In the process of acquisition the memory gets richer and richer, representing more and more utterance chunks with associated meanings or contexts. Here, then, another driving factor of language acquisition comes into play: economy of representation, a force that leads to a continuous reorganisation of the memory. Structural decomposition in the presence of structural similarities reduces memory load, and, simultaneously, it promises further success in imitation because it improves both the interpretation and the expression of intentions by using simpler units compositionally. Of course, economy itself is not specific to language, but its particular application to linguistic chunks being stored seems to be specific. Another precondition for language as a communicative means is the presence
"wun-r23">
Why assume UG?
of logical thinking in terms of predication and proposition. Predication means that some contextual instance is subsumed under a conceptual category, thus building up a particular proposition. These propositions constitute a language of mind, which almost certainly evolved prior to the language faculty. It is these propositions that form the possible content of linguistic utterances, and therefore became much more differentiated alongside the means that allow them to be communicated. Further candidates of UG properties come into play once structural decomposition has started in the mind of the language learner. These properties are often taken for granted because they are so omnipresent. However, I think that the combination of exactly these properties is responsible for UG. At least, it allows us to give UG some more substantial content. It is hard to see why exactly these properties could result from a brain organization developed to solve general purpose tasks; it is much more plausible that they result from a specific brain organization, and, hence, are true candidates of UG properties. Some of these special-purpose features of language are briefly discussed here. –
–
–
Distinctive features: The elementary linguistic units (‘phonemes’ or ‘signemes’) are characterized by robust categorical (distinctive) features rather than by fuzzy features (Eimas et al. 1971); evolutionarily this was an advantage because distinctive features allow us to ignore noise. In general, the working of the brain would rather result in fuzzy categorization. For instance, it depends on fuzzy categorization whether a collection of trees counts as wood or not, or whether a certain container object counts as cup, bowl, or vase (Labov 1973). Double Articulation: The elementary units themselves do not bear meaning, only some combinations of these units (such as syllables or feet) do, with the exception that functional meanings might be expressed by just one unit (or even just one phonological feature) in the case of affixes (probably resulting from a process of reduction). Evolutionarily this was an advantage because it allows for a large lexical inventory, based on quite a small inventory of phonological features. It is, however, open to discussion whether this kind of ‘hierarchical’ lexical organization has been fixed in UG or rather automatically emerges when the lexical inventory increases. Predication and reference (the two elementary semantic functions): Semantically, all lexical items are predicates, making it possible to subsume an instance under some conceptual category. An instance has to be anchored in some context, that is, it is represented by an argument variable of the predicate, which allows us to relate propositions to external states of affairs in a rather flexible way. The instance may be given indexically or iconically, or (finally) by means of a symbol. This allows us to express elementary propositions. Under the premise that logical thinking evolved prior to the language faculty, UG must include some mechanism to relate logical propositions to linguistic expressions.
153
154
Dieter Wunderlich
–
–
Lexical Categories: The lexical inventory is partitioned into at least two (widely complementary) categorial types: nouns, prototypically relating to ‘spatial’ objects, and verbs, prototypically relating to ‘temporal’ events. If this distinction is pushed into the context of predication and reference, it is feasible to add some mechanism of conversion, by which nouns can be delegated to verbs, and, vice versa, verbs to nouns (so that instances of both categories can fulfil the semantic functions of predication and reference). Evolutionarily this was an advantage because it allows for a clause-internal combination noun + verb (as the articulated expression of a minimal proposition), as well as for clausal recursivity if the possibility of conversion exists. (Note that if conversion is possible, verbs can be subcategorized for deverbal nouns, which allows for all kinds of propositional attitude verbs, including those that are classified as raising or control verbs.) Certainly, already the general operation of the brain produces partitions in a memorized inventory if it becomes large enough, but these partitions could be based on any kind of semantic or structural features, and it is hard to see why they should end up just with nouns and verbs and the possibility of conversion (which is not dictated by fuzziness). In a more developed grammar, the categorial distinction between nouns and verbs is strengthened by category-specific functional categories, such as aspect and mood for verbs and definite articles for nouns. In my view it is rather improbable that these functional categories evolved first, and then have shaped the lexicon into verbs and nouns (an idea, which lies behind Baker’s 2003 proposal that lexical categories are determined by the syntax). It is still an open question whether functional categories already belong to UG or evolved later for specializing the function of lexical categories, with regard to their role in predication and reference. Argument hierarchy: The arguments of a predicate are strictly ordered. For several cognitive reasons prior to the emergence of the language faculty, relational predicates must be possible in UG. Relational predicates are necessary to express social or part-whole relationships, as well as goal-directed actions, which must have played an important role in the cognitive mastering of elaborated tool-making. Therefore, a predicator should in principle be able to have two argument positions: an object-related predicator (a noun) should be able to have a possessor, and an event-related predicator (a verb) should be able to make a distinction between the actor (the causer, instigator, or controller of an event) and the undergoer (the patient, or theme of an event). Whatever semantic distinctions are made, and regardless of whether the two arguments participate in the same way (as in symmetric predicates such as meet), the two argument roles must be ordered. Argument hierarchy allows for a distinction of the possible instances of a relational predicate under every condition. I would be inclined to say that argument hierarchy is part of UG, so that relational predica-
Why assume UG?
–
–
–
tors (such as transitive verbs as well as body-part and kinship nouns) are expected; UG thus predicts object/subject asymmetry, which goes much beyond the agent-undergoer distinction. However, it is certainly not the case that UG has any provision for ditransitive verbs, given the amount of linguistic variety in the realization of the simplest transaction predicate give which has three argument variables. Not everything which is simple in logical or cognitive terms must be likewise simple in UG; if that is true, we have a strong argument why UG must be separated from cognitive resources. Adjunction: Predications can be combined under the condition of argument sharing, with one predicator being the head of the construction and the other being the non-head. Evolutionarily this allows for the expression of more complex propositions, with all constituents being anchored in one and the same context. It is conceivable that the human brain has the general tendency to produce figure-ground constellations, so that, for instance, predication functions as the figure and the context in which it is anchored as the ground. Such an asymmetry might have been generalized in the case of complex predication as an asymmetry between head and non-head. It could likewise be the case that the distinction between head and non-head is already forced by an UG constraint itself, which, for the sake of simplicity, is termed here as be asymmetric. (Note that such a constraint can also serve in the categorial verbnoun distinction as well as in the context of argument hierarchy.) Reference-tracking: A series of predications can attach to the same instance. This property allows for discourse economy, and simultaneously, for a fast and unambiguous interpretation of a piece of discourse. Given that reference tracking devices vary much across languages, one is not inclined to consider any specific one of them to be part of UG. For UG, it might be enough to allow for a combination of predicates, either by means of propositional attitude verbs, or by means of adjunction. Consequently, then, these means must be handled adequately to ensure both economic and unambiguous reference. A possible UG constraint that does this work could be termed as parse reference; all more specific devices such as same subject — different subject, obviative, antecedent — anaphora, reflexive, control etc. can then be regarded as complying with this general constraint. Quantification: A sentence such as every man thinks he is clever connects instances from {x is a man} with instances from {x thinks that x is clever}; generally, quantifiers connect a domain with a value, which inherently involves variable binding. Thus, quantifiers have a scope and allow bound pronouns (in distinction to anaphoric pronouns). This property almost certainly is specific to human language and is found in many variations (such as scalar adverbials, negative polarity items, focus-inducing particles, conditionals, etc.); see the
155
"wun-r20">
156
Dieter Wunderlich
overview in Bach et al. (1995), a volume that grew out of a NSF-supported project on cross-linguistic quantification and semantic typology. I am uncertain about how much of this property has to be ascribed to UG itself, and what evolved in the later interaction of linguistic means and cognitive requirements. At least, I think, UG must specify that predicates have argument variables. All the afore-mentioned properties of language are not trivial ones. In the process of language acquisition they do not automatically emerge from a global neuronal organization, neither do they depend in obvious ways on other cognitive domains. Therefore, one should consider them to be candidates of an autonomous linguistic capacity that is genetically determined. However, the way in which these properties are implemented in the brain depends on the input to the language learner. Different input may lead to different implementations, therefore we consider changes in the linguistic input to be the primary source for typological variation.13 It is generally taken for granted that UG characterizes a faculty of all human individuals, regardless to which ethnic group they belong, and that UG therefore should be traced back to the time when the modern homo sapiens came into existence, which took place somewhere in East Africa about 150,000 years ago. Since records of human language do not date back further than 6,000 years (which comprises only the last 4% of homo sapiens’ history), there lived more than 7,000 human generations that were in possession of UG and from which we lack any linguistic data. This is quite a long time for many of the typological features observed today to have developed by the interaction of UG with successively more articulated inputs. Linguists generally believe that under normal circumstances every newborn human child is able to acquire any of the languages spoken today. This serves as one of the arguments that UG is a fixed device, identical for all human beings. I am not sure whether there is a clear positive evidence that this assumption is true in every respect. I know of no investigation showing that, for instance, a European-born child masters the rather complex morphology of an Amerindian or Australian language similarly to other natives in all respects, including, among others, the parsing of these complex structures. It might well be the case that ultimately, some differences show up in parsing inverse morphology, or in parsing a ‘same subject — different subject’ device, that is, descendents of the ethnic group in which these devices have developed and descendents form other ethnic groups could behave slightly differently. Given the long history of homo sapiens, involving thousands of generations, UG could have been affected by certain mutations (see also Jenkins 2001). At least, the idea of UG variation is not totally unwarranted if one considers UG to be a genetically determined device. Yet, as reasonable as such an idea is, it is far from being explicative for any features of typological variation considered so far. All these features appear to be compatible with the assump-
Why assume UG?
tion that UG is identical for all human beings, and that typological variation is induced in the process of thousands of generations, each of them confronted with slightly more differentiated inputs.
3. UG and typological variation The typologist who is concerned with the variation among languages can observe that a few features are common to most languages in some way or the other, for instance, the marking of person and number, or of aspect, tense, and mood. These referential features either specify the individual arguments or the predicative event in which these arguments are involved. In this context, cross-linguistic variation can be traced back to a set of universal features, instantiated in each particular language to a larger or smaller extent. Some languages exhibit a formal marking of dual as being distinct from plural, while other languages include the dual category in the plural marking. Similarly, some languages exhibit a formal tense marking for past and future, while other languages only have a differentiated aspect system on the basis of which it is implied whether the event has taken place, or will take place. Thus, whether one of the universal feature values is actually found in a language depends on the extent to which the respective feature domain has been generalized. In the course of feature generalization, for instance, plural (which captures any number greater than one) wins over dual, and past (which captures any type of event) wins over completive aspect — both has often been observed in the history of languages. Referential features fulfill the UG requirement parse reference, but it seems to be the task of the conceptual system (rather than of UG itself) to predict what the system of referential features has to look like. In any case, the language learner can easily identify the relevant referential feature values from a given input; typologists will not argue about this. On the other hand, when considering certain morphosyntactic constructions across languages, the typologist will soon realize that there is too much variation which cannot be traced back to a single system of constructional features. Based on detailed observations, the typologist can reasonably exclude certain constructional features from UG, and I would like to suggest that only he or she can. For instance, given that cross-linguistically arguments can be marked by means of pronominal affixes on the verb (head marking), or by means of morphological case on the argument NP (dependent marking), or by none of these devices but rather purely positionally, the typologist can exclude the existence of morphological case from UG. The (widely complementary) notions of (generalized) accusative versus ergative could, nevertheless, be UG-conform, since all three constructional possibilities (head, dependent, or positional marking) can treat either agents or
157
158
Dieter Wunderlich
patients (undergoers) similar to the only argument of an intransitive verb.14 However, several varieties of head marking follow neither the accusative nor the ergative strategy. For instance, the so-called active systems (Lakhota) encode whether an argument instigates/controls an event or not; the voice systems (Philippine languages) encode the most prominent argument, and the inverse systems (Algonquian) encode whether the agent is higher or lower on a particular salience hierarchy than the patient. These typological observations exclude the notions of accusative and ergative altogether from UG. Another instance of very high constructional variation is the ways in which a third argument of a predicate is encoded, such as the recipient of a ‘give’ verb, or the causee of a causativized transitive verb. In a system based on animacy, the recipient is likely to be treated like the patient of transitive verbs because it is usually more animate than the theme, i.e., the object to be given. However, in a system based on accusative or ergative, the recipient is often treated as a medial argument, to be marked by dative or a medial position. Systems that do not so easily accept a third argument may either suppress one of the other two arguments in order to express the third one, or they may use a serial verb construction, in which the third argument is introduced by a second verb. Again, the typologist will find that UG says nothing about how to realize a third argument. One can easily multiply these kinds of examples. Relative constructions can be formed by a clause added to a noun in which this noun is gapped or indexed by a relative marker, or they can be formed by a full clause that includes the relativized noun, or they can be formed by a nominalization strategy, and so on. UG seems to imply nothing of how relative constructions have to be formed. The question, then, is how a language learner can identify relative constructions, and a possible answer is: All he has to do is to relate predicates to their arguments; in case of a relative construction he is confronted with an additional predicate and has to find its arguments. For conceptual reasons, individual predicates have one, two or even three arguments, and all UG requires is parse arguments.15 The most remarkable domain in which UG is silent is whether the constructions of a language are morphologically or syntactically realized. ‘Morphologically’ means that the constructions are realized by head or dependent marking, whereas ‘syntactically’ means that they are realized by positioning of uninflected elements. For encoding the role of arguments, some languages exhibit case and agreement morphology, some languages exhibit either case or agreement morphology, and some languages exhibit none of these. Reasonably, any distinction between morphology and syntax should not be seen as part of UG, and, consequently, any notions that are exclusively based on either morphological or syntactic properties should be absent from UG. So far, typologists are able to expel many features from UG, features that have
Why assume UG?
been proposed on the basis of insufficient cross-linguistic knowledge. In other words, UG is less restrictive than usually assumed. It seems that human languages allow much more options with respect to constructions than with respect to referential properties.16 As a result, constructions must be seen under a more general perspective; one has to investigate whether certain linguistic functions performed by a variety of constructions are constrained by the same type of (semantic) factors. As a beginning, let us point out that the realization of morphological case often depends on (differently weighted) cognitive scales concerning animacy and referential specificity. One often finds instances of differential object marking: an object is marked by accusative only if it is animate or definite, otherwise it is unmarked (nominative). Likewise, one finds instances of differential subject marking: a subject is marked by ergative only if it is inanimate or indefinite, otherwise it is unmarked (nominative). How one can deal with these so-called linking splits in systematic ways will be discussed in the first excursus below. Similarly, word order depends on (differently weighted) factors of language processing such as information structure and locality. In view of information structure, topic precedes focus, and focus precedes the rest of the predication. In virtue of locality, objects (such as patients or recipients) belong closer to the verb than subjects (such as agents). This will be discussed in the second excursus below. According to the functional premises of Optimality Theory (OT), language variation is generally conceived of as an interplay of three factors: expressivity (faithfulness), economy (markedness), and alignment, and it is these factors that determine the optimal construction. A successful communication requires that every intended semantic feature is expressed, and conversely, that no unnecessary semantic feature is expressed (because it leads to additional marking, and thus is costly), and, furthermore, that every semantic feature correlated with a predicative head α is expressed in the immediate locality of α (either to the right or to the left). Thus, any linguistic feature f is connected with a set of relevant constraints, such as max(+f) ‘Realize +f ’, *(+f) ‘Do not realize +f ’, and align(+f, α) ‘Align +f with α’. Languages differ in the way in which these universal constraints are ranked with respect to each other. The language learner is assumed to be able to detect these rankings on the basis of whether the constraints are violated in some pieces of the input or not (Tesar & Smolensky 1998a,b). A linguistic typology that is based on such a conception can be termed differential typology. The types of universal constraints just mentioned implement some typical processing behavior of the brain, which, however, does not mean that these constraints specifically belong to UG. On the contrary, all three types of constraints mentioned above should be relegated to the general cognitive resources which also function outside of language. In the following two excurses I will illustrate a further
159
"wun-r6"> "wun-r8"> "wun-r33">
160
Dieter Wunderlich
aspect of differential typology, namely, that the possible constraint rankings can be restricted by harmonic alignment of (independently given) scales. More specifically, I will argue that a specific linguistic scale (which probably is part of UG) interacts with some general cognitive scales.
Linking splits First, I would like to elaborate the principal structure of linking splits. For transitive verbs it is characteristic that their arguments are realized asymmetrically, which also includes various factors concerning the values of the arguments; this allows the hearer to identify the respective arguments more easily. It is more likely that the subject (or higher argument) of a transitive verb is also high in salience, being, for instance, an animate and specific entity, while the object (or lower argument) is low in salience, being an inanimate or unspecific entity. That is, under normal circumstances one can infer that the more animate or more specific argument functions as the subject, and the less animate or less specific argument as the object. Given that arguments only need to be marked if they are instantiated by nonprototypical values, languages often mark their arguments (by means of a specific morphological case, or a specific set of pronominal affixes) only if they exhibit unexpected values, that is, if subjects are low in salience, or if objects are high in salience (Comrie 1989; Dixon 1994). This phenomenon of marking an argument only under special circumstances is known as differential subject marking or differential object marking, respectively, both yielding a linking split: some instances of an argument type are marked, while other instances of the same argument type are unmarked.17 It seems that linking splits constitute a domain in which considerations about UG and cross-linguistic observations made by typologists can successfully cooperate. Following basically a proposal made by Aissen (1999, 2003), and revising it for various reasons that need not concern us here, Stiebels (2000, 2002) made a substantial progress in unifying all instances of differential argument marking. Regardless of whether the language exhibits the ergative or accusative type of marking, of whether it includes dative, and of whether it realizes the marking by means of morphological case or by means of specific sets of pronominal affixes, it is reasonable to assume that all the following considerations hold cross-linguistically. Let us first assume that argument roles are encoded by means of the features [+hr] ‘there is a higher role’, and [+lr] ‘there is a lower role’. Let us furthermore assume that both morphological case and pronominal affixes are specified by the same sort of features: [+hr] for ‘accusative’, and [+lr] for ‘ergative’, with unmarked [ ] for nominative (or absolutive). This ensures that the subject can only be realized by ergative or nominative, and that the object can only be realized by accusative or nominative, given the normal understanding of feature unification.
Why assume UG?
(1) λy +hr acc/nom
λx eat(x,y) +lr erg/nom
The two argument role features are inherently ordered, as shown in (2a).18 The marking of an object is preferred over the marking of a subject because subjects can more easily be identified with contextually given elements, and thus are often dropped from realization. In order to implement the function of salience, one of the scales in (2b) is assumed, with the value A being cognitively more prominent than the value B.19 (2) a.
[+hr] > [+lr] ‘Marking an object is preferred over marking a subject.’ A>B ‘The value A is more prominent than the value B.’ 1/2 person > 3 person pronoun > full noun animate > inanimate specific > unspecific dynamic > static imperfective > perfective
In the following, one has to understand that one of the argument roles is to be realized in the context of one of the salience scales A > B. We assume that the two scales are harmonically aligned (Prince & Smolensky 1993), which yields the two contextualized scales given in (3). (3) a. (+hr)/A > (+hr)/B b. (+lr)/B > (+lr)/A
This result can be reinterpreted in terms of markedness hierarchies, as in (4), with the reading ‘Do not realize the feature +f ’ for *(+f). Obviously, differential object marking behaves just reverse to differential subject marking. (4) a.
Differential object marking: *(+hr)/B » *(+hr)/A ‘Avoiding accusative in a B-context is better than avoiding it in an A-context.’ b. Differential subject marking: *(+lr)/A » *(+lr)/B ‘Avoiding ergative in an A-context is better than avoiding it in a B-context.’
Finally, linguistic variation arises if one of the constraints max(+hr) ‘Realize accusative’ and max(+lr) ‘Realize ergative’ intervenes at different points of the
161
"wun-r33"> "wun-r27"> "wun-r36"> "wun-r19">
162
Dieter Wunderlich
scale; in principle, one of the possible options must be chosen for object marking and one for subject marking, with respect to all possible values of A and B. (5) only illustrates this for object marking. (5) a. max(+hr) »*(+hr)/B » *(+hr)/A: Accusative is marked on all objects. b. *(+hr)/B » max(+hr) » *(+hr)/A: Accusative is only marked on objects of high salience. c. *(+hr)/B » *(+hr)/A » max(+hr): Accusative is never marked.
As an example, option (5a) may be relevant for animacy, but option (5b) for specificity; in this case, object marking is only sensitive for specificity. These two options could, however, just be reversed, resulting in a language in which object marking is only sensitive for animacy. In this way, a typology based on these considerations offers a multiplicity of individual variants. Such a typology is compatible with all the facts found in various languages (see the extended demonstrations in Aissen 2003; Stiebels 2000, 2002; Morimoto 2002a; Wunderlich 2003; and many other papers), hence, it has uncovered linguistic universals in a realistic sense. The question is where in these considerations UG fits in. Harmonic alignment itself seems to belong to the cognitive resources that are prior to (or at least independent of) UG; harmonic alignment serves to make judgments by combining features from different sets being ordered in scales. Furthermore, all possible scales A > B derive from certain instantiations of pragmatic and semantic features, which are generally relevant to be communicated. For all I know, however, the particular scale [+hr] > [+lr] is specific for language, and therefore a true candidate for UG. Once the concept of a transitive verb has been detected, such a scale must be part of this concept. Proponents of Functional Grammar often overestimate the influence of cognitive strategies in explaining typological variation. A good example in question is Jäger’s (2003) attempt to derive the existing patterns of differential case marking (ergative/accusative vs. nominative) as possible equilibria in an evolutionary game theory, in which both speaker and hearer strategies are optimally fulfilled. The crucial point in his account is the initial condition under which each of the games starts: In order to determine the weight of strategies, Jäger uses the quantitative distribution of pronominal vs. nominal subjects/objects found in corpora of English and Swedish, which is representative for the quantitative distribution of other salience factors for subjects and objects.20
Why assume UG?
(6) Distribution of pronominal vs. nominal subjects and objects in Geoffrey Sampson’s christine corpus of spoken English.
pronominal subjects (pS) nominal subjects (nS)
pronominal objects (pO)
nominal objects (nO)
198 16
716 75
It is well-known that corpus data confound several factors. One of these factors is that speakers who follow the salience scales given in (2b) of course produce more pronominal subjects than pronominal objects (914/214 = 4.3), and more nominal objects than nominal subjects (781/91 = 8.6). Moreover, the second ratio clearly exceeds the first one, which shows a further object/subject asymmetry.21 It is much better to mark an object (rather than a subject) with a full noun than to have pronominal subjects (rather than objects). This is in line with the scale in (2a) saying that object marking is preferred because it is less costly than subject marking. The game, then, introduces the possibility of ergative and accusative marking, which both are assumed to be more costly than zero marking (nominative). It is not surprising that the game develops to an equilibrium in which only the rather few nominal subjects are ergative-marked and only the rather few pronominal objects are accusative-marked. Moreover, with a different cost factor only accusative turns up in a stabile equilibrium, i.e. accusative is favoured against ergative. The game thus produces results which are typologically valid. Jäger concludes that no assumption about universal principles is needed in order to derive these results. However, the corpus data does not only reveal the operation of a salience scale, but also the object/subject asymmetry, which in my interpretation of Stiebels’ account is a possible UG factor. Since every game starts with the same distribution of weights (according to the distribution in (6)), all the factors which I claim to be universal are already built into the game. If one interprets successive games as ‘iterative learning’ (which is not quite the way in which Jäger interprets his account), one easily sees that every input — not only the first one — reflects this universal factor. In other words, if Jäger were ready to spell out the hidden factors involved in his study, he would arrive at the same results as I did. Nevertheless, studies like the one by Jäger are valuable, and also necessary, because they allow us to see the rich typological variation as the product of very few basic assumptions. It is quite clear from both what I have said and what Jäger has shown that UG does not need to contain any stipulation about (generalized or abstract) case, contrary to what many generative grammarians still think. With regard to the same set of phenomena just discussed, Haspelmath (this volume) votes for a type of functional explanation that does not rely on any theoretical framework.
163
164
Dieter Wunderlich
“Differential case-marking […] basically says that case-marking on direct objects is the more likely, the higher the object referent is on the animacy scale. A functional explanation for this is that the more animate a referent is, the less likely it is that it will occur as a direct object, and it is particularly unlikely grammatical constellations that need overt coding…” (p. 98)
There is much intuitive insight in that type of explanation. Why is the more animate referent less likely in the position of a direct object? This is because of the universal object/subject asymmetry just described. Whatever the source of an intuitive explanation is (typological observation, frequency data, or theoretical awareness), it deserves to be mentioned and to be explicated. Moreover the simple statement in the first of the quoted sentences is likely becoming more complex in the presence of more detailed information, which then has to lead to more complex explanations. It is the framework of Optimality Theory that helps us to manage this complexity. A good illustrative example is the marking of direct objects in Hindi. Roughly, the direct object of transitive verbs is realized in the accusative only if the referent of the object NP is human, animate-specific, or inanimate-definite, whereas the direct object of ditransitive verbs is always in the nominative (Mohanan 1994). (7) Ditransitive verbs in Hindi a. ilaa-ne mãã-ko baccaa /*bacce-ko diyaa. Ila-erg mother-acc child.nom/*child-acc give.perf ‘Ila gave a/the child to the mother’ b. ilaa mãã-ko baccaa /*bacce-ko detaa hai. Ila.nom mother-acc child.nom/*child-acc give.imperf be.pres ‘Ila gives a/the child to the mother’
First, there is obviously more than one scale concerned, and these scales must have different cut-off points in the sense I described in (5). Second, there must be an explanation why all these scales become irrelevant for direct objects of ditransitive verbs. Intuitively, double accusative is forbidden, whereas double nominative is allowed. Is this a functional explanation, and on what basis? Note that double accusative is allowed in many languages. And why is the indirect object of Hindi more likely to be coded by accusative than the direct object? Thus, the coding of the direct object in Hindi is beginning to become complex as soon as we take ditransitive verbs into consideration, and it is this observation that forces a more elaborated technique for handling the several interacting conditions.22 We are not interested in oversimplified statements.
"wun-r7"> "wun-r22">
Why assume UG?
Information structure and word order Another domain in which typology and UG-considerations can go hand in hand is the study of information structure. In a broad sense, the topic of an utterance is related to the given information, and the focus of an utterance is related to the new information. A topic can be left unspecified, while a focus needs to be expressed. First, harmonic alignment shows that the arguments of a transitive verb attract topic and focus differently. In order to see this, let us have the two scales in (8) be aligned, similarly to the scales in (2) above: (8) a. Argument roles: [+hr] > [+lr] b. Discourse prominence: +foc > +top ‘Focus is more salient than topic.’
Harmonic alignment yields the following markedness hierarchies: (9) a.
*(+hr)/+top » *(+hr)/+foc This implies that objects are better candidates for focus than for topic. b. *(+lr)/+foc » *(+lr)/+top This implies that subjects are better candidates for topic than for focus.
Both topic and focus can be indicated by a lexical marker, and focus can also be expressed by a cleft construction or by intonational means. In addition, both topic and focus can be expressed by syntactic word order (but usually not by morphological means, for reasons not to be discussed here). It is therefore particularly interesting how information structure interacts with word order in realizing argument structure. A clear word order preference arises in the case of V-initial type languages. Iconicity predicts that topic (related to the given information) precedes focus (related to the new information). Since focus usually fills a slot in a presupposed predication, the operator-scope principle predicts that focus precedes the rest of the predication. One therefore expects that V-initial type languages exhibit the ordering topic-focus-V, which indeed is true (see Kiss 2002 for Hungarian; Dahlstrom 1995 for Algonquian; Aissen 1987 for Tzotzil). The positions for topic and focus can, however, be generalized to argument positions, in accordance with (9): the topic position, in which subjects are more often found than objects, can be generalized to a subject position, and the focus position, in which objects are more often found than subjects, can be generalized to an object position. SVO type languages would then develop by only generalizing the position of topic, and SOV type languages would develop by generalizing the positions of both topic and focus. This is an interesting result, suggesting that syntactic argument positions have derived relatively late in the history of languages. All (or at least most) human languages may have started with head-marking, that is with a state in which arguments are encoded by pronominal affixes on the verb, while full noun phrases
165
166
Dieter Wunderlich
were only rarely used (as adjuncts). If that is true, V-initial word order is the most fundamental one, and preverbal positions could have developed by the need of expressing topic and focus. As has been shown above, these positions could then have been generalized in terms of subject and object, so that all three dominant word order types in Greenberg’s sense, namely SOV, SVO, and VSO, could have resulted from generalizations during language history.23 Another interesting result arises for syntactic SVO type languages, that is, languages that lack any morphological case. In these languages it often suffices to use the feature values [+hr] and [−hr]. All [+hr] arguments (the objects) are projected into postverbal positions, and only the subject (being [−hr]) is projected into preverbal position. (10) λy λx eat(x,y) [+hr] [−hr] postverbal preverbal
The two markedness hierarchies derived in (9) can then be reinterpreted as follows: (11) a.
*(+hr)/+top » *(+hr)/+foc In an SVO type language, postverbal positions can be blocked from hosting the topic. b. *(−hr)/+foc » *(−hr)/+top In an SVO type language, preverbal positions can be blocked from hosting the focus.
In other words, SVO type languages may allow the word order topic-V-focus, even if the object is topicalized and the subject is in focus, but never the word order *focusV-topic, including situations where the subject is in focus and the object is topicalized. Examples that correspond to the former situation can be found in the so-called inverted subject-object construction of Bantu languages (Morimoto 2001, 2002b). Finally, SOV type languages do not offer any specific syntactic position for topic or focus, nor do they block one of their syntactic positions from hosting topic or focus. Therefore, these languages (such as Japanese) have to develop lexical or constructional means, such as a topic marker or a focus construction, to express topic and focus. In these two excurses, I have shown that the assumption of a UG-determined scale [+hr] > [+lr], interacting with other cognitive scales or principles, offers interesting fields of typological study. The harmonic alignment of scales is a means by which parameters specific for language can interact with external parameters independent of language. The interaction of constraints also offers a perspective of optimal interpretation. Given some sequence of NPs, the hearer has to decide about their argument roles, and simultaneously, he has to decide about their informational
Why assume UG?
status. This twofold task seems to be one of the central topics of contemporary linguistic research. Now, let us summarize what linguistic typology can teach us about UG. –
–
–
Linguistic typology can identify a universal set of linguistic features, including referential features, determined conceptually, but also categorial features, being candidates of UG (such as features that specifically distinguish between nouns and verbs, and features that relate to argument hierarchy). Linguistic typology can establish a realistic view on variation. In certain subdomains, all languages select representatives from a small bundle of features, while in other subdomains, the languages differ considerably. Only typological research can give evidence about the actual range of constructions, and, in particular, can show us how little restrictive UG in certain constructional domains actually is. Differential linguistic typology can offer a set of universal types of constraints, to be instantiated for any relevant individual feature. These constraints are simple enough for implementing the behaviour of the brain. Moreover, the possible rankings of these constraints can be restricted by harmonic alignment. Importantly, some of the scales to be aligned in this view may turn out as UG innovations.
In general, linguistic typology helps us to elaborate the notion of UG vis-à-vis cognitive resources and vis-à-vis linguistic variation. Without linguistic typology, all considerations of UG are blind because of the lack of knowledge about languages, and without some conception of UG, all linguistic typology is mindless because it is purely descriptive. In other words, typology without some concept of UG is misguided. Linguistic typology can also make predictions about language acquisition. The language learner must have a device to evaluate – –
in what respect cognitive scales can be important for the realization of linguistic features such as morphological case, and in what respect discourse factors such as information status can be important for variations in word order.
More general, the language learner must be able to construct for each linguistic feature f a set of relevant constraints (such as max(+f), *(+f), and align(+f, α)) and to determine the ranking of the constraints on the basis of instances in which some of the constraints are violated. These constraints are assumed to be universal, though not necessarily part of UG, whereas the ranking of the constraints is assumed to be language-specific.
167
168
Dieter Wunderlich
4. Conclusions Under the assumption that linguistic diversity is the result of language change and that all language change must pass the filter of language acquisition, UG becomes an important notion also for linguistic typology. Linguistic typology, which compares structural properties of languages on a broad basis, can give us evidence for what can reasonably be assumed to be part of UG. It can establish a set of categorial features and a set of constraints that determine how these features are realized under varying conditions. It is, then, a matter of discussion of how much of this framework is dealt with by other cognitive domains and what remains to be specific for the linguistic domain. Fundamentally, language is a system of sound-meaning connections (which sets it apart from other cognitive systems); that is, all internal linguistic representations serve the mapping between the sensory-motor interface and the conceptual-intentional interface. It is almost certain that a progress made in the sensory-motor interface also enabled some progress in the conceptual-intentional interface, and thus required a more articulated computational system. Several potential factors specific for UG have been identified: structure sensitivity paired with distinctive features, the requirement of asymmetry at several levels (consonant-vowel, verb-noun, headnonhead, subject-object, binder-variable), and the requirement to parse reference. These factors are general enough to be involved in the neural architecture of the brain. In any case, UG is a description of the (genetically transferred) information for the brain of how it has to process chunks of memorized linguistic input. For this reason, it is advisable for typologists to design language descriptions under the view of the language learner, who starts with nothing but UG and other, more general, learning devices.
Notes *I am grateful to Gisbert Fanselow, Simon Kirby, Martina Penke, and Anette Rosenbach for valuable comments. It is only due to the efforts of the two last-mentioned colleagues that this paper has ever been written. 1. The assumption that UG is a predisposition of the human brain arises from logical reasons. However, a question that can be discussed reasonably is whether all (or most) of what one ascribes to UG ultimately turns out to be in fact determined by more general cognitive predispositions, in which case UG as a language-specific instruction would be empty (or nearly empty). Needless to say, any serious concept of UG presupposes monogenesis of human language, because otherwise different versions of UG would have to be distinguished. If, however, everything of UG would have to be relegated to other cognitive resources, multigenesis of language would not be excluded a priori. Conversely, if UG turns out to be a meaningful concept vis-à-vis every known language, the monogenesis view of language is strongly supported.
2. Examples: Empty Category Principle, Subjacency Principle, and Binding Principle B (for anaphoras). 3. Examples: geometrically defined locality domains, global harmony (verbfinal languages have suffixes, verbinitial languages have prefixes) as well as other parallelisms, underspecification, elsewhere condition, minimality, earliness, economy. 4. Interestingly, Chomsky (2000: 12f.) considers the ‘displacement property’ as one of the imperfections of human language (besides uninterpretable features such as those for case); imperfections of human language, however, are not determined by UG, but result from the interaction with other capacities. “Why language should have this property is an interesting question, which has been discussed since the 1960s without resolution. My suspicion is that part of the reason has to do with phenomena that have been described in terms of surface structure interpretation […]: topic-comment, specificity, new and old information, […], and so on. If that is correct, then the displacement property is, indeed, forced by legibility conditions: it is motivated by interpretive requirements that are externally imposed by our system of thought”. This leaves open to discussion whether the displacement property was already present in the proto-language spoken by the first generations that possessed UG or was innovated in later traditions of language. My own intuition on the basis of typological insights is that displacement has been innovated in later stages (see Section 3). 5. One instance of such a principle is Müller’s (2000) Parallel Movement Constraint: ‘If α c-commands β at structure L, then α c-commands β at structure L’. This principle is reminiscent of a projective mapping which preserves the relationship between vector points. 6. Kirby (1999) demonstrates how implicational universals such as the above one can arise when there are competing functional pressures on two features which in other ways are independent of each other; in this case, he argues, there is no necessity for acquisition. 7. Lexical suppletion is only found in very high-frequent items, e.g. forms of to be and to go. On the other hand, high-frequent similarities are more easily generalized than low-frequent similarities in a set of items, so that lexical idiosyncrasies may survive in the low-frequent items. Interestingly, Indefrey (2002) has found in an experimental study that the German weak declension rule for the small class of masculine nouns ending in schwa (where all singular cases have to end in -n) gets acquired rather late, not before the age of 5, and that there are even adults that did not generalize this as a rule. In this case, animacy highly correlates with masculineness, but Indefrey gives evidence that gender and not animacy triggers the rule. (See below on structure-sensitivity). 8. Hockett’s list of features includes vocal-auditory channel (together with some consequences of it), interchangeability (of speaker and hearer), semanticity, arbitrariness (rather than iconicity), discreteness, displacement, productivity, traditional transmission, and duality of patterning (double articulation). See also Hockett (1966) for a slightly modified list of features. Imitation was certainly never considered by Hockett to be a linguistic universal. 9. The assumption that the evolution of language started with manual gestures is supported by the observation that deaf children easily adopt a sign language; even if they do not get sufficient linguistic input from their hearing parents they are nevertheless able to construct a language-like gesture system (Goldin-Meadow 1999, 2003). This suggests that UG is not specialized for the vocalic-auditory channel. However, it is still controversial whether this capacity of deaf children is inherited from an earlier stage of language evolution. The following passage is from GoldinMeadow’s homepage:
169
"wun-r25">
170
Dieter Wunderlich
“[…] We have shown that, despite these impoverished language-learning conditions, American deaf children are able to develop gestural communication systems which are structured as are the early communication systems of children acquiring language from conventional language models. Moreover, deaf children of hearing parents growing up in a Chinese culture develop the same gesture systems as their American counterparts, suggesting that the deaf children’s gesture systems are resilient, not only to the absence of a conventional language model, but also to cultural variation. Where do these deaf children’s gesture systems come from? — One candidate is the gestures that hearing adults produce as they talk. Indeed, the gestures of Mandarin-speakers are similar in type to those of English-speakers. However, the gestures adults use when speaking languages typologically distinct from Mandarin and English — verb-framed languages such as Spanish or Turkish — differ strikingly from the gestures used by speakers of satellite-framed languages such as English or Mandarin. These four cultures — Chinese, American, Spanish, and Turkish — thus offer an opportunity to examine the effects of hearing speakers’ gestures on the gesture systems developed by deaf children. If deaf children in all four cultures develop gesture systems with the same structure despite differences in the gestures they see, the children themselves must be bringing strong biases to the communication situation. If, however, the children differ in the gesture systems they construct, we will be able to explore how children’s construction of a language-like gesture system is influenced by the models they see.” (http://psychology.uchicago.edu/socpsych/faculty/meadow.html) 10. Note that the motor theory by Liberman (1957) already claimed that phonetic utterances are analyzed by generating an internal copy; it is therefore legitimate that phonology is mainly based on articulatory features. 11. This can answer the question raised by Hurford (forthc.), namely how the central link between meanings and sounds was established. The sounds replaced a gesture for which this link was not arbitrary. 12. Manual gestures can express deictic relations directly, they can distinguish between several referents by placing them at distinct places in space, they can signal source and goal, and they can model many kinds of modification. Nevertheless, in an established sign language all these iconic gestures have developed into conventional means. 13. The alternative view that typological variation is determined by parameter setting (Baker 2001) is rather problematic. Most serious candidates of UG principles do not have an open parameter that can be switched on or off; they rather lead to a default realization, which might be overridden in various ways. It is also hard to see how genetic information works with open parameters. Although I basically agree with Newmeyer’s (this volume) criticism of Baker’s proposal, his conclusions are much too negative. Of course, the child does not acquire ‘knowledge of language typology’, there are nevertheless many restrictions on possible languages that determine what the child will possibly acquire, as well as what can possibly be observed crosslinguistically. 14. If the linguistic input distinguishes arguments by means of case, or pronominal affix, or position, the child will detect this in accordance with argument hierarchy. There is no need for any parameter to be set by the language learner, notwithstanding the fact that the linguist may use typological parameters for obvious descriptive reasons. The distinction between accusative
and ergative itself cannot be a parameter because all four possibilities are documented: both accusative and ergative (Hindi, Georgian), only accusative (German), only ergative (Basque), none (see the next paragraph). Only in a positional system are the two generalized notions strictly complementary to each other; if transitive verbs are realized as agent-V-patient, only two options exist for intransitive verbs: either arg-V (‘accusative system’) or V-arg (‘ergative system’). Again, a possible typological parameter does not play any role for the language learner. 15. For instance, the policeman attacked in the street shot the aggressor contains only two arguments, but also two transitive verbs. And therefore each of the two arguments must be linked to both verbs. 16. Under this perspective, studies like Kirby’s (2002) are illuminating (see the discussion in Kirby et al. this volume). Kirby shows by means of simulation experiments that human agents equipped with a fixed learning strategy are able to construct a rather articulated grammar by iterated learning within thousands of generations. This indicates that the emergence of a rather rich morphosyntax needs in fact very few preconditions: one is the learning algorithm, which in this case was a heuristically-driven grammar inducer. Another precondition for Kirby’s study was the structure of meanings (to be) expressed. The meanings took the form of simple predicate logic expressions with the possibility of recursion. Thus, the meanings already contained the property of argument hierarchy as well as propositional attitude predicates; both factors have been identified as possible UG factors above. It is not at all obvious that the elaborated predicate calculus assumed by Kirby was prior to human language; it could as well be the case that it evolved alongside with language. (Recall that in my conception clausal recursion is enabled by the invention of the verb-noun distinction.) In any case, studies like the one by Kirby have to be welcomed because they constitute a new type of evidence, which is able to flank the more intuitive reasoning of typologists. 17. There is no inherent reason why differential direct object marking figures much more prominent in the literature than other instances of differential argument marking, concerning subjects, indirect objects (see Wunderlich 2001 on Yimas) or possessors (see Ortmann 2003). Both Haspelmath (this volume) and Newmeyer (this volume) restrict their considerations to direct objects. 18. (2a) also explains why accusative systems outnumber ergative systems, which otherwise are symmetric to each other. Note also that the ranking in (2a) is morphosyntactically oriented, in contrast to the well-known hierarchy subject > object of grammatical functions, a hierarchy that plays an important role in Aissens’s (1999, 2000) account of linking splits. 19. Most of these scales have first been discussed by Silverstein (1976); only the two latter scales are added here in order to account for ergative splits determined by aktionsart or aspect (see also Dixon 1994). 20. In this respect his account is at variance with the stochastic OT propagated by Bresnan et al. (2001). 21. For definites vs. indefinites, the definite S/O ratio is 2.1 and the indefinite O/S ratio is 16.8; for animates vs. inanimates, the animate S/O ratio is 9.3 and the inanimate O/S ratio is 14.0. Only for local vs. 3rd person does the local person S/O ratio (17.1) exceed the 3rd person O/S ratio (4.7). Note that Jäger’s game theory would produce slightly different results, favouring ergative. 22. The interested reader can find a full analysis on my homepage (Wunderlich 2000). 23. Since argument positions, and topic or focus positions can still conflict with each other, there might have been the further need to encode syntactic arguments by means of case.
References Aissen, Judith L. 1987. Tzotzil clause structure. Dordrecht: Reidel. Aissen, Judith L. 1999. “Markedness and subject choice in optimality theory”. Natural Language and Linguistic Theory 17: 673–711. Aissen, Judith L. 2003. “Differential object marking: iconicity vs. economy”. Natural Language and Linguistic Theory 21: 435–483. Bach, Emmon; Jelinek, Eloise; Kratzer, Angelika; and Partee, Barbara H. (eds). 1995. Quantification in natural languages. Dordrecht: Kluwer. Baker, Mark. 2001. The atoms of language: the mind’s hidden rules of grammar. New York: Basic Books. Baker, Mark. 2003. Lexical categories. Verbs, nouns, and adjectives. Cambridge: CUP. Bresnan, Joan; Dingare, Shipra; and Manning, Christopher D. 2001. “Soft constraints mirror hard constraints: voice and person in English and Lummi”. In: Butt, Miriam; and King, Thomas H. (eds), Proceedings of the LFG 01 conference 13–32. Stanford: CSLI Publications. Chomsky, Noam. 2000. New horizons in the study of language and mind. Cambridge: CUP. Comrie, Bernard. 1989. Language universals and linguistic typology (2nd edition). Oxford: Blackwell. Dahlstrom, Amy. 1995. Topic, focus and other word order problems in Algonquian. Winnipeg: Voices of Rupert’s Land. Dixon, Robert M. W. 1994. Ergativity. Cambridge: CUP. Eimas, Peter; Siqueland, Einar R.; Jusczyk, Peter; and Vigorito, James M. 1971. “Speech perception in infants.” Science 171: 303–306. Eisenbeiss, Sonja. 2002. Merkmalsgesteuerter Spracherwerb. Eine Untersuchung zum Erwerb der Struktur und Flexion von Nominalphrasen. Dissertation, Heinrich-Heine-Universität Düsseldorf. Fanselow, Gisbert. 1992. “Zur biologischen Autonomie der Grammatik”. In: Suchsland, Peter (ed.), Biologische und soziale Grundlagen der Sprachfähigkeit 335–356. Tübingen: Niemeyer. Fanselow, Gisbert; Kliegl, Reinhold; and Schlesewsky, Matthias. 1999. “Processing difficulty and principles of grammar”. In: Kemper, Susan (ed.), Constraints on language 171–201. Dordrecht: Kluwer. Fillmore, Charles. 1982. “Towards a descriptive framework for spatial deixis”. In: Jarvella, Robert J.; and Klein, Wolfgang (eds), Speech, place and action. Studies in deixis and related topics 31–60. Chichester, NY: Wiley. Goldin-Meadow, Susan. 1999. “The development of gesture with and without speech in hearing and deaf children”. In: Messing, Lynn S.; and Campbell, Richard (eds), Gesture, speech and sign 117–132. Oxford: OUP. Goldin-Meadow, Susan. 2003. The resilience of language. What gesture creation in deaf children can tell us about how all children learn language. New York: Psychology Press. Hauser, Marc D.; Chomsky, Noam; and Fitch, William Tecumseh S. 2002. “The faculty of language: what is it, who has it, and how did it evolve?” Science 298: 1569–1579. Hockett, Charles. 1960. “The origin of speech”. Scientific American 203/3: 88–96. Hockett, Charles. 1966. “The problem of universals in language”. In: Greenberg, Joseph H. (ed.), Universals of language 1–29. Cambridge, MA: MIT Press. Hurford, James R. Forthcoming. “Language beyond our grasp: what mirror neurons can, and cannot, do for language evolution”. To appear in: Oller, Kimbrough; Griebel, Ulrike; and
Plunkett, Kim (eds.), The evolution of communication systems: a comparative approach. Cambridge MA: MIT Press. Indefrey, Peter. 2002. Listen und Regeln. Erwerb und Repräsentation der schwachen Substantivdeklination des Deutschen. Dissertation, Heinrich-Heine-Universität Düsseldorf. Jackendoff, Ray. 2002. Foundations of language: brain, meaning, grammar, evolution. Oxford: OUP. Jäger, Gerhard. 2003. Evolutionary game theory and typology: a case study. Ms., Universität Potsdam. Jenkins, Lyle. 2001. Biolinguistics. Exploring the biology of language. Cambridge: CUP. Kirby, Simon. 1999. Function, selection and innateness: the emergence of language universals. Oxford: OUP. Kirby, Simon. 2002. “Learning, bottlenecks and the evolution of recursive syntax”. In: Briscoe, Ted (ed.), Linguistic evolution through language acquisition 173–203. Cambridge: CUP. Kiss, Katalin É. 2002. The syntax of Hungarian. Cambridge: CUP. Labov, William. 1973. “The boundaries of words and their meanings”. In: Bailey, Charles-James N.; and Shuy, Roger W. (eds), New ways of analyzing variation in English 340–373. Washington, DC: Georgetown UP. Levinson, Stephen C. 1998. “Studying spatial conceptualization across cultures”. In: Danziger, Eve (ed.), Language, space, and culture. Special issue of Ethos: Journal of the Society for Psychological Anthropology 26(1): 7–24. Levinson, Stephen C. 2003. Space in language and cognition: explorations in cognitive diversity. Cambridge: CUP. Liberman, Alvin M. 1957. “Some results of research on speech perception”. Journal of the Acoustic Society of America 29: 117–123. Mohanan, Tara. 1994. Argument structure in Hindi. Stanford: CSLI Publications. Morimoto, Yukiko. 2001. Grammatical coding of topic in Bantu. Ms., Heinrich-Heine-Universität Düsseldorf. Morimoto, Yukiko. 2002a. “Prominence mismatches and differential object marking in Bantu”. In: Butt, Miriam; and Holloway King, Tracy (eds), Proceedings of the LFG 02 conference 292–314. Stanford: CSLI Publications. Morimoto, Yukiko. 2002b. From synchrony to diachrony: topic salience and cross-linguistic patterns of agreement. Ms., ZAS Berlin. Müller, Gereon. 2000. “Optimality, markedness, and word order in German”. Linguistics 37: 777–818. Ortmann, Albert. 2003. “A factorial typology of number marking in noun phrases: the tension of economy and faithfulness”. To appear in: Gunkel, Lutz; Müller, Gereon; and Zifonun, Gisela (eds), Explorations in nominal inflection. Berlin: Mouton de Gruyter. Prince, Alan; and Smolensky, Paul. 1993. Optimality theory: constraint interaction in generative grammar. Ms., Rutgers University, New Brunswick & University of Colorado, Boulder. Rizzolatti, Giacomo; Fadiga, Luciani; Gallese, Vittorio; and Fogassi, Leonardo. 1996. “Premotor cortex and the recognition of motor actions”. Cognitive Brain Research 3: 131–141. Silverstein, Michael. 1976. “Hierarchy of features and ergativity”. In: Dixon, Robert M.W (ed.). Grammatical categories in Australian languages 112–171. Canberra: Australian Institute of Aboriginal Studies. Stiebels, Barbara. 2000. “Linker inventories, linking splits and lexical economy”. In: Stiebels, Barbara; and Wunderlich, Dieter (eds), Lexicon in focus 211–245. Berlin: Akademie Verlag. Stiebels, Barbara. 2002. Typologie des Argumentlinkings: Ökonomie und Expressivität. Berlin: Akademie Verlag.
173
"wun-r34"> "wun-r35">
174
Dieter Wunderlich
Tesar, Bruce; and Smolensky, Paul. 1998a. “Learnability in optimality theory”. Linguistic Inquiry 29: 229–68. Tesar, Bruce; and Smolensky, Paul. 1998b. “Learning optimality-theoretic grammars”. Lingua 106: 161–196. Tomasello, Michael; Savage-Rumbaugh, Sue; and Kruger, Ann Cale. 1993. “Imitative learning of actions on objects by children, chimpanzees and enculturated chimpanzees”. Child Development 64: 1688–1705. Tomasello, Michael. In press. “Intention-reading and imitative learning”. In: Hurley Sandra R.; and Chater, Nick (eds), New perspectives on imitation. Oxford: OUP. Wunderlich, Dieter. 2000. Optimal case in Hindi. Ms., Heinrich-Heine-Universität Düsseldorf. Wunderlich, Dieter. 2001. “How gaps and substitutions can become optimal: the pronominal affix paradigms of Yimas”. Transactions of the Philological Society 99: 315–366. Wunderlich, Dieter. 2003. “Argument hierarchy and other factors of argument realization”. To appear in: Bornkessel, Ina; Matthias Schlesewsky, Angela Friederici, and Bernard Comrie (eds), Semantic role universals: perspectives from linguistic theory, language typology and psycho-/neurolinguistics. Special issue of Linguistics.
What kind of evidence could refute the UG hypothesis? Commentary on Wunderlich* Michael Tomasello Max-PIanck-Institut für evolutionäre Anthropologie, Leipzig
A science is a series of ‘conjectures and refutations’. The most powerful conjectures are those that are formulated in such a way that they may be easily refuted by observation. On this account, Universal Grammar is an extremely weak hypothesis. This is because (i) there are very few precise formulations of exactly what is in UG (Wunderlich’s list on pp. 153–156 being an admirable exception), and (ii) there are very few suggestions for how one might go about testing any precise conjectures that are put forward.1 Although the most common practice is to invoke UG without specifying precisely what is intended, there are some specific (though mostly non-exhaustive) proposals. The problem is that these proposals assume UG to be very different things. For example: – –
–
–
–
In his textbook, O’Grady (1997) proposes that UG includes both lexical categories (N, V, A, P, Adv) and functional categories (Det, Aux, Deg, Comp, Pro, Conj). Jackendoff’s (2002) proposal includes X-bar syntax and the linking rules ‘NP = object’, and ‘VP = action’. Pinker (1994) agrees and adds ‘subject’ and ‘object’, movement rules, and grammatical morphology. The textbook of Crain and Lillo-Martin (1999) does not provide an explicit list, but some of the things they claim are in UG are: wh-movement, island constraints, the subset principle, head movement, c-command, the projection principle, and the empty category principle. Hauser, Chomsky, and Fitch (2002) claim that there is only one thing in UG and that is the computational procedure of recursion. Chomsky (2004) claims that the only thing in UG is the syntactic operation of merge. Baker (2001) lists a very long set of parameters in UG, including everything from polysynthesis to ergative case to serial verbs to null subject. Fodor (2003) gives a very different list, with only a couple of overlaps, for example: V to I
movement, subject initial, affix hopping, pied piping, topic marking, I to C movement, Q inversion, and oblique topic. Proponents of OT approaches to syntax put into UG such well-formedness constraints as stay, telegraph, drop topic, recoverability, and MaxLex (see Haspelmath 2003 for a review). And Wunderlich (this volume) has his own account of UG, which includes: distinctive features, double articulation, predication and reference, lexical categories, argument hierarchy, adjunction, and quantification (he specifically excludes many of the other things on the above lists).
The variety of different things on this list is enough to give one pause, for sure — are they all really talking about the same thing? The problem is that there do not seem to be, as far as I can tell, any direct debates anywhere in the literature among these or other researchers about which of these or other accounts of UG should be preferred and for what reasons. Each researcher is simply free to invoke ‘UG’ in whatever form is convenient for the argument at hand. Nor is there any discussion about what type of innateness we are talking about, for example, Elman et al.’s (1996) architectural innateness or representational innateness. It is perhaps telling that evolutionary psychology à la Pinker (1997), which proposes various innate cognitive modules including language, suffers from the same basic problem: everyone has a different list of innate cognitive modules, and there are no agreed upon methods for deciding among them. As far as I can tell as an outsider, the normal procedure in generative linguistics is either to assume the existence of UG or to provide confirmatory evidence for it. Confirmatory evidence is mainly (i) the possibility of describing any and all languages in terms of X-bar syntax, movement rules, and so forth; (ii) certain ‘logical’ arguments such as poverty of the stimulus; and (iii) the existence of empirical phenomena such as deaf children who create their own languages, people who supposedly have defective grammar genes, linguistic ‘savants’, and selective language deficits in aphasic persons. But (i) just about any language can be forced into just about any descriptive system if one is Procrustean enough and has the possibility to hypothesize parameters as needed (witness the erstwhile success of describing all European languages in terms of Latin grammar); (ii) in science logical demonstrations are only as good as their premises, which are demonstrably false in the case of at least some poverty of the stimulus arguments (Pullum & Scholz 2002); and (iii) all of the empirical phenomena typically cited in favor of an innate UG are also consistent with the existence of biological adaptations for more general skills of human cognition and communication (Tomasello 1995, 2003). No, as philosophers of science since Popper (1959) have emphasized, the quest to confirm a scientific hypothesis is fruitless: we simply propose a hypothesis and hope it stands up to attempts at falsification. If it is constructed in a way that makes
"tom-r5">
Commentary on Wunderlich
it immune to falsification, then it may be a pretty picture of the world (as, for example, Freudian psychology or Marxist sociology), but it is not science. So what could constitute falsifying evidence for a specific UG proposal? Most directly, one would think that the existence of significant cross-linguistic variation in such things as basic grammatical categories would potentially falsify the UG hypothesis (especially when no one has as yet proposed anything like an adequate set of parameters to explain the variation, much less any kind of theoretical account to ‘link’ UG to language particular grammatical categories, i.e., a theory of ‘triggers’; Fodor 2003). But apparently it does not. Even more strongly, one would assume that if a basic ‘nonparameterized’ linguistic phenomenon was not universal among all languages, it could not be a part of UG. But many languages show no evidence of having any form of movement rule, and yet it is widely assumed by generative linguists (if not by Wunderlich) that these languages nevertheless employ ‘covert’ movement. And many languages show a nonconfigurational pattern of phrase structure organization, but still X-bar syntax is assumed to be universal (e.g. Radford 1997). If these kinds of observations do not falsify the UG hypothesis, then what kinds of observations possibly could? One final point. I think it is important that the oddness of the UG hypothesis about language acquisition be emphasized; it has basically no parallels in hypotheses about how children acquire competence in other cognitive domains. For example, such skills as music and mathematics are, like language, unique to humans and universal among human groups, with some variations. But no one has to date proposed anything like Universal Music or Universal Mathematics, and no one has as yet proposed any parameters of these abilities to explain cross-cultural diversity (e.g., +/- variables, which some cultures use, as in algebra, and some do not — or certain tonal patterns in music). It is not that psychologists think that these skills have no important biological bases — they assuredly do — it is just that proposing an innate UM does not seem to be a testable hypothesis, it has no interesting empirical consequences beyond those generated by positing biological bases in general, and so overall it does not help us in any way to get closer to the phylogenetic and ontogenetic origins of these interesting cognitive skills. And so in the context of this volume on “What counts as evidence in Linguistics? — The case of innateness”, my challenge to Wunderlich and other proponents of an innate UG — a challenge that may be directed at anyone in any scientific field who proposes any hypothesis — is simply: What exactly is and is not in UG and what kind of evidence could possibly refute the UG hypothesis?
Notes *I would like to thank Adele Goldberg for useful comments on an earlier draft. 1. I am following Wunderlich throughout in assuming UG to be a hypothesis about “a human specific learning algorithm towards language”, a brain module genetically specified for language.
References Baker, M. 2001. The atoms of language. New York: Basic Books. Chomsky, N. 2004. “Three factors in language design: background and prospects”. Invited address at the 78th annual meeting of the Linguistic Society of America, Boston. Crain, S.; and Lillo-Martin, D. 1999. An introduction to linguistic theory and language acquisition. Oxford: Blackwell. Elman, J.L.; Bates, E.; Johnson, M.; Karmiloff-Smith, A.; Parisi, D.; and Plunkett, K. 1996. Rethinking innateness: a connectionist perspective on development. Cambridge, MA: MIT Press. Fodor, J. 2003. “Evaluating models of parameter setting”. Handout, LSA Summer Institute. Hauser, M. D.; Chomsky, N.; and Fitch, W. T. 2002. “The faculty of language: what is it, who has it, and how did it evolve?”. Science 298: 1569–1579. Haspelmath, M. 1999. “Optimality and diachronic adaptation”. Zeitschrift für Sprachwissenschaft 18(2): 180–205. Jackendoff, R. 2002. Foundations of language: brain, meaning, grammar, evolution. New York: OUP. O’Grady, W. 1997. Syntactic development. Chicago: Chicago University Press. Pinker, S. 1994. The language instinct: how the mind creates language. New York: Morrow Press. Pinker, S. 1997. How the mind works. New York: Norton. Popper, K. 1959. The logic of scientific discovery. New York: Basic Books. Pullum, G.; and Scholz, B. 2002. “Empirical assessment of stimulus poverty arguments”. Linguistic Review 19: 9–50. Radford, A. 1997. Syntactic theory and the structure of English. Cambridge: CUP. Tomasello, M. 1995. “Language is not an instinct”. Cognitive development 10: 131–156. Tomasello, M. 2003. Constructing a language: a usage-based theory of language acquisition. Cambridge, MA: Harvard University Press.
Author’s response Is there any evidence that refutes the UG hypothesis? Dieter Wunderlich Universität Düsseldorf
The UG hypothesis should fairly be seen as a general idea that directs scientific investigation rather than a specific hypothesis that can be refuted on the basis of particular empirical evidence. The UG hypothesis was and is characteristic for a scientific paradigm called ‘Generative Grammar’, which, as a whole, was extremely fruitful. Within a few decades the traditional arsenal of descriptive techniques was shifted to an ambitious theory of language, and thus linguistics was established as a modern discipline successfully playing part in the concert of scientific disciplines. The UG hypothesis evokes the idea of language as a specific human ability that is not available to other primates and is not just a particular adaptation of more general cognitive abilities. The constant appealing to UG can also be seen as an emancipating act of linguists who intended to move away from philological, sociological or psychological traditions towards a more naturalistic conception of language. So it does not surprise that the claim of UG being in principle an empirical hypothesis has been upheld for more than forty years, although the possible contents of UG have shifted and have been revised in many ways. There is, in fact, an ongoing debate about which properties of language are universal, and which of these belong properly to UG. Of course, each linguistic theory within the generative grammar paradigm claims somewhat different ingredients of UG, but there is also much convergence. In any case, in order to go beyond pure observation it is necessary to spell out what kind of linguistic phenomena have to be expected under a certain theory. My own argumentation was that the study of languages becomes interesting when we assume some specific contents of UG, and that there is good reason to reduce these contents to a few, very general principles, once one is guided (i) by cross-linguistic inspection, and (ii) by considering what cannot reasonably be attributed to other cognitive abilities. It is not easy, however, to find empirical tests to prove whether a certain property of language is similar to a geometric, numeric
180
Dieter Wunderlich
or musical property. I proposed move to be not in UG, but rather instantiated as a further innovation on the basis of general geometric abilities. In this respect, I feel supported by Chomsky (2004), who includes only merge in UG (as cited by Tomasello). My own suspicion was that not merge as such but the ability to make up linguistic categories is the main innovation of UG. (Linguistic categories are motivated semantically, however, they are generalized with respect to their combinatorial function, i.e. with respect to merge). As I said in my paper, there are further possibilities of restricting the contents of UG. Considering the model of iterative learning, UG is either to be identified with the starting condition of the model or with the assumptions that are held constant for all generations. Everything that turns up as a result of iterated learning in some later generations of language learners, however, clearly is excluded from UG. Specific contents of UG can only be proposed or falsified on the basis of specific linguistic investigation (and not from general considerations), but even this can be done only indirectly because a human learning algorithm is not subject to direct observation. Apart from all this, however, it is possible to pose the general claim “There must be some UG” on the basis of external evidence. Everything what Tomasello summarizes under (3), such as selective language deficits, linguistic ‘savants’, deaf children of hearing parents creating their own language, etc. prima viste supports such a general claim, and only by specific additional investigation one would be allowed to make more conclusive inferences. But in no case can these phenomena be interpreted as refuting the UG hypothesis. As long as a hypothesis is not falsified (according to certain standards), we are entitled to subscribe to this hypothesis, provided we believe it to be fruitful.
"wei-n*"> "wei-r9">
A question of relevance Some remarks on standard languages* Helmut Weiß Universität Regensburg
Data from natural languages (in contrast to, say, the results of psycholinguistic experiments) are still a major source of evidence used in linguistics, whether they are elicited through grammatical judgments, as in generative linguistics, or by collecting samples, as preferred in typology. The underlying assumption is that data are alike in their value as evidence if they occur in natural languages. The present paper questions this assumption in showing that there is a difference in the naturalness of languages because languages like German or English have originally emerged as secondarily learned written languages, that is they once were languages without native speakers. Although they are nowadays acquired as first languages, their grammars still contain inconsistent properties which partly disqualify standard languages as a source of evidence.
Science is a very strange activity. It only works for simple problems. (Chomsky 2000b: 2)
1.
The problem with standard languages
If natural languages are, in the words of John Lyons (1991: 1), “acquired by its users without special instruction as a normal part of maturation and socialization”, what about languages which are only learned by special instruction? And if natural languages are spoken (or signed), what about languages which are only written? From the point of view of today, it is surely hard to imagine that such languages could exist, but there is no doubt that standard languages once were such languages. They started as languages which were only used in writing and learned as secondary languages for this purpose, in a sense, they were very special languages. Furthermore, if (in Chomsky’s words) “each language is the result of the interplay of two factors: the initial state” — or Universal Grammar (UG) — “and the course
"wei-r9"> "wei-r15">
182
Helmut Weiß
of experience” (Chomsky 2000a: 4), are we not allowed to expect that standard languages which were — for some time in their history — only learned by instruction could have developed properties which are not entirely due to UG? As I will show in my paper, this is indeed the case.1 The result of my considerations will be that at least some data (though not all) from standard languages should be used with particular care in linguistics because their value as evidence is very restricted due to their exceptional emergence. I will restrict the discussion to generative linguistics and typology. The paper is organized as follows: in Section 2, I will discuss and define the linguistically exceptional nature of standard languages. In Section 3, I will show the prominent role data from standard languages play in linguistics (both generative linguistics and typology) by discussing analyses of negation that are heavily based on standardized languages like English or German. In Section 4, I will investigate German n-words showing that they are, contrary to their superficial behavior in Standard German, neither quantifiers nor negative. In Section 5, I will present further examples (pronouns, articles, etc.) that demonstrate that Standard German has syntactic properties which did not result from natural language development alone, but were due to extra-linguistic forces (like prescriptivism). In Section 6, I will finally make a plea for a serious cooperation between theoretical and sociolinguistics, and I will show how one can use sociological background information in synchronic and diachronic linguistics.
2. The linguistic nature of standard languages Let me first briefly illustrate the exceptional nature of standard languages with the history of Standard German, which is an extreme, albeit representative example (see below). Standard German emerged in the 14th/15th century as a purely written language for special purposes, like administration. Even when it started to be additionally used in literature, it was confined to writing and it was hardly employed in colloquial speech before the 19th century. More importantly, Standard German was not acquired as a first language before the second half of the 20th century (Durrell 1999: 302) and hence was a language without native speakers for more than 500 years. The lack of native speakers is an astonishing fact that should have gained more interest, which surprisingly and unfortunately it had not — neither from generative linguists nor from more traditional historical linguists. The 500 years during which Standard German was not acquired as a first language may have been an extremely long time span, but a similar situation can be observed in other European countries. According to van Marle (1997), the linguistic situation up to the 19th century in the Netherlands as well as in other
"wei-r45"> "wei-r10">
A question of relevance
European countries was characterized by the existence of only a written standard and spoken dialects. Van Marle (1997: 14) underscores the fact “that for centuries it was only a relatively small part of the population that was engaged in using” the written standard — exactly as was the case in Germany. On the other hand, standard languages were not totally unrelated to some kind of native competence even in those times. E.g., those people writing Standard German in the 18th century had first acquired dialects as native languages before they learned Standard German. This was even the case with Germany’s most famous writer Johann Wolfgang von Goethe: his native tongue was the Hessian dialect as spoken in Frankfurt, the city where he was born 1749.2 All of his contemporaries, if they ever wrote at all, did so in another language than they spoke. Since the German dialects and the evolving standard were related ‘languages’, though not identical, the development of Standard German as a whole was indirectly connected to first language (L1) acquisition. This last factor prevents standard languages from being artificial languages. However, the lack of first language acquisition for some time in the history of standard languages is nevertheless the crucial point: since they were not subject to L1 acquisition, it is conceivable that their grammars have acquired properties which could not have emerged from natural language change alone.3 An example, which I will discuss below, is the disappearance of multiple negation in the standard variants of English and German due to normative pressure. No such development happened in the historical dialects of both languages. The lack of L1 acquisition is surely the most relevant criterion which determines the linguistic nature of standard languages to a great extent. Language acquisition seems to be the driving force behind grammar or language change,4 as, for instance, the development of Ivrit out of Classical Hebrew demonstrates in a very impressive way. Classical Hebrew, being a “dead language” (Comrie 1990a: 968) for more than 2000 years, was reintroduced into life in Israel as the “community’s normal means of communication” (Comrie 1990a: 968), and then something unexpected happened:5 Although the original colonists in Israel tried to teach their children a pure form of Classical Hebrew — which for them was a religious and liturgical language they had never used in conversation — a lot of structural changes took place during the acquisition of the language by the sabras, the new speakers of the language who were born in Israel. [There are s]ome […] new features that set Ivrit apart from the classical language […], and in general, the disappearance of many irregularities and idiosyncratic constructions that were common in the written language. (Versteegh 1993: 547)
Besides the regularization of Classical Hebrew and the emergence of new properties there is a further point showing an interesting property of language acquisition: some (though not all) features of Ivrit were already present in Classical Hebrew,
183
"wei-r46"> "wei-r48">
184
Helmut Weiß
but occurred there only rarely. As Versteegh (1993: 547) rightly claims, “they cannot have played an important role in the acquisition process, as far as the input of the parents is concerned” (my emphasis). Therefore, for them to become common or obligatory in Ivrit, the process of acquisition itself must be blamed for (or UG, see Kirby et al. this volume, for an overview of different definitions of UG). The example of Ivrit shows that, whatever it may be that underlies language acquisition, it must be a process that is restrictive and creative at the same time. Restrictive because it limits the form of grammars or languages, and creative because there are features in the output of learners which are absent in their input. Bearing this in mind, it should be obvious that standard languages must be exceptional because they were not acquired as first languages for some time in their history. The restrictions imposed by L1 acquisition do not play an important role for them. In Weiß (1998, 2001), I have proposed to use the presence or absence of L1 acquisition as a criterion to define the naturalness of languages. According to this proposal, one can distinguish between first order and second order natural languages (N1 and N2 languages, respectively) and define them as follows: N1 languages are subject to L1 acquisition. N2 languages are not subject to L1 acquisition. According to these definitions, standard languages were N2 languages at those times when they were not acquired as first languages, whereas dialects, which are only transmitted from generation to generation by L1 acquisition, are N1 languages at any time. As for Standard German, the situation has dramatically changed during the last 50 to 100 years because today it is widely used as a spoken language in everyday communication and acquired as first language (cf. Weiß 2004a, 2005c,d). The consequence is that German now exists in many varieties, of which Standard German is only one. However, the crucial point is that even the more informal colloquial variants are all descendants of Standard German, that is they are all derived from an N2 language. Therefore, an absolute escape from the dilemma discussed here does not exist because all variants of German, even the nonstandard ones, have their origin in a language that was only written and never acquired as first language. Many other standard languages also underwent this process of renaturalization.6 However, they all have a special heritage due to their history, in that their grammars still contain some inconsistent properties. A well-known example is what Emonds (1999) called grammatically deviant prestige constructions such as (1a). In English, a pronoun in the second conjunct of a conjoined subject — and in many other environments — receives accusative instead of nominative, as in (1b). However, the ‘correct usage’ — as in (1a) — requires nominative, which, therefore, is a grammatically deviant usage.
"wei-r18"> "wei-r43">
A question of relevance
(1) a. Our landlord and we very often disagree. b. Our landlord and us very often disagree.
Such constructions (as in 1a) are “not part of a dialect spoken (and hence acquired) as a native language by any natural language speech community. Rather […] the standard or prestige usage is not a grammatical construct, but an extra-grammatical deviation imposed in certain, especially written forms of language exclusively through paralinguistic cultural institutions of the dominant socio-economic class” (Emonds 1999: 235). Such extra-learned rules are called grammatical viruses by Sobin (1997) because they “are not generative, but parasitic on the generative system” (Lasnik & Sobin 2000: 352). They are thus a very special part of the linguistic competence of adult speakers. Concerning the grammatical system of a language, grammatically deviant constructions represent an inconsistency because they do not result from the grammatical properties, but, for instance, follow extra-learned rules like the ‘correct usage’ of nominative in (1a). However, prescriptivism is not the only source of grammatical inconsistency. Another one is literacy, i.e. the mere fact that standard languages once were exclusively written languages (an example that will be discussed below is the pronominal syntax of German). If grammatically inconsistent properties are especially relevant for typology, I will call them typologically inconsistent. There are certainly many other mechanisms which can trigger the emergence of grammatically, typologically or otherwise inconsistent properties. A major source of such developments seems to be borrowing induced by language contact (F. Newmeyer, p.c.). This means that no language/dialect is grammatically consistent in all respects, but it is reasonable to assume that N1 languages, which are subject to L1 acquisition, tolerate inconsistency to a lesser extent than N2 languages. The examples discussed below clearly confirm this hypothesis for German, since in all these cases, the dialects show a greater consistency than the standard variant does. To summarize the discussion so far, (i) standard languages once were N2 languages because they started as secondarily learned and exclusively written languages, and (ii) they still contain grammatically deviant prestige constructions or grammatical viruses. The second point makes standard languages problematic for certain kinds of linguistic investigations, as will be shown in the following sections.
3. The problem with data from standard languages A striking example of grammatical inconsistency is the syntax of negated indefinites or n-words. N-words in Standard English and Standard German can negate a sentence of their own, as is shown in (2a,b):
185
"wei-r42"> "wei-r33"> "wei-r20"> "wei-r4">
186
Helmut Weiß
(2) a. I saw nobody b. ich sah niemanden
Without any doubt every native speaker of English and German will judge them grammatical and this obviously makes them reliable for whatever kind of linguistic research one intends to do (though they are only grammatical due to prescriptive rules, as we shall see below).7 Therefore, there will be no problem with intuitions here because they seem to be very uniform. The negation pattern illustrated by (2a,b) is an example of a clear case (cf. Schütze 1996: 23) at first look which turns out to be a false friend on closer scrutiny, however. The problem with the data in (2a,b) arises from the fact that such constructions are the product of language change forced by normative pressure rather than language internal factors. Multiple negative expressions negating only once really seem to be a strange thing, and for a long time, people reflecting about it had held it to be an imperfection. For example, Otfrid of Weißenburg — the first German poet who is known by name — was already aware of it in the ninth century and had held it to demonstrate that his native language was uncultivated when compared with Latin (cf. Weiß 2004a). Therefore, there was always a strong bias against multiple negation, and this bias was responsible for the fact that, in the course of standardization, multiple negation was discriminated as illogical and excluded from what people thought to be good German. However, it needed several centuries to completely banish multiple negation and it kept occurring until the end of the 18th century. As Langer (2001) has shown in detail, “the ungrammaticality of polynegative structures with negative reading in modern standard German was influenced in a major way by the rationalist thinking that provided the framework for [German] grammarians of the eighteenth century” (Langer 2001: 171f.). A similar process seems to have happened in English (van Gelderen 2004). After the disappearance of the clitic negator ne in Late Middle English, the adverb not which originally occurred together with ne came to be used as the general negation in early modern English. Whereas in the English dialects the original negative concord (NC) system was re-established in that the new negator not harmonically occurs with other n-words, the same development was suppressed in Standard English, “due to tremendous prescriptive pressure” (van Gelderen 2004: 85) especially in the 18th century. It was precisely the above mentioned lack of L1 acquisition which enabled such strange things as prescriptive prohibition against multiple negation to become successful. Note that in contrast to what took place in the standard variants, the historical dialects of both languages were only transmitted by L1 acquisition, and they remained NC languages (see Anderwald 2002 for NC in English dialects; and Weiß 1998, 1999 on Bavarian, a German NC dialect). This difference is crucial for the reliability of data used as evidence. The
"wei-r27"> "wei-r18"> "wei-r34"> "wei-r12">
A question of relevance
language externally induced emergence of such constructions is, I think, an equally important problem, as are the problems of variability and intuitions which have already been discussed in the literature for some time (cf. Schütze 1996; Henry 2002). I think everybody will admit that constructions shaped by prescriptive rules do not form a natural class with constructions resulting from language change driven by internal factors alone. Therefore, the former should not be treated on a par with the latter. The problem further increases because such constructions are not always as easily discernible as were the English examples discussed by Emonds (1999) or Lasnik and Sobin (2000). It is known to be a naive illusion that prescriptive knowledge can be distinguished from linguistic intuitions proper by normal speakers in any case (cf. Schütze 1996: 161). Constructions shaped by prescriptive rules are not coming with a label like ‘explicitely learned in school’ or something like that, so there is no problem in factoring them out. There is always the possibility that explicitly learned conscious knowledge becomes unconscious implicit knowledge or even that such rules are secondarily and implicitly learned (e.g., via imitation). Firstly, one has to be aware that learning/acquisition — a process — and knowledge — the result — are not the same, and secondly that this distinction holds independent of the implicitness or explicitness of both, cf. DeKeyser (2003: 315): “Even though implicitly acquired knowledge tends to remain implicit, and explicitly acquired knowledge tends to remain explicit, explicitly learned knowledge can become implicit in the sense that learners can loose awareness of its structure over time”. So it is not necessarily the case that prescriptive rules are part of the explicit linguistic knowledge, as is commonly assumed (cf. Schütze 1996: 88), they also can belong to the implicit part, which makes them hard to filter out as disturbing noise in the process of grammaticality judgment. Judging from the relevant literature on this topic (see below), the negation pattern illustrated in (2a,b) is an example which is mostly taken to be developed only on language internal grounds even by linguists. Such constructions are thus a real problem for linguistics, in that they give the impression of being the result of language internal change, though their emergence was due to external factors. Although they are judged grammatical, this judgment involves prescriptive knowledge which is “of no use” (Schütze 1996: 83) for most kinds of linguistics. Constructions like the ones in (2a,b) are a special kind of artifact because they are judged grammatical, although this judgment involves prescriptive knowledge. It is true that sentences like (2a,b) sound very natural to native speakers of English and German (F. Newmeyer, p.c.) because they are grammatical in any sense. Yet the crucial point is that the syntax of negation in Standard English/ German also contains the rule ‘Do not use n-words together with the negation’, which is part of the prescriptive knowledge — and this prescriptive knowledge is
187
"wei-r48"> "wei-r9">
188
Helmut Weiß
involved when judging the grammaticality of sentences like (2a,b). For Standard German, it is quite unproblematic to analyze it as a hidden NC language (cf. Weiß 2002a,b), that is to assume that the negative particle nicht ‘not’ gets obligatorily deleted at PF — an operation which exists as a possibility in the German NC dialects as well (cf. Weiß 1998: §IV, 1999 for Bavarian). In other words, all we have to assume is that an optional syntactic PF operation becomes obligatory in the course of standardization. This development does not alter anything more deeply rooted in the grammar of German, but it easily hides the underlying grammar. In the following I will show that artifacts of this kind (as well as others) play an eminent role in linguistics, although they should not do so. For ease of discussion, I will assume that there are two ways to study language, first from an internalist perspective, and second from an externalist one. The first perspective — the cognitive one — is what is taken, for example, in Generative Grammar. For Chomsky (2000a:5), “the cognitive perspective regards behavior and its products not as the object of inquiry, but as data that may provide evidence about the inner mechanisms of mind and the ways these mechanisms operate in executing actions and interpreting experience”. In addition, a substantial part of any speaker’s linguistic knowledge — i.e. the initial state (or UG) with which language acquisition starts — is thought to be “genetically determined” (Chomsky 2000a:4). The language of adult speakers is “internally represented in the mind/brain” and this I-language is the “output” of UG and as such accessible for linguistic research whose real object of investigation, however, is UG (Chomsky 2000a:4). Now the crucial issue is that, since I-language is a mental state just like UG, it is not directly accessible, so the only possible way to get access to I-language is to study its products. That is what we are actually doing is studying external language. This issue is also addressed by Schütze (2003) who points to the widely held misconception that grammaticality judgments provide direct evidence about competence. It is impossible to gain access to whatever mental state or faculty you want to study through introspection. That is, even with grammaticality judgments we only value verbal behavior or performance. Seen in this light, the distinction between internal and external language, though presumably actually existing, loses some of its relevance for everyday linguistic research. We can only access external language directly (via grammaticality judgments or other kinds of data). Therefore, the issue discussed in this paper, namely the restricted reliability of some grammatical sentences taken from standard languages, is a real problem for cognitive linguistics as well: these sentences do not give us access to UG because their properties did not result from principles or constraints of UG, but were forced by prescriptive pressure. Now to return to my hypothetical argumentation: In the internalist perspective, sentences like (2a,b) should reveal something about the underlying ‘mental’
"wei-r6"> "wei-r10"> "wei-r25">
A question of relevance
syntax of negation, hence about UG. Beghelli & Stowell’s (1997) principle of ‘Uniformity of Quantifier Scope Assignment’, as given under (3), is an example where n-words are analyzed as negative quantifiers which take scope from the specifier of the NegP.8 (3) The Uniformity of Quantifier Scope Assignment (Beghelli & Stowell 1997: 74): NQPs [negative quantifier phrases] take scope in the Spec NegP, where their [+Neg] feature is checked via Spec-Head agreement with the (silent) Neg0 head.
Obviously, such an analysis is appropriate for English and German n-words. Since they can negate a sentence on their own, they seem to have negative meaning and to be quantifiers. The problem is that, even if this were really the case,9 it does not follow from UG. It is not possible to build a UG principle upon sentences like (2a,b). It is just an idiosyncratic property that n-words in English and German are not compatible with negation. On the other hand, the externalist perspective is found, e.g., in typology.10 There, samples of data from various languages are taken to investigate actually attested language in order to make generalizations about the data, or in the words of Comrie (1990b: 447): “The overall aim of linguistic typology is to classify languages in terms of their structural properties”. Since structural properties are what typology is interested in, the issue discussed here is relevant as well (though admittedly to a somewhat lesser extent): as we will see, standard languages can possess grammatical properties which deviate from the ones expected for the language type they belong to (see Weiß 2004a, 2004b for a broader discussion on that issue). The externalist perspective takes the sentences (2a,b) to represent one pattern of negation among others. Take as an example Dahl’s (1993: 919f.) typology of patterns “to express existential quantification within the scope of negation” which is given under (4). Dahl’s third type, which is termed “stand-alone inherently negative quantifiers”, covers the type of n-words found in English and German (Haspelmath 1997: 201 classifies n-words in a similar manner). Dahl treats them on a par with the indefinites of the any-type and n-words in NC languages. He further suggests that, historically, “the normal development would be from (i) to (ii) to (iii)” (Dahl 1993: 920), thus clearly implying that the English/German type of n-words developed within the course of natural language change.11 (4) Dahl’s (1993: 919f.) typology of patterns ‘to express existential quantification within the scope of negation’ i. Standard negation + quantifiers marked for non-affirmity (not … any-) ii. Standard negation + inherent negative quantifiers (not … no-) iii. Stand-alone inherently negative quantifiers (no-)
189
"wei-r48">
190
Helmut Weiß
Beghelli & Stowell as well as Dahl take the English/German type of n-words as reliable data. However, as I have shown above, these data have a restricted value because their properties are partly due to the special and exceptional conditions which held for standard languages at their beginning, among which the lack of L1 acquisition is surely the most important one. L1 acquisition imposes some constraints on the form of languages and those which are not to be acquired as first languages may deviate from these constraints at some points. This is the reason why the negation pattern in (2a,b) is exceptional — a point which is totally neglected by Beghelli & Stowell as well as by Dahl. That this particular language change was forced by language external factors and became possible only because the standard variants of English and German were learned by instruction and not acquired as first languages, is also proved by the fact that the dialects of both languages — N1 languages which were only L1-acquired — are still NC languages (see above). It is furthermore an exceptional process because in that way, people usually do not think about, e.g., where to put the verb in a sentence. So we should not mix up the two cases, but keep them clearly apart.
4. Negation: A closer look revealing facts and fiction Even grammatically, the pattern illustrated by (2a,b) is exceptional in many respects. German used to be a language with NC since the times of Old High German and the modern dialects still are NC languages. In NC languages, n-words co-occur with the negative particle, thus being presumably neither negative nor quantifiers (Weiß 2002b) — in contrast to what is assumed for the English/German type of n-words. Now, since prescriptive rules are only constraints on the use or occurrence of lexical items (or syntactic constructions), the interesting question is whether the linguistic nature of these items can be changed as well. That means for our concrete example, did the rather superficial change of occurrence make n-words in English or German change into negative quantifiers. If this were indeed the case, data like the one in (2a,b) would become relevant again because that would be a linguistically substantial change. The data in (2a,b) look as if this really had happened: since there n-words can negate a sentence of their own, they must be negative quantifiers. However, a closer look at a broader range of data yields another result. For example, as is shown in Penka (2002), German n-words occurring in environments like modal verbs or idioms can get the correct interpretation only when split up in a negative and in an indefinite, non-quantificational part.12 In such environments they cannot be negative quantifiers which are raised as a whole to their scope position (as assumed
"wei-r6"> "wei-r39">
A question of relevance
by Beghelli & Stowell 1997). Compare sentence (5a) (from Penka 2002: 21) — with an adequate paraphrase in (5b): the negative indefinite keinen Bären ‘no bear’ is part of an idiomatic expression which must be interpreted as a whole in order to grasp its idiomatic meaning and which is embedded under a modal verb. However, the negation contributed by the n-word kein ‘no’ scopes over both the idiomatic expression and the modal verb, whereas the indefinite part einen Bären ‘a bear’ remains in situ, otherwise we would not achieve the idiomatic reading (cf. Penka 2002:21–23). (5) a.
mir kannst du keinen Bären aufbinden me can you no bear tie-on ‘You cannot hoax me’ b. es ist nicht möglich, dass du mir einen Bären aufbindest it is not possible that you me a bear tie-on ‘It is not possible that you hoax me’
Further evidence comes from coordinated sentences with a VP-ellipsis in the second conjunct which demonstrates that n-words in German are semantically non-negative (cf. Weiß 2002b). Consider sentence (6a): the first conjunct contains an n-word, the second the VP-ellipsis. In order to be recoverable, deleted VPs have to be interpreted as being semantically identical with an antecedent VP, which in our example contains an n-word. Thus the underlying pre-ellipsis form of the second conjunct is something like (6b). The surprising point is that in the second conjunct the negative particle nicht has to be present and it occurs without canceling the negation of the n-word in the deleted VP. That means that in Standard German negative concord is at work as well.13 (6) a.
weil er im Haus [VP niemanden sah] because he in-the house nobody saw und im Garten auch nicht [VP e] and in-the garden also not ‘Because he saw nobody in the house and neither in the garden’ b. und im Garten auch nicht [VP niemanden sah] and in-the garden also not nobody saw
Additional evidence for the non-negativity of German n-words comes from (7), where the deleted n-word must be semantically identical with someone in the second conjunct, otherwise we would get the wrong reading (i.e. that he has seen no-one in the garden).14
191
"wei-r48">
192
Helmut Weiß
(7) a.
weil er im Haus [VP niemanden sah] because he in-the house nobody saw im Garten aber schon [VP e] in-the garden but MP ‘Because he saw nobody in the house, but in the garden’ b. im Garten aber schon [VP niemanden sah] in-the garden but MP nobody saw
Therefore, there is ample evidence that Standard German is a hidden NC language. This is confirmed when compared with languages which show negative concord or not in an unambiguous manner. The clearest criterion that a language is not an NC language is that it does not possess n-words (compare for the following Weiß 2002a). Hindi is such a language.15 It has two morphologically distinguishable paradigms of weak indefinites: one that is used in positive sentences, as illustrated by (8a), and the other is used in negated sentences and in negative polarity environments, as we can see in (8b,c) (the sentences are Lahiri’s 1998 6b, 10a). (8) a.
koii aayaa [POS] someone came ‘Someone came’ b. koii-bhii nahiiN aayaa [NEG] anyone not came ‘Nobody came’ c. agar raam kissi-ko bhii dekhegaa [NPI] if Ram anyone see-fut ‘If Ram sees anyone’
Italian is an example of an NC language which also has two different forms of indefinites, namely qualcuno and the n-version nessuno. Interestingly, the distribution in the three contexts is different from that found in Hindi because in Italian positive sentences share the indefinite with negative polarity contexts, as one can see in (9a,b). (9) a.
ha telefonato qualcuno has called someone ‘Someone has called’ b. se qualcuno vuole qualcosa if someone wants something ‘If anyone wants anything’ c. non è venuto nessuno not is come nobody ‘Nobody came’
[POS]
[NPI]
[NEG]
Typologically, we thus have a sharp contrast between NC languages and non-NC languages. As is shown in (10a–c), Standard German clearly patterns with the
A question of relevance
former in that the positive indefinite jemand appears in positive sentences as well as in negative polarity contexts. (10) a.
jemand hat angerufen someone has called ‘Someone has called’ b. wenn jemand anruft if someone calls ‘If anyone calls’ c. niemand hat angerufen nobody has called ‘Nobody has called’
[POS]
[NPI]
[NEG]
Despite the first evidence coming from data like (2b), Standard German is an NC language with n-words which are neither negative nor quantifiers. Though prescriptive rules are not able to change the grammatical type or to create a new one (or to violate principles of UG), they can hide the real properties of constructions, and this is what makes them highly problematic. Note that Standard German being a hidden NC language does not structurally differ from its dialects which exhibit NC even on the surface, and that makes the issue discussed here relevant for typology as well. Since the “overall aim of linguistic typology is to classify languages in terms of their structural properties” (Comrie 1990b: 447), the property of NC should be the relevant one for typology — independent of the strong tendency to avoid multiple occurrences of n-words on the surface.
5. Some further examples and some further sources of inconsistency The negation pattern in (2a,b) is one of the clearest examples of grammatical inconsistency from the core of grammar, and it is of a special kind, since it was shaped by prescriptive rules. However, there are numerous other cases, many of which are of an entirely different origin because it need not necessarily be intentionally set norms which have given rise to grammatical inconsistency. Let us consider some more examples to illustrate the point at issue. In German, the pronominal syntax is another important case where the standard language is highly misleading, as it lacks clitic pronouns. In contrast to this, German dialects have pronominal clitics — as do all Continental West Germanic dialects where this is a typologically characteristic property (cf. Weiß 2005a,b). Clitic pronouns can evolve very interesting syntactic effects such as inflected complementizers and pro-drop, as is the case in the 2sg in Bavarian, which can be seen in (11a,b).
193
"wei-r23">
194
Helmut Weiß
(11) a.
obsd as glaubsd if-2sg it believe-2sg ‘Whether you believe it’ b. moang bisd pro wida gsund tomorrow are-2sg pro again healthy ‘Tomorrow you will be well again’
All of this is not found in Standard German. The reason why clitics did not enter the standard variant was presumably its beginning as a written language, where the non-clitic forms, being more explicit, were preferred, and it had nothing to do with prescriptivism. However, the crucial point is that Standard German thus hardly qualifies for investigating issues depending on pronominal syntax, e.g., the relation between pro-drop, pronominal clitics and verbal inflection, or even regularities of word order concerning pronouns. For example, Haider (1994) has proposed that the difference between pro-drop languages of the Romance type and non-pro-drop languages of the Germanic type is the (non-)existence of subject clitics. According to his proposal, pro-drop developed in Romance languages because there pronominal subjects originally cliticized onto I0, that is to that head position where the AGR-features are spelled out (this situation is still found in some Northern Italian dialects.). As the AGR-features were then morphologically doubly represented, pronominal subjects were dropped in a second developmental stage, allowing pro-drop to occur (as in modern Italian). According to Haider (1994), the reason why Germanic languages never developed pro-drop is simply because they lack clitic pronouns. However, this cannot be the reason, since German dialects, albeit possessing pronominal clitics (as was demonstrated above with respect to Bavarian), are still non-pro-drop languages. The relevant difference to Romance languages seems to be that in German dialects pronominal clitics cliticize onto C0 instead of I0. But C0 is obviously not the position where AGR-features are normally spelled out, which explains why German dialects in general are non-pro-drop languages. They exceptionally allow for pro-drop only in those cases where C0 has developed visible inflectional properties (hence inflected complementizers), as is the case, for instance, in the 2sg and 2pl in Bavarian. In all other cases pro-drop is not possible (cf. Weiß 2005b for further discussion). A second consequence of the lack of clitic pronouns is that Standard German shows a very strange peculiarity with respect to the order of pronominal objects: the ‘enigmatic inversion’ (Abraham 1997) of pronominal indirect and direct objects. While in Standard German the unmarked order of nominal objects is indirect object before direct object, pronominal objects appear in the reversed order, cf. (12a vs. b).
A question of relevance
(12) a.
er hat dem Karl den Film zurück gegeben he has the-dat Charles the-akk film back given ‘He has given back the film to Charles’ b. er hat ihn mir zurück gegeben he has him-3sg-akk me-1sg-dat back given ‘He has given it back to me’
This is a somewhat unique quality because many languages do not display a difference between nominal and pronominal objects concerning their relative order, and Standard German clearly differs from its dialects in this respect. In German dialects, pronominal objects appear as clitics in the order indirect before direct object (see 13a). There is one well defined exception from this pattern: in those cases where the paradigm lacks a clitic form for the dative pronoun (as, e.g., in the 3sg masculine in Bavarian), both objects invert (see 13b). (13) a.
ea hod’ma’n zruck geem he has-me-1sg-dat-him-3sg-akk back given ‘He has given it back to me’ b. ea hod’n eam zruck geem he has-him-3sg-akk him-3sg-dat back given ‘He has given it back to him’
Note, however, that in contrast to the enigmatic inversion of the standard, the inversion observed in the dialects is a syntactically regular and transparent process, the reason simply being that non-clitic pronouns cannot cliticize onto C0, whereas the clitic direct object still does. A look at the Continental West Germanic dialects as a whole shows that clitic pronouns are very characteristic for this type of language, because this feature is present in nearly all of them (as well as inflected complementizers, though to a lesser extent, cf. Weiß 2005b on that matter). That Standard German does not possess clitic pronouns is thus a typological inconsistency.16 As noted above, the emergence of this inconsistency was presumably only due to the fact that Standard German started as a written language. Though the decision to use the full forms of pronouns may have been made purposefully (they were held to be more explicit), the intention was clearly different from the one that had led to the ban on multiple negation. This example demonstrates two important issues: (i) that the mere fact that a language exists only in written form can have far reaching consequences for the morpho-syntactic system of that language, and (ii) that literacy and prescriptivism are in principle independent from one another (though there is a close connection between the two, because words or constructions were often excluded from standard languages with the argument that they belong to spoken language, cf. Stein 1997).
195
"wei-r48"> "wei-r28">
196
Helmut Weiß
There are other inconsistencies in Standard German, and the intentions (if there were any) which may have been decisive for their development are not always as clear to recover as they were in the case of multiple negation and pronominal syntax. I will just mention two further examples: first, the use of the article, e.g. in combination with proper nouns (14a,b) and with predicative nouns (15a,b), and second, the survival of the postnominal genitive (see below). er traf (??die) Maria am Bahnhof he met (??the Mary at-the station ‘He met Mary at the station’ b. er traf die bereits wartende Maria am Bahnhof he met the already waiting Mary at-the station ‘He met Mary at the station where she was already waiting’
(14) a.
Peter ist (??ein) Lehrer17 Peter is (??a teacher b. *Peter ist guter Lehrer Peter is good teacher c. Peter ist ein guter Lehrer Peter is a good teacher d. Peter ist *(ein) Dummkopf Peter is (*a fool
(15) a.
In the German dialects, the article has developed a grammatically important function in that it is the bearer of case morphology, whereas nouns are not marked for case (Weiß 1998). This is the reason, why even proper nouns are used with the definite article (see Hodler 1969: 33 on that issue with respect to Bernese German). The use of proper nouns without an article in Standard German can have two reasons: first, Standard German has preserved the older stage where proper nouns were generally used without an article (see Hodler 1969: 33 on archaic variants of Bernese German which have also preserved this stage); second, the intention behind the use of proper nouns without articles could have been due to the fact that there is a long-standing philosophical tradition according to which proper nouns are directly and unambiguously referring items making the use of the definite article superfluous. Since prescriptive grammarians were heavily influenced by rationalist thinking (Langer 2001: 171f.), it is no surprise that they explained the bare use of proper nouns along these lines.18 As example (14b) shows, the article becomes obligatory in Standard German when an attribute is present. This is strong evidence that the syntax of proper nouns in Standard German requires the article as well, but it is for some reason or other suppressed if possible, as in (14a). A similar split can be observed with predicative nouns: at least some nouns can be used predicatively without the indefinite article (cf. 15a vs. d), but again, when
A question of relevance
they are accompanied by an attribute, the article becomes obligatory (cf. 15b vs. c). Grammars of Standard German like the Duden (§561) offer as an explanation that predicative nouns which denote a socially established and acknowledged group (‘eine sozial etablierte und anerkannte Gruppe’) can be used without an article. Though there really seems to exist an affinity between group denoting nouns (vs., say, property denoting nouns like fool) and the bare use as predicates, the explanation is obviously rather ad hoc and misses the crucial point: that even group denoting nouns require an article if construed with an attribute.19 It seems to me to be rather difficult to detect the reason which has enabled the bare use of predicative nouns, but it is obvious that it results in a grammatically inconsistent peculiarity and the grammarian’s reasoning behind it is extralinguistic — just as it was the case with proper nouns and the definite article. A similar peculiarity is the postnominal morphological genitive (16a) which has disappeared from the dialects and was substituted there by the corresponding constructions in (16b–d): (16) a.
das Haus der Mutter the house the-gen Mutter b. Mutters Haus20 mother’s house c. der Mutter ihr Haus the-dat mother her house d. das Haus von der Mutter the house of the mother ‘mother’s house’
The survival of the postnominal genitive in Standard German is comparable to the ban on multiple negation discussed above as there is a clear motive behind its use by German writers. In this case however, it was the model of classical Latin which was held to be a perfect language, and every vernacular grammarian tried to make his language as similar to the ideal as possible. This was the deeper reason why the genitive as a morphological case survived in Standard German, and the way its survival was enabled was that it was learned from school grammars. As a result of this artificial survival, Standard German differs from all other Germanic languages (with the exception of Islandic) which have lost morphological genitive (see also von Polenz 1994: 253f. on inflectional properties of German in general compared to other Germanic languages). The phenomena discussed in this section are examples where Standard German is grammatically or typologically inconsistent — compared, e.g., to its dialects, though they do not involve violation of UG principles proper. Nevertheless, they establish problematic cases and should be avoided, since they can lead linguists to postulate UG principles based on them though their properties are rather due to
performance preferences, some of them quite obvious (the case of genitive or negation), others rather obscure. A similar plea can be made for linguistic typology. It is quite common in typology to use data from standard languages and there are some phenomena where these languages provide an ‘unrepresentative data base’ (Fleischer 2004). For this reason, typology begins to turn on dialects as objects of investigation (cf. Fleischer 2004; Weiß 2004b). That neither means that standard languages are totally useless for linguistics (cf. Weiß 2004a) nor that dialects are consistent in any respect. Dialects, like any other language (cf. Andersen 1999: 121), are sets of contingent and heterogeneous properties because they have resulted from the interaction of a variety of factors, among which UG and L1-acquisition are not the only ones. The examples discussed in this section should only illustrate the widely ignored fact that standardization is one factor which can generate some inconsistencies.
6. A plea for cooperation between theoretical and sociolinguistics The aim of my paper was to show that there is a problem with some data from standard languages. Because they were once N2 languages, prescriptive rules shaped parts of their grammatical properties. These data are thus not reliable and should be neglected in certain kinds of linguistic research. In theoretical linguistics normative pressure was ignored for a long time (cf. Schütze 1996: 83f.). However, one can observe a beginning and slowly increasing awareness of it. For instance, in the works of Emonds (1999) and Schütze (2001) on accusative subjects, of Sobin (1997) and Lasnik & Sobin (2000) on grammatical viruses, or of van Gelderen (2004) on grammaticalization. They take into account that some data do not qualify as evidence and they try — in Schütze’s (2001: 214) words — to “escape the heavy prescriptive influence”. In Weiß (2001), a theoretical model is sketched, which is based on the presence or absence of L1 acquisition to capture the possible difference in naturalness of languages and the different possibilities of language change following thereof (see also above). Somewhat surprisingly, although sociolinguistics is concerned with “how societies shape languages” — to quote Chambers (2002: 705) — this special kind of language-shaping does not play an important role there. Such processes which have demonstrably shaped standard languages to a considerable extent do not occur, for example, in the Handbook of Language Variation and Change published in 2002, presumably because the central perspective is not a historical one. Yet, in my opinion this is the field, I think, where historical sociolinguistics and theoretical linguistics can form a very useful cooperation. Theoretical linguists should acknowledge that standard languages once were languages of a special kind
"wei-r46">
A question of relevance
and that they still possess some inconsistent properties, although they are on the way to becoming N1 languages. On the other hand, sociolinguistics can learn from theoretical linguistics which grammatical properties can be shaped by social forces and which ones cannot (see Weiß 2004a for further discussion of the second issue). I will briefly discuss the first issue, i.e. explain how theoretical linguistics could benefit from sociolinguistics. There are two possible ways. First, Haspelmath (this volume) proposes that UG is best studied by investigating the acquisition of unnatural languages such as Esperanto. Bearing in mind what I have said about standard languages, I think they are appropriate candidates, too. Due to their special sociolinguistic background when they emerged — the lack of L1 acquisition, being restricted to writing, used only by a minority of the population (cf. Weiß 2004a) — they were N2 languages whose naturalness was limited, in other words they were unnatural to some extent. As said in Section 2, the situation changed in the 20th century because standard languages are now widely used in everyday communication and L1-acquired. This situation is reminiscent of the example of Ivrit mentioned in Section 2 where we saw that L1 acquisition has induced language change from Classical Hebrew to modern Ivrit. Among other things, a major part of this change consisted of the “disappearance of many irregularities and idiosyncratic constructions that were common in the written language” (Versteegh 1993: 547). Now the question is whether one could expect that similar changes had occurred in those standard languages which were subject to L1 acquisition. Set within a generative framework, we can expect children exposed to a standard language as their primary linguistic data to acquire an I-language that at least temporarily differs from their input language in those properties where the latter shows inconsistencies of the sort discussed above. For instance, they should use NC constructions instead of the standard pattern before they somehow learn the prescriptive rule.21 This could be expected because children learning a language firstly assume the unmarked parameter values according to UG settings. With regard to negation, NC corresponds to UG in that n-words in NC constructions are no quantifiers and they are not negative, whereas in Standard English/German style negative constructions such as (2a,b) n-words behave like negative quantifiers (though this is clearly not the case in other environments, cf. examples 5–7 above). That is, children should assume NC as the zero hypothesis, and they then must learn the exceptional behavior of Standard English/German style n-words (see also NcNeill 1970: 94). This is confirmed at least for some standard languages. For instance, McNeill (1970: 94f.) reports for English that middle class children whose parents do not use multiple negation for some time produce sentences like (17a,b).
(17) a. I don’t want no supper (McNeill 1970: 94) b. Nobody don’t like me (McNeill 1970: 106)
Unfortunately, the situation for German seems not to be as straightforward as expected. However, although Hamann (1994: 77) explicitly states that NC “cannot be said to exist in child German”, there is new evidence of the contrary. There are NC data in a (yet unpublished) language acquisition study at the Max-PlanckInstitute for Evolutionary Anthropology conducted by Heike Behrens under the guidance of M. Tomasello.22 In this corpus, Leo — the child under investigation — quite regularly uses NC constructions like the ones given in (18a,b), though they seem to be nearly completely absent in the input data.23 (18) a.
keine Enten nicht da no ducks not there ‘There are no ducks’ b. Autos haben kein Blaulicht nicht an cars have no blue light not on ‘cars do not have blue light switched on’
The same seems to hold for Dutch children who show NC at least up to the age of four (according to the CHILDES database for Dutch, Hedde Zeijlstra, p.c.). If we would suppose that standard languages have always been fully natural languages (i.e. N1 languages as defined above), this would be a somewhat surprising phenomenon. However, if we take their sociolinguistically exceptional nature into account, these occurrences of NC would be what we have to expect. The acquisition of the morphological genitive is another example pointing in the same direction: Eisenbeiss (2002) shows that the prenominal genitive -s (as in 16b above) — which is no genitive at all! — is mastered quite early by children (between the age of 2 and 3), whereas the postnominal morphological genitive (as in 16a above) is absent from child German, and does not appear before the age of 6 (Mills 1985: 185). Children further use the s-genitive in cases where adult German speakers use the morphological genitive (Eisenbeiss 2000). This means that the morphological genitive must be learned in later stages (presumably at school). The German morphological genitive seems to be a prototypical example of a grammatically deviant prestige construction “imposed in certain, especially written forms of language exclusively through paralinguistic cultural institutions” (Emonds 1999: 235). Note that a similar situation can be observed in Dutch, where the morphological genitive is “not part of the language acquired during the critical period” (Weerman & de Wit 1999: 1184) either. A second benefit could be gained in historical linguistics. Some linguists have already acknowledged the possibility that historical texts do not directly reflect native competences. For instance, Kroch (2001) assumes the existence of “syntactic
"wei-r41">
A question of relevance
diglossia within individual authors” (Kroch 2001: 722) with “one of the diglossic variants being more native than the other” (Kroch 2001: 723).24 He further assumes that this diglossia represents “an opposition between an innovative vernacular and a conservative literary language” (Kroch 2001: 723). This is perfectly in line with my N1-N2 model mentioned above where the N2-competence is not native, but secondarily learned. Kroch (2001) himself develops a competition model where both competences compete for surfacing in the texts. Given this scenario, it is also possible that neither competence wins, with the result that certain constructions are hybrid forms (e.g., via hypercorrection). For instance, up to the 18th century we find constructions like (19a) in German which look like a combination of the prenominal dative (19b) and adnominal genitive (19c): (19) a.
des Herrn sein Brief (von Polenz 1994: 271) the-gen master his letter b. dem Herrn sein Brief the-dat master his letter c. des Herrn Brief the-gen master letter ‘The master’s letter’
The dative variant was part of the vernacular (i.e. dialectal) N1 competence, whereas the genitive was part of the standard N2 competence. So it could well be that (19a) never was part of any coherent linguistic competence, but is the result of mixing both. If this assumption is correct, we can expect that the variation observable in historical texts is not always due to different parameter settings (see also Kroch 2001 on this issue), but may only reflect different hybrid forms. I think we can even expect the existence of interlanguage grammars as known from second language acquisition (Ritchie & Bhatia 1996): many writers in former times may only have acquired a partial competence for writing, especially when they were not professional writers, so that their texts do not correspond to the target standard grammar, neither do they reflect their L1 competences. The texts may thus have been the output of interlanguage grammars. Under this model, it would be essential for historical linguistics to obtain (and take into account) at least rudimental sociolinguistic information about the producers of the texts under investigation (see Kroch 2001: 709f. for a very illustrating example concerning the Peterborough manuscript of the Anglo-Saxon Chronicle).
201
"wei-n*">
202
Helmut Weiß
7. Conclusion I would like to end my paper with a quote from Chomsky (2001) where he, too, underlines the need of cooperation between the internalist and sociolinguistic perspective: Internalist biolinguistic inquiry does not, of course, question the legitimacy of other approaches to language, any more than internalist inquiry into bee communication invalidates the study of how the relevant internal organization of bees enters into their social structure. The investigations do not conflict; they are mutually supportive. In the case of humans, though not other organisms, the issues are subject to controversy, often impassioned, and needless. (Chomsky 2001: 41f.)
In Sections 2 to 5, I tried to demonstrate the necessity for theoretical linguists — be they generativists or typological linguists — to take the sociology of languages seriously. In Section 6, I attempted to demonstrate two possible ways where one can use sociological background information in investigating languages synchronically and historically (and that even explanatory theories in the generativist’s sense may benefit thereof). The overall aim, however, was to point to a hitherto largely ignored problem in linguistics — the fact that data from standard languages may not always qualify as evidence in linguistics.
Notes *This paper developed out of the talk given at the Workshop on linguistic evidence at the DGfS 2003 meeting at Munich. I would like to thank the Workshop audience for providing useful comments and Martina Penke and Anette Rosenbach for organizing this highly interesting and stimulating workshop. Many thanks to Katrin Axel, Heike Behrens, Sonja Eisenbeiss, Jochen Geilfuss-Wolfgang, Joachim Jacobs, Martin Haspelmath, Marga Reis, Susanne Winkler, and Hedde Zeijlstra for various kinds of help, comments and suggestions, and to Elly van Gelderen, Fritz Newmeyer, Martina Penke, and Anette Rosenbach for very helpful reviews of an earlier version of this paper. Special thanks to Janna Lisa Zimmermann for checking and improving my English. 1. As we will see, this does not mean that standard languages possess properties which violate principles of UG. 2. Goethe wrote in his autobiography Dichtung und Wahrheit: “Ich war nämlich in dem oberdeutschen Dialekt [= Hessian] geboren und erzogen worden, und obgleich mein Vater sich stets einer gewissen Reinheit der Sprache befliß und uns Kinder auf das, was man wirklich Mängel jenes Idioms nennen kann, von Jugend an aufmerksam gemacht und zu einem besseren Sprechen vorbereitet hatte, so blieben mir doch gar manche tiefer liegende Eigenheiten, die ich, weil sie mir ihrer Naivität wegen gefielen, mit Behagen hervorhob, und mir dadurch von meinen neuen [Leipziger] Mitbürgern jedesmal einen strengen Verweis zuzog.” (Goethe 1982: 250f.).
[‘I had been born and bred in the Upper-German dialect; and although my father always labored to preserve a certain purity of language, and, from our youth upwards, had made us children attentive to what may be really called the defects of that idiom, and so prepared us for a better manner of speaking, I retained nevertheless many deeper-seated peculiarities, which, because they pleased me by their /naivete/, I was fond of making conspicuous, and thus every time I used them incurred a severe reproof from my new fellow-townsmen’, Goethe 1971, I:268] 3. Throughout this paper the term ‘natural language change’ means language change governed by internal principles, unhindered by external/prescriptive forces — which is not (necessarily) the same as natural language change in the sense of natural morphology (cf. Wurzel 1994). 4. It is still controversial whether grammar change and language change are the same, see the discussion between Fischer and Lightfoot (this volume). 5. This case is thus coming near of fulfilling Haspelmath’s (this volume) requirement that the innate UG is best studied by examining the acquisition of unnatural languages. Though Classical Hebrew once was a fully natural language (i.e. one that was acquired as first language) it came completely out of use as a means of communication and was transmitted from generation to generation only for religious reasons. The results of this experimentum in naturam (see the main text) speak for themselves. On the acquisition of Hebrew see also Berman (1985): she, too, highlights the innovative power of language acquisition. 6. There are still many standard languages which are not acquired as first languages and hardly spoken in everyday communication. This is the case with Standard Indonesian and Standard Arabic (M. Haspelmath, p.c.). Very revealing is what Kaye (1987: 675) writes about the diglossia situation in the Arabic speaking world: “There have even been reports that certain individuals have adapted the standard language as their exclusive means of oral communication, yet I have reservations about this”. As F. Newmeyer (p.c.) informs me, there are apparently a million Indians who claim Sanskrit as a native language: if that were indeed the case, it would be a highly interesting one, comparable to the Ivrit case. Unfortunately, I have not managed to gain further information about this matter. 7. Following Schütze (1996), I use ‘grammaticality judgment’ as a cover term for all kinds of intuitions concerning the grammaticality or acceptability of an utterance. 8. The same holds for the NEG criterion (Haegeman & Zanuttini 1991) where n-words are treated as NEG-operators which take scope from the specifier of NegP. 9. See Section 4 for data showing that n-words in Standard German are neither negative nor quantifiers. 10. Though the externalist approach as such is not of crucial relevance for the perspective of this volume, typological generalizations as well as historical data are commonly taken to reflect the influence of UG principles. These topics are controversially discussed in several contributions to this volume (see Haspelmath, Newmeyer, Wunderlich, and Kirby et al. on the (non-)relation between typology and UG; Fischer discusses different aspects of the historical dimension of language with regard to UG). 11. Dahl’s own example is the ‘development from Latin to modern French’ which seems to be a misunderstanding at least in one point because modern French, though the clitic negative particle is lost, is still an NC language and does not belong to type (iii), cf. Weiß (2002a).
12. This split was originally proposed by Bech (1955/1957), see also Jacobs (1991: 595), Penka (2002: 3f.), and Weiß (2002b: 138). 13. As shown in Weiß (2002b: 137f.), English n-words exhibit quite the same behavior in such VP-ellipsis constructions. 14. Thanks to S. Winkler (p.c.) for drawing my attention to this type of VP-ellipsis. 15. N-words are defined strictly morphologically in Weiß (2002a: 87): they incorporate a morpheme which historically goes back to a negative expression like English no- in nobody, nothing etc. (comprising suppletive forms like Spanish nadie ‘nobody’). Note that the Hindi particle bhii occurring with indefinites under negation and in negative polarity environments means ‘also, even’ so that Hindi does not qualify as an NC language. 16. Dutch, the other Continental West Germanic language which underwent a similar standardization process, shows the same split in that the dialects possess clitic pronouns (cf. Weiß 2005b), whereas the standard variant does not. 17. There seems to be a rather strong tendency among younger speakers of German to use or at least to tolerate the indefinite article with predicative nouns (e.g., Martina Penke, p.c.). The double question marks follow the judgments given in prescriptive grammars (see below). 18. Cf. Adelung (1781: 94): “Ein eigener Nahme ist […] schon vollkommen bestimmt, daher bedarf er […] keines Artikels” [‘A proper noun is […] already perfectly determined, hence it does not need an article’, my translation]. Note that this kind of reasoning is still prevalent in semantics (see Heim 1991: 510f. for a good discussion of the apparent incompatibility of the definite article with proper nouns) and in descriptive German grammars (see, e.g., Eisenberg 1999: 160f.). 19. The additional criterion of being socially established and acknowledged seems to play a minor role, see Weiß (2004a) for counter-examples and further discussion. 20. Though constructions like (16b) are called prenominal genitive, they differ categorically from the postnominal morphological genitive. Compare Demske (2001: 206–255) who provides compelling historical evidence that the prenominal genitive has become part of the article system in New High German, and Eisenbeiss (2000, 2002) for evidence from German L1 acquisition (see Section 6 for further discussion). The contrast in acquisition clearly demonstrates the exceptional nature of the postnominal genitive in German. See also Weerman & de Wit (1999) for similar developments in Dutch: they show that the morphological genitive, though still occurring in Modern Dutch prose, is ‘not part of the core system’, but “acquired relatively ‘late’” (Weerman & de Wit 1999: 1157), whereas the s of the s-genitive is a determiner element in Dutch as well. Many thanks to A. Rosenbach for supplying me with these arguments and drawing my attention to Demske, Eisenbeiss, and Weerman & de Wit. 21. This issue was raised by J. Jacobs in the discussion at the workshop. 22. I am very much indebted to Heike Behrens for providing me with data and for giving me allowance to use them. Thanks to Jochen Geilfuss-Wolfgang for drawing my attention to the existence of this research. 23. In the Leo corpus, there are 40 NC constructions, the first from the age of 2 years, 2 months and 2 days, the last from the age of 4 years, 6 months and 16 days, whereby the overwhelming amount of these constructions (i.e. 37) was produced before Leo’s third birthday. Therefore, it seems that Leo underwent an early productive period of NC within the acquisition of negation — an impression which exactly corresponds to what NcNeill (1970: 94f.) reports for English.
Concerning the adult language which was part of the input of Leo, there is only one clear example of NC recorded (cf. i). A second possible example is (ii) where it is doubtful to me whether the initially intended sentence would really have ended up in a NC construction. i. ii.
keine Malkreide, kein nichts no paint-chalk, no nothing weil du keine [///] deinen zweiten Hausschuh nich(t) anhast because you no [///] your second slipper not on-have [[///] = selfcorrection of the speaker]
24. Thanks to Katrin Axel for drawing my attention to Kroch (2001) and discussions on this issue. A. Rosenbach has pointed out to me that standard features may not always be conservative and that prescriptivism in its early stages is rather change-inducing, whereas dialects can be sometimes rather archaic (compare the above mentioned archaic variants of Bernese German, where proper nouns are still used without articles).
References Abraham, Werner. 1997. “The base structure of the German clause under discourse functional weight: contentful functional categories vs. derivative functional categories”. In: Abraham, Werner; and van Gelderen, Elly (eds), German: syntactic problems — problematic syntax 11–42. Tübingen: Niemeyer. Adelung, Johann Christoph. 1781/1977. Deutsche Sprachlehre. Berlin: Voß & Sohn. [Reprint Hildesheim: Olms]. Andersen, Stephen R. 1999. “A formalist’s reading of some functionalist work in syntax”. In: Darnell, Michael et al. (eds), Functionalism and formalism in linguistics (vol. 1) 111–135. Amsterdam: Benjamins. Anderwald, Liselotte. 2002. Negation in non-standard British English. London: Routledge. Bech, Gunnar. 1955/1957. Studien über das deutsche verbum infinitum. Kopenhagen: Munksgaard (2nd edition Tübingen: Niemeyer 1983). Beghelli, Filippo; and Stowell, Tim. 1997. “Distributivity and negation: the syntax of each and every”. In: Szabolcsi, Anna (ed.), Ways of scope taking 71–107. Dordrecht: Kluwer. Berman, Ruth A. 1985. “The acquisition of Hebrew”. In: Slobin, Dan I. (ed.), The cross-linguistic study of language acquisition (vol. 1) 255–371. Hillsdale, NY: Erlbaum. Chambers, Jack K. 2002. “Language and Societies”. In: Chambers, Jack K.; Trudgill, Peter; and Schilling-Estes, Natalie (eds), The handbook of language variation and change 705–706. Oxford: Blackwell. Chomsky, Noam. 2000a. New horizons in the study of language and mind. Cambridge: CUP. Chomsky, Noam. 2000b. The architecture of language. Oxford: OUP. Chomsky, Noam. 2001. “Derivation by phase”. In: Kenstowicz, Michael (ed.), Ken Hale. A life in language 1–52. Cambridge, MA: MIT Press. Comrie, Bernard. 1990a. “Languages of the world: who speaks what”. In: Collinge, Neville E. (ed.), An encyclopaedia of language 956–983. London: Routledge. Comrie, Bernard. 1990b. “Linguistic typology”. In: Newmeyer, Frederick J. (ed.), Linguistics. The Cambridge survey, vol. 1: Linguistic theory: foundations 447–461. Cambridge: CUP. Dahl, Östen. 1993. “Negation”. In: Jacobs, Joachim; von Stechow, Arnim; Sternefeld, Wolfgang; and Vennemann, Theo (eds), Syntax. Ein internationales Handbuch zeitgenössischer
Forschung. An international handbook of contemporary research (1. Halbband) 914–923. Berlin: de Gruyter. DeKeyser, Robert. 2003. “Implicit and explicit learning”. In: Doughty, Catherine J.; and Long, Michael H. (eds), The handbook of second language acquisition 313–348. Oxford: Blackwell. Demske, Ulrike. 2001. Merkmale und Relationen. Diachrone Studien zur Nominalphrase des Deutschen. Berlin: de Gruyter. Duden. Grammatik der deutschen Gegenwartssprache. 6., neu bearbeitete Auflage. Mannheim: Duden Verlag. Durrell, Martin. 1999. “Standardsprache in England und Deutschland”. Zeitschrift für Germanistische Linguistik 27: 285–308. Eisenbeiss, Sonja. 2000. “The acquisition of the Determiner Phrase in German child language”. In: Friedemann, Marc-Auriel; and Rizzi, Luigi (eds), The acquisition of syntax: studies in comparative developmental linguistics 27–62. London: Longman. Eisenbeiss, Sonja. 2002. Merkmalsgesteuerter Grammatikerwerb. Eine Untersuchung zum Erwerb der Struktur und Flexion von Nominalphrasen. Dissertation, Heinrich-HeineUniversität Düsseldorf. Eisenberg, Peter. 1999. Grundriß der deutschen Grammatik, Band 2: Der Satz. Stuttgart: Metzler. Emonds, Joseph E. 1999. “Grammatically deviant prestige constructions”. In: Emonds, Joseph (ed.), The syntax of local processes. Collected essays (vol. 1) 235–271. Bloomington: Indiana University Linguistic Club. Fleischer, Jürg. 2004. “A typology of relative clauses in German dialects”. In: Kortmann, Bernd (ed.), Dialectology meets typology. Dialect grammar from a cross-linguistic perspective 211–243. Berlin: de Gruyter. Gelderen, Elly van. 2004. “Economy, innovation, and prescriptivism: from spec to head and head to head”. The Journal of Comparative Germanic Linguistics 7: 59–98. Goethe, Johann Wolfgang von. 1971. The autobiography of Johann Wolfgang von Goethe (Dichtung und Wahrheit) (vol. 1,2). Translated by John Oxenford. Introduction by Gregor Sebba. London: Sidgwick and Jackson. Goethe, Johann Wolfgang von. 1982. Hamburger Ausgabe, Bd 9. Autobiographische Schriften I. Textkritisch durchgesehen von Liselotte Blumenthal. Kommentiert von Erich Trunz. München: Deutscher Taschenbuchverlag. Haegeman, Liliane; and Zanuttini, Raffaella. 1991. “Negative heads and the Neg criterion”. The Linguistic Review 8: 233–251. Haider, Hubert. 1994. “(Un-)heimliche Subjekte — Anmerkungen zur Pro-drop Causa, im Anschluß an die Lektüre von Osvaldo Jaeggli & Kenneth J. Safir, eds., The null subject parameter”. Linguistische Berichte 153: 372–385. Hamann, Cornelia. 1994. “Negation and truncated structures”. In: Aldridge, Michelle (ed.), Child language 72–83. Clevedon: Multilingual Matters LTD. Haspelmath, Martin. 1997. Indefinite pronouns. Oxford: Clarendon Press. Heim, Irene. 1991. “Artikel und Definitheit”. In: von Stechow, Arnim; and Wunderlich, Dieter (eds), Semantik. Semantics. Ein internationales Handbuch der zeitgenössischen Forschung. An international handbook of contemporary research 487–535. Berlin: de Gruyter. Henry, Alison. 2002. “Variation and syntactic theory”. In: Chambers, Jack K.; Trudgill, Peter; and Schilling-Estes, Natalie (eds), The handbook of language variation and change 267–282. Oxford: Blackwell. Hodler, Werner. 1969. Berndeutsche Syntax. Bern: Francke Verlag.
Jacobs, Joachim. 1991. “Negation”. In: von Stechow, Arnim; and Wunderlich, Dieter (eds), Semantik. Semantics. Ein internationales Handbuch der zeitgenössischen Forschung. An international handbook of contemporary research 560–596. Berlin: de Gruyter. Kaye, Alan S. 1987. “Arabic”. In: Comrie, Bernhard (ed.), The world’s major languages 664–685. London: Croom Helm. Kroch, Anthony S. 2001. “Syntactic change”. In: Baltin, Mark; and Collins, Chris (eds), The handbook of contemporary syntactic theory 699–729. Oxford: Blackwell. Lahiri, Utpal. 1998. “Focus and negative polarity in Hindi”. Natural Language Semantics 6: 57–123. Langer, Nils. 2001. Linguistic purism in action. How auxiliary tun was stigmatized in Early New High German. Berlin: de Gruyter. Lasnik, Howard; and Sobin, Nicholas. 2000. “The WHO/WHOM puzzle: the preservation of an archaic feature”. Natural Language and Linguistic Theory 18: 343–371. Lightfoot, David W. 1999. The development of language: acquisition, change and evolution. Oxford: Blackwell. Lyons, John. 1991. Natural language and universal grammar. Essays in linguistic theory (vol. 1). Cambridge: CUP. McNeill, David. 1970. The acquisition of language. The study of developmental psycholinguistics. New York: Harper and Row. Mills, Anne E. 1985. “The acquisition of German”. In: Slobin, Dan I. (ed.), The cross-linguistic study of language acquisition (vol. 1) 141–254. Hillsdale, NY: Erlbaum. Penka, Doris. 2002. Kein muss kein Rätsel sein. Zur Semantik der negativen Indefinita im Deutschen. Master’s thesis, revised version. Universität Tübingen. Polenz, Peter von. 1994. Deutsche Sprachgeschichte vom Spätmittelalter bis zur Gegenwart, vol. 2: 17. und 18. Jahrhundert. Berlin: de Gruyter. Ritchie, William C.; and Bhatia, Tej K. 1996. “Second language acquisition: introduction, foundations, and overview”. In: Bhatia, Tej K.; and Ritchie, William C. (eds), Handbook of second language acquisition 1–46. London: Longman. Schütze, Carson T. 1996. The empirical base of linguistics. Grammaticality judgments and linguistic methodology. Chicago: The University of Chicago Press. Schütze, Carson T. 2001. “On the nature of default case”. Syntax 4: 205–238. Schütze, Carson T. 2003. “Linguistic theory and empirical evidence: Clarifying some misconceptions”. Talk given at the 25th annual meeting of the Deutsche Gesellschaft für Sprachwissenschaft in Munich, February 26–28. Sobin, Nicholas. 1997. “Agreement, default rules, and grammatical viruses”. Linguistic Inquiry 28: 318–343. Stein, Dieter. 1997. “Syntax and varieties”. In: Cheshire, Jenny; and Stein, Dieter (eds), Taming the vernacular: from dialect to written standard language 35–50. Harlow: Longman. Van Marle, Jaap. 1997. “Dialect versus standard language: nature versus culture”. In: Cheshire, Jenny; and Stein, Dieter (eds), Taming the vernacular: from dialect to written standard language 13–34. Harlow: Longman. Versteegh, Kees. 1993. “Esperanto as a first language: language acquisition with a restricted input”. Linguistics 31: 539–555. Weerman, Fred; and de Wit, Petra. 1999. “The decline of the genitive in Dutch”. Linguistics 37: 1155–1192. Weiß, Helmut. 1998. Syntax des Bairischen. Studien zur Grammatik einer natürlichen Sprache. Tübingen: Niemeyer. Weiß, Helmut. 1999. “Duplex negatio non semper affirmat. A theory of double negation in Bavarian”. Linguistics 37: 819–846.
207
208
Helmut Weiß
Weiß, Helmut. 2001. “On two types of natural languages. Some consequences for linguistics”. Theoretical Linguistics 27: 87–103. Weiß, Helmut. 2002a. “Indefinite pronouns. Morphology and syntax in cross-linguistic perspective”. In: Simon, Horst; and Wiese, Heike (eds), Pronouns: grammar and representation 85–107. Amsterdam: Benjamins. Weiß, Helmut. 2002b. “A quantifier approach to negation in natural languages. Or why negative concord is necessary”. Nordic Journal of Linguistics 25(2): 125–154. Weiß, Helmut. 2004a. “Zum linguistischen Status von Standardsprachen”. In: Kozianka, Maria; Lühr, Rosemarie; and Zeilfelder, Susanne (eds), Indogermanistik, Germanistik, Linguistik. Akten der Arbeitsstagung der Indogermanischen Gesellschaft, Jena 18.-20.9.2002 591–643. Hamburg: Verlag Dr. Kovacˇ. Weiß, Helmut. 2004b. “Vom Nutzen der Dialektsyntax”. In: Patocka, Franz; and Wiesinger, Peter (eds), Morphologie und Syntax deutscher Dialekte und historische Dialektologie des Deutschen 21–41. Wien: Edition Präsens. Weiß, Helmut. 2005a. “Syntax der Personalpronomen im Bairischen”. In: Krämer-Neubert, Sabine; and Wolf, Norbert-Richard (eds), Bayerische Dialektologie. Akten der Internationalen Dialektologischen Konferenz 26–28 Feb. 2002 179–188. Heidelberg: Winter. Weiß, Helmut. 2005b. “Inflected complementizers in Continental West Germanic dialects”. Zeitschrift für Dialektologie und Linguistik 72(22): 148–168. Weiß, Helmut. 2005c. “Von den vier Lebensaltern einer Standardsprache. Ein Vorschlag zur Neukonzeptionierung”. Deutsche Sprache 33(4): 289–307. Weiß, Helmut. 2005d. “The double competence hypothesis”. In: Kepser, Stephan; and Reis, Marga (eds), Linguistic Evidence: Empirical, Theoretical, and Computational Perspectives 557–575. Berlin: Mouton de Gruyter. Wurzel, Wolfgang Ulrich. 1994. Grammatisch initiierter Wandel. Bochum: Universitätsverlag Dr. N. Brockmeyer.
"sim-n*">
The Relevance of Variation Remarks on Weiß’s Standard-Dialect-Problem* Horst J. Simon Humboldt-Universität zu Berlin / Universität Wien
I understand Helmut Weiß’s paper (in this volume) as a plea for an intensified use of dialect data in linguistic research, and for a more cautious treatment of standard languages. I totally agree with Weiß’s general spirit and with his main aim. However, I cannot fully grasp the logic behind his reasoning. And, more importantly, I wonder whether one should not take a somewhat different perspective in future research. Weiß’s argumentation is deeply rooted in the Chomskyan research paradigm: ‘a natural language’ is seen as emerging in an individual (during the course of first language acquisition) through the interplay of innate Universal Grammar and the experience provided by the linguistic input from the child’s surroundings. According to Weiß, standard languages (more generally, maybe: prestige varieties) disqualify as data sources for the investigation of natural languages thus defined due to some of their special properties: for instance, they exist(ed) only in written form and they were subject to an immense amount of normative pressure during the course of their development; therefore, they are said to represent inconsistent systems. Taking the example of German, Weiß refers to the enormous influence of prescriptive grammarians in the 18th century as the main factor in the disappearance of multiple negation in the standard variety — in contrast to the dialects where such negative concord still prevails. While it is true that the relevance of rationalist grammaticography of the time should not be ignored, one should not, on the other hand, overestimate it. The study quoted by Weiß actually shows that “the stigmatization of polynegation was late, occurred in stages, and also took place in a way that does not correlate with its disappearance from general language use” Langer (2001: 171). In fact, the mid-eighteenth-century rationalist grammarian, Johann Christoph Gottsched wrote on the subject: “Ich würde es [namely, arguing against double negation; HS] auch gewiß nicht thun, wenn es nicht schon von sich selbst abgekommen wäre.”(transl.: ‘I would surely not argue against double negation if it had not fallen into misuse already by itself.’); 1762, quoted in Langer (2001: 170). Incidentally, he had even stated in earlier editions of his grammar
"sim-r9"> "sim-r12"> "sim-r2"> "sim-r6">
210
Horst J. Simon
(1748–1757), i.e. well before the grammarians’ influence became felt, that double negation had already been done away with (cf. Gottsched/Penzl 1980: 127).1 Concerning Weiß’s second example, the lack of cliticization in Standard German due to its status as a written-only language (unlike the spoken-only dialects), one wonders why in languages such as Italian or French clitic pronouns are abundant (the former with a spectrum of varieties and a history of standardization comparable to that of German — the latter boasting the normative, literalitybased institution par excellence, the Académie Française!). Apart from these factual details, I take issue with Weiß’s main contention which I consider to be based on a misunderstanding: it is true that until recently, the local dialect used to be the primary, often even the only means of communication in many German speech communities. Thus, in a sense, Standard German used to be a ‘language without native speakers’. However, there are strong regional differences in that respect, especially when one focuses on the 20th century (on the contemporary situation cf. contributions to Stickel 1997; and also Barbour & Stevenson 1990). In many regions (particularly in the North where the former dialect, Low German, has virtually disappeared), most speakers use some variant of the standard language from an early age onwards.2 — Now, taking the acquisitional perspective mentioned above, the historical reasons for the shape of present-day Standard German are simply irrelevant as long as one can find and draw on native speakers today.3 Synchronically, Standard German is just one among others in the German multi-dimensional space of varieties, a special one, but still a ‘natural’ one. Therefore, it can be studied through native speaker judgments just as any other variety, provided that the native competence of the persons under investigation is carefully checked. — Note, on the other hand, that there is some justification in Weiß’s argumentation against the common practice of analyzing predominantly the standard varieties. But this is the realm of diachronic studies: it would, of course, be rather naïve to construe — as has sometimes been done in the past — elaborate diachronic scenarios which explain the change from Old and Middle High German multiple negation to ‘single’ negation in Contemporary German without discussing the dialectal facts. Weiß is quite right here: diachronically speaking — and only diachronically espeaking — it is Standard German that is the deviant variety that needs to be especially explained. The true descendants of the older stages of the language are the modern dialects. This can also be seen in a wide range of other phenomena, e.g. the ubiquity of do-periphrasis (Fischer 2001), or the general retainment of old vowel contrasts (cf. Wiesinger 1983).4 In sum, it is certainly justified to highlight the special status of standard varieties and to warn against using them too unreflectedly in linguistic research. But then the interesting question is what lesson we should learn from Weiß’s remarks. Here I think that he does not go far enough. His observations point to a
"sim-r7"> "sim-r3"> "sim-r14"> "sim-n*">
The Relevance of Variation
much deeper problem in mainstream generative linguistics than he seems to realize: the preoccupation with homogeneous and invariant linguistic systems, exemplified by Weiß’s reference to the competing grammars approach by Kroch (2001). In traditional generative grammar, linguistic variation can only be conceptualized as ‘diglossia’ — the existence of two (or more?) discrete grammars next to each other.5 While this idea of cognitive discreteness may be appropriate for the situation in, say, Arabic, Sinhala, or Swiss German, it is probably not appropriate for a description of the constant (and gradient!) variation along a continuum of varieties that can be observed elsewhere.6 Here, one could discuss a much more radical approach to variation and the problem of the standard than Weiß envisages. Why not try to develop a grammatical model which incorporates the very idea of language variability as one of its central tenets?7 For me, it is not clear from the outset that someone commanding the range of non-discrete varieties between basilectal Bavarian and (Southern) Standard German should be modeled along the same lines as a German-French bilingual (and how should the former be related to someone speaking the range between the Thuringian dialect and Standard German?). In fact, there are some models currently being worked out in very different frameworks which might eventually shed some light on this issue, such as by Francescini (1998) in the context of code-switching research or by Bresnan and associates under the name of Stochastic Optimality Theory (cf. e.g. Bresnan & Deo 2001); for an insightful more conservative discussion in the framework of Principles and Parameters Theory cf. Wilson & Henry (1998). Thus, in accordance with Weiß, I have reservations about the almost exclusive use of data from standard languages in generative linguistics. For me, the next step on the research agenda would now be to carefully collect dialect data at a level of detail that is useful for modern theoretical linguistics. Thereby we should chart the extent and type of variation found in the language which a native speaker has access to; we can reasonably assume that this capacity for variation somehow reflects his/her linguistic competence. Here, an enormous research program comes into view. It will not be easily accomplished; however, the goal of a more realistic model of a native speaker’s knowledge should be worth the effort.
Notes *My present work is supported by a Feodor Lynen research fellowship from the Alexander von Humboldt Foundation whose sponsorship I hereby gratefully acknowledge. 1. Cf. in this context also the very considered paper by Glaser (2003, esp. p. 58–60) in which she summarises the last thirty years of research on the evolution of Standard German and where she shows considerable scepticism concerning the notions employed by Weiß.
2. Unfortunately, next to nothing is known in the Germanicist literature about the extent of syntactic variation in what speakers from different regions of the German speaking countries consider as ‘the standard language’ (cf., however, Ammon 1995 for some scattered phonological, morphological, and lexical data). 3. Naturally, one would not want to use grammaticality judgments from an ‘interlanguage’ speaker born in, say, the 1700s — even if one had any. 4. Rather surprisingly, however, Weiß also seems to make a statement in the exactly opposite direction when he claims that all variants of German are descendants of Standard German (p. 184), whatever that means… 5. This seems to be the heritage of the notion of an ‘ideal native speaker’, which may have had some methodological justification in the beginnings of the generative enterprise (and as such it was introduced in Chomsky 1965), but which has become reified over time. 6. For a succinct discussion of the German situation cf. Durrell (1998), who rejects the notion of diglossia in this context and prefers a comparison with post-creole continua instead. 7. Of course, this question touches on the problem whether gradient variability is really a property of I-language and not only a superficial E-language reflection of underlying distinctness; but this problem can only be solved empirically if it is stated as such in the first place.
References Ammon, Ulrich. 1995. Die deutsche Sprache in Deutschland, Österreich und der Schweiz. Das Problem der nationalen Varietäten. Berlin: de Gruyter. Barbour, Stephen; and Stevenson, Patrick. 1990. Variation in German. A critical approach to German sociolinguistics. Cambridge: CUP. Bresnan, Joan; and Deo, Ashwini. 2001. “Grammatical constraints on variation: ‘Be’ in the Survey of English Dialects and (stochastic) optimality theory”. Ms. Stanford University. (http:// www-lfg.stanford.edu/bresnan/be-final.pdf). Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press. Durrell, Martin. 1998. “Zum Problem des sprachlichen Kontinuums im Deutschen”. Zeitschrift für Germanistische Linguistik 26: 17–30. Fischer, Annette. 2001. “Diachronie und Synchronie von auxiliarem tun im Deutschen”. In: Watts, Sheila; West, Jonathan; and Solms, Hans-Joachim (eds), Zur Verbmorphologie germanischer Sprachen 137–154. Tübingen: Niemeyer. Francescini, Rita. 1998. “Code-switching and the notion of code in linguistics. Proposals for a dual focus model”. In: Auer, Peter (ed.), Code-switching in conversation. Language, interaction and identity 51–72. London: Routledge. Glaser, Elvira. 2003. “Zu Entstehung und Charakter der neuhochdeutschen Schriftsprache: Theorie und Empirie”. In: Berthele, Raphael; Christen, Helen; Germann, Sibylle; and Hove, Ingrid (eds), Die deutsche Schriftsprache und die Regionen. Entstehungsgeschichtliche Fragen in neuer Sicht 57–78. Berlin: de Gruyter. Gottsched, Johann Christoph. 1980. Ausgewählte Werke. Achter Band, dritter Teil. Deutsche Sprachkunst: Varianten und Kommentar. Bearbeitet von Herbert Penzl. Berlin: de Gruyter. Kroch, Anthony S. 2001. “Syntactic change”. In: Baltin, Mark; and Collins, Chris (eds), The handbook of contemporary syntactic theory 699–729. Oxford: Blackwell.
"sim-r11"> "sim-r12"> "sim-r13">
The Relevance of Variation
Langer, Nils. 2001. Linguistic purism in action. How auxiliary tun was stigmatized in Early New High German. Berlin: de Gruyter. Stickel, Gerhard (ed.). 1997. Varietäten des Deutschen. Regional- und Umgangssprachen. Berlin: de Gruyter. Wiesinger, Peter. 1983. “Phonologische Vokalsysteme deutscher Dialekte. Ein synchronischer und diachronischer Überblick”. In: Besch, Werner; Knoop, Ulrich; Putschke, Wolfgang; and Wiegand, Herbert Ernst (eds), Dialektologie. Ein Handbuch zur deutschen und allgemeinen Dialektforschung (vol. 2) 1042–1076. Berlin: de Gruyter. Wilson, John; and Henry, Alison. 1998. “Parameter setting within a socially realistic linguistics”. Language in Society 27: 1–21.
213
Author’s response Helmut Weiß Universität Regensburg
Though Horst Simon (HS) agrees with the general spirit and the main aim of my paper, he has some reservations regarding factual details and the logic behind my argumentation.1 As for the factual details, he doubts, for example, that it was the mid-eighteenth-century rationalist grammarians influence alone which swept away multiple negation. It is surely true that the decline of multiple negation began long before the 18th century, but Gottsched’s statement that he was only following the general language usage of his contemporaries does not seem to correspond to reality, because multiple negations did still occur in those times (cf. Weiß 2004a) — and, as for the logic of the argumentation, one wonders why Gottsched and Adelung were forced to hiss the banner against multiple negation, if it was no longer used by the writers of New High German (NHG). Though multiple negation ceased to be used before Gottsched and Adelung (probably due to Latin influence), prescriptivism appears to have been necessary for its final extinction — and this is in accordance with the story I told. As for the logic of my argumentation, HS denies the relevance of the point made against standard languages for synchronic studies “as long as one can find and draw on native speakers [of standard German] today” (p.210). I fully agree with this. However, I doubt whether this is feasible and, more importantly, possible at all. The crucial point for me seems to be that it is questionable whether adult native speakers of NHG exist in the way they exist (or existed) for dialects, because there are prescriptive rules for NHG and institutions to teach them (not necessarily intentionally, as the case of the public media shows). In Section 6 of my paper, I mention two properties of child German (i.e. the occurrence of multiple negation and the absence of the morphological genitive) which adult German does not possess. So it is not at all clear whether adult NHG native speakers of the kind required for UG studies really exist, and if not, my point made against standard languages is relevant for synchronic research, as well. However, even if such native speakers actually exist (as I would expect), my caveat concerning the use of standard language data loses nothing of its relevance, as the numerous examples
216
Helmut Weiß
discussed in my paper clearly show — it is no fight against windmills. In this sense, I cannot but fully agree with HS’s final demand to use and to carefully collect dialectal data — that is what I have done, at least when investigating linguistic phenomena (cf. the references to my work in the main article, e.g., Weiß 1998, 1999, 2000b, and Weiß 2004b, 2005b for a theoretical justification for doing this). And HS is to be strongly encouraged to realize his research program with its special interest on the variational aspects of linguistic competence.
Note 1. HS wonders why I claim all variants of German to be descendants of Standard German, and he is right in doing so. What I had in mind (and actually wrote in an earlier version of the paper) was that even all more informal, colloquial, but non-dialectal variants of NHG derived from the N2 language Standard German.
Universals, innateness and explanation in second language acquisition* Fred R. Eckman University of Wisconsin — Milwaukee
This paper considers the question of explanation in second language acquisition within the context of two approaches to universals, Universal Grammar and language typology. After briefly discussing the logic of explaining facts by including them under general laws (Hempel & Oppenheim 1948), the paper makes a case for the typological approach to explanation being the more fruitful, in that it allows more readily for the possibility of ‘explanatory ascent’, the ability to propose more general, higher order explanations by having lower-level generalizations follow from more general principles. The UG approach, on the other hand is less capable of such explanatory ascent because of the postulation that the innate, domain-specific principles of UG are not reducible in any interesting way to higher order principles of cognition (Chomsky 1982).
1.
Introduction
Over the last twenty years or so, one of the major research currents in second language acquisition (SLA) theory has been the use of universal principles to explain facts about second language (L2) acquisition. The rationale behind this program has been to relate L2 grammars (i.e. interlanguages) to primary language grammars by subsuming the former under the same generalizations or laws that govern the latter.1 Within this mainstream of research, one can identify two strands which correspond precisely to schools of thought concerning universals: the theory of Universal Grammar (UG), which postulates innate, domain-specific principles motivated largely on grounds of learnability (White 1989, 1996, 2000), and the typological approach (Eckman 1984, 1991, 1996; Hyltenstam 1984)), which invokes markedness principles based on implicational generalizations about the world’s languages. The purpose of this paper is to argue that the research program for SLA that invokes markedness generalizations as explanatorily principles is, on important grounds, preferable to that represented by an approach to SLA which invokes
"eck-r26"> "eck-r1"> "eck-r8">
218
Fred R. Eckman
innate, domain-specific principles. The point of departure for most of the ensuing discussion will be a number of criticisms of the typological approach to SLA that have appeared in the literature over the years. It will be argued that the basis for these critiques, which have for the most part taken one of two positions, represents a fundamental misunderstanding about the nature of explanation. Those criticisms that have been couched within the framework of UG have argued, in one form or another, that typological universals represent empirical generalizations about the world’s languages, and it is therefore not clear what the connection is to the mind of the L2 learner (White 1987). The criticisms that have been made by researchers who are not necessarily proponents of UG have generally taken two avenues of approach. On the one hand, it has been argued that typological generalizations, or equivalently, the ensuing markedness principles that are derivable from these generalizations, are not explanatory because they are descriptions, or facts, that are themselves in need of explanation (Archibald 1998). On the other hand, the critiques have claimed that explaining the nature of L2, or interlanguage (IL), grammars using markedness principles “only pushes the problem of explanation back one step” rather than actually providing an explanation (Gass & Selinker 2001: 154). Each of these criticisms misses a very important point about scientific explanations, namely, that there are levels of explanations which correspond to the generality of the laws invoked. To question a markedness-based hypothesis as an explanation for some fact about SLA because there is no established link to the mind of the learner is to dispute a hypothesis because there is at the time no higher level explanation that can be invoked. To debate whether a generalization is a description or an explanation is to debate the level of explanation, not whether an explanation has been given. And to reject a hypothesis because it pushes the problem of explanation back one step misses the point that all hypotheses push the problem of explanation back one step — indeed, this is necessary if we are to proceed to higher level explanations. Within this context, the goal of this paper will be pursued by arguing for two conclusions. First, the paper will maintain that none of the above criticisms of markedness-based explanations in SLA is compelling, that studies invoking typological markedness as an explanatory principle fit the logical structure of a scientific explanation, and also present a reasonable account for the facts they address. And second, the paper will argue that the research program for SLA that invokes markedness generalizations as explanatory principles has the potential to be more fruitful than the approach to SLA which invokes innate, domain-specific principles. The remainder of this paper is structured as follows. The background section gives a brief outline of the nature and logic of a scientific explanation as put forth in the classic work of Hempel & Oppenheim (1948), taking into account also some
Universals, innateness and explanation in second language acquisition
of the criticisms and refinements that have been put forth over the years. This discussion will be brought to bear on the issues surrounding the level of an explanation. The following section characterizes the approach to SLA that attempts to subsume IL grammars under UG, and argues that the stipulation that principles of UG are both innate and domain specific seems to discourage the kind of questioning that can lead to higher levels of explanation. The next section describes the approach to SLA theory that uses typological universals as explanatory principles, and compares this approach to L2 with the alternative that invokes UG. After considering some of the implications of both research programs within the context of the discussion surrounding explanations, it is argued that, not only is it the case that the above criticisms of and counter claims to the typological approach are not compelling, but also that there are important grounds on which to prefer this approach over the UG school of thought. The final section concludes the paper.
2. Background How do people, in general, and scientists, in particular, explain phenomena? The simple, straightforward response to this question is that we attempt to explain facts that we do not understand by relating them to phenomena that we believe we do understand. Stated more formally within a scientific context, we can say that scientists explain facts about the world by subsuming them under general laws. The fact to be explained is shown to be a specific instance of a more general phenomenon (Hempel & Oppenheim 1948). To take a concrete example, how do scientists explain the fact that immersing a mercury thermometer in a vat of boiling water results in the mercury column initially falling and then rapidly rising? Or if we consider a linguistics example, how do phonologists explain the fact that the English word ‘planting’ is syllabified as in (1a), and not as in (1b) or (1c)? (1) a. [plæn.tI]] b. *[plænt.I]] c. *[plæ.ntI]]
In the first example, scientists make reference to the general law of thermic expansion, and to the thermal properties of glass and mercury. The fact that the level of mercury at first drops is because of the relatively small thermal conductivity of glass. The glass tube which contains the mercury initially expands, causing the level of mercury to fall. After a short time, the glass conducts the heat of the boiling water to the mercury, which then expands upon being heated, causing the level of the mercury column to rise. Thus, the behavior of the mercury thermometer is shown to be a particular case of a more general phenomenon, namely, the behavior
219
"eck-r13">
220
Fred R. Eckman
of materials under thermic expansion. The explanation for the syllabification of ‘planting’ follows the same general pattern, but, of course, uses laws that refer to syllables and sound segments. Specifically, phonologists explain the syllabification in question by appealing to a universal principle known as the Sonority Sequencing Generalization (SSG), which states that the sonority profile of a syllable must rise until it peaks, and then it must fall. Vowels are the most sonorous sounds, obstruents are the least sonorous, with the various sonorant consonants falling in between the two. Given the relative sonority of the segments in the two syllables in question in (1a), it can be shown that the structure of these syllables follows the SSG, whereas those in (1b) and (1c) do not. In both (1b) and (1c), the problem is the second syllable, which, in (1b) lacks an onset, and therefore does not rise in sonority towards the vowel, and which, in (1c) begins with a nasal followed by an obstruent. Because the [n] has a higher sonority value than does the following [t], the second syllable violates the SSG.2 The facts in the above examples are explained, then, by showing that they occurred in accordance with general laws. Now, the question of ‘why’ can also be raised with respect to the general laws themselves (Hempel & Oppenheim 1948). These principles, in other words, can come to be regarded as facts to be explained. With respect to the examples at hand, it is therefore possible to ask why materials expand when they are heated, and why the universal profile of syllables in natural languages follows the pattern specified in the Sonority Sequencing Generalization. These regularities would be explained if one could subsume them under generalizations which are more comprehensive, that is, if it were possible to deduce them from some more-encompassing laws or principles. Given this background, it is important to recognize the following point: any proposed explanation of some phenomenon always engenders additional questions, because the generalizations serving as explanatory principles can also become a fact to be explained. Since any law or generalization will invariably be stated in terms of some construct, and will necessarily assert the truth of some state of affairs about those constructs, it will always be possible to raise questions about the constructs that the law or generalization postulates, or to ask why the specified state of affairs should exist. Any explanation, therefore, is always adequate only to the extent of the current state of knowledge and understanding of the phenomenon under investigation. It follows from this that there are levels of explanation, where ‘level’ can be defined as the relative generality of the laws used in the explanation (Sanders 1974). In the context of the examples presented so far, any generalization from which it would be possible to deduce the law of thermic expansion, or from which we could derive the principle of sonority sequencing, would constitute a higher-level explanation for those generalizations. It follows further that all empirical general-
Universals, innateness and explanation in second language acquisition
izations are, at the same time, a means for explaining lower-level generalizations, and the object of explanation for higher-level generalizations (Sanders 1974). A concrete linguistic example of this state of affairs is shown in (2), and is based on the discussion in Sanders (1974: 5). (2) Q1 Why does the word old in the English phrase old men precede the word men? A1 Because old is an adjective and men is a noun, and in English, adjectives precede the nouns they modify. Q2 Why in English do adjectives precede the nouns they modify? A2 Because in English, phrases are constructed according to the schema X≤ Æ Specifier X¢ X¢ Æ X Complement and adjectives are specifiers of noun phrases. Q3 Why is it that in English, Specifiers precede heads? A3 Because … Q4 Why …?
The series of questions and answers in (2) represents an ‘explanatory ascent’ in that the level of explanation becomes increasingly higher because the generalizations invoked as explanations become increasingly more general. In A1, the relative ordering of the words old and men is subsumed under the rule (law) of English that adjectives precede the nouns they modify. This generalization makes the prediction that all words classified as adjectives in English should precede the words they modify that are classified as nouns, whenever those two classes of words occur together in the same phrase. A1, of course, does not explain why adjectives precede the nouns they modify; it offers a relatively low-level explanation, one which many linguists may not consider at all satisfying. As a consequence, those linguists may refer to A1 as a description of the facts rather than as an explanation. And based on the above discussion, these linguists would be partly correct and partly incorrect. They would be right in saying that A1 is a description of the facts in the sense that lower-level generalizations become facts for higher-level generalizations to explain. But these linguists would be incorrect in asserting that A1 is not an explanation; A1 is a generalization under which the ordering of any English attributive adjective and its noun head can be subsumed, and is therefore an answer to the question of why old precedes men in the utterance old men. It would be worthwhile at this point to make two additional observations about this example, as they will become relevant in the discussion below concerning the explanatory value of universal generalizations in second language acquisition. The first is that a linguist who adopts A2 in (2) above as an explanation is clearly justified in referring to A1 as a fact, and not as an explanation. A2 is clearly a higher level of explanation than A1 because A1 has been subsumed under the more
221
"eck-r13">
222
Fred R. Eckman
general law A2. Indeed, the very goal of explanatory ascent is to gain a deeper understanding of the phenomenon at hand by continuing to raise questions that will lead to invoking more general laws.3 According to A2, not only should adjectives in English precede the nouns they modify, but in addition all instances of specifiers in English are predicted to precede their heads.4 Thus, the facts explained in A1 can be shown to be a specific instance of a more general phenomenon, namely, head-dependent ordering in English. However, it is sound scientific reasoning to reject an explanation such as A1 only if one can then invoke a higher level generalization such as A2. In the absence of a more general principle, it is scientifically imprudent to reject A1 as merely a description, because in so doing one would be left with no explanation at all. The second observation that we should make here is this: the most fruitful research program is one in which investigators can continue, in principle, to raise questions that would invoke a higher level generalization as an explanation. The best research program would, in other words, be one which allows for the possibility of explanatory ascent by making it possible to raise further questions. By the same token, a research program in which questions about the explanatory principles and laws seem to be cut off, would not be as fruitful a research program. One way in which a framework could discourage the possibility of explanatory ascent is by postulating that its explanatory laws are innate, and domain-specific. Under such a research program, it would be more difficult to invoke higher order generalizations from which the innate, domain-specific principles follow. This is also an important point which will become relevant in the discussion below about using universal principles to explain SLA. To summarize to this point, explanations involve subsuming phenomena under general laws from which the facts in question follow automatically. The laws or generalizations which serve as explanatory principles may themselves be the object of explanation by being subsumed under higher-order generalizations. One of the virtues of the Hempel & Oppenheim (1948) model of a scientific explanation, also referred to as the ‘covering law’ or deductive-nomological (D-N) approach, is that it captures an important intuitive feature of explanation: one feels that one understands why something happened if, under the conditions at hand, it had to happen (Newton-Smith 2000). However, there have been over the years a number of objections to the D-N model. The first problem is that explanation is asymmetric: if A explains B, then B cannot explain A. However, there are numerous examples where the law involved is neutral as to what explains what. For example, the pendulum law states that the period of a pendulum (P) is equal to 2π times the square root of the pendulum’s length (L) divided by the constant of gravitation, g. This law can be represented by the formula in (3).
"eck-r9"> "eck-r25"> "eck-r21">
Universals, innateness and explanation in second language acquisition
(3) P = 2π ÷L/g
The idea behind the pendulum law is that the length of the pendulum can explain its period: solving the equation in (3) for P yields the period of any given pendulum as a function of its length. But the problem with the D-N model of explanation is that, if we are given the period of the pendulum, we can solve the above equation for L, the length of the pendulum, thereby ‘explaining’ the pendulum’s length in terms of its period. Although no scientist would claim that the period of a pendulum explains its length, the covering law represented in (3) is neutral in this respect. Another objection to the D-N model was the question of relevance (Glymour 1975): the covering laws had to be relevant to the facts to be explained. Thus, for example, one could use the D-N model to explain why a rod partially immersed in water appears to bend by invoking the refractory properties of water on light along with the statement that the rod is partially immersed in water. But if a priest had also happened to bless the water in question, one could explain the facts using the refractory properties of water on light accompanied by the statement that the rod is partially immersed in water blessed by a priest (Newton-Smith 2000). One way to address this problem is to claim that the laws in question have to be causally relevant (Scriven 1975) to the facts to be explained. In the above example, the blessing of the water by a priest is not relevant to the bent appearance of the partially immersed rod. This proposal, in turn, raised the problem of how to specify whether or not something was causal or relevant (Newton-Smith 2000). What this discussion reduces to for our purposes is this: although there are problems with and objections to the D-N model of explanation that have yet to be fully addressed (Newton-Smith 2000), scientists and philosophers seem to agree that explanations are characterized by the process of unification whereby seemingly disparate phenomena are included under unifying principles of increasing generality. In other words, we increase our understanding of the world as we decrease the number of independently acceptable hypotheses (Friedman 1974). Armed with these conclusions, we now turn to the question of the extent to which the two approaches to universals can offer explanations for facts about SLA. We begin with a general characterization of the UG approach.
3. Universal Grammar and SLA Universal Grammar (UG) is a set of abstract and general principles, which is hypothesized to be part of a human being’s innate language faculty, and which makes available to the child-learner possible grammars that are consistent with the input to which the learner is exposed (Wunderlich, this volume). The motivation for an innate, domain-specific UG comes from the fact that no one has been able
223
"eck-r3">
224
Fred R. Eckman
propose a learning theory that is strong enough to account for the acquisition of some of the principles that are postulated to underlie natural-language grammars. The need for some innate cognitive device such as UG is thus justified on the basis of the poverty of the stimulus: the product of language acquisition, the end-state grammar, is claimed to be underdetermined by the available data. The adult native speaker of a language has acquired a grammar which underlies quite rich, subtle and complex knowledge, and this level of knowledge could not have been attained without some innate cognitive mechanism that enables the learner to bridge the gap between the input and the end-state grammar. Postulating UG as this innate mechanism provides a solution to the poverty of the stimulus problem, accounts for the relative speed and uniformity with which children acquire their native language, and explains the systematic variation found among the world’s languages. The motivation for the hypothesis that, in addition to being innate, the constructs of UG are domain-specific stems from the claim that principles of UG are not necessary for the explanation of other kinds of knowledge that human beings acquire, nor are such principles derivable from any higher order principles of cognition, as Chomsky (1982: 19) observes. (4) …what we are discovering is very specific principles which do not seem to have any obvious applications in other cognitive domains.
Along these lines, Newmeyer (this volume:62), argues that the principle of Subjacency must be part of UG because it cannot be synchronically derived from mechanisms of parsing, or from any other known principles of processing or cognition. From the preceding discussion, it should be clear that UG is motivated on the basis of facts about first language acquisition; nothing necessarily follows from this claim about SLA. A prima facie case could be made, in other words, that given the various differences between first and second language acquisition, UG may be necessary to account for first language acquisition, but is not motivated for the explanation of second language acquisition. There is, however, a relatively large body of literature that makes the hypothesis that UG also governs SLA (e.g. Schwartz & Sprouse 1996, 2000; White, 1989, 1996, 2000, to cite just two recent examples). One point of clarification may be necessary here. The postulation that UG governs second language acquisition does not imply that second language acquirers should necessarily end up with the same final state competence that first language learners have. Rather, the implication is that IL grammars will not violate UG constraints. IL grammars and L1 grammars will, according to this hypothesis, both be organized using the same UG principles. Accordingly, the differences between L1 and L2 acquisition that result in the accented and often error-ridden speech of the L2 learner must stem from other factors that are different between L1 and L2
Universals, innateness and explanation in second language acquisition
acquisition, and these differences are ex hypothesi not due to the unavailability of UG in SLA. Support for the position that UG governs IL grammars can be derived from syntactic analyses of L2 utterances showing that principles of UG are necessary to explain the properties of these utterances. As an illustration we consider a classic study by Bley-Vroman et al. (1988). The point of this study was to test whether the principle of Subjacency was motivated for the IL grammars of L2 learners of English whose native language (NL) grammar did not include wh-movement, because wh-words in the NL occur in situ. One of the grammatical contrasts that is accounted for by Subjacency, and one which was involved in the Bley-Vroman et al. study, is shown in (5).5 (5) a.
1. 2. b. 1. 2.
Mary believes that Bill said that John met someone last night. Who does Mary believe that Bill said that John met last night? Mary heard the news that John met someone last night. *Who did Mary hear the news that John met last night?
The ungrammaticality of (5b2) stems from a violation of Subjacency, viz., the whword in this sentence has been extracted from a complex noun phrase. The rationale behind the Bley-Vroman et al. (1988) study was to see whether Korean-speaking learners of English, whose NL grammar does not evidence whmovement, could distinguish the grammaticality of (5a2) relative to (5b2). The authors’ protocol used a questionnaire that required the research subjects to evaluate a set of 32 test sentences (15 grammatical, 17 ungrammatical) on a tripartite scale (possible — impossible — not sure). The responses of each subject were analyzed first to see whether their IL grammar contained wh-movement, and if so, then to determine whether the subjects could detect Subjacency violations. Any systematic ability on the part of the Korean speakers to distinguish Subjacency violations would be interpreted as the IL grammar of these L2 learners adhering to the constraints of UG. The learners’ knowledge of Subjacency could not stem from their NL, as the grammar of Korean does not have wh-movement, nor could it have come from exposure to the target language (TL), because, on the one hand, Subjacency is ex hypothesi not learnable from the input, otherwise it would not be posited as part of UG; and on the other hand, the facts surrounding Subjacency are presumably not taught in English as a second language programs. It is difficult to impose a clear interpretation on the results of the Bley-Vroman et al. (1988) study for several reasons.6 However, the clarity of the results notwithstanding, the goal of this approach to SLA is to explain L2 knowledge in terms of IL grammars organized around UG principles. Put somewhat differently, this school of thought attempts to include IL grammars under the same covering laws as L1 grammars, in the particular case at hand, under the principle of Subjacency.
225
226
Fred R. Eckman
The point of this discussion is that the innate, domain-specific nature of the principles of UG is not conducive to seeking a higher level of linguistic explanation. Consider, as a case in point, the following representation of the explanation offered for the SLA facts proposed in Bley-Vroman et al. (1988). (6) Q1 Why are at least some of the L2 learners in question able to make the appropriate TL distinction between (5a2) and (5b2)? A1 Because the IL grammar of at least some of the L2 learners obeys Subjacency. Q2 Why do the IL grammars of at least some of the L2 learners obey Subjacency? A2 Because Subjacency is part of UG, and IL grammars are governed by UG. Q3 Why is Subjacency part of UG? A3 Because Subjacency is one of the innate principles of UG.
The next step in this explanatory ascent takes one into the domain of evolution, because principles of UG are, by definition, domain-specific, and therefore it is not possible to derive them from any higher-order principles of cognition. Unless one is prepared to incorporate evolutionary principles into one’s linguistic research program, the questioning process that would lead to explanatory ascent, and a deeper understanding of Subjacency, is seemingly cut off.7 Thus, the difficulty with explanatory principles that are innate and domain-specific, at least as I see it, is practical rather than in principle: it is not impossible, in principle, to continue the questioning in (6), rather it is very difficult for most linguists. More will be said about this below. Let us now proceed to compare the situation in (6) with the kind of explanations given within the typological approach to SLA theory.
4. Typological universals and markedness in SLA 4.1 The typological approach to universals Under the typological approach to stating universals, the linguist attempts to formulate universal generalizations on the basis of observations from a number of genetically unrelated and geographically non-adjacent languages. The goal of this endeavor is to state generalizations about the occurrence, co-occurrence, or absence of linguistic expressions across the world’s languages; and to provide explanations for these generalizations. The universals are often stated as unidirectional implications asserting that the presence of a given structure in a language implies the presence of some other structure, but not vice versa. Several types of explanations can be given for these universals, ranging from innate knowledge, to
"eck-r12"> "eck-r11"> "eck-r10">
Universals, innateness and explanation in second language acquisition
semantics, to processing constraints, to pragmatic considerations, to language function (Hawkins 1988). The unidirectional implicational nature of the generalizations leads naturally to the notion of typological markedness, which can be stated formally in (7). (7) A structure X is typologically marked relative to another structure, Y, (and Y is typologically unmarked relative to X) if every language that has X also has Y, but every language that has Y does not necessarily have X. (Gundel et al. 1986: 108)
The construct of typological markedness as a language universal was developed in the work of Greenberg (1976), where it is argued that often the most insightful statements about human languages can be made only in terms of implicational statements, that is, the most enlightening universals are formulated in terms of markedness. A concrete example of a markedness generalization is as in (8), which derives from the universal generalization in (9). (8) A verbal morpheme signifying dual is marked relative to a verbal morpheme signifying plural, and a verbal plural morpheme is unmarked relative to a verbal dual morpheme. (9) Any language that has a verbal dual morpheme also has a verbal plural morpheme, but not every language that has a verbal plural morpheme also has a verbal dual morpheme.8
In addition to markedness holding between binary oppositions as in (8) and (9) above, this relation has also been shown to obtain between any number of linguistic representations. Keenan & Comrie (1977) have demonstrated that the variation in relative clause types across the world’s languages can be characterized in terms of a hierarchy known as the Accessibility Hierarchy (AH), shown in (10). (10) Accessibility Hierarchy (AH) (Keenan & Comrie 1977) Su > DO > IO > OBL > Gen > Ocomp
The symbol ‘>’ means ‘is more accessible than’, and Su, DO, IO, etc. refer, respectively, to the grammatical functions subject, direct object, indirect object, oblique, genitive and object of a comparative. The AH represents a markedness hierarchy because there exists an implicational relationship among the positions on the hierarchy, such that any language having a relative clause type represented by a given grammatical position X on the AH necessarily has relative clauses based on all positions to the left of X, but not necessarily on positions to the right of X. For example, any language that has relative clauses where the IO noun phrase is relativized also has relative clauses in which the DO and Su noun phrases have been relativized. Keenan & Comrie’s intuition behind the AH was that the positions on the hierarchy represent the degree of difficulty in forming relative clauses, with easier
227
"eck-r17">
228
Fred R. Eckman
positions being to the left of the hierarchy, and more difficult positions being to the right. The AH characterizes the fact that not all languages can form all kinds of relative clauses. Some languages can form relative clauses by relativizing only the subject, and no other position (e.g. Malagasy, Toba Batak); other languages can form relative clauses by relativizing all six positions on the AH (e.g., English), and still other languages can relativize more positions than just the subject, but they cannot relativize all of the positions (e.g. Greek, Kinyarwanda, Persian). Additional investigation on relative clauses since the Keenan & Comrie study has taken into account a wider set of languages, and has attempted to address some of the recalcitrant cases by using a broader classification of relative clause types, both in terms of the strategies for forming such clauses, and also in terms of whether a verb or a noun is the basis for the relative clause (Lehmann 1986). The hierarchy that seems to have distilled out of this work is as shown in (11). (11) Accessibility Hierarchy (revised) (Lehmann 1986; Croft 1990) Su/absolutive > DO/ergative > IO > Oblique
Within the context of the discussion about explanations in the Background section above, we can say that the AH is proposed as an explanatory generalization, a law, to account for the kinds of relative clauses that languages can have. In answer to the question as to why English has the kinds of relative clauses that it does, the answer based on the AH is that, English relative clauses represent one of the possible constellations of relative clause types allowed by the AH. More specifically, because English has relative clauses in which the oblique position can be relativized (e.g. the ladder that I climbed on), then English necessarily also has the other three kinds of relative clauses shown on the AH in (11). The AH can also be used to explain why apparently no languages allow only the types of relative clauses shown in (12). (12) a. Su, DO, Oblique b. Su, Oblique
By virtue of the implicational relationship among the positions on the AH, the prediction is that it will never be the case that a language allows two non-adjacent positions on the AH to be relativized without allowing the position or positions that intervene between the non-adjacent positions also to be relativized. In other words, the AH does not allow positions on the hierarchy to be ‘skipped’. To summarize this subsection, typological markedness is an asymmetric, irreflexive and transitive relation that is inferred to hold between certain linguistic constructions on the basis of the distribution of those structures across the world’s languages. We now turn to the use of such markedness relations to explain facts about SLA.
"eck-r6"> "eck-r8">
Universals, innateness and explanation in second language acquisition
4.2 Typological universals and SLA Typological markedness has been invoked to explain a number of different facts about L2 acquisition, including learning difficulty, as proposed in (Eckman 1977), transferability, as discussed in Gass (1979), and order of acquisition, as argued by Hyltenstam (1984). Undoubtedly, the construction type that has drawn the most interest in studies employing typological markedness is relative clauses. This sentence type constitutes an interesting domain for studies in L2 acquisition, because, as was shown above, not only do languages differ widely with respect to the kinds of relative clauses they have, but in addition the cross-linguistic differences can be characterized by a markedness hierarchy. In the remainder of this section, we outline how the AH was used to explain certain facts about SLA in two important studies on relative clauses, one by Gass (1979) and the other by Hyltenstam (1984).9 The construction type that was the focal point of these studies on L2 relative clauses was resumptive pronouns. Resumptive pronouns can be thought of informally as pronunciations of the various traces that result from the movement of the wh-word to the beginning of its clause. Standard English does not allow resumptive pronouns in relative clauses, but if it did, the examples in (13) would be an illustration. The sentences in (13) represent, respectively, examples of Su, DO, IO and Oblique relative clauses; the relative clause itself is italicized and the resumptive pronoun is underlined. (13) a. b. c. d.
There is the woman who she is my sister. There is the woman who(m) I registered her. There is the woman to whom I sent her an application. There is the woman whom I read about her in the newspaper.
In their 1977 study, Keenan & Comrie pointed out that a number of languages use resumptive pronouns as part of a strategy for forming relative clauses, and that the AH characterizes how languages differ in the use of this strategy. The AH predicts that if a language forms relative clauses using a resumptive pronoun strategy for some position on the AH, then that language also uses this same strategy when relativizing all lower positions on the AH, but not necessarily when relativizing higher positions. Accordingly, the occurrence of resumptive pronouns in relative clauses can also be characterized in terms of markedness. The Gass and Hyltenstam studies investigated the occurrence of resumptive pronouns in relative clauses of L2 learners, and both dealt with TLs in which relative clauses do not normally contain resumptive pronouns. Both studies used the AH to explain why their subjects produced fewer resumptive pronoun errors on relative clauses involving NPs in positions higher on the AH compared to those
229
"eck-r6">
230
Fred R. Eckman
lower on the hierarchy. One of the important aspects of the error patterns produced by subjects in both the Gass and the Hyltenstam studies is that the L2 learners produced TL relative clauses containing resumptive pronouns, even though such pronouns were not allowed in relative clauses in either of the TLs in question, and were absent in many of the learners’ NLs. In other words, many of the L2 learners in these studies produced relative clauses containing resumptive pronouns where these errors could not be explained in terms of either NL transfer or TL input, because neither the NL nor the TL allowed resumptive pronouns in relative clauses.10 Yet the error patterns in general adhered to the markedness relations defined by the AH. Though both Gass and Hyltenstam drew their own conclusions as to the explanation for the error pattern of their subjects, it is nevertheless possible to subsume their results under the view that second language learning is similar to primary language acquisition in the important respect that both types of acquisition result in grammars that adhere to the same set of markedness constraints. This claim is embodied in the Structural Conformity Hypothesis (Eckman et al. 1989: 195), stated in (14). (14) Structural Conformity Hypothesis (SCH) All universals that are true for primary languages will be true for interlanguages.
It seems clear that the point of the studies on relative clauses by Gass and by Hyltenstam was to invoke typological generalizations as laws under which facts about L2 acquisition could be subsumed. The claim is, in effect, that IL grammars are the way they are because interlanguages are in fact languages. IL grammars obey the same universal generalizations as primary languages because ILs constitute a specific instance of a more general phenomenon, namely, an instance of a natural, human language. Viewed in this way, the goal of the SCH is very much in keeping with that of the UG approach to SLA outlined above, with the important difference being that the typological approach is not appealing to domain-specific principles of UG, but allows the possibility that various typological universals can be derived from principles of human cognition or processing (Hawkins 1988).11 Now, as has been pointed out several times already, it is possible, even necessary, for someone to raise further questions about the principles that have been invoked as explanations, asking, for instance with respect to the above examples, why the Su position is easier to relativize than the DO position, why English allows relativization of the Oblique position, why Arabic does not allow relativization of the IO position without a resumptive pronoun, and why languages do not ‘skip’ positions on the AH in their relative clause patterns. Such questioning is natural, is applicable to any scientific generalization, and is even necessary if we are to increase our understanding of interlanguages in general, and of relative clauses in
"eck-r1"> "eck-r19">
Universals, innateness and explanation in second language acquisition
particular. And the fact that such questions can be raised does not diminish the value of the AH or the SCH as explanations; on the contrary, it is a virtue of a research program and the framework it employs to allow for such questioning to take place. This point seems to have been missed, however, judging by some of the criticisms and counter claims that have been made in the SLA literature, to which we now turn. We begin with that of Archibald (1998). Archibald’s view seems to be that typological markedness generalizations, rather than being explanatory, are themselves in need of explanation, a view which he makes explicit in the following. (15) My general assessment of this sort of typological universals approach to second language acquisition is that it provides an interesting description of the phenomena to be explained. I’m less sure of their [sic] status as an explanation of the observed facts. All in all, I prefer to assume some sort of structural explanation … (p. 150 emphasis added)
In making the above statement, Archibald seems not to have recognized the point about levels of explanation, that generalizations can be used to explain phenomena by subsuming the facts in question under a covering law, such as a typological universal, and that such generalizations are themselves grist for the explanation mill, in that they are the object of explanation for broader generalizations. Accordingly, it is reasonable for Archibald to view a (typological) generalization as being descriptive rather than explanatory, only if he can invoke a more encompassing generalization. But in this case, he does not, and consequently, his claim in the above statement simply does not carry the intended force. Statements similar to Archibald’s have appeared also in the linguistics literature, in particular within the context of the formalist-functionalist debate. In his influential book, Newmeyer (1998) takes a position similar to that of Archibald in stating, as shown by the quotes below in, respectively, (16) and (17), that principles such as the AH or the Hierarchy of Morphological Incorporability (Mithun 1984) are generalizations in need of explanation. (16) Surely one would think that if AH is indeed consistent with the facts, it must reflect a real generalization about language in need of explanation. (Newmeyer 1998: 317–318) (17) If this hierarchy is indeed valid, rather than being simply an artifact of the examination of a small or non-representative sample of languages, it represents a generalization in need of explanation. (Newmeyer 1998: 305–306)
Of course, the same point about level of explanation could be made here: the generalizations in question are in need of explanation insofar as any generalization can be a fact to be explained.
231
"eck-r8">
232
Fred R. Eckman
Interestingly, it seems to be the case that it is only the typological generalizations that are singled out as being “in need of explanation”. In principle, one could also cite generalizations or principles such as Subjacency, discussed above, and claim that the fact that movement is prohibited out of complex NPs, whislands or coordinate structures is a generalization in need of explanation. It is readily apparent that such questions are not raised, presumably because the principles of UG are innate and domain-specific and therefore are not derivable from other principles of cognition. The second type of argument leveled against typological markedness as an explanation is the claim made in the introductory textbook on SLA by Gass & Selinker (2001). The authors state, as shown in (18), that subsuming interlanguages under the same generalizations as primary languages, as asserted by the SCH above, is not an explanatory account, but simply “pushes the problem of explanation back one step”. (18) For implicational universals to have any importance in the study of second language acquisition, two factors must be taken into consideration. First, one must understand why a universal is a universal. It is not sufficient to state that second languages obey natural language constraints because that is the way languages are. This only pushes the problem of explanation back one step. (p. 154)
The claim of the SCH is that interlanguages are the way they are, at least in part, because they are instances of human languages, and all such languages obey the same laws. Implicit in the statement by Gass and Selinker is that such a claim is not explanatory, because it simply raises further questions. It seems clear that the claim made by Gass & Selinker in (18) is another example where the fact that there are levels of explanation has been missed. The point is that, if one were to reject a generalization as an insufficient explanation because that generalization “pushes the problem of explanation back one step”, as Gass & Selinker suggest, one would never be able to accept any generalization as an explanation, because all generalizations offer an explanation for the facts at hand, and then push the problem of further explanation back one step by becoming the target of explanation for higherlevel generalizations. We now turn to the third critique made in the literature concerning typological markedness as an explanation for SLA, an example of which is embodied in the quotation taken from White (1987). (19) Like the learnability definition [of markedness], the implicational definition is formal. It is also external to individuals in that it relies on the situation in the languages of the world. It is not clear how this way of defining markedness relates to individual learners, although people often make an implicit
"eck-r8">
Universals, innateness and explanation in second language acquisition
assumption that it does (e.g. Eckman’s L2 acquisition predictions made on this basis; clearly he must be assuming some kind of psychological reality to the implicational definition). (p. 265)
White continues on the next page as follows: (20) If markedness is defined implicationally, the characterization of what is marked is arrived at by considering the languages of the world; L2 learners cannot automatically be assumed to have such knowledge available to them. (p. 266)
It seems clear from the above statements that the questioning of typological markedness as an explanation for certain facts about L2 acquisition is based on two assumptions. The first is that typological markedness is determined on the basis of facts about certain structures in the world’s languages. And the second is that to hypothesize that these markedness relations can explain certain facts about SLA assumes either that the markedness principles are psychologically real, or that they are imputed to be part of the (explicit) knowledge of the L2 learner. And since it is highly unlikely that L2 learners have knowledge of such markedness relations, the implication seems to be that markedness is not a viable explanatory principle for SLA. We consider each of these assumptions in turn. The first assumption, that typological markedness is determined on the basis of implicational relationships across languages, is true. This is what White means by the statement in (18) above that the implicational definition of markedness is “external”: whether a linguistic expression is marked or unmarked relative to some other expression is determined externally to the L2 learner. The status of the second assumption is unclear. If, by psychological reality is meant that the L2 learner explicitly or implicitly knows the markedness principle in question, then the assumption is false. Certainly no such knowledge is assumed, nor has such knowledge on the part of the L2 learner ever been claimed or implied, to the best of my knowledge, in any writings by any researchers working within the typological framework. Nor would it ever be necessary to make such an assumption. But given that this seems to a point of contention, or at least confusion, let us explore further exactly what claim is being made by the typological school in this case. We use as an example the Accessibility Hierarchy discussed above, and we reprise the studies by Gass (1979) and Hyltenstam (1984). Both of these authors claimed that the AH provided an explanation for the IL patterns observed in the L2 learners in question. But to invoke the AH as an explanatory principle does not imply that the L2 learners involved in these studies actually knew the AH, either explicitly or implicitly. Rather the claim is this: whatever it is that causes the relative clauses of the world’s languages to be as they are is also at work in determining how relative clauses are learned in SLA. Stated somewhat differently, whatever property
233
"eck-r12">
234
Fred R. Eckman
about relative clause constructions causes the Su position to be universally relativizable, the Oblique position to be the most difficult, and so on, is also at work in SLA, and causes the Su position to be the least difficult, and so on. In other words, the issue again reduces to a discussion about level of explanation: why is the AH the way it is? Whatever property or principle causes the AH to be as it is, it is also ex hypothesi instrumental in shaping the relative clauses produced and understood by the learners in the studies by Gass and Hyltenstam. And one can claim that such a property is operative in SLA without claiming that L2 learners know what this property is. Another way to view the matter is that the AH is being used as a convenient short-hand representation for whatever principle or generalization can be shown to subsume the AH. And to be sure, it is incumbent on those proposing this explanation to address the ‘whatever’, to discover or propose a more general principle from which the AH can be derived. The issue is one of explanatory ascent: the AH is a generalization that explains the diversity of relative clauses across the world’s languages. But such a generalization, as stated before, also becomes the object of explanation, and linguists interested in this generalization attempt to subsume the AH under a higher-order law, and it may well develop from this endeavor that the connection of such higher order principles to the mind of the L2 learner will be more direct. In fact, proposals have been made in the literature that address exactly this point, including one by O’Grady (1987), which deals with language processing in general, another by Wolfe-Quintero (1992), which explicitly addresses L2 relative clauses and wh-questions, and another by Hawkins (1999) within the more general context of filler-gap dependencies.12 The thrust of all of these proposals is that the patterns characterized by the AH may indeed follow from facts having to do with parsing and long-distance dependency relations. Due to space limitations, however, we consider briefly only the proposals of O’Grady (1987) and Wolfe-Quintero (1992). O’Grady (1987: 87) proposes the Continuity Requirement shown in (21). (21) All phrases must be continuous.
From this constraint it follows that the unmarked structure is that which departs the least from (21), which, in the case of relative clauses, would be a clause where the subject is relativized. And according to (21), structures would be more marked to the extent that they exhibited more noncompliance with this principle, which would be those relative clause types that are farther to the right on the AH than the subject position. Building on this principle, Wolfe-Quintero argues that L2 error patterns in certain types of relative clauses and wh-questions can be explained using the distance measured in terms of phrasal brackets between the relative pronoun and its ‘gap’, that is, the position from which it was ‘moved’. In general,
"eck-r5"> "eck-r8">
Universals, innateness and explanation in second language acquisition
this distance increases as one proceeds from left to right on the AH. The goal here is not to argue for one or the other of these principles as an explanation for the AH. Rather, the point is simply to show that claims about the lack of a clear connection between markedness generalizations and the L2 learner are in reality discussions about the level of explanation. While it may seem that the AH represents a generalization only about relative clauses across languages, it seems highly plausible that this generalization is subsumable under a more general principle where the connection to L2 learning is more straightforward. This claim derives support from the fact that several L2 studies have been conducted that show that the AH seems to form the basis for the generalization of relative clause learning (Gass 1982; Eckman et al. 1988; Doughty 1991). All three of these studies showed that L2 learners were able to generalize instruction on relative clauses from more marked structures to less marked structures, but not necessarily in the opposite direction, where markedness in this case was determined in terms of the AH. Thus, taking the study by Doughty (1991) as an example, she showed that L2 learners who were instructed in TL relative clauses only on one position of the AH, were able to generalize this instruction compared to their baseline performance to the less marked positions on the AH, but not necessarily to the more marked positions. And Doughty’s results are consistent with those obtained in the studies by Gass (1982) and Eckman et al. (1988). The fact that a hierarchy such as the AH can form the basis for explaining L2 learners’ generalization in an instructional study suggests even more strongly that the AH is not unrelated to L2 learners. Before concluding this section it would be worthwhile to illustrate how using universal generalizations which are not claimed to be domain-specific allow for potentially higher levels of explanations. The point of departure here is the discussion of relative clauses in SLA. (22) Q1 Why do L2 learners of language L have more difficulty with IO relative clauses than they do with DO relative clauses? A1 Because IO relative clauses are more marked and therefore more difficult than DO relative clauses. Q2 Why are IO relative clauses are more difficult than DO relative clauses? A2 Because IO relative clauses involve phrases that are more discontinuous than DO relative clauses. Q3 Why are discontinuous phrases more difficult than continuous phrases? A3 Because discontinuous phrases place higher demands on short-term memory.13 Q4 Why…? A4 Because …
The point of (22) is to show that the potential for asking questions that lead to a
235
236
Fred R. Eckman
deeper understanding of the phenomenon is greater when the possibility is allowed that linguistic generalizations derive from other principles of cognition, whereas the questioning is cut off much earlier when the explanation resorts to domainspecific principles. Before concluding, it would be worthwhile to emphasize exactly what is being argued about the explanatory nature of innate, domain-specific principles of UG. The issue concerns the level of explanation provided, whether such principles are conducive to the questioning that leads to explanatory ascent, and therefore to a deeper understanding of the constructs in question. The issue is not whether the postulation of innate domain-specific principles curtails further scientific investigation or examination, nor does the issue pertain to the empirical adequacy of such principles. As one of the reviewers correctly pointed out, if the principles in question are empirically false, then there is nothing to explain at a higher level; if the principles are empirically defensible, then one can certainly pose the next question and ask why such principles exist, with the answer being provided, presumably, on evolutionary grounds (but see Lewontin 1998 for discussion against this view).
5. Conclusion In this paper we have compared two approaches to employing universal principles to the explanation of SLA facts. It has been argued, that despite a number of counter claims in the literature, typological universals and markedness principles can offer explanations for such facts in that the principles serve as covering laws. It has been argued further that typological generalizations allow for the possibility of higher level explanations because these generalizations, rather than being specific to the domain of language, may well be rooted in other aspects of human cognition.
Notes *I would like to express my appreciation to Edith Moravcsik, Martina Penke, Anette Rosenbach and an anonymous reviewer for their comments on an earlier draft of this paper. I have also benefited from discussions on this topic with Greg Iverson, Michael Liston, Barbara Schulz, Robert Schwartz, and Bert Vaux. Of course, none of the above is at all responsible for any misrepresentations, inconsistencies or other errors. 1. Interlanguages (IL) would presumably fall under what Weiß (this volume) defines as N2, languages which are not subject to primary language (L1) acquisition. However, since Weiß uses the presence or absence of L1 acquisition as “… a criterion to define the naturalness of languag-
"eck-r23"> "eck-r15"> "eck-r12"> "eck-r16">
Universals, innateness and explanation in second language acquisition
es” (Weiß, this volume:184), it would follow from their N2 classification that ILs are less natural than N1 languages, a point that would be disputed by many researchers in second language acquisition. 2. One of the reviewers correctly pointed out that the problem with the syllabification of (1b) is that it lacks an onset, and therefore violates the principle of syllabification that requires an onset. Such a principle is actually a special case of the SSG. In the case of (1b), the [t] of the complex coda of the first syllable is available to be the onset of the second syllable, bringing the sonority profile of the second syllable in line with the requirements of the SSG. 3. The point here is that, in what Sanders (1974) has termed explanatory ascent, the questioning must be of such a nature that it forces the investigator to search for more encompassing principles. As we will see below, it is always possible to raise further questions about explanations, but it is not always the case that such questioning will result in explanatory ascent. 4. One of the reviewers cites ‘an easy man to please’ as a counterexample to the generalization that all specifiers in English precede their heads. Assuming that ‘to please’ is indeed part of the specifier, then either the explanandum in A2 of (2) would have to be changed, or there would have to be some other principle involved to account for the placement of ‘to please’. Although the example provided by the reviewer may alter the content of the explanatory principles involved, the point about the explanatory ascent remains. 5. The Bley-Vroman et al. study also tested ECP violations, and considered Subjacency violations other than just Complex NPs. 6. The scores for the subjects were reported as percentage of correct judgments. Whereas the average score for the L2 subjects (75%) was lower and reliably different statistically from the average score of the native speakers, (92%), the 75% score of the non-natives was statistically different from chance. On the other hand, although all of the native-speaking controls scored 80% or higher, a score generally argued to be a threshold in L2 acquisition, only 26 of the 92 non-natives scored at this level or higher. 7. Of course, other kinds of questioning not involving what we have termed explanatory ascent are not cut off at this point. Thus, for example, it would be possible to ask why Subjacency and not some other principle of UG would be relevant in this case. But this kind of questioning is not the same as that represented in (2) or (6); that is to say, this kind of questioning does not lead to explanatory ascent, yielding a deeper understanding of Subjacency by relating it to other phenomena. 8. The universals archive at the University of Konstanz (http://ling.uni-konstanz.de/pages/proj/ sprachbau.htm) lists only Imonda as a counter example in that it indicates singular and dual with a non-zero morpheme and uses a zero morpheme to indicate plural. 9. The studies by Gass and Hyltenstam both used the version of the AH formulated by Keenan & Comrie (1977) shown in (10). 10. The TLs in Gass’ and Hyltemstam’s studies were, respectively, English and Swedish. Some of the NLs did not allow resumptive pronouns in any type of relative clause, and some of the NLs had relative clauses with resumptive pronouns in some, but not all positions on the AH. 11. This position also seems to be taken by Newmeyer (this volume) in which he argues that a number of typological generalizations can be derived from domain-general principles that are not related to UG. 12. See also Hawkins (1994) and Kirby (1999) for other proposed explanations for the AH. 13. See O’Grady (1987) for discussion.
References Archibald, J. 1998. Second language phonology. Philadelphia: John Benjamins. Bley-Vroman, R.; Felix, S.; and Ioup, G. 1988. “The accessibility of Universal Grammar in adult language learning”. Second Language Research 4: 1–32. Chomsky, N. 1982. On the generative enterprise: a discussion with Riny Huybregts and Henk van Riemsdijk. Dordrecht: Foris. Croft, W. 1990. Typology and universals. Cambridge: Cambridge University Press Doughty, C. 1991. “Second language instruction does make a difference”. Studies in Second Language Acquisition 13: 431–469. Eckman, F. 1977. “Markedness and the contrastive analysis hypothesis”. Language Learning 27: 315–330. Eckman, F. 1984. “Universals, typologies and interlanguages”. In: Rutherford, W. E. (ed.), Language universals and second language acquisition 79–105. Philadelphia: John Benjamins. Eckman, F. 1991. “The Structural Conformity Hypothesis and the acquisition of consonant clusters in the interlanguage of ESL learners”. Studies in Second Language Acquisition 13: 23–41. Eckman, F. 1996. “A functional-typological approach to second language acquisition theory.” In: Ritchie, W. C.; and Bhatia, T. K. (eds), Handbook of second language acquisition 195–211. San Diego: Academic Press. Eckman, F.; Bell, L.; and Nelson, D. 1988. “On the generalization of relative clause instruction in the acquisition of English as a second language”. Applied Linguistics 9: 1–20. Eckman, F.; Moravcsik, E; and Wirth, J. 1989. “Implicational universals and interrogative structures in the interlanguage of ESL learners”. Language Learning 39: 173–205. Friedman, M. 1974. “Explanation and scientific understanding”. Journal of Philosophy 71: 5–19. Gass, S. 1979. “Language transfer and universal grammatical relations”. Language Learning 29: 327–344. Gass, S. 1982. “From theory to practice”. In: Hines, M.; and Rutherford, W. E. (eds), On TESOL 129–139. Washington DC: TESOL. Gass, S.; and Selinker, L. 2001. Second language acquisition: an introductory course. Mahwah, NJ: Lawrence Erlbaum Associates. Glymour, C. 1975. “Relevant evidence”. Journal of Philosophy 72: 403–426. Greenberg, J. 1976. Language universals. The Hague: Mouton. Gundel, J.; Houlihan, K.; and Sanders, G. 1986. “Markedness distribution in phonology and syntax”. In: Eckman, F.; Moravcsik, E.; and Wirth, J. (eds) Markedness 107–138. New York: Plenum Press. Hawkins, J. 1988. “Explaining language universals”. In: Hawkins, J. (ed.) Explaining language universals 3–28. New York: Basil Blackwell. Hawkins, J. 1994. A performance theory of word order and constituency. Cambridge: Cambridge University Press. Hawkins, J. 1999. “Processing complexity and filler-gap dependencies across grammars”. Language 75: 244–285. Hempel, C.; and Oppenheim, J. 1948. “Studies in the philosophy of science”. Philosophy of Science XV: 135–175. Hyltenstam, K. 1984. “The use of typological markedness conditions as predictors in second language acquisition: the case of pronominal copies in relative clauses”. In: Andersen, R.
Universals, innateness and explanation in second language acquisition
(ed.), Second languages: a cross-linguistic perspective 39–58. Rowley, MA: Newbury House Publishers. Keenan, E.; and Comrie, B. 1977. “Noun phrase accessibility hierarchy and universal grammar”. Linguistic Inquiry 8: 63–99. Kirby, S. 1999. Function, selection and innateness: the emergence of language universals. Oxford: Oxford University Press. Lehmann, C. 1986. “On the typology of relative clauses”. Linguistics 24: 663–680. Lewontin, R. C. 1998. “The evolution of cognition: questions we will never answer”. In: Scarborough, D.; and Sternberg, S. (eds), An invitation to cognitive science, Volume 4, Methods, models and conceptual issues 107–132. Cambridge, MA: MIT Press. Mithun, M. 1984. “The evolution of noun incorporation”. Language 60: 847–893. Newmeyer, F. 1998. Language form and language function. Cambridge: MIT Press. Newton-Smith, W. H. 2000. “Explanation”. In: Newton-Smith, W. H. (ed.), A Companion to the philosophy of science 127–133. Malden, MA: Blackwell Publishers. O’Grady, W. 1987. Principles of grammar and learning. Chicago: University of Chicago Press. Sanders, G. 1974. “Introduction”. In: Cohen, D. (ed.), Explaining linguistic phenomena 3–20. New York: John Wiley & Sons. Schwartz, B.; and Sprouse, R. 1996. “L2 cognitive states and the full transfer/full access model”. Second Language Research 12: 40–72. Schwartz, B.; and Sprouse, R. 2000. “When syntactic theories evolve: consequences for L2 acquisition research”. In: Archibald, J. (ed.), Second language acquisition and linguistic theory 156–186. Malden, MA: Blackwell Publishers. Scriven, M. 1975. “Causation as explanation”. Nous 9: 3–10. White, L. 1987. “Markedness and second language acquisition: the question of transfer”. Studies in Second Language Acquisition 9: 261–286. White, L. 1989. Universal grammar and second language acquisition. Philadelphia: John Benjamins. White, L. 1996. “Universal grammar and second language acquisition: Current trends and new directions”. In: Ritchie, W. C.; and Bhatia, T. K. (eds), Handbook of second language acquisition 85–120. San Diego: Academic Press. White, L. 2000. “Second language acquisition: from initial to final state”. In: Archibald, J. (ed.), Second language acquisition and linguistic theory 130–155. Malden, MA: Blackwell Publishers. Wolf-Quintero, K. 1992. “Learnability and the acquisition of extraction in relative clauses and wh-questions”. Studies in Second Language Acquisition 14: 39–70.
239
"whi-r7"> "whi-r8"> "whi-r5"> "whi-r11">
‘Internal’ versus ‘external’ universals Commentary on Eckman Lydia White McGill University
Fred Eckman argues that the typological approach to linguistic universals and markedness — in contrast to an approach which assumes domain specific and innate principles of Universal Grammar (UG) — offers a superior explanation of the facts of second language acquisition, because it allows for levels of explanation, in particular “explanatory ascent” (following Hempel and Oppenheim 1948). Eckman’s article is predicated on the assumption that an explanation is only valid if it raises further questions that lead to higher levels of explanation. According to him, an approach to universals grounded in UG is not fruitful because, in some sense, one reaches a ceiling, beyond which no further questions can be asked (as in his illustration involving Subjacency [p. 226]). Eckman’s claim raises the issue of what the object of study of a theory of second language acquisition (SLA) is, what the goals of such a theory should be. According to him, the goal of an SLA theory is to explain the facts of SLA (p. 217). In contrast, I maintain that the goal of a theory of SLA is not to explain the facts of SLA, nor to arrive at a level of explanation that can, in due course, be subsumed under some higher level of explanation. Rather, along with many SLA researchers (who are not necessarily proponents of UG), I assume that SLA is in fact a (fledgling) branch of cognitive science (e.g. Gregg 2003). As Long and Doughty (2003: 866) put it, “A discernible trend … has been for increasing numbers of researchers and theorists … to focus their attention on SLA as an internal, individual, in part innately specified, cognitive process”. In other words, the object of inquiry, broadly conceived, is the mind of the second language learner. And the goals of an SLA theory include understanding the nature of interlanguage competence (what is knowledge of language?), as well as how L2 learners come to know what they know (how is that knowledge acquired?) (see Gregg 1996; White 1989b, 2003; cf. Chomsky 1986). In pursuit of these goals, SLA researchers working in the tradition of generative grammar assume that interlanguage grammars involve unconscious mental representations, constrained
by UG or by universal principles which are not domain specific and/or by the L1 grammar (e.g. Flynn 1996; Gregg 1996, 2003; Hawkins 2001; O’Grady 1996, 2003; Schwartz and Sprouse 1994; White 1989b, 2003). The generative linguistic approach to SLA adopts a particular perspective on the nature of the underlying unconscious knowledge of second language learners. Considerations of learnability (the logical problem of L2 acquisition) motivate the claim that certain properties of the L2 could not be acquired without a ‘built-in’ Universal Grammar (see White 1989b, 2003). In other words, when L2 learners acquire properties that are not represented in the L1 grammar and which are underdetermined by the L2 input, this suggests that interlanguage grammars must be constrained by internal universal principles (i.e. UG). It is important to note that researchers may agree about the general nature of the SLA research enterprise without accepting arguments for a domain-specific UG (see, for example, O’Grady 1996, 2003). Hence, researchers from a variety of perspectives share objections to explanations couched in terms of implicational universals, precisely because these universals are defined and identified in external rather than internal terms. Eckman dismisses observations from Archibald (1998), Gass and Selinker (2001) and White (1987) (see also White [1989a]) who all have suggested that typological universals are insufficiently explanatory. These researchers agree that what has to be accounted for in SLA is the individual learner’s underlying interlanguage system, as well as the mechanisms that allow this system to be acquired. Since typological markedness is external to the learner (determined on the basis of detailed cross-linguistic comparisons carried out by linguists), implicational universals as such cannot constitute any part of the unconscious knowledge that the learner brings to bear on the acquisition process. Instead, they themselves require explanation, as Eckman fully acknowledges — this is, after all, the main thrust of his paper. As Gregg (1993) points out, in order for markedness to be useful as a potential explanation of SLA, “it is necessary to define markedness in such a way as to connect it with learning mechanisms within the individual learner”. This is something that learnability-based definitions of markedness do (see White 1989a) and that implicational definitions fail to do. And it is precisely on this point that Eckman seems ambivalent (or even contradictory). On the one hand, he maintains that typological universals may ultimately be subsumed under higher order principles with a more direct connection to the mind of the L2 learner (p.234–236). On the other hand, he explicitly denies the psychological reality of markedness, that is, he denies that the L2 learner might have even implicit knowledge of markedness principles (p. 233). This surely implies that there is no underlying learner-internal explanation of implicational universals, i.e. that these are not based in human cognition. If we can agree that SLA is a cognitive science and that what has to be explained
includes the L2 learner’s underlying linguistic competence and how such competence is acquired, then an explanation grounded in external universals rather than internal ones (whether UG-derived or not) will not get us very far. The fact that typological universals invite (or necessitate) a higher level of explanation is perhaps their weakness rather than their strength.
References Archibald, J. (ed.). 1998. Second language phonology. Amsterdam: John Benjamins. Chomsky, N. 1986. Knowledge of language: its nature, origin, and use. New York: Praeger. Flynn, S. 1996. “A parameter-setting approach to second language acquisition”. In: Ritchie, W.; and Bhatia, T. (eds), Handbook of language acquisition 121–158. San Diego: Academic Press. Gass, S.; and Selinker, L. (eds). 2001. Second language acquisition: an introductory course. Mahweh, NJ: Lawrence Erlbaum. Gregg, K. 1993. “Second language acquisition: history and theory”. In: Asher, R. E. (ed.-in-chief), Encyclopedia of Language and Linguistics 3720–3726. Oxford: Pergamon Press. Gregg, K. 1996. “The logical and developmental problems of second language acquisition”. In: Ritchie,W.; and Bhatia, T. (eds), Handbook of second language acquisition 49–81. San Diego: Academic Press. Gregg, K. 2003. “SLA theory: construction and assessment”. In: Doughty, C. J.; and Long, M. H. (eds), The handbook of second language acquisition 831–865. Oxford: Blackwell. Hawkins, R. 2001. Second language syntax: a generative introduction. Oxford: Blackwell. Hempel, C.; and Oppenheim, J. 1948. “Studies in the philosophy of science”. Philosophy of Science XV: 135–175. Long, M.H.; and Doughty, C.J. 2003. “SLA and cognitive science”. In: Doughty, C.J.; and Long, M. H. (eds), The handbook of second language acquisition 866–870. Oxford: Blackwell. O’Grady, W. 1996. “Language acquisition without Universal Grammar: a general nativist proposal for L2 learning”. Second Language Research 12: 374–397. O’Grady, W. 2003. “The radical middle: nativism without Universal Grammar”. In: Doughty, C.J.; and Long, M.H. (eds), The handbook of second language acquisition 19–42. Oxford: Blackwell. Schwartz, B. D.; and Sprouse, R. 1994. “Word order and nominative case in nonnative language acquisition: a longitudinal study of (L1 Turkish) German interlanguage”. In: Hoekstra, T.; and Schwartz, B. D. (eds), Language acquisition studies in generative grammar 317–368. Amsterdam: John Benjamins. White, L. 1987. “Markedness and second language acquisition: the question of transfer”. Studies in Second Language Acquisition 9: 261–286. White, L. 1989a. “Linguistic universals, markedness and learnability: comparing two different approaches”. Second Language Research 5: 127–140. White, L. 1989b. Universal grammar and second language acquisition. Amsterdam: John Benjamins. White, L. 2003. Second language acquisition and Universal Grammar. Cambridge: Cambridge University Press.
243
"eck2-n*">
Author’s response ‘External’ universals and explanation in SLA* Fred R. Eckman University of Wisconsin — Milwaukee
After summarizing the main points of my paper, Lydia White begins her commentary by espousing the view that the object of study for second language acquisition (SLA) theory is not, as I suggested, the facts of L2 acquisition, nor is the goal of the theory to arrive at a higher level of explanation by being subsumed under more general principles. Rather, she continues, SLA theory is a branch of cognitive science, and the object of inquiry is the mind of the L2 learner (p. 241). Several points need to be made here. First, White contradicts her statement that SLA theory is not about explaining facts of second language acquisition when she writes (p. 242) “If we can agree that SLA is a cognitive science and that what has to be explained includes the L2 learner’s underlying linguistic competence and how such competence is acquired,…”. The L2 learner’s underlying linguistic competence and how it is acquired are certainly facts about SLA. Second, I see no reason why including SLA theory under cognitive science should affect at all the enterprise of theorizing about L2 acquisition. The psychological underpinnings of generative grammar can be traced at least to Chomsky (1986: 3), if not earlier. The cognitive nature of generative linguistics is therefore nothing new, which leads me to the third point. It is not clear, at least to me, that associating SLA research with cognitive science would automatically mean, as White contends (p. 241), that researchers would not be interested in subsuming explanations under more general laws. The only reason that I can see for taking such a position would be in order to claim that the principles involved in the explanation, principles of UG, must be defined as being domain-specific. These points aside, let us now consider some areas where I believe that White and I are in accord. I think we are in agreement that the goal of SLA research is to understand the nature of the L2 learner’s mental grammar. I believe that she would also agree that it is reasonable for L2 researchers to attempt to gain insight into L2 grammars by studying L1 grammars, the hypothesis being that whatever underlies L1 grammars may very well constrain IL grammars. This position certainly seems
"eck2-r7"> "eck2-r4"> "eck2-r3">
246
Fred R. Eckman
to underpin the claim that UG is involved in SLA. The question then becomes where should the linguist look to gain this perspective on IL grammars? My view is that typological universals are a reasonable source of insight into the constraints on L1 grammars, and by hypothesis, may provide a view into the nature of interlanguage grammars. In fact, I would borrow White’s own words in support: “… SLA researchers working within the tradition of generative grammar assume that interlanguage grammars involve unconscious mental representations, constrained by UG or by principles which are not domain specific…” (p. 242). I have hypothesized (Eckman 1991, 1996) that interlanguage grammars are constrained by typological universals, which, in the above quotation, would be categorized as principles which are not domain-specific. Though these universals pertain only to human languages, there is no stipulation that they cannot be derived from general principles of human cognition or behavior. Now, does it follow from the hypothesis that typological universals constrain IL grammars that the L2 learners actually “know” the universals in question? Are the universals psychologically real, in other words? The answer depends on what it means for the L2 learner to “know the universal” and what it means to be psychologically real. And here I may bear the responsibility for sowing some confusion, so I will attempt to clarify. If by “know the universal” is meant that the learner must actually be aware of the generalization, then the answer to the above question is negative; if we mean simply that L2 learners behave as though they know the universal, then the answer is yes. There is no evidence that L2 learners explicitly know the Accessibility Hierarchy (AH); but there is plenty of evidence that, in learning relative clauses, L2 learners behave as if they know the AH, because their IL grammars constrain their relative clauses in accordance with the principles of the AH. The position that I think is most reasonable in this case is that L2 learners know, or behave as if they know, whatever principle of human cognition, processing or behavior underlies the AH, and here I suggest the work of Wolfe-Quintero (1992) and O’Grady (1997) as examples of serious proposals for characterizing this principle. In this sense, then, the AH is psychologically real. In fact, there is research in SLA, some of which I have contributed to, that attests to the psychological reality of the AH. Studies by Gass (1982), Eckman et al. (1988) and Doughty (1991) all support the conclusion that L2 learners uni-directionally generalize their learning from more marked to less marked relative clauses, where the basis for determining markedness is the AH. This brings me to my last point, one on which White and I disagree. White states at the end of her paper that “[t]he fact that typological universals invite (or necessitate) a higher level of explanation is perhaps their weakness rather than their strength” (p. 243). I would like to address this assertion by considering the case of Subjacency, a constraint on extraction which has been proposed as a principle of
UG (see Newmeyer, this volume). Accounting for the cross-linguistic facts of extraction using an innate, domain-specific principle of UG, such as Subjacency, makes the claim that the extraction facts are not derivable from, i.e., cannot be explained by, a higher-order principle. I suggest that this is the weaker of two positions that a linguist could take in this instance. The other, stronger position is the one held by Hawkins (1999), in which he attempts to deduce the effects of Subjacency from principles of processing. The strength of Hawkins’ proposal becomes clear when one considers how the two positions can be related to each other. If Hawkins’ claims can be defended, then he has achieved a higher level of explanation by eliminating the need to state a principle of Subjacency. Instead, one would derive the facts of Subjacency from the processing principles, in effect explaining Subjacency. Alternatively, if his processing account turns out not to be defensible, Hawkins always has the option to “retreat” to the weaker position, and account for the facts by postulating Subjacency as a (perhaps innate) principle. I submit that it is not a weakness, but instead a strength, that the potential exists to derive typological universals from higher-order principles.
Note * I would like to thank my colleague, Edith Moravcsik, for many useful discussions on this topic, and for her suggestions and comments on an earlier draft of this reply. Any errors or shortcomings are my own.
References Chomsky, N. 1986. Knowledge of language: its nature, origin and use. New York: Praeger. Doughty, C. 1991. “Second language instruction does make a difference”. Studies in Second Language Acquisition 13: 431–469. Eckman, F. 1991. “The Structural Conformity Hypothesis and the acquisition of consonant clusters in the interlanguage of ESL learners”. Studies in Second Language Acquisition 13: 23–41. Eckman, F. 1996. “A functional-typological approach to second language acquisition theory.” In: Ritchie, W. C.; and Bhatia, T. K. (eds), Handbook of second language acquisition 195–211. San Diego: Academic Press. Eckman, F.; Bell, L.; and Nelson, D. 1988. “On the generalization of relative clause instruction in the acquisition of English as a second language”. Applied Linguistics 9: 1–20. Gass, S. 1982. “From theory to practice”. In: Hines, M.; and Rutherford, W. E. (eds), On TESOL 129–139. Washington DC: TESOL. Hawkins, J. 1999. “Processing complexity and filler-gap dependencies across grammars”. Language 75: 244–285. Hempel, C.; and Oppenheim, J. 1948. “Studies in the philosophy of science”. Philosophy of Science XV: 135–175.
247
"eck2-r7"> "eck2-r8">
248
Fred R. Eckman
O’Grady, W. 1987. Principles of grammar and learning. Chicago: University of Chicago Press. Sanders, G. 1974. “Introduction”. In: Cohen, D. (ed.), Explaining linguistic phenomena 3–20. New York: John Wiley & Sons. Wolf-Quintero, K. 1992. “Learnability and the acquisition of extraction in relative clauses and wh-questions”. Studies in Second Language Acquisition 14: 39–70.
What counts as evidence in historical linguistics?* Olga Fischer University of Amsterdam
The main aim of this paper is to establish the position of historical linguistics in the wider field of linguistics. Section 1 centres on the immediate and long term goals of historical linguistics. Section 2 discusses the type of data that play a role and looks at tools to be used for the analysis of the data. It also addresses the question whether the explanation of the data should be in terms of grammar change (as advocated by formalist linguists) or language change. This latter point automatically leads to the question as to what type of grammatical model or theory the historical linguist should work with, and more particularly, in how far the innate, syntacto-centric generative model is adequate for studying grammar change (Section 3). This is followed by a brief conclusion in which a semiindependent position for the historical linguist is advocated.
1.
Introduction
The heading under which papers were invited for this volume was ‘What counts as evidence in linguistics’, and contributors were asked to consider specifically ‘the case of innateness’. In my contribution I would like to address these issues with respect to a subfield of linguistics, i.e. historical linguistics. The discussion will concentrate on a number of points. A first question is, what is the position of historical linguistics vis-à-vis synchronic linguistics? Secondly, how does the question of innateness impinge on historical research? Thirdly, and this is a point intertwined with the two previous ones, what kind of evidence should historical linguists use within their subfield, and what methods should they employ in the interpretation of the data? Let us now first turn to the position of historical linguistics in the broader field of linguistics and the task of the historical linguist within this.
"fis-r38"> "fis-r30"> "fis-r49">
250
Olga Fischer
1.1 The position of historical linguistics vis-à-vis linguistics Since the work of de Saussure, the synchronic study of language has taken priority over the diachronic approach, which was seen as the only possible approach in the days of the Neogrammarians and in the work of ‘traditional’ descriptive grammarians such as Jespersen and Kruizinga, who wrote grammars of English. The structuralist movement has led to a different way of studying grammar, which has had great influence on our understanding of how grammar works, and has led to a deep interest in the theory of language acquisition, and mental models of grammar. It has proven very fruitful and successful, and has laid bare connections between aspects of grammar that had not been seen before. It has uncovered important principles and universals, such as the importance of basic word order and the behaviour of anaphors and clitics. The advantage of working with a theoretical linguistic model in historical linguistics is that it creates the sense of a goal, we move beyond mere description, beyond texts, to a deeper understanding of how the human mind works. However, the Saussurean dichotomy has also led to a loss of interest in the study of performance, in the study of language as it is processed by speakers and hearers against a historical, socio-cultural background. Through the change in focus, historical linguistics gained a new lease of life, it crept out of its ‘dusty’ corner, so to speak, but as a discipline it also became subservient to the (synchronic) theory of grammar. This is particularly clear in the work of Lightfoot (1979, 1991, 1999). Lightfoot emphasizes that it is through knowledge of what is possible in change that we may discover what is possible in grammar, i.e. change may throw light on the contours of the theory of grammar. It is true that change may provide ‘a window’ (to use Kiparsky’s [1968: 174] words) on the form of linguistic competence or the theory of grammar. At the same time, however, this window gives us only a partial view of what is possible in change if we reduce change to internal causes and mechanisms related to the theory of grammar, i.e. if we ignore all socio-historical factors, all context. The latter is crucial for an understanding of how and why change takes place (see also note 4). Pintzuk et al. (2000b: 10) emphasize — to my mind correctly — that it is “the analysis of variation in E-language, in particular variation in time, [that] can reveal I-language differences” (emphasis added). In other words, their view is that it is the variation on the performance-level that should be studied in detail because “the path of change” can provide us with “information about the nature and organization of the grammar (of the language in particular and of language in general) that is not available from synchronic comparative research” (ibid.).1 Another factor that needs to be mentioned in this connection is that most change involves a combination of internal and external factors (cf. Gerritsen and Stein 1992). It is often hard if not impossible to distinguish between internal and
"fis-r49"> "fis-r38">
What counts as evidence in historical linguistics?
external factors and to measure what the weight of each factor has been in any particular change. McMahon (2000: 120–121) and Pintzuk et al. (2000b: 9) note that Lightfoot (1991: 166, 1999: 105–106) provides six diagnostic properties which should help us decide whether we are dealing with an internal or external change. This may indeed tell us something about internal factors, once the change is underway, but even in these cases it is quite possible that external causes led up to the change, and may therefore be said to provide a ‘deeper’ explanation. McMahon (2000: 124) indeed observes that “Lightfoot’s ideas on the explanatory scope of his theory seem to have modified over the years, [which] might suggest that explanation lies ultimately in the changes in the triggering experience, which Lightfoot accepts he cannot deal with at all” (emphasis added). This would, McMahon concludes, “reduce the potential for explanation from internal aspects of the formal theory”. In other words, the separation of internal and external factors (in itself difficult enough because Lightfoot’s diagnostic properties are not without its problems, see note 2 and Section 3.3.1 [p. 263]) may lead to a loss in explanatory value. Finally, it is to be noted that, because of the emphasis on model building and on the elegance and simplicity of the model, the theory of grammar (any theory of grammar) has a tendency to become more and more abstract. It tends to become a purely logical construct, which takes little notice of what speakers and hearers do in real-life circumstances. Changes are often described in terms of changes in rules, conditions or functional categories, that have no surface manifestation. The more abstract the model, the more difficult it is to apply it with real explanatory power to its object of study, the changing language.2 Circularity looms large when the explanation becomes almost purely theory-internal (cf. also McMahon 2000: 128 and passim). 1.2 The task of the historical linguist When we consider the field of historical linguistics from the other direction, from the inside as it were, rather than from the superordinate aim and goal of linguistics, we can ask ourselves the question: what is the purpose of historical linguistics as a separate discipline, what is the task of the historical linguist? Should he (she) directly contribute towards the theory of grammar, towards a clearer understanding of the ‘language blueprint’ — as it is sometimes (mistakenly)3 called — or should he be working in the first place towards a correct description of language data as they occur historically, and towards a deeper understanding of how the language or more precisely, the language output changes? Ideally, I think, the historical linguist should do both. It stands to reason that the way in which the linguistic output changes may tell us something more about the contours of the system that produces that output. In that sense, historical linguistics is a useful
251
252
Olga Fischer
research tool to arrive at the higher goal of understanding how speakers acquire and process language. However, historical linguistics is not only linked to theoretical synchronic linguistics, there is also a link with historical (literary) texts and the proper understanding of these texts. It seems to me that the first goal, an understanding of the system, can best be reached indirectly, via the second one, the investigation of historical data, which I see as the primary task of the historical linguist.4 However, the historical linguist must make use of insights provided by the theory of grammar or, more precisely, by the various theories for grammar that have been developed on the synchronic level, because he needs these insights as tools to tackle his data. For a proper description and explanation of the facts, one needs hypotheses. A historical linguist who works without an explicit formal theory is in danger of interpreting the data in the light of what the language later becomes (cf. Lightfoot 1979: 34), or in the light of his own necessarily restricted intuitions. More often, such a description does not lead to any new insight into the language itself, the ‘grammar’ is presented as a collection of mere facts, not as a system that is learnable.5 Use of a theory should help him to avoid the kind of presuppositions mentioned by Lightfoot (ibid.), and give his search direction. On the other hand, a historical linguist who bases himself too exclusively on a particular theory may come to suggest an interpretation of the historical data that can only be called an oversimplification of its complex nature or even worse can lead to the neglect of relevant and by no means incidental facts.6 The danger of this, a too strong reliance on one theory, is obvious for it affects the quality of both branches of linguistics: the historical and the synchronic theoretical. Such an approach does not do justice to the historical facts, which are, after all, the only true data that we have (the grammatical system underlying the data being, as yet, a fictional rather than a physical fact, see further below), and it may ultimately also provide a false notion of the type of change that the system allows, and hence a false notion of the shape of the system itself.
2. What functions as the basis for historical linguistic research: linguistic utterances or the grammar? In the introduction, I referred to the fact that I consider it the primary task of the historical linguist to give a description of the historical linguistic facts — particularly of the variations that occur — and to give an explanation of the changes that take place with respect to these variants when they are compared over a period of time. One thing must be clear from the start. In order to compare linguistic expressions from different periods one must have a sense of what is comparable. Within
"fis-r2">
What counts as evidence in historical linguistics?
historical phonology or morphology, this is not so difficult since the forms to be compared are fairly similar and of frequent occurrence. One can indeed compare cognates, and the changes that take place in phonology and morphology are therefore relatively easy to discover and describe. In other words, one knows more or less that one is comparing items that show some continuation in their form, which serves as evidence that they go back to the same form etymologically. The case is rather different for syntax. When we compare two syntactic constructions from two different periods, how do we know that we are comparing ‘cognates’ so to speak? For instance, if we are interested in the history of infinitival complements in English, what do we compare? It is unlikely that we will find two exactly the same clauses in our data, probably not even simple ones such as, (1) a. OE b. PDE
Ic seah hie gan I saw her go
In a case such as (1), it could be established on the basis of the phonological forms and with the help of phonological and morphological theory that the four words used are indeed all cognates.7 They are even used in exactly the same order, so that the constructions themselves could be called cognates of one another. But how often do we find such exact forms? And even if they are exactly the same on the surface, how do we know that the underlying structure is the same? For instance, it is possible that the construction has been re-analysed in the course of time, or ‘abducted’ in the grammar developed by a new generation of speakers (‘abduction’ is the term Andersen (1973) and other linguists after him have used, but see Deutscher 2002, who argues convincingly that ‘abduction’ is the same as ‘reanalysis’). This was the case, for instance, with the Old English form an nædder, which at some point in Middle English was analysed both as an adder as well as a nadder. How do we know how any individual analysed it? We only know when that same individual uses the word with a definite article, when he says either the adder or the nadder. It means that in order to know what has happened we must — beside a knowledge of phonological developments, of course — have a sense of system (in this case the determiner system), and we must be able to look also at other forms that are not strictly cognate, i.e. in the case of nædder, at the same form preceded by the. It is also a change in the system itself that made the development possible: in Middle English we witness the development of an article system, and the grammaticalization of the numeral a¯n to the article a(n). Thus, in order to explain this particular case we need a database of utterances, covering more than strict cognates, and a sense of the grammatical system of Middle English.8 Another reason why exact cognate forms are not good enough as data to study syntactic change is that, if we can only use such exact forms, our corpus of evidence, or the data-set on which we will have to base our ideas about change, will be
253
254
Olga Fischer
very small. Quite clearly, for syntax, we will need to abstract away from the surface forms, and we will have to compare constructions. This is always a hazard and great care has to be taken in this. It does show, however, that we need some theory of grammar in order to study syntactic change at all. The grammar, thus, becomes more important than or as important as the surface forms found in the historical documents. In that sense Lightfoot (1999: 74) raises an important point when he writes in a book on language development: our focus here is grammars, not the properties of a particular language, or even general properties of many or all languages. A language on the view sketched here is an epiphenomenon, a derivative concept, the output of certain people’s grammars. … So when we think about change over the course of time, diachronic change, we shall now think not in terms of sound change or language change, but in terms of changes in these grammars, which are represented in the mind/brains of individuals … (emphasis added)
For Lightfoot, indeed, only grammar is worthwhile as an object of study, considering “language … [to be] an epiphenomenon”. Later in his book he sharpens his ideas somewhat further when he states “[h]istorical linguists who limit themselves to phenomena of language change are prone to offer pseudo-explanations which lose touch with reality and which create mysteries where there is nothing mysterious” (p. 212). Even though I agree with Lightfoot that we need (a theory of) grammar in order to describe and explain syntactic change, I also believe that we need to study the utterances or the historical documents as much, in order to deduce what the grammar of a particular period is like. Indeed, we need to study constructions in context in order to determine their meaning, and the possible structural reinterpretations that may have occurred. If a change is described in terms of the grammar, we in fact only describe the endpoint of a change. In order to understand why something started changing we must look at the variations over time as they begin to occur on the performance level. Linguistic structures or patterns may change, for instance, because speakers ‘see’ an analogy with other structures; they may change via pragmatic inferencing when they are frequently used in certain contexts; they may change because the language community is ‘invaded’ by speakers with a different linguistic background etc. These performance variants, to be found in the output of adults, will serve as primary linguistic data to children in the next generation and they may ultimately cause these children to set up grammars that are slightly different from those of the adults. For a full understanding of these changes we therefore need to investigate the innovation stage as well as the later grammatical change. For a full understanding of the system of grammar with which adults innovate, we should look at how they innovate. Lightfoot tends to ignore that linguists’ knowledge of the grammar system is
"fis-r26"> "fis-r37"> "fis-r48"> "fis-r55">
What counts as evidence in historical linguistics?
indirect, and that, as far as I can see, it depends very much on our interpretation of the data. For Lightfoot, however, grammar (the genotype) is a biological fact, something innate, much of which is ‘given’ or ‘pre-wired’ in our brains.9 With such a view of grammar, grammar is indeed more important, and the grammar should be the object of our investigations. But there are two large questions here. First, is this grammar indeed innate, and second — if we admit the first point — does the innate grammar look like the one devised by formal linguists? These two topics need to be addressed first before we can turn again to the question as to what counts as evidence for the historical linguist.
3. Is grammar innate?10 What pleads for or against innateness of some type of core grammar? The arguments usually put forward by the formalist linguistic school are of a biological and a logical nature; both are, as yet, based on indirect evidence (see also Haspelmath, this volume). We will first look at the biological facts, but before we do this something must be said about the way in which I will use the term ‘grammar’. I will use it to refer to the structural (morpho-syntactic) component, as is usual within generative theory, but I will use it loosely. Such imprecision is inevitable since there is no agreement as to what this innate grammar might contain. This depends on the theoretical model being used and also on the state of the art of this model (for a useful overview, see Jackendoff 2002:40–82). One thing is clear, not all of what our individual grammars (phenotypes) contain (cf. note 9) is part of an innate grammar (genotype) as postulated by the generative school, but how much of it is, is not clear. 3.1 Innateness: Biological considerations Evidence from brain-damaged patients is often used to show that there is a special part or module in our brains that deals with grammar. Pinker (1994: 45–46) and others who believe that language has “an identifiable seat in the brain” (ibid. p. 45) describe cases of people with Broca’s aphasia, whose grammatical processing is seriously impaired but whose lexical processing is left more or less undisturbed. It was at first thought that this impairment was strictly related to Broca’s area but further research has shown (see Lieberman 1991: 85; Pinker 1994: 308–310; Slobin 1997:281–282; Goldberg 2001:41) that this is far too simple an idea. Lieberman, for instance, writes, The traditional view of Broca’s aphasia is that damage localized to Broca’s area will result in these [grammatical] deficits, whereas damage to any other part of the brain will not. This belief is reflected in popularized accounts of how the human
255
"fis-r37"> "fis-r55">
256
Olga Fischer
brain works, and in the supposition of many linguists that human beings have a specific, localized “language organ” (…). However, that supposition is erroneous… . The damage pattern that produces Broca’s aphasia interrupts the circuits between Broca’s area and other parts of the brain … . In fact subcortical damage that disrupts the connections from Broca’s area but leaves it intact can result in aphasia (Lieberman 1991: 85, italics in original)
And Slobin (1997: 282) notes that cross-linguistic studies of aphasia have failed “to find any support for a ‘dual-lexicon hypothesis’, which postulates that open- and closed-class items are mediated by different mechanisms and/or stored separately”. Rather, this research points to “processing factors alone as distinguishing the two classes”, and he adds that it is more likely that both classes of words are handled within a single lexicon.11 In other words, a localized grammar module has not (yet?) been found. The autonomy of grammar, too, has been brought seriously into doubt by these findings.12 A recent book on brain research by Goldberg (2001, esp. Chapter 5) indicates that the idea of modularity (and, by implication, of a grammar module) is finding less and less support. Goldberg’s research has produced a number of experimental results important in this connection. He found that different parts of the lexicon are stored in different places: e.g. the “naming [of] animals activated the left occipital areas, whereas naming tools activated the left premotor regions in charge of right hand movements” (Goldberg 2001: 66). From this experiment and others he concludes that “different aspects of word meaning are distributed in close relationship to those aspects of physical reality which they denote” (ibid.) and that the “cortical mapping of language is decidedly distributed” (ibid. p. 65).13 This suggests that words may well be learned together with their pragmatic or real-world handling of them (this may constitute proof for the importance of the situational context in learning!) and that for that reason these words get stored in the place which is also in charge of the movements needed to execute whatever the content is of these words. Another important point that Goldberg’s research has shown is that hemispheric specialization is not unique to humans but also occurs in the great apes. This casts serious doubt on the idea that the central difference between the right and left hemispheres can be based on language alone (Goldberg 2001: 41–42) as has been done traditionally. He finds that there is a more fundamental distinction between the functions of the two hemispheres and this has to do with learning: “The brains of higher animals, including humans, are endowed with a powerful capacity for learning. Unlike instinctive behavior, learning, by definition, is change. The organism encounters a situation for which it has no ready-made effective response” (p. 44). Goldberg notes that “[a]t an early stage of every learning process, the organism is faced with ‘novelty,’ and the end stage of the learning process can
"fis-r5"> "fis-r37"> "fis-r50"> "fis-r62"> "fis-r43">
What counts as evidence in historical linguistics?
be thought of as ‘routinization’ or ‘familiarity’” (p. 44). Goldberg believes that it is the role of learning and learned behaviour at the expense of instinctive behaviour that led to the difference between the two hemispheres. He concludes on the basis of experiments and evidence from brain-damaged patients that the right hemisphere deals with cognitive novelty, with the first stage of any kind of learning, while the left hemisphere deals with learning that has been routinized or automatized (p. 52).14 These new findings question the idea of distinct and highly language-specific modules. It is more plausible, in other words, that language is dealt with in both hemispheres: the learning of language, by means of the development of a grammar or system, taking place in the right hemisphere, and, once it is learned, the processing of it being relegated to the left. This would explain the hitherto puzzling “lack of adverse effect of left-hemispheric damage in children” (who are still dealing with ‘novelties’) as far as language is concerned, and the “particularly severe adverse effect of right-hemispheric damage” (p. 43) on these children. Before we leave the topic of biological innateness, I must add one rider. Although I have suggested that on the basis of our present knowledge about the workings of the human brain, there is no necessity to accept a specific grammar module, it does not follow that no part of our language faculty is situated in our genes. On the contrary, it is well-known that the human larynx and vocal tract have evolved differently compared to that of the great apes; they have become adapted to produce sound efficiently, as is necessary for the precise articulation needed in human language (cf. e.g. Carstairs-McCarthy 2000: 252). The same is true for other brain mechanisms such as memory, which also has become adapted to the production of language (cf. e.g. Yngve 1996: 90–91). At the same time, however, these organs have also retained their old functions to a greater or lesser extent, and more or less successfully (cf. Lieberman 1991: 14–16, 53–57). Such adaptations for language are now part of our genotype. So parts of the language faculty are certainly innate (cf. also Pullum and Scholz 2002: 10). What is being questioned is the very specific nature (and position) of an innate grammar module. 3.2 Innateness: Logical considerations The other arguments for innateness come from the so-called poverty-of-thestimulus concept. It should be noted before we address the content of this notion, that the various schools of linguistics are very much divided on this issue: where formalist (generative) linguists plead for the poverty of the ‘primary linguistic data’ (PLD), others emphasize the great richness of the data available to the child (e.g. Yngve 1996: 90). The latter linguists point especially to the help given a child by means of conceptual structure (see for direct commentary on the poverty idea and the nature of the child’s triggering experience, McCawley 1989; Schlesinger 1989;
257
"fis-r56">
258
Olga Fischer
and Snow and Tomasello 1989). I will briefly review some aspects of the poverty concept here.15 The notion refers to the child’s ability to learn the parent language within a relatively short period of time in spite of the following obstacles: i.
The PLD is poor, it contains many incomplete, ill-formed utterances, and yet the child makes the correct choices, it does not overgeneralize. ii. No evidence is provided in the PLD for constructions that do not occur; i.e. the PLD is not rich enough to determine the limits to the generalizations that the child makes. iii. Very little linguistic correction is offered where it could be offered. iv. The child produces novel utterances that it has never heard before. The poverty-of-the-stimulus as an argument for innateness rests on the idea that the PLD is poor (incomplete/ill-formed) and that therefore there must be some linguistic ‘extra’ that helps the child to acquire language. Note that the basis for the concept itself is weak in that there is no precise criterion for establishing ‘poverty’, poverty is merely assumed.16 No wonder there is such controversy among linguists on this issue.17 The reason why the generative or formalist linguist assumes that the PLD is incomplete is because he focusses strictly on sentences, as a product of the brain, and on their formal structure. In other words, all context, both linguistic and situational, is left out of account. As an example, to see how this works, we will have a look at a generative account of how children acquire pronouns. Lightfoot (1999: 50ff.) notes that pronouns refer back to a noun previously mentioned in the sentence (as in [2a–c]), but that this is not the case in (2d), where the pronoun him may not refer to Jay, (2) a. b. c. d.
Jayi hurt hisi/j nose Jayi’s brother hurt himi/j Jayi said hei/j hurt Ray Jayi hurt himj
The question is, how do children acquire the right generalization, and, particularly, how do they acquire knowledge of the exception, i.e. how do they know that him in (2d) cannot refer to Jay? Lightfoot’s solution is that “children learn from their environment [from the PLD] that he, his etc. are pronouns, and native principles dictate [in this case those of the so-called Binding Theory, see p. 58] where pronouns may not refer to a preceding noun” (p. 52, emphasis added). The weakness of this account is that not all the communicatorily relevant details of the utterances are taken into consideration but only their strictly structural properties (since the grammar is autonomous), as if the child learns to handle these pronouns in an otherwise complete void. It is relevant to ask, when children learn that he, his etc. are pronouns “from their environment”, how they learn this. Do they simply mark them off and then
"fis-r62">
What counts as evidence in historical linguistics?
store them up as lexical elements marked ‘pronoun’? I do not think so. Since pronouns have very little referential content, children can only ‘learn’ these items if they also learn at the same time how they are used in the situation since extralinguistic, real-world meaning will not help them here. Pronouns are not apples you can point at or bite into. What I mean is, a word or concept like ‘apple’ is relatively easy to learn for a child because the object in the real world will conjure up the word.18 This is not the case with pronouns, so when the child learns what pronouns are, it can only learn these in connection with some real-world referent, present in the context of the situation.19 Even though the real-world referent shifts, he soon learns that him refers to some other person or object present in the situation, it cannot refer to himself, and by analogy, the child will understand that him cannot refer to Jay in (2d) because if it did it would refer to the same object. (Note that nose in [2a] is indeed another object, so his can but need not refer to Jay.) The only mistake the child could make, is that it interprets him reflexively, but other evidence from the PLD would soon put the child straight in this respect.20 Lightfoot’s scenario is problematic also in another respect: the Binding Principles do not account for all the ways in which a particular pronoun can refer. Thus, the Principles do not help the child to decide to what noun the pronoun refers in (2a–c). It may refer to the previous noun in the clause, but it may also refer to a noun further away (as indicated by the indices), or even to an entity that has not been linguistically introduced at all (cf. Yngve 1996: 80–81). If the exact reference of the pronouns in (2a–c) has to be learned by the child from the situational context, why then would the child not use the same context and similar learning strategies to understand that him does not refer to Jay in (2d)? There is no need for a quite complicated innate Binding Principle to forbid the interpretation of coreference between Jay and him in (2d). I would add, moreover, that in the real world it would be as important for the child to know the exact reference of the pronouns in (2a–c) as it is to know the non-referential possibility of the pronoun in (2d), but the Binding Principles would only help him with the latter, not with the former. So what use are such Binding Principles to him? Another very important point that has to be made in relation to point (i) of the poverty-of-the-stimulus notion, is the idea that the children’s PLD contains “many incomplete and ill-formed sentences”. Clark (2003: 25–54) devotes a whole chapter to this problem and concludes that children in fact enjoy a large amount of ‘schooling’ in language (albeit different for different cultures) by means of what she calls “child-directed speech”. She has analysed large amounts of this speech, and finds that this consists of simple sentences (p. 44), contains a lot of repetition (pp. 42–43), in sum that “it is singularly well tailored to its addressees, highly grammatical in form, and virtually free of errors” (p. 28). Clark also stresses the language practice that children exert upon themselves in endless trials and repeti-
259
"fis-r10"> "fis-r38"> "fis-r55"> "fis-r20">
260
Olga Fischer
tions of new forms or constructions they have heard (pp. 122–124; 184–185; 421). The second aspect of the poverty-of-the-stimulus argument concerns the idea that the child makes the correct choice in spite of negative (unavailable) evidence. Let me illustrate this again first with an example. UG predicts that children will not produce sentences such as Who do you *wanna take a walk, where want to cannot be contracted because of an intervening trace (i.e. the trace left by who, which is the subject of the infinitive), whereas they may produce clauses such as What do you wanna eat where no trace intervenes (because what has been moved from a different underlying position), which makes contraction possible. Quite an ingenious experiment was set up by Crain (1991: 602–604) (also discussed in Lightfoot 1999: 69–70), which shows that when children between the ages of 2.10 and 5.5 are ‘channelled’ into producing a sentence like the first one (Who do you want to take a walk?) — about which they are unlikely to have any positive evidence —, the unreduced want to form occurs 67 percent of the time, the reduced wanna form only 4 percent. The second sentence, however, shows a result of 59 percent contracted, 18 percent uncontracted. (In both cases the remaining percentages concern children who did not produce any sentence at all.) This result is then neatly explained by the UG rules about movement and traces. But are other explanations not equally possible? Slobin (1985a: 1229) shows in his research on children’s learning model (which he terms the LMC — the ‘Language Making Capacity’, a more general learning model than the generative model of UG) that children’s operating principles or strategies avoid “synthetic forms in favor of more analytic expressions” “for purposes of clarity”; i.e. a child would go for the longer, more analytic form (want to in this case) when he realises that a certain notion is complex. There is no doubt that the above sentence (Who do you want to take a walk?) is complex for a child, since it hardly ever occurs in everyday speech.21 It also seems probable, in terms of the new discoveries in Goldberg (2001) about the different functions of the two hemispheres, that the production of wanna is linked to automatized behaviour stored in the left hemisphere and that the more complex, novel sentence is interpreted by the right hemisphere. In other words, a different mechanism might be involved in the children’s production of the want to and wanna sentences. In addition, it is important to realise that there may well have been acoustic clues present in the experimental situation, which makes the child differentiate between the who-clause (which requires want to) and the what-clause (which may have wanna). The experiment does not provide any evidence of this (presumably because any situational or non-structural clues are not relevant to the experimenter). It is quite clear, however, from child-language experiments that the earliest and most important operating principles handled by children are based on acoustic salience (cf. Peters 1985: 1033–1040), and that indeed children produce (or imitate) melodies or
"fis-r18"> "fis-r6"> "fis-r16">
What counts as evidence in historical linguistics?
intonational units before they produce words (see also Foster 1990: 46). An issue related to this is that the principles of UG (i.e. ones that we have seen proposed in the literature) are unnecessarily specific and complex because it has to allow for the generation of all logically possible sentences even those that can be shown not to occur in reality (e.g. by means of corpus evidence). Again we run against the problem here that generative linguistics does not study physical data in the real world but abstracted linguistic data in our competence. We should also note in this connection that there is a clear tendency to link ‘competence’ with the rules and procedures of written language. Within generative theory the characteristics of the spoken language are relegated to the performance level, and to my knowledge, it has never been seriously considered within this framework that the structure of spoken and written language may be essentially different in a number of ways. Since the young child learns language only through the oral/aural channels, it is the system of spoken language we should study. Weiß (this volume) likewise points to the fact that standard languages are not (or for a long time were not) acquired as first languages — he calls these standard forms “second order natural languages” (p. 184). It seems to me that linguists’ intuitions about language are often pre-programmed or modeled by their (grammatical) schooling, and thus reflect the logical notions acquired there rather than the ‘real’ competence (whatever that is). This point has an important historical corollary. Some of the changes taking place in the history of a language may have more to do with the development of spoken language into a written standard than with changes in the language per se. A large amount of research on the oral/written parameter has been conducted by non-English, and in particular German, scholars, as shown by the papers in Cheshire and Stein (1997) and in Feilke et al. (2001) (and see also the references given in Weiß, this volume). In the introduction to the latter volume, the editors argue that languages which have developed a written standard undergo, what they call, a “Verschriftlichung der Sprache” (Feilke et al. 2001: 18–24), i.e. the spoken standard is influenced by the forms of the written standard: “Die Schrift … wird zum Motiv einer weitergehenden Sprachanalyse, sie wird zum Motiv grammatischer Analyse und der professionellen Grammatikschreibung selbst” (p. 18). That is, a written standard influences the language both theoretically (in the way we interpret grammar as linguists) and practically, and the influence of Verschriftlichung is especially strong on syntax and the organization of text (ibid. p. 19). According to the authors, the changes that occur have in common that they aim at text which is maximally decontextualized, i.e. text is produced that is maximally explicit as to context (ibid. p. 20, and see also Yngve 1996: 300ff.). Feilke et al. mention especially the development of complex prepositional constructions, constructions expressing purpose, new conjunctions and complex clauses (p. 20).22
Concerning point (iii), the absence of negative feedback from parents or caretakers, we find many linguists, especially psycho-linguists, arguing (in commentaries on articles advocating the poverty-of-the-stimulus notion) that children may well have evidence to negative data: by means of frequency of positive data, for instance, or because corrections made by parents may have effect (see A. Grimshaw 1989: 340; J. Grimshaw and Pinker 1989: 341–342; Schlesinger 1989: 356–357, 1991; Berman 1991: 613; McCawley 1991: 627–628; Sokolov and Snow 1991: 635). Clark (2003: 44) mentions that adults offer a “plethora of tacit corrections, with almost involuntary repeating as they reformulate in conventional terms what the child seems to have said”. The fourth point put forward in support of the poverty-of-the-stimulus concept is the fact that children produce novel utterances, phrases or clauses they have never heard before. It seems to me that this aspect has been somewhat exaggerated. The first thing one notices with small children is their canny intuition (or should we call it ‘uncanny’?) for using or imitating phrases they have heard others use, in the right place and at the right time. The novelty, as far as I can see, consists mainly in the use of already encountered and frequent constructions but with different lexical slot-filling, or in the use of the same construction but with a dependent clause stuck onto it or embedded in it. Lightfoot (1999: 60) calls the latter “iterative devices”, devices which “in principle” may cause “any given sentence [to] be of indefinite length”. Clark (2003: 185) notices that children are very conservative in their use of language. They play around with a new word in already familiar constructions until they have fully mastered it, and they “take a long time to build up a repertoire in which the same construction can occur with several different verbs”. Also it has to be remembered that children do not always know what they are doing, they are often just trying out new things without a sense of how the construction is analysed. Clark (2003: 171), in this context, refers to the “formulaic” way of learning that children practise. Other linguists, too, have pointed to the largely formulaic character of language in use, see e.g. Hopper (1987: 144), who refers to “prefabricated parts”, and Wray (2000), who links the use of formulae to the evolution of language from an earlier proto-language (which she argues contained holistic phrases and no syntax). Interesting in this light is also the well-known discovery made by Milman Parry ([1928] 1971) on the formulaic nature of early oral poetry. 3.3 Further questions relevant to the debate on innateness The questions that arise with respect to the assumption of innateness are threefold. If part of the grammar is indeed innate, we need to find out what the innate rules consist of. Secondly, if we reject that part of grammar is innate, we need to find out
"fis-r38"> "fis-r32"> "fis-r7">
What counts as evidence in historical linguistics?
how else children could learn their native language. A third point is, how do we rate innateness methodologically? 3.3.1 The content of UG? The idea of UG, of a ‘linguistic blueprint’ is that its design is fixed23 and universal. The majority of generative linguists writing about UG see it as “a biological entity, a finite mental organ” or “module”, “a linguistic genotype that [is] part of our genetic endowment” (cf. e.g. Lightfoot 1999: 52–53, 67; Wunderlich, this volume). But not all linguists working within the generative framework agree. Wim Klooster, for instance, one of the earliest practising generative linguists in the Netherlands, wrote recently: “This [UG] of course is not a model of what takes place in our heads, but a model of what an idealized language user possesses in terms of implicit linguistic knowledge” (Klooster 2000: 8, my translation). And, I am sure, many more generative linguists are sitting on the fence as far as the physical reality of the model is concerned; indeed even Chomsky did not commit himself for a long time.24 There is one great methodological advantage, of course, in positing a UG that is part of our genetic constitution, our genotype. It makes the enterprise of determining what constitutes UG more scientific in that the contents of UG will ultimately have to be brought into line with the physiological workings of the brain and its limitations. It makes the theory of UG falsifiable at another level and links it to other scientific domains, thus avoiding the danger of circularity. A model of UG in Klooster’s interpretation will be internally logical, and may indeed show up very interesting relations between parts of the grammar (or between linguistic utterances) that had been hidden before, but these findings are not scientifically or empirically falsifiable, only logically. And in fact this is true also for other linguists of the Chomskyan school who consider UG as part of our genotype (see also Section 3.3.3). How are the principles of UG derived? Chomsky argues that the theory of grammar should attain both descriptive and explanatory adequacy. Descriptive adequacy is based on the “intrinsic competence of the idealized native speaker” (Chomsky 1965: 24, emphasis added). In other words, it is as it were two steps removed from the actual language utterances in that performance factors, the individual speaker, pragmatic (and semantic) context, and other communicative factors such as gaze, gesture etc. are all ignored. If the theory of UG is motivated largely on grounds of learnability (cf. also Eckman, this volume), it seems to me that one must then also take the learner’s situation into account, not just his competence. It seems wrong to concentrate only on the ‘product’ (as is done in generative linguistics); one should concentrate on the ‘process’ instead (cf. Clark 2003: 12–14). Another problem with the generative notion of ‘descriptive adequacy’
263
"fis-r7"> "fis-r26">
264
Olga Fischer
is that there is a strong tendency to concentrate only on European, written languages, and specifically on English, which means that the content of UG is rather biased in that direction (cf. also Haspelmath, this volume). Chomsky (1981: 6) denied that this was problematic: A valid observation that has frequently been made (and often, irrationally denied) is that a great deal can be learned about UG from the study of a single language, if such study achieves sufficient depth to put forth rules or principles that have explanatory force but are underdetermined by evidence available to the language learner. Then it is reasonable to attribute to UG those aspects of these rules and principles that are uniformly attained but underdetermined by evidence. (emphasis added)
‘Explanatory adequacy’ represents an even higher goal and as such involves more abstraction still. Chomsky (1965: 27) notes that a grammar is explanatorily adequate when “the grammar is justified on internal grounds, on grounds of its relation to a linguistic theory that constitutes an explanatory hypothesis about the form of language as such” (emphasis in the original). It involves “a reduction in the variety of possible systems” (Chomsky 1981:13) resulting in a higher level of abstraction in the contents of UG so that it can account for every possible language. The search for the form of UG thus became related to “the mathematic theory of learnability” (Chomsky 1981: 11), which in fact meant that an explanatorily adequate theory was to be determined solely from within the logical domain. A ‘logical’ UG and a ‘biological’ UG are subject to different kinds of conditions, and it is perhaps not surprising that for the school of generative linguists as a whole, the exact status of UG is still unclear (see Jackendoff 2002: 79–82, and also Kirby et al., and Tomasello, both this volume). Elegance and economy thus play a more important role in UG than the actual ways in which a child may learn a language. Since the child does not learn its language in a vacuum (in which case logical and economic constraints and principles might indeed be the most economic way to learn), it stands to reason that one has to take into account what the context attributes to the ease of learning in order to measure the ‘economy’ of the learning system or UG. It is quite possible, therefore, that superficial rules may be more helpful than deep, abstract principles since the use of these rules is conjured up by or linked to the communicative situation. As Joseph (1992: 140) argues, it is quite likely that speakers generalize locally rather than globally. Indeed the crucial role played by re-analysis and analogy in cases of language change seems to indicate that speakers/hearers tend to analyse structures rather less economically than a (generative) linguist might hypothesize, i.e. language users seem to pay more attention to the immediately apparent data, to surface forms, than to the more distant data that a linguist might be aware of.25 It is telling in this respect that generative proposals of surface changes
What counts as evidence in historical linguistics?
which are said to be linked to one deep change (such as Lightfoot’s [1979] proposals concerning the emergence of modals, new infinitival constructions and the rule of NP preposing, and cf. also the case described in note 2) were all found to be wanting in some respects (cf. Fischer and van der Leek 1981; Warner 1983). Other factors that may play a crucial role in the learning of rules are frequency and social transmission. Hurford (2000), Kirby (2000) and Kirby et al. (this volume) have shown via computer simulation programs that general rules automatically arise in proto-language given enough time. Such rules, because they can be applied more generally (than earlier idiosyncratic rules in proto-language, which are linked to holistic formulae), are therefore also more frequent and hence the earliest to emerge in a budding system of grammar (general rules are more successful as ‘replicators’, Kirby 2000:319). In addition, through social transmission — individuals learn by observing and imitating others in the same social group (Kirby 2000: 305, and Kirby et al., this volume) — the number of rival generalizations will be reduced within a linguistic community (Hurford 2000; see also McMahon 2000: 161–162). It seems likely that these phylogenetic observations may also apply to ontogenetic learning. Another problem related to Chomsky’s “theory of learnability” applied to UG, is the concentration on purely structural factors. Syntax is seen as the central component of UG, on which everything else depends; the syntactic component indeed is said to be autonomous. Language becomes more difficult to acquire if, according to this notion of UG, learners cannot rely on semantic and pragmatic information, and if frequency of occurrence plays no role in the learning system. Clark (2003: 42, 322 and passim) however, emphasizes all through her study of language acquisition how important repetition is in learning, and she stresses the fact that all learning takes place in a communicative context. Bybee and Hopper (2001) and Krug (2003) show how important frequency is in language change (see also note 18). There is also enough evidence from language change that the semantic and the phonological components may direct change in syntactic structure (this belies the notion that semantics and phonology are only ‘interpretative’ components — see also Jackendoff [2002: Ch. 5], who suggests that phonology, syntax and semantics should be seen as independent tiers correlated to each other through an interface; i.e. the grammar shows “parallel architecture” rather than a “syntactocentric” one). Schlüter (2003) presents some interesting case studies of syntactic and morphological change in English, where the phonology determines patterns of grammatical variation and change. She shows to my mind convincingly that Chomsky’s “worst possible case” (Chomsky 1995: 222), i.e. the idea that the interpretative components may determine the inner nature of the system of grammar, may in fact be the true state of affairs. Schlüter indicates, with the help of empirical data drawn from (historical) corpora, that the shape of the syllable and rhythmic alternation have influenced variation and change in English in the form
265
"fis-r5">
266
Olga Fischer
of the verb BE, in the development of the indefinite article, and in the position of the negative element within the NP. What is so interesting about Schlüter’s examples is the fact that she ties these phonological factors to certain neurological constraints on the way in which language is realised. She refers to work done on the “neurological recovery cycle” (2003: 72), which explains why CV is the ideal syllable structure, and why languages show rhythmic alternation of stressed and unstressed syllables. It is interesting to note in this context that ideal syllable structure and rhythmic alternation are also highly relevant in language acquisition and may play a role in language evolution. Clark (2003: 103) mentions that “[c]anonical babbling consists of short and long sequences containing just one consonant vowel (CV) combination that is reduplicated or repeated, and that in early production children tend to adhere to a “trochaic metrical template” (ibid. p. 118). Carstairs-McCarthy (2000) points to the possible importance of the ‘syllabic frame’ for the evolution of syntax. He further notes (p. 253) that the syllabic frame can be related to an “existing neural mechanism for imposing a regular pattern on speech”, which is “associated with mandibular oscillations”. To sum up briefly at this point. I think that we can agree that grammar is some system situated in our brain. With it we can understand and produce linguistic utterances. We do not know, however, whether there is such a thing as a ‘genotype’ grammar, a generative machine standing at the birth of those utterances, or whether a grammar develops with the help of general cognitive mechanisms as the result of our brains’ analysis of utterances heard. Because we have not yet been able to investigate this ‘machine’ physically, we cannot really know its status. Can we use it as an empirical tool to help us understand the processes of language acquisition and language change, which are both in different ways related to it? That all depends on its status: if part of grammar is innate, then of course its role becomes pivotal for the study of both acquisition and change. If it is not, then the study of linguistic utterances becomes our primary concern in the establisment of the system of grammar.26 3.3.2 If not innate, what then? If it is difficult to prove that part of the grammar is innate, how else could we account for the fact that children acquire their native language in such a relatively short time and are able to produce novel utterances, which presupposes some sort of generative system? First of all, it should be noted that the ‘speed’ with which children are said to learn their native language has to be taken with a pinch of salt. Clark (2003: 421) puts this speed into somewhat more perspective when she notes that [c]hildren spend a lot more time learning a language than adults do. At a conservative estimate, they are attentive to what people are saying at least 10 hours a day, or 70 hours a week. Contrast that with adults who spend a mere 5 hours a week in
"fis-r22"> "fis-r25">
What counts as evidence in historical linguistics?
a language class, with an added hour or so in the language laboratory, for a total of 5 or 6 hours — less than a tenth of the time children spend.
Moreover, when children learn, they also learn more intensively because [s]mall children have little else to occupy themselves with and are less self-conscious than adults about how they appear to others. Adults are used to presenting themselves through language, so their incomplete mastery may also inhibit them socially and further impede their language learning. (ibid.)
A cognitive approach to language learning is that children use general cognitive abilities, which they need elsewhere too, for language. Slobin and his associates (1985b, 1997) have shown what kind of simple strategies are available to children to make sense of the first linguistic noises that they hear. These are mainly strategies that distinguish salient from non-salient acoustic clues, and strategies that help them recognize and store certain categories of sounds. These strategies are very general and simple based on recognizing what is same and what is different. It is by building up strategies upon strategies that children, as they take increasingly complex data into account, become more and more adept at understanding more complex utterances and indeed beginning to produce them. The most important learning tool behind these strategies is the principle of analogy. Holyoak and Thagard (1995) have shown that analogy is one of the prime forces in the learning process of a child, something that is also present in a more primitive form in apes, and in lower mammals too. Analogy is a general principle, an ability that children need, not just to learn language but also to survive in an ever-changing environment. Itkonen (1994: 45) describes this as follows: The properties of co-occurrence and succession, and in particular the causal properties, of things and events are learned on the basis of analogy. Consider the knowledge that all ravens are black, that the day is always followed by the night, and that (every instance of) fire is hot. This knowledge is acquired in two steps. First we infer from the present case to the next one: raven-1/black-1 = raven-2/X, and X = black-2 … fire-1/hot-1 = fire-2/X, and X = hot-2 Second we perform an analogical (or ‘inductive’) generalization: All ravens observed so far are (have been) black Æ All ravens are black.
Analogy also plays a role in distinguishing between what is same and what is different. Clark (2003: 144) elaborates this into the principles of “contrast” and “conventionality”. She notes that children from a very early age are sensitive to differences, and that they infer that a different form also means something different. By setting off this form against the other, ‘conventional’ forms (i.e. the adult forms they have already mastered) in any given speech situation, the child can infer the meaning of the ‘different’ or new form. “[I]t seems reasonable to propose”, Clark
267
"fis-r14">
268
Olga Fischer
(2003: 430) writes, “that children begin with rote memorization and then extend what they have learnt through analogy or rule use”. Rule-use, according to Clark is likely to be “schema-based” (pp. 207–208): i.e. in contrast to rules, they are “product” and not “source oriented” (p. 431) so that here too analogy can be said to play a crucial role. 3.3.3 The question of innateness from a methodological point of view If, as we have seen in Section 3.3.1, it is difficult to establish at this point in time what exactly may be innate in our system of grammar, the question may be legitimately asked whether it is wise, methodologically, to assume innateness, especially innateness of a rather specific kind. Derwing (1977: 79–80) believes that we should be questioning the value of any linguistic theory that attempts to invoke “innateness” as an explanatory vehicle. For to maintain that some cognitive or behavioral skill is “innate” does not provide any positive insight into either its nature or development, but is rather tantamount to an admission of a failure to explain it. “Innateness” is a purely negative notion; it means that something has not been learned, hence that it can not be explained in terms of any known principles of learning. Explanation does not consist in substituting one unknown for another, but rather in accounting for what puzzles in terms of some general principle which is known and which is understood. And how, in any event, does one ever propose to demonstrate that some particular aspect of human language has, in fact, not been learned? (…) The search for “innate” principles, therefore, strikes me as more of a weakness than an attractive alternative to looking for answers in terms of psychological or physiological capacities that human beings have been shown to possess. (emphasis in the original)
Similarly, Eckman (this volume) remarks that a theory which stipulates that its explanatory laws are innate, is not conducive to seeking a higher (more general) level of explanation. In order to move beyond description, which can be seen according to Eckman as a first level of explanation (hence, he sees no real distinction between description and explanation), we need to find explanations in other domains that raise the level of explanation (for the latter see also note 4, and Haspelmath, this volume). For historical linguists a very similar plea was made by Bybee (1988: 357), who writes, “complete explanations must specify a causal mechanism: thus we cannot explain change with reference to preferred types [she is discussing the use of ‘explanatory’ universal typological principles such as Venneman’s ‘Natural Serialization Principle’], but we must explain common types by referring to the factors that create them”. In other words, one should look further afield — or better still, outside one’s field — in order to explain phenomena. Such cross-domain investigations strengthen one’s own field, and it creates links with other scientific fields (cf.
"fis-r14"> "fis-r62">
What counts as evidence in historical linguistics?
Yngve 1996: 117). Further on in his article, Derwing adds, “my perspective is that since the ‘language system’ exists only in the minds of language learners, we must therefore explore these minds in order to ascertain what that system is actually like” (Derwing 1977: 81). This is also Yngve’s (1996) point of view: in order to create a scientifically worthwhile linguistics, we must start only with the physical data, with utterances (“sound-waves”), speakers and the situational context; only in this way can we relate linguistics to other sciences and find out how language is really processed. It is clear that we need a neurolinguistic approach in the search for innateness; one that may investigate and corroborate the principles so far suggested by linguists (cf. also Wunderlich, this volume).
4. Back to the historical linguist: Some concluding remarks It must be clear that one’s point of view in the innateness debate profoundly influences the way in which language is studied. Cognitive as well as formalist linguists seem to agree that there is a link between language acquisition and language change, and that knowledge about how the one takes place may help us to understand more about the other. The big difference, however, is what exactly the object of research is. Is it the physical language data, the physical context and the physical mind of the speaker, or is it the fictional (?) language system and the idealized, and therefore fictional competence of the speaker? At various points in this essay, it was indicated that evidence for language change could be corroborated by or linked to evidence found in language acquisition, and vice versa. Both are valuable sources upon which the theoretical linguist can draw when gathering evidence for the form or contents of UG. (As we have seen, the study of language evolution, although not providing us with hard empirical data, may also contribute towards an understanding of how grammar develops.) Chomsky (1965: 27) linked the content of the theory of grammar and UG explicitly to the construction of a theory of language acquisition, which should provide “an account of the specific innate abilities that makes this achievement [i.e. language acquisition] possible”. In a similar way, Kiparsky and Lightfoot (see the introduction) pulled in historical linguistics to help build up UG. It seems to me, as far as language acquisition is concerned, that the type of operating principles suggested by Slobin and his school (1985b) are more appropriate and promising than the generative approach for this task since they do not presuppose a ready-made grammar module. Rather, they allow for developmental changes taking place in children: they take into account the increase in processing capacity, and cognitive developments which influence the way in which children perceive the world around them. Instead of on innate principles, they rely on developmental ‘boot-
269
270
Olga Fischer
strapping’. This means that children make use of existing resources or capabilities to raise themselves to a new situation or state; they ‘pull themselves up’ from what they already know, and by doing so, acquire more knowledge and thus create more resources by which to pull themselves up even further.27 Because of the observed link between historical linguistics and language acquisition, it seems a good idea to investigate whether the operating principles that play an important role in acquisition are also to be found in language change, and whether structures acquired early in acquisition are more stable in periods of change. In that way, we may be able to develop an understanding of a language system, which is not predetermined like UG, but which is a direct result of the analysis of the PLD, and consequently likely to be much more language-specific. At the same time, an understanding of such a language processing system, might bring us closer to what the brain actually does, and would create a profitable link between linguistics and the other sciences concerned with the working of the brain, since linguistics too would then be based on physical data. It would be part of the physical domain rather than the logical domain. The historical linguist has only one firm source of knowledge and that is the historical documents. Unlike synchronic linguists, he cannot make use of the intuitions of native speakers, nor has he access to spoken material and visual aids, such as gestures, direction of gaze etc. His prime concern is an accurate description of the data in the documents, in their context, which he investigates in order to understand the regularities underlying the data and the changes that take place. In doing this he may (or rather, must) make use of insights provided by other disciplines. He must not only turn to language acquisition studies, but also to sociolinguistics, generative grammar, cognitive grammar, discourse analysis, optimality theory etc., and in addition make use of insights drawn from synchronic variation and typological comparison (unlike Newmeyer, this volume, I believe that typological insights may help us to gain a greater knowledge of how grammar or UG works even though I agree that there is not a direct relation between the principles and parameters of UG and typological tendencies). These insights, however, are not primary data to be used in the same way as the written documents themselves. If the historical linguist wishes to contribute to our knowledge of how language works, if he wishes to deepen our knowledge about the system that language users have or develop — the ultimate aim, I take it, of most linguistic subdisciplines — then he should do this from within his own subdiscipline, i.e. use the empirical data that his subdiscipline provides. For me, this means that he must concentrate on physical data, on their context, and on the variations that occur on the performance level, and not on how grammar changes. Grammar, at this stage, is a theoretical construct, not something that has been established empirically.28 I therefore cannot go along with linguists who see the formalist type of UG as a biological fact. When Lightfoot (1999: 10), for instance, writes, “For us, a grammar
What counts as evidence in historical linguistics?
is part of human biology”, the phrase “for us” is crucial because it indicates that the idea that UG is part of human biology is a generative assumption, not (yet) a fact.29 I have tried to show why this is not a sound assumption. For Lightfoot indeed, as I wrote in the introduction, grammar change is what the historical linguist should investigate because grammar is more ‘basic’.30 He believes that historical linguists should not occupy themselves too much with unimportant details, leading to uninteresting explanations, because if they do this, they move away too far from the ‘core business’, that is, establishing the language ‘system’. I do not think, however, that a short-cut to the core via UG is possible. As long as UG is a construct, not a fact, such a short-cut may indeed lead to pseudo-explanations. This does not mean that the historical linguist must consider all historical details. He must consider the details scientifically, i.e. observe what is ‘same’ and what is ‘different’ in comparable contextual situations. Sameness may tell us something about robust patterns in language, while differences may show what patterns are less basic. By looking at the contexts in which the utterances occur, we may learn how the differences come about, and how peripheral patterns are affected. Thus, in my view, the study of physical, written data should provide us with hints as to what causes variation and change, hints about the mechanisms that play a role in change; hints about what speakers do, what (and why) they make changes. From all these hints, a theory of how language works should be built up, and this theory should not a priori coincide with any theory set up by other subdisciplines; i.e. all subfields of linguistics should observe a certain measure of independence. Other areas of linguistics may also provide hints on another (more abstract) level, which may feed our imagination as researchers, but these hints should be tested on the historical facts. All in all, it is clear that I do not think it is a good idea to ascribe an autonomous, formal type of grammar to a genotype, and to explain linguistic change first and foremost in terms of such an innate grammar. It may be that such a grammar is biological, but it is too early yet to know, let alone use this grammar as a biological fact. Moreover, there is still very little agreement about what exactly the principles, constraints and parameters of this innate grammar are.31 I would advise historical linguists to take the data and their context seriously first, and to work from there using theoretical insights but not taking them as ‘real’. Ascribing a formal grammar to biology remains an assumption which may become a fact once we know more about how the mind actually works.
Notes *I would like to thank the editors, Martina Penke and Anette Rosenbach, and an anonymous reviewer, for the very useful comments that they have provided me with on an earlier version of
this paper. I have taken good note, made many changes, but also left parts unchanged. The latter mainly with reference to the different opinions that Martina and I have as to the innateness of grammar. We agree that empirically the matter is still undecided, but we do not agree on much else. I am no expert on neuro-linguistics, I have quoted neurologists and neuro-linguists, who are more of an expert than I am, who present ideas that seem to me intuitively correct. I have given my arguments on innateness, and more importantly on the use of ‘innate grammar’ by historical linguists. I will leave it to others and to the future to decide which path is the ideal path to follow. 1. One way of dealing with variation is via the so-called double-base hypothesis suggested by historical linguists such as Pintzuk and Kroch (e.g. Pintzuk 1991). A problem with this model is that it is descriptive rather than explanatory. A more promising approach is Kroch’s (1989) sampling of changes over time, which led him to posit the ‘constant-rate hypothesis’, showing that clusters of surface changes involving different structures are the result of a single underlying change. I will deal with the question of variation and change in more detail in Fischer (2007). 2. A quite illustrative example of this tendency is the interpretation of the infinitive marker to in Old English by Kageyama (1992). In an attempt to explain the ‘connectedness’ (one of Lightfoot’s ‘diagnostic properties’) of a number of constructions that are used in Old English and which disappear or are changed in Middle English, he gives a highly abstract analysis of the particle to, which in his view is placed in a separate functional node, i.e. AGR. To is thus not seen as part of the infinitive or as a preposition governing the infinitive, but it is positioned in a higher node, above the infinitive. In this node, it functions as an agreement marker and also as an external argument, i.e. a subject, but one without a theta-role (Kageyama claims it absorbs its theta-role). So in order to prove the connection between (or the simultaneous occurrence of) various types of constructions (such connections makes the grammar more elegant, more economic because they can be reduced to one more basic rule), Kageyama must assume that to functions like an inflection on a verb and also as the subject of that verb, characteristics that intuitively we would not associate with to at all. There is no surface evidence for the change in to — this occurs later; there is only the postulated connectedness of the above-mentioned constructions (for a discussion of the problems posed by Kageyama’s hypothesis, see Fischer 1996). 3. Even though the term ‘blueprint’ is sometimes used to refer to UG (as a genotype), Dawkins (1986: 295–296) makes a convincing case that this is the wrong term, and that ‘recipe’ fits the bill much better: “the indications are very strong that the genes are much more like a recipe than like a blueprint. Indeed, the recipe analogy is really rather a good one, while the blueprint analogy, although it is often unthinkingly used in elementary textbooks, especially recent ones, is wrong in almost every particular. … [T]he effect, if any, that a gene has is not a simple property of the gene itself, but it is a property of the gene in interaction with the recent history of its local surroundings in the embryo. This makes nonsense of the idea that the genes are anything like a blueprint for a body … There is no simple one-to-one mapping, then, between genes and bits of body, any more than there is a mapping between words of recipe and crumbs of cake. The genes, taken together, can be seen as a set of instructions for carrying out a process, just as the words of a recipe, taken together, are a set of instructions for carrying out a process”. 4. It could be said even that an investigation of the historical process is necessary in order to reach an understanding of a change. McMahon (2000: 146), who quotes Dennett (1995: 123, 129), states that “Constraints and general mechanisms, then, can help us delimit what is possible, but to understand actual organisms or genomes, ‘we have to turn to the historical process that created them, in all its grubby particularity’” (and see also McMahon, p. 148–149: “if we take a detailed enough perspective, we may understand where the differences between groups [or variants] come from”, emphasis added).
What counts as evidence in historical linguistics?
5. Cf. my own reaction (Fischer 1995: 165–166) to Mitchell (1992), where he questions the usefulness of modern linguistic theories for the study of Old English syntax (“are the Emperor’s new clothes really there”, Mitchell 1992: 98). It is difficult to specify what makes a language ‘learnable’. In principle, it might be possible to learn a language that has only facts (idiosyncratic rules), but computer simulation studies by Kirby (2000, and see also Kirby et al., this volume) and Hurford (2000) have shown that such systems inevitably (given enough time) develop more general rules when the idiosyncratic facts multiply. 6. I have discussed an instance of this in Fischer (1994), where the development of Modern English have + to infinitive from a possessive verb into a modal verb of obligation is seen by one linguist as a typical example of grammaticalization. Grammaticalization theory involves the notions of gradual change and semantic/pragmatic steering of the change. Seeing the have to case as a typical example of grammaticalization, the linguist in question interprets a number of early constructions as already grammaticalized (because this is to be expected from the point of view of the theory), which in their context quite clearly show that they are still instances of the ‘old’ construction. Thus, a gradual development with specific stages, as recognized by the theory is forced upon rather ‘reluctant’ data. 7. Strictly speaking, saw has been derived from the OE past tense plural sawon, so it is not a direct cognate of seah, while the accusative hie has been replaced by the originally Old English dative form hire. 8. I am ignoring here other problems historical linguists have with the database having to do with the difficulties involved in the selection of texts. It is clear that we should choose texts that are precisely localized both in time and place, thus controlling as much as we can possible parameters for variation. The work done on the nature and the copying of manuscripts, and the production of dialect atlases (as done for English by Michael Samuels, Angus McIntosh and their colleagues in Glasgow and Edinburgh (see e.g. MacIntosh et al. 1986) has been most fruitful in improving the historical linguistic effort. Unfortunately (and I include myself among them) their results have not always been paid heed to, cf. the warnings issued by Cynthia Allen on many occasions (e.g. Allen 2000). 9. Lightfoot (1999) makes a distinction between the ‘genotype’ (UG), with which every human being is born, and the phenotype, the mature grammar which develops in the brain on the basis of interaction between the genotype and the PLD. Jackendoff (2002: 72) refers to this innate ‘grammar’ as the “grammar-acquiring capacity”, i.e. “the prespecification in the brain that permits the learning of language to take place”. In other words, Jackendoff (2002: 90) seems to insert a layer between the genetic instructions and the neural architecture, and between the latter and the functional organization of UG. 10. Parts of this section were also used in a plenary lecture delivered to the 12th International Conference on English Historical Linguistics, held in Glasgow, August 2002. The lecture has been published in the Proceedings edited by Christian Kay et al. (2004). 11. Slobin (1997: 280) also writes, “there is no clear dividing line between ‘content words’ and ‘functors.’ Rather, there is a continuum with clearly lexical items on one end […] and grammatical inflections on the other […]. In between, there are lexical items that play more or less specialized roles, sometimes on their way to becoming grammatical morphemes over time. What, then, is a grammatical morpheme? It depends on the purposes of the analysis. In any event, it would be difficult to preprogram the child with an adequate definition”. Children, in other words, will have to discover, while developing their linguistic system, which of the lexical elements are used grammatically, which turn out to be fully lexical, which fully grammatical, and
273
"fis-r60"> "fis-r55"> "fis-r48"> "fis-r44"> "fis-r9">
274
Olga Fischer
which float somewhere in between; i.e. they have to learn which notions are grammaticizable in their language. This presents a developmental (socio-historical) approach to the structuring of grammaticizable notions (of grammar), and not a ready-made individual mind source in which content words and functors are pre-wired. The approach is clearly related to the notion of grammaticalization as used in historical linguistics, even though the processes at work are not necessarily or even likely to be the same. But common ground can be discovered: what is important is that both processes — change and acquisition — take place in a communicative context, and, presumably, distribution and frequency of exposure play a crucial role in both. Also in both there will be items that function on the content as well as the functor level, and it would be interesting to find out whether the ways of identifying or handling them are the same in language acquisition as in adult every-day usage, and how this relates to tendencies/principles in historical grammaticalization processes. Another interesting aspect is the question whether the small lexical core of source-concepts from which grammaticalized elements develop, is also a central or prototypical core group in child language, which they develop early in the process of acquisition. I think we can learn more about grammaticalization by studying both processes in combination (for an example, see Wong 2004). Slobin (1997: 282) adds that the developmental notion of grammaticalization also removes “any basis for a neurological definition of the closed class as a linguistic subsystem.” 12. Martina Penke notes in her commentary on an earlier version of this paper that “neither modularity not autonomy of grammar are dependent on whether or not ‘a grammar’ module can be localized in the brain … The criterion for an autonomous grammar module is that its computations are based on elements and principles that cannot be derived from elements and principles operative in other cognitive domains”. I take her point: empirically we do not yet know the answer since the verdict on innateness is still open. However, looking at this from an evolutionary point of view, if the generative school accepts that grammar is autonomous, i.e. has its own independent principles and terms, then it is unlikely that this grammar was an adaptation from some earlier organ used in the cognitive domain. It is more likely, in that case, as indeed Chomsky has suggested (cf. Pinker 1994: 262–263 and McMahon 2000: 159–161), that the grammar emerged de novo, in a single step rather than as a gradual adaptation (as some other generative linguists believe), and if this is so, it is also more likely that this grammar would not be distributed all over the brain. Putting two and two together, it seems to me that that it is not altogether ‘illogical’ to think that modularity (localization) and autonomousness are somehow connected. 13. He adds that this coupling also makes evolutionary sense: “The neural blueprint is both parsimonious and elegant” (Goldberg 2001: 67). It pleads against a separate module or place in the brain for language. 14. Code (1997) seems to argue the opposite role for the two hemispheres in a study based on investigations of aphasic and left hemispherectomy patients. He writes: “The kind of speech that the right hemisphere is capable of appears to be confined to the automatic, familiar, nonpropositional”, while new utterances, i.e. in the child learning to speak, are “generated by the left hemisphere’s linguistic system” (Code 1997: 55). It is difficult to see how these seemingly opposed views could be reconciled. What may play an important role, however, is age of the patient (most of Code’s patients were middle-aged), individual differences and perhaps the role played by new learning after a stroke or operation. Goldberg (2001: 46) emphasizes that “the roles of the two hemispheres in cognition are dynamic, relative and individualized”: “Mental representations develop interactively in both hemispheres but the rates of their formation differ.
What counts as evidence in historical linguistics?
They form more rapidly in the right hemisphere at early stages of learning a cognitive skill, but the relative rate reverses in favor of the left hemisphere at the late stages”. 15. For a history of this concept, see Thomas (2002); for a full presentation and critical assessment, see Pullum and Scholz (2002); for a watered-down version of the Chomskyan poverty-notion, and, hence, a watered-down version of UG, see Jackendoff (2002: 82–87; 102). 16. Pullum and Scholz (2002) examine four often quoted cases where it has been assumed by generative linguists that there is no positive evidence for children to acquire the construction. They show for each case, with the help of various types of corpus evidence, that such positive evidence is easily available. The question then becomes: how much of this sort of positive evidence do generative linguists require for their assumption to be falsified? 17. See, for instance, the recent discussion in a double volume of The Linguistic Review 19.1–2 (2002) edited by Nancy Ritter. 18. According to Lightfoot (1999: 63), however, a child cannot “induce the meanings of even the simplest words”: “[c]hildren do not have sufficient evidence to induce the meaning of house, book, or city, or of more complex expressions, even if we grant them everything to advocates of Motherese or those who argue that it’s all data processing of huge corpora”. So children need innate knowledge even for this. I cannot quite see what this innate knowledge might consist of, nor do I see that children would not be able to learn what apple or house means when the objects are there for them to see, and when they occur repeatedly. I take it that learning starts with visible objects; see also Foster (1990: 159), who writes that “[t]he primary input to such a process [i.e. lexical learning] must be exposure to language in situations where the meaning can be deduced; and it is clear that children do get exposure to words under these conditions”. Clark (2003: 29 and passim), too, emphasizes that “adults anchor their conversational contributions to objects or events physically present on each occasion”. It is strange too that frequency seems to play no role in generative concepts of learning. A recent book by Bybee and Hopper (2001) shows how important a role frequency plays both in the development of language in children and in language change. Bybee has indeed emphasized the importance of frequency in much earlier work. Clark (2003: 416) notes that children “tally frequency during acquisition” and that the importance of both type- and token-frequency in acquisition is “usually ignored in rule-based approaches” (p. 421). 19. Itkonen (1994: 46), writes: “It is a well-known fact that, in the beginning, children learn the meanings of only those words whose referents are present when they hear (or see) the corresponding word-forms. Adding in a note, “[b]ecause of its hostility towards associationist learning theory, Chomskyan psycholinguistics is incapable of accommodating this simple fact”. 20. This was indeed a possible interpretation of (2d) in Old English, as Lightfoot (1999: 75, note 4) notes. Anthony Warner (p.c) remarked in this connection that the use of himself in (2d) when the construction is reflexive, would soon make clear to the child that him cannot refer to Jay in Present-day English. 21. I checked the OED on-line on the structure in question and found no examples of it, the only examples concern structures where the wh-element is the object of the infinitive depending on want to, i.e. What do you want to/wanna eat? In a large newspaper corpus, again the examples with initial, objective what, were numerous, only three examples were found with initial who. In two of them who was object (Who do you want to punch today (Daily Telegraph 2–9–1997), Who do you want to live with, John? (Daily Mail 26–8–1998), just one example had who as subject: Who do you want to represent you? (Daily Telegraph 23–4–1997). Note, however, that this last clause is much easier to process than Crain’s example since who, quite clearly, cannot be the object of the
275
"fis-r8"> "fis-r38"> "fis-r41"> "fis-r7"> "fis-r50">
276
Olga Fischer
infinitive because it already has an object. I am grateful to my colleague Tom van Brederode for providing me with these examples. 22. For an example of a linguistic ‘change’ that may be due not to a change in the grammar itself but to the development of a written standard, see Fischer (2004); and see also Weiß (this volume) for further examples. 23. Even though many generative linguists now see language acquisition as a maturational process, they still accept that there is a fixed initial state which influences and constrains the maturational development. Thus, Clark and Roberts (1993) and Lightfoot (1999) believe that the triggering of the innate rules and principles of grammar may take place in stages. In Lightfoot’s ‘cue-based’ model children parse utterances, which results in their setting up “mental representations” or abstract structures which they scan against so-called “designated cues” in UG (the genotype). At first, some of these representations constitute partial parses because children ignore the more complex parts of the input; only at a later stage do children reach their mature grammar or phenotype (Lightfoot 1999: 57–58, 148–151). Still, even in this model the grammar or genotype is innate and predetermined. 24. For instance, in Syntactic Structures (1957: 18), he uses the words “device”, or refers to “the theory of grammar” and he talks about the adequacy of this device only in purely logical terms, no mention is made of a biological base. In Aspects (1965), Chomsky links linguistic theory with language learning, but his idea is that “empiricist theories about language acquisition”, are not at all helpful: they “are refutable wherever they are clear, and … further empiricist speculations have been quite empty and uninformative” while “the rationalist approach exemplified by recent work in the theory of transformational grammar seems to have proved fairly productive, to be fully in accord with what is known about language [note that what is known concerns only competence O. F.], and to offer at least some hope of providing a hypothesis about the intrinsic structure of a language acquisition system that will meet the condition of adequacy-inprinciple and do so in a sufficiently narrow and interesting way so that the question of feasibility, can, for the first time, be seriously raised” (p. 54–55). In other words, there is a link, but the language acquisition device can only be productively studied from the top down so to speak, and it is quite clear that logical principles only play a role, i.e. reasoning from competence (which does not constitute empirical data!) is the only productive way forward. The model is therefore not “a psychological model of the way people construct and understand utterances” (Lyons 1970: 85). Only in later work (e.g. Chomsky 1981: 8), do we learn that UG is “an element of shared biological endowment”, but there is still a gap between UG and core grammar, the latter is said to be an “idealization” of “the reality of what a particular person may have inside his head”. More recently, judging from his reaction to John Searle in the New York Review (July 18, 2002, p. 64), Chomsky’s stance has become clearer, he writes: “The long-term goal has been, and remains, to show that contrary to appearances, human languages are basically cast to the same mold, that they are instantiations of the same fixed biological endowment, and that they ‘grow in the mind’ much like other biological systems, triggered and shaped by experience, but only in restricted ways”. 25. Pullum and Scholz (2002: 16) also question “whether children learn what transformational generative syntacticians think they learn”. They suggest instead that children learn constructions “piecemeal” rather than via “rapid generalizations”. 26. Cf. Croft (2000: 2), who writes, “In the study of linguistics, the real, existing entities are utterances as they are produced in context, and speakers and their knowledge about their language as it is actually found in their minds”. As historical linguists, we can study the
What counts as evidence in historical linguistics?
utterances in context, even though the context will be more limited due to the nature of the evidence (see Section 4), but we cannot probe speakers’ knowledge about their language because we have no access to the linguistic intuitions of people in the past. The structure of the mind itself, we should take into account because this will not have changed, but we need neurophysical evidence for that. 27. Scholz and Pullum (2002: 195) note that it has not been shown that innateness is a necessary prerequisite for learning a language. They write: “It is important to see that learnability theory does not support the mathematical impossibility of learning language from positive examples … even for infinite languages in which some strings are ungrammatical”. In other words, “[f]rom the observation that children do not appear to be supplied with negative evidence … it does not follow that learning is impossible” (p. 196, italics in original). 28. Not even an empirical fact that can be shown to exist (in the forms or models proposed) by experimentation. Although it is true that in the natural sciences — ever since Francis Bacon and Robert Boyle (cf. Leezenberg and de Vries 2001: 39–42) — empirical facts comprise not just natural phenomena but also non-visible phenomena that can be shown to exist via repeatable and verifiable experiments, psychological or any other tests have not yet shown that the transformations, rules, principles, constraints etc., which are said to exist in formal models of grammar, are actually used in the processing of sentences. 29. Cf. Pullum and Scholz (2002: 10–12), who remark that these assumptions are rather typical in generative work and that so far, they have lacked any empirical underpinning: “Instead of clarifying the reasoning, each successive writer on this topic [i.e. the poverty-of-the-stimulus argument] shakes together an idiosyncratic cocktail of claims about children’s learning of languages, and concludes that nativism is thereby supported” (p. 12). 30. In this debate about what is the proper ‘object of study’, the school of ‘Emergent Grammar’ may be said to represent the other ‘extreme’. According to Hopper (1987), grammar is “epiphenomenal” (p. 142), “an effect”, rather than a “cause”, “always emergent and never present” (p. 148). Lightfoot (1999: 74), on the other hand, sees language as an ‘effect’, and grammar as a ‘cause’ (language is “a derivative concept, the output of people’s grammar”). I would say that grammar, when it develops, is an effect, but once developed, may also act as a cause. 31. This is a real problem because it entails that the explanation given in terms of the grammar shifts, when the content of the grammar shifts, as has happened so often within generative historical work. A case in point is Lightfoot’s explanation for the loss of impersonals in English. In Lightfoot (1979), this was due to the ‘Transparency Principle’, in later work (1981), it was due to the ‘Trace Erasure Principle’. Cf. also the explanation of the auxiliary case discussed by Pullum and Scholz (2002: 27–31). For a thorough and insightful description of such changing ‘explanations’, see McMahon (1994: 123–137).
References Allen, Cynthia. 2000. “Obsolescence and sudden death in syntax: the decline of verb-final order in early Middle English”. In: Bermúdez-Otero, Ricardo; Denison, David; Hogg, Richard M.; and McCully, C. B. (eds), Generative theory and corpus studies. A dialogue from 10 ICEHL 3–25. Berlin: Mouton de Gruyter. Andersen, Henning. 1973. “Abductive and deductive change”. Language 49: 765–793. Berman, Ruth. 1991. “In defense of development”. Behavioral and Brain Sciences 14: 612–613
Bybee, Joan. 1988. “The diachronic dimension in explanation”. In: Hawkins, John (ed.), Explaining language universals 350–379. Oxford: Blackwell. Bybee, Joan; and Hopper, Paul (eds). 2001. Frequency and the emergence of linguistic structure. Amsterdam: Benjamins. Carstairs-McCarthy, Andrew. 2000. “The distinction between sentences and noun phrases: an impediment to language evolution?”. In: Knight et al. (eds), 248–263. Cheshire, Jenny; and Stein, Dieter (eds). 1997. Taming the vernacular. From dialect to written standard language. London: Longman. Chomsky, Noam. 1957. Syntactic structures. The Hague: Mouton. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge Mass.: MIT Press. Chomsky, Noam. 1981. Lectures on government and binding. Dordrecht: Foris. Chomsky, Noam. 1995. The minimalist program. Cambridge Mass.: MIT Press. Clark, Eve V. 2003. First language acquisition. Cambridge: Cambridge University Press. Clark, Robin; and Roberts, Ian. 1993. “A computational model of language learning and language change”. Linguistic Inquiry 24: 299–345. Code, Chris. 1997. “Can the right hemisphere speak?”. Brain and Language 57: 38–59. Crain, Stephen. 1991. “Language acquisition in the absence of experience”. Behavioral and Brain Sciences 14: 597–612. Croft, William. 2000. Explaining language change. An Evolutionary appraoch. London: Longman. Dawkins, Richard. 1986 [reprinted in Penguin 1988]. The blind watchmaker. London: Longman. Dennett, Daniel C. 1995. Darwin’s dangerous ideas. London: Allen Lane. Derwing, Bruce, L. 1977. “Is the child really a ‘little linguist’?”. In: Macnamara, John (ed.), Language Learning and Thought 79–84. New York: Academic Press. Deutscher, Guy. 2002. “On the misuse of the notion of ‘abduction’ in linguistics”. Journal of Linguistics 38: 469–485. Feilke, Helmuth; Kappest, Klaus-Peter; and Knobloch, Clemens. 2001. Grammatikalisierung, Spracherwerb und Schriftlichkeit. Tübingen: Niemeyer. Fischer, Olga. 1994. “The development of quasi-auxiliaries in English and changes in word order”. Neophilologus 78: 137–164. Fischer, Olga. 1995. “New directions in English historical grammar” (review article of Rissanen et al. 1992). Neophilologus 79: 163–182. Fischer, Olga. 1996. “The status of to in Old English to-infinitives: a reply to Kageyama”. Lingua 99: 107–133. Fischer, Olga. 2004. “ ‘Langue’, ‘parole’ and the historical linguist”. In: Rodríguez Álvarez, Alicia and Almeida, Francisco Alonso (eds), Voices on the Past. Studies in Old and Middle English Language and Literature 101–138. Coruña: Netbiblo. Fischer, Olga. 2007. Morphosyntactic Change. Functional and Formal Perspectives. Oxford: Oxford University Press. Fischer, Olga; and van der Leek, Frederike. 1981. “Optional vs radical re-analysis: mechanisms of syntactic change”. Lingua 55: 301–350. Foster, Susan H. 1990. The communicative competence of young children. London: Longman. Gerritsen, Marinel; and Stein, Dieter (eds). 1992. Internal and external factors in syntactic change. Berlin: Mouton de Gruyter, Goldberg, Elkhonon. 2001. The executive brain. Frontal lobes and the civilized mind. Oxford: Oxford University Press. Grimshaw, Allen D. 1989. “Infinitely nested Chinese ‘black boxes’: linguists and the search for Universal (innate) Grammar”. Behavioral and Brain Sciences 12: 339–340.
What counts as evidence in historical linguistics?
Grimshaw, Jane; and Pinker, Steven. 1989. “Positive and negative evidence in language acquisition”. Behavioral and Brain Sciences 12: 341–342. Holyoak, Keith J.; and Thagard, Paul. 1995. Mental leaps. Analogy in creative thought. Cambridge Mass.: MIT Press. Hopper, Paul J. 1987. “Emergent grammar”. In: Aske, Jon et al. (eds), Proceedings of the thirteenth annual meeting of the Berkeley Linguistics Society 1987 139–157. Berkeley Linguistics Society. Hurford, James R. 2000. “Social transmission favours linguistic generalisation”. In: Knight et al. (eds), 324–352. Itkonen, Esa. 1994. “Iconicity, analogy and Universal Grammar”. Journal of Pragmatics 22: 37–53. Jackendoff, Ray. 2002. Foundations of language. Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Joseph, Brian D. 1992. “Diachronic explanation: putting speakers back into the picture”. In: Davis, Garry W.; and Iverson, Gregory K. (eds), Explanation in historical linguistics 123–144. Amsterdam: Benjamins. Kageyama, Taro. 1992. “AGR in Old English to-infinitives”. Lingua 88: 91–128. Kay, Christian J.; Horobin, Simon; and Smith, Jeremy (eds). 2004. New perspectives on English historical linguistics. Amsterdam: Benjamins. Kiparsky, Paul. 1968. “Linguistic universals and linguistic change”. In: Bach, E.; and Harms, R.T. (eds), Universals in linguistic theory 171–202. New York: Holt, Rinehart and Winston. Kirby, Simon. 2000. “Syntax without natural selection: how compositionality emerges from vocabulary in a population of learners”. In: Knight et al. (eds), 303–323. Klooster, Wim G. 2000. Geen. Over Verplaatsing en Focus [‘No(ne). On movement and focus’]. Amsterdam: Vossiuspers AUP. Knight, Chris; Studdert-Kennedy, Michael; and Hurford, James R. (eds). 2000. The evolutionary emergence of language. Social function and the origins of linguistic form. Cambridge: Cambridge University Press. Kroch, Anthony. 1989. “Reflexives of grammar in patterns of language change”. Language Variation and Change 1: 199–244. Krug, Manfred. 2003. “Frequency as a determinant in grammatical variation and change”. In: Rohdenburg and Mondorf (eds), 7–67. Leezenberg, Michiel; and de Vries, Gerard. 2001. Wetenschapsfilosofie voor Geesteswetenschappen [‘Philosophy of science for the humanities’]. Amsterdam: Amsterdam University Press. Lieberman, Philip. 1991. Uniquely human. The evolution of speech, thought, and selfless behavior. Cambridge Mass.: Harvard University Press. Lightfoot, David. 1979. Principles of diachronic syntax. Cambridge: Cambridge University Press. Lightfoot, David W. 1981. “The history of noun phrase movement”. In: Baker, C. L.; and McCarthy, J. J. (eds), The Logical problem of language acquisition 86–119. Cambridge Mass.: MIT Press. Lightfoot David W. 1991. How to set parameters: arguments from language change. Cambridge Mass.: MIT Press. Lightfoot, David. 1999. The development of language. Acquisition, change and evolution. Oxford: Blackwells. Lyons, John. 1970. Chomsky. London: Fontana/Collins. MacIntosh, Angus; Samuels, Michael Louis; and Benskin, Michael. 1986. A linguistic atlas of late mediaeval English. Vols I-IV. Aberdeen: Aberdeen University Press.
McCawley, James D. 1989. “INFL’, Spec, and other fabulous beasts”. Behavioral and Brain Sciences 12: 350–352. McCawley, James D. 1991. “‘Negative evidence’ and the gratuitous leap from principles to parameters”. Behavioral and Brain Sciences 14: 627–628. McMahon, April. 1994. Understanding language change. Cambridge: Cambridge University Press. McMahon, April. 2000. Change, chance and optimality. Oxford: Oxford University Press. Mitchell, Bruce. 1992. “How to study Old English syntax”. In: Rissanen, Matti; Ihalainen, Ossi; Nevalainen, Terttu; and Taavitsainen, Irma (eds), History of Englishes. New methods and interpretations in historical lingusitics 92–100. Berlin: Mouton de Gruyter. Parry, Milman. 1971 [1928]. The making of Homeric Verse: the collective papers of Milman Parry ed. by Adam Parry. Oxford: Clarendon Press. (Contains a translation of the original French PhD of 1928) Peters, Ann M. 1985. “Language segmentation: operating principles for the perception and analysis of language”. In: Slobin 1985b, 1029–1067. Pinker, Steven. 1994. The language instinct. London: Penguin Books. Pintzuk, Susan. 1991. Phrase structures in competition: Variation and change in Old English word order. PhD dissertation. University of Pennsylvania, Pittsburgh. Pintzuk, Susan; Tsoulas, George; and Warner, Anthony (eds). 2000a. Diachronic syntax. Models and mechanisms. Oxford: Oxford University Press. Pintzuk, Susan; Tsoulas, George; and Warner, Anthony. 2000b. “Syntactic change: theory and method”. In Pintzuk, Tsoulas and Warner (eds), 1–22. Pullum, Geoffrey K.; and Scholz, Barbara C. 2002. “Empirical assessment of stimulus poverty arguments”. The Linguistic Review 19: 9–50. Rohdenburg, Günter; and Mondorf, Britta (eds). 2003. Determinants of grammatical variation in English. Berlin: Mouton de Gruyter. Schlesinger, I. M. 1989. “Language acquisition: dubious assumptions and a specious explanatory principle”. Behavioral and Brain Sciences 12: 356–357. Schlesinger I. M. 1991. “Innate universals do not solve the negative feedback problem”. Behavioral and Brain Sciences 14: 633. Schlüter, Julia. 2003. “Phonological determinants of grammatical variation in English: Chomsky’s worst possible case”. In: Rohdenburg and Mondorf (eds), 69–118. Scholz, Barbara C.; and Pullum, Geoffrey K. 2002. “Searching for arguments to support linguistic nativism”. The Linguistic Review 19: 185–223. Slobin, Dan I. 1985a. “Crosslinguistic evidence for the language-making capacity”. In: Slobin 1985b (ed.), 1158–1256. Slobin, Dan I., ed. 1985b. The crosslinguistic study of language acquisition. Vol. 2. Theoretical issues. Mahwah, N. J.: Erlbaum Associates. Slobin, Dan I. 1997 [2001]. “The origins of grammaticizable notions: beyond the individual mind”. In: Slobin, Dan (ed.) The crosslinguistic study of language acquisition. Vol. 5. Expanding the contexts 265–323. Mahwah, N. J.: Erlbaum Associates. (Reprinted in shortened form in: Bowerman, Melissa; and Levinson, Steven C. (eds). 2001. Language acquisition and conceptual development 406–449. Cambridge: Cambridge University Press) Snow, Catherine E.; and Tomasello, Michael. 1989. “Data on language input: incomprehensible omission indeed!”. Behavioral and Brain Sciences 12: 357–358. Sokolov, Jeffrey L.; and Snow, Catherine E. 1991. “A premature retreat to nativism”. Behavioral and Brain Sciences 14: 635–636. Thomas, Margaret. 2002. “Development of the concept of ‘the poverty of the stimulus’”. The Linguistic Review 19: 51–71.
"fis-r59"> "fis-r60"> "fis-r61">
What counts as evidence in historical linguistics?
Warner, Anthony. 1983. Review article of Lightfoot 1979. Journal of Linguistics 19: 187–209. Wong, Kwok-shing. 2004. “The acquisition of polysemous forms: the case of bei2 (‘give’) in Cantonese”. In: Fischer, Olga; Norde, Muriel; and Perridon, Harry (eds), Up and down the cline — the nature of grammaticalization 324–344. Amsterdam: Benjamins. Wray, Alison. 2000. “Holistic utterances in protolanguage: the link from primates to humans”. In: Knight et al. (eds), 285–302. Yngve, Victor H. 1996. From grammar to science. New foundations for general linguistics. Amsterdam: Benjamins.
281
Abstraction and performance Commentary on Fischer David W. Lightfoot Georgetown University
Fischer writes that if one describes changes in terms of grammars, then “we in fact only describe the endpoint of a change. In order to understand why something started changing we must look at the variations over time as they begin to occur on the performance level” (254). Exactly. Grammars arise in children as they are exposed to primary data. Therefore the only way a different grammar (I-language) can emerge in different children (grammatical change) is if the children are exposed to different primary data (elements of E-language). In other words, there is more to language change than grammar change. One needs to attend to changing patterns of usage as well as to changes at the systematic level of grammars, and the distinction is never clear a priori. There are ways of telling the two types of change apart but that requires analysis and abstraction, as is familiar, and sometimes there is controversy. Synchronically, somebody may use sentences like Jay’s taller than Ray is but not Jay’s taller than Ray’s and that distinction may be a function of the person’s grammar, or not. Somebody else might use contracted forms like Jay’s taller than Ray is more frequently than somebody else and that difference may reflect not differences between their grammars but differences in the way they use their grammars, or vice versa. There is no way to know the correct analysis a priori, to know where grammar ends. One constructs the best hypotheses one can, abstract models, and they are revised through constant debate; one cannot be certain of one’s latest analysis and we are in a vigorous field where there has been much progress, where hypotheses are constantly changing, as our models become more sophisticated. Patterns of usage may change because of foreign influence, for example the influence of the Scandinavians in north east England during the Danelaw, because of stylistic innovation, by changing population mixes, or just by random variation. This kind of E-language flux, the grist of sociolinguists and discourse analysts, goes on all the time and no two children are exposed to the same primary data. Fischer is right: one explains a change at the level of people’s grammars when one shows
284
David W. Lightfoot
what the grammatical change was and how new primary data might have triggered that change, with an interplay between external and internal factors. Both involve abstractions. When we have sufficient data to make cases persuasively, we can enrich our ideas about the nature of grammars, how they unify phenomena, and their acquisition. One much discussed example is a change whereby verbs ceased to be raised to a higher inflectional position in the grammars of English speakers. Evidence for that grammatical change is that one ceases to find in the texts constructions like (1). (1) a. John liked not that book b. John likes always books about syntax c. Likes John books about syntax?
The loss of this grammatical operation must be attributed to some change in primary, external data, changes in children’s trigger experiences, and linguists often point to two critical changes in this regard, the earlier recategorization of certain verbs as inflectional elements (a prior grammatical change) and the spread of periphrastic do, a change in usage documented by Ellegård (1953) and much subsequent literature. This research program, the focus of the Diachronic Generative Syntax (DIGS) meetings over the last fifteen years, along with many books, papers, and anthologies over a longer period, integrates work on language change with work on grammatical theory and variation, discourse analysis, sociolinguistics, and language acquisition. The goal is to study language naturalistically, trying to discover the nature of the human language capacity, an object of nature, and studying it from various perspectives. In this way historical linguists learn from acquisitionists and vice versa. Recent work on syntactic change has contributed substantial results to the field at large, relating primarily to coexisting systems and the structural nature of trigger experiences. Under this view, the ‘evidence’ that people work with is drawn from many aspects of linguistics. This distresses Fischer, who wants historical linguistics not to be ‘subservient’ to non-historical work (250). In a striking statement, she wants “a theory of how language works [which] should not a priori coincide with any theory set up by other subdisciplines” (271). The subdisciplines of historical linguistics, acquisition studies, discourse analysis, etc should all be studied independently, as if there is no common capacity to be discovered, just independent silos with their own separate constructs and methods. Fischer is concerned with evidence for work in historical linguistics and wants to avoid the synchronic work that has been drawn on. She dislikes the abstract hypotheses permeating that work, and believes that a grammar “is a theoretical construct, not something that has been established empirically” (270). She distinguishes between “physical data in the real world” and “abstracted linguistic data in
Commentary on Fischer
our competence” (261). She wants to limit linguistic work in general to using “the physical language data, the physical context and the physical mind of the speaker” as opposed to the “fictional competence of the speaker” (269). It is unclear what she means by this physicality requirement. For example, she thinks there can be no real evidence for grammatical analyses until we know how those analyses correlate with physical properties of brains, but our current imaging machines are far too theory laden to meet her notions of direct physicality, registering only what they are programmed to measure; models are built into the machines. Maybe she wants to see cells corresponding to NPs and synapses corresponding to movement operations before drawing on the work of syntacticians. I will not discuss her long third section, a muddled discussion of innateness and brain physiology, which has no bearing on the way in which she wants to limit historical work and is to be supplanted, in a giant step back to the nineteenth century, by ‘principles’ of analogy (267). At the level of historical work she proposes to restrict work to “physical, written data” (271). Again her mysterious physicality notion. Marks on a page require analysis even to be construed as facts. Philological analysis determines whether in the last line of the prologue to the Canterbury Tales Chaucer wrote And he bigan with right a myrie chere / His tale anon, and seyde as ye may heere or … and seyde in this manere. When an observer refers to cyning as a noun, a substantive theory is being invoked. All observation is theory laden and certainly observations as abstract as these. All the facts mentioned here, the fact that Likes John books about syntax? ceased to be attested, the fact that a person does not say Jay’s taller than Ray’s, the fact that somebody uses contracted forms more frequently than somebody else, all these are facts about performance, about how people perform, and they are stated in abstract terms. Performance facts cannot be stated without abstractions. Performance facts, understood as best we can, are the basis for grammarians distinguishing well-formed structures, for acquisitionists identifying stages of childhood language, and for historians identifying how speech patterns have changed. Each of these domains has different problems and different opportunities in understanding their respective performance facts but they all involve abstractions and hypotheses, shaped by attending to performance factors. Historians have developed philological techniques, acquisitionists have developed experimental methods, sociolinguists have survey procedures. The notion that after Saussure grammarians “los[t] […] interest in the study of performance” (250), pursuing “a purely logical construct, which takes little notice of what speakers and hearers do in real-life circumstances” (251), is a delusion that ignores the fact that grammars distinguish well-formed structures that people use (with their associated meanings). Of grammatically based historical work Fischer says that “changes are often described in terms of changes in rules, conditions or functional categories, that have no
285
286
David W. Lightfoot
surface manifestation” (251); she gives no references and I have no idea what she has in mind. Fischer wants to deal just with the pure “physical, written data” of historical texts. People have worked in historical isolation for many generations, counting instances of object-verb as distinct from verb-object. Such notions go way beyond “physical, written data”, of course, but they are congenial with Fischer’s isolationist goals in minimizing the way that historians draw on synchronic work. I submit that we learned little about the general nature of language change from such examinations until we let in notions from across the discipline, from syntactic theory, discourse analysis, social variation, and acquisition, among others. At that point historical work was greatly enriched and drew on a vastly greater range of evidence, including ideas from many different subfields. True, some of these ideas have gray edges and there are controversies at some points. That means that historical linguists engage with those controversies, not that they wait until there is certainty in those areas and everything is reduced to “physical data” and brain physiology. There is no alternative. Fischer wants historians to limit themselves to studying variations in historical linguistic facts (251), but has to acknowledge that as soon as one asks what to compare in the domain of syntax, one needs the constructs of (synchronic) syntactic theory (251–252). If one pursues a research program of the type I have described, one takes a broad perspective, engaging with other subfields. Reading lists expand but the rewards can be great and we find theoreticians, acquisitionists, and even biologists and computer scientists taking an interest in the phenomena and analyses of language change, to everybody’s enrichment. This convergence is reminiscent of the nineteenth century, when biologists and political scientists read work on language change, which was one of the great enterprises of those times (Lightfoot 1999, Ch. 2). Historical linguists are the beneficiaries of such crossdisciplinarity, but, if they follow Fischer, they isolate themselves into insignificance.
References Ellegård, Alvar.1953. The auxiliary DO: the establishment and regulation of its use in English. Stockholm: Almqvist & Wiksell. Lightfoot, David W. 1999. The development of language: acquisition, change and evolution. Oxford: Blackwell.
Author’s response Olga Fischer University of Amsterdam
It is encouraging to read in Lightfoot’s reaction to my contribution that “there is more to language change than grammar change” (p.283) and that we need “sufficient data to make cases persuasively” so that “we can enrich our ideas about the nature of grammar” (p.284). This position, which I wholeheartedly agree with, reveals a more open stance than the one I quoted in my paper, where Lightfoot states that “language on the view sketched here is an epiphenomenon, a derivative concept” and that “when we think about change over the course of time, diachronic change, we shall now think not in terms of sound change or language change but in terms of changes in these grammars, which are represented in the mind/brains of individuals” (p.254). Lightfoot’s reply paints a rather reductive picture of my description of the use of data, and of the primary aims and methods of historical linguistics as a subdiscipline of linguistics. He suggests that the work done in generative syntax, which makes use of grammatical theory, sociolinguistics, discourse analysis etc. “distresses” me (p.284). On the contrary, I make abundantly clear that we, historical linguists, need models and theories on various levels. First of all, in order to be able to analyse our data we need to work with some grammatical model or models, we need to “abstract away from the surface forms” (see p.254 and p.273, note5). In addition, we do not need to take account of all historical details, but only of the ones that are significant (p.271); the significance of which we can only decide on by abstracting away from the data. Secondly, where I write that our ideas about how language changes “should not a priori coincide with any theory set up by other subdisciplines” (p.271), I mean just that, i.e. that the historical linguist should have the freedom to makes use of models or theories offered by other disciplines, indeed that we should make use of these (p.271). What I object against is that our ideas must fall in with the autonomous, syntacto-centric model suggested by the generative school of linguists. Similarly, I do not write that the linguistic subdisciplines “should all be studied independently” (p.284), rather that they “should observe a certain measure of independence” (p.271) towards any one theory. The reason why I devoted a long section to the ‘innateness of grammar’ — a
"fis2-r3">
288
Olga Fischer
section which Lightfoot describes as “muddled” (p.285) without saying why — is precisely because this is one of the pillars of the generative enterprise, which, to my mind at least, is based on the rather shaky concept of the ‘poverty-of-the-stimulus’. I have given arguments as to why I find this concept shaky. I may be wrong in not accepting the notion of poverty, but it is up to generative linguists and up to Lightfoot to go into the arguments I offer. I would like to add at this point that I do not see myself as some linguistic dinosaur, taking “a giant step back to the nineteenth century” (p.285). I do indeed refer to the importance of analogy (and not only as a neogrammarian ‘mopping-up’ device),1 but I relate analogy to the kind of operating principles that may play a role in language acquisition, as suggested by language acquisition experts and cognitive linguists such as Slobin and his associates. I believe that the further investigation of such principles in both language acquisition and language change may be a more fruitful approach to a deeper understanding of the language system than investigations based on the type of UG suggested by generative linguists.2 It is indeed a different way of looking at change, but it is by no means an a-theoretical way, as Lightfoot seems to suggest. Concerning Lightfoot’s rather personal remarks in his reply on my theoretical stance, i.e. my “distress[.]” (p.284) and “isolationist goals” (p.286), I will only quote his own words in a review of a book containing an article of mine: The only paper which offers thorough description AND draws on theoretical ideas, for mutual benefit, Is Olga Fischer’s study of the rise of the for NP to V construction. … She relates the change to the emergence of NP to V constructions (I expect [her to win]), which in turn is attributed to the SOV-to-SVO word-order change. She properly avoids Latin influence theories, which have been invoked too often in this area, and relates diverse data in an interesting way. … One knows how to compare her account with competitors, because she has exploited theoretical devices productively (Lightfoot 1991: 657)
To summarize, the historical linguist must make use of theoretical models to guide his investigations. He must, however, take the physical data on the page (this includes context of situation) seriously and describe them objectively even when they seem to contradict ideas that he draws from some theoretical model. In Fischer (2002), I have described cases where a particular theory ‘molds’ the data thereby distorting the explanation for the change in question and thus not doing good service to the theory itself. It seems to me healthy therefore, from a scientific point of view, to keep the subdisciplines relatively independent.
Notes
"fis2-r3"> "art">
Author’s response
1. It is interesting to note in this connection that in recent probabilistic approaches to linguistics (e.g. Baayen 2003) analogy is firmly back into business. Baayen shows how the use of an analogical model, in which similarity-based reasoning takes the place of abstract symbolic rules, makes more accurate predictions about the productivity or unproductivity of affixes than generative rule models. This analogical model is connected to a different way of learning, called ‘lazy learning’ (as opposed to ‘greedy learning’, which is the basis for formal models). Lazy learning does not require a priori knowledge, and is process-driven rather than product-driven (cf. my Section 3.3.1 and Fischer 2004: 55); procedures rather than rules, and type/token frequencies play a significant role here. I am grateful to Anette Rosenbach for pointing out Baayen’s article to me. More information on probabilistic linguistics can be found in the introduction to this volume, Section 3.1. 2. One of my objections against Lightfoot’s grammatical model is its abstractness and the way in which some generative historical linguists practising this model ascribe changes to changes in the apparatus of the system without there being any surface evidence for it. Lightfoot objects that I give no references and therefore has “no idea what she has in mind” (p.743). I do discuss such a case, however, in note 2 (p. 272), and see also the discussion in Fischer (2003: 449–450) of the lack of surface evidence for the simultaneity of the modal category change as a class as discussed by Lightfoot.
References Baayen, R. Harald. 2003. “Probabilistic approaches to morphology”. In: Bod, Rens; Hay, Jennifer; and Jannedy, Stefanie (eds), Probabilistic linguistics 229–287. Cambridge, Mass.: MIT Press. Fischer, Olga. 2002. “Teaching the history of the English language: its position in the university curriculum and its relation to linguistic theory”. In: Stanulewicz, Danuta (ed.), PASE papers in language studies. Proceedings of the ninth annual Conference 31–46. Gdan´sk: Wydawnictwo Uniwersytetu Gdan´skiego. Fischer, Olga. 2003. “Principles of grammaticalization and linguistic reality”. In: Rohdenburg, Günter; and Mondorf, Britta (eds), Determinants of grammatical variation in English 445–478. Berlin: Mouton de Gruyter. Fischer, Olga. 2004. “Grammar change versus language change. Is there a difference?” In: Kay, Christian J.; Horobin, Simon; and Smith, Jeremy (eds), New perspectives on English historical linguistics 31–63. Amsterdam: Benjamins. Lightfoot, David W. 1991. “Review of An historic tongue. Studies in English linguistics in memory of Barbara Strang (Routledge 1988)”. Language 67: 656–657.
cross-linguistic c.-l. datasee evidence, typological c.-l. variationsee variation, typological cultural evolutionsee evolution D datasee also evidence competence d.14 corpus d. 6–12, 18, 39–40, 42, 60–1, 71, 135, 162–3, 204, 261, 265, 275 cross-linguistic see evidence, typological d. from (non)-standard varieties12–4, 33, 114, 181–216, 261, 276 elicited d. 7, 10–2 experimental d.6–10, 12, 14–5, 32, 39–40, 57, 93–5, 103, 118–9, 171, 257, 260, 277, 285 intuitive d.7, 13–4, 16, 39, 186–8, 203, 261, 270, 277 see also grammaticality judgments performance d.12, 14–6 quantitative d. 7–9, 20 spontaneous speech d. 7, 10–4 web d.8, 11 deductive-nomological (D-N) model222–3 default68, 77, 152, 170 definiteness hierarchysee hierarchy dependents58, 76 derivation 88, 97–8 description 2, 32–3, 37, 81ff., 86ff., 96, 99, 102, 113–4, 167, 218, 250, 268, 270 d. vs explanation81–104, 109–12, 217–37 descriptive adequacy see adequacy dialect66–7, 78, 183–6, 193–8, 209–11 differential d. case-marking98, 162, 164
d. object marking (DOM)61–2, 90, 114, 159–61 d. subject marking159–61 d. typology159–60, 167 direct (negative) evidencesee evidence direct objectsee object discourse economysee economy discrete infinity149 displacement34, 147, 169 distinctive featuressee feature(s) ditransitive verbs155, 164 DNA94, 132–3, 141see also genetic code domain specificity see specificity double articulation 34, 153, 169, 176 E Early Immediate Constituents (EIC)32, 58–9, 71 economy34, 90–1, 101, 149, 152, 159, 264 discourse e.155 elicited data see data embedded question78 empirical method 3–4 empiricism 3–7 ergativesee case E. Case Parametersee parameter Esperanto32, 94, 199see also language, artificial evidencesee also data counter-e.4, 6 cross-linguistic e.89ff. direct negative e.139 direct vs. indirect e.7, 9–10, 33, 39, 188, 255 empirical e.1, 36–7, experimental e.6 external e.95, 102, 180 genetic e.21, 29 historical e.17, 37, 102, 204 negative e.8, 26–8, 40, 95, 262, 277
neurolinguistic e.10, 12, 14–5, 22–3, 94, 102 no negative e. problem43, 139 positive e.7–8, 62, 64–5, 77, 95, 156, 260, 275 typological e.17, 21, 31–6, 40–1, 51–73, 89–92, 96–101, 147, 150, 157–69, 203, 226–37, 242–3, 246, 270 qualitative vs. quantitative e.7 evolution 41, 67, 71, 84–5, 114–5, 117–45, 151, 169, 172, 178, 207, 211, 226, 236, 239, 262, 266, 269, 274see also adaptation(s); see also biology; see also selection cultural e.132, 135 e. of language33, 71, 117–74, 269 e. of syntax266 e. of linguistic structure129 evolutionary biology see biology exceptions 35, 92 existential quantificationsee quantification experiment(al) see data explanation70, 96, 117–9, 143–5, 163–4, 217–23, 237, 268, 277 evolutionary e.85, 117–8, 123–35, 143–5 frequency-based e.60, 62 functional(ist) explanation81–115, 121–3, 163–4, 226–37, 241–3, 245–7 of language change251–2, 268, 271, 277, 288 syntagmatic e.60–1 UG-based (/ generative) e.41, 58–61, 77, 81, 83–4, 86–9, 90–1, 111, 113–4, 118–9, 134–5, 223–6, 236, 241–3, 245–7, 277 usefulness-based e.60–1
Subject index
explanatory adequacy see adequacy explicit explicitly learned187 e. linguistic knowledgesee linguistic expressivity133, 152, 159 external e. evidencesee evidence e. languagesee language F faithfulness88, 159 falsifiability / falsification4–7, 39, 176–7 feature(s)54, 77, 129, 153, 157, 159, 161, 167, 189 agreement (AGR) f.77, 194 argument role f.161 case f.61 categorial f.167–8 distinctive f.152–3, 168, 176 [hr]160, 166 innate f. of language152 referential f.157 semantic f.152, 162 structural f.152, 154 universal f.157 figure-ground149, 155 final devoicing 88, 97 first languagesee language acquisition focus159, 165–6 formulaic see also holistic utterances 127 way of learning262 FOXP230, 40 frequency f.-based explanationsee explanation f. effect(s) 9–10 formal modeling of f.19–20, 41, 111 f. in language change265, 275 f. in typology 32, 55–6, 60, 62, 150, 164 learning mechanisms based on f. 23, 265, 274–5
functional categories154, 175, 251, 285 AGRsee agreement C(P)78, 115, 195 I(P)101, 104, 194 NEG(P)189, 203 TP78 functional grounding (of OT constraints)41, 111, 114 fuzzy categorization153see also gradience G Galilean style (of science)6, 39 game theory162, 171 Geisteswissenschaften2, 38 gene30, 40, 141, 272 generalisability128 genetic23, 29–30, 56, 148, 170, 263 g. code84–5, 93–4 see also DNA g. evidence29 g. defects, deficits, disorders25, 30 transmission132–3 genitive122, 200–1, 227 his-g.8, 39 -s-g.200, 204 prenominal g.200, 204 postnominal g.196–7, 200, 204 German noun plurals9 gestures151–2, 270 manual g.151–2, 169–70 vocalic g.151 gradience 5, 19, 39see also fuzzy categorization grammar change 36, 203, 270–1, 277, 283–9see also language change, child-based approach to grammatical g. competencesee competence g. viruses185, 198 grammaticality judgments 9, 14–6, 39, 187–8, 203, 207,
212see also data, intuitive d. grammaticalization 5, 35, 198, 273–4 H harmonic alignmentsee alignment head54, 59, 77, 87, 100–1, 122, 155, 159, 222 H. Directionality Parameter (HDP)see parameter h. marking157–8, 165 h. parametersee Head Directionality Parameter h. first / h. initial54, 57, 68–9, 76–7, 89 h. last /h. final54, 57, 68–9, 76, 89 h. ordering universal131 specifier-h.-agreement98, 189 hermeneutics 38 hierarchysee also scale accessibility h. (AH)70, 227–31, 233–5, 237, 246 animacy h.61 argument h.154–5, 158, 167, 170–1 constraint h.91 definiteness h.61 markedness h.227, 229 parameter h.51–3, 55 Prepositional NounModifier H. (PrNMH)58–9, 122 Relative Clause Accessibility H.70 hierarchical universalssee universal higher argumentsee argument historical evidencesee evidence holistic126–8, 262see also formulaic h. formulae265 h. (proto)language126, 128, 130, 132, 135, 140
293
294
Subject index
I I-languagesee language iconicity140, 143–4, 151–2, 165, 169–70 imitation151–2, 169 implication(al) i. definition232–3, 242 i. generalization51, 55–6, 217 i. relation52–3, 57, 60, 227–8, 233 i. scale81, 91 i. universalssee universal one-way / unidirectional i.81, 89–90, 226–7 two-way i. 89 implicit linguistic knowledgesee linguistic indirect objectsee object indirect evidencesee evidence induction i. algorithm126 problem of i.121 inductive2, 4–5, 267see also learning infinite expressivity133 inflection 24, 30, 88, 110, 284 inflection vs. derivation97–8, 273 information structure159, 165 initial state13, 25, 119, 134, 181, 188 interactor141 internal representationsee representation internetsee data, web d. introspection see data, intuitive d. intuitive datasee data I(P)see functional categories iterated learning117ff., 123ff., 130–5, 163, 171, 180, 265 Iterated Learning Model (ILM)13, 33, 117, 123, 124ff., 130–1, 144 K KE family 29–30
L L1see language acquisition L1 grammars224–5, 245–6 language acquisition21, 25–9, 33, 65, 86, 103, 117, 119–21, 134–5, 139, 147–8, 150–2, 167–8, 177, 183–4, 188, 200, 209, 224, 241, 250, 265–6, 269–70, 288 first l. a. / L1 acquisition33, 182–6, 190, 198–9, 224, 242 l. a. device (LAD)119–20, 134, 276 l. a. mechanism130 logical problem of L2 acquisition242 logical problem of l. a.25–8 see also poverty-of-thestimulus argument; see also learnability second l. a. (SLA) / L2 acquisition217ff., 241–2, 245–6 LADsee language acquisition device language l. areas (in brain)25 see also Broca’s aphasia artificial l.32, 94, 183see also Esperanto l. capacity17, 22–5, 29, 33, 148, 284 see also competence; see also l. faculty l. change17, 36, 124, 134–5, 139–41, 145, 168, 183, 186–90, 198–9, 203, 254, 264–6, 269–70, 275, 283–4, 286–9see also transmission as evidence for UGsee evidence, historical child-based (generative) approach to l. c.41, 139–40, 150, 168see also grammar change l. disorders17, 29–30 see also Broca’s aphasia; see
also KE-family; see also Specific Language Impairment; see also Williams syndrome E-l. / external l.16, 188, 250, 283 l. evolutionsee evolution l. faculty15, 17, 22–3, 25, 28–9, 38, 86, 104, 147–50, 153–4, 223, 257 see also competence; see also l. capacity l. games32, 95, 104 l. gene see FOXP2 I-l.16, 55, 188, 199, 250, 283 l. isolation29, 40 l. of mind /thought140, 153 N1 l.33, 184–5, 190, 199–200 N2 l.33, 184–5, 198–9 l. processingsee processing proto-l.33, 35, 71, 126–7, 140, 169, 262, 265 see also holistic l. savantssee savants sign l.29, 152, 169–70 spoken language184, 195, 261 standard l.181ff., 193, 198–200, 202, 209ff., 215ff., 261, 276 typologically (in)consistent l.57 l. universalssee universal l. use51–2, 59, 82, 84, 90, 93, 102, 110, 121–2, 131, 140, 143, 209, 262 l. user60, 123, 130, 141, 263–4, 270 written l.8, 11, 182–3, 185, 194–5, 199, 261, 264 learnability 217, 242, 263–5, 277see also language acquisition, logical problem of; see also poverty-of-the-stimulus argument learning 21–3, 33, 57, 65, 106–7, 118–23, 144–5, 168, 187,
Subject index
256–7, 259–60, 262, 264–8, 273–7see also language acquisition; see also lexical, L. Learning Hypothesis; see also iterated learning associative l.22–3, 29, 275see also l. model Bayesian approach to l.121, 134 inductive l.26, 58, 62 l. algorithm13, 72, 106, 124–5, 147–8, 171, 178, 180 l. model23, 46, 128, 144, 260see also Iterated Learning Model lazy l.20, 289 modular view of l.23 lexical categories154, 175 items65, 148, 153, 190 L. Learning Hypothesis71 L. Parameterization Hypothesis (LPH)65–6, 71 lexicon10, 65, 88, 154, 256 linguistic l. adaptationsee adaptation(s) l. bottlenecksee bottleneck l. capacity149, 156 l. descriptionsee description l. evidencesee evidence l. evolutionsee evolution l. explanationsee explanation explicit l. knowledge187 implicit l. knowledge187, 263 l. transmissionsee transmission l. variationsee variation linking l. elements in Dutch 19 l. splits159ff., 171 literacy185, 195 locality149, 159, 169
logical problem of L2 acquisitionsee language acquisition logical problem of language acquisitionsee language acquisition lower argumentsee argument ludlings see language, l. games M macroparameter78 manual gesturessee gestures markedness 37, 42, 44, 71, 76, 217–8, 238–9, 242, 246 asymmetric m. 135, 228 m. hierarchies/ scales 150, 161, 165–6, 227, 229 m. in L2226–37, 241–3 m. in Optimality Theory88, 90–2, 159, 161, 165–6 typological m.218, 226–37, 241–3 mental m. grammar(s)83–5, 104, 109, 113, 245 m. representationsee representation mentalism 2–3, 16, 81–5, 113–4, 188, 250, 263 metaprinciples149 microparameter78 microparametric variation / microvariation66–7 mirror neuron151 modular view of learning see learning modularity256, 274 morphological casesee case morphological productivity 19 multiple negationsee negation mutation30, 40, 156 N N1 languagessee language N2 languagessee language natural sciences 2–3, 6, 38, 277 natural selectionsee selection Negation182, 186–7, 189–91, 193, 199
multiple n.183, 186, 195–7, 199, 209–10, 215 negative n. concord (NC)186, 188–93, 199–200, 209 n. polarity155, 192–3 n. quantifier189–90, 199 negative evidencesee evidence NEG(P)see functional categories network(s)22–3, 125, 128, 132 neurolinguistic evidencesee evidence ‘no negative evidence’ problemsee evidence non-standard datasee data Null Subject Parametersee parameter number system149 O object cognate110 direct61–2, 89, 91, 98, 110, 113, 164, 194–5, 227 indirect o.96, 110, 113, 164, 194, 227 o./subject asymmetry155, 163–4 observational adequacysee adequacy observer’s paradox 12 Optimality Theory (OT)18–9, 35, 41, 61, 71, 88, 90–1, 97, 111–5, 120, 135, 159, 164, 171, 176, 211see also functional grounding (of OT constraints); see also markedness in Optimality Theory stochastic OT 18–9, 40–1, 70, 171, 211 Optional Polysynthesis Parameter (OPP)see parameter optionalitysee variation
295
296
Subject index
P parameter(s)28, 34, 52–4, 56–7, 59, 63, 66–7, 69, 76–8, 90, 131, 139, 177, 199, 201, 270–1 Ergative Case P.68–9 Head Directionality P.35, 54, 56–7, 65, 68, 77, 88–9, 131 P. Hierarchy51–3, 55, 57, 66–70, 72, 105 macrop.78 microp.78 Null Subject P.68–9, 89 Optional Polysynthesis P. (OPP)54, 67–9 Serial Verb P.69 V2 P.57 parametric variationsee variation parse p. arguments158 p. reference155, 157, 168 performance data see data phrase structure76, 87, 177 Plato’s problem see poverty-ofthe-stimulus argument Polysynthesis Parametersee Optional Polysynthesis Parameter positive evidencesee evidence poverty-of-the-stimulus argument 25–6, 28, 40, 62, 65–6, 86, 95, 176, 224, 257–60, 262, 277, 288 see also language acquisition, logical problem of l.a.; see also learnability predicate110, 126, 129, 140, 153–6, 158, 197 p.-argument structure140, 144 relational p.154 predication147, 151, 153–5, 159, 165 predictions 57, 89, 99, 110, 113, 134, 139, 143, 167, 233, 289 preference scalessee scale
Prepositional Noun-Modifier Hierarchy (PrNMH)see hierarchy prescriptive186–8, 193, 196, 198–9, 209, 215 preservation of relations149 primates151, 179see also ape principle5, 189, 220, 237, 267 metap.149 P. of Cross-Category Harmony75 p. of falsificationsee falsification P. of Subjacencysee subjacency Specificity P.35 UG p.34, 52, 59, 62, 65, 170, 189, 197, 224–5 Principles-and-Parameters (P&P) approach28, 34, 41, 52–3, 55, 57, 65, 70, 89–90, 120, 211, 270 pro-drop193–4 problem of linkage118, 121ff., 129 processing 10, 16, 30, 39, 71, 89, 91–2, 101, 122–3, 130–2, 148, 150, 159, 224, 230, 234, 246–7, 255–7, 269–70, 277 proposition140, 153–5 proto-languagesee language psychological reality 14–5, 119, 233, 242, 246 Q qualitative vs. quantitative evidencesee evidence quantification155–6, 200 existential189 quantifier(s)155, 182, 189–90, 193, 199see also negative quantifier quantitative data see data R rationalism 3 re-analysis 253, 264 recursion126–7, 130–1, 140, 149–50, 171
recursive compositionalitysee compositionality reference151, 153–5, 157, 259 r. grammar96, 100, 110 r. tracking155 referential r. featuressee features r. specificity159 reflexive pronoun59–60, 89–90 relational predicatesee predicate Relative Clause Accessibility Hierarchysee hierarchy relative clauses 58–9, 76, 122, 158, 227–30, 233–5, 237, 246 center-embedded r. c. 133 renaturalization184 replication132–3, 141 representation23, 63, 124, 140, 144, 149, 168, 176 economy of r.149, 152 internal r.129, 132, 141 mental r.94, 113, 145, 241, 246, 274, 276 resumptive pronouns 229–30, 237 S salience91, 158, 160–3, 260 savants24, 176, 180 scale19, 91, 98, 111, 160–7see also hierarchy cognitive s.152, 159–60, 166 implicational s.81, 91 markedness s.150 preference s.91, 111 salience s.161, 163 second language acquisition (SLA)see language acquisition selection132–3, 141, 144–5see also evolution semantic featuressee features Serial Verb Parametersee parameter sign languagesee language
Subject index
speaker-hearer symmetricsee symmetry Specific Language Impairment (SLI)24 specificity domain-s. 16–7, 21–9, 37, 121, 140, 218–9, 222–4, 226, 230, 232, 235–6, 242, 241–7 input(output) s.149 referential s.159, 162, 169 S. principlesee principle species-s. 21–2 specifier76–7, 87, 115, 221–2, 237 spoken languagesee language spontaneous speech data see data standard languagesee language; see also data, d. from (non)standard varieties statistical s. methods 9, 20 s. universalssee universal structure-sensitivity152, 168 subjacency 34, 41, 62–5, 78, 169, 224–6, 232, 237, 241, 246–7 symmetry103–4, 151, 154, 171 syntagmatic explanationsee explanation T third argumentsee argument topic54, 68, 159, 165–6, 169, 171 TPsee functional categories transmission123, 128, 131, 133, 140, 144–5, 265see also language, l. change genetic t.132–3 linguistic t.118, 122–3, 128, 131–5 social t.128, 131, 140, 265 typological evidencesee evidence t. markednesssee markedness t. variationsee variation
U unidirectionality hypothesis 5 universal(s)52, 57, 82–4, 90–2, 97, 102, 114, 117ff., 130ff., 139, 144, 149, 162, 217ff., 226ff., 229ff., 241ff., 245ff. absolute u.31, 33, 55, 57, 139 u. as preference scales91, 111 biconditional u.139 u. constraints159 u. features157 u. head ordering131 hierarchical u.123 implicational u.31, 55, 89ff, 91, 169, 232, 242 language u.33, 86, 96, 117–23, 129–31, 134, 139, 143, 227 scalar u.91 syntactic u.149 word order u.58, 87, 131 Universal Grammar (UG)13, 20, 27–8, 31–7, 40–1, 51–72, 75–8, 81–4, 86–97, 100–3, 114, 117–23, 134–5, 139–40, 143, 147–71, 175–80, 236, 255–63, 268–9, 272–3, 275–6 content of UG34–6, 150–7, 175, 179–80, 263–6 definition of UG118–9, 175–6 neurolinguistic approach to UG33, 148, 269 UG-based approach to language changesee grammar change UG-based approach to second language acquisition223–6, 241–3, 245–7 UG-based approach to typology51–5, 62, 65–6, 75–8 use of languagesee language
usefulness-based explanationsee explanation V V2 parametersee parameter variation 40–2, 78, 141, 209–11, 216, 250, 252, 254, 265, 270–2, 286 parametric v.52, 65–7, 76–7, 120, 170 syntactic v.134, 212 typological v.17–9, 34, 62–3, 71, 75, 118, 120, 122–3, 135, 139, 150, 155–70, 177, 224, 227, 270 word order v.122 W Williams syndrome 24–5 written language see language X X-bar schema/theory 35, 76, 87–9, 96, 104, 175–7
297
In the series Benjamins Current Topics (BCT) the following titles have been published thus far or are scheduled for publication: 10 Liebal, Katja, Cornelia Müller and Simone Pika (eds.): Gestural Communication in Nonhuman and Human Primates. xiv, 275 + index. Expected July 2007 9 Pöchhacker, Franz and Miriam Shlesinger (eds.): Healthcare Interpreting. Discourse and Interaction. 2007. viii, 155 pp. 8 Teubert, Wolfgang (ed.): Text Corpora and Multilingual Lexicography. x, 161 pp. Expected June 2007 7 Penke, Martina and Anette Rosenbach (eds.): What Counts as Evidence in Linguistics. The case of innateness. 2007. ix, 297 pp. 6 Bamberg, Michael (ed.): Narrative – State of the Art. 2007. vi, 271 pp. 5 Anthonissen, Christine and Jan Blommaert (eds.): Discourse and Human Rights Violations. 2007. x, 142 pp. 4 Hauf, Petra and Friedrich Försterling (eds.): Making Minds. The shaping of human minds through social context. 2007. ix, 275 pp. 3 Chouliaraki, Lilie (ed.): The Soft Power of War. 2007. x, 148 pp. 2 Ibekwe-SanJuan, Fidelia, Anne Condamines and M. Teresa Cabré Castellví (eds.): Application-Driven Terminology Engineering. 2007. vii, 203 pp. 1 Nevalainen, Terttu and Sanna-Kaisa Tanskanen (eds.): Letter Writing. 2007. viii, 160 pp.