Event Representation in Language and Cognition
Event Representation in Language and Cognition examines new research in...
43 downloads
1457 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Event Representation in Language and Cognition
Event Representation in Language and Cognition examines new research into how the mind deals with the experience of events. Empirical research into the cognitive processes involved when people view events and talk about them is still a young field. The chapters by leading experts draw on data from the description of events in spoken and signed languages, first and second language acquisition, co-speech gesture and eye movements during language production, and from non-linguistic categorization and other tasks. The book highlights newly found evidence for how perception, thought, and language constrain each other in the experience of events. It will be of particular interest to linguists, psychologists, and philosophers, as well as to anyone interested in the representation and processing of events. ¨ j urgen bohnemeyer is Associate Professor of Linguistics at the University at Buffalo, The State University of New York. He is the author of The Grammar of Time Reference in Yukatek Maya (2002). eric pederson is Associate Professor of Linguistics at the University of Oregon. He is the co-editor (with Jan Nuyts) of Language and Conceptualization (Cambridge, 1997) and Perspectives on Language and Conceptualization (1993).
Language, culture and cognition Editor Stephen C. Levinson, Max Planck Institute for Psycholinguistics
This series looks at the role of language in human cognition – language in both its universal, psychological aspects and its variable, cultural aspects. Studies focus on the relation between semantic and conceptual categories and processes, especially as these are illuminated by cross-linguistic and cross-cultural studies, the study of language acquisition and conceptual development, and the study of the relation of speech production and comprehension to other kinds of behaviour in a cultural context. Books come principally, though not exclusively, from research associated with the Max Planck Institute for Psycholinguistics in Nijmegen, and in particular the Language and Cognition Group. 1 Jan Nuyts and Eric Pederson (eds.) Language and Conceptualization 2 David McNeill (ed.) Language and Gesture 3 Melissa Bowerman and Stephen C. Levinson (eds.) Language Acquisition and Conceptual Development 4 Gunter Senft (ed.) Systems of Nominal Classification 5 Stephen C. Levinson Space in Language and Cognition 6 Stephen C. Levinson and David Wilkins (eds.) Grammars of Space 7 N. J. Enfield and Tanya Stivers (eds.) Person Reference in Interaction: Linguistic, cultural and social perspectives 8 N. J. Enfield The Anatomy of Meaning: Speech, gesture, and composite utterances 9 Giovanni Bennardo Language, Space, and Social Relationships: A foundational cultural model in Polynesia 10 Paul Kockelman Language, Culture, and Mind: Natural constructions and social kinds 11 J¨urgen Bohnemeyer and Eric Pederson (eds.) Event Representation in Language and Cognition
Event Representation in Language and Cognition Edited by
J¨urgen Bohnemeyer University at Buffalo, The State University of New York
and
Eric Pederson University of Oregon
cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, S˜ao Paulo, Delhi, Dubai, Tokyo, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521898348 c Cambridge University Press 2011
This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2011 Printed in the United Kingdom at the University Press, Cambridge A catalogue record for this publication is available from the British Library Library of Congress Cataloguing in Publication data Event representation in language and cognition / edited by J¨urgen Bohnemeyer and Eric Pederson. p. cm. – (Language, culture, and cognition) Includes bibliographical references and index. 1. Semantics. 2. Grammar, Comparative and general – Syntax. 3. Events (Philosophy) I. Bohnemeyer, J¨urgen, 1965– II. Pederson, Eric. III. Title. IV. Series. P325.E97 2010 401 .43 – dc22 2010041512 ISBN 978-0-521-89834-8 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
Figures Contributors Acknowledgments
page vii x xii
1 On representing events – an introduction ¨ eric pederson and j urgen bohnemeyer
1
2 Event representation in serial verb constructions andrew pawley
13
3 The macro-event property: The segmentation of causal chains ¨ j urgen bohnemeyer, n. j. enfield, james essegbey, and sotaro kita
43
4 Event representation, time event relations, and clause structure: A crosslinguistic study of English and German mary carroll and christiane von stutterheim
68
5 Event representations in signed languages ¨ urek ¨ aslı ozy and pamela perniss
84
6 Linguistic and non-linguistic categorization of complex motion events jeff loucks and eric pederson 7 Putting things in places: Developmental consequences of linguistic typology dan i. slobin, melissa bowerman, penelope brown, sonja eisenbei ß, and bhuvana narasimhan 8 Language-specific encoding of placement events in gestures marianne gullberg
108
134
166
v
vi
Contents
9 Visual encoding of coherent and non-coherent scenes christian dobel, reinhild glanemann, helene kreysa, pienie zwitserlood, and sonja eisenbei ß
189
10
Talking about events barbara tversky, jeffrey m. zacks, julie bauer morrison, and bridgette martin hard
216
11
Absent causes, present effects: How omissions cause events phillip wolff, matthew hausknecht, and kevin holmes
228
References Index
253 278
Figures
3.1 3.2 3.3 3.4 3.5 5.1 5.2 5.3 5.4
5.5
6.1 6.2 6.3 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8
Event segmentation – an introductory example page 44 ECOM E7 50 Early and late frame of ECR 18 62 Early and late frame of ECR 5 63 Early and late frame of ECR 23 64 Different construction types of spatial and activity predicates observed in our data 93 The percentages of different event predicate types in the two sign languages 99 The percentages of perspective types across the two sign languages 100 The distribution of combinations of different event space projections (character, observer) with different types of classifier predicates (aligned, non-aligned) in the two sign languages 101 Schemas for different possible uses of predicate types and perspectives deployed in event space representations in signed narratives 106 Average proportion manner choices by language group in Experiment 1 120 Average proportion manner choices by language group in Experiment 2 123 Average proportion of manner and path false alarms by language group 124 English placement schema (satellite-framed) 135 Spanish placement schema (verb-framed) 136 German placement schema (satellite-framed) 137 Russian placement schema (satellite-framed) 137 Finnish placement schema (satellite-framed) 137 Hindi placement schema (verb-framed) 138 Turkish placement schema (verb-framed) 138 Tzeltal placement schema (verb-framed) 139 vii
viii
7.9 7.10a 7.10b 7.10c 7.10d 7.11a 7.11b 7.11c 7.12
Figures
Scale of languages according to relative frequency of verbs at t1 Spanish preposition Turkish case-marking Hindi case-marking Finnish case-marking English placement category German placement categories Tzeltal placement categories English and German: Verb-of-placement constructions in caregiver speech 7.13 Verb-of-placement constructions in English and German child speech 7.14a English verb-of-placement constructions: Naomi and her parents 7.14b German verb-of-placement constructions: Simone and her parents 8.1 The task set-up with the Describer on the left and the Drawer on the right 8.2 Stimulus: placement of the bowl 8.3 Placement of bowl in Dutch with a posture placement verb, zetten, and a bi-manual gesture encoding object information in the hand shape 8.4 Placement of bowl in French with a general placement verb, mettre, and a gesture encoding simple-path, no object information 8.5 Placement in Dutch with a general placement verb, doen, ‘do, make,’ and a gesture encoding object information in the hand shape (right hand, grip around bananas) 8.6 Placement in Dutch with another specific placement verb, duwen, ‘push,’ and a gesture encoding object information in the hand shape (grip around chewing gum) 8.7 Placement in French with a specific placement verb, coller, ‘stick’, and a gesture encoding simple-path, with a flat hand, no object information 9.1 Examples of the naturalistic stimuli used in Experiments 2a, 2b and 3b, displaying events with one participant, two participants and three participants 9.2 Experiment 2a. Mean proportion of gaze time spent in different ROIs, depending on task (percent of time between picture onset and speech onset) 9.3 Experiments 3a and 4. Examples for coherent and non-coherent scenes (taken from Dobel et al. 2007)
147 150 150 151 151 153 153 154 162 163 163 164 175 178
179
180
182
182
183
201
203 206
Figures
9.4 11.1
11.2
11.3
11.4
11.5 11.6
Examples for stimuli of actions involving two participants, used in Experiments 3c, 3d and 3f Scene adapted from Freyd, Pantzer, and Cheng (1988) in which participants were asked to indicate whether the plant was located in the “same” position once a source of support was removed Configurations of forces associated with CAUSE, HELP/ ENABLE/ALLOW, and PREVENT; A = the affector force, P = the patient force, R = the resultant force; E = endstate vector, which is a position vector, not a force On the left side, two CAUSE relations are combined using the resultant force from the first cause relation (BA) as the affector force in the second cause relation (BBA ). On the right side, a PREVENT relation is combined with another PREVENT relation using the resultant of the PREVENT relation in the second premises as the patient vector in the PREVENT relation in the first premise The affector force in the conclusion, A, is the affector force in the first relation, A. The endstate in the conclusion is the endstate vector from the last premise. The patient force in the conclusion, C, is based on the vector addition of the patient forces, B and C in the premises The composition of two PREVENT relations can either lead to a CAUSE or ALLOW conclusion The configuration of forces in the top panel, which depicts a PREVENT ◦ PREVENT composition, was entered into a physics simulator to produce the movements of the cars in the animation depicted in the still frames in the bottom panel. First, car C attempts to cross the line but is prevented by car B, which approaches car C. Then, car A pulls car B away from car C with a rope, preventing car B from preventing car C. Finally, with car B out of the way, car C crosses the line
ix
209
236
241
243
243 245
247
Contributors
julie bauer morrison Glendale Community College ¨ j urgen bohnemeyer University at Buffalo, The State University of New York melissa bowerman Max Planck Institute for Psycholinguistics penelope brown Max Planck Institute for Psycholinguistics mary carroll Ruprecht-Karls-Universit¨at Heidelberg christian dobel Westf¨alische Wilhelmsuniversit¨at M¨unster sonja eisenbei ß University of Essex n. j. enfield Max Planck Institute for Psycholinguistics james essegbey University of Florida at Gainesville reinhild glanemann Westf¨alische Wilhelmsuniversit¨at M¨unster marianne gullberg Centre for Languages and Literature, Lund University matthew hausknecht University of Texas at Austin kevin holmes Emory University sotaro kita University of Birmingham helene kreysa Bielefeld University jeff loucks Institute for Learning and Brain Sciences, University of Washington bridgette martin hard Stanford University bhuvana narasimhan University of Colorado at Boulder ¨ urek ¨ asli ozy Radboud University Nijmegen and Max Planck Institute for Psycholinguistics x
Contributors
andrew pawley Australian National University eric pederson University of Oregon pamela perniss Radboud University, Nijmegen, Max Planck Institute for Psycholinguistics, and DCAL, University College London dan i. slobin University of California, Berkeley barbara tversky Stanford University and Columbia Teachers College christiane von stutterheim Ruprecht-Karls-Universit¨at Heidelberg phillip wolff Emory University jeffrey m. zacks Washington University pienie zwitserlood Westf¨alische Wilhelmsuniversit¨at M¨unster
xi
Acknowledgments
The origins of this volume lie in the Event Representation project at the Max Planck Institute for Psycholinguistics. From 2000 to 2004, this project brought together researchers studying lesser documented languages in the field and scholars studying child language development to explore universals and variation in how events are described across languages. Several of the contributing authors were members or external collaborators of this project (Bohnemeyer and Bowerman jointly directed the project and Brown, Eisenbeiß, Enfield, Essegbey, Kita, Narasimhan, Pederson, and Slobin participated) or members of institute research projects on co-speech gesture, language production, multilingualism, and sign language who collaborated with Event Represen¨ urek, Perniss). The Max Planck Institute for tation (Dobel, Gullberg, Ozy¨ Psycholinguistics is unique in the breadth of the different approaches to the interface between language and cognition its researchers are able to provide. The multifaceted perspective that is the result of this breadth is well reflected in the present collection. Moreover, the research presented in five of the ten chapters of the body of the book was wholly or in part funded by the Max ¨ urek, and Planck Society (Bohnemeyer et al., Dobel et al., Gullberg, Ozy¨ Perniss, Slobin et al.). The Event Representation project was highlighted by two workshops dedicated to the topic of event encoding in language and mind. These workshops brought together participants of the project and some of the premier scholars of event representations in linguistics, psychology, and philosophy from outside the project. The first of these was organized by Bohnemeyer at the Max Planck Institute in Nijmegen in 1999; the second in 2004 was organized by Pederson and Russell S. Tomlin, of the University of Oregon, as well as by Bohnemeyer. This second symposium was sponsored by the University of Oregon Foundation, the University of Oregon College of Arts and Sciences, and the Department of Linguistics. As for the current volume, the chapters by Bohnemeyer et al., Dobel et al., Loucks and Pederson, and Pawley all evolved out of presentations at the Eugene symposium. Carroll and von Stutterheim and Wolff likewise presented from their ongoing research on event representation in language and xii
Acknowledgments
xiii
cognition in Eugene. Zacks and Tversky’s joint research was presented on both occasions (by Tversky in Nijmegen and by Zacks in Eugene). It was during the Eugene symposium that the idea for this volume was conceived. It was clear from the beginning that the goal would be a record, not so much of the proceedings of the symposium, but rather of the state of the art in research on the relation between linguistic and cognitive event representations. Consistent with this, however much the current volume may trace a history back to this symposium, the chapters reflect a broad body of scholarship far beyond the original conference. We would like to thank the contributors, the editors in charge of the project at Cambridge University Press, Helen Barton and Joanna Garbutt, and the series editor Steve Levinson. We should particularly thank Levinson, who in his capacity as Director of the Language and Cognition research group at the Max Planck Institute for Psycholinguistics instigated the Event Representation project, made it possible, and served as a source of ideas and advice throughout its development. We would also like to thank the two anonymous reviewers of the book proposal for their valuable suggestions for improvement, Carolyn O’Meara for compiling the bibliography, Randi Tucker for assistance during the proofreading process, and Linda Konnerth and Holly Lakey for producing the index, and Jill Lake for meticulous and impeccable copy-editing. In the end, this volume has been the product of the efforts of many individuals contributing in many different ways.
1
On representing events – an introduction Eric Pederson and J¨urgen Bohnemeyer
This volume presents a collection of essays reporting on new research into the relationship between event representations in language and mind. In recent decades, linguists have increasingly invoked the notion of ‘events’ – under this and other labels – in modeling the meanings of natural language expressions. Indeed, numerous aspects of the structure of human languages are now commonly seen across theories and frameworks as geared towards the task of expressing event descriptions. Like many of the constructs of semantic analysis and theory, the concept of ‘event’ has been influenced by the work of philosophers and natural scientists, usually with no more than a passing acknowledgment of the puzzles and controversies besetting its philosophical treatment (see Pianesi and Varzi 2000 for an overview). Philosophers have referenced the concept since antiquity, especially in treatments of causality (the subordinate notion of ‘actions’ has been used even longer in moral philosophy). However, events and their properties do not appear to have become topics of ontological research before the twentieth century, and their status must at present be considered far from settled. Even more glaring is the contrast between the rich and imposing architecture of event representations in language envisioned by many semanticists and the limited and scattered research on the status, nature, and role of event representations in the cognitive processing of perception and action by psychologists. The research presented in this volume aims to make advances towards bridging the gap between linguistic and psychological research by illuminating from various perspectives the relationship between linguistic and cognitive event representations. The chapters come from different traditions and use different methods, but each presents empirical research on the interaction of linguistic and cognitive event representations. Some draw on data from the linguistic categorization of events in single languages (Pawley; Tversky et al.; Wolff et al.). Others directly compare results from multiple spoken (Bohnemeyer ¨ urek and et al.; Slobin et al.; Carroll and von Stutterheim) or signed (Ozy¨ Perniss) languages. Further, first language acquisition (Slobin et al.) and gestures accompanying speech (Gullberg) are examined. Attention and the visual 1
2
Pederson and Bohnemeyer
processing of stimuli during language production are examined (Dobel et al.). Two studies look at the non-linguistic categorization of event stimuli in the context of language use (the components of motion events in Loucks and Pederson; and event segmentation in Tversky et al.). By presenting this set of different perspectives on the relationship between event encoding in language and internal cognition, the volume provides an overview of the research that has been conducted into this question. Our hope is that this will foster cross-stimulation, in that researchers interested in one approach (or method, or source of evidence) will find helpful the lessons from those pursuing other approaches. 1
Previous treatments of event representation in linguistics and psychology
Grammarians through the ages have relied on what one might think of as “expert folk theories” of event description in language. These are sets of unstated assumptions involving undefined notions that are presupposed by linguistic analyses. As an example, the practice of defining the verb as a part of speech or ‘lexical category’ (wholly or in part) with reference to the semantic property of describing (kinds of) actions or events can be traced back (in the European tradition) at least as far as the Greek grammarian Apollonius Dyscolus of the second century AD (Luhtala 2002: 279). Yet, explicit theories of event semantics would not be developed until the late twentieth century. It is impossible to characterize the assumptions folk theories consist of without turning them into something they are not – explicit statements. That said, the following core assumptions, even though they are couched in the terminology of contemporary linguistic theory, seem compatible with a great many of the folk theories implicit across the scholarship on language structure. r Verbs generally describe (kinds of) actions or events. r The arguments and complements of verbs – for example, subject, object, and perhaps certain kinds of embedded clauses – describe entities (or perhaps other events) involved in the event which is described by the verb (event participants). r The roles that characterize the ways in which the participants are involved in the event – roles such as agent, theme, and recipient – are typically reflected by the syntactic properties of the expressions describing them. That is to say, the relationship between a verb and its arguments reflects these relationships between the event and its participants. r The meanings of sentences and clauses involve states of affairs or propositions which may be about the reality or realization of the event described by the main verb of the sentence or clause.
On representing events – an introduction
3
To make this a little more concrete, consider the example in (1): (1)
Sally gave Floyd a book on event semantics on Monday with a conspiratorial wink
On the standard view of event encoding in contemporary linguistic theory, this sentence asserts a proposition concerning the occurrence of an event of the kind described by give, with the participant named by the subject, Sally, as the agent (here: the giver), the one named by the first (or ‘primary’) object, Floyd, as the recipient, and a third participant described by the second (and ‘secondary’) object, the noun phrase a book on event semantics, in the role of theme. All of the semantic properties just mentioned have been the focus of theorists’ attention since the 1960s; but all of them have been part of implicit assumptions about event description from the beginning of scholarly work on the structure of language. Indisputably, the most influential step in the development of event semantics was the publication of the paper ‘The logical form of action sentences’ by the philosopher Donald Davidson in 1967. Davidson’s point of departure is a subtle observation: many adverbials, rather than functioning as true predicate modifiers, show an “intersective” behavior vis-`a-vis the verb. For example, on Monday and with a conspiratorial wink in (1) do not so much single out particular kinds of giving, but rather impose independent constraints on the action described by the verb: the verb and its arguments require the action to be a giving of a book to Floyd by Sally, and the adverbials require the action to have taken place on a Monday and to have been conducted with a conspiratorial wink. In predicate-logic terms, it seems that the verb and the adverbials are all interpreted as predicates over the same argument, and that argument is not expressed by any of the syntactic arguments of the verb, but rather refers to the event itself. To formalize this insight, Davidson proposes that content words such as verbs and adverbs – and the nouns, adjectives, prepositions, and so forth that combine with them – express predicates, not just over individual arguments of the traditional kind referring to animate beings, inanimate things, and perhaps also abstract things, but over event arguments – existentially bound argument variables whose values are events. Since 1967, Davidson’s framework (and its numerous variants and offshoots) has been applied to many other problems of event semantics. One example is the theory of semantic (or ‘thematic’) roles such as agent, patient, recipient, and so on, alluded to above, the origins of which can be traced as far back as the Sanskrit grammarian P¯an.ini of the fourth century BC. Most syntactic arguments of a sentence have referents that bear semantic roles in the event described by the sentence, and the syntactic properties of each argument reflect the semantic role of its referent. For example, in an English sentence with three syntactic arguments in the active voice such as (1), the agent is expressed by the
4
Pederson and Bohnemeyer
syntactic subject (Sally), the recipient by the first or ‘primary’ object (Floyd), and the theme by the ‘secondary’ object (a book on event semantics). Semantic roles mediate between the structure of an event and the syntactic structure of a sentence describing it, making the latter an abstract iconic representation of the former. This makes semantic roles key elements of what is often called the ‘interface’ between syntax and semantics, i.e., the principles that govern the mapping between form and meaning in language.1 In the 1970s, semanticists started thinking about how to model meanings that transcend the sentence level. The result was the family of so-called ‘dynamic’ approaches to semantics, which view meanings as properties, not of sentences, but of utterances in contexts. (Simplifying somewhat, sentences are complex linguistic signs composed of words and phrases, whereas utterances are actions that involve the use of such signs.) The most widely used of these frameworks is discourse representation theory (DRT; e.g., Kamp and Reyle 1993). From its beginnings, the modeling of the semantics of temporal operators such as tenses and viewpoint aspects has been one of the central goals of DRT. Viewpoint-aspectual meanings are illustrated by the contrast between progressive forms such as was pushing and simple past tense forms such as drew: the former present the events described in the sentences as ongoing at some reference point understood in context, whereas the latter describe these events as completed within a reference time frame. The miniature narrative fragment (2) is interpreted to the effect that the time during which Floyd’s pushing is described as ongoing is identical to that during which Sally’s drawing a circle is completed: (2)
Floyd was pushing a cart. Sally drew a circle
In DRT, the sentences in (2) are modeled as introducing the ‘run times’ of the events of Floyd pushing a cart and Sally drawing a circle as ‘discourse referents’ with the tenses and viewpoint aspects encoding relations between these times. An even more radical departure from the Aristotle–Frege tradition in logical semantics is proposed in another dynamic theory of linguistic meaning, situation semantics (Barwise and Perry 1983). Instead of expressing, or being about, propositions, which are true in certain possible worlds and false in others (i.e., are functions from possible worlds to truth values), the meanings of utterances are assumed in situation semantics to be exemplified by situations which ‘support’ or ‘exemplify’ the utterances. The notion of ‘situation’ is given wide interpretation in this approach, ranging all the way from Davidsonian events on one end to possible worlds (Lewis 1986) on the other. Among approaches to semantics that view meanings in terms of relations between utterances and mental states or cognitive representations, Jackendoff’s 1
The theory of semantic roles has been formalized in a Davidsonian framework in Carlson (1984) and Parsons (1990).
On representing events – an introduction
5
(1983) conceptual semantics takes a position similar to that of situation semantics, viewing the highest-order conceptual functions expressed in sentences and utterances as characterizing events and states rather than propositions. These few remarks should convey how explicit event semantics has gradually developed over the course of the past decades into something approaching a common metalanguage which can be used by researchers of diverse theoretical backgrounds in dealing with a vast array of aspects of the structure of natural languages.2 In psychology, the question of the relationship between object perception and event perception has played a significant role in perception research. James J. Gibson and Gunnar Johansson independently of one another amassed evidence to the effect that not only are there distinctive gestalt patterns of events in the flow of perceptual information (Gibson’s ‘styles of change,’ von Fieandt and Gibson 1959), but event perception must in fact at least in some ways be prior to object perception. Gibson’s and Johansson’s (1973, 1975) approaches to perception differ in the amount of inherent mediating information they ascribe to the perceptual system. Johansson assumes that the perceptual system operates on certain rules that allow it to prioritize in cases of informational ambiguity in the input, rules that, e.g., favor a rigid-rotation perception over an elastic-deformation perception in case the input permits both construals. Gibson, in contrast, proposes a theory of ‘direct perception’ in which the perceptual system is assumed to be attuned to an environment informationally rich enough so as not normally to give rise to the kinds of ambiguities that occur under lab conditions. This approach has become known as the ‘ecological’ approach to perception psychology. Gibson attempts to minimize the role of event concepts or ‘schemas,’ arguing that there is a principled coordination of the schematic properties of perceived events and those of actions potentially carried out by the observer: the schematization of objects and events is based on ‘affordances’ of the objects and events, i.e., those properties that are functionally relevant for potential action in the environment. Surveys of the early work on event perception and cognition in the Gibsonian framework are offered by Warren and Shaw (1985) and McCabe and Balzano (1986). The idea of direct coordination between event perception and action has more recently been taken up in Prinz’s common coding theory (e.g., Prinz 1997). The threevolume Handbook of Perception and Action edited by Prinz and colleages (Prinz and Bridgeman 1995; Heuer and Keele 1996; Neumann and Sanders 1996) provides a comprehensive overview with a focus on the conditions for an integrated account of perception and action.
2
For more comprehensive overviews of the history of event semantics, see Higginbotham (2000) and Rothstein (1998b). Tenny and Pustejovsky (2000) survey the role of event semantics in the theory of language structure.
6
Pederson and Bohnemeyer
Like Gibson’s and Johansson’s work, the ground-breaking work conducted by A. Michotte and his collaborators on the processing of causal information in perception was influenced by gestalt psychology. In a classical study described in Michotte and Thin`es (1963), participants were shown displays of virtual ballistic collisions between two rectangles. The participants were aware that the rectangles were not actually three-dimensional objects and their movements were not actually ballistic. They nevertheless reported seeing a variety of different types of ballistic collisions, the precise type depending on parameters such as the length of the time interval between the contact of the two rectangles and the beginning of the motion of the second rectangle. For more recent replications of some of Michotte’s studies with contemporary stimuli and response measures, see Schlottman and Anderson (1993). Michotte interpreted his findings to the effect that participants directly perceive the causality in these events, rather than merely inferring it from the succession of sub-events, as Hume’s (1739–1740) classical account of causal reasoning suggests. Verfaillie and Daems (1996) used Michotte-style ballistic collision stimuli in a chronometric study in which participants identified agents and patients. This study provided evidence for a cognitive basis of the theory of semantic roles mentioned above. Newtson (1973), working within the field of social psychology, conducted classic studies on the role of mereological (part–whole) structures in the cognitive encoding of events. Newtson showed participants a video in which an actor filled in a questionnaire, smoked a cigarette, and read in a book. The participants were then asked to segment the events in the clip into units by pressing a button at event boundaries (‘break points’). Half of the participants were given a task of fine-grained segmentation, the other half were asked to perform coarsegrained segmentation (Newtson 1973: 30). Correspondence of fine-grained and coarse-grained break points was found to be relatively low. This led Cohen and Ebbesen (1979) to question the existence of task-independent knowledge of event mereologies. However, Zacks and Tversky (2001) repeated Newtson’s experiments with different stimulus scenes and more precise measures of event boundaries and found a much higher rate of coincidence across coarsegrained and fine-grained segmentations than Newtson (see also Tversky et al., this volume). Other classic studies that have produced evidence of mereological knowledge in event cognition include Newtson and Engquist (1976) and Jenkins, Wald, and Pittenger (1986). Jenkins et al. (1986) prepared a series of picture stills representing three different event sequences. In one sequence, a woman prepares a cup of tea; in a second, a teenage girl is shown having a conversation on the telephone, and the third series shows stills from a party. Unlike the first two, the third series of pictures does not represent one coherent event sequence. Every third picture was removed from each sequence. The participants were then tested for recognition of pictures from a randomly ordered
On representing events – an introduction
7
subset of the pictures shown originally and foils from the subset that had been taken out. The authors demonstrated that the participants were more likely to detect foils from the less coherent event sequence. They argued that a less coherent sequence induces participants to memorize each individual picture, rather than to encode the entire series as a representation of an event. This suggests that event concepts may spell out mereological information and that dynamic stimuli that do not conform to any recognizable mereological schema are not readily conceptualized and memorized as instances of complex events. It is against the backdrop of this scholarship that the chapters in the current volume contribute to the study of event representations in perception, action, mind, and language.
2
The current volume
Half of the chapters in this volume directly draw data from crosslinguistic comparison in their investigation of the relationships between form, meaning, and conceptual representation. By asking what properties of linguistic event representations vary across languages and what are universal, crosslinguistic approaches open one empirical window on the principles that govern the mapping between event representations in language and internal cognition. Andrew Pawley kicks off this series of comparative chapters by sketching a somewhat extreme example of what is possible in terms of the lexical and syntactic resources used in event descriptions. Kalam, a language of the highlands of Papua New Guinea, has only a small set of around a hundred verb roots with very elementary meanings.3 The vast majority of meanings lexicalized in simple verbs in English require various kinds of so-called ‘serial verb constructions’ for their expression in Kalam. Such constructions involve strings of two or more verbs which are not syntactically dependent upon one another, but may together form a single clause or phrase. Serial verb constructions not only render meanings expressed by single verbs in English, but also meanings expressed by combinations of verbs and prepositional phrases or verbs and embedded clauses. In his classic 1987 paper, Pawley argued that at least for the purposes of talking about them, Kalam speakers must conceptualize events quite differently from how they are conceptualized by native speakers of English. This conclusion was challenged by Giv´on (1990, 1991a) on the basis of experimental data. Giv´on compared descriptions of a video clip by speakers of Kalam, 3
This is not a unique case: languages with closed and very small classes of verb roots appear to be common both on New Guinea and in northern Australia (there are also reports from other parts of the world; e.g., Dickinson 2002 and Sakel 2004 describe two unrelated languages of the Andes with similar phenomena).
8
Pederson and Bohnemeyer
three other Papuan languages, and Tok Pisin, an English-based Creole spoken in Papua New Guinea and elsewhere in Melanesia. He found that serial verb constructions are not likely to be interrupted by pauses, suggesting that they are produced as chunks, like single words. Giv´on took this as evidence that the same cognitive event representations expressed by single-verb sentences in English can be expressed by serial verb constructions in languages such as Kalam.4 In his contribution to the present volume, Pawley provides his response. He draws a distinction between two types of serial verb constructions, which he terms ‘compact’ and ‘narrative.’ Compact series do not permit the insertion of any material between the verbs that constitute the series. Their meanings can generally be expressed by single verbs in English. In contrast, narrative series may span multiple clauses and correspond to multi-clausal narratives in English, following similar principles in terms of the order of presentation and the type of information presented at each stage. Yet, like compact series, they typically fall under a single intonation contour, and each sequence of verbs appears to be stored as a template in long-term memory. Narrative series thus provide evidence even by Giv´on’s criteria that the segmentation of events in the preverbal message during narrative production differs between speakers of English and Kalam. The Pawley–Giv´on debate illustrates several larger points which serve as a good introduction to this set of chapters. First, languages differ quite drastically in the lexical and syntactic resources they provide for event descriptions and the constraints they impose on the use of these. Second, research into the semantic impact of this variation – and thus into crosslinguistic variation and uniformity in the event representations expressed in language – has been rather preliminary to date and there is no agreement on methodological standards. Thirdly, attitudes towards the question of semantic variation have been heavily informed by the universalism–relativism and nature–nurture debates: what aspects of language and cognition are language-specific and/or culture-specific and thus presumably learned, and what universal and thus potentially innate? Attempts at shedding empirical light on these questions have all too often been overwhelmed by ideological preconceptions and prejudices. The second of the above points has recently been addressed by Bohnemeyer et al. (2007). Typologists – students of language variation and universals – have long used an intuitive distinction between constructions that describe “single events” vs. “multiple events.” For example, Floyd went from Rochester to Buffalo might be said to describe a single event, whereas Floyd left Rochester and arrived in Buffalo might be said to describe a sequence of two events. Bohnemeyer et al. (2007) propose that this intuition can be formalized using 4
For a recent typological survey of serial verb constructions, see Aikhenvald and Dixon (2006).
On representing events – an introduction
9
the “scopal” properties of temporal operators as a criterion: descriptions that intuitively refer to a sequence of multiple events allow the speaker to “time” the sub-events independently of one another (e.g. Floyd left Rochester at eight and arrived in Buffalo at nine), whereas descriptions that intuitively refer to a single event do not (?Floyd went from Rochester at eight to Buffalo at nine sounds odd). They call this property of event descriptions being compatible only with those time expressions that refer to the time of the entire larger event the ‘macro-event property’ (MEP) and the descriptions that have the MEP ‘macro-event expressions.’ Equipped with the methodological innovation of the MEP, Bohnemeyer et al. (2007) find variation in the representations of motion events far exceeding the assumptions of previous work. But the study also uncovered principles that are shared across the languages of the sample. In their contribution to this volume, J¨urgen Bohnemeyer, N. J. Enfield, James Essegbey, and Sotaro Kita present a case study that applies the methodology of Bohnemeyer et al. (2007) to a new domain: the expression of causality. They examine the segmentation of causal chains into macro-event expressions in four unrelated languages: Ewe, Japanese, Lao, and Yukatek Maya. Like the study on motion event segmentation before it, the present study shows that languages differ in the events for which they provide macro-event descriptions. The source of these differences is variation in both the availability of lexical expressions for concepts and syntactic constructions to combine these. Mary Carroll and Christiane von Stutterheim examine the impact of language-specific patterns in the mapping between information perspective and syntax on event descriptions. Information perspective identifies referents as new to the discourse vs. already established, as foregrounded vs. backgrounded in an utterance, etc. English and German, though closely related, differ in the interface between information perspective and syntax. In English event descriptions, new referents are by preference introduced in existential predications, bumping the categorization of the event into a second clause which is often subordinate (e.g., There is a girl shopping in a supermarket). In contrast, German event descriptions freely permit introduction of new referents in indefinite noun phrases, thereby allowing the categorization of the event to take place in the same clause (e.g., ‘A girl shops in a supermarket’). This is the first study ever to look at the role of information perspective in the structure of event descriptions. The authors present evidence of secondary effects on the segmentation of event descriptions. Turning to event representation in the sign languages of the Deaf, Aslı ¨ urek and Pamela Perniss look at Turkish and German Sign Language and Ozy¨ the ways these are anchored in the established reference space and with respect to the signer’s body. In order to represent events about action, motion, and location (e.g., to depict flipping a pancake), signers need to project the referents and the event space onto their body and the space around them. The authors
10
Pederson and Bohnemeyer
investigate the similarities and differences in perspective choice and its interaction with event descriptors in these sign languages. They suggest that although the visual-spatial modality might constrain and homogenize expressive possibilities in sign languages, there remains diversity in the expression of events across sign languages just as is reported for spoken languages. Better understanding this bodily and spatial linguistic expression of events can broaden our understanding of how events are represented in languages more generally. The next two chapters focus on Talmy’s (1985, 2000b) well-cited typology of verb-framed vs. satellite-framed motion descriptions. Verb-framed descriptions express the path component (information about from/to where the ‘figure’ moves) in the main verb root; satellite-framed descriptions express the path peripherally to the verb (e.g. adverbially or in a particle). Languages tend to systematically favor one type of description or the other, based on lexical and syntactic factors. Several studies in recent years have investigated the question whether speaking one type of language or the other (especially as one’s native language) influences the cognitive processing of motion events. In their chapter, Jeff Loucks and Eric Pederson report on two studies they conducted with speakers of English, Japanese, and Spanish, involving the categorization of human motion events. A separate set of speakers were asked to describe these same motion stimuli. There appears to be no general support for cognitive effects of Talmy’s patterns in that all groups demonstrated no consistent bias in their categorization strategies. Loucks and Pederson conclude with suggestions for revising Talmy’s typology for these purposes as well as critiquing the methods so far employed by this line of research. Dan I. Slobin, Melissa Bowerman, Penelope Brown, Sonja Eisenbeiß, and Bhuvana Narasimhan use child and adult language descriptions of placement (“putting”) events in four satellite-framed (English, Finnish, German, Russian) and four verb-framed (Hindi, Spanish, Turkish, Tzeltal Maya) languages to examine the extent to which child language follows patterns of adult variation or is largely constrained by pre-linguistic and universal notions of event encoding. This is one of the first studies examining the developmental patterns of event encoding. While the languages can be roughly categorized following Talmy’s typology, the authors find fine-grained crosslinguistic variation within each of the two groups. For those event features which are perceptually salient, even quite young children prove sensitive to these finer-grained adultlanguage characteristics. The authors argue that a “multiplicity of interacting factors . . . each with its own language-specific constraints and regularities” must be assumed to account for this variation in child sensitivity to input variation. Also looking at the expression of placement events, Marianne Gullberg investigates co-speech gestures in Dutch and French. Taking gesture as indicative of at least some aspects of underlying cognitive event representations, her
On representing events – an introduction
11
study explores to what extent the semantic properties of habitually used verbs guide speakers’ attention to certain types of information. French has a general placement verb mettre ‘put.’ In contrast, Dutch speakers pervasively use positional verbs such as zetten ‘set’/‘stand’ and leggen ‘lay’ to describe the placement of animate and inanimate entities alike. The choice of which Dutch verb to use depends above all on the shape of the figure (the entity whose placement is at issue). Analysis of the co-speech gestures reveals that Dutch speakers are more likely to represent the shape of the figure in their gestures. Conversely, French speakers’ gestures show a focus only on the path of the placement movement. Importantly, these perspectives in gesture permeate the entire placement domain regardless of the actual verb used, suggesting a broad effect of language typology on the attentional preferences of the speakers, rather than just an immediate effect of current lexical choice or context. Christian Dobel, Reinhild Glanemann, Helene Kreysa, Pienie Zwitserlood, and Sonja Eisenbeiß present eye-movement data on the visual processing of event stimuli during language production. Using pictorial representations of events, while admittedly static, allows a relatively straightforward eye-tracking paradigm to measure the visual attention of speakers. The authors’ findings indicate a rapid initial extraction of the ‘gist’ of a scene, affording, for instance, assessments of coherence. This ‘apprehension phase’ precedes any fixations that are needed to identify actions and participants. This series of experiments moves the field toward a more interactive model of vision and language. The task demands of speech production (e.g., the need to describe an event) have an early and profound interaction with the various perceptual factors of an event in how this event will be initially processed. Barbara Tversky, Jeffrey M. Zacks, Julie Bauer Morrison, and Bridgette Martin Hard summarize the research on knowledge and attribution of event mereology (part–whole structures) mentioned above. Their perspective is a comparison of event mereology with studies of the mereology of the human body. A key finding in both domains is that participants who are given the task of describing the stimuli tend to segment them more finely than participants asked to perform segmentation only. The authors argue that language use drives participants to shift their focus of attention away from more purely perceptual features of the stimuli toward functional or intention-related features. This suggests a role of language as a “cognitive . . . tool that can guide and craft perception, thought and action.” Phillip Wolff, Matthew Hausknecht, and Kevin Holmes close the volume by stepping into a long-standing debate in philosophy and the cognitive sciences on the principles governing a central aspect of event cognition: the attribution of causality. The authors distinguish externalist ‘outcome’ theories, which treat causal attribution in terms of the conditions that make statements of causality true or assign them some probability, and internalist ‘process’ theories, which
12
Pederson and Bohnemeyer
view causal attribution as guided by conceptual models. They present a process model developed by Wolff and colleagues that extends Talmy’s (1988) work on the representation of ‘force-dynamic’ interactions between entities in semantics and cognition. An important advantage of the authors’ ‘force theory’ over other process approaches is that it makes the right predictions for causation-byomission scenarios, a notorious problem for process approaches. This is shown using linguistic descriptions of causation-by-omission stimuli as evidence. All in all, the chapters collected in this volume constitute an early attempt to map out a new field of inquiry. A field which recognizes event representation as more than a simple cognitive process, but the result of interaction between the cognitive systems controlling language and other cognitive and perceptual domains. More interestingly, we are beginning to assemble evidence about the ways in which these cognitive systems not only interact but constrain and influence one another. One might anticipate that the questions this volume raises will seem na¨ıve in a few decades, but for now, they are the foundational questions needing careful examination. We hope that this volume may help elevate the importance of events in models of cognitive and linguistic behavior as the fundamental nature of the event in human cognition clearly must be central to any story of human behavior and development.
2
Event representation in serial verb constructions Andrew Pawley
1
Talking about events in Kalam and English
What can linguistic representations tell us about how people conceive of events?1 This chapter revisits an earlier debate on that question which focused on event representation in serial verb constructions (SVCs) in certain languages of New Guinea. Underlying the debate, between Tom Giv´on and me, was the general question of whether people who speak languages (or linguistic genres) with different semantic categories and structures live in partly different conceptual worlds or whether such linguistic differences are largely superficial and are not a reliable indicator of differences in worldview. The debate was provoked, in part, by a paper comparing the way events are reported in English and in Kalam, a language spoken by about 20,000 people in the Bismarck and Schrader Ranges, on the northern fringes of the central highlands of Papua New Guinea (Pawley 1987).2 Giv´on felt that my conclusion that English and Kalam have markedly different conventions for reporting events, so that isomorphic or quasi-isomorphic translation of the reports was often impossible, could be read as adopting a position of “extreme culturerelativism” (1990: 22). A central issue was the definition of ‘(conceptual) event’ and the degree to which there is isomorphism between event boundaries defined by syntactic, semantic, and pause-placement or intonational criteria, respectively. Kalam belongs to the large Trans New Guinea (TNG) family, containing some 400 languages, which dominates the central highlands of New Guinea. 1
2
I am indebted to J¨urgen Bohnemeyer for incisive editorial comments on a draft of this chapter and to various colleagues and students, in particular Wally Chafe, George Grace, Tom Giv´on, and Jonathan Lane, for helpful discussion of many of the issues treated here. The main data source is an extensive collection of tape-recordings and texts on Kalam traditional knowledge and use of animals and plants by Ian Saem Majnep and his collaborators, chiefly Majnep and Bulmer (1983, 1990, n.d.) and Majnep and Pawley (n.d.). My fieldwork on Kalam was supported by grants from the Wenner-Gren Foundation, the University of Auckland, and the University of Papua New Guinea. The 1987 paper was presented at a 1984 conference at Eugene, Oregon and preceded Giv´on’s fieldwork in Papua New Guinea in 1986. Giv´on notes that he was also responding to a 1976 draft paper on Kalam that was an early version of Pawley (1993).
13
14
Pawley
Kalam speakers in the Kaironk, Simbai and Asai Valleys had their first contact with government patrols in the mid 1950s. In the early 1960s scholars from the University of Auckland began a long-term project among the Kalam of the Upper Kaironk Valley, investigating their perception and use of the environment, social organization, material culture, and language.3 I joined the project in late 1963, with the assignments of writing a grammar for my PhD thesis and helping to compile a dictionary. At that time no detailed description of a New Guinea highlands language had been published, but these languages had already gained a reputation for being “exotic” and “difficult” in much the same way as many North American Indian languages: having some grammatical and semantic categories alien to European languages. When I began fieldwork among the Kalam I was not especially looking for linguistic exotica. My immediate aims were to figure out how the language works and to gain a good practical command of it. But of the Pacific Island languages I tried to learn during the 1960s, when I was in my late teens and 20s (chiefly, Maori, Samoan, Kalam, Tok Pisin, Bauan Fijian and Western Fijian), Kalam was by far the hardest to gain an idiomatic command of. To this day, long after I gained a working fluency in the language, I am still unsure how the Kalam will say many seemingly rather ordinary things. It soon became clear that among the hardest things to learn about this language were the conventions for constructing reports of events and event sequences. My 1987 paper, drawing in particular on the ideas of George Grace (1981, 1987), sought to find a framework for comparing event reports in English and Kalam, one that would allow the similarities and differences to be described and would provide reasonable criteria for judging whether the differences were superficial or profound. If two different reports of an observed or imagined event give essentially the same information (i.e., specify the same conceptual elements and relations), but merely package that information in a more condensed or diffuse manner, the reports can be said to be quasi-isomorphic. The question was whether Kalam and English conventions for describing events often differ more substantially than that, requiring different kinds of information to be singled out for mention, making quasi-isomorphic translation impossible. Among the conclusions presented were the following: (i) Descriptions of observed events are always selective, interpretive representations of experience. Event reports typically follow stereotyped patterns (schemas), provided by the favored construction types of the 3
The team project was initiated by the late Ralph Bulmer, one of the pioneers of modern ethnobiology. It involved linguists, social anthropologists, specialists in the natural sciences, and archaeologists, as well as members of the Kalam community who became collaborators and co-authors.
Event representation in serial verb constructions
(ii)
(iii) (iv)
(v)
4
15
language. (The same point is made by Slobin 1987 as well as by Grace 1987.) Speakers of Kalam and English characteristically report some observed events in very similar ways and this common core presumably reflects properties of certain events in the “real world” that are salient for people everywhere. In languages generally, clause structure is peculiarly adapted for the task of depicting simple conceptual events. Clauses contain grammatical mechanisms for saying who did what, with which and to whom, where and when, i.e., for separately specifying an act (process, state), the participants and their roles, and the temporal and spatial setting. Also part of clause grammar are mechanisms for associating a modality with the representation of the event (saying whether it is asserted, questioned, hypothesized, etc.). However, Kalam and English differ markedly in the kinds of events they can describe in a single clause. A major factor in this difference lies in the number of verb roots available. Kalam has only about a hundred verb roots whereas English has many thousands. Put another way, and even allowing for polysemy in verbs, the number of conceptual events that Kalam can express by single verbs is just a small fraction of the number that can be expressed that way in English. A rough classification of the kinds of conceptual events denoted by verbs in English might include (among other types): (a) ‘simple’ events (corresponding to single well-bounded acts that do not involve causal relations, e.g. wink, slap, and shout); (b) ‘complex’ events (clearly analyzable into two or more immediately related sub-events), such as are denoted by simple transitive verbs like break, split, sever, fill, bring, and take or by phrasal verbs like break off, break through, knock down, and throw out, which lexicalize minimal causal relations; and (c) ‘episodic’ events (often not readily analyzable into sub-events, but implying a sequence of routinely associated acts that may be discontinuous in time and place), such as are denoted by verbs like construct, dismantle, farm, hunt, legislate, and debate.4 Whereas English is rich in verbs denoting causal chains and is quite rich in verbs for episodic events, Kalam has few such verb roots. For example, one cannot say in Kalam that ‘something broke X’; one must say ‘something happened to X and it broke.’ One cannot say ‘He landed the plane’; one must say something like ‘He having controlled the plane, it came and landed.’
These are prototypical categories. There are some types that are intermediate between the major categories but for reasons of space these will not be discussed here. There is an extensive literature on event structure and the typology of events, which I will not try to review here. See, for example, Bohnemeyer et al. (2007), Croft (1990), Dowty (1979), Jackendoff (1990), Levin and Rappaport Hovav (1995, 1996), Parsons (1990), and Talmy (2000).
16
Pawley
(vi) Where Kalam has an approximate equivalent expression for an English verb denoting a ‘complex’ event, that equivalent usually consists of a series of verb roots, with only the final verb carrying inflections. Some such verb series consist of two verbs (English taste = Kalam ‘consume perceive’, feel = ‘touch perceive’, bring = ‘get come’), some of three verbs (remove = ‘get go dispose’), and a smaller number of four or more. (vii) To report certain kinds of routine episodes, English speakers commonly use a metonymic strategy, in that one or two component acts stand for the whole episode, with the remaining acts taken as understood, e.g. What did you do this morning? – I went to the supermarket, or I went to the doctor (where the act of going to a certain type of place is understood as implying the speaker also did other things at that place that one usually does) or I gathered firewood (where gathering is understood as implying a normal routine, in which the gatherer went out, found, picked up, brought back and stored the firewood). By contrast, Kalam favors a more explicitly analytic strategy, in which several component acts are mentioned. It is possible, and indeed common, to represent such routine event sequences by a series of verbs packed into a single clause-like construction. In example (1) such a construction containing seven verb roots describes a routine sequence associated with making a camp for the night. (In all example texts, verb roots and their glosses both appear in bold face. In multi-clause examples, successive clauses are distinguished as i, ii, etc.)5 (1)
Kik am mon pu-wk d ap agi kn-ya-k. they go wood hit-smash get come ignite sleep-3PL-PAST
A fairly literal English translation of (1) would occupy several clauses: ‘They went and gathered firewood and brought it, made a fire and slept.’ However, a free translation might say simply, ‘They gathered firewood for the night,’ where the act of gathering can, in context, be understood as implying the associated acts. (viii) English does, however, have ways of squeezing into a single clause reports of certain kinds of episodic events which Kalam cannot compress in this way. For example, one can say the man (1) threw the stick (2) over 5
The following abbreviations are used in glossing Kalam examples: D – dual; DS – different Subject (from following verb); DUR – durative; FUT – future; IMM – immediate past; IMP – imperative; ITER – iterative; PL – plural; PAST – remote past (yesterday or earlier); PERF – perfect (denotes present perfect, present habitual and today’s past); PASTHAB – past habitual; PRIOR – prior to (the event denoted by following verb); SG – singular; SS – same subject (as following verb); 1, 2, 3 – 1st, 2nd, 3rd person;- – morpheme boundary within a phonological word; = – clitic boundary within a phonological word.
Event representation in serial verb constructions
17
the wall (3) into the garden, where (2) and (3) are prepositional adjunct phrases denoting sub-events in a three-event episode. To render this in Kalam one must use a three-clause description, saying, approximately, ‘the man held (and) threw the stick, it went over the fence and dropped in the garden.’ That is to say, Kalam uses verbs to denote direction of movement where English uses prepositions. (ix) If English and Kalam minimal reports of a given event specified the same semantic elements and relations, differing only in the number of clauses needed to represent them, one could say that the reports were quasi-isomorphic. But in many cases this is not so. In particular, the conventions of Kalam often require information to be mentioned that is absent from the English version. (See the comparison in (vii) above of the metonymic strategy favored in English and the more explicitly analytic strategy favored in Kalam. See also section 5.3, especially example (21) and the following discussion.) As a consequence, in many cases Kalam and English event reports are not fully intertranslatable. Giv´on objected to my assumption that each verb root in a serial verb construction represents a separate conceptual event and so (by definition) the verb sequence in a SVC codes a sequence of conceptual events. This equation he regarded as being too firmly anchored in a belief that grammatical categories are isomorphic with cognitive categories. To assume that a single verb (a grammatical category) codes a single event (a cognitive unit) leads to a circularity of method, in which grammar is first used to define cognitive categories and then is said to correspond to them. Giv´on proposed to use pause placement as an independent indicator of how speakers segment events. In 1986 he traveled to Papua New Guinea to conduct an experiment with speakers of Kalam and four other languages that use SVCs. The results led him to conclude that the grammatical differences between single verbs and SVCs are merely superficial differences of linguistic organization and should not be taken to indicate cognitive differences, differences in how speakers perceive event boundaries (Giv´on 1990, 1991a). Instead, serial verb sequences should be viewed as lexicalized units, functional equivalents of single verbs in languages that lack serial verbs. The rest of this chapter is organized as follows. Section 2 provides brief background notes on Kalam grammar. Section 3 elaborates on Giv´on’s critique of my claims, and describes his experiment and the conclusions he draws from this. Section 4 presents some logical objections to Giv´on’s critique. Section 5 presents an empirical objection: Giv´on’s study overlooked an important distinction in Kalam between ‘compact’ and ‘narrative’ SVCs. Whereas compact SVCs are indeed often semantically equivalent to a single verb in English, narrative SVCs are not. Section 6 draws some general conclusions from the debate.
18
Pawley
2
Notes on Kalam grammar
This section outlines some features of Kalam grammar relevant to the discussion that follows.6 2.1
Word classes
Of the major parts of speech – nouns, verbs, verb adjuncts, adverbs, adjectives and directionals – verbs and verb adjuncts are of particular relevance here. Verbs are the only part of speech to carry inflectional suffixes marking tense, aspect or mood, subject person-and-number, and anticipatory switch reference. Verb roots are a small, closed class with about 130 members.7 There are no morphological processes for deriving new verb stems. However, the stock of verb roots is augmented by several classes of multi-word predicates, including verb adjunct constructions and serial verb constructions (see below). Verb adjuncts are words that occur only as the partner of one verb root, or a few verb roots, with which they form a complex predicate, called a verb adjunct construction (VAC), e.g. suk ag- (laughing say) ‘to laugh’, kleηd am(crawling go) ‘to crawl’, gadal badal g- (higgledy-piggledy do) ‘place things higgledy-piggledy or criss-crossed’.8 (In these examples verb adjuncts and their literal glosses are underlined.) In VACs the verb root serves as a classifier, marking the event as being of a certain general type. The verb adjunct specifies the subtype or specifies an associated activity to that depicted by the verb root. A VAC can occur as a predicate by itself or it can fill a verb slot in a serial verb construction. VACs form an open class of predicates with several hundred recorded members, often translatable by a single verb in English. 2.2
Verbal clauses
A verbal clause consists minimally of a verb inflected for tense/aspect/mood and subject reference. However, one or more bare verb roots may precede the inflected verb in a clause (see section 2.4). In transitive clauses the canonical 6
7
8
In most respects Kalam’s morphological and syntactic patterns are typical of the Trans New Guinea family. However, it allows more elaborate serial verb constructions than most TNG languages. Kalam has two main dialects, Etp and Ti, which show considerable differences in morphological forms and lexicon. Examples cited here are from the Ti dialect of Gobnem, in the Upper Kaironk Valley. My 1987 paper states that Kalam has about a hundred verb roots. Since then another thirty or so have been discovered. The dictionary distinguishes about 400 senses among the 130 verbs, a much smaller amount of polysemy than is recorded for the most common 130 English verbs. ‘Verb adjunct’ is the usual term for this word class in Papuan linguistics. In descriptions of Australian languages a category with similar characteristics is often called ‘coverb’ or ‘preverb.’
Event representation in serial verb constructions
19
order of major constituents is Subject Object Verb. If there is a secondary object it usually precedes the primary object, as in (2): (2)
˜ An np moni n-a-k? who you money give-3SG-PAST ‘Who gave you money?’
Only one inflected verb is allowed in a clause. Verbal clauses are classified according to the kind of inflected verb that is their head or obligatory element. Independent verbs are able to stand alone as the head of a complete sentence. They carry suffixes marking subject person-and-number and tense/aspect/mood with absolute reference (i.e. deictic reference with respect to the speech situation). Coordinate-dependent verbs (often called medial verbs in descriptions of Trans New Guinea languages) are dependent on the final clause in a sentence for a full interpretation of their tense-aspect and subject reference. They carry suffixes marking subject and tense reference relative to the next verb: whether the verb has the same subject (SS) as the next verb or a different subject (DS), and whether the event denoted by the verb is prior to, simultaneous with, or future to that of the following verb. However, in other respects they are coordinate with, rather than subordinate to the final verb, hence the term ‘coordinate-dependent,’ used by Foley and Olson (1985). The most common suffixes marking same subject and relative tense are -l ‘SS:prior’, -lg ‘SS:simultaneous’ and -ng ‘SS:future’. The basic forms of the different subject markers are -e- ‘DS:prior’ and -knη ‘DS:simultaneous.’ A coordinate-dependent verb marked for change of subject in the next verb carries a separate suffix marking the person-and-number of its own subject, e.g. k-naknη (sleep-2SG-DS:simultaneous) ‘while you were sleeping (someone else did . . . ).’ 2.3
Clause-chaining constructions
It is common for a long chain of medial (coordinate-dependent) clauses, marked for same subject and relative tense, to precede an independent clause. A nonfinal intonation juncture (written here as a comma) must occur after each coordinate-dependent clause except the final one, that which immediately precedes the independent clause. Sometimes such chains of medial clauses number more than fifteen. In (3), clauses ii–ix constitute a chaining construction within the larger construction i–x. (3) i.
. . . aps-basd=yad md-elgp-al won ok, grandmo.-grandfa.=my live-PASTHAB-3PL time that ‘. . . at the time when my grandparents were alive,
20
Pawley ii.
kmn=nen gos nη-l, game=after thought perceive-SS.PRIOR having planned to go after game mammals,
iii. am-l, go-SS.PRIOR having gone out,
iv.
kmn tap nb ogok ti ti d-l, game food like those what what obtain-SS.PRIOR having gathered various plants for (cooking with) game mammals,
v.
˜ ad nb-l, cook eat-SS.PRIOR having cooked and eaten them,
vii.
am-l, go-SS.PRIOR having gone out,
ix.
g-elgp-al ak, do-PASTHAB-3PL topic those (things) they used to do,
x.
m˜ni ag-ngab-in. now say-FUT-1SG I am now going to talk about.’
vi. kn-l, sleep-SS.PRIOR having camped out overnight, viii. ap-l, come-SS.PRIOR having come back,
‘I’m now going to describe how, in the time of my grandparents, when people planned to hunt game mammals, they would go out and gather certain plants and cook them in stone ovens and eat them, and sleep out (in the forest), and after going out and coming back (to camp) they would do these things.’
2.4
Serial verb constructions
The predicate of a serial verb construction (SVC) in Kalam has as its nucleus a verb series in which one or more bare verb roots precede an inflected verb root without any intervening conjunctions. Some of the main features common to SVCs are illustrated by (1) (on page 16) and (4), by the three clauses in (5), and by the second and third clauses of (6).9 (4)
Am d aw-an! go get come-2S.IMP ‘Fetch (it)!’ (lit. ‘Go get (it) and come!’)
(5) i.
Ami . . . taw tb tk-l, mother step cut sever-SS.PRIOR ‘My mother . . . having stamped on and closed off (the entrance to the bandicoots’ burrow),
9
There are a few kinds of ‘non-canonical’ SVCs, of which I will say little here. These include SVCs where the final (inflection-carrying) verb is grammaticalized, serving as an aspectual marker or as an auxiliary verb taking a preceding SVC as its complement (Lane 2007; Pawley and Lane 1998; Pawley 2008). Grammaticalized verbs do not denote a separate event in the same sense as lexical verbs do.
Event representation in serial verb constructions ii.
tug tb tk d-e-k, . . . holding.in.hand cut sever hold-3SG.PAST she took hold of them (one by one) and closed off (the entrance)
iii.
mey pak l-a-k mamd ak. thus kill finish-3SG-PAST five that and in this way killed all five.’
(6) i.
21
. . . kayn ak ney awsek am-ub, dog the he alone go-3SG.ITER ‘ . . . the (hunting) dog, he goes out alone,
ii.
n˜ n ak ognap wtsek d ap tan d ap yap day the some pursuing get come ascend get come descend g g suw-p, do do bite-ITER.3SG some days he goes about chasing all over the place and makes kills,
iii.
n˜ n ak ognap wt-sek d ap tan d ap yap day the some pursuing get come ascend get come descend g g met nη-l do do not find-SS.PRIOR some days after chasing (animals) back and forth and not having caught any,
iv.
adkd katp ow-p. turning.back (adv.) house come-ITER.3SG he comes back home.’ (KHT ch. 19: 28)
In a canonical SVC each verb root has a lexical meaning (usually its primary sense) as opposed to a grammaticalized meaning, and denotes a distinct conceptual (sub-)event in the event sequence. To the extent that the events represented in a SVC are temporally discrete, their order matches the temporal order of the verbs that represent them. All the events are of roughly equal semantic importance, i.e. none is subordinate to another. SVCs have a number of characteristics, phonological, grammatical and semantic, that support the view that they belong to a single clause. This is the case even though there is, in principle, no grammatical limit to the number of verb roots that can occur in a single SVC. In practice – if we exclude iteration of verb roots to show repetition or continuity – the limit seems to be about nine or ten. As Giv´on’s study confirms, the verb series is almost always uttered without internal pause and within a single intonation contour. The shortness of Kalam verb roots is a help in this. Verb roots are mostly monosyllabic and some consist of a single consonant. Thus the sequence of eight verb roots, wik d ap tan d ap yap g- (rub get come go.up get come go.down do ‘rub up and down, as in massaging or scrubbing’), consists of just six syllables and takes no longer to say than excommunicated or indefatigably. Even including nominal
22
Pawley
and adverbial constituents, narrative SVCs seldom exceed fifteen syllables and can comfortably be fitted into a single intonation contour. SVCs have a number of characteristics, grammatical, semantic and phonological, that support the view that they are tightly integrated and belong to a single clause. The verb series is almost always uttered without internal pause and within a single intonation contour (see section 3). All the verbs in the SVC share the same overt subject; this can be represented lexically only once and only the final verb in the series can carry a subject-marking suffix. Only the final verb in the series is marked for tense/aspect/mood; but this marker has scope over all the verbs in the SVC. The same holds for marking of the grammatical subject. Only one object NP can occur and this is shared by all transitive verbs in the SVC. Only one negator can occur. In most cases it has scope over the whole verb series, but there are some exceptions, to be discussed in section 5. That section distinguishes between two major types of SVC that differ in important ways in syntactic and semantic structure. 3
Giv´on’s experimental study of event packaging in Kalam and other languages
Giv´on interpreted me as suggesting that speakers of verb-serializing languages differ fundamentally from speakers of other languages in their “cognitive segmentation” of events. Thus, what English speakers view as unitary events are treated by Kalam speakers as concatenations of fragmented sub-events. This argument, he complains, rests on a questionable belief in the iconicity of grammatical and semantic categories. This is a long-standing fallacy in the thinking of linguists, one that goes back to Aristotle’s equation of the verb with the core of a proposition. Giv´on observes (1990: 23) that “the opposite view – that serial verbs are sub-parts of a single ‘event’ – can be argued on, essentially, the same iconicity grounds, by invoking other grammatical criteria to determine what is an ‘event’.” He points out that this “opposite” view has been taken by various linguists, including Foley and Olson (1985). Giv´on carried out an experiment designed to investigate the cognitive processing of different kinds of verb sequences in five verb-serializing languages spoken in Papua New Guinea. He published findings for three of these languages: Kalam, Tairora, a very distant relative of Kalam spoken in the Eastern Highlands Province, and Tok Pisin, a creole whose grammar and semantics has been heavily influenced by Austronesian and Papuan languages of Melanesia (Giv´on 1990, 1991a). The hypothesis was that encoding certain types of verb sequences would take more time than others, depending on the degree of morphosyntactic integration (or independence) of the verbs within a larger construction. Three types of constructions were compared: independent clauses (the verb, being marked
Event representation in serial verb constructions
23
for absolute tense/aspect/mood and reference to subject, can be interpreted independently of any other verb), coordinate-dependent clauses (the verb is marked for tense and subject relative to the final verb, and in that respect its interpretation depends on the next independent clause), and serial verb sequences (where the verbs are integrated to the point of being part of the same predicate phrase). Kalam makes heavy use of SVCs, Tairora moderate use, and Tok Pisin much less use. Kalam and Tairora both make extensive use of clause-chaining constructions, using coordinate-dependent verbs, but Tok Pisin does not have this type of construction. A six-minute action film was shown to speakers of each language. Each subject was asked to provide two narratives describing what happened in the film, one spoken ‘on-line’ (during a second viewing of the film), one ‘postview’ (immediately after). Pause placement and intonation were used as a measure of whether a sequence of verbs denotes a sequence of conceptual events or just one event. Any period of silence longer than 100 ms was counted as a pause, and associated with a processing event, i.e., an act of encoding. In psycholinguistic research pauses are well known to be associated with encoding acts, i.e., points in the flow of speech where the speaker plans a subsequent speech act (Goldman-Eisler 1968). However, pauses alone are a crude indicator of encoding activity. A more refined measure also needs to take into account intonation junctures and fluctuations in rate of articulation, as well as a number of interactional variables (Chafe 1979, 1980, 1994; Pawley and Syder 2000). If verbs A and B (and any associated material) were separated by pause this was taken to be evidence that they were encoded separately in planning and preverbal production, and in this sense correspond to separate cognitive events. If A and B were not separated by pause this was taken to be evidence that they were processed as a single chunk or package, which in turn was interpreted as evidence that they represented a single cognitive event. The hypothesis predicted that speakers will pause most often after an independent verb (not highly integrated with the next verb), less often after a coordinate-dependent verb (moderate degree of integration), and least often after a serial verb (highly integrated with the next verb, both being part of the same clause). Kalam narrators paused between the verbs in a serial verb construction only in about 4–5% of cases, similar to the hesitation rate within single words. They paused much more often at boundaries between coordinate-dependent clauses (about 23–32% in on-line narratives and about 48–60% post-view narratives) and consistently paused after independent clauses (81% on-line and 71% postview). Although Kalam speakers used far more SVCs than speakers of the other two languages, all three languages displayed similar overall patterns of pause probabilities, with inter-clause transitions showing a much higher rate of pausing than transitions between verbs in a SVC. Giv´on commented:
24
Pawley
In terms of temporal packaging, serial-verb clauses, on the one hand, and prototypical main/finite clauses, on the other, behave as two extreme points on this scale: the former as co-lexical stems (or grammatical morphemes) within a clause; the latter as full-fledged independent clauses. However, chain-medial verbs exhibit pause probabilities and adjacency probabilities somewhere between the two extreme poles. (Giv´on 1990: 49)
He concluded that serial verbs in Kalam and Tairora are consistently colexicalized (or in a minority of cases, co-grammaticalized) because they “display pause probabilities that fall within the range of lexical words.” (1990: 48) The main function of SVCs in Kalam, he said, is to augment the small stock of verb roots. That is to say, SVCs serve to encode conceptual events that are usually denoted by single verbs in languages with large open verb classes. In Tok Pisin, on the other hand, one of the verbs in a SVC is typically grammaticalized, serving as an aspect or causative marker: Whether primarily a device for enriching the grammar (as is the case in Tok Pisin), or of enriching a limited verbal lexicon (as in Kalam), serial-verb constructions should be viewed within the context of the typology of lexical-syntactic coding, rather than the typology of cultural-cognitive event perception. (Giv´on 1990: 48)
It follows that: Serial verb constructions . . . do not represent a different cognitive way of segmenting reality . . . . Rather, they represent a different grammatical-typological way of coding event segments. These event-segments are ‘chunked’ roughly the same way crosslinguistically, thus presumably cross-culturally. Cross-language uniformity in event segmentation, while never absolute, is much higher than the small core suggested by Pawley . . . . (Giv´on 1990: 48)
4
Logical problems with Giv´on’s position
Giv´on’s experiment clearly demonstrates a strong correlation between patterns of pause placement in multi-verb constructions and degrees of grammatical (morphosyntactic) integration. The very low probability of internal pause in SVCs, compared with sequences of medial clauses and independent clauses, is consistent with the claim that verb series in SVCs are stored as a package in the long-term memory. These findings come as no surprise. There is a good deal of evidence that when a burst of novel speech is being planned, clause-sized constructions are the favored targets and that speakers cannot, in a single planning act, encode novel lexical combinations across independent clause boundaries (see section 6 for further discussion). Although Giv´on was the first to provide rigorous statistical proof, the claim that the verb series in a SVC is typically spoken as a single fluent chunk has long been one of the arguments supporting
Event representation in serial verb constructions
25
the view that SVCs are single clauses rather than multi-clause constructions. However, evidence that SVCs are single clauses has not generally been taken to imply that the verb series in a SVC forms a single lexical item. Giv´on goes on to make a number of broader claims about the nature of ‘events,’ ‘cognitive’ units, and ‘lexical’ units and how these relate to Kalam SVCs. There are some logical flaws in his case. In the first place, I suggest that, while Giv´on’s experimental findings are valid, they do not contradict my conclusions about how events are segmented because we are talking about different things.10 My paper was not concerned with whether or not speakers store certain verb series in the long-term memory and retrieve them as single chunks. I did not contest the “opposing” view that a SVC denotes a single larger event made up of a number of (sub-)events. Indeed, this was also my view. My concern was with the semantic structure of SVCs. As a working procedure I assumed that each verb carrying a lexical meaning represents a semantic category ‘event,’ without precluding the possibility that this meaning might itself be further analyzable into more atomic ‘sub-events’ or that a series of verbs might combine to represent a more complex kind of event. Giv´on’s chain of argument runs like this. We must bear in mind that ‘event’ is a conceptual/cognitive entity not a grammatical one. We should avoid the circularity of definition that grammarians are prone to when they define ‘(conceptual) event’ in terms of grammatical categories. Instead of assuming that each verb root (a grammatical category) in a SVC denotes a separate event one should look for an independent measure of how speakers segment events when they use SVCs. Pauses in the flow of narrative speech provide such a measure, if we accept the assumption that each such pause corresponds to an act of speech planning while absence of pause means absence of such an act. Although not everyone would accept that absence of pause invariably means absence of planning during speech, this is a sensible alternative to the grammarbased approach. However, it yields a tight circle of inter-dependent constructs, which are isomorphic, precisely the sort of thing that Giv´on objected to: SVCs (verb series) are chunks of fluent speech. Therefore they are stored as memorized packages. Therefore they are lexicalized. Therefore the verb series represent single events.
Patterns of pause placement are one measure for distinguishing between word strings that form a lexical unit and those that do not, but they are hardly a 10
In view of Giv´on’s reservations about the idea that differences between Kalam and English ways of talking about events correlate with differences in worldview, it is perhaps ironic that our different views of how things work stem in part from our different ways of talking about the subject matter in question.
26
Pawley
sufficient measure. If we define the lexicon as containing all linguistic entities that are memorized, we will catch a strange array of diverse creatures in our net. Adopting this criterion would mean treating as lexical items thousands of familiar sentences like I’m sorry to keep you waiting so long, I wouldn’t do that if I were you, or That was the last time I ever saw him. It may be true that each such sentence contains a formulaic core that is stored in the long-term memory but in many respects these expressions deviate from typical lexical items. And by defining ‘event segmentation’ solely in terms of breaks in the flow of speech Giv´on tacitly excludes from consideration the internal semantic structure of the fluent chunks. Individual verb roots may have meanings but if a verb series is consistently spoken as a chunk it is regarded as a single cognitive unit, not analyzed by the speaker during the encoding process. These two views of event segmentation are not mutually exclusive: cognition is a mansion with many rooms. There are holistic and analytic ways of knowing things. When uttering the familiar sentences referred to above, English speakers may retrieve them from the memory as single (unanalyzed) packages, but when called on they are nevertheless able to analyze them semantically. A rough analogy can be made with knowing and performing a musical score. To play a concerto acceptably a pianist must automate the basic mechanics of the performance, learning to execute complex chords and sequences without conscious thought. But that is not incompatible with the capacity to reflect on scores and performances and analyze them, picking out small details. I suggest that the chief merit of pause placement and intonation contour boundaries as diagnostics for event segmentation in multi-verb constructions is not that they provide a grammar-independent means for defining minimal conceptual events. It is that such junctures provide a grammar-independent indicator that, in the speech planning process, separate conceptual events have been integrated, treated as sub-events of a tight-knit larger unit. Insofar as the grammatical packaging of these intonational units is also tight-knit, the two measures are corroborative. Indeed it was striking that Giv´on’s (1990, 1991a) experiment yielded a very close correlation between degree of syntactic integration (comparing three types of multi-verb constructions) and frequency of phonological juncture. Let us return briefly to the question: Are different linguistic representations of events associated with different ways of thinking about the events? My view is that this must be true, by definition, to the degree that the linguistic representations give different information. However, one may distinguish between representations that are: (a) fully isomorphic, i.e. have the same linguistic structure and content, differing only in superficial details of form; (b) quasi-isomorphic, specifying the same conceptual elements and relations but packaging them in a more condensed or diffuse manner; and (c) not isomorphic, because they mention different entities and/or relations.
Event representation in serial verb constructions
27
Even within the same language, a particular observed event sequence can be described in various ways, some of which are not isomorphic. Compare Floyd flattened the metal with a hammer, Floyd hammered the metal flat, Floyd hammered the metal until it became flat, and Floyd hammered the metal and it became flat. There is a sense in which all of these sentences describe two events – an activity and a state change and that these two events always form sub-events of a single larger event. However, only the first two descriptions package the two sub-events so tightly that we get the sense that the speaker is thinking of the larger event as a single event. Viewed in these terms, the conclusion Giv´on drew from his experiment was that most Kalam SVCs can be given quasi-isomorphic translations into single verbs in English because they are co-lexicalized or co-grammaticalized. It is fair to say that this makes a good deal of sense for one large class of SVCs, namely compact SVCs.
5
Empirical problems: compact vs. narrative SVCs
I turn now to empirical problems with two of Giv´on’s claims about Kalam serial verbs, namely: (a) that they are consistently co-lexicalized (or in a few cases, co-grammaticalized); and (b) that their main function is to augment the limited stock of verb roots, and their meanings are largely equivalent to those of single verbs in English. These claims overlook an important distinction between two major classes of SVC: compact and narrative SVCs. Although both classes share the semantic and grammatical constraints characterizing typical SVCs outlined in section 2, they differ in their semantic and syntactic structure.11
5.1
Compact SVCs
Most compact SVCs consist of two verb roots, though some contain three or four. Syntactically, a compact SVC is a nuclear layer predicate in the sense of Foley and Van Valin (1984) and Foley and Olson (1985). No non-verb elements can be inserted between verb roots (other than verb adjuncts, which count as part of a verb). The negative clitic and any adverbial modifiers have scope over the entire verb series. 11
The distinction is similar to (though not identical to) that made between ‘component serialization’ and ‘narrative serialization’ by van Staden and Reesink (2008). Narrative SVCs very like those of Kalam appear in Kalam’s closest relative, Kobon (Davies 1981). Broadly similar constructions appear in some other New Guinea languages (e.g. Bruce 1986, 1988; Heeschen 2001; Farr 1999). This kind of SVC has variously been called ‘condensed narrative’ (Heeschen 2001), ‘narrative’ (van Staden and Reesink 2008), ‘episodic’ (Farr 1999; Pawley 1987) and ‘multi-scene’ (Lane 2007; Pawley and Lane 1998).
28
Pawley
The verb series is semantically close-knit. It has the ‘macro-event property’ defined by Bohnemeyer et al. (2007), Bohnemeyer et al. (this volume): temporal operators, such as tense markers and temporal adverbs, have scope over all subevents in the construction. The sub-events are usually close-spaced in time and often connected in a causal chain. In some cases the connections people make between the constituent sub-events are probably grounded in humans’ innate perceptions of physical processes. In other cases the connections depend on culture-specific knowledge of customary behavior. Compact SVCs fall into many types according to their particular semantic and grammatical makeup. Just a few types will be illustrated here (a fuller account appears in Pawley 2008). It is important to note that each type represents a productive pattern. For each of the types exemplified in (7)–(10) below, the patterns are defined by the accompanying notes. Verb series denoting testing or discovering events. An activity verb or verbs precedes the generic verb of perception and cognition, nη- ‘perceive, be conscious, aware, see, hear, feel, smell, know, etc.’. (7) ag nηap nηl nηd nηn˜ b nηpuηl nηtag nηtaw tag nηtb nηwk nη-
(say perceive) (come perceive) (put perceive) (touch perceive) (consume perceive) (pierce perceive) (travel perceive) (tread walk.about perceive) (cut perceive) (burst perceive)
‘ask, enquire, ask for, request’ ‘visit s.o., come and see s.o.’ ‘try to fit s.th., try s.th. on (e.g. clothing)’ ‘feel s.th. by touching (deliberately)’ ‘taste s.th.’ ‘probe, test by poking’ ‘sightsee, travel and see’ ‘test (ground, branch, etc.) by treading’ ‘make a trial cut’ ‘test by cracking open, break open and inspect’
Verb series denoting transfer/connection events. A transitive verb precedes the generic verb of transfer, n˜ - ‘give, connect, etc.’, which denotes transfer of the referent of the affected object of V1 to the recipient of V2. (8) ag n˜ d jak n˜ d n˜ g n˜ ju n˜ n˜ ag n˜ -
(say transfer) (get stand connect) (get transfer) (do transfer) (withdraw transfer) (shoot transfer)
‘tell s.th. to s.o.’ ‘stand s.th. against a place’ ‘give s.th. personally, hand s.th. to s.o.’ ‘fit s.th. in position, connect to s.th.’ ‘return s.th. to its owner, give back’ ‘fasten s.th., pass s.th. through and connect it (in sewing, buttoning)’ puηl n˜ - (pierce transfer) ‘pierce and fit/connect’ n˜ g pak n˜ - (water strike transfer) ‘wash s.o.’ [˜ng ‘water’ is a noun] tk n˜ (write transfer) ‘write s.o. (a letter)’
Event representation in serial verb constructions
29
Verb series denoting transporting events. A verb of manipulation, usually d- (hold, handle, get, touch, control) combines with one or more verbs of locomotion. (9)
d apd amd am yokd ap tand ap tan jakd ap tan d ap yap-
(get come) (get go) (get go move.away) (get come ascend) (get come rise reach) (hold come ascend hold come descend)
‘bring s.th.’ ‘take s.th.’ ‘get rid of s.th, take s.th. away’ ‘bring s.th. up, fill s.th.’ ‘bring s.th. to the top, fill s.th. up’ ‘move s.th. up and down’, or ‘move s.th. back and forth’
The last verb series in (9) is a compact SVC that itself consists of two compact SVCs, d ap tan + d ap yap, whose order can be reversed. Verb series denoting resultative or change of state events. In the simplest case, resultative SVCs contain just two verbs: V1 is transitive and specifies an activity performed by an agent, usually forceful contact. V2 is intransitive and specifies a change of state or a movement undergone by an affected entity. The conventional meaning derived from the sequence is that the state or movement is the result of the first event. The overt subject of a resultative SVC is always the agent of V1. The logical subject of V2 is not represented as a subject; if it is overtly marked in the SVC, it is as the direct object of V1. (10)
pak cgpak wkpak sugpug sugpuηl askpuηl laktaw pag yoktb kluk yok-
(strike adhere) (strike shattered) (strike extinguished) (blow extinguished) (pierce open) (pierce split) (step.on broken displace) (cut gouge displace)
‘stick s.th. on, cause s.th. to adhere’ ‘knock s.th. to bits, shatter s.th’ ‘put out (a fire)’ ‘blow out (a flame)’ ‘prise s.th. open’ ‘split s.th by wedging or levering’ ‘break s.th. off by stepping on it’ ‘gouge s.th. out’
Upwards of 500 compact verb series have been recorded. All are included in the dictionary of Kalam (Pawley and Bulmer 2003) on the grounds that they are common usages, standardized expressions. In many cases Kalam compact verb series are translatable in English by a single verb, or a verb plus particle or adjective, or a verb plus adverbial phrase. 5.2
Narrative structure and SVCs
As their name suggests, narrative SVCs tell a short story, or parts of a story, in highly compressed form. They cannot be accurately translated by a single verb in English. To understand their conceptual structure we would do better to compare them not with English lexical verbs but with narratives. In a well-known paper on spoken narratives in English, Labov (1973: 363) identifies the following major components of narratives:
30
1. 2. 3. 4. 5.
Pawley
Abstract. Announces the story and indicates what it is about. Orientation. Identifies the initial context, e.g. time, place, and participants. Complicating action. Answers the question: What happened? Resolution. Reveals the outcome of the complicating action. Coda. Summary remark signaling that the narrative is finished.
The components of a narrative differ from one another in their syntactic patterning. Typically, it is only the complicating action and resolution that are made up of narrative clauses, which report events using verbs in the simple past tense. By contrast, the abstract and orientation, which deal with situations and relationships rather than events, require syntactic structures that are a good deal more complex. Kalam narratives show similar functional parts to English narratives. Narratives may be complex, containing two or more episodes within a larger story. A well-formed account of a single episode must at least describe the complicating action and the resolution, the other components being optional. The account may be spread over many clauses or be compressed into two or three clauses or even into a single clause, by the use of narrative SVCs. The semantic links between events in a narrative SVC differ from the direct causal chain and force dynamic links that characterize the event structure of many compact SVCs. The semantic conventions for constructing narrative SVCs are a subset of the well-formedness conditions on episodes, which specify what events should be mentioned and in what order. The events fall into two or more distinct stages, which occur in different places and in some cases are understood to be separated from other stages by considerable intervals of time. The extent of the time gaps between stages usually remains unspecified (occasionally an adverb such as kasek ‘quickly, soon’ modifies a particular component verb or verb series) but is understood from pragmatic knowledge. Why, one may ask, would speakers wish to cram several stages of a narrative into a single clause? What is to be gained by such compression? To answer this we need to see narrative SVCs as part of a set of syntactic structures that contrast with each other in terms of information packaging. Their first cousins are clause-chaining constructions, in which the speaker uses a string of medial verbs to report a sequence of acts performed by the same actor (Lane 2007; Pawley and Lane 1998). Chaining constructions are preferred when speakers want to individuate particular stages in the narrative, i.e., to emphasize the temporal discreteness of the stages, or to elaborate on details, as in (3) above and (12) below. Narrative SVCs are preferred when speakers do not want to individuate the stages. In narrative SVCs individual events in the sequence are mentioned but in the most minimal way, with little or no use of what Labov calls ‘evaluative devices’ – such as voice modulations, adverbial intensifiers,
Event representation in serial verb constructions
31
and descriptive phrases – to add detail and drama to the bare bones of the reported actions. And, as many of the examples show, speakers narrating a particular episode can use a mixture of strategies, using single verb clauses for some stages and narrative SVCs for others. 5.3
Collecting episodes
The distinctive semantic and syntactic features of Kalam narrative SVCs may be illustrated by examining one major class of narratives which are richly represented in our corpus: successful collecting expeditions, such as getting firewood, fetching water, picking fruit, gathering leafy greens, hunting for wild mammals on the ground or in trees, and collecting pandanus leaves to make mats or for thatching. Collecting expeditions represent a particular sort of purposeful activity, where there is both an immediate objective and an ultimate objective. Whether carried out by humans, nut-storing squirrels, nectar-gathering bees, or nesting sparrows, successful collecting expeditions have four main stages: one or more actors go forth in search of something and, having got it, they carry the goods to a convenient place and then process them or dispose of them in some way. The main stages in a well-formed minimal report of a successful collecting episode in Kalam can be sketched as in (11). (11)
Major constituents of reports of successful collecting episodes 1 2 3 4 5 MOVEMENT COLLECTING TRANSPORT PROCESSING CODA TO SCENE OF TO SCENE OF COLLECTING PROCESSING
The first three stages describe the complicating action. Stage 4 is the resolution, telling how the goods were processed or disposed of (e.g. cooked and eaten, preserved by smoking, stored, divided up, or traded). Sometimes there is a fifth stage, a kind of coda, that closes off the episode by saying, e.g., that the actor(s) slept or came home. A narrative SVC reporting a collecting episode is defined as any SVC that contains two or more of stages 1–5. Example (1) above contains stages 1–5. The corpus also contains SVCs consisting of stages 1–3, 1–4, 1–5, 2–4, 1–2, 2–3, 3–4 and 4–5. Predictably, there are no recorded cases of 1+3 and 1+4; these would be ill-formed because stages 2 and 3 describe pivotal event(s) in the complicating action and cannot be omitted from a report. For each of stages 1–5, speakers can choose to say what happened in more or less detail. Thus, it is possible to compress all five stages of a collecting episode into a single clause or to give them more extended treatment, spread over two,
32
Pawley
three or many clauses. In (12), after the orientation segment in clauses i–ii, a report of a hunting sequence is spread over the four clauses iii–vi, but only stage 4 receives some degree of elaboration. (12) i.
m˜nab ak l g-l land that establish do-SS.PRIOR ‘After that land had been created
ii.
md-e-k, exist-DS.PRIOR-3SG and came into existence,
iii.
kmn ak pak dad ap-l, (stages 2 + 3) game that kill carrying come-SS.PRIOR (the first hunter) having killed and brought game mammals,
iv.
ti ti g-l, what what do-SS.PRIOR having performed various rituals,
(stage 4)
v.
ad-l cook-SS.PRIOR having cooked
(stage 4)
vi.
˜ nb-e-k, ... eat-3SG-PAST he ate (the game mammals). . .’ (KHT Intro, #35)
(stage 4)
The next example is about gathering n˜ epek herbs. Clause i contains the gathering stage, the transport stage, and the first event in the processing stage, cooking. However, the second event in this stage, eating, occurs in clause ii and the coda is given in iii. (13) i.
ognap ksen nb tk d ap ad-l, (stages 2–4) sometimes new thus pick get come cook-SS.PRIOR ‘. . . sometimes they would gather and bring fresh ones (˜nepek herbs) and having cooked (them),
ii.
˜ nb-l, (stage 4) iii. kn-elg-al. (stage 5) eat.SS.PRIOR sleep-PAST.HAB-3PL and eaten (them), they would sleep.’ (FPKF, #17)
Narrative SVCs have a deeper constituent structure than compact SVCs. A maximal SVC reporting a collecting episode can be analyzed as containing five small verb phrases (VPs), each representing one stage. Each small VP may contain a single verb or a verb series, i.e., it may describe a single event, or an event sequence that hangs together. Most often the verb series representing one stage is a compact SVC but more sequences sometimes occur. For example, the formulaic string d ap tan + d ap yap (get come ascend + get come descend) ‘go back and forth, go up and down,’ which may occur in stage 1 or stage 3,
Event representation in serial verb constructions
33
itself consists of two compact SVCs. Small VPs do not cut across stages in an episode. At the next level up, stages 2, 3, and 4 (collecting, transport and processing) form a single constituent standing in contrast to stage 1 (movement to the collecting site) and to stage 5 (the coda, usually sleeping or return home). The verbs in stages 2–4 share the same object NP (the thing collected). They can fall under the scope of a single adverbial modifier, independently of 1. Finally the entire SVC forms a constituent, a large VP or predicate phrase, coordinate with the subject. Thus, the constituent structure of the highly recurrent lexical string, am kmn pak d ap ad n˜ b- (go game mammal kill get come cook eat), is as follows (using English glosses for Kalam words): (14)
[[go]VP [[game.mammal kill]VP [get come]VP [cook eat]VP ]VP ]VP
All the verbs in a narrative SVC may be contiguous, as is the case in example (4) and clause ii of (6) above and in (15) and (16) below. In (15) a hunting episode is spread over two clauses. Stages 1–3 are represented in clause i while stage 4, cooking and eating, is represented in ii. The object of the stage 2 and 3 verbs occurs clause-initially in i, preceding the stage 1 verb, an indication that it is topicalized. (15) i.
ii.
kmn am pak dad ap-l, game:mammal go kill carrying come-SS.PRIOR ‘. . . having gone and killed and brought game mammals,
(stages 1–3)
˜ ad nb-l katp seη ognl, . . . . (stage 4) cook eat-SS.PRIOR house old:site those they cooked and ate them at those old house sites, . . .’ (KHT Intro, #8)
For animals that live underground and are found by digging, the collecting stage is often represented by the verbs yg pak (dig kill), as in (16). The object NP is omitted here, having been established earlier in the narrative. (16)
Bin pataj ogok am yg pak dad woman young these go dig kill carrying ap-elgp-al . . . (stages 1–3) come-PASTHAB-3PL ‘Young women used to go and dig up and kill and bring back (these bush rats) . . .’ (KHT ch. 13, #29)
Narrative SVCs differ from compact SVCs in that the verbs need not be contiguous. Four kinds of non-verbal elements can intervene, marking boundaries between small VPs. First, an object NP can (and often does) follow the stage 1 verb(s) denoting movement to the scene of collecting. This can be seen in (1) and in (17) below.
34
Pawley
(17)
am kas nb ogok tk dad ap-l, . . . (stages 1–3) go leaves such these pick carrying come-SS:PRIOR ‘(they) go and pick such leaves and having brought them back, . . .’ (KHT ch. 10, #113)
Second, locative adjuncts can intervene. A locative adjunct to a stage 4 verb or verb series can occur after stage 3, as is the case in (18), in which the broad leaves of a spinach-like herb, bep, are gathered and put into an oven pit. . . . mj – bep tk d ap nb okyaη leaf-spinach pick get come place below
(18)
yok-l, . . . (stages 2–4) throw-SS:PRIOR ‘. . . having picked and brought bep leaves and thrown (them) below (into an oven pit), . . .’ (KHT ch. 1, #72)
Alternatively, a locative adjunct to a stage 2 verb or verb series, as well as an object NP, can separate this from stage 1 material, as in (19): Ney am okok kmn-nen gtag tag pak dad ap-l,. . . she go around game-after travel travel kill carrying come-SS:PRIOR ‘She used to go and walk about killing and bringing back game mammals, . . .’ (KHT ch. 10, #35)
(19)
Thirdly, an adverbial modifier can occur between the stage 1 verb(s) and the following verbs. In such cases the scope of the modifier may be over the whole SVC or just over the verb(s) that follow the modifier. In the case of (20) it is probable that the speaker intended kasek ‘quickly’ to modify only the final verb. (20) i.
. . . maj-wog ogok g ym-e-l, sweet.potato-garden these do plant-DS:PRIOR-3PL ‘. . . after they had made these sweet potato gardens,
ii.
˜ (kupyak) ap kasek nb-e-k (rat) come quickly eat-DS:PRIOR:3SG-PAST (the rat) came and soon ate (there).’ (KHT ch. 13, #68)
(stages 3–4)
Fourthly, a negative clitic may precede the final verb in a narrative SVC. In compact SVCs the negative clitic must precede the entire verb series and it always has scope over the entire series. In narrative SVCs there are more options. First, the non-emphatic negator ma- can precede the entire verb series and have scope over it. Second, ma- can precede the final verb in the series but have scope over the whole series. Third, ma-, or the emphatic negator met, can precede the final verb in the series, but have scope only over that verb, as in (6iii) above. The question arises, where do narrative SVCs fit into a typology of syntactic constructions? In terms of intonation, narrative SVCs with continuous
Event representation in serial verb constructions
35
verb series, even those containing eight to ten verb roots, behave like single clauses, being almost invariably spoken under a single intonation contour. However, pauses do sometimes occur when the verb series is discontinuous, specifically, after a stage 1 VP that is followed by a heavy locative and/or heavy object phrase. In such cases, the likely reason for the pause is that there is new information in the non-verbal constituents and the encoder has to pay close attention to these. Compare English single-verb clauses with heavy complements or modifier phrases, which often exhibit internal pauses (Chafe 1987). In terms of the criteria employed by Foley and Olson (1985), narrative SVCs are typologically diverse. Some qualify as nuclear layer constructions because the verbs are contiguous, and share all arguments and peripheral phrases. In other cases the stage 1 VP appears to be joined to the other VPs at the core layer (it does not share the direct object but shares other material, such as tense-aspect and mood, and scope of negation). Both these types would count as single clauses in their typology. In a small minority of narrative SVCs, one VP appears to be joined to the rest at the peripheral layer (cases where the scope of a locative adjunct, adverbial modifier, or negator is restricted to just one of the VPs). These would count as separate clauses. However, a sharp one-clause vs. two-clause taxonomy seems counter-intuitive. What we have here, surely, is evidence for a continuum of clause-like constructions, with some constructions meeting more of the diagnostic criteria than others. We may ask, why would speakers want to squeeze a report specifying a long sequence of sub-events into a single clause-like frame? What is to be gained by such compression? There appear to be two kinds of advantages, both having to do with packaging information for a fast ride. The first has to do with choices in the way of telling a story, in choosing how much detail to provide. Narrative SVCs are preferred when speakers do not want to individuate the stages. In narrative SVCs individual events in the sequence are mentioned but in the most minimal way, with little or no use of evaluative devices – such as voice modulations, adverbial intensifiers and descriptive phrases – to add detail and drama to the bare bones of the reported actions. When speakers want to individuate particular sub-events in a narrative sequence – whether merely to emphasize the temporal discreteness of the stages, or to elaborate on other details – they must choose multi-clause constructions. The second advantage, related to the first, is in economy of processing. The ‘narrative sequence’ in a narrative SVC is reduced to a more or less fixed form of words, a speech formula that can be retrieved as an automatic chain from episodic memory. One measure of the rigidity of narrative SVCs, and more generally of Kalam narrative style, is the fact that speakers often recount a whole episode, with
36
Pawley
all its sub-events, even when the main point being made relates to just one sub-event in the sequence. This apparent transgression of the Gricean principle of economy can be seen both in narrative SVCs and in clause-chaining constructions. Consider (21): (21) i.
ii.
˜ As nb-ak yg pak d ap nb-l, ... (stages 2–4) small.mammal like-this dig kill get come eat-SS.PRIOR ‘After digging up killing bringing (home) and eating this kind of animal, b mnek wog ksen ma-a-b-al. man next.day garden new not-go-PERF-3PL men don’t go into newly planted gardens for the next few days.’
It is only the act of killing this kind of animal that makes a man ritually dangerous to crops. The other four sub-events represented in clause i (the mode of capture, transport, cooking, and eating of the animal) are not strictly relevant to the point the narrator is making. Thus, an idiomatic English translation would simply say ‘After killing this kind of animal, men don’t enter newly planted gardens . . . ’ In such cases, why do speakers bother to mention the superfluous subevents? There appear to be two mutually reinforcing factors: (i) habit: because the formula for the whole event sequence is stored in the long-term memory it is just as easy, or easier, to retrieve the whole sequence than to pick out the most salient sub-event(s); (ii) convention favors it: it is good style to mention all the sub-events in a routine narrative sequence. A comparison may be made with superfluous grammatical elements. It is a commonplace that, in most languages, the rules require some grammatical elements to be included in contexts where they could easily be done without (for example, in English the plural marker -s is redundant on nouns which are modified by the numerals two, three, etc.). This freezing of habitual usages can occur in the domain of discourse content as well as in grammar and lexicon. 5.4
In what sense do narrative SVCs represent a single event?
Clearly, the syntactic complexity of narrative SVCs is such that it makes little sense to treat them as complex lexical units. Narrative verb serialization is highly productive. Particular instances of narrative SVCs are based on generalized semantic and grammatical schemas (constructions) that can generate an indefinitely large number of strings. In addition, as we have already argued, the semantic structure of narrative SVCs is not to be understood in terms of patterns of lexical semantics but in terms of the conventions of story-telling.12 Narrative SVCs have close relatives in the form of chains of 12
It might be argued that one function of narrative SVCs is to make up for the lack of episodic verbs such as ‘dismantle,’ ‘farm,’ ‘hunt,’ and ‘shop,’ which stand for a familiar sequence of
Event representation in serial verb constructions
37
coordinate-dependent clauses that share the same subject and that describe familiar event sequences. Clause chaining is preferred when the speaker wishes to separate (and perhaps elaborate on) the sub-events or stages in the narrative sequence. Narrative SVCs are preferred when the speaker prefers to give a routine, minimalist description, without individuating the sub-events. I conclude that Giv´on’s claims that Kalam SVCs are either co-lexicalized or co-grammaticalized and that their main semantic function is to supplement Kalam’s small class of verb roots do not apply to narrative SVCs. But if narrative SVCs represent a sequence of separate events, in what sense can they also be said to represent a single event? It is generally acknowledged that there are constructions in which two or more sub-events hang together to form a larger event, e.g. English resultatives such as Jane threw the cat out and John wiped the table clean, and causatives such as Sally caused her sister to cry and That made me so angry. But where does one draw the line between constructions that describe a single complex event and constructions that don’t? What about The plumber came and cleared the blocked drain this morning, or I’ve just washed and hung out the clothes, or I spent this morning washing, drying and ironing all of Bill’s shirts? Clearly in each case there is a sequence of events that together make up an episode, well-defined in terms of cultural norms. But what criteria can one appeal to in deciding whether, in the mind of the speaker, the events in a sequence are so closely associated as to form parts of a single larger conceptual event? I suggest that the formulaic character (or otherwise) of event reports is a particularly significant clue to their cognitive standing. If people use much the same form of words over and over to report a certain sequence of events, there can be little doubt that they are drawing on a conceptual schema that is, in some sense, stored as a single unit. However, familiar event sequences can be described using multi-clausal formulas. Do we want to restrict the field to those event sequences that are syntactically very tight-knit? For many, single clausehood remains the sine qua non of eventhood: i.e., the sequence of predicates must behave as a single large predicate. This seems sensible in principle. The problem is that the constructs ‘clause’ and ‘predicate (phrase)’ have fuzzy boundaries. Some multi-predicate syntactic constructions have all or almost all the properties of prototypical clauses and others have just some of the properties (Foley and Olson 1985) and the nature of the constructions will differ from language to language. We have seen that Kalam narrative SVCs are themselves a diverse class of constructions with subtypes that occupy different points on the scale between, more or less like a prototypical clause. events. However, the event structure of some narrative SVCs is more complex than any single verb in English.
38
Pawley
Bohnemeyer et al. (2007) and Bohnemeyer et al. (this volume) propose a measure for event segmentation that applies across languages regardless of construction type, which they term the ‘macro-event property.’ This has to do with whether the sub-events expressed by a construction can individually take operators marking temporal position (tenses, adverbials, temporal clauses). A construction expresses a macro-event if the sub-events it entails are not individuated temporally – in more formal terms, if and only if the time-positional or durational operators have scope over all sub-events that are represented in the construction. Several of the stages in a narrative SVC are understood, from people’s knowledge of the world, to take place at different times and different places. But importantly for our concerns, the sub-events in narrative SVCs exhibit the macro-event property that the predicates denoting the sub-events all fall under the scope of a single tense-aspect-mood marker (which occurs on the final verb). The situation with regard to adverbial modifiers is less clearcut. Stages cannot be separately specified as occurring at different times (‘earlier,’ ‘later,’ ‘yesterday,’ ‘today,’ etc.). However, although only a single modifier referring to duration (e.g. ‘quickly,’ ‘slowly’) can occur in a narrative SVC, it can modify a single ‘small VP’, as in (20). The situation regarding locative modifiers is also not clearcut. It is rare to find more than one locative present in a narrative SVC. However, SVCs reporting gathering episodes can contain a locative NP that locates only the action of the first ‘small VP’ (movement to the scene of the pivotal action), or only the action of the pivotal action VP or only the action of the VP denoting processing or disposal of the goods. In these respects certain narrative SVCs are less tightly integrated than prototypical macro-events. In particular, a case can be made for treating the initial ‘movement to the scene of pivotal action’ as a separate macro-event in the sequence. It seems, then, that all narrative SVCs fulfill some of the criteria usually considered diagnostic of single eventhood, and some fulfill most, but there are some discrepancies. One might reasonably conclude that in this case, as in many others, the analyst’s search for water-tight categories cannot be fully successful. Ordinary language users are comfortable operating with categories that leak. 6
Reflections on event representation and cognition
Are there lessons to be learned about the interface between language and cognition from this debate about event representation in Kalam and English? In this final section I reflect on several issues. (i) Let us begin with the notion ‘event’ itself. What is it good for? What explanatory value does it have? How can we connect linguistic and cognitive event representations?
Event representation in serial verb constructions
39
For linguists the theoretical construct ‘event’ is useful insofar as it helps to tie together various features of language behavior. ‘Events’ are ideas, conceptual constructions that exist in the minds of language users, mental representations of bounded happenings. Something of a consensus has emerged that, in a particular language, a particular event idea will typically be expressed: (a) by a certain kind of syntactic structure, namely a clause13 with a simple predicate (a single verb or adjective in the predicate phrase) as opposed to a complex, multi-headed predicate; (b) within the span of a single intonation contour or fluent burst of speech. To this I would add (c) in cases where an event report consists of several words, these words are likely to take the form of a formula, a familiar collocation. That is to say, a semantic construction, event X, is likely to be framed as a simple clause, and this construction is likely to be uttered as a single fluent unit, using a prefabricated form of words retrieved from episodic memory. (ii) The connection between clause and single event is unsurprising and indeed circular, insofar as ‘events’ are defined as simple propositions, as clause-sized happenings. But there is no such logical connection between events and intonation units or between events and speech formulas. Insofar as these connections hold within and across languages they must be explained in terms of how the human brain processes information. (iii) Chafe (1979, 1980, 1987, 1994) found experimental evidence for distinguishing two kinds of cognitive processes that play complementary roles in organizing speech, and which he terms, respectively, a ‘focus of consciousness’ and ‘scanning a center of interest.’ A ‘focus of consciousness’ is a concentrated, short-lived mental act in which the speaker encodes a limited amount of information, typically including one ‘new idea unit.’ During connected discourse such foci typically occur before a short burst of fluent speech that follows an intonation boundary or a pause of less than half a second. In English, such fluent bursts have a mean of about five words in length, typically fall under a single intonation contour and often correspond to a clause. It seems that the simple (single predicate) clause is a unit that encompasses roughly the amount of new information that can easily be organized and encoded in a single concentrated focus of attention (a point also made by Pawley and Syder 1983, 2000). By contrast, ‘scanning a center of interest’ is an extended process in which a certain range of related information held in ‘peripheral’ or ‘semi-active’ consciousness is explored and organized. It is typically associated with a break of more than a second in the speaker’s discourse flow. The linguistic outcome is 13
The notion that speakers typically introduce one new idea per clause (whether it be in the form of a nominal argument, a predicate or adverbial) is advanced, in various ways, by Chafe (1987, 1994); Giv´on (1984); and Du Bois (1987).
40
Pawley
often an extended sentence, made up of a sequence of discrete bursts of speech, each representing a different idea unit, strung together to describe, say, a single episode or complex situation. (iv) Given that people perceive certain event types as recurrent, it is natural that over time language communities will develop a repertoire of clause-sized schemas for describing broad classes of events and a large store of lexically specified formulaic expressions for denoting particular event types. Being able to draw on such a store of ready-made expressions ensures that event reports will be fluent and easily understood (Grace 1987; Pawley and Syder 1983; Wray 2002). (v) Some event ideas may be analyzed into sub-events and this process of complex event formation does not stop at single verbs and adjectives. By the 1980s it had become clear that to develop a comprehensive typology of event representations found in and across languages, it was necessary to look beyond simple predicates. There was an expansion of research on complex predicates (e.g. Alsina, Bresnan, and Sells 1997) and serial verb constructions (summarized in Durie 1997), some of which was concerned with event structure. (Among the questions asked in my 1987 paper were: What is a possible event structure in a single clause? How far do languages differ in what sorts of combinations of events they can express in a clause?) A problem is how to decide when a multi-predicate sequence represents a single complex event, one that is semantically tight-knit, as opposed to a looser concatenation of event ideas. Most analysts would insist that the multi-predicate sequence must be a single clause, i.e., that the predicates behave as a single large predicate. However, the categories ‘clause’ and ‘predicate’ have fuzzy boundaries. For example, Kalam SVCs are a diverse class of constructions with subtypes that occupy different points on the scale between prototypical clause and clause sequence. The semantic diagnostic proposed by Bohnemeyer et al. (this volume) would encompass most narrative SVCs but probably not all. (vi) Giv´on (1990, 1991a) interpreted me as suggesting that speakers of Kalam differ fundamentally from English speakers in their cognitive segmentation of events, specifically in those cases where the Kalam use a SVC to represent certain sequences of sub-events that English speakers can only represent explicitly by a sequence of independent clauses.14 His experimental study of event reports in Kalam and certain other languages yielded results that he viewed as indicating that Kalam and English speakers segment the stream of events in approximately the same ways. 14
English and Kalam schemas for reporting events differ in more ways than those investigated in Giv´on’s experiment (see sections 1, 4 and 5 of the present chapter).
Event representation in serial verb constructions
41
However, Giv´on and I were talking about different aspects of cognition. There are many different kinds of ‘cognitive’ processes, such as paying attention and focusing, comprehending, remembering, reasoning, analyzing, planning, discriminating, categorizing, generalizing and schema-constructing, all of which can be regarded as kinds of information-processing functions. Comparison of the way people cognize about events can be viewed from the standpoint of any of these various information-processing functions. Giv´on approached the analysis of serial verb packaging and event segmentation from the standpoint of on-line speech planning, as measured by breaks in the stream of speech. This approach reflects his long-standing interest in the roles that attentional activation and searching in memory storage play in speech processing (Giv´on 1979). Pauses and intonation junctures were equated with discontinuities in the flow of thought, an equation that provides a way of defining boundaries between events, as processing units, that is independent of their grammatical and lexical representation. In the 1987 paper I was not concerned with temporal measures of how speakers chunk information when encoding speech. I was concerned with semantic categories and schemas, represented by particular linguistic constructions, and with speakers’ judgments about what constitute well-formed reports of event sequences. I treated English and Kalam schemas for making event reports as a-temporal analytic models of reality, rather than as models of how spontaneous speech is encoded. (There is a parallel with the competence–performance distinction made by Chomsky. Each model makes sense of certain elements of speech behavior but not others.) Giv´on was correct in pointing to an approximate semantic equivalence between English lexical verbs and one class of Kalam SVCs, namely those I call ‘compact SVCs,’ which typically consist of two or three verb roots. Many compact SVCs have counterparts in English verbs of transport like bring, take and fetch, transitive verbs of testing like ask, taste and feel, phrasal transitive verbs such as throw out, wipe off, knock over and prise open, and so on. However, there is no such equivalence when it comes to ‘narrative SVCs,’ whose semantic structure is more complex than that of any English lexical verb. It is true that narrative SVC predicates conform to Giv´on’s generalization that SVCs are typically retrieved as chunks from the long-term memory. This correlates with the fact that narrative SVCs are expressed using familiar constructions and lexically specific speech formulas. But it makes no sense, semantically, to analyze narrative SVC predicates as being like lexical verbs that denote a single event. What this means is that the placement of intonation breaks is not, by itself, diagnostic of lexicalization. (vii) There is more to the analysis of event structure than event segmentation. I noted that English and Kalam conventions for making event reports differ
42
Pawley
not only in how events are segmented, lexically, but in respect of what kinds of information get mentioned. When formulating reports the speaker must make several kinds of decisions. One kind concerns which kinds of details to put in the report and which to leave out. In the case of episodic sequences, Kalam conventions require more sub-events to be mentioned than is the case for English. Do such differences in ways of talking about the world influence the way speakers perceive the world, e.g. in what details of observed events they pay attention to, in what categories they differentiate, in what they remember? It would seem likely that there are some such effects, but that is a matter for experimental study.
3
The macro-event property The segmentation of causal chains J¨urgen Bohnemeyer, N. J. Enfield, James Essegbey, and Sotaro Kita
1
Towards a semantic typology of event segmentation
Semantic typology is the study of semantic categorization. In the simplest case, semantic typology investigates how an identical perceptual stimulus is categorized across languages. The problem examined in this article is that of event segmentation. To the extent that events are perceivable,1 this may be understood as the representation of dynamic stimuli in chunks of linguistic code with categorical properties. For illustration, consider an example from a classic study on event cognition (Jenkins, Wald and Pittenger 1986): a woman prepares a cup of tea. She unwraps a tea bag, puts it into the cup, gets a kettle of water from the kitchen, pours the water into the cup, etc. This action sequence can be diagrammed schematically as in fig. 3.1. It is conceivable that at some level of “raw” perception – before the onset of any kind of categorization – the action sequence is represented as a continuous flux. But it is hard to imagine how higher cognitive operations of recognition and inference could operate without segmenting the stream of perceived activity into units that are treated as instances of conceptual categories. Let us call the intentional correlates of such categories ‘events.’ Regardless of whether or not one assumes internal representations of the action sequence to operate on event concepts, linguistic representations of it do require segmentation into units that can be labeled as instances of unwrapping a tea bag, pouring water into a cup, and so on. A semantic typology of event segmentation is concerned with the conditions under which dynamic stimuli are broken down into instances of semantically distinct categories across languages. Events are not generally encoded by lexical items alone, but by syntactic constructions, such as verb phrases or clauses. As a result, the set of possible linguistic representations of a given event stimulus may not be enumerable, since its members may vary from one another along an indefinite number of choice points. The typology of event segmentation is therefore addressed here in terms of the constraints different 1
Of course, not all events are perceivable. But perceivable events are a reasonable starting point. The scope of the problem is limited further below.
43
44
Bohnemeyer et al.
time the woman unwraps a tea bag
puts it into the cup
gets a kettle from the kitchen
pours hot water into the cup
Figure 3.1 Event segmentation – an introductory example
languages impose on the segmentation of dynamic stimuli into semantic event categories. We argue that such constraints derive partly from lexicalization patterns (Talmy 1985) and partly from the availability of particular syntactic constructions. Semantic typologies map the extensions of language-particular semantic categories on an ‘etic grid,’ a possibility space created by a few independent notional dimensions in which every linguistic categorization in the domain under study can be located as a data point. These dimensions are the potential independent variables of the analysis. They are selected on the basis of evidence from prior research. The cells of the grid are then exhaustively encoded in sets of non-verbal stimuli, and preferred descriptions and/or ranges of possible descriptions of these are collected in a typologically broadly varied sample of unrelated languages with multiple speakers per language according to a standardized protocol. Etic grids are arguably a necessary prerequisite of crosslinguistic studies of semantic categorization – at least as part of the implicit background assumptions of such studies. And in the interest of proper evaluation and critique of a study’s protocol, laying out the grid explicitly is to be preferred. Etic grids bias the data collected on their basis, but they do not obscure this bias. For example, Levinson (2000) demonstrates that not all languages lexicalize color foci identifiable in terms of the two dimensions of the Munsell color chart – hue and brightness – employed as an etic grid in Berlin and Kay (1969). Yet, the demonstration is based primarily on data collected with the Munsell color chips. The need for a manageable etic grid makes a typology of the segmentation of scenarios such as the action sequence depicted in fig. 3.1 rather ambitious. The research reported on in this chapter has focused on much simpler stimuli. The case study we present examines the encoding of causal chains across languages. Attention is specifically on stimuli in which a state change or location change is caused by one or more preceding events. Constraints on the linguistic representation of causal chains turn out to be sensitive to the number and types of ‘sub-events’ in the chain, as well as to the specific nature of the ‘links,’ i.e. the causal relations among the sub-events. Event segmentation thus reflects the complexity of the semantic representations to be conveyed. At the elementary level at which the problem is analyzed here, complexity of the
The macro-event property
45
internal structure of event representations – the number and types of sub-events to be encoded and the nature of the relations between them – emerges as the fundamental dimension on which constraints on event segmentation operate. Cut-off points along this dimension vary according to language-specific constraints. Language-specific patterns of event segmentation can be located on the complexity dimension in a way broadly similar to how language-particular color categories can be mapped on the hue–brightness matrix. But first, a methodological issue requires some attention. If language-specific event categories are not simply delimited by the extensions of lexical event labels (again, chiefly verbs), then how are such categories to be identified? This problem is addressed in the following section. 2
The macro-event property
Pawley (1987) provides a study of event segmentation in Kalam, an East New Guinea Highlands language. Pawley compares Kalam and English in terms of how they segment event descriptions into ‘conceptual events.’ He defines ‘conceptual event’ as the meaning of a clause that contains a single ‘event classifier,’ i.e. verb. The study finds striking differences between the two languages in the sets of possible conceptual events. In particular, Kalam lacks ‘episodic’ verbs, i.e. verbs that lexicalize script-level action sequences such as denoted by make a cup of tea as a summary description of the scenario in fig. 3.1 above. For instance, there is no simple verb that means ‘hunt.’ Instead, hunting activities are conventionally construed as sequences of four to six ‘conceptual events,’ according to the schema in (1) (Pawley 1987: 344; the events in parentheses may or may not be mentioned): (1)
1 2 3 4 5 6 (GO FORTH) KILL GAME BRING IT COOK IT EAT IT (RETURN TO CAMP HOME) OR HOME
An example is given in (2): (2) KAL
2
. . . mneb ak lgl mdek land that having.come.about it:existed:DS kmn ak pak dad apl, ty ty game that kill carry having.come what what gl, adl nˇ bek . . . having.done having.cooked he:ate ‘. . . when that land came into existence, people hunted game mammals [and cooked and ate them]’ (Pawley 1987: 338)2
Abbreviations in morpheme glosses include: 1 – 1st person; 3 – 3rd person; A – ‘Set-A’ cross-reference; ABL – ablative; ABS – absolutive; ACAUS – anticausative; ACC – accusative;
46
Bohnemeyer et al.
The difference in lexicalization between the two languages is obvious. The question is whether this amounts to a difference in what is semantically represented as an instance of an event category.3 Events are encoded in language and cognition as having ‘mereological’ (i.e., part–whole) structures. Parts and combinations of events are themselves conceptualized as instances of events (Casati and Varzi 1999; Krifka 1998; Zacks and Tversky 2001). So even if the hunting activity is broken down into a series of ‘conceptual events’ in (2), these still “add up” to a representation of hunting as an event. Moreover, different verbs and verb phrases are in different syntactic relations in (2). Some are more “tightly integrated” syntactically – i.e., more similar to simple sentences – than others. How do such syntactic differences affect the semantics and pragmatics of the event representation? Should one not assume, contrary to Pawley, that the relative syntactic complexity or simplicity of the expression has an impact on the complexity or simplicity of the semantic event representation it conveys? Giv´on (1991b) compares on-line and off-line descriptions of a video stimulus in four Papuan languages including Kalam, which heavily makes use of serial verb constructions and clause chaining, and in Tok Pisin (or Neo-Melanesian, the English-based creole used as a lingua franca in Papua New Guinea), which has few serial verb constructions and no chaining. He finds that pauses of a certain length are significantly less likely to occur inside serial verb constructions than elsewhere, regardless of language. From this he concludes (p. 120) that “serial verb constructions do not represent a different cognitive way of segmenting reality.” Pauses may not be a very reliable measure of event segmentation, either, since they are likely to reflect a host of factors in addition to semantics which may (although they need not) be independent of event segmentation (including phonology, syntax, and pragmatics; see Levelt 1989: 256–60, 385–7). But at any rate, Giv´on’s study suggests that serial verb constructions in Kalam form tighter syntactic units than clause-chaining constructions and sequences of independent clauses. And it stands to reason that event segmentation is affected by such differences in syntactic packaging. Consider the examples in (3), which may be representations of the same stimulus event:
3
ALL – allative; APP – applied object; AUX – auxiliary; B – ‘Set-B’ cross-reference; CAL – calendrical; CAUS – causative; CL – classifier; CMP – completive; COM – comitative; CON – converb; D2 – indexical (distal/anaphoric); DEF – definite; DIM – diminutive; DIR – directional; DS – different subject; EXIST – existential/locative; EVID – evidential; F – feminine; FOC – focus; GEN – genitive; HESIT – hesitation; IMPF – imperfective; IN – inanimate; INC – incompletive; INST – instrumental; LOC – locative; M – masculine; NEG – negation; NOM – nominative; PAST – past; PERF – perfect; PL – plural; PRES – present; PRV – perfective; REL – relational; SPONT – spontaneous; SG – singular; TOP – topic. A complication is that (2) appears to have habitual or generic reference. So strictly speaking, it refers to an indefinite number of instances of an action sequence of the same kind.
The macro-event property (3) a. b. c.
47
Floyd opened the door. Floyd pushed the door open. Floyd pushed the door and it opened.
The verb push in (3b) specifies how Floyd caused the door to open. In (3a), the causal sub-event leading to the door’s opening is present in the semantic representation, but it is left unspecific – (3a) does not reveal what exactly Floyd did to cause the door to open. Example (3c), like (3b), explicitly refers to the pushing event and the opening event, and invites a strong implicature (a defeasible inference) to the effect that these are causally related. But in (3b), this causal relation is in fact entailed. And this does not appear to be the only semantic difference between (3b) and (3c). In (3b), the pushing event and the opening event are entailed to be in spatio-temporal contiguity. In (3c), these relations are again merely implicated. The syntactic relation between event-encoding phrases reflects or encodes the semantic relation that is expressed to obtain between the events referred to. What is called for, then, is some measure that assesses how event segmentation is affected by the syntactic properties of the construction. Is there a way of telling whether (3b) or (3c) construe the pushing and the opening as parts of one event or as two separate events? No, not exactly. This is not possible, because, again, the sub-events of (3c) are at least implicated to form parts of a larger event, just as all the events referred to in a narrative of indefinite length may be construed to form parts of a single event. The difference in event segmentation between (3b) and (3c) lies in the “tightness” of “packaging.” And that is primarily a difference, not in what is expressed, but in how it is expressed – a difference in the mapping between form and meaning.4 But there is no way of assessing this mapping difference. The defining ontological characteristic of events5 is that they are individuated, not just in space (as “objects” or “things” are),6 but also in time. Events “occupy” time intervals and have a beginning or an end in time – most typically, both – and duration. The existence or history of objects of course is time-bound as well; but whereas different “time slices” out of the course of an event individuate distinct parts of the event, it is not the case that parts of the history of an object define parts of the object. Therefore, it makes sense to assume that 4
5 6
Again, there is a purely semantic difference between (3b) and (3c), but that difference does not concern event segmentation per se, but merely the distinction of what aspects of the event representation are entailed vs. implicated. As intentional correlates of event concepts and event expressions – no claims are made here concerning the existence of events in extralinguistic/extracognitive reality. There are, in fact, abstract objects and events whose spatial individuation is problematic – e.g. things like democracy, inflation, or poetry. But all events, even abstract ones, are individuated in time.
48
Bohnemeyer et al.
it is the temporal properties of a construction that provide the decisive clues to its event construal. Indeed, the three event descriptions in (3) differ crucially in this respect: (3c) admits distinct time-positional operators in the two verb phrases: (3) c .
Floyd pushed the door and it opened immediately / after a moment of breathless suspense.
This is impossible in (3a–b). With these descriptions, any operator that defines a position in time or duration necessarily has scope over both subevents: (3) a . b .
Floyd opened the door immediately / after a moment of breathless suspense. Floyd pushed the door open immediately / after a moment of breathless suspense.
In both cases, the temporal operators express the temporal distance between the combination of the door’s opening and Floyd’s pushing (or the unspecified causal event in 3a ) and some reference point – not the distance between the pushing and the opening event, as in (3c ). In precisely the sense that the pushing sub-event is not accessible to operators of temporal position or duration at the exclusion of the opening sub-event, and neither is the latter at the exclusion of the former, these sub-events are not semantically “individuated” in (3a–b), but are presented as parts of an event that in terms of the criteria of duration and location in time is unanalyzed. But this is quite clearly a structural property of (3a–b) – a mapping property of the basic clause structures of (3a–b) at the syntax–semantics interface. We may call this property the macro-event property (MEP), borrowing the term ‘macroevent’ from Talmy (2000a). For a more formal definition, see Bohnemeyer et al. (2007). For present purposes, the characterization in (4) should do: (4)
Macro-event property (MEP): An event-denoting construction has the MEP iff it combines only with those time-positional or durational operators that have scope over all sub-events it entails.7,8
In the remainder, the MEP serves as a heuristic: the encoding of complex causal chains across languages is examined with respect to the question as to which constructions involved in it have the MEP and which do not. Put differently, we explore to what extent there is uniformity or variation in the parts of the 7 8
It should go without saying that (4) is restricted to semantically and syntactically well-formed combinations of event-encoding constructions and temporal operators. The observation that differences in syntactic packaging result, aside from differences in the division of labor between semantics and pragmatics, in the differences in form-to-meaning mapping properties captured by the MEP, goes back to the Generative Semantics debate; see Fillmore (1972), Fodor (1970), and Wierzbicka (1980: 162–3).
The macro-event property
49
stimulus events that are segmented as ‘macro-events,’ i.e. described by expressions that have the MEP. Thus the MEP plays a role in these studies comparable to the role of the ‘conceptual event’ unit in Pawley’s comparison of Kalam and English. The advantage of employing the MEP as the primary criterion in a typology of event segmentation is its sensitivity to the syntactic “packaging” of event reference. It is demonstrated in Bohnemeyer et al. (2007) that multi-verb constructions may have the MEP and there are mismatches between clausehood and the MEP. Moreover, as discussed in detail in Bohnemeyer (2003) and Bohnemeyer et al. (2007), there are specific constraints on form-to-meaning mapping that emerge as operating, not on a particular unit of phrase structure, such as the clause or verb phrase, but on whatever construction has the MEP. An example is the bi-uniqueness constraint on the encoding of thematic roles (Bresnan 1982; Chomsky 1981; Fillmore 1968, inter alia). This indicates that the MEP is not merely an otherwise arbitrary property that happens to be quite suitable for the purposes of a typology of event segmentation. Indeed, the MEP appears to play a substantive role in constraining form-to-meaning mapping at the syntax–semantics interface. 3
Design of the study
The present study grew out of a larger project, an examination of the semantic typology of event segmentation in the domains of motion, causality, and transfer (or change of possession) undertaken by the members of the Event Representation Project at the Max Planck Institute for Psycholinguistics between 1999 and 2004. The study was conducted with a two-pronged design, combining a questionnaire and a video stimulus. The questionnaire – called Event Integration Questionnaire – consisted of a structured list of complex event scenarios represented in a semantic metalanguage, to be used, not in direct elicitation, but as a checklist – the researchers were to collect renditions of the questionnaire scenarios in the target languages by whatever technique seemed applicable, including with the help of the video stimulus (see Bohnemeyer 1999 for further details). The video stimulus – the Event Complexity (ECOM) clips – comprised seventy-four short animated videos representing complex events that involved a number of simple geometrical objects (circles, rectangles, triangles; see Bohnemeyer and Caelen 1999).9,10 Both the questionnaire and the ECOM 9 10
The researchers negotiated culturally appropriate interpretations of the objects and their motions with the consultants; e.g., Mayan speakers interpreted a triangle as a pyramid. Several contributors to the study worked, instead of or in addition to ECOM, with the real-video stimulus Staged Events, developed by M. van Staden, G. Senft, N. J. Enfield, and J. Bohnemeyer specifically for issues of event encoding in multi-verb constructions. Staged Events includes renditions of the ECOM scenarios featuring location change sequences, realized with a remotecontrolled toy car moving around in a model landscape. See van Staden et al. (2001).
50
Bohnemeyer et al.
Figure 3.2 ECOM E7
clips were used to collect descriptions of complex stimulus events under two conditions: (a) the most natural descriptions of the various scenarios in the languages under investigation; and (b) the most “densely packaged” descriptions of the scenarios acceptable in the target languages, i.e. (roughly) those descriptions that made do with the smallest number of clauses while still entailing all relevant sub-events (as prescribed in manuals accompanying the two tools). Results of this research in the motion domain, drawing on primary data from eighteen languages, are reported in Bohnemeyer (2003) and Bohnemeyer et al. (2007). The data collected with the ECOM clips and the Event Integration Questionnaire turned out to be insufficient as the basis for a semantic typology of the segmentation of causal chains. This is because a surprising languageindependent tendency manifested itself in the ECOM descriptions to leave causality largely to implicature. Consider the ECOM clip E7: a blue square bumps into a red circle, causing it to drop a yellow bar onto a green triangle, which breaks; see fig. 3.2.11 Regardless of language, descriptions of this clip are very similar, as far as the encoding of causal relations is concerned, to the English description in the previous sentence. Here are Dutch and Yukatek Mayan examples:
11
Some of the characters of the ECOM clips were given facial expressions to motivate the idea of them controlling inanimate objects (instruments or themes in transfer scenarios).
The macro-event property (5) DUT
51
(. . .) komt een paarsig haakje, komt het beeldscherm binnen. comes a purple hook comes into the screen Botst tegen een rood rondje aan met een geel staafje. bumps against a red round thing with a yellow staff Op het moment dat ie daar tegen aan botst at the moment that it against it bumps valt het gele staafje van het rondje af falls the yellow staff from the round thing off en komt terecht op een groen driehoekje and lands on a green triangle dat in twee¨en splitst . . . which in two splits ‘(. . .) a purple hook appears on the screen. Bumps into a red round thing with a yellow stick. The moment it bumps into it, the yellow stick falls off from the round thing and lands on the triangle, which splits in half (. . .)’
(6) YUK
(. . .) k-u=ch´ıik-pah-al le=chan kw`aadradro=o’, IMPF-A.3=appear-SPONT-INC DEF=DIM square=D2 chich u=t`aal=e’, k-u=koh-ik hard(B.3.SG) A.3=come(INC)=TOP IMPF-A.3=collide-INC(B.3.SG) le=chan (. . .) s`ıirkulo=o’, le=chan s`ırkulo t´uun=o’, DEF=DIM circle=D2 DEF=DIM circle then=D2 k-u, o´ olb`eey, est´ee, k-u=l´uubul IMPF-A.3 it.seems HESIT IMPF-A.3=fall-INC hun-p’´eel chan che’-il y`aan (. . .) ti’=e’, one-CL.IN DIM wood-REL EXIST(B.3.SG) LOC(B.3.SG)=TOP k-u=hats’-ik le=chan tri`aangulo=o’, IMPF-A.3=hit-INC(B.3.SG) DEF=DIM triangle=D2 k-u=k´aach-al. IMPF-A.3=break/ACAUS-INC ‘(. . .) the little square appears, it comes on hard, it bumps into the little (. . .) circle; the circle now, it, apparently, uhm, a little piece of wood that (. . .) [the circle] has falls, it hits the little triangle, [the triangle] breaks.’
Neither of the two descriptions contains a single causative light verb or a single caused-state-change verb, even though both languages have plenty of both. The Yukatek speaker actually employs an anticausative (or ‘middle-voice’) form of a caused-state-change verb (kach ‘to break something’) to refer to the final state change in the chain (the collapse of the triangle) – and this is quite typical. Clearly this phenomenon deserves further attention. For now, one reasonable interpretation of this phenomenon seems the following: Because the ECOM clips feature event sequences, the descriptions are in a narrative format (or ‘genre’). Apparently, there is something of a conflict between narrating
52
Bohnemeyer et al.
events in the “main story line” and the encoding of causal links between these events. It seems that causality is either omitted or backgrounded in narratives (see, e.g. Lascarides 1992). And since in the case of the ECOM clips the causal information is already perfectly recoverable from the event information alone due to Gricean stereotypicality implicatures, speakers do not bother to background it, as that would feel redundant. The lack of causative expressions in the ECOM descriptions forced us to develop a more targeted approach to the elicitation of causal chain descriptions. We assembled a new set of stimuli and designed an elicitation procedure that relies on two types of questions: first, questions as to why a certain event featured in a particular clip happened (‘Why-questions’), and second, questions as to which participant caused the event (‘Who-questions’).12 Researchers were instructed to ask these questions about as many of the events in each of the causal chains featured in the videos as seemed appropriate. In addition, they were asked to pay attention specifically to the first and last link in the causal chains, probing for an expression that would attribute the cause for the final state change (in the scenario in fig. 3.2, the breaking of the green triangle) to the event participant that sets off the whole chain (in fig. 3.2, the blue square). The researchers would do this by offering a range of possible constructions to the native speaker consultants and asking for the best choice. For example, in the case of the scenario in fig. 3.2, the range of possible expressions of the causal relation between the square’s bumping into the circle and the triangle’s breaking in English might look as follows: (7) a. b. c. d.
Did the square break the triangle? Did the square make the triangle break? Did the square cause the triangle to break? Did the triangle break because of the square(’s bumping into the circle)?
The same range of causative expressions was to be used in the Who-questions to the extent they were applicable: (8) a. b. c.
Who broke the triangle? Who made the triangle break? Who caused the triangle to break?
Where the researchers felt a need to avoid existential presupposition, they were to use the form in (9) first: (9) a. b. c. 12
Is there someone in this video who broke the triangle? Is there someone in this video who made the triangle break? Is there someone in this video who caused the triangle to break?
The model for this approach is the ‘Where-question’ in the elicitation of locative expressions pioneered with the now famous BowPed stimulus (see Bowerman and Pederson ms.; Levinson, Meira, and the Language and Cognition Group 2003).
The macro-event property
53
The report in the following section focuses specifically on the encoding of the causal relation between the “first” and “last” participants in the chain, elicited either with the Who-question or with the approach illustrated in (7) above. We used a combination of twenty-one ECOM and eleven Staged Events clips.13 The working title for this selection of stimulus items is ECOM Causality Revisited (ECR).14 The videos were selected so as to achieve a broad-based representation of various factors that might be conceived of as contributing to a given scenario’s (language-specific) ranking on a scale of directness of causation (see Comrie 1981; Kemmer and Verhagen 1994; Shibatani 1976a; Talmy 1988; Verhagen and Kemmer 1997; inter alia). Directness of causation is the central dimension of the etic grid for this study. We assume that directness of causation breaks down into a number of independent factors. Specifically, directness varies in the ECR scenes along the following parameters: Mediation: the number and type of ‘links’ in the causal chain. To simplify matters, the problem is reduced here to the number of event participants involved in the chain and the roles they play in it. Of course these roles are ultimately determined by the kind of event in which a participant is involved. Four role types are distinguished: ‘causer’ (CR) – the participant who sets off the causal chain; ‘causee’ (CE) – an animate “intermediate” participant who may or may not have some degree of control over the event (s)he is involved in; ‘instrument’ (IN) – an inanimate “intermediate” participant over which the CR/CE has complete control;15 and ‘affectee’ (AF) – the participant undergoing the state change that marks the final link in the chain.16,17 Four mediation types may be distinguished in the ECR clips along these lines: CR > AF (a causer directly effecting a change on an affectee without involvement of a causee or instrument); CR > IN > AF (a causer effecting a change on an affectee with 13
14 15
16
17
Again, Staged Events is a real-video stimulus developed specifically for the study of event representation in multi-verb constructions. It includes real-video renditions of some ECOM scenarios, but also additional scenes not instantiated in ECOM. See van Staden et al. (2001). See Bohnemeyer and Majid (2002). It is implied here that causee and instrument are poles of a continuum. A type of role intermediate between the two that is of some relevance in the ECR scenarios is that of an inanimate object over which the causer has no control, or insufficient control. In order to qualify as “intermediate,” a participant has to (a) be acted upon (or causally affected) by the next “higher” participant in the chain and (b) act itself upon (or causally affect) the next “lower” participant. These labels designate roles as part of the etic grid (see section 2) of the study and thus should not be confused with semantic roles in language-specific causative constructions that encode any of the stimulus scenarios. For example, a particular linguistic representation may well choose to “emically” frame as a causer what is “etically” a causee. Note also that the terminology used in this study deviates from a convention often found in the literature (e.g., Kemmer and Verhagen 1994; Verhagen and Kemmer 1997) which refers to the final participant in the chain as ‘causee’ unless an intermediate causee is involved, and only in that case calls the final participant ‘affectee.’
54
Bohnemeyer et al.
the help of some instrument); CR > CE > AF (a causer effecting a change on an affectee via a causee); and CR > CE > IN > AF (a causer effecting a change on an affectee via a causee and with the help of some instrument).18 Contact: this refers to spatio-temporal contiguity of the various events in the causal chain, or conversely, to the presence of spatial or temporal “gaps.” For example, someone who hits a plate with a hammer affects it more directly than someone who merely hits the table on which the plate is placed. Similarly, the agent with the hammer may be viewed as less directly responsible for the plate’s breaking if the breaking occurs not instantly, but only after some lapse of time.19 Many of the ECR items featuring spatio-temporal gaps were modeled after the ballistic collision displays Albert Michotte used in his classic research on ‘phenomenal causality’ (see, e.g., Michotte and Thin`es 1963). Since lack of spatio-temporal contiguity can affect any and every link in a causal chain, the set of combinatorial possibilities is large, and only a relatively small number of the possible combinations are actually instantiated in ECR. Force dynamics: Talmy (1988) has argued that causation is conceptualized as a special type of interaction of (mechanical or metaphorical) forces. In the simplest case, a “stronger” ‘agonist’ “overpowers” a force-dynamically “weaker” ‘antagonist,’ thus forcing a change of state against the antagonist’s inherent tendency. Another pattern, which Talmy calls ‘letting’ or ‘enabling,’ results when a stronger antagonist ends its impingement on a weaker agonist’s inherent tendency towards change. Force-dynamic patterns, too, may influence the directness of causation. For instance, if someone drops a plate and it shatters upon hitting the floor, the person may be thought to cause the plate’s breaking less directly than if (s)he had smashed the plate to pieces with a hammer. In the former case, gravity takes part of the blame, as it were. The number of force-dynamic patterns distinguished on Talmy’s approach is large all by itself, and once again, there is no a priori reason why any of these patterns could not apply to any of the links of a causal chain. Only a small fraction of these possible combinations are realized in ECR. The stimuli include seven clips in 18
19
There is in fact only one instance of CR > CE > IN > AF realized in ECR – namely in the ECOM E7 clip sketched in fig. 3.2 above. In a more complete stimulus set, one would obviously also want to include the options CR > IN > CE > AF and CR > IN > CE > IN > AF. Intuitively, the entire domain of psych causation – animate causees under the impact of external causers carrying out activities which are primarily internally caused (see Smith 1978) – is intimately tied to a lack of spatio-temporal contiguity. Not only do psych causation events not seem to involve contiguity, but many of the ECR stimuli that feature “gaps” are interpreted in language after language as involving psych causation. It should be stressed in this connection that none of the ECR stimuli were designed to show psych causation. The domain of ECR, from an etic perspective, is physical causation, albeit in some instances physical causation across a spatial gap or after a temporal delay.
The macro-event property
55
which enabling-type dynamics obtain between causee and instrument, causer and affectee, or causee and affectee; all of these involve the pull of gravity. All other stimulus items feature exclusively causation-type dynamics. Four languages were included in the study reported here: Ewe (studied by Essegbey), Japanese (Kita), Lao (Enfield), and Yukatek (Bohnemeyer). All languages were studied in the field, i.e., in Ghana, Japan, Laos, and Mexico. The number of speakers consulted ranged from three in the case of Lao to seven in the case of Yukatek, with four speakers for Japanese and six for Ewe. 4
The encoding of complex causal chains across languages
Examples of initial evidence for typological variation in the domain of causality can be found in Pawley’s (1987) study of event segmentation in Kalam. Kalam has few verbs that lexicalize caused state change, which is commonly expressed by serial verb constructions and clause chains instead: (10) KAL
kab a˜na˜n ap yap pkek, pagak ok stone glass come fall it:having:struck:DS it:broke that ‘A stone fell and struck the glass and it broke’ (Pawley 1987: 355)
Pawley notes that (10) is the closest Kalam equivalent of The stone broke the glass. The four languages in our sample are typologically extremely diverse in terms of their resources for the encoding of causal chains. At the lexical level, Lao differs from the other three languages in that – not unlike Kalam – it has few simple verbs that lexicalize caused state change.20 Not one of the thirtytwo ECR scenes can be described in a single clause headed by (or projected from) a single transitive verb. The most simple construction used in reference to any of the ECR scenarios is a ‘resultative’ multi-verb construction (MVC) in which the first verb phrase may be transitive or intransitive and describes the causing event, and the second verb phrase is intransitive and encodes the caused event:21 (11) LAO
kuu3 thup1 c`oo` k5 t`ee` k5 1.SG smash glass break ‘I smashed the glass (and it) broke’
(12) LAO
∗
20 21
kuu3 n`ung1 moong2 thup1 c`oo` k5 s`oo` ng3 moong2 t`ee` k5 1.SG one hour smash glass two hour break ‘I smashed the glass at one (and it) broke at two’
There are transitive verbs that lexicalize destructive activities, such as thup1 ‘crush’; however, these merely implicate, but do not entail, state change. Superscript numbers indicate register tones in the Lao orthography used here.
56
Bohnemeyer et al.
As (12) illustrates, this construction has the macro-event property (MEP). Ewe has similar resultative MVCs, which likewise have the MEP. However, these play a lesser role in the ECR data, since they compete with simple transitive verb clauses.22 (13) EWE
e-tutu-i do a 3.SG-push-3.SG send away ‘[The circle] pushed [the square] away’
The domain of such resultative MVCs is properly included in that of simple transitive verb clauses in Ewe. Resultative MVCs are by and large restricted to unmediated (CR > AF type) chains in which there is contact between causer and affectee at the time of change. The core domain of simple transitive verb clauses in Ewe, Japanese, and Yukatek within the etic grid of this study is that of unmediated (CR > AF) spatio-temporally contiguous chains that involve only causation dynamics. The same may be said of the Lao ‘resultative’ multi-verb construction in (11). Mediation by an instrument has only a minor effect on the acceptability of the simplest construction; to be more precise, whether or not the presence of an instrument has an effect on the applicability of the simplest construction seems to depend mostly on whether it is indeed construed as an instrument. Mediation by a causee, however, renders the simplest construction squarely inapplicable in Japanese and Lao and disfavors it in Ewe and Yukatek.23 There is some variation in the extensional range of the simplest construction along the parameters of contact and force dynamics. Lack of contact strongly disfavors simple transitive verb clauses in Yukatek, but not in any of the other languages. Delays between impact and change – temporal discontinuities – disfavor simple transitive verb clauses in Japanese, while they have no effect by themselves in any of the other languages. Ewe simple transitive verb clauses become dispreferred only under the combined impact of lack of spatial and temporal contiguity. The Lao resultative MVC is strongly disfavored by enabling dynamics, which has no effect on the simplest construction in any of the other languages – as a matter of fact, enabling dynamics does not have a demonstrable effect on any construction in any language, aside from the Lao resultative MVC. Thus, it appears that simple transitive verb clauses in Yukatek are contact-sensitive, while their Japanese counterparts are timingsensitive and their Ewe equivalents are contact-and-timing-sensitive (but in 22 23
These Ewe constructions are known under a range of different labels in the literature; see, for instance, Ameka (2005a,b) for a recent summary. Ewe speakers readily describe chains mediated by causees with simple transitive verb clauses provided they interpret the causer as clearly intending the outcome. This would correspond to the condition under which Japanese speakers may apply morphological causatives to a chain mediated by a causee, as noted above; however, it seems that the threshold for attributing intention may be different in the two cultures.
The macro-event property
57
the sense that lapses of spatial and temporal contiguity only have an effect where both types occur simultaneously). Lao resultative MVCs, in contrast, are force-dynamics-sensitive. Future research will need to clarify the source of these crosslinguistic differences. There are three possibilities here – none of which are mutually exclusive. It is possible that these differences are rooted in the semantic properties of the constructions in question. Or they may be pragmatic consequences of some formal (structural) properties. It is also very much conceivable that the extensional differences of the constructions reside in culture-specific nuances in the conceptualization of causality. Japanese and Yukatek are grouped together against Ewe and Lao by the presence of causative morphology, i.e., affixal valence-changing or voice operations that license a causer argument semantically linked to the participant of a newly introduced causal sub-event. In Yukatek, causative morphology is restricted to (‘unaccusative’) intransitive verb roots or stems that encode noninternally caused events (mostly state changes), such as l´uub ‘fall’ in (14); see Bohnemeyer (2003) for details: (14) YUK
t-u=l´uub-s-ah PRV-A.3=fall-CAUS-CMP(B.3.SG) u´ uch u=koh-ik le=x-ch’´uup=o’ happen(B.3.SG) A.3=bump.into-INC(B.3.SG) DEF=F-female=D2 ‘[He] caused [the plate] to fall the way he bumped into the woman’
There is no such restriction in Japanese. Compare, for instance, (15), with the non-internally caused denominal verb idoo-s ‘change location’, and (16), with a transitive stem: (15) JPN
akai maru-ga aoi shikaku-o oshi-te red circle-NOM blue square-ACC push-CON idoo-s-ase-ta change.of.location-do-CAUS-PAST ‘The red circle made the square change its location by means of pushing it’
(16) JPN
onna-no hito-ga otoko-no hito-ni female-GEN person-NOM male-GEN person-LOC osara-wo war-ase-ta plate-ACC break-CAUS-PAST ‘The woman made/let the man break the plate’
However, even though (16) encodes a causal chain involving three participants (a causer, causee, and affectee), no description of this type actually occurred in the ECR data. The reason for this conspicuous absence appears to be that the construction type in (16) strongly implicates that the causer, the woman in (16), intended the caused event, in (16), the breaking of the plate. Intended outcome is not clearly applicable to any of the ECR clips that involve mediation by a
58
Bohnemeyer et al.
causee. Overall, morphological causatives play only a marginal role in the ECR data from either language, and in both data sets, the conditions under which morphological causatives are used are more or less identical to the conditions under which underived transitive verbs are used.24 Three of the four languages – all except for Japanese – have periphrastic constructions involving causative ‘light’ verbs. In all instances, these constructions have the macro-event property (MEP). In Ewe, na ‘give,’ ‘make’ and wɔ ‘do’ are used as causative light verbs. The complement referring to the caused event may be intransitive or transitive: (17) EWE
ŋutsu-a na (be) agba gb˜a man-DEF make that plate break ‘The man made the plate break’
(18) EWE
nyɔnu-a ye na/wɔ-e (be) ŋutsu-a gb˜a agba woman-DEF FOC make/do-3.SG that man-DEF break plate ‘It is the woman who made the man break the plate’
As (17)–(18) show, the complementizer be is optional in this construction. Example (19) illustrates the MEP: (19) EWE
∗
ŋutsu-a na/wɔ-e etsɔ man-DEF make/do-3.SG yesterday (be) ŋutsu-a gb˜a agba egbea that man-DEF break plate today ‘Yesterday the woman caused the man to break the plate today’
Periphrastic causative constructions in Lao employ haj5 ‘give,’ hˆet1 ‘make,’ or a combination of both: (20) LAO
man2 hˆet1 k`ee` w4 t`eek5 /sia3 3 make glass break/be.lost ‘He broke/lost the glass’
(21) LAO
man2 hˆet1 -haj5 kuu3 met2 ngen2 laaj3 3 make-give 1 finish money much ‘He caused me to lose a lot of money’
The semantic differences between these choices are not yet fully understood. There are no restrictions in terms of the transitivity of the complement. Thus, as in Ewe, periphrastic causative constructions can be used to encode chains that involve a causee, as in (21). Examples (22)–(23) show that these constructions have the MEP: 24
Both Japanese and Yukatek also use compound verbs to encode causal chains. But again, these are used for the same types of scenarios – in terms of the distinction built into the etic grid of the study, as laid out above – as simple transitive verbs are.
The macro-event property
59
∗
(22) LAO
man2 n`ung1 moong2 hˆet1 k`ee` w4 s`oo` ng3 moong2 t`ee` k5 /sia3 3 one hour make glass two hour break/be.lost ‘He at one broke/lost the glass at two’
(23) LAO
∗
man2 n`ung1 moong2 hˆet1 -haj5 3 one hour make-give kuu3 s`oo` ng3 moong2 met2 ngen2 laaj3 1.SG two hour finish money much ‘He caused me at one to lose a lot of money at two’
In Yukatek, causative periphrases are formed with m`eet ‘do,’ ‘make.’ The complement may be intransitive (externally or internally caused) or transitive, and in the latter case, it may appear in the active, anticausative (or ‘middle’), or passive voice. (24) YUK
leti’ le=chan t`aabla=o’ it DEF=DIM plank=D2 k-u=m`eet-ik uy=op’-ik IMPF-A.3=make-INC(B.3.SG) A.3=burst-INC(B.3.SG) le=m´aak le=chan tri`aangulo y=´eetel le=chan che’=o’ DEF=person DEF=DIM triangle A.3=COM DEF=DIM wood=D2 ‘That little plank [i.e. the blue square], it made the person [i.e. the red circle] burst the triangle with the stick’
The complement has to be transitive in order to permit the encoding of a causee. The causee is linked to the “subject”25 of the embedded verb in the active voice or to an adjunct in the passive. Consider the contrast between (25), where the complement appears in the anticausative and the intermediate participant in the chain (the hammer) is construed as an instrument, marked by the comitative preposition e´ etel ‘with,’ and (26), where the complement is in the passive and the hammer is construed as a causee, marked by the causal preposition tum´een ‘by,’ ‘because of’: (25) YUK
25
t-u=m`eet-ah uy=´oop’-ol PRV-A.3=make-CMP(B.3.SG) A.3=burst/ACAUS-INC y=´eetel le=m´aartiyo=o’ A.3=COM DEF=hammer=D2 ‘(S)he made it burst with the hammer’
The ‘A’-argument in the sense of Dixon (1994), or the ‘actor’-argument in the parlance of Van Valin and LaPolla (1997). If there is a grammatical relation of subject in Yukatek – which is not obvious – then it is not consistently marked; see Bohnemeyer (2004) for details.
60
Bohnemeyer et al.
(26) YUK
t-u=m`eet-ah uy=op’-a’l PRV-A.3=make-CMP(B.3.SG) A.3=burst-PASS.INC tum´een le=m´aartiyo=o’. CAUSE DEF=hammer=D2 ‘(S)he caused it to be burst by the hammer’
Example (27) illustrates the MEP: (27) YUK
∗
Juanita=e’ by`eernes-ak=e’ t-u=m`eet-ah Juanita=TOP Friday-CAL=TOP PRV-A.3=make-CMP(B.3.SG) u=m`ıis-t-ik u=nah-il Pedro s`aabado A.3=broom-APP-INC(B.3.SG) A.3=house-REL Pedro Saturday ‘Juanita, last Friday, she made Pedro sweep her/his house on Saturday’
In all three languages that have causative periphrases, mediation by a causee strongly favors this construction type. Lack of spatio-temporal contiguity likewise makes periphrastic causatives the preferred choice over simpler constructions. In Lao – and only in Lao – enabling dynamics also favors periphrastic constructions. Japanese lacks periphrastic causative constructions of the kind illustrated above, in first approximation, because it lacks a suitable causative light verb. Together with the restriction of synthetic (morphological) causatives to scenarios in which the caused event is clearly intended by the causer, the absence of causative light verb constructions imposes a set of constraints on the event segmentation of causal chains in Japanese that differs dramatically from that in the other three languages. The next least complex (or most densely packaged) alternative to simple transitive verb clauses in Japanese is a variety of constructions that employ ‘converb’ forms, i.e., subordinate verb forms morphologically marked for various semantic relations between the event or proposition expressed by the subordinate clause and the event or proposition referred to by the main clause (see Hasegawa 1996). Some of these constructions have the MEP, but most do not. Among the converb constructions featured in the ECR corpus, the only ones that have the MEP employ a -te converb: (28) JPN
onna-no hito-ga osara-o teeburu-ni female-GEN person-NOM dish-ACC table-LOC tataki+tsuke-te wat-ta hit+attach-CON break-PAST ‘The woman broke the dish by smashing it against the table’
(29) JPN
onna-no hito-ga hanmaa-o female-GEN person-NOM hammer-ACC otoshi-te sara-o wat-ta drop-CON dish-ACC break-PAST ‘The woman broke the dish by dropping a hammer’
The macro-event property
61
Here, the matrix clause encodes a causal chain involving causer and affectee, and the converb clause serves to further specify the causing event. The subjects of the two clauses must be coreferent. This construction does not permit the encoding of a causee. Its domain is largely coextensive with that of simple transitive verb clauses. It is slightly favored over plain transitive clauses by scenes that involve an instrument, especially under lack of contact between causer and affectee at the time of change. Examples (30)–(31) illustrate the MEP: (30) JPN
∗ onna-no hito-ga osara-o teeburu-ni tataki+tsuke-te female-GEN person-NOM dish-ACC table-LOC hit+attach-CON go-fun-go-ni wat-ta five-minute-later-LOC break-PAST ‘The woman broke the dish five minutes later [i.e., after smashing it] by smashing it against the table’
(31) JPN
∗ onna-no hito-ga hanmaa-o otoshi-te female-GEN person-NOM hammer-ACC drop-CON go+fun+go-ni sara-o wat-ta. five+minute+later-LOC dish-ACC break-PAST ‘The woman broke the dish five minutes later [i.e., after dropping the hammer] by dropping a hammer’
This construction contrasts with the one illustrated in (32) which has the causal converb formative -node: (32) JPN
te-de hageshiku teeburu-o tatai-ta-node hand-COM hard table-ACC hit-PAST-because osara-ga ware-ta plate-NOM break-PAST ‘Because (someone) hit the table hard, the plate broke’
Here, the -node converb clause semanto-syntactically behaves very much like a causal adverbial clause in English – it has its own tense inflection and subject. (The two subjects are referentially disjoint – notice that waru ‘break’ in (31) is transitive, but wareru ‘break’ in (32) is intransitive.) As (33) shows, this construction lacks the MEP: (33) JPN
te-de hageshiku teeburu-o tatai-ta-node hand-COM hard table-ACC hit-PAST-because go-fun-go-ni osara-ga ware-ta five-minute-later-LOC plate-NOM break-PAST ‘Because [someone] hit the table hard, the plate broke five minutes later’
A variety of other converb constructions occurred during ECR elicitation – these do not even entail a causal relation between causing and caused event, but merely implicate such a relation. The construction in (33) is semantically
62
Bohnemeyer et al.
Figure 3.3 Early and late frame of ECR 18
and in terms of its form-to-meaning mapping properties equivalent to causal connective constructions in the other three languages. Ewe, Lao, and Yukatek all possess such constructions. (34) is a Lao example: (34) LAO
kuu3 b`oo` 1 -daj◦ -paj◦ -lin5 t`oo` n3 s`oo` ng3 moong2 1.SG NEG-achieve-go-play period two hour n˜ o` o` n4 kuu3 b`o◦ -mii2 ngen2 t`oo` n3 n`ung1 moong2 because 1.SG NEG-have money period one hour ‘I didn’t go out at two because I didn’t have any money at one’
As the example shows, this construction does not have the MEP. However, no single ECR clip required a connective construction to express the causal relation between CR and affectee in Ewe, Lao, or Yukatek, or even elicited a connective construction as the preferred response. In contrast, exactly half of the ECR clips elicited a non-MEP converb construction as the preferred response type in Japanese, and in response to five of these sixteen scenes, a non-MEP converb construction was in fact the only option of encoding the causal relation between causer and affectee. Lack of contact and delays between cause and effect both favor non-MEP converb constructions in Japanese, and mediation by a causee leaves it as the only option of encoding a causal relation. This means that whereas all thirty-two ECR scenarios can be represented by single macro-event constructions in Ewe, Lao, and Yukatek, Japanese speakers prefer multiple macro-event constructions in half of the cases, and are left with multiple macroevent constructions as the only resource for representing causal chains mediated by a causee. Just as the study on motion event segmentation summarized in Bohnemeyer et al. (2007) unearthed profound and systematic crosslinguistic differences in the constraints imposed by language on the encoding of location change sequences, so our study reported on here has found dramatic systematic differences in the constraints different languages impose on the segmentation of causal chains. The findings of the ECR study may be illustrated with a selection of scenes suitable to bring out the crosslinguistic variation that has been discovered. One stimulus item that is categorized as minimally complex across the four languages is ECR 18, as depicted in fig. 3.3. A red circle slides or rolls across the screen until it “hits” a stationary blue square. The two figures then travel on together in the same direction until they leave the screen. This is the type
The macro-event property
63
Figure 3.4 Early and late frame of ECR 5
of scene Michotte and Thin`es (1963) dubbed ‘entraining.’ The causal relation between the motion of the circle and that of the square can be described by a simple transitive verb clause in Ewe, Japanese, and Yukatek. Speakers of all three languages also offered more complex descriptions, using a resultative multi-verb construction in the case of Ewe, a multi-macro-event converb construction in the case of Japanese, and a periphrastic causative construction in the case of Yukatek. Lao, however, requires minimally a resultative MVC to encode the causal relation in this scenario, as it lacks simple transitive verbs that could do the job. Now consider ECR 31. This is identical to ECR 18, except that the circle never actually “touches” the square. It stops at a short distance from the square, and after half a second or so both objects start to travel in the same direction in which the circle moved before, keeping the distance between them constant. This clip thus features a disruption of contiguity both in the spatial and the temporal domain. This still can be described by a simple transitive clause in Ewe and Japanese, although the preferred strategy in Japanese is now a non-MEP converb construction. Yukatek speakers, however, require a periphrastic causative construction to express the causal interaction between the circle and the square, and Lao speakers – although they can use a resultative MVC in reference to this clip – likewise prefer a causative periphrasis. Next, consider ECR 5: a woman drops a plate onto a table, and the plate shatters; see fig. 3.4. In this scenario, the causer still inflicts a state change on the affectee relatively directly (with her bare hands, literally), except for the involvement of enabling dynamics and the role of gravity. For speakers of Ewe, Japanese, and Yukatek, it is perfectly acceptable in this case to say that the woman ‘broke the plate,’ using a simple transitive verb clause (in Yukatek, a periphrastic description is considered equally appropriate, due presumably to the lack of contact between CR and affectee at the time of change). But in Lao, due to the special role force dynamics appears to play in determining the applicability of causative constructions in this language, not even a resultative
64
Bohnemeyer et al.
Figure 3.5 Early and late frame of ECR 23
MVC is acceptable in reference to this scene – one has to use a causative periphrasis. ECR 5 contrasts minimally with ECR 32, in which the woman drops a hammer onto the plate, again with the effect of the plate shattering. The addition of an instrument to the causal chain has no effect on the Lao representation – again, periphrastic causative constructions are the only option. But in the other three languages, the level of formal complexity is revved up one notch to accommodate the increased conceptual complexity. In Ewe, a simple transitive verb clause is still the preferred response, but a resultative MVC emerges as an alternative. In Japanese, a converbial macro-event construction a` la (29) above becomes the preferred response. And in Yukatek, a causative light verb construction now is preferred over a simple transitive clause. Finally, consider ECR 23 – a man bumps into a woman, who is holding a plate. She drops the plate, and the plate hits the floor and shatters. To attribute the cause of the plate’s breaking to the man, Ewe speakers may either use a simple transitive clause (‘He (focus) broke the plate’) or a periphrastic construction (‘He (focus) made her break the plate’) – both descriptions are considered equally acceptable. In Yukatek, a periphrastic representation is again preferred over a simple transitive verb clause in response to this scenario. In Lao, a periphrastic light verb construction is the only choice. And the Japanese consultants all opted for a non-MEP converb construction (‘The plate broke because he tickled her’) as the most densely packaged acceptable representation that encodes the causal relation between the man’s action and the plate’s breaking. 5
Summary and implications
This chapter has presented some building blocks of a semantic typology of event segmentation. The domain of event segmentation differs critically from
The macro-event property
65
the domains of the classic studies of the cognitive anthropologists – terminologies for color categories, kinship relations, and ethnobiological classifications – in that it cannot be adequately captured in terms of lexicalization alone. Events are linguistically represented, not just by lexical items, but by morphosyntactic constructions and entire discourses. One could resort to comparing the semantic extension of particular types of event-denoting constructions – verb phrases, clauses, etc. – across languages. But this is likewise unsatisfactory, for a number of reasons. There is no single construction type that is uniquely dedicated to the encoding of events – so how would one generalize across construction types? And without such generalizations, how is one to capture the impact that differences in the availability of certain constructions have on event segmentation in particular languages? The proposal that has been advanced here is to abstract away from individual construction types to a property of construction types that describes their behavior at the syntax–semantics interface, in such a fashion as to directly determine the semantic event representations a construction type is compatible with. This property is the macro-event property (MEP). Since events are critically individuated by their temporal properties – their boundaries, duration, and “location” (order and distance) relative to other events or times – the MEP registers the compatibility of event-denoting constructions with operators that modify or specify these temporal properties. For a construction to have the MEP means for it to “package” an event representation so “tightly” as to render its proper sub-events inaccessible to those temporal operators that might individuate them. Researchers can analyze event descriptions into the constructions they consist of and evaluate these in terms of the MEP. The MEP then provides a criterion of event segmentation that is sensitive to both lexicalization and morphosyntactic packaging and that is readily applicable crosslinguistically irrespective of the particular type of event-denoting construction. The application of the MEP to the semantic typology of event segmentation in causal chains in the study presented here has indeed confirmed that event segmentation, in terms of the constraints imposed by individual languages on the information about an event that can be packaged into certain constructions, as assessed via the MEP, is a function of the interaction of lexicalization patterns and the availability of morphosyntactic constructions. Among the four languages considered, a split emerges between Japanese and the other three languages in the encoding of chains that involve causees (animate intermediate participants with some amount of control over the sub-event they are immediately involved in): Japanese requires the use of multiple macro-event expressions to encode the causal relation between the initial causer and the final state change (at least unless the outcome is construed as clearly intended by the causer), whereas the other three languages permit the encoding of these types of scenarios in single macro-event expressions. The reason is a combination of
66
Bohnemeyer et al.
lexical and syntactic factors: Japanese lacks causative light verb constructions, and in fact lacks the requisite causative light verbs; and causative morphology is restricted to intended outcomes. (There are other interesting lexical and morphosyntactic differences in this domain between the four languages, such as the lack of caused state change verbs in Lao; but these happen not to affect the segmentation of chains among macro-event expressions.) The case study reported on here has confirmed Giv´on’s (1991b) claim that lexicalization alone is not an adequate measure of event segmentation. Yet, at the same time it has also fully confirmed the extent of the crosslinguistic variation in event segmentation argued for on the basis of the contrast in lexicalization between English and Kalam in Pawley (1987). This variation has been found to occur not just in lexicalization, but to “project upwards” into constraints on syntactic packaging, given particular kinds of interaction between lexical and morphosyntactic factors. The ultimate question raised by the crosslinguistic variation in event segmentation we have found is, naturally, that of its implications for the language– cognition interface. A classic relativist view would be that internal cognitive event representations vary with linguistic constraints. A cognitive universalist, in contrast, would argue that linguistic event representations are “supported” by the same internal cognitive representations, regardless of how many macroevent expressions they may have to be segmented into depending on the constraints imposed by particular languages. We do not at this point have any evidence that bears on this debate. We would, however, like to point out that the macro-event property (MEP) is neither a purely syntactic nor a purely semantic property. Consider (35)–(36): (35) a. b.
Sally went from Nijmegen to Arnhem. Sally left Nijmegen and then went to Arnhem.
(36) a. b.
Floyd pushed the door shut. Floyd pushed the door and it shut.
Examples (35a) and (36a) have the MEP, and (35b) and (36b) do not. But (35a) and (35b) convey the same information, and so do (36a) and (36b). The difference between (35a) and (35b) is mostly in the division of labor between semantics and pragmatics, and the same holds for (36a) and (36b).26 In particular, the causal relation between Floyd’s pushing and the door’s closing is entailed in (36a), but merely implicated in (36b). But in order to implicate the same scenario as in (35a), a speaker uttering (35b) must have the same scenario “in mind” in some sense. The same holds again with respect to (36a–b). Barring 26
In addition, there are differences in lexical aspect: (35a) and (36a) are accomplishments, (35b) is a sequence of two achievements, and (36b) is a sequence of an achievement or activity and an achievement.
The macro-event property
67
further evidence, we tentatively conclude that the difference between (35a) and (35b) lies not so much in the underlying mental representations, but in the mapping between these conceptual representations and syntax. The same holds for (36a) and (36b). The MEP is a form-to-meaning mapping property of event-denoting constructions at the syntax–semantics interface. The evidence from interface constraints such as the bi-unique mapping of thematic relations being sensitive to the MEP (as discussed in Bohnemeyer et al. 2007) suggests that the MEP is built into the design of human language itself. But we see no reason to assume that for the MEP to operate in language, it needs to be supported by an ontological category of “macro-events” in internal cognition.
4
Event representation, time event relations, and clause structure A crosslinguistic study of English and German Mary Carroll and Christiane von Stutterheim
1
Introduction
One of the central questions in cognitive linguistics concerns human cognition and the way dynamic situations are structured for expression. When language is used to convey information on experience, it is far from being a mirror of what was actually perceived. Representations are based on information stored in memory and retrieved when construing a reportable event in the language used. Taking the linguistic output as a point of reference, the process is selective, perspective-driven and interpretative. Crosslinguistic studies of event representation show that the perspectives chosen can differ, depending on the expressive means available to the speaker, and the term ‘event representation’ is used in the following to relate to event construal at this level. Many languages require speakers to direct attention to temporal contours of events, for example, as in aspect-marking languages such as Modern Standard Arabic, where events are viewed and encoded as to whether they are completed, ongoing, or relate to a specific phase (inceptive, terminative, etc.). When talking about events, speakers may also have to accommodate relational systems that include reference to the time of speech, since formal means of this kind allow us to say whether an event occurred in the near or distant past, for example, or just now. An assertion such as the lights went out when the dog barked is grounded in context, in temporal terms, since the time for which the assertion holds has been specified as preceding the time of utterance. Attention can be directed to the status of the participants in the event by placing references at relevant positions in the clause. In the following utterance, attention is directed to the participant a dog by using a presentative structure such as the existential there is, as in there was a dog sitting on the mat that barked when the lights went out. The participant a dog is singled out for attention (in contrast to the utterance above) by mapping it onto form in this way. In terms of information structure, an existential is a form of presentational that serves to call special attention to one element of the sentence (Hetzron 1975). Its purpose is to present a previously inactive, brand-new referent in the text-internal world (Lambrecht 1994). 68
Time event relations and clause structure
69
This is just one way of directing attention to entities via grammatical means (see Tomlin 1997). Speakers generally take this battery of linguistic knowledge into account when talking about events, so the question is: Do the means used in anchoring an event and its participants in context have implications for the way in which the event is represented? Is grammatical knowledge the servant in this process in that it is brought to bear on a ready-made outcome in event representation and comes up with the best possible fit, or is it incorporated at an earlier stage in order to help ensure the best possible fit? In pursuing this question the crosslinguistic analysis takes into account the range of linguistic requirements that speakers must satisfy when grounding information on an event in context. As shown in the following, it investigates the extent to which event representation varies, depending on the means used to direct attention to participants in events, as well as the temporal relational systems and other means that determine how an event is anchored in context. The status of the concept of an event is evidenced in the fact that the notion of time has been treated in some theories on the basis of temporal relations between events such as precedence or simultaneity (Kamp 1979; Russell 1936). The assertion of a temporal relation such as precedence requires not only a witness or observer, but a temporal entity that is somehow individuated. Definitions of what constitutes an event (or process) have relied on how they contrast with states. While events and processes may have temporal parts (beginning, middle, end), states do not. Events and processes can be subdivided into parts that can be viewed as the same in kind (to knot knots in a string, for example) or as heterogeneous (to mend a tire), allowing significant analogies with the domain of objects with regard to the distinction between mass nouns vs. count nouns with respect to the factors individuation, as well as subdivisibility (see Bach 1981). Event structure thus lends itself to descriptions in terms of part–whole relations, a feature at the focus of the present crosslinguistic analysis (see below). In addition to these factors, the concept of an event is bound up with notions of agentivity and volition as well as associated causal relations (see, e.g., Dowty 1979, 1991). As indicated above, tense and aspect characterize event structure, in addition to the lexical content of the verb and its arguments (Comrie 1976; Bach 1981; Dahl 1985; Parsons 1990; Smith 1991; Rothstein 1998a; Klein 1994; Klein, Li, and Hendricks 2000; Higginbotham, Pianesi, and Varzi 2000; Wunderlich 2006). A wide range of crosslinguistic comparisons have focused on languagespecific differences found in the conceptualization and representation of specific event types and how they are linked to the way relevant concepts are mapped linguistically. This applies in particular to motion events (Talmy 1985, 2000a; Slobin 1991, 2000; Gumperz and Levinson 1996a; Bohnemeyer et al. 2007), separation events such as to cut, break something (Majid et al. 2004),
70
Carroll and von Stutterheim
event serialization (Pawley 1987; Talmy 2000a). Going beyond the representation of event types of this kind, language specific means have been observed when sequencing sets of events (narratives, reports), where underlying temporal frames can be shown to follow language-specific principles that are grammatically driven (von Stutterheim, Carroll, and Klein 2003; von Stutterheim and N¨use 2003; Carroll et al. 2008). The semantic domain under study in the present analysis involves events associated with everyday situations (hammering a nail into the sole of a shoe) that can be represented in terms of the overall event (repairing a shoe) or one of its sub-events (hitting a nail with a hammer into the sole of the shoe). The languages at the focus of analysis, English and German, have access to similar lexical means in order to represent situations of this kind via their verbal lexicon. However, they differ in the grammatical means used when grounding events in context (i.e., in specifying the time interval for which an assertion holds; directing attention to participants in the event). The present analysis investigates the extent to which grounding requirements of this kind, and associated grammatical means, are linked with options chosen in event representation.
2
Grounding events, time of assertion, and event representation
In order to test the possible impact of this form of linguistic knowledge on event representation, standardized video clips (40) were designed for the crosslinguistic comparison. The information presented was new in each clip, thus increasing the likelihood that speakers would ground the events and direct attention to the participants as required, when asked to tell what is happening.
2.1
Clause type and time of assertion
A pilot study had shown that the preferred option in English in telling what is happening is to introduce the main participant in the event in question by means of a presentational. In the majority of cases this is the existential there is, as in there is a boy. The temporal information encoded by this clause asserts the existence of the entity in the discourse world. This means that information on an associated event in which x is a participant (e.g., there is a girl who is hitting a ball with a bat) is mapped into a relative clause introduced by the relative pronoun who. A video clip showing a man sitting at a typewriter and getting ready to type can be represented as there is a man (main clause) (who is) typing on a typewriter (dependent clause)
Time event relations and clause structure
71
Presentationals allow the speaker to place information that is being mentioned for the first time (e.g. a man) in a position following the finite verb. As mentioned above, this satisfies requirements in information structure for first mentions: place inaccessible information in a position following the verb and reserve the position preceding the verb for information which is familiar, or in some way accessible to the interlocutor. The fact that information on the event is mapped onto the dependent clause has implications for temporal grounding. With the statement there is a man, the speaker makes an assertion about a time interval in the here and now. Significantly, the time for which the assertion holds is coded explicitly for information in the main clause only, i.e., for the existence of the entity (there is a man). “It is now the case that a man exists” is asserted as holding for the time span at issue (topic time). The event encoded in the dependent clause does not have an assertion time in explicit terms, since it is in a subordinate relation to the main clause. This means that the time span for which the event holds is underspecified (see in detail, Klein 1994, 2006, and section 4 below). The event encoded in the dependent clause is not strictly tied to the time interval asserted via the tensed finite verb of the main clause. What this entails for event representation will be analyzed below. Speakers can also map information on the situation into a single main clause, and leave the indication that the referent in question is being mentioned for the first time to an indefinite article (e.g. a man): A man is inserting a sheet of paper into a typewriter In this case information on the event is mapped into the finite verb of a main clause and the time span for which the assertion holds relates to the event (here and now). 2.2
Macro-events, sub-events, experimental design
Taking again the situation represented as there is a man typing at his desk, the video clip starts with the participant seated at a desk in an office: Takes a sheet of paper Inserts the sheet into a typewriter Positions the sheet (end of clip) English speakers prefer to represent this situation in overall terms as what can be called the macro-event there is a man typing, rather than the sub-events shown in the clip. Following Bohnemeyer et al. (2007), a construction has the macroevent property to the extent that it packages an event representation so that temporal, aspectual operators cannot access proper sub-events individually; in
72
Carroll and von Stutterheim
this sense temporal operators for the macro-event necessarily have scope over all sub-events (see also Talmy 2000a). The decision in the experimental design as to what a typing situation encompasses is based on criteria that rely on world knowledge and trial and error in standardizing the video clips. In the real world, a situation represented as typing at a typewriter could form part of a continuum preceded by another possible unit such as “getting an old typewriter out of the store room,” preceded by “arriving at the office,” etc. What may count as a typing situation, for example, and allow representation as a macro-event or one of its sub-events (or both), is specified in the design with the event boundaries presented via the video clip. Its viability was tested empirically on the basis of speakers’ responses in a pilot study that included the languages in the overall crosslinguistic study (Semitic, Romance, and Germanic languages). A situation showing “someone folding out a sofa to convert it into a bed” and allowing the representation someone is converting a sofa into a bed, as a macro-event, was not validated since speakers generally relate to the individual sub-events shown. On the other hand, a situation with “someone drinking a glass of water” will be rarely represented at a high level of resolution with sub-events such as grasping the glass, lifting it off the table etc. Underlying principles at this level are given by event schemas based, in part, on cultural practice and world knowledge. In contrast to questions regarding the macro-event, the relevant sub-events can be defined on the basis of temporal criteria, given the fact that they can be located at a unique point within the given frame, as mentioned above. Subevents are temporally located events in that they form part of a sequence. Take, for example, a clip with a child playing ball on a lawn in the back garden. The clip shows the following: Picks up a ball Positions it over its head Turns to aim the ball at a goal Throws it towards the goal The speaker may choose to relate to any one of these events in answering the question what is happening. The sub-events of positioning the ball and turning around could be subsumed hierarchically and represented as to aim the ball, for example. In the present analysis it is still categorized as a subevent of the macro-event (playing ball), since aiming the ball can be uniquely located in relation to the other sub-events (preceded by pick up the ball, for example). Representations at the macro level can have different degrees of differentiation (playing, vs. playing a game, vs. playing ball); they can be treated, on the whole, as situations represented as a single state, using verbs such as to play, to sing, to type. Sub-events, by contrast, are generally encoded by verbs
Time event relations and clause structure
73
that relate to a change in state, giving two different states, or more specifically two times, as in throw a ball towards a slide. With throw the ball, for example, there is the time before the ball is thrown and the time when the ball is thrown. Thus throw x describes an event with two times in which there is a transition from a time interval with “ball not thrown” to one where the assertion “ball is thrown” holds (see Klein 1994, 2000). Speakers can defocus changes of this kind, using verbs with one time such as playing ball, for example. Dynamic situations represented in this way are also termed ‘homogeneous’ (Ryle 1949), or ‘atelic,’ i.e., possible goal orientations are defocused (Comrie 1976; Bach 1981; Sasse 2002). The contrast between situations represented as having one vs. two times is often a question of perspective: if terminal points, boundaries or transitions from one phase to the other are defocused, speakers make way for a representation of a situation with one-time verbs: he is playing with blocks, as opposed to two-times verbs he is building a tower out of blocks. In the latter case the existence of whatever is being built will provide a time at which the event can be viewed as completed. With the representation he is playing with blocks the possible time of completion is not in any way indicated in the means chosen in the linguistic representation. The video clips consisted of forty separate scenes with six test items and thirty-four fillers. The six items in which the event at issue could be represented at the level of either a macro- or sub-event are as follows, using verbs with one or two times: r Scene (1) shows a person at a riverbank fly fishing: the film clip shows the fisherman standing on the banks of a river, casting the line and letting it float on the water. The options in event representation include the macro-events fishing, fly fishing or a possible sub-event such as casting the line. r Scene (2) shows a person at a typewriter preparing to type: the scene shows a man sitting at a typewriter taking a sheet from a tray with paper, and inserting it in the typewriter. The options in this case are the macro-event typing/preparing to type or one or more of its sub-events, taking a sheet of paper, inserting, putting a sheet of paper in the typewriter. r Scene (3) shows a young boy in a garden throwing a ball towards a goal; the options are a macro-event such as playing ball, playing a game, or sub-events such as throwing, tossing, rolling a ball, etc. r Scene (4) shows someone in the supermarket pushing a trolley along the aisle and stopping in front of a shelf, taking a packet off the shelf, putting it in the trolley. Options at the macro level include shopping, getting the groceries, and as possible sub-events pushing a trolley around in a supermarket, taking x off the shelf, putting x in the trolley. r Scene (5) shows a waitress in a caf´e with a tray in her hand, going over to a table, taking a cup of coffee off the tray and placing it on a table in front of
74
Carroll and von Stutterheim
a person. A possible macro-event in this case is serving coffee, or any of the events listed as a possible sub-event. r Scene (6) shows a person at a table rebuilding a tower made out of building blocks; some of the blocks are still in a jumble on the table, while others are already stacked as the base of a tower; the clip shows the person taking a block and placing it on the base. Possible options here are playing, playing with blocks, building something, or any of the possible sub-events (take, place, put, stack a block). (Although building a tower is a macro-event description, build is a verb with two times.) Speakers of English were asked to tell what is happening; the corresponding form in German is was passiert gerade (‘tell what happens just now’). The speakers were told that we were interested in what was going on and not in a description of the scene. This was emphasized, since pilot studies showed that speakers often give lengthy descriptions that reduce the time available to present information on the event. The forty scenes were presented in randomized orders with a blank between each clip of eight seconds, while each clip lasted approximately seven to nine seconds (entire set with forty clips around ten minutes). Examples from the English and German data sets are as follows. English Shopping (macro-event, encoded in a dependent clause; full existential clause) 001 002
There is a girl shopping in a supermarket
Shopping (macro-event, encoded in a dependent clause; existential elliptical) 001 002
A woman going grocery shopping
(sub-event, encoded in main clauses) 001 002
A woman is pushing a shopping trolley and has chosen something off the shelf
Typing (sub-events, encoded in main clauses) 001 002
A man is taking paper putting it into a typewriter
German Shopping (sub-event, encoded in main clauses) 001
Jemand nimmt eine Packung Kekse aus dem Regal ‘Someone takes a packet of biscuits out of the shelf’
Time event relations and clause structure
75
Shopping (sub-event, encoded in main clauses) 001
Eine Frau geht mit einem Einkaufswagen durch den Laden ‘A woman goes with a trolley through the shop
002
und nimmt Sachen aus dem Regal and takes things out of the shelf’
A note on existential clauses in the English data – as well as the general omission of the relative pronoun who in introducing the dependent clause – is in order here since the full clause is typically realized in the first scenes presented, there is a boy (who is) playing in the garden, but is generally reduced to an elliptical pattern in which there is is omitted as the task proceeds. The expletive there forms the syntactic subject of the clause but is empty with regard to content. The full existential recurs in the data, given changes such as a switch in tense, since the clause has to be realized in full with the finite verb to mark the tense in question (there was a man painting a picture). Although speakers generally begin speaking approximately two seconds after the clip started (speech onset), they sometimes wait until the clip has finished, which occasionally leads to a change in tense. With regard to the omission of the relative pronoun in the present data set, dependent clauses differ in informational status and function, depending on whether the relative pronoun is used or not. If the relative pronoun is used, the referent is introduced as a ‘topic’ for a subsequent presentation, as in there was an old king who lived in a beautiful castle. When the relative pronoun is used, the speaker can be expected to continue with more information on the referent in question (see Lambrecht 1994). Where this is not called for, as in the scenes above, the relative pronoun can be omitted. In other words, a statement with there is a man at a desk who is trying to type, would indicate that we can expect to hear more about this referent, which is not the case in the present task. 3
Event representation and clause type
Table 4.1 gives an overview of preferences in the selection of clause type when mapping information on the event onto form. The figures cover 180 events (six scenes, thirty speakers). Information on the event is typically mapped into a main clause in German (88.3%) while presentationals (ich sehe . . . I see . . . ), with information on the event in a dependent clause, amount to 11.6%. In English the preferred pattern is the other way around, compared to German, since information on the event is typically mapped into a dependent clause (70.0%) occurring in conjunction with an existential. Table 4.2 shows how event representation in English (macro-event or subevent) is distributed across the two clause types for the six scenes listed above. As shown in the table, macro-events are more likely to occur with a dependent
76
Carroll and von Stutterheim
Table 4.1 Clause type: Dependent vs. main clause English
German
Information on event in dependent clause
126/180 70.0%
21/180 11.6%
Information on event in a main clause
54/180 30.0%
159/180 88.3%
Table 4.2 English: Event representation in dependent vs. main clauses Macro-event in dependent clause
104/126 82.5%
Sub-event in dependent clause
22/126 17.4%
Macro-event in main clause
23/54 42.5%
Sub-event in main clause
31/54 57.4%
Table 4.3 German: Event representation in dependent vs. main clauses Macro-event in main clause
46/146 31.5%
Sub-event in main clause
100/146 68.4%
Macro-event in dependent clause
8/21 38.1%
Sub-event in dependent clause
13/21 61.9%
clause, while representations in terms of a sub-event are relatively low in this clause type (17.4%) and are more likely to occur in a main clause. This distribution is also found in German, in that sub-events are less likely to occur in a dependent clause, compared to main clauses (table 4.3). Overall frequencies differ, however, since there is a clear preference for main clauses and sub-events in event representation in German. (In 13/180 cases the situation was described at the level of the macro-event as well as one of the sub-events, i.e. giving two main clauses. The thirteen cases were omitted in the analysis so the figures for main clauses add up to 146 and not 159.) German speakers tend to map information on the event into a main clause, and in this case the event is also typically represented as a sub-event, as in the English data. Given the low number of occurrences in German for dependent clauses, the present figures are not reliable. So there is evidence of similar preferences in event representation, given a specific clause type in both languages, but the languages differ with respect to the frequency with which the patterns occur.
Time event relations and clause structure
77
Table 4.4 English: Distribution of macro- and sub-events with respect to clause type English 30 speakers
Event main clause
Event dependent clause
6 Scenes (180)
Sub-event
Macro-event
Sub-event
Macro-event
typing (30) fishing (30) playing (30) shopping (30) serving coffee (30) building tower (30)
4/5 3/7 3/5 5/11 9/14 7/12
1/5 4/7 2/5 6/11 5/14 5/12
4/25 0/23 6/25 4/19 3/16 5/18
21/25 23/23 19/25 15/19 13/16 13/18
180 Total
31/54 57.4%
23/54 42.5%
22/126 17.4%
104/126 82.5%
Table 4.5 German: Distribution of macro- and sub-events with respect to clause type German 30 speakers
Event main clause
Event dependent clause
(13 references omitted)
Sub-event
Macro-event
Sub-event
Macro-event
typing (29) fishing (26) playing (27) shopping (27) serving coffee (30) building tower (28)
23/28 10/17 12/19 14/26 20/29 21/27
5/28 7/17 7/19 12/26 9/29 6/27
1/1 5/9 4/8 1/1 1/1 1/1
0/1 4/9 4/8 0/1 0/1 0/1
167 Total
100/146 68.4%
46/146 31.5%
13/21 61.9%
8/21 38.1%
The frequency of occurrence of the two event types in main clauses in German (sub-events 68.4% vs. macro-events 31.5%) is not random.1 A breakdown of the numbers, as found for each of the six scenes, is presented in tables 4.4 and 4.5 for both languages. 1
The preference for the sub-event is significant (sub-event vs. macro-event H(1) = 7.030, p = 0.008). English native speakers clearly prefer the macro-event in dependent clauses (H(1) = 8.366, p = 0.004). In both languages the distribution observed is random when occurrences for the given clause type are low (e.g. dependent main clauses in German (21) or main clauses in English (54)).
78
Carroll and von Stutterheim
So far there is evidence of a sustained correlation between the way in which information on an event is embedded at clause level, indicating that this and associated temporal factors are coupled in event representation.2 4
Testing preferences in event representation
In order to test the stability of the patterns observed, where speakers show matched preferences between event representation, clause type and associated temporal factors (assertion time), a set of experiments was carried out in which the clause type, as well as the time given to formulate information on the event, were manipulated. If speakers have a preference in event representation that is independent of temporal factors associated with clause status, these preferences should be immune to manipulations of this kind. The following tests were carried out with speakers of German. (i) In the first experiment speakers of German were asked to use a dependent clause to see if this led to any change in the preference to represent the situation on the basis of a sub-event. (ii) In the second experiment the blanks between the video clips, the time in which speakers can provide information on the scene, was reduced by two seconds from eight to six seconds, compared to the standard set described above. However, the length of the video clips was maintained (approx. eight to nine seconds), giving speakers the same length of time as before to process information on the scene. Again if speakers have a preference in event conceptualization for sub-events, then this should not be open to disruption by exerting time pressure and reducing the time available in mapping information into form. 4.1
Inducing use of a dependent clause
Since the existential (es gibt . . . ‘there is’) is rarely used when grounding information in German, speakers were asked to use the construction (ich sehe . . . ‘I see’), since presentationals typically take this form in the cases in which they occur in the data base: ich sehe einen Jungen der ins Wasser springt ‘I see a boy who into the water jumps’ 2
A preliminary comparison with Italian with a similar task (twenty-five speakers, five scenes) provides further evidence for the preferences observed. A high frequency in the use of existentials and a conjoined dependent clause (72.6%), compared to main clauses (27.4%), correlates with a predominance of macro-events 86/125 (68.8%), in contrast to sub-events 39/125 (31.2%), in event representation.
Time event relations and clause structure
79
Table 4.6 Event representation in German, dependent clause enforced Event representation as macro-event Event representation as sub-event
47/90 (52.2%) 43/90 (47.7%)
The speakers were also asked here to tell was passiert gerade (‘what happens just now’), as with the data set described above, but to formulate their response using the clause type indicated. This construction is similar to existentials in English since focus is directed on the participant in the event (ich sehe einen Jungen, ‘I see a boy’), and information on the event in which the entity participates is encoded in a dependent clause. The task was carried out with fifteen speakers and the stimulus material consisted of the same set of forty scenes with the six test items described above (ninety events).
4.2
Results
The enforced use of the dependent clause leads to an increase in the number of cases in which the event is represented as a macro-event (47/90; 52.2%), reducing the rate of occurrence for sub-events to 47.7%. The response thus differs from the spontaneous data set described above where the overall occurrence for sub-events is higher at 68.4% (main clauses). Although the tendency is not pronounced, it is possible to disrupt a preference for sub-events by asking speakers to use a different type of clause. In the present context one form of representation is as likely as the other.
4.3
Experiment with time pressure
As mentioned above, time pressure was introduced by reducing the time between the video clips to express the relevant information. Crucially, the time left for information processing and event conceptualization, i.e. the length of time of the video clip, was maintained as in the standard set. The experiment was carried out with twenty speakers of German and the same six test items.
4.4
Results
The total number of events analyzed is not 120 but 108, since there are twelve syntactically incomplete responses in the time pressure data set. While main clauses still predominate at 85/108 (78.7%), the preference in event representation has clearly changed. Representations as sub-events are lower with an
80
Carroll and von Stutterheim
Table 4.7 Event representation under time pressure Event main clause
Event dependent clause
Sub-event
Macro-event
Sub-event
Macro-event
44/108 40.7%
42/108 38.9%
6/108 5.5%
16/108 14.8%
overall frequency of 50/108 (46.2%), and are as likely to occur as representations as a macro-event 58/108 (53.7%). In contrast to event representation, preferences in the selection of clause type are not disrupted. Table 4.7 illustrates the distribution of macro- and sub-events across main clauses. In sum, the findings show that time pressure disrupts preferences in event representation, although the time available for the conceptualization phase was not modified. If event representation as a sub-event were paramount in German, with precedence over temporal and grammatical factors, disruption under the set conditions should be unlikely. Coming back to the question as to whether formal means used in anchoring an event and its participants in context have implications for the way in which the event is represented, the results indicate that time pressure has led to the disruption of a finely tuned set of grammatical factors that allow a congruent fit between requirements relating to temporal anchoring, assertion time, clause type, and event representation. 5
Discussion and conclusions
In solving distributional questions in information structure, speakers draw on constructions which either profile an entity, and its properties, or an event and its participants (existentials with a dependent clause vs. a simple main clause). These options at clause level correlate with different preferences in event representation, as shown in the analysis presented above. The question is, what factors drive the preferences observed in event representation? 5.1
Clause type, time of assertion, and finiteness
As outlined in section 1, one of the crucial distinctions between main and dependent clauses concerns the encoding of assertion time. Taking, for example, a situation represented as a soprano was singing, a distinction is drawn between the situation time and the time for which the assertion or claim is made, as in a soprano was singing when he arrived. In the latter case the assertion was singing holds for the interval given with when he arrived. The time interval in the latter
Time event relations and clause structure
81
example is referred to as the ‘topic time’ or ‘the time for which the assertion holds’ (see in detail, Klein 1994; 2006). Significantly, this is where main and dependent clauses differ. Dependent clauses have a reduced tense structure and temporal interpretation is dependent on the verb of the main clause (Hazout 2004; Klein 2006). Taking, for example, there is a teacher (who is) doing equations, the event depicted in the dependent clause (doing equations) does not have a time of assertion, in contrast to the main clause there is a teacher. The temporal properties of the dependent clause are not accessible since they are overruled by the main clause and its finite component. This means that what is actually now the case does not necessarily hold for the event predicated in the dependent clause. The time for which the assertion holds relates to the existence of the entity: it is claimed that it is now the case that – “there exists a teacher.” Looking at the nature of the time interval at issue, the time span for x exists goes beyond the time interval given by the individual event in which the entity becomes a participant. The interval which is active in the conceptual space may open the door for the speaker to select an event description that is not necessarily closely tied to the here and now and to represent the situation as the macro-event: there is a teacher doing maths rather than there is a teacher writing an equation on the board (a sub-event depicted in the video clip). If a main clause is used, the verb that encodes information on the event delivers the assertion time, and thus relates directly to the question what is happening? The results of the empirical analysis reveal a preference in event representation in main clauses which is closer to the here and now, since speakers are more likely to select one of the sub-events shown in the clip. Temporal factors, time of assertion, and related time intervals may constitute one of the factors leading to the different preferences in event representation in the present task, showing how clause type and time event relations are interrelated as possible contributing factors in event representation. 5.2
Existential predicates and dependent clauses
Examining the existential and the dependent clause, options in event representation may also be sensitive to the attributive function of the dependent clause, since the clause in which the event is encoded is a pseudo-relative there is a boy (who is) playing ball. Relative clauses belong to the set of means that encode properties of entities and modify nouns. In this context there is also the question of the there-clause and the semantics of existence predicates in that there existential predicates can denote a property of a situation (see Strawson 1959; Chierchia 1998a,b; McNally 2009). This is indicated in the contrast between the following descriptions of a situation. In yesterday we were at the amusement park; there was singing the singing can be interpreted as an integral part of the amusements on offer, while the description yesterday
82
Carroll and von Stutterheim
we were at the amusement park; someone was singing could be interpreted as coincidental and not a set feature in the program. Furthermore, predicates with instantiate-type semantics do not combine with expressions that denote particulars but with non-particular denoting expressions (McNally 2009; Chung and Ladusaw 2004); denoting non-particulars: There was every type and brand of farm and forestry equipment available; denoting particulars ∗ There was every piece of equipment available (see in detail, McNally 2009). In this sense, existential predicates are property predicates and may constitute one of the relevant factors in event representation with respect to the level of abstraction observed with this type of construction. Given use of the construction existential + dependent clause, a situation is more likely to be represented as a playing situation, typing situation, a shopping situation, rather than at the level of one of the sub-events, with all its particulars (taking a packet off the shelf) as depicted in the video clip. The differences in event representation observed across the main and dependent clause cannot be attributed to the progressive, since it occurs in both contexts and its inherent semantic features are the same in both clause types. The same preferences in event representation across clause type are also found in German where this temporal perspective is rarely used. 5.3
Information structure and event representation
As mentioned briefly in section 1, questions relating to information structure constitute the basis for the use of the clause types found in the present study. The majority of English speakers, for example, select a construction that satisfies requirements regarding the distribution of new or unfamiliar information in the clause. The construction used (existential) has an expletive or empty subject (there . . . ) which closes the door, so to speak, on the option of mapping information onto the syntactic subject of the clause, and with this onto preverbal position in English. In the present task, for example, this structure ensures placement of the participant (new or not active in memory) in postverbal position. The frequency of the existential in the data can be attributed to the fact that in English, as well as the Romance languages, the subject of the sentence is a core feature in encoding topic information, i.e. information that is clearly at issue and recoverable in the context in question. In contrast to English, information that is new in German in the domain of discourse can be mapped as the syntactic subject, since this can be placed in different positions in the clause. Word order constraints for main clauses, in a formal sense, are limited to the position of the finite verb, since this must be the second main constituent (verb second or V2 constraint). This creates slots around the finite verb (Vorfeld, ‘prefield’; Mittelfeld, ‘midfield’) that can be used to encode information with topic status (see Frey 2000). Constraints in
Time event relations and clause structure
83
placing new information in the Vorfeld (preverbal position) are linked to the assignment of topic status, as occurs with participants involved in a series of events, for example (e.g., in a narrative). This is not the case in the present study, given the fact that the forty scenes shown in the video clips are not connected in any way. All the information encoded in the clause is treated as having focus status, i.e., requiring attention as new. 5.4
Knowledge bases in language production
The findings provide a window on event representation and show how temporal semantic factors associated with existential predicates, main and dependent clauses, are taken into account when talking about events. The findings indicate that event representation is guided by an integrated knowledge base that incorporates inherently grammatical as well as semantic and conceptual knowledge, and allows fine tuning across the different domains that speakers must deal with in language production. In event representation consideration has to be given not only to patterns of lexicalization, verb type, argument structure, but also to a cluster of factors concerning temporal relational systems that ensure specification of a time of assertion and grammatical constructions that support requirements in information distribution. An integrated knowledge base of this kind allows event representation to proceed in terms of the best possible fit with respect to core grammatical means and their functions.
Event representations in signed languages1
5
¨ urek and Pamela Perniss Aslı Ozy¨
1
Introduction
Signed languages are the natural visual languages of the Deaf, and rely mainly on spatial and body-anchored devices (that is, the body, head, facial expression, eye gaze, and the physical space around the body) for linguistic expression. The affordances of the visual-spatial modality allow signers to give detailed information about the relative location and orientation, motion, and activity of the characters in an event, and to encode this information from certain visual perspectives. In spoken languages, devices such as spatial verbs, locatives, and spatial prepositions also help speakers to situate referents in a discourse context and describe relations among them from certain perspectives (e.g. Taylor and Tversky 1992; Berman and Slobin 1994; Gernsbacher 1997). However, due to modality differences, spatial details about an event can be conveyed in a richer way in signed compared to spoken languages.2 Furthermore, much spatial information, including visual perspective, is often encoded obligatorily in event predicates of location, motion and activity predicates in signed languages due to the modality. The purpose of this chapter is to give an account of the way in which a signer’s choice of visual perspective interacts with and determines the choice of different types of event predicates in narrative descriptions of complex spatial events. We also ask whether certain types of events (i.e. transitivity) are more or less likely to be expressed by certain perspectives and/or types of predicates. To give a comprehensive account of this phenomenon and to see to what extent the visual-spatial modality predicts/constrains such expressions in sign languages, we compare two historically unrelated and differentially documented sign languages, namely Turkish (T˙ID) and German Sign Language (DGS).3 1 2
3
This research is funded by NWO (Netherlands Science Foundation), VIDI project. Note that speakers convey more spatial information than is present in their speech if one takes ¨ urek into account the gestures that accompany their speech (Goldin-Meadow 2003; Kita and Ozy¨ 2003). The acronyms T˙ID and DGS use the letters of the Turkish and German names for the sign languages, respectively. T˙ID stands for T¨urk I˙s¸aret Dili; DGS stands for Deutsche Geb¨ardensprache. See section 3.1 for general information about these sign languages.
84
Event representations in signed languages
85
First we give an overview of different types of event predicates and perspectives that have modality-specific features in signed languages. 2
Event representations in signed languages: Types of event predicates and perspective choice
2.1
Types of event predicates
In order to express the location, motion, and action of referents in an event, signers can use different types of event predicates, in particular, so-called ‘classifier’ (handling, entity) or ‘lexical’ predicates. These two main types of predicates convey different amounts of semantic information about the figure, location, motion, and action of the depicted event. In particular, classifier predicates are semantically more specific than lexical predicates, as will be described below. In the use of classifier predicates, the handshape typically expresses information about the size and shape of the referent, and the position and movement of the hand in sign space encodes information about the motion and location of the referent in the event space (Schick 1990; Engberg-Pedersen 1993; Emmorey 2002; Schembri 2003). Two major types of classifiers are distinguished in the sign language literature on the basis of how referents are depicted by the handshape: (1) in ‘entity’ classifiers, the hand represents a referent as a whole, and the handshape encodes certain salient features of the entity’s size or shape; (2) in ‘handling’ classifiers, the hand represents the handling or manipulation of a referent by an animate referent (e.g. Engberg-Pedersen 1993; Emmorey 2003; Zwitserlood 2003, among others).4 For example, a B-hand (flat hand) can be used as an entity classifier to represent a car (in German Sign Language) or a table (an object with a broad, horizontal surface), while an F-hand (contact between index finger and thumb) can be used as a handling classifier to represent holding a single flower or picking up a pencil. These two types are particularly relevant to the present study. The use of classifier predicates to express the location, motion, and action of referents in discourse is generally preceded by a sign that identifies the referent. Once the referent has been identified, a signer can use classifier predicates to convey spatial information about it, as can be seen in example 1 from DGS below (see still 1 in appendix 2 for the cartoon event being depicted). In this example, the signer first uses the lexical noun MOUSE to identify the referent and then uses an entity classifier in the second sign to refer to the mouse’s 4
In classifications proposed by other researchers, what we call ‘entity’ and ‘handling’ classifiers are subsumed under categories including ‘static size and shape specifiers (SASS),’ ‘semantic classifiers,’ and ‘instrument classifiers’ (Supalla 1986; Brennan 1992).
86
¨ urek and Perniss Ozy¨
Example 1 (DGS)
(a) GLOSS:MOUSE
(b) GLOSS:Mouse(RH:entityCL) come-from-right
(c) GLOSS:Mouse (RH:handlingCL) bouncing-ball
path and direction of motion. In the third sign, she uses a handling classifier to refer to the mouse’s simultaneous manual activity, namely bouncing the ball. The use of entity and handling classifiers in discourse can be linked to the type of information that can be felicitously represented by the different forms. In particular, while entity classifiers are better suited for the representation of an entity’s location and motion, handling classifiers can aptly depict the manner of manual activity (Supalla 1986; Engberg-Pederson 1993), as can be seen in example 1 (DGS). The use of a handshape with an extended, upright index finger can very appropriately represent the path of motion (e.g. straight), including source and goal information (e.g. from right to left), of an animate figure. The intrinsic features of the index finger handshape do not, however, include parts that correspond to the human figure’s arms or head, and are thus not suited for the expression of anything involving manual activity. On the other hand, the handling handshapes are better suited for representing the manner of the activity than for expressing change of location. Thus, expressions of this type of information appropriately involve the use of handling classifiers, which – as the name suggests – represent an animate agent handling an entity. In addition to classifier predicates, signers can also use lexical predicates to describe the actions of protagonists in events. Instead of representing the handling of an entity or the entity itself, the handshape in lexical predicates corresponds to the sign’s citation form (i.e. the form that would be listed in a dictionary – see examples 7 and 8 later in the text). For example, signers may use the lexical predicate PLAY to describe a scene where the mouse and the elephant play ball together (throwing it back and forth), instead of actually depicting the action of throwing the ball (as the use of a handling classifier would). When signers use lexical predicates, referents’ actions are semantically
Event representations in signed languages
87
identified, but more specific spatial information about the referents themselves, as is encoded in classifier predicates, is absent.
2.2
Perspective types
In order to depict an event in fluent discourse, signers generally have to choose the visual perspective from which to depict the location, motion, and action of figures in the event. Thus, signing perspective refers to the vantage point from which an event is mapped or projected onto sign space. Unlike spoken languages, the iconic properties of the visual-spatial modality make it possible to map referent location and motion from the real event space directly onto sign space from different perspectives. This is done by visually modulating the predicate (classifier or lexical) in the sign space according to the particular perspective chosen. In this chapter, we emphasize the notion of ‘event space projection’ in our definition of perspective. We distinguish the different perspectives or event space projections (character and observer) in signed depictions primarily in terms of (i) the vantage point from which the event is projected onto the sign space, (ii) the signer’s role in the projected event space, and (iii) the size of the projected event space (e.g., as evidenced by the depiction of size and shape information about the figure). In what we call character perspective, the event space is projected onto sign space from a character’s vantage point within the event. The signer assumes the role of a character in the event, such that at least the character’s head and torso are mapped onto the signer’s body, and the size of the projected space is life-sized. When observer perspective is employed, on the other hand, the event space is projected onto sign space from an external vantage point. The signer is not part of the represented event, and the event space is reduced in size, projected onto the area of space in front of the signer’s body. These signing perspectives have been described along similar lines by a number of other researchers. Character and observer perspective correspond, respectively, to Liddell’s (2003) distinction between ‘surrogate’ and ‘depictive’ space,5 Morgan’s (1999) use of the terms ‘shifted referential framework’ and ‘fixed referential framework,’ and to what Schick (1990) calls ‘real-world space’ and ‘model space.’ Emmorey and Falgier (1999) introduce the terms ‘diagrammatic space’ and ‘viewer space’ to describe the two spatial formats that signers use to structure space in describing environments like a convention center or a town. Furthermore, McNeill (1992) uses the terms ‘character 5
Depictive space was called ‘token space’ in some of Liddell’s earlier publications (Liddell 1994, 1995).
88
¨ urek and Perniss Ozy¨
viewpoint’ and ‘observer viewpoint’ for a similar distinction in the use of space for referent representation in gestures accompanying spoken narratives. 2.3
Alignment of event predicates and perspectives
The use of the types of classifier predicates described above typically involves the use of character or observer perspective (or the fusion or simultaneous use of both perspectives). However, less is known with regard to how perspective is used with lexical predicates. With regard to perspective and the type of classifier predicate, the most prototypical alignments in their use can be motivated in the following way. Referent motion and location within the event space is most felicitously depicted through the use of entity classifiers, which depict the figure (i.e. salient size and shape properties of the figure) as if viewed from an external viewpoint. This corresponds to observer perspective, where the signer is external to the event and the event space is projected onto the area of space in front of the signer. The use of observer perspective is thus expected to co-occur with the use of entity classifiers. On the other hand, in character perspective, the signer is part of the event in the role of an event protagonist. Handling classifiers depict the way a referent is handled or manipulated by an agent. Thus, character perspective is expected to co-occur with the use of handling classifiers.6 Table 5.1 summarizes what we take to be the most salient features of the two main signing perspectives in terms of event space projection. In addition, it also indicates which classifier types will co-occur with which perspectives when they are expected to ‘align.’ Note that these expected alignments assume that the signer’s visual perspective of the event will determine the type of event predicate chosen, as described above. This view also predicts that when signers choose either perspective, they are more likely to depict the event with a classifier predicate than with a lexical predicate, since the first one is more visually specific than the latter. However, the combinations of perspective and classifier predicates found in extended discourse appear to be much more varied than the expected alignments. For the purposes of this chapter, we call these less expected, though frequent, constructions ‘non-aligned.’ For example, entity classifiers can appear not only in observer perspective event space projections, but also in character perspective representations. In event descriptions where two referents need to be depicted simultaneously, one referent can be mapped onto the signer’s body in character perspective and the other mapped onto the hand as an entity 6
See also Metzger (1995) and Liddell and Metzger (1998) for the notion of ‘constructed action,’ where the signer’s movements and affective displays can be directly attributed to the character mapped onto the body.
Event representations in signed languages
89
Table 5.1 Characteristics of observer and character perspectives in terms of event space projection and classifier types that are aligned or non-aligned with each perspective
Character perspective Observer perspective Projection of event space Event-internal vantage point Encompasses signer Life-sized Classifier Handling Entity
Event-external vantage point In front of signer Reduced size Entity Handling
Perspective/classifier combination
Aligned Non-aligned
classifier (i.e. upright index finger) moving towards the body to mean “the person approached me” (see a similar example in Liddell 2003: 209). Conversely, though it has not been documented in the literature, handling classifiers may appear not only in character perspective representations, but also in representations in which the event space is projected from an observer’s perspective (see example 3 from T˙ID later in the text). These possible uses of perspective with non-aligned classifiers are also represented in table 5.1. To date, not much is known about how frequently and under what conditions these different types of constructions, that is, different types of combinations of perspective and type of classifier or lexical predicates, appear in sign language discourse. For example, do signers prefer certain event predicate types in certain perspectives? Secondly, is there some event type (i.e. transitivity) that motivates the use of certain event predicates and perspective/classifier predicate combinations (i.e. aligned vs. non-aligned)? Finally, almost nothing is known about possible crosslinguistic variation between sign languages with regard to these questions.
3
The present study
In the present study, we investigate how different perspective and classifier and lexical predicate combinations occur in narratives that depict the location, motion, and action of referents. We compare these uses both qualitatively and quantitatively across two unrelated sign languages, namely in Turkish (T˙ID) and German Sign Language (DGS). Until recently, the use of classifier predicates for depicting locations and actions of referents has been assumed to be similar across sign languages (Meier 2002; Talmy 2003b; Aronoff, Meier, and Sandler 2005), or has not been investigated for systematic differences across unrelated, or less documented sign
90
¨ urek and Perniss Ozy¨
languages (for an exception, see Nyst 2004, who shows that certain types of classifier predicates found in Western sign languages – notably, entity classifiers – do not exist in Adamorobe Sign Language, a village sign language used in Ghana). Furthermore, the assumption of modality effects has created a bias toward expecting similarities rather than differences in the use of these devices across sign languages (see also Supalla and Webb 1995; Newport and Supalla 2000). These claims have been attributed to the homogenizing effect of the iconic (i.e. visually motivated) properties of sign languages in contrast to spoken languages (Aronoff et al. 2005). However, there has not been much research on less well-known and unrelated sign languages or in discourse situations to test these claims. In this chapter, we investigate similarities and differences between two sign languages in the use of classifier predicates and perspectives in sign language narratives. We discuss the implications of these findings in terms of whether and to what extent the iconic properties of the visual-spatial modality homogenize expressions related to spatial representation in different sign languages. If the use of space in these spatial expressions is driven primarily by iconic properties of the visual-spatial modality, we do not expect to see differences between the two unrelated sign languages, since they use the same modality for expression. However, if there are further constraints on the use of such expressions other than iconicity (e.g. linguistic or discourse constraints), then we do expect variation between the two languages. 3.1
History and previous work on TI˙D and DGS
In comparing two sign languages, it is important to take into account their historical and sociolinguistic properties. If there are differences between sign languages in terms of youth and sociolinguistic context, then the differences/similarities we find in uses of perspective and classifier predicates cannot be directly attributed to linguistic variation (see Aronoff et al. 2003; Aronoff et al. 2005 for the possible influence of the youth of sign languages to account for their differences or similarities). Furthermore, it is also important to establish that there has not been any historical contact between the languages. The two sign languages we compare in this study, namely T˙ID and DGS, are similar in terms of historical development and the use of sign language in education, but there is no contact attested between them (Zeshan 2002). In Turkey, the establishment of the first Deaf school is dated to 1902 (Deringil 2002).7 From 1953 to the present, the teaching of T˙ID has not been allowed in schools; instead oral teaching methods have been preferred. The Turkish 7
The use of a sign language within a Deaf community that existed in the Ottoman Palace for official reasons between 1500–1700 has been documented (Miles 2000), but it is difficult to obtain evidence that the T˙ID used today is a continuation of the sign language used in the Palace.
Event representations in signed languages
91
Federation of the Deaf was founded in 1964 and since then has helped promote communication among the Deaf population throughout the country. In Germany, the first schools for the Deaf were established in the late eighteenth century and used a manual method of teaching until the middle of the nineteenth century. In the second half of the nineteenth century, the teachers of the Deaf began to support the idea of a strict oral method. Since 1911, schooling for the Deaf has been compulsory and a predominantly oral approach has remained the foundation of Deaf education in Germany. DGS has been used continuously by members of the Deaf community since formal education united them, and since the establishment of the Federation of the Deaf in 1848 (Vogel 1999). In both countries, Deaf people learn sign language either from their peers in the Deaf schools or through exposure to the community, e.g. in the Deaf clubs, without formal instruction in the schools. Thus, due to the historical and sociolinguistic similarities, possible differences in structure are less likely to be attributable to differences in the ages of the two sign languages, but may rather reflect structural variation between T˙ID and DGS. 3.2
Method
Event narratives were collected from four Turkish and ten German Sign Language users. In each group, signers were either native or early signers (who learned sign language no later than 6 years of age). Signers were asked to view two short silent cartoons (from Westdeutscher Rundfunk television broadcast) that contained activities of a personified mouse and elephant (see appendix 2 for selected stills). Due to field research circumstances, for T˙ID, each of the four signers narrated both cartoons, while for DGS, five signers narrated one of the cartoons and five (different) signers narrated the other one. T˙ID narratives were collected in Istanbul, Turkey, and DGS narratives in Aachen and Cologne, Germany. Movies were described to other deaf signers who had not seen the movies. 3.3
Coding
Narratives were transcribed into DGS or T˙ID glosses with the help of hearing and deaf native/early signers. Since the aim of this study is the investigation of whether two different sign languages depict events differently, only spatial and activity predicates were considered for the analysis. All predicates that indicated location, orientation, motion, or manual activity of referents in space were subsumed under spatial and activity predicates. Each spatial and activity predicate was further classified into classifier (handling or entity) vs. lexical predicates. Each event predicate was then coded
92
¨ urek and Perniss Ozy¨
Table 5.2 Characteristics of observer and character perspective in terms of event space projection and their alignment with the direction or placement of the predicate in our coding
Projection of event space to sign space Direction or placement of the predicate
Character perspective
Observer perspective
Event-internal vantage point Encompasses signer Life-sized Sagittal axis
Event-external vantage point In front of signer Reduced size Lateral axis
with regard to whether and from which perspective it projected the event onto sign space. In our coding, in deciding whether an event space projection was from character or from observer perspective, the direction or placement of the predicate in space was crucial. This is motivated by the way events are depicted in the stimulus films used (see the stills from the stimuli in appendix 2). In the films, referents are predominantly located on the left and right sides of the screen, and movement or actions between them, as seen by the viewer, appear laterally directed. Thus, a lateral representation in sign space of referent location, motion, and action reflects the image of the event space as viewed on the screen. For this reason, we take the laterality of the predicate’s direction as a cue that the event space is projected from the vantage point of an external observer. On the other hand, in the stimulus films, motion and action are directed either toward or away from the protagonists’ bodies. Thus, location, motion, and action as represented from a character’s perspective are mapped onto sign space along the sagittal axis – moving away from or towards the signer’s body or referents associated with locations opposite the signer’s body. (See examples 2–5 below.) Thus, we add another element, namely the direction of movement of predicates, to the characteristics that determine the event space representation from either a character’s or an observer’s perspective in our coding (as shown in table 5.2).8 Types of event predicate and perspective alignments ¨ urek 2008), using In our previous work (Perniss 2007a, b; Perniss and Ozy¨ the same data and the same coding scheme, we have identified different 8
We do not claim that the axis of representation will determine the choice of perspective in all signed narratives. We use it as a cue for the analysis of these narratives based on these particular stimuli.
Event representations in signed languages
93
Spatial and activity predicates
CLASSIFIER PREDICATES Event space projection
OBSERVER aligned (entity) non-aligned (handling)
CHARACTER
FUSED
LEXICAL PREDICATES Event space projection
NONE
OBSERVER
CHARACTER
aligned (handling) non-aligned (entity)
Figure 5.1 Different construction types of spatial and activity predicates observed in our data9
construction types based on our definitions of observer and character perspective event space projections and on how they combine with different types of predicates (see fig. 5.1). First, we divided the spatial and activity predicates into two main categories: classifier predicates and lexical predicates. Within the classifier predicates group, we categorized them as aligned or non-aligned with respect to their use in observer and character perspectives. We also identified a novel construction type which we call fused perspective. This construction combines elements from both character and observer perspective into the event space projection. Further, we split the lexical predicates category into occurrences with or without an event space projection. Figure 5.1 shows all types of spatial and activity predicates that have been attested in our T˙ID and DGS data. Based on the classification scheme outlined above, we identified different event predicate and perspective construction types in the data in a systematic way. These include the types below with examples. Observer perspective with entity classifier (aligned): In event representations in observer perspective, the event space is reduced in scale and represented in the area of space in front of the signer’s body. The signer’s head and body are not part of the event, and the hands represent whole referents in the form of entity classifier predicates. Viewed from an external vantage point, the main protagonists in the stimulus events (see the still images from the films in 9
Note that observer or fused perspectives could have also potentially co-occurred with lexical predicates, but we have not observed any combinations of these types in our sample.
94
¨ urek and Perniss Ozy¨ Example 2 (DGS)
GLOSS: mouse(RH:locR,entityCL)-eleph(LH:locL, entityCL)-face-eachother10
appendix 2) are located on the right and left sides of the screen, and activity and motion between them is depicted along the lateral axis. In example 2, the mouse and the elephant are represented on the signer’s hands by means of entity classifiers. The signer’s head and torso are not part of the event. The classifiers are located on the left and right sides of sign space (i.e. laterally) to depict the relative locations of the mouse and the elephant, standing across from each other and facing each other. Observer perspective with handling classifier (non-aligned): In these predicates, the signer’s head and torso are not part of the event, that is, the signer is external to the event and the event space is projected from an observer’s vantage point onto the space in front of the body. The placement of the hands in space corresponds to referent locations from observer perspective. However, the handshape represents the manipulation of objects (and not the referent as a whole). In example 3, the signer uses handling classifiers (i.e. to depict holding the pans) located on the left and right side of sign space to depict the scene where the mouse and elephant flip the pancake back and forth between each other (appendix 2, still 2). Character perspective with handling classifier (aligned): In aligned character perspective signing, an event protagonist is mapped onto the head, torso, and hands of the signer, and the signer’s movements can be attributed to the character whose role is assumed. The event space is life-sized and encompasses the signer as a character within the event. Spatial and activity predicates move or 10
The following abbreviations are used in the examples: RH: right hand; LH: left hand; CL: classifier predicate; LocL: entity located on the left of observer perspective sign space; LocR: entity located on the right of observer perspective sign space.
Event representations in signed languages
95
˙ Example 3 (TID)
GLOSS: mouse(RH:locR)-elephant(LH:locL) hold/flip-pan(LH+RH: handlingCL) ˙ Example 4 (TID)
GLOSS: mouse(signer)-hold/flip-pan(LH: handlingCL)
are located along the sagittal axis, as corresponds to an event space projection from a character’s vantage point within the event. In example 4, the signer depicts the mouse flipping the pancake into the air (see appendix 2, still 2). The signer is in the role of the main animate protagonist (the mouse) and the signer’s hand is in the form of a handling classifier, holding the pan. The signer moves her arm in a way that corresponds to the action in the event as the mouse performs it. The pan is held in front of the signer’s body and the direction of the flipping movement (upward and oriented forward) directs the pancake along the sagittal axis. Character perspective with entity classifier (non-aligned): In this non-aligned type, the event space is life-sized and projected from the vantage point of an event protagonist. The location, orientation, or motion of referents is depicted in a character perspective event space. However, the character is not fully, but rather only partially mapped onto the signer. In this case, (at least) one of the signer’s hands will not represent the hand of the character, but will instead
96
¨ urek and Perniss Ozy¨
Example 5 (DGS)
(a) GLOSS: mouse(signer)hold-pan(RH: handlingCL)
(b) GLOSS: pancake(LH: entityCL)fall-on-floor-in-front-of-mouse(signer)
represent another referent through the use of an entity classifier. (It is also possible for both hands to represent other referents with entity classifiers, while the character remains mapped onto the signer’s head and torso.) In example 5, the signer is depicting the mouse flipping the pancake, which then lands on the floor in front of it (see appendix 2, still 4). The image in example 5a shows an aligned character perspective representation with a handling classifier for holding the pan. In 5b, however, a non-aligned entity classifier (on the left hand) is used to represent the pancake at a location across from the signer’s body (along the sagittal axis). The pancake’s location is determined by an event space projection from the character’s vantage point (i.e. as seen from the point of view of the mouse). Observer perspective fused with character perspective: Furthermore, in our data, we found a construction type that was characterized by what we call a fused representation that includes elements of both character and observer perspectives. This category of representations was found only in the Turkish Sign Language narratives. In the fusion, the character’s head and torso are mapped onto the signer, yet the event space projection is reduced to the space in front of the signer’s body and is from the vantage point of an external observer (corresponding to the signer’s view of the stimulus events). The signer exhibits movements of the head and torso that are attributable to the character, but the representation of referent location and motion is within an event space projection as viewed from an observer perspective. Example 6 shows a use of the fused perspective construction by a Turkish signer. In this example, the signer is depicting the scene where the elephant enters the kitchen (appendix 2, still 5).
Event representations in signed languages
97
˙ Example 6 (TID)
(a) GLOSS: elephant(RH: entityCL)-walk-from-left
(b) GLOSS: mouse(signer)-RH: LOOK-AT elephant(LH: locL,entityCL)
In 6a, the signer uses an aligned observer perspective representation in an event space projected in front of the body to depict the elephant entering the scene (as determined by the viewer’s external vantage point). The elephant, depicted by a two-legged entity classifier, enters from the left and traverses the sign space laterally (moving right). In 6b however observer and character perspectives are fused. The signer maps the head and torso of the mouse onto her body and uses a LOOK-AT predicate to depict the mouse seeing the elephant entering. However, the predicate and the signer’s head and torso are not directed forward as would correspond to the elephant’s location in an event space projected from the vantage point of the mouse. Instead, they are directed to the left, that is, to the elephant’s location viewed from an observer perspective. Thus, we see here an overlay of both character and observer perspectives. Lexical predicate only (no event space projection): Some signers described aspects of the stimulus films using lexical predicates executed in citation form in neutral space, without the use of any signing perspective. In these cases, the event representation was non-spatial because predicates were not associated with meaningful locations within an event space. In example 7, the signer uses a lexical predicate (PLAY) to refer to the mouse and the elephant playing ball (see appendix 2, still 3). There is no topographic mapping of locations and actions onto sign space. Character perspective with lexical predicate: In this construction type, signers identify the actions of characters through the use of directional lexical predicates that are executed in a character perspective event space projection.
98
¨ urek and Perniss Ozy¨ ˙ Example 7 (TID)
GLOSS: PLAY Example 8 (DGS)
GLOSS: mouse(signer)-RH: GIVE-TO-elephant(opp. signer)
The handshape encodes the meaning of the predicate, but does not reflect the handling or size and shape of an entity. In example 8, the signer’s handshape is that of the lexical predicate (GIVE), and the hand moves along the sagittal axis to convey the transfer of the ball between the mouse and the elephant (see appendix 2, still 3). In the stimulus event, the mouse and the elephant are located across from each other, and thus the use of the sagittal axis indicates that the event space is projected from the vantage point of one of the characters, that of the mouse in this case.11 (Note that the ball was identified with a lexical noun prior to the use of this predicate in the narrative.) 11
The vantage point could also be the elephant’s, but in this particular narrative, the mouse stays mapped to the location of the signer’s body throughout.
% of event predicates
Event representations in signed languages 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
99
Turkish German
Handling
Entity
Lexical
Figure 5.2 The percentages of different event predicate types in the two sign languages
Finally, the DGS data sample used for this study included only one instance of a “double-perspective construction” which was characterized by the simultaneous occurrence of both types of predicates (classifier and lexical), on separate articulators, and both types of perspectives (observer and character) for event space projection (see Perniss 2007a for a detailed exposition of this example). Since we encountered this type of construction only once in our sample, we excluded it from the quantitative analysis of the constructions presented in the next section. 3.4
Analysis and results
In total, DGS signers used 408 and T˙ID signers used 204 spatial and activity predicates when uses in both film narrations were considered. The means per signer were (40.8) for DGS and (25.5) for T˙ID. Event predicate types across languages In the first analysis, we investigated whether signers preferred classifier predicates (handling vs. entity) or lexical predicates in representing the location, motion, and action of referents and whether this varied across the two languages. For this, we calculated the percentages of the different predicate types over all the spatial and activity predicates used. Figure 5.2 shows that, regardless of the perspective choice, signers of both languages preferred to use classifier predicates over lexical ones, that is, predicates that contained more specific semantic specification about the referents themselves. Furthermore, handling classifiers were observed more often than entity classifiers in both languages. This may be due to the fact that the events in the cartoons contained a lot of manual activity events. However, the quantity of handling and entity classifiers was not equally distributed across the two languages. Turkish signers used relatively more handling and fewer entity classifiers than the German signers. This shows that even though event
¨ urek and Perniss Ozy¨
% of event predicates
100
1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
Turkish German
CHAR
OBS
Figure 5.3 The percentages of perspective types across the two sign languages
types of the cartoon might drive the prominent use of handling classifiers, this preference can be mediated by the specific language used. Perspective types across languages In the second analysis, we looked to see whether the two languages exhibited differences in the dominant choice of perspective to depict events. Figure 5.3 shows that signers of both languages used more character than observer perspective (in this analysis, the fused perspective use found in the Turkish Sign Language data contributed both to the use of observer and character perspective in the counts and was used 11% of the time by Turkish signers). However, we also see that German signers used slightly more character perspective than Turkish signers, while Turkish signers used more observer perspective than German signers. Event predicate/perspective type alignments In the final analysis, we directly investigated the preference of the event predicate type given the choice of a certain perspective in the two languages to see whether certain perspectives motivate the choice of certain event predicates (see fig. 5.4). First, we took into account only the classifier predicates. As fig. 5.4 shows, in most cases and in both languages, character perspective was used with handling classifiers, and observer perspective was used with entity classifiers. This pattern fits with the expected alignments we proposed in the introduction. However, the occurrence of non-aligned constructions shows that perspective does not totally predict the type of the classifier predicate. Furthermore, the preference for these alignments differed across the languages. In the aligned constructions, character perspective with handling classifiers was more frequently preferred by Turkish signers, while observer perspective with entity classifiers was more likely to be preferred by German signers. In the non-aligned constructions, Turkish signers preferred to use handling classifiers with observer perspective
Event representations in signed languages
101
1.00 0.90
% of event predicates
0.80 0.70 0.60 Turkish German
0.50 0.40 0.30 0.20 0.10 0.00 Ch wt HL
OBS wt ENT
Aligned
Ch wt ENT
OBS wt HL
Non-aligned
Figure 5.4 The distribution of combinations of different event space projections (character, observer) with different types of classifier predicates (aligned, non-aligned) in the two sign languages
more than German signers, while the German signers used entity classifiers in character perspective more than Turkish signers. A separate analysis of the lexical predicates showed that lexical predicates, when used, were mostly used with character perspective in DGS (95%), and were rarely used with an event space projection by T˙ID signers (i.e. 75% of the lexical predicates were used in neutral space in T˙ID). Since the use of lexical predicates was quite small overall (<10%) for both sign languages, these percentages should be taken with caution. Event type analysis Since these results show that perspective choice does not determine the type of event predicate in a one-to-one way, we now ask what can motivate the choice of event predicate types and the aligned vs. non-aligned constructions? In order to answer this question, we investigated whether different event types predict certain event predicate types or the event predicate/perspective combinations that we see in fig. 5.4. For the aligned constructions, we saw that signers used character perspective with handling classifiers mostly for depictions of transitive events that involved the manual activity of single animate characters, such as flipping a pancake, bouncing a ball, etc. For DGS, out of all predicates using character perspective
102
¨ urek and Perniss Ozy¨
and a handling classifier, 88% fall into this category; for T˙ID, 94% do so. On the other hand, observer perspective with entity classifiers was used to represent intransitive motion events of single animate or inanimate entities (such as the mouse walking or the ball moving between the animate characters), and the location of one or two animate figures. For DGS, out of all predicates using observer perspective and an entity classifier, 94% fit this definition; for T˙ID 100% do so). Thus, perspective choice and event type (transitivity) can explain the choice of event predicate in the aligned constructions (see section 2 for similar findings in other sign languages). Unlike aligned constructions, the non-aligned constructions were used when the action, motion, or location of more than one “directly” represented entity (two animates or one animate/one inanimate) needed to be expressed simultaneously and in relation to each other.12 In such constructions observer perspective with handling classifiers occurred mainly when signers tried to represent the transitive actions of two animates at the same time. (As noted above, and shown in fig. 5.4 and example 3, this type of construction was predominantly preferred by Turkish signers.) Character perspective with entity classifiers was mostly used to represent the intransitive movement of an inanimate object towards or away from an animate figure (as, for example, to represent the pancake falling in front of the mouse in example 5b, or to represent the elephant approaching the mouse by moving an entity classifier toward the signer’s body). Out of all predicates with character perspective and entity classifiers, 90% of DGS descriptions and 76% of T˙ID descriptions (excluding the fused perspective constructions) fit into this category. One could propose, then, that events that require the simultaneous representation of the location, motion, and/or action of two “directly” represented entities in relation to each other (two animates or one animate/one inanimate) motivate the use of non-aligned constructions. Furthermore, in the non-aligned constructions we still see that transitive events are represented by handling predicates while intransitive ones are represented by entity predicates, as found in the aligned constructions. Therefore, event type can predict the type of classifier predicate regardless of whether it is used in aligned or non-aligned perspective construction types. Whether an aligned or non-aligned construction is going to be used depends on the number of directly represented entities to be depicted. However, the differential distribution of these alignments across the two languages shows 12
Note that aligned constructions also involve one animate and one inanimate entity where the latter is incorporated into the handling classifier, representing the object manipulated or acted upon. However, what is different about the non-aligned constructions is that the second entity is represented “directly” (as the whole entity itself), rather than “indirectly” (through a depiction of its manipulation by an agent). For example, in the non-aligned example in 5b both the mouse and the pancake (the former as an Agent and the latter as Theme argument) are “directly” represented entities (Zwitserlood 2003).
Event representations in signed languages
103
that there is more than the event type or the number of entities to be depicted that motivates the uses, and that linguistic/discourse constraints specific to each language may also be at play. 4
Conclusion and discussion
In this chapter, we have aimed to show how events can be represented in sign languages, in particular, in Turkish and German Sign Languages. One of the unique aspects of event representations in signed languages (which makes them radically different from spoken languages) is the use of event predicates known as ‘classifier’ predicates. These predicates can depict information pertaining to the size and shape of referents, their location, orientation, and motion, as well as to the way they are manipulated or handled. Furthermore, in depicting an event, their use necessitates the choice of a perspective. Here, we have tried to give an account of whether the perspective choice and/or the type of event depicted can determine the type of event predicate chosen to represent the event. Overall, we found that perspective choice to a certain extent, and the semantics of the event type to a greater extent, predicts the event predicate choice. However, we also found that characteristics of the specific language used mediate these choices, probably due to the presence of different linguistic/discourse constraints in different sign languages. The exact nature of these linguistic/discourse constraints needs further research. We found in both sign languages that signers use semantically more specific predicates, i.e. classifier predicates, than lexical predicates to depict events in narratives (as shown in fig. 5.2). Furthermore, within the classifier predicates, the use of handling classifiers is more common than the use of entity classifiers. Finally, character perspective is used more frequently than observer perspective (as shown in fig. 5.3). These preferences may be due to the nature of the cartoons, which involve a lot of manual activity. It also fits with prior observations from Danish Sign Language narratives that signers choose to “depict” (i.e. enact) rather than “describe” the events, and prefer to do so from an “egocentric” perspective (Engberg-Pedersen 1993). Because depictions allow a more direct visual mapping of actions onto the body of the signer, this may have motivated the use of character perspective and handling classifiers. However, the crosslinguistic data also revealed some tendencies for differences, first of all with regard to the use of handling vs. entity classifiers. Turkish signers showed a greater tendency to use handling classifiers than did the German signers (though without statistical evidence we cannot make a definitive claim about this). The tendency to use more handling than entity classifiers has also been documented in previous comparative research. For example, Aronoff et al. (2003) have shown that fewer types of entity classifiers are used in Israeli Sign Language than in American Sign Language. (Note that this
104
¨ urek and Perniss Ozy¨
previous research is based on types rather than the frequency of use, as we have shown here.) This difference between ISL and ASL has been attributed to the difference in the age between the languages; ASL being more grammaticalized and having more frozen classifier predicates due to being an older language. However, as we have outlined, there is no apparent difference in age between Turkish and German Sign Languages, nor in the sociolinguistic context of these sign languages. Furthermore, Nyst (2004, 2007) has shown that in Adamorobe Sign Language (Ghana), there are no entity classifiers found even though this is a rather old sign language (approximately 200 years old, i.e. roughly as old as ASL). At the current point in research, then, it is not clear what can motivate these differences across languages and we think it is too soon to attribute these differences only or mainly to the age of the sign languages. It is, rather, possible that typological differences may exist between sign languages in this domain as they do in spoken languages. Typological differences may motivate and influence the use of different types of classifier predicates in addition to the influence of modality factors. Another tendency showing a crosslinguistic difference that we have found is in the choice of perspective. German signers used more character perspective than Turkish signers. This difference suggests that choosing character perspective in narratives is not necessarily the default or the most “depictive” way of representing events in sign languages. In our previous research, we have speculated that this difference between languages could be due to the availability of a role-shift-marking device in German Sign Language to indicate switches in subject reference that involves a shift in shoulder/head/torso orientation. In DGS, different animate referents can be associated with different orientations of the torso, such that signers can “shift” into the role of a particular referent simply by shifting their shoulders (similar-looking devices have also been reported for Danish Sign Language: Engberg-Pedersen 1995; and ASL: Lillo-Martin 1995; Lillo-Martin and Klima 1990; McClave 2001). In T˙ID, such a shoulder-shift device does not seem to be systematically used to indicate switches in subject reference. T˙ID signers seem to prefer different devices to mark reference switches, such as the repetition of noun phrases or changes ¨ urek 2004).13 The existence of the shift in facial expressions (Perniss and Ozy¨ in shoulder orientation within role-shift, as a linguistic/discourse device, then, might explain why German signers use character perspective more often than Turkish signers. Looking at the alignments between perspective type and the type of classifier predicate, we found that perspective choice can motivate classifier choice to 13
Note that DGS signers also used these different devices, but used lexical noun phrases to a much lesser extent than did the T˙ID signers, and that signers of both sign languages used combinations of multiple devices.
Event representations in signed languages
105
some degree but not in a determinate way. The most frequent constructions were the aligned ones. However, the existence of non-aligned constructions shows that perspective can determine predicate type only to a certain extent. Thus, perspective choice is independent of, and orthogonal to, the predicate type, even though some visual conceptual features of both could make them align in most cases (Perniss 2007a,b). Our further analysis showed that the transitivity of the event types could be a better predictor of predicate type. Our finding that intransitive events are more likely to be represented by entity classifiers, while transitive events are more likely to be represented with handling classifiers has also been previously noted in the literature (e.g. McDonald 1982; Engberg-Pedersen 1993; Zwitserlood 2003). Here, however, we show that event type can furthermore even predict the use of aligned vs. non-aligned perspective-classifier type correspondences. While the aligned ones are used when the location, motion, or action of one entity is depicted (possibly incorporating a manipulated object in the use of handling classifiers), the latter is more likely to be used to depict the relationship between more than one directly represented entity (at least one animate). This preference also shows that classifier predicate use should not be analyzed on its own without taking into account the perspective with which they occur. We argue that predicate type-perspective type combinations are unified constructions, the uses of which are dependent on the event types to be depicted. These findings show, then, that visual perspective, together with the semantics of the stimulus events, determine the use of event space constructions in signed languages. However, differences in crosslinguistic distributions suggest that other typological factors could also be involved. For example, the existence of a shoulder-shift mechanism to indicate the referent role taken as a linguistic device in DGS, but not in T˙ID, could motivate some of these differences. More research is needed to determine whether these crosslinguistic differences are due to linguistic or discourse constraints (or perhaps even to conceptual constraints that might differ across signers of different languages). Finally, the fact that lexical predicates can be used without an event space projection, unlike classifier predicates, shows that the way lexical predicates are used to express events might be considered more akin to “describing” the event rather than to “depicting” it (Clark and Gerrig 1990). To sum up, both similarities and differences between the two sign languages in the use of types of event predicates and in their combination with different perspectives were found in the qualitative analysis, as well as in the quantitative analysis. Perspective and the semantics of event type are influential factors in a signer’s choice of event predicate, which is then further mediated by different linguistic/discourse constraints. Similarities in the use of space in these domains have been claimed by other researchers to be driven by modality effects (e.g. Newport and Supalla 2000; Meier 2002; Aronoff et al. 2005). However, the
¨ urek and Perniss Ozy¨
106
present analysis shows that differences also exist, expanding our knowledge of the different ways the visual-spatial modality can be used for expression in the domains of event representation. Thus, our results suggest that although the visual-spatial modality might constrain and homogenize expressive possibilities in sign languages (e.g. Newport and Supalla 2000; Aronoff et al. 2005), the diversity of human conceptual, linguistic, and discursive structures may influence the impact of these constraints in different ways. The present study is limited by a small number of subjects and narratives, and further research is needed to determine the range of variation across sign languages in the expression of spatial events. However, the results presented here already indicate that event representations in sign languages can be as diverse as those in spoken languages, even though the parameters that drive the diversity are modality-specific, and differ between signed vs. spoken languages. Appendix 1 Event space projection LEXICAL predicates
CLASSIFIER predicates OBSERVER
aligned
a
c entity CL
non-aligned b
FUSED
CHARACTER
aligned
NONE
e
handling CL
f lex. pred.
handl. CL/ent.CL
non-aligned
CHARACTER
d
g
handling CL lex. pred.
entity CL
character perspective
observer perspective
Figure 5.5 Schemas for different possible uses of predicate types and perspectives deployed in event space representations in signed narratives14
14
See Fridman-Mintz and Liddell (1998) for the use of similar symbolic depictions, where a wavy line area surrounding the signer indicates surrogate space and a semi-circle area in front of the signer indicates token space.
Event representations in signed languages
107
Appendix 2 Stills from stimulus clips that correspond to examples of signed narratives in the text: r Still 1: Mouse enters scene bouncing a ball r Still 2: Mouse and Elephant each hold pan and flip pancake back and forth between them r Still 3: Mouse and Elephant throw ball to each other r Still 4: Pancake falls in front of Mouse r Still 5: Elephant enters kitchen and Mouse sees Elephant 1.
2.
4.
3.
5.
6
Linguistic and non-linguistic categorization of complex motion events Jeff Loucks and Eric Pederson
1
Introduction
Motion events play a central role in people’s representation of the world. Not only is our perception of motion events crucial for safely navigating through the world and key to our survival; our conceptual understanding of motion events is necessary for interpreting other people’s behavior, and for accurately communicating important aspects of an event to others. Despite the importance of event cognition in people’s everyday functioning, most research on motion and event processing to date has been restricted to low-level phenomena associated with simple perception of motion, while studies investigating higher-level conceptual phenomena are few and far between. In recent years, however, there has been a growing body of interdisciplinary research on people’s understanding of events (as we hope this volume has made apparent). Growing evidence indicates that we possess a very powerful system for processing events, especially for events involving acts of human motion (Baldwin 2005). Our conception of events appears crucial to our understanding of social behavior in two ways: the first, that we utilize our event processing system to understand people’s intentions and goals, and the second, that we communicate the details of events to others using language. Given the importance of motion events in social functioning, it is no wonder that language systems around the world provide speakers with a rich set of lexical items with which to describe many of the details present in the complex stream of motion. Expressing these components clearly and effectively is crucial for recounting to another speaker how an event unfolded. For over two decades now, Talmy has surveyed many languages, investigating the strategies they use to express motion events, and has created a typology describing such strategies (Talmy 1985, revised in 2000b). Importantly, for the purposes of this chapter, Talmy’s typology has outlined a number of ways that languages differ with respect to how motion events are most commonly expressed, both lexically and syntactically. This variation in lexicalization has, of course, been a source of great interest to researchers concerned with identifying relationships between language 108
Categorization of complex motion events
109
Table 6.1 Typically cited semantic components of motion events Element
Definition
Motion∗ Path∗ Figure∗ Ground∗ Manner∗ Cause∗ Origin Endpoint Recipient Agent
the fact of physical motion the course followed by the figure with respect to the ground the entity in motion the location with respect to which the figure moves the way in which the figure moves the event that initiated motion the origin of the path the end of the path the animate entity at the end of the path who receives the moving entity the animate entity which caused the motion
∗
Elements identified and discussed in Talmy (2000b)
and cognition. Could these linguistic differences somehow correlate with underlying non-linguistic cognitive differences? Taking a closer look at the characteristics of the stimulus, there can be no doubt that motion events are quite complex. Not only do events contain a number of aspects which can regularly vary from situation to situation, they are also by their very nature dynamic and ephemeral. It seems only natural to question the role of language in the processing of such a complex stimulus. This chapter summarizes the extant data that have been collected with respect to such relationships, including two experiments of our own. We argue that the methods behind most of these investigations have been largely inappropriate, and that methodological innovation is needed for this domain. We also argue that the typological distinctions which have informed these investigations may be, to a degree, unsuitable for examining relationships between language and cognition. Further work in motion event typology is needed before research into the relationship between language representing motion and thought should be considered. Finally, we offer directions for future research in this domain. To begin with, however, we will review the typology and related typological work that has provided the conceptual framework for both the previous studies and the current investigations. 2
Talmy’s motion event typology
Talmy (1985) identified the semantic components of motion events that languages make explicit. A partial list of these components and their definitions can be found in table 6.1. From this list, a few components appear to be universal crosslinguistically, in that all languages appear to provide speakers with a relatively simple way to express them. Specifically, these components are
110
Loucks and Pederson
motion (the presence or absence of movement), figure (the entity in motion), ground (the location with respect to which the figure moves), and path (the course followed by the figure with respect to the ground). These components seem intuitively essential in describing any act of motion. Tautologically, the most primary component of a motion event is motion, and as such this component is always expressed.1 Although it may occasionally be permissible to omit figure, ground, or path in motion event descriptions, such components are nonetheless ubiquitous in event descriptions across a wide variety of contexts. Languages vary with respect to how components are expressed in the clause, in particular, whether they are represented grammatically or lexically. In his typology, Talmy analyzed whether components are conflated in the main verb, or instead expressed elsewhere, in prepositions or what he refers to as ‘satellites.’2 A satellite is “the grammatical category of any constituent other than a noun phrase or prepositional phrase complement that is in a sister relation to the verb root.” (Talmy 2000b: 102.) As such, the term includes a relatively diverse set of syntactic types, including verb particles (out, up, etc.), gerunds, and adverbials. Since verbs of motion necessarily express the primary semantic component of motion, conflation is used to describe situations in which the motion verb contains an additional motion component (e.g., manner, path). For example, the English verb roll conflates motion with manner, in that it expresses the fact of motion, and also expresses the particular way in which the figure is moving. When additional components are expressed and cannot be conflated in the main verb, they are thus expressed in prepositions or satellites. For example, English has a very large set of prepositions that express path (e.g., in, out, around, through). Variability in how and where components are expressed or conflated is not exclusively between languages; even within a language there exist multiple expression patterns. However, there are many languages which have a particular pattern that is predominant within that language. Across the world’s languages, Talmy identified two main syntactic patterns for the expression of path which are quite common: verb-framing, in which path is conflated in the verb, and satellite-framing, in which path is expressed in a satellite or preposition. The satellite-framing pattern is typical of English, and also many other northern European languages (e.g., German, Swedish, Russian, etc.). The verb-framing pattern is typical of Romance languages (e.g., Spanish, French, Italian, etc.). These patterns are found in languages outside of Europe. 1
2
Note however, some research has suggested that change of location verbs need not necessarily involve motion of the figure, though these might conventionally be understood as essentially equivalent to motion verbs which do involve motion of the figure. See Kita (1999), and Levinson and Wilkins (2006b). Understandably then, many lump prepositions together with satellites for the purposes of applying Talmy’s motion typology.
Categorization of complex motion events
111
These dominant syntactic patterns often correlate with dominant verbconflation patterns as well. Because path is expressed in satellite form for satellite-framing languages, manner is often conflated in the main verb of the clause. This conflation pattern is characteristic of English. In contrast, many verb-framed languages more commonly express manner in satellite form, since path is already conflated in the main verb. Again, it should be noted that languages are not restricted to using only one of these patterns, and most languages employ a number of different patterns. Typing a language as a verb- or satellite-framing language has more to do with the relative frequency with which a pattern is used in relation to others. 2.1
Manner salience
Slobin (1996a, 1996b, 2003, 2006) has investigated two additional typological differences that are directly related to Talmy’s original typology. The first of these is the overall mention of manner within and across languages, and the second is the available set of manner expressions a given language provides to its speakers for describing an event. Both fall under the study of what Slobin calls ‘manner salience.’ The overall frequency of manner encoding is a function of the default syntactic structure of the clause. One consequence of expressing path in the main verb is that manner must be expressed adverbially, and thus encoding of manner is less predicated. Since adverbials are not part of the foundation of the verb phrase, they can be added or dropped at the discretion of the speaker. As stated in our introduction, path appears to be a more central component of any motion event, and thus removing it from the clause will have larger semantic implications than removing manner. Accordingly, Slobin speaks only of manner salience, and not path salience. With the satellite-conflation pattern, one cannot drop the particle expressing the path of a motion event without a large change in meaning. For example, the sentence Brandon ran into the room describes a very particular event, whereas the sentence Brandon ran describes a general activity of running, and could be used to describe the event of Brandon running into the room or a great range of running events. It is pragmatically obligatory to include a path particle in order to accurately describe a great majority of motion events in English. Expressing manner with the verb-framing pattern is not obligatory in the same fashion: in many communicative settings, one could say either Brandon entered the room or Brandon entered the room running to describe the same motion event. The use of the manner satellite is a decision on the part of the speaker to provide that additional information. Thus, unless a speaker of a verb-framing language wishes to inform the listener about the manner used, they may simply leave it out.
112
Loucks and Pederson
According to Slobin, omission of manner appears to be frequent in verbframing languages. This may be especially true for cumulative or boundarycrossing events, whether the activity results in a salient change of location or path across a salient boundary (Aske 1989). Slobin has observed this difference in absolute manner mention in surveys of English-to-Spanish and Spanish-toEnglish translations of novels, and also from verbal descriptions of a scene from Mayer’s (1969) Frog, where are you? across satellite-framing (Dutch, German, English, Russian, Mandarin, Thai, and Tsou) and verb-framing (Spanish, French, Italian, Turkish, and Hebrew) languages (Slobin 2006). Naigles et al. (1998) provide a careful investigation of manner salience in English and Spanish motion event descriptions. Regardless of the forms used, English and Spanish differed in how often manner was expressed, though the difference appeared to be more slight than what Slobin’s observations might predict. They elicited verbal descriptions of both static motion event pictures and dynamic videos of events. Across both types of stimuli, English participants’ absolute mention of manner ranged between 88–90%, while the Spanish participants’ range was only 70–80%. Unfortunately, they did not report means for this statistic across the two stimulus types, nor did they explore whether this difference between groups was significant. 2.2
Cognitive relevance of the typology
Talmy’s typology and Slobin’s research on manner salience have been a recent source of interest to researchers investigating interactions between language and thought. It seems only natural to ask whether such differences across languages in motion event lexicalization might relate to how language users process motion events non-linguistically. A number of different but related hypotheses have fallen out from these linguistic surveys, and each of them has been tested in previous research. We describe each of these here before moving on to the studies proper. The first hypothesis about how language and cognition might interact in this domain was put forth by Talmy (1985, 2000b). He suggested that whatever information is conflated with motion in the main verb is “backgrounded” in attention, while information expressed in satellite is brought to be “foregrounded” in attention (Talmy 2000b: 128). According to this position, the English sentence I went to the store by foot brings attention to the manner of walking, whereas I walked to the store reduces the salience of the manner information. According to this suggestion, the hypothesis would be that speakers of languages which habitually express manner information in the main verb should find path information more salient in non-linguistic cognitive tasks, and vice versa for speakers of languages which express path habitually in the verb. We refer to this hypothesis as the ‘Background’ hypothesis.
Categorization of complex motion events
113
An opposing view is the idea that information conflated in the main verb should be more salient than information expressed in satellite form. This idea was suggested in two of the studies we review in the next section (Finkbeiner et al. 2002; Papafragou, Massey, and Gleitman 2002). Since English more commonly conflates manner in the verb, English speakers should pay more attention to manner in non-linguistic tasks than Spanish speakers, who should pay more attention to path information, as Spanish expresses path in the main verb. We call this hypothesis the ‘Verb’ hypothesis. Note that although both of the above suggestions might fit well with one’s global intuitions about language, they are in fact in direct contradiction to one another. This is problematic. We worry that hypotheses about linguistic relativity that do not rely on whether a notion is expressed, but rather only on which grammatical category expresses a notion, will not lead to sensible predictions (more on this in a later section). Finally, Slobin has argued that the frequent omission of manner in verbframing languages will lead to reducing the cognitive salience of manner in mental operations at least as far as “thinking for speaking” contexts. Expressing a semantic component as a gerund – or any other subordinate form – involves more cognitive processing, and thus subsequently this component is less readily expressed in that language (Slobin 1996b). Thus, attention to manner is reduced in both the speaker’s mind as well as the listener’s, since it is low in absolute mention during discourse. Unlike the previous two hypotheses, this hypothesis relates only to processing of manner information, and says nothing about path. Extending beyond Slobin, one hypothesis along these lines would be that speakers of a language with low frequency of expression of manner will find manner less relevant in their conceptualizing events (even outside of the context of language use) than speakers of languages with an overall higher frequency of manner expression. Accordingly, we refer to this hypothesis as the ‘Manner’ hypothesis. Studies in this domain so far have tested each of these hypotheses by contrasting verb-framing languages with satellite-framing languages and assessing differences on non-linguistic measures. We will now review these studies, including two of our own. 3
Studies of linguistic relativity for motion events
There have been a number of recent studies examining interactions between language and thought for motion events; but before discussing the details of each previous study, it will be helpful to describe some of the general methods that have been utilized by all of the researchers. Each of these studies used one or both of two non-linguistic cognitive tasks: a triads categorization task and a recognition-memory task. In the triads task, participants are asked to sort motion
114
Loucks and Pederson
events into categories. They are shown a target motion event (either a picture or a video), followed by two variants of that target event: a manner variant, in which the manner of motion has been altered from the original target and the path is maintained, and a path variant, in which the path has been altered and the manner maintained. After viewing the target and its variants, participants are asked to decide whether they believe the manner variant or the path variant is more similar to the original video. Although this is a complex reflective process, it is assumed that part of participants’ decision-making will be influenced by which component is made more salient in their native language. For the recognition-memory task, participants are shown a series of motion event stimuli which involve clear manners and paths, and are later asked to recall whether or not a particular stimulus is old (familiar) or new (novel). Similar to the triads task, new stimuli in the recognition-memory task are stimuli which retain the original path but depict a new manner, or retain the original manner but depict a new path. This process is somewhat less explicitly reflective, as participants are not made consciously aware of the contrast between manner and path. Again, it is expected that any systematic tendency for language groups to differ is due to habitual language use. 3.1
Previous studies
The first study to examine the effects of typological patterns on motion event processing was Papafragou, Massey, and Gleitman (2002). In developing predictions, the authors suggested both the Background hypothesis or the Verb hypothesis as viable candidates, but that in either case finding any systematic differences across the two languages would suggest an interaction between language and thought. English (satellite-framing) and Greek (verb-framing) speakers’ non-linguistic cognitive processing of events was tested using both a recognition-memory task and a triads-categorization task. In the memory task, participants verbally described static motion event pictures from Mayer’s (1969) Frog, where are you? picture book, and were tested for their memory of the original pictures the next day. In the triads task, “dynamic” motion-event pictures were used as stimuli: each event unfolded across a series of three still images. Participants verbally described each of the dynamic motion event pictures following the triads task. For both tasks, the results revealed no relationship between language typology and performance. Although English speakers and Greek speakers described the motion events according to hypothesized typological patterns, they performed identically on the recognition task. As well, neither group showed a preference for manner or path on the triads task. Both found manner just as salient as path for the purposes of categorization. Thus, neither the Background nor the Verb hypothesis received support. The authors do not report the overall
Categorization of complex motion events
115
expression of manner in their descriptions, so there is no way to evaluate the Manner hypothesis in this instance. Although this study was well executed and controlled, the use of pictorial stimuli makes the interpretation of the results unclear. Processing of pictorial stimuli may rely on different cognitive resources than typical motion event processing (i.e., of dynamic stimuli). Perhaps more importantly, in their conclusion the authors try to argue from a null result to claim that there is absolutely no effect of language on cognition. There are, of course, two flaws in this reasoning. First, it is impossible to argue from a single null result: there are any number of reasons why they could have failed to detect a difference between the groups which are orthogonal to the effect of habitual language use – especially given our concerns about the stimuli. Second, they did not test the effects of language on cognition generally: they tested the effects of language on cognition purportedly for motion event processing. There are numerous linguistic relativity studies arguing for relevant cognitive effects of habitual language use in a number of domains, so it is absurd to generalize from an apparent lack of effect in a single domain to the conclusion that language cannot have any nontrivial effect on non-linguistic cognition. See, e.g., Pederson et al. (1998) for static spatial arrays, Boroditsky (2001) for temporal reasoning, and the Hardin and Maffi (1997) collection for color naming and perception. Finkbeiner et al. (2002) also investigated linguistic relativity in motion event processing, with English, Spanish and Japanese participants. Like Spanish, Japanese has been typed as a verb-framing language (Hayashi 2003). The authors based their predictions on the Verb hypothesis, working from the results of Naigles and Terrazas (1998), who showed that main verb conflation patterns predicted what meaning speakers of English and Spanish extend to novel verbs. The stimuli to categorize in this case were computer animations of a moving ball. The authors wanted to use animated displays to ensure that the manners were all novel and thus not lexicalized in any of the three languages (the ball rotated on two different axes which were combined in various ways to create novel manners). However, animated displays of otherwise inanimate objects have an unclear relationship with the kinds of real-world motion events people typically encounter, and thus the results may again be difficult to reconcile with how people typically process more ecologically relevant motion events (e.g., human motion). Similar to Papafragou et al. (2002), Finkbeiner et al. (2002) also used a triads task, in which the one target animation was compared to manner and path variants (sequential and simultaneous presentation of the animations was manipulated across two experiments). In contrast to Papafragou et al. (2002), the results revealed a relationship between linguistic typology and categorization: English participants categorized based on manner significantly more often than both Spanish and Japanese speakers. Further, Japanese speakers chose the
116
Loucks and Pederson
manner variant significantly more than chance, while Spanish speakers were at chance for selecting manner or path. This latter result is difficult to explain with typological patterns alone (since both languages have been typed as pathconflating), and may have been the result of statistical power. However, the results of this study are quite limited, for two reasons. First, linguistic descriptions of the animations were not collected from the participants or from an additional group of native speakers. This limits the conclusions the authors can make; even if these languages had been typed previously, if the groups cannot be distinguished typologically in this context, then any differential results may be due to reasons other than language. Second, and perhaps more importantly, the nature of the stimuli used severely limits the interpretability of the results, as manner was confounded with novelty of motion (i.e., the complex rotary motion of the ball). Although the authors wanted to balance the lexicalization of manner across the language groups, the impetus for doing so is opaque given the fact that novel paths were not generated for any of the animations. In any case, the results from English speakers could have been due to a manner bias or a novelty bias. At best, the results demonstrate – for some unspecified reason not predicted by the typology – that English speakers will sort according to manner more often than Spanish and Japanese speakers when the contrast is between two novel manners of motion and two familiar paths of motion. The interpretation of the results provided by Finkbeiner et al. is thus highly questionable, given this inherent problem. Using native English and Spanish speakers, Gennari et al. (2002) were the first to examine categorization and memory for motion events using what we feel are the most ecologically relevant type of stimuli: dynamic events of human motion. Gennari et al. based their predictions on the Manner hypothesis, and thus expected differences in the relative salience of manner information between English and Spanish speakers in their tasks. Accordingly, the overall expression of manner was collected from participants’ linguistic descriptions of the events. Participants engaged in a triads task followed by a surprise recognitionmemory task. As the researchers were also interested in the degree to which the immediacy of language would influence people’s performance on the task, there were three separate language conditions for the triads task. In the Naming First condition, participants provided written descriptions of the target video before viewing the two variants. In the Free Encoding condition, participants simply watched the videos without describing them. Finally, in the Shadowing condition, participants shadowed an audio passage of nonsense speech as they viewed the videos. For the recognition task, there were no significant differences between the two groups under all three conditions. Both language groups were just as likely
Categorization of complex motion events
117
to give false positives for manner variants as they were for path variants. On the triads task, the groups were only significantly different from each other in the Naming First condition. When asked first to describe the videos, English speakers were more likely to later group the videos using a manner categorization strategy, and vice versa for Spanish speakers. In the other conditions, both groups appeared to have no preference for one component over the other. The authors conclude that immediate language use is the only driving factor in speakers’ categorization of motion events. Unfortunately, in this study the use of language in the Naming First condition was confounded with typologically biased instructions. Specifically, participants in the Naming First condition were instructed to describe the videos with respect to the typological patterns consistent with their language. In four demonstration trials, English participants were given written example descriptions which only included manner verbs and path satellites, and Spanish speakers were given example descriptions which only included path verbs and optional manner satellites, e.g., (la persona) cruza (gateando) por delante de la mesa, ‘(the person) crosses (crawling) in front of the table’. Not surprisingly, they found that English participants mentioned manner 86.16% of the time in their descriptions of video stimuli, while Spanish speakers mentioned manner 71.33% of the time. Thus although this difference in manner expression appears to corroborate with Slobin (2006) and Naigles et al. (1998), it may have been artificially created in the lab. Any slight bias could have a significant effect on people’s category judgments. However, the goal is to discover whether such slight biases come from habitual language use, not to create such biases artificially. The results of this study are thus difficult to interpret, given this confound. Finally, in the most comprehensive study to date, Bohnemeyer, Eisenbeiß, and Narasimhan (2006) collected categorization data using a triads task from speakers of seventeen genetically and typologically diverse languages. Stimuli in this case were two-dimensional animations of a non-human agent (a tomato). Much like Finkbeiner et al. (2002), the use of animated stimuli lowers the relevance of the results to everyday motion event processing. The authors base their predictions on the Manner hypothesis. No correlation was found between typology and categorization. Across the seventeen languages, there was not a significant group difference between those typed as verb-framing languages and those typed as satellite-framing languages overall. Statistically, language typing was not predictive of whether speakers of a language would sort according to manner or path. The authors note that when looked at individually, particular verb-framing and satellite-framing language groups did differ significantly from one another (e.g., Polish and Tamil), highlighting the fact that if they had only selected these two languages to investigate linguistic relativity in this domain, they would have found a
118
Loucks and Pederson
significant effect. This is an important lesson for the replicability of these effects across different studies (more on this in the next section). Summary of previous studies To sum up so far, we have a confusing package. Studies investigating potential interactions between language and thought up to this point had been fraught with design problems, interpretations that go beyond the data, and inconsistent results. For instance, just looking at English, we see markedly different categorization preferences across these three studies: in Finkbeiner et al. (2002) English speakers categorized with respect to manner on 88% of trials, while in Papafragou et al. (2002) and in two conditions of Gennari et al. (2002) English speakers categorized with respect to manner at chance levels. A great majority of these studies also employed stimuli which are not representative of the kinds of motion events people devote much of their cognitive resources to each day, that is, human motion events. Because of the fact that previous investigations had yielded inconclusive findings, and because of our interest in the processing of human motion events, we conducted two of our own experiments to test for interactions between language and thought in this domain. We will summarize each of these in turn. 3.2
The current investigations
Our first experiment involved twenty-one native English speakers and thirteen native Spanish speakers. Because all participants were residents of the United States, almost all of the Spanish participants knew varying degrees of English. However, none were exposed to English while growing up at home, and no participant had achieved native-like fluency in English. Following previous studies, we also had participants engage in a triads categorization task identical in basic structure to the tasks used in prior research. Like Gennari et al. (2002), we chose to use videos of dynamic human action. We specifically chose manners and paths which were equally lexicalized across Spanish and English (whether using a verb or a satellite). We also chose paths that would be considered, for the most part, cumulative or boundarycrossing, in order to elicit linguistic descriptions which would maximize the typological differences between the groups. As can be seen from table 6.2, each of these videos involved simple actions with a clear path and a clear manner. We also introduced foil trials into the design (not represented in table 6.2). We were concerned that in much of previous research utilizing the triads task, the research question has been transparent to the participants (see also Bohnemeyer et al. 2006). Participants are readily aware that there are only two possible solutions, and this may severely undermine the sensitivity of the instrument.
Categorization of complex motion events
119
Table 6.2 Semantic component types used in the video stimuli for Experiment 1
Set 1 Set 2 Set 3 Set 4 Set 5
Figures
Manners
Paths
Ground
male / female male / female male / female male / female male / female
hopping / spinning walking / marching skipping / jogging tiptoeing / shuffling crawling / rolling
up / down across / past in / out toward / away under / over
ramp blanket room vending machine table
Thus, we designed the stimuli such that the figure – the entity in motion – could also vary in the experiment (i.e., both a male and female actor performed all actions). This allowed us to introduce two different trial types in addition to our manner vs. path trials of interest: (1) foil trials, in which one variant involved the same figure but a new manner and path and the other variant involved a new figure but the identical manner and path (and thus there was a correct answer: the variant with the identical manner and path); and (2) figure test trials, in which both the manner and path variants were performed by a different actor from the target. Figure test trials were created so that a change in the figure from target to the variants was not a cue to the type of trial, and still allowed us to collect data on the manner vs. path preference. One-third of trials were foil trials, while two-thirds contrasted manner and path. In designing our experiment this way, we hoped that participants would not be able to “guess” the research question, and would instead be more likely to reveal any bias generated by habitual language use. Unlike Gennari et al. (2002), we did not ask the participants to describe the videos as they watched them: a separate group of seven native Spanish and five native English speakers gave independent, written descriptions of each stimulus video. Participants were asked to describe each video to the extent that someone could identify the video from among the others based on their description alone. For both the experiment and the language elicitation task, all participants in both tasks were instructed in their native language. This initial triads experiment did not reveal significant group differences. As shown in fig. 6.1, both Spanish and English speakers categorized the motion events by manner significantly greater than chance, and were not statistically different from one another. This manner preference was consistent across items, as it held up when we looked at participants’ performance within individual events. Surprisingly, however, our groups also did not differ in their linguistic descriptions of the stimuli. Overall, English speakers patterned identical to Talmy’s characterization as a satellite-framing language. However, contrary
120
Loucks and Pederson 1
Average proportion manner choices
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 English speakers
Spanish speakers
Language group
Figure 6.1 Average proportion manner choices by language group in Experiment 1
to our expectations, when describing these events Spanish participants overwhelmingly used manner verbs as the main or only verbs in their responses: (1)
Ella se gatea debajo de la mesa ‘She crawls under the table’
(2)
Caminando pasos cortos ‘Walking half-steps’
(3)
Ella est´a girando hacia arriba ‘She is turning upwards’
Spanish participants almost never used path verbs in their descriptions, and used just as many manner verbs as their English-speaking counterparts. Interestingly, they expressed path information much less than they expressed manner information. While English speakers mentioned manner 100% of the time and path 99% of the time, Spanish speakers mentioned manner 96% of the time, but path only 57% of the time. This result is especially surprising given that our paths involved boundary-crossing and path culmination. Something about our stimuli made the manner of the videos very salient to our participants. This experiment underscores the importance of collecting linguistic descriptions for the stimuli to be used in any non-linguistic experiment. Although there
Categorization of complex motion events
121
may be an existing typological characterization for a language, the typology may not necessarily correlate with elicitations obtained with the stimuli to be used in the experiment. Speakers of that language may not necessarily structure their speech according to previous characterizations, as we have demonstrated here with Spanish. Importantly, we would like to note that there are plenty of previous characterizations of Spanish as a verb-framed language (Aske 1989; Bohnemeyer et al. 2006; Naigles et al. 1998; Slobin 1996b), and that we are not trying to argue that these previous typological characterizations were incorrect. However, the fact that our elicitations differ so greatly from previous characterizations suggests that it is not an absolute characterization. Moreover, since Finkbeiner et al. (2002) did not collect linguistic descriptions, and Gennari et al. (2002) biased participants to describe the events according to prior characterizations, our data suggest that previous non-linguistic differences identified with Spanish speakers may have little to do with how Spanish speakers would actually linguistically categorize the events used in these experiments. At least for human motion events, Spanish speakers are not compelled to conflate path in the verb or even mention it at all. On the one hand, our initial experiment contradicted the Background hypothesis and supported the Verb hypothesis: English and Spanish speakers both demonstrated a preference for using manner as a basis for categorization, which is in line with the fact that both language groups predominantly used manner verbs as either the main or the sole verbs of their descriptions. It would, of course, be irresponsible to draw inferences based on a single null result. On the other hand, our results did not support a “path” hypothesis (similar to the Manner hypothesis). Because English speakers mentioned path more often in their descriptions, we might have expected them to base their categorization preferences significantly more on path than manner compared to the Spanish speakers. However, this was not the case. In either case, the results of this study have extremely low interpretability. Thus, we carried out an additional experiment, with the aim to compare two language groups which actually contrast in lexicalization patterns for the stimuli used, and also to expand and improve the non-linguistic tasks used. To this end, we designed and conducted a second study. Experiment 2 compared fourteen native English speakers with eleven native Japanese speakers. As was mentioned previously, Japanese, like Spanish, has been typed as a verbframing language. In contrast to Experiment 1, we obtained on-line verbal descriptions of our stimuli instead of written ones, from a separate group of five native English and seven native Japanese speakers. We again had participants carry out a triads categorization task, with stimuli similar to those used in Experiment 1 (see table 6.3). Again, all manners and paths were equally lexicalized in English and Japanese, and paths were generally cumulative or involved boundary-crossing.
122
Loucks and Pederson
Table 6.3 Semantic component types used in the video stimuli for Experiment 2
Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 Set 9 Set 10
Manners
Paths
Grounds
dance / march jump / crawl dance / walk spin / hop spin / walk slide / run crawl / roll hop / tiptoe tiptoe / march skip / run
across / past on / off across / around toward / away up / down in / out over / under along / beside around / through along / across
towel / newspaper pillow 1 / pillow 2 blanket 1 / blanket 2 tripod / lamp ramp 1 / ramp 2 door 1 / door 2 table 1 / table 2 table 1 / table 2 arch 1 / arch 2 basketball / soccerball
In order to better mask the contrast of interest to participants, for this experiment we changed the nature of the foil trials. We now designed the videos such that there were additional variants in which the ground – the location with respect to which the figure moves – also varied (e.g., a different ramp to walk up, a different pole to march past, etc.). Within each video scenario, every possible manner, path, and ground combination was filmed, so that we could create trials in which ground changes were pitted against manner and path changes. Thus, one-third of trials contrasted ground with manner, one-third contrasted ground with path, and one-third contrasted manner and path – our only contrast of interest. In this way, no particular contrast type stood out from the others to participants. As with Experiment 1, all participants were instructed in their native language. As well as changing the structure of the triads task, we also introduced a recognition-memory component into the experimental design. The triads task was now split into four blocks. In between each block of the triads task, participants engaged in the recognition-memory task. Specifically, participants were asked on each trial to identify whether a presented video was a video they had seen as the target video in the previous triads block. Half of the events in the memory blocks were previously viewed target stimuli, and half were novel manner, path, or ground variants which were not seen as target stimuli during the previous triads block (again, using ground variants to disguise our contrast of interest). Analysis of the verbal descriptions was much more promising this time around. English again typed as a satellite-framing language. Almost all descriptions used manner verbs, and expressed path as a satellite, or prepositional phrases (see below). In contrast with Spanish in Experiment 1, Japanese easily typed as a verb-framing language in this context. Since Japanese is a verb-final
Categorization of complex motion events
123
1
Average proportion manner choices
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 English speakers
Japanese speakers Language group
Figure 6.2 Average proportion manner choices by language group in Experiment 2
language, the last verb of an utterance is taken as the main verb of the clause, and in our descriptions participants almost always used a path-conflating verb as the final verb. Our Japanese participants also typically used a manner verb in the clause in a ‘converb’ form comparable to an English gerund, using the -te conjunctive marker: (4)
hashi-tte onnanohito-ga heya-kara de-te kuru run-CONJ woman-NOM room-from leave-CONJ come-PRES ‘The woman comes out of the room running’
(5)
hito-ga arui-te mon-no yoko-o tooru person-NOM walk-CONJ gate-GEN side-ACC go.by-PRES ‘The person goes by the gate walking’
We also examined the descriptions for absolute mention of manner. The groups did differ slightly: English speakers mentioned manner 100% of the time, while Japanese speakers mentioned manner 93% of the time. This is a small difference, but a difference nonetheless. Thus, it appeared then that we had selected more appropriately contrasting languages for Experiment 2 for evaluating the Background, Verb and Manner hypotheses.
124
Loucks and Pederson 1
Average proportion false alarms
0.9 0.8 0.7 manner false alarms
0.6 0.5
path false alarms
0.4 0.3 0.2 0.1 0 English speakers
Japanese speakers
Language group
Figure 6.3 Average proportion of manner and path false alarms by language group
Although our linguistic data quite nicely contrasted Japanese and English according to Talmy’s typology, we failed to detect any significant categorization difference or memory difference between the two groups. Both groups showed no preference for sorting according to manner or path: they were not statistically different from chance. This lack of a preference for our English speakers supports the idea that our foil trials may have influenced how people approached the task – recall that in Experiment 1, English speakers categorized predominantly with respect to manner. This variability in manner bias for English participants – seen also among previous studies – suggests a high level of task sensitivity. Participants appear to respond quite differently depending on seemingly minor changes in design, which raises concerns about whether these studies are consistently tapping into an inherent cognitive bias for these populations (more on this in the next section). Experiment 2 also had a secondary measure of recognition memory for presented motion event stimuli. Consonant with the triads categorization task, both Japanese and English groups performed identically on the memory task. As shown in fig. 6.3, there were no significant differences in memory performance in terms of false recognition of manner or path components for either group.
Categorization of complex motion events
125
Overall, this experiment demonstrated no support for any of the Background, Verb, or Manner hypotheses. Summary Although it is impossible to argue from a single null result, given the weight of the evidence across the six studies reviewed here there does not appear to be any strong indication that the typical syntactic structure of motion event descriptions in one’s native language has any influence on one’s non-linguistic conception of such events. Studies reviewed above differ in the cognitive differences that are obtained between speakers, and some that have demonstrated cognitive differences were likely due to experimental confounds. Experiment 2 – which is, in our opinion, the most carefully controlled experiment yet – still failed to find any differences between speakers on two different measures of cognitive performance. It is, of course, impossible to prove a negative. There may yet be influences of motion event lexicalization patterns and non-linguistic thought. However, even with the most controlled experiment, research in this domain may still be on shaky ground. In the next two sections we discuss two distinct reasons why this might be the case: one methodological and one typological. 4
Rethinking the methods
All of the studies investigating the relationship between the linguistic typology and categorization of motion events (including the two experiments reported on in this chapter) have relied on a variant of a triads task. The design is simple and elegant. Subjects are presented with a stimulus item (i.e., a “pivot” or an “X”) and are asked to group it with either of two other stimulus items. One of these items shares one relevant property (A) with the pivot and the other item shares a different property (B) with the pivot. When subjects group the pivot preferentially with the items sharing one property (e.g. A), this is taken as evidence that subjects categorize the pivot with the A-items more than with the B-items. This shared property between pivot and the selected items must be the basis for categorization in this task. There are, however, a number of potential problems with using this design. Some of these problems may account for the rather mixed results that the various studies surveyed in this chapter have demonstrated. Problem 1. The instructions for triads tasks are seemingly simple. Participants are to sort according to “similarity.” As these projects are inherently crosslinguistic and cross-cultural the issue of instructional clarity is particularly acute. We must question whether the basis for the grouping, i.e., “similarity” or “belonging together,” is comparable for all the participants. For example, how would we know whether participants interpret the instructions as asking them to sort the items paradigmatically (group items belonging to the same
126
Loucks and Pederson
category e.g., two types of tools), or syntagmatically (group items naturally occurring together, e.g., a hammer and a nail)? For an example with motion events, a paradigmatic grouping might group two events together because they both involve a prominent manner of motion using the legs. A syntagmatic grouping might put two events together because a particular path in one event is commonly associated with the particular manner of another event. Since there are no instructions as to the expected nature of the similarity, subjects may find the task essentially unclear. Problem 2. Like many other experimental paradigms, the triads design is a forced choice design. Participants must categorize as to whether events should be grouped according to either path characteristics or manner characteristics. While the selection should reveal which of the two categorization strategies is preferred for this set of stimuli, it remains a distinct possibility that neither strategy is particularly preferred in people’s general categorization of more variable real-world experiences. If a population exhibits a preference for manner categorization when faced with such a task, this does not mean that the same preference plays a role in cognition outside of the task. For example, speakers of one linguistic community may demonstrate a preference for manner during the triads task that does not match their general tendency for no preference for manner or path in their everyday real-world cognition. In other words, we do not know whether the selectional strategy of this task is one which speakers would habitually make outside of this task. To what extent might subjects sort by manner only because they are confronted specifically with a forced choice between exactly manner and path? Of course, issues of ecological validity plague many experimental designs. However, the work on motion typology has a particularly acute lack of corroborating evidence of manner/path categorization from other tasks or naturalistic observation. As such, there is little reason to assume that these particular triads experiments test the effects of habitual language on categorization decisions. Problem 3. More critically, not only do these motion event triads tasks have a forced choice design, but they obviously have no incorrect answer. This is a common design issue for many triads tasks. Forced choice designs where a pivot is compared to two other exemplars are well established and easy to interpret when only one of the answers is correct. The dependent measure is simply of accuracy. When neither possible choice in a forced choice task is uniquely correct, then this design can be successfully used for measuring perceptual similarity, which is why this design is robustly accepted in, for example, phonetics experiments. The simplicity of triads tasks has led to their adoption in a number of studies investigating linguistic relativity as well. Famously, the triads design for testing perceptual similarity was used in Kay and Kempton’s (1984) exploration of the relationship between perceptual similarity and native language color categories. Brown and Lenneberg (1958) and Lucy and Gaskins
Categorization of complex motion events
127
(2001) have used a triads design to test sorting strategies for shape vs. material/ function/color of ordinary objects. In these later studies, the lack of a correct answer invites participants to consciously reflect on alternative interpretations. When it comes to complex motion events which vary by either manner or path, both variants are perceptually quite distinct from the pivot stimulus. In other words, the participants are effectively forced into conscious reflection. Indeed, it is obvious from comments and debriefing from Experiments 1 and 2 that participants are aware that either response in these triads tasks is potentially correct. Since they are consciously aware of these two strategies (of path- or manner-similarity), deciding between them could be the result of conscious imagining of what result the researchers are hoping for or any number of reflective strategies. Indeed this may account for the differences in results for the English participants in the various studies discussed above (sometimes exhibiting manner bias, sometimes not, depending on the task). In Experiment 2, the English speakers lost the manner-bias seen in Experiment 1 when more foils were introduced making the path vs. manner contrast design less apparent. Alternatively, participants may use variable strategies to choose between these solutions. Some participants may select almost arbitrarily to use one strategy or another throughout the task but there is little guarantee that in another session, the same strategy might be chosen. On the other hand, other subjects may deliberately alternate between strategies in a way which cannot be hypothesized from the stimuli. In the results of our own experiments, we sometimes find high intra-subject variability and at other times high intersubject variability, but with fairly consistent responses for individuals. This variability has no obvious pattern with recorded participant parameters. In other words, participants may well rely on strategies unrelated to their habitual language use and beyond the hypotheses ostensibly being tested. For determining non-reflective categorization decisions and how they may be affected by less than fully conscious language patterns, an experimental task can be embedded in another task. In this way, conscious reflection is largely limited to the apparent task and not to the actually measured task-within-a-task. For example, the animals-in-a-row experiment discussed in Pederson et al. (1998) gave subjects the task of rebuilding a sequence of animals from memory on a table with a different orientation from the original display. Many subjects clearly sub-vocalized the set and sequence of the animals and reported doing so in debriefing. However, the dependent measure of the study was actually the question of which direction the animals were facing when the participants rebuilt the sequence. Few subjects reported any conscious strategies for that part of the rebuilding task and we interpret the dependent measure to be largely the result of non-reflective cognitive processes. From this embedded nature of the response, Pederson et al. (1998) were more confident in concluding that there was a certain “naturalness” to participants’
128
Loucks and Pederson
directional response, which was not immediately the result of reflective encoding at the time of the task (either linguistic encoding or some other conscious mnemonic strategy). Despite these problems with triads designs, the dearth of established experimental methods for determining native categorization encourages the use of triads tasks for measuring categorization behavior. We would do well to develop some alternative paradigms to the triads method for measuring categorization. Further, one might look for behavioral measures of cognitive processes other than categorization. Gennari et al. (2002) and Papafragou et al. (2002) attempt to study whether or not memory of events is affected by habitual language use. Of course, if sub-vocalized or explicit language is used as strategy for memorization, hypothesizing an effect of this language on memory is relatively trivial. More interesting are studies in which language is not obviously used for long-term memory encoding. Papafragou et al. studied this with the memory component of their experiment (though as mentioned above, the stimuli were already pictorial representations of events) and found no effect of habitual language use for memory of these pictures. In contrast, Oh (2003) tested for memory effects of native language descriptions of short videotaped motion events with English and Korean speakers. After viewing and describing a total of ten video clips with four target clips of human walking motion, subjects were presented with a surprise memory task for the target clips. English speakers performed more accurately on a memory questionnaire about the manner nuances of arm and leg swings in these walking events than did the Korean speakers, who also used fewer manner descriptions for describing these video stimuli. Since the memory task was administered to the participants after they had linguistically described the stimuli, it is impossible to conclude that there was an effect beyond the specific language used in the trials themselves. Further, Oh only looked for a difference in manner memory in four videos of people walking and only found effects for arm-swinging. It is difficult to extrapolate from these limited results, as there may be an explanation for this difference independent of linguistic type. In our second experiment, in which speakers did not describe the viewed motion events, the Japanese-speaking participants performed identically to the English speakers in both categorization and recognition memory. While it is impossible to directly compare Oh’s questionnaire about the manner details of walking events with our recognition-memory task, Japanese is similar to Korean in terms of Talmy’s typology, so it is unclear what might explain the difference between our results and Oh’s results. Visual attention has not, to date, been studied with respect to these questions. It may prove fruitful to examine whether or not Japanese/Spanish speakers devote relatively more attention to path components in a scene while English
Categorization of complex motion events
129
speakers give relatively more attention to the manner component. However, this separation of manner and path into discrete linguistic components is not typically found in the motion stimuli themselves. At least for the stimuli as used in the current study, it is inconceivable that a person could visually attend to the actor’s manner of motion without also having adequate path information in their visual focus. We revisit this issue in the conclusions section. 5
Moving beyond the typology
We have so far discussed the possibility that studies of motion event processing have been unsuccessful at finding any reliable correlation between language and thought because the triads task may be ill-equipped to detect such correlations. However, even if we did not have reservations about the task for this purpose, or if we were to find a replacement task that would satisfy the concerns outlined above, there is an additional, perhaps more fundamental, problem. Namely, it is unclear whether Talmy’s typology is an appropriate typology for studies of linguistic relativity for motion events. To be sure, we believe it is an excellent typology for grammatical characterizations, but it is not transparently appropriate for event categorization hypotheses. Talmy’s intention, when he created the typology, was to generate a starting point for analyzing semantic and syntactic variation in motion event lexicalization. He did suggest such variations would have implications for non-linguistic cognitive processing (i.e. the Background hypothesis), and his writings have inspired a number of researchers to extend the work in this direction, but in many ways this may have been premature. We will next discuss reasons why this fundamental assumption – that grammatical status should be correlated with conceptual prominence – is a nonstarter. Next, we explore another aspect of Slobin’s manner salience which may be a viable candidate for future explorations in this domain. Finally, we end with a discussion of why manner and path may have been the wrong semantic components in which to expect cognitive variation – for both linguistic and cognitive reasons. 5.1
Correlations between grammar and thought
As was stated at the outset of this chapter, there has been an assumption underlying research in this domain that there should be some correlation between grammatical status and cognitive salience. On the one hand, the main verb has sometimes been regarded as the core of the sentence, and thus is also regarded as central to one’s conception of an event. Such an assumption seems intuitive, and we designed our own experiments around this assumption. However, to our knowledge, this assumption has not been supported by any empirical evidence. While the main verb certainly holds special syntactic status in the clause, there
130
Loucks and Pederson
is yet no reason to believe that it necessarily holds conceptual prominence over other elements in the sentence. Particularly problematic is the fact that it is just as easy to concoct a story in which the manner information that is expressed outside of the main verb is more conceptually prominent because of the fact that it requires distinct syntactic architecture from the main verb, and in fact this is what was originally suggested by Talmy. In this sense, a typology of grammar is not a good starting point for building hypotheses about the effects of language on cognition. For example, contrast skip out, in which the verb expresses the manner and the particle expresses the path, with exit skipping, in which the path is expressed in the verb and the manner expressed adverbially. Do these two expressions indicate distinct conceptualizations? Although the semantic components are expressed in different syntactic positions across these two phrases, the simple fact that they are both expressed may be enough to equate them conceptually. It seems too little is known about the detailed cognitive mechanisms that support grammatical processes to derive hypotheses about how language and thought might be related at this level. 5.2
Manner salience
Given the uncertainty about whether syntactic constituency on its own could contribute to shaping people’s conception of events, this leads us to look for other sources of linguistic variation in motion event lexicalization with which to examine interactions between language and thought. Slobin’s concept of manner salience (1996a,b, 2003, 2006) provides an excellent opportunity. Since it appears that languages differ in the degree to which particular components are expressed overall, then we can more readily hypothesize that there might be cognitive differences between two contrasting language groups. This is reminiscent of the classical notions of linguistic relativity – if one language group does not talk about X, do they think about X in the same fashion? Unfortunately, in our experiments, we find little evidence that verb-framing languages actually leave manner out significantly more often than do satelliteframing languages, at least with the stimuli used in the experiments considered here. Japanese speakers mention manner somewhat less than English speakers (93% vs. 100%, respectively), but the difference hardly seems worthy of inquiry. Although it may still be true that verb-framing languages mention manner of motion less in fiction and in day-to-day discourse, as Slobin’s observations indicate, speakers of such languages choose to talk about manner when describing video stimuli of human motion events. Since such events comprise the bulk of people’s everyday motion event processing, then absolute mention of manner does not appear to be a viable typological difference for studies of Whorfian effects. It may be more optional to express manner in Spanish and
Categorization of complex motion events
131
Japanese, but we find no evidence that this entails that manner is less salient non-linguistically to speakers of these languages. Note that a true test of the manner hypothesis is still lacking however, as no appropriately contrasting languages have been tested.
5.3
Manner differentiation
Another typological variation Slobin has investigated in his research on manner salience is the relative size of the manner lexicon. English has a notably large set of manner expressions (mostly verbs) which one can use to describe very subtle differences among types of motion. In English one can choose to differentiate running from sprinting, jogging, dashing, or bolting, while in Spanish only one verb describes each of these activities: correr. Thus, even if a verb-framing language does mention manner robustly, perhaps it has a less elaborated lexical set for this purpose. If this is true, it might indicate not that manner is somehow less important for speakers of these languages, but that fine manner differentiation may be somewhat less relevant. Slobin has provided evidence for reduced manner differentiation from his research on English to Spanish translation (1996b). A set of Spanish translations of English novels were examined and from these, a hundred motion event descriptions were randomly selected for analyses. Overall, manner was expressed in the verb faithfully across the translations: 165 mentions of manner in the English originals, and 163 mentions of manner in the Spanish translations. However, while English employed sixty types of manner verbs, Spanish only employed forty-three in their translation of the same motion events. Although we are unable to explore the relative size of the lexical set of manner expressions used in previous studies in this domain, we were able to examine how many different types of manner expressions were used in our two experiments. For each experiment, we counted each unique lexical item used to describe the manner of motion in each of the events. In Experiment 1, our Spanish speakers (n = 7) did indeed use a less variable lexical set to describe the manners depicted in the events: for the ten unique manners, Spanish speakers used a total of twelve unique lexical expressions, whereas English speakers (n = 5) used a total of twenty-one unique lexical expressions. In Experiment 2, however, the results were contrary to expectation: for the twelve unique manners, Japanese speakers (n =7) used twenty-five unique lexical items, and English speakers (n = 5) used nineteen. However, neither of these differences are significant, and moreover, even if they were, they would not correlate with any of the non-linguistic data we collected for any of the language groups. Thus, much like absolute mention of manner, although speakers of verbframing and satellite-framing languages may differ in the relative size of the
132
Loucks and Pederson
lexical set dedicated to manner in Slobin’s text counts, this may not relate to how they describe motion events or how they process them non-linguistically. Matsumoto (2003) reports that there is actually a poor correlation between languages which can be typed as verb-framing languages and the amount of lexical inventory which they have for expressing manner of motion. That is, while individual languages such as Spanish may be appropriately described as having less manner differentiation than, for example, English, we cannot expect this to generalize to the inventory of manner expressions across other verb-framing languages. In all languages, manner expressions are largely expressed through non-grammaticized or open-class elements. That is, a language’s inventory of manner verbs, adverbs, onomatopoeic expressions, and other expressions of manner is readily augmented by borrowing or innovation as further manner differentiation becomes communicatively appropriate. In other words, all languages may augment their manner expressions at any time. Furthermore, it is not clear whether one language’s more extensive manner differentiation implies that manner information is proportionately more salient than path information than would be the case in another language with less manner differentiation. In contrast to manner expressions, path expressions are generally found among a relatively small set of grammaticized verbs, prepositions, and other such closed-class expressions. To increase path differentiation within the closed-class inventory requires a grammaticization process which is inherently slower than the largely open-class innovation of manner expressions. In other words, new path expressions are necessarily slow to enter into the vocabulary of all languages. Satellite-framing languages, such as English, are often rich in pathexpressing adpositions. Verb-framing languages, such as Japanese, are typically sparser in path expression. Indeed, English’s uncommonly large set of adpositions and verb particles gives the English speaker unusually rich grammatical resources for path differentiation. In other words, while the typical choice of main verb for motion events might suggest considerable concern with manner, the prepositions and particles of English suggest that English speakers might be particularly concerned with path. However these crosslinguistic grammatical contrasts may not even be relevant for these motion event stimuli, since we do not find substantial differences in the overall expression of manner in their descriptions across languages anyway. 6
Conclusions
Summarizing across a range of studies, we find little evidence for differences in categorization according to native language path or manner expression. Rather than taking this as counter-evidence against a language effect on cognition (as
Categorization of complex motion events
133
suggested by Papafragou et al. 2002), we argue that there is a preference across languages for robustly expressing both manner and path. In other words, there is insufficient meaningful linguistic variation for manner and path to pursue linguistic relativity research. Beyond linguistic descriptions, and perhaps more fundamentally, it is quite difficult to imagine manner and path as perceptually distinct components in a motion event. As a person or object moves with a particular manner, that same motion defines the path. Attention to either manner or path necessarily involves at least some attention to the other. However, other components of motion events can be perceptually distinct from the manner and path. Just examining the typical list in table 6.1, the ground, cause or agent, origin, endpoint, and recipient are all potentially perceptually distinct components, while figure, manner, and path essentially constitute a single perceptual package, though language generally forces the decomposition of this package into distinct components. This suggests that for the purposes of investigating non-linguistic correlates of different linguistic patterns for motion event expressions, examining components other than figure, manner, and path might prove more appropriate. If some languages reliably differ in their extent of expression for, e.g., obligatory mention of ground vs. cause, one might hypothesize different non-linguistic salience of these particular components correlating with the native language patterns of the participants. In contrast, choosing to oppose manner and path (as with the studies surveyed in this chapter) may well be the poorest choice of components to study. These two motion components may be the least likely to evidence categorizational memory, or attentional differences across linguistically defined populations. All in all then, we feel that Talmy’s linguistic typology of manner and path expressions is an unlikely source for variation in cognitive tasks, and not just for categorization in triads tasks, but in other behavioral measures as well. However, this should not be taken to deny any cognitive relevance of linguistic differences in motion event descriptions more generally. It may simply be that Talmy’s basic verb- vs. satellite-framing typology is not the most relevant linguistic dimension for separating languages from each other in order to explore relationships between language and cognition. That being said, we call for a co-operative venture between cognitive psychologists and linguistic typologists to develop a list of linguistically variable components which might prove more fruitful for testing issues of the relationship between language and non-linguistic cognition for motion events.
7
Putting things in places Developmental consequences of linguistic typology Dan I. Slobin, Melissa Bowerman, Penelope Brown, Sonja Eisenbeiß, and Bhuvana Narasimhan
1
Introduction
In this chapter, we explore how different languages describe events of putting things in places, and how children begin to talk about such events in their very early multi-word utterances. Our aim in focusing on the domain of “putting” events is to allow us to identify some important semantic and psycholinguistic factors that influence the course of acquisition. The overarching question is to determine the extent to which the development of linguistic event representations is influenced by the particular language the child is learning. Events of “putting” are frequently discussed in interactions between caregivers and children, providing us with a rich crosslinguistic database in a high-frequency semantic domain. By examining language-specific characteristics of early event representations, we can make inferences about the cognitive resources and abilities that children bring to the task of learning how to talk about events in their native language. A major motivation for working crosslinguistically is to investigate the role of language typology in children’s mapping of meanings onto forms – in this case, the expression of particular sorts of transitive motion events. In his well-known typology of how languages encode motion events, Talmy (1991, 2000b) distinguishes between ‘satellite-framed’ languages and ‘verb-framed’ languages on the basis of the element in the clause where information about path is characteristically encoded. Our analyses show that this typological distinction does play an important role in the course of language acquisition, but other features that crosscut this typology play a role as well. These include properties of the target language’s inflectional morphology and its semantic categories. We examine eight languages – four satellite-framed (English, German, Russian, Finnish) and four verb-framed (Spanish, Hindi, Turkish, Tzeltal). 2
Typology: verb-framed and satellite-framed languages
Verb-framed languages characteristically encode the path of motion (e.g., the path ‘in,’ ‘out,’ ‘upward,’ ‘downward’) in the verb, whereas satellite-framed 134
Putting things in places
135
Figure 7.1 English placement schema (satellite-framed)
languages characteristically encode path outside of the verb (e.g., particles, prefixes, directional adverbs or inflections). (Note that, following Talmy, we are concerned with the characteristic, typical means of event coding in a language. Every language has subsystems or low frequency constructions which do not fall into the overall typological description.) Information about the manner or cause of motion is also treated differently in the two types of languages; however, in this investigation we focus principally on the encoding of the path of motion. To illustrate the differences among our eight languages, we adapt Talmy’s schema in distinguishing the following four conceptual components of a placement event: Figure: the object that is caused to move Action: the placement action (caused motion toward a goal) Goal: the intended end location of the figure Relation: the resulting spatial relationship between the figure and the goal (a subtype of Talmy’s “path” category). To begin with, compare descriptions of an event of putting a pencil into a box in English – a satellite-framed language, and Spanish – a verb-framed language. In English, the placement action is expressed in the verb and the resulting relation between the pencil and the box is encoded in a particle (put the pencil in) or a prepositional phrase (put the pencil in the box). The verb could also express the manner of the caused motion, as in roll the pencil into the box. The dominant English pattern is schematized in fig. 7.1, using a schematic representation that we will apply to all eight languages. In Spanish (fig. 7.2), by contrast, both the placement action and the resulting relation are expressed by the verb, which can be roughly translated as ‘insert’: mete el l´apiz en la caja ‘insert the pencil at the box.’ (For convenience, the following figures have only English glosses for the seven non-English languages.) Note that the preposition preceding the goal expression, translated here as ‘at,’ gives some general spatial information, but, unlike its English counterpart, it does not indicate the containment relationship obtaining between the figure and
136
Slobin et al.
Figure 7.2 Spanish placement schema (verb-framed)
the goal. (For example, the same preposition occurs in the construction poner en la mesa meaning ‘put on the table.’) The dichotomy between verb-framed and satellite-framed is useful, but, as we will see, it is not the whole story; there is considerable variation within each of these language types, with consequences for the course of acquisition.
3
Intratypological variation
3.1
Satellite-framed languages
First consider the languages which, together with English, represent the satellite-framed type in our corpus: German, Russian, and Finnish. In German, as in English, the placement event is encoded in a verb and the relation in a particle or a preposition. But the verb obligatorily expresses more information than simply placement: the speaker must choose between verbs such as legen ‘lay’ and stellen ‘make stand’ on the basis of the shape and final orientation of the figure. Further, a subcomponent of the action, which we may call the ‘vector’ (motion towards a goal), is encoded not only in the verb, but also in the accusative case ending on the determiner of the goal nominal. (Accusative contrasts with dative case, which is used for encoding a static locative relation.) Although German and English are both satellite-framed, they present rather different structures for the child to learn. Compare fig. 7.3 (German) with fig. 7.1 (English). Russian (fig. 7.4) patterns similarly to German except that the accusative case-marking appears directly on the goal nominal. Although Finnish (fig. 7.5) is also a satellite-framed language, it shows a different patterning: information about the relation (containment) is combined with information about an aspect of the action (vector: motion towards), and both are expressed simultaneously in the illative case ending on the goal nominal (translatable as ‘into’). As in the Germanic and Slavic languages, the verb is neutral with regard to the specific path.
Putting things in places
137
Figure 7.3 German placement schema (satellite-framed)
Figure 7.4 Russian placement schema (satellite-framed)
Figure 7.5 Finnish placement schema (satellite-framed)
3.2
Verb-framed languages
We have already seen how putting a pencil in a box is expressed in Spanish (fig. 7.2), where the relation and the action are conflated and both are expressed in the verb. Our other verb-framed languages are Hindi, Turkish, and Tzeltal. In the most usual Hindi encoding of an event of putting a pencil in a box, however, the relation is expressed in the inessive case ending on the goal nominal, rather than in the verb (fig. 7.6). This case ending, unlike the Finnish illative, does not include information about vector: the same case is used regardless of whether the scene described is a dynamic one (putting ‘into’)
138
Slobin et al.
Figure 7.6 Hindi placement schema (verb-framed)
Figure 7.7 Turkish placement schema (verb-framed)
or a static one (being ‘in’). Given that relational information is expressed here in a case ending rather than in the verb, it may seem surprising that we have classified this language as verb-framed. But Hindi is in fact verb-framed: like Spanish, it has a full set of path verbs, in Talmy’s terms, comparable to ‘enter,’ ‘exit,’ ‘ascend,’ ‘descend,’ ‘insert,’ ‘extract,’ and so on;1 and – again like Spanish but unlike satellite-framed languages – it does not allow the verb slot to be filled with a manner verb, as in ‘roll the pencil into the box.’ Unlike Spanish, however, it does not require the use of a verb that expresses the relation. Turkish is also a verb-framed language that has a full set of path verbs and does not allow the main verb to express information about the manner of motion in events that encode change of state, such as placement events. But in describing an action of putting a pencil in a box, the speaker does not even have to express the relation at all (fig. 7.7). The dative case ending on the goal nominal expresses the ‘vector’ subcomponent of the action (motion towards a goal), but the fact that the pencil ends up in the box is typically left to inference: 1
The fact that Hindi has (in-)transitive verbs in its lexicon which lexicalize the path together with motion does not entail that such path verbs are obligatory in descriptions of (caused) motion to a goal. Such motions can also be described using deictic verbs such as aa ‘come’ and jaa ‘go’ or semantically general verbs such as Daal ‘put/drop’ or rakh ‘put/place’ in conjunction with locative case-marked nominals or spatial nominals.
Putting things in places
139
Figure 7.8 Tzeltal placement schema (verb-framed)
listeners know what spatial relation is likely to result when a pencil is moved to a box. If the goal is a table, ‘put table-DATIVE’ would be interpreted as ‘put on table.’ Tzeltal (a Mayan language) conforms to the canonical pattern of verbframing in combining information about both the action and the relation together in the verb. But it differs from the other verb-framed languages in that many of its placement verbs pack in additional information: fig. 7.8 shows a Tzeltal verb, expressing not only that a figure object is put into a goal object, but also that the figure is a long thin thing, and that the goal contains other elongated objects to which the figure ends up parallel. 3.3
Summary of patterns
In comparing these eight patterns of expressing placement events, three phenomena stand out for special attention. First, two or more meaning components can be expressed simultaneously by one form (Talmy’s ‘conflation’). A typical example of conflation is provided by the verb meter ‘insert’ in Spanish, which expresses both an action (of caused motion) and a resulting relation (of containment). A second phenomenon warranting special attention – in a sense, the converse of conflation – is the fact that a meaning component can be distributed across more than one morpheme. This potential source of crosslinguistic variation has tended to escape attention in Talmy’s typology, but has been discussed by Sinha and Kuteva (1995) under the rubric “distributed semantics.” An example of distributed semantics is found in the German example (fig. 7.3), where information about the action (caused motion to a goal) is expressed in two places: both in the verb and in the accusative case ending on the determiner of the goal nominal (contrasted with the use of dative case for static situations).
140
Slobin et al.
Finally, a particular meaning component may not be explicitly encoded at all, but rather left to be inferred on the basis of discourse context and world knowledge. An example of this can be seen in Turkish (fig. 7.7), where the relation of containment is inferred from knowledge of the canonical relationship between pencils and boxes. As noted, if the first part of the sentence is held constant and the word ‘table’ is substituted for ‘box,’ the listener will infer that the pencil ended up ‘on’ the table. In addition to calling attention to crosslinguistic differences in conflation, distribution, and the interplay between semantic underspecification and pragmatic inferences, we should emphasize that even when languages offer the same set of options for encoding a placement event, they may differ in the choices speakers typically make from among these options. Recall, for instance, that Hindi has more than one option for expressing an event of putting a pencil in a box. Speakers can use a Spanish-style verb like ‘insert’ that conflates the action and the containment relation, but they can also select a general verb like ‘put’ and express the relation either with the inessive case ending, as in fig. 7.6, or with a spatial nominal (‘inside’): (1) pencil put box-INESSIVE; (2) pencil put (box’s) inside; (3) pencil insert box-INESSIVE; (4) pencil insert (box’s) inside. In actual fact, the Hindi child rarely hears the ‘insert’ verb in the input; the pattern most frequent in parental speech involves combining the ‘put’ verb with the inessive case ending on the goal expression (Narasimhan and Brown 2008).
4
Patterns in early acquisition: satellite-framed and verb-framed languages
We hypothesized that if children tune in to the typological characteristics of the target language early on, children learning verb-framed languages would predominantly use verbs in talking about placement events, and would use few if any non-verbal elements such as adpositions and directional adverbs. In contrast, children learning satellite-framed languages would home in on satellites such as verb particles, as well as adpositions and locative inflections, preferring such forms over verbs in talking about putting things in places. All the analyses presented in the following are based either on diaries that parents kept of their children’s linguistic development or on video or audio recordings of spontaneous family interactions. Some of the data come from the CHILDES database. Other data were gathered and analyzed by ourselves as well as by a large number of colleagues, whom we would like to acknowledge.2 2
English: Roger Brown, Michelle Chouinard, Eve Clark, Jacqueline Sachs; German: Heike Behrens, Harald Clahsen, Max Miller; Hindi: Rukmini Bhaya-Nair, Pritha Chandra, Ayesha Kidwai, Rajesh Kumar, Bhumika Sharma, Rachna Sinha; Russian: Sabine Stoll; Spanish: Jos´e Mar´ıa Albal´a, Mar´ıa Benedet, Mar´ıa Carrasco, Celis Cruz, Jos´e Linaza, Victoria Marrero, Rosa
Putting things in places
141
In our analysis of early development, we have focused on placement utterances that express causing an inanimate object to move to a place (e.g., ‘put,’ ‘place,’ ‘attach’). We did not include utterances describing self-motion plus placement (expressed by verbs such as ‘bring’): putting clothing on, or giving something to an animate recipient. Moreover, we excluded answers to ‘Wherequestions’ – which often ellipse everything but the goal – as well as imitations and self-repetitions. For each language, we selected two children whose corpora provided sufficient data.3 Starting from the point (+1) at which these children began to combine morphemes to encode placement events, we coded all placement utterances in their samples regardless of utterance length.4 Three major questions are addressed in the analysis: (1) When children are capable of combining two or three morphemes to talk about placement, which morphemes do they select to talk about placement? (2) Does the selection differ for satellite-framed and verb-framed languages? (3) What are children’s favored patterns in each of the eight languages? In our analysis of children’s placement utterances at this beginning point, we have tried to establish the dominant patterns for the four satellite-framed and four verb-framed languages.5 Describing these patterns revealed the complexities of the crosslinguistic variation children and researchers are faced with, and the complexities involved in producing a comprehensible overview of the basic patterns and the generalizations emerging from them. In order to simplify the presentation of the dominant patterns in each of the eight languages, we have abstracted away from language-particular details which were not relevant for the analysis, presenting one example for each pattern. For each example we provide an English gloss that captures the relevant morphology (e.g., locative case markers), but omits morphological markers that do not serve to encode figure, action, relation, or goal (e.g., person, number, or tense-marking). Moreover, we have focused on semantic elements in the verb phrase, omitting the agent of the placement action, negation elements, etc. For the sake of comparability, we have also normalized the word order according to English patterns.
3
4
5
Graciela Montes, Rosanna Mucetti, Susana L´opez Ornat, Elisabet Serrat Sellabona, Catherine Snow; Turkish: Ayhan Aksu-Koc¸, Aylin K¨untay. The data are listed by researcher, with the child’s designation (name, pseudonym, code) in parentheses when available. The age is about 2;0. Sample size varies by child, with at least two hours of recordings per child. English: Roger Brown/CHILDES (Adam), Jacqueline Sachs/CHILDES (Naomi); German: Sonja Eisenbeiß (Liam), Harald Clahsen (Leonie); Finnish: Melissa Bowerman (Rina, Seppo); Russian: Sabine Stoll (Ch1, Ch2); Turkish: Ayhan Aksu-Koc¸ (Azra, Deniz); Spanish: Susana L´opez Ornat (Mar´ıa), Jos´e Linaza (Juan); Tzeltal: Penelope Brown (Lus, Xan); Hindi: Bhuvana Narasimhan (Ish, Aar). For subsequent developmental analysis, the data were also coded for a second time period, 6–12 months after the children started to string morphemes together. Data for this second time point are mentioned in this chapter only for Tzeltal (see table 7.10). The one or more most-frequent construction types in each child’s spontaneous productions were designated as “dominant.”
142
Slobin et al.
Table 7.1 Components of analysis of placement scenarios Words
Semantic elements
in inwards put enter-CAUS, insert lay box here hither
relation relation&vector action action&relation action&figure goal deixis deixis&vector
Analyzing the dominant patterns for the description of placement scenarios, we found the following types of elements: r particles in which the relation or path is encoded either by itself (e.g., ‘in’) or in combination with the vector or direction of the motion (e.g., ‘inwards’) r verbs expressing the action (e.g., ‘put’) r verbs that conflate the action with the relation (e.g., ‘enter-CAUS’, ‘insert’) r verbs that conflate the action and the figure (e.g., ‘lay’) and/or the goal (e.g., ‘insert between two surfaces’) r nouns referring to the goal of the motion (e.g., ‘box’) r deictic elements such as ‘here’ or ‘hither’, which contain deictic information only or a combination of information about deixis and the vector or direction of the motion. The components of the analysis are summarized in table 7.1.6 4.1
Satellite-framed languages
Recall that in satellite-framed languages, the path of motion is lexicalized not in the verb itself, but in a ‘satellite’ (e.g., in a verb particle such as ‘in’ in Germanic languages, or a corresponding path prefix or suffix in other types of languages). In this section, we review children’s preferred patterns for expressing placement 6
Conventions followed in the “Semantic elements” column of tables 7.1–7.9: a hyphen (as in goal-vector, deixis-relation) indicates that the second element is a suffix on the first element; an ampersand (as in relation&vector, action&figure) indicates that the two meanings are conflated, i.e., expressed by a single morpheme. Verbs of placement that include manner information, such as shove, stuff, cram, are not included in the analysis. They are vanishingly rare in verbframed languages, and although they are characteristic of satellite-framed languages, they are not characteristic of caregiver–toddler discourse in our data.
Putting things in places
143
Table 7.2 Dominant English patterns Words
Semantic elements
in bead in in mouth pour in put table
relation figure relation relation goal action&figure relation action goal
Table 7.3 Dominant German patterns Words
Semantic elements
inwards (‘rein) that inwards (das ‘rein) there inwards (da ‘rein) girl chair (m¨adchen stuhl )
relation&vector figure relation&vector deixis relation&vector figure goal
events in the four satellite-framed languages: English, German, Russian, and Finnish. English: In four of the five dominant patterns found in the English child language data, the relation is encoded by a particle like in. Thus, the Englishspeaking children produce utterances like “bead in,” “in mouth,” or simply “in.” In contrast, only two of the five most frequent patterns for placement utterances involve a verb. Such verbs typically encode information about the action but provide no information about relation or goal. The dominant English patterns are shown in table 7.2. German: None of the most frequent patterns of placement utterances in the German child data contains a verb. Rather, children acquiring German show an overwhelming preference for utterances with particles like rein ‘inwards,’ in which the relation and the vector are conflated. These particles occur either by themselves or in combination with deictic elements (da rein ‘there inwards’) or noun phrases encoding the figure (X rein ‘X inwards’). In addition, nounnoun combinations encoding the figure and the ground can be observed. The dominant German patterns are shown in table 7.3. Russian: Russian children produce more verbs in placement utterances than English and German children, and they do not use path prefixes to express relations or vectors. Rather, they make use of locative cases which encode vector information. However, just as in the English and German data, placement utterances without verbs can be found. In these utterances an element encoding the figure is combined with a case-marked element encoding the goal, e.g., vs¨e
144
Slobin et al.
Table 7.4 Dominant Russian patterns Words
Semantic elements
put (poloˇzi) put on table-ACCUSATIVE (poloˇzi na stol) everything bag-ACCUSATIVE (vs¨e sumku) set bear (posadim miˇsu)
action action relation goal-vector figure goal-vector action&figure figure
Table 7.5 Dominant Finnish patterns Words
Semantic elements
put this-ILLATIVE (laita t¨anne) that-ILLATIVE cow (tonne ammu)
action deixis-relation&vector deixis-relation&vector figure
sumku ‘everything bag-ACCUSATIVE.’ The dominant Russian patterns are shown in table 7.4. Finnish: In the Finnish data, as in Russian, locative case markers appear from early on. In contrast to the Russian case markers, however, Finnish case markers conflate relation and vector. Moreover, only one of the two dominant patterns of placement utterances contains a verb. The dominant Finnish patterns are shown in table 7.5. Summary: satellite-framed languages: In general, children learning satellite-framed languages focus especially on the spatial relation or on the vector, and secondarily on the goal. They typically omit the verb. In three of the four languages the children express vectors (e.g., ‘inwards,’ ILLATIVE case). English-speaking children express spatial relations (in, on) but not vectors (into, onto).
4.2
Verb-framed languages
Recall that in verb-framed languages, action and relation are typically conflated in the verb. In this section, we review children’s preferred patterns in the four verb-framed languages: Spanish, Hindi, Turkish, and Tzeltal. Spanish: Spanish-speaking children express an action in all their favored patterns, using general verbs of placement such as poner ‘put,’ in sentences such as ‘put it’ or ‘put it here.’ They also frequently use verbs such as meter ‘insert,’ in utterances such as ‘insert it,’ in which, following the canonical pattern of verb-framed languages, action is conflated with the spatial relation of containment. The dominant Spanish patterns are shown in table 7.6.
Putting things in places
145
Table 7.6 Dominant Spanish patterns Words
Semantic elements
put it (ponga e´ se) insert it (m´etelo) put-it here (p´onga-lo aqu´ı)
action figure action&relation figure action-figure deixis
Table 7.7 Dominant Hindi patterns Words
Semantic elements
attach this-INESSIVE (is-mE lagaa) do this down (isko niice kar) put here-LOCATIVE (yahAA par rakh)
action goal-relation action figure relation action deixis-relation
Table 7.8 Dominant Turkish patterns Words
Semantic elements
this.place-DATIVE put (buraya koy) this.one put (bunu koy) put (koy)
goal-vector action figure action action
Hindi: As in Spanish and Turkish, placement events in early child Hindi are also encoded in verbs in all of the favored patterns. Hindi, like Spanish, has verbs such as ‘insert,’ which conflate action and relation; however, in Hindi (as in Turkish), placement events tend to be encoded in general verbs of placement or action such as ‘put,’ ‘do,’ or ‘attach,’ as in ‘attach in this,’ ‘put here,’ or ‘do this down.’ The dominant Hindi patterns are shown in table 7.7. Turkish: Children acquiring Turkish show an overwhelming preference for actions encoded in verbs, leaving relation to be inferred from context. All of their favored patterns involve verbs encoding placement actions, such as koymak ‘put’ or atmak ‘throw.’ The dominant Turkish patterns are shown in table 7.8. Tzeltal: Tzeltal children also use verbs in all their favored patterns, e.g., ‘put,’ ‘insert.’ What is remarkable is the early use of very specific verbs that conflate the action with properties of the figure and/or goal, e.g., ‘set [bowlshaped object]’ or ‘cover [with cloth].’ The dominant Tzeltal patterns are shown in table 7.9. Summary: verb-framed languages: All of the favored patterns involve action, encoded by the verb. Tzeltal and Spanish children use mostly verbs
146
Slobin et al.
Table 7.9 Dominant Tzeltal patterns Words
Semantic elements
put (ak’) insert (otzes) set [of bowl-shaped object] (pach) cover [with-cloth]-for.her head (muk-be jol)
action action&relation action&figure action&figure goal
with canonical verb-framed conflation patterns (action&relation, e.g., ‘insert’). Hindi and Turkish children mostly use general verbs of placement (action, e.g., ‘put,’ ‘attach’).
5
Interpreting the patterns in child language
As noted in the Introduction, a central feature of Talmy’s distinction between verb-framed and satellite-framed languages lies in the locus of lexicalization of path information: in the verb in verb-framed languages, in elements associated with the verb in satellite-framed languages. In encoding placement events, children tune in to the typological characteristics of their language at an early stage of development. Children acquiring Spanish, Hindi, Turkish, and Tzeltal typically use verbs, focusing on the action of putting.7 Children acquiring English, German, Russian and Finnish tend to use various sorts of directional locative markers, paying relatively more attention to the vector and relational elements of the placement scenario – that is, those semantic elements expressed outside of the verb. However, at a finer-grained level of analysis, a more differentiated picture emerges. Within both of the typological groupings, languages differ in the degree to which they exhibit the properties typical of the language type. Within the set of satellite-framed languages, children acquiring Russian and Finnish pay more attention to the action in placement utterances (encoded in the verb) than children acquiring English and German; that is, they are more likely to use 7
Since in verb-framed languages the verb often conflates action with relation, one may wonder whether it is the relation on which learners focus rather than the action per se. (We thank J¨urgen Bohnemeyer for pointing out this alternative interpretation to us, as well as the one discussed in footnote 8.) While we cannot resolve this issue, we think action is the more important element driving learners’ initial preference for verbs in these languages. In two of the four verb-framed languages in our sample, Hindi and Turkish, the most popular ‘putting’ verbs with which children express placement events do not in fact conflate relation (see tables 7.7 and 7.8); nevertheless the children home in on these verbs just as zealously as learners of Spanish and Tzeltal home in on relation-conflating verbs such as ‘insert.’
Putting things in places
147
Figure 7.9 Scale of languages according to relative frequency of verbs at t1
verbs in their placement utterances.8 Children acquiring verb-framed languages do not pattern identically either. Children acquiring Hindi and Turkish explicitly encode the goal (expressed in case markers or spatial nominals) more often than do children acquiring Spanish. Unlike children acquiring Hindi, Tzeltalacquiring children produce verbs which conflate figure and goal information (a pattern reminiscent of Atsugewi, a language which conflates information about properties of the figure along with motion in the verb root [Talmy 1985]). The current findings suggest a scale of relative frequency of children’s early verb use in the eight languages, as shown in fig. 7.9. At one end, we have German and English, where verbs are rare (relatively little emphasis on action). At the other end, we have Hindi and Tzeltal, where all of the children’s preferred patterns include verbs (overwhelming attention to action). Russian, Finnish, Spanish, and Turkish occupy intermediate positions between these extremes. A number of factors potentially contribute to the scalar distribution we find. One factor is language typology. There are many verbs – both types and tokens – in the speech of children learning verb-framed languages, and far fewer in the speech of children learning satellite-framed languages. It seems that the typology of the language plays a role in the frequency of verb use in children’s ways of talking about placement scenarios. 8
In Russian, the verbs used to express placement often encode the end posture of the figure (‘standing,’ ‘lying,’ etc., see fig. 7.4), so one might wonder whether it is figure rather than action that attracts learners’ attention in this language. If so, learners of German, also a “posture verb” language (see fig. 7.3), should show a similar early preference for verbs, but they do not. (Nor can this explanation account for the relative popularity of verbs among learners of Finnish, where placement verbs do not encode posture.) Also problematic for the “attraction to posture” hypothesis is that learners of “posture-verb” languages seem to have trouble grasping the distinctions made by posture verbs: (1) the German children in our sample initially tended to rely on ‘light’ verbs like ‘make’ or ‘do’ rather than posture verbs, and (2) in an experimental study, learners of Dutch – which, like German, obligatorily breaks down “putting” events according to the posture of the figure – vastly overextended ‘lay’ to all placement events (Narasimhan and Gullberg 2010; Gullberg and Narasimhan 2010). It is not easy, then, to explain differences in relative attention to verbs vs. satellites among learners of different satellite-framed languages through an appeal to different verb semantics. A more likely explanation, as we argue shortly, is variation in the perceptual salience of satellites in these languages.
148
Slobin et al.
A second factor is the perceptual salience of the grammatical morphemes that encode spatial relations (particles, verbal affixes, case markers, and adpositions). This factor can play a role that distinguishes between languages that belong to the same language type with regard to lexicalization patterns. Children learning satellite-framed languages are more likely to encode spatial relations/vectors when talking about placement scenarios if their language marks the spatial relation in a transparent way. For example, the spatial relation of containment, encoded by the English element in (either the particle/satellite put in or the preposition in the box), is syllabic, separable from the verb, and often stressed (put it IN). By contrast, the relation is relatively non-transparent in the Russian prefix v- ‘in-’, as in vloˇzit’ ‘inlay’, as well as the preposition v ‘in’, as in v korobku ‘in box:ACCUSATIVE.’ In both instances, the Russian relational marker – whether prefix or preposition – is non-syllabic, unstressed, and phonologically part of the following content word. In Finnish, the ‘satellites’ in the encoding of placement events are unstressed inflections either on the goal nominal (e.g., p¨oyd¨a-lle ‘table-ALLATIVE’ [onto the table]) or on a demonstrative (e.g., t¨a-nne ‘this-ILLATIVE’ [into here]) or a relational nominal (e.g., p¨oyd¨an p¨aa¨ -lle ‘table’s head-ALLATIVE’ [onto the top of the table]). In general, then, the ‘satellites’ are less transparent in Russian and Finnish than in English and German. Thus although Russian and Finnish are satellite-framed, children learning these languages produce fewer satellites and more verbs than learners of English and German. We can conclude that the more perceptually salient the grammatical marking of the spatial relation, the more likely are children to encode Relation, using salient satellites and prepositions. Hence the early favored patterns in the English and German data sets include relation or Vector more often than the patterns found in the Russian and Finnish data sets. Finally, a third possible factor lies in the semantics of the linguistic elements, especially the verbs. For example, many Tzeltal verbs conflate the properties of figure/goal along with action, whereas placement verbs in Hindi, Spanish, and Turkish typically do not. All of the early patterns in Tzeltal include verbs, often verbs that carry a good deal of semantic content. The three dimensions we have identified – language typology, perceptual salience of relational marking, and semantic richness of verbs – all contribute to a scalar distribution of children’s favored patterns at the two-word stage. There are undoubtedly additional interacting factors that require further research to identify. 6
Beyond framing typology
The data allow us to pose a number of questions that go beyond Talmy’s typology. We consider three such issues: (1) specificity of semantic categorization (6.1); (2) crosslinguistic differences in child-directed speech (6.2); and (3) an
Putting things in places
149
intratypological comparison of English and German with regard to the explicit mention of the goal in placement utterances (6.3). 6.1
Specificity of semantic categorization
On the level of semantic organization, languages differ in how finely they divide the domain of placement. They also differ in the sorts of semantic distinctions that they require or encourage speakers to make in talking about placement situations. Here we go beyond Talmy’s typology, because the semantic specificity of linguistic forms – verbs, case markers, adpositions – varies independently of verb- and satellite-framing. A major task for an individual acquiring a particular language is to become sensitive to particular ways of organizing conceptual space into semantic categories, and to the mapping of those categories onto lexical items, grammatical morphemes, and construction types. In our data we find two types of semantic variation in what is encoded by relevant words or morphemes. The first is degree of specificity of the spatial relation. For example, where Spanish can use a single preposition, en, English requires a choice between in, into, at, on, and onto. A second major area of variation is found in the required attention to spatial properties of the figure and/or the goal – that is, whether or not spatial properties of the figure and/or goal, or the end configuration of figure and goal, are incorporated in the verb. Languages make a range of distinctions of shape, substance, and orientation. Consider, for example, the distinctions made in English between lay, stand, and set, focusing on the orientation of the figure, and put, pour, and scatter, focusing on physical characteristics of the figure. Crosslinguistic comparison makes it evident that languages display different action categories in placement verbs. Consider two kinds of examples of semantic variation in our data: obligatory marking of goal phrases (6.1.1) and specificity of verb categories (6.1.2). 6.1.1 Obligatory marking of goal phrases In four of our languages, prepositions or case endings obligatorily mark a goal nominal. The semantic categories of these markers differ across languages, as can be seen in a simple comparison of the encoding of two cross-cutting dimensions, containment/support and static/dynamic (here more specifically: ‘location at’ vs. ‘motion toward’). Figures 7.10a, 7.10b, 7.10c, and 7.10d show that the four languages make strikingly different categorizations. As shown in fig. 7.10a, Spanish uses the preposition en ‘in/on’ for all four scenes. Children must then form a category for en that is indifferent to containment vs. support and to the static (location at) vs. dynamic (motion toward) character of the situation. By contrast, Turkish obligatorily marks the goal with case markers that distinguish the two static scenes (locative case) from the two dynamic scenes (dative case). This is shown in fig. 7.10b. Children learning Turkish, therefore,
150
Slobin et al.
Spanish Figure 7.10a Spanish preposition
Turkish Figure 7.10b Turkish case-marking
have to ignore the containment/support distinction for the purpose of this case marking. Note that in both Spanish and Turkish you can distinguish, e.g., ‘apple INTO bowl’ from ‘cup ONTO table’ if you want to (with additional specifiers), but this distinction is normally left to inference. Unlike in Turkish, case markers in Hindi are indifferent to whether the placement event is static or dynamic, but they are sensitive to whether it involves containment or support. The inessive case is used for both containment scenes, while the adessive case applies to both support scenes, as shown in fig. 7.10c.
Putting things in places
151
Hindi Figure 7.10c Hindi case-marking
Finnish Figure 7.10d Finnish case-marking
Finally, Finnish requires speakers to make a four-way distinction: static containment (inessive case), static support (adessive), dynamic containment (insertion) (illative), and dynamic support (placement onto a surface) (allative). These four case-marked categories are shown in fig. 7.10d. Because of the case-marking, children learning Finnish have to attend to the distinction both between static and dynamic and between containment and support. What are the consequences of these differences in categorization for children learning to mark goals in these four languages? One potential consequence is in what is learned earlier. We might expect making one categorical distinction
152
Slobin et al.
to be easier than making two; thus Finnish children would be predicted to have the hardest task. This is indeed what we find. At the stage when children are beginning to string morphemes together, the Spanish children are using en and the Hindi and Turkish children are making the contrast between the two relevant cases for their languages. (Similarly, English children at this age make the appropriate distinctions between containment and support.) But the Finnish children have trouble with their four-way case distinction: they consistently distinguish between static and dynamic scenes, but not, in the beginning, between containment and support; instead, they use illative-marked deictic forms (e.g., tonne ‘there-ILL’) for both kinds of relations, and do not yet apply case-marking to nouns like ‘table’ or ‘cup.’ (Adults use the illative deictic forms similarly: these forms are semantically relatively unmarked and more broadly applicable than their allative counterparts.) These developmental patterns might suggest that, in general, linguistic systems that require more semantic distinctions are harder to learn. This hypothesis can be explored by looking at another example, this time involving verbs. 6.1.2 Verb categories Like prepositions and other goal markers, verbs differ in the level of specificity with which they divide up events. ‘Light’ verbs like ‘put’ apply to a wide range of events; ‘heavy’ verbs like ‘attach by inserting tightly between two pinching surfaces’ apply to a relatively small range of situations. Our languages can be compared with regard to level of specificity, as shown in the following figures. The figures show some of the sorts of placement events that appear in our data: putting a stick of firewood on the fire, putting a bottle down on its side, putting a pencil into a cup of pencils, putting down a bottle, a frying pan, or a bowl on a table. First consider English, shown in fig. 7.11a. In English, the verb put is typically used to describe all six of these events. Of course, we could make finer distinctions, but we generally don’t bother to, and in our data the children don’t either. In German, by contrast (fig. 7.11b), adult native speakers typically make distinctions based on the shape and orientation of the figure being placed. For example, they use the verb legen ‘lay’ for events in which the figure is placed with the long axis horizontal, while they use stellen ‘make stand’ for events in which the figure is placed vertically, or resting canonically on its base. Our children also produce posture verbs, though they initially preferred verbless utterances and utterances with ‘light’ verbs like ‘make’ or ‘do’(see footnote 8). Tzeltal (fig. 7.11c) makes even finer distinctions, requiring five different placement verbs for these scenes, depending on precise characteristics of the orientation and spatial configuration of figure and ground objects resulting from the placement action. According to the hypothesis that learning more distinctions is harder, we would expect the Tzeltal children to have a hard
Putting things in places
153
English Figure 7.11a English placement category
German Figure 7.11b German placement categories
time. But the Tzeltal children use a range of verbs when they start to use two-morpheme utterances to encode object placement, including not only a general ‘give/put’ verb (semantically more general than English put) but also several semantically specific verbs, including ch’ik (‘insert something between supports’), lut (‘insert tightly between forked object’), and pach (‘set down
154
Slobin et al.
Table 7.10 Examples of placement verbs in Tzeltal child speech Early two-morpheme stage
Six months later
give/put (ak’) insert.between.supports (ch’ik) insert.between [forked object] (lut) make-enter (otz-es) put.away (k’ej) set.down.upright [bowl-shaped thing] (pach)
stick.on (nap) drop (ch’ay) tie.on (chuk) put.on.clothes (lap) spoon.out (lop) insert.through.opening (tik’)
Figure 7.11c Tzeltal placement categories
something bowl-shaped upright’). They seem to use these verbs correctly, and do not overextend them to inappropriate placement scenes. Looking at the Tzeltal input, we find thirty-two distinct placement verbs, including verbs with meanings ‘stand-up-vertically,’ ‘set-down-stacked,’ ‘set-down-on-its-side,’ ‘mix-in-with [particulate things],’ ‘pile [multiple things],’ ‘insert-long-thin-thing-into-tight-fit-at-one-end,’ and ‘put-standingat-an-angle.’ However, the children use only a small proportion of these, as shown in table 7.10, which includes data for the two Tzeltal children from the early two-word stage and from about six months later. Thus although children begin very early to create the relevant fine-grained categories, it takes them a while to finish the job (Brown 2008, forthcoming; Narasimhan and Brown 2008).
Putting things in places
155
To sum up: the kinds of crosslinguistic variation we have been examining create a complex learning problem. Categories with more distinctions are not necessarily harder; it depends on what the distinctions and contrasts are – that is, what situations are being classed together (see also Bowerman 2005). Finnish children do have a hard time with two cross-cutting distinctions which simultaneously determine which case to use, but Tzeltal children do not appear to find semantically specific verbs particularly difficult. The Tzeltal verb semantics relate to concrete image schemas of shape, manipulation, and orientation. They require attention to particular types of objects, and children use them early on for objects that they have regular daily contact with. By contrast, the Finnish distinctions are general and abstract, referring to any type of object. The level of abstraction required seems to be a later developmental achievement. This leads us to a final point: there may be an interaction between the ease of learning semantic categories and where the language puts its information. As we saw in the first section, Tzeltal children are encouraged by the structure of their language to attend to verbs at an early age. German children are not as oriented to verbs (at least for placement events), and, correspondingly, it takes them longer to make the needed semantic distinctions, e.g., between legen ‘lay’ and stellen ‘make stand’. (Clahsen 1982 reports that German children initially produce many sentences without verbs.) Since in Tzeltal a lot of placement information is compressed into the verb, it is often unnecessary to separately mention the relation or the goal. Hence the favored pattern for the Tzeltal child at the beginning point is a verb alone, or a verb plus a deictic. What you can say with a verb alone in Tzeltal can also be said in Hindi by combining a verb with a noun. But when this information is packaged together with action in a single verb, it can be taken for granted in the context and does not need to be separately mentioned. It seems, then, that learning semantic categories interacts with other aspects of language structure.
6.2
Child-directed speech: Variation sets
Regardless of the typology of the exposure language, the child has to make use of patterns of speech in discovering the structures of the language. When a caregiver rephrases a child-directed utterance, language-specific patterns of expression can be subtly revealed. Such repetitions and rephrasings are typical of speech addressed to very young children, who do not often readily respond to questions and commands. Consider a mother who is instructing a child of age 2;3 (Sachs’ data, CHILDES; Sachs 1983):
156
Slobin et al.
(1)
Nomi, don’t put your bread on the floor, honey, put it back on the table, Nomi. Put it up on the table.
Note the substitution of back on by up on. This gives the child the information that a verb particle/preposition, on, can be combined with a temporal adverb, back, and a directional adverb, up. Of course, such information can also be derived from comparing stored utterances, but the immediacy of the mother’s rephrasing, with no change in the situation, may draw the child’s attention to the linguistic contrast. Example (2), also from the Sachs corpus, presents a more complex substitution pattern: (2)
I think I’ll put her over in your toy basket until you’re finished with breakfast, OK? I’ll put her right over here and you can go get her after you’re finished.
The directional adverb over is strengthened by the emphatic particle right, and a specified location, in your toy basket, is replaced by a deictic adverb, here. We will refer to such sequences as ‘variation sets,’ following K¨untay and Slobin (1996), who introduced this term to designate a series of utterances produced with a constant communicative intent, but with changing form. Variation sets are characterized by three types of phenomena: (1) lexical substitution and rephrasing, (2) addition and deletion of specific reference, and (3) reordering. Such sequences provide the child with clues about the typological characteristics of the exposure language, presenting patterns of word-order variability, ellipsis, and lexical alternation. Variation sets provide information about the meanings of lexical items. Successive utterances show which verbs can occur with the same array of arguments, along with demonstrating possible alternative expressions of noun arguments and relations. In addition, a sequence of utterances about the same events indicates how the language segments events into linguistic units. Variation sets are frequent in child-directed speech in the early period that we are investigating. For example, K¨untay and Slobin (1996), studying two Turkish mothers speaking to children in the age range 1;8–2;3, found that 21– 35% of utterances occurred in variation sets. Our examination of naturalistic data has drawn our attention to variation sets in several of the languages considered here: English, Russian, Hindi, and Turkish. Each of these languages presents the child with particular learning problems. The variation sets in the input serve to highlight critical features of the morphology and syntax of the language. Consider the following variation set in English, where the mother is encouraging a child of age 2;1 to put J’s bottles in the refrigerator:
Putting things in places (3)
157
let’s put J’s bottles in the refrigerator want to put them in the refrigerator with me let’s put J’s bottles in the refrigerator we’ll put it in the refrigerator let’s put it in the refrigerator we’ll put it in the refrigerator you can put it in I’ll let you put it in yourself you put it right in you put it in there put it right in the refrigerator
This English example is typical of variation sets in a language with fixed word order. Note that all of the utterances adhere to the same word-order schema: pragmatic introducer – verb – object – locative goal. The verb is always present. The object is quickly reduced to a pronoun, but it never disappears – that is, there is no object ellipsis. The goal shows the most variation: in the refrigerator, in, right in, in there, and finally the most elaborated form, right in the refrigerator. This sort of sequence is typical of variation sets addressed to very young children who do not readily show signs of comprehension or compliance: moderate elaboration, followed by reduction, followed by more elaboration. Input patterns of this sort in English show the child that English is a fixed word-order language, with neither verb nor object ellipsis, and with various types of locative expressions and the optional expression of deixis (in there). Russian presents rather different morphosyntactic information in variation sets, as shown in (4), where a mother is prodding a child of 2;0 to put his toys away. (For ease of presentation, examples from Russian, Hindi, and Turkish are presented only in English glosses with grammatical codes; the original examples can be found in the appendix.) (4)
gather toys-ACC put in basket-ACC blocks-ACC put put in basket-ACC toys-ACC throw thither in basket-ACC must put put
‘Gather (the) toys.’ ‘Put (them) in the basket.’ ‘(The) blocks.’ ‘Put.’ ‘Put (them) into the basket (the) toys.’ ‘Throw (them) over there.’ ‘Into (the) basket (you) must put.’ ‘Put.’
Variation sets of this sort demonstrate a range of patterns of ellipsis: there is no goal in the first utterance, no figure in the second, no verb or goal in the third, and so forth. Deixis is optional, appearing only in the fifth utterance (‘thither’). The verb is a constant element across utterances, except for the third utterance, ‘blocks-ACCUSATIVE,’ where the case ending implies action on the object. Note that the ACCUSATIVE is a reliable cue to the object (toys), and that it
158
Slobin et al.
also indicates that the basket is the goal.9 The preposition preceding ‘basket’ makes it clear that this is the goal rather than the figure. Given the physical situation, the variation set might serve to highlight these two functions of the ACCUSATIVE. Note, too, that there is considerable word-order variability. Again, repeated exposure to variation sets of this sort throws critical factors of Russian morphosyntax into relief. The next example shows a Hindi variation set (consisting of non-consecutive utterances excerpted from a conversation involving play with plastic blocks of different shapes which fit into slots on a board). This example is interesting in that the mother and older brother collaborate in constructing a variation set for a child of 1;7. Example (5) provides a morpholexical gloss and an English translation.10 (5) MOT
BRO
MOT
BRO
MOT
[mother puts blocks in front of a board with slots] this-ACC all-ACC attach-CAUS give-IMP this-INESS ‘Attach all this in this.’ come on, attach-CAUS give-IMP ‘Come on, attach (it).’ that-INESS that-NOM not attach-FUT ‘It won’t attach in that.’ here put-IMP ‘Put (it) here.’ that-INESS attach-FUT? ‘Will (it) attach in that?’ [referring to the slot into which child is putting block] no, this-NOM here attach-FUT, yes ‘No, this will attach here, yes.’ [mother points to the slot where the block should fit] there not attach-HAB ‘(It) doesn’t attach there.’ [addressing child who is unable to fit block in slot] good, this-NOM attach-CAUS-IMP, this-NOM attach-CAUS-IMP ‘Good. Attach this. Attach this.’ [mother points to new block]
Note that the verb is always present, and all other elements come and go. Note too that, as in Russian, ACC-marking is used to indicate action on an object (blocks). Interestingly, in Hindi, NOM and ACC case-marking alternate on the object of the transitive verb: compare the first sentence of the Hindi variation set (object gets ACC-marking with the verb ‘attach’) and the last sentence (object gets NOM-marking with the same verb). The use of the variation set by the mother illustrates this morphosyntactic distinction in ways that are helpful 9 10
The ACCUSATIVE is not a unique form in every case-gender-number combination, so the situation is more complex than schematized here. The verb glossed as ‘give’ in (5) represents a light verb which does not literally imply transfer but typically adds an aspectual value to the meaning encoded in the main verb.
Putting things in places
159
to the child. As the switch from ACC- to NOM-marking occurs in the same physical and linguistic context (referring to the same type of object/action, using the same verb and imperative construction), the child’s attention is likely to be more quickly drawn to the relevant factor – namely, definiteness – that conditions the case alternation (the mother uses NOM to refer to a new block that hasn’t been talked about before). Finally, (6) presents a Turkish variation set in which a mother is encouraging a child of 2;0 to put a toy bird in its nest. (6)
how put? ‘How (do we) put it?’ put hither ‘Put (it) here.’ its.nest-DAT put ‘Put (it) to its nest.’ its.nest’s inside-LOC like.this ‘Inside of its nest like this.’
Turkish exhibits the greatest amount of ellipsis. Even the verb is not constantly present, and the figure is never lexicalized at all. Here the child is shown the Turkish preference for ellipsis of all but the least redundant elements. Variation sets may enhance the effects of the sort of syntactic bootstrapping proposed by Lila Gleitman (1990: 23): Children’s sophisticated perceptual and conceptual capacities yield a good many possibilities for interpreting any scene, but the syntax acts as a kind of mental zoom lens for fixing on just the interpretation, among these possible ones, that the speaker is expressing. This examination of structure as a basis for deducing the meaning is the procedure we have called syntactic bootstrapping. (1990: 27)
Naigles and Hoff-Ginsberg (1998) have studied the consequences of syntactic bootstrapping for acquisition: The prediction of the syntactic bootstrapping hypothesis is that the more frames in which a child hears a verb, the easier that verb will be to learn because each additional syntactic frame has the potential to provide additional semantic information (Naigles and Hoff-Ginsberg 1998: 101).
They argue that if a mother uses a verb in a diversity of constructions, the child is most likely to learn that verb quickly and use it appropriately. The proposals of Gleitman and Naigles and Hoff-Ginsberg focus on verbs and the child’s task of keeping track of the frames in which particular verbs occur. A variation set “magnifies” the effects of syntactic bootstrapping: since a set of sentence frames is present in a single context of relatively brief duration, the child is spared having to collect, remember, and compare utterances produced far apart and in very different contexts. Instead, the utterances to be compared are conveniently grouped together in the same non-linguistic context and follow
160
Slobin et al.
each other in short succession, greatly reducing the memory burden required by syntactic bootstrapping. Note, finally, that in our examples the variation sets highlight the meanings and patterns of expression not only of verbs, but of all linguistic elements. 6.3
Intratypological variation in explicit mention of goal11
In our final excursion beyond framing typology, we present a small case study of the expression of locative goals in child-directed speech in two closely related languages of the same type, English and German.12 In both languages the goal can be stated explicitly, using a prepositional phrase with a nominal, as shown in (7), using examples from child-directed speech: (7a)
Put the flowers in this bucket.
(7b)
Die legen wir hier in’n Topf. this.one lay we here in.the pot ‘Let’s put this one here in the pot.’
Both languages can also point toward the goal with ellipsis of the goal nominal, as in (8): (8a)
You can put it in.
(8b)
Willst du deine Flasche reinlegen? want you your bottle in.lay ‘Do you want to put your bottle in?’
Where the languages differ considerably is in the elaboration of deixis in combination with expressions of relative location. English is essentially limited to here and there, sometimes in combination with locative particles, as in (9): (9)
Put it in here / over there.
German speakers have a larger and more flexible set of choices: hier/da ‘here/there’ can be combined not only with a large collection of directional particles such as rein ‘into,’ drinnen ‘inside’ and drauf ‘onto.there,’ but also with the deictic particles her/hin ‘hither/thither’ and with pronominal adverbs that mark out spatial regions, e.g., oben ‘upper region,’ unten ‘lower region,’ vorne ‘front region,’ hinten ‘rear region.’ For example, compare descriptions locating an object in the upper part of a cupboard. English requires two 11 12
We thank J¨urgen Bohnemeyer and Heike Behrens for critical discussion in developing the ideas presented in this section. The study was carried out by Heike Behrens, Melissa Bowerman, and Dan Slobin, using English data from the Naomi/CHILDES corpus and German data from Max Miller’s Simone corpus, sorted and statistically summarized by Behrens.
Putting things in places
161
nominals – one labeling a region and the other the ground: in the upper part of the cupboard. German uses a pronominal adverb that denotes a space: oben im Schrank ‘upper.region in.the cupboard.’ These elements occur in numerous combinations in parental speech to toddlers, such as the examples in (10). (10a)
Leg sie hier unten hin. put them here lower.region thither ‘Put them here down there.’
(10b)
Leg sie hier oben drauf. put them here upper.region onto.there ‘Put them here up onto there.’
Note the form drauf (reduced from darauf ) in (10b). It contains the deictic da ‘there’ along with rauf, a specification of the ‘on’ location to be found ‘there’ along with an allative specification. Expressions including da can compactly direct attention to the goal without an explicit noun or pronoun. Heike Behrens (p.c. 1997) has suggested the term ‘residual ground’ to designate such elements that point to contextually or deictically established ground referents without explicitly mentioning them. It may well be that detailed attention to deictic perspective and spatial divisions is more important in German discourse than the explicit naming of contextually given reference objects. There is some experimental evidence that is consistent with this proposal. Carroll and von Stutterheim (1993) compared English and German speakers in tasks of describing locations of objects. They observed English orientation to objects and object features, contrasting with German orientation to the spaces in which objects are located. What might be the consequences of these subtle differences for the explicit mention of goals in child-directed speech in the two languages? The available construction types in the two languages might lead English-speaking parents to make more frequent explicit mention of goals, whereas German-speaking parents might more frequently make implicit reference to goals by the use of deictic expressions. This proposal was tested in a case study by Behrens, Bowerman, and Slobin, using data from two girls and their parents: English data from Naomi (Sachs’ data) and German data from Simone (Miller’s data). For each of the girls we analyzed a six-month stretch of data, beginning with the child’s first expression of a motion event that included specification of goal and coding all motion event expressions made by both the child and her parents. Within these data, we attended specifically to object placement events, defined as reference to moving a physically present object to another location. For both girls, the data provide reference to comparable naturalistic situations – eating and playing with toys – encoded by verbs of placement. In English the verbs are hand, pour, put, stick, and throw. In German the verbs are h¨angen ‘hang,’ legen ‘lay,’
162
Slobin et al.
Figure 7.12 English and German: Verb-of-placement constructions in caregiver speech
machen ‘make,’ schmeißen ‘hurl,’ stecken ‘stick,’ stellen ‘make stand,’ tun ‘do,’ and werfen ‘throw.’ Consider first the implicit reference to goals by the use of deictic expressions only. The English-speaking parents used this option 10% of the time, in comparison with 28% for the German-speaking parents. Perhaps as a consequence, they were much more likely to make explicit reference to the goal: 53% in comparison with 30% for the German parents. Figure 7.12 shows the relative use of three types of constructions for verbs of placement in caregiver speech in the two languages: no explicit goal (e.g., ‘put it in’), deictic goal only (e.g., ‘put it in here’), and explicit goal (e.g., ‘put it in the box’). These limited data support the suggestion that English speakers, in comparison with German speakers, are relatively more concerned with explicit goal reference. The children matched these patterns even more strongly than their parents. Simone never made explicit lexical mention of a goal in the types of placement scenarios examined, and she used far more deictic expressions than Naomi, as shown in fig. 7.13. Finally, comparing each girl with her parents, it is evident that each child matches her parents more closely than she matches the corresponding child in the other language. Figures 7.14a and 7.14b compare each girl with her parents. Note that Naomi seems to slightly overproduce explicit goals in comparison with her parents, whereas Simone never mentions a goal. In this intra-typological comparison we have again gone beyond the largescale typological dimensions discussed earlier in this chapter. We find that an apparently small difference – the relative elaboration of locative systems expressing speaker perspective – seems to influence the use of referential
Putting things in places
163
70% 60% 50%
NAOMI SIMONE
40% 30% 20% 10% 0%
NO EXPLICIT GOAL
DEICTIC GOAL ONLY
EXPLICIT GOAL
Figure 7.13 Verb-of-placement constructions in English and German child speech
Figure 7.14a English verb-of-placement constructions: Naomi and her parents
encoding of goals in placement events. Available lexical choices combine with available construction types to shape discourse patterns. 7
Overall conclusions
We suggest, on the basis of the data presented in this chapter, that children’s early talk about placement events reflects typological characteristics of the target language. In terms of our initial question, the development of linguistic event representations is indeed influenced by the particular language the child
164
Slobin et al.
Figure 7.14b German verb-of-placement constructions: Simone and her parents
is learning. Specifically, children learning satellite-framed languages show an early emphasis on goals and vectors/relations, whereas children learning verbframed languages emphasize actions. At the same time, factors that crosscut this typological dichotomy are also important, including: (1) the varying perceptual salience of relational markers across languages, and (2) varying patterns of ellipsis and discourse framing. In addition, on the level of semantic categories, it is clear that children are language-specific in their semantic distinctions. We have also found significant crosslinguistic differences in input patterns which influence acquisition. Child-directed speech displays the grammatical and semantic properties of the language. Further, from among these patterns, the input presents those patterns that are preferred for specific discourse purposes. The bottom line is that any reasonable model of language acquisition must consider a multiplicity of interacting factors – morphosyntax, pragmatics, lexicon, etc. – each with its own language-specific constraints and regularities (Brown and Bowerman 2008). As a consequence, what we end up with is not infinite variation, but rather constrained variability. Appendix Russian variation set (example 4) soberem igruˇseˇcki ‘gather:1PL.OPTATIVE toy:DIM.ACC.PL’ skladyvaj v korobku ‘heap/stack:IMP in basket:ACC’ kubiki ‘block:ACC.PL’ skladyvaj ‘heap/stack:IMP’
Putting things in places
165
skladyvaj v korobku igruˇski ‘heap/stack:IMP in basket:ACC toy: ACC.PL’ kidaj tuda ‘throw:IMP thither’ v koroboˇcku nado klast’ ‘in basket:DIM.ACC must put/place:INF’ skladyvaj ‘heap/stack:IMP’ Hindi variation set (example 5a) isko sabko lagaa do ismE ‘this:ACC all:ACC attach:CAUS give:IMP this:INESS’ calo lagaa do ‘go:IMP attach:CAUS give:IMP’ usmE wo nahii lagegaa ‘that:INESS that:NOM not attach:FUT’ yahAA rakho ‘here put:IMP’ usmE lagegaa? ‘that:INESS attach:FUT?’ nahii, ye yahAA lagegaa, hAA ‘no, this:NOM here attach:FUT, yes’ wahAA nahii lagtaa ‘there not attach:HAB’ acchaa, yeh lagaao, yeh lagaao ‘good, this:NOM attach:CAUS:IMP, this:NOM attach:CAUS:IMP’ Turkish variation set (example 6) nasıl koyalım ‘how put:1PL.OPT’ koy buraya ‘put:IMP this.place:DAT’ yuvasına koy ‘nest:3POSS:DAT put:IMP’ yuvasının ic¸inde b¨oyle ‘nest:3POSS:GEN interior:LOC like.this’
8
Language-specific encoding of placement events in gestures∗ Marianne Gullberg
1
Introduction
What information do speakers attend to as they prepare to speak about the world? This question lies at the heart of concerns about how language might influence the ways in which humans deal with the world. As we plan to talk about events around us, we must select which information is relevant for expression and how to encode it in speech. This activity is alternatively known in the literature as ‘macro-planning,’ ‘linguistic conceptualization,’ ‘event construal,’ and ‘perspective taking’ (e.g., Levelt 1989; von Stutterheim and Klein 2002; von Stutterheim, N¨use, and Murcia-Serra 2002). Various suggestions have been made regarding what constrains such information selection. One approach focuses on the effects of the linguistic categories themselves. It suggests that speakers’ choices of information are guided or “filtered” through the linguistic categories afforded by their language, specifically by the categories they habitually use to express events (e.g., Berman and Slobin 1994a; Carroll and von Stutterheim 2003, and in this volume; Slobin 1991, 1996a; von Stutterheim and N¨use 2003; von Stutterheim, N¨use, and Murcia-Serra 2002). This idea is known as the thinking for speaking hypothesis (e.g., Slobin 1991, 1996a). Language-specific rhetorical styles, views or perspectives arise through the habitual use of linguistic categories that select for certain types of information to be expressed (Slobin 2004; Talmy 2008). This view of the effect of linguistic categories on speaking differs in scope from the so-called linguistic relativity or neo-Whorfian hypothesis. Linguistic relativity proper explores the effect of language on general cognition (e.g., Gumperz and Levinson 1996b; Lucy 1992). The focus here, however, is on the effect of linguistic categories on the activity of information selection or linguistic conceptualization for speech. ∗
I gratefully acknowledge financial and logistic support from the Max Planck Institute for Psycholinguistics, and funding from the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO; MPI 56–384, The Dynamics of Multilingual Processing, awarded to M. Gullberg and ¨ urek, Eric Pederson, Mandana P. Indefrey). I thank Asifa Majid, Bhuvana Narasimhan, Aslı Ozy¨ Seyfeddinipur, and David Wilkins for their valuable input and thoughtful comments. I am also grateful to Wilma Jongejan, Arna van Doorn, and Merlynne Williams for assistance in data collection, coding, and establishing reliability.
166
Encoding of placement events in gestures
167
It may seem self-evident that insofar as languages provide different linguistic categories encoding different meanings, this will affect what type of information speakers should focus on and how they talk about the external world (see Pederson 1995). Nevertheless, it remains a vexing question as to what extent differences in linguistic repertoires result in mere surface differences in speech and rhetorical styles, and to what extent, if any, such differences reflect a deeper difference in what information speakers attend to and consider in their construals of events. That is, it is far from clear whether event representations are always reflections of semantic categories or whether speakers have neutral event representations that are guided by properties of events themselves. Although semantic distinctions in verbs, for instance, seem to steer attention to certain types of information, speech may not reflect all aspects of the information that speakers actually consider. That is to say, speech may be under-specified with regard to the details of representations. The question then arises, how to ascertain what information is considered. Different types of behaviors have been shown to reflect linguistic activities. For instance, studies using eye-tracking techniques have shown that speakers’ gaze behavior is influenced by language in that they look at precisely those entities they are about to talk about (Meyer 2004; Meyer and Dobel 2003). Importantly, speakers of different languages look at different entities and locations, reflecting subtle distinctions made in the languages spoken (Carroll and von Stutterheim, this volume). This chapter explores yet another source of information, namely speech-associated gestures. Although gestures convey information in a very different format from speech, their forms and timing have been shown to reflect the expression of linguistic ¨ urek 2003; McNeill 1992). information (e.g., Kita and Ozy¨ The target domain for this study is that of placement, an event type firmly grounded in sensori-motor experience and a popular candidate for a universal, language-neutral event type. This event type has not been examined before in terms of language-specific event construal in speech and gesture. This chapter asks the following questions. Do the semantic properties of verbs habitually used to describe placement events guide speakers’ attention to different types of spatial information, keeping syntactic and information structure constant? What can gestures tell us about language-specific linguistic event representations beyond speech in this concrete domain? 2
Gestures
Speech-associated gestures, defined as the movements speakers perform while they speak as part of the expressive effort, might at first glance seem like an ideal place to look for language-neutral perspectives on events. After all, gestures seem well suited to represent what is seen or done iconically and mimetically. However, the relationship between gestures, what is seen (percepts) and done
168
Gullberg
(actions) is complicated by the impact of language, at least when gestures accompany spoken descriptions. In fact, gestures, speech, and language are intricately linked to each other (Kendon 1980, 2004; McNeill 1985, 1992). The integration between the modalities is seen both in comprehension and in production. For example, gestures affect the interpretation of, and memory for, speech (e.g., Beattie and Shovelton 1999a; Cassell, McNeill, and McCul¨ urek et al. 2007). lough 1999; Graham and Argyle 1975; Kelly et al. 1999; Ozy¨ They also appear to be integral to speech production in that speakers deliberately distribute information differently across modalities depending on whether interlocutors can see their gestures or not (e.g., Bavelas et al. 2002; Holler and Beattie 2003; Melinger and Levelt 2004). The link between speech and gesture is also seen in their parallel development in childhood (Goldin-Meadow 2003; Mayberry and Nicoladis 2000; Volterra et al. 2005), and breakdown in disfluency (Seyfeddinipur 2006) and stuttering (Mayberry and Jaques 2000). Several recent theories attempt to formalize the relationship between gesture, language, and speech production (e.g., Alibali, Kita, and Young 2000; De Ruiter ¨ urek 2003; Krauss, Chen, and Gottesman 2000; Kendon 2004; Kita and Ozy¨ 2000; McNeill 1992, 2005). Although the details of the relationship are not yet fully understood and the theories differ in their views on the mechanics, the relationship itself remains undisputed. Of particular relevance here is the steadily growing evidence that gestures reflect linguistic choices. It has been observed at various levels of granularity that gestures and speech often are temporally and semantically coordinated such that they express closely related meaning at the same time. More specifically, gestures reflect information selected for expression (what is relevant and newsworthy) and also the way in which it is then (lexically) encoded in speech. At the level of information selection, a number of studies have shown that gestures tend to co-occur with elements in speech that represent new or focused information (e.g., Levy and McNeill 1992; McNeill 2000b; McNeill and Levy 1982; McNeill, Levy, and Cassell 1993). Crosslinguistic differences in discourse organization and its implementation in linear speech are therefore mirrored in language-specific gesture patterns (e.g., Duncan 1996; McNeill and Duncan 2000). For instance, in narratives, Dutch, Swedish, and French speakers treat discourse referents’ actions as newsworthy and structure discourse such that actions become focused information. Japanese speakers instead treat discourse referents’ locations and the setting as newsworthy and structure discourse accordingly. These different linguistic foci result in different narrative gesture patterns, with Dutch, French, and Swedish speakers generally gesturing more about actions, and Japanese speakers gesturing more about entities that are part of the setting (Gullberg 2003, 2006; Yoshioka and Kellerman 2006). Turning to how information is expressed, gestures also reflect the way in which specific information is encoded in speech (e.g., Duncan 1996; Kita and
Encoding of placement events in gestures
169
¨ urek 2003; McNeill 1992, 2000a; M¨uller 1994). For instance, in the domain Ozy¨ ¨ urek have shown that gestures accompanying of voluntary motion, Kita and Ozy¨ motion expressions in English look different from the corresponding Turkish ¨ urek 2003; Ozy¨ ¨ urek et al. 2005). English and Japanese gestures (Kita and Ozy¨ speakers tend to express manner and path of motion in one single verbal clause (e.g. he rolls down) and in one single gesture that encodes both manner (roll) and direction of the motion (down) in one movement. In contrast, Turkish and Japanese speakers typically express the manner and the direction of motion in two separate verbal clauses (e.g. the ball descends rolling) and typically in two gestures, one expressing the direction (descends) and the other the manner of ¨ urek have argued that the separate motion (rolling) independently. Kita and Ozy¨ gesture patterns reflect the linguistic encoding patterns in these languages, a claim strengthened by the observation of within-language variation depending ¨ urek et al. 2005). on what structures speakers actually use (Ozy¨ A further observation is that gestures are particularly well suited to express spatial and/or imagistic information such as size, shape, directionality, etc., because of their imagistic and synthetic nature (e.g., Beattie and Shovelton 1999a, 1999b, 2002). Such spatial information is in fact more likely to be encoded in gesture than in speech. Gestures therefore often express additional or complementary spatial information to speech (see Beattie and Shovelton 1999a; De Ruiter 2007; Kendon 2004). In this sense, gestures provide a fuller picture of what spatial information is taken into account for speaking than can be gleaned from speech alone. To summarize, then, gestures reflect linguistic choices both at the level of information structure and at the level of linguistic, structural choices. Moreover, complementary spatial information that is not easily expressible in speech may be revealed in gestures. Insofar as speakers of different languages target different sorts of (spatial) information for expression, i.e., display language-specific event construal or linguistic conceptualizations, this difference may be reflected in gestures either in terms of where gestures fall (what is newsworthy) or in terms of how gestures look (what meaning elements are taken into account). If meaning elements are relevant to the event construal but not readily expressed in speech, they may nevertheless be visible in gesture as additional spatial information. Note again that because of their link to language and linguistic choices, gestures are well suited to explore linguistic event representations, but not linguistic relativity proper, i.e. effects of language on general cognition. 3
The test domain: Placement events
Placement events are mundane, frequent events where somebody moves an object to a location, as in putting a coffee mug on a desk. These events can be described more technically as caused motion events involving an action where
170
Gullberg
an agent causes an object (the figure object) to move to an end location (a goal ground) to which it will relate in a resulting spatial relationship. Typically, the agent maintains manual control over the figure object until it reaches its end location (Bowerman et al. 2002; Slobin et al. in this volume; Talmy 1985). Placement events are an interesting test domain for language-specific event construal because they are popular candidates for a cognitively basic and universal event category with language-neutral representation. A number of properties speak for such a position. First, placement events have a firm experiential basis involving sensori-motor patterns of manual grasping of objects. Partly based on this observation, it has been suggested that causing something to move somewhere is a basic category also for language, often encoded in semantically general or ‘light’ verbs such as put (e.g., Goldberg 1995; Pinker 1989). It has further been proposed that children come preordained with a basic placement category at the outset of the language learning task, and then simply map a linguistic label from the input language onto it (e.g., Gleitman 1990; Piaget and Inhelder 1956; Pinker 1989). The acquisition of placement terms should therefore be early and effortless. The view of placement as basic and universal is further bolstered by recent evidence suggesting a common neurological basis for action, perception, and language. Seeing someone grasping an object or hearing words like pick activates the same areas of motor cortex as when the action is actually performed (e.g., Pulverm¨uller 2005). There are, however, also reasons to question the basic, universal, and language-neutral representations of placement and instead expect languagespecificity. First, the basicness of placement is called into question by the observation that placement verbs do not seem to be acquired as early, easily, and uniformly as previously thought. In fact, even the acquisition of the meaning of general placement verbs like English put is protracted and involves stages where children as old as three years of age use targetlike forms in ways that differ from adult usage (Bowerman 1978, 1982). Second, the universality and language-neutrality of placement seems questionable in view of the considerable crosslinguistic variation in the domain. Languages vary greatly in how placement events are encoded linguistically and in how the information about the figure object, the ground, and the relationship between the two, is distributed across the clause (see Sinha and Kuteva 1995). Depending on the language, the spatial information can be encoded in the adpositional or adverbial phrases describing the goal ground (on the table), in local case markings, in spatial nominals (e.g., top, side), and in locative verbs (Levinson and Meira 2003; Levinson and Wilkins 2006a). The semantic specificity of placement verbs further varies extensively across languages (Ameka and Levinson, 2007; Bowerman et al. 2002; Kopecka and Narasimhan forthcoming; Newman 2002a; Slobin et al. in this volume). Light verbs (Jespersen 1965) or ‘general-purpose’ verbs like put cover a range of
Encoding of placement events in gestures
171
events and tend to encode cause and change of location. Other verbs have more specific semantics and constrained extensions to specific events. For instance, locative posture verbs (e.g. set, stand, lay) encode cause and change of location, but also take properties of the figure object and its orientation and disposition with respect to the ground into account. They must each be applied to specific events. Such systems are typical in Germanic languages (e.g., Berthele 2004; David 2003; Gullberg and Burenhult forthcoming; Hansson and Bruce 2002; Lemmens 2002a; Newman and Rice 2004; Pauwels 2000; Serra Borneto 1996; Van Oosten 1984), but are also found in many other languages (see various in Ameka and Levinson, 2007; Kopecka and Narasimhan forthcoming; Newman 2002b). In other languages verbs encode distinctions like path, such as meter ‘insert’ in Spanish (see Slobin et al. in this volume), or a combination of path and final end state, such as the difference between final containment vs. support from below, as in kkita in Korean meaning ‘to place by inserting tightly between two pinching surfaces’ (Bowerman et al. 2002; Bowerman and Choi 2001; Choi et al. 1999). In some languages verbs are so specific as to be classificatory or dispositional, as with tz’apal in Tzeltal meaning ‘standing of stick-shaped object vertically erect with base buried in support’ (Brown 2006). Some languages display a degree of optionality such that both a general and a semantically specific system exist in parallel. English, for instance, has both a general placement verb, put, and what Levin (1993) calls “verbs of putting in a spatial configuration,” exemplified by set, stand and lay. While both groups of verbs are available, the most frequently or habitually used expression is often the general verb, whereas the specific system is reserved for making contrastive or otherwise pragmatically motivated distinctions. For instance, Tamil speakers will use a general verb, veyyii ‘put,’ the first time they see a given object being placed (Narasimhan and Gullberg 2006). If they encounter the same object again, they will typically use a more specific verb, nikka veyyii ‘make stand’ or paDka veyyii ‘make lie,’ to highlight the contrast to the first mention. 3.1
Placement in French and Dutch
French and Dutch differ with respect to the division of labor between prepositions, existential verbs, locating verbs, and posture verbs (e.g., Chenu and Jisa 2006; Hickmann 2007; Hickmann and Hendriks 2006; Lemmens 2006; Van Oosten 1984). French encodes locative information mainly in prepositional or adverbial adjuncts. Most caused motion or placement events can be described in French using the general placement verb mettre, ‘put,’ followed by a prepositional phrase (1): (1)
elle met le bol sur la table ‘she puts the bowl on the table’
172
Gullberg
The general placement verb mainly encodes cause and change of location. A range of more specific verbs also exist which focus chiefly on the manner of attachment, conflating action and relation or path such as accrocher ‘hang, hook,’ ins´erer ‘insert,’ or even the nature of the figure object as in verser ‘pour into’ (e.g., Chenu and Jisa 2006; Hickmann 2007). In addition, there is a trade-off in specificity such that the general verb is followed by greater specificity in the expression of ground carried by specific prepositions, whereas the specific verbs are followed by more general prepositions (Chenu and Jisa 2006; Hickmann 2007). Dutch encodes locative information across posture verbs and prepositions. To describe placement events, one of a set of caused posture verbs is typically used which encode information about figures and the end configuration on the ground as well as cause and change of location towards that ground (see Slobin et al. in this volume). Crucially, for any given event, a posture verb is chosen, and the choice depends on the properties of the object being located: its shape, its orientation, and its disposition with respect to the ground. Specifically, the semantic distinctions concern the presence of a functional base and whether the figure object is resting on it, and whether the spatial extension or projected axis of the object is vertical or horizontal (Lemmens 2002a, 2006; van Staden, Bowerman, and Verhelst 2006). For objects resting on their base, zetten ‘set’ is typically used. For objects lacking a functional base and/or extending horizontally, leggen ‘lay’ is preferred. The posture verbs are followed by locative prepositional phrases as exemplified in (2): (2)
ze zet/legt het kommetje op de tafel ‘she sets/lays the bowl on the table’
In addition to the posture verbs, Dutch also has a range of placement verbs that typically encode the manner of placing or attaching something, such as plakken ‘stick, glue,’ but also verbs that conflate manner of placing with final containment like stoppen ‘put.into’ (Lemmens 2006). In summary, the placement verbs habitually used in French (the general verb mettre ‘put’) and Dutch (the caused posture verbs zetten ‘set/stand’ and leggen ‘lay’) select different types of spatial information for expression. It is therefore possible that speakers of Dutch and French consider different types of spatial information as they prepare to talk about placement and have language-specific event construals of placement.
4
This study
This study explores whether the semantics of Dutch and French placement verbs affect speakers’ linguistic conceptualization of – or thinking for speaking
Encoding of placement events in gestures
173
about – placement, and whether these lead to language-specific event representations. Specifically, the study asks whether speakers of languages whose verb categories encode different spatial information actually attend to different types of information (language-specific event construal), or whether they attend to the same information despite differences in linguistic distinctions (languageneutral event construal). In contrast to existing studies of event representations, this study targets a concrete domain where a language-neutral, experiencebased representation with a focus on the same basic spatial information is likely. Overt differences in speech alone between Dutch and French will not necessarily address this issue. In Dutch, the lexical encoding is specific, and so the event representation can be assumed to be as well. In French, the lexical encoding is coarse-grained. However, coarse-grained, under-specified speech does not preclude an event representation that takes other information into account even if it is not realized in speech. Therefore, to probe event representations further, all vehicles of meaning will be considered, namely both speech and speech-associated gestures. Gestures may reveal additional aspects of what spatial information Dutch and French speakers take into account when talking about placement and reflect differences in event construal. Also in contrast to existing studies, this study will compare instances where the organization of information structure and syntactic structure is kept constant across languages, in order to explore the effect of verb semantics alone. Two possibilities can be posited. First, if linguistic conceptualization of placement is language-neutral and instead guided by general properties of the event itself, then French and Dutch speakers should attend to the same types of spatial information as they prepare to talk about it, even though the verb categories differ. Dutch and French speakers should gesture similarly as they describe the same placement events, perhaps representing placement as an enactment of the observed, practical action in both languages. Second, if the linguistic conceptualization of placement events is guided by verb semantics, then French and Dutch speakers should construe these simple events differently, and attend to different types of spatial information as they prepare to talk about them. Dutch and French speakers should then gesture differently as they describe the same placement events, keeping information and syntactic structure constant across languages. If Dutch and French speakers gesture differently while using similar syntactic structures, the following predictions can be made regarding the form and content of their gestures. Given that the semantics of the Dutch posture verbs zetten ‘set/stand’ and leggen ‘lay’ conflate caused motion with object properties, Dutch speakers should attend to the figure object in order for verb selection to proceed. Gestures accompanying Dutch placement expressions may reflect this object focus in hand shapes that reflect the physical properties of the object (object incorporation), as well as the movement of the physical action
174
Gullberg
encoded as gestural path.1 In contrast, the French placement verb mettre ‘put’ encodes only the caused movement towards the ground. French speakers should therefore typically attend to the movement towards the ground but, crucially, not to the figure object. Gestures accompanying French placement expressions may therefore express only the movement towards the goal encoded as gestural path, but not information about the figure objects. 4.1
Method and data
A referential communication task was used to elicit event descriptions. The task was a director-matcher game (Clark, Carpenter, and Just 1973; Clark and Wilkes-Gibbs 1986) where a speaker watches video clips of placement events on a laptop screen and then describes them to an interlocutor. The stimulus clips show an actor putting away objects found on the floor of a messy room. The stimulus comprises thirty-two placement events where thirty-two objects are placed on fourteen grounds. The thirty-two events are distributed over eight video clips with each clip showing four events. The first and last clips also include an introductory and a wrap-up sequence showing the actor entering and leaving the room, respectively. The task requires a Describer to watch one clip at a time, i.e., the placement of four objects. When the screen goes blank, she must describe from memory to a Drawer what the agent in the video did to the objects. The Describer has a list of the object nouns as memory support so as not to omit any object (e.g. chair, tablecloth, bowl, bananas). The Drawer, in turn, has a picture of the empty room and has to draw in the objects in their final destinations. The elicitation set-up is illustrated in fig. 8.1.2 The task is interactive and self-paced. Oral and written instructions called for a description of “what happened” in order to focus the description on the placement activity. No mention was made of gesture. A post-test questionnaire ascertained that participants were unaware of the focus on gestures. 1
2
Hand shapes are a likely gestural reflection of object incorporation. The act of grasping or moving a real object involves a motor pattern whereby the hand encloses the object and so molds itself around the shape of the object. The shape of the hand therefore reflects the properties of the object. A gestural representation – that is, a symbolic movement without the real, physical object – that incorporates the imagined object with the placement movement might draw on this same pattern and thus represent the object in a hand shape. This is in potential contrast to expressions that target the object itself, for example existential expressions like there was a big bowl. Such expressions may yield other object-related gestural expressions, such as gestures that trace the shape of an object. However, such object-focused expressions are not under study here. The low barrier between the interlocutors might have obscured some of the Describer’s gestures from the Drawer’s view (although not from the Analyst’s view). Although this raises interesting questions about whether the Drawer made use of the gestural information (e.g., Beattie and Shovelton 1999a; Kelly et al. 1999), as well as whether the Describer intended some of the gestures to be seen by the Drawer (e.g., Gullberg 2006; Holler and Beattie 2003; Melinger and Levelt 2004), these issues are not relevant for the current study.
Encoding of placement events in gestures
175
Figure 8.1 The task set-up with the Describer on the left and the Drawer on the right
Descriptions were elicited from twelve pairs of native speakers of French and twelve pairs of native speakers of Dutch. The role of Describer and Drawer was assigned randomly and kept throughout the task. Participants were paid for their participation and provided written consent to use the data. Only speech and gesture data from the Describer went into the analyses. 4.2
Data treatment and coding
Speech The placement descriptions were transcribed verbatim by native speakers of each language. The descriptions included the first complete and minimal description of the target events with mention of the figure object, the placement verb, and often also the ground location. Only the first spontaneous, simple renditions of the target scene, i.e., the placement event itself, exemplified in italics in (3)–(4), were analyzed. Elaborations on the precise locations, often prompted by the interlocutor’s questions, were not included. (3)
ze pakt de prullenbak die zet ze rechtop naast het bureau ‘she takes the wastepaper basket she sets it straight next to the desk’
(4)
elle prend la poubelle elle la met a` droite du bureau ‘she takes the wastepaper basket she puts it to the right of the desk’
All placement verbs were extracted from these first renditions. Excluded were cases of self-motion plus placement (e.g., ‘bring’), cases of giving something to an animate recipient, answers to where-questions, and self-repetitions (see Slobin et al. in this volume). The interrater reliability for the placement verb coding was Cohen’s kappa. 95.
176
Gullberg
Gesture The digital video recordings were coded frame-by-frame using video annotation software (Mediatagger 3.1, see Brugman and Kita 1995). All representational gestures occurring within the first spontaneous renditions of the target event in speech (i.e. the italicized parts in (3) and (4) above) were identified (N = 238). The analysis targeted gestural strokes, i.e. the most effortful and meaningful parts of manual movements (Kendon 1980; Kita, Van Gijn, and van der Hulst 1998; McNeill 2005: 32). Excluded from analysis were gestures occurring with lengthy descriptions of the figure objects themselves when introduced; substitutive gestures, i.e. gestures replacing speech in expressions such as “She did like this”; and clearly compensatory gestures in cases of word-finding problems (cf. Gullberg 1998). The remaining gestures were coded without sound for form as either figure-incorporating or encoding simple path. Gestures displaying hand shapes reflecting the figure object were coded as figure-incorporating. Gestures expressing a “spatial excursion” (see Kendon 2004) with either a pointing hand shape or no particular hand shape, i.e. a relaxed, floppy hand, were coded as simple path. The two coding categories are mutually exclusive for each gesture. The interrater reliability for the gesture identification and form coding was Cohen’s kappa .89, and .94, respectively. Finally, the speech precisely co-occurring with the gesture strokes was noted and coded for word class. 5
Results
5.1
Crosslinguistic placement in speech
The speech data overall confirm the language-specific patterns proposed for Dutch and French. First, in terms of information structure, both Dutch and French show a clear preference for introducing the figure object in a separate clause (underlined in examples (3) and (4), repeated here for convenience), followed by a target event clause expressing the placement (in italics): (3)
ze pakt de prullenbak die zet ze rechtop naast het bureau ‘she takes the wastepaper basket she sets it straight next to the desk’
(4)
elle prend la poubelle elle la met a` droite du bureau ‘she takes the wastepaper basket she puts it to the right of the desk’
Further, the placement verbs project similar structure in both languages. The elements in the target event clause in Dutch (3) are ordered figure object – verb – agent – (adverb) – location. In French (4) the order is agent – figure object – verb – location. These orders accounted for 67% of all Dutch and 62% of all French descriptions. The next most frequent orders in Dutch have either fronted location (location – verb – agent – object, 9%), or verb (verb – agent – object – location, 7%). In French, the next most frequent orders are
Encoding of placement events in gestures
177
agent – verb – object – location (17%) and location – agent – verb – object (7%). Second, concerning the verbs used, the Dutch descriptions include twentyone verb types (thirty-five if particle constructions like neerzetten ‘set down’ and opzetten ‘set on’ are considered separately). Three verb types constitute 66% of the tokens. Zetten ‘set/stand’ is used to describe scenes where figure objects are located in a vertical position on a base, and leggen ‘lay’ is used about scenes where objects are placed in a horizontal position. Hangen ‘hang’ is applied to scenes where objects are suspended over or around something or are attached to a vertical surface. Three other verbs are fairly frequent, namely stoppen ‘put.in, insert,’ plakken ‘stick, glue,’ and a light verb, (in)doen ‘do’ or ‘make(into),’ which together account for a further 24% of the tokens. The French descriptions include thirty-three verb types. One verb, mettre ‘put,’ accounts for 51% of all tokens in the set. Moreover, it is used to describe all scenes. For scenes where objects are located in a vertical position, poser ‘put, set down’ is sometimes also used, accounting for 18% of the tokens. Four other verbs are moderately frequent: accrocher ‘hang, hook,’ coller ‘stick, glue,’ ranger ‘put away, in order,’ and replacer ‘re-place, put back’ (17% of the tokens). The particular verb choices in this data set are obviously dependent on the elicitation tool. Nevertheless, two clear patterns appear for each language. Overall, in Dutch, three verbs dominate the descriptions and constitute the most frequent verbs: zetten ‘set/stand,’ leggen ‘lay,’ hangen ‘hang.’ These verbs are all posture verbs that are specific with regard to object properties and spatial relationships between figure and ground. They are each applied to different scenes. In French, in contrast, one single verb dominates: mettre ‘put.’ It is a general placement verb that mainly encodes movement. It can be used to describe all scenes. More specific verbs are much less frequent and are only found for scenes that involve suspension or sticky attachment. 5.2
Crosslinguistic placement in gesture
The analysis of all gestures occurring with the first spontaneous renditions of the target placement events reveals two language-specific gestural patterns for Dutch and French at the level of gestural form as illustrated in figs. 8.3 and 8.4. The Dutch speaker in fig. 8.3 talks about the stimulus scene shown in fig. 8.2, the placing of a bowl of bananas on the desk. She uses one of the two posture placement verbs, zetten ‘set/stand.’ She accompanies the description with a bi-manual gesture displaying cupped hand shapes, as if holding the bowl, moving to her right. The French speaker in fig. 8.4 describes the same scene using the general placement verb mettre ‘put.’ Her gesture is strikingly different. She accompanies the description with an open-handed gesture moving sagittally outwards. This gesture only indicates the path outwards towards
178
Gullberg
Figure 8.2 Stimulus: placement of the bowl
the ground and reflects no object properties at all. This difference in what information is encoded in gesture constitutes a robust preferential pattern across the languages. Dutch speakers are significantly more likely to incorporate figure object information in their gestures than are French speakers (Dutch on average 59% vs. French 35%; p ≤ 0.05). Conversely, French speakers are significantly more likely to encode only simple-path information in gestures than are Dutch speakers (Dutch on average 41% vs. French 65%; p ≤ 0.05). Crucially, the difference in gestural form is not related to a difference in the timing of gestures relative to speech. In both language groups, gestures occurred most frequently with verb phrases (on average 49% in Dutch and 55% in French), followed by prepositional phrases (on average 30% in both Dutch and French). In other words, the difference in form does not reflect a difference in what aspects of the event are targeted as being newsworthy or focused in the two languages. In particular, the French gestural focus on path is not a reflection of gestures aligning with prepositional phrases expressing ground location as opposed to Dutch gestures aligning with verbs. Instead, both languages treat the action as the most newsworthy element and align gestures with verbs. A further important observation is that neither the Dutch nor the French speakers imitate the action seen in the stimulus video. The stimulus scene in fig. 8.2 shows the actor placing a bowl of bananas on the desk using a single
Encoding of placement events in gestures
179
Figure 8.3 Placement of bowl in Dutch with a posture placement verb, zetten, and a bi-manual gesture encoding object information in the hand shape
left hand to put the bowl on the desk, gripping it around the rim. Both the Dutch speaker in fig. 8.3 and the French speaker in fig. 8.4 gesture something different. The Dutch speaker uses a bi-manual gesture molding the entire bowl as she moves it instead of gripping it (see M¨uller 1998). The French speaker moves her open right hand sagittally outwards, not gripping or molding anything. Neither gesture resembles the action in the video, which is particularly obvious in the case of the French gesture. There is therefore no simple and direct one-to-one mapping between the percept and the gesture that accompanies the description of the percept, nor between the practical action that is described and the gesture that accompanies the description of the action. Put differently, speakers do not necessarily gesture what they see or what they would do if they performed the physical action, but rather what they say (see De Ruiter 2007). The crosslinguistic differences in gestural form thus reflect a difference in what spatial information Dutch and French speakers select for gestural expression, and, by extension, what spatial information they consider to be relevant for speaking about placement events. The gesture data therefore seem to support the notion that there is a crosslinguistic difference in placement event
180
Gullberg
Figure 8.4 Placement of bowl in French with a general placement verb, mettre, and a gesture encoding simple-path, no object information
construal in French and Dutch with two different foci: one (French) focus on the movement towards the goal ground encoded in gesture as simple path; another (Dutch) focus on the movement of a particular object towards the goal ground, encoded in gesture as object-incorporation with the path.
5.3
Alternative explanations? Gesture chains and lingering hand shapes
As seen above, the syntactic structures onto which the semantic elements map are similar in both languages. Moreover, both languages organize information in similar ways, with the figure object typically introduced as a direct object in a separate clause, followed by another clause in which the object is reduced to a pronoun and the action of moving and the ground constitute the new, focused information. Given that gestures often occur with new information in a clause, gestures may accompany the introduction of the figure object in one clause, and be followed by another gesture for the action in the subsequent clause. If the gesture concomitant with the introduction of the object reflects
Encoding of placement events in gestures
181
some aspect of that object, the subsequent gesture accompanying the placement action could display lingering hand shapes, since gestures influence each other as they occur in gesture chains. This might account for a tendency towards object-incorporating gestures. In both languages, half of the figure objects were accompanied by gestures when first introduced. In Dutch, the gesture on the figure was followed by a gesture on the verb that encoded figure information. However, in French, gestures on the figure were followed by gestures on the verb encoding path information. In other words, although it cannot be ruled out that the figure focus on verbs in Dutch is an effect of a lingering hand shape from the preceding gesture with the introduction of the object, no such pattern can be detected in French. Gestures with object information do not carry over to the verb expressing the actual placement in French, since gestures aligned with verbs overwhelmingly encode only path. There is thus a shift in focus between the two utterances in French, again supporting the notion that the construal of placement events in French does not include figure objects. 5.4
The effect of individual verbs vs. habitual usage
If the semantics of the individual placement verb guides linguistic attention to figure objects as they are deployed to describe a particular scene, French and Dutch speakers might be expected to gesture in different ways depending on verb use. For instance, when French speakers apply one of the more specific placement verbs in French, such as coller ‘stick, glue,’ or accrocher ‘hang, hook,’ they might attend to, and gesture about, objects to the same extent as Dutch speakers. Conversely, on those rare occasions where Dutch speakers use the general term doen ‘make, do,’ they might show no interest in objects, but rather focus on, and gesture about, movement or ground information. A closer inspection of the gesture data shows this not to be the case however. The Dutch speaker in fig. 8.5 uses the most general Dutch verb, doen ‘do, make’ to describe putting bananas into a bowl. Although this verb encodes only causation, the accompanying gesture reflects a focus on the figure object, the bananas, visible in the tight grip displayed by the right hand as it moves down. Similarly, the Dutch speaker in fig. 8.6 uses the verb duwen ‘push’ to describe a scene where a woman sticks chewing gum under the desk. This placement verb encodes the manner of moving but does not seem to necessarily require any object information. Nevertheless, the gesture accompanying the expression displays a hand shape incorporating the chewing gum. In other words, Dutch speakers display a focus on the object in gesture regardless of the semantic specifics of the verb chosen. Conversely, the French speaker in fig. 8.7 uses one of the specific placement verbs in French, coller ‘stick,’ to describe the chewing gum scene. His gesture shows no trace of the figure object. It displays
Figure 8.5 Placement in Dutch with a general placement verb, doen, ‘do, make,’ and a gesture encoding object information in the hand shape (right hand, grip around bananas)
Figure 8.6 Placement in Dutch with another specific placement verb, duwen, ‘push,’ and a gesture encoding object information in the hand shape (grip around chewing gum)
Encoding of placement events in gestures
183
Figure 8.7 Placement in French with a specific placement verb, coller, ‘stick,’ and a gesture encoding simple-path, with a flat hand, no object information
only a flat hand moving upwards under the table, reflecting a focus on the path towards the ground. Quantitatively, object-incorporating gestures are as likely to accompany the posture verbs in Dutch (74% aggregated across participants) as they are to accompany other verbs (72%). Conversely, in French the gestures encoding simple path are as likely to accompany mettre ‘put’ and poser ‘place’ (55%) as to accompany more specific verbs (52%). The gestural preferences and the language-specific gestural focus on figure objects vs. paths thus seem to spread to all other placement verbs in both languages, regardless of the semantic specificity of the individual lexical items. 6
Discussion
This study investigated whether the semantics of Dutch and French placement verbs affect speakers’ linguistic conceptualization or construal of placement events, and specifically, whether they select and attend to different types of spatial information as revealed by their speech-associated gestures. There are
184
Gullberg
four key findings. First, Dutch and French speakers talk differently about placement. Dutch speakers typically use one of three posture verbs, zetten ‘set/stand,’ leggen ‘lay,’ and hangen ‘hang.’ The verbs are specific with regard to object properties and spatial relationships between figure and ground, and so are used to describe different scenes. French speakers typically use one single general placement verb, mettre ‘put,’ encoding movement, which is used to describe all scenes. Second, Dutch and French speakers also produce two distinct preferential gesture patterns when talking about placement. Dutch speakers gesture about figure objects along with the movement, seen as object-incorporating hand shapes. French speakers gesture only about the path of the placement movement. Neither group imitates the perceived action. Third, the different gesture patterns are not due to differences in syntactic or information structure. The placement verbs in the two languages project similar syntactic structures and information is organized in the same way with objects being introduced in separate clauses followed by clauses describing the placement events. What differs are the verb semantics of the placement verbs. Information about the figure object, which is necessary to select the right verb in Dutch, is visibly present in Dutch gestures accompanying the placement descriptions. In contrast, this information is conspicuously absent from gestures co-occurring with the corresponding descriptions in French. Fourth, language-specific gesture patterns permeate the entire placement domain. Figure incorporation occurs in Dutch across all placement verbs, not only in cases where the actual choice of a posture verb hinges on figure information. Conversely, the focus on path in French gestures also occurs regardless of the actual verb used. The study set out to answer the question whether speakers of different languages whose verb categories differ also consider different types of information when talking about placement. Based on the gestural findings, the answer seems to be that they do. The gesture data indicate that Dutch and French speakers consider and select different sorts of spatial information when they describe the same placement events, as reflected in the two distinct and language-specific gesture patterns. Moreover, because syntactic structures and information organization are the same across both languages, the semantics of the placement verbs themselves are the likeliest source of differences in selection of spatial information: a Dutch focus on figure objects in conjunction with the path of movement (driven by posture verbs), and a French focus on simple paths towards grounds (driven by a general placement verb). The findings therefore provide support for the hypothesis that linguistic categories like verbs (and the meanings they encode) influence what information speakers consider and select for expression. That is to say, they support the notion of language-specific
Encoding of placement events in gestures
185
representations, event construal or ‘thinking for speaking.’ Speakers of Dutch and French construe everyday placement events differently as they prepare to talk about them and have two different ways of ‘thinking for speaking’ about placement events. Notice that these findings provide little support for the notion that placement events are cognitively basic or universal with a language-neutral representation. However startling it may seem in view of their grounding in sensori-motor patterns and practical actions, the construal of placement events nevertheless seems to be influenced by habitual linguistic encoding. The different gesture patterns in Dutch and French indicate two different construals of placement which differ both from each other and – crucially – also from actions observed, suggesting no simple action-related or language-neutral representation. Moreover, the fact that the evidence for language-specificity comes from gesture – the same modality used for the practical placement actions – considerably strengthens the claim of language-specific representations and conceptualization of placement. What about the finding that the language-specific gesture patterns permeate the entire placement domain regardless of actual placement verb used? At first glance, this result seems to falsify the hypothesis that event representations are based on the semantics of the placement verbs. However, the ‘thinking for speaking’ hypothesis suggests that it is the most frequently or habitually used verbs that guide information selection, not necessarily the verb used for a specific description (Berman and Slobin 1994a; Carroll and von Stutterheim 2003; Pederson 1995; Slobin 1996b). In Dutch, the most frequently and habitually used placement verbs are the posture verbs zetten and leggen, which require object information for the appropriate selection to be made. In French, the most frequent and habitual verb is the general placement verb mettre, which only cares about the translocation to the goal ground, not about the object being moved. Therefore, the language-specific event representations reflect the different parts of the spatial information that is habitually attended to as a direct result of the semantics of the placement verbs that are most frequently used in the respective language. The customary attention to certain types of information for habitual encoding of placement events affects encoding of other types of placement as well, such that the construal of what might be called basic placement constructions3 for linguistic encoding is governed by a “default setting.” The figure focus constitutes the default in Dutch, whereas the path is the default in French. Notice that habitual event construals are not static but may change under the influence of specific pragmatic or expressive needs. For instance, a focus on figures can be induced in French if it becomes relevant for pragmatic reasons. 3
I owe this term to David Wilkins.
186
Gullberg
For example, if an object occurs a second time in the stimuli and in a different position, this promotes a contrastive focus on the figure. French speakers may mention the orientation of the object explicitly in speech (poser a` plat ‘put flat,’ coucher ‘lay,’ mettre comme c¸a ‘put like that’), and sometimes produce concomitant gestures that incorporate the figure object. The crucial observation, however, is that French speakers do not typically do this, unless there is a reason to, whereas Dutch speakers regularly consider the object. Even at first encounter, they pick the posture verb that is appropriate to the orientation of the object. This verb choice is not governed by the contrast but is simply the default way of labeling the two scenes. In other words, the fine-grained focus on objects is habitual or a “default” in Dutch and driven by the semantics of the verbs, whereas it is optional and pragmatically driven in French (see Narasimhan and Gullberg 2006). The suggested differences in linguistic event construal between French and Dutch can also be considered in terms of the familiar distinction between satellite- and verb-framed languages (Talmy 1985, 2000a,b, 2003a). Slobin (1996b) has shown how satellite-framed languages like Dutch focus on manner of movement and multiple components of path (source, goal, and medium of path). Verb-framed languages like French instead focus on states and settings, and generally mention only one component of the path per clause. The patterns for placement have been re-formulated as a Dutch focus on the manner of being located (posture-focus), and a French focus on being in a location or state (location-focus) (Berthele 2004, 2006; Hickmann 2007; Lemmens 2002a, 2002b). The important contribution of gesture analysis and of this study is to allow such characterizations to become more precise, especially regarding what aspects of an event really are or are not part of the event construal. The Dutch satellite-framed focus on manner of movement and posture includes a focus on figure objects in the domain of placement. The figure object focus could be surmised on the basis of the verb semantics, but its actual presence is only confirmed by the gesture analysis. Conversely, the French verb-framed focus on settings and the state of being located could easily have included information about the figure object even if not overtly realized in the spoken verb, but again, the gesture analysis shows that it does not. Gesture has the same level of specificity generally as exhibited by the verb selection. That is to say, gestures allow us to see that French speakers in fact do not consider object-information but target exactly the information suggested by the verb. 7
Final remarks
This study shows that representations of placement events are not languageneutral but appear to reflect distinctions in verb semantics – at least as people prepare to talk about them. These findings contribute to a growing body of
Encoding of placement events in gestures
187
evidence for language-specific event representations, even in very concrete domains such as placement. Moreover, the results undermine the notion that placement events are a universal, basic event category not affected by language. Despite the experiential basis of these events, their representations are mediated by language. More generally, the study illustrates that gestures can reveal what spatial information speakers consider for expression beyond what is detectable in speech alone. Bridging the divide between linguistic and spatio-visual representations, gestures can provide more specific details about the information contained in event representations crosslinguistically. They therefore allow more detailed characterizations to be made of crosslinguistic differences in event construal in on-line speech production. The findings have important implications for language acquisition (see Slobin et al. in this volume). With regard to first language acquisition, although children may come equipped with innate concepts such as ‘cause to move,’ data of this type highlight that children must nevertheless attune these concepts to the specifics of the input language. That such attunement is a complex process involving much adjustment is indicated both by the development of placement verb usage (Narasimhan and Gullberg 2010) and of concomitant gesture patterns in later childhood (Gullberg and Narasimhan 2010). In the case of adult second language acquisition, adults with fully developed event representations in one language face the difficulty of adjusting such representations as they move to another where other distinctions might be made if they are to be native-like or ‘idiomatic.’ There is ample evidence that such adjustments are difficult, slow and gradual (e.g. Carroll et al. 2000; Kellerman 1995; Odlin 2005). Even very advanced second language learners, whose speech is formally accurate, continue to express the types of information they typically consider as relevant in their first language event construals, rather than shifting an interest to the information native speakers of the target language select. This gives them what has been labeled a ‘discourse accent.’ In the context of placement, Dutch learners of French and French learners of Dutch would have to consider different types of spatial information if they wanted to construe placement events in an idiomatic fashion relative to native speakers of each language. Dutch learners of French, for instance, would have to demote their interest in the object as being irrelevant to French placement (Gullberg ms.). More generally, the notion that event representations differ crosslinguistically and that categories like placement are not universal and languageneutral raises a slew of problems for models of language processing and of the (multilingual) mental lexicon. In such frameworks it is a standard assumption that there are language-neutral conceptual representations onto which new or multiple labels can simply be mapped (for overviews, see e.g., Green 1998; Gullberg 2009; La Heij 2005; Pavlenko 1999). The existence of
188
Gullberg
language-specific event representations and findings like the ones presented here put such models under considerable pressure to account for the intricacies of crosslinguistic differences in semantic-conceptual structures. New types of data and techniques are going to be necessary to explore, develop and model these issues further. Speech-associated gestures constitute one such data type that show us a little more about event representations and about what information speakers consider as relevant as they set out to talk about the external world.
9
Visual encoding of coherent and non-coherent scenes∗ Christian Dobel, Reinhild Glanemann, Helene Kreysa, Pienie Zwitserlood, and Sonja Eisenbeiß
1
Introduction
Perceiving and talking about events taking place in the world around us is an essential part of our everyday life and crucial for social interaction with other human beings. Visual perception and language production are both involved in this complex cognitive behavior and have been investigated individually in numerous empirical studies. Extensive models have been provided for both domains (see Hoffmann 2000; Levelt 1989, for overviews). But an integrative approach to the interface between vision and speaking, to “seeing for speaking,” is still lacking. Psycholinguists have only recently begun to experimentally investigate how visual encoding and linguistic encoding interact when we describe events1 and their protagonists or participants (see Henderson and Ferreira, 2004b). These studies have answered some, but raised many more general and specific questions: r How does visual encoding of events evolve; how detailed are representations of the visual world generated at various points during visual encoding? r How is visual encoding linked to stages of linguistic encoding for speaking? r Is the visual encoding of an event influenced by the linguistic task that subjects have to perform in experiments (e.g., describing scenes with full sentences vs. naming individual scene actors, and so on)? ∗
1
Research was supported by two grants from the Deutsche Forschungsgemeinschaft to the first author. The first author is most grateful to Antje Meyer and Pim Levelt for their help and support in conducting the first experimental series. We thank Kristin L¨ocker, Nadine Kloth, Stefanie Enriquez-Geppert and Malte Viebahn for their help in data collection and analyses. We are also grateful to Jens B¨olte, Heidi Gumnior, Annett Jorschick and Andrea Krupik for their suggestions and constant support. Also, we would like to thank Bettina Landgraf and J¨urgen Bohnemeyer for their comments on earlier drafts. Note that an event usually involves a change of state. We use the term ‘event’ here also to describe static line drawings or photographs that were made to depict an event. This is in line with earlier studies (see Griffin and Bock 2000). By ‘participants’ of events, we mean all (living and non-living) entities that are involved in the event. Thus, for a scene depicting the event of a clown giving a ballerina an apple, both protagonists, i.e. clown and ballerina, as well as the apple, are ‘participants’ in the event. To distinguish between participants of events and participants in experiments we call the latter ‘subjects,’ against common practice.
189
190
Dobel et al.
r Is visual encoding influenced by the type of stimulus – in particular, are there differences between line drawings and naturalistic stimuli? r Does the encoding of (parts of) coherent scenes differ from the encoding of (parts of) scenes in which objects, animals or people do not interact in ways that could be straightforwardly interpreted as meaningful, coherent action? In what follows, we present data from some of our own empirical studies that speak to such questions. To facilitate this, we first introduce the methods that we and others used to study visual and linguistic encoding during language production: eye-tracking experiments and brief presentation of visual displays. Next, we briefly present information on stages of visual and linguistic encoding and summarize data on interactions between visual and linguistic processing. All this serves as a background for a series of experiments in which we used different experimental tasks, stimuli and stimulus arrangements to investigate what kinds of visual representations speakers create, depending on how they describe (parts of) events. We used both line drawings and naturalistic pictures, presented either for ample time, or only very briefly. We asked our subjects either to fully describe the scenes, to merely name the action or one protagonist, or even just to indicate the location of one of the protagonists by a button push. Scenes were coherent in some experiments and non-coherent in others: the coherent scenes involved objects, animals or people that interacted in ways that could be straightforwardly interpreted as meaningful events; in contrast, the non-coherent scenes involved simple arrangements of unconnected entities (objects, animals, and people) that did not seem to interact with one another in any meaningful way. The data from these experiments, in combination with previous results, provide some preliminary answers to the questions above: brief presentations (<300 ms) of highly complex pictures (line drawings or photos of action scenes) allow speakers to activate speech production relevant information such as the role and identity of protagonists and whether or not protagonists engage in a meaningful interaction. The latter information can be recognized even if duration of presentations is as short as 100 ms, so extraction of event coherence seems to be quite an automatic process. Knowledge about the action depicted in a scene poses a special case, as it might necessitate overt attention shifts towards regions that allow apprehending the relevant action. For instance, in order to distinguish whether a person is scratching or stroking an animal one has to look at the person’s hands. This is especially so if actions do not allow a straightforward interpretation based on the distribution of protagonists in the spatial layout of the scene. In general, event complexity determined speech onset latencies, error rates and gaze patterns of upcoming utterances. Based on such detailed visual representations of the apprehension phase, the very first eye movements in our eye-tracking experiments are then quite task-driven and
Visual encoding of coherent and non-coherent scenes
191
allow a clear distinction between utterances in which the speaker is going to name, e.g., the agent or action of scenes. In some situations the very efficient apprehension phase is accompanied by a preview phase, in which a part of the scene is fixated that is not named first in an utterance. We found repeatedly more and longer gazes on action regions when speakers were going to produce full sentences in comparison to tasks where speakers had to name the protagonists in list format. We assume that the (overt or covert) encoding of action-relevant information is the crucial step in apprehending and describing action events, because it is the action that distinguishes an event from a static state. 2
Experimental paradigms
In the domain of event perception, psychological research (for an overview, see Bruce, Green, and Georgson 1996) focused mainly on topics like perception of motion (e.g., Aubert 1866; cited in Kaufman 1974), including biological motion (e.g., Johansson 1973). Other topics covered perception of causality (Michotte 1963), the attribution of dispositional states (Heider and Simmel 1944), the segmentation of events (Zacks and Swallow 2007), and the influence of perceptual cues on speech production (Tomlin 1997). In studies on the interplay of vision and language, eye movements are often monitored while subjects describe or apprehend pictures of objects or events. In addition, and often as a complement to this technique, we also employed the technique of brief presentation of complex events. We briefly motivate the use of these two experimental methods in the following section. 2.1
Eye tracking
Vision is an active process in which shifts of attention play a crucial role, and monitoring eye movements allows us to track attention shifts with high sensitivity in time and space (modern eye trackers like the ones used in our labs have a sampling rate of 500 Hz and an average gaze position error <0.5 degrees [Eyelink II, SR Research]). When we perceive the visual world – even when its complexity is mimicked by means of two-dimensional stimuli presented on computer screens – we move our eyes almost constantly. These high-speed movements (saccades cover as much as 500 degrees per second) are interrupted by fixations lasting typically between 200 and 300 ms (for an overview, see Rayner 1998). Saccades occur so frequently because at any fixation, only a small part of the visual world can be seen with high acuity. Acuity is very high in the foveal part of the eye (the central 2 degrees of vision) and diminishes rapidly in the parafovea (5 degrees to either side of fixation) and the periphery (beyond the parafovea). It is well-established that attention is shifted towards a specific location before the eyes move to this location
192
Dobel et al.
(Hoffman and Subramaniam 1995; Irwin and Gordon 1998). Thus, we can take eye movements as indices for an earlier shift of attention to the location to which the eyes just moved. Though the coupling of eye movements to attention shift is obligatory (Deubel and Schneider 1996), attention may shift covertly while the eyes stay put (Posner 1980). Thus, eye movements tell us something about shifts of attention, but the reverse is not true. If the eyes are fixed on a specific location, it cannot necessarily be concluded that attention is also bound at this location. 2.2
Brief presentation of visual displays
As we summarize below, eye movement studies have provided a wealth of information about visual and linguistic encoding and their interaction. However, moving the eyes takes at least 100 ms (Irwin 1992), and there is ample evidence that briefly presented naturalistic stimuli (as short as 20 ms) can be categorized successfully into living and non-living entities (e.g., Thorpe, Fize, and Marlot 1996; see also below). Moreover, there is no experimental evidence suggesting that fixating objects is a necessary step in order to name them (Griffin 2004); there is even some evidence speaking against it (Dobel et al. 2007; Morgan and Meyer 2005; see below). To investigate which and how much information can be taken up within one – parafoveal or peripheral – glance without eye movements (and, potentially, attention shifts) into the scene, we have adopted a method that has been used for decades in research on visual attention: the presentation of complex visual stimuli for brief durations (see Potter 1975). In sum, we can use eye-tracking experiments to investigate temporal links between steps of visual and steps of linguistic encoding, but this method cannot tell us how much information speakers can extract from a visual stimulus before they shift their attention and move their eyes. This information can, however, be gained from studies using brief presentation durations, which therefore prove a useful supplement to eye-tracking methodology. 3
Previous studies on visual and linguistic encoding
To present data from our own experiments in the necessary theoretical and empirical context, we now briefly review previous studies on visual and linguistic encoding and their interaction. 3.1
Visual encoding
The literature on visual object recognition is dominated by “integrating parts into wholes” approaches, which consider object recognition to result from a series of processing stages (see Goldstein 2002). A host of studies on visual
Visual encoding of coherent and non-coherent scenes
193
encoding has focused on distinguishing such processing stages. Typically, visual search tasks were employed, in which a target item (such as a T) has to be searched for in an array of distractors (e.g., Ls), varying in number (see Wolfe 1998, for an overview). A core finding is that search times are hardly affected by the number of distractors if target and distractors differ only in one basic feature, such as color (finding a red dot among green dots). In contrast, if a target is defined by a conjunction of two or more features that are also present in distractors (finding a red dot among blue dots and red squares), search times increase with the size of the distractor set. Such findings have resulted in a division of visual cognition into a preattentive and an attentive state (Treisman and Gelade 1980). The preattentive state is characterized by rapid and efficient processing of basic features (e.g., orientation, color, curvature), but not of (complex) objects. In contrast, attention is necessary to bind multiple features into objects in the attentive state. Wolfe (1998) described the preattentive “world” as a representation level inhabited by objects that can be searched for but whose identity is not known: “Preattentive processes divide the scene into ‘things’ and the preattentive basic features describe the ‘stuff’ out of which perceptual ‘things’ are made.” (Wolfe 1998: 43.) While the preattentive state has been intensively studied, much less is known about the nature of postattentive representations. The rather meagre literature claims that attended objects fall back into their preattentive state if attention is released again (Wolfe, Klempen, and Dahlen 2000; Oliva, Wolfe, and Arsenio 2004). While the results reported above, obtained with quite artificial stimuli, suggest that vision is a limited-capacity system with an emphasis on binding features to construct individual objects, recent studies have demonstrated that natural scenes seem to be processed differently. Such studies often manipulate stimuli with respect to their naturalness, complexity, and coherence. Thorpe and colleagues have shown that natural scenes can be categorized correctly into e.g., “living” vs. “non-living” at presentation durations as short as 20 ms (Delorme, Richard, and Fabre-Thorpe 2000; Fabre-Thorpe et al. 2001; Thorpe et al. 1996; Van Rullen and Thorpe 2001). This was even the case for large images (39 degrees high and 26 degrees wide) appearing in the far periphery (up to 75 degrees eccentricity; Thorpe et al. 2001). Such categorizations do not even require much attention, given that they can be performed correctly in the presence of a second, attention-demanding task (Li et al. 2002). These studies “upset the visual applecart” (Braun 2003), because they demonstrated the high effectiveness of the visual system in response to natural, naturalistic stimuli, which was not evident for “simpler,” artificial stimuli consisting only of a few lines and colors. However, these baffling recent demonstrations fit very well with other older findings. It has already been known for decades that the ‘gist’ of complex scenes can be apprehended very quickly, i.e. within 30–50 ms (Biederman
194
Dobel et al.
1972; Biederman et al. 1974; Hollingworth and Henderson 1998; Potter and Levi 1969; Henderson and Ferreira 2004b, for a review). In more precise terms, ‘gist’ is defined as “knowledge of the scene category (e.g., kitchen) and the semantic information that may be retrieved based on that category” (Henderson and Ferreira 2004a: 15). Interestingly, recent data show that consistency between foreground (e.g., a clergyman) and background (e.g., a church) of a scene matters. Davenport and Potter (2004) used naturalistic stimuli of persons and objects in front of a consistent or inconsistent background, presented for only 80 ms (e.g., a clergyman in church or in a boxing ring). Protagonists as well as backgrounds were processed more accurately when consistent. Thus, the authors concluded (p. 559) that “objects and their settings are processed interactively and not in isolation.” In sum, studies on visual encoding led to a distinction between preattentive and attentive states, with attention necessary for object (and scene) identification. Stimuli in these studies were quite artificial objects, such as a red dot or a blue dollar sign. There is ample evidence that the visual system is well-tuned to encode complex scenes, especially naturalistic scenes, and is very effective in retrieving knowledge about core aspects of such complex stimuli. The efficiency of the system increases with scene consistency. 3.2
Linguistic encoding
From the viewpoint of language production, naming objects or describing events requires several processing steps, from object recognition to the execution of articulatory motor programs. As a result of the efficacy of such processes, speakers produce more or less fluent speech. Models of language production largely agree on what these processes are (see Levelt 1989; Levelt, Roelofs, and Meyer 1999). Most models distinguish between three global domains of processing. The first level, the preverbal message level, is concerned with the “what” of the speaker’s message. Here, the intention of the speaker is molded into a message and appropriate lexical concepts are selected. When speakers describe what they see in the outside world, identification of relevant aspects, objects, protagonists, backgrounds, and so forth is a prerequisite for message generation and concept selection. The next level, also called ‘formulator’, takes pieces of message as input and translates concepts into language. The syntactic properties of the words that encode these concepts (lemmas) have to be retrieved and incorporated into sentence structure. At the final level, the phonological level, lexemes (word forms) of the lemmas are accessed and the phonological and morphological makeup of the utterance is spelled out. It is generally assumed that these levels operate in an incremental way: one component can start its work as soon as the preceding component has produced some output, even if the output is not complete (Bock and Levelt 1994).
Visual encoding of coherent and non-coherent scenes
195
Consequently, different levels of processing work on different parts of an utterance at the same time. 3.3
The interplay between visual and linguistic encoding
Although most experiments on speaking use visual input in order to control the content of the speakers’ messages and utterances, the interplay between vision and speaking has not often been addressed explicitly. Also, the introduction of fast eye trackers had no immediate impact. Eye tracking became a widely used technique in cognitive science once eye movements were recognized as a tool to investigate shifts in attention. But a second and third generation of eye trackers without chin rests or bite boards, which inhibit normal mouth movement, was needed to study speaking. The recent advent of head-mounted eye trackers has made it possible to track eye movements during speech production. One of the first studies of the interaction between visual and linguistic encoding investigated naming of adjacent objects with labels that were either semantically or phonologically related (Meyer, Sleiderink, and Levelt 1998). It was found that objects were fixated and named from left to right, with attending/fixating preceding naming by a few hundred milliseconds. More amazingly, objects were fixated far longer than expected: objects to be named were fixated until their word form/phonological code was retrieved (Meyer and van der Meulen 2000; Griffin 2001). Only then did the eyes move to the next object. While this succession of fixating and naming was originally interpreted as evidence for strict seriality of processing, it became evident that this is not the case. A lot of processing is going on before fixation by means of peripheral vision. Before objects are fixated in order to be named, they are already processed all the way down to the selection of a word form. Morgan and Meyer (2005: 438) conclude that “a substantial part of the processing of an object that is about to be fixated and named may be done prior to fixation.” When objects have to be named in a fixed order, fixation and naming follow this sequence starting from stimulus onset. A different pattern emerges when speakers have a choice of word order. In a study by van der Meulen (2001), speakers had to describe four objects and their configuration. The four objects were organized in two rows, forming a 2 × 2 grid. The two lower objects were either identical or differed (a frog and a key each above a ball, or a frog above a wardrobe and a key above a ball). Consequently, speakers had to decide which conceptual and syntactic structure to use, for example “the frog and the key are above the ball” or “the frog is above the wardrobe and the key is over the ball.” In such cases, speakers fixated the lower two objects before they fixated the frog for subsequent naming. Such fixations, taking place before the standard fixation-followed-by-naming pattern, were labeled ‘preview phase,’ in
196
Dobel et al.
contrast to the formulation phase in which every object is fixated right before it is named. The idea is that preview phases are needed to determine which syntactic structure should be used. Griffin and Bock (2000) coined the notion of ‘apprehension phase’ for a different but related phenomenon. They reported the first eye-tracking study in which subjects had to describe complex scenes with active or passive sentences (e.g. “The dog chased the mailman” vs. “The mailman was chased by the dog”). The formulation phase, with its alignment of fixation and naming, was preceded by an apprehension phase in which “speakers rapidly extracted the event structure of the pictures” (p. 277). In contrast to the preview phase, the apprehension phase was not characterized by fixations to any specific object. The existence of an apprehension phase was confirmed in a more recent study with displays of analog and digital clocks presented for naming (Bock et al. 2003). Again, speakers apprehended the different displays very quickly, without moving the eyes. That a rapid apprehension phase did take place was evident from the fact that speakers were able to immediately direct their first fixation to whichever region corresponded to the first part of their subsequent utterance (looking at the minute region for saying “a quarter past three” vs. looking at the hour display for saying “three-fifteen”). In sum, studies on the interaction between visual and linguistic encoding led to a distinction between a rapid and highly efficient apprehension phase, which is under certain circumstances accompanied by a preview phase, and a formulation phase. The apprehension phase is best characterized by the quick parafoveal uptake of information about a scene that can be used to guide upcoming fixations. One or more subsequent gazes at scene regions before fixating the object that is named first in an utterance are subsumed under the preview phase. Finally, a sequence of fixating-and-naming steps takes the eyes to relevant visual information during the formulation phase. This seems a nice and coherent picture of what happens during “seeing for speaking,” but a caveat is called for. The studies summarized above all use different types of materials and often different tasks. This raises a number of questions: Is the apprehension phase always evident, independent of task and material type? Do particular stimuli and tasks induce a preview phase and others not? We believe that allocation of attention, eye movements, and phases of fixating and naming all depend on the input and the task. Below, we report under which conditions speakers fixate ‘action regions,’ i.e., regions that are informative with regard to the categorization/identification of the action taking place (for instance the region that includes the hands of a protagonist who is handling an object). We also investigated whether visual encoding differs for coherent and non-coherent scenes in which to-be-identified people, animals, and objects do not interact in meaningful ways. Moreover, we examined whether scene coherence is important only for tasks that involve the complete
Visual encoding of coherent and non-coherent scenes
197
linguistic encoding of an event – and not for simple tasks such as naming the objects in the scene in list format.
4
Experimental studies
In the following, we present an overview of our research and address some of the questions raised above. These experiments involve a variety of tasks and employ stimuli that differ with respect to their naturalness, complexity and event coherence. We use eye tracking and the brief-presentation method, depending on the questions at hand, and present experiments addressing similar questions in four separate groups. Table 9.1 presents an overview over all the experiments reported in the following sections.
4.1
Describing line drawings of complex scenes
In our first series of studies with line drawings of complex scenes, Dutch speakers had to describe two- and three-participant events (e.g., for three-participant events: a policeman handing a ticket to a driver, a waitress serving a cocktail to a customer, a child showing a drawing to a teacher, etc.; for two-participant events: a boy pulling a suitcase, a girl carrying a flag). Mostly, speakers used double-object or prepositional-dative sentences to describe three-participant events (Dobel ms.; Meyer and Dobel 2003). In line with earlier research, we observed a strong linkage of fixating and naming during the formulation phase. Before formulation, however, a preview phase became evident. With their first or second gaze, speakers fixated regions that were informative with regard to the action taking place. In the following, we will refer to these regions as ‘action regions,’ For example, in pictures displaying three-participant events in which agents give or show patients/themes to recipients, these were the agents’ hands (containing the transferred or shown object). We assume that gazes to these regions enable speakers to classify the event conceptually, assisting them in the linguistic encoding of the verb. We found such a gaze pattern for stimuli depicting two-participant events as well, though with shorter gaze durations on the action region before speech onset. Previewing was independent of the verb’s position in the sentence. Verb position was varied by a lead-in word with which sentences had to be started. This led to a mandatory position of the verb at the beginning, middle or end of the sentences. Compare, for example, the following Dutch utterances: Hier geeft de man bloemen aan de vrouw (literally ‘here gives the man flowers to the woman’) and Omdat de man bloemen aan de vrouw geeft (literally ‘because the man flowers to the woman gives’). To investigate whether this gaze behavior is task-dependent, we also compared scene description in form of a sentence with list production, i.e., the bare
Table 9.1 Overview of methods for experiments reported in the chapter
Exp. no. and reference Stimulus material
Task
Presentation and data recording method
1. Describing line drawings of complex scenes Eye-tracking during 1 (Meyer and Dobel, Line drawings of two- Sentence production speech; free viewing with lead-in verb vs. 2003; Dobel, ms.) and three-participant list production events 2. Describing naturalistic representations of complex scenes 2a (Kreysa et al. ms.) Naturalistic stimuli of Naming of agent vs. action one-, two-, and three-participant events (Example fig. 9.1) 2b (Kreysa et al. ms.) Photo stimuli from Sentence production Experiment 2a 3. Describing briefly presented action scenes 3a (Dobel et al. 2007) Semantically coherent Description of any perceived details and mirrored line drawings of two-participant scenes (giving and shooting; Example fig. 9.3) 3b Naturalistic stimuli Description of any from Experiment 2a perceived details 3c (Glanemann 2008)
3d (Glanemann 2008)
3e
3f
Eye-tracking before speech onset
Eye-tracking before speech onset Brief presentation (100, 200, or 300 ms)
Brief presentation (100, 200, or 300 ms) Button press indicating Eye-tracking during free viewing position of patient in the scene
Naturalistic stimuli of two-participant events (Example fig. 9.4) Naturalistic stimuli Button press indicating Brief presentation (150 ms) from Experiment 3c position of patient in the scene Naturalistic stimuli Naming the action Brief presentation from Experiment 3c (150 ms) in the periphery Naming all plausible Free viewing Blurred version of actions photo stimuli from Experiment 3c
4. Apprehending and encoding of actions from non-coherent displays 4 Coherent and List production of all non-coherent line participants drawings from Experiment 3a, with additional visual incoherence condition (lines separating participants)
Eye-tracking during free viewing
Visual encoding of coherent and non-coherent scenes
199
naming of the involved participants (“man, flower, woman”), without using verbs, sentence structure, or inflectional markings. First, speakers devoted more early gazes to the action region when they were about to produce a sentence. We took these early fixations on the action region as evidence for models that assume that verbs are retrieved before sentences are initiated (Ferreira 2000). Second, and quite unexpectedly, we also observed a preview phase during list production, i.e. when no verb had to be retrieved. However, this preview phase had different properties. In contrast to sentence production, in which patients or recipients weren’t fixated at all before utterance onset, they were quite thoroughly fixated (with more and longer gazes) before speech onset in list production. This result was unforeseen, given that others found no preview phase in multiple object naming where speakers could name objects in any particular order (e.g., Meyer et al. 1998). We believe that this difference arose because our action events consisted of several participants involved in a meaningful, coherent interaction, instead of a simple arrangement of unconnected entities. At the present stage we do not know, however, if such preview phases appear in any coherent object arrangement or whether the animacy of the participants – and their perceived ability to act intentionally – plays a role. One clear conclusion from these studies is that visual scene encoding depends on the type of stimuli used: unconnected objects vs. several participants engaged in a meaningful action. In addition, there is an influence of stimulus complexity on the extent of previewing. We found the strongest expression of a preview phase for three-participant events. Such events are conceptually quite complex: they typically involve two animate participants and an object that is exchanged between them. Two-participant events are conceptually less complex, leading to a somewhat less prominent preview phase in sentence production. This corroborates findings by Griffin and Bock (2000), who also observed only “short initial gazes to object and action regions” (p. 278) of two-participant events. However, some form of preview seems unavoidable with this material: we do find a preview phase even in the list production task, though not to the action region. So, last but not least, the linguistic task matters: it makes a difference whether a complete sentence or a list of unconnected nouns is required. These results suggest that most initial gazes go to the action region if speakers are in need of a verb for the upcoming utterance. 4.2
Describing naturalistic representations of complex scenes
As was pointed out earlier, there is some evidence that photographic, i.e. naturalistic, images can be processed more efficiently than line drawings (see Delorme et al. 2000; Fabre-Thorpe et al. 2001; Li et al. 2002; Davenport and Potter 2004). So we were interested to see whether our results for line drawings
200
Dobel et al.
of complex action scenes would be replicable with more natural stimuli (Kreysa et al. ms.). Moreover, we wanted to compare scenes with different levels of complexity. We used events that involved one protagonist, typically encoded by a sentence with an intransitive verb, and events with two protagonists and/or an additional object, which would typically be described with transitive or ditransitive verbs. And finally, we expanded the set of actions to include some in which the manipulating body part of the actor and the manipulated object were spatially separable. This enabled us to investigate the role of an action region in verb encoding separately from the agent’s role. To these purposes, we filmed a number of actions that are usually described by means of intransitive, transitive, or ditransitive verbs in German, the language in which this study was conducted. Actions were performed by one or two actors in front of realistic and appropriate backgrounds (such as a park, a kitchen etc.). We used four actors who were easily distinguishable by gender, clothing and hairstyle. Still pictures that best represented the portrayed action were digitally copied from the movies (see fig. 9.1). In a pilot study, we pretested the recognizability and typical naming of these actions and collected ratings on ‘action regions.’ Ten uninitiated raters were asked to name the action depicted on a paper copy of each image and encircle the picture area where they felt this action was primarily taking place. These putative action regions, as well as the heads and bodies of the actors, served as regions of interest (ROI) for coding and analysis in subsequent eyetracking studies, in which we investigated which parts of a scene were fixated before speech onset, depending on the type of utterance being produced. Only the twenty most consistently named actions were used for each level of transitivity and only the action regions with very high consistency across subjects. In Experiment 2a, we varied the speech task between two conditions: subjects had to name either the agent of the scene (agent naming) or the performed action (action naming). Fictive first names for the four actors were learned before the experiment, and – as in all experiments of this kind – subjects were provided with several examples and practice items. There were clear effects of event complexity. Speech-onset latencies were longest and error rate2 was highest for three-participant actions in both agent and action naming, and – not surprisingly – such effects were overall stronger for action naming than for agent naming. The viewing pattern demonstrated a significant interaction of task and ROI: in agent naming, speakers most often fixated on the agents’ heads prior to speech onset, while in action naming, when speakers were formulating a verb, action-related regions attracted most fixations. An interesting pattern emerged for those three-participant scenes in 2
For agent naming, an error was coded if subjects used the wrong name or if they named the patient or recipient (5% of trials). For action naming, the use of a verb with a completely different meaning counted as an error, as did a verb which required a different sentence structure (17%).
Visual encoding of coherent and non-coherent scenes
201
(a)
(b)
Figure 9.1 Examples of the naturalistic stimuli used in Experiments 2a, 2b and 3b, displaying events with one participant (a), two participants (b) and three participants (c). (Illustration (c) is an enlarged detail of the actual stimulus picture.)
202
Dobel et al.
(c)
Figure 9.1 (cont.)
which manipulating body parts and manipulated objects could be separated (e.g., the agent’s hands and the ball in throwing a ball to someone, where the ball is in mid air): in such cases, both regions were fixated prior to action naming. Hence, the action region seems to comprise both the manipulated object and the manipulating body part of the agent. Overall, recipients and patients of actions were hardly ever fixated, not even in action naming. Figure 9.2 displays the proportion of the time between picture onset and speech onset (in percent) which the eyes spent on each ROI in action and agent naming for the three-participant events. The results are displayed separately for images where body parts and manipulated objects are close together (five ROIs)
Visual encoding of coherent and non-coherent scenes
203
70
Gaze time %
60 50 40 30 20 10 0 Object
Action
Agent body
Agent head
Recipient body
Recipient head
Region of interest Naming agent 5 ROIs
Naming action 5 ROIs
Naming agent 6 ROIs
Naming action 6 ROIs
Figure 9.2 Experiment 2a. Mean proportion of gaze time spent in different ROIs, depending on task (percent of time between picture onset and speech onset)
and where they are separated (six ROIs). A separate examination of the first two fixations after picture onset revealed the same pattern: for both tasks, agent and action naming, approximately 50% of first fixations landed on the taskrelevant region. If not the first, about 70% of second fixations landed on that region. These results support conclusions drawn by Bock et al. (2003): patterns of initial eye movements in language production are largely linguistically or task-driven, and no region in itself attracts attention by visual salience. We take our results as evidence that visual encoding of actions depends on event complexity and task. Several regions need to be focused upon to conceptualize complex scenes with three-participant events. This is not necessary for single-participant events (such as running or crawling), in which only one region (the active body parts) matters for event apprehension. Margetts and Austin (2007) report that three-participant events are linguistically expressed in manifold ways in the world’s languages, while constructions used for transitive two-participant events are much more similar (see also Narasimhan, Eisenbeiß, and Brown 2007a,b). Obviously, an increase in contributing regions allows for more diverse ways of event encoding, and this poses a higher processing load on the cognitive system. It thus seems that three-participant events are special due to their conceptual complexity, which is reflected in the number of attended regions. Consequently, there is less one-to-one mapping of visual and
204
Dobel et al.
linguistic encoding, and many factors can influence which region receives most attention. In Experiment 2b we asked a different group of speakers to describe the same naturalistic events with complete sentences while recording their eye movements, e.g., Katrin zeigt Tanja die Fotos ‘Katrin is showing Tanya the photos’; in German, bare-dative indirect objects are generally preferred over prepositional datives). We could not corroborate the results obtained with line drawings in the first series of experiments, which, as discussed above, revealed a preview fixation phase on action regions when events had to be described with complete sentences. One explanation for this initial preview of the verb-related region before any fixations related to order of mention in the description was that subjects might be encoding the verb and its argument structure prior to commencing the formulation of their sentence. In this experiment with naturalistic material, however, subjects did not show any indication of previewing the action region. Instead, their first fixations went straight to whichever region would be mentioned first in the sentence they were producing, i.e., the agent (all sentences were active). We believe that naturalistic stimuli allow for a better use of peripheral vision to encode the action, thereby reducing the need for preview fixations. We will discuss some such issues of action encoding below. At the same time, the results of our eye-tracking studies with naturalistic stimuli confirm the notion of an initial peripheral apprehension phase, which serves as a guideline to where the eyes should move and to establish a suitable starting point for the upcoming utterance. This is suggested by the fact that the very first fixation into the scene tends to go directly to the region most relevant to subjects’ upcoming speech. The apprehension phase must be extremely short, but it seems to be all that is needed to extract enough event information to determine the likely agent. Moreover, the data corroborate the idea of action regions which are relevant for verb encoding and vary with event complexity. We also found some evidence that the manipulating body parts and manipulated objects are both attended to for verb encoding, as evidenced by the cases in which the two were separable. Last but not least, we observed a strong impact of the linguistic task (agent, verb, or sentence production) on various measures of visual information uptake. As mentioned earlier, with eye-tracking experiments we gain insight as to what the eyes (and attention) do before and during speaking, which regions of a display are fixated, for how long, and in which order. Only little can be learned about what happens before the eyes move into the scene, except what we can glean from where the eyes move as a consequence of the postulated apprehension phase. To find out which and how much information can be taken up at a glance, we have to revert to a different method: the brief presentation of visual scenes.
Visual encoding of coherent and non-coherent scenes
4.3
205
Describing briefly presented action scenes
We manipulated the naturalness of the stimuli and the complexity and coherence of the depicted events in our experiments with brief presentations. Scenes were displayed for durations between 100 and 300 ms, randomly appearing in one of the four quadrants of a monitor and subsequently masked. The fixation point was in the center of the screen so that upcoming stimuli fell in our subjects’ visual periphery. Although shifts of fixation were not controlled for in these experiments, even the longest stimulus presentation of 300 ms would generally allow at most one short fixation into the scene, so the uptake of information during this time can be mainly attributed to parafoveal and peripheral vision. In Experiment 3a (Dobel et al. 2007), stimuli consisted of line drawings similar to those used in Experiment 1, depicting events with three participants (e.g., clown, baker, bear) involved either in a coherent (meaningful) or non-coherent (meaningless) interaction. Coherent scenes displayed different activities of giving or shooting. Non-coherent scenes were created by mirroring both animate participants so that they no longer faced each other and action coherence was lost (see fig. 9.3 for an example). Upon presentation of the mask, native speakers of German had to describe in as much detail as possible everything they had been able to identify. Our subjects were very good at judging whether scenes were coherent (about 80% correct), even at the shortest presentation duration. This is comparable to the fast extraction of a scene’s ‘gist’ (see Biedermann 1972). Moreover, even at presentation durations of 100 ms – which appear at first to consist just of a brief flash on the screen – agents and recipients were already identified and named correctly in 30% of the cases (which is far better than guessing, although many subjects reported that this was what they were doing). Note that the identification of protagonists (Native American, clown, king) often involved detailed information. At 200 ms and beyond, performance for agents increased to about 75%, and for recipients to 60%. Agents3 were always identified better than patients, and agent identification depended on scene coherence: agents were identified correctly more often when they were part of a coherent scene. Actions (and themes) were identified less well than agents and recipients, even though there were only two kinds of actions. Their identification improved significantly with longer presentation durations. While eye-tracking studies tend to emphasize the serial, piecemeal uptake of information, the method of brief presentation times stresses the role of extrafoveal vision and the uptake of information without overt attention shifts. It seems that the overall ‘event structure’ (Griffin and Bock 2000) of scenes, which we manipulated here in terms of coherence, can be extracted very efficiently 3
By ‘agent’ we mean the protagonist that was identified as the agent in the coherent scenes.
Figure 9.3 Experiments 3a and 4. Examples for coherent and non-coherent scenes (taken from Dobel et al. 2007)
Visual encoding of coherent and non-coherent scenes
207
Table 9.2 Experiment 3a. Percentage of correctly identified actions for different presentation durations and event types (intransitive: one-participant events typically described with an intransitive sentence; transitive: two-participant events with one or two animate participants typically described with transitive sentences; ditransitive: three-participant events typically described with ditransitive sentences). Chi-square and p values indicate corresponding comparisons between cells. 100 ms
200 ms
300 ms
Chi square
Intransitive
33
53
80
Transitive: 1 person
11
39
74
Transitive: 2 persons
7
35
71
Ditransitive
12
27
58
104.35 (2), p < .001 296.67 (2), p < .001 74.33 (2), p < .001 120.72 (2), p < .001
Chi square
64.22 (3), p < .001
41.13 (3), p < .001
31.82 (3), p < .001
and rapidly. Even agents and recipients can often be identified without overt fixations, with better performance for agents (which again is indicative of genuine scene apprehension). In contrast, identification of actions seems to require more processing time (and preferably fixations into the scene). We addressed the visual encoding of actions in more detail in a further experiment, 3b, for which we used the method just described. However, instead of line drawings we presented the naturalistic stimuli from Experiment 2a. Recall that these stimuli were characterized by increasing event complexity and varying numbers of actors. As table 9.2 shows, all types of actions were identified better with increasing presentation durations, and actions with fewer participants were identified better than actions with more participants. This is the same picture that we found for line drawings: even with naturalistic images, action identification apparently requires overt attention shifts, particularly for events of higher complexity involving more participants. Nevertheless, even with very short presentation durations some actions could be recognized and overall performance was surprisingly good. We explore below what additional factors besides event complexity make actions easily recognizable. Thus, it seems that within a single glance at a scene, representations are created that are not so coarse after all. These representations can then be used to guide eye movements to regions that need further exploration, for example, action regions.
208
Dobel et al.
While in the last two studies we mainly focused on naming participants and actions, we now turn to role apprehension and identification of actions, again using naturalistic stimuli. In Experiment 3c, we used the ‘patient detection task,’ which is not unlike the ‘agent naming task’ employed in Experiment 2a (Kreysa et al. ms.). As in that experiment – and in contrast to all other experiments described in this section – the stimulus remained on the screen for free inspection for as long as it took the subjects to decide on the patient, and eye movements were monitored. Patient detection was first used by Griffin and Bock (2000), who asked their subjects to locate the patient in line drawings of two-participant events by fixating it and subsequently pressing a button as quickly as possible (patient positions were balanced over items). Their results showed that the patient was already fixated more than the agent just half a second from image onset. As described above, this was taken as crucial evidence for an apprehension phase, in which participants’ roles can be extracted in a form that allows subsequent eye movements to be directed according to the task demands. We wanted to see whether such findings could be replicated with naturalistic stimuli (for an example, see fig. 9.4) and expected first fixations to land preferably on the patient, given that the position of the patient matters for the patient detection task (Glanemann 2008). In contrast to Griffin and Bock (2000), subjects were instructed to report on the patient’s position only by button press and not by using saccades, which kept them na¨ıve and left eye movements as natural as possible. There was no explicit instruction for eye movements. Against our expectations, the first systematic fixations went to patients in only 34% of the cases whereas they went to the agent (and action) region in 56% of the cases. This constitutes a significant task effect on first fixations. Furthermore, in the vast majority of cases (74%) both actors were inspected before the button press indicating patient detection. This resulted in rather long latencies of about 1000 ms. The tendency to fixate both actors, as well as the long latencies, were unexpected given that coherence can be apprehended at a first glance. In Experiment 3d, we therefore presented the same stimuli for 150 ms and masked them immediately afterwards. With the pre-trial fixation point lying outside the area where the stimulus appeared, this presentation duration was too short to complete a gaze shift into the stimulus. This procedure thus enabled us to find out whether eye movements are truly necessary to complete the patient detection task. In this experiment, to be on the safe side, we actually monitored eye movements and excluded the rare cases (1%) in which a gaze went into the picture. Subjects’ task was again to indicate the position of the patient by button press. Accuracy was very high (93%) with brief and masked presentation, and barely different from the free viewing condition (98%) reported previously
Visual encoding of coherent and non-coherent scenes
209
Figure 9.4 Examples for stimuli of actions involving two participants, used in Experiments 3c, 3d and 3f
(Experiment 3c). Moreover, decisions were about 350 ms faster than in free viewing. It thus appears that peripheral visual information is sufficient for role apprehension, and the observed eye movements under free viewing conditions subserve monitoring or checking processes before an overt response is made. Reaction times, derived here from manual role decision, do not reflect the time needed to assign thematic roles to actors.
210
Dobel et al.
There are two possibilities for how such high performance in event apprehension can be achieved without overt fixations. First, subjects may genuinely identify the extra-foveally represented action, which then enables them to identify the roles of the actors. Second, the global structural layout of the whole scene, also called ‘spatial envelope’ (Oliva and Torralba 2001), which can be recognized peripherally, might be sufficient to allow role identification. In the first case, the same brief exposure as above should allow for similar performance rates for action detection as for patient detection. If, however, role assignment hinges on the global layout of the scene, on body posture of the actors and their relation to each other, action information need not be available. Based on the data by Kreysa et al. (ms.; see Experiment 2b) we hypothesized the latter, assuming that action encoding needs foveal vision in action-relevant regions. To test this, in Experiment 3e we presented the same action photographs for 150 ms in the visual periphery and now asked (new) subjects to name the action. Unexpectedly, their performance was quite good: the correct verb (as elicited in the pretest of the material) was produced in nearly two-thirds of trials. However, variation in task performance across scenes was impressively large, ranging between 4% and 100%. Some actions such as kicking a person or throwing a ball at a person could be named nearly perfectly, whereas others, such as feeding a person or giving a present to somebody, could hardly be identified. We noticed that some of the differences between photographs with high and low performance were related to the body postures of the two actors. Highly dynamic actions such as kicking a person show very typical relative positions of the actors’ arms and legs. Moreover, it is hard to think of an alternative action compatible with the same body posture. Such scenes were usually named correctly. In contrast, small manual gestures, facial expressions and specific objects appeared to be more relevant for action identification in scenes with high error rates. Body postures alone were ambiguous in these scenes. Since peripheral vision does not provide the detailed, high-resolution information necessary to identify facial expressions and small objects, we assumed that actions were inferred where possible on the basis of perceived body postures. Experiment 3f was designed to test this hypothesis. We used the same action photographs as in Experiments 3c–d, but transformed them into blurred (Gaussian filter, 10 px.) and grey-scale stimuli. This procedure reduced local, high-spatial frequency information so as to mimic the peripheral viewing condition in Experiment 3e. A new group of subjects saw these blurred photographs as a slide presentation with presentation times of 250 ms per image. The subjects’ task was to name the action they thought was the most likely one being depicted. As neither facial expressions nor the object used in an action could be identified with blurred images, answers could only be given on the basis
Visual encoding of coherent and non-coherent scenes
211
of the perceived layout of body postures. The total number of verbs (synonyms counted as one verb) produced by all subjects was submitted to the analysis and, as expected, the number of alternatives elicited in this experiment correlated negatively with the performance in the action naming task in Experiment 3e. The more alternatives were suitable for a given blurred action, the fewer were the correct identifications in the peripheral naming task with the original photographs and vice versa. Importantly, neither the proximity to the fixation cross nor the size of the action region correlated with action naming performance. That is, the high performance for some actions in Experiment 3e was due to low ambiguity of the spatial layout of these action scenes. In summary, peripheral viewing with very brief presentation times allows for scene apprehension (understanding the overall gist and coherence of a scene as well as role assignment) of complex action scenes, both from line drawings and from highly complex and visually rich photographs. In contrast, the actual encoding of actions usually requires fixations on action-relevant areas. Exceptions are actions with unambiguous body postures, which can be inferred from the spatial envelope without gaze shifts. Interestingly, we found task effects even for data obtained at extremely short presentation durations. We found differences as a function of scene coherence, both in the identification scores for agents (Experiment 3a; Dobel et al. 2007) and in first gazes into the picture, where we obtained a (small) task effect in patient detection. If, as we demonstrated, event coherence and role apprehension can be extracted very rapidly on the basis of peripheral vision, a final issue we wish to address is whether and how visual encoding might be affected by scene coherence. We again resort to eye movements as a measure for visual encoding.
4.4
Apprehending and encoding of actions from non-coherent scenes
With brief and masked presentation of scenes, the “event’s causal structure” (Griffin and Bock 2000: 277) is quickly extracted without the need for overt attention shifts. Does scene coherence affect what the eyes do when they move into the scene? In Experiment 1, we found evidence for a preview phase when subjects had to produce lists of unconnected nouns to describe participants of a coherent scene. They inspected more regions before speech onset than were needed for naming the first participant. This preview phase was characterized by enhanced fixations on both agent and recipient/patient of a scene. We had not expected a preview phase for list production and assumed that it arose due the engagement of the participants in a meaningful action. If this holds, rendering a scene meaningless by manipulating visual or causal coherence should reduce such a preview phase (remember that it was absent in the multiple-object naming studies cited in section 1).
212
Dobel et al.
To investigate this question we used the coherent and non-coherent stimuli from Experiment 3a (Dobel et al. 2007) described above (see fig. 9.3). In addition to causal coherence, we also manipulated visual coherence by drawing lines between participants. This visually reduced their connectedness. We analyzed only events of shooting because in these the arrangement of participants in relation to each other remained fixed even after mirroring. The subjects’ task was to assign common-noun labels to the two animate participants and the theme object (e.g. “clown,” “arrow,” “buffalo”), with free order of mention. Subjects were not shown the stimuli before and so they did not know the set of characters and objects. Eye movements were measured and speech was recorded, and the position of the agent and patient in the images was counterbalanced. None of the manipulations had an effect on which participant was named first: agents4 were named first in the vast majority of cases (ranging between 80% and 90%). In line with this finding and with Experiments 2a and 2b, we found that the first fixations were also mostly directed at the agent (66% across all conditions), showing again that first fixation and first mention tend to address the same entity. The frequency of first fixations to the second most frequently fixated entity, the transferred object (e.g. arrow, bullet), was much lower, at 19%, and it was fixated presumably only due to its proximity to the fixation cross. Interestingly, the tendency to fixate the agent first was modulated by coherence. In non-coherent scenes, the agent was fixated first in 79% of cases, and much less (54%) in coherent cases (χ 2 = 15.4; df = 1; p < .001). No difference was found for the purely visual manipulation (i.e., the vertical line dividing the scene; χ 2 = .40; df = 1; n.s.). Analysis of gaze paths supported these findings: visual manipulation had no effect, but causal coherence did. In non-coherent versions, subjects looked first at the agent and then into the action region in 49% of cases. This was by far the most frequent gaze path; the next most frequent was looking first at the agent and then at the transferred object (20%); all other combinations were below 10%. In contrast, in coherent versions the gaze pattern was more dispersed: the agent-action pattern was much less pronounced (25%), and there were also comparatively high incidences of inspecting first the agent, then the instrument (16%), first the action region and then the agent (18%), and first the transferred object and then the agent (16%). These results are significant over all possible gaze patterns that arose for the first two fixations (χ 2 = 33.2; df = 13; p = .002). It thus seems that the absence of causal coherence leads to a rather consistent and task-induced pattern of fixating the agent first and thereupon naming it, while coherent scenes evoke a variable visual encoding pattern. We found no clear evidence for a task-related preview of all protagonists in coherent 4
As before, we use ‘agent’ to refer to the protagonist that was identified as the agent in the coherent scenes.
Visual encoding of coherent and non-coherent scenes
213
scenes, as we did in Experiment 1. However, the pattern for non-coherent scenes has to be interpreted with caution. By mirroring participants in noncoherent scenes we also moved the agent closer to the fixation cross, and the action region (the agent’s hands and instrument) away from it. Thus, the more straightforward encoding and naming pattern in non-coherent scenes may have resulted from higher saliency of the agent due to its closeness to the fixation cross. Based on the experiments described above, we prefer this more conservative interpretation, and conclude that coherence or incoherence of a scene is apprehended within fractions of a second and does not require overt attention. 5
Discussion
The eye-tracking and brief-presentation studies discussed in this chapter provide some initial answers to our questions about visual and linguistic event encoding. Several studies investigating the interplay of vision and language in describing events have produced evidence suggesting that a rapid apprehension phase is followed by a slower, more deliberate, and attention-demanding formulation phase (Griffin and Bock 2000; Bock et al. 2003; Dobel ms.). While we know that the latter is characterized by a successive linkage of fixating and naming (as found also in multiple-object naming studies), the nature of the apprehension phase poses more of a puzzle. Several of our studies using line drawings or naturalistic pictures have demonstrated that initial eye movements are task-dependent, that is, guided by language production. This means that at least some understanding of the event must be achieved before any fixation, to direct the eyes to relevant sites in the scene. This initial processing of gist in the absence of fixations is what we have termed the apprehension phase, during which speakers seem to acquire sufficient information peripherally to be able to assign roles to the participants with high accuracy. There are situations in which the apprehension phase is followed by a preview phase, in which one or more regions of the visual display are fixated before the first of the fixating-and-naming sequences which characterize the formulation phase (see Dobel ms.). It seems plausible that these preview fixations serve to check the rough interpretation of the scene drawn from the apprehension phase and to identify those aspects which have remained unclear. For example, action regions of events were fixated if speakers had to describe three-participant events (depicted by line drawings) with full sentences. Such gazes were evident before the first-named participant was fixated. Gazes on action regions were less prominent for two-participant events and when the speakers’ task was to produce a list of nouns to refer to participants of the event. But again, when preview phases were evident, they were task-dependent. We took the initial gazes on the action region in sentence production to be compatible with
214
Dobel et al.
linguistic models that assume early verb access in sentence production (Ferreira 2000; Meyer and Dobel 2003). Similarly, examining several of the participants is a good preview strategy for list production. In sum, based on the current literature and on our own data, it seems safe to conclude that both the type of event and the upcoming utterance (that is, the linguistic task) influence early fixations for visual encoding. With regard to visual encoding of actions, our studies demonstrated that visually and conceptually, more complex events lead to more complex gaze patterns on action-relevant regions. While events with one or two participants are encoded quite straightforwardly, this is not the case for three-participant events. Here, both the manipulating body part and the manipulated object are often fixated before naming. This finding is supported by our studies employing brief presentation of action events. There we found that more complex actions (involving more participants) required longer presentation durations than simpler actions when viewers were asked to recognize and name the actions. Furthermore, actions can be apprehended quite rapidly if they allow for few alternatives. This was demonstrated by comparing the results obtained in Experiment 3f, when actions were named on the basis of blurred photographs, to those obtained in Experiment 3d, when non-blurred images showing the same action scenes were briefly presented and masked. Success rates across subjects in naming briefly presented actions were highly correlated with the number of naming alternatives in the blurred images. We used brief presentations to investigate how much information can be taken up with a single glance. With both line drawings and naturalistic stimuli, we were surprised to see that speakers could give quite detailed descriptions of events on the basis of brief presentation durations. They were able to very efficiently extract whether a scene made sense or not, i.e., whether a ‘gist’ could be derived. Interestingly, they were also able to name agents and recipients of events quite well, even when presentation duration was too short for eye movements. Of course, attention probably did shift within the scene, even if after the subsequent saccade the eyes found the mask instead of the picture. Action naming was less successful; we have discussed a number of factors which may be responsible here. We conclude therefore that the apprehension phase, on the basis of which initial eye movements are launched, can result in very detailed representations. In addition, we have also demonstrated that scene coherence can be correctly appreciated with presentation durations as short as 100 ms. Further experiments targeted the question as to how this is possible. It could be based on overall stimulus ‘gestalt,’ with protagonists facing each other, or alternatively depend on more complex features such as gaze direction of interacting animate participants. Monitoring eye movements while speakers described coherent
Visual encoding of coherent and non-coherent scenes
215
and non-coherent scenes with lists of nouns demonstrated that initial gaze patterns were quite task-dependent if events were non-coherent: the first gaze was often directed towards the first-mentioned protagonist right away. In contrast, if events were coherent, more initial gazes were sent towards the action region and the instrument with which an action was performed. This might be an indication that coherent scenes automatically attract more attentional resources, even if the task does not necessitate this. However, we believe that this conclusion has to be taken with caution. As emphasized above, the type of stimulus material and the type of actions make a big difference for overall event apprehension. We are currently trying to corroborate coherence effects with naturalistic stimuli and with a much broader range of actions. After this series of experiments, we are more certain than ever that the interplay of vision and language is extremely complicated. In our opinion there will be no easy answers, for example in terms of one process always preceding the other. The order and magnitude of effects will depend as much on linguistic factors (e.g., speech task) as it does on perceptual factors (type of event, type of stimulus). We have not even begun to address issues that arise when speakers are under time pressure or when they are observing real events, i.e. interactive changes taking place in real time. It will be interesting to see how far we can transfer our conclusions from static to dynamic stimulus material and situations. The major question remains how the cognitive system is able to adapt so effortlessly to so many different circumstances.
10
Talking about events∗ Barbara Tversky, Jeffrey M. Zacks, Julie Bauer Morrison, and Bridgette Martin Hard
1
Introduction
People, in common with other creatures, need to identify recurrences in the world in order to thrive. Recurrences, whether in space or time, provide the stability and predictability that enable both understanding of the past and effective action in the future. Recurrences are often collected into categories and, in humans, named. One crucial category, and set of categories, is events, the stuff that fills our lives: preparing a meal, cleaning the house, going to the movies. Event categories are an especially rich and complex set of categories as they can extend over both time and space and can involve interactions and interrelations among multiple people, places, and things. Despite their complexity, they can be named by simple terms, a war or an election or a concert and described in a few words, folding the clothes, rinsing the dishes, or tuning the violin. People have an advantage over their non-verbal relatives in that language can facilitate learning categories and serve as a surrogate for them in reasoning. What are the effects of naming or describing over and above identifying categories? And what do the descriptions reveal about the categories? Here, we examine some of the consequences and characteristics of language for familiar categories, events, and the bodies that perform them. 2
What language can do
Language, like many cognitive tools, plays many roles in our cognitive and social lives. One role is to carve out entities in continuous space and time so that they can be referred to in their absence. Carving out some entities and not others calls attention to those that are named and focuses attention on the features that distinguish and characterize them. Yet another role of language is to allow abstraction, to go beyond features in the world that can be readily perceived and pointed out to describe those not readily observable in the world, ∗
We are grateful for grants. ONR grants N00014-PP-1-0649, N000140110717, and N000140210534, and NSF REC-0440103 to B. T. and grants NIH RO1-MH70674 and NSF BCS-0236651 to J. Z.
216
Talking about events
217
notably functional, conceptual, or other abstract features. Functional features, if given in perception at all, are less immediate than the perceptual cues that signal things in the world: color, shape, or kinds of action. Consider, for example, the features that characterize furniture or food. Although one can point to examples of furniture or food, it is difficult to point to their functional properties, which might be enhancing people’s comfort or providing nutrients to the body. Such features are often not knowable simply from seeing a static object, that is, they may not be afforded by the object. Knowing them depends on other information: sometimes using the object or seeing it in use, but sometimes learning about it in other, more formal ways. When that information is activated, it can change how the stimuli are perceived and encoded by calling attention to the aspects of the stimuli associated with that more abstract information. For furniture for example, that the dimensions are appropriate for the human body. In the case of events, watching someone perform an organized set of actions allows immediate perception of the actions of the body and the objects involved in the actions, but the goals or purposes or causal implications of the actions must be inferred. Goal achievement may be associated with specific actions, such as putting one object part down and picking up another. These informative actions, however, may not be as salient as other actions not associated with goal completion. In the case of bodies, the body parts that are perceptually salient may not be the same as those that are functionally important. The research to be discussed on body schemas and event parsing suggests that language can call attention to abstract, functional features of bodies and events, and that doing so reorganizes perception of them. The events of concern are representative of the everyday events performed by people, the sorts of events that fill the needs of the day: preparing meals, going shopping, and cleaning the house. Such events can be viewed as a partonomy, hierarchically organized sets of parts, consisting of higher-level goals, such as cleaning the house, which can be decomposed into lower-level goals, such as vacuuming the floor, doing the dishes, and washing the clothes, which in turn can be decomposed to even lower-level goals (e.g., Bower, Black, and Turner 1979; Schank and Abelson 1977; Zacks and Tversky 2001). Even mundane events such as these are rich and complex, not just because they have temporal and spatial contexts, but also because they involve the body, typically interacting with objects. In fact, when observing these kinds of events, people describe them as sequences of actions of the body on objects (Zacks, Tversky, and Iyer 2001). Hence, we first turn to people’s concept of the body, and the effects of naming on that concept. 3
Body schemas
Bodies, like other objects, take space in the world and have characteristic shapes, parts, sizes, and behaviors. This is an outsider’s view of the body. As humans,
218
Tversky et al.
we also have an insider’s view of bodies, a view we lack for other objects. We know what bodies and their parts feel like and what bodies can do from the inside, from sensory and kinesthetic information and motor experience. Does this insider perspective affect our schemas of our bodies? Both behavioral and neuropsychological research suggests that bodies are treated differently from objects. In a striking set of studies, apparent motion of an object with respect to a body was interpreted differently from apparent motion with respect to another object (Chatterjee, Freyd, and Shiffrar 1996). Apparent motion of a stick was more likely to be perceived as going through an object but going around a body. Perceiving a stick going through an object violates physical laws, but perceiving a stick moving through the body violates not only physical laws but also laws of body mechanics. Neuropsychological research also suggests that body schemas are special. For example, a patient with right hemisphere damage involving the basal nuclei suffered from personal neglect of the left side of his body, without any impairment to knowledge of the space beyond his body (Guariglia and Antonucci 1992). Underlying a body schema is knowledge of body parts, their spatial relations and their behaviors or functions. The project reported here investigated body schemas using a part verification task (Morrison and Tversky 2005). The stimuli were realistic renderings of (male) bodies in various poses and orientations. Across cultures, there is some commonality in what body parts get named, especially those named with primary rather than derived names (Andersen 1978; Brown 1976), suggesting that certain parts are more important or salient than others. The body parts selected were those commonly named across cultures: head, chest, back, arm, hand, leg, and foot. Two kinds of experiments were designed to assess the impact of appearance and function in retrieving information about bodies. In the name-body experiments, participants saw the name of a body part, and then a realistic rendering of a body in one of many postures and orientations with a part indicated by a small circle. In the body-body experiments, participants saw pairs of bodies in different orientations with the same or a different part indicated. In each case, participants responded “same” or “different” depending on whether the indicated parts were the “same.” The data of interest were the times to verify the various body parts. 3.1
Theories of body part verification times
Research on objects suggests three different theories that could account for part verification speed, based on size, contour discontinuity, or significance. The size theory receives support from studies of imagery that have shown that larger parts are verified more quickly than smaller ones in images (Kosslyn 1980). The explanation offered is that in scanning an image, as in scanning a scene, larger things are more quickly detected than smaller ones. Thus, a
Talking about events
219
theory derived from imagery would predict that larger parts would be verified faster than smaller ones. The part discontinuity theory receives support from studies of object recognition and cognition. Objects can be identified from their contours (e.g., Rosch et al. 1976; Palmer, Rosch, and Chase 1981). It has been proposed that objects are recognized by their parts (Biederman 1987). This may be in part because discontinuities in object contours, their shapes, are excellent cues to object parts (Hoffman and Richards 1984). Object parts are not only salient in perception, they are central to the understanding of objects and their functions (Tversky and Hemenway 1984). Perceptually salient parts tend to be functionally significant ones, and the parts themselves often suggest their behaviors or functions. The legs, seat, and backs of chairs are bounded by contour discontinuities, and are regarded as salient parts and serve separate essential functions. Like size, sharp changes in contour are thought to catch the eye. A theory consonant with this approach would predict that parts with greater contour discontinuity would be verified faster than those with less contour discontinuity. The part significance theory receives support from research on the role of parts in categorization. Parts proliferate in people’s listing of attributes for objects described at the basic level, the level of choice in reference, the level of table and apple in contrast to the superordinate level of furniture and fruit and the subordinate level of kitchen table or Fuji apple. Tversky and Hemenway (1984) suggested that parts are at once features of appearance and of function, providing a link between them. On the appearance side, the legs of a table are relatively large and yield discontinuities of contour. On the function side, the legs of a table serve to support the table top at a convenient height. In fact, parts rated as highly significant seemed to be both perceptually salient and functionally significant, making it difficult to separate the role of perceptual salience from that of functional significance in objects. For bodies, a crude index of functional significance is the relative proportion of territory in the sensorimotor cortex, well-known from the popular diagram of a large-handed, smallbacked homunculus representing relative cortical areas devoted to sensation and motor control of bodies (Penfield and Rasmussen 1950). Another is the twopoint discrimination threshold on the skin, though it reflects only the sensory side of the body (Weinstein 1968). For the common body parts considered here, the three theories make different predictions for verification times. According to the image theory, back and chest should be relatively fast as they are large parts, whereas hand and foot should be relatively slow as they are small. According to the part discontinuity theory, hand, head, arm, leg, and foot should be relatively fast as they are relatively discontinuous, and chest and back should be relatively slow. Finally, according to the part significance approach, head, hand, and chest should be relatively fast with back and leg relatively slow. In an independent study, head,
220
Tversky et al.
hand, and chest were rated as more significant body parts than back and leg. Note that the predictions from part discontinuity and part significance theories are highly similar. This is not surprising, as part significance has inputs both from perceptual salience, which correlates with part discontinuity, and from functional significance. Remember that for objects, perceptual salience and functional significance are themselves correlated. The sensory and motor cortex homunculi, while not identical, are nevertheless similar. 3.2
Significant body parts: names evoke function
Now we can interpret the data from the two experimental tasks, one comparing a named part to a part highlighted on a body, and the other comparing two bodies with highlighted parts (Morrison and Tversky 2005). Image size, derived from imagery theory, can be quickly dismissed, as larger parts were verified more slowly than smaller ones in both tasks. In contrast, part discontinuity and part significance did correlate with verification times. For the name-body comparison, part significance was the best predictor, but for the body-body comparisons, part discontinuity was the best predictor. There was quantitative as well as qualitative evidence for this. The correlation between verification time and part significance was higher (though not significantly so, given the limited range and high correlations between the indices) for the name-body comparison, and the correlation between verification time and part discontinuity was higher (though not significantly so) for the body-body comparison. As noted earlier, these two theories predict highly correlated rankings of the parts. One important qualitative difference separates them: chest is high in significance though low in contour discontinuity. For the name-body comparison, chest is second in speed only to head, whereas for the body-body comparison, chest is slow, second-tolast. Salience and significance ratings by independent judges confirmed these differences. Why is part discontinuity, a perceptible feature, the best predictor for bodybody comparisons, but part significance, a conceptual feature, the best predictor for name-body comparisons? The body-body task can be done purely perceptually, comparing the highlighted parts directly with no need to access part name or part meaning. Perceptual salience of parts in the form of contour discontinuity should facilitate those comparisons. In contrast, for the name-body task, the name must be understood in order to know which part to search for on the body. Understanding a name appears to arouse functional features as well as perceptual ones. Naming, in the case of bodies, arouses features not immediately available from perception, features that depend on knowing how the parts of the body function. Naming reorganizes the perception of body parts. Let us now turn to the roles bodies serve in events, and to language used in describing and understanding events.
Talking about events
4
221
Event schemas
Events can be regarded as temporal analogs of objects, though the analogy, like most analogies, is incomplete. In philosophical treatments, events are distinguished from activities and states in that events have intentions or goals. They refer to achievements or accomplishments. Running is an activity but running a race is an event. Whereas activities and states are homogenous, achievements and accomplishments are not; they have natural beginnings and they culminate in natural endings (Vendler 1957; see also the introduction and papers in Casati and Varzi 1996). Like objects, events are perceived to have parts. Just as object parts extend in space, event parts extend in time. Event parts turn out to be distinguished by sharp breaks in activity contours just as object parts are distinguished by sharp breaks in spatial contour (Hard, Recchia, and Tversky ms.; Tversky, Zacks, and Hard 2008; Zacks 2004; Zacks et al. 2009). In the classic restaurant example, the overall goal, going to a restaurant, consists of subgoals such as entering, registering, being seated, reading the menu, and ordering. That people, even very young ones, have topdown knowledge of goals and subgoals for common events is well-documented (e.g., Bower, Black, and Turner 1979; Schank and Abelson 1977; Zacks and Tversky 2001). Understanding everyday events. Is the hierarchical structure in people’s knowledge of events also present in their on-line perception of events? How does describing events as they unfold affect perception of them? To examine these questions, we turned first to familiar everyday events (Zacks, Tversky, and Iyer 2001). Events can be loosely classified into those that have no human agency, such as hurricanes or volcanic explosions, events that involve a single person, and events that involve more than one person. Because we were interested in human action and the roles of goals and intentions in event segmentation, we restricted ourselves to events involving people. For simplicity, and to enable laboratory studies, we chose events involving a single person, taking a limited amount of time, and occurring in a single place. Undergraduates rated events chosen from previous research and other sources on frequency and familiarity. From these, we selected two familiar events, making a bed and doing the dishes, and two unfamiliar events, fertilizing a plant and assembling a saxophone, and filmed them from a single viewpoint in the scene. First, we examined how observers segmented events as the events unfolded, how they described them as they unfolded, and how describing the event segments affected on-line perception of them. The descriptions provided insight into how events are segmented and interpreted. Observers viewed the films twice. As they viewed, they pressed a key when, in their judgment, one segment ended and another began, a procedure adopted from Newtson (1973). For one viewing, they were asked to select the largest units that made sense; for the
222
Tversky et al.
other viewing (in random order), they were asked to select the smallest units that made sense. Half of the observers described what happened in each segment as they segmented, and half segmented silently. All participants segmented a practice film depicting ironing a shirt before segmenting the experimental films. One question of interest was whether observers perceived the ongoing action as hierarchical. Hierarchical encoding was assessed by measuring how well fine-unit boundaries coincided with coarse-unit boundaries for each participant. The obtained degree of hierarchical encoding was compared to the appropriate null models. There are a number of bases on which people could segment activity on different scales. One natural hypothesis is that viewers segment fine units in a more bottom-up way, based on salient physical features (Newtson and Engquist 1976) and segment coarse units in a more top-down way, based on abstract features related to the goals of the actor, social relationships, or personality traits. Another proposal is that segmentation in this task is not a direct perceptual process at all, and the kinds of computations that lead to segmentation at different levels bear little relationship to each other (Cohen and Ebbesen 1979). Alternatively, segmentation in this task may reflect viewers’ attempts to make sense of what they are watching, and they do this by integrating the perceived activity into a coherent representation of the intentions and goals of the actor in the observed event. Such integration would require thinking about hierarchical relationships between goals and subgoals. The data supported the last alternative. There was greater hierarchical segmentation than would be expected by chance; that is, fine units were nicely bookended by larger ones (Zacks, Tversky, and Iyer 2001). More significant for the present discussion, this was affected by language use: describing led to a greater degree of hierarchical structure. This is particularly interesting because those who described as well as segmented performed two attention-demanding tasks, which could have competed with each other and resulted in greater disorganization of segmentation. Such a finding supports the view that in order to describe an activity, speakers generally need to think about how larger events relate to the sub-event ones that make them up. However, this does not appear always to be the case. In subsequent studies, this effect has not been observed, and in older adults the reverse has been found (Hard, Tversky, and Lang 2006; Kurby et al. 2008). In future research it will be important to establish the exact circumstances under which describing increases or decreases hierarchical segmentation; for now, the important conclusion is that describing an activity affects how it is segmented. Next, we asked why describing increased the degree of hierarchical encoding of events. Similar to the effects of naming on the perception of body parts, we offer an explanation dependent on the power of words to arouse functional, abstract features of viewed things. Inspecting the language used in the
Talking about events
223
descriptions supports this view. Although participants could have described the events as activities or movements of the body, such as “she raised her left arm” or “he bent at the waist,” they did not. While true, such language seems awkward, inappropriate, uninformative, and improbable. Such simple descriptions of motion, as shall be seen, are invoked for describing events that are not readily understood, such as exotic mating rituals in birds or insects or movements of geometric figures (Hard et al. 2006), but is aberrant for recognizable human events. In describing coherent comprehensible sequences of actions, as in learning them, what is important is what is accomplished, not the specific movements that accomplish it. For the everyday events observed here, the language described goals and subgoals, completed actions by actors on objects rather than activities of the actors or changes of locations of objects. In order to describe events, then, participants appear to consider the goals of the actions they are watching, features not given directly in the perceptual input. To provide such descriptions requires viewers to think about the higher-order goals of the actor and the relations of the subgoal structure to the higher-order goals. So, as for naming body parts, speaking about events appeared to evoke more abstract representations of them. Language elicited functional features, features containing information about the goals and subgoals of the actions underlying the events. In the case of these complex everyday events, this functional information served to produce a tighter organization of the perceptual information, leading to greater alignment of fine and coarse event boundaries. Just as names reorganize perception of bodies, describing events reorganizes perception of events. Qualitative differences between coarse and fine units. Talking does more than affect perception and cognition. It also reflects perception and cognition. The language of the descriptions provides insight into how events are interpreted as they happen. The vast majority (96%) of the play-by-play descriptions were actions on objects. The few exceptions were “she enters” or “he exits” (notably, events by feet, or translocations, not by hand). Revealingly, the descriptions of coarse and fine event units differed qualitatively. Consider one person’s protocol for coarse segmentation of “making a bed”: “walking in; taking apart the bed; putting on the sheet; putting on the other sheet; putting on the blanket.” Now the fine units for the coarse unit “putting on the sheet”: “unfolding sheet; laying it down; putting on the top end of the sheet; putting on the bottom; straightening it out.” This example suggests what subsequent analyses supported: that at the coarse level, events are punctuated by objects or object parts; that is, new coarse units are typically indicated by a new object or object part. At the fine level, events are punctuated by articulated actions on the same object, so objects are described less precisely. One measure of language use illuminates that observation: ratings of generality or specificity
224
Tversky et al.
of the words by new participants. According to these ratings, nouns – that is, the objects referred to – were more variable and specific at the coarse level than at the fine level. The language people use when they talk about events as they observe them, then, reveals how they perceive and organize events in their minds, as actions on objects. The actions described are accomplishments, typically goals and subgoals, parts of a larger goal. Observers organize action at a coarse level by objects, so that each coarse unit involves a different object or object part; similarly, they organize fine units by actions on those objects. For the kinds of events that fill our lives, then, describing both affects and reveals perception and cognition. Describing events as they unfold organizes perception of them around people’s actions on objects, on the goals and subgoals of their actions. These kinds of events, making beds and doing the dishes, involve complex actions coordinating objects with different parts of the bodies. As such, they contrast with the simpler events of translocation, of entire bodies moving from one place to another in space. We have termed the former events by hands and the latter, events by feet (Tversky, Lee, and Zacks 2004). Translocations can often be represented quite simply, as paths and turns at landmarks, though sometimes manner of motion is perceived and communicated as well. However, because of the simplicity of the action, language may not serve to organize perception of translocations. The next project describes research on abstract unfamiliar events of translocation. Understanding translocations. So far, we have investigated everyday events that are more or less familiar to viewers, events performed primarily by complex movements of the arms, hands, and upper bodies, directed by the eyes. Viewers perceive these ongoing events hierarchically. Providing a play-byplay description of each segment while segmenting yielded greater hierarchical organization, suggesting that articulation of top-down knowledge of function organizes perception according to indicators of goal/subgoal structure. Even though these events varied in familiarity, all were somewhat familiar so that their goal/subgoal structure was easily described by observers. How are abstract, novel events, whose goals and subgoals are difficult to discern, segmented and described? Does describing organize perception for events that are primarily translocations? To study this, we turned to a film used in a classic experiment by Heider and Simmel (1944). They showed participants a film of three geometric figures moving around in a minimal background. According to Heider and Simmel, participants perceived the actions in the film as a social, goal-directed scenario, one character bullying and chasing two smaller ones, the smaller ones taunting the bully and finally thwarting him. This scenario, which we have called the chase scenario, was translated into a computer animation and shown to viewers along with a second animation, modeled on “hide and seek.” As before, all viewers segmented the videos into coarse and fine units, on alternate viewings. Familiarity was varied by
Talking about events
225
having half the viewers segment the films on first viewing and half after five viewings. Half of each familiarity group described the action in each segment as they segmented, and half segmented silently. Causality was also varied, by presenting the films forwards or backwards, counterbalanced across conditions (Hard et al. 2006). The presumption was that presenting the films backwards would disrupt perception of the causal relations between the actions. Importantly, after only a single viewing of the films, the play-by-play descriptions of the event segments revealed that, contrary to previous claims, viewers did not perceive the events as a sequence of sensible goal-related actions. After a single viewing, more than 90% of the descriptions were of movements, such as, “move,” “rotate,” and “change direction.” This dropped to around 75% after five viewings. By contrast, after five viewings, around 70% of the protocols contained intentional verbs, such as “chase,” “hide,” and “talk,” whereas after a single viewing, only 40% of the protocols contained intentional verbs (descriptions could contain both motion and intention). After five viewings, forward events were described more intentionally than backwards ones. Thus, familiarity can have a strong influence on event understanding. Intriguingly, although familiarity increased intentional understanding of the films, in these cases it had no effect on hierarchical organization. Nor did describing or direction of viewing. In fact, the degree of hierarchical organization as well as the specific event unit boundaries were highly consistent across all conditions and across individuals as well. This suggests that for these events, not the complex interactions of many body parts that characterized the earlier work, but rather events consisting primarily of simple paths of motion with very little manner of motion, the perceptual information in the stimulus was sufficient for segmentation. Top-down interpretations did not facilitate segmentation for this type of event; rather, the motion itself seemed to drive segmentation. A detailed analysis of the motion features in the films supports that interpretation. The films were coded for types of movement in one-second bins: start, turn, rotate, contact object, and change speed. Those bins coinciding with segment boundaries had significantly more movement changes than those that did not coincide with segment boundaries. Moreover, coarse segment boundaries were associated with more movement changes than fine segment boundaries. There were no qualitative differences between coarse and fine units, only quantitative ones. Note that these events were highly abstract. Unlike the films of everyday events, these films did not show full-bodied characters interacting in space. Rather they used point-like figures to stand for the characters and the motion consisted of paths of movement and jiggling of the characters. As such, these results should be seen as qualifying the previous findings for everyday events rather than contradicting them. For paths of motion, translocations of entire bodies, the dominant motion is stop, start, change of direction, and change of
226
Tversky et al.
speed. These simple, salient changes are all that is needed for segmentation and for hierarchical organization. The perceptual information seems to be sufficient for excellent hierarchical segmentation, even in the absence of understanding of intentions and goals. For everyday events that consist of complex interactions of arms, hands, head, body, and legs with objects in the world, thinking about the goals and subgoals does improves hierarchical segmentation by focusing it on the relevant parts of the action. Describing activates that thinking. 5
Talk and thought
Of all the categories people acquire, events are arguably the most important. They are important because they fill our lives and fulfill – or obstruct – our needs. Especially important are events enacted by people. These are events that we need to understand in order to learn them and in order to react to them. These events, even the mundane, such as preparing meals or assembling furniture, are nevertheless intricate interactions of people’s bodies with objects that take place over time. Here, we asked two of those Big Questions that make cognitive science exciting: How do people perceive and understand the events of their lives? Does language affect perception? We presented research on bodies and events that showed that language affected their organization in perception. For bodies, names bias perception toward organization based on abstract function rather than organization based on perceptual salience. Specifically, when cued by a named body part, functionally important body parts were more rapidly verified in a picture of a body, but when cued by a picture of a body with a part highlighted, perceptually salient body parts were more rapidly verified. For events, segmentation of ongoing events into coarse and fine units was more tightly hierarchically organized when observers described each event segment as they segmented than when observers merely segmented. Specifically, when describing, and as confirmed by the descriptions, observers’ segmentation of events into parts and subparts became more closely aligned with goals and subgoals. These effects of language on perception are quite different from the kinds of effects observed in eyewitness memory (Loftus 1980 [1996]). In those studies, participants watched a slide show or film of an accident or disturbing event. Later they were asked a long list of questions about the incident. For some participants, the questions presupposed information contradictory to what had been viewed. For example, the slide show might have shown a stop sign but the question presupposed a yield sign. In the memory tests, many participants who had received the contradictory information selected the information later presupposed rather than what they had viewed. This is a classic interference effect in memory (e.g., Tversky and Tuchin 1989), where the same cue is associated with different responses, and the responses conflict, sometimes one
Talking about events
227
winning out, sometimes the other, sometimes neither, as when we draw a blank searching for someone’s latest phone number or where we parked the car. Instead, these effects of language on perception seem closer to the effects of language on thought that the Whorfians or neo-Whorfians have explored (e. g., Boroditsky 2003; Levinson 2003; Slobin 1996a; Whorf 1956). For example, speakers of languages that use both left-right-front-back and north-south-eastwest frames of reference to refer to objects in space tend to organize a spatial array of nearby objects around their own left-right. On the other hand, speakers of languages that only use a north-south-east-west absolute reference frame also organize nearby objects using that absolute reference frame (Levinson 1996). Speakers of languages restricted to absolute reference frames are also better oriented when moving around in space (Levinson 2003). To explain these kinds of effects, Slobin proposed that the way language affects thought is by focusing attention on certain features of the world rather than others. The present results are consistent with that analysis, that language calls preferential attention to some aspects of experience at the expense of others. In the present cases, the shift in focus of attention is from more perceptual features to features that are more functional or abstract. Language can cause shifts of attention that have other cognitive consequences. The influences of language that we have found occur within a language and within an individual. They broaden Slobin’s suggested mechanism for the Whorfian effects from “while speaking” to “while perceiving” and “while remembering,” and from “the dimensions of experience that are enshrined in grammatical categories” to dimensions of experience “enshrined” in semantic categories and schemas. But more than that, the results highlight how language serves as a cognitive tool, a tool that can guide and craft perception, thought, and action. As such, simple describing is an elementary form of teaching.
11
Absent causes, present effects How omissions cause events Phillip Wolff, Matthew Hausknecht, and Kevin Holmes
1
Introduction
Causal relationships range from the physical to the abstract: from friction causing heat to stress causing forgetfulness. This broad spectrum of relationships motivates the question of what all causal relationships have in common. One approach has been to specify the conditions for causation in terms of the occurrence or non-occurrence of events or states, with no regard to processes that produce these events or states. Because these theories specify causation in terms of the effects of causation, they will be referred to as outcome theories. Outcome theories typically describe the conditions for causation in terms of probabilities, counterfactuals, first-order logic or mental models. An alternative approach specifies the conditions for causation in terms of the processes that bring about outcomes; such accounts will be referred to as process theories. Process theories typically specify the conditions for causation in terms of the transmission of energy and force or their analogs in the social and psychological domains, for example, intentions and social pressure. The two kinds of theories sometimes address different questions about causation, making them, in some sense, complementary. However, they contrast sharply on the question of what counts as a causal event, in particular, the phenomenon of causation by omission. Causation by omission occurs when the absence of an influence brings about an effect. We say, for example, Not watering the plant caused it to wilt or Lack of clean air causes dizziness. Outcome theories view causation by omission as a fully legitimate kind of causation, a position that we will support in this chapter. Outcome theories hold this position because the criteria for causation in these theories do not depend on the underlying processes that give rise to events; they do not need to explain how the absence of an influence could cause something to happen. Process theories, on the other hand, deny that causation by omission is causation in the fullest sense of the concept. They are led to this position because, as currently formulated, process theories define causation in terms of the transmission of force, and plainly nothing can be transmitted by an absence. 228
Absent causes, present effects
229
In this chapter we will argue for the legitimacy of causation by omission, but against the view that causation is defined in terms of outcomes. We will argue instead that causation is fundamentally understood as specified in process theories. Although process theories, to date, have denied the legitimacy of causation by omission, we will argue that the rejection of this kind of causation does not necessarily follow from this approach to causation. We will then show how the phenomenon of causation by omission can be handled in process theories. We begin with a brief discussion of outcome theories and how they specify causation by omission. We also note several key challenges for outcome theories. We then provide a brief discussion of various process theories, including the two key challenges that have been raised against these theories. One challenge was originally raised by Hume (1978 [1739]), namely that accounts of causation based on notions such as force are circular. We address Hume’s criticism with both logical and empirical evidence. We then take up the remaining challenge of explaining causation by omission in terms of a process. In particular, we will present a new process theory of causation, the force theory, which explains how causation by omission and commission can be given a unified characterization. In the last section, we describe empirical evidence in support of this new account of causation. 2
Defining causation in terms of outcomes
According to outcome theories, causal relations are specified in terms of the occurrence or non-occurrence of events or states, without regard to the nature of the process that produced those events or states (Ahn and Kalish 2000). Because the mechanism is left unspecified, such theories, in effect, avoid the problem raised by causation by omission since they do not attempt to specify the manner in which an effect might be caused by an absence. There are several classes of theories that specify causation in terms of outcomes. Here we quickly review some of these theories and how they account for causation by omission. 2.1
Probability raising models
According to probability raising models, a cause can be defined as an event that changes the probability of another event. The concept of CAUSE, in particular, as opposed to the concept of PREVENT, is usually associated with probabilitychanging events in which the probability of an effect in the presence of a cause, P(E|C), is noticeably greater than the probability of an effect in the absence of a cause, P(E| ¬ C) (Cheng and Novick 1991, 1992). When the probability of the effect in the presence of a cause is greater than in its absence, it is often said that the cause “raises the probability of the effect.” For example, the probability
230
Wolff, Hausknecht, and Holmes
of traffic jams is greater in the presence of construction than in its absence, so according to the probability criterion of causation, construction causes traffic jams. In these theories, omissions are treated like commissions. For example, in probability-raising models, the statement Lack of water caused the plant to wilt – causation by omission – would entail that the probability of E given ¬C, P(E|¬C), is greater than the probability of E given C, P(E|C), that is P(E|¬C) > P(E|C). Thus, causation by omission does not raise problems for these kinds of models. 2.2
Counterfactual theories of causation
Counterfactual theories offer a second type of outcome theory (Lewis 1973, 2000; see also Spellman and Mandel 1999). According to counterfactual theories, ‘C causes E’ holds if it is the case that if C had not occurred, E would not have occurred. Because this criterion is stated in terms of the occurrence and non-occurrence of events, it can be adapted to causation by omission. As argued by McGrath (2005), the meaning of the causal claim ‘Not C causes E’ would presumably map onto the conditional “if C had occurred, E would not have occurred” (e.g., If watering had occurred, wilting would not have occurred). As with probability raising models, counterfactual theories handle causation by omission without difficulty. 2.3
Mental model theory of causation
A third type of outcome theory is Goldvarg and Johnson-Laird’s (2001) mental model theory. The model theory goes beyond other theories in characterizing not only CAUSE and PREVENT, but also distinguishing these two notions from ALLOW. According to the mental model theory, the notions of CAUSE, ALLOW, and PREVENT are associated with different combinations of possible co-occurrences. For example, a CAUSE relation is associated with a set of cooccurrences in which A is present and B is present (a b), A is absent and B is present (¬a b), and A is absent and B is absent (¬a ¬b). Applying NOT to the antecedent or consequent flips the states of affairs of the antecedents and consequents (respectively) in all of the possible co-occurrences. Thus, the meaning of “Not C causes E” would be given by a set of co-occurrences (¬a b), (a b), and (a ¬b). As with the previous outcome theories, causation by omission is handled quite easily by the mental model as simply a different set of possible co-occurring states. 2.4
Causal Bayesian network theories
A fourth type of outcome theory is represented by causal Bayesian network theories of causation. In causal Bayesian networks, variables are connected
Absent causes, present effects
231
to one another with ‘arcs’, as in A → B. Each arrow in a causal Bayesian network is associated with a set of conditional probabilities in conjunction with assumptions about the effect of actual or hypothetical interventions (Woodward 2003, 2007; Sloman 2005; Schulz, Kushnir, and Gopnik 2007). A recent proposal by Sloman et al. (2009), the causal model theory, shows how a Bayesian network approach to causation might be applied to the representation of CAUSE, ALLOW, and PREVENT, including causation by omission. Sloman et al. (2009) frame their theory in terms of structural equations, which represent a particular way of instantiating a graph with arrows. For example, the graph A → B can be instantiated in a structural equation such as B := A (Sloman et al. 2009; see also Hitchcock 2001). According to their theory, the concept of ALLOW is associated with a different structural equation, namely, the claim A allows B is represented as B := A and X, in which the variable X is an assessor variable. The claim A prevents B is represented by the equation B := ¬A, which also represents causation by omission, that is, claims such as Not-A causes B. As with the other outcome theories, causation by omission falls out naturally from causal Bayesian network theories such as Sloman et al.’s (2009) causal model theory. 2.5
Challenges for outcome theories
Special assumptions are not required for the representation of causation by omission in outcome theories; hence, these theories provide a unified account of different kinds of causation. In addition, these theories define causation in terms that are relatively unambiguous rather than relying on constructs that are as undefined as the concept they seek to explain. Despite these strengths, there are several properties of causation that outcome theories cannot account for: temporal priority, mechanism, and spatio-temporal contiguity. Temporal priority is the property that causes must precede or occur simultaneously with their effects. In outcome theories, temporal priority needs to be stipulated (Hitchcock 2002) because the representations used in outcome theories do not require that the causal factors occur in a certain chronological order. In probability-raising theories, for example, a correlation between variables A and B can exist regardless of their temporal order. Outcome theories also do not explain the importance of mechanism in people’s causal knowledge, that is, why people expect non-contiguous events to be connected via a chain of intervening events. Correlations and counterfactual dependencies, for example, do not depend on the presence or absence of intervening links. Finally, outcome theories cannot motivate why spatio-temporal contiguity should be relevant to causation. Some researchers have suggested that spatio-temporal contiguity may be important to causation because of an innate bias to attend to these features (Schulz et al. 2007; see also Gopnik et al. 2004; Leslie and Keeble 1987), but such a proposal really offers no explanation, or actually concedes the
232
Wolff, Hausknecht, and Holmes
point: if attention to spatial and temporal contiguity is innate, it must be because it is an important property of causation. Outcome theories do not explain why this should be the case. In contrast to outcome theories, process theories are able to account for the properties discussed above. First, the property of temporal priority is motivated in process theories because a result can only occur after there has been an exchange of energy or force. Second, process theories motivate the need for mechanism. Forces impose a relatively “local” level of granularity. This local level of granularity implies that for indirect causal relationships, that is, for causal relationships between non-contiguous events, there must be a sequence of intermediate links, each contiguous to the next (Russell 1948). Thus, process theories entail the existence of causal mechanisms. This assumption has been strongly supported by work in psychology (Ahn and Bailenson 1996; Ahn and Kalish 2000; Ahn et al. 1995; see also Bullock, Gelman, and Baillargeon 1982; Shultz 1982). A third property of causation that is readily explained by process theories but not by outcome theories is the importance of spatio-temporal contiguity. The importance of spatial and temporal contiguity has been repeatedly observed in standard “billiard ball” type events in which one object hits and launches another into motion (Michotte 1963 [1946]). Much research has shown that the impression of causation is greatly diminished if the first object in the sequences does not make physical contact with the second object (Lesser 1977; Michotte 1963 [1946]; Oakes 1994; Spelke, Phillips, and Woodward 1995; Thin`es, Costall, and Butterworth 1991), or if the second object moves after a significant delay (Kruschke and Fragassi 1996; Leslie 1984; Oakes 1994; Spelke et al. 1995; for a review, see Scholl and Tremoulet 2000). Process theories are able to explain the importance of spatio-temporal contiguity because they define causation in terms of forces and physical forces require spatio-temporal contiguity (Wolff 2008). In sum, outcome theories offer unified accounts of both causation by commission and causation by omission, but they cannot explain the properties of temporal priority, mechanism, and spatio-temporal contiguity. Given that these properties are central to people’s concept of causation, process approaches to causation remain viable, despite some limitations. Below, in addition to reviewing various process theories, we will propose how the limitations of process theories might be overcome. 3
Defining causation in terms of process
Process theories begin with the assumption that causation in the mind can be traced to the physical world (Dowe 2007; Wolff 2007). In almost all process theories described to date, causation involves a transmission of these physical
Absent causes, present effects
233
quantities. For example, according to Aronson’s (1971) transference theory, causation implies contact between two objects in which a quantity possessed by the cause (e.g., velocity, momentum, kinetic energy, heat, etc.) is transferred to the effect. Another transference theory is proposed by Fair (1979), who holds that causes are the source of physical quantities, energy, and momentum that flow from the cause to the effect. According to Salmon’s (1994, 1998) invariant quantity theory, causation involves an intersection of world lines that results in the transmission of an invariant quantity. The proposals of Aronson, Fair, and Salmon come from philosophy. Similar proposals from psychology have been termed ‘generative theories’ of causation. According to Bullock et al. (1982), adults believe that causes bring about their effects by a transfer of causal impetus. Shultz (1982) suggests that causation is understood as a transmission between materials or events that results in an effect. According to Leslie (1984), physical causation is processed by a ‘theory of bodies’ that schematizes objects as bearers, transmitters, and recipients of a primitive notion of force. A recent proposal from philosophy breaks from earlier process models in not requiring that the transmission occur in only one direction. According to Dowe’s conserved quantity theory (2000), there are two main types of causation: persistence (e.g., inertia causing a spacecraft to move through space) and interactions (e.g., the collision of billiard balls causing each ball to change direction). Causal interactions occur when the trajectories of two objects (essentially, Salmon’s ‘world lines’) intersect and there is an exchange of conserved quantities (e.g., an exchange of momentum when two billiard balls collide). Unlike earlier theories, exchanges are not limited to a single direction (i.e., from cause to effect); however, even in the case of an exchange, at least part of the interaction involves a transmission from the cause to the effect. As discussed earlier, one of the strengths of process theories is that they explain why the properties of temporal priority, mechanism, and spatiotemporal contiguity are important to causation. However, these theories have been subject to at least two key criticisms. One of the criticisms was originally raised by Hume (1978 [1737]) and concerns the value of defining causation in terms of abstract notions such as force. The second criticism concerns the ability of process models to address causation by omission and related phenomena. In the following sections, we will address the challenges raised by these criticisms. With respect to Hume’s challenge, we will discuss evidence that force is not necessarily abstract but rather something physical that people are able to apprehend through bodily senses. With respect to the problem of causation by omission, we will describe a new process theory of causation that explains how causation by omission can be specified in terms of causal processes.
234
Wolff, Hausknecht, and Holmes
4
Addressing Hume’s challenge
An important potential criticism of the process theories of causation was originally raised by Hume (1978 [1739]), but is often repeated in current discussion of theories of causation (Cheng and Novick 1991, 1992; Cheng 1997; Schulz, Kushnir, and Gopnik 2007; Woodward 2007). According to Hume, defining causation in terms of force is circular because the notion of force cannot be defined without reference to causation itself. Further, the reason why the notion of force has no independent standing from the notion of cause is because forces cannot be directly observed. Below we provide evidence against Hume’s criticism by showing that forces can be directly apprehended by our sensory systems. Second, we discuss behavioral and linguistic evidence that the notion of force does not depend on causation. Specifically, we review evidence that people are able to distinguish between events on the basis of forces. We conclude that people can represent the notion of force independent of the notion of CAUSE, and so Hume’s argument that definitions of causation based on force are circular can be rejected. As argued by White (1999, 2006), causal understanding might begin in the somatosensory system, which includes kinesthesia, the sense of the body’s position, weight, or movement, as well as skin pressure sensors. The somatosensory system allows for the direct detection of dynamic properties, such as mass, force, and energy (e.g. heat). In the motor control literature, this idea is reflected in what is referred to as an ‘internal model,’ that is, a representation that specifies the dynamics of the environment (Reinkensmeyer, Emken, and Crammer 2004; Kurtzer, Herter, and Scott 2005; Davidson and Wolpert 2004; Conditt, Gandolfo, and Mussa-Ivaldi 1997; Hinder and Milner 2003; Imamizu, Uno, and Kawato 1995; Kawato 1999; Ohta and Laboissi`ere 2006; Milner and Franklin 2005; Papaxanthis, Pozzo, and McIntyre 2005; Shadmehr and MussaIvaldi 1994). Our internal models are at work when, for example, we over-lift a suitcase that we think is full when it is, in fact, empty (Reinkensmeyer et al. 2004). One of the classic studies on internal models was conducted by Shadmehr and Mussa-Ivaldi (1994). Participants were instructed to move their hand from one point to another while holding onto a handle that was part of a robotic arm (i.e., a manipulandrum). The robotic arm was programmed to generate forces that pushed the person’s hand away from the target location. With repeated practice, people learned how to overcome the pressure of the robotic arm and to reach straight for the intended target. The key finding was the appearance of an after-effect once the force field (robotic arm) was removed: people’s arm trajectories were distorted in the opposite direction of the previously applied force. The result can be explained as due to the formation of an internal model of the dynamics of the environment that continued to affect motion even after
Absent causes, present effects
235
the dynamics of the environment returned to normal. Similar findings have been observed in conditions of microgravity, that is, when people are asked to reach for targets when they are in weightless conditions of parabolic flight (Papaxanthis et al. 2005). Changes in the trajectories of their arms imply that people factor into their motion plans the effects of gravity and inertia. Several studies have also shown that expectations about the dynamic characteristics of the environment generalize to completely new movements (Conditt et al. 1997; Reinkensmeyer et al. 2004); such results are important because they suggest that the representation of the dynamics of the environment is not tied to a particular motor motion, but rather appears to exist as a central representation (Milner et al. 2007). Other research suggests that people are able to keep track of the dynamics associated with multiple objects in the same environment on the basis of visual or other sensory clues (Davidson and Wolpert 2004; Reinkensmeyer et al. 2004). This ability is implied by the ease with which people are able to switch from manipulating one object to another when performing everyday tasks (Milner et al. 2007). For example, raking leaves involves lifting up and pushing around leaves, branches, rakes, bags of leaves, wheelbarrows, etc. Not only are people able to represent the dynamics of individual objects, they are also able to combine them. It has been shown, for example, that people are able to anticipate the force needed to lift two objects together on the basis of having learned the force needed to lift each object individually (Davidson and Wolpert 2004). Further evidence for the independent representation of dynamic properties comes from research on the use of mental simulation in problem solving (Schwartz 1999; Schwartz and Black 1996). In Schwartz and Black (1996) participants were presented with two glasses of the same height, one narrow and the other wide. Each glass had a line on it to indicate a particular level of imaginary water. The level was at the same height for both glasses. In one condition, people were explicitly asked whether the water in the two glasses would pour out at the same or at different angles. In the second condition, people were asked to close their eyes and to tilt each glass until the point where they believed the water would reach the rim. The correct answer is that as the glasses are tilted, the water will pour out of the wide glass before it pours out of the narrow glass. In the explicit condition, people gave the correct answer only 15% of the time, whereas in the mental simulation condition, people were correct nearly 100% of the time. Amazingly, the process of mentally simulating the event led to more accurate knowledge of the world than predictions based on explicit prior beliefs. In Schwartz (1999), the task was modified slightly to address the question of whether people’s simulations were based on kinematics or dynamics. In one condition, the participants were asked to imagine the glass was filled with water while in the other, they were asked to imagine the glass
236
Wolff, Hausknecht, and Holmes
Figure 11.1 Scene adapted from Freyd, Pantzer, and Cheng (1988) in which participants were asked to indicate whether the plant was located in the “same” position once a source of support was removed
was filled with molasses. If people’s mental simulations of these events were based purely on kinematics, that is, purely in terms of the geometry of the glass and the liquids, their judgments about the angle at which the liquids should reach the rim should not differ. On the other hand, if their mental simulations of these events were based on dynamics, people should tilt the molasses glass farther than the water glass, since a greater angle is needed (in comparable amounts of time) to make the molasses move to the edge of the rim than is required for water. This is exactly what Schwartz found: people tilted the glass roughly 15◦ more in the molasses condition than in the water condition. Altogether, the results suggest that people are able to differentiate two events on the basis of dynamic quantities, supporting the view that dynamic quantities such as force can be represented independently of causation. One last piece of evidence for the independence of force and causation comes from a study in which no obvious causation was present, and yet people inferred the existence of forces. In this study, conducted by Freyd, Pantzer, and Cheng (1988), participants were presented with a scene depicting a potted plant sitting on a pedestal and positioned next to a window, which served as a point of reference (see fig. 11.1). The scene was then replaced by another scene in which the pedestal was removed, but the plant was in exactly the same position as in the earlier scene. This second scene was then replaced with a third scene in which the plant’s position was shifted slightly (higher or lower) or remained exactly where it had been before. Freyd et al. (1988) reasoned that if people viewed the pedestal as exerting a force on the pot, then people might (implicitly) expect the plant to move downward due to the influence of gravity. The participants’ task was to indicate whether the plant in the third display was in the same position as in the second. As predicted, participants were far more likely to report “same” to a shift in the downward than in the upward direction,
Absent causes, present effects
237
supporting the hypothesis that they viewed the pedestal as exerting a force on the potted plant, even in the absence of causation. In addition to behavioral studies, various patterns in language support the proposal that causation and force can be represented independently of each other. One such pattern is the existence of two relatively large classes of transitive verbs. Certain transitive verbs, lexical causatives, entail the occurrence of a causing and resulting event (e.g., bend, break, open, drain, melt, sink, bounce, roll, turn). Another class of transitive verbs, two-argument activities, also denote actions directed towards an entity (e.g., clobber, hammer, hit, jab, nudge, slam, shove, stroke, touch, wipe) but, unlike lexical causatives, they do not strictly entail the occurrence of a change of state of location (Levin and Rappaport Hovav 1994; Pinker 1989; Shibatani 1976b; Song 1996). The difference in meaning between these two classes is revealed when their (possible) results are explicitly denied (Shibatani 1976b). Whereas the possible result to a two-argument activity verb can be explicitly denied (e.g., John kicked the ice, but nothing happened to it), the possible result of a lexical causative verb cannot be denied without contradiction (e.g., ∗ John melted the ice, but nothing happened to it). This difference in semantic entailment is often taken as evidence that that lexical causatives encode for causation whereas two-argument activities do not (Shibatani 1976b; Wolff 2003). Nevertheless, the two verb classes share an important semantic component. As described by Levin (2007), the direct object of both types of verbs can be construed as instantiating the semantic role of force recipient, in other words, the target of a transmitted force. According to this analysis, two-argument activities encode for the transmission of force without coding for causation. This analysis is supported by intuition. When we say He hit the wall, this does not mean that something was caused; the wall likely remained the same as it was prior to the hitting. However, the sentence does imply that force was imparted since its meaning is quite different from the meaning of the sentence He touched the wall, another situation where nothing is caused. In sum, evidence from both behavioral studies and linguistic analysis indicates that people need not rely on an abstract notion of causation in order to understand the notion of force. Forces can be perceived directly through our senses and appear to be conceptually separate from each other. As stated earlier, Hume’s criticism that definitions of causation based on force are circular can be safely rejected. 5
Why causation by omission represents a problem for process theories, and a possible solution
A second major criticism that has been raised against process theories concerns their ability to address the phenomenon of causation by omission. Here we
238
Wolff, Hausknecht, and Holmes
first describe why this phenomenon is a problem for process theories. We then describe a new theory of causation that explains how causation by omission can be handled in terms of causal processes. Lastly, we review empirical evidence in support of this theory. The main criterion for causation in process theories is the transfer of energy or force. This assumption is clearly at odds with causation by omission. For example, when we say Lack of rainfall caused the drought, the cause in this claim, Lack of rainfall, is an absence, and obviously no force can be transmitted from an absence. The problem posed by causation by omission has led some philosophers to suggest that there may be two kinds of causation: productive, or positive, causation, in which there is a transfer of force or energy from the cause to the effect, and make-a-difference, or negative, causation, which preserves counterfactual dependencies but not the property of transitivity (Hall 2004; Godfrey-Smith forthcoming; Menzies 2004). Other philosophers have argued that causation by omission is not “really” causation (Beebee 2004; Dowe 2001). For example, Dowe (2001) views causation by omission as “quasi” causation because it does not involve an exchange of conserved quantities. In order to account for statements of causation by omission, Dowe (2001) adopts theoretical machinery from outcome theories, namely, counterfactuals. For other theorists, the inability of process theories to account for causation by omission indicates that such theories are fundamentally flawed (Schaffer 2000; Schulz et al. 2007; Woodward 2006). In contrast to this conclusion, we show causation by omission can be specified in terms of causal processing. To account for causation by omission, we will adopt a proposal implied by several philosophers, namely that causation by omission can be handled in terms of double prevention (McGrath 2003; Foot 1967; McMahan 1993). To illustrate double prevention, imagine a situation in which a car is held up off the ground by a jack. A man pushes the jack aside, and the car falls to the ground. This scenario could be described as The man caused the car to fall to the ground (even though the force that causes the car to fall to the ground does not come from the force exerted by the man but rather from gravity). What transpires in this example is a sequence of PREVENT relations, or double prevention. First, the jack prevents the car from falling (due to gravity), and then the man prevents the jack from preventing the car from hitting the ground. Critically, instances of double prevention can be paraphrased as instances of causation by omission. As noted earlier, we can say the man caused the car to fall to the ground. Alternatively, instead of explicitly naming the man, we can refer to the prevention the man initiated: lack of a jack caused the car to fall to the ground. The way a double prevention is described will depend on whether people wish to focus on the entity that prevented the prevention or the absence caused by the prevention. The same idea can be illustrated with the example used earlier: Lack of rainfall caused the drought. According to our hypothesis,
Absent causes, present effects
239
claims of causation by omission imply double preventions. In this particular example, the implicit double prevention may have been that, for example, Climate change prevents rainfall and Rainfall prevents drought. Notice that this double prevention licenses the conclusion Climate change causes drought, or we could once again choose to highlight what was prevented by climate change and say Lack of rainfall causes drought. Once causation by omission is analyzed as double prevention, it is possible to specify how it might be represented in terms of forces, and hence by a process theory of causation, in particular, the force theory of causation. Our description of the force theory will have five main parts. First, we will introduce the force theory, including some of its assumptions. Second, we will describe how the theory represents individual causal relationships. Third, we will describe how the theory accounts for the joining of causal relations that allow for the generation of overarching relations, or relation composition. For example, in order to understand the meaning of a double prevention, two prevention relations must first be joined, which then results in a new causal relation. In a fourth section, we will explain how relation composition can sometimes lead to more than one conclusion, and how the theory explains the relative proportion of possible conclusions. Finally, we will describe how the theory accounts for the representation of causation by omission and causation of an absence in detail in terms of the relation composition of PREVENT relations.
6
Force theory
According to the force theory, people specify causal relations in terms of configurations of forces that can be represented as vector quantities that can exert an influence. The forces may be physical, psychological (e.g., intentions), or social (e.g., peer pressure) (Wolff 2007). The force theory is primarily described at the algorithmic level; in other words, it is meant to account for the actual cognitive operations that people perform when they reason. However, certain aspects of the model are described at the computational level, that is, at a level that is intended to simply predict human performance while adopting computational procedures that are not psychologically plausible. We will explicitly note those parts of the theory that extend beyond the algorithmic to the computational level. We assume that the specification of forces must be partially symbolic, especially when the forces involved are psychological or social. They must also be symbolic in that the magnitudes of the vectors used in the representation of causal relations may be somewhat arbitrary (Wolff 2007). At the same time, it is also assumed that reasoning with forces is partially iconic. According to the force theory, vector representations in the mind support the same kinds
240
Wolff, Hausknecht, and Holmes
of processes that occur among forces in the world (e.g., vector addition, subtraction); thus, according to the theory, specifying causal relations involves a partial “re-enactment” of how forces interact in the world. The process of re-enactment in the mind begins with a specification of the quantities that produce causal relations, namely forces. A key test of the claim that the force theory specifies the quantities that produce causal relations is whether the representations specified by the model can be entered into a physics simulator to produce animations reflecting real-world events and whether the animations are recognized by people as causal in the ways specified by the theory. Such a test is described below. In being able to meet this test, the force theory provides an account of the representations that might drive the analog mental simulations that people “watch” in their minds to recognize and identify causal relations (see Hegarty 2004). In the force theory, uncertainty is built into the representation of causation because of uncertainty about the magnitudes of the vectors in a configuration of forces. For individual causal relations, this uncertainty will be of little or no consequence: while people may not know the precise magnitudes of the vectors involved, they can ascribe relative magnitudes, which is enough to classify a configuration of forces as causal, preventative, or allowing. However, as discussed below, when configurations of force are added together to form networks or chains of configurations, probabilistic outcomes will emerge from deterministic processes. Further important assumptions of the force theory are discussed in Wolff (2007). 6.1
Representing individual causal relations in the force theory
The force theory extends Wolff’s (2007) dynamics model of causation and Talmy’s (1988) theory of force dynamics in specifying how individual causal relations might be represented in configurations of force. According to the dynamics model, the concept of CAUSE and related concepts involve interactions between two main entities: an affector and a patient (the entity acted on by the affector). It holds that the different kinds of causal relationships can be specified in terms of three dimensions: (a) the tendency of the patient for an endstate, (b) the presence or absence of concordance between the affector and the patient, and (c) progress toward the endstate (essentially, whether the result occurs). Table 11.1 summarizes how these dimensions differentiate the concepts of CAUSE, HELP/ENABLE/ALLOW, and PREVENT. For example, according to the dynamics model, when we say High winds caused the tree to fall down, we mean that the patient (the tree) had no tendency to fall (Tendency = No), the affector (the wind) acted against the patient (Concordance = No) and the result (falling down) occurred (Endstate approached = Yes).
Absent causes, present effects
241
Table 11.1 Representations of several causal concepts
CAUSE HELP/ENABLE/ALLOW PREVENT
Patient tendency for endstate
Affector–patient concordance
Endstate approached
No Yes Yes
No Yes No
Yes Yes No
Figure 11.2 Configurations of forces associated with CAUSE, HELP/ ENABLE/ALLOW, and PREVENT; A = the affector force, P = the patient force, R = the resultant force; E = endstate vector, which is a position vector, not a force
The dynamics model specifies how these three dimensions are captured in terms of configurations of force vectors. Sample configurations of forces for CAUSE, HELP/ ENABLE /ALLOW, and PREVENT are depicted in fig. 11.2. As is customary, the free-body diagrams in fig. 11.2 show forces acting on only one object, in these cases, the patient entity. They do not show the location of the affector entity, only the direction and magnitude of the affector’s force on the patient (i.e. A). Similarly, they do not show the location of the endstate, just the vector that points from the patient to the endstate (i.e. E). In each of the configurations shown in fig. 11.2, the patient entity is also associated with a force (i.e. P). If the patient were a boat, the patient force might correspond to the force generated by the boat’s motor; if the patient were a rock on the ground, the patient force might correspond to the force of friction (or the tendency to resist movement in a particular direction due to frictional forces). When the patient has a tendency for the endstate, the patient vector, P, will point in the same direction as the endstate vector, E; when the patient does not have a tendency for the endstate, the patient vector will point in a different direction from E. When the patient and the affector are in concordance, their respective vectors will point in the same direction; otherwise, they will point in different directions. Finally, the patient entity will approach the endstate when the resultant of the A and P vectors, R, is in the same direction as the endstate vector, E. Support for the dynamics model’s account of CAUSE, HELP/ENABLE/ ALLOW, and PREVENT was provided in a series of experiments in which participants categorized 3-D animations of realistically rendered objects with trajectories that were wholly determined by the force vectors entered
242
Wolff, Hausknecht, and Holmes
into a physics simulator (Wolff 2007). (The animations can be viewed at http://userwww.service.emory.edu/∼pwolff/CLSAnimations.htm.) In these experiments, the very same physical forces used to generate physical scenes were used as inputs into a computer model to predict how those scenes would be described. As reported in Wolff (2007 and Wolff and Zettergren 2002), the fit between the predictions of the model and people’s descriptions of the events was strong. 6.2
Combining relations in the force theory
Whereas the dynamics model is an account of how people represent individual relations, the force theory is an account of how people generate new relations by combining them, a process referred to in mathematics as relation composition. For example, given that nerve damage causes pain and pain causes lost workdays, an overarching causal relation can be generated via the composition of the two cause relations to produce nerve damage causes lost workdays. In the following sections we show how these two parts of relation composition – the joining of relations and the generation of summary conclusions – are accomplished. In the force theory, the mechanism for combining relations depends on whether the initial relation in a pair of relations is generative or preventative. If the initial relation is generative (CAUSE, HELP or ALLOW), then the first relation is connected to the second relation by using the resultant of the first relation as the affector in the second relation. For example, as shown in fig. 11.3, if the causal chain involves a sequence of CAUSE relations, the resultant in the first premise (BA) is the affector in the second premise (BBA ). This sequence can be exemplified by a chain of three marbles, A, B and C in which A first hits B, which in turn hits C. The force that sends B into motion is based on the resultant of the force acting on A moving towards B, minus the force slowing it down (i.e. friction); in other words, the force acting on B would be based on the resultant of the forces acting on A. The force acting on C would, in turn, be based on the resultant of the forces acting on B. When the initial relation in a pair of relations is preventative (PREVENT), the process of relation composition proceeds in a different manner. Note that if A first prevents B, B cannot act on C because B has been prevented. The way such chains are understood, then, is that an interaction first occurs between B and C, and then A acts on B. In terms of forces, as shown in fig. 11.3, when a PREVENT relation is followed by another relation (CAUSE, ALLOW, or PREVENT), the resultant of the second premise (CB) is the patient vector in the first premise (BCB ). The intuition behind this way of connecting the premises can be illustrated with a real-world example of double prevention: pulling a plug to allow water to flow down the drain. This sequence of prevents begins
Absent causes, present effects
243
Figure 11.3 On the left side, two CAUSE relations are combined using the resultant force from the first cause relation (BA) as the affector force in the second cause relation (BBA ). On the right side, a PREVENT relation is combined with another PREVENT relation using the resultant of the PREVENT relation in the second premise as the patient vector in the PREVENT relation in the first premise
Figure 11.4 The affector force in the conclusion, A, is the affector force in the first relation, A. The endstate in the conclusion is the endstate vector from the last premise. The patient force in the conclusion, C, is based on the vector addition of the patient forces, B and C in the premises
with the plug (B) preventing the water (C), that is, with the second premise in a double prevention. With this prevention in place, the next step is for someone (A) to pull the plug (B), which constitutes the first prevent relation. Thus, in double prevention, the order of causation, in a sense, occurs in reverse order of the premises. In the theory, this reversal is captured by using the resultant force in the second premise as the patient vector in the first premise. 6.3
Generating a conclusion
Regardless of the type of causal chain, the manner in which an overall conclusion is reached is the same. As depicted in fig. 11.4, the affector in the conclusion is the affector from the first premise; the endstate in the conclusion is the endstate from the last premise; and the patient in the conclusion is the resultant of the patient vectors in the premises. Intuitively, the patient vector specifies what would happen in the absence of the affector force since it is based on all of the forces in the chain except the affector force. The patient force in the conclusion allows people to evaluate the truth value of the counterfactual if a were not present, b would not occur. According to some theorists,
244
Wolff, Hausknecht, and Holmes
evaluating such a counterfactual is a key part of establishing a causal relation (see Lewis 1973, 2000; Mackie 1974; see also Mandel and Lehman 1996; Spellman and Mandel 1999; Spellman, Kincannon, and Stose 2005). The force theory specifies the knowledge that allows such counterfactuals to be evaluated. 6.4
Accounting for multiple conclusions
Prior research indicates that people’s representations of forces are underspecified with respect to magnitude (Wolff 2007). Not knowing the exact magnitude of the forces adds indeterminacy to people’s representations of causation. The effects of this indeterminacy also emerge when configurations of force are combined: variations in the magnitudes of the forces can sometimes lead to more than one summary conclusion or to a conclusion of reduced strength due to the presence of un-categorizable summary configurations. One type of relation composition that can lead to multiple conclusions is that of double prevention. As argued by several researchers, the composition of two PREVENT relations can sometimes lead to a CAUSE relation and other times to an ALLOW relation (McGrath 2003; Barbey and Wolff 2006, 2007; Sloman et al. 2009; Chaigneau and Barbey 2008). The intuition behind this claim can be illustrated with everyday instances of double prevention. Imagine, for example, causing/allowing a pencil to fall to the floor by letting go of it. Initially, there is a PREVENT relationship between your hand and the pencil: your hand “prevents” the pencil from falling to the floor. With this pre-condition in place, you open your hand, thereby preventing the prevention. The magnitudes of the forces are unclear in this example and, as a consequence, the double prevention is open to either a CAUSE or ALLOW interpretation: we can say either I allowed the pencil to fall to the floor or I caused the pencil to fall to the floor. The force theory predicts that double preventions can lead to either CAUSE or ALLOW conclusions, depending on the magnitude of the patient vectors in the premises. As shown on the left side of fig. 11.5, double preventions lead to CAUSE interpretations when the BCB patient vector is greater in magnitude than the C patient vector; as a consequence, the patient vector in the conclusion points away from the endstate when these two vectors are added together. When the relative magnitude of these two vectors is reversed, as in the right side of fig. 11.5, the conclusion is ALLOW. Given that a sequence of PREVENT relations can lead to either a CAUSE or an ALLOW conclusion, it raises the more general question of which conclusion is more likely. One way to find out is to systematically vary the magnitudes of the vectors and then tally the number of conclusions that follow from the different combinations of magnitudes. A program has been written that conducts such a simulation process (http://userwww.service.emory.edu/∼pwolff/ Transitivedynamics.htm). One part of the program allows users to create chains
Absent causes, present effects
245
Figure 11.5 The composition of two PREVENT relations can either lead to a CAUSE or ALLOW conclusion
of any length and of any combination of relations by manipulating the magnitudes of vectors with the mouse. A second part of the program implements a simulation process that exhaustively varies the magnitudes of the vectors (assuming a uniform distribution) within the constraints set by the relations in the premises. The program then tallies the percentages of conclusions that are generated from all of the possible combinations of magnitudes. Using this procedure, the program finds, for example, that the conclusions associated with double prevention are 62% ALLOW and 38% CAUSE. A second way to determine the proportions of different conclusions is to use integral calculus, as described in Barbey and Wolff (ms.). 6.5
Representing ALLOW/ENABLE
We propose that the concepts of ALLOW and ENABLE are complex relations derived from the composition of two PREVENT relations. In other words, when the composition of two PREVENT relations results in a conclusion in which the affector and patient vectors both point toward the endstate, the resulting conclusion is interpreted as either ALLOW or ENABLE. The idea that ALLOW and ENABLE are based on a series of PREVENT relations is consistent with prior work in philosophy (McGrath 2003; Foot 1967; McMahan 1993), psychology (Barbey and Wolff 2006, 2007; Sloman et al. 2009; Chaigneau and Barbey 2008; Wolff 2007), and linguistics (Talmy 1988). One of the benefits of defining ALLOW/ENABLE in terms of double prevention is that it accounts for the intuition that the affector in an ALLOW relation is a necessary condition for the result (Reinhart 2002; Goldvarg and Johnson-Laird 2001). In generating the predictions for different kinds of causal compositions, ALLOW will be treated as double preventions. 7
Explaining causation by omission
As discussed earlier, the way the force theory handles causation by omission is in terms of double prevention. Consider, again, the example of double
246
Wolff, Hausknecht, and Holmes
prevention in which a person pulls a plug and allows water to flow down the drain. Instead of focusing on the entity that prevented the prevention, the same situation could be described in terms of the absence that was created, specifically, Absence of a plug allowed/caused water to flow down the drain. Similarly, in the pencil scenario, we could say Lack of support allowed/caused the pencil to fall to the floor. As discussed earlier, how a double prevention is described will depend on whether the speaker wishes to focus on the entity/event that prevented the prevention or the absence caused by the prevention. As noted earlier, once causation by omission is analyzed as double prevention, it becomes possible to specify how it might be represented in terms of forces and, hence, by a process theory of causation. For example, consider the chain of forces and frames from an animation generated from the forces in fig. 11.6. The chain of forces at the top of fig. 11.6 instantiates double prevention. Relevant information about these forces was entered into a physics simulator to produce the animation depicted in fig. 11.6. In the beginning of the animation (left panel), car C approaches the line. Car B then approaches car C and prevents it from crossing the line (middle panel). Car A then pulls car B away, preventing the prevention (middle panel). Finally, with car B out of the way, car C crosses the line (right panel). As with the examples above, this sequence of prevents leads to a positive outcome. However, in this case, it seems more natural to say Car A allowed car C to cross the line than to say Car A caused car C to cross the line.1 In any case, the double prevention depicted in fig. 11.6 can be described in two ways: Car A allowed car C to cross the line or The absence of car B allowed car C to cross the line. Importantly, although car B exerts a force on car C, and car A exerts a force on car B, these two forces are in opposite directions. Because they are in opposite directions, the force car A exerts on car B is not transferred from car B to car C. Rather, the force that car A exerts on car B removes the force that car B exerts on car C. Further, note that there is no energy tradeoff between cars A and C. When energy is transferred, the energy gained by one entity is balanced by the energy lost by another. In the animation depicted in fig. 11.6, as car A accelerates, so does car C, indicating that kinetic energy is not transferred from car A to car C. These observations are important because they demonstrate that transmission of force is not a necessary condition of causation; causation can also come about from the removal of a force. The idea that causation need not involve a transmission of force conflicts with a major assumption of previous process theories. The force 1
In the literature on double prevention, it has been noted that the sequence resulting from a double prevention sometimes seems causal, while at other times it does not (Hall 2004; Livengood and Machery 2007). In those cases where it sounds odd to describe a double prevention in terms of “cause,” an “allow” paraphrase is often acceptable. Notice that Car A allowed car C to cross the line can also be described as The absence of car B allowed car C to cross the line.
Absent causes, present effects
247
Figure 11.6 The configuration of forces in the top panel, which depicts a PREVENT ◦ PREVENT composition, was entered into a physics simulator to produce the movements of the cars in the animation depicted in the still frames in the bottom panel. First, car C attempts to cross the line but is prevented by car B, which approaches car C. Then, car A pulls car B away from car C with a rope, preventing car B from preventing car C. Finally, with car B out of the way, car C crosses the line
theory explains how this assumption can be let go by process theories while maintaining the commitment to the centrality of energy and force in people’s representations of causation. As discussed earlier, the reason why causation by omission has been a problem for process theories is because these theories held that causation was defined in terms of transmission. Removing this assumption makes it possible for process theories to account for causation by omission as well as several other phenomena, including the meaning of ALLOW and the causation of absences (see Barbey and Wolff ms.).
248
Wolff, Hausknecht, and Holmes
In the following section, we describe two lines of support for the force theory and its account of causation by omission. In one line, we show that the force theory is able to predict the conclusions people generate for a wide range of causal compositions, including those involving causation by omission. In a second line, we show that the force theory is able to account for the kinds of expressions people choose to use when describing animations of physical events. 8
Evidence in support of the force theory
The force theory is capable of explaining the relational composition of a wide range of causal chains. Some of the predictions of the theory were tested in an experiment reported in Barbey and Wolff (2006; ms.). In that experiment, participants (N = 40) read two-relation causal chains constructed from real-world causal statements found on the internet. For example, for the argument A causes B and B causes C, participants read sentences like Factories cause pollution, Pollution causes global warming. They also read sentences involving causation by omission. For example, for the argument not-A causes not-B and B causes C, people saw statements like Leaf loss causes lack of shade and Shade causes cooling. Six real-world instantiations were found for all thirty-two argument types shown in table 11.2 for a total of 192 arguments. In table 11.2, the columns show all possible 1st relations and the rows show the different possible 2nd relations. After reading the causal chain, participants chose the relation between the unconnected A and C terms that sounded most natural from a list of ten possible conclusions (A causes C, A allows C, A prevents C, A causes not-C, A allows not-C, A prevents not-C, not-A causes C, not-A allows C, not-A prevents C, or none of the above). In the actual experiment, the A, B, and C terms were filled in with the terms named in the causal chain. The results in table 11.2 show that participants’ relational compositions were well explained by the force theory. As a measure of fit, the average Pearson correlation between the mean response to each item and the force theory was .86, a correlation that was significantly different from chance, p < .0001. According to the force theory, double prevents should lead to ALLOW responses 62% of the time and CAUSE responses 38% of the time. As shown in table 11.3, the result supported the prediction. The most frequent response to double preventions was ALLOW, and ALLOW responses were chosen more often than CAUSE responses, 39% to 8%. As shown in table 11.2, fourteen of the thirty-two causal chains involved causation by omission. The force theory correctly predicted the most frequent response for twelve of these chains, a result that was significantly greater than chance by a binomial test, p = 0.013. The results support the hypothesis that the force theory describes the processes that people use to compose causal relations, including causal compositions involving causation by omission.
Absent causes, present effects
249
Table 11.2 Results from Experiment 1 of Barbey and Wolff (ms.) showing the predicted (in bold) and observed percentages of response types 1st relation A prevents B
¬A causes B
2nd relation A causes B
A allows B
B causes C
C (100%) C (78%)
A (76%) C (24%) P (37%) A (66%) C (23%) P (93%)
¬C (50%) P (43%) ¬C (46%) P (26%)
B allows C
A (76) C (24%) A (69) C (15%)
A (100%) A (89%)
P (30%) P (82%)
¬A (90%) ¬C (9%) ¬A (28%) ¬C (13%)
B prevents C P (100%) P (77%)
P (22%) P (69%)
A (62%) C (38%) ¬P (51%) A (39%) C (8%) ¬P (38%)
¬B causes C P (59%) P (62%)
P (5%) P (71%)
C (49%) A (22%) ¬P (20%) C (58%) A (18%) C (23%) ¬P (20%)
A causes ¬B
A allows ¬B
A prevents ¬B
B causes C
P (59%) P (59%)
P (5%) P (61%)
C (49%) A (22%) ¬P (19%) C (39%) A (20%) ¬P (34%)
B allows C
P (17%) P (70%)
P (69%) P (68%)
A (92%) C (7%) ¬P (47%)∗ A (53%) C (8%) A (43%) ¬P (29%)
¬A causes ¬B
B prevents C C (61%) A (39%) A (95%) C (5%) P (49%) C (60%) A (28%) A (50%) C (18%) P (65%)
¬A (75%)∗ ¬C (23%) P (37%) ¬C (30%) ¬A (24%)
¬B causes C C (54%) A (13%) A (62%) C (35%) P (32%) C (82%) A (7%) A (60%) C (26%) P (69%)
¬C (47%) ¬A (31%) ¬C (34%) P (30%) ¬A (3%)
Note: C = Cause, A = Allow, P = Prevent, ¬C = Not A causes B, ¬A = Not A allows B, ¬P = Not A prevents B; ∗ = missed prediction
The results described above show that the force theory is consistent with the way that people understand and express causal relations. However, one of the most important features of the force theory is that it offers an account of how people might be able to recover causal relations from the world on the basis of a single observation. This prediction was examined in detail – and supported – for single relations in both physical and social domains in Wolff (2007). The force theory also predicts that it should be possible to interpret causal chains on the basis of a single observation. In addition, the theory explains how the phenomenon of double prevention – and by extension, the concept of ALLOW and causation by omission – can be instantiated in physical processes. These predictions were tested in the following experiment. Four 3D animations were made from an animation package called 3D Studio Max (ver. 8). Each animation involved three cars, labeled A, B, or C, acting on each other by pushing and pulling, and a line on the ground, which defined an endstate. The four types of animations were a CAUSE/CAUSE chain, a
250
Wolff, Hausknecht, and Holmes
Table 11.3 Percentage of responses for four types of causal chains Chain type CAUSE/CAUSE
CAUSE/PREVENT
P/P-CAUSE
P/P-ALLOW
A caused C A allowed C A prevented C None of the above
90% 6.7% – 3%
– – 90% –
47% 43% – 10%
10% 70% – 20%
B caused C B allowed C B prevented C None of the above
63% 27% – 10%
– – 87% 13%
7% 17% 13% 63%
7% 27% 17% 50%
Not A caused C Not A allowed C Not A prevented C None of the above
3% 7% 3% 87%
– 7% 13% 80%
7% 7% 17% 70%
– 13% 13% 73%
Not B caused C Not B allowed C Not B prevented C None of the above
23% 13% 10% 52%
3% 10% 23% 56%
7% 50% 3% 40%
7% 50% – 43%
CAUSE/PREVENT chain, and two kinds of PREVENT/PREVENT chains. The direction and magnitude of the cars were calculated using a physical simulator called Havok Reactor. The input into the physics simulator consisted of forces generated by the computer version of the force theory. The average animation lasted 8 seconds. Each time an animation was presented, participants (N = 30) were presented with four possible descriptions. In condition 1, participants chose the best description of the animation from a list of four options: (a) A caused C to cross the line, (b) A allowed C to cross the line, (c) A prevented C from crossing the line, and (d) None of the sentences above are applicable to the scene. The options in condition 2 were the same as those in condition 1 except that all of the As were replaced with Bs; for example, for the CAUSE option, participants were given the sentence B caused C to cross the line. In condition 3, the sentences were also the same as in condition 1, except that the cause was described in terms of its absence; for example, for the CAUSE option, participants were given the sentence The absence of A’s influence caused C to cross the line. In condition 4, the sentences were the same as in condition 3, except that the As were replaced with Bs; hence, in this condition, the CAUSE option was The absence of B’s influence caused C to cross the line. The four conditions were run within participants, in other words, participants saw each animation four times and each time with a different set of options.
Absent causes, present effects
251
In the case of the CAUSE/CAUSE chain (C/C), car A hit car B, which then hit car C, pushing it over the line. For this animation, we predicted that people would choose the description Car A caused car C to cross the line. In the case of the CAUSE/PREVENT chain (C/P), car A hit car B, which then hit car C, blocking it from crossing over the line. For this animation, we predicted that people would choose the description Car A prevented car C from crossing the line. As stated above, there were two PREVENT/PREVENT chains. For the first of these chains, P/P-CAUSE, the relative magnitudes were such that they implied a CAUSE response (see left side of fig. 11.5). For the second PREVENT/PREVENT chain, P/P-ALLOW, the relative magnitude of the forces was such that they implied an ALLOW response (see right side of fig. 11.5). The P/P-ALLOW animation was the same as the one depicted in the bottom of fig. 11.6. The key predictions with regard to the two PREVENT/PREVENT chains were that the proportion of CAUSE responses would be higher for the P/PCAUSE chain than for the P/P-ALLOW chain, and that the proportion of ALLOW responses would be higher for the P/P-ALLOW chain than for the P/P-CAUSE chain. Finally, for the PREVENT/PREVENT chains, but not the other chains, we predicted that people would be willing to describe the causation in terms of absences, specifically, that for the two P/P chains, they would be willing to say The absence of B’s influence caused/allowed C to cross the line. The results in table 11.3 show that these predictions were supported. For the C/C chain, people were very happy to say that Car A caused car C to cross the line as well as Car B caused car C to cross the line. For the C/P chain, participants indicated that Car A prevented car C from crossing the line, and that Car B prevented C from crossing the line. Of primary interest was how people described the causal chains implementing double prevention. As predicted, people preferred to describe the relationship between the A and C cars in the P/P-ALLOW chain with an ALLOW description, but they were also willing to use a CAUSE description. For the P/P-CAUSE chain, people split in their preference between CAUSE and ALLOW descriptions; importantly, though, the number of CAUSE descriptions for P/P-CAUSE chains was higher than for the P/P-ALLOW chains. Of particular interest, people were willing to describe the P/P chains – but not others – in terms of the absence of an influence. For both types of P/P chains, participants were happy to say The absence of B’s influence allowed C to cross the line. The results support the hypothesis that people conceptualize causation by omission in terms of double prevention. 9
Summary
In this chapter we contrasted two general approaches to causation. We argued that outcome theories define causation in terms of the outward signs of causal
252
Wolff, Hausknecht, and Holmes
relationships while process approaches define causation in terms of the processes that produce causation in the world. We argued that the quantities that are essential to causation in the world are also central to causation in the mind. In support of this position, we discussed how several of the most prominent characteristics of causation – spatial and temporal contiguity, mechanism, and temporal priority – fall out naturally from representations of causation that are based on dynamics, but do not fall out of representations that specify the outward signs of causation, loosely speaking, their kinematics. In further support of process theories, we reviewed literature suggesting that people are able to represent the dynamic properties of the environment. Finally, we addressed the most serious challenges that have been raised against process approaches to causation, namely Hume’s criticism that theories based on force are circular and the problem of how to represent causation by omission. We showed that these challenges can be handled by thinking about causation in terms of the configurations of forces. The key insight is that causation is based not only on the transmission of force, but also on its removal. In support of this proposal, we described the results of two experiments. In one, we showed that a particular process model, the force theory, was able to predict the relational compositions associated with linguistic descriptions. In a second experiment, we demonstrated the ability of the force theory to predict the composition of relations from visually presented stimuli and, in particular, its ability to differentiate double preventions that are typically construed as CAUSE relations from those that are typically construed as ALLOW relations. In addition, we showed that both types of these double prevention chains can be described in terms of the absence of a force. 10
Conclusions
Much of the recent research on causal cognition has been dominated by outcome theories of causation, that is, the proposal that causal knowledge is specified in terms of actual or possible outcomes. The emphasis is understandable since outcome theories have been able to address in a rigorous fashion some of the most interesting aspects of causal cognition, in particular, the problems of how people put causal relationships together into larger structures and then reason over those structures. The force theory represents the first process theory to address these more demanding problems. In the force theory, we know the relevant entities (patient, affector, endstate), their assignment to representational elements of causal knowledge (configurations of force), their meaning (quantitative force vectors), and how their structure maps (sometimes directly) onto the structure of causal events in the world, demonstrating for the first time how our understanding of forces in the world might provide the basis for our ability to represent and reason about causal relations.
References
Ahn, W. and Bailenson, J. 1996. ‘Mechanism-based explanations of causal attribution: An explanation of conjunction and discounting effect’, Cognitive Psychology 31: 82–123. Ahn, W. and Kalish, C. W. 2000. ‘The role of mechanism beliefs in causal reasoning’, in F. C. Keil and R. A. Wilson (eds.), Explanation and Cognition, pp. 199–225. Cambridge, MA: MIT Press. Ahn, W., Kalish, C. W., Medin, D. L., and Gelman, S. A. 1995. ‘The role of covariation versus mechanism information in causal attribution’, Cognition 54: 299–352. Aikhenvald, A. Y. and Dixon, R. M. W. (eds.) 2006. Serial Verb Constructions: A Cross-linguistic Typology. Oxford: Oxford University Press. Alibali, M. W., Kita, S., and Young, A. J. 2000. ‘Gesture and the process of speech production: We think, therefore we gesture’, Language and Cognitive Processes 15(6): 593–613. Alsina, A., Bresnan, J., and Sells, P. (eds.) 1997. Complex Predicates. Stanford: CSLI Publications. Ameka, F. K. 2005a. ‘Ewe serial verb constructions in their grammatical context’, in A. Y. Aikhenvald and R. M. W. Dixon (eds.), Serial Verb Constructions: A Cross-linguistic Typology, pp. 124–43. Oxford: Oxford University Press. 2005b. ‘Multiverb constructions on the West African littoral: Micro-variation and ˚ areal typology’, in M. Vulchanova and T. A. Afarli (eds.), Grammar and Beyond: Essays in honor of Lars Hellan, pp. 15–42. Oslo: Novus Press. Ameka, F. K. and Levinson, S. C. (eds.) 2007. ‘Special issue on locative predicates’, Linguistics 45(5/6). Andersen, E. S. 1978. ‘Lexical universals in body-part terminology’, in J. H. Greenberg (ed.), Universals of Human Language, vol. 3, pp. 335–68. Stanford, CA: Stanford University Press. Aronoff, M., Meir, I., and Sandler, W. 2005. ‘The paradox of sign language morphology’, Language 81(2): 301–44. Aronoff, M., Meir, I., Padden, C., and Sandler, W. 2003. ‘Classifier constructions and morphology in two sign languages’, in K. Emmorey (ed.), Perspectives on Classifier Constructions in Sign Languages, pp. 53–84. Mahwah, NJ: Lawrence Erlbaum Associates. Aronson, J. L. 1971. ‘On the grammar of “CAUSE”’, Synthese 22: 414–30. Aske, J. 1989. ‘Path predicates in English and Spanish: A closer look’, Proceedings of the Fifteenth Annual Meeting of the Berkeley Linguistics Society, pp. 1–14. Berkeley, CA: Berkeley Linguistics Society. 253
254
References
Bach, E. 1981. ‘On time, tense, and aspect: An essay in English metaphysics’, in P. Cole (ed.), Radical Pragmatics, pp. 62–81. New York: Academic Press. Baldwin, D. 2005. ‘Discerning intentions: Characterizing the cognitive system at play’, in B. D. Homer and C. S. Tamis-LeMonda (eds.), The Development of Social Cognition and Communication, pp. 117–44. Mahwah, NJ: Lawrence Erlbaum. Barbey, A. K. and Wolff, P. 2006. ‘Causal reasoning from forces’, Proceedings of the 28th Annual Conference of the Cognitive Science Society, pp. 24–39. Mahwah, NJ: Lawrence Erlbaum. 2007. ‘Learning causal structure from reasoning’, Proceedings of the 29th Annual Conference of the Cognitive Science Society, pp. 713–18. Mahwah, NJ: Lawrence Erlbaum. ms. ‘Composing causal relations in force dynamics’, manuscript, Emory University. Barwise, J. and Perry, J. 1983. Situations and Attitudes. Cambridge, MA: MIT Press. Bavelas, J. B., Kenwood, C., Johnson, T., and Phillips, B. 2002. ‘An experimental study of when and how speakers use gestures to communicate’, Gesture 2(1): 1–17. Beattie, G. and Shovelton, H. 1999a. ‘Do iconic hand gestures really contribute anything to the semantic information conveyed by speech?’ Semiotica 123(1/2): 1–30. 1999b. ‘Mapping the range of information contained in the iconic hand gestures that accompany spontaneous speech’, Journal of Language and Social Psychology 18(4): 438–62. 2002. ‘An experimental investigation of some properties of individual iconic gestures that mediate their communicative power’, British Journal of Psychology 93(2): 179–92. Beebee, H. 2004. ‘Causing and Nothingness’, in J. Collins, N. Hall, and L. Paul (eds.), Causation and Counterfactuals, pp. 291–308. Cambridge, MA: MIT Press. Berlin, B. and Kay, P. 1969. Basic Color Terms. [Paperback edition, reprinted 1999] Berkeley, CA: University of California Press. Berman, R. A. and Slobin, D. I. 1994. ‘Filtering and packaging in narrative’, in R. Berman and D. I. Slobin (eds.), Relating Events in Narrative: A Crosslinguistic Developmental Study, pp. 515–54. Hillsdale, NJ: Lawrence Erlbaum. 1994b. Relating Events in Narrative: A Crosslinguistic Developmental Study. Hillsdale, NJ: Lawrence Erlbaum. Berthele, R. 2004. ‘The typology of motion and posture verbs: A variationist account’, in B. Kortmann (ed.), Dialectology Meets Typology. Dialect Grammar from a Cross-linguistic Perspective, pp. 93–126. Berlin: Mouton de Gruyter. 2006. Ort und Weg. Die sprachlich Raumreferenz in Variet¨aten des Deutschen, R¨atoromanischen und Franz¨osischen. Berlin: Mouton de Gruyter. Biederman, I. 1972. ‘Perceiving real world scenes’, Science 177: 77–80. 1987. ‘Recognition-by-components: A theory of human image understanding’, Psychological Review 94: 115–17. Biederman, I., Rabinowiz, J., Glass, A. L., and Stacy, E. W. Jr. 1974. ‘On the information extracted from a glance at a scene’, Journal of Experimental Psychology 103: 597–600. Bock, K. and Levelt, W. J. M. 1994. ‘Language production. Grammatical encoding’, in M. A. Gernsbacher (ed.), Handbook of Psycholinguistics, pp. 405–52. New York: Academic Press.
References
255
Bock, K., Irwin, D. E., Davidson, D. J., and Levelt, W. J. M. 2003. ‘Minding the clock’, Journal of Memory and Language 48: 653–85. Bohnemeyer, J. 1999. ‘A questionnaire on event integration’, in D. P. Wilkins (ed.), “Manual” for the 1999 Field Season, pp. 87–95. Nijmegen: Max Planck Institute for Psycholinguistics. 2003. ‘The unique vector constraint’, in E. van der Zee and J. Slack (eds.), Representing Direction in Language and Space, pp. 86–110. Oxford: Oxford University Press. 2004. ‘Split intransitivity, linking, and lexical representation: the case of Yukatek Maya’, Linguistics 42(1): 67–107. Bohnemeyer, J. and Caelen, M. 1999. ‘The ECOM clips: A stimulus for the linguistic coding of event complexity’, in D. P. Wilkins (ed.), ‘Manual’ for the 1999 Field Season, pp. 74–86. Nijmegen: Max Planck Institute for Psycholinguistics. Bohnemeyer, J. and Majid, A. 2002. ‘ECOM causality revisited version 4’, in S. Kita (ed.), 2002 Supplement (Version 3) for the ‘Manual’ for the Field Season 2001, pp. 35–58. Nijmegen: Max Planck Institute for Psycholinguistics. Bohnemeyer, J., Eisenbeiß, S., and Narasimhan, B. 2006. ‘Ways to go: Methodological considerations in Whorfian studies on motion events’, Essex Research Reports in Linguistics 50: 1–20. ¨ Bohnemeyer, J., Enfield, N., Essegbey, J., Ibarretxe, I., Kita, S., Lupke, F., and Ameka, F. K. 2007. ‘Principles of event representation in language: The case of motion events’, Language 83(3): 495–532. Boroditsky, L. 2001. ‘Does language shape thought? English and Mandarin speakers’ conceptions of time’, Cognitive Psychology 43(1): 1–22. 2003. ‘Linguistic relativity’, in L. Nadel (ed.), Encyclopedia of Cognitive Science, pp. 917–21. London: Macmillan Press. Bower, G. H., Black, J. B., and Turner, T. J. 1979. ‘Scripts in memory for text’, Cognitive Psychology 11: 177–220. Bowerman, M. 1978. ‘Systematizing semantic knowledge: Changes over time in the child’s organization of word meaning’, Child Development 49(4): 987–97. 1982. ‘Starting to talk worse: Clues to language acquisition from children’s late speech errors’, in S. Strauss and R. Stavy (eds.), U-shaped Behavioral Growth, pp. 101–46. New York: Academic Press. 2005. ‘Why can’t you “open” a nut or “break” a cooked noodle? Learning covert object categories in action word meanings’, in L. Gershkoff-Stowe and D. Rakison (eds.), Building Object Categories in Developmental Time, pp. 209–44. Mahwah, NJ: Lawrence Erlbaum. Bowerman, M. and Choi, S. 2001. ‘Shaping meanings for language: universal and language-specific in the acquisition of spatial semantic categories’, in M. Bowerman and S. C. Levinson (eds.), Language Acquisition and Conceptual Development, pp. 475–511. Cambridge: Cambridge University Press. Bowerman, M. and Pederson, E. ms. ‘Cross-linguistic perspectives on topological spatial relations’, Max Planck Institute for Psycholinguistics. Bowerman, M., Brown, P., Eisenbeiß, S., Narasimhan, B., and Slobin, D. I. 2002. ‘Putting things in places. Developmental consequences of linguistic typology’, in E. V. Clark (ed.), Space In Language. Location, Motion, Path, and Manner. The
256
References
Proceedings of the 31st Stanford Child Language Research Forum, pp. S1–S122. Stanford, CA: CSLI Publications. Braun, J. 2003. ‘Natural scenes upset the visual applecart’, Trends in Cognitive Sciences 7: 7–9. Brennan, M. 1992. ‘The visual world of BSL: An introduction’, in D. Brien (ed.), Dictionary of British Sign Language/English, pp. 1–133. London: Faber and Faber. Bresnan, J. 1982. ‘Polyadicity’, in J. Bresnan (ed.), The Mental Representation of Grammatical Relations, pp. 149–72. Cambridge, MA: MIT Press. Brown, C. H. 1976. ‘General principles of human anatomical partonomy and speculations on the growth of partonomic nomenclature’, American Ethnologist 3: 400–24. Brown, P. 2006. ‘A sketch of the grammar of space in Tzeltal’, in S. C. Levinson and D. P. Wilkins (eds.), Grammars of Space: Explorations in Cognitive Diversity, pp. 230–72. Cambridge: Cambridge University Press. 2008. ‘Verb specificity and argument realization in Tzeltal child language’, in M. Bowerman and P. Brown (eds.), Crosslinguistic Perspectives on Argument Structure: Implications for Language Acquisition, pp. 167–89. Mahwah, NJ: Lawrence Erlbaum. forthcoming. ‘To “put” or to “take”? Verb semantics in Tzeltal placement and removal expressions’, in A. Kopecka and B. Narasimhan (eds.), Events of “putting” and “taking”: A Crosslinguistic Perspective. Amsterdam: Benjamins. Brown, R. W. and Lenneberg, E. H. 1958. ‘Studies in linguistic relativity’, in E. Maccoby, T. H. Newcomb, and E. L. Hartley (eds.), Readings in Social Psychology 3rd edn, pp. 9–18. New York: Holt, Rinehart and Winston. Bruce, L. 1986. The Alamblak Language of Papua New Guinea (East Sepik). Canberra: Pacific Linguistics. 1988. ‘Serialization: from syntax to lexicon’, Studies in Language 12(1):19–49. Bruce, V., Green, P. R., and Georgson, M. A. 1996. Visual Perception: Physiology, Psychology, and Ecology, 3rd edn. Hove, UK: Psychology Press, Erlbaum. Brugman, H. and Kita, S. 1995. ‘Impact of digital video technology on transcription: a case of spontaneous gesture transcription’, KODIKAS/CODE: Ars Semiotica. An International Journal of Semiotics 18: 95–112. Bullock, M., Gelman, R., and Baillargeon, R. 1982. ‘The development of causal reasoning’, in W. Friedman (ed.), The Developmental Psychology of Time, pp. 209–55. London: Academic Press. Carlson, G. 1984. ‘Thematic roles and their role in semantic interpretation’, Linguistics 22: 259–79. Carroll, M. and von Stutterheim, C. 2003. ‘Typology and information organization: perspective taking and language-specific effects in the construal of events’, in A. G. Ramat (ed.), Typology and Second Language Acquisition, pp. 365–402. Berlin: Mouton de Gruyter. Carroll, M., Murcia-Serra, J., Watorek, M., and Bendiscoli, A. 2000. ‘The relevance of information organization to second language acquisition studies: The descriptive discourse of advanced adult learners of German’, Studies in Second Language Acquisition 22(3): 441–66. Carroll, M., Rossdeutscher, A., Lambert, M., and von Stutterheim, C. 2008. ‘Subordination in narratives and macrostructural planning: A comparative point of view’,
References
257
in C. Fabricius Hansen and W. Ramm (eds.), Subordination versus Coordination in Sentence and Text – From a Crosslinguistic Perspective, pp. 161–84. Amsterdam: Benjamins. 1993. ‘The representation of spatial configurations in English and German and the grammatical structure of locative and anaphoric expressions’, Linguistics 31: 1011– 41. Casati, R. and Varzi, A. C. 1999. Parts and Places: The Structures of Spatial Representation. Cambridge, MA: MIT Press. (eds.) 1996. Events. Brookfield, VT: Dartmouth. Cassell, J., McNeill, D., and McCullough, K.-E. 1999. ‘Speech–gesture mismatches: Evidence for one underlying representation of linguistic and nonlinguistic information’, Pragmatics and Cognition 7(1): 1–33. Chafe, W. 1979. ‘The flow of thought and the flow of language’, in T. Giv´on (ed.), Syntax and Semantics, vol. 12: Discourse and Syntax, pp. 159–81. New York: Academic Press. 1980. ‘The deployment of consciousness in the production of a narrative’, in W. Chafe (ed.), The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production. Norwood, NJ: Ablex. 1987. ‘Cognitive constraints on information flow’, in R. Tomlin (ed.), Coherence and Grounding in Discourse, pp. 21–51. Amsterdam and Philadelphia: Benjamins. 1994. Discourse, Consciousness and Time. The Flow and Displacement of Conscious Experience in Speaking and Writing. Chicago and London: University of Chicago Press. Chaigneau, S. E. and Barbey, A. K. 2008. ‘Assessing psychological theories of causal meaning and inference’, in B. C. Love, K. McRae, and V. M. Sloutsky (eds.), Proceedings of the 30th Annual Conference of the Cognitive Science Society, pp. 1111–16. Austin, TX: Cognitive Science Society. Chatterjee, S. H., Freyd, J. J., and Shiffrar, M. 1996. ‘Configural processing in the perception of apparent biological motion’, Journal of Experimental Psychology: Human Perception and Performance 22: 916–29. Cheng, P. W. 1997. ‘From covariation to causation: A causal power theory’, Psychological Review 104: 367–405. Cheng, P. W. and Novick, L. R. 1991. ‘Causes versus enabling conditions’, Cognition 40: 83–120. 1992. ‘Covariation in natural causal induction’, Psychological Review 99: 365–82. Chenu, F. and Jisa, H. 2006. ‘Caused motion constructions and semantic generality in early acquisition of French’, in E. V. Clark and B. F. Kelly (eds.), Constructions in Acquisition, vol. 174, pp. 233–61. Stanford, CA: CSLI Publications. Chierchia, G. 1998a. ‘Reference to kinds across languages’, Natural Language Semantics 6: 339–405. 1998b. ‘Plurality of mass nouns and the notion of semantic parameter’, in S. Rothstein (ed.), Events and Grammar, pp. 53–103. Dordrecht: Kluwer. Choi, S., McDonough, M., Bowerman, M., and Mandler, J. 1999. ‘Early sensitivity to language-specific spatial categories in English and Korean’, Cognitive Development 14(2): 241–68. Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris.
258
References
Chung, S. and Ladusaw, W. A. 2004. Restriction and saturation. Linguistic Inquiry Monographs 42. Cambridge, MA: MIT Press. Clahsen, H. 1982. Spracherwerb in der Kindheit: Eine Untersuchung zur Entwicklung der Syntax bei Kleinkindern. T¨ubingen: Narr. Clark, H. H. and Gerrig, R. J. 1990. ‘Quotations as demonstrations’, Language 66: 764–805. Clark, H. H. and Wilkes-Gibbs, D. 1986. ‘Referring as a collaborative process’, Cognition 22(1): 1–39. Clark, H., H., Carpenter, P. A., and Just, M. A. 1973. ‘On the meeting of semantics and perception’, in W. G. Chase (ed.), Visual Information Processing, pp. 311–82. New York: Academic Press. Cohen, C. E. and Ebbesen, E. B. 1979. ‘Observational goals and schema activation: a theoretical framework for behavior perception’, Journal of Experimental Social Psychology 15: 305–29. Comrie, B. 1976. Aspect. Cambridge: Cambridge University Press. 1981. Language Universals and Linguistic Typology. Chicago, IL: University of Chicago Press. Conditt, M. A., Gandolfo, F., and Mussa-Ivaldi, F. A. 1997. ‘The motor system does not learn the dynamics of the arm by rote memorization of past experience’, Journal of Neurophysiology 78: 554–60. Croft, W. 1990. ‘Possible verbs and the structure of events’, in S. L. Tsohatzidis (ed.), Meanings and Prototypes: Studies in Linguistic Categorization, pp. 48–73. London: Routledge. ¨ 1985. Tense and Aspect Systems. Oxford: Oxford University Press. Dahl, O. Davenport, J. L. and Potter, M. C. 2004. ‘Scene consistency in object and background perception’, Psychological Science 15: 559–64. David, C. 2003. Les “verbs of putting”: Typologie, sch´ema syntaxique et organisation s´emantique des constructions pr´epositionnelles en anglais contemporain. Unpublished PhD dissertation, Universit´e de Poitiers, Poitiers. Davidson, D. 1967. ‘The logical form of action sentences’, in N. Rescher (ed.), The Logic of Decision and Action, pp. 81–95. Pittsburgh, PA: University of Pittsburgh Press. Davidson, P. R. and Wolpert, D. M. 2004. ‘Internal models underlying grasp can be additively combined’, Experimental Brain Research 155: 334–40. Davies, J. 1981. Kobon. Lingua Descriptiva Series 3. Amsterdam: North-Holland. De Ruiter, J. P. 2000. ‘The production of gesture and speech’, in D. McNeill (ed.), Language and Gesture: Window into Thought and Action, pp. 284–311. Cambridge: Cambridge University Press. 2007. ‘Postcards from the mind: The relationship between speech, gesture and thought’, Gesture 7(1): 21–38. Delorme, A., Richard, G., and Fabre-Thorpe, M. 2000. ‘Ultra-rapid categorization of natural scenes does not rely on color cues: a study in monkeys and humans’, Vision Research 40: 2187–200. Deringil, S. 2002. I˙ktidarın Sembolleri ve I˙deoloji: II. Abd¨ulhamid D¨onemi (1876– 1909), [The symbols of power and ideology: The era of II. Abd¨ulhamid (1876– 1909)] ˙Istanbul: Yapi Kredi Press. Deubel, H. and Schneider, W. X. 1996. ‘Saccade target selection and object recognition: Evidence for a common attentional mechanism’, Vision Research 36: 1827– 37.
References
259
Dickinson, C. 2002. Complex predicates in Tsafiki. Unpublished PhD dissertation, University of Oregon. Dixon, R. M. W. 1994. Ergativity. Cambridge: Cambridge University Press. Dobel, C. ms. ‘Interaction of vision and language in the description of action events’. Westf¨alische Wilhelms-Universit¨at M¨unster. Dobel, C., Gumnior, H., B¨olte, J., and Zwitserlood, P. 2007. ‘Describing scenes hardly seen’, Acta Psychologica 125: 129–43. Dowe, P. 2000. Physical Causation. Cambridge: Cambridge University Press. 2001. ‘A counterfactual theory of prevention and “causation” by omission’, Australasian Journal of Philosophy 79: 216–26. 2007. ‘Causal processes’, in Stanford Encyclopedia of Philosophy. http://plato. stanford.edu. Dowty, D. 1979. Word Meaning and Montague Grammar. Synthese Language Library. Dordrecht: Reidel. 1991. ‘Thematic roles and argument selection’, Language 67: 547–619. Du Bois, J. 1987. ‘The discourse basis of ergativity’, Language 63(4): 805–55. Duncan, S. 1996. Grammatical form and ‘thinking-for-speaking’ in Mandarin Chinese and English: An analysis based on speech-accompanying gesture. Unpublished PhD dissertation, University of Chicago, Chicago. Durie, M. 1997. ‘Grammatical structure in verb serialization’, in A. Alsina, J. Bresnan and P. Sells (eds.), Complex Predicates, pp. 289–354. Stanford, CA: CSLI Publications. Emmorey, K. 2002. Language, Cognition, and the Brain: Insights from Sign Language Research. Mahwah, NJ: Lawrence Erlbaum. (ed.) 2003. Perspectives on Classifier Constructions in Sign Languages. Mahwah, NJ: Lawrence Erlbaum. Emmorey, K. and Falgier, B. 1999. ‘Talking about space with space: Describing environments in ASL’, in E. Winston (ed.), Storytelling and Conversation: Discourse in Deaf Communities, pp. 3–26. Washington, DC: Gallaudet University Press. Engberg-Pedersen, E. 1993. Space in Danish Sign Language: The Semantics and Morphosyntax of the Use of Space in a Visual Language. Hamburg: Signum Press. 1995. ‘Point of view expressed through shifters’, in K. Emmorey and J. Reilly (eds.), Language, Gesture and Space, pp. 133–55. Hillsdale, NJ: Lawrence Erlbaum. Fabre-Thorpe, M., Delorme, A., Marlot, C., and Thorpe, S. 2001. ‘A limit to the speed of processing in ultra-rapid visual categorization of novel natural scenes’, Journal of Cognitive Neuroscience 13: 171–80. Fair, D. 1979. ‘Causation and the flow of energy’, Erkenntnis 14: 219–50. Farr, C. 1999. The Interface Between Syntax and Discourse in Korafe, a Papuan Language of Papua New Guinea. Canberra: Pacific Linguistics. Ferreira, F. 2000. ‘Syntax in language production: An approach using tree-adjoining grammars’, in L. Wheeldon (ed.), Aspects of Language Production, pp. 291–330. London: Psychology Press. von Fieandt, K. and Gibson, J. J. 1959. ‘The sensitivity of the eye to two kinds of continuous transformation of a shadow pattern’, Journal of Experimental Psychology 57: 344–7. Fillmore, C. J. 1968. ‘The case for case’, in E. Bach and R. T. Harms (eds.), Universals of Linguistic Theory, pp. 1–90. New York: Holt, Rinehart, and Winston. 1972. ‘Subjects, speakers and roles’, in D. Davidson and G. Harman (eds.), Semantics of Natural Language, pp. 1–24. Dordrecht: Reidel.
260
References
Finkbeiner, M., Nicol, J., Greth, D., and Nakamura, K. 2002. ‘The role of language in memory for actions’, Journal of Psycholinguistic Research 31(5): 447–57. Fodor, J. 1970. ‘Three reasons not to derive “kill” from “cause to die”’, Linguistic Inquiry 1: 429–438. Foley, W. and Olson, M. 1985. ‘Clausehood and verb serialization’, in J. Nichols and A. C. Woodbury (eds.), Grammar Inside and Outside the Clause, pp. 17–60. Cambridge: Cambridge University Press. Foley, W. and Van Valin Jr., R. D. 1984. Functional Syntax and Universal Grammar. Cambridge: Cambridge University Press. Foot, P. 1967. ‘The problem of abortion and the doctrine of double effect’, Oxford Review 5: 5–15. Reprinted in B. Steinbock and A. Norcross (eds.), Killing and Letting Die, 2nd edn, pp. 266–79. New York: Fordham University Press. ¨ Frey, W. 2000. ‘Uber die syntaktische Position des Satztopiks im Deutschen’, Issues on Topics. ZAS Papers in Linguistics 20: 137–72. Freyd, J. J., Pantzer, T. M., and Cheng, J. L. 1988. ‘Representing statics as forces in equilibrium’, Journal of Experimental Psychology: General 117: 395–407. Fridman-Mintz, B. and Liddell, S. K. 1998. ‘Sequencing mental spaces in an ASL narrative’, in J. P. Koenig (ed.), Discourse and Cognition: Bridging the Gap, pp. 255–68. Cambridge: Cambridge University Press. Gennari, S. P., Sloman, S. A., Malt, B. C., and Fitch, W. 2002. ‘Motion events in language and cognition’, Cognition 83(1): 49–79. Gernsbacher, M. A. 1997. ‘Coherence cues mapping during comprehension’, in J. Costermans and M. Fayol (eds.), Processing Interclausal Relationships: Studies in the Production and Comprehension of Text, pp. 3–22. Mahwah, NJ: Lawrence Erlbaum. Giv´on, T. 1979. ‘From discourse to syntax: grammar as a processing strategy’, in T. Giv´on (ed.), Syntax and Semantics, vol. 12: Discourse and Syntax, pp. 81–112. New York: Academic Press. 1984. Syntax. A Functional-typological Introduction, vol. 1. Amsterdam: Benjamins. 1990. ‘Verb serialization in Tok Pisin and Kalam: a comparative study of temporal packaging’, in J. Verhaar (ed.), Melanesian Pidgin and Tok Pisin, pp. 19–56. Amsterdam: Benjamins. 1991a. ‘Serial verbs and event cognition in Kalam: an empirical study of cultural relativity’, in C. Lefebvre (ed.), Serial Verbs: Grammatical, Comparative and Universal Grammar, pp. 137–84. Amsterdam: Benjamins. 1991b. ‘Serial verbs and the mental reality of “event”’, in E. C. Traugott and B. Heine (eds.), Approaches to Grammaticalization, vol. 1, pp. 81–227. Amsterdam: Benjamins. Glanemann, R., 2008. ‘To See or Not to See: Action Scenes Out of the Corner of the Eye’. Unpublished PhD dissertation, University of M¨unster, http://miami.unimeunster.de/resolver/urn:nbn:de:hbz:6-84599565282. Gleitman, L. R. 1990. ‘The structural sources of verb meanings’, Language Acquisition, 1(1): 3–55. Godfrey-Smith, P. forthcoming. ‘Causal pluralism’, in H. Beebee, C. Hitchcock, and P. Menzies (eds.), Oxford Handbook of Causation. Goldberg, A. 1995. Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press.
References
261
Goldin-Meadow, S. 2003. Hearing Gesture: How Our Hands Help Us Think. Cambridge, MA: Harvard University Press Goldman-Eisler, F. 1968. Psycholinguistics: Experiments in Spontaneous Speech. New York: Academic Press. Goldstein, E. B. 2002. Sensation and Perception. Wadsworth: Thomson Learning. Goldvarg, E. and Johnson-Laird, P. 2001. ‘Naive causality: A mental model theory of causal meaning and reasoning’, Cognitive Science 25: 565–610. Gopnik, A., Glymour, C., Sobel, D., Shulz, L., Kushnir, T., and Danks, D. 2004. ‘A theory of causal learning in children: Causal maps and Bayes nets’, Psychological Review 111: 1–31. Grace, G. 1981. An Essay on Language. Columbia, SC: Hornbeam Press. 1987. The Linguistic Construction of Reality. London/New York/Sydney: Croom Helm. Graham, J. A. and Argyle, M. 1975. ‘A cross-cultural study of the communication of extra-verbal meaning by gestures’, International Journal of Psychology 10(1): 56–67. Green, D. W. 1998. ‘Bilingualism and thought’, Psychologica Belgica 38(3/4): 251– 76. Griffin, Z. M. 2001. ‘Gaze durations during speech reflect word selection and phonological encoding’, Cognition 82: B1–B14. 2004. ‘Why look? Reasons for eye movements related to language production’, in J. M. Henderson and F. Ferreira (eds.), The Interface of Language, Vision, and Action: Eye Movements and the Visual World, pp. 213–48. New York: Psychology Press. Griffin, Z. M. and Bock, K. 2000. ‘What the eyes say about speaking’, Psychological Science 4: 274–9. Guariglia, C. and Antonucci, G. 1992. ‘Personal and extrapersonal space: A case of neglect dissociation’, Neuropsychologia 30(11): 1001–9. Gullberg, M. 1998. Gesture as a Communication Strategy in Second Language Discourse. A Study of Learners of French and Swedish. Lund: Lund University Press. 2003. ‘Gestures, referents, and anaphoric linkage in learner varieties’, in C. Dimroth and M. Starren (eds.), Information Structure and the Dynamics of Language Acquisition, pp. 311–28. Amsterdam: Benjamins. 2006. ‘Handling discourse: Gestures, reference tracking, and communication strategies in early L2’, Language Learning 56(1): 155–96. 2009. ‘Why gestures are relevant to the multilingual mental lexicon’, in A. Pavlenko (ed.), The Multilingual Mental Lexicon, pp. 161–84. Clevedon: Multilingual Matters. ms. ‘What learners mean. Gestures and semantic reorganisation of placement verbs in advanced second language production’. Gullberg, M. and Burenhult, N. forthcoming. ‘Probing the linguistic encoding of placement and removal events in Swedish’, in A. Kopecka and B. Narasimhan (eds.), Events of “putting” and “taking”: A Crosslinguistic Perspective. Amsterdam: Benjamins. Gullberg, M. and Narasimhan, B. 2010. ‘What gestures reveal about the development of Dutch children’s placement verbs’, Cognitive Linguistics 21: 239–62. Gumperz, J. J. and Levinson, S. C. (eds.) 1996a. Rethinking Linguistic Relativity. Cambridge: Cambridge University Press.
262
References
Gumperz, J. J. and Levinson, S. C. 1996b. ‘Introduction: Linguistic relativity reexamined’, in J. J. Gumperz and S. C. Levinson (eds.), Rethinking Linguistic Relativity, pp. 1–18. Cambridge: Cambridge University Press. Hall, N. 2004. ‘Two concepts of causation’, in J. Collins, N. Hall, and L. Paul (eds.), Causation and Counterfactuals, pp. 225–76. Cambridge, MA: MIT Press. Hansson, K. and Bruce, B. 2002. ‘Verbs of placement in Swedish children with SLI’, International Journal of Communication Disorders 37(4): 401–14. Hard, B. M., Recchia, G., and Tversky, B. ms. ‘The shape of action’. Hard, B. M., Tversky, B., and Lang, D. S. 2006. ‘Making sense of abstract events: Building event schemas’, Memory and Cognition 34: 1221–35. Hardin, C. L. and Maffi, L. (eds.) 1997. Color Categories in Thought and Language. Cambridge: Cambridge University Press. Hasegawa, Y. 1996. ‘A study of Japanese clause linkage: the connective TE in Japanese’, Studies in Japanese Linguistics 5. Stanford: CSLI. Hayashi, A. 2003. Lexicalization of motion events in Japanese. Unpublished PhD dissertation, University of Oregon. Hazout, I. 2004. ‘The syntax of existential constructions’, Linguistic Inquiry 35: 393– 430. Heeschen, V. 2001. ‘Event-formulas: sentences as minimal narratives’, in A. Pawley, M. Ross, and D. Tryon (eds.), The Boy from Bundaberg: Studies in Melanesian Linguistics in Honour of Tom Dutton, pp. 155–73. Canberra: Pacific Linguistics. Hegarty, M. 2004. ‘Mechanical reasoning by mental simulation’, TRENDS in Cognitive Sciences 8: 280–5. Heider, F. and Simmel, M. 1944. ‘An experimental study of apparent behavior’, American Journal of Psychology 57: 243–59. Henderson, J. M. and Ferreira, F. 2004a. ‘Scene perception for psycholinguists’, in J. M. Henderson and F. Ferreira (eds.), The Interface of Language, Vision, and Action: Eye Movements and the Visual Word, pp. 1–58. New York: Psychology Press. (eds.) 2004b. The Interface of Language, Vision, and Action: Eye Movements and the Visual Word. New York: Psychology Press. Hetzron, H. 1975. ‘The presentative movement, or why the ideal word order is V.S.O.P.’, in C. N. Li (ed.), Word Order and Word Order Change, pp. 345–88. Austin, TX: University of Texas Press. Heuer, H. and Keele, S. W. (eds.) 1996. Handbook of Perception and Action, vol. 2: Motor Skills. London: Academic Press. Hickmann, M. 2007. ‘Static and dynamic location in French: Developmental and crosslinguistic perspectives’, in M. Aurnague, M. Hickmann, and L. Vieu (eds.), The Categorization of Spatial Entities in Language and Cognition, pp. 205–31. Amsterdam: Benjamins. Hickmann, M. and Hendriks, H. 2006. ‘Static and dynamic location in French and English’, First Language 26(1): 103–35. Higginbotham, J. 2000. ‘On events in linguistic semantics’, in J. Higginbotham, F. Pianesi, and A. C. Varzi (eds.), Speaking of Events, pp. 49–79. Oxford: Oxford University Press. Higginbotham, J., Pianesi, F., and Varzi, A. C. 2000. Speaking of Events. Oxford: Oxford University Press.
References
263
Hinder, M. R. and Milner, T. E. 2003. ‘The case for an internal dynamics model versus equilibrium point control in human movement’, Journal of Physiology 549: 953–63. Hitchcock, C. 2001. ‘The intransitivity of causation revealed in equations and graphs’, Journal of Philosophy 98: 273–99. 2002. ‘Probabilistic causation’, Stanford Encyclopedia of Philosophy. http://plato. stanford.edu/entries/causation-probabilistic/ Hoffman, D. D. 2000. Visual Intelligence: How We Create What We See. New York: W.W. Norton and Company. Hoffman, D. D. and Richards, W. A. 1984. ‘Parts of recognition’, Cognition 18: 65–96. Hoffman, J. E. and Subramaniam, B. 1995. ‘The role of visual attention in saccadic eye movements’, Perception and Psychophysics 57: 787–95. Holler, J. and Beattie, G. 2003. ‘Pragmatic aspects of representational gestures. Do speakers use them to clarify verbal ambiguity for the listener?’, Gesture 3(2): 127–54. Hollingworth, A. and Henderson, J. 1998. ‘Does consistent scene context facilitate object perception?’, Journal of Experimental Psychology: General 127: 398–415. Hume, D. 1978 (1739). A Treatise of Human Nature, ed. by L. A. Selby-Bigge, 2nd edn, revised by P. H. Nidditch. Oxford: Oxford University Press. Imamizu, H., Uno, Y., and Kawato, M. 1995. ‘Internal representations of the motor apparatus: Implications from generalization in visuomotor learning’, Journal of Experimental Psychology: Human Perception and Performance 21: 1174–98. Irwin, D. E. 1992. ‘Memory for position and identity across eye movements’, Journal of Experimental Psychology: Learning, Memory, and Cognition 18: 307–17. Irwin, D. E. and Gordon, R. D. 1998. ‘Eye movements, attention and trans-saccadic memory’, Visual Cognition 5: 127–55. Jackendoff, R. 1983. Semantics and Cognition. Cambridge, MA: MIT Press. 1990. Semantic Structures. Cambridge, MA: MIT Press. Jenkins, J. J., Wald, J., and Pittenger, J. B. 1986. ‘Apprehending pictorial events’, in V. McCabe and G. J. Balzano (eds.), Event Cognition: An Ecological Perspective, pp. 117–33. Hillsdale, NJ: Lawrence Erlbaum. Jespersen, O. 1965. A Modern English Grammar on Historical Principles, vol. VI: Morphology. London: George Allen and Unwin Ltd. Johansson, G. 1973. ‘Visual perception of biological motion and a model for its analysis’, Perception and Psychophysics 14: 201–11. 1975. ‘Visual motion perception’, Scientific American 232(6): 76–88. Kamp, H. 1979. ‘Events, instants and temporal reference’, in R. Bauerle, U. Egli, and A. von Stechow (eds.), Semantics from Different Points of View, pp. 27–54. Berlin: Springer. Kamp, H. and Reyle, U. 1993. From Discourse to Logic: Introduction to Model Theoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Dordrecht: Kluwer. Kaufman, L. 1974. Sight and Mind. New York: Oxford University Press. Kawato, M. 1999. ‘Internal models for motor control and trajectory planning’, Current Opinion in Neurobiology 9: 718–27. Kay, P. and Kempton, W. 1984. ‘What is the Sapir-Whorf hypothesis?’, American Anthropologist 86(1): 65–79.
264
References
Kellerman, E. 1995. ‘Crosslinguistic influence: Transfer to nowhere?’, Annual Review of Applied Linguistics 15: 125–50. Kelly, S. D., Barr, D. J., Breckinridge Church, R., and Lynch, K. 1999. ‘Offering a hand to pragmatic understanding: The role of speech and gesture in comprehension and memory’, Journal of Memory and Language 40(4): 577–92. Kemmer, S. and A. Verhagen 1994. ‘The grammar of causatives and the conceptual structure of events’, Cognitive Linguistics 5: 115–56. Kendon, A. 1980. ‘Gesticulation and speech: Two aspects of the process of utterance’, in M. R. Key (ed.), The Relationship of Verbal and Nonverbal Communication, pp. 207–27. The Hague: Mouton de Gruyter. 2004. Gesture. Visible Action as Utterance. Cambridge: Cambridge University Press. Kita, S. 1999. ‘Japanese enter/exit verbs without motion semantics’, Studies in Language 23(2): 307–30. ¨ urek, ¨ Kita, S. and Ozy A. 2003. ‘What does cross-linguistic variation in semantic coordination of speech and gesture reveal?: Evidence for an interface representation of spatial thinking and speaking’, Journal of Memory and Language 48(1): 16–32. Kita, S., van Gijn, I., and van der Hulst, H. 1998. ‘Movement phases in signs and cospeech gestures, and their transcription by human coders’, in I. Wachsmuth and M. Fr¨ohlich (eds.), Gesture and Sign Language in Human–Computer Interaction, pp. 23–35. Berlin: Springer. Klein, W. 1994. Time in Language. London: Routledge. 2006. ‘On finiteness’, in V. van Geenhoven (ed.), Semantics in Acquisition, pp. 245–72. Dordrecht: Springer. Klein, W., Li, P., and Hendricks, H. 2000. ‘Aspect and assertion in Mandarin Chinese’, Natural Language and Linguistic Theory 18: 723–70. Kopecka, A. and Narasimhan, B. (eds.) forthcoming. Events of “putting” and “taking”: A Crosslinguistic Perspective. Amsterdam: Benjamins. Kosslyn, S. M. 1980. Image and Mind. Cambridge, MA: Harvard University Press. Krauss, R. K., Chen, Y., and Gottesman, R. F. 2000. ‘Lexical gestures and lexical access: a process model’, in D. McNeill (ed.), Language and Gesture, pp. 261–83. Cambridge: Cambridge University Press. Kreysa, H., Glanemann, R., B¨olte, J., Zwitserlood, P., and Dobel, C. ms. ‘Where is the action? An eyetracking study on the description of photorealistic events’. Krifka, M. 1998. ‘The origins of telicity’, in S. Rothstein (ed.), Events and Grammar, pp. 197–235. Dordrecht: Kluwer. Kruschke, J. K. and Fragassi, M. M. 1996. ‘The perception of causality: Feature binding in interacting objects’, in G. W. Cottrell (ed.), Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society, pp. 441–6. Hillsdale, NJ: Lawrence Erlbaum. ¨ Kuntay, A. and Slobin, D. I. 1996. ‘Listening to a Turkish mother: Some puzzles for acquisition’, in D. I. Slobin, J. Gerhardt, A. Kyratzis, and J. Guo (eds.), Social Interaction, Social Context, and Language: Essays in Honor of Susan Ervin-Tripp, pp. 265–86. Hillsdale, NJ: Lawrence Erlbaum. Kurby, C. A., Zacks, J. M., Shriver, S., Mehta, R., and Brewer, S. 2008. ‘Event memory and hierarchical segmentation in younger and older adults’, Cognitive Aging Conference, Atlanta, Georgia.
References
265
Kurtzer, I., Herter, T. M., and Scott, S. H. 2005. ‘Random change in cortical load representation suggests distinct control of posture and movement’, Nature Neuroscience 8: 498–504. La Heij, W. 2005. ‘Selection processes in monolingual and bilingual lexical access’, in J. F. Kroll and A. M. De Groot (eds.), Handbook of Bilingualism. Psycholinguistic Approaches, pp. 289–307. Oxford: Oxford University Press. Labov, W. 1973. Language in the Inner City. Philadelphia: University of Pennsylvania Press. Lambrecht, K. 1994. Information Structure and Sentence Form: Topic, Focus, and the Mental Representation of Discourse Referents. Cambridge: Cambridge University Press. Lane, J. 2007. Kalam Serial Verb Constructions. Canberra: Pacific Linguistics. Lascarides, A. 1992. ‘Knowledges, causality, and temporal representation’, Linguistics 30(5): 941–73. Lemmens, M. 2002a. ‘The semantic network of Dutch posture verbs’, in J. Newman (ed.), The Linguistics of Sitting, Standing, and Lying, pp. 103–39. Amsterdam: Benjamins. 2002b. ‘Tracing referent location in oral picture descriptions’, in A. Wilson, P. Rayson and T. McEnery (eds.), A Rainbow of Corpora. Corpus Linguistics and the Languages of the World, pp. 73–85. M¨unchen: Lincom-Europa. 2006. ‘Caused posture: experiential patterns emerging from corpus research’, in A. Stefanowitsch and S. Gries (eds.), Corpora in Cognitive Linguistics. Corpus-based Approaches to Syntax and Lexis, pp. 263–98. Berlin: Mouton de Gruyter. Leslie, A. M. 1984. ‘Spatiotemporal continuity and the perception of causality in infants’, Perception 13: 287–305. Leslie, A. M. and Keeble S. 1987. ‘Do six-month-old infants perceive causality?’, Cognition 25: 265–88. Lesser, H. 1977. ‘The growth of perceived causality in children’, The Journal of Genetic Psychology 130: 145–52. Levelt, W. J. M. 1989. Speaking: From Intention to Articulation. Cambridge, MA: Bradford Books/MIT Press. Levelt, W. J. M., Roelofs, A., and Meyer, A. S. 1999. ‘A theory of lexical access in speech production’, Behavioral and Brain Sciences 22: 1–75. Levin, B. 1993. English Verb Classes and Alternations. Chicago: University of Chicago Press. 2007. ‘The lexical semantics of verbs I: Introduction’, 2007 Summer Institute of Linguistics at Stanford University, manuscript handout. Levin, B. and Rappaport Hovav, M. 1994. ‘A preliminary analysis of causative verbs in English’, Lingua 92: 35–77. 1995. Unaccusativity: At the Syntax-Lexical Semantics Interface. Cambridge, MA: MIT Press. 1996. ‘From lexical semantics to argument realization’, in H. Borer (ed.), Handbook of Morphosyntax and Argument Structure. Dordrecht: Kluwer. Levinson, S. C. 1996. ‘Frames of reference and Molyneux’s question: Cross-linguistic evidence’, in P. Bloom, M. A. Peterson, L. Nadel, and M. Garrett (eds.), Space and Language, pp. 109–69. Cambridge, MA: MIT Press.
266
References
2000. ‘Y´elˆı dnye and the theory of basic color terms’, Journal of Linguistic Anthropology 10: 3–55. 2003. Space in Language and Cognition: Explorations in Cognitive Diversity, Cambridge: Cambridge University Press. Levinson, S. C., Meira, S., and the Language and Cognition Group 2003. ‘Natural concepts’ in the spatial topological domain – adpositional meanings in crosslinguistic perspective: an exercise in semantic typology’, Language 79(3): 485– 516. Levinson, S. C. and Wilkins, D. P. 2006a. ‘The background to the study of the language of space’, in S. C. Levinson and D. P. Wilkins (eds.), Grammars of Space. Explorations in Cognitive Diversity, pp. 1–23. Cambridge: Cambridge University Press. (eds.) 2006b. Grammars of Space: Towards a Semantic Typology. Cambridge: Cambridge University Press. Levy, E. T. and McNeill, D. 1992. ‘Speech, gesture, and discourse’, Discourse Processes 15(3): 277–301. Lewis, D. 1973. ‘Causation’, Journal of Philosophy 70: 556–67. 1986. On The Plurality of Worlds. Oxford: Blackwell. 2000. ‘Causation as influence’, Journal of Philosophy 97: 182–97. Li, F. F., Van Rullen, R., Koch, C., and Perona, P. 2002. ‘Rapid natural scene categorization in the near absence of attention’, Proceedings of the National Academy of Science 99: 9596–601. Liddell, S. K. 1994. ‘Tokens and surrogates’, in I. Ahlgren, B. Bergman, and M. Brennan (eds.), Perspectives on Sign Language Structure. Papers from the 5th International Symposium on Sign Language Research held in Salamanca, Spain, May 25–30, 1992, vol. 1, pp. 105–19. Durham: Isla. 1995. ‘Real, surrogate, and token space: Grammatical consequences in ASL’, in K. Emmorey and J. Reilly (eds.), Language, Gesture, and Space, pp. 19–41. Hillsdale, NJ: Lawrence Erlbaum. 2003. Grammar, Gesture, and Meaning in American Sign Language. Cambridge: Cambridge University Press. Liddell, S. K. and Metzger, M. 1998. ‘Gesture in sign language discourse’, Journal of Pragmatics 30: 657–97. Lillo-Martin, D. 1995. ‘The point of view predicate in American Sign Language’, in K. Emmorey and J. Reilly (eds.), Language, Gesture, and Space, pp. 155–70. Hillsdale, NJ: Erlbaum. Lillo-Martin, D. and Klima, E. 1990. ‘Pointing out differences: ASL pronouns in syntactic theory’, in S. Fischer and P. Siple (eds.), Theoretical Issues in Sign Language Research, vol. 1, pp. 191–210. Chicago: University of Chicago Press. Livengood, J. and Machery, E. 2007. ‘The folk probably don’t think what you think they think: Experiments on causation by absence’, Midwest Studies in Philosophy 31: 107–27. Loftus, E. 1980 [1996]. Eyewitness Testimony. Cambridge, MA: Harvard University Press. Lucy, J. A. 1992. Language Diversity and Thought: A Reformulation of the Linguistic Relativity Hypothesis. Cambridge: Cambridge University Press.
References
267
Lucy, J. A. and Gaskins, S. 2001. ‘Grammatical categories and the development of classification preferences: A comparative approach’, in S. C. Levinson and M. Bowerman (eds.), Language Acquisition and Conceptual Development, pp. 257–83. Cambridge: Cambridge University Press. Luhtala, A. 2002. ‘On definitions in ancient grammar’, in P. Swiggers and A. Wouters (eds.), Grammatical Theory and Philosophy of Language in Antiquity, pp. 257–85. Leuven: Peeters Publishers. Mackie, J. L. 1974. The Cement of the Universe. Oxford: Oxford University Press. Majid, A., Staden, M. von, Boster, J., and Bowerman, M. 2004. ‘Event categorization: A cross-linguistic perspective’, in K. Forbus, D. Gentner, and T. Regier (eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society, pp. 885–90. Mahwah, NJ: Erlbaum. Majnep, I. S. and Bulmer, R. 1983. ‘Some food plants in our Kalam forests’, Department of Anthropology Working Papers no. 63, University of Auckland. 1990. Aps basd skop kmn ak pak n˜ belgpal. Kalam hunting traditions, Vols 1–6, ed. by A. Pawley. University of Auckland. Department of Anthropology Working Papers nos. 85–90. n.d. Kalam hunting traditions. Department of Anthropology Working Papers nos. 7–12, ed. by A. Pawley. Printout. Department of Linguistics, Research School of Pacific and Asian Studies, Australian National University. Majnep, I. S. and Pawley, A. n.d. Kalam plant lore. Computer printout. Department of Linguistics, Research School of Pacific and Asian Studies, Australian National University. Mandel, D. R. and Lehman, D. R. 1996. ‘Counterfactual thinking and ascriptions of cause and preventability’, Journal of Personality and Social Psychology 71: 450–63. Margetts, A. and Austin, P. 2007. ‘Three-participant events in the languages of the world: towards a cross-linguistic typology’, Linguistics 45: 393–451. Matsumoto, Y. 2003. ‘Typologies of lexicalization patterns and event integration: Clarifications and reformulations’, in S. Chiba (ed.), Empirical and Theoretical Investigations into Language: A Festschrift for Masaru Kajita, pp. 403–18. Tokyo: Kaitakusha. Mayberry, R. I. and Jaques, J. 2000. ‘Gesture production during stuttered speech: insights into the nature of gesture–speech integration’, in D. McNeill (ed.), Language and Gesture, pp. 199–214. Cambridge: Cambridge University Press. Mayberry, R. I. and Nicoladis, E. 2000. ‘Gesture reflects language development: Evidence from bilingual children’, Current Directions in Psychological Science 9(6): 192–6. Mayer, M. 1969. Frog, where are you? New York: Dial Press. McCabe, V. and Balzano, G. J. (eds.) 1986. Event Cognition: An Ecological Perspective. Hillsdale, NJ: Lawrence Erlbaum. McClave, E. 2001. ‘The relationship between spontaneous gestures of the hearing and American Sign Language’. Gesture 1:1, 51–72. McDonald, B. H. 1982. Aspects of the American Sign Language predicate system. Unpublished PhD dissertation, State University of New York, Buffalo. McGrath, S. 2003. ‘Causation and the making/allowing distinction’, Philosophical Studies 114: 81–106.
268
References
2005. ‘Causation by omission: A dilemma’, Philosophical Studies 123: 125–48. McMahan, J. 1993. ‘Killing, letting die, and withdrawing aid’, Ethics 103: 250–79. McNally, L. 2009. ‘Properties, entity correlates of properties, and existentials’, in A. Giannakidou and M. Rathert (eds.), Quantification, Definiteness, and Nominalization, pp. 163–87. Oxford: Oxford University Press. McNeill, D. 1985. ‘So you think gestures are nonverbal?’, Psychological Review 92(3): 271–95. 1992. Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press. 2000a. ‘Analogic/Analytic representations and cross-linguistic differences in thinking for speaking’, Cognitive Linguistics 11(1/2): 43–60. 2000b. ‘Growth points, catchments, and contexts’, Cognitive Studies. Bulletin of the Japanese Cognitive Science Society 7(1): 22–36. 2005. Gesture and Thought. Chicago: University of Chicago Press. McNeill, D. and Duncan, S. D. 2000. ‘Growth points in thinking-for-speaking’, in D. McNeill (ed.), Language and Gesture, pp. 141–61. Cambridge: Cambridge University Press. McNeill, D. and Levy, E. T. 1982. ‘Conceptual representations in language activity and gesture’, in R. J. Jarvella and W. Klein (eds.), Speech, Place, and Action. Studies in Deixis and Related Topics, pp. 271–95. Chichester: John Wiley. McNeill, D., Levy, E. T., and Cassell, J. 1993. ‘Abstract deixis’, Semiotica 95(1/2): 5–19. Meier, R. P. 2002. ‘Why different, why the same?’, in R. P. Meier, K. Cormier, D. Quinto-Pozos (eds.), Modality and Structure in Signed and Spoken Languages, pp. 1–25. Cambridge: Cambridge University Press. Melinger, A. and Levelt, W. J. M. 2004. ‘Gesture and the communicative intention of the speaker’, Gesture 4(2): 119–41. Menzies, P. 2004. ‘Difference-making in context’, in J. Collins, N. Hall, and L. Paul (eds.), Causation and Counterfactuals, pp. 139–80. Cambridge, MA: MIT Press. Metzger, M. 1995. ‘Constructed dialogue and constructed action in American Sign Language’, in C. Lucas (ed.), Sociolinguistics in Deaf Communities, pp. 255–71. Washington DC: Gallaudet University Press. van der Meulen, F. F. 2001. Moving Eyes and Naming Objects. MPI series in psycholinguistics. Nijmegen: Max Planck Institute. Meyer, A. S. 2004. ‘The use of eye tracking in studies of sentence generation’, in J. M. Henderson and F. Ferreira (eds.), The Interface of Language, Vision, and Action, pp. 191–212. Hove: Psychology Press. Meyer, A. S. and Dobel, C. 2003. ‘Application of eye tracking in speech production research’, in J. Hy¨on¨a, R. Radach, and H. Deubel (eds.), The Mind’s Eye: Cognitive and Applied Aspects of Eye Movement Research, pp. 253–72. Amsterdam: Elsevier. Meyer, A. S. and van der Meulen, F. F. 2000. ‘Phonological priming effects on speech onset latencies and viewing times in object naming’, Psychonomic Bulletin and Review 7: 314–19. Meyer, A. S., Sleiderink, A. M., and Levelt, W. J. M. 1998. ‘Viewing and naming objects: eye movements during noun phrase production’, Cognition 66: B25–B33. Michotte, A. E. 1963 [1946]. The Perception of Causality. New York: Basic Books.
References
269
Michotte, A. E. and Thin`es, G. 1963. ‘La causalit´e perceptive [Perceptual causality]’, Journal de Psychologie Normale et Pathologique 60: 9–36. Reprinted [1991] in G. Thin`es, A. Costall and G. Butterworth (eds.), Michotte’s Experimental Phenomenology of Perception, pp. 66–87. Hillsdale, NJ: Lawrence Erlbaum. (English translation by the editors.) Miles, M. 2000. ‘Signing in the Seraglio: Mutes, dwarfs and gestures at the Ottoman Court 1500–1700’, Disability & Society 15(1): 115–34. Milner, T. E. and Franklin, D. W. 2005. ‘Impedance control and internal model use during the initial stage of adaptation to novel dynamics in humans’, Journal of Physiology 567: 651–64. Milner, T. E., Franklin, D. W., Imamizu, H., and Kawato, M. 2007. ‘Central control of grasp: Manipulation of objects with complex and simple dynamics’, Neuroimage 36: 388–95. Morgan, G. 1999. ‘Event packaging in BSL discourse’, in E. Winston (ed.), Storytelling and Conversation: Discourse in Deaf Communities, pp. 27–58. Washington, DC: Gallaudet University Press. Morgan, J. L. and Meyer, A. S. 2005. ‘Processing of extrafoveal objects during multiple-object naming’, Journal of Experimental Psychology 31: 428–42. Morrison, J. B. and Tversky, B. 2005. ‘Bodies and their parts’, Memory and Cognition, 33: 696–709. ¨ Muller, C. 1994. ‘Semantic structure of motional gestures and lexicalization patterns in Spanish and German descriptions of motion-events’, in K. Beals, J. M. Denton, R. Knippen, L. Melnar, H. Suzuki, and E. Zeinfeld (eds.), Papers from the Annual Regional Meeting of the Chicago Linguistic Society. The main session, vol. 30, pp. 281–95. Chicago: Chicago Linguistic Society. 1998. Redebegleitende Gesten. Kulturgeschichte-Theorie-Sprachvergleich. Berlin: Berlin Verlag Arno Spitz GmbH. Naigles, L. R. and Hoff-Ginsberg, E. 1998. ‘Why are some verbs learned before other verbs? Effects of input frequency and structure on children’s early verb use’, Journal of Child Language 25: 95–120. Naigles, L. R. and Terrazas, P. 1998. ‘Motion-verb generalizations in English and Spanish: Influences of language and syntax’, Psychological Science 9(5): 363–9. Naigles, L. R., Eisenberg, A. R., Kako, E. T., Highter, M., and McGraw, N. 1998. ‘Speaking of motion: Verb use in English and Spanish’, Language & Cognitive Processes 13(5): 521–49. Narasimhan, B. and Brown, P. 2008. ‘Getting the INSIDE story: Learning to talk about containment in Tzeltal and Hindi’, in V. C. Mueller Gathercole (ed.), Routes to Language: Studies in Honor of Melissa Bowerman, pp. 97–132. Mahwah, NJ: Lawrence Erlbaum. Narasimhan, B. and Gullberg, M. 2006. ‘Perspective-shifts in event descriptions in Tamil child language’, Journal of Child Language 33(1): 99–124. 2010. ‘The role of input frequency and semantic transparency in the acquisition of verb meaning: evidence from placement verbs in Tamil and Dutch.’ Journal of Child Language doi: 10.1017/S0305 00091 0000164. Narasimhan, B., Eisenbeiß, S., and Brown, P. 2007a. ‘Two’s company, more is a crowd: the linguistic encoding of multiple-participant events’, Linguistics 45(3): 383–92.
270
References
(eds.) 2007b. ‘Special issue on the linguistic encoding of multiple-participant events’, Linguistics 45(3): 383–681. Neumann, O. and Sanders, A. F. (eds.) 1996. Handbook of Perception and Action, vol. 3: Attention. London: Academic Press. Newman, J. 2002a. ‘A cross-linguistic overview of the posture verbs “sit”, “stand”, and “lie”’, in J. Newman (ed.), The Linguistics of Sitting, Standing, and Lying, pp. 1–24. Amsterdam: Benjamins. (ed.). 2002b. The Linguistics of Sitting, Standing, and Lying. Amsterdam: Benjamins. Newman, J. and Rice, S. 2004. ‘Patterns of usage for English SIT, STAND, and LIE: A cognitively inspired exploration in corpus linguistics’, Cognitive Linguistics 15(3): 351–96. Newport, E. and Supalla, T. 2000. ‘Sign language research at the millennium’, in K. Emmorey and H. Lane (eds.), The Signs of Language Revisited. An Anthology to Honor Ursula Bellugi and Edward Klima, pp. 103–14. Mahwah, NJ: Lawrence Erlbaum. Newtson, D. 1973. ‘Attribution and the unit of perception of ongoing behavior’, Journal of Personality and Social Psychology 28(1): 28–38. Newtson, D. and Engquist, G. 1976. ‘The perceptual organization of ongoing behavior’, Journal of Experimental Social Psychology 12(5): 436–50. Nyst, V. 2004. ‘Verb series of non-agentive motion in Adamorobe Sign Language (Ghana)’, poster presented at Theoretical Issues in Sign Language Research 8, held in Barcelona, Spain, Sept. 30–Oct. 2, 2004. 2007. A descriptive analysis of Adamorobe Sign Language (Ghana). Unpublished PhD dissertation, University of Amsterdam. Oakes, L. M. 1994. ‘The development of infants’ use of continuity cues in their perception of causality’, Developmental Psychology 30: 869–79. Odlin, T. 2005. ‘Crosslinguistic influence and conceptual transfer: What are the concepts?’, Annual Review of Applied Linguistics 25: 3–25. Oh, K. J. 2003. Language, cognition, and development: motion events in English and Korean. Unpublished PhD dissertation, University of California, Berkeley. Ohta, K. and Laboissi`ere, R. 2006. ‘Underlying principles of trajectory formation for human movement in dynamical environments’, International Congress Series 1291: 97–100. Oliva, A. and Torralba, A. 2001. ‘Modeling the shape of the scene: A holistic representation of the spatial envelope’, International Journal of Computer Vision 42: 145–75. Oliva, A., Wolfe, J. M., and Arsenio, H. C. 2004. ‘Panoramic search: the interaction of memory and vision in search through a familiar scene’, Journal of Experimental Psychology: Human Perception and Performance 30: 1132–46. ¨ urek, ¨ Ozy A., Kita, S., Allen, S. E. M., Furman, R., and Brown, A. 2005. ‘How does linguistic framing of events influence co-speech gestures? Insights from crosslinguistic variations and similarities’, Gesture 5(1/2): 219–40. ¨ urek, ¨ Ozy A., Willems, R. M., Kita, S., and Hagoort, P. 2007. ‘On-line integration of semantic information from speech and gesture: Insights from event-related brain potentials’, Journal of Cognitive Neuroscience 19: 605–16. Palmer, S., Rosch, E., and Chase, P. 1981. ‘Canonical perspective and the perception of objects’, in J. B. Long and A. D. Baddeley (eds.), Attention and Performance, IX, pp. 135–51. Hillsdale, NJ: Lawrence Erlbaum.
References
271
Papafragou, A., Massey, C., and Gleitman, L. 2002. ‘Shake, rattle, ’n’ roll: The representation of motion in language and cognition’, Cognition 84(2): 189–219. Papaxanthis, C., Pozzo, T., and McIntyre, J. 2005. ‘Kinematic and dynamic processes for the control of pointing movements in humans revealed by short-term exposure to microgravity’, Neuroscience 135: 371–83. Parsons, P. 1990. Events in the Semantics of English: A Study in Subatomic Semantics. Cambridge, MA: MIT Press. Pauwels, P. 2000. Put, Set, Lay and Place: A Cognitive Linguistic Approach to Verbal Meaning. M¨unchen: Lincom Europa. Pavlenko, A. 1999. ‘New approaches to concepts in bilingual memory’, Bilingualism: Language and Cognition 2(3): 209–30. Pawley, A. 1987. ‘Encoding events in Kalam and English: different logics for reporting experience’, in R. S. Tomlin (ed.), Coherence and Grounding in Discourse, pp. 329–60. Amsterdam: Benjamins. 1993. ‘A language which defies description by ordinary means’, in W. Foley (ed.), The Role of Theory in Language Description, pp. 87–129. Berlin: Mouton de Gruyter. 2008. ‘Compact versus narrative serial verb constructions in Kalam’, in G. Senft (ed.), Serial Verb Constructions in Austronesian and Papuan Languages, pp. 171–202. Canberra: Pacific Linguistics. Pawley, A. and Bulmer, R. 2003. A Dictionary of Kalam with Ethnographic Notes. Printout. Department of Linguistics, Research School of Pacific and Asian Studies, Australian National University. Pawley, A. and Lane, J. 1998. ‘From event sequence to grammar: serial verb constructions in Kalam’, in A. Siewierska and S. J. Jung (eds.), Case, Typology and Grammar, pp. 201–27. Amsterdam: Benjamins. Pawley, A. and Syder, F. 1983. ‘Two puzzles for linguistic theory: nativelike selection and nativelike fluency’, in J. Richards and R. Schmidt (eds.), Language and Communication, pp. 191–225. London: Longman. 2000. ‘The one clause at a time hypothesis’, in H. Riggenbach (ed.), Perspectives on Fluency, pp. 163–99. Ann Arbor: University of Michigan Press. Pederson, E. 1995. ‘Language as context, language as means: Spatial cognition and habitual language use’, Cognitive Linguistics 6(1): 33–62. Pederson, E., Danziger, E., Levinson, S. C., Kita, S., Senft, G., and Wilkins, D. P. 1998. ‘Semantic typology and spatial conceptualization’, Language 74(3): 557–89. Penfield, W. and Rasmussen, T. 1950. The Cerebral Cortex of Man. New York: Macmillan. Perniss, P. 2007a. ‘Achieving spatial coherence in German Sign Language narratives: the use of classifiers and perspective’, Lingua 117: 1315–38. 2007b. ‘Locative functions of simultaneous perspective constructions in German Sign Language narratives’, in M. Vermeerbergen, L. Leeson, and O. Crasborn (eds.), Simultaneity in Signed Languages: Form and Function, pp. 27–55. Amsterdam: John Benjamins. ¨ urek, ¨ Perniss, P. and Ozy A. 2004. ‘Differences in the expressions of spatial relationships in German (DGS) and Turkish (T˙ID) sign languages’, poster presented at Theoretical Issues in Sign Language Research 8, held in Barcelona, Spain, Sept. 30–Oct. 2, 2004.
272
References
2008. ‘Representations of action, motion and location in sign space: A comparison of German (DGS) and Turkish (T˙ID) sign language narratives’, in J. Quer (ed.), Signs of the Time: Selected Papers from TISLR8. Hamburg: Signum Press. Piaget, J. and Inhelder, B. 1956. The Child’s Conception of Space. London: Routledge. Pianesi, F. and Varzi, A. C. 2000. ‘Events and event talk: An introduction’, in J. Higginbotham, F. Pianesi, and A. C. Varzi (eds.), Speaking of Events, pp. 3–47. Oxford: Oxford University Press. Pinker, S. 1989. Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: MIT Press. Posner, M. I. 1980. ‘Orienting of attention’, Quarterly Journal of Experimental Psychology 32: 3–25. Potter, M.C. 1975. ‘Meaning in visual search’, Science 187: 965–6. Potter, M. C. and Levi, E. I. 1969. ‘Recognition memory for a rapid sequence of pictures’, Journal of Experimental Psychology 81: 10–15. Prinz, W. 1997. ‘Perception and action planning’, European Journal of Cognitive Psychology 9: 129–54. Prinz, W. and Bridgeman, B. (eds.) 1995. Handbook of Perception and Action, vol. 1: Perception. London: Academic Press. ¨ Pulvermuller, F. 2005. ‘Brain mechanisms linking language and action’, Nature Reviews Neuroscience 6(7): 576–82. Rayner, K. 1998. ‘Eye movements in reading and information processing: 20 years of research’, Psychological Bulletin 124: 372–422. Reinhart, T. 2002. ‘The theta system: An overview’, Theoretical Linguistics 28: 229– 90. Reinkensmeyer, D. J., Emken, J. L., and Crammer, S. C. 2004. ‘Robotics, motor learning, and neurological recovery’, Annual Review of Biomedical Engineering 6: 497–525. Rosch, E., Mervis, C. B., Gray, W., Johnson, D., and Boyes-Braem, P. 1976. ‘Basic objects in natural categories’, Cognitive Psychology 8: 382–439. Rothstein, S. (ed.) 1998a. Events and Grammar. Dordrecht: Kluwer. 1998b. ‘Introduction’, in S. Rothstein (ed.), Events and Grammar, pp. 1–11. Dordrecht: Kluwer. Russell, B. 1936. ‘On order in time’, Proceedings of the Cambridge Philosophical Society 32: 216–28. 1948. Human Knowledge. New York: Simon and Schuster. Ryle, G. 1949. The Concept of Mind. London: Hutchinson. Sachs, J. 1983. ‘Talking about the there and then: The emergence of displaced reference in parent–child discourse’, in K. E. Nelson (ed.), Children’s Language, vol. 4, pp. 1–28. Hillsdale, NJ: Lawrence Erlbaum. Sakel, J. 2004. A Grammar of Moset´en. Berlin: de Gruyter. Salmon, W. 1994. ‘Causality without counterfactuals’, Philosophy of Science 61: 297– 312. 1998. Causality and Explanation. Oxford: Oxford University Press. Sasse, H.-J. 2002. ‘Recent activity in the theory of aspect: Accomplishments, achievements, or just non progressive state?’, Linguistic Typology 6(2): 199–271. Schaffer, J. 2000. ‘Causation by disconnection’, Philosophy of Science 67: 285– 300.
References
273
Schank, R. C. and Abelson, R. P. 1977. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures. Hillsdale, NJ: Lawrence Erlbaum. Schembri, A. 2003. ‘Rethinking “classifiers” in signed languages’, in K. Emmorey (ed.), Perspectives on Classifier Constructions in Sign Languages, pp. 3–34. Mahwah, NJ: Lawrence Erlbaum. Schick, B. S. 1990. ‘Classifier predicates in American Sign Language’, International Journal of Sign Linguistics 1(1): 15–40. Schlottman, A. and Anderson, N. H. 1993. ‘An information integration approach to phenomenal causality’, Memory and Cognition 21(6): 785–801. Scholl, B. J. and Tremoulet, P. D. 2000. ‘Perceptual causality and animacy’, Trends in Cognitive Sciences 4: 299–309. Schulz, L., Kushnir, T., and Gopnik, A. 2007. ‘Learning from doing: Intervention and causal inference’, in A. Gopnick and L. Schulz (eds.), Causal Learning: Psychology, Philosophy, and Computation, pp. 67–85. Oxford: Oxford University Press. Schwartz, D. L. 1999. ‘Physical imagery: kinematic versus dynamic models’, Cognitive Psychology 38: 433–64. Schwartz, D. L. and Black, J. B. 1996. ‘Analog imagery in mental model reasoning: Depictive models’, Cognitive Psychology 30: 154–219. Serra Borneto, C. 1996. ‘Liegen and stehen in German: A study in horizontality and verticality’, in E. H. Casad (ed.), Cognitive Linguistics in the Redwoods, pp. 459–505. Berlin: Mouton de Gruyter. Seyfeddinipur, M. 2006. Disfluency: Interrupting speech and gesture. Unpublished PhD dissertation, Radboud University, Nijmegen. Shadmehr R. and Mussa-Ivaldi F. A. 1994. ‘Adaptive representation of dynamics during learning of a motor task’, Journal of Neuroscience 14: 3208–24. Shibatani, M. 1976a. The Grammar of Causative Constructions. New York: Academic Press. 1976b. ‘The grammar of causative constructions: a conspectus’, in M. Shibatani (ed.), Syntax and Semantics, vol. 6: The Grammar of Causative Constructions, pp. 1–40. New York: Academic Press. Shultz, T. R. 1982. ‘Rules of causal attribution’, Monographs of the Society for Research in Child Development 47: 1–51. Sinha, C. and Kuteva, T. 1995. ‘Distributed spatial semantics’, Nordic Journal of Linguistics 18(2): 167–99. Slobin, D. I. 1987. ‘Thinking for speaking’, Berkeley Linguistic Society, BLS 13: 435– 54. 1991. ‘Learning to think for speaking’, Pragmatics 1: 7–25. 1996a. ‘From “thought and language” to “thinking for speaking”’, in J. J. Gumperz and S. C. Levinson (eds.), Rethinking Linguistic Relativity, pp. 70–96. Cambridge: Cambridge University Press. 1996b. ‘Two ways to travel: Verbs of motion in English and Spanish’, in M. Shibatani and S. A. Thompson (eds.), Grammatical Constructions: Their Form and Meaning, pp. 70–96. Oxford: Oxford University Press. 2000. ‘Verbalized events: a dynamic approach to linguistic relativity and determinism’, in S. Niemeier and R. Dirven (eds.), Evidence for Linguistic Relativity, pp. 107–38. Amsterdam: Benjamins.
274
References
2003. ‘Language and thought online: Cognitive consequences of linguistic relativity’, in D. Gentner and S. Goldin-Meadow (eds.), Language in Mind: Advances in the Study of Language and Thought, pp. 157–91. Cambridge, MA: MIT Press. 2004. ‘How people move. Discourse effects of linguistic typology’, in C. L. Moder and A. Martinovic-Zic (eds.), Discourse Across Languages and Cultures, pp. 195–210. Amsterdam: Benjamins. 2006. ‘What makes manner of motion salient?: Explorations of linguistic typology, discourse, and cognition’, in M. Hickmann and S. Robert (eds.), Space in Languages: Linguistic Systems and Cognitive Categories, pp. 59–81. Philadelphia: John Benjamins. Sloman, S. A. 2005. Causal Models: How People Think about the World and its Alternatives. Oxford: Oxford University Press. Sloman, S. A., Barbey, K. A., and Hotaling, J. M. 2009. ‘A causal model theory of the meaning of cause, enable, and prevent’, Cognitive Science 33: 21–50. Smith, C. S. 1978. ‘Jespersen’s “move and change” class and causative verbs in English’, in M. A. Jazayery, E. C. Palome, and W. Winter (eds.), Linguistic and Literary Studies in Honor of Archibald A. Hill, vol. 2: Descriptive Linguistics, pp. 101–9. The Hague: Mouton. 1991. The Parameter of Aspect: Studies in Linguistics and Philosophy. Dordrecht: Kluwer. Song, G. 1996. ‘Causation, adicity and lexical aspect’, in M. Przezdziecki and L. Whaley (eds.), ESCOL ’95, pp. 299–307. Ithaca, NY: CLC. Spelke, E. S., Phillips, A. T., and Woodward, A. L. 1995. ‘Infants’ knowledge of object motion and human action’, in D. Sperber, D. Premack, and A. Premack (eds.), Causal Cognition: A Multidisciplinary Debate, pp. 44–78. Oxford: Oxford University Press. Spellman, B. A. and Mandel, D. R. 1999. ‘When possibility informs reality: Counterfactual thinking as a cue to causality’, Current Directions in Psychological Science 8: 120–3. Spellman, B. A., Kincannon, A. P., and Stose, S. J. 2005. ‘The relation between counterfactual and causal reasoning’, in D. R. Mandel, D. J. Hilton, and P. Catellani (eds.), The Psychology of Counterfactual Thinking, pp. 28–43. London: Routledge Research. van Staden, M. and Reesink, G. 2008. ‘A functional approach to verb serialization’, in G. Senft (ed.), Serial Verb Constructions in Austronesian and Papuan Languages, pp. 17–54. Canberra: Pacific Linguistics. van Staden, M., Bowerman, M., and Verhelst, M. 2006. ‘Some properties of spatial description in Dutch’, in S. C. Levinson and D. P. Wilkins (eds.), Grammars of Space. Explorations in Cognitive Diversity, pp. 475–511. Cambridge: Cambridge University Press. van Staden, M., Senft, G., Enfield, N., and Bohnemeyer, J. 2001. ‘Staged events’, in N. Enfield (ed.), ‘Manual’ for the Field Season 2001, pp. 115–25. Nijmegen: Max Planck Institute for Psycholinguistics. Strawson, P. F. 1959. Individuals. London: Methuen and Co. Ltd. von Stutterheim, C. and Klein, W. 2002. ‘Quaestio and L-perspectivation’, in C. F. Graumann and W. Kallmeyer (eds.), Perspective and Perspectivation in Discourse, pp. 59–88. Amsterdam: Benjamins.
References
275
¨ R. 2003. ‘Processes of conceptualisation in language von Stutterheim, C. and Nuse, production’, Linguistics (Special Issue: Perspectives in language production), 41: 851–81. von Stutterheim, C., Carroll, M., and Klein, W. 2003. ‘Two ways of construing complex temporal structures’, in F. Lenz (ed.), Deictic Conceptualization of Space, Time and Person, pp. 97–133. Berlin: de Gruyter. ¨ von Stutterheim, C., Nuse, R., and Murcia-Serra, J. 2002. ‘Cross-linguistic differences in the conceptualization of events’, in H. Hasselg˚ard, S. Johansson, B. Behrens, and C. Fabricius-Hansen (eds.), Information Structure in a Crosslinguistic Perspective, pp. 179–98. Amsterdam: Rodopi. Supalla, T. 1986. ‘The classifier system in American Sign Language’, in C. Craig (ed.), Noun Classification and Categorization, pp. 181–213. Philadelphia: John Benjamins. Supalla, T. and Webb, R. 1995. ‘The grammar of international sign’, in K. Emmorey and J. Reilly (eds.), Language, Gesture, and Space, pp. 333–54. Hillsdale, NJ: Lawrence Erlbaum. Talmy, L. 1985. ‘Lexicalization patterns: Semantic structure in lexical forms’, in T. Shopen (ed.), Language Typology and Syntactic Description, vol. 3: Grammatical Categories and the Lexicon, pp. 57–149. Cambridge: Cambridge University Press. 1988. ‘Force dynamics in language and cognition’, Cognitive Science 12: 49– 100. 1991. ‘Path to realization: A typology of event conflation’, Proceedings of the Berkeley Linguistics Society 17: 480–520. 2000a. Toward a Cognitive Semantics, vol. I: Concept Structuring Systems. Cambridge, MA: MIT Press. 2000b. Toward a Cognitive Semantics, vol. II: Typology and Process in Concept Structuring. Cambridge, MA: MIT Press. 2003a. ‘Concept structuring systems in language’, in M. Tomasello (ed.), The New Psychology of Language. Cognitive and Functional Approaches to Language Structure, vol. 2, pp. 15–46. Mahwah, NJ: Lawrence Erlbaum. 2003b. ‘The representation of spatial structure in spoken and signed language’, in K. Emmorey (ed.), Perspectives on Classifier Constructions in Sign Languages, pp. 169–96. Mahwah, NJ: Lawrence Erlbaum. 2008. ‘Aspects of attention in language’, in N. C. Ellis and P. Robinson (eds.), Handbook of Cognitive Linguistics and Second Language Acquisition, pp. 27–38. London: Routledge. Taylor, H. A. and Tversky, B. 1992. ‘Spatial mental models derived from survey and route descriptions’, Journal of Memory and Language 31: 261–82. Tenny, C. and Pustejovsky, J. 2000. ‘A history of events in linguistic theory’, in C. Tenny and J. Pustejovsky (eds.), Events as Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax, pp. 3–37. Stanford, CA: CSLI Publications. Thin`es, G., Costall, A., and Butterworth, G. 1991. Michotte’s Experimental Phenomenology of Perception. Hillsdale, NJ: Lawrence Erlbaum. Thorpe, S., Fize, D., and Marlot, C. 1996. ‘Speed of processing in the human visual system’, Nature 381: 520–2.
276
References
¨ Thorpe, S., Gegenfurtner, K. R., Fabre-Thorpe, M., and Bulthoff, H. H. 2001. ‘Detection of animals in natural images using far peripheral vision’, European Journal of Neuroscience 14: 869–76. Tomlin, R. S. 1997. ‘Mapping conceptual representations into linguistic representations: the role of attention in grammar’, in J. Nuyts and E. Pederson (eds.), Language and Conceptualization, pp. 162–89. Cambridge: Cambridge University Press. Treisman, A. and Gelade, G. 1980. ‘A feature-integration theory of attention’, Cognitive Psychology 12: 97–136. Tversky, B. and Hemenway, K. 1984. ‘Objects, parts, and categories’, Journal of Experimental Psychology: General 113: 169–93. Tversky, B. and Tuchin, M. 1989. ‘A reconciliation of evidence on eyewitness testimony: Comments on McCloskey and Zaragoza (1985)’, Journal of Experimental Psychology: General 118: 86–91. Tversky, B., Lee, P., and Zacks, J. M. 2004. ‘Events by hand and feet’, Spatial Cognition and Computation 4: 5–14. Tversky, B., Zacks, J. M., and Hard, B. M. 2008. ‘The structure of experience’, in T. Shipley and J. M. Zacks (eds.), Understanding Events: How Humans See, Represent, and Act on Events, pp. 436–65. Oxford: Oxford University Press. Van Oosten, J. 1984. ‘Sitting, standing and lying in Dutch: A cognitive approach to the distribution of the verbs zitten, staan, and liggen’, in J. van Oosten and J. Snapper (eds.), Dutch Linguistics at Berkeley, pp. 137–60. Berkeley: University of California, Berkeley. Van Rullen, R. and Thorpe, S. J. 2001. ‘The time course of visual processing: From early perception to decision-making’, Journal of Cognitive Neuroscience 13: 454– 61. Van Valin, R. D. Jr. and LaPolla, R. J. 1997. Syntax. Cambridge: Cambridge University Press. Vendler, Z. 1957. ‘Verbs and times’, Philosophical Review 56: 143–60. Verfaillie, K. and Daems, A. 1996. ‘The priority of the agent in visual event perception: On the cognitive basis of grammatical agent–patient asymmetries’, Cognitive Linguistics 7(2): 131–47. Verhagen, A. and Kemmer, S. 1997. ‘Interaction and causation: Causative constructions in modern standard Dutch’, Journal of Pragmatics 27: 61–82. Vogel, H. 1999. ‘Geschichte der Geh¨orlosenbildung’, in A. Beecken, J. Keller, S. Prillwitz, and H. Zienert (eds.), Grundkurs Deutsche Geb¨ardensprache, Stufe I, Arbeitsbuch, pp. 46–9. Hamburg: Signum Press. Volterra, V., Caselli, M. C., Capirci, O., and Pizzuto, E. 2005. ‘Gesture and the emergence and development of language’, in M. Tomasello and D. I. Slobin (eds.), Beyond Nature–Nurture: Essays in Honor of Elizabeth Bates, pp. 3–40. Mahwah, NJ: Lawrence Erlbaum. Warren, W. H. Jr. and Shaw, R. E. (eds.) 1985. Persistence and Change. Proceedings of the First International Conference on Event Perception. Hillsdale, NJ: Lawrence Erlbaum. Weinstein, S. 1968. ‘Intensive and extensive aspects of tactile sensitivity as a function of body part, sex, and laterality’, in D. R. Kenshalo (ed.), The Skin Senses, pp. 195–222. Springfield, IL: Charles C. Thomas.
References
277
White, P. A. 1999. ‘Towards a causal realist theory of causal understanding’, American Journal of Psychology 112: 605–42. 2006. ‘Theoretical notes: A causal asymmetry’, Psychological Review 113: 132–47. Whorf, B. L. 1956. Language, Thought, and Reality, ed. by J. B. Carroll. Cambridge, MA: MIT Press. Wierzbicka, A. 1980. Lingua Mentalis. Sydney: Academic Press. Wolfe, J. M. 1998. ‘Visual search’, in H. Pashler (ed.), Attention, pp. 13–73. Hove: Psychology Press. Wolfe, J. M., Klempen, N., and Dahlen, K. 2000. ‘Postattentive vision’, Journal of Experimental Psychology: Human Perception and Performance 26: 693–716. Wolff, P. 2003. ‘Direct causation in the linguistic coding and individuation of causal events’, Cognition 88: 1–48. 2007. ‘Representing causation’, Journal of Experimental Psychology, General 136: 82–111. 2008. ‘Dynamics and the perception of causal events’, in T. Shipley and J. Zacks (eds.), Understanding Events, pp. 555–88. Oxford: Oxford University Press. Wolff, P. and Zettergren, M. 2002. ‘A vector model of causal meaning’, in W. D. Gray and C. D. Schunn (eds.), Proceedings of the Twenty-fourth Annual Conference of the Cognitive Science Society, pp. 944–9. Mahwah, NJ: Lawrence Erlbaum. Woodward, J. 2003. Making Things Happen. Oxford: Oxford University Press. 2006. ‘Sensitive and insensitive causation’, Philosophical Review 115: 1–50. 2007. ‘Interventionist theories of causation in psychological perspective’, in A. Gopnik and L. Schulz (eds.), Causal Learning: Psychology, Philosophy, and Computation, pp. 19–36. Oxford: Oxford University Press. Wray, A. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Wunderlich, D. (ed.) 2006. Advances in the Theory of the Lexicon. Berlin: de Gruyter. Yoshioka, K. and Kellerman, E. 2006. ‘Gestural introduction of Ground reference in L2 narrative discourse’, International Review of Applied Linguistics 44(2): 171–93. Zacks, J. M. 2004. ‘Using movement and intentions to understand simple events’, Cognitive Science 28: 979–1008. Zacks, J. M. and Swallow, K. M. 2007. ‘Event segmentation’, Current Directions in Psychological Science 16: 80–4. Zacks, J. M. and Tversky, B. 2001. ‘Event structure in perception and conception’, Psychological Bulletin 127: 3–21. Zacks, J. M., Tversky, B., and Iyer, G. 2001. ‘Perceiving, remembering and communicating structure in events’, Journal of Experimental Psychology: General 136: 29–58. Zacks, J. M., Kumar, S., Abrams, R. A., and Mehta, R. 2009. ‘Using movement and intentions to understand human activity’, Cognition 112: 201–16. Zeshan, U. 2002. ‘Sign language in Turkey: The story of a hidden language’, Journal of Turkic Languages, 6(2): 229–74. Zwitserlood, I. 2003. Classifying hand configurations in Nederlandse Gebarentaal. Unpublished PhD dissertation, University of Utrecht, LOT.
Index
accomplishment, 66, 221, 224; see also aspect accusative case, 45, 136, 139, 157, 158 achievement, 66, 155, 217, 221; see also aspect act: see action action, 1–7, 9, 85–7, 89, 92, 95–9, 102–5, 118, 135–49, 152–5, 157, 164, 167–8, 169, 172, 178–81, 185, 190–1, 196, 200, 205–11, 214–15, 217, 221, 223–6, 237; see also event processing of, 1, 23 sequence, 43–5, 46 action region, 191, 196–204, 207, 211, 212–15 activity, 18, 23, 27–9, 31, 43–6, 84–6, 91–103, 111–12, 131, 166–7, 174, 205, 221–3, 237; see also aspect Adamorobe Sign Language, 90, 104 American Sign Language, 103–4 Apollonius Dyscolus, 2 apprehension phase (vision), 11, 190, 196, 204, 208, 213–14 Arabic, Modern Standard, 68 Aristotle, 4, 22 aspect, 4, 18–19, 22–4, 35, 38, 40, 68–9, 71; see also state, activity, accomplishment, achievement, event: linguistic representation of event structure, 30, 40, 41, 69 Atsugewi, 147 attention, 1, 10–11, 41–2, 44, 68–70, 83, 112–13, 128, 129, 133, 146–7, 149, 155–6, 159, 161, 167, 181, 185, 193–4, 196, 203–4, 213, 215, 216–17, 222, 227, 232 focus of, 11, 39, 227 shift of, 190–2, 195, 205–7, 211, 214, 227 visual, 11, 128, 192 bootstrapping, syntactic, 160 BowPed stimulus, 52 break points: see event, segmentation of
278
categorization, 43, 113–33, 148–52, 193, 196, 219 event, 9–10, 129 linguistic, 1, 43–4, 108 non-linguistic, 2, 108 category cognitive, 17 conceptual, 43–4 event, 43–6, 170, 187, 216 grammatical, 14, 17, 25, 110, 113, 227 semantic, 13–14, 22, 25, 41, 43–4, 134, 149–55, 164, 167, 227 causal chain, 9, 15, 28, 30, 44, 48–66, 242–3, 248–52 causality: see causation causation, 1, 6, 9, 11, 48–57, 181, 191, 225, 246–52; see also force dynamics by omission, 12, 228–31, 232, 237–9, 246–52 cause, concept of, 229–31, 234, 240–5 directness of, 53–5 double prevention, 238–9, 246–52 expression of: see causative mediation, 53, 56–8, 60, 62 prevent, concept of, 229–31, 246–51 relation, causal, 15, 44, 47, 50, 52–3, 61–6, 69, 225, 228–9, 232, 239–44, 248–52 relation composition, 239, 242–4 temporal priority, 231–3, 252 causation, theory causal Bayesian network theory, 230–1 Conserved Quantity Theory, 233 counterfactual theory, 228, 230, 238, 244 force theory, 12, 229, 246–52 generative, 233 Invariant Quantity Theory, 233 mechanism, 229, 231–3, 242, 252 mental model theory, 230 outcome theory, 229–31 probability raising model, 229–30 process theory, 229, 233, 239, 246, 252 Transference Theory, 233
Index causative (expression of causation), 24, 37, 51–3 lexical, 237 syntactic, 46, 51, 57–66, 123, 158, 170–1, 177 causative role, event participant affectee (AF), 53–7, 61–3 causee (CE), 53–62, 65 causer (CR), 53–8, 60–5 instrument (IN), 53–6, 59–61, 64 change of location, 62, 86, 110, 112, 170–2, 223, 237; see also spatial description: caused motion resultative, 29, 37, 55–7, 63–4 verb, caused-state-change, 51 change of state, 29, 54, 138, 189, 237 child language, 1, 10, 134, 143, 146–8, 163, 187 input (language acquisition), 10, 140, 154, 156–7, 164, 170, 187 clause, 2, 7–8, 9, 15–17, 18–19, 21, 24, 39–40, 45, 49, 68, 75–83, 110–11, 129, 134, 170, 180 clause-chaining, 19, 23, 25, 30, 36–7, 46 construction, chaining, 19, 30, 36, 46 strategy, analytic (Kalam), 17 clause linkage dependent clause, 71, 74–83 main clause, 60, 70–1, 74–82 relative clause, 70, 81 relative pronoun, 70, 75 Conceptual Semantics, 5 conceptualization, linguistic, 166, 169, 172–3, 183; see also thinking for speaking hypothesis construction (morphosyntactic), 7–9, 13–16, 40, 43, 46–9, 52, 55–67, 71, 78–83, 88–9, 92–103, 105, 135–6, 149, 159–63, 177, 185, 203, 230; see also event, linguistic representation of clause structure, 15, 48, 68 language, structure of, 3 containment (vs. support), 135–40, 144, 148–52, 171, 172 contiguity, spatio-temporal, 47, 54–7, 60–3, 231–3, 252 control, 53, 65, 170 Danish Sign Language, 103–4 dative case, 136–9, 149, 204 Davidson, Donald, 3–4 discourse, 4, 9, 36, 39, 65, 70, 82, 84–90, 102–5, 113, 130, 140, 161–4, 168 discourse accent, 187
279 discourse representation theory (DRT), 4–5 Dutch, 10–11, 50, 112, 168, 171–86, 197 ecological validity, 126 enabling causation, 54–7, 60, 63 allow, concept of, 230–1, 246–52 help, concept of, 240–2 English, 3, 7–10, 13–17, 22, 26–7, 29–30, 33, 35–42, 45–6, 49, 50, 52, 61, 66, 70–1, 74–9, 82, 110–24, 127–32, 134–6, 141–52, 156–62, 169–71 entailment, 38, 47–50, 61, 66, 237 episode, 15–17, 30–3, 35–40 verb, episodic, 36 event: see also state, activity, accomplishment, achievement, mereology coherence, 6, 190–1, 193, 196–9, 205–8, 211–15, 222–3 complexity, 7, 36–7, 40, 44–6, 49, 64, 173, 190–1, 200–7, 214 ECOM movie clips, 49–55 perception of, 5, 24, 191, 221, 223; see also perception, eye movement representation in sign language, 84, 91–4, 99 segmentation of, 6, 9, 13, 17, 24–6, 38, 41–2, 43–50, 55, 60, 62, 64–6, 72, 112, 221–2, 224–6 event, cognitive representation of, 1–12, 26, 37–42, 66–7, 68–72, 78–80, 108–9, 112–15, 126, 132–3, 167, 187, 221, 223–5, 252 event structure, 15, 196, 205 language-specificity, 8–11, 45, 69 mental state, 4 perception and action, integrated account of, 5 processing, event, 108, 114–15, 117, 129–30 schema, 5–7, 14, 36–7, 40–1, 45, 72, 135, 157, 221, 227, 233 event, linguistic representation of, 1–5, 7–10, 12, 13–17, 26, 37–42, 44–8, 65–7, 68–83, 88, 93, 97, 103, 105–6, 108–9, 125–31, 167, 169, 172–4, 185–8 construction, temporal properties, 48, 65, 81 event encoding phrase, 47 language-specificity, 8–11, 45, 69, 134, 155, 164, 167–74, 176–88 representation, semantic, 44–7 event concept, 5–7, 26, 43, 47
280
Index
event participant, 2–4, 11, 15, 30, 52–9, 65, 68–71, 79–83, 189, 197–208, 211–14; see also subject, object, primary object, secondary object agent, 29, 54, 86–8, 117, 133, 141, 170, 174, 176–7, 191, 197–208, 211–14 patient, 3, 6, 197–9, 202, 205–12, 240–5, 252 recipient, 2–4, 28, 133, 141, 175, 197–9, 202, 205–7, 211, 214, 233, 237 role, semantic (thematic), 3–4, 6, 49, 53, 209, 237 role identification, 210 theme, 2–4, 50, 102, 197, 205, 212 Ewe, 9, 55–8, 62–4 eye movement, 11, 190–2, 195–6, 203–4, 207–9, 212, 214; see also event: perception of eye tracking, 167, 190–2, 195–7, 200, 204–7, 213 fixation, 11, 191, 195–6, 199, 200–14 gaze pattern, 190, 197, 212–15 patient detection task (vision), 208–9 preview phase, 191, 195–9, 211–13 representation, visual, 187, 190 saccade, 191, 208, 214 vision, peripheral, 195, 204–5, 210–11 figure, 109–10, 119, 122, 133, 136–9, 141–9, 152, 157–9, 169–86 Finnish, 10, 134–8, 142–8, 151–5 first language acquisition: see child language force: see force dynamics vector, 239–45, 252 force dynamics, 12, 54–6, 63, 240–3 force transmission, 228, 232, 237, 246, 252 Frege, Gottlob, 4 French, 10–11, 110–12, 168, 171–87 Generative Semantics, 48 German, 9–10, 70, 74–82, 110–12, 134–40, 142–9, 152–5, 160–2, 200–5 German Sign Language (DGS), 9, 84–6, 89–93, 99–105 gesture (accompanying speech), 1, 10–11, 88, 167–9, 173–88, 210 figure-incorporating, 176 gesture chain, 181 gesture pattern, language specific, 168, 184–5 hand shape, lingering, 180–1 object incorporation, 173, 174 simple path, 176–84 speech and gesture, link between, 168
Hebrew, 112 Hindi, 10, 134, 137–40, 144–52, 155–8, 165 Hume, David, 6, 229, 233–4, 237, 252 iconicity, 4, 22, 87, 90, 167, 239; see also implicature: stereotypicality implicature, 47, 50; see also inference manner, 36 stereotypicality, 52 inference, 43, 47, 138–40, 150; see also implicature information perspective, 9, 68, 71, 80, 82–3, 167, 169, 173, 176, 184 existential, 9, 52, 68–70, 74–83 focus, 11, 39, 46, 64, 73, 79, 83, 129, 146, 149, 167–8, 174, 178–86, 238, 246 presentational, 68, 70–1, 75, 78 intention, 11, 43, 56, 108, 194, 199, 221–2, 225–6, 228, 239 goal, 108, 217, 221–6 subgoal, 221, 223–6 intonation, 8, 13, 19, 21–3, 26, 34–5, 39–41 Israeli Sign Language, 103 Italian, 78, 110, 112 Japanese, 9–10, 55–8, 60, 62–5, 115–16, 121–4, 128–9, 130–2, 168–9 Kalam, 7–8, 13–25, 27, 29–33, 35–8, 40–2, 45–6, 49, 55, 66 Korean, 128, 171 language–cognition interface, 38, 66, 109, 112–14, 118, 129–30, 133 representation, linguistic, 13–14, 17, 25–7 language use, habitual, 114–15, 117–19, 126–8 Lao, 9, 55–66 lexicalization, 41, 44, 46, 64–6, 83, 108, 112, 116, 121, 125, 129–30, 146–8 lexicon, 24, 26, 36, 70, 131, 164, 187 linguistic relativity, 113–15, 117, 126, 129–30, 133, 166, 169 universalism–relativism debate, 8 macro-event, 9, 38, 48, 62–7, 71–81 in dependent clause, 74–7 macro-event property (MEP), 9, 28, 38, 48, 56, 58, 60–7; see also typology: semantic Mandarin, 112 Mayan: see Yukatek Mayan, Tzeltal memory, 8, 24–6, 35–6, 39, 41, 68, 82, 113–16, 122–5, 127–8, 133, 160, 168, 174, 226
Index mereology, 6–7, 11, 222–6 structure (part–whole), 11, 46 subevent, 6, 9, 44–5, 47–50, 57, 65, 70, 71–82 morphology, 16, 24, 45, 57, 66, 134, 139–41, 148–54, 156 salience, perceptual (of morphemes), 147, 148, 164 motion, 2, 9–10, 17, 29, 33, 38, 69, 85, 92, 102, 108–21, 124–33, 134–5, 161, 173–86, 223–6, 234–5 categorization, 9, 113–18, 125–7, 132–3 caused, 169, 171, 173 cognitive processing, 10, 113–15 manner, 86, 110–33, 135, 138, 169, 172, 181, 186, 224–5, 229 path, 10, 85–6, 109–33, 134–8, 142–6, 169, 171–2, 177–86, 224–6 path, gestural, 11, 173–4 satellite-framed, 10, 110–14, 117, 119, 122, 130–3, 134–49, 164, 186 transfer, 28, 49, 50, 98, 158 translocation, 185, 223–6 typology, 109–11 vector, 136–9, 142–8, 164 verb-framed, 10, 110–15, 117–18, 121–3, 130–2, 134–41, 142, 144–7, 164, 186 motor control, 219, 234 narrative, 4, 8, 17, 21–3, 25, 27, 29–38, 40–1, 47, 51, 70, 83, 84, 87–91, 92, 96–8, 103–4, 106, 168 object (grammatical relation), 2, 19, 22, 29, 33–5, 157–9, 177, 180, 197, 204 P¯an.ini, 3 perception, 1, 5–7, 11, 14, 24, 28, 43, 108, 115, 170, 189, 191, 217, 219, 220–7 elastic-deformation, 5 perceptual salience, 10, 148, 164, 217, 219–20, 226 perceptual system, 5 “raw”, 43 theory of direct, 5 phenomenal causality, 54 possible worlds, 4 predicate, 3, 18, 23, 27, 33, 37–42, 83, 84–106, 111; see also verb complex, 15, 18, 37–40 predicate logic, 3 primary object, 3–4, 19 proposition, 2–4, 22, 39, 60
281 Russian, 10, 110–12, 134–6, 142–8, 156–8, 164 scope, 48, 72 second language acquisition, 187 secondary object, 3–4, 19 serial verb construction, 7–8, 13, 17–42, 49, 53, 55–7, 63–4; see also event: linguistic representation of, construction (morphosyntactic) compact, 8, 17, 27–30, 32–4, 41 narrative, 8, 17, 22, 29–38, 41 narrative serialization, 27 nuclear layer predicate, 27 scope, 22, 27–8, 33–4, 38 semantic structure, 25–6, 36, 41 strategy, analytic (Kalam), 16–17 sign language, 9–10, 84–106; see also Turkish, Adamorobe, American, Danish, German, Israeli sign languages classifier, 85–105 expression, linguistic, 10, 84–5 sign space, 85–8, 91–7 sign language perspective, 10, 106 character, 87–9, 92–104 fused, 93, 96, 100–2 observer, 88, 94, 103 signing, 87–8, 97 visual, 84, 87–8, 105 situation semantics, 4–5 Spanish, 10, 110–13, 115–22, 128, 130–2, 134–40, 144–52, 171 spatial description, 84, 106, 149 caused motion, 135, 139 in sign language, 84, 90–4, 99 information, spatial, 84–7, 135, 167, 169–70, 172–3, 179, 183–5, 187 locative, 34–5, 38, 84, 136, 140–6, 149, 157, 160, 162, 170–2 placement, 10–11, 134–64, 167, 169–88 posture, 11, 147, 152, 171–3, 177, 183–6 speech formula, 35, 39, 41 speech production, 11, 168, 187, 190–1, 195–7, 213 pause, 8, 13, 17, 21–6, 35, 39, 41, 46 speech onset latency, 190, 200 speech processing, 41 state, 5, 15, 27, 29, 44, 51–7, 63, 65, 69, 72–3, 138, 171, 186, 191, 193–4, 221, 228–30, 237, 240–1, 243–5, 249, 252; see also aspect subject, 2–4, 18–19, 22–3, 29, 33, 37, 59–61, 75, 82, 104; see also event participant Swedish, 110, 168
282
Index
syntax–semantics interface, 30, 35, 48–9, 65, 67 bi-uniqueness constraint, 49 Tairora, 22–4 Talmy, Leonard, 10–12, 48, 54–5, 108–12, 119, 124, 128–30, 133, 134–9, 146, 148–9, 240 Tamil, 117, 171 Thai, 112 thinking for speaking hypothesis, 113, 166, 172, 184–5; see also linguistic relativity time temporal operator, 4, 9, 28, 48, 65, 71 topic time, 71, 81 Tok Pisin, 8, 14, 22–4, 46 Tsou, 112 Turkish, 10, 112, 134, 137–40, 144–52, 156, 159, 165, 169 Turkish Sign Language (T˙ID), 9, 84, 90–106
typology linguistic, 7–8, 10–11, 57, 62, 66, 68–72, 114, 115, 125, 133, 134, 139–41, 147–9, 155, 164, 168, 170, 176–80, 187–8 satellite-framed, 10, 117, 119, 122, 130–3, 134–49, 164, 186 semantic, 10, 43–4, 49–50, 64–6, 108–11, 112, 114, 124, 128–9, 139, 148–9 verb-framed, 10, 117–18, 121–3, 130–2, 134–41, 142, 144–7, 164, 186 Tzeltal (Mayan), 10, 134, 137–9, 144–9, 152–5, 171 verb level of specificity, 152, 186 semantic richness, 148 visual processing: see eye movement Whorf, 130, 166, 227; see also linguistic relativity Yukatek Mayan, 9, 50–1, 55–7, 59–60, 61–4