The Vienna Series in Theoretical Biology
Evolution of Communication Systems A Comparative Approach edited by D. Kimbrough Oller and Ulrike Griebel
Evolution of Communication Systems
The Vienna Series in Theoretical Biology Gerd B. Müller, Günter P. Wagner, and Werner Callebaut, editors The Evolution of Cognition edited by Cecilia Heyes and Ludwig Huber, 2000 Origination of Organismal Form: Beyond the Gene in Developmental and Evolutionary Biology edited by Gerd B. Müller and Stuart A. Newman, 2003 Environment, Development, and Evolution: Toward a Synthesis edited by Brian K. Hall, Roy D. Pearson and Gerd B. Müller, 2003 Evolution of Communication Systems: A Comparative Approach edited by D. Kimbrough Oller and Ulrike Griebel, 2004
Evolution of Communication Systems A Comparative Approach
edited by D. Kimbrough Oller and Ulrike Griebel
The MIT Press Cambridge, Massachusetts London, England
© 2004 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For information, please e-mail
[email protected] or write to Special Sales Department, The MIT Press, 5 Cambridge Center, Cambridge, MA 02142. This book was set in Times Roman by SNP Best-set Typesetter Ltd., Hong Kong. Printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Evolution of communication systems : a comparative approach / edited by D. Kimbrough Oller and Ulrike Griebel. p. cm.—(The Vienna series in theoretical biology) Includes bibliographical references (p. ). ISBN 0-262-15111-1 (alk. paper) 1. Communication—History. 2. Animal communication. 3. Human evolution. 4. Language and languages—Origin. I. Oller, D. Kimbrough. II. Griebel, Ulrike. III. Series. P90.E86 2004 302.2¢09—dc22 10
9
8
7
2004042788 6
5
4
3
2
1
Contents
Preface List of Contributors
vii ix
I
INTRODUCTION
1
1
Theoretical and Methodological Tools for Comparison and Evolutionary Modeling of Communication Systems D. Kimbrough Oller and Ulrike Griebel
3
II
PHILOSOPHICAL ISSUES: CONCEPTIONS AND FOUNDATIONS
13
2
On Reading Signs: Some Differences Between Us and the Others Ruth Garrett Millikan
15
3
Primitive Content, Translation, and the Emergence of Meaning in Animal Communication William F. Harms
4
Underpinnings for a Theory of Communicative Evolution D. Kimbrough Oller
III
METHODOLOGICAL AND THEORETICAL DEVELOPMENTS FOR THE FUTURE OF EVOLUTIONARY STUDY OF COMMUNICATION SYSTEMS
5
6
31 49
67
Social and Cultural Learning in the Evolution of Human Communication Luc Steels
69
The Role of Learning and Development in Language Evolution: A Connectionist Perspective Morten H. Christiansen and Rick Dale
91
7
Repeated Patterns in Behavior and Other Biological Phenomena Magnus S. Magnusson
111
IV
ANIMAL COMMUNICATION SYSTEMS: A COMPARATIVE BASIS
129
8
Social Processes in the Evolution of Complex Cognition and Communication Charles T. Snowdon
131
vi
9
Contents
Human Infant Crying as an Animal Communication System: Insights from an Assessment/Management Approach Donald H. Owings and Debra M. Zeifman
151
10
Evolution of Communication from an Avian Perspective Irene M. Pepperberg
171
11
Cephalopod Skin Displays: From Concealment to Communication Jennifer A. Mather
193
V
PRIMITIVE COMMUNICATION SYSTEMS AND LANGUAGE
215
12
The Evolution of Language: From Signals to Symbols to System Chris Sinha
217
13
Cooperation and the Evolution of Symbolic Communication Peter Gärdenfors
237
14
Language, Music, and Laughter in Evolutionary Perspective R. I. M. Dunbar
257
15
Kin Selection and “Mother Tongues”: A Neglected Component in Language Evolution W. Tecumseh Fitch
275
Language beyond Our Grasp: What Mirror Neurons Can, and Cannot, Do for the Evolution of Language James R. Hurford
297
16
17
How Far Is Language beyond Our Grasp? A Response to Hurford Michael A. Arbib
315
VI
CONCLUDING REMARKS
323
18
Directions for Research in Comparative Communication Systems D. Kimbrough Oller and Ulrike Griebel
325
Index
333
Preface
The evolution of communication systems in a variety of organisms is a topic that has both fascinated and frustrated theorists and empirical scientists. There is growing optimism, however, that real progress is being made and that further progress can be made, even toward the lofty goal of providing a lasting characterization of the evolution of communication in the hominid line. Our interest in providing groundwork for the enterprise of comparative communication systems was focused from the outset on bringing together a truly interdisciplinary group of scientists. One hope was that such a gathering might make it possible to solve a key problem: Different scholars and different subfields of scholarship utilize different terminology to address questions about communicative evolution that have much in common. The differences in terminology were, we presumed, a major source of the difficulties and frustrations that may have kept the field from achieving more rapid progress. And so we gathered scientists from Europe and America, from a wide variety of disciplines. We set the goal of addressing these terminological differences. The outcome yielded no improvement that we could discern. Fortunately, there were additional goals. Perhaps we should not have been surprised at the intransigence of terminological differences. The issues that underlie the difficulty with terminological standardization represent real differences in how researchers in various disciplines have conceptualized or organized information about communication and its functions in a variety of species. The terminological differences actually reflect the wealth and diversity of ideas that are being explored both theoretically and empirically. The workshop that we held at the Konrad Lorenz Institute for Evolution and Cognition Research in Altenberg (near Vienna), Austria, resulted in an extremely lively exchange of these ideas, filled with probing questions and much laughter. Paradoxically, even though the terminological differences appear largely to remain in place, the sense of progress within the context of the gathering and its aftermath was truly exciting. Considerable rapprochement of ideas was achieved, as we hope will be evident to the readers of this work. The workshop that inspired development of this volume, and that provided its title, was part of a series that implements a major function of the Konrad Lorenz Institute: to foster and stimulate interdisciplinary development of theory and integration of empirical work in theoretical biology. Participants in these workshops, always small in size, are selected carefully for their potential contribution to interaction both at the workshop itself and in many communications both before and after the event. In this case, participants submitted papers ahead of time, presented and discussed them extensively in a three-day workshop, revised them shortly afterward, then engaged in a peer review process, coordinated by the editors, with all authors involved in reviewing and receiving reviews. A final round of revisions of the articles occurred after many communications among the authors and the editors, both formally and informally.
viii
Preface
The activity of developing this workshop, as well as organizing and editing the volume, has been a delight from start to finish. The Konrad Lorenz Institute is founded upon an inspired idea: to bring together the sciences of biology at the highest level for interchanges that may lay the groundwork for significant growth of understanding. The Institute is housed in the mansion where Konrad Lorenz lived and where he was seated at his desk the day he received the call informing him that he was to share the 1973 Nobel Prize for physiology or medicine with Niko Tinbergen and Karl von Frisch for the development of the field of ethology. Indeed, Lorenz and his colleagues contributed fundamentally to development of many of the ideas that were richly entertained and elaborated in the context of our workshop. Participants in the workshop did not fail to notice that Lorenz’s books lined the shelves in the room where the primary meetings occurred, the room that was his library. The green landscape visible through the many windows of the facility provided a serene setting for reflections and illuminations. It was a rare opportunity, and we offer our heartfelt thanks to the administration of the Konrad Lorenz Institute for supporting it.
Contributors
Michael A. Arbib University of Southern California, USC Brain Project, Los Angeles, CA, USA. Morten H. Christiansen Cornell University, Department of Psychology, Ithaca, NY, USA. Rick Dale Cornell University, Department of Psychology, Ithaca, NY, USA. R. I. M. Dunbar University of Liverpool, School of Biological Sciences, Liverpool, England. W. Tecumseh Fitch University of St. Andrews, School of Psychology, St. Andrews, Fife, UK. Peter Gärdenfors Lund University Cognitive Science, Kungshuset, Lund, Sweden. Ulrike Griebel The University of Memphis, Department of Biology, Memphis, TN, USA. Member of the Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria. William F. Harms Independent scholar, Seattle, WA, USA. James R. Hurford University of Edinburgh, Language Evolution and Computation Research Unit, Department of Theoretical and Applied Linguistics, Edinburgh, Scotland, UK. Magnus S. Magnusson University of Iceland, Human Behavior Laboratory, Reykjavik, Iceland. Jennifer A. Mather University of Lethbridge, Department of Psychology and Neuroscience, Lethbridge, AB, Canada. Ruth Garret Millikan University of Connecticut, Department of Philosophy, Storrs, CT, USA. D. Kimbrough Oller The University of Memphis, School of Audiology and Speech-Language Pathology, Memphis, TN, USA. Member of the Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria. Donald H. Owings University of California, Department of Psychology, Davis, CA, USA. Irene M. Pepperberg Massachusetts Institute of Technology, The Media Lab, Cambridge, MA, USA. Chris Sinha University of Portsmouth, Department of Psychology, Portsmouth, England.
x
Contributors
Charles T. Snowdon University of Wisconsin, Department of Psychology, Madison, WI, USA. Luc Steels Vrije Unversiteit Brussel, Artificial Intelligence Laboratory, Brussels, Belgium, and Sony Computer Science Laboratories, Paris, France. Debra M. Zeifman Vassar College, Department of Psychology, Poughkeepsie, NY, USA.
I
INTRODUCTION
1
Theoretical and Methodological Tools for Comparison and Evolutionary Modeling of Communication Systems
D. Kimbrough Oller and Ulrike Griebel The Need for Framework Development in the Study of Evolution of Communication Systems The study of communication and its origins has an illustrious and varied history. The inspiration for the effort has often been focused upon the possibility of illuminating the evolution of language in humans (Condillac, 1756; Locke, 1690; Aarsleff, 1976). But in the last few decades, there has been remarkable growth of understanding regarding communication systems of other species (see, e.g., review in Hauser, 1996). A comparative enterprise now holds the promise of richly clarifying both the similarities and the differences between our own communicative capabilities and those of other species. To pursue the comparative enterprise effectively, a genuinely interdisciplinary effort is required. One of the primary needs is to formulate a lasting framework of properties and principles that can be understood across disciplines, and that can be used as a set of standards for comparison among species. Hockett’s Model Groundwork toward this goal was laid in the pioneering effort of Charles Hockett and his colleagues (Hockett, 1960; Hockett and Altmann, 1968). However, it is clear the framework of “design features” for communication systems that was provided by Hockett needs drastic revision. The definitions of features within the framework formed a shaky foundation for comparison, perhaps in part because their formulation preceded much of the fundamental work that has illustrated communicative capabilities of nonhumans. The features were formulated with an eye always directed toward human language, and ignored many aspects of communication that are now evident from work with nonhumans. The features were characterized as binary properties, even though the authors acknowledged that they would at some point have to be reformulated as dimensions with many potential values. Binarity imposes unacceptable limits on description of both human and nonhuman systems of communication. Perhaps the most fundamental problem with the Hockett framework is that it was formulated as a flat list of design features. Hockett understood that the concepts embodied in the model were actually related in many ways, and that in some cases the design features seemed to presuppose each other. Such relations among the features could, it appears, after restructuring and reformulation, reveal a hierarchical system indicating paths of potential evolution for communication systems. Lower levels of the hierarchy could be
4
D. Kimbrough Oller and Ulrike Griebel
expected to occur early in evolution and to form the foundation for elaboration toward higher levels of the hierarchy. The flat list of design features stood in the way of this more evolutionarily rich potential characterization of communication systems. Ultimately, it has become clear that many of the features formulated by Hockett were simply ill-defined, yielding unnecessary and confusing overlap among features, lack of clarity regarding boundaries implied by the definitions, and a failure to account for hierarchical relationships among features. The inadequacies of the Hockett framework are illustrated with disturbing clarity by the fact that, based on a review of recent empirical evidence, comparisons within the framework have not proven to be unambiguously capable of discriminating human language from nonhuman primate communication systems. Nonhuman primates can be portrayed as illustrating all of the Hockett features, given the way they are defined (see Snowdon, chapter 8 in this volume). Not a single Hockett feature appears to be unique to human language. This is surely not what Hockett intended. And it does not lay the sort of foundation that is needed for evolutionary speculations about communication systems. A New Look at Frameworks and Empirical Accomplishments in Evolution of Communication Systems This volume is the product of a splendidly fruitful interchange among researchers from Europe and North America who are dedicated to addressing the evolution of communication and to the hope of contributing to a more lasting framework for description and comparison across species. Language in humans is clearly a topic of enormous interest to the contributors to this volume, but it is also clear that the authors have taken the burgeoning literature in animal communication extremely seriously. Some of them have made major contributions to that animal literature, and their empirical work is reflected here. In addition, the volume reflects interactions among philosophers of language, linguists, developmental psychologists, evolutionary biologists, and specialists in the development of new technologies for the study of evolution and communicative systems. The chapters are organized in parts, but there is considerable overlap among the goals and focuses of authors across the parts. The following overview is intended to prepare the reader for a journey along new paths in communicative evolution. Philosophical Framework Needs In Part II, “Philosophical Issues: Conceptions and Foundations,” three chapters address critical matters regarding the evolution of signals and symbols, and all of them seek to develop a model that will characterize steps of evolution, the sorts of logically necessary steps that could provide an important update to the modeling work of Hockett.
Tools for Comparison and Modeling of Communication Systems
5
The three chapters have much in common in terms of the subject of interest, and all arrive at a generally similar conclusion: In primitive animal communication, two aspects of the representation (or the signal) are coupled. These two aspects are those which can be interpreted as referring (to entities or events, or to concepts about entities or events) and those which can be interpreted as influencing responses in the receiver. And for all three of the authors, the decoupling of these elements of communication is taken to constitute a major necessary step in the direction of creating more complex and powerful representational and communicative devices. Ruth Garrett Millikan’s contribution extends her previous “functional semantics” work on the essential character of representation and meaning, formulating the idea of the “pushmi-pullyu representation” (PPR), a primitive representational form where both “what is the case” and “what to do about it” are transmitted simultaneously. PPRs in this treatment are representations that are said to face in both directions at once, toward indication and responsive action, “in one undifferentiated breath.” Even reflexes, in this formulation, can be thought of as PPRs. Millikan then outlines possible steps toward complex and “articulate” communication (for example, in human language), where the indicative face of representation is no longer held fast to influences on the receiver. William F. Harms addresses a similar issue. He asks whether animal signals such as warning calls provide indicative information or commands to action. He points out that the answer might be “both or neither.” The two aspects of the signal are bound together, but their messages are not easily translated into human language, he argues. Such signals possess what he calls “primitive content,” and he draws attention to the power of “functional semantics” (referring to Millikan’s line of research) in making it possible to bring primitive content into the domain of what has traditionally been called meaning. He presents a six-layer scheme in which primitive content can be seen to emerge from even simpler nonmeaningful representations or acts. The inherent coupling of extension (reference) and intension (procedural specification) in primitive content is undone at the higher levels of evolution for meaningful acts that Harms outlines. D. Kimbrough Oller offers a sketch of an “infrastructural” alternative to Hockett. He outlines steps of a “natural logic” indicating how foundations can be laid in primitive communicative systems, and how elaborations can be evolved to higher levels of complexity. The hierarchical scheme offers description in terms of many layers of communicative “properties” (which can be taken to be roughly equivalent to Hockett’s design features) for both signal complexity (“infraphonology”) and value/functional complexity (“infrasemiotics”). Like Millikan and Harms, Oller emphasizes coupling in primitive systems, noting that signals are permanently coupled to their values (functions, meanings, etc.) in primitive communications (such as the fixed signals exemplified by monkey alarm calls or distress cries), but that more intelligent animals often learn new pairings of signals
6
D. Kimbrough Oller and Ulrike Griebel
and functions. He also illustrates another way of looking at the indicative/procedural coupling in primitive representations that Millikan and Harms discuss, interpreting the procedural side of representations in terms of Austin’s notion of illocutionary force (Austin, 1962). Not surprisingly, the issues of coupling and decoupling raised by the three chapters in part II are treated from additional perspectives later in the volume, especially in chapter 12, by Chris Sinha, and chapter 13, by Peter Gärdenfors. From a variety of disciplines, then, it is clear that new approaches are being developed to characterize the essential qualities of more primitive and more elaborate forms of communication such that lasting comparative work can be pursued. Methodological and Theoretical Developments The study of the evolution of communication systems is being enriched enormously by new developments in tools of investigation. In the areas of artificial intelligence, connectionist modeling, neural network development, and pattern detection, growth of technology is extraordinary. The efforts are laying the groundwork for broad new methods to test theoretical approaches to evolution of learning systems. The simplest such systems merely acquire the ability to detect patterns, but work is underway to test artificial systems that, it is hoped, will one day be capable of learning human language. The work on these new methodologies clearly is not just tool development, however. The endeavor has been driven in many instances by fundamental new theoretical assumptions and by the desire to create ways to illustrate and simulate their implications. Luc Steels presents a review of dynamic new developments in robotics and accompanying software that have provided a new foundation for the study of language. The work, to which he personally has contributed substantially, implements a theoretical perspective on the learning of and evolution of language, a perspective that assumes self-organization plays a significant role. Social and cultural interrelations are prominent in his simulation approach, which offers tests of game-theory models implemented in robotic and computer simulations. The workshop that inspired this volume did not include a presentation in a particular area of interest to the participants: connectionist modeling. Consequently, Morten H. Christiansen and Rick Dale were invited to provide a contribution even though they had not been present at the workshop. The efforts they review represent an exciting field of computer-based connectionist simulation that is offering new perspectives on learning and its role in language and the evolution of communication. The authors contend that much of language evolution may have been shaped, in fact, by development. Self-organizing systems, acting within simulated social contexts, can acquire information and structure of remarkable complexity. By using computer simulation models of neural networks, the
Tools for Comparison and Modeling of Communication Systems
7
authors review information and present two new simulations to support their view of the evolution of language. In the final methodological article, Magnus S. Magnusson outlines a pattern detection scheme that promises to provide a powerful new method for discovering the organization of temporally systematic information. In particular, the algorithms implemented in the Theme software make it possible to determine the existence of hidden patterns in sequential data, patterns that are quite obvious when located but that can remain entirely hidden without the aid of the software tool. The algorithm is capable of locating even infrequent patterns in reasonably sized data sets. And perhaps most important, it can locate patterns of hierarchical character. Magnusson argues for the application of this sort of technique to a wide variety of problems in the area of communication and language, as well as to fields such as DNA sequencing. These new methodological tools form part of a rapidly growing trend in the study of the evolution of communication systems. High technology is offering whole new approaches to study based upon simulations that can test existence hypotheses and compare various possibilities in game-theory and neural network modeling of evolutionary possibilities. Animal Communication Systems As suggested above, there has been rapid progress in the development of new information about animal communication systems in recent years. While the present volume can provide only a sampling of those developments, it is an intriguing sampling indeed. Charles T. Snowdon provides a fascinating glimpse into the world of nonhuman primates and the relations between vocal communications found there and in human language. His approach offers a perspective on what the early environmental conditions may have been that led to the hominid communicative explosion. In particular, Snowdon points out that while apes and monkeys in the Old World tend to be relatively silent creatures, the New World is home to monkeys, such as tamarins and marmosets, that vocalize more frequently, that show more richness of development and learning in their vocal patterns, and that appear to transmit more information with the sounds they produce than do any of the Old World primates. A key reason, he suggests, is cooperative breeding, which is found in the New World animals to a much greater extent than in the Old World monkeys and apes. The New World primates that he is studying live in circumstances where engaging in rich communicative exchange is advantageous, because parents (and alloparents) engage in cooperative rearing and need to communicate about it. This, Snowdon suggests, may have been a critical factor that differentiated the early hominids from their ape cousins.
8
D. Kimbrough Oller and Ulrike Griebel
Donald H. Owings and Debra M. Zeifman take the ethologist’s point of view when they study the human infant, just as they do in the study of other species. In particular, the authors look at the human infant cry from the perspective of assessment/management theory, a framework developed by Owings and his colleague Eugene Morton (Owings and Morton, 1998). This is an insightful approach, because it avoids the temptation to create inappropriately anthropomorphic comparisons between humans and nonhumans. The human infant’s cry has much in common with the vocal communications of other primates, in fact much more in common than does speech, even the speech of little children. Owings and Zeifman indicate enlightening parallels with nonhuman communication in illustrating how the human infant “manages” and the parent “assesses” in the context of crying. Irene M. Pepperberg provides a perspective on the remarkable minds of the members of the African Grey parrot species. By utilizing a specially designed training technique based on “model/rival” observation by the learner, she has been able to illustrate that these parrots can learn aspects of language that deserve substantial scrutiny, especially by those who might presume that only mammals have rich learning capabilities and communicative prowess. From the perspective of evolution of communication systems, the remarkable accomplishments of the parrot trainees include substantial vocabularies of intelligible words that are used in semantically creative ways. Further, the parrots show the apparent ability to use learned words to transmit multiple illocutionary forces (at least “identification” and “request”), a pattern that suggests the sort of decoupling (of different aspects of communicative function) that is so much the focus of articles in part II of this volume. In the final chapter on animal communication, Jennifer A. Mather surveys work on the skin communication systems of the cephalopods, especially certain species of squid. These animals are able to create detailed patterns on their skin with extraordinary speed, flickering at rates that challenge the flicker-fusion rate of the human eye. Their skin patterns are used both for extremely effective camouflage and to communicate with conspecifics, especially in the domains of courtship and aggression. A general theory of evolution of communication systems will need to account for a broad range of modes and styles of communication, and the cephalopods offer an important expansion of our viewpoint about possible ways that communication systems can play out, because their system is visual, whereas the focus in most research in communication evolution is on acoustic systems. Taken together, the articles on animal communication put much of what the rest of the book considers in a concrete perspective. Primitive Communication and Language The remaining contributed chapters of the volume are dedicated to direct consideration of the relations between animal communication systems, beginning with very primitive ones,
Tools for Comparison and Modeling of Communication Systems
9
and human language. The focus of the chapters is also upon the conditions (both ecological and physiological) that may be required for a linguistic system to emerge in evolution from a less complex communicative background. In the first of these chapters, Chris Sinha outlines a proposal in which the more primitive signal systems of nonhumans are contrasted with human language in terms of a distinction between “signals” and “symbols.” His view incorporates self-organizational principles and his own notion of “epigenetic naturalism.” He argues that environment is “constructive” in its relation to self-organization and learning. As in the chapters by Millikan, Harms, and Oller, Sinha focuses on the growth of higher-order communicative structures from the primitive “signal” background seen in much of animal communication, but his view offers suggestions, based on epigenesis, about how the process of “elaboration” to higher-order “symbolic” structures occurs. He notes that two emergent properties are the product of symbolic elaboration: reference and construal. The former term requires joint attention by sender and receiver, and the latter, a more elaborate form of representation advocated by Langacker (1987). Peter Gärdenfors refers to Sinha’s work and modeling as he develops the idea that human language may have depended upon a strong tendency in the hominid line to plan into the future. This tendency to look forward, and to cooperate in social groups that look forward, may have been critical, in Gärdenfors’s view, to the emergence of language. No other animal shows nearly the degree of planfulness as humans, and this, he argues is the crux of the language need and a critical foundation for it. The chapter provides a review of relevant animal literature, and notes in particular that “cued” representations (where communications are grounded in the here and now), which appear to be common in nonhumans, are less powerful and elaborate (and less capable of supporting planful behavior) than “detached” representations (where communications can refer to the present, the past, the future, the absent, or the imaginary). He exemplifies his approach with an outline of the underpinnings for names, nouns, and adjectives. The chapter by R. I. M. Dunbar elaborates upon his widely cited hypothesis that the evolution of human language was dependent upon an increase in group size of ancient hominids in comparison with their ape relatives. As social groups became larger, means of maintaining bonds had to be extended beyond those available to other apes. Because grooming could no longer do the job (there was not enough time in the day to groom so many group members), another mechanism had to take over, and that mechanism was vocal in nature. The use of vocalization as a bonding and affiliative device jump-started language, according to Dunbar’s hypothesis. His chapter in this volume reviews that hypothesis and offers the suggestion that both music and laughter may have played crucial roles in the early steps of the process by which hominids came to be vocally different from their ape cousins. Both music and laughter may have offered key social bonding devices,
10
D. Kimbrough Oller and Ulrike Griebel
with physiological rewards to maintain them. He reviews preliminary empirical evidence supporting the idea that physiological effects may be involved. In W. Tecumseh Fitch’s chapter the idea that social conditions may have spawned human language is formulated in the context of the Hamiltonian idea of inclusive fitness. The idea is that hominid societies may have offered circumstances where “cheap honest communication” could be advantageous because kin selection was able to drive evolution in the highly social hominid environment. Kin selection offers, in Fitch’s view, an escape from “the evolutionary traps of constant Machiavellian deceit, or wasteful Zahavian handicaps.” In this way, his proposal dovetails with other proposals in this volume, in particular with Dunbar’s notion that change in group size may have created a special environment that was conducive to the effects of kin selection for vocal communication, with Snowdon’s idea that cooperative breeding may have played a special role in language emergence, and with Gärdenfors’s idea that cooperation for future planning was the driving force in the origin of human language. The last two contributed chapters in the volume constitute a debate. James R. Hurford, one of the workshop participants, provides a critique of the “mirror neuron” concept and its applicability to the evolution of language. He communicated with Michael A. Arbib, a primary author of the mirror neuron work, and their discussions resulted in the written interchange published here. Both contributions were peer reviewed, and then rewritten by the authors. Arbib was not a participant in the workshop, but the questions evaluated in the debate regarding possible neural underpinnings for language were too engaging to ignore, and so he was invited to offer his response to Hurford’s critique for back-to-back publication. The articles in the final part offer a sampling of directions that the study of language evolution is taking. Both from the standpoint of social conditions and from the standpoint of physiological requirements, we are clearly entering a new era in research on the origin of complex natural communication systems. Acknowledgment Our thanks go to the Konrad Lorenz Institute for Evolution and Cognition Research for support of the workshop upon which this volume is largely based, and to the Plough Foundation for support of much of the work that went into this chapter. References Aarsleff H (1976) An outline of language origins theory since the Renaissance. The Origins and Evolution of Language. Ann NY Acad Sci 280: 4–13.
Tools for Comparison and Modeling of Communication Systems
11
Austin JL (1962) How to Do Things with Words. London: Oxford University Press. Condillac EB de (1756) An Essay on the Origin of Human Knowledge; Being a Supplement to Mr. Locke’s Essay on the Human Understanding. London: J. Nourse (translation of Essai sur l’Origine des Connaissances Humaines). Hauser M (1996) The Evolution of Communication. Cambridge, Mass.: MIT Press. Hockett C (1960) Logical considerations in the study of animal communication. In: Animal Sounds and Communication (Lanyon WE, Tavolga WN, eds.), 392–430. Washington, D.C.: American Institute of Biological Sciences. Hockett CF, Altmann SA (1968) A note on design features. In: Animal Communication: Techniques of Study and Results of Research (Sebook TA, ed.). Bloomington: Indiana University Press. Langacker RW (1987) Foundations of Cognitive Grammar, vol. 1, Theoretical Prerequisites. Stanford, Calif.: Stanford University Press. Locke J (1690) An Essay Concerning Human Understanding. Reprinted 1965, London: Dent. Owings DH, Morton ES (1998) Animal Vocal Communication: A New Approach. Cambridge: Cambridge University Press.
II
PHILOSOPHICAL ISSUES: CONCEPTIONS AND FOUNDATIONS
2
On Reading Signs: Some Differences Between Us and the Others
Ruth Garrett Millikan If there are certain kinds of signs that an animal cannot learn to interpret, that might be for any of a number of reasons. It might be, first, because the animal cannot discriminate the signs from one another. For example, although human babies learn to discriminate human speech sounds according to the phonological structures of their native languages very easily, it may be that few, if any, other animals are capable of fully grasping the phonological structures of human languages. If an animal cannot learn to interpret certain signs, it might be, second, because the decoding is too difficult for it. It could be, for example, that some animals are incapable of decoding signs that exhibit syntactic embedding, or signs that are spread out over time as opposed to over space. Problems of these various kinds might be solved by using another sign system—gestures rather than noises, or visual icons laid out in spatial order, or separating out embedded propositions and presenting each separately. But a more interesting reason that an animal might be incapable of understanding a sign would be that it lacked mental representations of the necessary kind. It might be incapable of representing mentally what the sign conveys. When discussing what signs animals can understand or might learn to understand, one question it may be important to have in mind concerns what kinds of mental representations these animals are likely to possess. To this end, a fairly explicit theory of mental representation, and of its various types, would be needed. In this chapter I am going, very quickly, to sketch a general theory of mental representation and of its most basic varieties. I will suggest some ways in which the most sophisticated kinds of mental representations that humans use seem to differ from those used by other animals. Mental representations are a species of what I will call “intentional signs.” Intentional signs must be distinguished, first, from “natural signs.” “Natural signs,” in general philosophical usage and in the usage of pragmatics, are signs that are not designed to be used as signs, and hence are not conventional (e.g., Augustine, 1986; Ockham, 1938, p. 19) and not voluntary (e.g., Ockham, 1938; Kant, 1978). Because they are not designed for use as signs, it makes no sense to attribute truth or falsehood to natural signs. Smoke means fire only when it has actually been caused by fire. Black clouds mean rain only when they actually produce rain. Red spots mean measles only when caused by measles. Natural signs themselves cannot be deceitful or wrong, though it is of course possible for an interpreter to make mistakes when trying to read them. In my usage, natural signs contrast with “intentional signs,” which, following Franz Brentano’s technical usage of the term “intentionality,” are signs that can be false or that may sometimes signify nothing real. By intentional signs I mean those that have been “designed,” in accordance with human or animal purposes, or by learning mechanisms, or
16
Ruth Garrett Millikan
by natural selection, to be interpreted according to predetermined (semantic) rules to which targeted interpreters are cooperatively adjusted. Thus it is possible for intentional signs to be false or misleading. To remind us that this usage of “intentional” is technical, and to be sure that the intentional, in this sense, does not become mixed in our minds with the very different notion expressed by the philosopher’s terms “intension,” “intensional,” and “intensionality,” I will sometimes capitalize the T, thus: “intenTion,” “intenTional,” “intenTionality.” One kind of intenTional sign is a mental representation, as will become clearer below. In my terminology, “mental representation” does not imply consciousness. I am not going to talk about what is before or within an animal’s conscious mind. Mental representations have to do with the mechanics of behavior control and how this control is accomplished—presumably, neurologically. A place to start is with Gallistel’s usage of the term “representation,” which, as he says, is derived from the mathematical sense: “The brain is said to represent an aspect of the environment when there is a functioning isomorphism between some aspect of the environment and a brain process that adapts the animal’s behavior to it,” and later, “The exploitation of the correspondence to solve problems in the one domain using operations belonging to the other establishes a functional isomorphism: an isomorphism in which the capacity of one system to represent another is put to use” (1990, pp. 15–16). Gallistel seems to have in mind the classic twentieth-century view that putting a representation to use involves calculation, but this image is unnecessarily restrictive. Think, instead, of tracing a line with your finger, the visual representation of the line guiding the motion of your hand to conform to the contours of the line. This seems, anyway, to involve something more like translation (in the physicists’ sense) than calculation.1 Also, Gallistel probably has in mind a more restricted notion of a functioning isomorphism than I intend. Just as, strictly speaking, nothing is a quantity and zero is a number, so, strictly speaking, a sign system that maps times onto themselves and/or maps places onto themselves, as in simple signaling systems, exploits isomorphisms. The isomorphisms are what allow these simple systems to exhibit productivity. For example, a warning cry to conspecifics that tells when a predator has been sighted is a member of a potentially infinite set of such signals, each telling of a predator at a different time and/or in a different place. The set of possible signals is isomorphic to its corresponding set of possible signifieds. It is not the case, of course, that any intrinsic property of the cry is isomorphic to any intrinsic property of the predator. Similarly, no intrinsic property of the dot on the map that indicates the village of Storrs is isomorphic to any intrinsic property of Storrs. More accurately, any such isomorphisms are inoperative, nonfunctioning, within this system of representation. Every isomorphism other than the isomorphism that maps a domain onto itself by the identity function involves some arbitrary correspondences.
On Reading Signs
17
Gallistel is clear that what makes a brain state into a representation of some aspect of the environment is not just an isomorphism but an isomorphism that is used to adapt behaviors to this aspect. A mental representation is of whatever it is designed to be used as a representation of. This is the same as saying that mental representations are intenTional signs or intenTional representations. Perceptual or cognitive systems that produce intentional representations have been selected for producing representations to be used by targeted interpreters. If the representations they produced were never used as representations, these mechanisms could not have been selected for producing representations according to rules to which targeted interpreters are adjusted. They might be very efficient at producing natural signs, but natural signs are not intenTional representations. Gallistel’s description covers only representations of what is the case—that is, only “indicative” representations. It does not cover representations of what is to be done— “imperative representations.” For example, it does not cover explicit intentions or goal representations. Paraphrasing Gallistel, I will say that the brain represents something to be done by the organism when there is a specific kind of isomorphism between a brain process and some aspect of the environment (or of the organism–environment relation) that this process functions to produce. The use or purpose of the brain process is to guide behavior so as to produce what it represents. Again, notice that being a mental representation depends on there being uses for it. Brain–environment correlations and covariations that are mere side effects of proper functioning—and there undoubtedly are many—do not count. Mental representations, then, can be used either to reflect states of affairs or to produce them. That representations can face in either of these directions is not news. Classic statements are in Anscomb (1957) and Searle (1983). What has not been generally recognized is that many representations face both ways at once. The principle is easiest to grasp in the case of simple external representations used for communication between nonhuman conspecifics. Does the dance of the honeybee tell where the nectar is, or does it tell worker bees where to go? Clearly, it does both. The genes for producing and responding to these dances have been selected because they result in dances that map nectar locations and also because they result in worker bees’ being guided to those locations. Similarly, alarm calls of the various species do not just represent present danger but also are signs directing conspecifics to run or to take cover. If beavers did not dive in response to the danger splashes of their conspecifics, the disposition to splash when sensing danger surely would not have been selected for. These calls and signals are intenTional signs or representations that are at once descriptive and directive. What, then, occurs in the head of a bee who understands another bee’s dance? Does the bee come to believe there is nectar at location L, desire to collect nectar, know that to collect nectar at L requires going to L, hence desire to go to L, and hence, no other desires
18
Ruth Garrett Millikan
being stronger at the moment, decide to go to L, and proceed accordingly? Surely not. To posit anything more complicated than, as it were, a literal translation of the dance into bee mentalese is surely superfluous. The comprehending bee merely acquires an inner representation that is at the same time a picture, as it were, of the location of nectar (relative to its hive) and that guides the bee’s direction of flight. The very same representation tells in one breath both what is the case and what to do about it. I call representations having this sort of double aspect “pushmi-pullyu” representations (or PPRs), after Hugh Lofting’s charming two-headed, Janus-faced creature by that name. (For more details see Millikan, 1996.) J. J. Gibson (1966) and E. J. Gibson (1977) claimed that the direct objects of perception are affordances. That is, what an animal directly perceives is places to climb up on, things to sit on, places to hide, things to eat or to run from, and so forth. One way to understand this is that the natural signs in the ambient energy read via perception are translated directly into mental representations that face in two directions at once. They tell what is located in what regions nearby, and at the same time guide appropriate responses to this information. Contemporary Gibsonians postulate “perception-action” cycles whereby structures of ambient energy impinging on the organism and carrying information about the distal environment, added to information about the current configuration of the organism’s body, are directly translated into structured action that takes account of both these factors, directly producing behaviors that will be productive given these factors. But by “directly” they don’t, of course, mean without mediation by the nervous system. So this is tantamount to postulating basic perceptual representations as being PPRs. It seems clear that many primitive animal behaviors, even the own most primitive human behaviors, are controlled in this way. This becomes transparently clear if we remain strict in our mathematical reading of “isomorphism” in the definition of representation, recognizing time and place as significant variables in representations. Examples are everywhere. The neural signal that triggers the protective eye blink reflex is technically a PPR. It represents that something is approaching the eye too closely right here, right now, and gives the instruction to close the eye right here, right now. It does this by mapping time and place of the approach of the object onto time and place of the neural signal, and in turn onto time and place of the blink. Similarly, the neural signals that mediate between those environmental signals that are “behavior releasers” and the behaviors called “fixed action patterns” (Lorenz and Tinbergen, 1939; Tinbergen, 1951; McFarland, 1981, pp. 1990ff.; Gould, 1982) that are thereby released in many animals are PPRs. Very simple internal mechanisms that control tropistic behaviors in primitive animals employ PPRs. And if the Gibsonians are at least partly right, many more flexible behaviors, such as grasping, chasing, climbing, and so forth, may fit this pattern as well. One possibility is that the simplest animals—at the level of
On Reading Signs
19
insects, for example—may be governed almost entirely by a set of perception–action cycles arranged in a hierarchy that determines which shall take precedence over which, depending on need, or when more than one currently relevant affordance is perceived. Some animals may be pure pushmi-pullyu animals. Notice that for a PPR to serve as an unmediated guide to immediate action, its indicative face has to represent the relation of the affording situation or object to the perceiving animal. An animal’s action has to be initiated from the animal’s own location. So in order to act, the animal has to take account of how the things to be acted on are related to itself, not just how they are related to one another. In the simplest cases, the relevant relation may consist merely in the affording situation’s occurring in roughly the same location and at the same time as the animal’s perception and consequent action. More typically, it will include a more specific relation to an affording object, such as a spatial relation, or a size relative to the animal’s size, or a weight relative to the animal’s weight or strength, and so forth. That the indicative faces of PPRs have to show relevant relations of the affording situation or object to the animal does not make what the PPRs represent in any way “subjective,” however. PPRs must give objective information about perceiver-world relations. That the PPR must represent a relation or relations between the affording situation or object and the animal itself does not imply, however, that the PPR expresses a self-concept or contains an independent element or aspect that refers to the animal itself. Reference to the acting animal itself is not an articulated part of a PPR. To see this, we must look again at what is involved in the functioning or “significant” isomorphism between a representation and what it represents. A representation is, as such, a member of a representational system defined by an isomorphism between the domain of the signs and the domain of the signifieds. There will be certain significant mathematical transformations of any sign in the system that will yield other signs in the system, these transformations corresponding in a regular way to transformations of the states of affairs that would be signified. That is what defines the functioning or significant isomorphism and makes a sign system productive. This isomorphism will be defined by certain entirely definite relations among the signs in the system that correspond to definite relations among the correlative signifieds. What the signs say explicitly or articulately will be only what these relations show. For example, the commonest kind of bee dance contrasts significantly with other bee dances only along three dimensions. (See, for example, Gould and Gould, 1988.) One dimension shows direction of nectar location relative to the hive and the sun. A second dimension shows the rough distance of the nectar from the hive. A third dimension says when this is so, namely, at roughly the same time as the dance. There is no way to transform a bee dance so that it talks about peanut butter rather than nectar, or about the moon
20
Ruth Garrett Millikan
rather than the sun, or about the tall oak tree rather than the hive, or about something that was the case last week. Exactly similarly, there no way to transform the dance so that it tells not that the watching worker bees should fly off in a certain direction, but that just Susy bee should or Sally bee should or, of course, that the wasps should. Nor can we suppose that the mental representational system into which the bee translates a bee dance would allow it to think, alternatively, about the relation of nectar to the tall oak tree or of peanut butter to the moon last week, or that Susy bee, rather than she herself, should fly off in a certain direction. What use would a worker bee have for any such representations? Or consider the beaver splash. Its articulation is even more restricted than that of the bee dance, having only two dimensions of contrast: time and place. Nor can I see much motivation for supposing that the beaver understands the splash by translating it into a mental representation in a system also allowing representations of danger next week or of peace and quiet last week, or of what bears or other beavers should do now. For the same reason, the PPR that tells the relation of the perceiving animal to the affording object and directs the animal’s action toward that object does not need to represent the animal itself explicitly. There need not be transformations of it that would represent, instead, the relation of objects other than the perceiving animal to the affording object. Similarly, in the simplest cases, there is no reason to suppose that the affording situation or object represented by a PPR is explicitly represented. Consider any behavior triggered by an environmental releaser—for example, the feeding behavior of the songbird triggered by the sight of the red inside of the open beak of its young. This behavior is mediated by a mental PPR whose indicative content is that a hungry baby of mine is right here and at this time needy and ready to receive food, and whose imperative content is the directive at this time drop food into this baby’s mouth—or something of that sort. But none of that complicated content is articulated, of course. This PPR need not contrast, for example, with any PPR that says anything about any non-babies-of-mine or about actions other than dropping food. Indeed, the bird may have, as we might say, “no idea” what it is doing, as we would conceive what it is doing. Similarly, it would surprise me if the beaver tail splash gave rise to anything we would consider “thought” at all in the beavers that hear it. But it doesn’t follow that the splash is not representational. The very simplest of inner representations, then, seem to have the following three characteristics, none of which appear to characterize the kinds of inner representations humans typically communicate using sentences. First, these representations tell in one undifferentiated breath both what the case is and what to do about it. Second, they represent the relation of the representing animal itself to whatever else they also represent. Third, they tend to be highly inarticulate, the representational systems in which they occur being devoted to highly specific tasks, so that very few contrasts in possible content are needed or possible.
On Reading Signs
21
The tendency for systems of inner representation to be devoted to highly specific tasks in most animals is evident from studies of animal learning. On this point, it may be sufficient to quote the Princeton ethologist James Gould on what he terms the “rigidly programmed plasticity” (Gould 1982, p. 268) characteristic of most animals: . . . learning is adaptively programmed so that specific context, recognized by an animal’s neural circuitry on the basis of one or more specific cues, trigger specific learning programs. The programs themselves are constrained to a particular critical period, . . . and to a particular subset of possible cues. Nothing is left to chance, yet all the behavioral flexibility which learning makes possible is preserved. (Gould, 1982, p. 274) Learning, even in higher vertebrates, seems less a general quality of intelligence and more a specific, goal-oriented tool of instinct. Bouts of learning such as food avoidance conditioning, imprinting, song learning, and so on, are specialized so as to focus on specific cues—releasers—during well-defined critical periods in particular contexts. Releasers trigger and direct the learning, and in general the learned material is thereafter used to replace the releaser in directing behavior. As a result animals know what in their busy and confusing world to learn and when, and what to do with the information once it has been acquired. Most learning, then, is as innate and preordained as the most rigid piece of instinctive behavior. (Gould 1982, p. 276)
In this preordained way, many animals learn either by trial and error or from conspecifics what to eat and what not to eat; some learn from others which local species are their predators; the European red squirrel laboriously learns how to open, specifically, hazelnuts; the oystercatcher laboriously learns to open oysters; and the chimp laboriously learns to open nuts by using a rock and an anvil. Speaking generally, what animals are capable of learning, hence, it is reasonable to suppose, what they are capable of developing representational systems to support, tends to be closely tied to specific skills or at least specific ends found to be useful in the past history of the animal’s species. Count this as a fourth typical characteristic of many inner representations in nonhuman animals. In contrast to these reflections on typical inner representations in other animals, we humans are capable, first, of having many beliefs that we know of no practical uses for. And we can have many explicit desires and goals that we do not know how to implement because we lack the relevant information. Clearly, indicative and imperative mental representations can occur quite independently in us, obliging us to use practical inference to join them together again in the production of action. We definitely are not pure pushmipullyu animals. Second, we are capable of having beliefs about things and affairs that are very distant from us and about things whose spatial and temporal relations to us we have no knowledge of at all. Our indicative representations do not, in general, represent relations of situations and objects to us. Third, on the assumption that beliefs and desires can be at least as articulate as sentences used to express them, they must have considerable inner articulation, allowing contrasts in at least subject and verb phrase, and often in direct and indirect object, prepositional phrases, and so forth. Fourth, humans seem to be capable
22
Ruth Garrett Millikan
of learning many skills and of learning about many kinds of affairs that neither we ourselves nor our species has previously had any use for, and developing the necessary representations accordingly. We appear to be able to harbor many representations that are not dedicated to any particular practical purpose, but that instead remain quite uncommitted. Now I think there is no question but that we humans also use many forms of representation in perception, below the perceptual level, and so forth, that are PPRs; that show relations of ourselves to affording objects; that are inarticulate or, like food aversions, that are learned according to built-in triggers. And I think there is no question that many animals harbor inner representations that are not just PPRs; that show relations of objects to one another rather than merely to the animal; or that are somewhat articulate; and so forth. My point is merely that in exploring the question of what kinds of signs a particular animal might be capable of learning to interpret, we should explicitly take into consideration whatever we can discover about the kinds of inner representations the animal is capable of employing. For it seems clear that to comprehend a sign with a certain force, content, and articulation, the animal must be able to match it with an inner representation having similar force, content, and articulation. What, then, are the steps from, beginning at the bottom rung, the sort of inarticulate pushmi-pullyu comprehension the bee has and the dim sort of pushmi-pullyu comprehension that mediates responses to behavior releasers, to articulate, well-differentiated, and uncommitted human beliefs and desires at the top? Well, of course, I don’t really know, but here are a few speculations. One thing that apparently occurred with the evolutionary development of the forebrain is that much incoming perceptual information became divided into two somewhat independent channels, a dorsal channel that yields representations, for example, of direction, distance, angle, location, and size of objects relative to the perceiving organism, and a ventral channel that yields representations of objective or nonrelative shape, size, color, texture, and so forth, used for determining what object or objective kind of object is being perceived. (For a review, see, for example, Norman, 2003.) The capacity to represent objective, non-observer-relative properties of objects as distinguished from the effects these properties are currently having on the perceiver requires the development of what are called “perceptual constancies,” such as the ability to recognize the same size at various distances, the same shape at various angles, the same originating sound through various kinds of interfering noise, the same color under various lighting conditions, and so forth.2 This is not the ability to make anything like subjectpredicate judgments, of course, but merely to represent observer-independent properties, certain configurations of which are then recognized as indicating certain objects or kinds of objects. Consider, for example, a connectionist net that has learned to recognize seven faces from any of various angles and at various distances, but which, if given a new face
On Reading Signs
23
to learn, has as hard a time as it did with the first (indeed, harder, because of interference). Suppose instead that it had somehow learned to recognize same-shape-again quite generally. If it could do that, the next face might be learned in one trial. Representations exist and show significant articulation only insofar as they are used as representations, and insofar as the contrasts corresponding to these articulations matter for these purposes. The other side of the division between the two kinds of representations must be the development of two kinds of uses for these representations. On the one hand, the animal develops general skills in navigating among and manipulating objects-ingeneral, skills that might be applied to any object whose shape, size, orientation, distance, and so forth, relative to the organism, are perceived. On the other hand, the animal develops the capacity to recognize various specific objects and specific kinds of objects, each from a variety of distances and perspectives, and through a variety of intervening media and different sensory modalities. These are objects and kinds of objects suitable to certain purposes, such as chasing, fleeing from, eating, nest building, and so forth, but that must be navigated among or manipulated in order to be used. Thus the animal perceives via the ventral system which kinds of objects to run from, which to approach, which to pick up, which to eat, which to climb up on, and so forth, while it perceives via the dorsal system the relations to itself of these objects, which relations must be taken into account to guide its movements with respect to the objects. How would this development affect the four aspects of mental representation mentioned above? First, representations that result from the achievement of perceptual constancies—representations of objective shape, color, size, and so forth—would seem to be intrinsically uncommitted representations. There is unlikely to be anything relevant to an animal’s immediate activities that follows from the presence, for example, of objective sizes or colors or shapes simply as such. Representations of these properties, say, in early vision, have no one particular use but any of many possible uses, depending on what kind of situation or object in the environment they help to identify. An indicative representation that is not dedicated to any particular use but has many uses is still a representation only because it has uses, but it is not a PPR. Or at least it is moving away from being a PPR. If it has an open-ended set of uses, as in the case of an animal that can learn to identify many new kinds of objects for use by first representing their properties, it certainly is not a PPR, but has a purely indicative character. Second, there seems no reason to suppose, on the other hand, that the separation of representations of objective properties from representations of relations of these objects to the perceiving animal would result, just as such, in replacement of pushmi-pullyu representations with independent indicatives and imperatives. Rather, the immediate result would seem to be the replacement of inarticulate PPRs with articulated ones that explicitly represent what objects are where (or otherwise significantly related to the animal), and
24
Ruth Garrett Millikan
thus immediately guide the animal’s activity. PPRs of this kind would represent the kind of affordances Gibson had in mind when he said that apples are perceived as affording eating, mailboxes as affording letter posting, and so forth. Consider, for example, a cat frightened by an approaching dog. The dog affords (requires) escaping from, which can be done only if the direction of approach of the dog is part of what is represented in perception. The direction of the dog, combined with the direction of something perceived as affording cover, directs taking cover or hiding—rather than, for example, running (on some other occasion) to something perceived as climbable. Thus, although on a deep level, the animal now harbors some purely indicative representations, there is no reason to suppose that it harbors any purely imperative representations. Third, the articulate nature of the PPRs that results from the dorsal/ventral separation allows the decomposition of undifferentiated skills into subskills that may be learned or practiced within certain contexts and then recombined in new situations. The capacity to recognize a certain kind of affording object can be developed in some contexts but then reapplied in others. Likewise, the ability to manipulate or alter relations to objects can be developed or practiced in some contexts and then reapplied in others. Much playing in mammals seems to be devoted to developing such general skills. Fourth, the capacity to recognize and represent objects articulately as differentiated from their relations to the perceiving subject might naturally be applied to the learning of new and different affordances connected with those very same objects. For example, if the dog is good at recognizing its master as an afforder (if approached in the right way) of food, this ability can be put to good use in learning to recognize and then approach its master in the right way to get let outside. But if the same object in the same relation to the animal affords the animal different things on different occasions, it begins to look as if a purely indicative representation of the object bearing a certain relation to the animal may be emerging: Master is in such and such spatial relation to me. Here we should go slowly, however, for two reasons. The first is that the completed representation of most affordances may be considered as involving perception of the animal’s state of need or appetite as well. The more careful statement of the affordance the animal perceives will then be in terms of satisfaction of that need or appetite. The dog perceives a hunger satisfaction affordance or an exerciseneed satisfaction affordance, and so forth. Lifting completely out of the domain in which pushmi-pullyu representations reign may not be so easily achieved. The second reason is that even though recognition of the same object or kind of object may in some cases be involved in the animal’s recognition of more than one kind of affordance, limitations on what the animal is (as Gould put it above) preordained to learn may be very strict indeed. Thus the animal’s perceptions of most situations and objects are likely to remain dedicated to quite specific kinds of tasks, the nature of which has been
On Reading Signs
25
dictated by past history of the species or, to some degree in more flexible animals, past history of the individual. Similarly, it is unlikely that an animal would learn to recognize any object or kind of object for which neither it nor its species has yet found any practical uses. Notice, last, that the separation of ventral and dorsal channels for perception of objects and their relations to perceivers has no tendency to free inner representations from representing only objects as currently related to their perceivers. Thus far we have no account, for example, of how an animal might come to represent objects distant from it in time or space without also representing its current relation to that distant time or place, or without these relations being immediately germane to current action. A step in that direction is in fact taken, however, by many animals—indeed, perhaps even by some that are relatively simple—as follows. Many animals apparently construct and use something like mental maps of the locales in which they live. Among these, perhaps, is the honeybee.3 If you trap a honeybee and release it in a locale with which it is familiar but from which the hive cannot be seen, it will fly up a bit, circle around as if to identify its current location, then fly off in a beeline for home. A number of things are very interesting about this development. First, these maps are not just representations of the relations of other objects to the perceiver. Relations of that sort keep changing, so there would be no obvious point in recording them for future use. Rather, these maps apparently represent relations of other objects, of various places, to one another. A second interesting thing is that unlike perceptual representations, these maps are constructed gradually over time and stored away for future use. As such they appear to be purely factual representations of what some part of the world is like, apart from any particular projects the organism currently has in progress. But perhaps the most interesting thing is that for any such representation to be used, it will have to be combined with or temporarily joined to a current perceptual representation that represents the animal’s current location and orientation within the domain mapped. Joining two representations in this manner to yield a representation of which way to go—that is, to yield a PPR—looks a lot like mediate practical inference. Indeed, there is even a middle term. The same location has to be represented twice: once in its relation to other things not currently perceived, and once in relation to the perceiver as where the perceiver is now. Do bees, then, actually make inferences? Perhaps so. Or perhaps the phenomenon is more parallel to the way a connectionist net may be able to fill in the rest of a configuration on which it has been well trained via Hebbian learning when presented with only a portion of that configuration. In either case, we should not let ourselves be carried away. That an animal can join one kind of representation with another or complete a partial representation for some specific kind of purpose does not make the animal rational. You are
26
Ruth Garrett Millikan
able to join visual representations from one eye with those from the other, using the overlap as a middle term, and thus derive representations of depth, but that is not what makes you rational. Similarly, that an animal can collect and later use one kind of purely factual information, information about the space it lives in, has no implications for whether it can represent any other detached facts. That it collects and remembers information about local spaces depends on the fact that this kind of information has, often enough, been used during evolutionary history—used, indeed, in specific ways. Similarly, many species of birds can remember hundreds and even thousands of caching places in which they have left food for future use. It does not follow that they are capable of collecting and remembering any other kinds of facts. Nor does it follow that they can use knowledge of these facts for any purpose other than finding food when they are hungry. It is likely that the representations of fact that these animals collect are entirely dedicated to very specific uses. They are to be used for completing PPRs of very special predetermined kinds. Parallel to the way in which animals collect specified kinds of factual information for predetermined uses, they may also collect certain kinds of skills out of the context of serious use. Young mammals, in particular, do a lot of playing. But once again, the things that they play at are always closely related to future uses. Animal play develops not arbitrary skills but skills for which the species has historically had uses. Now it is true that through rigorous and careful, step-by-step training by humans, individuals of many higher species can laboriously be brought to recognize perceptual affordances of kinds quite remote from any they were specifically designed to learn. They have some capacity to recombine their abilities to learn to recognize objects and to remember successful perception-induced response sequences so as to produce behavior patterns of kinds fairly remote from any anticipated in the histories of their species. There are three things that I strongly suspect they are not able to do, however, or to do at all well, but that humans seem to do quite easily. The first is to represent pure facts that concern situations or objects of a sort that have not yet proved to be of use either to the animal or to prior members of its species. The second is to represent facts about world affairs that have entirely unknown relations to the animal. The third is to be motivated by representations that do not originate from the animal’s perception of its current needs and/or current environment. Concerning the third, notice that the motivating representations we have been discussing are all PPRs. Typically, the indicative faces of these PPRs represent facts about the animal’s current needs, coupled with facts that concern its immediate environment—as joined, perhaps, to some stored knowledge of the relation of the immediately present part of the environment to the wider environment which helps to fill out the animal’s perception of its current relation to more distant affording situations, places, or objects. Even our
On Reading Signs
27
most respected and intensively studied relatives, the monkeys and apes, seem to derive their motivation entirely from perception of the current situation. Thus, for example, Merlin Donald summarizes the literature on signing in apes: “. . . the ‘meaning’ of an ASL sign to an ape is simply the episodic representation of the events in which it has been rewarded . . .” (1991, p. 154) and “The use of signing in apes is restricted to situations in which the eliciting stimulus and the reward are clearly specified and present, or at least very close” (p. 152). No dog, I suspect, or even chimp, wonders where its next meal is coming from unless it is already hungry, nor does it wonder how it will cope next winter. Of course, appropriate migrating behaviors are elicited, in certain species, by natural signs that current food sources are running out, or by natural signs correlated with the imminent approach of winter. The indicative facets of the PPRs that are responses to these natural signs indeed do, though quite inarticulately, concern the future. These PPRs will produce appropriate behaviors only in the event that these future events are indeed imminent. What this shows, however, is only that animals are sometimes capable of perceiving the future, things temporally distal, just as they are capable of perceiving things spatially distal. Similarly, you must perceive the future in order to position yourself to catch a ball now in midair. It does not follow that you, or the animal, has left the level of PPRs. But some human mental representations seem be free both from the yoke of historical usefulness and from the necessity of representing relations to self. And some motivating representations seem to be free of the bonds of currently perceived affordances. Unlike other animals, we represent and remember thousands of facts of kinds for which neither we nor our ancestors have yet found practical uses. The nonfiction sections of libraries are repositories, largely, for immense collections of such facts. We are able to interpret natural signs and also linguistic signs of world affairs that are distant from us both in time and in place. We think about both the past and the distant future. We interpret signs of distant affairs and remember these facts even when we have no idea what relations these affairs bear to us. I know, for example, that gerbils come from the desert, but I have no idea what desert, or what use my knowledge of this fact about gerbils might have. Humans are adept at learning to interpret new kinds of signs, not just human language signs: at learning how to read meters and scopes and information filtered through a multitude of other instruments. Apart from us, perhaps only apes can learn to interpret even visual information reflected from a mirror, and then only for guiding current activity. We notice and remember not just what we can cause, or what causes something we want, but what causes what, quite out of context. We also spend huge amounts of energy and time developing skills, both physical and intellectual, for which neither we nor our ancestors knew any practical uses. We practice bouncing balls, juggling, manipulating Rubik’s Cubes, riding skateboards, cracking our
28
Ruth Garrett Millikan
knuckles, wiggling our ears, blowing bubbles, whistling through our teeth, spinning around to make ourselves dizzy (children often love this), and so forth and so forth. Similarly, we collect dreams of things we would like to do or have done, places we would like to go, things we would like to have or to be able to build, without having any notion how to fulfill these dreams. Certainly these dreams are not currently perceived affordances. Nor are they representations of currently perceived needs. In short, we appear to be compulsive collectors of all kinds of junk! Looking at the evolution of these strange capacities and behaviors, it is clear, of course, that although many or even most of them may never find uses, the general disposition to collect junk does find uses. If you have enough storage space and a good enough retrieval system, some pieces of that junk may well come in handy sometime, though there was perhaps no way to tell in advance which pieces. But it is not just that we have bigger storage barns than do neighboring species, bigger brains, although that may be part of it. What we really are alone in having, I suspect, is what Dennett (1996) likes to call “Popperian” minds. We have the capacity and disposition to play games in our minds, entirely divorced from current perception, tinkering with the collected junk to see what might be built out of it that would be useful or help fulfill otherwise empty dreams. We do trials and make errors in our heads. We learn in our heads. It is because we can do this that we can represent desires and goals of kinds that neither we nor our species have ever realized. These desires are imperative representations designed for a job: to become fulfilled someday by means of lucky tinkering. It is because dreams and desires of this kind are sometimes fulfilled that our cognitive mechanisms have been designed to produce them. Indeed, this is what makes them be (intenTional) representations. Without this they would have no biological uses, and hence could not be representations at all. What, exactly, is the lesson, then? If an animal lacks the capacity to form mental representations having certain kinds of content, obviously it cannot learn to understand signs, either conventional or natural, that carry those contents. But perhaps most of what we humans convey with signs is of a kind that, for animals without Popperian powers, would be utterly useless for them to represent. And for an animal to represent what it can have no use for representing is actually a contradiction in terms.4 Notes 1. For a discussion of the difference between inference and translation, see Millikan (2004, chap. 9). 2. On perceptual constancies, see any elementary textbook on general psychology or perception and cognition. 3. See, for example, Gould (1986) and Gould and Gould (1988). For dissent, see Wehner and Menzel (1990) and Dyer (1996). 4. For a more detailed discussion of many of the matters addressed in this chapter, see Millikan (2004).
On Reading Signs
29
References Anscombe GEM (1957) Intention. Ithaca, N.Y.: Cornell University Press. Augustine, Saint (1986) On Christian Doctrine (Robertson DW, Jr., trans.), bk. II. New York: Macmillan. (Written c. 427.) Dennett DC (1996) Kinds of Minds. New York: Basic Books. Donald M (1991) Origins of the Modern Mind. Cambridge, Mass.: Harvard University Press. Dretske F (1981) Knowledge and the Flow of Information. Cambridge, Mass.: MIT Press. Dretske F (1991) Explaining Behavior. Cambridge, Mass.: MIT Press. Dyer FC (1996) Spatial memory and navigation by honeybees on the scale of the foraging range. J Exper Bio 199: 147–154. Gallistel CR (1990) The Organization of Learning. Cambridge, Mass.: MIT Press. Gibson EJ (1977) The theory of affordances. In: Perceiving, Acting, and Knowing (Shaw RE, Bransford J, eds.). Hillsdale, N.J.: Lawrence Erlbaum. Gibson JJ (1966) The Senses Considered as Perceptual Systems. Boston: Houghton Mifflin. Gould J (1982) Ethology: The Mechanisms and Evolution of Behavior. New York: Norton. Gould J (1986) The locale map of honeybees: Do insects have cognitive maps? Science 232: 861–863. Gould J, Gould C (1988) The Honeybee. New York: Scientific American Library. Grice P (1957) Meaning. Phil Rev 66: 377–388. Kant I (1978) Anthropology from a Pragmatic Point of View (Dowdell VL, trans., Rudnick H., ed.), bk. I, sec. 29. Carbondale: Southern Illinois University Press. Lorenz K, Tinbergen N (1939) Taxis und Instinkthandlung in der Eirollbewegung der Graugans. Tierpsychologie 2: 1–29. McFarland D (ed.). (1981) The Oxford Companion to Animal Behavior. Oxford and New York: Oxford University Press. Millikan RG (1996) Pushmi-pullyu representations. In: Philosophical Perspectives, vol. 9 (Tomberlin J, ed.), 185–200. Atascadero, Calif.: Ridgeview. Reprinted in Mind and Morals (May L, Friedman M, eds.), 145–161. Cambridge, Mass.: MIT Press. Millikan RG (2004) Varieties of Meaning: The Jean Nicod Lectures 2002. Cambridge, Mass.: MIT Press. Norman J (2003) Two visual systems and two theories of perception: An attempt to reconcile the constructivist and ecological approaches. Behav Brain Sci 24 (6): 73–144. Ockham, William of (1938) Super quattuor libros Sententiarium subtilissimae earumdenque Decisiones (Tornay S, trans.), II, qu. 25. In: Ockham: Studies and Selections, by Tornay. La Salle: Open Court, 1938. (Written 1495). Searle J (1983) Intentionality. Cambridge: Cambridge University Press. Tinbergen N (1951) The Study of Instinct. Oxford: Clarendon Press. Wehner R, Menzel R (1990a) Do insects have cognitive maps? Ann Rev Neurosci 13: 403–414. Wehner R, Menzel R (1990b) Insect navigation: Use of maps or Ariadne’s thread? Ethol Ecol Evol 2: 27–48.
3
Primitive Content, Translation, and the Emergence of Meaning in Animal Communication
William F. Harms Historically, most philosophers of the Western tradition have regarded human beings as being dramatically different from “mere” animals, particularly with regard to our mental and social lives—representation and communication. On this, both ancient Greek thought and Christianity have concurred. Since Darwin, there has been considerable shift in overt opinions on this question, especially given the continual secularization of academe. Nonetheless, the set of theoretical tools, concepts, and presumptions we bring to bear on the newly necessary comparative analysis of humans as animals echoes the old views, making it difficult to proceed. For instance, considerable progress has been made since the beginning of the 20th century toward developing mathematically rigorous models of the structure of language, meaning, and logic. These models have served well as the focus of systematic inquiry into the source and nature of meaning (a rather different question than that of its structure). However, current logic is a flawed tool for comparative analysis of human and animal communication because it begins by taking the most peculiar form of signaling behavior as the paradigm, and assessing all others in terms of their semblance to it. What the comparative analysis of human and animal communication requires is a basic vocabulary which is more informed by the historical emergence of meaning, from which the intricacies of human language and thought can be seen to emerge in their turn. I will suggest that recent developments in functional semantics provide an appropriate vocabulary for talking about meaning in both animals and humans. Certain cherished distinctions survive in modified form. Others are shown to be refinements peculiar to humans and our mental life. In particular, concepts akin to Frege’s (1980) “sense” and “reference” are necessary for a sign to have fully conventional meaning, and the emergence of these relations can be seen in clear steps. On the other hand, the atomistic assumption that the word is the basic unit of meaning, and that truth and falsity arise when meaningful words are combined according to rules of sentence construction, will be seen to be an anthropocentric projection onto the broader field of animal communication. The truth of representation, as it turns out, is prior to the reference of parts of the representation, at least as systems of representation emerge and evolve. In what follows, I will begin with a sketch of relevant aspects of the standard view of meaning as expressed in current formal logic and semantics. I will then present a gametheoretic model of the simplest kind of meaning—“primitive content”—and show how it can help us with two important tasks. The first is understanding precisely why we cannot typically translate animal communications, and how we can nonetheless specify the meaning of those communications. The second is establishing a clear sequence of stages
32
William F. Harms
by which fully conventional meaning first emerges, and then is elaborated in the direction of the complexities evident in human thought and language. The Standard View Intentionality To a large extent, the reason that language and mind remain part of the subject matter of philosophy (rather than being the exclusive province of linguistics and psychology) is what Brentano (1973) termed the “intentionality” of thoughts and linguistic representations— the way in which they stand for or are about something else. Intentional relationships have remained a philosophical puzzle because they do not appear to be like any of the physical relationships that material science trades in. For instance, what a word (“zebra”) stands for is only one of very many causal factors (zebras) contributing to the occurrence of the word. Picking out which contributing cause is the word’s reference has proved to be a surprisingly daunting task (Dretske, 1981). In Grice’s (1989) terms, physical processes tend to give us the “natural” meaning of signs. What we need is an account of conventional meaning. The formal models of reference which accompany formal logic simply stipulate the intentional “maps” between terms and objects (or sets of objects). These models do not provide an account of how these maps arise in the real world. Thus, the edifice of formal logic floats on a fundamental mystery—the source and nature of intentionality. Truth and Reference The standard formal semantics as given by Tarski (1956) has it that words (terms) refer to objects or classes of objects, and that the properties of truth and falsity pertain only to sentences of the indicative mood, such as statements of fact. Abstracting away from the myriad complexities of human language, predicate logic provides a simple two-part syntax for indicative sentences or “propositions.” A predicate, which refers to a class, is conjoined with a subject, which refers to an object or class of objects. The proposition is true exactly when the reference of the subject is a member or subset of the reference of the predicate. On Tarski’s scheme, the sentence “Snow is white” is true if and only if the reference of “snow” is a subset of the reference of “is white,” the latter being things which are white. The point here is that on this pervasive model, truth is not a property of the most basic units of meaning (words). Truth appears as a compound relationship which arises only when referring words are combined in a subject–predicate relationship. As a result, if animals do not communicate in words, their utterances cannot be true or false.
Primitive Content
33
Sense and Reference Since Frege’s pioneering work (1980) we have understood that the meaning or content of each meaningful linguistic entity is the combination of two sorts of conventional or rulegoverned relationships. Frege made this point with the following example. For centuries, sailors had navigated by reference to the morning star as well as the evening star, without ever realizing that they were one and the same as the planet Venus. In Frege’s terms, “morning star” and “evening star” share reference (Bedeutung), but clearly do not mean exactly the same thing. They differ in what he called their “sense” (Sinn), which is not a matter of external reference, but of the relations of the terms to other representations. For instance, the morning star can be seen in the morning, which is not true of the evening star, and so on. So generally, the intentional properties of any conventionally meaningful sign include both (1) relations regarding externally mapped object(s) or what a representation “corresponds” to and (2) relations regarding internally mapped representations or events, logical consequences, definitional implications, and the like. We have substantial vocabularies for both sorts of relationships. The former are variously referred to as “correspondence,” “reference,” “truth conditions,” or “extensional” rules or conventions. The latter include “intension,” “definition,” “connotation,” “implication,” “entailment,” and so on, depending on which part of human language we are talking about. Since there is, oddly, no standard pair of terms for these two sorts of conventions in general, we will find it convenient to distinguish (external) correspondence conventions from (internal) procedural conventions. For instance, class designators like “zebra” or predicates like “has spots” correspond (refer) to the classes which they designate—in this case, the classes of zebras and things with spots. The procedural conventions (sense) for words are a complex matter, but roughly, they are determined by the definitions of words and the concepts they express. So part of the procedural or interpretive meaning of “zebra” might be “mammalian quadruped with stripes, resembling a horse but with a disposition making it unsuitable for domestication.” Part of the procedural meaning of “has spots” might be something that has a visible pattern consisting of at least two colors, and that the spots have a certain size, spacing, and range of shapes. Roughly, what a term corresponds to is what it “points to” externally, and its procedural meaning is what follows from it within the representational system. The correspondence meanings of complete sentences are the states of affairs that make them true, or their “truth conditions.” Simple subject-predicate sentences attribute properties to objects, such as “That zebra has spots.” Again, the sentence is true exactly when the object named in the subject (that zebra) is in fact a member of the class designated by
34
William F. Harms
the predicate (things with spots). So, as it is commonly put, “That zebra has spots” is true if and only if that zebra has spots. This is correct as far as it goes, but one might reasonably complain that it doesn’t go very far. The procedural rules for complete sentences are determined by the inferential relations within the language in which they occur. In some cases, these may specify those other representations which would “justify” the representation in question. In other cases, procedural relations involve the meaning-based or logical consequences of the representation, including those which depend conditionally on other representations. In the case of the propositional attitudes such as “hoping that” or “fearing that,” the attitude directed toward the proposition also determines what should follow from it in terms of further thoughts, feelings, or behavior—again, part of the full procedural meaning of the total cognitive state. Modes of presentation of propositions such as the imperative and the indicative are also a matter of procedure rather than of correspondence. Correspondence and procedural rules are not conjoined by necessity, as is evident in Frege’s evening star/morning star example. “Evening star” and “morning star” correspond to the same object—the planet Venus—but differ (procedurally) in their connotations. Exactly how they are conjoined is, however, one of the central mysteries of intentionality. Animal signals such as warning cries often appear meaningful, yet applying the standard view to their analysis makes them seem either more confusing than human sentences, or else “mere” vocalizations. The somewhat overworked example is the cry a vervet sentry gives upon seeing a leopard, which elicits a distinctive evasive maneuver. Does it mean “There is a leopard,” or is it a sort of command, like “Run up a tree”? The answer would seem to be both and neither. The conventional aspect of the signal seems to share with the statement (“There is a leopard”) its truth conditions, and with the command (“Run up a tree”) its normative consequences. As such, attempts to translate animal signals into any human sentence run afoul of the fact that human sentences, or at least those understood within the propositional paradigm, may indicate facts or command actions, but never both. This may make it seem as though animals not only do not name objects, but they do not state or command in the way we do. This makes it seem less and less likely that animal signals are meaningful according to anything but the weakest analogy with human utterances. Not all philosophers have been convinced that the dominant formal treatment of language gets to the heart of matters of meaning. Of particular relevance here, Austin (1962) pointed out how infrequently human communication consists simply of indicative statements. In his terms, every locutionary act or actual statement is accompanied by an illocutionary act (the intended effect) and a perlocutionary act (the actual effect). Austin’s threefold distinction has been quite influential in the analysis of human communication
Primitive Content
35
because it encourages us to see the many ways in which human beings communicate with language opportunistically and creatively. However, Austin’s problematization of the standard view, as much as it helps in making sense of human communication, does so by adding complexity to the standard view rather than by simplifying it. Illocutionary acts, situational intentions, require both locutions (propositions) and the ability to have intentions regarding the reactions of others. Thus, Austin’s framework requires more representational complexity than the standard view. What we require for the purposes of comparing human and animal communication is models with less complexity. Consequently, if Austin’s distinction is to be relevant to animal communication, we would need some reason to think that animals were capable of something like indicative representation to start with. I will discuss that question at greater length later on. The Lewis-Skyrms Meaning Game One of the more interesting developments in philosophical semantics since the 1980s is the convergence of two rather different approaches to the theory of meaning on the same basic model for the emergence of meaning in adapted signaling systems. This model, which confounds the expectations of most philosophers in grounding a correspondence theory of meaning/truth, at the same time provides the tools required for bringing genuine semantic analysis into the field of animal communication. The two approaches are Ruth Millikan’s (1984,1993) functional analysis and Brian Skyrms’s (1996) game theoretic approach. We will focus on Skyrms’s approach here, since Millikan develops her views elsewhere in this volume. The primary item of interest here is the kind of content characteristic of simple adapted signaling systems in both Skyrms’s and Millikan’s models. Skyrms begins with a signaling game borrowed from philosopher David Lewis’s (1969) Convention. Lewis was investigating whether or not coordinating conventions might supply the fallible standards required for a theory of meaning. In his two-player pure coordination game, at each round one player is the sender and the other is the receiver. The sender perceives the state of the world and sends a signal to the receiver. The receiver chooses an action on the basis of the signal, and if it is appropriate for the state of the world, both sender and receiver get one point; otherwise, nothing. The interesting part is that the sender has to choose which signal to send in which state, and the receiver must decide what state the signal indicates. Moreover, the players cannot communicate outside the confines of the game. In the simplest case, the world has two states, T1 and T2. There are two acts, A1 and A2, and we stipulate that A1 gets the payoff in T1, and A2 in T2. There are also two messages, M1 and M2, which are not associated with either act or state. (Think of them as
36
William F. Harms
blue flags and red flags, if you like.) The game itself can be made as much more complex as you like, with more states (which change in ordered ways), messages, and acts, more players, and more complicated payoffs simulating costs for senders. (Note that there don’t need to be equal numbers of states, messages, and acts.) Logically, there are only four sender strategies, S1–S4, and four receiver strategies, R1–R4, as follows. S1: If T1, send M1; if T2, send M2. S2: If T1, send M2; if T2, send M1. (opposite of S1) S3: Send M1 always. S4: Send M2 always. R1: Receive M1, perform A1; receive M2, perform A2. (coordinates with S1) R2: Receive M1, perform A2; receive M2, perform A1. (opposite of R1) R3: Always perform A1. R4: Always perform A2. If each player has an equal chance at each round of being sender or receiver, then each player needs both a sender strategy and a receiver strategy. There are a total of 16 combined strategies from S1/R1 to S4/R4. Strategies S1/R1 and S2/R2 are distinguished as being capable of generating the full payoff by generating reliable coordination of receiver’s actions with the world states, termed by Lewis to be fully qualified “signaling systems.” All others either involve one of the nonresponsive substrategies (S3, S4, R3, R4), or else combine the responsive strategies S1 and S2 in uncoordinated ways, resulting in the “antisignaling strategies” S1/R2 and S2/R1, where the receiver always manages to do the wrong thing. This game has three equilibriums, one of which is the unstable combination of S1/R2 and S2/R1 (these two strategies do well against each other). The other two are Nash equilibriums of all S1/R1 and all S2/R2. If both players are playing one of the signaling system strategies, no advantage can be gained by any deviation. Games of this sort are important because they are the most precise tools we have to analyze the creation of conventions, and the meaning of signs is a matter of convention. The critical point here is that in the game state dominated by S1, R1, signal M1 has come to correspond to T1, the latter being very like the truth conditions of sentences, while M2 is “true” when T2. Similarly, it is plausible to say that M1 has acquired A1 as its conventionally specified procedural consequence. When strategy S2/R2 dominates, M1 is true when T2 and “implies” A2, while M2 is true when T1 and “implies” A1. Thus, which usage conventions obtain is determined by the equilibrium that play has brought the game to, and in order for a signal to function in a coordinating rule, it must acquire both kinds
Primitive Content
37
of conventions. The standards for correspondence and procedure are fallible in the sense that the signal still means what it does if the system accidentally deviates from its equilibrium. Lewis had intended the game to be one for ideally rational agents, showing how usage conventions could emerge from common knowledge of rationality of the other player and of the payoff structure. It does show how the meaning of communicated signals can emerge and stabilize in the absence of other communication about what the signals mean. On the other hand, rational choice games presuppose that the players are already having meaningful thoughts about each other. Thus, it cannot explain the meaningful communication in creatures lacking a model of mind. Skyrms (1996) was largely concerned to show that mindless evolutionary processes can often solve decision problems that idealized rational agents cannot. In Skyrms’s model, instead of strategies being chosen by agents with common knowledge of each other’s ideal rationality, strategies were inherited in a population under selection, relative reproductive rates being determined by success in the game. The result of numerous Monte Carlo computer simulations was that the system invariably equilibrates, with the entire population possessing one or the other of the two signaling system strategies. The good news for the study of animal communication is that the equilibrium states in which the signals acquire determinate meaning can be acquired even in the absence of learning, much less a model of mind. In order for a signal to have an objective conventional meaning, it is not necessary for the individual sending the signal to intend that it have a meaning, nor for the individual to intend that the receiver respond in some particular way. (Such a requirement makes it rather hard to see how meaning could ever get off the ground in the first place.) Rather, the meaning of simple signals depends on the stabilization of usage conventions, and these emerge from functional historical productivity of a set of signaling conventions. As a consequence, even the most instinctive, automatic, or reflexive production of and response to signals can imbue the mediating signal with meaning, as long as the production and response mechanisms are coadapted to coordinate their behavior on the basis of a potentially arbitrary signal. We shall return presently to the question of why meaning requires coadaptation. But what kind of meaning do such signals acquire? Primitive Content It should not be surprising to discover that the culprit in the inability to theoretically accommodate the meaning of animal communications is a certain anthropomorphism in our theory of meaning: the presumption of word reference as prior to the meaning of
38
William F. Harms
state indicators. Still, we cannot solve the problem by abandoning all of our presumptions about meaning, for to do so would be to abandon meaning entirely. For instance, we can avoid anthropomorphism in the usual way by restricting ourselves to purely causal analyses of animal communications. But meaning is lost, since meaning is not a causal relation but requires externally identifiable conventions regarding correct occurrence and interpretation. The solution to this dilemma is rejecting the presumption of propositional structure and combinatorial syntax as a prerequisite for meaning while retaining the Fregean idea that meaning is the conjunction of correspondence and procedural rules. In so doing, the essentially conventional and two-part character of meaning is preserved while highly specific features derived from the detailed analysis of the communication of one species—human beings—are rejected. The Lewis-Skyrms game helps us here by giving us a picture of what meaning is like in systems whose signals do not express propositions. Leaving aside the details of human representation, the external meaning of a signal is what it refers to or denotes or what makes it true—what it corresponds to outside of the system in which it occurs. What makes this mapping noncausal is that even if the signal occurs in the absence of the state which makes it true, it still has that state as its (correspondence) meaning. The occurrence of the signal in the absence of the state constitutes a kind of failure of system, one which is typically called “falsehood.” In the case of the vervet’s leopard cry, we are inclined to say that it is false when no leopard is present, since it is obviously supposed to occur when leopards are in fact present. Procedural rules in propositional language typically involve other representations or concepts. The nonanthropomorphic question to ask is What does the signal mean in terms of what follows from it? When is the signal interpreted correctly, and when incorrectly? In the Lewis-Skyrms game, the correct interpretation of a signal is not another signal or an inference, but an action. In the vervet’s signaling system, the correct interpretation of the leopard cry is running up a tree—again, an action rather than another signal, concept, or representation. If standards for correspondence and procedures emerge as a matter of adapted function, then the simplest possible sort of content would attach to the functionally simplest signaling system—that with a single signal, resulting in a single response, coping with a single state of affairs. This kind of content, schematized in the Lewis-Skyrms game but evident in animal warning cries and, indeed, in internal control signals (as well as in simple control mechanisms like thermostats) should be taken as biologically primitive. Signals like this possess convention both for tracking external states and for direct motivation of behavior. Millikan (chapter 2 in this volume) calls these “pushmi-pullyu” representations, or PPRs, and concurs in taking them as semantic primitives.
Primitive Content
39
The reason for choosing these adapted tracking-and-motivating signals as the point at which meaning has properly emerged is as follows. We imagine the typical evolutionary sequence for a control system to begin without any meaning or representation—merely a number of local disturbances which are correlated with functionally relevant states. We can call these “unexploited natural signs.” The first stage is that some organism (or organ) develops a discriminating response to some disturbance. Such exploited natural signs are often referred to as “indexical” signals, or as “natural” meaning in ethnographic literature, though the term “indexical” has a rather different meaning in philosophy. Second, the presence of the adapted/discriminating response allows selection for adapted production of the disturbance/signal, either to exploit or to enhance the functional response. Only when coadapted production occurs to enhance the response process does the signal conventionally coordinate cooperative behavior, for only then is it possible for the signal to be replaced by one which is fully conventional. The reason for insisting that this is the point at which meaning emerges is that the adapted signal has acquired conventions governing both aspects of meanings—correspondence rules for the state that is tracked, procedural rules for the normative causal responses to the signal—and these conventions are the same for both sender and receiver. This is primitive content. Capturing Primitive Content in Propositional Language Much of animal communication consists of signals which exhibit primitive content, and even where signals demonstrate more elaborate context-sensitive structure and function, primitive content serves as an appropriate benchmark. It also serves as a useful tool for understanding the difficulty of translating animal communications. Consider: If we try to translate the vervet’s cry as “There is a leopard here,” we are immediately aware of the inadequacy of that translation. The problem, or at least the theoretical problem, is not that we may not have correctly captured the correspondence meaning of the cry, for our language is sufficiently powerful and flexible that we can try again: “There is a hungry leopard here.” “There is a hungry female leopard within 100 yards.” Our own language, due to its elaborate syntax and large, open-ended vocabulary, possesses extraordinary referential flexibility. Perhaps we cannot exactly duplicate the correspondence of the leopard cry in some long sentence, but we can come very, very close (at least if we know a lot about when the signal is supposed to occur). Rather, the problem is that even if we duplicate the correspondence meaning of the signal exactly, we have completely missed its procedural component. On the one hand, our sentence “There is a leopard here etc.” is loaded with implications (e.g., “there is a mammal here”) that the vervet’s cry does not share, since those procedural consequences are a feature of human
40
William F. Harms
language. On the other hand, our own semantically rich sentence nonetheless does not convey the procedural implication of the vervet’s cry: that one should be running up a tree now. The point is that the problem of translation is not that the vervets are somehow talking about a different world than we are. To be sure, they categorize states of their world via their own behavioral options, but the fine-grained discrimination that our own language possesses makes it possible to come arbitrarily close to duplicating that categorization referentially. What we cannot do is, in the same (noncompound) sentence, duplicate the procedural implications of the warning cry. This should not surprise us if correspondence and procedure are conjoined by function, as the Lewis-Skyrms game indicates. The moral is that you can translate only signals between systems with very similar functions. This is why you can translate (well enough) between German and English, but not between vervet and English. Fortunately, even though we can’t translate across species boundaries, it is still possible to capture the full meaning of animal signals in human language. Again, the extreme referential flexibility of our own language comes into play. Just as we can specify the correspondence meaning of the vervet’s signal, we also can referentially specify the procedural meaning as well. In fact, this is just what we are inclined to do. Such a cry is supposed to correspond to a leopard being near, and means, procedurally, to run up a tree. Include the fact that these two aspects of meaning are conjoined in the same monolithic signal, and you have said all that could have been said with a translation. This model of primitive content makes it clear why we can’t translate animal meaning, but that we can nonetheless capture that meaning completely—just not in a single signal or sentence. This takes a bit of self-awareness about our own language: that it is perhaps unmatched in referential discrimination but, despite this descriptive power, its translational capabilities are limited. This limitation is not distinctive of powerful language, but rather of the functional nature of meaning. How Signaling Becomes Language: A Model A naturalist theory of meaning such as the one discussed in this chapter is a quintessentially interdisciplinary enterprise, in that it must address the very old philosophical puzzles regarding truth, reference, and translation while providing a framework within which the comparative empirical study of communication systems can proceed. Though only gradually gaining broad acceptance, the teleofunctional theory of meaning as pioneered by Millikan (1984) has shown itself to be surprisingly, and perhaps uniquely, capable of addressing the philosophical problems. We shall see that it can do equally well in providing a framework for comparative empirical work.
Primitive Content
41
The major conceptual reorientation required from the philosophical enterprise is twofold. First, we must no longer assume that the word is the basic unit of meaning and that truth and falsity emerge only when simple referring symbols are combined to form complete representations. Rather, something very like warning cries (Millikan’s PPRs) must be taken to be semantically basic, and increased semantic complexity arises by subdividing the function of signals exhibiting primitive content. The process of subdivision of the basic function of signals will become important in structuring the framework for comparative analysis (see table 3.1). Second, meaning derives from historical patterns of successful coordination and comes in as many varieties as there are functionally distinct signaling systems, rather than being an abstract universal (i.e., the proposition) which somehow attaches to signaling systems with sufficient complexity. From the scientific point of view, functional semantics entails not so much a reorientation as an expansion of the field of analysis. The outsider’s impression is that most of the literature devoted to the evolutionary analysis of signaling behavior tends to focus on the competitive economics of the exploitation of perception (Zahavi and Zahavi, 1997) rather than on the kind of predominantly cooperative signaling illustrated by the Lewis-Skyrms game and by Millikan’s approach. In particular, a great deal of attention has been directed toward the question of what sorts of mechanisms can enforce signal reliability in the presence of competitive interests. However, we shall see shortly that meaning, properly speaking, is fundamentally a cooperative phenomenon which requires some solution to the typical instability of cooperative behavior (Sober and Wilson, 1998). At this point, the most useful thing to do for the joint empirical and philosophical enterprise is to sketch the relationship between primitive content and common paradigms of communication. This sequence of development has been touched on above in the context of establishing primitive content as the point at which meaning properly emerges. Let us consider it in more detail. Table 3.1 provides a summary of the basic stages. Stage 0: The analysis of the emergence of any phenomenon should begin with its absence. In the case of meaning, we understand the world to be a web of causal processes which leave exploitable information-bearing traces in their wake. In the absence of any perception, it is difficult to see in what sense these traces are meaningful. Presumably, bear tracks mean that a bear has been by only to a perceiver. Stage 1: The emergence of meaning begins properly with the development of perception. In the most minimal sense, we understand perception to be an adapted response to some external stimulus which can be used to affect subsequent behavior. In simple perception there are emergent interpretation conventions on the part of the receiver. (Think of simple forms of predator detection.) However, natural signs do not have fully conventional meaning because, at this stage, what the sign corresponds to is ambiguous since production and perception are not coadapted.
42
William F. Harms
Table 3.1 Stages in the emergence of meaning Stage
Examples (semantic properties)
Adaptations
0 Causal Trace
Chemical diffusion, impact traces, light reflection (None) Olfaction, touch, sight (Natural meaning, emergent correspondence, and procedural rules) Warning cries, signals mediating approach and avoidance reflexes, tracking-andmotivating PPRs.1 (Emergent nonnatural meaning; unambiguous correspondence and procedural conventions) Sexual signaling, predator/prey signals, feeding competition (Ambiguous correspondence and procedural conventions) Foraging control systems, rational choice, belief/desire architecture (Multiple interdependent semantic subsystems) Human sentences with propositional subject/ predicate structure, categorizing systems, associative learning, linguistic indexicality, naming (Pure indicators with combinatorial syntax and possible truth-functional semantics) Self-awareness, model of mind, deliberate lying, function stabilizing mechanisms, normative semantics,2 illocutionary force3 (Massive semantic complexity; competitive elements)
None
1 Perception
2 Primitive Content
2a Exploited Perception 3 Pure Indication 4 Symbolic Reference/ Propositional Contents 5 Hierarchical Representation
Receiver only
Sender and receiver coadaptation (Cooperative)
Sender and receiver coadaptation (Competitive) Sender and receiver cooperative coadaptation and coadapted subsystems Sender and receiver cooperative coadaptation and coadapted subsystems
Sender and receiver cooperative coadaptation and coadapted subsystems
1. Millikan, chapter 2, this volume. 2. Harms (2000). 3. Oller, chapter 4, this volume.
Stage 2: This stage is primitive content. Fully conventional meaning first emerges when production and interpretation of signals are coadapted, when both sender and receiver have conventional strategies and where these strategies are designed to work together or coordinate, eliminating ambiguities in the usage conventions. So far, we have been focusing on the well-worn examples of warning cries as paradigms of the most basic kind of signals possessing fully conventional meaning, partially because the primary focus of this chapter is the comparative study of animal communication. But sender and receiver need not be separate organisms mediating their coordination. The analysis of usage conventions applies just as well to internal signaling in biological control systems as it does to external ones. A simple mechanism like the blink reflex can be fairly characterized as involving a neurological signal that mediates between a discriminating sensor and a motor response. Since
Primitive Content
43
the selective history of that mechanism specifies external conditions under which response (blinking) to the mediating signal was productive, and which response to the signal was the correct one, then that history specifies correspondences and procedures, and thus primitive content for the signal. This is not to say that every adapted system possesses meaning, since adaptation itself does not necessarily require mediated signals which are assigned conventional coordinating roles. Rather, the point is that meaning is not exclusively the province of interorganism communication, but is a ubiquitous characteristic of all complex control systems, and in general one should expect that an organism’s internal representational system is far more elaborate than its external communications indicate. The functional analysis of meaning applies equally well to both, and thus legitimates the possibility that one of the ways interorganism communication can evolve is by the expression of meaningful internal representations. It is possible, for instance, that warning cries are external versions of much older internal warning signals, in which case the full semantic analysis requires consideration of the functional histories of both usages. For signals, whether internal or external, to have unequivocal meaning, they must mediate coordination that is fundamentally of a cooperative nature. As is well known, cooperation in evolutionary processes is fundamentally unstable, relying as it does on individual self-restraint in the interest of overall longer-term greater productivity or efficiency. This fundamental point was made forcefully by George C. Williams (1966) and brought to the attention of the general public by Richard Dawkins (1976). Only with a great deal of formal research in a variety of fields since the 1960s have we come to understand that while cooperation is fundamentally unstable, there are nonetheless a number of mechanisms which allow it to persist, from kin selection (Hamilton, 1964), short-term memory (Axelrod, 1984), and nonrandom assortment (Sober and Wilson, 1998), to spatial segregation effects (Harms, 2001). The consequence of this research trend is that while we can never assume that cooperation underlies some coordinated interaction without some specific stabilizing mechanism, we should not be surprised to find it. It is this general expectation, for instance, which allows the Lewis-Skyrms game simply to assume an underlying cooperative arrangement. On the other hand, we have also learned to expect that the ways in which cooperation can be undermined are as numerous as the ways in which it can be protected. Table 3.1 includes at the stage 2 level of complexity an alternative situation (2a) in which both sender and receiver have coadapted strategies and the signals have the monolithic tracking-and-motivating character, but where interests of sender and receiver are competitive. Such situations are of particular interest for us here, since they are the primary focus of much research on animal communication. Classic examples are communication between prey and predator, like the “stotting” or vertical leaps of gazelles, and sexual
44
William F. Harms
signaling, like the excessive plumage of some birds. The fact that mechanisms like the handicap principle (Zahavi and Zahavi, 1997) allow such signals to be reliable means we cannot assume that reliable signals indicate an underlying cooperative arrangement, and thus that the signal has an unequivocal meaning. (Competitive interests create competing standards for when a signal should occur.) Moreover, the fact that cooperation is fundamentally unstable indicates that even in predominantly cooperative signaling systems we should expect to find competitive elements. The paradigm example of warning cries obviously invites cheating, and even where mechanisms exist to guard against cheating, the typically increased risk to senders creates an underlying conflict of interest. Consequently, rather than stages 2 and 2a being distinct, we should expect to see a full spectrum of situations in which interests of senders and receivers have both cooperative and competitive elements. In such cases, we can allow the cooperative element to specify the meaning of the signal, with the understanding that competition may affect reliability or even render the signal meaningless where competition becomes overwhelming in its effect. On the other hand, systems like predator–prey signaling may generally be completely competitive in nature, allowing no unequivocal assignment of meaning to the signal. In table 3.1, stages 0–2 represent the emergence of fully conventional and unequivocal meaning for signals. That meaning is present is due to unequivocal conventions for both correspondence and procedures. What makes it primitive is that the signaling system is as structurally simple as it can be. Stages 2–5 represent the transition from primitive meaning to various features that are thought to characterize human language and cognition. This transition results in greater information-processing capacity and precision in behavioral control by subdividing the complete function of the functionally monolithic signals at stage 2. The characterization of the stages is to a large extent driven by the contrast between primitive content and standard models of rational choice theory (stage 4), and selfawareness and model of mind (stage 5). Stage 3: This stage introduces pure indication. The standard (economic) model of human action has it that behavior is rationally determined when the satisfaction of the individual’s preferences or desires is maximized, given the individual’s beliefs. On this model, action results not from a single signal but from at least two, and critical for our understanding of how human thought and language compare with those of animals is that the belief or external-state-representing component is stripped of its motivating force, allowing the “pure” indication of external states of affairs. In the simplest case, the representation of external conditions implies (procedurally) that if a certain outcome were desired, then a certain action would be effective. A second representation of internal states indicates that the outcome is in fact desired.
Primitive Content
45
Together these determine the action. So, for instance, the belief that food is present in combination with hunger results in the attempt to eat. The example of feeding behavior is chosen deliberately here. There is good reason to think that feeding behavior in all animals above the unicellular level must be conditional on both internal and external sensors. This would make it seem that something like the belief/desire distinction will be evident in internal representation and control in all animals. One good empirical question is At what point and how does it enter external communications between animals? There is some indication that some warning cries function more as state indicators than as monolithic tracking-and-motivating signals, since how such cries are responded to may depend on a variety of other factors (Cheney and Seyfarth, 1990). Stage 4: At least as far as formal semantics go, human external representations (indicative sentences) are characterized by a subject/predicate or propositional structure, which attributes properties to objects (or classes of objects). At stage 3, the simple indication of the presence of food need not go so far as first to indicate food, and then indicate that it is present. Rather, there can simply be an external sensor which indicates a feeding opportunity, which says procedurally, if you are hungry, then eat. The most basic kind of subject/predicate structure would seem to involve the reidentification of objects, and placing those objects into categories. This is little more than a schematic for associative learning, and it seems unavoidable that any (for instance) social animal will learn (a) to reidentify individuals and (b) categorize them as allies or enemies, superior or subordinate, and so forth. Consequently, the default presumption should be that the internal representations of higher animals are indeed capable of subject/predicate structure, though again, there is no reason to think that the terms or symbols involved are translatable into human terms. Stage 5: Finally, the most common items of interest when comparing human and animal communication are whether or not animals are “aware” of the meaning of their signals and whether they “intend” that the hearer understand them in some particular way. The questions, in familiar terms, are whether animals are reflective in their internal representation and whether they possess a “model of mind”—a representation of their communicative partners as representing agents in their own right. These questions, while often taken as central to the comparison between human and animal representation, concern second-order representation, that is, representation of representation processes. As such, they constitute an additional elaboration of the control structure above and beyond the belief/desire and subject/predicate distinctions which constitute the logical differences between the primitive content model and the rational action model. This is not the place to discuss the true nature of human intelligence or self-aware representation, but a few points which arise directly from the above discussion are relevant here.
46
William F. Harms
Short of a model of a representational system which is capable of representing itself, we may want to consider a hierarchically structured set of semantically endowed signaling/control systems. The simplest case would involve a signaling system that tracks the behavior of another signaling system. The literature on cooperation (e.g., Trivers, 1971) has made a truism of the idea that cooperative systems invite cheating, and cheating makes cheater detection mechanisms adaptive. A cheater detection mechanism represents the activity of a representing system: the mind of the cheater. However, cheater detection mechanisms as they are usually conceived of do not go so far as to describe the internal tendencies of the cheater, but simply respond to some stimulus (like cheating on the last round) and do so by not cooperating. The semantics of these mechanisms is typically primitive, tracking relevant states and responding without mediation from further representational inputs. The point is that just as the simplest emergent forms of communication lack belief/ desire and subject/predicate structure, so the simplest emergent forms of hierarchical representation and control should also be expected to be primitive in this way. If minds consist not of a Cartesian unified seat of consciousness, but of a large collection of partially integrated signaling/control systems of varying sophistication and flexibility, then we must be prepared to ask each such subsystem what sort of content it possesses. Is it primitive? How cooperative and how competitive are the underlying economics of the interactions? Does the signal mirror the belief/desire distinction in having its output be contingent on input from multiple indicator systems? Is there an indicator type which allows associative learning via the categorization of reidentified objects via a subject/predicate structure? The representation of an individual’s own representation and of other minds need not exhibit any more than primitive content, and one should probably expect to find primitive content first. Conclusion The general awkwardness that surrounds the attribution of meaning to animal communication has been shown to be the result of using an elaborate anthropocentric theory of meaning to constitute minimum requirements for legitimate semantic analysis. Recent developments in functional theories of meaning seem to solve old philosophical problems concerning how correspondence is established between representations and the world. That the functional approach to meaning is on the right track is further indicated by the facility with which it allows us to plot stages of the emergence and elaboration of mean-ing in adapted systems, rather than taking meaning itself as a primitive and rather mysterious relationship characteristic of an irreducible realm of “mind.” What seems to
Primitive Content
47
be the case is that we are poised to break the theoretical logjam which has kept our understanding of mind separate from that of the physical world, and the study of meaning in animal communication and cognition must play a critical role in integrating mind into the world. Of what has been written here, a few points deserve to be insisted on, while others can be expected to evolve with the demands of a framework for comparative empirical analysis. Of the former, meaning must be a cooperative phenomenon because competitive interests introduce conflicts of purpose, and thus ambiguities in usage conventions. This means, among other things, that real-world semantics develops against a fundamentally gametheoretic backdrop. It is also worth insisting that primitive content is the point at which meaning can be properly said to emerge, since it possesses both the minimum required semantic properties (unequivocal correspondence and procedural conventions) and the absolutely minimum structure. On the other hand, we should expect to find that individual examples of signaling behavior are more complex in terms of both conventions and structure than we had thought, so that, for instance, the warning cries which inspired the concept of primitive content may turn out to have more elaborate meaning than that bare minimum. On the theoretical side, table 3.1 is structured to show relationships between a number of standard models of (0) causal processes, (1) perception, (2) simple signaling, (2a) competitive signaling, (3) pure indication, (4) basic sentence structure, and (5) model of mind. Missing from this scheme, for instance, are elaborations of stage 2/3 which involve adapted interpretations of signals of varying strength and, from stage 4, the explicit consideration of truth-functional syntaxes. References Austin JL (1962) How to Do Things with Words. London: Oxford University Press. Axelrod R (1984) The Evolution of Cooperation. New York: Basic Books. Brentano F (1973) Psychology from an Empirical Standpoint (Rancurello T, Terrell D, McAllister L, trans.). New York: Humanities Press. (First published 1874.) Cheney DL, Seyfarth RM (1990) How Monkeys See the World. Chicago: University of Chicago Press. Dawkins R (1976) The Selfish Gene. New York: Oxford University Press. Dretske F (1981) Knowledge and the Flow of Information. Cambridge, Mass.: MIT Press. Frege G (1980) Über Sinn und Bedeutung. In: Translations from the Philosophical Writings of Gottlob Frege (Geach PT, Black M, eds.). Oxford: Blackwell. (Essay first published 1892.) Grice P (1989) Studies in the Way of Words. Cambridge, Mass.: Harvard University Press. Hamilton WD (1964) The genetical evolution of social behavior I & II. J Theoret Biol 7: 1–52. Harms WF (2000) Adaptation and moral realism. Biol Phil 15 (5): 699–712. Harms WF (2001) Cooperative boundary populations: The evolution of cooperation on mortality risk gradients. J Theoret Biol 213: 299–313.
48
William F. Harms
Lewis D (1969) Convention. Cambridge, Mass.: Harvard University Press. Millikan RG (1984) Language, Thought, and Other Biological Categories: New Foundations for Realism. Cambridge, Mass.: MIT Press. Millikan RG (1993) White Queen Psychology and Other Essays for Alice. Cambridge, Mass.: MIT Press. Skyrms B (1996) Evolution of the Social Contract. New York: Cambridge University Press. Sober E, Wilson DS (1998) Unto Others: The Evolution and Psychology of Unselfish Behavior. Cambridge, Mass.: Harvard University Press. Tarski AA (1956) Logic, semantics, metamathematics; papers from 1923 to 1938. (Woodger JH, trans.). Oxford: Clarendon Press. Trivers RL (1971) The evolution of reciprocal Altruism. Quart Rev Biol 46 (3): 35–57. Williams GC (1966) Adaptation and Natural Selection. Princeton, N.J.: Princeton University Press. Zahavi A, Zahavi A (1997) The Handicap Principle: A Missing Piece of Darwin’s Puzzle. New York: Oxford University Press.
4
Underpinnings for a Theory of Communicative Evolution
D. Kimbrough Oller Overview In order for the study of communication evolution to proceed systematically, we need to determine a set of features in terms of which comparison can be made across species and across evolutionary time. I began working on such a framework in research on infant vocal development in the early 1970s (for a summary see Oller, 2000), and in that context considered the role that a theory of development might play in illuminating the evolution of language. This work led to formulation of a general framework of command “properties” for communication systems, an attempt to account for the capabilities that are required to implement communication at various levels of complexity. The system is currently designed to address vocal systems; of course an analogous system of gestural communicative properties is also possible, though it will not be considered here. The current framework development was partly inspired by efforts of Charles Hockett (Hockett, 1960; Hockett and Altmann, 1968), whose intent was to formulate a set of “design features” that could form the basis for comparison across species, across time, and across natural and artificial systems of information transmission. A key point in the development of such a system of comparison is that such design features have to be abstract. They cannot be formed from the concrete units of any particular functional communication system. The complexities of human language must be accounted for, but other communication systems cannot simply be shoehorned, as a means of comparison, into the characteristics of the human one. Hockett contended that the useful points of reference cannot be such things as the word for “sky”; languages have such words, but gibbon calls do not involve words at all. Nor can they be even the signal for “danger”, which gibbons do have. Rather, they must be the basic features of design that can be present or absent in any communicative system of humans, of animals, or of machines. (Hockett 1960, p. 89)
On the basis of research on vocal development, I came to a similar conclusion about a basis for comparison between mature speech and infant vocalizations. The proper points of reference for infants in the first months of life cannot be concrete units of the mature system such as particular “phonemes” or “resonance patterns” or “intonations.” I refer to such elements as mature “operational-level,” “functional-level” or “common-sense” units. To try to compare the infant vocal system with the adult system by using mature operational-level units requires shoehorning of infant categories into frames where they do not fit, and it yields at worst meaningless, and at best misleading, comparisons. For example, it simply makes no sense to ask which phonetic segments from the list [r], [p],
50
D. Kimbrough Oller
[s], [n], [o], [u] are produced most commonly by a one-month-old infant. The question is meaningless because the infant does not produce well-formed syllables at all, and so has not even yet shown the infrastructure for phonetic segments. Instead, I have argued that the points of reference for the infant must be infrastructural properties of potential systems of communication (Oller, 2000) corresponding to action capabilities that can be used in communication. For example, instead of asking whether the infant can produce the speech segments [r] or [p], we might ask (a) whether the infant can produce any kind of sound freely, as evidenced by repetitive production of any particular sound in the absence of obvious stimulation, such as discomfort or social interaction; (b) whether the infant shows the ability freely to produce sounds with a substantial range of variation along multiple acoustic parameters (amplitude, duration, pitch, resonance); and if so, (c) whether the infant shows systematic production of polar opposite contrasts along those parameters (for example, contrastively producing repetitive sequences of high-pitched “squeals” followed by sequences of low-pitched “growls”). Infants produce vocalizations that appear to be primitive precursors to speechlike sounds and that meet the conditions (a), (b), and (c) during the first four months of life (see Oller, 1980). Because conditions (a), (b), and (c) imply capabilities of control over vocalization that are required in speech, they imply infrastructural points of reference for comparing the mature human vocal system with the infant system, and importantly they do not require that our reasoning be entangled in the shoehorning fallacy. A baby that performs these actions is a baby that has important foundations for speech. Further, the capabilities implied by (a), (b), and (c) can be used as a basis for comparison among species, again avoiding shoehorning. An animal that shows any of the capabilities implied by (a), (b), or (c) also shows important potential vocal communicative flexibility, independent of any direct relationship with speech syllables or segments. Natural Logic and the Properties of Command I refer to these capabilities formally as command “properties” of communication systems, a usage that invokes some but not all aspects of Hockett’s notion of “design features.” Ideally, as suggested by Hockett, the properties formulated within such a framework would point to a progression of potential evolution. In general I have attempted to account for progression in terms of a “natural logic” where properties are implemented step by step according to logical relationships among them. The goal of the effort is to provide a comprehensive treatment of the properties and their relationships, a characterization of the infrastructure of potential communication systems. The goal of this chapter is to sketch the model and exemplify its utility.
Underpinnings for a Theory of Communicative Evolution
51
The natural logic presumes that capabilities can build upon each other in evolutionary steps. New capabilities emerge in a species as adaptations that confer selective advantages. Each newly evolved capability may serve as the foundation for subsequent development of a more elaborate capability that confers a further degree of advantage. The logic is designed to conform to the possibilities that communication systems may exploit, and since those possibilities are discoverable (according to my assumption), and form a sequence (although a somewhat complex sequence), I hypothesize that we should be able to determine the possible steps of growth in capability. Power and Efficiency It is a truism to say that systems of communication are naturally selected if they contribute to the survival of the species that utilizes them. There appear to exist two characteristics of systems that relate to each other in a dynamic tension, both of them contributing to survival of a communication system (actually survival of creatures that use it). One is power (the relative ability of the system to transmit the information or manage the social relationships that might be relevant to survival) and the other is efficiency (the relative ability to communicate rapidly and at low cost in energy and danger). Communication systems both across species and across individual human languages appear to have features that are naturally selected for power and efficiency. The two sometimes compete: power can in some cases be attained only if some degree of efficiency is sacrificed (long, complex messages may yield more information, but they take longer to transmit and understand, and cost more in energy), and efficiency sometimes can be attained only at a cost in power (fixed signals such as distress or alarm calls appear to be extremely efficient and quick, but their power is limited to constrained circumstances of interaction). I do not refer to power and efficiency as “properties” of communication systems in the technical vocabulary of my proposal. One reason is that power and efficiency are not “prime properties”; all the proposed properties discussed here can be thought of as contributors to power and efficiency. Similarly I do not refer to concepts such as “flexibility” of action or “purposefulness” as properties, because all the properties contribute to flexibility and purposefulness. As systems become more elaborate (that is, incorporate higher-order properties), their users show systematic increases in the degree of flexibility of usage they display. The infrastructural properties model provides a characterization of ways in which a communication system can become more flexible and be used more purposefully. Hockett refers to “openness” and “productivity” of communication systems, attempting to capture a characteristic of human language. But again, these features are not prime properties. Many, if not all, of the properties to be discussed contribute to productivity and
52
D. Kimbrough Oller
openness because they contribute to the flexibility of systems to adapt in order to produce and receive messages with power and efficiency. In keeping with a distinction introduced by Ferdinand de Saussure (Saussure, 1968), I say that communication acts always encompass signal and value. I will refer to the domain of study for vocal signals as “infraphonology.” I will refer to the domain of study for value, which is also called “function” or “usage,” as “infrasemiotics.” Saussure’s terms for signal and value were signifiant and signifié (see figure 4.1). That there exist a signal and a value does not necessarily imply that both parties to a communicative act relate to the signal and/or value in the same way. It is clear that in many primitive communicative acts, the signaler may not even be aware of the fact that communication has occurred. Similarly, the response to the signaler’s act by the perceiver may not be conscious or intelligent. Thus the description of communicative acts from the perspective of the definition of signal and value leaves an enormous amount of room for different types of signals and values that may have an assessment/management flavor (Owings and Morton, 1998; Owings and Zeifman, chapter 9 in this volume), as well as for those that are freer and more languagelike. In primitive communication systems, it can be said the signal and value are “coupled.” Signals in such systems have fixed form and they transmit fixed values. As a system of communication becomes more powerful, it can acquire the ability to transmit signals and values more flexibly. The key first step in this process of adaptability in a communication system might be called “signal/value decoupling” or “functional decoupling.” Figure 4.2 indicates that when functional decoupling occurs in a system, the property I call “Contextual Freedom” enters the picture. Prior to that point, a system of communication can transmit values only in fixed ways, where signals and values are tied together inextricably within individual communicators. Signal (display, form, action)
Value (content, usage, function, message)
[signifiant] Infraphonological material
[signifié] Infrasemiotic material
When signal and value are tied together (cannot be separated within individual members of a species), they can be said to be functionally “coupled.”
Figure 4.1 The omnipresent distinction between signal and value.
Underpinnings for a Theory of Communicative Evolution
53
b: Types of communicative acts occurring at each time a: Order of appearance of command properties
Contextual Freedom Functional Decoupling
Time 3
Indexical acts
Time 2
Ritualization
Specialization
Indexicality
Contextually free acts Specialized communicative acts
Specialized communicative acts Indexical acts
Time 1
Indexical acts
Figure 4.2 Early steps in communicative evolution. (a) Order of appearance of command properties; (b) Types of communicative acts occurring at each time.
Early Steps of Communicative Evolution Communication systems appear to begin with purely “Indexical” character. Indexical, in this usage, is related to Peirce’s (1934) concept but is not identical to it, so I now provide a technical definition: Indexicality is the property of a communication system that allows any sort of functional connection between signal and value. This property merely requires that there be a relationship such that the signal “points to” or “indexes” the value for some potential receiver, and there is no limit on how that might be done, as far as the “Indexicality” goes. Every communication, no matter how complex or how simple, is thus Indexical. However, the most primitive communicative acts might be said to be “purely indexical,” and in that case there are limits on how the connection is made. A purely Indexical communication is one that results from an action that is not adapted for communication, but has some other function, as in the case of coughing. A cough is communicative
54
D. Kimbrough Oller
because it indexes or points to a condition of the producer, but it is only because the hearer recognizes what coughing indexes (airway obstruction, tickling in the throat, etc.) that the act results in communication. Once an act has (by evolution or learning) become adapted for communication, it becomes “specialized,” and we say, utilizing the terminology of Hockett, that the system has a design feature (or property) of “Specialization”, that is, it possesses signal/value pairings that are specialized as potential communications. Lorenz (1963) and Tinbergen (1951) called the evolutionary process that produces specialized communications “ritualization.” In the middle of the 18th century Condillac (1756) referred to purely indexical acts as “accidental signs” and to specialized fixed signals (that is, functionally coupled signals) as “natural signs.” Although the terms were different, it appears that the distinction invoked here between Indexicality and Specialization has been recognized for at least two and a half centuries and perhaps for much longer. In general, it appears that in evolution of communicative systems, Specialization does not occur except where purely indexical communications are already in place. Similarly, it appears that Contextual Freedom in the utilization of potential communicative acts does not occur until specialized, but functionally coupled, communications are already in place. Further, the relationship among these properties appears to abide by a natural logic as portrayed in figure 4.2. At the earliest point in time, organisms command minimal Indexicality, and can produce purely indexical acts, which constitute the only form of communication at that primitive point of evolution or development—these acts are neither specialized nor contextually free. At the second point in time, Specialization is commanded and there exist communicative acts that are “designed” (by natural selection) as communications. The Venn diagram in figure 4.2b indicates that specialized communicative acts are included within the domain of indexical acts (since they are new kinds of indexical acts), and further that contextually free communicative acts, which enter the picture at time 3, are within the domain of both indexical acts and specialized communicative acts, and constitute new kinds of each. The first step departing from pure Indexicality is ritualization, and the second step is functional decoupling. Further Elaborations of Functionally Decoupled Communication Systems Functionally coupled, specialized communications appear to be extremely efficient in terms of speed of signal delivery and reception, and presumably also in terms of cost. However, a system of communication that is functionally coupled is severely limited in what it can do. It can transmit only the values that have been selected through evolution, and cannot be adapted within the individual to social or other environmental changes that
Underpinnings for a Theory of Communicative Evolution
55
might call for new kinds of communication. Functional decoupling represents a foundation that is a sine qua non for adaptability of communication within individuals. At the same time, the first implementation of functional decoupling can seem inauspicious. The human infant shows a kind of decoupling in the vocal domain, typically in the first week of life, through production of “quasivowels.” These begin as low-intensity, short-duration, normally phonated sounds produced in circumstances that appear to include no external stimulus to account for their production and no obvious state expression, unless we wish to view lack of discomfort as a state deserving of expression. The infant appears to produce these sounds for no utilitarian reason. That parents and other caretakers may notice and respond to the sounds on some occasions may be important in motivating the infant to continue vocalizing, but the lack of immediate utilitarian effects suggests that the infant commands Contextual Freedom of vocal production, providing a foundation for flexible communicative action later. Figure 4.3 offers a perspective on additional steps of communicative elaboration that are possible once Contextual Freedom is in place. On the infrasemiotic side, it is possible
Categorically adaptive acts
Freely directive acts
Time 5
Freely expressive acts
Acts of signal analysis
Contextually free acts
Contextually free acts
Time 4
Freely expressive acts
Acts of signal analysis
Contextually free acts Contextually free acts
Time 3
Infraphonological domain
Contextually free acts
Figure 4.3 New communicative acts possible after functional decoupling.
Infrasemiotic domain
56
D. Kimbrough Oller
to use functionally decoupled actions freely to express multiple states, such as comfort, discomfort, or exultation. Certainly, even without functional decoupling, state expression through vocalization is clearly possible, since fixed signals, such as crying, clearly express states. But this is not free state expression because crying is coupled to negative emotion. Free Expressivity is, then, a utilitarian step that builds upon Contextual Freedom, and is a kind of Contextual Freedom wherein different emotions can be voluntarily associated with individual sound types. Freely expressive acts are included within the domain of contextually free acts, as indicated in the figure. The empirical observations conform to the logical relationship of inclusion: primitive, nonutilitarian contextual free vocal acts have been observed to appear in the human infant prior to freely expressive acts. The first step of functional decoupling, then, appears to be that babies vocalize contextually freely, with no immediate utilitarian intent (unless interest in the sounds themselves can be viewed as utilitarian). Once vocalization is freely expressive, it can be directed freely to listeners. It appears that this pattern of growth in vocalization usage with freely expressive acts preceding freely directive ones occurs in human infants. By the second or third month of life, vocal acts can be directed by infants toward their caretakers (as indicated by eye contact; Trevarthen, 1979). By this point (time 5 in figure 4.3) all three types of decoupled acts (contextually free, freely expressive, freely directive) occur. The ordered appearance of these act types in development conforms with the naturally logical relationship that I posit to exist among the command properties: Contextual Freedom, Free Expressivity, and Free Directivity. In figure 4.4 the presumed order of development for additional command properties is indicated. Minimally contextually free acts (repetitive vocal acts that are not produced with variable emotional expression) occur before freely expressive ones because, according to the proposed natural logic, freely expressive acts are kinds of contextually free acts and consequently presuppose the existence of prior command over acts of minimal contextual freedom. Similarly, freely directed acts presuppose freely expressive ones. The next posited property of infrasemiotic development indicated in figure 4.4 is Free Interactivity, presumed to be possible only after Free Directivity is under control. Again, research in child development supports this notion: Early vocal exchanges with infants where mutual eye contact is involved become elaborated by the fifth month or so to include systematically alternating vocal exchanges with adults, an indicator of Free Interactivity (Stern et al., 1975; Anderson et al., 1977; Beebe et al., 1979; Papousˇek and Papousˇek, 1979; Papousˇek and Papousˇek, 1989). Finally, as indicated in figure 4.4, once an infant is able to interact systematically, it may become possible for the infant to imitate vocally, a function which represents elaboration of Free Interactivity to include systematic matching of the vocalizations of one party
Underpinnings for a Theory of Communicative Evolution
Time 7
Adaptable Prosody
Syllabic/Prosodic Decoupling
Time 5
Categorical Adaptation
Time 3
Active Signal Analysis Infraphonological domain
Imitation
Adaptable Syllabicity
Time 6
Time 4
57
Free Interactivity
Free Directivity
Free Expressivity
Contextual Freedom
Infrasemiotic domain
Figure 4.4 Additional steps of potential evolution or development of properties after functional decoupling.
with those of another. Again the data from human infancy suggest that the Free Imitation function appears after Free Interactivity (certain more primitive imitative functions are seen earlier [Meltzoff and Moore, 1977; Kessen et al., 1979], but should not, I contend, be counted as acts of “free” imitation). On the infraphonological side, the systematic steps of potential evolution beyond Contextual Freedom begin with Active Signal Analysis (or Parameterization). The earliest contextually free utterances appear to be quasivowels, produced with uninterrupted low resonance, middle pitch, low amplitude, short duration, and normal phonation. With the appearance of sounds that vary by resonance, continuity, pitch, amplitude, duration, or phonation type, the infant shows that the potential signal can be differentiated in terms of specific acoustic parameters, revealing the property of Active Signal Analysis. Any of these variations appears to be possible, although resonance, continuity, pitch, and phonationtype variations appear to be the most common in the first month (Zlatin, 1975; Oller, 1980; Stark, 1980; Roug et al., 1989). Not long after early parametric variation is introduced, evidence of Categorical Adaptation appears when infants systematically contrast polar opposite vocalizations along selected parameters—in the domain of pitch, for example, we see infants produce systematically alternating sequences of very high pitch (squealing) and very low pitch (growling) (Oller, 1980). Among the earliest contextually free utterances we see in human infants are brief, rhythmically organized sequences of quasivowels (Lynch et al., 1995). Each element of such a sequence can be thought of as a primitive syllable. Once Categorical Adaptation has been
58
D. Kimbrough Oller
achieved, it is possible to produce different kinds of primitive syllables (differentiated, for example, by resonance or pitch, as in the case of the contrast between squeals and growls) in these rhythmically organized sequences. Similarly, infants may also introduce a primitive distinction between syllabic and prosodic information at this point, by using differing “intonational” or “voice quality” frames with the same primitive syllables. For example, an infant may produce a high-pitched squeal in several ways: with rising intonation it may indicate delight, whereas with rapidly falling intonation it may indicate frustration, and with harsh voice quality it may indicate high levels of arousal. In such cases the infant manifests Syllabic/Prosodic Decoupling, in which syllabic and prosodic material is presented simultaneously, woven together in what John Locke (1993) calls Hot-Cool Synthesis. Perhaps the most notable feature of the synthesis is that it represents systematic coordination of independent (i.e., decoupled) simultaneous features of the vocal signal. Having decoupled the prosodic and syllabic domains, it becomes possible for the infant to develop systematic elaborations in both domains, with the appearance of the command properties Adaptable Prosody and Adaptable Syllabicity. Both of these involve notable steps that cannot be described in this space, but suffice it to say that canonical forms of syllables and prosodic patterns come into play once Syllabic/Prosodic Decoupling is accomplished (Oller, 2000). It is important to emphasize that development or evolution of command properties beyond the point of functional decoupling always presupposes freedom of vocal communicative function. In fixed signal systems of nonhumans we certainly see examples of directivity of vocalizations (in threat displays, for example), of interactivity (in duetting bouts, for example), and of various kinds of emotional expressivity (in distress calls, for example). But note that if the signals are fixed, then they do not show freedom of function in any of these cases. Of course it is an empirical question whether they really are entirely fixed, but the literature generally suggests that in nonhumans, distress calls are always distress calls and threat calls are always threat calls. What is so apparently important about human infants is that in the first months of life, they already show primitive decoupling of vocal signals from values, followed systematically and soon thereafter by the appearance of additional kinds of contextually free acts, which appear to reach the level of imitation by early in the second half of the first year. In addition, it is worthy of note that functionally decoupled systems with many of the vocal command properties indicated in figure 4.4 have been evolved in nonhuman creatures, especially birds. Figures 4.5 and 4.6 offer a fuller sketch of the proposed infrastructural model, which cannot be explicated in the space available here. The arrows in these figures indicate naturally logical, presuppositional relationships among the proposed command properties. If
Underpinnings for a Theory of Communicative Evolution
59
Adaptable Rhythmic Hierarchy
Phonotactic Elaboration Recombinability of Syllables
Recombinability of Rhythmic Groups
Segmentation
Adaptable Prosody
Adaptable Syllabicity
Syllabic/prosodic Decoupling
Source/filter Decoupling
Categorical Adaptation
Active Signal Analysis
Contextual Freedom Figure 4.5 Hierarchy of infraphonological properties.
the arrows are reversed, they indicate order of development. The goal of the model, as noted above, is a comprehensive treatment of the character of potential communication systems. Consequently, the sketches attempt to treat possible elaborations of both signal and value systems, to account for changes from simple communicative systems with very limited power to systems of languagelike potency as evolution or development proceeds in accord with the naturally logical possibilities specified in the model. It should be noted that the relationships posited and symbolized in the figures as arrows are not the only relevant ones, because there are clearly many relations (at least of the weak logical sort symbolized by dotted lines) between properties of the infraphonological and infrasemiotic domains—for example, growth of power in what is to be transmitted in the infrasemiotic domain may exert selectional pressure to increase the power of an infraphonological system (if you have more to say, you may need better tools to say it with). Likewise, as greater power accrues in the infraphonological domain, growth in the infrasemiotic domain may be facilitated (if you have better tools, there is more you can say with them).
60
D. Kimbrough Oller
Embedding
Grammatical Movement
Metacommunication Flexible Prevarication Displaceability
Propositional Conjunction Grammaticization Propositionality
Thematicity
Semanticity (Semantic-Illocutionary Decoupling) Arbitrarity Free Triadic Illocutionarity (Designation)
Conventionality Imitability
Free Dyadic Illocutionarity
Free Interactivity Free Directivity Free Expressivity Contextual Freedom
Figure 4.6 Hierarchy of infrasemiotic properties.
A Key Example: Semantic/Illocutionary Decoupling In this limited space I choose to focus on just one of the distinctions among command properties that I deem particularly crucial in the comparative enterprise: The distinction is implied by, though not directly explicated in, Austin’s (1962) differentiation of two value types, “meaning” and “illocutionary force.” Within the model I propose, the systematic maintenance of a distinction between meaning and illocutionary force is termed “Semantic/Illocutionary Decoupling.” A communicative system without this property is more primitive than one that has it, and a system with it presupposes the more primitive form; any creature whose vocal acts simultaneously distinguish meanings and illocutionary forces can also communicate in the more primitive way, without the distinction. It is worth noting that both Millikan (chapter 2 in this volume) and Harms (chapter 3 in this volume) address what I understand to be this lack of distinction (or coupling) in primitive communications, although their focus is upon representational aspects of it, and they do not portray the coupling as involving illocutionary force and meaning, but rather two aspects of what they call meaning: an indicative (referential) and a procedural (instruc-
Underpinnings for a Theory of Communicative Evolution
61
tional, action-influencing) aspect of meaning. To clarify my own view of the coupling, it will be necessary to explicate what I mean by illocutionary force. To a first approximation, illocutionary forces consist of the social acts performed in the act of saying things or communicating about things. So in the act of saying “I want a beer,” I may make a “request” (thus performing an illocutionary act). But with the same words (and “meaning,” in Austin’s terminology), I make reference to a state of desire within myself, without necessarily making a request (since I may have gone on a no-alcohol diet). Thus I might make this statement without intending to make a request, but rather to perform the simple descriptive act of indicating my desire for a beer. We might call the latter illocutionary force a “comment” or “statement.” “I want a beer” always directly bears the same core indicative semantic content, though it may imply other contents as well. The fact that the same meaning, the same semantic content, can be utilized in the service of a variety of illocutionary forces provides a clear illustration that meaning and illocutionary force are different in type, that they represent different layers of communication, simultaneously transmitted. It is also worthy of note at this juncture that a richly implemented Syllabic/Prosodic Decoupling in the infraphonological domain can greatly facilitate expressions that show Semantic/Illocutionary Decoupling (to a first approximation, syllabic content in natural human languages tends to express semantic value and prosodic content tends to express illocutionary value). The distinction between illocutionary force and meaning is critically important in crossspecies comparisons of communication systems. A variety of extant creatures have systems of communication less powerful than the mature human one, in that they can communicate an illocutionary force such as a warning, a comment, a threat, an invitation, or a greeting by producing one of the natural display signals in their species-specific repertoire. However, the nonhuman may not have the capability to utilize the same signal (or recombinable elements of the same signal) to communicate any other illocutionary force. Thus the vervet monkey can communicate a warning (even a predator-specific or at least a location-specific warning) with a particular vocalization (Struhsaker, 1967; Cheney and Seyfarth, 1990), but cannot use the same vocalization (or recombinable elements of that signal) to communicate an invitation or a comment, or any other illocutionary force. Each vervet signal has important communicative power and precision, but insufficient flexibility to allow it to communicate any illocutionary force other than that prescribed through natural selection. As a consequence the “semantic” value of a warning signal is bound to its illocutionary force, and in some sense it might be said there is no systematic distinction between semantics and illocutionary force for the vervet producer of the warning call. The two value types remain coupled. Mature humans, in contrast, because they command Semantic/Illocutionary Decoupling, are capable of issuing a warning with an enormous variety (in fact, an indefinitely large
62
D. Kimbrough Oller
class) of semantically endowed vocalizations. We can shout the name of any number of predators, for example, intending those names as warnings, or we can say “watch out” or “be careful” or “danger,” along with an indefinitely long list of other possible warnings. Equally important, each semantically endowed utterance in a system with Semantic/Illocutionary Decoupling can be utilized to transmit a variety of forces. The force of “warning” is just one among them. We can say, for example, the word “eagle,” intending to warn people that we should all duck our heads to avoid the talons of a swooping bird. On the other hand, with precisely the same segmental sequence, referring to precisely the same class of birds, one can also say “eagle” simply to comment upon the fact that a particular photograph is that of an eagle. Further, we can say “eagle’ as an invitation to view one, as a correction of some prior statement that a particular animal was a crow, and so on. In a typical act of communication within the mature human linguistic system, then, analytical reference is made to at least one class of entities, but at the same time, and in ways that are not determined by the nature of the reference, any one of a wide variety of particular illocutionary forces is also transmitted. The force-independent reference (or “meaning,” in Austin’s usage) is crisply distinct from the illocutionary force, and both the force and the meaning are transmitted simultaneously. In a communicative system that commands Semantic/Illocutionary Decoupling, communicative elements can be said to have illocutionary flexibility, whereas in systems without Semantic/Illocutionary Decoupling, communicative elements have semantic and illocutionary values that are in a fixed relationship. Semantic/Illocutionary Decoupling is a capability that is built upon a base where primitive illocutionary forces and meanings are already present, but always bound together. As far as I can tell, no nonhuman system of communication in the wild shows Semantic/Illocutionary Decoupling, although rich systems of illocutionary force transmission with some inherently referential character are seen in a wide variety of animals (see chapters 8 and 10 this volume, for instance). Notably Pepperberg (chapter 10 in this volume) has observed an apparent ability of the African Grey parrot to learn to use humanlike vocalizations with at least limited Semantic/Illocutionary Decoupling. The development of linguistic capabilities in human infants and children shows the emergence of Semantic/Illocutionary Decoupling over the first two years of life. Infants begin with systems of vocal and gestural communication that involve rich illocutionary force transmission, but lack systematically distinct analytical meaning. Empirical research on early vocabulary learning in children suggests that Semantic/Illocutionary Decoupling does not emerge with the first “words” of children. In fact, illocutionary inflexibility is one of the hallmarks of early word use. When Semantic/Illocutionary Decoupling does emerge, it tends to be accompanied by rapid vocabulary growth (Carey, 1982), suggesting that the advantages of Semantic/Illocutionary Decoupling are associated with an emer-
Underpinnings for a Theory of Communicative Evolution
63
gent awareness (perhaps tacit) in the child that words can make analytical reference to entities independent of the illocutionary forces that might be transmitted in using the words. Another advantage of a system that distinguishes illocutionary force and meaning may be in more efficient storage in memory: Any child who draws the distinction simplifies storage because s/he need not memorize each word in conjunction with each force it might be used to express. Instead, the child can have a lexicon of words with meanings independent of the stored elements of force, reducing the number of lexical items that need to be recorded in memory by a factor equal to the number of forces that might be transmitted for each meaning. Further, the achievement of Semantic/Illocutionary Decoupling allows every newly acquired element with semantic value to be expressed with numerous potential force values. Where the Infrastructural Proposal Fits in the Recent Literature on Evolution of Language Commonly, speculations about evolution of language tend to be focused almost exclusively upon properties that pertain to very elaborate systems, such as those displayed at the top in figures 4.5 and 4.6. Lower-order properties are typically not considered, even though they appear to provide the necessary foundations for the higher-order ones. It would appear that there must exist foundational capabilities observable in early stages, because systems must be evolvable and developable. And there is strong evidence that many levels of power of communicative systems can exist, especially as seen in various forms of animal communication and, perhaps even more interestingly, in the human infant, where useful and functionally decoupled communication is found in the first months of life, and where development proceeds through all the steps indicated in figures 4.5 and 4.6. The infrastructural model is designed in such a way as to specify primarily the capabilities of the producer of vocal communications. I do not intend to imply that the receiver is not important, since receivers must be capable of interpreting the values of the signals produced, if those signals are to be selected and maintained. Since each step of communicative elaboration must be selectable, there must be logically related steps of reception and production at every point in evolution. Each one of the steps is of interest, and the nature of the relationship between reception and production may vary with the step. In the pursuit of the infrastructural model I am fascinated by the forces that seem to apply at many of the stages of potential evolution. For example, how a communication system makes the first major leap that I have posited, that of functional decoupling in Contextual Freedom, needs explanation because there appear to exist advantages of efficiency
64
D. Kimbrough Oller
and accuracy in coupled, specialized systems. (Remember that fixed signals are often extremely well-designed to their specific tasks, such as alarm or threat.) One possibility is that Contextual Freedom may introduce an advantage to the producer because contextually free vocalizations may elicit attention and caregiving within the context of a social group. Early hominids apparently were highly social, with large clans where mechanisms of obtaining caregiving may have been extremely important to infants and children, and where contextually free vocalizations may have played an important role in maintenance of social bonds (see chapter 14 in this volume), with especially high value placed on kin communication and cooperative breeding (see chapters 8 and 15 in this volume). Contextually free vocalizations may thus afford opportunities for learning and socialization even in very primitive cases of vocalizations such as the quasivowels of human infants, and perhaps in contextually free vocalizations that may have occurred in early hominid evolution where hominid social situations began to favor the use of vocalization to elicit attention. Such primitive steps of growth in system need to be addressed, and in my opinion, it is important to do so in the context of a comprehensive attempt to specify the infrastructure of communicative systems that may culminate in languagelike potential. The evolution of language and other powerful systems of communication can be understood only in the context of a general theory of the underpinnings of communicative systems. Acknowledgment The author would like to express appreciation to the Konrad Lorenz Institute for Evolution and Cognition Research in Altenberg and to the Plough Foundation for supporting the work represented in this article. References Anderson BJ, Vietze P, Dokecki PR (1977) Reciprocity in vocal interactions of mothers and infants. Child Devel 48: 1676–1681. Austin JL (1962) How to Do Things with Words. London: Oxford University Press. Beebe B, Stern D, Jaffe J (1979) The kinesic rhythms of mother–infant interactions. In: Of Speech and Time (Siegman AW, Feldstein S, eds.). Hillsdale, NJ: Lawrence Erlbaum. Carey S (1982) Semantic development: The state of the art. In: Language Acquisition: The State of the Art (Wanner E, Gleitman LR, eds.). Cambridge: Cambridge University Press. Cheney DL, Seyfarth RM (1990) How Monkeys See the World: Inside the Mind of Another Species. Chicago: University of Chicago Press. Condillac EB de (1756) An Essay on the Origin of Human Knowledge; Being a Supplement to Mr. Locke’s Essay on the Human Understanding. London: J. Nourse. (Translation of Essai sur l’Origine des Connaissances Humaines.)
Underpinnings for a Theory of Communicative Evolution
65
Hockett CF (1960a) Logical considerations in the study of animal communication. In: Animal Sounds and Communication (Lanyon WE, Tavolga WN, eds.). Washington, D.C.: American Institute of Biological Sciences. Hockett CF (1960b) The origin of speech. In: Human Communication: Language and Its Psychobiological Bases. Readings from Scientific American. San Francisco: W.H. Freeman. Hockett CF, Altmann SA (1968) A note on design features. In: Animal Communication: Techniques of Study and Results of Research (Sebeok TA, ed.). Bloomington: Indiana University Press. Kessen W, Levine J, Wendrich K (1979) The imitation of pitch by infants. Infant Behav Devel 2: 93–100. Locke JL (1993) The Child’s Path to Spoken Language. Cambridge, Mass.: Harvard University Press. Lorenz KZ (1963) Das Sogenannte Böse: Zur Naturgeschichte der Aggression [On Aggression]. Vienna: Dr. G Borotha-Schoeler Verlag. Lynch MP, Oller DK, Steffens ML, Buder EH (1995) Phrasing in prelinguistic vocalizations. Devel Psychobiol 28: 3–23. Meltzoff AN, Moore MK (1977) Imitation of facial and manual gestures by human neonates. Science 198: 75–78. Oller DK (1980) The emergence of the sounds of speech in infancy. In: Child phonology, vol. 1, Production (Yeni-Komshian G, Kavanagh J, Ferguson C, eds.), 93–112. New York: Academic Press. Oller DK (2000) The Emergence of the Speech Capacity. Mahwah, N.J.: Lawrence Erlbaum. Owings DH, Morton ES (1998) Animal Vocal Communication. Cambridge: Cambridge University Press. Papousˇek H, Papousˇek M (1979) The infant’s fundamental adaptive response system in social interaction. In: Origins of the Infant’s Social Responsiveness (Thoman EB, ed.), 175–208. Hillsdale, NJ: Lawrence Erlbaum. Papousˇek M, Papousˇek H (1989) Forms and functions of vocal matching in interactions between mothers and their precanonical infants. First Lang 9: 137–158. Peirce CS (1934) Pragmatism in retrospect: A reformulation. In: The Collected Papers of Charles Sanders Peirce, vol. 5 (Hartshorne C, Weiss P, eds.). Cambridge, Mass.: Harvard University Press. Roug L, Landberg I, Lundberg L-J (1989) Phonetic development in early infancy: A study of four Swedish children during the first eighteen months of life. J Child Lang 16: 19–40. Saussure F de (1968) Cours de Linguistique Générale. Paris: Payot. Stark RE (1980) Stages of speech development in the first year of life. In: Child phonology, vol. 1 (YeniKomshian G, Kavanagh J, Ferguson C, eds.), 73–90. New York: Academic Press. Stern DN, Jaffe J, Beebe B, Bennett SL (1975) Vocalizing in unison and in alternation: Two modes of communication within the mother–infant dyad. Ann NY Acad Sci 263: 89–100. Struhsaker TT (1967) Auditory communication among vervet monkeys (Cercopithecus aethiops). In: Social Communication Among Primates (Altmann SA, ed.), 281–324. Chicago: University of Chicago Press. Tinbergen N (1951) The Study of Instinct. Oxford: Oxford University Press. Trevarthen C (1979) Communication and cooperation in early infancy. A description of primary intersubjectivity. In: Before Speech: The Beginnings of Human Communication (Bullowa M, ed.), 321–347. London: Cambridge University Press. Zlatin M (1975) Preliminary Descriptive Model of Infant Vocalization During the First 24 Weeks: Primitive Syllabification and Phonetic Exploratory Behavior. National Institutes of Health Research Grants, final report.
III
METHODOLOGICAL AND THEORETICAL DEVELOPMENTS FOR THE FUTURE OF EVOLUTIONARY STUDY OF COMMUNICATION SYSTEMS
5
Social and Cultural Learning in the Evolution of Human Communication
Luc Steels Evolutionary Linguistics In order to understand how human languages could have emerged and continue to evolve, we need, above all, explanations for the enormous increase in complexity compared with animal communication systems. This increase has taken place for all aspects of language. Form: The repertoire of speech sounds used in human language is extraordinarily complex. It relies on an articulatory apparatus which needs to be controlled very fast and at a very fine-grained level. It requires the real-time processing of structured sounds despite noise and individual variation. Meaning: An intricate system of conceptualization underlies language (Langacker, 1987). This system consists of a way to organize the world into different objects and events, a way to categorize them, and a way to introduce structure from the viewpoint of the speaker and the listener. For example, the sentence “The car is behind the tree” implies that the speaker and the listener view themselves as positioned on a line which goes from themselves to the tree and then to the car. Human conceptualization is not fixed, but expands and adapts to the needs of the community. It is partly grounded in the real world through a sensorimotor apparatus and partly based on purely symbolic relations between abstract concepts. Lexicon: The lexicons of human languages are very large (an educated person uses on average 60,000 words and understands 100,000 words). Moreover, they are open ended. New words or new meanings are created even as conversations take place (H. Clark and Brennan, 1991). Natural lexicons exhibit homonyms, synonyms, and polysemy, which make communication ambiguous and very difficult to learn. Grammar: Human languages use a wide variety of means to indicate the structure and function of words in a phrase (word order, intonation, tone, stress, morphological marking, word form variation, and others). Using grammar requires a complex planning process on the part of the speaker and a complex plan for the interpretation process on the part of the hearer. Pragmatics: Human language is used in many different contexts and for many different purposes. Each context induces different registers of language and different dialogue patterns. New contexts arise continuously, and hence human language conventions need to adapt in order to remain flexible in achieving pragmatic goals. There is probably not going to be a single simple mechanism to explain this incredible rise in complexity for all these various aspects, and there is not a single approach to study
70
Luc Steels
it. Generally speaking, three types of questions have been asked, giving rise to three fields of inquiry. Linguists working in the area of historical linguistics have asked what changes have taken place in human language. They have amassed a remarkable set of facts documenting language change at all levels and have attempted to organize these facts in language typologies and laws of change (see, e.g., Traugott and Heine, 1991; Vogel and Comrie, 2000). Anthropologists and biologists have asked why human language might have evolved. Particularly the field of cultural anthropology has tried to identify changes in the ecology or the social organization of early hominids and relate that to the need for a more complex form of communication (see, e.g., Dunbar, 1994). A third type of question is how these developments took place—in other words, what are the causal mechanisms both at the level of individuals (their cognitive and bodily apparatus) and at the level of the group (their interactions)? This question is more recent and has been asked by people ranging from cognitive scientists to psychologists to linguists and to artificial intelligence researchers. Evolutionary linguistics, by analogy with evolutionary biology, is the field of study that attacks these three questions, particularly the third one. This chapter is intended as a contribution to the emerging field of evolutionary linguistics. The methodology of evolutionary linguistics is reminiscent of that of theoretical evolutionary biology: formal modeling and experiments with artificial systems. Such an approach makes it possible to examine the consequences of certain hypotheses and show with computer simulations or through mathematical analysis whether genes will spread in a population, given certain assumptions (Maynard Smith, 1975). In a similar spirit, I have been doing with my collaborators and students a variety of experiments with robots that engage in various forms of communication with languagelike features. The performance of “artificial linguistic agents” or robots in these experiments falls far short of human capabilities. But realism is not the point. The main goal is a precise and objective exploration of what factors could intervene in the origins and evolution of human communication systems (Steels, 2001c). Issues and Hypotheses Universals and Particulars There has been an ongoing debate in linguistics between those emphasizing the universal character of language (Chomsky, 1975) and those emphasizing the uniqueness of each language. The latter group includes historical linguists (Vogel and Comrie, 2000) and researchers conducting empirical studies of language and language use (Labov, 1994).
Evolution of Human Communication
71
At the level of sound systems, it has been pointed out that every possible sound which can be made by the human vocal tract is a potential element of a human language repertoire (Ladefoged and Maddieson, 1995). We therefore observe a bewildering variety, and speakers of one language are in general not even capable of perceiving the subtle sounds of another language or reproducing them, unless they have started at a very early age. The tones in Chinese or the clicks in K!ong are some obvious examples, but so are the v/b sounds in Spanish, which are subtly different from those used in English. Nevertheless, there are clear universal tendencies as well. For example, in the case of vowels, it is known that there is a progressive complexity in vowel systems (from three to four, to five, etc.) but that there is a very specific probability distribution for each size (Schwartz et al., 1997). Similar universal tendencies can be found for other aspects of language, such as syllable structure (Vennemann, 1988). It is also well established that there is a constant evolution in the sound systems of human languages, sometimes going very rapidly, and that this evolution also exhibits universal tendencies (Labov, 1994). Many similarities in the conceptualizations underlying different languages have been postulated (Wierzbicka, 1992). For example, the distinction between objects (things, people), on the one hand, and events, on the other hand, appears to be universal. Similarly, many categorial dimensions like space, time, aspect, countability, and kinship relations are lexicalized in almost all languages of the world, even though there may be profound differences in how this is done. For example, some languages lexicalize kinship relations as nouns (as in English: father) and others as verbs (Evans, 2000). But there are profound differences in the way different languages conceptualize reality (Talmy, 2000; Bowerman and Levinson, 2001). And this may impact other cognitive behavior, such as memory tests (Davidoff et al., 1999). For example, the conceptualization of the position of the car in “The car is behind the tree” is just the opposite in most African languages. The front of the tree is viewed as being in the same direction as the face of the speaker, and hence the car is conceptualized as in front of the tree as opposed to behind it (Heine, 1997). The lexicons of languages also differ profoundly because they use different word forms, even though it has been argued that there is a common core of words which is shared by all languages, pointing to a possible common Ursprache (Ruhlen, 1994). At the same time there are also profound differences in terms of which aspects of reality are lexicalized. For example, in Japanese the word -san (mister or miss/misses) is neutral with respect to male or female, whereas English forces us to make a gender distinction, and for females a further distinction based on marital status. The lexicon of a language clearly is in constant, rapid evolution. Finally, many linguists in the Chomskyan tradition of generative grammar have argued that all languages are variations on the same basic pattern, that of universal grammar. They
72
Luc Steels
have tried to capture the universality as a set of principles and the variation as a set of parameters that have to be set in particular languages (Chomsky and Lasnik, 1993). Based on empirical evidence, others have argued that natural languages are in fact profoundly different. For example, nothing seems more universal and basic than the parts of speech (noun, verb, adjective, adverb, preposition, etc.). Nevertheless, there is irrefutable evidence that many languages do not share the parts of speech that seem so obvious for English. For example Mundari, an Austro-Asiatic language, does not make a distinction between nouns and verbs (Bhat, 2000, p. 56). Any lexicalized predicate can be used as a verb, in the sense that it can be used as predication, taking tense and aspect markers, agreement, and voice, and as a noun, in which case it takes case markers and is used referentially. Words therefore denote both things and events. For example, lutur means both “ear” and “listen,” and kumRu, “thief” and “to steal.” Some languages do not have prepositions (but use double verb constructions instead); others, such as Boro, a Tibeto-Burman language, do not have adverbs but use morphological markers instead (Bhat, 2000, p. 59). Thus masa means “to dance” and masaglo, “to dance quickly.” There is a constant, significant evolution in the grammars of human languages. Thus a grammatical category similar to prepositions may evolve from double verb constructions or the category “auxiliary” may develop by differentiating the class of verbs (Traugott and Heine, 1991). All this points to a first major puzzle: How can we explain that there are both universal tendencies in human languages and, at the same time, such a bewildering variety? How can we explain that there is an ongoing, profound change in human language at all levels? Genetics versus Culture The tension between universals and particulars is related to another debate: How is the human language system transmitted? Is it primarily in a genetic fashion (i.e., through the human genome)? Or is it primarily in a cultural fashion (i.e., through learning)? The genetic approach to language evolution necessarily favors universals but has difficulty explaining strong variation and rapid evolution. The cultural approach favors particulars and has no difficulty explaining rapid evolution, but must still address why there are universal tendencies and how language is transmitted. Intermediate positions are possible. In particular, some researchers have argued for a strong Baldwin effect (Briscoe, 2000)—which would imply, however, that there is a cultural evolution first and that the resulting linguistic behaviors are sufficiently stable to be genetically encoded later. The genetic position is associated with researchers such as Pinker (1994), who have postulated a genetically encoded language acquisition device which embodies the con-
Evolution of Human Communication
73
straints of universal grammar. Language acquisition in this framework is not really a learning process but a maturation process. Data present in the linguistic environment act as a way to set the parameters of universal grammar (Lightfoot, 1991). The genetic encoding of language requires that language evolve in a genetic fashion—that is, that important changes in human language communication must have arisen from mutations and the propagation of these mutations in the human gene pool (Bickerton and Calvin, 2000). The cultural position relies on learning as the way in which language is transmitted (Tomasello, 1999). It has been criticized by those insisting on innateness because they point to a poverty of stimuli and a lack of direct linguistic feedback given to children. But those emphasizing learning argue that there is in fact quite a lot of pragmatic feedback, and they have been trying to find alternative learning methods essentially by pursuing two approaches: individualistic learning and cultural learning. In individualistic learning, the child is assumed to receive as input a large number of example cases where speech is paired with specific situations. She is assumed to extract through an inductive learning process what is essential and recurrent in these situations— in other words, to learn the appropriate categories underlying language, and then associate these categories with words. This viewpoint assumes a rather passive role of the language learner and no feedback from the speaker (see Fischer et al., 1994; E. Clark, 1987). It is widespread among researchers studying the acquisition of communication, and various attempts have been made to model it with neural networks or symbolic learning algorithms (Broeder and Murre, 2000). Induction by itself is a weak learning method that does not give identical results on the same data and may yield irrelevant clustering compared with human clustering. To counter this argument, it is usually proposed that innate constraints help the learner zoom in on the important aspects of the environment (Smith, 2001). In the case of social learning, interaction with other human beings is considered crucial. The mediator could be a parent and the learner a child, but children (or adults) can, and do, teach each other just as well. The goal of the interaction is not really teaching but something of practical value in the world—for example, to identify an object or an action. The mediator helps to achieve the goal and is often the one who wants to see the goal achieved. The mediator has various roles: She sets constraints on the situation to make it more manageable (scaffolding), gives encouragement on the way, provides feedback, and acts upon the consequences of the learner’s actions. The feedback is not directly about language, and certainly not about the concepts underlying language. The latter are never visible. The learner cannot telepathically inspect the internal states of the speaker, and the mediator cannot know which concepts are already known by the learner. Instead, feedback is pragmatic, operating in terms of whether the goal has been realized or not. Consider a situation where the mediator says, “Give me that pen,” and the
74
Luc Steels
learner picks up a piece of paper instead of the pen. The mediator might say, “No, not the paper, the pen,” and point to the pen. This is an example of pragmatic feedback. It not only is relevant to subsequent success in the task but also supplies the learner with information relevant for acquiring new knowledge. The learner can grasp the referent from the context and situation, hypothesize a classification of the referent, and store an association between the classification and the word for future use. While doing all this, the learner actively tries to guess the intentions of the mediator. The intentions are of two sorts. The learner must guess what the goal is that the mediator wants to see realized (e.g., “Pick up the pen on the table”) and also the way that the mediator has construed the world. Typically the learner uses herself as a model of how the mediator would make a decision and adapts this model when a discrepancy arises. Social learning enables active learning. The learner can initiate a kind of experiment to test knowledge that is uncertain or to fill in missing wholes. The mediator is available to give direct, concrete feedback for the specific experiment done by the learner. This obviously speeds up the learning, compared with a passive learning situation where the learner simply has to wait until examples arise that will push the learning forward. Coherence All the members of a language community must share (at least to a large extent) the same language conventions and the same conceptualizations, otherwise communication is not really possible. A key puzzle of evolutionary linguistics is where this coherence may come from, and different explanations have been put forward along the lines of the universal/genetic versus cultural/learning debate. The genetic framework explains language coherence through gene sharing. Genes of successful organisms—which presumably means, in the case of language, human beings, which are better at producing and comprehending the language in the community— propagate in a population until there is complete homogeneity. It must be noted that this process is very slow (10,000 years in the case of a human gene for digestion of lactose which could be studied in this respect) and that there is never complete homogeneity because of natural variation in the gene pool. But perhaps the language genes postulated by Pinker and others are like the genes for determining the number of fingers on the hand. They are so widespread that differences are not noticed. The genetic framework explains conceptual coherence by assuming that the concepts needed for language communication also evolve through mutation and subsequent propagation in the population. Once again, there is a difficulty with the speed of genetic evolution, which is too slow to explain the rapid rise and spread of new concepts in human populations. A good example is the many concepts associated with the Internet (browsing, home page, server, Internet provider, etc.) which are now common knowledge but did
Evolution of Human Communication
75
not exist even in the mid-1980s. It seems difficult to maintain that these were all innate. Another issue concerns the question of storage. Worden (1998) has argued that there is simply not enough genetic storage space available for the massive number of concepts and linguistic specifications that are usually assumed to be innate, particularly when the human genome is compared with the genomes of its closest species, which do not have language. On the other hand, if different language users learn the language and the concepts underlying language through social and cultural learning, how can this become shared? Several researchers (see review in Steels, 1997a) have shown through a number of computer simulations that two principles may explain this, both coming from the study of biological systems: self-organization and structural coupling. Self-organization is clearly seen in path formation in ant societies and many other pattern formation and collective phenomena (Camazine et al., 2001). In the case of ants, there is random variation (all ants crawling around randomly) and a positive feedback loop that influences variation. Concretely, when an ant finds a path, it leaves behind a chemical trail (pheromone) which attracts other ants. When they also travel on the same path, the trail becomes stronger, and hence more ants are attracted. The end result is the complete self-organization of all ants on a path without a central coordinator or prior knowledge. The principle of self-organization can be applied to explain how certain conventions spread in a population. If speakers use the conventions that were most successful in the past as a guide, we get a positive feedback loop. The more conventions are used, the more successful they are, and so they are used even more. This leads to a winner-take-all effect in which one word dominates for the expression of a particular meaning (see figure 5.1) (Steels, 1996). Structural coupling, a concept introduced by Maturana and Varela (1998), is needed to explain how concepts may become shared without direct feedback. Structural coupling means that two systems develop independently but, because they provide inputs and feedback to each other, they become tightly coordinated. In this case, the learning system generating new meanings and the learning system lexicalizing these meanings receive feedback from each other. New concepts generate new words, and the successful use of a word gives a boost to the concept that was used with this word. It has been shown in computer simulations (Steels, 1997b) and in experiments with robots (Steels et al., 2002) that this indeed gives rise to coherence, not only for words but also for meanings. Complexity A final issue concerns the increase in complexity. Here, too, we get radically different explanations, depending on the position with respect to the genetic versus cultural debate. In a genetic framework, the complexity of language can fundamentally increase only as a result of genetic mutations or the reshuffling of parameter settings (Lightfoot, 1998). The
76
Luc Steels
1
0,9
wogglesplat 0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
0
00 90
0
00 85
0
00
00
80
0 75
0
00 70
0
00 65
0
00
0
00
60
55
0
00 50
0
00
0
00
45
40
0
00 35
0
00 30
0
00
0
00
25
20
0
00 15
00
00 10
50
0
0
Figure 5.1 Results from an experiment in the emergence of a lexicon in a population of agents. The graph shows all the words for a given meaning and their percentage of use. A winner-take-all situation is clearly observed after a struggle in which several words compete for the same meaning.
introduction of a new part of speech presumably would require a change in the language genes. In a cultural framework, language users are viewed as active agents that shape and reshape their language. Complexity increases as a result of the growing needs of the language community, which push expressivity forward. These needs are met by the extension of the lexicon, which may lead to increases in the complexity of the sound system, so that words can still be well distinguished, or to the expansion of grammatical structures, so as to be able to say more with less. Many of the expansions are based on problem solving and make use of generic cognitive abilities. For example, analogy has been pointed out as playing an important role in the recruitment of existing words or constructions for new functions (Heine, 1997). Expressivity generally comes into conflict with learnability. And so there are also forces to simplify word forms or grammatical constructions. The messy features of a specific language are explained by the need to optimize the conflict between expressivity and economy.
Evolution of Human Communication
77
Evolutionary Game Theory It is possible to use the methodology of computer simulations and robotic experiments to examine any of the hypotheses mentioned here. For example, if one argues in favor of the principles and parameters theory, one can put forward a detailed universal grammar and show on a database of example sentences empirically collected from interactions between adults and children precisely how the parameters are set (see examples in Berwick, 1998; Briscoe, 2001). If one believes that “the learning bottleneck” (i.e., the fact that each generation has to learn the language of its predecessors) explains why languages have become compositional, one can make a precise model that gives “linguistic agents” a choice between compositional and holistic expression and see which choice is made if the language needs to be transmitted culturally (Kirby, 1999). In my own work I favor the cultural approach to language. I view language as a complex adaptive system that has evolved in a cultural fashion under the natural constraints given by the human sensorimotor apparatus, the cognitive apparatus, and the environments and ecologies in which humans find themselves (Steels, 2000). Moreover, I support the hypothesis that the learning process, both for conceptualization and for language itself, is highly social. My experiments attempt to substantiate these hypotheses, partly by showing that the cultural self-organization of language is possible, given social learning, and partly by showing that other learning methods or innate schemata are too slow to adapt to changes, or are impossible due to the lack of quality data or the inherent combinatorial explosion hidden in language learning. The rest of the chapter provides a bit more detail on two example experiments: one focusing on the emergence of sound repertoires and one looking at the emergence of words and word meaning. Imitation Games for Sound Repertoires Kaneko and Suzuki (1994) showed how the framework of evolutionary game theory could be used to study the evolution of sound repertoires in birds. My group started to use the same framework for phonetics around 1995, and applied it to different aspects of sound systems: vowels (de Boer, 1997) and syllables (Steels and Oudeyer, 2000). All this builds further on earlier work by phoneticians to show that the sound systems of natural languages are not arbitrary, but the consequence of various sensorimotor and cognitive constraints (Lindblom et al., 1984). The rest of this section describes the work of De Boer (1997) as an example how a repertoire of sounds may become agreed upon in a distributed group of robots. The question being addressed in these experiments is how robots can come to share a system of vowels without having been given a preprogrammed set and without central supervision, and how the universal tendencies found in human vowel systems can be explained through a process of cultural evolution under natural selection.
78
Luc Steels
In the robotic simulations, the sensorimotor apparatus of the robots consists of an acoustic analyzer, which extracts the first formants from the signal, and an articulatory synthesizer, which models relevant aspects of the human vocal tract. The robots play an imitation game. One robot produces a random sound from its repertoire. The other robot (the imitator) recognizes it in terms of its own repertoire and then reproduces the sound. Then the first robot attempts to recognize the sound of the imitator, and if it is similar to its own, the game is a success; otherwise it is a failure. This setup therefore adopts the motor theory of perception whereby recognition of a sound amounts to the retrieval of a motor program that can reproduce it. To achieve this task, the robots in the De Boer experiment use two cognitive structures: The vowels are mapped as points into a space formed by the first, second, and third formants (see figure 5.2), and a nearest-neighbor algorithm is used to identify an incoming sound with the sounds already stored as prototypes. These prototypes have an associated motor program that can be used to reproduce the sound. When an imitation game succeeds, the score of the prototype goes up, which means that the certainty that the sound is in the repertoire increases. There are two types of failure. One possibility is that the incoming sound is nowhere near any of the sounds already in the repertoire. In that case it is added to the prototype space and the robot tries to find its corresponding motor program by producing and listening to itself, progressively adjusting the motor program until it produces the desired sound. Alternatively, the incoming sound is near an existing sound but the reproduction is rejected by the producing robot. This means that the imitator does not make sufficiently fine-grained distinctions. Consequently the failure can be repaired by adding this new incoming sound to the repertoire as a new prototype and associating it with a motor program learned again by hill-climbing. In order to get new sounds into the repertoire,
8
16
14
10
10% nolse / 1000 games
8
16
14
T2 12
10
10% nolse / 4000 games
8
16
14
T2 12
10
8
1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
F1
10
T2 12
F1
14
10% nolse / 500 games
F1
16
T2 12
5
5
5
5
6
6
6
6
7
7
7
7
8
8
8
8
Figure 5.2 Example of the evolution of a vowel system. Vowels are represented in formant space (first and second formant). Each dot represents the vowel prototype for one agent. Prototypes progressively cluster into groups, and occasionally a new cluster appears.
F1
10% nolse / 20 games
Evolution of Human Communication
79
robots occasionally “invent” a new sound by a random choice of the articulatory parameters and store its acoustic image in the prototype space. Sounds which have consistently low scores are thrown out, and two sounds that are very close together in the prototype space are merged. Quite remarkably, the following phenomena are perceived when a consecutive series of games is played by a population of robots: (1) a repertoire of shared sounds emerges through self-organization (see figure 5.2); (2) the repertoire keeps expanding as long as there is pressure to do so; (3) most interestingly, the kinds of vowel systems that emerge have the same characteristics as those of natural vowel systems (De Boer, 1997). The experiment therefore shows not only that the problem can be solved in a distributed fashion but also that it captures some essential properties of natural systems. Three principles have been used. First, reinforcement learning based on feedback after each game (Sutton and Barto, 1998) explains how individual agents may learn the vowels that are present in their environment. Second, self-organization, in the sense of Nicolis and Prigogine (1989), explains how a group of individuals arrives at a shared repertoire. It arises when there is a positive feedback loop in an open nonlinear system. Here there is a positive feedback between use and success. Sounds that are (culturally) successful propagate. The more a sound is used, the more success it has, and thus it will be used even more. Self-organization explains how the group reaches coherence, but not why these specific vowels, and not others, occur. For this we need a third principle, natural selection, familiar from Darwinian explanations of biological complexity. The scores of vowels that can be successfully distinguished and reproduced given a specific sensorimotor apparatus have a tendency to increase, and hence they survive in the population. Novel sounds or deviations from existing sounds (which automatically are produced due to the unavoidable stochasticity) create variation, and sensorimotor constraints select those that can be reproduced and recognized. The more closely we can model natural human sensorimotor behavior and environmental conditions, the more realistic the vowel systems become. Guessing Games for Concepts and Words The notion of a language game was promoted by Wittgenstein to emphasize that language and meaning are not based on context-independent abstractions but arise as part of concrete interactive situations. I have found language games to be an excellent vehicle for studying the emergence of the words and meanings in robotic experiments within the framework of social learning (Steels, 2001c). A language game is a situated interaction between two agents. The interaction involves not only language aspects (i.e., the parsing and producing of utterances) but also the grounding through sensory processing, execution of appropriate gestures or other actions in the world, and, most important, steps for learning new parts of language, if necessary:
80
Luc Steels
new words, new meanings for existing words, new phrases, new pronunciations of known words. Complex dialogues involve multiple language games interlaced with each other. The meaning of a word or phrase comes from its role in a language game, just like the meaning of “queen” in chess. This explains why the meaning of a word cannot be defined easily in absolute terms but arises from the situation and context. It explains why humans have no trouble disambiguating words or phrases: The interpretation processes take place in the context of a situated language game which strongly restricts what is being talked about. I have used this framework of language games to study the origins of meanings and lexicons in groups of agents. A first example game that I have studied intensely is the guessing game (Steels, 1998). In the guessing game the speaker tries to draw the attention of the hearer to an object in the environment. For example, Mary sits at the table and asks her neighbor Pierre for the salt by saying “salt.” She perhaps points to the salt at the same time. The table, all the objects on it, and the people around the table and their actions form the context of the game. The salt is called the topic. We notice immediately that the word spoken, “salt,” is only a small part of what is going on. The hearer must also perceive and conceptualize the situation, interpret the gestures made by the speaker, guess what action the speaker may want, and so on. All of these are an intrinsic part of the language game. There are many ways the guessing game can fail. Pierre may incorrectly understand the word, or simply not know the word, or believe that the word has another meaning. This failure becomes obvious by subsequent action (for example, Pierre hands Mary the water instead of the salt). Every language game must contain provisions for detecting failure and repairing it. The speaker typically provides more information, possibly in a nonverbal way through additional gestures. If failure is due to lack of knowledge, the language game is an opportunity for learning. For example, when the hearer does not know the word “salt” (perhaps Pierre is French), he can use this example to acquire a new word. If he fails to conceptualize the scene (perhaps Pierre comes from a culture where salt never is purified to white grains), he may acquire or enrich his repertoire of concepts. Many variations can easily be imagined. But in any case, it is crucial, first, that speaker and hearer individually keep a score for each association between form and meaning in their lexicon. The score reflects the success of that association in their game. When the speaker needs to choose a form to express a particular meaning, naturally the form–meaning association with the highest score should be chosen because this has had the most success in the past. Second, the hearer considers first associations with a higher score because they are more likely to be chosen by the speaker, there is a positive feedback loop between use and success in the sense that associations with a higher score increase their chance of being used, and they thus increase their potential for leading to a
Evolution of Human Communication
81
successful communication in the future. And third, there is a strong structural coupling between concept formation and language, achieved when language gives feedback on the adequacy of concepts. I have formalized and implemented all aspects of the guessing game on simple robots equipped with pan-tilt cameras (figure 5.3). The context consists of a small area on the white board covered with geometric figures, and the topic is one figure—for example, a red square. A decision-tree-like algorithm was used for conceptualization. Input to the decision tree is output from a battery of statistical pattern recognition and computer vision algorithms. Thus, “left” meant that the x coordinate of the middle point of a figure was less than the average x coordinates of all figures in the context, and “right” meant that it was greater than this average; “large” meant that the size of the figure was greater than the average size of all figures; and so on. A selectionist learning method was used for concept acquisition: Decision trees grow in a random fashion when there is a failure to
Figure 5.3 Setup for the Talking Heads experiment with two pan-tilt cameras facing a white board on which colored geometric figures are pasted. The agents using these robotic bodies play a guessing game.
82
Luc Steels
find a distinctive concept that distinguishes the topic from the other objects in the concept, and branches are pruned when they are irrelevant or unsuccessful in subsequent language games (Steels, 1997b). To play a game, the robots capture an image, segment it, derive features using statistical pattern recognition techniques, and give pragmatic feedback by gesturing toward objects with their cameras. Utterances are single words, and the lexicon consists of associations between single words and visually grounded predicates. Each association is stored as a triple
, where r is a visually grounded feature or combination of features, s is a symbol, and k is a score to reflect how successful this association has been in past games, and hence how successful it might be in the future. Each individual robot has its own lexicon. There is no global knowledge nor central control. A somewhat simplified version of the game involves the following steps (see figure 5.4): 1. Shared attention. By pointing, gazing, moving an object, or other means, the speaker draws the visual attention of the hearer to the topic or at least to a narrow context which includes the topic. The speaker may emit a word that facilitates shared attention, like “look,” and observes whether the hearer gazes toward the topic. Based on this activity, each agent can be assumed to have captured an image that reflects the shared context. 2. Speaker behavior. The speaker then conceptualizes the topic, yielding a representation r. Conceptualization means that a combination of concepts is found that distinguishes the topic from the other objects in the context. Let us assume, for the simple version of the game, that this is a single predicate which is true for the topic but not for the other objects. For example, if every object on the table is blue but the topic is white, then color is a good way to refer to the topic. The speaker then collects all associations in his lexicon and picks out the one with highest score k. The s from this association is the best word to
perception
conceptualize
concept
world
sense
reference
verbalize
sense
de-reference
utterance
analyze
perception
apply
concept
Figure 5.4 Left: processes carried out by the speaker. Right: processes carried out by the hearer. There are also feedback processes moving in alternate directions until the agents settle on coherent choices for all the stages.
Evolution of Human Communication
83
communicate, from the speaker’s point of view. It is transformed into a speech signal and transmitted to the hearer. 3. Hearer behavior. The hearer receives the speech signal, recognizes the word s, and looks up all associations in his memory. 3.1. If s is a new word for the hearer, which implies that there is no association in lexical memory for s, then the hearer signals incomprehension and the speaker points to the topic. Given that the hearer then knows the topic, he can guess the meaning most plausibly used by the speaker by performing a discrimination game. The resulting representation r≤ is then used to add a new association: to the memory, with i being the initial default score. 3.2. If there are associations, the hearer applies each representation r¢ to the current scene (perhaps starting with those with the highest score m) to see whether any one of them picks out a unique object. If that is the case, this is the topic. There may be ambiguity (i.e., more than one possible topic), in which case the referent picked out by the association with the highest score is chosen. The hearer then points to the topic. 4. Feedback. Suppose that the hearer finds a referent (step 3.2); then there are two outcomes. 4.1. The speaker agrees that this is the right referent (i.e., it was the topic she originally had in mind) and signals agreement. In that case both speaker and hearer increase the score of the association they used and decrease the score of competing associations. For the speaker, a competing association is one which involves the same meaning but a different word. For the hearer, a competing association is one which involves the same word but a different meaning. 4.2. If the speaker signals that the hearer failed to recognize the topic, then the score of the used associations is decreased by both speaker and hearer. The speaker gives additional feedback until speaker and hearer share the same topic. The hearer can then conceptualize the topic from his point of view and either store a new word (as in 3.2) or increase the score of an existing association (as in 4.1). 5. Speaker or hearer may fail to conceptualize the scene, in which case a concept acquisition algorithm is triggered which should try to acquire a new conceptualization, using the current situation as a source for learning. Here is a typical dialogue with the images and concept repertoires shown in figure 5.4. Both agents compute a number of features for each figure in the image: X, the horizontal position of the middle point of the figure; Y, the vertical position; H, the height; W, the width; A, the angularity (the number of angles); R, G, Y, and B, the amount of red, green, yellow, and blue in the figure; and L, the brightness:
84
Luc Steels
Object 0 (0,1) X: 0.37, Y: 0.71, H: 0.48, W: 0.21, A: 0.45, R: 0.17, G: 0.0, Y: 0.0, B: 0.39, L: 0.28 Object 1 (1,0.96) X: 0.7, Y: 0.69, H: 0.38, W: 0.22, A: 0.45, R: 0.98, G: 0.0, Y: 0.52, B: 0.0, L: 0.36 Object 2 (0.42,0.0) X: 0.51, Y: 0.31, H: 0.21, W: 0.51, A: 0.70, R: 0.0, G: 0.99, Y: 0.73, B: 0.0, L: 0.46 The first object (with coordinates 0,1) is the topic. Based on the decision trees (shown in figure 5.5), the agent conceptualizes this object in terms of distinctions on the blue channel. A shade of blue (between 0.25–0.5) is distinctive for the topic but not for any
Figure 5.5 Example of a guessing game played by two robots. Left, the images captured by the speaker (top) and the hearer (bottom). Notice that they are not exactly the same. On the righthand side are the decision trees of the speaker (top) and the hearer (bottom). The repertoires are not the same, although the same distinction happens to be used by both agents in the game discussed in the text.
Evolution of Human Communication
85
other object in the context. The speaker has three words in the lexicon for this: XAGADUDE (score 0.1), NIBIDESU (score 0.0), and TETIPI (score 0.0). The first word is chosen. Speaker: XAGADUDE. The hearer does not know this word, and therefore signals incomprehension. Hearer: Huh? The speaker now points to the topic, using the camera. The hearer then performs its own categorization, using its own decision trees, which happens to yield the same conceptualization. The hearer then adds a new association in the lexicon. Note that it could very well have happened that the hearer used another conceptualization for this scene (for example, based on brightness or height), in which case there would be a divergence in the lexicon. Such a divergence would show up when in a later game the same agents are confronted with a disambiguating situation. I have done experiments with a growing population of up to 3,000 robots with multiple installations and sharing of bodies by more than one robot. The robots played in total 500,000 language games over a period of three months. These experiments show that (1) a lexicon indeed arises in the population from scratch—there was a core of a few hundred words and a total lexicon of 8,000 words; (2) the lexicon is transmitted as new generations come in; (3) there is a high degree of communicative success which is due to sufficient sharing of words for specific meanings (as shown in figure 5.1) and sufficient sharing of meanings. The three principles discussed earlier are at work: reinforcement learning, self-organization, and structural coupling. A broader discussion of this experiment and an analysis of the causal factors explaining its success can be found in Steels et al. (2002). Classification Games with Autonomous Robots More recently several experiments on the AIBO doglike robot were performed within the same language game framework (Steels and Kaplan, 2001). In many respects these experiments are a giant step beyond the previous one. First of all, the AIBO is a fully autonomous mobile robot with more than a thousand behaviors, coordinated through a complex behavior-based motivational system (Fujita and Kitano, 1998). It follows that getting attention from the robot and sharing the same perspective on the world prior to a game is complex. The dialogue can be enhanced by gestures and movements by the robot, and onboard visual processing and sensing can be used. The experiments employed speech input and output, using off-the-shelf speech components. Obviously the use of spoken language increases still further the uncertainty of the communication. Second, the dialogue took place between a human and a robot. This introduced many additional
86
Luc Steels
complexities but gave me the opportunity to design and test very concrete models of social learning. Nevertheless, several language games were successfully implemented, starting with the guessing game. Rather than using the decision trees discussed earlier, conceptualization was performed using a nearest-neighbor algorithm with a memory of stored object views (similar to the one used in Bartlett, 1997). The object memory is acquired by using instance-based learning. Every language game is an opportunity to acquire a new view of an object or to learn that there is a new class of objects of which the current example is a first instance. This experiment therefore showed that any kind of concept acquisition can be used. The topic can be a single object, an action, or a property of a situation. Other games focused on naming body parts and actions to be used in commands. An example dialogue is the following, starting when the experimenter shows a red ball (figure 5.6).
Figure 5.6 A guessing game between the AIBO and a human experimenter, Frederic Kaplan, involving the use and acquisition of a word for “ball.”
Evolution of Human Communication
87
Human: Sit. Human: Sit down. AIBO has already acquired names of actions. Forcing the robot to sit down is a way to make it concentrate on the language game. The human now shows the ball to the robot. Human:
Look.
Human: Ball. The word “look” is helping to cause a focus on the guessing game based on visual input. The robot performs image capturing and segmentation. The game will be possible if a segment has been found. The robot tries to recognize the object, using a nearest-neighbor algorithm. Aibo: Ball? The robot asks for feedback of the word to make sure that the word has been understood. “Ball” is the word that will then be associated with the object. Human: Yes. There is positive feedback on the word pronounced. This feedback causes the rest of the guessing game to proceed (i.e., storage of new word if not yet known, increase of the score, etc.). I have shown that the visual data generated in this experiment are so confusing that an unsupervised learning algorithm cannot find the concepts that are required to use the (English) words correctly, thus suggesting that there must be a strong causal influence of language on concept formation (Steels and Kaplan, 2001). We also see evidence in the same experiment of why the learning of language has to be social. The mediator must restrict the attention of the learner to help focus on what is important, on what needs to be learned. The mediator must help to ensure that the learner has good and relevant data. When the role of the mediator is reduced, we get less adequate results both for concept acquisition and for word learning (Steels and Kaplan, 2001). Conclusions Evolutionary linguistics is concerned with explaining the causal factors underlying the evolution of humanlike communication. One way to test theories is to engage in computer simulations and robotic experiments. The chapter has discussed various issues in evolutionary linguistics, specifically how coherence and complexity may arise and how language is transmitted from one generation to the next. I have briefly introduced a few
88
Luc Steels
experiments that show the importance of social and cultural learning as well as concrete models to test these hypotheses. Self-organization, structural coupling, and reinforcement learning were shown to be among the primary forces. There are obviously many open problems, particularly concerning the origins of grammar, but there is no doubt that a promising new avenue has been opened up to investigate a fascinating enigma of human evolution: the origins and evolution of language. Acknowledgment I am indebted to the members of the Sony Computer Science Laboratory in Paris, particularly Frederic Kaplan, Angus McIntyre, Eduardo Miranda, and Pierre-Yves Oudeyer, and of the VUB Artificial Intelligence laboratory in Brussels, in particular Paul Vogt, Joris van Looveren, Bart de Boer, Tony Belpaeme, and Edwin de Jong. References Bartlett M (1997) SEEMORE: Combining color, shape, and texture histogramming in a neurally-inspired approach to visual object recognition. Neur Comp 9: 777–804. Berwick R (1998) Language evolution and the minimalist program: The origins of syntax. In: Approaches to the Evolution of Language (Hurford J, Studdert-Kennedy M, Knight C, eds.), 320–340. Cambridge: Cambridge University Press. Bhat D (2000) Word classes and sentential functions. In: Approaches to the Typology of Word Classes (Vogel M, Comrie B, eds.), 47–63. Berlin: Mouton de Gruyter. Bickerton D, Calvin W (2000) Lingua ex Machina: Reconciling Darwin and Chomsky with the Human Brain. Cambridge, Mass.: MIT Press. Bowerman M, Levinson S (2001) Language Acquisition and Conceptual Development. Cambridge: Cambridge University Press. Briscoe E (2000) Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device. Language 76 (2): 245–296. Briscoe E (2001) Grammatical acquisition and linguistic selection. In: Linguistic Evolution Through Language Acquisition: Formal and Computational Models (Briscoe E, ed.). Cambridge: Cambridge University Press. Broeder P, Murre J (2000) Models of Language Acquisition: Inductive and Deductive Approaches. Oxford: Oxford University Press. Camazine S, Deneubourg J-L, Franks N, Sneyd J, Theraulaz G, Bonabeau E (2001) Self-Organization in Biological Systems. Princeton, N.J.: Princeton University Press. Chomsky N (1975) Reflections on Language. New York: Pantheon. Chomsky N, Lasnik H (1993) The theory of principles and parameters. In: Syntax: An International Handbook of Contemporary Research (Jacobs J, von Stechow A, Sternefeld W, Vennemann T, eds.), 506–569. Berlin: Walter de Gruyter. Clark E (1987) The principle of contrast: A constraint on language acquisition. In: Mechanisms of Language Acquisition (MacWhinney B, ed.). Hillsdale, N.J.: Lawrence Erlbaum. Clark H, Brennan S (1991) Grounding in communication. In: Perspectives on Socially Shared Cognition (Resnick L, Levine J, Teasley S, eds.), 127–149. Washington, D.C.: APA Books.
Evolution of Human Communication
89
Davidoff J, Davies I, Roberson J (1999) Color categories in a stone-age tribe. Nature 398: 230–231. de Boer B (1997) Self-Organisation in vowel systems through imitation. In: Proceedings of the fourth European Conference on Artificial Life (Husbands P, Harvey I, eds.), 503–510. Cambridge, Mass.: The MIT Press. Dunbar R (1996) Grooming, Gossip and the Evolution of Language. Cambridge, Mass.: Harvard University Press. Evans N (2000) Kinship verbs. In: Approaches to the Typology of Word Classes (Vogel P, Comrie B, eds.), 103–172. Berlin: Mouton de Gruyter. Fischer C, Hall G, Rakowitz S, Gleitman L (1994) When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth. Lingua 92: 333–375. Fujita M, Kitano H (1998) Development of an autonomous quadruped robot for robot entertainment. Autonomous Robotics 5: 1–14. Heine (1997) The cognitive foundations of grammar. Oxford: Oxford University Press. Hurford J, Knight C, Studdert-Kennedy M (eds.) (1998) Approaches to the Evolution of Language: Social and Cognitive Bases. Cambridge: Cambridge University Press. Kaneko K, Suzuki J (1994) Imitation games. Physica D75. Kirby S (1999) Function, Selection and Innateness: The Emergence of Language Universals. Oxford: Oxford University Press. Labov W (1994) Principles of Linguistic Change, vol. 1, Internal Factors. Oxford: Basil Blackwell. Ladefoged P, Maddieson I (1995) The Sounds of the World’s Languages. Chicago: University of Chicago Press. Langacker R (1987) Foundations of Cognitive Grammar, vol. 1. Stanford, Calif.: Stanford University Press. Lightfoot D (1991) How to Set Parameters. Cambridge, Mass.: MIT Press. Lightfoot D (1998) The Development of Language: Acquisition, Change, and Evolution. Oxford: Blackwell. Lindblom B, MacNeilage P, Studdert-Kennedy M (1984) Self-organizing processes and the explanation of language universals. In: Explanations for Language Universals (Butterworth B, Comrie B, Dahl O, eds.), 181–203. Berlin: Walter de Gruyter. Maturana H, Varela F (1998) The Tree of Knowledge (rev. ed.). Boston: Shambhala. Maynard Smith J (1975) Evolution and the Theory of Games. Cambridge: Cambridge University Press. Nicolis G, Prigogine I (1989) Exploring complexity. New York: Freeman and Co. Pinker S (1994) The Language Instinct: The New Science of Language and Mind. Harmondsworth, U.K.: Penguin. Ruhlen M (1994) On the Origin of Languages: Studies in Linguistic Taxonomy. Stanford, Calif.: Stanford University Press. Schwartz J-L, Boe L-J, Vallee N, Abry C (1997) Major trends in vowel system inventories. J Phonet 25: 233–253. Smith L (2001) How domain-general processes may create domain-specific biases. In: Language Acquisition and Conceptual Development (Bowerman M, Levinson SC, eds.), 101–131. Cambridge: Cambridge University Press. Steels L (1996) Self-organizing vocabularies. In: Proceedings of the Artificial Life V Conference (Langton C, Shimohara T, eds.), 179–184. Cambridge, Mass.: MIT Press. Steels L (1997a) The synthetic modeling of language origins. Evol Commun J 1 (1): 1–35. Steels L (1997b) Constructing and sharing perceptual distinctions. In: Proceedings of the European Conference on Machine Learning (van Someren M, Widmer G, eds.), 4–13. Berlin: Springer-Verlag. Steels L (1998) The origins of syntax in visually grounded robotic agents. Art Intell 103 (1,2): 133–156. Steels L (2000) Language as a complex adaptive system. In: Proceedings of PPSN VI, Lecture Notes in Computer Science, Berlin, Germany, September 2000 (Schoenauer M, ed.), 17–26. Berlin: Springer-Verlag. Steels L (2001a) Social learning and language acquisition. In: Social Robots (McFarland D, Holland O, eds.). Oxford: Oxford University Press.
90
Luc Steels
Steels L (2001b) The methodology of the artificial. Behav Brain Sci 24 (6): 1110–1111. (Commentary on Barbara Webb’s article.) Steels L (2001c) Language games for autonomous robots. IEEE Intell Sys (September/October) 16 (5): 16–22. Steels L, Kaplan F (2001) AIBO’s first words. The social learning of language and meaning. Evol Commun J 4 (1): 3–32. Steels L, Kaplan F, McIntyre A, Van Looveren J (2002) Crucial factors in the origins of word-meaning. In: The Transition to Language (Wray A et al., eds.). Oxford: Oxford University Press. Steels L, Oudeyer P-Y (2000) The cultural evolution of syntactic constraints in phonology. In: Proceedings of Artificial Life (Bedeau et al., eds.), 382–394. Cambridge, Mass.: MIT Press. Sutton R, Barto A (1998) Reinforcement Learning. Cambridge, Mass.: MIT Press. Talmy L (2000) Toward a Cognitive Semantics: Concept Structuring Systems (Language, Speech, and Communication). Cambridge Mass.: MIT Press. Tomasello M (1999) The Cultural Origins of Human Cognition. Cambridge, Mass.: Harvard University Press. Traugott E, Heine B (1991) Approaches to Grammaticalization. 2 vols. Amsterdam: John Benjamins. Vennemann T (1988) Preference Laws for Syllable Structure. Berlin: Mouton de Gruyter. Vogel P, Comrie B (2000) Approaches to the typology of word classes. Empirical Approaches to Language Typology 23. Berlin: Mouton de Gruyter. Webb B (2001) Can robots make good models of biological behaviour? Behavioural and Brain Sciences 24 (6): 1033–1094. Wierzbicka A (1992) Semantics, Culture and Cognition. Oxford: Oxford University Press. Worden R (1995) A speed limit for evolution. J Theoret Biol 176: 137–152.
6
The Role of Learning and Development in Language Evolution: A Connectionist Perspective
Morten H. Christiansen and Rick Dale Introduction Much ink has been spilled arguing over the idea that ontogeny recapitulates phylogeny. The discussions typically center on whether developmental stages reflect different points in the evolution of some specific trait, mechanism, or morphological structure. For example, the development trend from crawling to walking in human infants can be seen as recapitulating the evolutionary change from quadrupedalism to bipedalism in the hominid lineage. Closer to the area of the evolution of communication, casts have been taken to indicate that the vocal tract of newborn human infants more closely resembles those of australopithecines and extant primates than the adult human vocal tract—with the vocal tract of Neanderthals falling in between, roughly corresponding to that of a twoyear-old human child (Lieberman, 1998). These data could suggest that the development of the vocal tract in human ontogeny is recapitulating the evolution of the vocal tract in hominid phylogeny. However, other researchers have strongly opposed such a perspective, arguing that evolution and development work along entirely different lines when it comes to language (Pinker and Bloom, 1990). In this chapter, we provide a different perspective on this discussion within the domain of linguistic communication, arguing that language evolution to a large extent has been shaped by language learning. A rapidly growing body of work on the evolution of language is focusing on the role of learning—often in the guise of “cultural transmission”—in the evolution of linguistic communication (e.g., Batali, 1998; Christiansen, 1994; Deacon, 1997; Kirby and Hurford, 2002). Instead of concentrating on biological changes to accommodate language, this approach stresses the adaptation of linguistic structures to the biological substrate of the human brain. Languages are viewed as dynamical systems of communication that are subject to selection pressures arising from limitations on human learning and processing. From this perspective, language evolution can be construed as being shaped by language development, rather than vice versa.1 Computational simulations have proved to be a useful tool for investigating the impact of learning on the evolution of language. Connectionist models (also sometimes referred to as “artificial neural networks” or “parallel distributed processing models”) provide a natural framework for exploring a learning-based perspective on language evolution because they have been applied extensively to model the development of language (see, e.g., Bates and Elman, 1993; MacWhinney, 2003; Plunkett, 1995; Seidenberg and MacDonald, 2001, for reviews). In this chapter, we show how language evolution may have been shaped by developmental constraints on language acquisition. First, we discuss
92
Morten H. Christiansen and Rick Dale
connectionist models in which the explanations of particular aspects of language evolution and linguistic change depend crucially on the learning properties of specific networks— properties that have also been pressed into service to explain similar aspects of language acquisition. We then present two simulations that directly demonstrate how network learning biases over generations can shape the language being learned. Finally, we conclude the chapter with a brief discussion of the possible theoretical advantages of approaching language evolution from a learning-based perspective. Evolution Through Learning Connectionist models can be thought of as a kind of “sloppy” statistical function approximator learning from examples to map a set of input patterns onto a set of associated output patterns. The two most important constraints on network learning (at least for the purpose of this chapter) derive from the architecture of the network itself and the statistical makeup of the input–output examples. Differences in network configuration (such as learning algorithms, connectivity, number of unit layers, etc.—see Bishop, 1995; Smolensky et al., 1996) provide important constraints on what can be learned. For example, temporal processing of words in sentences is better captured by recurrent networks in which previous states can affect current states, rather than in simple feed-forward networks in which current states are unaffected by previous states. These architectural constraints interact with constraints inherent in the input–output examples from which the networks have to learn. In general, frequent patterns are more easily learned than infrequent patterns because repeated presentations of a given input–output pattern will strengthen the weights involved. For example, for a network learning the English past tense, frequently occurring mappings, such as go Æ went, are learned more easily than less frequent mappings, such as lie Æ lay. However, low-frequency patterns may be more easily learned if they overlap in part with other patterns. This is because the weights involved in the overlapping features of such patterns will be strengthened by all the patterns that share those features, making it easier for the network to acquire the remaining unshared pattern features. In terms of the English past tense, this means that the partial overlap in the mappings from stem to past tense in sleep Æ slept, weep Æ wept, keep Æ kept (i.e., -eep Æ -ept) will make network learning of the these mappings relatively easy even though none of the words have a particularly high frequency of occurrence. Importantly, these two factors—the frequency and regularity (i.e., degree of partial overlap) of patterns—interact with each other. Thus, highfrequency patterns are easily learned independently of whether they are regular or not, whereas the learning of low-frequency patterns suffers if they are not regular (i.e., if they do not have partial overlap with other patterns).
Role of Learning and Development in Language Evolution
93
This characteristic of learning in neural networks makes them suitable for capturing human language processing as many aspects of language acquisition and processing involve such frequency by regularity interactions (e.g., auditory word recognition, Lively et al., 1994; visual word recognition, Seidenberg, 1985; English past tense acquisition, Hare and Elman, 1995). The frequency by regularity interaction also comes into play when processing sequences of words. In English, for example, embedded subject relative clauses such as that attacked the reporter in the sentence The senator that attacked the reporter admitted the error have a regular ordering of the verb (attacked) and the object (the reporter)—it is similar to the ordering in simple transitive sentences (e.g., The senator attacked the reporter). Embedded object relative clauses, on the other hand, such as that the reporter attacked in the sentence The senator that the reporter attacked admitted the error have an irregular verb–object ordering with the object (the senator) occurring before the verb (attacked). The regular nature of subject relative clauses—their patterning with simple transitive sentences—makes them easy to learn and process relative to the irregular object relative clauses; this is reflected in the similar way in which both humans and networks deal with the two kinds of constructions (MacDonald and Christiansen, 2002). As we shall see next, the frequency by regularity interaction is also important for the connectionist learning-based approach to language evolution. From this perspective, structures that are either frequent or regular are more likely to be transferred from generation to generation of learners than structures that are irregular and have a low frequency of occurrence. Learning-based Morphological Change Although the first example comes from the area of morphological change, we suggest that the same principles are likely to have played a role in the evolution of morphological systems as well. Connectionist networks have been applied widely to model the acquisition of past tense and other aspects of morphology (for an overview, see Christiansen and Chater, 2001). The networks’ sensitivity to the frequency by regularity interaction has proven crucial to this work. Simulations by Hare and Elman (1995) have demonstrated that these constraints on network learning can also help explain observed patterns of dramatic change in the English system of verb inflection over the past 1,100 years. The morphological system of Old English (ca. 870) was quite complex, involving at least ten different classes of verb inflection (with a minimum of six of these being “strong”). The simulations involved several “generations” of neural networks, each of which received as input the output generated by a trained network from the previous generation. The first network was trained on data representative of the verb classes from Old English. However, training was stopped before learning could reach optimal performance.
94
Morten H. Christiansen and Rick Dale
The imperfect output of the first network was used as input for a second-generation net. This reflected the causal role of imperfect transmission from learner to learner in language change. Training for the second-generation network also was halted before learning reached asymptote. Output from the second network was then given as input to a third network, and so on, until seven generations were trained. This training regime led to a gradual change in the morphological system. These changes can be explained by verb frequency in the training corpus and phonological regularity (i.e., phonological overlap between mappings, as in the -eep Æ -ept example above). As expected, given the frequency by regularity interaction, the results revealed that membership in small classes, irregular phonological characteristics, and low frequency all contributed to rapid morphological change. High frequency and phonologically regular patterns were much less likely to change. As the morphological system changed through generations, the pattern of simulation results closely resembled the historical change in English verb inflection from a complex past tense system to a dominant “regular” class and small classes of “irregular” verbs. These simulations demonstrate how constraints on network learning can result in morphological change over time. Although these models cannot address such powerful influences as borrowing from foreign languages or other kinds of social change, we suggest that these learning-based pressures may have been an important force in shaping the evolution of morphological systems more generally. Next, we shall see how similar considerations may help explain the existence of word-order universals. Learning-based Constraints on Word Order Despite the considerable diversity that can be observed across the languages of the world, it is also clear that languages share a number of relatively invariant features in the way words are put together to form sentences. We propose that many of these invariant features—or linguistic universals—may derive from learning-based constraints, such as the frequency by regularity interaction. As an example consider the head of a phrase, the particular word in a phrase that determines the properties and meaning of the phrase as a whole (such as the noun boy in the noun phrase the boy with the bicycle). Across the world’s languages, there is a statistical tendency toward a basic format in which the head of a phrase consistently is placed in the same position—either first or last—with respect to the remaining clause material. English is considered to be a head-first language, meaning that the head is most frequently placed first in a phrase, as when the verb is placed before the object noun phrase in a transitive verb phrase such as eat curry. In contrast, speakers of Hindi would say the equivalent of curry eat, because Hindi is a head-last language. Christiansen and Devlin (1997) trained simple recurrent networks (SRN; Elman, 1990) on corpora generated by thirty-two different grammars that differed in the regularity of
Role of Learning and Development in Language Evolution
95
their head ordering (i.e., irregular grammars would have a highly inconsistent mix of headfirst and head-final phrases). The networks were trained to predict the next lexical category in a sentence. Importantly, these networks did not have built-in linguistic biases; rather, they were biased toward the learning of complex sequential structure. Nevertheless, the SRNs were sensitive to the amount of head-order regularity found in the grammars, such that there was a strong correlation between the degree of head-order regularity of a given grammar and the degree to which the network had learned to master the language. The more irregular a grammar was, the more erroneous the network performance it elicited. The sequential biases of the networks made the corpora generated by regular grammars considerably easier to acquire than the corpora generated from irregular grammars. Christiansen and Devlin further used frequency data on the world’s natural languages (gleaned from the FANAL database; Dryer, 1992) concerning the specific syntactic constructions used in the simulations. They found that languages incorporating fragments the networks found hard to learn tended to be less frequent than languages the network learned more easily. This suggests that constraints on basic word order may derive from nonlinguistic constraints on the learning and processing of complex sequential structure. Grammatical constructions with highly irregular head ordering may simply be too hard to learn, and would therefore tend to disappear. In a similar vein, Van Everbroeck (1999) presented network simulations in support of an explanation for language type frequencies based on learning constraints. He trained recurrent networks (a variation on the SRN) to produce the correct grammatical role assignments (i.e., who does what to whom) for noun-verb-noun sentences, presented one word at a time. Forty-two different language types were used to represent cross-linguistic variation in word order (e.g., subject-verb-object) and noun/verb inflection. Results of the simulations coincided with many observed trends in the distribution of the world’s languages. Subject-first languages, which make up the majority of language types (51% subject-object-verb and 23% subject-verb-object, respectively), were easily learned by the networks. Object-first languages, on the other hand, were not well learned, and have very low frequency in the world’s languages (object-verb-subject, 0.75%; objectsubject-verb, 0.25%). Van Everbroeck argued that these results were a predictable product of network learning and processing constraints. However, not all of Van Everbroeck’s results were directly proportional to actual language type frequencies. For example, verb-subject-object languages account for only 10 percent of the world’s language types, but the model’s performance on it exceeded performance on the more frequent subject-first languages. In more recent simulations, Lupyan and Christiansen (2002) were able to fit language type frequencies appropriately once they took case markings into account. More important, from the viewpoint of this chapter, they
96
Morten H. Christiansen and Rick Dale
were able to observe a frequency by regularity interaction when modeling the acquisition of English, Italian, Turkish, and Serbo-Croatian. English relies strongly on word order to signal who does what to whom, and thus has a very regular mapping from words to grammatical roles (e.g., the subject noun always comes before the verb in declarative sentences). Italian has a slightly less regular pattern of word order, but both English and Italian make little use of case. Turkish, although it has a flexible (or irregular) word order, nonetheless has a very regular use of case markings to signal grammatical roles. Serbo-Croatian, on the other hand, has both an irregular word order and a somewhat irregular use of case. Similar to children (Slobin and Bever, 1982), the networks initially showed the best performance on reversible transitive sentences in Turkish, with English and Italian quickly catching up, and with Serbo-Croatian lagging behind. Because of their regular use of case and word order, respectively, Turkish and English were more easily learned than Italian and, in particular, the highly irregular Serbo-Croatian. Of course, with repeated exposure the networks (and the children) learning Serbo-Croatian eventually caught up, as predicted by the frequency by regularity interaction. Together, the simulations by Christiansen and Devlin, Van Everbroeck, and Lupyan and Christiansen provide support for a connection between learnability and frequency in the world’s languages based on the learning and processing properties of connectionist networks. Languages that are more easily learned tend to proliferate, and we propose that such learning-based constraints are crucial to our understanding of how language may have evolved into its current form. However, one limitation regarding the three word-order models is that there is no actual transmission between generations of learners (as was the case in Hare and Elman, 1995). Next, we present a series of simulations in which we show how, through processes of linguistic adaptation, learning-based constraints on language acquisition can shape the language being learned. The Evolutionary Emergence of Multiple-Cue Integration An outstanding problem in developmental psycholinguistics is how children overcome initial hurdles in learning language. Upon first glance, these hurdles seem insurmountable: Children must disentangle a continuous stream of speech without any obvious information about syntactic structure. They have to learn to what grammatical categories words belong in their native language, and how to put those words together. However, grammatical categories and syntactic structure are not logically independent. A language’s syntax assumes grammatical categories, and grammatical categories assume a particular syntactic distribution. The task of acquiring language therefore presents a “bootstrapping” problem.
Role of Learning and Development in Language Evolution
97
A possible solution to this problem has been proposed (Gleitman and Wanner, 1982; Morgan and Demuth, 1996; Christiansen and Dale, 2001), and argues that multiple probabilistic cues in speech provide the child’s entering wedge into syntax. Prosodic and phonological sensitivity emerges rapidly in children (for reviews, see Jusczyk, 1997; Kuhl, 1999), and this attunement offers opportunities for languages to contain prosodic and phonological information about linguistic structure. Christiansen and Dale (2001) offered computational support for the hypothesis that integrating multiple probabilistic cues (phonological, prosodic, and distributional) by perceptually attuned general-purpose learning mechanisms may hold the key to how children solve the bootstrapping problem. Multiple cues can provide reliable evidence about linguistic structure that is unavailable from any single source of information. Much evidence suggests that such cues are present cross-linguistically (see Kelly, 1992, for a review) and are manifested in different combinations or “cue constellations.” Our hypothesis is that in order for languages to increase their linguistic complexity without compromising learnability, they have evolved cue constellations that reflect their structure and cater to cognitive constraints imposed by the child’s learning mechanisms. Here, we consider the evolution of such cues from a computational perspective. After reviewing the cues available for syntax acquisition, we present two language evolution simulations in which we explore how and why cues may have arisen. In the first, we demonstrate the ways in which cues could have emerged, given a language that is growing in vocabulary size. In the second, we offer an illustration of how growing grammatical complexity can strengthen the importance of cues for language acquisition. Cues Available for Syntax Acquisition Three sources of information may guide syntax acquisition: innate knowledge in the form of linguistic universals; language-external information that supplies the relationship between language and world; and language-internal information, such as aspects of phonological, prosodic, and distributional patterns within a language. Although some kind of innate knowledge may play a role in language acquisition, it cannot solve the bootstrapping problem. Even with built-in abstract knowledge about grammatical categories and syntactic rules (e.g., Pinker, 1984), the bootstrapping problem remains formidable: Children must map the right sound strings onto the right grammatical categories while determining the specific syntactic relations between these categories in their native language. Moreover, there now exists strong experimental evidence that children do not initially use abstract linguistic categories, but instead employ novel words as concrete items, thereby challenging the usefulness of hypothesized innate grammatical categories (Tomasello, 2000). Language-external information may contribute substantially to language acquisition. Correlations between environmental observations relating prior semantic categories (e.g.,
98
Morten H. Christiansen and Rick Dale
objects and actions) and grammatical categories (e.g., nouns and verbs) may furnish a “semantic bootstrapping” solution (Pinker, 1984). However, given that children acquire linguistic distinctions with no semantic basis (e.g., gender in French; Karmiloff-Smith, 1979), semantics cannot be the only source of information involved in solving the bootstrapping problem. Another extralinguistic factor is cultural learning, where children may imitate the pairing of linguistic forms and their conventional communicative functions (Tomasello, 2000). Nonetheless, to break down the linguistic forms into relevant units, it appears that cultural learning must be coupled with language-internal learning. Moreover, because the nature of both language-external and innate knowledge is difficult to assess, it is unclear how this knowledge could be quantified: There are no computational models of how such knowledge might be applied to learning basic grammatical structure. Though it is perhaps not the only source of information involved in bootstrapping the child into language, the potential contribution of language-internal information is more readily quantified. Phonological information—including stress, vowel quality, and duration—may help distinguish grammatical function words (e.g., determiners, prepositions, and conjunctions) from content words (nouns, verbs, adjectives, and adverbs) in English (e.g., Cutler, 1993). Phonological information may help distinguish between nouns and verbs. For example, nouns tend to be longer than verbs in English—a difference that even three-year-olds are sensitive to (Cassidy and Kelly, 1991). These and other phonological cues, such as differences in stress placement in multisyllabic words, have been found to exist cross-linguistically (see Kelly, 1992, for a review). Prosodic information provides cues for word and phrasal/clausal segmentation, and may help uncover syntactic structure (e.g., Morgan, 1996). Acoustic analyses suggest that differences in pause length, vowel duration, and pitch indicate phrase boundaries in both English and Japanese child-directed speech (Fisher and Tokura, 1996). Infants seem highly sensitive to such language-specific prosodic patterns (for reviews, see e.g., Jusczyk, 1997; Morgan, 1996)—a sensitivity that may start in utero (Mehler et al., 1988). Prosodic information also improves sentence comprehension in two-year-olds (Shady and Gerken, 1999). Results from an artificial language learning experiment with adults show that prosodic marking of syntactic phrase boundaries facilitates learning (Morgan et al., 1987). Unfortunately, prosody is also partly affected by a number of nonsyntactic factors, such as breathing patterns, resulting in an imperfect mapping between prosody and syntax (Fernald and McRoberts, 1996). Nonetheless, infants’ sensitivity to prosody provides a rich potential source of syntactic information (Morgan, 1996). None of these cues in isolation suffice to solve the bootstrapping problem; rather, they must be integrated to overcome the partial reliability of individual cues. Previous connectionist simulations by Christiansen, Allen, and Seidenberg (1998) have pointed to efficient and robust learning methods for multiple-cue integration in speech segmentation.
Role of Learning and Development in Language Evolution
99
Integration of phonological (lexical stress), prosodic (utterance boundary), and distributional (phonetic segment sequences) information resulted in reliable segmentation, outperforming the use of individual cues. The efficacy of multiple-cue integration has also been confirmed in artificial language learning experiments (e.g., McDonald and Plauche, 1995). By age one, children’s perceptual attunement is likely to allow them to utilize languageinternal probabilistic cues (for reviews see, e.g., Jusczyk, 1997; Kuhl, 1999).2 For example, infants appear to be sensitive to the acoustic differences between function and content words (Shi et al., 1999) and to the relationship between function words and prosody in speech (Shafer et al., 1998). Young infants can detect differences in number of syllables among isolated words (Bijeljac et al., 1993)—a possible cue to noun/verb differences. Moreover, infants are accomplished distributional learners (e.g., Saffran et al., 1996) and, importantly, they are capable of multiple-cue integration (Mattys et al., 1999). When solving the bootstrapping problem, children are also likely to benefit from specific properties of child-directed speech, such as the predominance of short sentences (Newport et al., 1977) and the cross-linguistically more robust prosody (Kuhl et al., 1997). This review has indicated the range of language-internal cues available for language acquisition; that these cues affect learning and processing; and that mechanisms exist for multiple-cue integration. In an earlier paper (Christiansen and Dale, 2001), we reported on a series of simulations revealing the computational feasibility of the multiple-cue approach to syntax acquisition. SRNs that faced the task of learning grammatical structure and predicting cues actually benefited from the additional burden. Despite previous theoretical reservations about the value of multiple-cue integration (Fernald and McRoberts, 1996), the analysis of network performance revealed that learning under multiple cues results in faster, better, and more uniform learning. In another simulation, SRNs were able to distinguish between relevant cues and distracting cues, and performance did not differ from networks that received just reliable cues. Overall, these simulations offer support for the multiple-cue integration hypothesis in language acquisition. They demonstrate that learners can benefit from multiple cues, and are not distracted by irrelevant language-internal information. Though Christiansen and Dale (2001) offered computational support for the benefit of multiple cues, they did not investigate how these cues may have emerged in language. The following two simulations address this question and illustrate how learning-based constraints can impinge on the evolution of languages. Simulation 1: Growing Vocabulary The following simulation implements a system of language adaptation: Grammars mutate, and are selected on the basis of their learnability. This approach echoes observations by
100
Morten H. Christiansen and Rick Dale
Christiansen (1994) and Deacon (1997) that language changes much more rapidly than its neurobiological substrate, and the child’s brain serves as a kind of habitat in which natural selection applies to individual languages. Languages that were difficult to learn were selected against, and languages that were more easily learned, survived and propagated throughout a population of speakers. This method of simulating language change allows investigation into how cues evolved to contribute to this selection and benefit language learning. In what follows, we describe the networks and the language they learned, the conditions provided for transmitting language across generations, and the resulting patterns of cue constellations in the languages that evolved.
cat dog . . .
Output: predict next word
lex cue const cue
Networks and Grammar SRNs served as language learners in both simulations (see figure 6.1). This type of network has a context layer to which the activation of the hidden unit layer—the network’s current internal state—is copied and fed back to the network at the next time step. This provides the network with the ability to learn and process the grammatical structure inherent in sequences of words. Each SRN had initial weight randomization of [-.05, 0.05], with a learning rate of 0.1 and momentum of 0. Input to the networks consisted of individual words in the form of localist representations (one unit was activated for each word). When presented with a word, networks were required to predict the following word in a sentence, along with its corresponding cues. Networks consisted of 12 or 24 word units (depending on the
Input: current word
lex cue const cue
context layer
cat dog . . .
hidden layer
Figure 6.1 Diagram of one SRN agent. Solid lines indicate full connectivity between layers of nodes, and dashed lines indicate one-to-one copy-back connections (lex cue = lexical cue marker; const cue = constituent cue marker).
Role of Learning and Development in Language Evolution
101
Table 6.1 The Phrase-Structure Grammar Used in Simulation 1 S Æ {NP VP} NP Æ {PP N} VP Æ {V NP} PP Æ {P NP} Note: Brackets indicate the order of these rules was permitted to change.
vocabulary size condition of the simulation) and two cue units, one representing a constituent cue (e.g., pauses) and another activated conjointly with words representing any lexical cue (e.g., primary stress). Each network had ten hidden units and ten context units. Languages were defined by phrase-structure grammars, a system of rewrite rules defining how sentences are constructed. The phrase-structure grammar “template” used in this simulation is presented in table 6.1. Individual grammars had three changeable features allowing “mutation” with each generation. Head ordering was modified by shifting the constituent order of the four main rewrite rules: S(entence), N(oun)P(hrase), V(erb)P(hrase), and P(repositional)P(hrase). For example, a grammar with the rule PP Æ P NP, a head-first rule, could be made head-final by simply rewriting PP as NP P, with the head of the prepositional phrase in the final position. The constituent cue was permitted to potentially mark the boundary of the four rewrite rules. This cue was modified by addition, deletion, and movement (from one rewrite rule to another). Finally, all words were permitted to be associated with the lexical cue. Cues could be added to words, deleted from them, or moved from one word to another. This process was applied across all words and was not specific to any particular grammatical category. The constituent cue was represented as a single unit activated separately after its corresponding phrase-structure rules. The lexical cue was a single unit coactivated with lexical items during training. Two grammar templates were created for two separate sets of simulation runs. These grammars differed only in the size of their vocabulary, the first being half (12 words) of the second (24 words). Procedure The grammar template was initially randomized to form five different languages, and each language was learned by five different networks (25 networks in total). Networks were trained on 3,000 randomly generated sentences of their respective grammar (approximately 15,000 word presentations). The performance of each language’s five networks was averaged, and the language most easily learned produced linguistic “offspring” for the next generation of networks. Performance was based on a test corpus of 100 randomly generated sentences. The winning language, and four variations of it, served as the
102
Morten H. Christiansen and Rick Dale
parent L1
¥5
¥5
1
parent L2
parent L3
¥5
3
4 variations on parent for each generation
¥5
...
L1 ¢ ¢ ¢ ¢ lowest error becomes parent for next generation
L2 ¢ ¢ ¢ ¢
L2¢
¥5
¥5
2
L1 ¢ ¢ ¢ ¢
L1¢
¥5
...
L3¢
L2 ¢ lowest error becomes parent for next generation
L3 ¢ ¢ ¢ ¢
¥5
...
¥5
Figure 6.2 Evolutionary simulation schematic of the first three generations of 500.
five languages for the next generation. Variations were formed by randomly selecting two of the three features of the grammar to modify (as described above). The simulation was halted after 500 generations. Ten differently seeded simulation runs were performed. Figure 6.2 illustrates the procedure. Analysis Head Order Christiansen and Devlin (1997), as described previously, argued that headorder regularity is a consequence of learning constraints. SRNs in their simulation were better able to learn languages that had head-order regular rules. Similarly, in this simulation we observed head-order regularity of languages across generations. We associated with each winning grammar a score based on the proportion of rules consistently headfirst or head-final.
Role of Learning and Development in Language Evolution
103
Constituent Cue We observed the ways in which evolving languages incorporated the constituent cue, and its consonance with what is observed in child-directed speech. Lexical Cue Length, stress, and other lexical cues in language benefit the child to the extent that they delimit grammatical classes. To measure this in the simulation, we performed a simple comparison of how the lexical cue associated with different classes. We used the magnitude of the maximum difference of association among grammatical categories. Formulaically, we measured cue relevance using max  xi +
Âx -Âx Âx i
j+
j
where xi denotes a grammatical class and xi+ denotes words of that class that have an associated lexical cue. This approach to measuring cue relevance is beneficial for two reasons. First, if the cue is unimportant and does not become associated with any words, or becomes associated with all of them, the value of cue relevance will be 0 (0 percent relevant). If the cue separates any two-word classes completely, then cue relevance will be 1 (100 percent relevant). Second, this interval of [0, 1] allows us to graphically represent how the lexical cue becomes exploited across language generations. Results Even though all simulation runs started with random grammars without consistent head ordering or use of the constituent and lexical cues, we expected that coherent cue constellations would emerge over generations. Head Order Languages did not evolve head-ordering regularity in any runs of the simulation, in both vocabulary sizes. Constituent Cue In all runs of the simulation, the constituent cue quickly delimited NP and VP rules, consistent with child-directed speech (Fisher and Tokura, 1996). Lexical Cue Only in the simulation runs with a large vocabulary did languages exploit the lexical cue. As seen in figure 6.3, languages with a larger vocabulary remained highly consistent in use of the lexical cue, in which it clearly delimits two-word classes across generations. Using the area under these graphs as a measure of cue consistency across generations, it was found that larger vocabularies were more consistent in their use of the lexical cue than small ones (p < .05). Simulation 2: Growing Grammatical Complexity Simulation 1 suffers from a few limitations. First, the grammatical template used was very simple, and may not fully capture the importance of cues in emerging syntactic structure. Second, the simulations were unable to settle on a particular grammar, but would continuously change back and forth between several possible grammars. Finally, in contrast to
104
Morten H. Christiansen and Rick Dale
Small vocabulary
Large vocabulary 1
consistency
consistency
1
0.5
0 1
500 generations
0.5
0 1
500 generations
Figure 6.3 A small and large vocabulary run similarly seeded. Languages with larger vocabulary better exploit the lexical cue. The strength of a lexical cue varies from 0 to 1, and is computed by calculating the maximum extent to which a cue distinguishes any two lexical classes (0 = distinguishes no classes; 1 = signals with 100 percent probability two lexical classes in the vocabulary).
the simulation of Christiansen and Devlin (1997), we did not observe a strong effect of regular head ordering. Simulation 2 was intended to overcome these limitations. Networks and Grammar The networks in this simulation were the same as those in simulation 1, and the same learning parameters were used. The selection process in this simulation, however, was based on a considerably more complex grammar template with two additional rules that encoded a recursive possessive phrase (PossP; see table 6.2). This grammar template is the same as the phrase-structure grammar used in Christiansen and Devlin. Procedure The procedure mirrored that of simulation 1, with 3,000 sentences for training and 100 for testing. Mutation of the languages was accomplished in the same way, and winning grammars were again selected on the basis of their learnability. Runs of this simulation were halted after the winning language remained the same for 50 generations. Analysis Head-order regularity, constituent cue use, and lexical cue consistency were measured as in simulation 1.
Role of Learning and Development in Language Evolution
105
Table 6.2 The Phrase-Structure Grammar Used in Simulation 2 S Æ {NP VP} NP Æ {PP N} VP Æ {V NP} PP Æ {P NP} NP Æ {PossP N} PossP Æ {Poss NP} Note: Brackets indicate the order of these rules was permitted to change.
Results Nine of our ten simulation runs stabilized on one particular language variation. Of those nine, the following results were observed. Head Order All languages had highly regular head ordering (i.e., at least five of the six rules were consistently head-first or head-final). Constituent Cue As in simulation 1, the constituent cue consistently delimited plausible aspects of the grammar template. All runs of the simulation rapidly evolved languages delimiting NP boundaries, again consistent with child-directed speech. Lexical Cue All stable languages had perfectly consistent lexical cues. Interestingly, six of the nine that stabilized, evolved lexical cues that separated function words from content words, much like English and other natural languages. Summary of Simulations These simulations explored two ways in which languages can evolve, and how these conditions influence the emergence of cues to service language acquisition. Simulation 1 revealed that constituent cues, such as pauses or pitch modulation, are highly important in initial syntactic structure and emerge quickly. A growing vocabulary, however, enabled languages to exploit subtler lexical cues, such as word length and lexical stress, to delimit grammatical classes. Simulation 2 revealed that growing grammatical complexity compels languages to incorporate both constituent and lexical cues for syntax acquisition. Together, these simulations illuminate how ontogenetic constraints can guide the evolution of languages. The learning-based constraints imposed by neural network learners shaped the form of the emerging languages across generations. Conclusion In this chapter, we have sought to turn the discussion of whether or not ontogeny recapitulates phylogeny on its head. At least when it comes to language, we have proposed
106
Morten H. Christiansen and Rick Dale
that development to a large extent has shaped the evolution of our linguistic abilities, rather than vice versa. Consequently, we have emphasized the role of learning-based constraints in the evolution of linguistic structure, instead of biological changes to accommodate language. Connectionism provides a natural framework for studying a learning-based approach to language evolution, given its widespread application to the modeling of language development. Indeed, we have seen that the specific network properties which have proven crucial for modeling developmental patterns in language acquisition, such as the frequency by regularity interaction, also provide a basis for explaining language evolution. We have presented two series of connectionist simulations in which learning biases over generations lead to the emergence of multiple-cue integration through linguistic adaptation. Importantly, the nature of the emergent cue systems was similar to the kind of cue systems that young infants have been shown to use in language acquisition. These cue systems appear to emerge to service growing linguistic structure. Fueled by constraints on learning, cue integration becomes a vehicle for facilitating the acquisition of complex linguistic structure. Languages employing cues become more likely to survive the processes of cultural transmission across generations, demonstrating how learning can shape evolution. On a more theoretical level, our learning-based approach to language evolution may allow us to deal productively with Lewontin’s (1998) scathing critique of evolutionary approaches to cognition, and to language evolution in particular: “Reconstructions of the evolutionary history and the causal mechanisms of the acquisition of linguistic competence . . . are nothing more than a mixture of pure speculation and inventive stories” (p. 111). He argues that we are unlikely to find solid evidence that there are heritable variations in linguistic abilities among individuals in the hominid lineage, and that these variations lead individuals with greater abilities to have more offspring. Lewontin’s main concern is that we simply cannot test the hypotheses put forward to explain language evolution because of our limited knowledge about hominid evolution in general. However, if, as we have suggested here, language has evolved largely through cultural transmission constrained by limitations on human learning and processing, we can test these hypotheses through computational simulations and human experimentation (Christiansen and Ellefson, 2002; Christiansen et al., 2002). Acknowledgment The research reported in this chapter was supported in part by a grant from the Human Frontiers Science Program to Morten H. Christiansen.
Role of Learning and Development in Language Evolution
107
Notes 1. For a more detailed description of this approach, see Kirby and Christiansen (2003). For a review placing the cultural transmission approach in the context of contemporary theories of language evolution, see Christiansen and Kirby (2003). 2. Acoustic and articulatory speech science has provided a strong historical basis for these psycholinguistic analyses, including some early clues that prosodic information may point toward syntactic structure. Oller (2000; chapter 4, this volume) presents a theory of communicative evolution that has grown partly out of this tradition.
References Batali J (1998) Computational simulations of the emergence of grammar. In: Approaches to the Evolution of Language: Social and Cognitive Bases (Hurford JR, Studdert-Kennedy M, Knight C, eds.), 405–426. Cambridge: Cambridge University Press. Bates E, Elman JL (1993) Connectionism and the study of change. In: Brain Development and Cognition: A Reader (Johnson MH, ed.), 623–642. Oxford: Blackwell. Bijeljac R, Bertoncini J, Mehler J (1993) How do 4-day-old infants categorize multisyllabic utterances? Devel Psych 29: 711–721. Bishop CM (1995) Neural Networks for Pattern Recognition. New York: Oxford University Press. Cangelosi A (1999) Modeling the evolution of communication: From stimulus associations to grounded symbolic associations. In: Advances in Artificial Life (Proceedings ECAL99 European Conference on Artificial Life) (Floreano D, Nicoud J, Mondada F, eds.), 654–663. Berlin: Springer-Verlag. Cassidy KW, Kelly MH (1991) Phonological information for grammatical category assignments. J Mem Lang 30: 348–369. Christiansen MH (1994) Infinite languages and finite minds. Ph.D. thesis, University of Edinburgh. Christiansen MH, Allen J, Seidenberg MS (1998) Learning to segment speech using multiple cues: A connectionist model. Lang Cog Proc 13: 221–268. Christiansen MH, Chater N (2001) Connectionist psycholinguistics in perspective. In: Connectionist Psycholinguistics (Christiansen MH, Chater N, eds.), 19–75. Westport, Conn.: Ablex. Christiansen MH, Dale R (2001) Integrating distributional, prosodic and phonological information in a connectionist model of language acquisition. In: Proceedings of the 23rd Annual Conference of the Cognitive Science Society, 220–225. Mahwah, N.J.: Lawrence Erlbaum. Christiansen MH, Dale R, Ellefson MR, Conway CM (2002) The role of sequential learning in language evolution: Computational and experimental studies. In: Simulating the Evolution of Language (Cangelosi A, Parisi D, eds.), 165–187. London: Springer-Verlag. Christiansen MH, Devlin JT (1997) Recursive inconsistencies are hard to learn: A connectionist perspective on universal word order correlations. In: Proceedings of the 19th Annual Conference of the Cognitive Science Society, 113–118. Mahwah, N.J.: Lawrence Erlbaum. Christiansen MH, Ellefson MR (2002) Linguistic adaptation without linguistic constraints: The role of sequential learning in language evolution. In: Transitions to Language (Wray A, ed.), 335–358. Oxford: Oxford University Press. Christiansen MH, Kirby S (2003) Language evolution: Consensus and controversies. Trends in Cognitive Sciences 7: 300–307. Cutler A (1993) Phonological cues to open- and closed-class words in the processing of spoken sentences. J Psycholing Res 22: 109–131. Deacon TW (1997) The Symbolic Species: The Co-evolution of Language and the Brain. London: Penguin Press.
108
Morten H. Christiansen and Rick Dale
Dryer MS (1992) The Greenbergian word order correlations. Language 68: 81–138. Elman JL (1990) Finding structure in time. Cognit Sci 14: 179–211. Fernald A, McRoberts G (1996) Prosodic bootstrapping: A critical analysis of the argument and the evidence. In: From Signal to Syntax (Morgan JL, Demuth K, eds.), 365–388. Mahwah, N.J.: Lawrence Erlbaum. Fisher C, Tokura H (1996) Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Devel 67: 3192–3218. Gleitman L, Wanner E (1982) Language acquisition: The state of the state of the art. In: Language Acquisition: The State of the Art (Wanner E, Gleitman L, eds.), 3–48. Cambridge: Cambridge University Press. Hare M, Elman JL (1995) Learning and morphological change. Cognition 56: 61–98. Hebb DO (1949) Organization of Behavior: A Neuropsychological Theory. New York: John Wiley and Sons. Jusczyk PW (1997) The Discovery of Spoken Language. Cambridge, Mass.: MIT Press. Karmiloff-Smith A (1979) A Functional Approach to Child Language: A Study of Determiners and Reference. Cambridge: Cambridge University Press. Kelly MH (1992) Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psych Rev 99: 349–364. Kirby S, Christiansen MH (2003) From language learning to language evolution. In: Language Evolution (Christiansen MH, Kirby S, eds.), 272–294. New York: Oxford University Press. Kirby S, Hurford J (2002) The emergence of linguistic structure: An overview of the iterated learning model. In: Simulating the Evolution of Language (Cangelosi A, Parisi D, eds.), 121–148. London: Springer-Verlag. Kuhl PK (1999) Speech, language, and the brain: Innate preparation for learning. In: Neural Mechanisms of Communication (Konishi M, Hauser M, eds.), 419–450. Cambridge, Mass.: MIT Press. Kuhl PK, Andruski JE, Chistovich IA, Chistovich LA, Kozhevnikova EV, Ryskina VL, Stolyarova EI, Sundberg U, Lacerda F (1997) Cross-language analysis of phonetic units in language addressed to infants. Science 277: 684–686. Lewontin RC (1998) The evolution of cognition: Questions we will never answer. In: An Invitation to Cognitive Science, vol. 4: Methods, Models, and Conceptual Issues (Scarborough D, Sternberg S, eds.), 107–131. Cambridge, Mass.: MIT Press. Lieberman DE (1998) Sphenoid shortening and the evolution of modern human cranial shape. Science 393: 158–162. Lively SE, Pisoni DB, Goldinger SD (1994) Spoken word recognition. In: Handbook of Psycholinguistics (Gernsbacher MA, ed), 265–318. San Diego: Academic Press. Livingstone D, Fyfe C (1999) Modelling the evolution of linguistic diversity. In: Advances in Artificial Life, Proceedings of ECAL99 European Conference on Artificial Life (Floreano D, Nicoud J, Mondada F, eds.), 704–708. Berlin: Springer-Verlag. Lupyan G, Christiansen MH (2002) Case, word order, and language learnability: Insights from connectionist modeling. In: Proceedings of the 24th Annual Conference of the Cognitive Science Society, 596–601. Mahwah, N.J.: Lawrence Erlbaum. MacDonald MC, Christiansen MH (2002) Reassessing working memory: A comment on Just & Carpenter (1992) and Waters & Caplan (1996). Psych Rev 109: 35–54. MacWhinney B (2003) Language acquisition. In: The Handbook of Brain Theory and Neural Networks, 2nd ed. (Arbib MA, ed.), 600–603. Cambridge, Mass.: MIT Press. Mattys SL, Jusczyk PW, Luce P, Morgan JL (1999) Phonotactic and prosodic effects on word segmentation in infants. Cognit Psych 38: 465–494. McDonald JL, Plauche M (1995) Single and correlated cues in an artificial language learning paradigm. Lang Speech 38: 223–236. Mehler J, Jusczyk PW, Lambertz G, Halsted N, Bertoncini J, Amiel-Tison C (1988) A precursor of language acquisition in young infants. Cognition 29: 143–178.
Role of Learning and Development in Language Evolution
109
Morgan JL (1996) Prosody and the roots of parsing. Lang Cog Proc 11: 69–106. Morgan JL, Demuth K (1996) From Signal to Syntax. Mahwah, N.J.: Lawrence Erlbaum. Morgan JL, Meier RP, Newport EL (1987) Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cognit Psych 19: 498–550. Newport EL, Gleitman H, Gleitman LR (1977) Mother, I’d rather do it myself: Some effects and non-effects of maternal speech style. In: Talking to Children: Language Input and Acquisition (Snow CE, Ferguson CA, eds.), 109–149. Cambridge: Cambridge University Press. Oliphant M (1999) The learning barrier: Moving from innate to learned systems of communication. Adapt Behav 7: 371–384. Oller DK (2000) The Emergence of the Speech Capacity. Mahwah, N.J.: Lawrence Erlbaum. Pinker S (1984) Language Learnability and Language Development. Cambridge, Mass.: Harvard University Press. Pinker S, Bloom P (1990) Natural language and natural selection. Behavioral and Brain Sciences 13: 707–784. Plunkett K (1995) Connectionist approaches to language acquisition. In: Handbook of Child Language (Fletcher P, MacWhinney B, eds.), 36–72. Oxford: Blackwell. Saffran JR, Aslin RN, Newport EL (1996) Statistical learning by 8-month-old infants. Science 274: 1926–1928. Seidenberg MS (1985) The time course of phonological code activation in two writing systems. Cognition 19: 1–30. Seidenberg MS, MacDonald MC (2001) Constraint satisfaction in language acquisition and processing. In: Connectionist Psycholinguistics (Christiansen MH, Chater N, eds.), 281–318. Westport, CT: Ablex. Shady M, Gerken LA (1999) Grammatical and caregiver cues in early sentence comprehension. J Child Lang 26: 163–175. Shafer VL, Shucard DW, Shucard JL, Gerken LA (1998) An electrophysiological study of infants’ sensitivity to the sound patterns of English speech. J Speech, Lang Hearing Res 41: 874–886. Shi R, Werker JF, Morgan JL (1999) Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition 72: B11–B21. Slobin DI, Bever TG (1982) Children use canonical sentence schemas: A crosslinguistic study of word order and inflections. Cognition 12: 229–265. Smolensky P, Mozer MC, Rumelhart DE (eds.) (1996) Mathematical Perspectives on Neural Networks. Mahwah, N.J.: Lawrence Erlbaum. Tomasello M (2000) The item-based nature of children’s early syntactic development. Trends Cognit Sci 4: 156–163. Van Everbroeck E (1999) Language type frequency and learnability: A connectionist appraisal. In: Proceedings of the 21st Annual Conference of the Cognitive Science Society, 755–760. Mahwah, N.J.: Lawrence Erlbaum.
7
Repeated Patterns in Behavior and Other Biological Phenomena
Magnus S. Magnusson Human environments consist to a large extent of repeated spatiotemporal patterns which are typically composed of simpler patterns. Most humans are thus surrounded by houses, streets, cars, shops, and omnipresent behavior patterns composed of verbal and nonverbal elements. Human individuals are, of course, themselves patterns of parts, such as trunk, head, arms, and legs, that again are composed of simpler parts, and so on, recursively, down to the infinitesimally small. The human individual thus appears as a particular type of repeated pattern immersed in endless numbers of types and instances of other patterns, some man-made and visible, but most neither. This view of human existence is thus in accordance with the words of Francis Crick, one of the discoverers of the structure of DNA: “Another key feature of biology is the existence of many identical examples of complex structures” (Crick, 1989, p. 138). Regarding behavior, the word identical above might preferably be replaced by the word similar, but molecules also have elasticity (Grosberg and Khokhlov, 1997). Hidden Patterns Clearly, the production of patterns and their detection in the behavior of others is essential for communication, and such abilities generally increase during both individual development and phylogenetic evolution. The ability to recognize patterns in the environment is critical for an organism’s survival. It is a prerequisite for tasks including foraging, danger avoidance, mate selection, and, more generally, associating specific responses with particular events and objects (Sinha, 2002, p. 1093). The following quotation thus concerns a characteristic of behavior which constitutes a difficult but possibly essential problem for behavioral research: “Behavior consists of patterns in time. Investigations of behavior deal with sequences that, in contrast to bodily characteristics, are not always visible” (Eibl-Eibesfeldt, 1970, p. 1; emphasis added). In these opening words of his Ethology: The Biology of Behavior, Eibl-Eibesfeldt thus defines behavior as temporal patterns that may occur before the very eyes and ears of observers without being (consciously) noticed. Emergence Unexpected and hard-to-explain patterning in nature is receiving increased attention, and interdisciplinary studies of emergence and complexity have gained much momentum (see,
112
Magnus S. Magnusson
e.g., Holland, 1998). Emergence is often exemplified by Bénard cells, which are particular directly visible patterns that may form on the surface of a liquid that is enclosed in an open container and heated from below (see, e.g., Kelso, 1997, p. 7; Solé and Goodwin, 2000, p. 15). Important aspects of such patterns, which are also among the reasons for being of emergence studies, is that given the available understanding of basic processes, they may be impossible to predict and/or explain. But while Bénard cells are visible, this is not necessarily the case for all emergent patterns. Or, in the words of James Crutchfield: “It is rarely, if ever, the case that the appropriate notion of pattern is extracted from the phenomenon itself using minimally biased procedures. Briefly stated, “in the realm of pattern formation ‘patterns’ are guessed and then verified” (Crutchfield, 1993; quoted from Solé and Goodwin, 2000, p. 20). The discovery of patterning may thus require the creation of model patterns with corresponding detection procedures, as will be illustrated below. The study of emergent patterns is closely related to that of self-organization, and emergent patterns in human behavior and interactions are examples of both par excellence. Obviously, before understanding the function and evolution of any pattern, molecular or behavioral, it must first be discovered. Two pioneers of human interaction research have repeatedly reminded us that that task does not end with the discovery of any fixed number of fully specified patterns: “. . . a conversation, . . . a complex system of relationships which nonetheless may be understood in terms of general principles which are discoverable and generally applicable, even though the course of any specific encounter is unique (cf. Kendon 1963, Argyle and Kendon 1967)” (Kendon, 1990, p. 4; emphasis added). Unending creativity and uniqueness must thus be expected, and this whirlwind of new combinations may be characteristic of life and the universe itself (Kaufman, 2000). Hidden Context and Meaning What if many complex, repeated behavioral patterns are still hidden from the eyes, ears, and tools of researchers? What if some are essential for the understanding of behavior and communication? Moreover, a hidden pattern could be the context which determines the meaning of the simplest elements. Aiding the Senses Only adequate models and tools allow the detection of such patterns, and below a pattern type, called the t-pattern, and a detection algorithm (Magnusson, 1996, 2000a) are briefly described, along with examples of discovered behavioral patterns. T-patterns in other
Repeated Patterns in Behavior and Other Biological Phenomena
113
biological phenomena, such as brain-cell interactions, DNA, and memes will be discussed briefly. Toward a Model Pattern What is a pattern? The broad meaning of the word pattern is indicated by the fact that most mathematicians now define mathematics as the science of patterns (Devlin, 1997). Considering behavior as repeated patterns is a long-standing tradition in the behavioral sciences. For example, linguists and ethologists traditionally deal with repeated temporal patterns in communicative behavior, and radical behaviorism deals with probabilistic realtime contingencies (patterns), also with a focus on repetition. Other branches of behavioral science, such as anthropology, social psychology, and sociology, deal with repeated patterns, such as scripts, plans, routines, strategies, rituals, and ceremonies. The importance of repeated temporal patterns in behavior, whether hidden or obvious, is thus widely accepted. But, more formally, what kinds of patterns are they? Obvious Versus Hidden Patterns The underlying hypothesis here is that many hidden behavioral patterns may be structurally similar to some obvious ones. Characteristics of well-known patterns have thus been combined to create a general-scale independent pattern type. Well-known obvious examples follow: 1. “How are you?” This sequence of words is an intraindividual verbal pattern. 2. “How are you?” “Fine, thank you.” This is an interindividual verbal pattern. 3. Bill says, “Pass me the salt, Jack.” Jack passes him the salt. This is an interindividual mixed verbal and nonverbal pattern. 4. “If . . . then . . . else. . . .” This is a verbal pattern with time slots that may be filled in various ways. A typical dinner is also such a pattern of acts which themselves are patterns—for example: “takes a seat at a table, takes an appetizer, then a main course, then a dessert, then coffee, and finally stands up.” As in “if . . . then . . . else,” the number of other acts between the components may vary considerably. Other examples are rhythmic phrases, melodies, and musical themes. Characteristic of these patterns is the particular order of their components and the particular approximate time distances between them; if the distances are too short or too long,
114
Magnus S. Magnusson
the pattern disappears or becomes strange or even pathological. Whether a melody or a molecule, it can be squeezed and stretched only within critical limits. The limited flexibility is generally such that most well-known patterns would hardly ever recur by chance if their components were distributed randomly and independently, each with its own average frequency. This aspect is of essential importance here because hidden patterns are often impossible to detect on the basis of order alone, due to the great variation in the number of other behaviors occurring between their components. This is especially true if the pattern is complex and/or infrequent—occurring, for example, only twice in the data. But the common argument that only frequent behaviors should be studied seems to neglect the fact that the most important events tend to be rare. Another important aspect for detection is hierarchical structure, often with many levels, since pattern components may themselves be patterns of still simpler patterns. For example, a common phrase is composed of words that are composed of syllables. And its words may occur in various other phrases and even alone. Similarly, its syllables may occur in other words and possibly alone. A multitude of common rituals, ceremonies, routines, conferences, classes, financial operations, and even genes and genomes seem to correspond to the defining characteristics of this general one-dimensional “flexible” pattern type. T-Patterns Are Often Hard to See The slightest presence of behaviors other than those pertaining directly to a t-pattern can make the most regular t-pattern invisible even in the simplest data, as is shown in figure 7.1. Similar difficulties are encountered when searching for such patterns in video recordings of behavior, even after they have been pointed out. Overlapping patterns unfolding over many time scales and modalities may simply be too much to follow. This, combined
Figure 7.1 T-patterns are easily overlooked. The letters a, b, c, d, and k represent occurrences of event types A, B, C, D, and K on a single dimension. The lower axis and its data are identical to the upper one except that occurrences of events of type K have been removed, making the two hidden occurrences of the simple t-pattern (A B C D) appear clearly.
Repeated Patterns in Behavior and Other Biological Phenomena
115
with the well-known human tendency to see patterns where there are none, calls for improved means of detection. Derived Types and the T-System The following are some of the terms which have been derived from the t-pattern type and together form the t-system: The t-marker is a component of a t-pattern that rarely occurs independently of that pattern and thus indicates its presence (Magnusson and Beaudichon, 1997). A t-associate (+/-) of a t-pattern Q is not a component of Q, but is behavior (event type or pattern) that has a significant positive versus negative tendency to occur (anywhere) during or near occurrences of Q. It may thus serve as an indicator of the occurrences of its associate pattern. A t-satellite of Q is a positive t-associate that always and only occurs together with Q, while a t-taboo is a negative t-associate of Q that never occurs with Q. T-drifters are behaviors belonging to none of the other categories of the system. A t-pattern with its +/- associates is called a t-packet, and it has an attraction and a repulsion zone around it defined by the occurrence/nonoccurrence of its +/- associates. T-coverage of a pattern is the total amount of time the pattern is in progress; it is called percent coverage when expressed as a percentage of total observation time. T-composition is the set of alternating nonoverlapping patterns with the highest combined t-coverage in a given data set. Origin of the T-System and Theme The conceptual and algorithmic development behind the t-system and Theme (the t-pattern detection program, Magnusson, 1996, 2000) was initially stimulated by research regarding the structure of behavior and interactions with varying focus on real-time, probabilistic, and functional aspects, as well as hierarchical and syntactic structure, creativity, routines, and planning (notably, Chomsky, 1959, 1965; Cosnier, 1971; Dawkins, 1976; Duncan and Fiske, 1977; Miller et al., 1960; Montagner, 1978; Skinner, 1957; Tinbergen, 1963). Method The t-pattern detection algorithm, which performs a fully automatic search for t-patterns, is based on a formal definition of t-patterns relative to a particular data structure, the t-data set.
116
Magnus S. Magnusson
T-Data Sets Each of the behaviors or acts that may occur in a pattern is here called a behavior type. When the actor is also specified and whether it is the beginning or ending of the behavior, the term event type is used. For example, “Bill begins walking” (or, in short form: bill, b, walk) is an event type which may also be further qualified (e.g., bill, b, walk, fast). The behavior is coded in terms of the occurrence times of such beginnings and endings (points) on a discrete time scale. Each beginning and/or ending thus either occurs or does not occur at a discrete time point. Any number of event types (involving any number of actors) may occur at the same discrete time point (i.e., basic time unit). The occurrences of each event type within the continuous observation period(s) thus constitute a time point series or process (see, e.g., Daley and Vere-Jones, 1988). The real-time behavior record is thus a data set consisting exclusively of such series of occurrence times (i.e., a multivariate point process) and a specification of the observation period(s). Below, all definitions of t-patterns and any derived terms refer exclusively to such data sets (see example in figure 7.2). It goes without saying that all results still depend on insightful choice of categories and careful coding. T-Pattern Definition The following notation expresses more formally the general structure of any given t-pattern with m components: X1 ª dt 1 X 2 ª dt 2 ◊ ◊ ◊ X i ª dt i X i +1 ◊ ◊ ◊ X m -1 ª dt m -1 X m The X1 . . . Xm terms stand for pattern components, which may be either event types or other t-patterns (recursive definition). The ªdt1 . . . dtm-1 terms stand for the approximate characteristic distances between the consecutive components. The general term Xi ª dti Xi+1 thus means that component Xi is followed within the approximate characteristic time distance ªdti by component Xi+1. That is, over a given number of occurrences of a pattern within a given observation period, each ªdti varies within an interval given by its lowest and highest values, here noted as [d1i, d2i]. The general term Xi ª dti Xi+1 may thus be rewritten as X i [d1i , d 2i ]X i +1 which means that component Xi is followed within time window [d1i, d2i] by component Xi+1. T-Patterns as Critical Interval Trees For detection purposes, binary tree representations of any t-pattern can be obtained by splitting the t-pattern into two parts (left and right) and then, recursively, splitting each
Repeated Patterns in Behavior and Other Biological Phenomena
117
side down to the terminal event type level. For longer patterns this can be done in numerous ways. (Note also that any subpattern or branch of a t-pattern may occur more frequently than [i.e., independently of] the full pattern.) The first rather loose t-pattern definition can then be replaced by a more restricted definition of the t-pattern as a binary tree of critical intervals, each relating a left (preceding) and a right (following or concurrent) part. In this way, any given t-pattern (and any of its subpatterns) can be written as a pair of components related by a characteristic (or critical) interval: X left [d1 , d 2 ]X right Here, Xleft stands for the first part, ending at t, which is followed within the critical interval [t + d1, t + d2] by the beginning of the latter part, Xright, where 0 £ d1 £ d2. (The t is implicit in [d1, d2], but omitted to simplify notation.) The T-Pattern Search Algorithm In behavior records of a moderate size (e.g., 100 event types, each occurring at least twice), the number of possible patterns involving, for example, ten event types is astronomical; since both sequence and interval length variation are considered, it is far greater than 10010. Even for much smaller data sets the number can be staggering. Trying out all possible sequences of all possible lengths is clearly not an option. Instead, the proposed search algorithm can be said to reverse the above top-down recursive splitting of a given t-pattern with known critical intervals, ending with event types as the (linear) string of leaves (or terminals) of a binary tree of critical intervals. The algorithm thus begins with only a data set of event type series possibly containing t-patterns, and it attempts to construct (detect) such binary t-pattern trees. Rather than trying out all possible combinations, it works bottom-up, level by level, first searching for the simplest possible t-patterns, which at the lowest hierarchical level are pairs of directly coded event types having a critical interval relationship. This relationship, a case of Xleft [d1, d2] Xright, is detected by a special algorithm which considers all possible pairs of components as possible Xleft, Xright parts. It thus measures the time distances from each occurrence of Xleft to the first following or concurrent occurrence of Xright. Using this distribution, it searches for the longest possible interval [d1, d2] such that (Xleft) (ending at t) is, significantly more often than expected by h0, followed within [t + d1, t + d2] by the beginning of another component (Xright). Here h0 is that (Xright) is independently and randomly distributed over the observation period [t1, t2] with a constant probability per time unit: N(Xright)/(t2 - t1 + 1), where N(Xright) is the number of occurrences of Xright.
118
Magnus S. Magnusson
When they are found, the algorithm connects the critically related instances of each of the two components and adds them to the data as the occurrences of a newly detected tpattern, which later in the process may become a (left or right) component in a more complex pattern. Gradually, longer patterns may thus be detected as patterns of already detected patterns. As indicated above, each t-pattern of some length (m > 2) may be represented as a binary tree in various ways: for example, ABCD as ((A (B C)) D), ((A B)(C D)), (((A B) C) D), and so on. Similarly, complex t-patterns existing in the data may be detected (constructed) in many different ways, and this can easily lead to numerous partial and/or redundant detections of the same underlying patterns. The primary objective here is to discover the most complete (most complex, and thus a priori most unlikely) t-patterns; therefore, all detected patterns are automatically compared with all the others, and patterns that occur only as parts of more complete (complex) patterns are dropped. The detection process stops when no more critical relationships can be found, given the specified significance level. At very low significance levels none are found. At a higher level (an approximate “ideal” level, often near 0.005) all the most complex patterns are detected, and at still higher levels the same patterns are more redundantly discovered as more and more binary trees become significant for each underlying pattern (see Magnusson, 2000). Through this process of pattern growth (construction) and competition for maximum completeness, complex patterns often evolve. They constitute the output of the search process and are typically invisible to unaided observers. Statistical Methods and T-Patterns The initial t-pattern algorithms (Magnusson, 1982, 1983, 1988) were developed after carefully considering the use of standard statistical methods for behavior analysis (see, e.g., Colgan, 1978; Monge and Cappella, 1980; Scherer and Ekman, 1982). Such methods, which are implemented in the major statistical software packages and in some specialized behavior analysis software (for example, Bakeman and Quera, 1995; Noldus, 1991), do not allow the detection of complex t-patterns and were not developed for that task. Actually, none of the following essential elements are provided: the t-pattern definition, automatic critical interval detection, multilevel bottom-up pattern construction, and completeness competition. The Theme t-pattern detection program (Magnusson, 1996, 2000) is thus quite different from these, but has some similarity with the so-called evolution programs (Michalewicz, 1996). Research Application T-pattern detection can have two quite different aims. One is to detect effects of external (experimental, independent) variables on behavior. It has been shown that various aspects
Repeated Patterns in Behavior and Other Biological Phenomena
119
of t-patterns, such as the number and types of behaviors and actors involved, may vary strongly with independent variables even when no such effects on event type frequencies or durations are found using traditional statistical methods. A different use of t-pattern detection is aimed at the deepest possible understanding of the structure of each stream of behavior or interaction, but many studies have involved both approaches. (See, e.g., Beaudichon et al., 1991; Blanchet and Magnusson, 1988; De Roten, 1999; Grammer et al., 1998; Hirschenhauser et al., 2002; Lyon et al., 1994; Magnusson, 2000b, 2003; Magnusson and Beaudichon, 1997; Martaresche et al., 2000; Martinez et al., 1997; Merten, 2001; Montagner, et al., 1990; Schwab, 2000; Sevre-Rousseau, 1999; Sigurdsson, 2000; Tardif and Plumet, 2000; Tardif et al., 1995.) In particular, t-patterns may reveal cycles not present in any of their component series (Magnusson, 1989). Attempts have been made to represent the structure of particular types of encounters in terms of a kind of (flowchart, graph) “grammar” based on the t-patterns detected within them (Duncan, 2000). (For other references and information concerning t-patterns and Theme, see www.hi.is/~msm, www.patternvision.com, and www.noldus.com). Results A few t-patterns detected in different types of human interactions will be presented here. The main purpose is to show that complex t-patterns may be hidden in behavior and that they can be detected with the t-pattern algorithm. All critical intervals of all presented patterns were significant at 0.005 or lower, and only far simpler nonsense patterns were found when the data were randomized before the search. The randomization of a whole data set here consists of simply replacing the occurrence series of each event type with a series containing the same number of points dispersed randomly over the observation period. Reading the T-Pattern Diagrams Figure 7.2 shows the data set in which the pattern presented in figure 7.4 was detected. Each horizontal line of points in figure 7.2 thus shows the occurrence times of one of its 53 event types. The pattern in figure 7.3 was detected in an equally opaque data set (not shown, to save space). The three-box t-pattern diagram, as shown in figures 7.3 and 7.4, was created for the visualization of various aspects of detected t-patterns, especially the way in which they were gradually detected, bottom-up and level by level. The focus is thus on the hierarchical critical interval relationships between the occurrence series of the event types that
120
Magnus S. Magnusson
Series
53
1 0
Seconds
1506
Figure 7.2 This figure shows the real-time behavior record or data set with 53 occurrence time series, which resulted from the coding of just over 25 min of two children’s collaborative problem solving. Time is in seconds.
make up the t-pattern. It also shows the way in which particular points in each series are connected to form each instance of the pattern. There are three main boxes. The Top-Left Box shows all the event types (i.e., X1 · · · Xm) of the pattern and how they are gradually connected, level by level, into the full binary tree t-pattern. For example, in figure 7.4, at the first level, (2) connects to (3), forming pattern (2 3), and (4) connects to (5), forming pattern (4 5). At the second level two patterns are also formed: (1) connects to pattern (2 3), forming pattern (1 (2 3)), and pattern (4 5) connects to (6), forming pattern ((4 5) 6). Finally, at the third level the patterns (1 (2 3)) and ((4 5) 6) are connected to form the full pattern shown in figure 7.4: (1 (2 3)) ((4 5) 6). The Top-Right Box Immediately to the right of each event type in the top-left box, the occurrence series (from the data set) is shown. Connection lines also reveal how the particular critically related occurrences of the event types and/or subpatterns are connected,
Repeated Patterns in Behavior and Other Biological Phenomena
121
Figure 7.3 Interactive t-pattern detected in two five-year-olds, X and Y, playing for 13 min with a picture viewer. B = begins; E = ends. Behaviors are automanipulate = fiddle with something without watching it; haveviewer = have the viewer; order,viewer = order the other to give up the viewer; view,long = look in the viewer for >3 seconds; lookat,partner = looks at the other; lookat,picturecard = looks at a picture card that’s not in the viewer; manipulate,viewer = manipulates the viewer. Time is in video frames, 1/15 s.
level by level, to gradually form the complete pattern. (In this box, occurrences of subpatterns that sometimes occur outside the full pattern are also shown.) The Lower Box shows the occurrences of the full t-pattern tree on the real-time axis in a manner similar to the lower part of figure 7.1, but without the letters. Note that when event types occur simultaneously within a pattern, lines overlap and the branching becomes invisible, but can still be seen in the top-right box. Pattern Example 1 This interactive pattern (figure 7.3) was found in a 13-min dyadic interaction between two five-year-old children who took turns playing with a picture viewer and a few picture cards
122
Magnus S. Magnusson
Figure 7.4 Interactive t-pattern between two five-year-old children, E and N. Only beginnings (B) were coded. The event types in the pattern are (top-left box): (1) E gives an order (ORD) regarding the task (TAC); (2) N provides information (FOU) regarding the task, nonverbally (NV). (3) N provides information regarding the task, verbally (default). (4) N asks a question (QUE) regarding a solution rule (REG), nonverbally. (5) E makes a positive evaluation (EVP) regarding the task, talking to herself (S). (6) = (1). Time is in seconds.
(Magnusson, 1996). Their behavior was coded using a preexisting list of categories (McGrew, 1972) with a few additions related to the particular situation. Unexpectedly, a very regular t-pattern with 25 event types was found, and the total duration of its four occurrences was >90 percent of the total observation time. However, no verbal acts had then been coded. When the occurrences of the verbal act “(begins) order the other to give up the viewer” (i.e., b, order, viewer) was tentatively coded for both children, the t-pattern shown in figure 7.3 was discovered. (Only beginning coded, due to the short event duration.) It is not the most complex pattern detected in this interaction, but it is presented here in relation to the point made above: that the meaning or function of a simple behavior may depend on its relationship to a (here multimodal) hidden pattern. The “order, viewer” behavior of one of the two children is the fifth event type of this pattern—(5) in figure 7.3—and clearly may be left out without noticeably affecting this t-
Repeated Patterns in Behavior and Other Biological Phenomena
123
pattern or its detection. The causal effect of “order, viewer” is therefore in doubt, and each pattern occurrence is predictable from well before the “order, viewer” behavior occurs until the final event type (22). However, the event type “x, b, order, viewer” also occurs twice outside the pattern (see figure 7.3), in both cases possibly a bit too early to be effective. In any case, it seems likely that expectations are building up in each child relative to the “meaning” of each act performed by the other within this repetitive and highly patterned context. Pattern Example 2 The interactive pattern example shown in figure 7.4 was found in one of the dyadic interactions coded in a study of children’s collaborative (dyadic) problem solving (see, notably, Beaudichon et al., 1991; Magnusson and Beaudichon, 1997). A total of 538 occurrences of 53 different verbal and nonverbal event types occurred in this particular 25:07 min dyadic interaction between children E and N (see the data set in figure 7.2). As can be seen in figure 7.4, the three occurrences of the pattern are in progress during most of this 25-min interaction. Each pattern occurrence consists of the following acts (where b stands for “begins”; only the beginning points of these brief acts were coded): 1. E, b, ord, tac: E gives an order (ord) concerning task (tac), and is followed within 5 to 7 s by 2. N, b, fou, tac, nv and, simultaneously, 3. N, b, fou, tac: N provides information (fou) concerning the task both nonverbally (nv) and verbally (default). Then, each time (!), 4:00 to 4:04 min later, 4. N, b, que, reg, nv: N asks a question (que) regarding a solution rule (reg) nonverbally (nv), 14–18 s later by 5. E, b, evp, s: E makes a positive evaluation (evp) of task performance, talking to herself (s). Finally, 2:02 to 2:05 min later, 6. E, b, ord, tac: E again, as in (1), gives an order concerning task execution (actually a series of orders, as can be seen in figure 7.4), and the whole pattern (again) follows. (1) and (6) are thus the same, but different instances are involved. This pattern, which involves 5 of the 53 series in the data set, thus revealed a deep, unexpected, and invisible temporal structure in this complex 25-min encounter. Discussion Hidden t-patterns seem common in human behavior and interactions, and interacting humans tend to construct complex patterns and then repeat them in a similar way within
124
Magnus S. Magnusson
each interaction. The production and perceptual detection of such patterns may well constitute important social skills to be considered in studies of, for example, human social handicaps and “social” robotics. Genes and Genomes as t-Patterns The so-called backbone of the DNA molecule is a cyclical structure of alternating molecules (somewhat like a ticking clock or markings on a scale) with a base pair occurring at each cycle. Each gene is a particular pattern of such base pairs along the DNA molecule (but see, e.g., Keller, 2000), and in complex organisms such as Drosophila and humans, each gene is composed of two alternating patterns called introns and exons. When a gene is transcribed into a protein, only the exons are used, but the noncoding introns define distances between the exons characteristic for the gene much as the characteristic distances (ªdt) separate the components (event types or patterns) in t-patterns. The DNA gene sections are again separated by noncoding sections, so the whole genome can be seen as a massively repeated t-pattern within the organism, influencing all its functioning. Introns are not present in bacterial genes (Griffiths et al., 1999, p. 33), and it is tempting to ask whether an analogy might exist in the evolution of behavioral and communication patterns. A search for t-patterns in DNA, RNA, and proteins is now in progress in collaboration between the author and researchers at the Musée de l’Homme, Paris, Pattern Vision Ltd., and the University of Paris VII (Icelandic Research Center, grants: 013220001 and 013220002). T-Patterns, Writing, and Memes Writing transformed vocal verbal behavior into relatively durable objects independent of the producers, and thus created a revolution in human behavioral possibilities especially after the invention of the printing press—without which modern science and technology would hardly exist. Through writing, speech sound t-patterns are translated from the single dimension of time to that of a string of symbols on a page that, like the DNA molecule, are much more durable than the sounds. These types of relatively durable strings thus bring about, from within and from outside, some approximately predictable effects on individual behavior. A multitude of “cultural genes” or memes (notably, Blackmore, 1999) seems to depend on relatively durable t-patterns. Bibles, constitutions, and many other standard word sequences are examples of such “t-meme” objects that influence human communities in somewhat the same way as molecular sequences (genomes, genes, proteins, or pheromones) influence organisms or insect societies. Noticeably, like cells, different categories of human individuals are known to focus on or use different (sections of) standard texts.
Repeated Patterns in Behavior and Other Biological Phenomena
125
Toward Pattern Bases For studies of evolution, databases of detected behavioral patterns need to be easily accessible—for example, through the Internet, as is already the case for molecular sequences within genetics (see, e.g., Attwood, 1999; Gibas and Jambeck, 2001). The creation of such a pattern base is in preparation in the context of a collaborative project between the author and Dr. Benjamin Isaac Arthur, Jr., at the Biology Department of Brandeis University, where numerous t-patterns have just been discovered in Drosophila courtship interactions. Brain Behavior Since the brain provides the moment-to-moment control of human behavior, it seems reasonable to guess that the temporal organization of its activity might be at least somewhat similar to that of behavior. Or, in the words of Scott Kelso, “In fact, the claim of the floor is that both overt behavior and brain behavior, properly construed, obey the same principles” (Kelso, 1997, p. 28). Recent technological developments now allow concurrent registration of multiple related brain cells (see, e.g., Rieke et al., 1997), and thus the possibility exists of finding t-patterns in cell interactions. The very first such search has been carried out. A multitude of complex intercell t-patterns was detected, but further systematic study is in progress (in collaboration between the author and Dr. Alister U. Nicol, Laboratory of Cognitive Neuroscience, Babraham Institute, Cambridge, U.K.). Facing Behavioral Complexity Behavioral scientists have had their hands full with the study of directly visible/audible behavior, and have paid less attention to hidden repeated patterns, probably in part due to the rarity of adequate models and tools—a kind of vicious circle. At least within psychology the situation has not been favorable: “Only about 8% of all psychological research is based on any kind of observation. A fraction of that is programmatic research. And, a fraction of that is sequential in its thinking” (Bakeman and Gottman, 1997, p. 184). Within social psychology, a similar situation has prevailed regarding the temporal aspect of behavior (McGrath, 1988). And ethology students are still taught little about structural analysis except the simplest kinds of sequential analysis, which easily miss the rich complexity of behavior and often produce more frustration than insight. One wonders what might be the state of, for example, chemistry under similar constraints. Computers versus Nervous Systems Computers already turn out to be inferior or superior to humans, depending on the nature of the task. Thus, for example, highly regular t-patterns easily escape the attention of
126
Magnus S. Magnusson
humans, while a relatively simple special-purpose algorithm can find complex t-patterns even when large numbers of other behaviors occur in parallel. Still, the t-pattern type seems to correspond to a large class of biological patterns that are especially common in communicative behavior. Is the nervous system constantly more or less overloaded as it simultaneously considers too many possibilities? And how much is it possibly detecting at the unconscious level? Conclusion The creation of new model patterns with corresponding detection algorithms is needed to allow new insights into the hidden complexity of behavior and communication processes. This will undoubtedly continue to require considerable interdisciplinary collaboration, including the relatively new fields of complexity and bioinformatics. I hope that, a kind of future “ethomatics” will help bring to light the true complexity and evolution of biological communication systems. References Attwood TK (1999) Introduction to Bioinformatics. New York: Prentice-Hall. Bakeman R, Gottman JM (1997) Observing Interaction: An Introduction to Sequential Analysis. Cambridge: Cambridge University Press. Bakeman R, Quera V (1995) Analyzing Interaction: Sequential Analysis with SIDS and GSEQ. New York: Cambridge University Press. Beaudichon J, Legros S, Magnusson MS (1991) Organisation des régulations inter et intrapersonnelles dans la transmission d’informations complexes organisées. Bull Psych 44 (399), spec iss: 110–120. Les Processus de Contrôle dans la Résolution de Tâches Complexes: Développement et Acquisition. Blackmore S (1999) The Meme Machine. New York: Oxford University Press. Blanchet A, Magnusson MS (1988) Processus cognitifs et programmation discursive dans l’entretien de recherche. Psych Française 33 (1/2): 91–98. Chomsky N (1959) Review of Skinner. Verb Behav Lang 35: 26–58. Chomsky N (1965) Syntactic Structures. The Hague: Mouton. Colgan PW (ed.) (1978) Quantitative Ethology. New York: John Wiley and Sons. Cosnier J (1971) Clefs pour la Psychologie. Paris: Seghers. Crick FHC (1988) What Mad Pursuit: A Personal View of Scientific Discovery. New York: Basic Books. Crutchfield J (1993) The calculi of emergence: Computation, dynamics and induction. Santa Fe Institute Working Paper no. 94-03-016. Daley DJ, Vere-Jones D (1988) An Introduction to the Theory of Point Processes. Berlin: Springer-Verlag. Dawkins R (1976) Hierarchical organisation: A candidate principle for ethology. In: Growing Points in Ethology (Bateson PP, Hinde RA, eds.). Cambridge: Cambridge University Press. de Roten Y (1999) L’Interaction mère–enfant dans la narration d’un événement d’ordre émotionel. Doctoral thesis, Psychology Department, Faculty of Psychology and Educational Sciences, University of Geneva.
Repeated Patterns in Behavior and Other Biological Phenomena
127
Devlin KJ (1997) Mathematics, the Science of Patterns: The Search for Order in Life, Mind and the Universe. New York: Scientific American Library. Duncan S (2000) Analyzing family interaction in real time: Structure and strategy. Communication presented at the workshop “Behavior and Time,” University of Iceland, July 29–30, 2000. Duncan SD, Fiske DW (1977) Face-to-Face Interaction: Research, Methods and Theory. Hillsdale N.J.: Lawrence Erlbaum. Eibl-Eibesfeldt I (1970) Ethology: The Biology of Behavior. New York: Holt, Rinehart and Winston. Gibas C, Jambeck P (2001) Developing Bioinformatics Computer Skills. Sebastopol, Calif.: O’Reilly. Grammer K, Kruck KB, Magnusson MS (1998) The courtship dance: Patterns of nonverbal synchronization in opposite-sex encounters. J Nonverb Behav 22 (1): 3–29. Griffiths AJF, Gelbart WM, et al. (1999) Modern Genetic Analysis. New York: W.H. Freeman. Grosberg AI, Khokhlov AR (1997) Giant Molecules. New York: Academic Press. Hirschenhauser K, Frigerio D, Grammer K, Magnusson MS (2002). Monthly patterns of testosterone and behavior in prospective fathers. Hormones Behav 42: 172–181. Holland JH (1998) Emergence: From Chaos to Order. Reading, Mass.: Addison-Wesley. Kaufman S (2000) Investigations. Oxford: Oxford University Press. Keller EF (2000) The Century of the Gene. London: Harvard University Press. Kelso JAS (1997) Dynamic Patterns: The Self-Organization of Brain and Behavior. Cambridge, Mass.: MIT Press. Kendon A (1990) Conducting Interaction: Patterns of Behavior in Focused Encounters. Cambridge and New York: Cambridge University Press. Lyon M, Lyon N, Magnusson MS (1994) The importance of temporal structure in analyzing schizophrenic behavior: Some theoretical and diagnostic implications. Schiz Res 13: 45–56. Magnusson MS (1982) Temporal configuration analysis: Detection of an underlying meaningful structure through artificial categorization of a real-time behavioral stream. Paper presented at workshop on artificial intelligence, University of Uppsala. Magnusson MS (1983) Theme and syndrome: Two programs for behavior research. In: Symposium in Applied Statistics (Edwards D, Hoeskuldsson A, eds.). Copenhagen: NEUCC, RECKU, and RECAU. Magnusson MS (1988) Le Temps et les patterns syntaxiques du comportement humain: Modèle, méthode et programme THEME. Rev Cond Trav La qualité de lavie scolaire, 284–314. Hors série. Magnusson MS (1989) Structure syntaxique et rythmes comportementaux: Sur la détection de rythmes cachés. Sci Tech Anim Lab 14 (2): 143–147. Magnusson MS (1996) Hidden real-time patterns in intra- and inter-individual behavior: Description and detection. Eur J Psych Assess 12 (2): 112–123. Magnusson MS (2000a) Discovering hidden time patterns in behavior: T-patterns and their detection. Behav Res Meth Instr Comp 32 (1): 93–110. Magnusson MS (2000b) Diagnostic possibilities of behavioral time structure analysis: Discovering group differences through statistical analysis of detected T-patterns. Paper presented at “Measuring Behavior 2000,” 3rd International Conference on Methods and Techniques in Behavioral Research, Nijmegen, The Netherlands, August 15–18, 2000. Abstract downloaded on November 5, 2001, from www.noldus.webaxxs.net/events/mb2000/program/abstracts/magnusson2.html. Magnusson MS (2003) Analyzing complex real-time streams of behavior: Repeated patterns in behavior and DNA. In: L’Éthologie Appliquée Aujourd’hui (Baudoin C, ed.), vol. 3, Ethologie Humaine. Levallois-Perret, France: Editions ED. Magnusson MS, Beaudichon J (1997) Détection de “marqueurs” dans la communication référentielle entre enfants. In: Conversation, Interaction et Fonctionnement Cognitif (Bernicot J, Caron-Pargue J, Trognon A, eds.). Nancy: Presses Universitaires de Nancy.
128
Magnus S. Magnusson
Martaresche M, Le Fur C, Magnusson MS, Faure JM, Picard M (2000) Time structure of behavioral patterns related to feed pecking in chicks. Physiol Behav 70 (5): 443–451. Martinez M, Forns M, Boada H (1997) Estudio longitudinal de la comunicación referencial en niños de 4 a 8 anos. Anuario Psicol (75): 37–58. McGrath J (1988) The Social Psychology of Time. London: Sage. McGrew, WC (1972) An Ethological Study of Children’s Behavior. London: Academic Press. Merten J (2001) Beziehungsregulation in Psychotherapien. Maladaptive Beziehungsmuster und der Therapeutische Prozess. Stuttgart: Kohlhammer. Michalewicz Z (1996) Genetic Algorithms + Data Structures = Evolution Programs. Berlin: Springer-Verlag. Miller GA, Galanter E, Pribram KH (1960) Plans and the Structure of Behavior. New York: Henry Holt and Company. Monge PR, Cappella JN (eds.) (1980) Multivariate Techniques in Human Communication Research. New York: Academic Press. Montagner H (1978) L’Enfant et la Communication. Paris: Stock/Pernoud. Montagner H, Magnusson MS, Casagrande C, Restoin A, Bel JP, Hoang PNM, Ruiz V, Delcout S, Gauffier G, Epoulet B (1990) Une Nouvelle Méthode pour l’étude des organisateurs de comportement et systèmes d’interaction du jeune enfant. Psych Enfant 33 (2): 391–456. Noldus LPJJ (1991) The Observer: A software system for collection and analysis of observational data. Behav Res Meth Instr Comp 23: 415–429. Rieke F, Warland D, et al. (1997). Spikes: Exploring the Neural Code. London: MIT Press. Scherer KR, Ekman P (eds.) (1982) Handbook of Methods in Nonverbal Behavior Research. Cambridge and Paris: Cambridge University Press and Maison des Sciences de l’Homme. Schwab F (2000) Affektchoreographien. Eine evolutionspsychologische analyse von grundformen mimischaffektiver interaktionsmuster. Dissertation, Empirical Human Sciences, University of Saarland. Sevre-Rousseau S (1999) Les competences sociales des enfants sourds-aveugles: Influences de l’interlocuteur et du contexte sur les échanges interpersonnels. Doctoral thesis, Department of Developmental Psychology, University of Paris V—René Descartes, Human Sciences, Sorbonne. Sigurdsson T (2000) Relation de Tutelle entre Parents et Enfants Handicapés Mentaux de Quatre a Six Ans. Lille: Presses Universitaires de Septentrion. Sinha P (2002) Recognizing complex patterns. Nat Neurosci 5 (supp) (November): 1093–1097. Skinner BF (1957) Verbal Behavior. New York: Appleton-Century-Crofts. Solé R, Goodwin B (2000) Signs of Life: How Complexity Pervades Biology. New York: Basic Books. Tardif C, Plumet MH (2000) La Détection des répertoires d’interaction sociale propres à chaque enfant autiste: Enjeux pour la recherche et la clinique. In: Autisme: Perspectives Actuelles (Gérardin-Collet V, Riboni C, eds.). Nancy: L’Harmattan-IRTS. Tardif C, Plumet MH, Beaudichon J, Waller D, Bouvard M, Leboyer M (1995) A micro-analysis of social interactions between autistic children and normal adults in semi-structured play situations. Internat J Behav Devel 18: 727–747. Tinbergen N (1963) On the aims and methods of ethology. Zeit Tierpsych 20: 410–433.
IV
ANIMAL COMMUNICATION SYSTEMS: A COMPARATIVE BASIS
8
Social Processes in the Evolution of Complex Cognition and Communication
Charles T. Snowdon Introduction We readily accept many parallels between the behavior of ourselves and our nonhuman primate cousins. Chimpanzees, as well as lions and wolves, hunt cooperatively. Cultural behavior is evident in the differing patterns of tool use among chimpanzees, and no longer do we conceive of toolmaking and tool use as uniquely human. Apes, monkeys, and even sheep can reconcile after disputes, and many species appear to display empathy. Great apes show an understanding of the knowledge of companions and use this in teaching, deception, and manipulation of others. Yet language is different. Nothing that nonhuman animals can do approaches the complexity of our vocabulary; our grammar; the concepts and ideas, both concrete and abstract, that we can express; our playfulness and creativity as we devise new words like Xerox and fax; as we pun, write poetry or novels, or express our love for another. So is it not foolish even to think about language as a part of evolution? How can something so complex, so expressive, so unique have evolutionary origins? Surely it is language that defines us as human. Language is the vehicle that has made possible all other human accomplishments: social organization and social relationships, agriculture and industry, the fantastic buildings and other structures that we have created For an evolutionary biologist the fact of language creates a potential conundrum. Is language a special creation unique to our species (in which case we must accept the possibility of other forms of special creation), or can we develop and support arguments for language as an evolved behavior like all other behavior? Even if we can imagine some arguments favoring the possible evolution of language, how can we move plausibly from the relative simplicity of the communication of our nonhuman primate relatives to the complexity of human language? In some ways this is a false issue. Each species has a unique set of signals that it uses for communication. Each species has a unique set of social and environmental circumstances to deal with, and therefore it should not be surprising to find that communication is species-unique. However, an understanding of evolution also leads to expectations of finding commonalities in how signals are formed: with respect to vocal communication, some form of vibrating organ in the throat, and influences of oral and nasal cavities and stoppage by tongue, teeth, or lips; how signals are transmitted through the environment and perceived by others; and commonalities in how different species use communication to manage social living. Evolutionary theory presents us with two types of models: diverging processes, by which argument those species closest in evolution should share the most features, and
132
Charles T. Snowdon
converging processes, by which argument species that share similar environmental or social problems might have similar features. By the arguments of diverging evolution, the great apes are the nearest human neighbors, with chimpanzees and bonobos being closer than gorillas or orangutans. Let us first examine the communication of great apes. The Silence of the Apes In the South American rain forest, noise is everywhere. From insects to frogs to birds to monkeys, every organism communicates, many through elaborate vocalizations. I can locate a group of pygmy marmosets, the world’s smallest monkeys, weighing about 120 g fully-grown and with highly cryptic coloration, simply by listening for their distinctive vocalizations. On the other hand, when I visited Karasoke in Rwanda, the famous field site for research on mountain gorillas, I was struck by the silence. At the high elevations where mountain gorillas live (3,000 to 4,000 m) there are far fewer species of birds and insects than in the Amazon. The gorillas rarely called, and the sounds they produced had none of the apparent diversity and complexity of the calls of pygmy marmosets (Harcourt and Stewart, 2001; Pola and Snowdon, 1975). Nonetheless, Harcourt and Stewart (2001) argue that mountain gorillas have the most complex and frequent vocalizations of the great apes. Chimpanzees and bonobos are more conspicuous than gorillas when they vocalize, but they vocalize much less frequently than most of the Amazonian primates I have observed. Some chimpanzee calls, like the pant-hoot, do have a complex structure, and Mitani and Brandt (1994) have demonstrated population differences in pant-hoot structure across different locations in Central Africa. Still, Mitani (1996) concludes that there is little complexity in either the structure or the usage of vocalizations in chimpanzees. Based on what we know, the natural vocal communication system of great apes offers little raw material from which to construct speech and language. One physical anthropologist has argued against a spoken language in apes because of the position of the larynx (Lieberman, 1975). The larynx is too high in the throat to allow the production of the extreme vowel sounds (as in “see,” “saw,” and “sue”), yet there are many other vowel sounds that a chimpanzee could make, and chimpanzees appear to have the vocal apparatus to produce nasals and most stopped consonants (with the possible exception of glottal stops, as in “ga” and “ka”). Although one might argue that not having the full capacity for producing all human speech sounds could be a limitation on the evolution of a spoken language, there do exist natural languages with very few phonemes (Rotaka and Mura use only 11 phonemes).
Social Processes and Evolution of Complex Communication
133
It should be possible for chimpanzees to imitate many human speech sounds, yet the studies of Hayes and Nissen (1971) with the chimpanzee Vicki found it was a struggle to teach her to imitate even three words. More recent studies (Hopkins and SavageRumbaugh, 1991) have reported somewhat more success with the bonobo Kanzi, but the vocal repertoire is still far short of what one might predict from vocal anatomy. Provine (2000) concluded from studies of laughter in chimpanzees and humans that a difference critical for language is the ability of humans to modulate breath exhalations. The evolution of bipedality, he argues, freed humans to use respiration for complex communication. We know from several studies involving either signing (Gardner and Gardner, 1969; Miles, 1983) or the use of computer-based symbols (Matsuzawa, 1996; Rumbaugh, 1977; Savage-Rumbaugh and Lewin, 1994) that apes can demonstrate a capacity for learning to use a large number of symbols and can use these symbols in complex ways. We also know that bonobos can understand completely novel spoken commands and follow the simple grammatical sequences of the commands (Savage-Rumbaugh et al., 1993), and that dolphins can do the same, using a visual communication system (Morrel-Samuels and Herman, 1993). Thus some of the cognitive abilities that we assume are essential for language appear to be present in great apes and dolphins. However, most great apes using an artificial language appear not to go beyond a few hundred symbols at best, and they rarely, if ever, engage in true conversation (e.g., asking how their trainers feel or what the trainers would like). The function of communication for the apes appears primarily to be the satisfaction of their own needs. The social functions of communication appear not to have emerged. Basically this translates to a question of motivation: Humans enjoy communicating with each other; apes do not seem as interested in communication or are in less need of vocal communication (see Dunbar, chapter 14 in this volume). The comparative silence of the apes in their natural habitat and the factors that have appeared to limit progress in experimental training programs clearly place limitations on the usefulness of apes for understanding language origins. The disconnection between the capabilities of humans and of chimpanzees and bonobos is major, all the more so given the close genetic similarity (almost 99 percent of DNA in common) and relatively similar vocal and brain anatomy. All of these features might be characterized as physical structures shaped by evolution, perhaps for functions other than communication. An alternative approach is to look at species with social structures similar to those of humans and with similar developmental processes. In this scenario, species with similar social lives are assumed to have communication needs in common. However, because of great differences in body size, vocal production mechanisms, and other factors, we cannot expect similar acoustic structures, and perhaps not even similar cognitive abilities. We should find clear analogies, but not homologies.
134
Charles T. Snowdon
Seeking Models Elsewhere If great apes are not the best models, then where should we look? In a seminal paper Peter Marler (1970) argued that for good models of speech, we must look to highly vocal species. He stated that vocal development in songbirds shared many features with human vocal development. If we apply Marler’s argument to nonhuman primates, then forest-dwelling monkeys become very important. Unlike terrestrial or semiterrestrial species that can use gestures and other visual signals, arboreal species are more likely to use vocal signals. Although there are several forest-dwelling species in Africa and Asia, only recently have there been extensive studies of the vocal behavior of these primates (e.g., Zuberbühler 2000, 2002). Most attention has been given to mainly terrestrial species (baboons and vervet monkeys in Africa, macaques in Asia). All Neotropical primates are arboreal, living in rain forests, and some of the strongest evidence for vocal complexity has emerged in studies of these primates. Extending Marler’s argument, we could say that highly vocal species that have a social structure similar to that of humans might be productive models. Sarah Blaffer Hrdy has argued in Mother Nature (1999) that human mothers can rarely rear infants without support from either fathers or nonreproductive helpers. Thus, she argues that humans fit the definition of cooperative breeders, where infant care is distributed among parents and nonreproductive alloparents. The only other cooperatively breeding primates are the several species of marmosets and tamarins found throughout the neotropics. Given the presumption that forest-dwelling species might vocalize more in general, vocal communication in cooperatively breeding arboreal primates might be ideal for comparisons with human speech and language. The Complexity of Animal Signals The signals of many monkeys are quite complex, and we probably do not have a complete repertoire for any species. Green (1975a) was the first to notice that sounds that seem similar to the human ear have different acoustic structures, and these different structures have different social functions. The “coo” vocalization of Japanese macaques has seven different variants that can be characterized in terms of a few distinctive features: whether the frequency modulation envelope is smooth or not, the location of the peak of frequency modulation early or late in the call, and whether there is single or double voicing of the call. Green (1975a) hypothesized that these coos varied with the intensity of social motivation, and placed them on a continuum. However, the calls also appeared to be specific to certain contexts: an infant approaching its mother gives a different coo than one comfort-
Social Processes and Evolution of Complex Communication
135
able alone; a dominant monkey gives a different form of coo to a subordinate than a subordinate gives to a dominant. Subsequent laboratory studies demonstrated that Japanese macaques could more easily discriminate between two variants of their coos than other species could (Zoloth et al., 1979), and that Japanese macaques showed a right ear advantage in discriminating call differences (Petersen et al., 1978). In a study of the related species of stump-tail macaques, Lillehei and Snowdon (1978) found similar structures in the coos of infants vocalizing alone versus vocalizing to their mothers. The significance of these studies is that signals heretofore classified by humans in a single call category may, in fact, have multiple variations, and that each variant may relate to a different context or function. Thus, primates have much larger vocal repertoires than expected. Unless we have done a careful acoustic and contextual analysis, we might be unaware of the complexity of primate signals. Pygmy marmosets have a repertoire of at least 25 different sounds (Pola and Snowdon, 1975), with one complex of calls—the trills—having four variants that are either responded to differently in playback trials (Snowdon and Pola, 1978) or used by animals when they are at different distances from each other in the field (Snowdon and Hodun, 1981; de la Torre and Snowdon, 2002). When each variant was played back in the natural habitat, each was distorted differently by the habitat and each call was maximally distorted at different distances (de la Torre and Snowdon, 2002). Marmosets used the call variants with the most rapid distortion at close distances and reserved variants with the least distortion for when they were farther from other group members. These variants can be classified using distinctive features of duration, envelope of frequency modulation, and rate of frequency modulation. The cotton-top tamarin has at least 35 different calls or call combinations (Cleveland and Snowdon, 1982), and there are two clusters of calls that show subtle variants. I described eight types of chirp vocalizations—short, frequency-modulated calls—that could be distinguished from each other on the basis of a few features: duration, peak frequency, presence or absence of a chevron (or initial upward frequency modulation) (Snowdon, 1982). Each variant appeared to be specific to one context: for example, mobbing, mild alarm, severe alarm, a strange group, approaching an attractive food, eating that food, and low arousal within group activities. Notice that the contexts differ greatly from each other. The chirp variants cannot easily be placed on a continuum of increasing arousal as Green (1975a) could do with Japanese macaque coos. In playback studies, tamarins distinguished behaviorally between the two most similar versions of chirps (Bauers and Snowdon, 1990) and, when we created the appropriate context for each chirp type, adults produced only the type of chirp appropriate to that context (Castro and Snowdon, 2000). Thus, chirp variants are not figments of our imagination or our analysis methods, but have functional significance to tamarins.
136
Charles T. Snowdon
There are at least three variants of long calls (two or three low-frequency, long notes of nearly a second each). One form is typically given in response to hearing strange monkeys; another is used within groups for cohesion or occasionally when a monkey is separated. A third form is typically given by nonreproductive animals in both contexts, but also by adults separated from the group. In playback studies tamarins distinguished between the different forms and could discriminate between calls of mates and of strangers (Snowdon et al., 1983; Weiss et al., 2001). Some New World primates show simple finite state grammars: titi monkeys (Robinson, 1979), cotton-top tamarins (Cleveland and Snowdon, 1982), and pygmy marmosets. These sequences do not appear to have duality of patterning (that is, changing the order of the sequence does not seem to produce new meanings), but in some cases the sequence appears to communicate more than the sum of each unit contributing to the call. In addition, there are several parallels in nonhuman animals to processes involved in speech perception and language learning. Species as different as Japanese quail, chinchillas, and macaques can discriminate human speech sounds in the same categorical fashion as humans do, suggesting that the perception of speech sounds (and presumably the contrasts on which speech is based) have been built on perceptual processes with a long evolutionary history (Kluender, 1994). There are developmental parallels as well. I have observed a phenomenon that we call pygmy marmoset babbling (PMB), which appears to share many features with the babbling of human infants. Infant pygmy marmosets start producing long sequences of vocalizations, often lasting several minutes, a few weeks after birth. These sequences mainly include calls that are recognizable from the adult repertoire (as phonemes are in canonical babbling). Oller (2000) interprets these as similar to sequences occurring in the expansion stage of human vocal development. Infants produce only a subset of about 60 percent of the adult repertoire. These calls are usually produced in sequences of two to five calls of one type before the infant switches to a different type (showing repetition and rhythmicity). The sequence of sounds produced bears no relationship to the infant’s ongoing behavior. A threat call will follow soon after an affiliative call, which in turn follows a fear call, which follows a food call. There appears to be no functional relevance to PMB, except that PMB increases social interactions between adults and infants. Babbling infant marmosets are more likely to be in contact with other group members when babbling than when not babbling (Elowson et al., 1998). We have observed babbling in wild pygmy marmosets and have used babbling to find groups in the forest. There are anecdotal reports of babbling in other species of marmosets. Babbling decreases as marmosets get older and the proportion of adult forms of vocalization increases. There is a clear progression toward more accurate forms of adult calls.
Social Processes and Evolution of Complex Communication
137
We measured changes in the structure of trill vocalizations, the most common call in the repertoire of adult marmosets as well as in the babbling of young marmosets. With increasing age pygmy marmosets produced more regular and accurate forms of trills, achieving accuracy in different parameters at different ages. Furthermore, we found a relationship between the amount and diversity of babbling in the first five months and the rate at which individuals developed adult structures, suggesting that babbling might function as a form of practice (Snowdon and Elowson, 2001). Marmosets changed the context of babbling as they got older, often using babbling in agonistic contexts to indicate submissiveness. Thus, much raw material for building a language exists in nonhuman species: complex vocalizations with subtle variations indicating different motivational states or contexts, simple finite state grammars, and even babbling behavior. Nonhuman species discriminate human speech sounds readily and perceive them in a humanlike way. It appears likely that language has been built upon features that appeared early in evolution. Documentation of this raw material is critical for any attempt to describe language as an evolutionary process. Language: Instinct or Social Construction? The issue of language development has led to great controversy with respect to both the processes of development within human infants and the relevance of studying other species for understanding language origins. Is language an “instinct” (defined in my dictionary as a response largely hereditary and unmodifiable) with specific brain modules for each aspect of language (Pinker, 1994), or is there plasticity and modifiability in language development? Are there critical periods for vocal development in nonhuman species or for language learning in humans, or can language skills be acquired throughout life? To what extent do social interactions and reinforcement shape language and communication processes? Most of our current evidence suggests that language cannot be an “instinct” in the strictest form of the definition. Although language universals exist (Jakobson and Halle, 1956), there is also enormous variation. There are many different languages, each with its own set of phonemes, words, and grammar, so some learning must be involved. Human infants appear much more motivated to communicate vocally than their great ape cousins, and deaf children are equally motivated, as shown through studies of their spontaneous gestures (Goldin-Meadow, 1997). Bates and Marchman (1988) have argued that there are few universals in language development, especially when several different languages are examined. Children appear to acquire first the grammatical skills that are relevant to their own language. Biological constraints do exist on language development, but there is remarkable resiliency as well. Children with brain damage in language areas can recover language skills with sufficient training, and biological markers of brain development
138
Charles T. Snowdon
associated with supposed critical periods have been difficult to discern (Bates et al., 2003). Adults can learn a second language, although perhaps not with the same proficiency as they learn a first language (Johnson and Newport, 1989), with an extreme example coming from an indigenous population in the northwest Amazon, where there are 25 different languages from 4 language families. Due to marriage outside of one’s own group, an infant is likely to learn both maternal and paternal languages, and when she or he marries, will learn the spouse’s language as well as those of the in-laws (Sorenson, 1967). The difficulties many of us have in learning a second language as adults may not relate to an innate critical period as much as to cognitive time-sharing with many other activities, and a much less intense degree of specific teaching and reinforcement (Snowdon, 1999). Infants can learn to group syllables into units (akin to words) through a process of statistical learning. An infant exposed to as little as two minutes of nonsense syllables sequenced so that some groups of syllables occur together with 50 percent or 70 percent probability, subsequently discriminates these clusters from other groups of syllables that did not occur together (Saffran et al., 1996). Cotton-top tamarins display similar statistical learning (Hauser et al., 2001). Instinct or Social Construction of Animal Communication? What do we know about vocal development in nonhuman animals? In birds, there is some evidence of a critical period with constraints on what is learned (Marler, 1970). But birds can learn songs of different species when housed with live tutors (Baptista and Gaunt, 1997), or learn song variants specific to a new breeding area (Payne and Payne, 1997), or alter call structure as a function of new social companions (Nowicki, 1989; Brown and Farabaugh, 1997; Hausberger, 1997). Cowbirds reared with inappropriate social companions direct song to inappropriate targets rather than biologically relevant targets (West et al., 1997). Most evidence from nonhuman primates has been more supportive than studies of birdsong of the “language as instinct” model. Early studies on primates failed to find any evidence of modification of vocal production, though more recent studies report some modification in production (although nothing like learning a different language). There is some evidence for learning the contexts in which calls are to be used, and considerable support for primates being able to learn to understand calls even those of other species (reviewed by Seyfarth and Cheney, 1997). Dialects within a species suggest that learning vocalizations might be possible, and the discovery of dialects in birdsong was a major force leading to studies of song learning. In primates we have little evidence so far for dialects, but Mitani and colleagues (Mitani and Brandt, 1994; Mitani et al., 1999) have demonstrated that different populations of chim-
Social Processes and Evolution of Complex Communication
139
panzees have different forms of pant-hoots that cannot be accounted for by individual differences alone. De la Torre (1999) has reported differences in structures of both J-calls and long calls in populations of pygmy marmosets 120 km apart in the Amazon, and these differences remain after accounting for individual differences in call structure. In these studies genetic variation might be responsible for dialects. However, both Green (1975b) and Masataka (1992) have described dialects in calls of provisioned Japanese macaques and suggested that these dialects have developed through reinforcement by humans providing food. Thus some plasticity in vocal production does exist. Since few primatologists study more than one population of the same species, we may be underestimating the possibility of dialects. I have already described babbling in pygmy marmosets, noting that with increasing age, monkeys produce more calls that are recognizably adult and the quality of production of trill vocalizations improves. Similar results have been reported for scream vocalizations in macaques (Gouzoules and Gouzoules, 1989, 1997) and grunt vocalizations in vervet monkeys (Seyfarth and Cheney, 1986). In each of these cases infant calls were recognizable, but imperfect, versions of adult calls. Castro and Snowdon (2000) found that infant cotton-top tamarins (up to 20 weeks of age) rarely responded with the appropriate calls in situations that reliably elicited different forms of chirp vocalizations in adults. Rather, infants produced a “protochirp” that had the general structural form of the adult chirps but was not differentiated according to context. These protochirps were given in series of two to four notes, whereas adults typically produced single chirps. On some tests infants occasionally gave an appropriate chirp type, but this was not observed universally across all infants. If an infant once gave an appropriate chirp in an appropriate context, it rarely gave that chirp in a subsequent test. Thus chirps were not universally elicited in infants and, once given, rarely appeared again in infancy, suggesting that neither innate nor maturational factors alone can account for chirp production. However, infants were sensitive to context and inhibited their production of “protochirps” on trials involving alarming stimuli. One chirp type, given in feeding contexts, appeared more often and more consistently than the others. Here there may be evidence of teaching. In cooperatively breeding marmosets and tamarins, fathers and older siblings not only assist in infant care but also provision infants with solid food at the time of weaning. At least five species, including pygmy marmosets and cotton-top tamarins, show active offering of food to the infant (Feistner and Price, 1991). Typically a male selects a piece of food and vocalizes with a more rapid and louder-than-normal series of the same chirps adults give when feeding (Roush and Snowdon, 2001). Interestingly, this feeding call was the one chirp type that most infants used and the one type that occurred most often over subsequent trials in our tests (Castro and Snowdon,
140
Charles T. Snowdon
2000). Cotton-top tamarin infants can receive food from adults only when the adult vocalizes. A nonvocalizing adult rarely shares food. Frequently, other group members orient toward the animal sharing food and give the same rapid sequence of food calls. Infants that participate in food sharing at the earliest age feed independently and use appropriate vocalizations in their independent feeding at an earlier age (Roush and Snowdon, 2001). Rapaport (1999) has reported that captive golden lion tamarin adults more often share food that is rare, difficult to obtain, or unfamiliar to infants. More recently she has reported (Rapaport and Ruiz-Miranda, 2002) observations from the wild of adult lion tamarins giving intense food calls to older infants that approach and subsequently find an otherwise hidden prey animal located near the vocalizing adult. These last studies come very close to being examples of teaching infants not only what foods to eat but also what vocalizations to use. Furthermore, the examples from wild lion tamarins are suggestive of scaffolding as infants mature. No evidence for any sort of teaching infants about food has been seen in other monkeys (King, 1994), and we have only limited evidence of teaching in chimpanzees (Boesch, 1991). Tamarins also use social learning to avoid noxious foods. We selected two familiar foods with equally high preference (tuna and peaches), and then added white pepper to the tuna in a 1 : 40 ratio and presented this adulterated tuna once each week to groups of tamarins in alternation with unadulterated peaches. Over three weeks only 33 percent of monkeys ever sampled the previously highly preferred tuna, and after we returned to normal, unadulterated tuna in subsequent weeks, three out of eight groups failed to eat tuna again over at least 15 weeks. Interest in and consumption of peaches was as high as it had been in pilot tests. In additional studies we presented other monkeys with both regular and pepperadulterated tuna that could be seen and smelled, but not tasted. The tamarins showed no differences in time spent with either type, suggesting that odor cues could not be involved in avoiding tuna (Snowdon and Boe, 2003). What led 67 percent of monkeys not even to sample a previously preferred food? The animals that sampled tuna decreased food calling and increased alarm calling, and also showed visual signs of disgust—head shaking, chin wiping, retching—so that both vocal and visual signals were involved. No other studies of monkeys have found evidence for learning to avoid noxious foods (Galef and LeFebvre, 2001), and a similar study using capuchin monkeys and pepperadulterated food found no evidence of social learning. Each capuchin monkey sampled the food (Visalberghi and Addessi, 2000). Among the consequences of cooperative breeding may be greater attention to social cues provided by other group members, leading to both increased social learning among all group members, compared with other primates, and the potential for teaching infants through food sharing (Coussi-Korbel and Fragaszy, 1995).
Social Processes and Evolution of Complex Communication
141
Marmosets and tamarins also change vocal structure as adults. Studies in birds (e.g., Nowicki, 1989; Brown and Farabaugh, 1997), in dolphins (Tyack and Sayigh, 1997; McCowan and Reiss, 1997), and in bats (Boughman, 1997, 1998) have shown that adult animals can alter vocal structure, usually in response to changes in social environments, such as forming a new pair bond, joining a new social group, or forming a coalition with a new partner. The trill structure of pygmy marmosets is also responsive to social change. When we brought several new groups of marmosets into our colony, we recorded trills both during quarantine and after the new groups were housed in the same colony room as our original group. Both new and old colony members of all ages changed trill structure within six weeks after being housed in the same room (Elowson and Snowdon, 1994). We also formed new pairs, and found rapid convergence of trill structure within six weeks, with “his call” and “her call” converging into “their call.” We followed some pairs for an additional three years, and although various parameters of trills changed over those years, each member of a pair still produced trills similar to those of its mate (Snowdon and Elowson, 1999). We found similar social influences on food calls of young tamarins. Infant tamarins begin to produce food calls while feeding independently, and, as suggested above, these calls may emerge from the process of food sharing. Nonetheless, infant calls do not have the structural regularity of adult calls. We tested tamarins of different ages, from the end of infancy through puberty and beyond, expecting to see progression toward adult structure with increasing age. To our surprise, we found no development change through this period. From 4 to 28 months, tamarins continued to produce imperfect food calls, and in feeding tests also used many other call types not used by adults (Roush and Snowdon, 1994). Regardless of whether animals have reached puberty or not, offspring of cooperatively breeding monkeys remain socially subordinate and rarely reproduce while in their natal group. Perhaps the infantile calling in feeding contexts is not due to an inability to produce adult calls, but a means of communicating subordinate status. To test this hypothesis, we followed several tamarins from their natal group to their pairing with a new mate. We observed a very rapid decrease in nonfeeding calls during food tests, and a slower, but still rapid decrease in the proportion of imperfect food calls. Within three to eight weeks, all newly paired tamarins were using adult communication patterns in feeding (Roush and Snowdon, 1999). These results suggest that inhibitory processes arising from social interactions can influence vocal production. I have focused on the possible role of experience in shaping vocal production and vocal usage because it is in these areas that there has been the least evidence of plasticity. There is wide acceptance that primates are extremely flexible in responding to signals from others. Owren et al. (1992, 1993) conducted an important cross-fostering study where
142
Charles T. Snowdon
infant Japanese macaques and rhesus macaques were placed with mothers of the opposite species. Although a prior study reported some subtle changes in vocal structure (Masataka and Fujita, 1989), Owren et al. (1992, 1993) found no evidence that infants of one species acquired the calls of their foster parents. However, foster mothers understood the calls of their infants and responded appropriately to their calls. Other studies have reported that vervet monkeys respond to alarm calls of superb starlings (Hauser, 1988) and ringtailed lemurs respond to calls of sifakas (Oba and Masataka, 1996), and that Diana’s monkeys and Campbell’s monkeys understand the meaning of each other’s predator calls (Zuberbühler, 2000, 2002). Thus primates can learn to respond to other calls. These examples indicate that nonhuman primates display flexibility in vocal structure, with change being facilitated by some social interactions and inhibited by others. Especially among the cooperatively breeding marmosets and tamarins, there are intriguing developmental parallels: babbling that appears to have some features of human infant babbling, and possible teaching about which foods to eat, the appropriate vocalizations to use, and where to find food. The relatively slow development of adult trill structure in pygmy marmosets, and of adult chirp structure and usage in cotton-top tamarins, implies that these monkeys do not have an innate ability to produce calls with adult structure in the appropriate context. Moving beyond development, we see that marmosets and tamarins are highly vocal and have an elaborate vocal repertoire that can be used flexibly—with alarm calls denoting both predators and noxious foods. They appear to understand their location relative to other group members—as when pygmy marmosets selectively use different calls at close distances and when they are farther from other group members. All of these findings argue against the idea that primate calls are stereotyped, innate, and reflexive; social factors clearly affect both development and adult usage of calls. How can the brain of a tamarin, which is the size of a Brazil nut, or the even smaller brain of the pygmy marmoset produce this degree of sophisticated communication? They do not have the complexity of brain structure of the great apes or humans; in fact, the cortex is smooth, with none of the convolutions of an ape or cetacean brain. In short, they do not have the neurological structures that are associated with language, and they probably lack important cognitive skills found in great apes. But I think they have some of the essential social components for language. That is, in the nature of social interactions involved in successful cooperative rearing of infants, there has emerged a complexity of vocal communication, an intense involvement by all group members—not just the mother—in caring for infants that requires greater sensitivity to signals from others, as well as a social flexibility that involves not just rapid changing of roles throughout the day but also flexible use of vocal signals. A pair bond is essential for successful infant rearing, and this, along with the need of each family to differentiate itself from others, leads to flexible changes in vocalizations
Social Processes and Evolution of Complex Communication
143
throughout life. Each territory has different fruit and insect resources, some of which can be toxic, which leads to the development of rudimentary instructional methods about appropriate foods. How Might Language Have Evolved? How do we move from primate communication to speech and language? Some of the basic framework for speech and language can be seen in one or another primate species. Various nonhuman species can quickly learn to discriminate human speech contrasts, and in the number of signals available and the ability to combine signals into sequences, nonhuman primates have the potential complexity of signals on which a language might be built. Nonhuman primates are sensitive to the location and activities of unseen group members and adjust their communication appropriately. There are important developmental parallels between cooperatively breeding primates and vocal development in human infants, to the point where tamarin parents display an apparent “teaching” of infants about what food to eat and how to communicate about it. The babbling of infant marmosets contains a variety of signals that would appear to be functionally inappropriate from an adult perspective, and the function of babbling appears to change over time from vocal practice to avoiding conflict by communicating subordinate status. Both primate communication and human speech and language are essentially social processes, and hence it is in social functions that evidence of language evolution should be evident (Snowdon, 2001). Even if we accept that some of the foundations for language exist in other species, we still face formidable problems in understanding the processes leading to the evolution of language. First, we must move from the silence of the apes to the noisy human ape. Other apes have expressive faces and elaborate gestural communication systems that humans still retain (Van Hooff, 1972; Snowdon, 2002). These gestural systems might be sufficient for other apes with a few conspicuous vocalizations to use when individuals are out of sight of each other. Humans have also developed highly complex gestural systems, most notably in sign languages, that are used to communicate with complexity and efficiency. However, there are advantages of vocal communication. Both human and nonhuman apes have highly overlapping visual fields that provide great binocular resolution but limit the field of vision to 70–80 degrees on either side of the midline. With an effective visual field of only 140–160 degrees, we miss more than half of the visual world around us. By contrast, auditory communication can be perceived from all directions. Gestures and facial expressions are not easily perceived in the dark or in heavy forests or the thick grasses of the savannas. Furthermore, gesturing means that our hands cannot be used for other functions, such as carrying weapons or other goods or tools.
144
Charles T. Snowdon
Those of our ancestors who could exploit the preexisting features of the auditory system to create a primitive vocal language would have gained adaptive significance over other apes by extending the range of habitat, carrying tools and other objects, and being active in the dark while still communicating. Individuals who could expand on vocal communication would be able to engage in more complex cognitive and social interactions with more individuals, which would, in turn, lead to increased cranial size and a more complex neocortex—as Dunbar (2001; chapter 14 in this volume) has argued—specialized for vocal production and auditory analysis. Words may have developed from two different sources. Some monkeys have referential signals that refer to high-quality food or different types of predators. Expanding beyond the ability to refer to predators and food sources to other objects in the day-to-day environment would have been relatively easy and adaptive, once the first referential signals emerged. A request for objects is another route toward words. Many apes and some monkeys point to objects, and tamarin vocalizations in food sharing may be vocal requests. Human infants use grunts to request objects and, according to McCune (1999), these grunts are subsequently replaced by words. Simple finite state grammars are found in several Neotropical primates: marmosets, tamarins, titi monkeys, and capuchin monkeys. In some cases the sequences represent the function of each call type, as when tamarins combine an alarm call and an affiliative call to tentatively signal the end of freezing after a threat (Cleveland and Snowdon, 1982), or when two forms of aggressive calls are sequenced together at the peak of a territorial encounter (McConnell and Snowdon, 1986). Other chapters in this volume also have focused on social processes in language evolution. Thus, kin selection is at the heart of honest animal communication and human language (see Fitch, chapter 15 in this volume), with kin selection becoming even more necessary in humans due to the extended period of infant dependence and cooperative care of infants. The idea that increased group size is related to increased brain size and that vocal gossip replaced grooming to maintain social relationships is intriguing (Dunbar, chapter 14 in this volume), but this alone does not explain the complex communicative and social skills shown by marmosets and tamarins, which have relatively small brains, live in small groups of five to ten, and both socially groom and keep in touch through antiphonal calling. Nor does the grooming hypothesis explain the abilities of Alex (Pepperberg, chapter 10 in this volume) and Aibo (Steels, chapter 5 in this volume) or the evolution of words and syntax. Social interactions have driven language evolution. Those ancestors who could communicate most effectively about food, shelter, predators, and location of good food resources may have had higher reproductive success, thus leaving behind more offspring. Those parents who could teach communication skills to their offspring would have higher
Social Processes and Evolution of Complex Communication
145
reproductive success than those unable to teach offspring. Within families, those with communication skills that could be used to coordinate infant care, food finding, and predator defense would have a higher reproductive success. Although there are also increased benefits to families that collect together into larger social groups for mutual defense, cooperative hunting, organization of foraging parties, and between-family division of labor, the essential unit for child rearing and language learning is the family, not the society. A key emerging construct is the importance of joint attention in language learning, developing about nine months of age in human infants (Tomasello, 1999; Sinha, chapter 12 in this volume). In children, the parrot Alex (Pepperberg, chapter 10 in this volume), the robot Aibo (Steels, chapter 5 in this volume) and the model I have presented for developing species-typical communication skills in cooperatively breeding monkeys, some aspect of joint attention is necessary. The development of joint attention requires a close social relationship between tutor(s) and learner that is more likely to emerge in families than in larger social groups. However, aggregation of families into groups composed of multiple families would create pressures for more effective management of social relationships within these groups. If increased complexity of communication led to increased success in cooperative hunting, cooperative child care, cooperative foraging, and cooperative defense, then those ancestors who had more effective communication skills would have had a clear advantage. Learning and plasticity are key features of this scenario. Learning is a much more efficient mechanism for promoting change than waiting for mutations to occur. Learned skills can pass relatively quickly through a population. Closed social groups are more likely to consist of related individuals (see Fitch, chapter 15 in this volume), and these groups would lead to divergence in communication signals that result in dialects or distinct languages. Closed social groups have value in defense against conspecifics. However, exogamy to avoid inbreeding places a high value on flexibility to communicate with different groups. Thus, an individual that can alter communication patterns as an adult will be more likely to find a mate in another group and to accommodate to the social patterns of the mate’s group. Although teaching is important, learning also emerges as a self-organizing process. Thus, although most songbirds require tutoring in order to learn song, a group of four naive zebra finches raised together will develop normal song without a tutor (Volman and Khanna, 1995). Deaf children with speaking parents develop complex gestural signals to communicate with their parents (Goldin-Meadow, 1997), and an isolated population of deaf people in Nicaragua created its own sign language system (Kegl et al., 1996). These last examples suggest the presence of a biological platform on which communication can be constructed, as well as the powerful motivation of social communication that leads severely disadvantaged birds or people to create an effective communication system.
146
Charles T. Snowdon
As the needs of our ancestors became more efficiently met, and more spare time was available, speech could be used in a more playful, creative way, leading to puns, poems, and polemics. Miller (2000) has argued that creative and fluent use of language became a form of sexual selection, making those with high levels of language skills more attractive as mates than those less skilled. If language has become a critical component of sexual selection for humans, then the rapid increase in complexity of both vocabulary and grammar well beyond the satisfaction of basic needs becomes understandable, and to be maximally effective, one needs to start learning these skills before the age of mating. The data from nonhuman animals suggests that some of the basic foundations on which language is constructed have been present for a relatively long time in evolution. Furthermore, the requirements for managing social relationships and producing the joint attention needed for complex communication systems can be found in nonhuman and even nonanimate systems. Flexibility throughout life allows communication about novel events or objects, and also facilitates communication with other groups. Strangely, language is rarely treated as a social process, with much more study directed to formal structural and cognitive processes. The essential social component of language needs both greater study and incorporation into models of language evolution. I have argued here that productive models for understanding language origins can be found outside of the apes, especially in species with cooperative infant care and close-knit social groups that mimic those of modern humans. Acknowledgments My research and preparation of the chapter were supported by USPHS grants MH29775 and MH00177. I thank D. Kimbrough Oller, Ulrike Griebel, and W. Tecumseh Fitch for their careful reading and helpful critique of an earlier version of this chapter. References Baptista LF, Gaunt SLL (1997) Social interaction and vocal development in birds. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 23–40. Cambridge: Cambridge University Press. Bates E, Marchman V (1988) What is and is not universal in language acquisition. In: Language Communication and the Brain (Plum F, ed.), 19–38. New York: Raven Press. Bates E, Thal D, Finlay B, Clancey B (2003) Early language development and its neural correlates. In: Handbook of Neuropsychology, vol. 8, Part II Child Neurology. Amsterdam: Elsevier. Bauers KA, Snowdon CT (1990) Discrimination of chirp variants in the cotton-top tamarin. Amer J Primatol 21: 53–60. Boesch C (1991) Teaching in wild chimpanzees. Anim Behav 41: 530–532. Boughman JW (1997) Greater spear-nose bats give group distinctive calls. Behav Ecol Sociobiol 40: 61–70.
Social Processes and Evolution of Complex Communication
147
Boughman JW (1998) Vocal learning by greater spear-nosed bats. Proc Roy Soc London B265: 227–233. Brown ED, Farabaugh SM (1997) What birds with complex social relationships can tell us about vocal learning: Vocal sharing in avian groups. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 98–127. Cambridge: Cambridge University Press. Castro NA, Snowdon CT (2000) Development of vocal responses in infant cotton-top tamarins. Behaviour 137: 629–646. Cleveland J, Snowdon CT (1982) The complex vocal repertoire of the adult cotton-top tamarin (Saguinus oedipus oedipus). Zeit Tierpsych 58: 231–270. Coussi-Korbel S, Fragaszy DM (1995) On the relation between social dynamics and social learning. Anim Behav 50: 1441–1453. De la Torre S (1999) Environmental correlates of vocal communication in wild pygmy marmosets, Cebuella pygmaea. Ph.D. dissertation, University of Wisconsin, Madison. De la Torre S, Snowdon CT (2002) Environmental correlates of vocal communication in wild pygmy marmosets, Cebuella pygmaea, Anim Behav 63: 847–856. Dunbar RIM (1997) Grooming, Gossip and the Origins of Language. Cambridge, Mass.: Harvard University Press. Dunbar RIM (2001) Brains on two legs: Group size and the evolution of intelligence. In: Tree of Origin (de Waal FBM, ed.), 173–191. Cambridge, Mass.: Harvard University Press. Elowson AM, Snowdon CT (1994) Pygmy marmosets, Cebuella pygmaea, modify vocal structure in response to changed social environment. Anim Behav 47: 1267–1277. Elowson AM, Snowdon CT, Lazaro-Perea C (1998) Infant “babbling” in a nonhuman primate: Complex vocal sequences with repeated call types. Behaviour 135: 643–664. Feistner ATC, Price EC (1991) Food offering in New World primates: Two species added. Fol Primatol 57: 165–168. Galef BG, Jr, LeFebvre L (2001) Social influences on foraging in vertebrates: Causal mechanisms and adaptive functions. Anim Behav 61: 3–15. Gardner RA, Gardner BT (1969) Teaching sign language to a chimpanzee. Science 165: 664–672. Goldin-Meadow S (1997) The resilience of language in humans. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 293–311. Cambridge: Cambridge University Press. Gouzoules H, Gouzoules S (1989) Design features and developmental modification of pigtail macaque, Macaca nemestrina, agonistic screams. Anim Behav 37: 383–401. Gouzoules H, Gouzoules S (1997) Recruitment screams of pigtail monkeys (Macaca nemestrina): Ontogenetic perspectives. Behaviour 132: 431–450. Green S (1975a) Variation of vocal pattern with social situation in the Japanese monkey (Macaca fuscata): A field study. In: Primate Behavior, vol. 4 (Rosenblum LA, ed.), 1–104, New York: Academic Press. Green S (1975b) Dialects in Japanese monkeys, vocal learning and cultural transmission of locale specific behavior? Zeit Tierpsych 38: 304–314. Harcourt AH, Stewart KJ (2001) Vocal relationships of wild mountain gorillas. In: Mountain Gorillas: Three Decades of Research at Karisoke (Robbins MM, Sicotte P, Stewart, KJ, eds.), 241–262. Cambridge: Cambridge University Press. Hausberger M (1997) Social influences on song acquisition and sharing in the European starling (Sturnus vulgaris). In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 128–156. Cambridge: Cambridge University Press. Hauser MD (1988) How vervet monkeys learn to recognize starling alarm calls: The role of experience. Behaviour 105: 187–201. Hauser MD, Newport EL, Aslin RN (2001) Statistical learning of the speech stream in a non-human primate: Statistical learning in cotton-top tamarins. Cognition 78: B53–B64.
148
Charles T. Snowdon
Hayes KJ, Nissen CJ (1971) Higher mental functions of a home-raised chimpanzee. In: Behavior of Nonhuman Primates (Schreier AM, Stollnitz F, eds.), 59–115. New York: Academic Press. Hopkins WD, Savage-Rumbaugh ES (1991) Vocal communication as a function of differential rearing experience in Pan paniscus: A preliminary report. Internat J Primatol 2: 559–583. Hrdy SB (1999) Mother Nature. New York: Ballantine. Jakobson R, Halle M (1956) Fundamentals of Language. The Hague: Mouton. Johnson JS, Newport EM (1989) Critical periods effects in second language learning: The influence of maturational state on the acquisition of English as a second language. Cognit Psych 21: 60–99. Kegl J, Senghas A, Coppolla M (1996) Creation through contact: Sign language emergence and sign change in Nicaragua. In: Comparative Grammatical Change: The Intersection of Language Acquisition, Creole Genesis and Diachronic Syntax (de Graff M, ed.), Hillsdale, N.J.: Lawrence Erlbuam. King BJ (1994) The Information Continuum. Santa Fe, N.M.: School of American Research Press. Kluender KR (1994) Speech perception as a tractable problem in cognitive science. In: Handbook of Psycholinguistics (Gernsbacher MA, ed.), 173–217. San Diego: Academic Press. Lieberman P (1975) On the Origins of Language: An Introduction to the Evolution of Human Speech. New York: Macmillan. Lillehei RA, Snowdon CT (1978) Individual and situational differences in the vocalizations of young stumptail macaques, Macaca arctoides. Behaviour 65: 270–281. Marler P (1970) Birdsong and speech development: Could there be parallels? Amer Sci 58: 669–674. Masataka N (1992) Attempts by animal caretakers to condition Japanese macaque vocalizations result inadvertently in individual specific calls. In: Topics in Primatology, vol. 1 (Nishida T, McGrew WC, Marler P, Pickford M, deWaal FBM, eds.), 271–278. Tokyo: University of Tokyo Press. Masataka N, Fujita K (1989) Vocal learning of Japanese and rhesus monkeys. Behaviour 109: 191–199. Matsuzawa T (1996) Chimpanzee intelligence in nature and in captivity: Isomorphism of symbol use and tool use. In: Great Ape Societies (McGrew WC, Marchant LF, Nishida T, eds.), 196–209. Cambridge: Cambridge University Press. McConnell PB, Snowdon CT (1986) Vocal interactions between unfamiliar groups of captive cotton-top tamarins. Behaviour 97: 273–296. McCowan B, Reiss D (1997) Vocal learning in captive bottlenose dolphins: A comparison with human and nonhuman animals. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 178–207. Cambridge: Cambridge University Press. McCune L (1999) Children’s transitions to language: Human model for the development of the vocal repertoire in extant and ancestral primate species? In: The Origins of Language: What Nonhuman Primates Can Tell Us (King BJ, ed.), 269–306. Santa Fe, N.M.: School of American Research Press. Miles HM (1983) Apes and language. In: Language in Primates (de Luce J, Wilder HT, eds.), 43–61. New York: Springer-Verlag. Miller G (2000) The Mating Mind. New York: Doubleday. Mitani J (1996) Comparative studies of African ape vocal behavior. In: Great Ape Societies (McGrew WC, Marchant LF, Nishida T, eds.), 241–254. Cambridge: Cambridge University Press. Mitani JC, Brandt KL (1994) Social factors influence acoustic variability in the long distance calls of male chimpanzees. Ethology 96: 233–252. Mitani JC, Hunley KL, Murdoch ME (1999) Geographic variation in the calls of wild chimpanzees: A reassessment. Amer J Primatol 47: 133–151. Morrel-Samuels P, Herman LM (1993) Cognitive factors affecting comprehension of gesture language signs: A brief comparison of dolphins and humans. In: Language and Communication: A Comparative Perspective (Roitblatt HL, Herman LM, Nachtigall PE, eds.), 311–327. Hillsdale, NJ: Lawrence Erlbaum. Nowicki S (1989) Vocal plasticity in captive black-capped chickadees: The acoustic basis of call convergence. Anim Behav 37: 64–73.
Social Processes and Evolution of Complex Communication
149
Oba R, Masataka N (1996) Interspecific responses of ring tailed lemurs to playback of antipredator alarm calls given by Verraux’s sifakas. Ethology 102: 441–453. Oller DK (2000) The Emergence of the Speech Capacity. Mahwah, N.J.: Lawrence Erlbaum. Owren MJ, Dieter JA, Seyfarth RM, Cheney DL (1992) “Food” calls produced by adult female rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques, their normally-raised offspring and offspring cross-fostered between species. Behaviour 120: 218–231. Owren MJ, Dieter JA, Seyfarth RM, Cheney DL (1993) Vocalizations of rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques cross-fostered between species show evidence of only limited modification. Devel Psychobiol 26: 389–406. Payne RB, Payne LL (1997) Field observations, experimental design and the time and place of learning bird songs. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 57–84. Cambridge: Cambridge University Press. Petersen MR, Beecher M, Zoloth S, Moody D, Stebbins, W (1978) Neural lateralization of species-specific vocalizations by Japanese macaques (Macaca fuscata) Science 202: 324–327. Pinker S (1994) The Language Instinct. New York: William Morrow. Pola YV, Snowdon CT (1975) The vocalizations of pygmy marmosets (Cebuella pygmaea). Anim Behav 23: 826–842. Provine RR (2000) Laughter: A Scientific Investigation. New York: Viking Penguin. Rapaport LG (1999) Provisioning of young in golden lion tamarins (Callitrichidae, Leontopithecus rosalia): A test of the information hypothesis. Ethology 105: 619–636. Rapaport LG, Ruiz-Miranda C (2002) Tutoring in wild golden lion tamarins. Internat J Primatol 23: 1063–1070. Robinson JG (1979) An analysis of the organization of vocal communication in the titi monkey, Callicebus moloch. Z Tierpsychol 49: 381–405. Robinson JG (1984) Syntactic structures in the vocalizations of wedge-capped capuchin monkeys, Cebus olivaceus. Behaviour 90: 46–79. Roush RS, Snowdon CT (1994) Ontogeny of food-associated calls in cotton-top tamarins. Anim Behav 47: 263–273. Roush RS, Snowdon CT (1999) The effects of social status on food-associated calling behavior in captive cottontop tamarins. Anim Behav 58: 1299–1305. Roush RS, Snowdon CT (2001) Food transfers and the development of feeding behavior and food-associated vocalizations in cotton-top tamarins. Ethology 107: 415–429. Rumbaugh DM (ed.) (1977) Language Learning by a Chimpanzee. New York: Academic Press. Saffran JR, Aslin RN, Newport EL (1996) Statistical learning by 8-month-old infants. Science 274: 1926–1928. Savage-Rumbaugh ES, Lewin R (1994) Kanzi: The Ape on the Brink of Human Mind. New York: John Wiley. Savage-Rumbaugh ES, Murphy J, Sevcik RA, Brakke KE, Williams S, Rumbaugh DM (1993) Language comprehension in ape and child. Monog Soc Res Child Devel 58. Seyfarth RM, Cheney DL (1986) Vocal development in vervet monkeys. Anim Behav 34: 1640–1658. Seyfarth RM, Cheney DL (1997) Some general features of vocal development in nonhuman primates. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 249–273. Cambridge: Cambridge University Press. Snowdon CT (1982) Linguistic and psycholinguistic approaches to primate communication. In: Primate Communication (Snowdon CT, Brown CH, Petersen MR, eds.), 212–238. New York: Cambridge University Press. Snowdon CT (1999) An empiricist view of language evolution and development. In: The Origins of Language: What Nonhuman Primates Can Tell Us (King BJ, ed.), 79–114. Santa Fe, N.M.: School of American Research Press. Snowdon CT (2001) From primate communication to human language. In: Tree of Origin (De Waal FBM, ed.), 195–227. Cambridge, Mass.: Harvard University Press.
150
Charles T. Snowdon
Snowdon CT (2002) Expression of emotion in nonhuman animals. In: Handbook of Affective Science (Davidson RJ, Scherer K, Goldsmith HH, eds.), 457–480. New York: Oxford University Press. Snowdon CT, Boe CY (2003) Social communication about unpalatable foods in tamarins. J Comp Psych 117: 142–148. Snowdon CT, Elowson AM (1999) Pygmy marmosets modify call structure when paired. Ethology 105: 893–908. Snowdon CT, Elowson AM (2001) “Babbling” in pygmy marmosets: Development after infancy. Behaviour 138: 1239–1248. Snowdon CT, French JA, Cleveland J (1983) Responses to context- and individual-specific cues in cotton-top tamarin long calls. Anim Behav 31: 92–101. Snowdon CT, Hodun A (1981) Acoustic adaptations in pygmy marmoset contact calls: Locational cues vary with distances between conspecifics. Behav Ecol Sociobiol 9: 295–300. Snowdon CT, Pola YV (1978) Interspecific and intraspecific responses to synthesized pygmy marmoset vocalizations. Anim Behav 26: 192–206. Sorensen AP, Jr (1967) Multilingualism in the northwest Amazon. Amer Anthro 69: 670–685. Tomasello M (1999) The Cultural Origins of Human Cognition. Cambridge, Mass.: Harvard University Press. Tyack PL, Sayigh LS (1997) Vocal learning in cetaceans. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 208–233. Cambridge: Cambridge University Press. van Hooff JARAM (1972) A comparative approach to the phylogeny of laughter and smiling. In: Nonverbal Communication (Hinde RA, ed.), 209–241. Cambridge: Cambridge University Press. Visalberghi E, Addessi E (2000) Response to changes in food palatability in tufted capuchin monkeys, Cebus apella. Anim Behav 59: 231–238. Volman SF, Khanna H (1995) Convergence of untutored song in group-reared zebra finches. J Comp Psych 109: 211–221. Weiss D, Garibaldi B, Hauser M (2001) The production and perception of long calls by cotton-top tamarins (Saguinus oedipus): Acoustic analyses and playback experiments. J Comp Psych 115: 258–271. West MJ, King AP, Freeberg TM (1997) Building a social agenda for the study of bird song. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 41–56. Cambridge: Cambridge University Press. Zoloth SR, Petersen MR, Beecher MD, Green S, Marler P, Moody DB, Stebbins W (1979) Species-specific perceptual processing of vocal sounds by monkeys. Science 204: 870–872. Zuberbühler K (2000) Interspecies semantic communication in two forest primates. Proc Roy Soc London B267: 713–718. Zuberbühler K (2002) A syntactic rule in forest monkey communication. Anim Behav 63: 293–299.
9
Human Infant Crying as an Animal Communication System: Insights from an Assessment/Management Approach
Donald H. Owings and Debra M. Zeifman Introduction At five weeks old, my normally happy first child became a fussy baby. About midday one day, he began to cry. After trying all of my standard calming techniques to no avail, in frustration I finally told him, loudly, “STOP!” This stopped the crying, but I felt so guilty about yelling at my child that I began to cry, and called my mom to tell her what had happened. She reassured me that I wasn’t a bad mother. After several more days of this crankiness, my pediatrician diagnosed the problem as colic, and said that about all I could do about it was to hold my son as much as I could when he cried. I was very relieved that the problem wasn’t due to poor parenting on my part. Isaiah became progressively harder to please. So, I turned to a sling for carrying him, which kept him close to me at all times, provided easy nursing access, and freed my hands for other activities. He still cried intensely in the evenings, though, and I would initially walk, jiggle, and pat him, and then try calming him with a ride in his stroller. However, his crying would resume at the end of the ride, and I’d feel intense frustration and even anger at that time, and then guilt for the anger. I cried with him many times. A few times, I put him down in a safe place and left the room for a few seconds for some relief from the piercing screaming, but more often I used earplugs. The colic continued until Isaiah was right about 3 months old, when I finally got my happy baby back! (Anna Owings-Heidrick, personal communication)
The power of crying both to motivate caregiving and to distress mothers is illustrated by this narrative, which provides a typical description of a new mother’s experience with a colicky infant. However, crying’s evocative power is not confined to mothers. One of us (D.Z.), for example, has noted a reliable phenomenon when reviewing research audiotapes of infant crying in the lab: if the door is open even a crack and the sound of a crying baby escapes, students, colleagues, and secretaries rush in to check on the crying baby. The unique potency of human infant crying to mobilize a response by most adults is striking. Human infant crying is, to put it simply, a signal that cannot be ignored. Partly because of this potency, human infant crying has been studied intensively (see Zeifman, 2001b, for a review). One goal of this chapter is to identify insights from the crying literature that can enhance our understanding of animal communication. However, our primary goal is to explore the insights into crying provided by an assessment/ management (A/M) approach to animal communication (Owings and Morton, 1998). We will begin with an introduction to A/M and its historical roots, which include attachment theory (AT). A/M and AT together provide a conceptual framework for discussing the attachment bond that develops between human infant and caregiver, and how crying functions in the context of that relationship. We will argue that efforts to understand crying by infants have paid insufficient attention to the role of caregivers. Caregivers are the intermediaries through whom crying must work, but they are also individuals with agendas
152
Donald H. Owings and Debra M. Zeifman
that extend beyond the realm of parenting. For crying to be effective, it must have the power to capitalize on the motivational and emotional systems that are central to structuring caregivers’ activities. Historical Roots of an Assessment/Management Approach According to Von Uexküll, an organism’s Umwelt or “self-world” is founded on both its perceptual abilities and the repertoire of effector activities it uses to operate on its environment (von Uexküll, 1934/1957; Burghardt, 1998). A significant portion of the Umwelt consists of Kumpanen (companions) that have been evolutionarily significant to the species, such as enemies, parents, offspring, mates, and peers. Kumpanen are characterized not only by the stimulus patterns whereby they are recognized but also by the typical interactions for which they are available. Lorenz’s approach to the study of behavior shared much with von Uexküll’s (Lorenz, 1970a). His concepts of sign stimuli, releasers, innate releasing mechanisms, and fixed action patterns formalized von Uexküll’s Umwelt idea. Lorenz’s research included his studies of imprinting, an important process whereby animals develop the ability to recognize and become attached to companions. His imprinting studies in turn influenced John Bowlby, who developed attachment theory (AT) (Bowlby, 1969, 1982), an approach that today continues to guide the study not only of infant–parent relationships but also of a variety of other categories of social relationships (Cassidy and Shaver, 1999). Bowlby proposed the then revolutionary idea that human infants, like those of precocial species, possess evolved imprinting-like mechanisms that guide the formation of an infant–caregiver bond. Like those of von Uexküll and Lorenz, Bowlby’s approach implied that the perception and action systems of human infants need to be understood in terms of the properties of the infant’s most significant initial companion, typically its mother. Bowlby also added a control-theory analogy that treated organisms as feedback-sensitive regulatory systems and, like von Uexküll’s conception, highlighted two-way interplay between organism and environment. According to Bowlby, feedback-sensitive “attachment behavior” serves to maintain the proximity or availability of the attachment figure. Bowlby’s view that social behavior involves regulating the behavior of others is shared by an assessment/management (A/M) approach to animal communication (Hennessy, 1981; Owings and Hennessy, 1984; Owings and Morton, 1998), and is readily applied to the problem of infant crying. According to A/M, communication is an interindividual process built upon two equally important categories of individual behavior, assessment and management (roughly equivalent to receiving and sending, or mind reading and manipulating: Hennessy, 1981; Owings and Hennessy, 1984; Owings and Morton, 1998).
Human Infant Crying as an Animal Communication System
153
Assessment involves making self-interested adjustments to current circumstances based on the extraction of clues from other individuals and their contexts. Management involves self-interested efforts to maintain or change current circumstances by regulating (managing) the behavior of others. A key feature of this approach is the view that management works by capitalizing on the assessment systems of targets. In other words, crying works by capitalizing on the assessment systems of caregivers and, conversely, the assessment systems of caregivers are major determinants of the effectiveness of crying. Communicative systems originate and function through a dynamic interaction between self-interested management and self-interested assessment. Through processes of mutual regulation, each interactant plays both assessment and management roles in communication, and receives both proximate and ultimate payoffs. The novelty of such a regulatory approach to animal communication can best be understood in light of the history of animal communication theory. In the 1960s and 1970s, insights about behavior proliferated as a result of newly developed clarity about the logic of natural selection (Williams, 1966). The central insight of this newfound clarity was that natural selection favors ways of behaving that serve the behaver’s own reproductive interests, even when such behavior is detrimental to the other interactant. Two influential papers advanced our current understanding of the communicative process, one with a focus on signals (management: Dawkins and Krebs, 1978) and one emphasizing assessment (Zahavi, 1975). By referring to communication as “manipulation,” Dawkins and Krebs suggested that the essence of signaling was pragmatic, self-interested action rather than cooperative sharing of information. Zahavi raised the question of what limited manipulation, if exploitative deployment of signals is possible. His conclusion was that receivers are selected to make use only of reliable information, and that signals are more likely to be reliable if they are costly. Receivers, consequently, are selected to be responsive only to costly signals, which should impose a cost on (i.e., “handicap”) signalers. Thus, the combination of manipulation and handicapping introduced an ultimate framework for thinking about communication that took us beyond earlier ones and was more consistent with the logic of natural selection. A/M extended manipulation and handicapping in several ways (Hennessy, 1981; Owings and Hennessy, 1984). First, it synthesized the two approaches, pointing out how management and assessment mutually constrain and capitalize on one another. Second, the regulatory analogy central to A/M extended the pragmatic, self-interested logic of natural selection from ultimate to proximate time frames, thereby providing a more realistic view of the proximate dynamics of interacting than existing models did. Finally, A/M emphasized the active nature of assessment, a theme that has become increasingly visible in the communication literature (reviewed in Owings and Morton, 1998).
154
Donald H. Owings and Debra M. Zeifman
If we apply A/M to infant crying, our attention is directed to several avenues of research and theory that have been underexplored. For example, most work on the role of emotion in communication follows Darwin’s (1981/1871) emphasis on signal emission (management) rather than assessment (Scherer, 1992). However, emotion-mediated responses to signals (assessment) also have the potential to play a significant proximate and ultimate role in shaping signal structure and deployment (Owings, 1994). While AT leads us to consider the longer-term developmental effects of the social consequences of signaling, an understudied area in animal communication (West et al., 1994, 1997), A/M extends the scope to the less studied immediate effects of those social consequences. Finally, concerns about both immediate and developmental effects of social feedback highlight behavioral plasticity, an important but underexplored property of nonhuman communication (West et al., 1994; Snowdon, chapter 8 in this volume; Seyfarth and Cheney, 1997) and emotional communication by humans (Gustafson and Deconti, 1990; Gustafson and Green, 1991). Plasticity is central to the ontogeny of human infant crying (Ainsworth and Wittig, 1969; Bell and Ainsworth, 1972), but remains underexplored even in that literature. The above concepts imply that interactions involving crying, like other caregiver–infant interactions, are founded on mutual, self-interested regulatory processes whose mechanisms were naturally selected. These regulatory processes are sensitive to feedback in both immediate and developmental time frames, and each must be examined. Crying can be understood only in this social regulatory context, and must take into account the motivational and emotional systems of caregivers through which crying achieves its social effects. Crying and Attachment: The View through an A/M Lens Crying is central to most conceptualizations of attachment between infant and caregiver (Bowlby, 1969). The adaptive value of crying is perhaps most obvious for the infant: at a time when the infant is helpless to meet his or her own needs, crying is the “acoustical umbilical cord” that ties an infant to its source of sustenance (Ostwald, 1972). Crying provides a motivated context for the infant to associate the caregiver with rewarding transitions from distress to calm, and infants typically become attached to the individual who has most reliably responded to their cries. Separation protest, an infant’s crying at the departure of his or her primary caregiver, is widely considered the marker of a clear-cut attachment (Bowlby, 1969; Sroufe and Waters, 1977; Ainsworth et al., 1978). Furthermore, it is common to infer the existence of an attachment by placing the infant in a distressing situation and observing to whom the infant retreats for comfort (Harlow and Zimmermann, 1958). In contrast to peers and other social companions who provide stimulation and increase arousal, attachment figures are uniquely capable of alleviating distress and
Human Infant Crying as an Animal Communication System
155
reinstating calm. The capacity of the attachment figure to soothe distress sets him or her apart from other conspecifics and even from mother surrogates (Coe et al., 1983; Gandelman, 1992). According to Bowlby, the homeostatic “set goal” maintained by the attachment system centers on the infant’s need for security (or perceived safety: Bowlby, 1969). The infant’s sense of security varies not only with its degree of proximity to the caregiver but also with context, influenced by endogenous factors such as the infant’s age and state, as well as such exogenous factors as novelty or threat from the environment. The attachment system balances the infant’s need for security and desire for exploration by serving as a “safe haven” to which the infant can retreat when distressed, and as a “secure base” from which the infant can explore his or her environment. The presence of a concerned caregiver emboldens an infant to explore more freely, whereas the absence or inconsistent performance of an attachment figure inhibits exploration (Singh, 1975; Ainsworth et al., 1978). In fact, well-documented long-term consequences of depriving infant monkeys of a mother are timidity, fear of novelty, and inhibition of exploration (Kraemer, 1997; Suomi, 1997). Individual differences in infants’ use of caregivers to manage distress and facilitate exploration underlie Ainsworth’s description of patterns of attachment (Ainsworth et al., 1978). Following home observations, Ainsworth observed one-year-old infants in the “strange situation,” a laboratory procedure designed to activate the attachment system by exposing infants to a novel setting, the presence of a stranger, and the unexpected departure of the mother. Infants’ reactions in the laboratory reflected maternal responsiveness to infant signals, particularly crying, in the home. Mothers who had reliably responded to crying had “securely attached” infants who cried during separations but were easily comforted during reunions, and quickly returned to play. Mothers who repeatedly ignored crying had “avoidantly attached” infants who appeared unfazed by their departure, avoided contact with them upon reunion, and were equally friendly toward them and a stranger. Finally, mothers who responded inconsistently to infant crying had “ambivalently attached” infants who reacted to separation with distress and continued to cry even after the mother had returned. Upon being reunited, ambivalent infants vacillated between clinging to and resisting contact with their mothers, and hardly explored the laboratory environment. The degree of maternal responsiveness to crying, according to Ainsworth, largely predicts infants’ attachment classifications (Ainsworth et al., 1978). While Ainsworth viewed the secure attachment pattern as optimal, each attachment pattern might be viewed as an adaptive response to the relative availability of a primary caregiver. If the burden of providing safety and regulating infant distress is viewed as one that is shared between the infant and caregiver in a process of mutual self-regulation, dyads may differ in how the burden is apportioned. A secure infant who has experienced a
156
Donald H. Owings and Debra M. Zeifman
competent and reliable caregiver depends heavily on the caregiver for protection, and is therefore free to explore the environment unimpeded by constant vigilance. In contrast, an avoidant infant whose cries have been ignored, has learned to rely on himself rather than a rejecting caregiver. Finally, ambivalently attached infants who have sometimes received care and sometimes been ignored appear to shift back and forth between attempts at self-regulation and attempts at regulation through interaction with the caregiver. Such infants appear “preoccupied” with the presence of the caregiver, clinging to and monitoring the caregiver’s whereabouts even before her first departure and after her return. Infant adjustments to caregiver responsiveness support the idea of sensitivity of signalers to feedback from the environment, and the use of signals as probes, deployed as a means for assessing and adjusting to the social environment (Owings and Hennessy, 1984). Considering the infant’s inability to self-regulate emotionally or physiologically and his limited repertoire for avoiding danger, attachment behavior might be used not only to procure care and protection from adults but also as a probe for assessing the reliability of attachment figures. From an A/M perspective, what transpires between infant and caregiver reflects the interplay between the self-interested regulatory processes of both parties. Just as attachment outcomes can be viewed as adaptive adjustments to caregiver availability, so the relative availability of caregivers can be viewed as an adaptive caregiver adjustment to environmental conditions. Caregivers have agendas of their own, which include not only caregiving but also competing demands such as foraging (or its modern equivalents, working and shopping). Researchers have reliably shown that caregiving behavior is sensitive to two kinds of environmental inputs: (a) the availability of resources, such as food, essential to survival, and (b) risk to infants of injury and mortality. For example, increased foraging demand produces a greater incidence of insecure attachment among bonnet macaque mother–infant dyads (Andrews and Rosenblum, 1991), presumably because mothers are forced to forage more and thus are less attentive to their offspring. Rhesus monkey mothers apparently adjust their maternal behavior to the perceived vulnerability of their infants. Maternal status in the troop determines the infant rhesus monkey’s proximity to the mother, with higher-status mothers allowing greater distance (Suomi, 1999). Such variation may reflect the fact that the offspring of higher-status females are less vulnerable to aggression. Parenting practices of humans also are influenced by ecological variables. LeVine (1977) noted that societies with the highest infant mortality rates engage in the most indulgent infant care practices, such as frequent feedings, nearly constant carrying, and immediate response to infant crying. Even in modern societies where, compared with our evolutionary past, infant mortality rates are low and predators are rare, mothers routinely respond to even slight increases in risk to their infants by keeping their infants closer. Compare, for example, the distance from her offspring a mother will tolerate in her own
Human Infant Crying as an Animal Communication System
157
home versus the distance that will cause her to panic in a shopping mall. However, the greater incidence of insecure attachment (Crockenberg, 1981), child abuse, and neglect (Pelton, 1978; Steinberg et al., 1981) in low socioeconomic status samples suggests that when parental resources are low, even if infant mortality rates are elevated, the quality (and quantity) of caregiving will be compromised. In both human and nonhuman animals, the relative availability of caregivers to infants appears to be a function of resource availability and perceived environmental threat. Attachment researchers have emphasized the adaptiveness of attachment behavior for the infant and have neglected the value of the infant’s attachment for the attachment figure. The adaptive value of an attachment is obvious for the infant because his very survival depends on the investment of others. However, an A/M model suggests that attachment must have adaptive value for the parent as well. Obviously, the parent is motivated to provide care to propagate her own genes, but why is it in the parent’s interest to have an infant attach to her? One possibility is that when an infant becomes attached, a parent is no longer fully responsible for ensuring the infant’s safety. Following, calling, crying, and other attachment behaviors assist the parent in protecting the young by having the infant participate in regulating proximity and detecting environmental threats that the infant cannot handle alone. Having an attached infant frees a mother to attend to other items on her agenda while her infant is keeping an eye on her. Crying as a Reflection of Mutual Regulation The attachment system of infants and the caregiving system of older individuals are, according to Bowlby (1969), coevolved and codeveloped behavioral systems (see also George and Solomon, 1999). Early in an infant’s development, the outputs of the attachment and caregiving systems mutually comprise major components of the Umwelt of infant and caregiver. It is in this shared realm that caregivers and infants mutually regulate one another’s behavior and physiology (e.g., Hofer, 1987), and it is in this context that the structure and function of crying can be understood. Two additional features of the infant–caregiver relationship contribute to variation in the effectiveness of crying. First, whereas caregivers comprise almost the entire Umwelt of infants, infants are a much smaller part of the caregivers’ Umwelt, in which caregiving competes with other activities. Second, adults vary in the maturity and activation of the caregiving system, which mediates responsivity to crying. Crying as a Reflection of Management by Infants Crying is a highly variable signal (e.g., Wasz-Höckert et al., 1968; Wolff, 1969; Zeskind and Marshall, 1988; Gustafson and Green, 1989), a property that may reflect its variable
158
Donald H. Owings and Debra M. Zeifman
contexts and the adjustments necessary to maintain caregiver availability at a preferred value (cf. Hennessy, 1981; Owings and Hennessy, 1984). However, most studies of variation do not compare adjustments in crying by the same infant under differing conditions, but instead use between-subject comparisons to explore changes in crying associated with individual differences, such as in temperament (Lounsbury and Bates, 1982; Zeskind and Barr, 1997) or in perinatal and postnatal complications (Wasz-Höckert et al., 1968; Zeskind and Lester, 1978). Nevertheless, it has been shown that acoustic features of crying correlate with changes in an infant’s severity of distress, as predicted by a regulatory view (e.g., Wolff, 1987; Gustafson and Harris, 1990; Hopkins, 2000). The following description of human infant crying is based on the small number of reports of this latter sort, and deals with only a few of the many acoustic dimensions along which crying varies. This description also neglects nonacoustic features of crying, such as facial expressions and bodily movements (Wasz-Höckert et al., 1968; Wolff, 1987). Facial expressions undoubtedly work through visual channels, but bodily movements might well function through tactile channels for infants already in contact with caregivers (Barr, 1990b). This multimodal property of crying (Owings and Hennessy, 1984; Partan and Marler, 1999) is a rich area for future research. As the time for the next feeding approaches, infants become increasingly likely to cry (Bernal, 1972). Crying typically begins with “fussing,” consisting of intermittent moaning or crylike sounds more than 3 seconds apart, and emitted arhythmically. This crying is sensitive to immediate feedback; if caregiving follows, and especially if it involves holding and/or feeding, crying ceases more than 75 percent of the time (Bell and Ainsworth, 1972). If, however, caregiving is not forthcoming, the infant typically progresses to rhythmic “phonated” crying, involving harmonically structured vocalizations usually with a rise-fall melody; these cries are repeated rhythmically, less than three seconds apart. Continued absence of caregiver response is associated with shortening of individual cries, an overall downward shift in the spectral frequencies emphasized, and an increase in fundamental frequency. Crying may also become more “dysphonated,” as a result of forceful exhalations obscuring the harmonic structure of cries (Murray, 1979; Zeskind, 1985; Wolff, 1987; Zeskind et al., 1993; Green et al., 1998). In addition to the normal range of modifications, injury or other painful stimulation will extend the variation of cries further. Pain cries begin abruptly rather than with gradual escalation, and often involve “hyperphonated” cries, characterized by exceptionally high F0s. The initial pain-evoked cry can be exceptionally long, typically has a falling melody, and is followed by extended breath holding before production of subsequent cries. The early cries in pain- and hunger-induced crying bouts are most identifiable as arising from those specific causes, with later cries in the same bouts converging structurally and
Human Infant Crying as an Animal Communication System
159
becoming less distinctive than those initiating a bout (Wolff, 1987; Gustafson and Harris, 1990; Green et al., 1998). Variation in crying reflects the infant’s pragmatic attempts to deal with its own varying degree of disregulation, as well as with evidence about the caregiver’s proximity. For example, the more forceful exhalations produced as the infant progresses into dysphonated crying may generate higher-amplitude sounds that carry farther, thereby increasing the chances of detection by a distant caregiver. Other changes in cry structure appear to reflect adjustments to assessment systems of targets rather than physical factors such as caregiver distance. For example, the vocal changes associated with painful stimulation or extended “hunger-induced” crying may be designed to escalate pressure on the caregiver to respond by inducing greater caregiver discomfort. Increases in fundamental frequency, shifts in the spectral distribution of acoustic energy, and shortening of cries characteristic of heightened infant disregulation induce greater caregiver sympathetic nervous system arousal, and are perceived as more urgent (Boukydis and Burgess, 1982; Boukydis, 1985; Crowe and Zeskind, 1992; Zeskind and Barr, 1997; Dessureau et al., 1998). Such changes in cry acoustics make crying harder to ignore, and caregiver intervention more likely. A pragmatic view of crying also helps explain why crying is sometimes dissociated from other indices of infant distress (the “dissociation problem”: Barr, 1998; Gunnar and Donzella, 1999). The solution to the dissociation problem lies in understanding that the infant stress-response system consists of multiple components with different specializations (Sapolsky, 1992; Mendoza et al., 2000). Various painful and novel experiences, such as medical procedures, activate the stress-response system, which includes crying and the release of stress hormones (Gunnar et al., 1988; Gunnar and Donzella, 1999; Gormally et al., 2001). Although crying and physiological aspects of the stress response are positively related at the level of group averages, correlations between these two levels of the stress response are often modest at the intra-individual level. In addition, activities such as sucking a pacifier during a medical procedure reduce crying without reducing cortisol response (Gunnar et al., 1988; Gunnar and Donzella, 1999). The independence of various components of the stress response is less perplexing if crying is viewed as one component of a larger regulatory system. The infant stressresponse system functions not only to draw the attention of caregivers but also to prepare infants to cope with impending threats through other means. Stress hormones, for example, prepare the body to deal with tissue damage and blood loss (Gunnar and Donzella, 1999; Mendoza et al., 2000). Some distress-induced changes are targeted internally (e.g., the release of cortisol), and others externally (e.g., crying). The decoupling of physiological and behavioral components can occur when progress is made on some fronts, such as recruitment of caregiving (as evidenced by the opportunity to suckle), but not on other fronts, such as termination of painful stimulation. Crying is the primary externally targeted
160
Donald H. Owings and Debra M. Zeifman
regulatory output available to very young infants, and it works to forestall further progression of disregulation by eliciting caregiving. This interpretation of the decoupling of stress-related changes is supported by studies of mother–infant separation in squirrel monkeys, a New World species that exhibits mother–infant attachment (Coe et al., 1983). When infants were separated from their mothers and placed either in adjacent cages, which preserved olfactory and vocal contact, or in total sensory isolation, infants in adjacent cages vocalized more than those completely separated from their mothers, suggesting that vocal response is sensitive to the utility (or futility) of using vocalizations to recruit caregiving. In contrast, cortisol responses were equal or greater in total isolation than in adjacent cages. The Interplay of Infant Management and Caregiver Assessment The effectiveness of crying in activating adequate caregiving depends on characteristics of both infant and caregiver. On the infant’s part, crying must be salient and precise enough to impel and guide caregiving. On the caregiver’s part, a perceptual readiness must be present, as well as a willingness to abandon other activities in favor of caregiving. Caregiving interventions result from the interplay between characteristics of infant and caregiver. A paradoxical feature of crying is that part of its power to activate caregiving lies in its noxiousness, and that this very noxiousness can also evoke abusive or avoidant responses by caregivers (Murray, 1979; Frodi, 1985). We will argue that such noxiousness may be adaptive in small doses, but becomes more costly to the infant when used excessively. Such excessive crying may be a relatively recent phenomenon in our evolutionary history, and may be a by-product of modern child care practices. The Infant Side The effectiveness of crying can be evaluated along two dimensions, precision and efficacy. Precision is most closely linked to issues of cognition in communication, such as referentiality (as used in the animal communication literature; e.g., see Evans, 1997), and has to do with how precisely the infant’s crying can guide the caregiver to a remedy for distress. Efficacy is most closely linked to issues of motivation and emotion in communication (Owings, 1994), and has to do with the power of crying to compel caregivers to abandon other activities and attend to caregiving. The precision of crying seems surprisingly low, in that the sounds of cries do not allow inference of the exact source of the infant’s distress. Even though it is common to speak of different cry types (hunger, anger, pain, and birth cries), the overwhelming evidence is that cries do not represent a set of discrete semantic categories of causes (see Smith, 1965 for application of semiotics to animal signals). Instead, crying appears to provide graded cues about varying levels of infant distress (Murray, 1979; Zeskind et al., 1992; Green et al., 1998; Barr et al., 2000b; Zeifman, 2001b). For example, women are not generally
Human Infant Crying as an Animal Communication System
161
accurate when asked to identify the specific causes of cries they hear as playbacks, with only half scoring above chance levels, but are highly accurate when their judgments are scored with regard to infant distress level rather than specific eliciting stimuli (Gustafson and Harris, 1990). Such modest precision in cries is counterintuitive since the sophisticated cognitive systems of adults afford more refined feedback for shaping crying, including its precision. Nor can the absence of different cries for different elicitors be attributed to the cognitive limitations of infants; the emission of referentially specific signals does not require exceptional cognitive abilities (Evans, 1997). In reality, categorical distinctions among cries arising from different circumstances would not have much utility for guiding caregiver intervention, given the numerous potential sources of discomfort and pain. The relatively poor precision of crying may reflect the large number of potential sources of distress, as well as the presence of contextual information. Certainly if close contact between infant and caregiver was the historical norm for our species, the context for inferring the cause and responding appropriately would typically have been present at the onset of crying. Although the efficacy of crying determines the tempo of response, contextual cues may determine the nature of response. The efficacy of crying depends in part on its power to induce distress in caregivers, as illustrated by the opening narrative. Inconsolable crying (colic) is common in technologically developed Western countries (Barr, 1990b), and has spawned a broad range of solutions. The sheer number of proposed colic remedies provides some index of the mobilizing power of crying (see Lutz, 1999, pp. 110–111). The sound of infant crying activates the sympathetic nervous system in adults (Frodi, 1985; Crowe and Zeskind, 1992), and does so more effectively than sounds from other species and mechanical sources (Murray, 1985). Finally, the salience of crying is also indicated by the disproportionate volume of research on crying relative to other infant emotional signals. Apparently, crying motivates caregiving in part by inducing negative states in caregivers that they work to terminate (see Murray, 1979; Frodi, 1985; Gustafson and Green, 1989; Zeskind and Barr, 1997; Barr et al., 2000a). The noxiousness of crying may help explain developmental changes in the deployment and structure of the signal (for a review, see Zeifman, 2001a). As infants come to anticipate caregiving in response to crying during their first year, crying is more likely to occur in the presence of caregivers than in their absence (Bell and Ainsworth, 1972), and is more narrowly directed toward particular caregivers. Crying is coordinated with gaze and gesture (Gustafson and Green, 1991), and occurs less often as alternative methods of communication develop. In fact, the degree of reliance on crying among infants of the same chronological age is inversely related to the complexity of the infant’s linguistic and nonlinguistic managerial repertoire (Bell and Ainsworth, 1972; Kopp, 1992).
162
Donald H. Owings and Debra M. Zeifman
Such changes in managerial activities may well be facilitated by the contrast between negative caregiver reactions to crying and positive reactions to smiling, laughter, and first words (Locke, 1996). The general decline in crying may also reflect the less permissive contexts infants find themselves in as they mature. In comparison with parents, nonrelated adults and peers may be less inclined to tolerate a punishing signal and less motivated to provide care. The literature on emotion regulation in children suggests that children who inhibit negative emotions such as anger, disappointment, and distress are better liked than those who do not (Eisenberg et al., 1993). And even though crying continues in adulthood, the vocal component that carries the brunt of crying’s negative impact is replaced by a visual component, tearing (Zeifman, 2001a). In fact, most adult crying is merely tearing. The fact that crying by adults can induce negative social reactions among peers (Plas and Hoover-Dempsey, 1988) may account for the tendency of adults to seek privacy before crying (Becht and Vingerhoets, 1997). Negative feedback may favor modifications in the structure of the signal across the life span that reduce its most noxious element (the cry sound), as well as refinements that limit the contexts in which crying appears. Not all crying, however, is equal in its power to induce discomfort in listeners. For example, “difficult” or colicky infants distress their caregivers not only because they cry more but also because they engage in the more escalated forms of crying. Infants rated by their mothers as “difficult” produce hunger-associated cries with longer pauses within and between cry sounds and a higher fundamental frequency at peak intensity than infants judged to be less difficult (Lounsbury and Bates, 1982; see also Zeskind and Barr, 1997). The cries of “difficult” infants are rated by listeners as more grating, arousing, piercing, discomforting, and aversive (Boukydis and Burgess, 1982), and evoke more anger, irritation, and avoidance, as well as a greater perception that such cries are produced by spoiled infants (Lounsbury and Bates, 1982). These and additional findings indicate that the distressing effect of crying on listeners increases with the graded escalation of crying (Boukydis and Burgess, 1982; Zeskind, 1985; Crowe and Zeskind, 1992). In the circumstances in which crying evolved, infants probably were carried continuously, allowing rapid response to early, nonescalated signs of infant distress (Barr, 1990b). Among hunter-gatherer groups such as the !Kung San, where continuous carrying is the norm, caregivers respond to 92 percent of fussing and crying episodes within 15 seconds of their onset (Barr et al., 1991). In contrast, under conditions of reduced proximity between caregiver and infant typical of industrialized countries (e.g., the practice of placing infants in cribs), the potency of nonescalated crying is lost. During the first three months of their infants’ lives, for example, American mothers ignored 46 percent of their infants’ cries (range = 4–97), and even when they did respond, they delayed their responses by an average of 3.83 minutes (Bell and Ainsworth, 1972).
Human Infant Crying as an Animal Communication System
163
This reduced responsiveness may be a dysfunctional product of evolutionarily atypical distances between infant and caregiver, perhaps exacerbated by “experts’ ” admonitions that immediate responses will “spoil” the infant (Bell and Ainsworth, 1972). Experimental augmentation of caregiver–infant contact through carrying substantially reduces the amount of crying by infants (by 43 percent at the developmental peak in crying; Barr, 1990a). Alternatively, caregiver unresponsiveness could reflect a behavioral neophenotype (Kuo, 1967; West et al., 1994), a self-interested ontogenetic adaptation by caregivers to the unusual freedom from danger that infants experience in modern environments. The Caregiver Side Negative reactions to crying not only are judiciously regulated by infants, but also are tempered by the permissive context created by the caregiver system. While strangers are unlikely to approach an adult accompanied by an older child, they often stop to admire infants, especially newborns. This extreme attraction is not unique to humans; in Old World primate groups, the birth of an infant “invigorates” a matriline, bringing related individuals together in shared attraction to both infant and new mother (Berman, 1982; Suomi, 1999). Lorenz (1970b) identified the “babyish” features that seem to underlie the newborn’s appeal, including a relatively large head and eyes, small face, and short muzzle. Consistent with Lorenz’s claim, deviations from the norms of this infant schema have been implicated as a risk factor for child abuse in humans (McCabe, 1984). Adult attraction to infants is further enhanced by tonic activation of the caregiving system, and infants may play a significant role in that activation. Work with sheep and rats indicates that maternal motivation is primed by the action of gonadal and pituitary hormones during pregnancy, activated by the infant’s stimulation of the birth canal during parturition, and sustained by continuing stimulation from infants during caregiving. Postpartum maintenance of maternal motivation by infants works by inducing maternal release of the neuropeptide oxytocin (Rosenblatt, 1992). Oxytocin, prolactin, and endogenous opioids appear to be primary mediators of the mood shifts that occur immediately after birth and facilitate the development of feelings of acceptance, nurturance, and love (Panksepp, 1998). Activation of caregiving depends on more than the hormonal priming of pregnancy, however; repeated contact with infants even in virgin female rats can induce caregiving (Rosenblatt, 1992), and human experience with infants is associated with positive changes in affective evaluations of crying sounds (Murray, 1985). Activation of the caregiving system through parturition or caregiving experience minimizes the negative impact of crying, and enhances the rewards associated with prompt responses. The termination of crying is a negative reinforcer, which is followed by positive reinforcement with restoration of positive emotional infant signals such as smiling, cooing, and becoming visually alert (Korner and Grobstein, 1966; Thoman et al., 1977).
164
Donald H. Owings and Debra M. Zeifman
When mothers breast-feed their infants, nursing alleviates the discomfort that comes with the milk-letdown response to hearing crying (Lind et al., 1971), and induces a positive affective state mediated by the release of oxytocin and endogenous opioids (Panksepp et al., 1997; cf. Dunbar, chapter 14 in this volume). Finally, a very generic payoff results from successful intervention. The sense of mastery that comes with being able to transform an infant’s state for the better (or helplessness in being unable to; see Donovan and Leavitt, 1985) is a prime example of contingency sensitivity, a trait that may be widespread among vertebrates (Mason, 1979a, 1979b). Nevertheless, there is a great deal of variation among adults in motivation and ability to provide appropriate care to a distressed infant. Fathers are less effective than mothers at distinguishing their own infant’s cries from those of other infants (Green and Gustafson, 1983), and have more hostile reactions than mothers to playbacks of crying. Sympathetic nervous system activation is strongest for primiparous parents, intermediate for nonparents, and lowest for multiparous parents (Boukydis and Burgess, 1982), suggesting that aversive arousal in response to crying is tempered by prior exposure. Compared with mothers, nonmothers engage in fewer soothing activities, such as ventral holding, tactile and vestibular stimulation, and talking in response to crying (Gustafson and Harris, 1990). Finally, among nonparents, women respond more discriminatively than men to variations in cry structure (Zeskind, 1985). Less skillful caregiving also may be responsible for crying escalating into more irritating and difficult-to-extinguish ranges. Just as more nurturant caregiving practices (like carrying) prevent crying, so less sensitive caregiving practices exacerbate crying. The unfortunate result is that noxious crying is more likely to occur with caregivers less inclined to tolerate crying in the first place. Although developmental trends indicate progressive infant inhibition of crying in nonpermissive contexts, very young infants cannot adjust their behavior to environmental conditions (Zeifman, 2001a). Consequently, crying in the presence of insensitive caregivers may evoke negative and even injurious reactions (Murray, 1979; Frodi, 1985), especially where extended family is not present to assist or to inhibit hostile responses. The Evolutionary Origin and Maintenance of Crying Crying’s potential to evoke hostile reactions may be a heretofore unappreciated cost of infant crying. Although the handicap principle has been fruitfully applied to the evolutionary stability of infant crying, the costs limiting the use of crying previously considered are expenditure of energy, increased conspicuousness to predators, and loss of credibility (Hauser, 1993; Kilner and Johnstone, 1997; Zeifman, 2001b). The potential of excessive crying to engender abuse has not previously been proposed as a cost constraining the deployment of care-soliciting signals.
Human Infant Crying as an Animal Communication System
165
The applicability of the handicap principle to crying illustrates the fact that parent-offspring conflict can coexist with mutuality of interest (Trivers, 1974), but it does not provide much insight into the evolutionary origin of crying. An evolutionary precursor to crying would have contained cues useful to caregivers in assessing the status of their infants and therefore could have generated the evolution of crying. Blumberg and Sokoloff (2001) present evidence that in rats, ultrasonic retrieval calls have originated as a by-product of an abdominal compression maneuver that serves to increase blood return to the heart when cardiovascular function is jeopardized by extreme cooling. According to these authors, maternal retrieval evoked by these sounds originated through a process of active caregiver assessment based on these sounds, and not active infant management. Nevertheless, subsequent selection for signal function could have emerged from maternal retrieval responses. This idea is consistent with Thompson et al.’s (1996) “respiratory melodrama” hypothesis that crying evolved through a process of capitalizing on the cues used by caregivers to detect infant respiratory distress (a common consequence of cardiovascular failure). General Discussion and Future Research Directions The purpose of this chapter has been to uncover insights from an A/M approach to animal communication that serve to clarify or resolve sticking points in the infant crying literature. In particular, an A/M approach directs our attention to caregivers as active perceivers of infant signals, and to the impact of ecological variables on signal design and deployment. For example, the negative responses of caregivers to the most noxious vocal element of crying may underlie its gradual tapering off over the course of early childhood in preference to other expressions of distress. Viewing signaling behavior in its natural context highlights not only the infant-caregiver relationship but also the ways in which modern contexts for crying deviate from historical ones. Reduced infant-caregiver contact typical of Western societies may alter the structure of the effective cry signal by reducing the salience of its nonvocal aspects. For example, tactile stimulation provided by infant agitation and fussing that would have led to an intervention to avert crying if an infant were being held, would go undetected in a modern setting and likely result in escalated and protracted crying. Finally, viewing crying as based on mutual regulation resolves the problems of crying’s poor precision and its occasional uncoupling from other indices of infant distress. Although we have noted that the extensive crying literature is one of several manifestations of adults’ strong motivation to control a potent and punishing signal, many research questions remain unanswered. An emphasis on the caregiver side of the crying equation
166
Donald H. Owings and Debra M. Zeifman
and the importance of context suggest several avenues of research that are likely to be productive. For example, cry perception studies have, with few exceptions, employed short auditory playbacks of crying. Utilizing samples which simulate the length and progress of actual crying bouts, as well as characterizing the efficacy of nonacoustic cues associated with crying, would be productive avenues of exploration. Although numerous studies have examined the effect of various caregiving interventions on infant crying and soothing, few studies have explored the effects of infant crying on adult physiology, and none have demonstrated the effect of crying termination on sympathetic arousal. Finally, although a good deal is known about cry variation associated with differences between infants, not nearly enough is known about crying variation within individuals in different contexts. Research addressing the above questions would contribute greatly to our knowledge of human infant crying as well as enrich our understanding of other animal communication systems. Acknowledgment Preparation of this manuscript was facilitated by discussions with Phil Shaver and Gig Levine (DHO), a sabbatical leave provided by Vassar College (DMZ), and feedback from the editors as well as an anonymous reviewer. References Ainsworth MDS, Wittig BA (1969) Attachment and exploratory behavior of one-year-olds in a strange situation. In: Determinants of Infant Behaviour IV (Foss BM, ed.), 111–136. London: Methuen. Ainsworth MS, Blehar MC, Waters E, Wall S (1978) Patterns of Attachment: A Psychological Study of the Strange Situation. Potomac, Md.: Lawrence Erlbaum. Andrews MW, Rosenblum LA (1991) Attachment in monkey infants raised in variable- and low-demand environments. Child Devel 62: 686–693. Barr RG (1990a) The “colic” enigma: Prolonged episodes of a normal predisposition to cry. Infant Ment Health J 11: 340–348. Barr RG (1990b) The early crying paradox: A modest proposal. Hum Nat 1: 355–389. Barr RG (1998) Reflections on measuring pain in infants: Dissociation in responsive systems and “honest signalling.” Arch Dis Childhd, Fetal Neonat Ed 79: F152–F156. Barr RG, Hopkins B, Green JA (2000a) Crying as a sign, a symptom and a signal: Evolving concepts of crying behavior. In: Crying as a Sign, a Symptom and a Signal (Barr RG, Hopkins B, Green JA, eds.), 1–7. London: Mac Keith Press. Barr RG, Hopkins B, Green JA (2000b) The crying infant and toddler: Challenges, emergent themes and promissory notes. In: Crying as a Sign, a Symptom and a Signal (Barr RG, Hopkins B, Green JA, eds.), 210–217. London: Mac Keith Press. Barr RG, Konner M, Bakeman R, Adamson L (1991) Crying in !Kung San infants: A test of the cultural specificity hypothesis. Devel Med Child Neurol 33: 601–610.
Human Infant Crying as an Animal Communication System
167
Becht M, Vingerhoets AJJM (1997, March) Why we cry and how it affects mood. In: Annual Meeting of the American Psychosomatic Society (Santa Fe, N.M.: abstracted in Psychosom Med, 59, 92). Bell SM, Ainsworth MD (1972) Infant crying and maternal responsiveness. Child Devel. 43: 1171–1190. Berman CM (1982) The ontogeny of social relationships with group companions among free-ranging infant rhesus monkeys: II. Differentiation and attractiveness. Anim Behav 30: 163–170. Bernal J (1972) Crying during the first 10 days of life, and maternal responses. Devel Med and Child Neurol 14: 362–372. Blumberg MS, Sokoloff G (2001) Do infant rats cry? Psych Rev 108: 83–95. Boukydis CFZ (1985) Perception of infant crying as an interpersonal event. In: Infant Crying: Theoretical and Research Perspectives (Lester BM, Boukydis CFZ, eds.), 187–215. New York: Plenum Press. Boukydis CZ, Burgess RL (1982) Adult physiological response to infant cries: Effects of temperament of infant, parental status, and gender. Child Devel 53: 1291–1298. Bowlby J (1969) Attachment and Loss, vol. 1, Attachment. London: Hogarth Press. Bowlby J (1982) Attachment. New York: HarperCollins: Basic Books. Burghardt GM (1998) Snake stories: From the additive model to ethology’s fifth aim. In: Responsible Conduct with Animals in Research (Hart LA, ed.), 77–95. New York: Oxford University Press. Cassidy J, Shaver PR (1999) Handbook of Attachment: Theory, Research, and Clinical Applications. New York: Guilford Press. Coe CL, Wiener SG, Levine SS (1983) Psychoendocrine responses of mother and infant monkeys to disturbance and separation. In: Symbiosis in Parent–Offspring Interactions (Rosenblum LA, Moltz H, eds.), 189–214. New York: Plenum Press. Crockenberg SB (1981) Infant irritability, mother responsiveness, and social support influences on the security of infant–mother attachment. Child Devel 52: 857–865. Crowe HP, Zeskind PS (1992) Psychophysiological and perceptual responses to infant cries varying in pitch: Comparison of adults with low and high scores on the Child Abuse Potential Inventory. Child Abuse and Neglect 16: 19–29. Darwin C (1981) The Descent of Man, and Selection in Relation to Sex. Princeton, N.J.: Princeton University Press. (First published 1871.) Dawkins R, Krebs JR (1978) Animal signals: Information or manipulation? In: Behavioural Ecology: An Evolutionary Approach (Krebs JR, Davies NB, eds.), 282–309. Sunderland, Mass.: Sinauer. Dessureau BK, Kurowski CO, Thompson NS (1998) A reassessment of the role of pitch and duration in adults’ responses to infant crying. Infant Behav Devel 21: 367–371. Donovan WL, Leavitt LA (1985) Physiology and behavior: Parents’ response to the infant cry. In: Infant Crying: Theoretical and Research Perspectives (Lester BM, Boukydis CFZ, eds.), 241–261. New York: Plenum Press. Eisenberg N, Fabes, RA, Bernzweig, J, Karbon, M, Poulin, R, Hanish, L (1993) The relation of emotionality and regulation to preschoolers’ social skills and sociometric status. Child Devel 64: 1418–1438. Evans CS (1997) Referential signals. In: Perspectives in Ethology, vol. 12, Communication (Owings DH, Beecher MD, Thompson NS, eds.), 99–143. New York: Plenum Press. Frodi A (1985) When empathy fails: Aversive infant crying and child abuse. In: Infant Crying: Theoretical and Research Perspectives (Lester BM, Boukydis CFZ, eds.), 263–277. New York: Plenum Press. Gandelman R (1992) Psychobiology of Behavioral Development. New York: Oxford University Press. George C, Solomon J (1999) Attachment and caregiving: The caregiving behavioral system. In: Handbook of Attachment: Theory, Research, and Clinical Applications (Cassidy J, Shaver PR, eds.), 649–670. New York: Guilford Press. Gormally S, Barr RG, Wertheim L, Alkawaf R, Calinoiu N, Young SN (2001) Contact and nutrient caregiving effects on newborn infant pain responses. Devel Med Child Neurol 43: 28–38.
168
Donald H. Owings and Debra M. Zeifman
Green JA, Gustafson GE (1983) Individual recognition of human infants on the basis of cries alone. Devel Psychobiol 16: 485–493. Green JA, Gustafson GE, McGhie AC (1998) Changes in infants’ cries as a function of time in a cry bout. Child Devel 69: 271–279. Gunnar MR, Connors J, Isensee J, Wall L (1988) Adrenocortical activity and behavioral distress in human newborns. Devel Psychobiol 21: 297–310. Gunnar MR, Donzella B (1999) “Looking for the Rosetta Stone”: An essay on crying, soothing, and stress. In: Soothing and Stress (Lewis M, Ramsay D, eds.), 39–56. Mahwah, N.J.: Lawrence Erlbaum. Gustafson GE, Deconti KA (1990) Infants’ cries in the process of normal development. Early Child Devel Care 65: 45–56. Gustafson GE, Green JA (1989) On the importance of fundamental frequency and other acoustic features in cry perception and infant development. Child Devel 60: 772–780. Gustafson GE, Green JA (1991) Developmental coordination of cry sounds with visual regard and gestures. Infant Behav Devel 14: 51–57. Gustafson GE, Harris KL (1990) Women’s responses to young infants’ cries. Devel Psych 26: 144–152. Harlow HF, Zimmermann RR (1958) The development of affectional responses in infant monkeys. Proc Amer Phil Soc 102: 501–509. Hauser MD (1993) Do vervet monkey infants cry wolf? Anim Behav 45: 1242–1244. Hennessy DF, Owings DH, Rowe MP, Coss RG, Leger DW (1981) The information afforded by a variable signal: Constraints on snake-elicited tail flagging by California ground squirrels. Behaviour 78: 188–226. Hofer MA (1987) Early social relationships: A psychobiologist’s view. Child Devel 58: 633–647. Hopkins B (2000) Development of crying in normal infants: Method, theory and some speculations. In: Crying as a Sign, a Symptom and a Signal (Barr RG, Hopkins B, Green, JA, eds.), 176–209. New York: Cambridge University Press. Kilner R, Johnstone RA (1997) Begging the question: Are offspring solicitation behaviours signals of need? Trends Ecol Evol 12: 11–15. Kopp CB (1992) Emotional distress and control in young children. In: Emotion and Its Regulation in Early Development (Eisenberg N, Fabes RA, eds.), 41–56. San Francisco: Jossey-Bass. Korner AF, Grobstein R (1966) Visual alertness as related to soothing in neonates: Implications for maternal stimulation and early deprivation. Child Devel 37: 867–876. Kraemer GW (1997) Psychobiology of early social attachment in rhesus monkeys: Clinical implications. In: The Integrative Neurobiology of Affiliation (Carter CS, Lederhendler II, Kirkpatrick B, eds.), 401–418. New York: New York Academy of Sciences. Kuo ZY (1967) The Dynamics of Behavioral Development: An Epigenetic View. New York: Random House. LeVine RA (1977) Child rearing as a cultural adaptation. In: Culture and Infancy: Variations in the Human Experience (Leiderman PH, Tulkin SR, Rosenfeld A, eds.), 15–27. New York: Academic Press. Lind J, Vuorenkoski V, Wasz-Höckert O (1971) The effect of cry stimulus on the temperature of the lactating breast of primipara. In: Psychosomatic Medicine in Obstetrics and Gynaecology (Morris N, ed.), 293–295. Basel: S. Karger. Locke JL (1996) Why do infants begin to talk? Language as an unintended consequence. J Child Lang 23: 251–268. Lorenz K (1970a) Companions as factors in the bird’s environment. (Martin R., trans.). In: Konrad Lorenz: Studies in Animal and Human Behaviour (Martin RD, ed.), 101–258. Cambridge, Mass.: Harvard University Press. (First published 1935). Lorenz K (1970b) Part and parcel in animal and human societies: A methodological discussion (Martin R, trans.). In: Konrad Lorenz: Studies in Animal and Human Behaviour (Martin RD, ed.), 115–195. Cambridge, Mass.: Harvard University Press. (First published 1950).
Human Infant Crying as an Animal Communication System
169
Lounsbury ML, Bates JE (1982) The cries of infants of differing levels of perceived temperamental difficultness: Acoustic properties and effects on listeners. Child Devel 53: 677–686. Lutz T (1999) The Natural and Cultural History of Tears. New York: Norton. Mason WA (1979a) Ontogeny of social behavior. In: Handbook of Behavioral Neurobiology, vol. 3 (Marler P, Vandenbergh JG, eds.), 1–28. New York: Plenum Press. Mason WA (1979b) Wanting and knowing: A biological perspective on maternal deprivation. In: Origins of the Infant’s Social Responsiveness (Thoman E, ed.), 225–249. Hillsdale, N.J.: Lawrence Erlbaum. McCabe V (1984) Abstract perceptual information for age level: A risk factor for maltreatment? Child Devel 55: 267–276. Mendoza SP, Capitanio JP, Mason WA (2000) Chronic social stress: Studies in non-human primates. In: Biology of Animal Stress: Basic Principles and Implications for Animal Welfare (Moberg GP, Mench JA, eds.), 227–247. New York: CABI. Murray AD (1979) Infant crying as an elicitor of parental behavior: An examination of two models. Psych Bull 86: 191–215. Murray AD (1985) Aversiveness is in the mind of the beholder: Perception of infant crying by adults. In: Infant Crying: Theoretical and Research Perspectives (Lester BM, Boukydis CFZ, eds.), 217–239. New York: Plenum Press. Ostwald P (1972) The sounds of infancy. Devel Med Child Neurol 14: 350–361. Owings DH (1994) How monkeys feel about the world: A review of How Monkeys See the World. Lang Commun 14: 15–30. Owings DH, Hennessy DF (1984) The importance of variation in sciurid visual and vocal communication. In: The Biology of Ground-Dwelling Squirrels: Annual Cycles, Behavioral Ecology, and Sociality (Murie JO, Michener GR, eds.), 169–200. Lincoln: University of Nebraska Press. Owings DH, Morton ES (1998) Animal Vocal Communication: A New Approach. Cambridge: Cambridge University Press. Panksepp J (1998) Affective Neuroscience: The Foundations of Human and Animal Emotions. New York: Oxford University Press. Panksepp J, Nelson E, Bekkedal M (1997) Brain systems for the mediation of social separation-distress and social-reward. Evolutionary antecedents and neuropeptide intermediaries. In: The Integrative Neurobiology of Affiliation (Carter CS, Lederhendler II, Kirkpatrick B, eds.), 78–100. New York: New York Academy of Sciences. Partan S, Marler P (1999) Communication goes multimodal. Science 283: 1272–1273. Pelton LH (1978) Child abuse and neglect: The myth of classlessness. Amer J Orthopsych 48: 608–617. Plas JM, Hoover-Dempsey KV (1988) Working Up a Storm: Anger, Anxiety, Joy, and Tears on the Job. New York: Norton. Rosenblatt JS (1992) Hormone-behavior relations in the regulation of parental behavior. In: Behavioral Endocrinology (Becker JB, Breedlove SM, Crews D, eds.), 219–259. Cambridge, Mass.: MIT Press. Sapolsky R (1992) Neuroendocrinology of the stress-response. In: Behavioral Endocrinology (Becker JB, Breedlove SM, Crews D, eds.), 287–324. Cambridge, Mass.: MIT Press. Scherer KR (1992) Vocal affect expression as symptom, symbol, and appeal. In: Nonverbal Vocal Communication: Comparative and Developmental Approaches (Papousek H, Jurgens U, Papousek M, eds.), 43–60. Cambridge: Cambridge University Press. Seyfarth RM, Cheney DL (1997) Some general features of vocal development in nonhuman primates. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 249–273. Cambridge: Cambridge University Press. Singh M (1975) Mother-infant separation in rhesus monkey living in natural environment. Primates 16: 471–476. Smith WJ (1965) Message, meaning and context in ethology. Amer Nat 99: 405–409.
170
Donald H. Owings and Debra M. Zeifman
Sroufe LA, Waters E (1977) Attachment as an organizational construct. Child Devel 48: 1184–1199. Steinberg LD, Catalano R, Dooley D (1981) Economic antecedents of child abuse and neglect. Child Devel 52: 975–985. Suomi SJ (1997) Early determinants of behavior: Evidence from primate studies. Brit Med J 53: 170–184. Suomi SJ (1999) Attachment in rhesus monkeys. In: Handbook of Attachment: Theory, Research, and Clinical Applications (Cassidy J, Shaver PR, eds.), 181–197. New York: Guilford Press. Thoman EB, Korner AF, Beason-Williams L (1977) Modification of responsiveness to maternal vocalization in the neonate. Child Devel 48: 563–569. Thompson NS, Olson C, Dessureau B (1996) Babies’ cries: Who’s listening? Who’s being fooled? Soc Res 63: 763–784. Trivers RL (1974) Parent–offspring conflict. Amer Zool 14: 249–264. Von Uexküll J (1957) A stroll through the worlds of animals and men: A picture book of invisible worlds. In: Instinctive Behavior: The Development of a Modern Concept (Schiller C, ed.), 5–80. New York: International Universities Press. (First published 1934). Wasz-Höckert O, Lind J, Vuorenkoski V, Partanen T, Valanné E (1968) The Infant Cry: A Spectrographic and Auditory Analysis. Clinics in Developmental Medicine no. 29. London: Spastics International Medical Publications. West MJ, King AP, Freeberg TM (1994) The nature and nurture of neo-phenotypes: A case history. In: Behavioral Mechanisms in Evolutionary Ecology (Real LA, ed.), 238–257. Chicago: University of Chicago Press. West MJ, King AP, Freeberg TM (1997) Building a social agenda for the study of bird song. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 41–56. Cambridge: Cambridge University Press. Williams GC (1966) Adaptation and Natural Selection. Princeton, N.J.: Princeton University Press. Wolff PH (1969) The natural history of crying and other vocalizations in early infancy. In: Determinants of Infant Behaviour IV (Foss BM, ed.), 81–109. London: Methuen. Wolff PH (1987) The Development of Behavioral States and the Expression of Emotions in Early Infancy: New Proposals for Investigation. Chicago: University of Chicago Press. Zahavi A (1975) Mate selection: A selection for a handicap. J Theoret Biol 53: 205–214. Zeifman DM (2001a) Developmental aspects of crying: Infancy, childhood, and beyond. In: Adult Crying: A Biopsychosocial Approach (Vingerhoets AJJM, Cornelius RR, eds.), 37–53. Brighton: Brunner-Routledge. Zeifman DM (2001b) An ethological analysis of human infant crying: Answering Tinbergen’s four questions. Devel Psychobiol 39: 265–285. Zeskind PS (1985) Adult perceptions of pain and hunger cries: A synchrony of arousal. Child Devel 56: 549–554. Zeskind PS, Barr RG (1997) Acoustic characteristics of naturally occurring cries of infants with “colic.” Child Devel 68: 394–403. Zeskind PS, Klein L, Marshall TR (1992) Adults’ perceptions of experimental modifications of durations of pauses and expiratory sounds in infant crying. Devel Psych 28: 1153–1162. Zeskind PS, Lester BM (1978) Acoustic features and auditory perceptions of the cries of newborns with prenatal and perinatal complications. Child Devel 49: 580–589. Zeskind PS, Marshall TR (1988) The relation between variations in pitch and maternal perceptions of infant crying. Child Devel 59: 193–196. Zeskind PS, Parker-Price S, Barr RG (1993) Rhythmic organization of the sound of infant crying. Devel Psychobiol 26: 321–333.
10
Evolution of Communication from an Avian Perspective
Irene M. Pepperberg Introduction Many studies on the evolution of communication devolve into treatises on human language evolution, focusing on primates. If, however, we truly wish to develop models about communication, we must also consider systems phylogenetically removed from humans. I describe an experimentally manipulated vocal system, that of the Grey parrot (Psittacus erithacus), and argue for broadening our bases for theories and models of communication. Nonhuman Primates: Not Necessarily the Only Evolutionary Models Many researchers, including contributors to this volume (e.g., Dunbar, Fitch, Snowdon) highlight nonhuman primate present-day social and cognitive skills that could be ancestral links to human language and then, using these links, build evolutionary theories of communication (e.g., Bickerton, 1990; cf. Lieberman, 2000). Although a common ancestor for nonhuman primates and humans is undeniable, as are many neurological, anatomical, and resultant behavioral parallels (e.g., Deacon, 1997), this strategy overlooks the likelihood that, through evolutionary pressures and the exploitation of different ecological niches, similar communicative abilities may evolve in different ways and some may have been lost in some lineages. I do not dispute that certain capacities, such as discrimination of individual vocalizations (e.g., Cheney and Seyfarth, 1999) and melodic and rhythmic auditory patterns (e.g., Ramus et al., 2000), or social skills such as cooperative hunting that likely require communicative competence (Boesch, 1994), may be central to the evolution of communication. I do, however, question behavioral emphases on primates and arguments for primate-centric neurologically wired bases for such behavior patterns, because these skills and patterns also exist in avian and cetacean lines (e.g., Bednarz, 1988; Evans, 1987; Forestell and Herman, 1988; Hulse et al., 1984; Stoddard et al., 1991)—in creatures with different evolutionary histories and differently wired brains (e.g., McFarland and Morgane, 1966; Morgane et al., 1986; Nottebohm, 1980; Striedter, 1994). Thus, if we wish to examine the evolution of complex communication and build models to determine what abilities are necessary and sufficient for such communication, we will miss important insights, particularly with respect to vocal learning, by focusing solely on primates. Birds as Communication Models Although direct connections between avian and human communication systems are unlikely—other than in allegories of the African Bwiti tribe, who claim Grey parrots
172
Irene M. Pepperberg
brought language to humans as a gift from the gods (Fernandez, 1982)—most current studies of the evolution of communication lack multiple models. No longer, for example, are data on birdsong and human language parallels (e.g., issues of adequate input, presence of babbling or practice periods, learning appropriate context for specific vocalizations: Byers and Kroodsma, 1992; Marler, 1970, 1973; Nottebohm, 1970) considered central to communication studies; similarly, research on laboratory-based avian communicatory achievements (e.g., Pepperberg, 1999) may be considered artifactual. But, given our knowledge of avian vocal learning (Kroodsma and Miller, 1996), of how social interaction affects such learning (Baptista and Petrinovich, 1984, 1986; Kroodsma and Pickert, 1984a, 1984b; Todt and Hultsch, 1998; reviews in Pepperberg, 1985, 1997, 1999; Pepperberg and Schinke-Llano, 1991), and of birds’ advanced cognition (e.g., Balda et al., 1998; Peake et al., 2001; Pepperberg, 1999), we cannot ignore Aves in determining evolutionary pressures that affected how complex communication systems—particularly vocal learning systems—arose, and in developing testable theories and models. Although phylogenetically remote, Grey parrots and humans share several cognitive and communicative abilities. Greys learn simple vocal syntactic patterns and referential elements of human communication; and, despite walnut-sized brains organized differently from those of primates and even songbirds (Jarvis and Mello, 2000; Striedter, 1994), on certain tasks (e.g., label acquisition, categorization, numerical competence, relative size, conjunction, recursion) their processing abilities and learning strategies may parallel those of human children (Pepperberg, 1981, 1994, 1996, 1999; Pepperberg and Shive, 2001; Pepperberg and Wilcox, 2000). Like children, Grey parrots use sound play (phonetic “babbling” and recombination; Pepperberg et al., 1991) to produce new speech patterns from existent ones (Pepperberg, 1990), implying that they acoustically represent labels as humans do, and develop phonetic categories. Greys may use anticipatory coarticulation— separate specific phonemes from speech flow and produce these sounds to facilitate production of upcoming phonemes (Patterson and Pepperberg, 1998)—which, along with sound play, is consistent with top-down processing (Ladefoged, 1982). Greys recombine labels in novel ways to respond to novel situations and transfer such use across contexts (Pepperberg and Brezinsky, 1991). They can learn from each other in the laboratory (Pepperberg et al., 2000), and, if their natural behavior resembles that of other parrots (Gnam, 1988; Levinson, 1980; Nottebohm, 1970; Wright, 1996; Wright & Dorin, 2001; Yamashita, 1987), they establish strong pair bonds; recognize specific individuals; have vocal sentinel behavior, complex pair-bond duets, dialects, and likely alter calls when changing dialect areas. Long-lived, they reside in large groups whose social complexity may match that of primates. Such data suggest that parrots can provide important evolutionary insights into complex cognitive and communicative processes.
Evolution of Communication from an Avian Perspective
173
Psittacine Communication: Levels of Complexity Within this discussion, the complexity of my Greys’ communication system needs clarification. Although Greys use elements of English speech referentially, their overall behavior—and that of other subjects in animal–human communication studies—is not truly referential. Referentiality, as defined by linguists (e.g., Bickerton, 1990), requires full abstract use of a symbol: to talk about qualities of the item, to talk about how you think about the item (the referent) in its absence, to talk about it in future and past tenses—not simply, for example, in a request for something currently absent (Pepperberg, 1999). Nevertheless, these animals’ codes, if not referential in the strongest sense, qualify as advanced, complex communication. The way these animals use a sign as a symbol suggests that the symbol functions as a mental representation of an item (Pepperberg, 1999). Apes use particular signs as symbols for types of objects (e.g., for all apples) and actions (e.g., requests), and different signs as symbols for categories that include the specific objects (e.g., food) or actions (e.g., Gardner and Gardner, 1978; Savage-Rumbaugh et al., 1993); to a great extent, so do my birds (Pepperberg, 1999). Moreover, their symbol use is not only in the here-and-now. Like humans, they request absent objects or an action not currently being performed, and accept that object or action and no other (Pepperberg, 1999; see Hockett’s 1959 concept of displacement). They generally demonstrate both label comprehension and label production: Given the label, an animal can indicate the object in some manner (either by a point or by stating something unique about the indicated item; review in Pepperberg, 1999). These animals use labels to refer to similar but nonidentical items— for example, to identify the material of any colored or shaped piece of wood without additional training. Similarly, they understand that the label “green” refers to the concept “greenness”—to beans as well as to training objects—and how the arbitrary label “green” is subsumed into a category whose arbitrary label is “color” (Pepperberg, 1996). Specifically, my oldest subject, Alex, labels more than 50 different objects, seven colors, five shapes, quantity to 6, three categories (material/color/shape); he uses no, come here, wanna go X, want Y (X and Y are appropriate location and item labels). He combines labels to identify, classify, request, or refuse about 100 items and to alter his environment, and comprehends these labels. He processes queries to judge category, relative size, quantity, presence or absence of similarity and difference in attributes. Like some other “languagetrained” subjects, he can use symbols dispassionately, that is, separate identification of an object from a request for that item (Pepperberg, 1988; i.e., he separates illocutionary force from prepositional content—see Oller, chapter 4 in this volume, for implications of such behavior). Details of this research have been summarized elsewhere (e.g., Pepperberg, 1999). Suffice it to say that such abilities were once presumed to be limited to humans and apes (Premack, 1978), and that Alex is not unique: Other Greys are replicating some
174
Irene M. Pepperberg
of his results (Pepperberg, 1999). Thus Greys qualify as good models for the evolution of complex communication skills. Moreover, they are also good models for the acquisition of such skills: Complex skills must be learned, and evolution also exerts pressures on learning processes. How Greys Learn: Parallels with Humans My Greys’ learning sometimes parallels human processes, suggesting a long evolutionary history for the acquisition of complex communication. Like young children (e.g., Hollich et al., 2000), parrots acquire communication skills most effectively when input is referential, contextually applicable (functional), and socially rich (Pepperberg, 1997; Pepperberg et al., 1998, 1999, 2000). Reference is an utterance’s meaning—the relationship between labels and objects to which they refer—and referential input is exemplified by our use of objects that the bird labels as rewards. Context/function involves the situation in which an utterance is used and the effects of its use; initial use of labels as requests for objects gives the bird reason to learn the unique, unfamiliar sets of sounds constituting English labels. Social interaction signals which environment components are important, emphasizes common attributes (and thus possible underlying rules) of diverse actions, and allows continuous adjustment of input to a learner’s level. Interaction engages subjects directly, provides contextual explanations for actions, and demonstrates actions’ consequences. I describe the primary training technique, then experiments to determine which input elements are necessary and sufficient to engender learning. The Model/Rival (M/R) Training Technique My model/rival (M/R) training system (Pepperberg, 1981) is based on studies by Todt (1975a) and Bandura (1971) on how social modeling affects learning by, respectively, parrots and humans. M/R training, involving three-way social interactions with two humans and a parrot, demonstrates targeted vocal behavior. Typically, a parrot observes two humans talking about one or more items in which it has already shown interest: The trainer presents, and queries a another human about, the item(s) (e.g., “What’s here?” “What color?”), and gives praise and the object(s) to reward correct answers referentially. Incorrect responses (like those that birds may make) are punished by scolding and temporarily removing the item(s) from sight. Thus the second human is a model for the parrot’s responses and its rival for the trainer’s attention, and illustrates effects of an error: S/he tries again or talks more clearly after a (deliberately) incorrect or garbled response, thereby demonstrating corrective feedback and the reason for learning the specific sounds of the
Evolution of Communication from an Avian Perspective
175
label. A bird is included in interactions and rewarded for successive approximations to a correct response; training is thereby adjusted to its level. Unlike Todt’s (and other researchers’) modeling procedures (see Pepperberg and Sherman, 2000), ours reverses roles of human trainer and model, and includes the parrot to emphasize that one being is not always the questioner and the other the respondent, and that the procedure causes environmental change. Role reversal also counteracts an issue in Todt’s work: His birds, whose trainers maintained their respective roles, responded only to the human posing questions; our birds, however, respond to, interact with, and learn from all trainers. M/R training uses intrinsic reinforcers exclusively: Reward for uttering “X” is the object X, to ensure the closest possible correlations between labels or concepts to be learned and their referents. Earlier unsuccessful programs for teaching birds to communicate with humans used extrinsic rewards (e.g., Mowrer, 1950). On the few occasions when those subjects correctly labeled any items, or responded appropriately to specific commands, they received a single, favored food that neither related to, nor varied with, labels or concepts being taught, thereby delaying acquisition by confounding the targeted label or concept with that of the food (Greenfield, 1978; Miles, 1983; Pepperberg, 1981; for data on dysfunctional children, see Pepperberg and Sherman, 2000). Also, de facto use of labels as requests demonstrates functionality. Because Alex sometimes fails to focus on targeted objects, we trained “I want X” (i.e., to separate labeling and requesting; Pepperberg, 1988). Reward becomes the right to request something more desirable than what he identifies, which provides flexibility but maintains referentiality: To receive X (e.g., treats) for identifying Y, Alex can toss Y and state “I want X”; trainers comply only after the appropriate prior task is completed. His labels thus are true identifiers, not mere emotional requests. Training “want” provides two other advantages (review in Pepperberg, 1999). First, trainers can distinguish incorrect labeling from appeals for other items, particularly during testing, when birds unable to use “want” might not be using the wrong label but, rather, be requesting treats, and low identification scores might thus be unrelated to competence. Second, birds may demonstrate low-level intentionality: If thwarted, Alex continues requesting specific desired items and rarely accepts substitutes. M/R training showed which input elements enabled acquisition of some level of allospecific communicative competence, not what input was necessary and sufficient. What if training lacked some elements? Answering that question required parrots uninfluenced by prior experience; Alex might cease learning because training changed, not because of how it changed. Thus I added the juveniles Kyaaro, Alo, and Griffin to test the importance of reference, context/function, and social interaction.
176
Irene M. Pepperberg
Eliminating Aspects of Input Seven experiments were performed. First, Alo and Kyaaro received three input conditions contiguously: I, audiotapes of Alex’s M/R sessions, which were, for the juveniles, nonreferential, not contextually applicable, and noninteractive; II, videotapes of Alex’s M/R sessions, which were, for the juveniles, referential, minimally contextually applicable, and noninteractive; and III, standard M/R training (Pepperberg, 1994). In I and II, juveniles experienced tapes in social isolation. Condition I paralleled earlier allospecific song acquisition studies (Marler, 1970); II involved unresolved issues about avian vision and video (e.g., the flicker-fusion of the CRT screen, possibly birds’ UV vision: Bowmaker et al., 1994, 1996; Ikebuchi and Okanoya, 1999; Lea and Dittrich, 1999). I counterbalanced labels across birds and conditions, matching training time across sessions. Second, so lack of reward would not deter video learning, a socially isolated juvenile watched videos while a hidden student monitored its utterances through headphones and could deliver rewards remotely (Pepperberg et al., 1998). Third, because coviewers sometimes increased young children’s learning from video (e.g., Corder-Bolz and O’Bryant, 1978; Lemish and Rice, 1986; Lesser, 1974; Salomon, 1977; Watkins et al., 1980; but see Rice et al., 1990), a trainer provided social approbation for viewing, pointed to the screen with comments like “Look what Alex has!”, but did not repeat targeted labels, ask questions, or relate content to other training. The juveniles’ attempts at labels would garner only vocal praise. Social interaction thus was limited; reference and functionality matched earlier videotape sessions (Pepperberg et al., 1998). Fourth, because extent of coviewer interaction might affect video learning (St. Peters et al., 1989), the trainer now repeated labels and asked questions (Pepperberg et al., 1999). Fifth, because juveniles might have habituated to the single videotape used per label (even though each tape depicted many different responses and Alex–trainer interactions), we used live video from Alex’s sessions (Pepperberg et al., 1999). Sixth, because labels were not acquired if adult–child duos failed to focus jointly on objects being labeled (e.g., D. A. Baldwin, 1995), a single trainer faced away from the juvenile (who was within reach of, e.g., a key), talked about the object (“Look, a shiny key!” “Do you want key?” etc.), but had no visual or physical contact with parrot or object; the juveniles’ labeling attempts would receive only vocal praise, thereby eliminating some functionality and considerable social interaction (Pepperberg and McLaughlin, 1996). The juveniles failed to learn referential label use in any non-M/R procedure, but succeeded in contiguous standard M/R sessions. Finally, we eliminated interactive aspects of modeling: A single student labeled objects, queried Griffin, and jointly attended to objects with him (Pepperberg et al., 2000). Griffin did not utter labels in 50 such sessions, but produced labels clearly after two or three subsequent M/R sessions. (Birds normally need about 20 M/R sessions to produce labels.) We suspected latent learning: Griffin apparently stored but did not
Evolution of Communication from an Avian Perspective
177
use labels until he observed their use modeled. We have replicated video studies using a liquid crystal display (Pepperberg and Wilkes, 2004) to see if cathode ray tube flickerfusion, rather than lack of interaction, discourages learning (Ikebuchi and Okanoya, 1999); flicker-fusion does not influence recognition, because Alex could respond appropriately to objects presented via a live video link (Rutledge and Pepperberg, 1988). In all cases, results so far emphasize the importance of reference, contextual use/functionality, and social interaction for training parrots to communicate meaningfully with humans. Mutual Exclusivity: Studying Subtle Changes in Input Another study showed input effects on Greys’ label learning that parallel children’s mutual exclusivity (ME) data (Pepperberg and Wilcox, 2000). ME refers to children’s brief assumption during early word acquisition that each object has one, and only one, label (e.g., Liittschwager and Markman, 1994; Merriman, 1991). Along with the whole-object assumption (that labels refer to entire objects, not some feature: Macnamara, 1982; Markman and Wachtel, 1988), ME supposedly guides children in initial label acquisition. ME may also help children interpret novel words as feature labels (overcome the wholeobject assumption; Markman, 1990), but very young children may find second labels for items initially more difficult to acquire than the first, because the second label is viewed as an alternative (Liittschwager and Markman, 1994). Input, however, affects ME: Children (Gottfried and Tonks, 1996) and parrots like Alex, who receive explicit or even implicit inclusivity data (X is a kind of Y; color names taught as additional, not alternative, labels—e.g., “Here’s a key; it’s a green key”), generally accept multiple labels for items and form hierarchical relations. Thus, shown a wooden block, Alex answers “What color?” “What shape?” “What matter?” and “What toy?” (Pepperberg, 1990). Parrots given colors or shapes as alternative labels (e.g., “Here’s key”; later, “It’s green”), like children, exhibit ME. Griffin, given the latter input, answered “What color?” with a previously learned object label in over 50 training sessions. Similarly, while learning an object label—cup—he answered “What toy?” with colors and had difficulty acquiring “cup.” Thus even small input differences affect label acquisition in parrots much as in young children (Pepperberg and Wilcox, 2000). Sound Play/Babbling Not only input, but also how humans and Grey parrots actively practice their communication code without overt social stimulation, influences acquisition (Kuczaj, 1983; Marler, 1970). Such monologue speech, although not essential for human language acquisition, exists for most children (Kuczaj, 1983; K. Nelson, 1989), and birds exhibit equivalent behavior (Marler, 1970). Monologue speech has two components: private speech produced in solitude, and social-context speech produced in the presence of potential receivers but
178
Irene M. Pepperberg
without obvious communicative purpose (e.g., undirected commentary while playing with toys: Fuson, 1979; Kuczaj and Bean, 1982). But why engage in vocal behavior bereft of communication? Why do babies babble and produce monologues in cribs (e.g., Weir, 1962)? Why do immature birds warble to themselves not only in laboratory isolation boxes (Marler, 1970) but also in nature (Baptista, 1983)? Why do Grey parrots practice privately before publicly emitting perfect English utterances (J. M. Baldwin, 1914; Pepperberg et al., 1991)? Although Alex’s labels usually appear in sessions initially as rudimentary patterns—first a vocal contour, then with vowels, finally with consonants (Patterson and Pepperberg, 1994, 1998)—completely formed new labels sometimes materialize after minimal training and without overt practice. Moreover, outside of sessions Alex often recombines labels or label parts in their corresponding orders (e.g., keeps appropriate label beginnings and endings); these innovations quickly become part of his repertoire if we provide acceptable corresponding objects (referential mapping: see below; see also Pepperberg, 1990). Such performance may be integral to development and, because it occurs across species, suggests an evolutionary theory of language play (Kuczaj, 1998). Reviewing childhood phenomena puts avian data in perspective. Monologue speech play permits practice that, like most play, facilitates learning by allowing experimentation with adult systems without consequences of failure. Unlike other vocal practice, monologues allow complete freedom to choose topics and contexts, attempt novel forms, and compare familiar and novel forms (review in Pepperberg, 1999). Development of some speakers being trained in adult usage through formal, programmed routines may suffer if practice is public—for example, the developmentally disabled (Fey, 1986), second-language learners (Krashen, 1976), or normal preverbal children being hurried to attain competence (Rogoff, 1990). Such speakers face consequences of incorrectly communicating, may be interrupted and corrected, or may be committed to possibly boring drills; conceivably, these negative consequences inhibit further practice, and hence delay development (e.g., Koegel et al., 1987; Krashen, 1976; Kuczaj, 1983; Rice, 1991; Salmon et al., 1998). Whether caretakers who recast children’s errors or expand elementary attempts at communication help or hinder development is unclear (e.g., Bloom et al., 1998; Bohannon et al., 1996; Morgan, 1996; K. E. Nelson et al., 1995). Even in ostensibly unstructured dialogues with caretakers, constraints of constructing replies with appropriate word meaning (semantics), grammar (syntax), and functional use (pragmatics) could be inhibitory (Feldman, 1989). Lack of negative feedback or the need to use correct forms in monologues might encourage practice and accelerate learning. If monologues enable children to integrate what they have recently heard, practice new forms, and thus progress toward adult communication, how might monologues work for parrots, who are not learning human language but are acquiring functional human speech?
Evolution of Communication from an Avian Perspective
179
As noted above, birds practice vocally: Even laboratory-raised, isolate oscines learning songs from audiotapes engage in solitary vocal practice (monologue-like routines) before developing adult forms (Hultsch, 1990; Marler and Peters, 1982). These birds, like their wild compatriots, may practice order of notes in songs and order of songs if they have a repertoire, recombine elements from different tutors if given multiple input sources, and practice different songs in different contexts (e.g., under different lighting conditions). If not reared as isolates, they may change song types practiced after noting how different songs affect other individuals (Hultsch, 1990; King et al., 1996; Kroodsma, 1988; Lemon, 1975; Margoliash et al., 1994; D. E. Nelson, 1992; Nordby et al., 2001; V. A. Smith et al., 2000). Parrots behave similarly when learning natural vocalizations (Nottebohm, 1970) or mimetic speech (J. M. Baldwin, 1914; Todt, 1975b). How parrots use this behavior in nature is unknown, but it might parallel human practice of syntax, semantics, and pragmatics (West and King, 1985). Interestingly, Alex demonstrated certain parallels with children’s practice in monologues, particularly for acquisition of “none,” “nail,” and “bread” (Pepperberg et al., 1991). Alex, much like children, practiced “none” in consequence-free monologues for weeks before using it socially. Had he transferred practice to social speech, as children would, he would have actively modified the existent label “one” or socially attempted “none” (e.g., used “nnnn-won,” “nnnnnn,” or some other permutation) with trainers. Instead, his first social try at “none” (in session) was “one.” He used “one,” not “none,” in training for another week, then “one” “none” interchangeably for four more weeks before using “none” with trainers. Alex’s behavior differed for “nail.” Little monologue practice of “nail” existed prior to a social attempt; however, immediately after uttering “banail” socially, he began private practice, modifying this utterance toward “nail.” Private vocal strings (e.g., “banail chail mail”) were akin to those in Weir’s (1962) son’s monologues. Alex practiced privately for 10 days before socially producing “nail” reliably. After six more weeks he used “nail,” not variants, in monologues, producing “nail” more frequently than existent labels and as often as other novel vocalizations. Practice of “bread” paralleled that for “none” and “nail.” As with “none,” Alex practiced intermediate monologue forms (“braa”, “graed”), but for only two days before using them socially. As with “nail,” practice sessions were long, but for “bread” he modified intonation more than phonology. Unlike “nail,” which quickly matched standard English, “bread” stabilized halfway between “braa” and “graed.” Decreased interest in bread apparently affected motivation for acquiring its label; with muffins as the referent, however, “bread” improved quickly. Alex’s private practice also contained advanced behavior observed in children’s crib speech (Dore, 1987): private “dialogues” rather than monologues. One child, at 22–36
180
Irene M. Pepperberg
months old, did not reproduce caretakers’ utterances exactly, but re-created scenarios from interpersonal interactions. Although Alex did not integrate exact context or form of training dialogues into monologues, in both private and social-context speech he occasionally used questions and answers in one utterance set (e.g., “What’s same?” and “none” sequentially). Alex’s exposure to adult language, however, did not match the child’s, and he was not at the linguistic level where children employ private “dialogue” in monologue speech. Alex did reconstruct and reinvent scenarios not involved in formal training. Monologues included utterances from daily routines (e.g., “You go gym,” “Want some water”) and strings involving often heard patterns (e.g., “You be good, gonna go eat lunch, I’ll be back tomorrow”). Question-answer dialogues (e.g., “Snap, snap, snap” “How many?” “Three”) also emerged Interestingly, Kyaaro, who was far less linguistically sophisticated than Alex, used entire dialogues in solitude, reproducing two different trainers’ voices, as well as his own and synthesizer sounds—for example, “Listen, Kyo” (trainer 1 voice). “Click, click, click, click” (synthesizer). “How many?” (trainer 2 voice). “Four” (his voice). “Good boy!!!” (trainer 1 or 2). Overall, Alex’s monologues were less like those of children deeply involved in sound play and more like those switching from sound play to private “dialogue” and socialcontext speech. He often, but not exclusively, practiced vocalizations first introduced in sessions that contained consequences for errors in private, consequence-free contexts. Percentage of sound play in monologues was small, as is true for children who have similarly acquired fairly large repertoires but not adult competence. And, like children in early stages of language acquisition, he began to reproduce training scenarios (e.g., imaginary “dialogues” with trainers) in monologues. Such human-avian comparisons may exist for other birds (Pepperberg et al., 1991). Not only Alex, but also various songbirds, engage in seemingly comparable social-context monologues. Juvenile white-crowned sparrows and Bewick’s wrens may produce adult versions of song in the presence of adult conspecifics before adult functions of these songs could reasonably be used (Baptista, 1983; Kroodsma, 1974). How this behavior functions in nature is unknown. Given that, as noted above, many oscine birds practice and refine note order of their songs, as well as the structure of these notes and the order of their songs (e.g., Todt and Hultsch, 1998), and that for some birds note order affects meaning (see Pepperberg and Schinke-Llano, 1991), purported roles of monologues in human language acquisition (e.g., to analyze and refine syntax; Craig and Gallagher, 1979; Kuczaj, 1983) may provide a framework for examining avian practice (e.g., Kroodsma and Pickert, 1984b). Alex also produced new potential labels by combining parts of existing utterances. Parallels may exist with young children’s phonemic (re)combinations: Both normal and language-impaired children more likely spontaneously produce utterances combining existent
Evolution of Communication from an Avian Perspective
181
rather than new phonemes (Baddeley et al., 1998; Leonard et al., 1983; review in Pepperberg, 1999), and these productions often correlate little with comprehension levels. Alex’s spontaneous novel phonemic combinations often occurred socially outside of testing and training (Pepperberg, 1990); our juvenile Greys behave similarly (e.g., Neal, 1996). Such utterances thus appear in contexts reminiscent of children’s play (see above). These vocalizations were rarely, if ever, used by trainers, but resembled both existing labels and separate human vocalizations—for example, “grain” from “grey.” Here trainers gave Alex seed (not normally available), talked about and identified “grain” for one another; later we substituted sprouted legumes. Alex received a ring of paper clips for “chain,” the appropriate fruit for “grape,” and wire mesh (later a nutmeg grater) for trimming his beak after he uttered “grate.” “Cup” (from “up”) was mapped to metal cups and plastic mugs, “copper” (first produced as “cupper”) to pennies, and “block” to cubical wooden beads. “Chalk” (from “talk”) was mapped to variously colored blackboard chalk; “truck,” to toy cars and trucks. Thus, when we referentially mapped spontaneous utterances, Alex rapidly integrated these labels into his repertoire, using them routinely to identify or request appropriate items. We ignored “cane,” “shane,” and “cheenut”; he abandoned these utterances (Pepperberg, 1990). Importantly, Alex always combined label pieces in ways suggesting that he abstracted rules for utterances’ beginnings and endings. After analyzing over 22,000 English vocalizations, we never observed “backward” combinations (e.g., “percup” rather than “cupper”—Pepperberg et al., 1991—although our transcriptions are subjective). His behavior thus implies (but cannot prove) that he parses human sound streams in humanlike ways, acoustically represents labels as humans do, and has similar phonetic categories (Patterson and Pepperberg, 1994, 1998), and that his behavior is consistent with top-down processing similar to humans’ (Ladefoged, 1982). Such behavior is unlikely to have arisen from instruction, suggesting a cognitive architecture analogous to that of humans. In sum, Alex’s vocal practice may have natural origins and demonstrate some parallels with young children’s behavior. Vocal play appears to be relatively widespread, and may be involved in acquiring many forms of communication (e.g., Kuczaj, 1998; Petitto and Marentette, 1991). Mechanisms of Production Of evolutionary interest, too, is how Grey parrots actually produce speech. Researchers (e.g., Greenewalt, 1968; Lieberman, 1991) argue that birds are incapable of producing true speech, that avian “speech” depends on a bird’s using each half of its syrinx independently to produce a different sinusoidal, pure tone; each “tone is present at the formant frequencies of the original human speech sound that the bird is mimicking. The sinusoids are, in addition, interrupted at the rate of the fundamental frequency of the mimicked
182
Irene M. Pepperberg
human speech. . . . We perceive these nonspeech signals as speech because they have energy at the formant frequencies” (Lieberman, 1984, p. 156). Such may be true for songbirds, but not for Greys. X-ray videotapes and acoustic analyses of Alex’s speech (Patterson and Pepperberg, 1994, 1998; Warren et al., 1996 and reviews of previous studies therein) show that, unlike songbirds, Alex has a single syringeal mechanism (not two independent halves), and that he uses suprasyringeal structures (his beak, tongue, larynx, glottis, etc.) to produce speech we acoustically categorize (e.g., by formant structure, voice-onset timing) as equivalent to that of humans. We analyzed his vowel and stop consonant production. Comparisons of Alex’s vowel parameters with mine (Patterson and Pepperberg, 1994) demonstrated both differences (e.g., absolute values of first formant frequencies) and similarities (e.g., general values of second formant frequencies, separation of vowels into back and front categories with respect to tongue placement) in acoustic properties of avian and human speech. We also found similarities and differences in articulatory mechanisms: Parrots, for example, use their tongues in some but not all ways used by humans to produce vowels. Nevertheless, Alex’s sonagrams resemble those of humans (figure 10.1). He uses a two-tube system and frequency modulation the way humans do, but exactly reproduces neither human articulatory motions nor acoustic idiosyncracies; his articulatory/acoustic constructs derive from his anatomy. Interestingly, budgerigars (Melopsittacus undulatus), unlike Greys and humans, produce human speech primarily via amplitude modulation (Banta Lavenex, 1999). Of equal importance was that Alex’s consonants resemble those of humans, which require teeth and lips. Patterson and Pepperberg (1998) found similarities (category distinctions) and differences (predictive power of measures related to F1, F2; coherence of voicing/place subcategories) in Alex’s and human speech. For consonants, his F2 varies somewhat more than that of mine; he may use his esophagus to produce /p/ and /b/. His stops, however, exhibit voiced/voiceless, labial, alveolar, and velar groupings; thus such distinctions are basic to vertebrates, not mammals. Alex’s data suggest another phenomenon considered uniquely human: anticipatory coarticulation, which occurs when a phoneme is produced so as to configure the vocal tract for a subsequent sound. Observations (X-ray videos, Warren, 1995; tracings of SVHS video stills of birds’ utterances, Patterson and Pepperberg, 1998) are consistent with anticipatory coarticulation. Beak openness differs during the burst of /k/ for “key” /ki/ versus “cork” /kork/; birds also protract their tracheas before an /f/ only when it is followed by /or/ (“four”), even if interrupted before finishing. Moreover, Alex’s VOT (voice-onset timing) for stop consonants is significantly affected by identity of the following phoneme (Patterson and Pepperberg, 1998), which implies VOT adjustment in preparation for this phoneme.
Evolution of Communication from an Avian Perspective
183
Figure 10.1 Sonograms of Alex producing “pea” /pi/ and “pah” /pah/ for “pasta”
These data suggest, but do not prove, intentional use of preparatory strategies or topdown processing. Even researchers studying humans debate this issue. Ladefoged (1982) treats such behavior as evidence of top-down processing because the phenomenon apparently presupposes knowledge of sounds before they are produced, but Repp (1986, p. 1618) suggests that “Some forms of coarticulation are an indication of advanced speech production skills whereas others may be a sign of articulatory immaturity, and yet others are neither because they simply cannot be avoided.” Alex’s vocal tract is mature, but his anatomy might make anticipatory coarticulation unavoidable. Thus, whether Alex’s behavior reflects intention or simply anatomical constraints cannot be determined until we know more about his vocal mechanisms.
184
Irene M. Pepperberg
Models for the Evolution of Communication I have described Grey parrots’ cognitive and communicative abilities; input needed to engender their referential, allospecific vocal learning; and parallels with humans in acquisition and use of learned vocalizations. These parallels were drawn to demonstrate birds’ importance as models for studying the evolution of complex communication, particularly in the vocal mode. I suggested that a parrot’s capacity to learn what I teach in the laboratory must be based on an existent cognitive architecture, but never intimated why this cognitive architecture should exist—or, from an evolutionary standpoint, what selection pressures may have shaped such an architecture, and hence Grey parrot behavior. Such hesitancy comes from my propensity to propose testable hypotheses and, currently, I see no way to design rigorous tests. Parallels can be drawn to other creatures and theories, but resulting correlations are still untestable; furthermore, correlation merely suggests causation. Given those strictures, however, speculation is possible—if only to compare theories about avian communication (and possibly intelligence) with those for other animals (Pepperberg, 1999). A starting point is the work of W. J. Smith (1997; p. 31), who suggests that communication is best understood as being “forged in the linking of different individuals,” and that by knowing both the information a recipient receives from a signal and its response to that information, we can derive its processing abilities. Of critical importance is Smith’s argument that both content and context—the source and circumstances surrounding a signal’s emission—are part and parcel of what recipients must process. Also, the result is a weighting of input before a reaction is produced; this weighting shows that processing has occurred, and provides details of how information hierarchies are formed. This argument suggests an additional contingency, hinted at earlier, concerning intelligence as a possible prerequisite for complex communicative abilities. Possibly intelligence was an evolutionary outcome of the need not only for memory and flexibility, but also for choosing what to ignore as well as what to process. Social primates must actively make such choices; animals unable to form information hierarchies would be unable to act. But parrots, unlike most animals, provide a clear observational case—by what is/is not vocally acquired—for what they treat as viable or nonviable input (Pepperberg, 1999). Specifically, wild Greys attend and react to a variety of inputs and can reproduce environmental noises (e.g., sounds of a nearby stream) as well as conspecific and allospecific vocalizations—but reproduce only the latter two (e.g., Cruickshank et al., 1993; May, 1995/1997). They may locate a stream by its sound, but rank the sound low with respect to any need for its reproduction (Pepperberg, 1999); some evolutionary pressure likely selected for such behavior. This choice of reproduction provides measurable evidence for hierarchical learning—determining what is important to learn and communicate—and cognitive pro-
Evolution of Communication from an Avian Perspective
185
cessing, a biologically relevant ability (Balda et al., 1996). I will not belabor this point; such observations and correlations are not rigorously testable. Moreover, evolutionary pressures likely affect different communicative and cognitive abilities differently for different species in different habitats; also, neither intelligence nor communication is a unitary “thing” (Byrne, 1995). The data merely suggest that the combination of intelligence and advanced communication skills may have arisen not only in primates or even mammals, but also in birds, and directs not only learning processes but also what is appropriate to learn and communicate. Moreover, because what birds—and not only Greys— learn and communicate may involve allospecific codes, evolution of complex communication must involve considerable but directed plasticity. A white-crowned sparrow, for example, will almost effortlessly learn its own song and reject other songs even in the most impoverished conditions (Petrinovich, 1985), but will learn, from live tutors, allospecific songs effective in eliciting responses of allospecific neighbors (Baptista and Catchpole, 1989), which suggests an underlying cognitive architecture for complex vocal acquisition. Another issue, also speculative but related to human communicative learning, involves referential mapping (described above). Such emergent behavior can be fostered in humans and, apparently, in parrots. Both children’s and parrots’ spontaneous utterances, initially lacking language value (reference, intentionality), may acquire such value if caretakers interpret the utterances as meaningful and intentional, and subjects react positively to the interpretation (e.g., Furrow and Nelson, 1986; Lock, 1980; Pepperberg, 1990; Snow, 1979; Veneziano, 1988; cf. Gleitman et al., 1984). Repeated interactions then “conventionalize” the sounds (phonemic patterns) and sound–meaning connection (referential aspects) toward standard communication. Children may use incompletely understood terms to get adults to clarify exact meanings (Blank, 1974; Leonard et al., 1983), a strategy related to Brown’s (1958) “original word game,” in which children designate an object and caretakers provide the object label and correct children’s production (K. Nelson, 1996). My parrots also use humans to provide referential information for relatively novel labels. They see humans in this context during training, then build on the situation: They take a label used in a very specific context, such as “wool” for a woolen pom-pom, and pull at a trainer’s sweater while uttering that label. Such an action happening by chance is slim; by then the bird usually has at least three to four other labels. Birds—like children (Brown, 1973)—seem to be testing the situation. And our responses—of high affect and excitement—stimulate birds further, showing the power of their utterances and encouraging early categorization attempts. Even if birds err in initial categorizations, they are reinforced because we provide a correct, new label for the item: We state that an almond isn’t “cork,” but “cork nut” (Pepperberg, 1999). I suggest that parrots, like children, have a repertoire of desires and purposes driving them to form and test emerging ideas in dealing
186
Irene M. Pepperberg
with the world; these ideas may be the first stages of representation (categorization) in cognitive processing. In sum, I argue for examining many species for information on evolutionary pressures that helped shape existent systems. Such pressures were exerted not only on primates; hence the existence of analogous avian complex communication systems and their bases in analogous neural architectures. Moreover, complex communicative systems apparently require, or likely coevolved with, complex cognition: Although communication is functionally social, its complexity is based on the complexity of information communicated, processed, and received; thus contingencies that shape intelligence (social, ecological, etc.) likely shape communication. If intelligence is indeed a correlate of primates’ complicated social systems and long lives—that is, the outcome of selection processes favoring animals that flexibly transfer skills across distinct domains (Rozin, 1976), and remember and act upon knowledge of detailed intragroup social relations (Jolly, 1966; Humphrey, 1976)—then these patterns might also drive parrot cognition and communication. Long-lived birds existing in complex social systems, not unlike those of some primates, use abilities honed for social gains to direct other forms of information processing and vocal learning. Add the need for categorical classes (e.g., to distinguish neutral stimuli from predators, etc.), abilities both to recognize and to remember environmental regularities and adapt to unpredictable environmental changes over extensive lifetimes, and a communication system that is primarily vocal; then parrots’ capacities are not surprising (Pepperberg, 1999). Marler (1996) has proposed similar parallels between birds and primates, although not specifically for parrots. Only by looking for essential commonalities across many species can we develop theories about behavioral elements that are essential to, and evolutionary pressures that have shaped, complex communication. Whether avian and human abilities evolved convergently—whether similar adaptive responses evolved independently in association with similar environmental pressures—is unclear, but a common core of skills likely underlies complex cognitive and communicative behavior across species, even if specific skills manifest somewhat differently in each species. Acknowledgment This chapter was written under the support of the MIT Media Lab. Research was supported by grants from the National Science Foundation, the Harry Frank Guggenheim Foundation, the Pet Care Trust, the Kenneth Scott Charitable Trust, and donors to The Alex Foundation.
Evolution of Communication from an Avian Perspective
187
References Baddeley A, Gathercole S, Papagno C (1998) The phonological loop as a language learning device. Psych Rev 105: 158–173. Balda RP, Kamil AC, Bednekoff PA (1996) Predicting cognitive capacity from natural history. In: Current Ornithology, 13 (Nolan V Jr, Ketterson, ED, eds.), 33–66. New York: Plenum Press. Balda RP, Pepperberg IM, Kamil AC (eds.) (1998) Animal Cognition in Nature. London: Academic Press. Baldwin DA (1995) Understanding the link between joint attention and language. In: Joint Attention (Moore C, Dunham PJ, eds.), 131–158. Hillsdale, N.J.: Lawrence Erlbaum. Baldwin JM (1914) Deferred imitation in West African grey parrots. Proc IXth Intl Cong. Zool., 536. Bandura A (1971) Analysis of social modeling processes. In: Psychological Modeling (Bandura A, ed.), 1–62. Chicago: Aldine-Atherton. Baptista LF (1983) Song learning. In: Perspectives in Ornithology (Brush AH, Clark GA Jr, eds.), 500–506. Cambridge: Cambridge University Press. Baptista LF, Catchpole CK (1989) Vocal mimicry and interspecific aggression in songbirds: Experiments using white-crowned sparrow imitation of song sparrow song. Behaviour 109: 247–257. Baptista LF, Petrinovich L (1984) Social interaction, sensitive phases, and the song template hypothesis in the white-crowned sparrows. Anim Behav 32: 172–181. Baptista LF, Petrinovich L (1986) Song development in the white-crowned sparrow: Social factors and sex differences. Anim Behav 34: 1359–1371. Bednarz JC (1988) Cooperative hunting in Harris’ hawks (Parabuteo unicinctus). Science 239: 1525–1527. Bickerton D (1990) Language and Species. Chicago: University of Chicago Press. Blank M (1974) Cognitive function of language in the preschool years. Devel Psych 10: 229–245. Bloom L, Margulis C, Tinker E, Fujita, N (1998) Early conversations and word learning: Contributions from child and adult. Child Devel 67: 3154–3175. Boesch C (1994) Cooperative hunting in wild chimpanzees. Anim Behav 48: 653–667. Bohannon JN III, Padgett RJ, Nelson KE, Melvin M (1996) Useful evidence on negative evidence. Devel Psych 32: 551–555. Bowmaker JK, Heath LA, Das D, Hunt DM (1994) Spectral sensitivity and opsin structure of avian rod and cone visual pigments. Invest Ophthal Vis Sci 35: 1708. Bowmaker JK, Heath LA, Wilkie SE, Das D, Hunt DM (1996) Middle-wave cone and rod visual pigments in birds: Spectral sensitivity and opsin structure. Invest Ophthal Vis Sci 37: S804. Brown R (1958) Words and Things. New York: Free Press. Brown R (1973) A First Language: The Early Stages. Cambridge, Mass.: Harvard University Press. Byers BE, Kroodsma DE (1992) Development of two song categories by chestnut-sided warblers. Anim Behav 44: 799–810. Byrne R (1995) The Thinking Ape: Evolutionary Origins of Intelligence. Oxford: Oxford University Press. Cheney DL, Seyfarth RM (1990) Precis of “How monkeys see the world.” Behav Brain Sci 15: 135–182. Cheney DL, Seyfarth RM (1999) Recognition of other individuals’ social relationships by female baboons. Anim Behav 58: 67–75. Corder-Bolz CR, O’Bryant S (1978) Teacher vs. program. J Commun 28: 97–103. Craig HK, Gallagher TM (1979) The structural characteristics of monologues in the speech of normal children: Syntactic nonconversational aspects. J Speech Hearing Res 22: 46–62. Cruickshank AJ, Gautier J-P, Chappuis C (1993) Vocal mimicry in wild African Grey Parrots Psittacus erithacus. Ibis 135: 293–299.
188
Irene M. Pepperberg
Deacon TW (1997) The Symbolic Species: The Co-evolution of Language and the Brain. New York: Norton. Dore J (1987) An analysis of crib monologues from 22 months to three years. Paper presented at the 12th Annual Boston University Conference on Language Development, October. Evans DL (1987) Dolphins as beaters for gulls? Bird Behav 7: 47–48. Feldman CF (1989) Monologue as problem-solving narrative. In: Narratives from the Crib (Nelson K, ed.), 98–119. Cambridge, Mass.: Harvard University Press. Fernandez J (1982) Bwiti: An Ethnography of the Religious Imagination in Africa. Princeton, N.J.: Princeton University Press. Fey ME (1986) Language Intervention with Young Children. San Diego: College-Hill Press. Forestell PH, Herman LM (1988) Delayed matching of visual materials by a bottlenosed dolphin aided by auditory symbols. Anim Learn Behav 16: 137–146. Furrow D, Nelson K (1986) A further look at the motherese hypothesis: A reply to Gleitman, Newport, and Gleitman. J Child Lang 13: 163–176. Fuson KC (1979) The development of self-regulating aspects of speech: A review. In: Development of SelfRegulation Through Private Speech (Zivin G, ed.), 135–217. New York: Wiley. Gardner RA, Gardner BT (1978) Comparative psychology and language acquisition. Ann. NY Acad Sci 309: 37–76. Gleitman LR, Newport EL, Gleitman H (1984) The current status of the motherese hypothesis. J Child Lang 11: 43–79. Gnam R (1988) Preliminary results on the breeding biology of Bahama amazon. Parrot Letter 1: 23–26. Gottfried GM, Tonks JM (1996) Specifying the relation between novel and known: Input affects the acquisition of novel color terms. Child Devel 67: 850–866. Greenewalt CH (1968) Bird Song: Acoustics and Physiology. Washington, D.C.: Smithsonian Institution Press. Greenfield PM (1978) Developmental processes in the language learning of child and chimp. Behav Brain Sci 4: 573–574. Hockett C (1959) Animal “languages” and human language. Hum Biol 31: 32–39. Hollich GJ, Hirsh-Pasek K, Golinkoff RM (2000) Breaking the language barrier: An emergentist coalition model for the origins of word learning. Monog Soc Res Child Devel 262: 1–138. Hulse SH, Humpal J, Cynx JA (1984) Processing of rhythmic sound structures by birds. Ann NY Acad Sci 423: 407–419. Hultsch H (1990) Recombination of acquired songs as a correlate of package formation. In: Brain—Perception, Cognition (Elsner N, Roth G, eds.), 433. Stuttgart: Thieme Verlag. Humphrey NK (1976) The social function of intellect. In: Growing Points in Ethology (Bateson PPG, Hinde RA, eds.), 303–317. Cambridge: Cambridge University Press. Ikebuchi MK, Okanoya K (1999) Male zebra finches and Bengalese finches emit directed songs to the video images of conspecific females projected onto a TFT display. Zool Sci 16: 63–70. Jarvis ED, Mello CV (2000) Molecular mapping of brain areas involved in parrot vocal communication. J Comp Neurol 419: 1–31. Jolly A (1966). Lemur social behavior and primate intelligence. Science 153: 501–506. King AS, Freeberg TM, West MJ (1996) Social experience affects the process and outcome of vocal ontogeny in two populations of cowbirds (Molothrus ater). J Comp Psych 110: 276–285. Koegel RL, Dyer K, Bell LK (1987) The influence of child-preferred activities on autistic children’s social behavior. J Appl Behav Anal 20: 243–252. Krashen SD (1976) Formal and informal linguistic environments in language learning and language acquisition. TESOL Quart 10: 157–168. Kroodsma DE (1974) Song learning, dialects, and dispersal in the Bewick’s wren. Zeit Tierpsych 35: 352–380.
Evolution of Communication from an Avian Perspective
189
Kroodsma DE (1988) Song types and their use: Developmental flexibility of the male blue-winged warbler. Ethology 79: 235–247. Kroodsma DE, Miller EH (eds.) (1996) Ecology and Evolution of Acoustic Communication in Birds. Ithaca, N.Y.: Cornell University Press. Kroodsma DE, Pickert R (1984a) Repertoire size, auditory templates, and selective vocal learning in songbirds. Anim Behav 32: 395–399. Kroodsma DE, Pickert R (1984b) Sensitive phases for song learning: Effects of social interaction and individual variation. Anim Behav 32: 389–394. Kuczaj SA (1983) Crib Speech and Language Play. New York: Springer-Verlag. Kuczaj SA (1998) Is an evolutionary theory of language play possible? C Psych, Cognit 17: 135–154. Kuczaj SA, Bean A (1982) The development of non-communicative speech systems. In: Language Development: Language, Thought, and Culture (Kuczaj SA, ed.), 279–300. Hillsdale, N.J.: Lawrence Erlbaum. Ladefoged P (1982) A Course in Phonetics. San Diego: Harcourt Brace Jovanovich. Lea SEG, Dittrich WH (1999) What do birds see in moving video images? C Psych, Cognit 18: 765–803. Lemish D, Rice ML (1986) Television as a talking picture book: A prop for language acquisition. J Child Lang 13: 251–274. Lemon RE (1975) How birds develop song dialects. Condor 77: 385–406. Leonard L, Chapman K, Rowan L, Weiss A (1983) Three hypotheses concerning young children’s imitations of lexical items. Devel Psych 19: 591–601. Lesser GS (1974) Children and Television: Lessons from Sesame Street. New York: Random House. Levinson ST (1980) The social behavior of the white-fronted Amazon (Amazona albifrons). In: Conservation of New World Parrots (Pasquier RF, ed.), 403–417. ICBP Technical Publication no. 1. Washington, D.C.: Smithsonian Institution Press. Lieberman P (1984) The Biology and Evolution of Language. Cambridge, Mass.: Harvard University Press. Lieberman P (1991) Uniquely Human: The Evolution of Speech, Thought, and Selfless Behavior. Cambridge, Mass.: Harvard University Press. Lieberman P (2000) Human Language and Our Reptilian Brain. Cambridge, Mass.: Harvard University Press. Liittschwager JC, Markman EM (1994) Sixteen- and 24-month olds’ use of mutual exclusivity as a default assumption in second-label learning. Devel Psych 30: 955–968. Lock A (1980) The Guided Reinvention of Language. London: Academic Press. Macnamara J (1982) Names for Things: A Study of Human Learning. Cambridge, Mass.: MIT Press. Margoliash D, Staicer CA, Inoue SA (1994) Stereotyped and plastic song in adult indigo buntings, Passerina cyanea. Anim Behav 42: 367–388. Markman EM (1990) Constraints children place on word meaning. Cognit Sci 14: 57–77. Markman EM, Wachtel GF (1988) Children’s use of mutual exclusivity to constrain the meanings of words. Cognit Psych 20: 121–157. Marler P (1970) A comparative approach to vocal learning: Song development in white-crowned sparrows. J Comp Physiol Psych 71: 1–25. Marler P (1973) Speech development and bird song: Are there any parallels? In: Communication, Language, and Meaning (Miller GA, ed.), 73–83. New York: Basic Books. Marler P (1996) Social cognition: Are primates smarter than birds? In: Current Ornithology, 13 (Nolan V Jr, Ketterson ED, eds.), 1–32. New York: Plenum. Marler P, Peters S (1982) Subsong and plastic song: Their role in the vocal learning process. In: Acoustic Communication in Birds, vol. 2; Song Learning and Its Consequences (Kroodsma DE, Miller EH, eds.), 25–50. New York: Academic Press. May D (1995/1997) Studies of Grey Parrots in nature. Unpublished raw data.
190
Irene M. Pepperberg
McFarland WL, Morgane PJ (1966) Neurological, cardiovascular, and respiratory adaptations in the dolphin, Tursiops truncatus. Proc Annual Convention of the Amer. Psychol. Assoc: 167–168. Merriman WE (1991) The mutual exclusivity bias in children’s word learning: A reply to Woodward and Markman. Devel Rev 11: 164–191. Miles HL (1983) Apes and language. In: Language in Primates (de Luce J, Wilder HT, eds.), 43–61. New York: Springer-Verlag. Morgan JL (1996) Finding relations between input and outcome in language acquisition. Devel Psych 32: 556–559. Morgane PJ, Jacobs MS, Galaburda A (1986) Evolutionary morphology of the dolphin brain. In: Dolphin Cognition and Behavior: A Comparative Approach (Schusterman RJ, Thomas JA, Woods FG, eds.), 5–29. Hillsdale, N.J.: Lawrence Erlbaum. Mowrer OH (1950) Learning Theory and Personality Dynamics. New York: Ronald Press. Neal KB (1996) The development of a vocalization in an African grey parrot (Psittacus erithacus). Unpublished senior thesis, University of Arizona. Nelson DE (1992) Song overproduction and selective attrition lead to song sharing in the field sparrow. Behav Ecol Sociobiol 30: 415–424. Nelson K (1989) Monologues in the crib. In: Narratives from the Crib (Nelson K, ed.), 1–23. Cambridge, Mass.: Harvard University Press. Nelson K (1996) Language in Cognitive Development: The Emergence of the Mediated Mind. Cambridge: Cambridge University Press. Nelson KE, Welsh J, Camarata SM, Butkovsky L, Camarata M (1995) Available input for language-impaired children and younger children of matched language levels. First Lang 15: 1–17. Nordby JC, Campbell SE, Beecher MD (2001) Late song learning in song sparrows. Anim Behav 61: 835–846. Nottebohm F (1970) Ontogeny of bird song. Science 167: 950–956. Nottebohm F (1980) Brain pathways for vocal learning in birds: A review of the first ten years. Prog Psychobiol Physiol Psych 9: 85–124. Patterson DK, Pepperberg IM (1994) A comparative study of human and grey parrot phonation: I. Acoustic and articulatory correlates of vowels. J Acous Soc Amer 96: 634–648. Patterson DK, Pepperberg IM (1998) A comparative study of human and Grey parrot phonation: II. Acoustic and articulatory correlates of stop consonants. J Acous Soc Amer 103: 2197–2213. Peake TM, Terry AMR, McGregor PK, Dabelsteen T (2001) Male great tits eavesdrop on simulated male–male interactions. Proc Roy Soc London B268: 1183–1187. Pepperberg IM (1981) Functional vocalizations by an African Grey parrot (Psittacus erithacus). Zeit Tierpsych 55: 139–160. Pepperberg IM (1985) Social modeling theory: A possible framework for avian vocal learning. Auk 102: 854–864. Pepperberg IM (1988) An interactive modeling technique for acquisition of communication skills: Separation of “labeling” and “requesting” in a psittacine subject. Appl Psycholing 9: 59–76. Pepperberg IM (1990) Referential mapping: Attaching functional significance to the innovative utterances of an African Grey parrot. Appl Psycholing 11: 23–44. Pepperberg IM (1994) Vocal learning in African Grey parrots: Effects of social interaction. Auk 111: 300–313. Pepperberg IM (1996) Categorical class formation by an African Grey parrot (Psittacus erithacus). In: Stimulus Class Formation in Humans and Animals (Zentall TR, Smeets PR, eds.), 71–90. Amsterdam: Elsevier. Pepperberg IM (1997) Social influences on the acquisition of human-based codes in parrots and nonhuman primates. In: Social Influences on Vocal Development (Snowdon CT, Hausberger M, eds.), 157–177. Cambridge: Cambridge University Press. Pepperberg IM (1999) The Alex Studies. Cambridge, Mass.: Harvard University Press.
Evolution of Communication from an Avian Perspective
191
Pepperberg IM, Brese KJ, Harris BJ (1991) Solitary sound play during acquisition of English vocalizations by an African Grey parrot: Possible parallels with children’s monologue speech. Appl Psycholing 12: 151–177. Pepperberg IM, Brezinsky MV (1991) Relational learning by an African Grey parrot: Discriminations based on relative size. J Comp Psych 105: 286–294. Pepperberg IM, Gardiner LI, Luttrell LJ (1999) Limited contextual vocal learning in the grey parrot: The effect of co-viewers on videotaped instruction. J Comp Psych 113: 158–172. Pepperberg IM, McLaughlin MA (1996) Effect of avian–human joint attention on allospecific vocal learning by grey parrots. J Comp Psych 110: 286–297. Pepperberg IM, Naughton JR, Banta PA (1998) Allospecific vocal learning by Grey parrots (Psittacus erithacus): A failure of videotaped instruction under certain conditions. Behav Proc 42: 139–158. Pepperberg IM, Sandefer RM, Noel D, Ellsworth CP (2000) Vocal learning in the grey parrot: Effect of species identity and number of trainers. J Comp Psych 114: 371–380. Pepperberg IM, Schinke-Llano L (1991) Language acquisition and use in a bilingual environment: A framework for studying birdsong in zones of sympatry. Ethology 89: 1–28. Pepperberg IM, Sherman D (2000) Proposed use of two-part interactive modeling as a means to increase functional skills in children with a variety of disabilities. Teach Learn Med 12: 213–220. Pepperberg IM, Shive HA (2001) Hierarchical combinations by a grey parrot: Bottle caps, lids, and labels. J Comp Psych 115: 376–384. Pepperberg IM, Wilcox SE (2000) Evidence for a form of mutual exclusivity during label acquisition by grey parrots? J Comp Psych 114: 219–231. Pepperberg IM, Wilkes SR (2004) Lack of referential vocal learning from LCD video by Grey parrots (Psittacus erithacus). Interaction Studies 5: 75–97. Petitto LA, Marentette PF (1991) Babbling in the manual mode: Evidence for the ontogeny of language. Science 251: 1493–1496. Petrinovich L (1985) Factors influencing song development in white-crowned sparrows (Zonotrichia leucophrys). J Comp Psych 99: 15–29. Premack D (1978) On the abstractness of human concepts: Why it would be difficult to talk to a pigeon. In: Cognitive Processes in Animal Behavior (Hulse SH, Fowler H, Honig WK, eds.), 421–451. Hillsdale, N.J.: Lawrence Erlbaum. Ramus F, Hauser MD, Miller C, Morris D, Mehler J (2000) Language discrimination by human newborns and by cotton-top tamarin monkeys. Science 288: 349–351. Repp BH (1986). Some observations on the development of anticipatory coarticulation. J Acous Soc Amer 79: 1616–1619. Rice ML (1991) Children with specific language impairment: Toward a model of teachability. In: Biological and Behavioral Determinants of Language Development (Krasnegor NA, Rumbaugh DM, Schiefelbusch RL, Studdert-Kennedy M, eds.), 447–480. Hillsdale, N.J.: Lawrence Erlbaum. Rice ML, Huston AC, Truglio R, Wright J (1990) Words from “Sesame Street”: Learning vocabulary while viewing. Devel Psych 26: 421–428. Rogoff B (1990) Apprenticeship in Thinking. Oxford: Oxford University Press. Rozin P (1976) The evolution of intelligence and access to the cognitive unconscious. Prog Psychobiol Physiol Psych 6: 245–280. Rutledge D, Pepperberg IM (1988) Video studies of same/different. Unpublished raw data. St. Peters M, Huston AC, Wright JC (1989) Television and families: Parental coviewing and young children’s language development, social behavior, and television processing. Paper presented at Society for Research in Child Devepopment, Kansas City, Kan., April. Salmon CM, Rowan LE, Mitchell PR (1998) Facilitating prelinguistic communication: Impact of adult prompting. Infant–Toddler Interven 8: 11–27.
192
Irene M. Pepperberg
Salomon G (1977) Effects of encouraging Israeli mothers to co-observe Sesame Street with their five-year olds. Child Devel 48: 1146–1151. Savage-Rumbaugh ES, Murphy J, Sevcik RA, Brakke KE, Williams SL, Rumbaugh DM (1993) Language comprehension in ape and child. Monog Soc Res Child Devel 233: 1–254. Smith VA, King AP, West MJ (2000) A role of her own: Female cowbirds, Molothrus ater, influence the development and outcome of song learning. Anim Behav 60: 599–609. Smith WJ (1997) The behavior of communicating, after twenty years. In: Perspectives in Ethology, 12 (Owings DH, Beecher MD, Thompson NS, eds.), 7–53. New York: Plenum. Snow CE (1979) The role of social interaction in language acquisition. In: Children’s Language and Communication (Collins WA, ed.), 157–182. Hillsdale, N.J.: Lawrence Erlbaum. Stoddard PK, Beecher MD, Horning CL, Campbell SE (1991) Recognition of individual neighbors by song in the song sparrow, a species with song repertoires. Behav Ecol Sociobiol 29: 211–215. Striedter G (1994) The vocal control pathways in budgerigars differ from those in songbirds. J Comp Neurol 343: 35–56. Todt D (1975a) Social learning of vocal patterns and models of their applications in Grey parrots. Zeit Tierpsych 39: 178–188. Todt D (1975b) Spontaneous recombinations of vocal patterns in parrots. Naturwissenschaften 62: 399–400. Todt D, Hultsch H (1998) Hierarchical learning, development and representation of song. In: Animal Cognition in Nature (Balda R, Pepperberg IM, Kamil AC, eds.), 275–303. London: Academic Press. Veneziano E (1988) Vocal–verbal interaction and the construction of early lexical knowledge. In: The Emergent Lexicon: The Child’s Development of a Linguistic Vocabulary (Smith MD, Locke JL, eds.), 109–147. Orlando, Fla.: Academic Press. Warren DK (1995) X-ray video of grey parrot phonation. Unpublished raw data. Warren DK, Patterson DK, Pepperberg IM (1996) Mechanisms of American English vowel production in a Grey Parrot (Psittacus erithacus). Auk 113: 41–58. Watkins B, Calvert S, Huston-Stein A, Wright JC (1980) Children’s recall of television material: Effects of presentation mode and adult labeling. Devel Psych 16: 672–674. Weir R (1962) Language in the Crib. The Hague: Mouton. West MJ, King AP (1985) Social guidance of vocal learning by female cowbirds: Validating its functional significance. Zeit Tierpsych 70: 225–235. Wright TF (1996) Regional dialects in the contact calls of a parrot. Proc Roy Soc London B263: 867–872. Wright TF, Dorin M (2001) Pair duets in the yellow-naped Amazon (Psittaciformes: Amazona auropalliata): Responses to playbacks of different dialects. Ethology 107: 111–124. Yamashita C (1987) Field observations and comments on the Indigo macaw (Anodorhynchus leari), a highly endangered species from northeastern Brazil. Wilson Bull 99: 280–282.
11
Cephalopod Skin Displays: From Concealment to Communication
Jennifer A. Mather Introduction At first glance, the complex cephalopod skin display system, which Packard (1995) describes as a neuromuscular image generator, looks like the ultimate flexible sender system for visual communication. It is matched by conspecific receivers’ high-acuity-lens eyes (Budelmann, 1994; Muntz, 1999), an excellent example of convergent evolution with the vertebrate eye. But a closer look shows that the sender–receiver match is not so simple. The skin display system apparently evolved as an avoidance communication to potential vertebrate predators (Packard, 1972). Since any sender–receiver system must match the receiver’s sensitivity (Endler, 1992, describes this as sensory drive), the sender system of the cephalopod skin evolved primarily to fit the receiver characteristics of the vertebrate and not the cephalopod visual system. Camouflage patterns are widespread across the group but intraspecies displays are much less so, and even species having them develop them late in ontogeny. Thus cephalopods appear to have adapted the particular characteristics of a system designed for one purpose (Packard, 1995) to another: communication to conspecifics. Cephalopod Skin Display Systems The first part of this chapter will describe the skin system itself: its structure, control systems, capabilities, and limitations. The cephalopod skin is a complex structure including chromatophores (Packard, 1988a), deeper layers of reflecting cells (Messenger, 1974), and muscles which allow the skin to be smoothed or raised in papillae. In addition, the tremendous flexibility of movement of the arms (Mather, 1998) aids cephalopods in changing body outline to alter the animals’ appearance. The pigment-containing chromatophore sacs are the core of the cephalopod appearance system. However, they are not just pigment cells but units of a complex neuromuscular system. Each pigment sac is surrounded by an elastic membrane which can be pulled outward by a set of 15–25 radial muscles to reveal the color of the pigment inside (Packard, 1985). Sacs can contain yellow, red, or brown/black pigmentary color. Only a few are present at hatching, but the number of chromatophores increases over the life span of an individual as it grows, with perhaps a “wave” of chromatophore production if the planktonic juvenile cephalopod becomes more benthic (Packard, 1985). New chromatophores are formed in spaces between present ones, and new ones contain yellow pigment, which changes to red and then to brown as the animals age. Thus chromatophore ontogeny results in local multicolored “fields.”
194
Jennifer A. Mather
Nerves run directly from the cephalopod brain to muscles that expand each chromatophore, and this neuronal control of the chromatophores gives the system tremendous spatiotemporal display flexibility. Each chromatophore can be expanded or contracted in milliseconds. A single dermal nerve innervates a motor field on an area of the skin, with an effective ratio of about 1 motoneuron per 100 chromatophores (Dubas and Boyle, 1985). Stimulation studies reveal that with increase in both voltage and frequency of stimulation, scattered chromatophores within this motor field are recruited—though not the same ones with each treatment. Each chromatophore is thus part of several motor units and each axon innervates many chromatophores scattered within a restricted area. This overlapping allows changes in coarseness of appearance by recruitment at a patch’s edge and of intensity of luminance by recruitment of chromatophores within an area, yielding a good capacity to produce the mottled camouflage employed by many octopods. Such recruitment seems to divide the skin into patches and grooves, which darken differentially (Packard, 1995). Nevertheless, Packard (1995) argues that while brain commands change these two characteristics by recruitment, it is the distribution of chromatophores which forms the basic skin pattern. A second communicative element within the cephalopod skin is the deeper reflective layer of iridocytes. These cells contain thin, electron-dense platelets alternating with cytoplasm, with the platelets often lined up parallel to the skin surface (Hanlon and Messenger, 1996). Whereas the chromatophores produce long-wavelength colors by pigments, the iridocytes assist with appearance by producing reflection of the ambient light intensity and wavelengths in the immediate environment. Some of these are broadband reflecting leucophores which, Messenger (1974) pointed out, allow the color-blind cephalopod (Messenger et al., 1973) to reflect dominant wavelengths of light in its environment upon its skin. Others are iridophores, which reflect short-wavelength blue or green light, often in restricted areas of the body surface. These iridophores have long been thought to be passive reflectors only, but there may be long-term changes in platelet density and structure by acetylcholine hormone modulation (Hanlon et al., 1990). Such changes would, however, be different from the instant ones of the complementary chromatophore system. An accessory element to the skin displays is the cephalopod ability to vary appearance by modulation of the surface texture of the skin and by variations in posture. Small muscles within the skin rise locally or generally across the surface in “papillae” (Packard, 1988a). Thus skin surface texture can match that of surrounding rocks, sand, or algae, or papillae can be raised in specific areas, such as just above the eyes. In addition, the arms have a hydrostatic rather than a fixed skeleton (Kier, 1988), and can move in patterns and assume postures that would be impossible in animals with solid endo- or exoskeletons (Mather, 1998), to match a cephalopod’s appearance to its background.
Cephalopod Skin Displays
195
Although the proficiency of any communication system must depend not only on its display structures but also on its control capacity, the control of appearance in cephalopods is poorly known, partly because recording from the central nervous system has been so difficult. Control of the chromatophore system apparently begins in the large optic lobe with its direct input from the eyes. Motor programs for particular displays may originate in this lobe, since stimulation has resulted in skin pattern changes (Chichery and Chanelet, 1976). In addition, injection of neurotransmitters such as acetylcholine, dopamine, and noradrenaline, which are thought to stimulate circuits in the optic lobe, cause global pattern changes in the skin (Andrews et al., 1981). From the optic lobe, connections pass to the motor systems in the lateral basal lobes of the brain, where Packard (1995) presumes overall patterns are assembled. Stimulation then passes to the chromatophore lobes, where chromatophore motoneurons are located. Packard (1995) suggests that banks of motoneurons are assembled here to form local components (see Packard, 1982). Whereas older research suggested a topographical organization of these motoneurons in this area, newer studies have not confirmed it for all species (Dubas et al., 1986). In addition, horseradish peroxidase injection showed that not all neurons in the chromatophore lobe are motor and not all motoneurons originate from the lobe itself. Thus even this lowest level of control does not appear to have a simple spatial organization. Skin Patterns as Antipredator Devices Any display system must have evolved to pass information to some species, so researchers should assess the structure and function of the skin display system in terms of what Guilford and Dawkins (1991) call the psychology of the receivers. In the case of cephalopods the receivers were apparently the bony fishes, a group which explosively radiated to dominate the world’s oceans. Messenger (2001, p. 512) comments that “the body patterns of cephalopods have evolved to confound eyes: the eyes of visual predators.” Coleoid cephalopods are an unusual group of mollusks, quite unlike the typical clam and snail, and the group evolved at about the same time as bony fish. The cephalopod skin evolved during a competition for ecological niches between the two groups (Wells, 1994), and Packard (1972) has called fish the “designers of cephalopod skin.” If a communication system was evolutionarily designed for receivers of a different group than the senders, details of its production should be fitted to the sensory capacity of these receivers (Endler, 1992; Enquist and Arak, 1998). Packard (1988b) pointed out some of the general processes of visual reception by the vertebrate visual system, known in most detail for humans and their near relations (e.g., Matlin and Foley, 1997). Much of the extraction of the details about aspects of the environment—such as contrast, spatial pattern, and edge detection—is coded by bipolar, amacrine and ganglion cells in the vertebrate eye before being sent to the brain.
196
Jennifer A. Mather
Thus the perceptual properties of this communication system are neither produced nor received by circuits in the animals’ brains but by structures of skin sending and eye receiving systems. Packard (1988b) also emphasizes that in message analysis there is an important role for attention, the step between sensory reception and cognitive evaluation of messages. The addition of learned features of prey that result in a formation of a predator’s search image (Curio, 1976) means that parts of visual displays can be expected either to focus on or to draw attention away from the prey’s identity. These features will be demonstrated with examples later on. One of the paradoxes of the cephalopod skin display system, that a color-blind species produces colored displays, is easily explained if signals are prepared for non-cephalopod receivers. Cephalopods were suspected of being unable to discriminate the different wavelengths of visible light, and Messenger et al. (1973) showed that they could not learn a discrimination based on color and did not make optomotor response in a moving drum with coloured stripes. A further demonstration of this limitation was produced by Marshall and Messenger (1996), who showed that cuttlefish (Sepia officinalis) do not produce color-based camouflaging patterns on a background made up of gravels in contrasting colors but of the same luminance intensities. Physiological data confirmed that, unlike the multiple-pigmented eye of vertebrates, the eye of cephalopods (except that of Watasenia) has a single photopigment (see Hanlon and Messenger, 1996). Given the tuning of cephalopod skin patterns to vertebrate receptors and the well-known functioning of the vertebrate visual system in decoding and constructing pattern (Matlin and Foley, 1997), it is reasonable to examine how these systems match for antipredator communication. Cott’s (1940) pioneering work on visual concealing devices serves as a basis for this discussion; the only difference between those devices of cephalopods and other animals producing camouflaging displays is that most of the latter have a single pattern, whereas each cephalopod species may have several. Not all cephalopods have color- and pattern-changing abilities; some, especially nocturnal species, have a single disruptive pattern or a uniform skin color. The most straightforward concealment device of cephalopods, particularly those which move off the ocean floor, is countershading. In downward-radiating light, any animal will be illuminated from above and either show its silhouette to watchers or cast a shadow. Cott (1940) noted that darkening dorsally and paling ventrally with lateral gradations eliminates this problem to some extent, and such a device is used by cephalopods as well as many fish. Dorsal chromatophores are expanded, ventral ones retracted, and lateral ones kept in medium expansion. This effect is so important to concealment that several species have reflexive modulation of the differential expansion, controlled by a response to gravity and triggered by the statocyst system (Ferguson et al., 1994). Roll over a cuttlefish, and the ventral surface darkens while the dorsal one pales.
Cephalopod Skin Displays
197
In effect, the countershading system helps a pelagic animal automatically disappear from view. Another, more complex aspect of visual disappearance by cephalopods is background matching. Such matching cannot be described as the product of any low-level motor program because appearance is so variable, depending on the background to be matched. Many octopuses and squid produce a small-blotched mottle pattern that blends well with sand and rocks, and the chromatophore motor unit arrangement described by Dubas and Boyle (1985) for the octopus Eledone cirrhosa probably forms the basis for this type of response. Although mottle patterns are common in octopuses, they are subtly different for different species and the texture varies with different backgrounds. Their background matching is assisted by the reflection of short-wavelength light in the environment by the leucophore system, which is visible where chromatophores are retracted. Postures and skin texture combinations such as the flamboyant (Packard and Sanders, 1971)—with raised skin papillae, mantle in an extended, uptilted posture, and the two dorsal arms raised, curled posteriorly and twisted at the distal ends—assist resemblance to background such as algae. Cuttlefish are famous for background matching on the dorsal mantle surface (Hanlon and Messenger, 1988), even being reputed to produce a checkerboard black-and-white pattern on a similar tiled background. Squid (Sepioteuthis sepioidea) use their elongate tentacles in V postures (Moynihan and Rodaniche, 1982) to assist skin patterns in breaking up outlines and blending with floating algae or branching gorgonians near which they rest. All these camouflage patterns have the simple goal of causing the cephalopod to disappear from the view of fish, to be indiscriminable from the background by the contrast, movement, and edge-detection assessment devices of the vertebrate eye. (A good demonstration of their success at this is shown in Hanlon and Messenger, 1996, figure 5.2. Finding the cephalopod in the eight pictures is a challenge to most readers.) A second general device by which cephalopods can disappear from the perception of potential vertebrate predators is by disruptive coloration, again described by Cott (1940) and discussed in detail by Hanlon and Messenger (1996). Disruptive concealment is a result of being seen but not being recognized as an animal. Cuttlefish demonstrate an excellent example when placed on coarse gravel: a large, square middorsal white spot and wide white lateral bar over the eyes (Hanlon and Messenger, 1988). Such contrast elicits lateral inhibition, a processing specialty of many visual systems (Matlin and Foley, 1997) that makes edges more conspicuous and obscures other features, in this case the cuttlefish outline. Another concealment that functions by disruption is the aptly named plaid of juvenile squid (Moynihan and Rodaniche, 1982). Dark longitudinal stripes and lateral bands on a pale background are easily visible, but they break up the elongate outline of the squid and conceal its body shape.
198
Jennifer A. Mather
These units are all good matches to known vertebrate feature detectors (see Matlin and Foley, 1997, for a discussion of vertebrate edge detection). As Packard (1988b) pointed out, they call attention to features of the cephalopod that are not recognizable as associated with animals. In addition, vertebrates have two visual analytical systems, the magno and the parvo, specializing in color reception and motion/location detection, respectively. Mixing the messages to the two systems can result in spatial confusion (Triesman, 1986). Such a means of concealment may also be tuned to vertebrate watchers of cephalopod disruptive patterns. Disruptive patterns which function for concealment may also be present only on part of the skin surface of a cephalopod sender. Packard (1988b) emphasized that attention modulates between the sender’s skin system and the receiver’s brain. Smith (1990) emphasized this process with reference to social communication: that individuals are always receiving information and making predictions for the future—what Owings and Morton (1998) see as continuous interchanges of information. Visual predators build search images that guide their recognition of prey, finely tuned as they learn prey identity and updated by results of previous searches, as in birds’ recognition of camouflaging moths (Kettlewell, 1973). One feature that cephalopods cannot conceal is the large eye (see Coss and Goldthwaite, 1995, for a discussion of eyes as visual schemata for predators). Several local display devices help to conceal its outline and keep the predator’s attention from it. One is the eye bar that octopuses (Packard and Sanders, 1971; Mather and Mather, 1994) use to distract the predator from recognition of the horizontally elongated slit pupil. A local patch of chromatophores darkens or pales in an equal-length extension from both lateral margins of the eye, turning the general appearance from that of a pupil slit to a bar. Similarly, many octopuses have the ability to raise papillae around the eye, successfully changing its outline by blurring the circular shape that could be matched to a predator’s shape detection. The giant Pacific octopus (Enteroctopus dofleini) has such large dorsal papillae (up to a few centimeters in length) that they are often informally called horns. At the same time that attention must be drawn away from any feature that would allow a predator to recognize a cephalopod as an animal, attention can be drawn toward local features that would not. Thus Packard (1988b) points out that the white spots on the mantle of an octopus stand out from the skin surface. A potential visual predator would be drawn to look at what Packard (1988b) calls a foveal trap. Attention is drawn to the spot, which may flash on and off, and the high-acuity foveae of the predator’s eyes take in enough local information that it ceases to scan the rest of the area for cues, such as the real octopus eye, that might allow it to discover the animal. The conspicuous white square middorsal on the cuttlefish mantle (Hanlon and Messenger, 1988) may function similarly. In addition, the juxtaposition of chromatophores
Cephalopod Skin Displays
199
that produce long-wavelength colors with areas of reflective short-wavelength ones will stimulate the color-based center-surround receptive fields of the vertebrate visual system (Matlin and Foley, 1997). As Enquist and Arak (1998) point out, such sensory recruitment will maximize the conspicuousness of the pattern—in this case, one which draws attention away from the outline of the animal itself. Another way in which attention is drawn by cephalopods to a feature that misinforms potential predators is by production of circular, dark eyelike spots. The dymantic display of octopuses is a complex combination of dark around the eyes, spread arms with web flared (see Mather, 1998, for postures), and lateral margins of arms darkened. Such a display is not concealing but attention-attracting and misinforming. Eye spots are common antipredator devices in many animals, including fish and butterflies (Coss and Goldthwaite, 1995), and give an illusory appearance of the head of a much larger animal. They are also “aimed” at the vertebrate visual system at the level of pattern perception, and the sudden appearance of eyelike stimuli may startle a predator and make it hesitate in its approach (Cott, 1940). Behavior assists in such misinformation, for the spread body and arms give the illusion that the octopus is much larger than it actually is. Interestingly, eyelike spots are common in the cephalopods. Some are relatively stable reflective circles at the base of the lateral arm webs in the octopuses. The greatest extent of eyelike spots is probably in the genus Hapalochlaena. These small octopuses have a venom that is deadly to many vertebrates, including humans (Hanlon and Messenger, 1996), and the blue rings that give them their common name of blue-ringed octopus are flashed on the surface of mantle and arms when the animal is disturbed. In this case the rings are clearly not to startle or bluff, but act as warning coloration; the octopus is deadly. But pigmented spots are also present in several unrelated members of the group, described by Moynihan (1975) as an example of conservatism of displays across a group. Although the general communication device is also found in cuttlefish (Hanlon and Messenger, 1988) and squid (Moynihan and Rodaniche, 1982), which are fairly distantly related to octopuses, the pattern is representative of phylogenetic conservatism of communication devices (Coss and Goldthwaite, 1995) only in a general sense. The dark patches of octopuses are below the eyes, whereas the circular ones of cuttlefish are on the lateral posterior mantle, and the circular ones of squid are on the lateral posterior mantle and on the fins. Thus no specific fixed motor program could have evolved for all three. Either a general startle/bluff visual display which similarly exploits vertebrate pattern perception has evolved separately in three groups, or the general program for the device, available in an area such as the optic or lateral basal lobe, has been adapted with specific motoneuron patterns in the chromatophore lobe output areas of the different groups. Another antipredator device widely employed by cephalopods uses the skin system’s capacity to change in minimal time. This device exploits the likelihood that a vertebrate
200
Jennifer A. Mather
predator has built a search image of a prey’s appearance through learning. Despite the vertebrate’s learning update capacity, the cephalopod appearance changes faster than any search image can keep up with. Young cuttlefish (Hanlon and Messenger, 1988), adult sepiolids (Euprymna scolopes) (Anderson and Mather, 1996), and octopuses (Hanlon, Forsythe, et al., 1999) faced with a predator use not just one skin pattern but several. These include paling or darkening all the body, assuming the eye-spot dymantic, and showing various contrasting patterns. The concealment by change is accompanied by ejection of an ink cloud, either for concealment of the animal or as a “fake” cephalopod. Movement also plays a part, for sepiolids in shallow water may make an ink spot, pale, and jet away; jet to the water surface and take a concealing posture and pattern like a floating piece of algae; or sink to the sandy bottom and either adopt a background-matching pattern or dig into the sand (Anderson and Mather, 1996). Hanlon, Forsythe, et al. (1999) followed escaping octopuses and noted the lack of predictability in their sequences of whole-body appearance. Not only did the animals quickly change appearance and thus break any match to a predator’s search image, but there was little predictability in the temporal sequence that would allow cognitive computation and planning ahead by the predator (Smith, 1990). Such sequential production of concealing devices is a good example of the extension of cephalopod antipredator display systems from consistent automated displays (in response to specific stimuli) to intelligent decision making within what Owings and Morton (1998) describe as the flow of information exchange. Hanlon and Messenger (1996) point out that squid live in a tropical near-shore environment with a high potential of attack by fish predators. Besides the general cephalopod countershading, they use a variety of patterns and actions to avoid approach by these predators. Thus Mather (2000a) found that they have finely calculated responses to fish approaches depending on predator species, size, speed, and distance. Large herbivorous parrotfish could approach close to adult squid, whose usual response was an agonistic zebra display or an eye-spot display. Approaches of higher velocity or by predatory bar jack resulted in eye-spot dymantic displays, evasion, or short jets away. As the fast yellowtail snapper approached to 3 meters, squid would take evasive action, paling and jetting quickly, sometimes even before the observer saw the fish. Approaches to juvenile squid close to the bottom resulted in their sinking near the gravel with matching mottle or a disruptive plaid display; near algae or gorgonians, juveniles assumed a V posture and moved within the branches. This complex set of judgments revealed that cephalopod camouflage is not just a matter of producing the appropriate skin patterns, but also of having the sensory and cognitive ability to make choices, monitor results of decisions, and update the actions—and keep on doing so in a way that seems unpredictable to the observing predator until the threat
Cephalopod Skin Displays
201
is evaded. This capacity may be one basis for adaptation of skin pattern production for conspecific communication. Sepioteuthis sepioidea squid produce a range of concealing appearances, using skin colors and intensity gradients as well as a wide range of arm postures (Moynihan and Rodaniche, 1982; Griebel et al., 2002), but not much texture. One is the previously mentioned mottle, a variegated set of spots a few millimeters in diameter, in colors ranging from white through pale and warm brown to grays and black, of approximately the same size and with no apparent pattern. This is sometimes overlain by two 2 centimeterdiameter posterior-lateral dark dymantic dots on the fins. It is frequently accompanied by arm and tentacle postures including extended arms in corkscrew positions. A second is the plaid display, generally seen in juveniles but not adults. In this display there are two longitudinal stripes and two to four circular bars, both about a centimeter in width and paleto-deep brown or black. The squid’s background for Plaid is pale with extensive leucophore-based reflection of light. Arm and tentacle postures are varied from curl and V to bend and even spread, including some tentacle bends and corkscrew twisting of the arms (Moynihan and Rodaniche, 1982). Complex Non-Predator-Directed Communication How might the visual display system that cephalopods honed in avoiding predators be adapted for conspecific-directed communication? Of course it cannot be argued that concealing signals are not communication. They obviously pass information to potential predators, yet the communication may be stereotyped and the information passage, if they are successful, is truncated. Unlike the complex interactive sender–receiver dialogue in social interactions (Smith, 1990; Owings and Morton, 1998), antipredator displays are simple. Tapping into the sensory psychology of the receiver (Guilford and Dawkins, 1991), the cephalopods send displays indicating “not here, not an animal, not worth catching.” That ought to be the end of the interaction. The capacity of a display system such as the cephalopod skin should be adaptable to intraspecies signals, and Moynihan’s (1985) proposal that squid might make a language on their skin brought attention to this system. Moynihan and Rodaniche (1982) cataloged a large number of components on squid skin along with their co-occurrence. They theorized that these components could be the equivalent of nouns, verbs, adjectives, and adverbs in human sentences, and that cephalopods could produce a “language” on their skin, but their hypothesis ignored design features of language (Hockett, 1960) and a parallel with human language is very unlikely. Only one thread of evidence for an additive approach exists: the study of Adamo and Hanlon (1996), who demonstrated that dark face added to the zebra display of cuttlefish indicated a willingness to escalate an agonistic interaction.
202
Jennifer A. Mather
Many aspects of behavior have to be considered before such an assumption about the function of a communication can even be approached. First is the problem of domain specificity (Hirschfeld and Gelman, 1994). Behavior of animals, including humans, does not have an open-ended capacity spread evenly across all tasks (Dukas, 1998). Cognition, broadly conceived by Neisser (1976) as “all the processes by which the sensory input is transformed, reduced, elaborated, stored, recovered and used,” is not evenly available for all information processing that an animal undertakes. This is perhaps most obvious in the small arthropods with their ganglion-based nervous system. Hazlett (1995) notes that crustaceans do not use behavioral capacities which they demonstrate in one situation for an unrelated one, and even the feat of communication evidenced by bee dances has narrow and strict limitations (Dukas, 1998). Unlike these species, cephalopods show evidence of domain generality in their use of information and action. Clear modal action patterns with little variation are unusual in the group; the programmed sequence of digging into sand within the sepioid squid (Mather, 1986) is one exception. A good example of variety of action is the exhalation of water jets from the mantle through the flexible funnel. Intake of water for respiration and its return to the environment carrying wastes is a standard system of mollusks. The cephalopods have adapted this circulatory system for jet-propelled locomotion (O’Dor and Webber, 1986), and the octopuses use the jet in many other aspects of their life. They jet to assist in construction of sheltering “homes” (Mather, 1994), to repel scavenging fish (Mather, 1992), and even to manipulate objects in what appears to be play behavior (Mather and Anderson, 1999). Thus the cephalopod can use some behaviors across different domains. Another problem for evolutionary transformation of concealing signals to messageexchanging ones is the pressure to do so. Complex communication and intelligence are thought to occur in social species where nuances of relationships and a history of interactions shape animals’ lives (Jolly, 1966; Humphrey, 1976). Although many species remain uninvestigated, this type of organization does not appear to be true for cephalopods. Octopuses have a good capacity to learn both tactile and visual discriminations (Wells, 1978) but are generally solitary and asocial (Mather and O’Dor, 1991), as are sepiolids. Squid gather in groups of hundreds or even thousands (Hanlon and Messenger, 1996). If an individual’s identity is not available to conspecifics (see Boal and Marsh, 1998, for cuttlefish), sophisticated intraspecies communication may not usefully evolve. Squid, in contrast, are often held up as an example of a group where communication to conspecifics would be adaptive, for they are found nearly lifelong in groups. A third problem in adapting a concealing signal system for longer-term communication is the neural control system that would process and program the necessary information. Cephalopods are commonly assumed to have high intelligence (Mather, 1995b), and years
Cephalopod Skin Displays
203
of studies have demonstrated their excellent capacity to learn on the basis of both visual and tactile information (Wells, 1978). Nevertheless, their cognitive system does have its limitations. Surprisingly, much of the extensive neural control system of octopuses is in the chain of ganglia running along the dorsal arms. This arrangement is perhaps a necessity for control of the hydrostatic skeleton (Kier, 1988) and the complex set of actions carried out by the units of these limbs (Mather, 1998). Nevertheless, arm control is mostly local, by means of what Rowell (1963; 1966) termed reflexes. This limitation might account for researchers’ (Fiorito et al., 1990, 1998) difficulty in having octopuses solve problems that use arm manipulation as actions. The particular domains in which octopuses use their intelligence and learning capacity are still not well understood. If a system has evolved for one purpose (interspecies communication) and begins to be used for another (intraspecies communication), there might be changes in structure or use of different structures to fit a new receiver. This appears to be the case with the skin signal system of cephalopods. The cephalopod eye cannot discriminate different wavelengths of light but, thanks to the orthogonal arrangement of the microvilli of the receptor cells, it can discriminate its plane of polarization (Budelmann, 1994). This ability may have been useful in predator recognition, since light is polarized when it is reflected by fish scales. Similarly, discrimination of the plane of polarization of light might guide navigation (see Mather, 1991, for an example of this behavior in octopuses), as it does for insects (Dyer, 1998). Nevertheless, Shashar et al. (1996) and Shashar and Hanlon (1997) have demonstrated that polarized patterns may be visible in limited areas of the skin of several species. The extent to which these signals are used is not known, and the difficulties of recording and testing this ability are formidable. Observation of skin patterns of several cephalopod species suggests that part of their repertoire is addressed to conspecifics (see Hanlon and Messenger, 1996). Several patterns produced by Sepioteuthis sepioidea (Moynihan and Rodaniche, 1982) are explicitly used in sexual interactions, and a further investigation of some of them suggests how this display system can be used for intraspecies communication. Adult Sepioteuthis sepioidea have a repertoire of skin signals used throughout the approximately one-month period of reproductive maturity (Mather, 1999a). Squid gather in loose groups and form short-term male-female consortships, females apparently located in areas with substrate suitable for egg laying and males attracted to them. As they mature, both males and females display the dark-bar zebra pattern, though this apparently agonistic signal is used predominantly (2/3) in male-male competitive interactions. Females signal their reproductive maturity and interest with a pale-mantle saddle, and males commonly reply with longitudinal stripes. A male who is attempting to maintain consortship with a female and repel other males will direct a unilateral pale lateral silver at them but not to her. When a male is about to attempt to pass spermatophores to a female,
204
Jennifer A. Mather
he will signal with a repeated all-body on-off paling called flicker. Males compete and females choose; spermatophore transfer is completed by the female. A comparison of the displays that are used for concealment by squid and those which are used to signal conspecifics is useful (see table 11.1). One obvious contrast is the range of colors. Only occasionally in the zebra display is there use of color, which is what one might expect if the receivers were color-blind. Most of the sexual displays are simple in structure, but the mottle and plaid concealing displays are complex. The displays to conspecifics have high-contrast transitions between dark and white, a feature not found in the concealing ones, and form large areas of high-intensity white reflectance. Each of the concealing displays mentioned, and most others, are accompanied by arm postures that would break a potential predator’s search image of a squid outline, whereas those to conspecifics have a simple outline. The zebra display (discussed later) appears to be an exception to the simplicity of displays to conspecifics. This may have come about because it is a highly graded display and the grading may communicate variation in status and aggression. Beside easy visibility, the displays of squid have a variance which aids in specificity of communication. Two examples will be discussed, the situation-specific extent of the saddle display of females and the ritualization of the high-intensity zebra contests of males. The female saddle is formed by the paling of all the mantle except an anterior margin of a few centimeters, which is brown dorsally and ventrally and is normally held for a few seconds, though sometimes up to 1 minute (this is the same pattern that Moynihan and Rodaniche, 1982, confusingly described as pied). Saddle is overwhelmingly (98 percent) a visual signal of the adult female and is displayed in several situations. It is produced as a rapprochement (Mather, 2001), the first apparent signal of sexual readiness or interest when females mature. A female rises in the water column above the group with arms bent ventrally and produces saddle, of longer duration if there is no apparent response. Table 11.1 Perceptual characteristics of some concealing (interspecies) and communicating (intraspecies) skin displays of Sepioteuthis sepioidea Display Interspecies Mottle Plaid Intraspecies Saddle Stripe Zebra Flicker
Color
Outline
Spatiotemporal transitions
Pattern complexity
Pattern variability
Some Some
Complex Complex
Gradual Varied
Complex Complex
Medium Medium
None None Some None
Simple Simple Simple Simple
Abrupt Abrupt Abrupt Abrupt (time)
Simple Simple Complex Simple
Medium Little Wide Little
Cephalopod Skin Displays
205
Saddle is often paired with male displays of stripe early in courtship, and this exchange is seen especially at the beginning or renewal of pairing. The saddle-stripe paired display appears nearly simultaneously in the two animals, and they produce this paired display when they are in specific relative positions, the female above the male in parallel. A reduced-area saddle (described as saddle spot) is displayed for longer duration as consort pairs swim together. Saddle is also displayed after a male flicker, apparently as a deescalation signal, because females subsequently do not encourage spermatophore transfer. Males sometimes respond to this with stripe, and saddle has never been observed to lead directly to an attempt to transfer spermatophores (Mather, 2001). Thus it is a sign of sexual interest but not a consent to mating. One interesting aspect of this display is that the proportion of a female’s mantle paled to form the pattern varies tremendously (Mather, 2001). At rapprochement, females almost instantly pale all the mantle but the anterior margin. The paling is usually total and accomplished within milliseconds early in courtship when it is paired with stripe. During consortship maintenance and after a male flicker, paling may encompass only half of the mantle and the white is visibly “peeled back” from the posterior margin in up to a second. In the position maintenance of consortship, the saddle is reduced to a saddle spot only a few centimeters in diameter but always on the side of the mantle facing the female’s consort. Variations in signal form are a common modulation of signal intensity (Smith, 1990), apparently here indicated by changes in area. Interestingly, intensity can be predicted by the situation in which a squid finds herself. It is interesting that the signal, while stereotyped in form, as reproductive displays would be expected to be (Enquist and Arak, 1998), is modulated in intensity, since motivation could be expected to decrease. Such changes may suggest a general hormonal modulation of reproductive drive, decreasing as the squid gains feedback information. While little is known about the hormonal state of mature female octopuses (Hanlon and Messenger, 1996), that is a possibility. However, saddle changes not only are modulated in intensity over time but also vary in direction. Females always produce a saddle spot toward their male consort, so directional selection of area to an intended receiver also plays a part in this modulation. Other cephalopods also use area modulation when sexual motivational state presumably changes. Octopus cyanea were confined in a small outdoor pond in Hawaii (Mather, 1995a), and we were able to keep two male-female pairs in the pond sequentially for a 10-day period. Octopuses are usually solitary and reproductive opportunities are sparse. The male immediately displayed an all-body raised papillae skin texture with the outer half of the papillae clear white and the rest of the body surface dark chocolate brown, called descriptively white paps, and approached the female for eventual mating. After mating, the female refused further male advances, although the male continued to display
206
Jennifer A. Mather
white paps any time he was near her. As a week went on, it was notable to watchers that the area of the display became smaller and smaller. First it was unilateral, then only a few arms were involved, and finally only the reproductive third right arm was stretched toward her, raised white paps visible along its surface. Again intensity was modulated by decreasing the area of display, and again directionality was maintained. This pair of displays demonstrates the cephalopod control of spatial extent of skin signals (Packard, 1995), although the precise millisecond control that is possible for temporal modulation was not utilized in most displays. The reliance on positional information, possibly for eliciting the saddle-stripe interchange, and the location of saddle spots toward consorts underline that choice of area of signal production is by no means automatic. Directionality is an important aspect of signal production to aim communication at specific receivers (Bradbury and Vehrencamp, 1998), and the spatial control of the squid skin display system allows this quality to be expressed at the same time that motivational modulation is carried out. A second intraspecific display of squid is the agonistic zebra, which consists of alternating dark and light bars. Cuttlefish have a vivid zebra pattern, modulated from brown on-cream in resting patterns to vivid black and white during interactions (Hanlon and Messenger, 1988; Adamo and Hanlon, 1996). Loligo plei squid have a horizontally oriented lateral flame male-male display (Hanlon, Maxwell, et al., 1999) with the same alternation of dark and white stripes. The zebra of Sepioteuthis sepioidea (Moynihan and Rodaniche, 1982) is vertically oriented, on arms and mantle at greatest extent, and forms only rough dark bars. Hanlon and Messenger (1996) discuss these three examples as parallel agonistic displays, but they are clearly not the result of common chromatophore motoneuron circuitry. Instead, they may be parallel production of highly visible abrupt luminance changes, discriminable as the medium spatial frequency bars whose easy visibility Packard (1988b) pointed out, and produced for the cephalopod visual system as receiver. Zebra is produced only by mature squid, but by both males and females. Female-female zebra displays are almost unknown (one in 1,200) and male-male ones are the majority (Mather, 2000b). Zebra appears in situations of antagonism (even to intruding fish) or of challenge, such as males competing for the consortship of a mature female. Given the fine control of the chromatophore system, Packard (1995) has pointed out that changes of intensity of a signal on the cephalopod skin can be of three types: modulation of the intensity of the components, variation of the spatial extent, and change of contrast by differences in background color. Using these three types of modulation allowed me (Mather, 2000b) to calculate a quantitative intensity score for each of the many zebra displays that are produced and to specify the differences in intensity that accompany the differences in situation and targets. The most intense zebra displays are reserved for what Smith (1990) calls a formalized interaction, the male-male contests that I designated as formal zebras.
Cephalopod Skin Displays
207
Formal zebra contests are very conspicuous events and have been noted by Moynihan and Rodaniche (1982) and Hanlon (Hanlon and Messenger, 1996). In a formal zebra contest, one male assumes a position just above the other, usually parallel but sometimes rotated 90 degrees. Each assumes a dark zebra-striped pattern on the dorsal mantle and arms, though it is typically presented on brown by the squid above and on white by the squid below (Mather, 1999b). The ventral mantle of each is pale with scattered dark dots, and the fins assume a brown color while revealing a series of large pale spots along their midline. These fin spots are of particular interest to observers (and perhaps other squid) because they develop their conspicuousness when males mature and are sufficiently irregular that individuals can be identified by their pattern. During the formal zebra contest the arms of each squid are spread, although those of the dorsal animal typically spread to 60 degrees and those of the ventral one to 180–270 degrees. Sometimes contact is made between the posterior mantles of the pair, more often by an upward push of the lower squid. Formal zebra contests have all the characteristics of ritualized communication: warning of their occurrence, conspicuousness, redundancy, and stereotypy (Wiley, 1983). A pair of males in formal zebra commonly rise above the group and may continue the display for up to a minute, during which they are vulnerable to predators (I have seen one predation attempt during a formal zebra). In 56 contests, with the arm spread increasing the area of display, the zebra of the under squid was much more intense (mean score of 9/10) than that of the over one (mean score of 4/10). In addition, if winning a contest was indicated by retaining or assuming consortship of a “desirable” female, the under squid was usually the winner (84 percent). Such a ritualized contest between males is common in the sexual behavior of male competitors of many species across the animal kingdom (Krebs and Davies, 1987). It is interesting to record the antecedent behaviors of formal zebra contests, for any maximal agonistic contest should be the product of escalation (Krebs and Davies, 1987). Though adult males come and go from local female groups, some recognition and assessment of individuals is likely before a formal zebra. In fact, the display is often seen when a new male enters a group (25 percent) where one or several adult males are in consortship with adult females, or after a male-female saddle-stripe sexual display. Escalation in signal intensity is obvious, since immediately before a formal zebra contest a lowerintensity zebra by one or both contestants is common (42 percent) (Mather, 1999b). A diagonal rise approach (25 percent) might be an attempt by each to attain the superior under position, just as male ungulates jockey for position before an antler clash (Krebs and Davies, 1987). The formal zebra thus has most of the characteristics of ritualized contests for superiority—escalation, stereotyped roles, and a nondangerous manner of status signaling and contest settling.
208
Jennifer A. Mather
The production of formal zebra displays by squid certainly demonstrates that cephalopods have the capacity to use the complex, fine-tuned skin visual system that was honed by predator threat (Packard, 1972) to display to conspecifics. The fine control of chromatophore expansion by the brain (Dubas and Boyle, 1985) has allowed the squid to modulate signals spatially. It is interesting that in both the saddle and the zebra, spatial modulation apparently is used to indicate motivational intensity, as one might find in other social signals such as limb movements or auditory calls (Smith, 1990). Endler (1992) points out that in many species greater intensity is better received and favored, and certainly the maximal luminance contrasts in zebra are easily received. Spatial modulation is also used to form directionality, for the saddle spot is always unilateral toward a consort receiver and low-intensity zebras are commonly unilateral. The ability of chromatophore muscles to contract and change the cephalopod appearance in milliseconds is not normally utilized in these communications, although it is in the male flicker. Endler (1992) also points out that too quickly changing signals may not be processed by the receiver if flicker fusion frequency (Matlin and Foley, 1997) blurs the luminance changes. Relative placement and arm position that so nicely help the camouflaged cephalopod to disappear before vertebrate eyes (Hanlon and Messenger, 1996) are added to the formal zebra displays, apparently also to modulate intensity. What do the uses of this skin system in displays for conspecifics tell us about the level of control of these behaviors or the awareness that squid might have about their actions? Regrettably, so far they tell us little. Part of this is lack of information about the sources of control of the antipredator skin displays (Packard, 1995). Only the general areas of the brain are known, and some control, such as that of camouflaging countershading, is known to be reflexive (Ferguson et al., 1994). Signals to conspecifics in sexual interactions must maximize clarity and also be fairly stereotyped in form so that they are easily understood (Wiley, 1983; Smith, 1990). Moynihan and Rodaniche’s (1982) inability to recognize individual squid hampered the generality of their observations, and only my studies have been able to identify and follow individuals and determine the long-term pattern of their behavior (Mather, 1999a). Much more remains to be understood through observation about the context and range of this intraspecies communication, and this must be followed by experimental studies of receiver responses (see Guilford and Dawkins, 1991, for a reminder that we must take receivers into account). That the cephalopods are known to be good at learning (Wells, 1978), and thus intelligent (Mather, 1995b), does not tell us how and in what domains this intelligence is used. Which of the aspects of behavior mentioned earlier might be limiting for the sophistication of cephalopod skin visual communication in its expansion to target conspecific receivers and its function within a different cognitive domain (sensu Hirschfeld and
Cephalopod Skin Displays
209
Gelman, 1994)? The sophistication of the sender system is certainly no barrier. Cephalopods are justifiably known as the invertebrates with intelligence, and there is evidence that they are domain generalists for some behaviors. Perhaps the barrier to development of complex communication between cephalopod conspecifics is in the necessity to utilize it. Several authors have argued that acute intelligence has evolved only in animals with complex social organization (Jolly, 1966; Humphrey, 1976). Certainly, sophisticated communication and intelligence seem necessary only if animals are gathered in long-term groups and individuals play specialized roles (Byrne, 1994). Squid are likely the best candidates for development of individual discrimination and social roles, but no cooperation is evident. Group membership is fairly fluid (Boom et al., 2001). The groups disperse at night to hunt, and even daytime hunting is individual. Schools are commonly linear in the daytime, and casual observation has suggested “sentinels” on the ends of such a line (Hanlon and Messenger, 1996). O’Dor (personal communication) has suggested that physics rather than social organization dictates group arrangement, and squid have a welldeveloped lateral line analogue which enables them to sense water displacement (Budelmann, 1994) and judge relative position. In addition, Adamo and Weichelt (1999) analyzed flight from predators in another species of squid and found that the end “sentinel” is not the first to flee from predators, as one would expect from an animal designated as vigilant. All this suggests that squid do not use the excellent sender capacity to form anything similar to a human language on their skin, despite what Moynihan (1985) suggested. Hanlon and Messenger (1996) remind us that spatial signal complexity does not necessarily equate to cognitive sophistication. One of the major characteristics of a language must be openness, and the displays that have been evaluated in squid are apparently limited in meaning and ritualized—but that is what should be expected for reproductive displays (Wiley, 1983). Another characteristic of a languagelike communication should be that it “comments” on more than internal state. The squid displays appear to be interpreted as internally based, unlike the dance language of bees (Dukas, 1998), for instance. Human language contains much of its meaning in temporal sequences, and squid can also signal meaning this way. The squid skin system certainly has fine temporal as well as spatial control capacity, and Owings and Morton (1998) remind us that it is the flow of information exchange between individuals that tells us about their actions. Sequences, which can be seen in male-male interaction and in female responses to male signals of intent to pass spermatophores, are too few for analysis, but also do not appear to convey meaning. Much more remains to be observed, but the physical capacity of the cephalopod skin to produce complex displays for potential predators does not seem to have been translated into the cognitive complexity of a language to conspecifics.
210
Jennifer A. Mather
References Adamo SA, Hanlon RT (1996) Do cuttlefish (Cephalopoda) signal their intentions to conspecifics during agonistic encounters? Anim Behav 52: 73–81. Adamo SA, Weichelt KJ (1999) Field observations of schooling in the oval squid, Sepioteuthis lessoniana (Lesson, 1830). J Moll Stud 65: 377–380. Anderson RC, Mather JA (1996) Escape responses of Euprymna scolopes Berry 1912 (Cephalopoda: Sepiolidae). J Moll Stud 62: 543–545. Andrews PL, Messenger JB, Tansey E (1981) Colour changes in cephalopods after neurotransmitter injection into the cephalic aorta. Proc Roy Soc London B213: 93–99. Boal JG, Marsh SE (1998) Social recognition using chemical cues in cuttlefish (Sepia officinalis Linnaeus, 1758). J Exper Mar Biol Ecol 230: 183–192. Boom S, Byrne RA, Mather JA (2001) Schooling behavior of the Caribbean reef squid, Sepioteuthis sepioidea, in Bonaire. Presented at XXVII International Ethological Conference, Tübingen, August. Bradbury JW, Vehrencamp SL (1998) Principles of Animal Communication. Sunderland, Mass.: Sinauer. Budelmann BU (1994) Cephalopod sense organs, nerves and the brain: Adaptations for high performance and life style. Mar Behav Physiol 25: 13–33. Byrne RW (1994) The evolution of intelligence. In: Behaviour and Evolution (Slater PJB, Halliday TR, eds.), 223–265. Cambridge: Cambridge University Press. Chichery R, Chanelet J (1976) Motor and behavioural responses obtained by stimulation with chronic electrodes of the optic lobe of Sepia officinalis. Brain Res 105: 525–532. Coss RG, Goldthwaite RO (1995) The persistence of old designs for perception. In: Perspectives in Ethology, vol 11, Behavioral Design (Thompson NS, ed.), 83–148. New York: Plenum Press. Cott H (1940) Adaptive Colouration in Animals. London: Methuen. Curio E (1976) The Ethology of Predation. Berlin: Springer-Verlag. Dubas F, Boyle PR (1985) Chromatophore motor units in Eledone cirrhosa (Cephalopoda: Octopoda). J Exper Biol 117: 415–431. Dubas F, Leonard RB, Hanlon RT (1986) Chromatophore motoneurons in the brain of the squid, Lolliguncula brevis: An HRP study. Brain Res 374: 21–29. Dukas R (1998) Constraints on information processing and their effects on behavior. In: Cognitive Ecology: The Evolutionary Ecology of Information Processing and Decision Making (Dukas R, ed.), 89–174. Chicago: University of Chicago Press. Dyer FC (1998) Cognitive ecology of navigation. In: Cognitive Ecology: The Evolutionary Ecology of Information Processing and Decision Making (Dukas R, ed.), 201–260. Chicago: University of Chicago Press. Endler JA (1992) Signals, signal conditions, and the direction of evolution. Amer Nat 139S: 125–153. Enquist M, Arak A (1998) Neural representation and the evolution of signal form. In: Cognitive Ecology: The Evolutionary Ecology of Information Processing and Decision Making (Dukas R, ed.), 21–87. Chicago: University of Chicago Press. Ferguson GP, Messenger JB, Budelmann BU (1994) Gravity and light influence the countershading reflexes of cuttlefish. J Exper Biol 192: 195–203. Fiorito G, Biederman GB, Davey VA, Gherardi F (1998) The role of stimulus preexposure in problem solving by Octopus vulgaris. Anim Cog 1: 107–112. Fiorito G, Von Planta C, Scotto P (1990) Problem solving ability of Octopus vulgaris Lamarck (Mollusca, Cephalopoda). Behav Neur Biol 53: 217–230. Griebel U, Byrne RA, Mather JA (2002) Squid skin flicks—Sepioteuthis sepioidea display repertoire. Presented at Animal Behaviour Society Annual Meeting, Bloomington, Ind., July.
Cephalopod Skin Displays
211
Guilford T, Dawkins M (1991) Receiver psychology and the evolution of animal signals. Anim Behav 42: 1–14. Hanlon RT, Cooper KM, Budelmann BU, Pappas TC (1990) Physiological color change in squid iridophores. I. Behavior, morphology and pharmacology in Lolliguncula brevis. Cell Tiss Res 259: 3–14. Hanlon RT, Forsythe JW, Joneschild DE (1999) Crypsis, conspicuousness, mimicry and polyphenism as antipredator defences of foraging octopuses on Indo-Pacific coral reefs, with a method of quantifying crypsis from video tapes. Biol J Linn Soc 66: 1–22. Hanlon RT, Maxwell MR, Shashar N, Loew ER, Boyle K-L (1999) An ethogram of body patterning behavior in the biomedically and commercially valuable squid Loligo pealei off Cape Cod, Massachusetts. Biol Bull 197: 49–62. Hanlon RT, Messenger JB (1988) Adaptive colouration in young cuttlefish (Sepia officinalis): The morphology and development of body patterns and their relation to behaviour. Phil Trans Roy Soc London B320: 437– 487. Hanlon RT, Messenger JB (1996) Cephalopod Behaviour. Cambridge: Cambridge University Press. Hazlett BA (1995) Behavioral plasticity in crustacea: Why not more? J Exper Mar Biol Ecol 193: 57–66. Hirschfeld LA, Gelman SA (1994) Toward a topography of mind: An introduction to domain specificity. In: Mapping the Mind: Domain Specificity in Cognition and Culture (Hirschfeld LA, Gelman SA, eds.), 3–35. Cambridge: Cambridge University Press. Hockett CF (1960) Logical considerations in the study of animal communication. In: Animal Sounds and Communication (Lanyon WE, Tavogla WN, eds.), 392–430. Washington, D.C.: American Institute of Biological Sciences. Humphrey NK (1976) The social function of intellect. In: Growing Points in Ethology (Bateson PP, Hinde RA, eds.), 303–317. Cambridge: Cambridge University Press. Jolly A (1966) Lemur social behavior and primate intelligence. Science 153: 501–506. Kettlewell HBD (1973) The Evolution of Melanism. Oxford: Oxford University Press. Kier WM (1988) The arrangement and function of molluscan muscle. In: The Mollusca, Form and Function, vol. 2 (Trueman ER, Clarke MR, eds.), 211–252. New York: Academic Press. Krebs JR, Davies NB (1987) An Introduction to Behavioral Ecology, 2nd ed. Sunderland, Mass.: Sinauer. Marshall NJ, Messenger JB (1996) Colour-blind camouflage. Nature 382: 408–409. Mather JA (1986) Sand-digging in Sepia officinalis: Assessment of a cephalopod mollusc’s “fixed” behavior pattern. J Comp Psych 100: 315–320. Mather JA (1991) Navigation by spatial memory and use of visual landmarks in octopuses. J Comp Physiol A168: 491–497. Mather JA (1992) Interactions of juvenile Octopus vulgaris with scavenging and territorial fishes. Mar Behav Physiol 19: 175–182. Mather JA (1994) “Home” choice and modification by juvenile Octopus vulgaris (Mollusca: Cephalopoda): Specialized intelligence and tool use? J Zool (London) 233: 359–368. Mather JA (1995a) Influences on food intake and activity of Octopus cyanea. Final report to the National Geographic Society, Washington, D.C., December. Mather JA (1995b) Cognition in cephalopods. Adv Stud Behav 24: 316–353. Mather JA (1998) How do octopuses use their arms? J Comp Psych 112: 306–316. Mather JA (1999a) Mating games squid play. Presented at Animal Behaviour Society Annual Meeting, Lewistown, Pa., August. Mather JA (1999b) What do squid signal with a Zebra display? I. The formal challenge. Presented at the Annual Meeting of the American Malacological Society, Pittsburgh, Pa., June. Mather JA (2000a) I’m a lonely little squidlet in a sea of fish: Sepioteuthis fish avoidance strategies. Presented at Animal Behaviour Society Annual Meeting, Atlanta, August.
212
Jennifer A. Mather
Mather JA (2000b) Do squid make a visual language on their skin: The case of the Zebra display. Presented at a joint BBCS/BPS meeting, Cambridge, July. Mather JA (2001) What does a female squid say with a saddle display? Presented at the Animal Behaviour Society Annual Meeting, Corvallis, Ore., July. Mather JA, Anderson RC (1999) Exploration, play and habituation in octopuses (Octopus dofleini). J Comp Psych 113: 1–6. Mather JA, Mather DL (1994) Skin colors and patterns of juvenile Octopus vulgaris in Bermuda. Vie Milieu 44: 267–272. Mather JA, O’Dor RK (1991) Foraging strategies and predation risk shape the natural history of the juvenile Octopus vulgaris. Bull Mar Sci 49: 256–269. Matlin MW, Foley HJ (1997) Sensation and Perception, 4th ed. Needham Heights, Mass.: Allyn and Bacon. Messenger JB (1974) Reflecting elements in cephalopod skin and their importance for camouflage. J Zool (London) 174: 387–395. Messenger JB (2001). Cephalopod chromatophores: Neurobiology and natural history. Biol Rev 76 (4): 473–528. Messenger JB, Wilson AP, Hedge A (1973) Some evidence for colour-blindness in Octopus. J Exper Biol 59: 77–94. Moynihan M (1975) Conservatism of displays and comparable stereotyped patterns among cephalopods. In: Function and Evolution in Behaviour. Essays in Honor of Professor Niko Tinbergen (Baerends FRS, Beer G, Manning A, eds.), 276–291. Oxford: Oxford University Press. Moynihan M (1985) Communication and Noncommunication in Cephalopods. Bloomington: Indiana University Press. Moynihan MH, Rodaniche AF (1982) The behaviour and natural history of the Caribbean reef squid Sepioteuthis sepioidea with a consideration of social, signal and defensive patterns for difficult and dangerous environments. Adv Ethol 125: 1–150. Muntz WRA (1999) Visual systems, behaviour and environments in cephalopods. In: Adaptive Mechanisms in the Ecology of Vision (Archer SN et al., eds.), 467–483. Dordrecht: Kluwer. Neisser U (1976) Cognitive Psychology. New York: Appleton-Century-Crofts. O’Dor RK, Webber DM (1986) The constraints on cephalopods: Why squid aren’t fish. Can J Zool 64: 1591–1605. Owings DH, Morton FS (1998) Animal Vocal Communication: A New Approach. Cambridge: Cambridge University Press. Packard A (1972) Cephalopods and fish: The limits of convergence. Biol Rev 47: 241–307. Packard A (1982) Morphogenesis of chromatophore patterns in cephalopods: Are morphological and physiological “units” the same? Malacologia 23: 193–201. Packard A (1985) Sizes and distribution of chromatophores during post-embryonic development in cephalopods. Vie Milieu 35: 285–298. Packard A (1988a) The skin of cephalopods (Coleoids): General and special adaptations. In: The Mollusca: Form and Function, vol. 11 (Wilbur KM, Clarke MR, eds.), 37–67. New York: Academic Press. Packard A (1988b) Reading your enemy right: Lessons from the octopus in the subtle art of self-defence. In: Fear and Defence (Brain PF, Parmigiani S, Blanchard RJ, Mainardi D, eds.), 24–39. London: Harwood. Packard A (1995) Organization of cephalopod chromatophore systems: A neuromuscular image-generator. In: Cephalopod Neurobiology (Abbott NJ, Williamson R, Maddock L, eds.), 331–367. Oxford: Oxford University Press. Packard A, Sanders GD (1971) Body patterns of Octopus vulgaris and maturation of the response to disturbance. Anim Behav 19: 780–790. Rowell CHF (1963) Excitatory and inhibitory pathways in the arm of Octopus. J Exper Biol 40: 257–270.
Cephalopod Skin Displays
213
Rowell CHF (1966) Activity of interneurons in the arm of octopus in response to a tactile stimulation. J Exptl Biol 44: 589–605. Shashar N, Hanlon RT (1997) Squids (Loligo pealei and Euprymna scolopes) can exhibit polarized light patterns produced by their skin. Biol Bull 193: 207–208. Shashar N, Rutledge PS, Cronin TW (1996) Polarization vision in cuttlefish—a concealed communication channel? J Exper Biol 199: 2077–2084. Smith WJ (1990) Communication and expectations: A social process and the cognitive operations it depends upon and influences. In: Interpretation and Explanation in the Study of Animal Behavior, vol. 1, Interpretation, Intentionality, and Communication (Bekoff M, Jamieson D, eds.), 234–253. Boulder, Colo.: Westview Press. Triesman A (1986) Features and objects in visual processing. Sci Amer 254: 114–125. Wells MJ (1978) Octopus: Physiology and Behavior of an Advanced Invertebrate. London: Chapman and Hall. Wells MJ (1994) The evolution of a racing snail. In: Physiology of Cephalopod Molluscs (Pörtner HO, O’Dor RK, Macmillan DL, eds.), 1–12. Amsterdam: Overseas Publishers. Wiley RH (1983) The evolution of communication: Information and manipulation. In: Animal Behaviour, vol. 2 (Communication) (Halliday TR, Slater PJB, eds.), 156–189. Oxford: Blackwell Scientific.
V
PRIMITIVE COMMUNICATION SYSTEMS AND LANGUAGE
12
The Evolution of Language: From Signals to Symbols to System
Chris Sinha Introduction Human natural languages are communicative systems, and the primary use of language is to communicate. The precise nature of the relationship between the communicative functions and the systemic properties of natural languages may be disputed, but what cannot be disputed is that language is a vehicle for human communication. How different are natural languages, and their users, from other natural communication systems, and other species? Studies of nonhuman communication systems have revealed not only the ubiquity of communication in the animal world but also unsuspected complexity in some naturally occurring systems of nonhuman communication. A now classic example is the communication system of the vervet monkeys studied by Cheney and Seyfarth (1981). These monkeys employ a system of warning calls in which each of three call types codes for the presence of a particular predator (snake, eagle, leopard). Animals hearing a call respond with behavior that is appropriate to the danger posed by the predator: hearing an eagle call, they descend from a tree; hearing a snake call, they stand and scan the ground; hearing a leopard call, they climb up in a tree. The capacity for communicative use of elements of the lexicons, or elements corresponding to those of the lexicons, of human natural languages is certainly not unique to humans. People have communicated with domestic animals for countless generations. However, nonhuman animals can do more with human natural languages than respond to simple instructions. When raised in an environment broadly resembling the cultural and communicative settings in which human infants acquire language, bonobos (Pan paniscus) apparently can acquire extensive receptive and productive lexicons, use them combinatorially in ways which involve quite complex event characterization, and spontaneously teach such uses to their offspring (Savage-Rumbaugh and Fields, 2000). African gray parrots, when participating in structured communication settings, also can learn extensive vocabularies and employ them for cross-classification of objects according to different object attributes (Pepperberg, 1999, and chapter 10 in this volume). Human natural languages, and their human users, are the products of both biological and cultural evolution. If language really is unique, however, it is difficult to ascribe this uniqueness to dramatic differences between humans and other species in their genetic makeup. Humans share (on the most recent estimate) about 95 percent of their genetic material with their closest primate relatives, chimpanzees (Britten, 2002). Taken together with initial results of the human genome project, this suggests that the linguistic gulf separating the human species from other closely related species is not correlated with a
218
Chris Sinha
difference of orders of magnitude in the available quantity of genetic material for directly coding the language capacity. This does not mean that there is no genetic foundation for the human language capacity. It does mean that we should be cautious in ascribing differences between human languages and other natural communication systems to interspecies genetic differences alone. Language is the foundation of human societies as symbolically mediated orders and of human cultural transmission. It could be, and has been, argued that the uniqueness of language goes hand in hand with the uniqueness of culture. Culture is not, however, a uniquely human achievement. Culture can minimally be defined as the existence of intraspecies group differences in behavioral patterns and repertoires which are not directly determined by ecological circumstances (such as the availability of particular resources employed in the differing behavioral repertoires), and which are learned and transmitted across generations. On this definition, there is ample evidence of culture and cultural differences in foraging strategies, tool use, and social behaviors in chimpanzees (Whiten et al., 1999; de Waal, 2001). Such a definition will also qualify, for example, epigenetically learned intraspecies dialect differences between songbird communities as cultural and culturally transmitted behavior (Marler and Peters, 1982). Given these findings, would it be correct to conclude, as some have, that the human language capacity is, after all, not species unique? Such an argument would hold that the evident continuities we can observe between humans and nonhumans in genetic makeup, capacity for culture, and capacity to use languagelike signs communicatively justify the “gradualist” conclusion that the difference in complexity between human natural languages, and the communication systems and abilities of nonhuman animals, is nonqualitative. I will present some arguments why this is not the case. My argument will not focus solely or primarily upon the unique grammatical properties of human natural languages, although it is clear that these exist. My argument is rather that, in contrast to nonhuman signal systems of communication, human natural languages are symbol systems. The evolutionary transition from signal to symbol usage, and the extrasomatic, culturally driven elaboration of symbol usage into language, account for the unique complexity of human language (including grammar). This emergent complexity, I suggest, has in the course of evolution co-opted or captured a suite of cognitive capacities that are uniquely developed (but not unique) in humans. The account I offer below of the evolution of the human language capacity is neither nativist nor empiricist, but one based upon the epigenetic emergence and elaboration of symbolization. Each of these terms is technical, and all of them are disputed. Hence, I conclude this introduction by providing definitions of how I shall use the terms epigenesis (and epigenetic), emergence, elaboration, and symbolization.
Evolution of Language
219
Epigenesis Contemporary theories of epigenesis in biological and psychological development build upon the pioneering accounts of Waddington (1975) and Piaget (1979). Epigenetic naturalism (Sinha, 1988) proposes a constructivist account of the interaction between genotype and somatic and extrasomatic environment in organismic development. The claim that such an interaction exists is, as such, trivial and undisputed, since everyone agrees that phenotype is codetermined by genes and environment. There are two particularly important characteristics of epigenesis that I wish to highlight here. The first is that the role of the environmental factors is constructive rather than being selective or in addition to being selective. Nativist approaches to the developmental interaction between genotype and environment stress the role of specific input either in permitting a developmental process to unfold or in parametrically selecting a particular variant of development. An example of the former would be phenomena such as “imprinting,” where an innate and fully endogenous process of development is “triggered” by an environmental event during a critical developmental window. An example of the latter would be the role hypothesized by generative linguists to be played by typological characteristics of target languages in setting parameters and thereby permitting the child noninductively to acquire the grammar of the target language (Chomsky, 2000). In neither of these cases does the environmental information add any higher level of organization to the genetically coded information. That is, the pathway along which the behavior develops, and its terminal structural complexity, are assumed already to be directly encoded in genes. By contrast, in epigenesis the developmental pathway and final structure of the behavior that develops are a consequence as much of the environmental information as of the genetically encoded information. For example, the development of birdsong seems to involve reproduction by imitative epigenetic learning rather than selection from among preestablished alternatives (Marler and Peters, 1982). Fledglings not exposed to a model do develop birdsong, but it is impoverished or unelaborated relative to that of individuals which develop in a normal environment where models are available. The second key characteristic of epigenesis is, accordingly, that a genetically specified developmental envelope or window specifies an initial behavioral (or perceptual) repertoire that is subsequently elaborated through experience of a relevant environment. This process of elaboration is directional (see below), and once it has taken place, the initial plasticity of the embryonic, or unelaborated, repertoire is largely (though not necessarily wholly) lost. A typical example is the development in human infancy of speech sound perception (Bohn, 2000; Kuhl, 2000; Oller, 2000), in which the “universal” initial processor is transformed into a “language-specific” processor in a process that probably is analogous with that of the development of birdsong.
220
Chris Sinha
We can note here that an epigenetic account of this process differs from a nativist, parameter-setting process inasmuch as no assumption is made that the human infant’s brain is innately equipped with an inventory of all possible natural language phonemes (the first characteristic, above). Equally, however, it differs from a classical learning account inasmuch as epigenesis depends upon the elaboration of an initial repertoire which itself is not learned, in a process which cannot be rerun—the initial, unelaborated capacity cannot be reaccessed after the epigenetic developmental process has taken place, as all second language learners rapidly come to realize. In other words, the process of developmental elaboration implies that in epigenetic development there is a transition from relative plasticity and informational openness to relative rigidity and informational closure. There are two other characteristics of epigenesis that are particularly relevant to human development. One is its neurobiological basis in “neural Darwinism,” the selective stabilization of synaptic connections during ontogenesis (Changeux, 1985). The other is the role of ontogenesis itself in canalizing phylogenesis, through “Baldwin effects” and genetic assimilation. Baldwin (1902) hypothesized that behavioral adaptations in individuals could track environmental changes, or increase the range and complexities of behavioral repertoires, before becoming genetically fixed (assimilated) by natural selection. The relevance of such a process to the elaboration of communication systems is obvious, and it should be noted that Baldwin supposed that such “organic selection” played an increasingly important role in the evolution of species with high degrees of neural and behavioral plasticity. Emergence The “emergentist” hypothesis has received considerable attention recently as an alternative (closely allied with epigenetic theories) to nativism (MacWhinney, 1999). I will use emergence to mean, quite widely, the development of new properties and/or levels of organization of behavioral and cognitive systems as a consequence of the operation or cooperation of simpler processes. Epigenesis is thus a special case of emergence. In this chapter, I focus on symbolization as a phylogenetically emergent property of communication, as well as upon its epigenetic development in infancy. Elaboration By elaboration I mean the process whereby development gives rise to increased complexity of organism, behavior, and cognition. Increase in complexity usually involves both form and function. A crucial distinction between Darwinian natural selection and epigenetic development is that the latter, but not the former, implies elaboration. In ontogene-
Evolution of Language
221
sis, some instances of elaboration are under more or less direct genetic control, others may be epigenetically driven, and still others may be emergent consequences of the elaboration of subsystems. I will not make a strong distinction between emergence (new properties) and elaboration (greater complexity), which I see as two aspects of the underlying directionality of developmental change (Valsiner, 2000). Although it is appropriate to reject teleological explanations for Darwinian evolution, and teleology is not inherent in emergence, teleology is inherent in elaboration as a directional process whose “aim” is the increase in the spatiotemporal extent of the lived and cognized environment. Symbolization Symbolization is the central topic of this chapter, and I shall restrict myself here to some brief remarks which I shall expand below. There is a large literature in linguistics, psychology, and semiotics on the defining characteristics of the concept of “symbol,” which I do not have the space to review in depth. Many treatments postulate arbitrariness and productivity as criterial for symbolicity. In line with much discussion of the importance of motivation in language structure (Lakoff, 1987), I consider the extent of arbitrariness in natural languages to have been overstated, although it is certainly characteristic of morphemes in spoken languages (Hurford, 1989). Arbitrariness is a bivalent relationship between symbol and symbolized, which does not adequately capture the communicative logic of symbolization. I prefer to focus on the notions of intentionality and conventionality, which directly implicate the users of symbols, and their psychological capacities and social relations; and reference, which similarly implicates the uses to which symbols lend themselves. Productivity is indeed criterial for natural languages and symbol systems, but not necessarily for symbolicity as such. If it were so, we should be forced to conclude that children in the early stages of language acquisition are using one-word utterances nonsymbolically, which is counterintuitive, since what they have not yet learned is the structural properties of the system, not the signifying properties of words. Symbols are contrasted, in Peirce’s semiology (Peirce, 1955), with both icons and indices. I have followed instead the binary distinction employed by Bühler (1990) between signals (which equate in most respects to Peirce’s indices) and symbols. My core hypothesis is that the epigenetic development of symbolization involves the emergence of symbol usage from communicative signal usage. Whereas a communicative signal can be viewed as an instruction (perhaps coded) to behave, the use of symbols involves two emergent properties, reference and construal. The latter are the basic functional components of the representational function of language, and the development of symbolization is essentially the process of elaborating the representational function.
222
Chris Sinha
Signals and Symbols Signals and Signal Sensitivity Sensitivity to signals is as basic a property of life as the ability to reproduce. All organisms are able to detect signals indicating (indexing) the presence of conditions hospitable to survival (including metabolism) and reproduction. The more complex the organism, the greater the range of signals to which it is sensitive, and the more complex its behaviors both in response to, and in the active search for, life-relevant signals. In the most general terms life might be defined as the possession by self-organizing systems of the dynamic and mutually influencing emergent properties of reproduction and signal sensitivity, which together provide the basic conditions for the organismic “value system.” The functional characterization of simple, noncommunicative signals is essentially identical to that of the stimulus-response link of classical learning theory, although the responsivity of the organism may be either innately determined or learned. It is diagrammed in figure 12.1. Signals, in social animals, may also be used to communicate. The basic structure of the communicative signal is shown in figure 12.2. The communicative signal mediates between a noncommunicative signal picked up by one organism and the response produced by a second organism. The social exchange of communicative signals does not require intentionality. The sender does not have to emit the communicative signal purposively, since the signal may simply be an innate or learned response to a stimulus. The receiver does not have to direct its attention either to the sender or to the original stimulus (signal1) that causes the sender to emit the communicative signal, but only to the communicative signal emitted by the sender. The sender is not signifying or representing a “referent” for the receiver, and no mutual awareness of the cognitive viewpoint of sender and receiver is implied in the exchange.
Signal (Stimulus)
Organism
Figure 12.1 A noncommunicative signal.
Response
Evolution of Language
223
Signal1 (Stimulus)
Response1/ Signal2
Organism1
Organism2
Response2
Figure 12.2 A communicative signal.
The communicative signal (as in the vervet monkey call) may, however, bear an arbitrary relationship to the noncommunicative signal (object, event) which triggers it. Does this mean that vervet monkey calls are symbolic in nature (as maintained, for example, by Hurford, personal communication)? Certainly, if learned, they display an embryonic conventionality as well as a primitive systematicity. Social, communicative signals thus may be in some degree systematic and coded, that is, the same communicative modality may support a variety of coded instructions, and it is even possible for them to support a simple “code-syntax.” But are they intentionally produced in order to refer to the dangers they signal? Or are they merely behavioral cues, instructions to behave in a certain way? Before exploring this issue further, let us introduce an analysis of fully developed symbolic communication based upon Karl Bühler’s (1990) organon model of language, first published in 1934. Symbols and Symbolization The conventionality of a true symbol rests upon the shared understanding by the communicating participants that the symbol is a token representing some referential class, and that the particular token represents a particular (aspect of a) shared situational context and, ultimately, a shared universe of discourse. Conventional symbol systems are therefore grounded in an intersubjective meaning field in which speakers represent, through
224
Chris Sinha
symbolic action, some segment or aspect of reality for hearers. This representational function is unique to symbolization, and is precisely what distinguishes a symbol from a signal. A signal can be regarded as a (possibly coded) instruction to behave in a certain way. A symbol, on the other hand, directs and guides not the behavior of the organism(s) receiving the signal but its (their) understanding (construal) or (minimally) its (their) attention, with respect to a shared referential situation. In this way, we can unpack and understand the concept of intentionality, widely understood to be intrinsic to symbol usage but used in several different ways. For current purposes I distinguish three meanings (or related aspects) of intentionality: Intentionality1—Purposiveness or goal-directedness Intentionality2—Orientation to others as “minded” beings Intentionality3—Directedness to the world, or reference. I suggest that these different aspects of intentionality are interrelated in symbol usage, which involves the purposive use by a speaker of a symbolic sign to manipulate or direct the mental orientation (construal or, minimally, attention) of a hearer with respect to an intersubjectively shared aspect of reality (joint reference). Note that “speaker” and “hearer” should be understood as producer and interpreter, respectively, of a symbolic sign in any modality, and “reality” should be understood as any aspect of the shared universe of discourse. It is important to emphasize that symbolicity is defined here in terms of the semiotic and pragmatic logic of communicative representation, not of the specific typology, in Peirce’s sense, of the relationship between sign and object (Sinha, 1988). Even an indexical sign, such as simple pointing, provided it is intentionally produced in an intersubjective field of joint reference, can be regarded as a kind of “protosymbolic” communication, and the intentional and conventional production and comprehension of iconic representations such as maps clearly fall under this pragma-semiotic definition of symbolization. My claim here is that the first criterion for symbolization, or the existence of a symbolic capacity in any organism or simulated organism, is reference. It is, however, important to specify that reference, in this definition, is not a property of signs or symbols “in themselves”: symbols refer only by “inheriting” the referential function intended by their user (Sinha, 1999). Reference, however, is only the first of two criteria for fully developed, or “true,” symbolization. I will claim that joint reference is the criterial basis for the emergence of symbolization, while the second criterion, which I shall call, following Langacker (1987), construal, constitutes the set of cognitive operations which underpin the elaboration of protosymbolic joint reference into true symbolization.
Evolution of Language
225
Referential Situation
Representation
Symbol Expression
Appeal
Speaker
Hearer
Figure 12.3 Symbolic communication—a modified version of Bühler’s organon model of language. Broken lines represent joint attention.
Simple, unadorned joint reference, such as is implied by the production and comprehension of an indexical pointing gesture, serves to orient the attention of the receiver, but does not (in the general case) direct the receiver to any particular understanding or conceptualization of what is being referred to. The use of a truly symbolic sign, such as a word, however, at the very least implies a categorization of the referent, and may involve complex manipulations of perspective and figure-ground relations. This cognitivefunctional analysis of symbol usage is essentially the same as that advanced by Bühler (figure 12.3). The Emergence of Symbolization It is possible to envisage an evolutionary scenario for the phylogenetic emergence of symbolic communication from signal communication. We may hypothesize the following four steps: 1. The receiver comes to pay attention to the sender as the source of communicative signals. 2. The sender comes to pay attention to the receiver as a recipient of communicative signals.
226
Chris Sinha
3. The receiver comes to pay attention to the evidential reliability of the sender’s communicative signals as a source of information, by checking what the sender is paying attention to or doing. 4. The sender comes to pay attention to the receiver’s readiness to act reliably upon the information communicated, by paying attention to what the receiver is paying attention to or doing. The first two steps of this sequence do not involve the communicating organisms’ intersubjective “sharing” of a referential world, but they do require orientation toward, or social referencing of, a communication partner either as a source of information or as an actor whose behavior can be influenced. This level of communicative competence is probably widespread among mammals, underpinning complex signal-mediated social behaviors. Not only communication between conspecifics, but also communication between humans and domesticated or working animals such as dogs, horses, and elephants, often seems to involve an understanding on the part of the domesticated animal that the human can both send and receive signals. My young border collie, for example, brings a ball and nuzzles me with it, while looking at me, when she wants to play (an instance of step 2 above). This can be considered an elementary instance of communicative intentionality, in the sense that the dog is able to treat communication as a means of indirectly achieving goaldirected action (intentionality1). Communicative signals, it has been suggested, may emerge from noncommunicative signals through a process of ritualization, in which the expression of an emotionalmotivational state, or the initiating sequences of a social behavior, become stylized and acquire a communicative value (Huxley, 1966; Hauser, 1996). A communicative signal indexing a noncommunicative intention (such as a wish to engage in play, grooming, or any other social behavior) often has its origins in an initiatory segment of the behavior, which may be abbreviated or stylized in shifting its status from “just behavior” to signal. Ritualization and abbreviation are also observed in the development of dyadic mother–infant communication in humans (Lyra and Souza, in press). It is the understanding by each of the communication partners that the other can both send and receive such signals that constitutes the mastery of steps 1 and 2 above. Communication, with the achievement of steps 1 and 2, remains signal-based, but it implies the establishment of a first level of intersubjectivity, consisting of a recognition by each communication partner of the other as a communication partner, and the recognition by each partner of the other as an agent capable of acting as initiator or mediator of goal-directed action. In phylogenesis, then, the basis of intersubjectivity is (I hypothesize) constructed through the mediation of goal-directed social behaviors by signals and the understanding
Evolution of Language
227
Controls Interaction
Figure 12.4 Primary intersubjectivity—caretaker–neonate interaction from three weeks.
of the communicative partner as a potential agent. The ontogenesis of intersubjectivity in humans seems, however, to follow a different route: primary intersubjectivity has been claimed to be innate (Trevarthen and Hubley, 1978) (figure 12.4). Caretakers (usually mothers) and infants engage from very soon after the child’s birth in episodes of “communication” in which the bodily movements, facial expressions, and vocalizations of the two participants provide the signals necessary for the maintenance of the communicative channel or intersubjective “we” formed by the dyad. The real-time temporal meshing by the mother of her actions with those of the baby is of fundamental importance to the maintenance of intersubjectivity, indicating the emergence of a psychologically real “ontology of the social.” Whereas I suggested in the evolutionary scenario above that communicative intentions emerge phylogenetically from elaboration of the recognition and signaling of social intentions, in human ontogenesis communicative motives are innate (or at least intrinsic to the socio-emotional relation between mother and infant). Whether it is innate or not, the mutual gaze component (at least) of early mother-infant interactions appears not to be species specific, and its frequency in interaction is culturally variable in both humans and chimpanzees (Bard et al., 2002). In taking steps 3 and 4, the sender and/or receiver develops the capacity to understand that a signal indexes an intention rather than the action intended. With this, the possibility is opened for deception and suspicion regarding intentions. The most basic level of understanding of the communicative partner not just as a potential agent, but also as an experiential subject within the intersubjective field, is the ability to follow gaze, as evidenced by human infants from about six months of age (Butterworth and Jarrett, 1991) and by a number of other species (figure 12.5). Gaze following allows the receiver to monitor the activity and attention of the communicative partner, but not to manipulate (as sender) the attention of the partner to a specific object or referent. The ontogenetic development of this capacity has been well researched since the late twentieth century. From around nine or ten months of age, human infants
228
Chris Sinha
Zone of Shared Attention
Figure 12.5 Gaze following—human infants (six months), chimps, dolphins, sheepdogs.
begin to engage with adults in relatively extended bouts of joint attention to objects. . . . In these triadic interactions infants actively co-ordinate their visual attention to person and object, for example by looking to an adult periodically as the two of them play together with a toy, or by following the adult’s gaze. Infants also become capable at this age of intentionally communicating to adults their desire to obtain an object or to share attention to an object, usually through nonlinguistic gestures such as pointing or showing, often accompanied by gaze alternation between object and person. (Tomasello, 1996, p. 310; see also Franco and Butterworth, 1996)
The achievement of joint reference in human infancy establishes the “referential triangle” (figure 12.6), also referred to as “secondary intersubjectivity” (Trevarthen and Hubley, 1978). Spontaneous productive pointing in free-ranging nonhuman primates has been observed (Bard, 1992), although its extent and frequency in the wild are unclear. The emergence of the “referential triangle” marks the emergence of the first criterion for symbol usage, reference in the intersubjective field. From this point until about 14 months of age, infants increasingly mediate the manipulation of the field of joint attention by manipulating objects in give-and-take routines, and early in the second year of life they begin to demonstrate active mastery of the conventional or canonical usage of objects in
Evolution of Language
229
Figure in joint attention
Figure 12.6 The referential triangle—joint reference in human infant (age nine–ten months).
play situations, their usage of such objects being dominated by the cultural specification of conventional function until well into the third year of life (Sinha, 1988; Moro and Rodriguez, 1998; Sinha and Jensen de López, 2000). It seems to be a well-founded conclusion that by early in the second year of life, the basic foundations of symbolization in intersubjectivity, and in an understanding of conventionality, have been laid. At this point, we can return to the question asked above: Are vervet monkey calls symbolic, or at least protosymbolic? The basic model of the social exchange of signals, represented in figure 12.2, implicates neither intersubjectivity nor social convention. Instead, it involves simple coordination of individual organismic behavior (which may, indeed, be complex—arising, like many complex behaviors, from natural selection). This mechanism also lacks not only reference but even direct attentional coordination between the communicators. The vervet monkey system may well be more complex than this, especially inasmuch as it seems to cue visual attention as well as locomotion. The tendency to arbitrariness and systematicity displayed by the vervet monkey call system, a result of far-reaching ritualization, may serve to enhance the attentional orientation of the communicators both to each other and to the shared context of situation, facilitating the logic of emergence described above. If this is so, we should perhaps view the properties of arbitrariness and systematicity as possible prerequisites for, rather than criteria of, the emergence of symbolization from signals.
230
Chris Sinha
The Elaboration of Symbolization into Grammar Fully developed symbol usage, I have argued, involves the mastery of a symbolic system with a representational function. The baseline of symbolization is, I also have argued, reference in an intersubjective field, exemplified at the most fundamental level by nonlinguistic means for sharing and manipulating joint attention. From here, it is only a relatively short step to a protosymbolic system of conventional word signs, but there still remains a long road to travel before we arrive at evolutionary modern natural languages, with their multidimensional cognitive, grammatical, and pragmatic complexity. What drives the elaboration of symbolization into language? Does increase in structural complexity merely accompany the increasing differentiation and integration of cognitive and communicative functions, following its own autonomous developmental pathway? Or is structural elaboration motivated by, and interdependent with, functional elaboration? Since there is no consensual answer to this question in respect to the ontogenesis of language, consensus can hardly be expected in interpreting the meager evidential base for language evolution. What follows is therefore a speculative account, based upon a core thesis of cognitive-functional linguistics: that natural languages are complex, multilevel systems of mapping between linguistic conceptualization and linguistic expression (Sinha, 1999). Conceptualization in and through language involves manipulations of figure-ground relations, the adoption and shifting of perspectives, and the exploitation of the symbolic power of language to construct virtual realities, enabling speakers and hearers to share universes of discourse extending beyond the actual spatiotemporal frame of reference (Hockett, 1960; Oller, chapter 4 in this volume). Grammar, in this view, is not confined to rules governing linguistic expression, but is the structural means for integrating conceptualization and expression in discourse contexts. The variety and power of linguistic constructions afford a rich flexibility of construal of actual and virtual referential situations. The notion of construal (Langacker, 1987) can be illustrated by example. Any referential situation which requires characterization in terms of the relationships obtaining between more than one entity may thus be characterized in more than one way. I can say, for example, that the cup is on the saucer, or that the saucer is under the cup. In the first case, the cup is the figure (or trajector), and the saucer is the ground (or landmark) in relation to which the location of the cup is specified. In the second case, these cognitive roles are reversed. Similarly, the lexicalization “father of” represents the same relationship as the lexicalization “child of,” but the two lexicalizations are from different perspectives. Construal in language also often involves the superimposition of virtual properties onto actual referential situations, as has been emphasized in the case of spatial conceptualiza-
Evolution of Language
231
Figure (Trajector)
Universe of Discourse
Ground (Landmark)
Referential Situation
Schematizes
Linguistic Expression
Figure 12.7 Semiotic mediation—linguistic conceptualization as symbolic construal.
tions by Talmy (1996), who designates usages such as “The tunnel goes from Dover to Calais” as instances of “fictive motion.” The hypothesis that I advance is that the evolutionary elaboration of symbolization into grammar involved the construction of natural language subsystems that functionally subserve flexible construal. Probably this process of elaboration accompanied, and was led by, increasing sociocultural complexity, necessitating more complex perspectival coordinations and more complex discourse representations, including narrative representations consolidating group identity, planning of socially coordinated activity, and naming practices based upon increasingly complex kinship relations and growing social differentiation. Whatever the details, the hypothesis proposes that the emergence of grammar was not a fortunate accident, but a developmental process governed by a social cognitive logic of elaboration. Figure 12.7 diagrams the semiotic structure resulting from the elaboration of joint reference into linguistic (symbolic) conceptualization via the mastery of symbolic vehicles enabling flexible construal.
232
Chris Sinha
Infancy, Evolution, and Culture There is a common epigenetic logic to the phylogenetic and ontogenetic development of symbolization. The logic is one of process, from signals to the emergence and elaboration of symbols. This logic involves the following subprocesses, which significantly overlap temporally but emerge in the order below: Intentionality, intersubjectivity, and reference Conventionalization based on intersubjectivity Structural elaboration yielding flexible construal. Each of these subprocesses represents a contrast with specific cognitive characteristics of signal-based communication. Intentionality contrasts with stimulus dependence, conventionalization contrasts with (though perhaps emerges from) simple social coordination, and structural elaboration contrasts with code rigidity. The subprocesses of conventionalization and elaboration are dynamically coupled: novel, elaborated constructions are entrenched in usage, and can subsequently be recruited for further elaboration, eventuating in a “ratchet effect” (Tomasello, 1999) capable of producing potentially very rapid structural change and evolution. It should be emphasized that there is no claim in this model that ontogenesis necessarily involves, within any one of these processes, the recapitulation of stages passed through in phylogenesis. Although we can observe analogous phenomena in (for example) the communication strategies of human children and nonhuman primates, there are also many differences. We have seen, for example, that primary intersubjectivity appears to be innate (or intrinsic) in humans, and perhaps in some other primates, while it is hypothesized to be emergent in phylogenesis from the mediation by communicative signals of noncommunicative social behaviors. Similarly, although it is plausible to draw very general analogies in terms of principles of motivation between grammaticalization processes in historical language change, and the acquisition by the child of the constructional resources of grammar, the stages and strategies characterizing each of these processes are very different (Slobin, 1997). Commonalities in developmental logic do not, therefore, imply that ontogenesis recapitulates phylogenesis. Instead, I would like to suggest that ontogenesis—and in particular the ecological niche of infancy—played a crucial role in the evolutionary development of the human symbolic capacity. Human infants, as has often been pointed out, are extraordinarily well adapted to the demands of enculturation and the acquisition of symbolic communication (Tomasello, 1999). I suggest that this is because, once established, the emergent social ontology of intersubjectivity and conventionalization sets up new parame-
Evolution of Language
233
ters for the selection of context-sensitive and socially situated learning processes, rather than “content-dedicated” cognitive mechanisms. In such an evolutionary process, a major role might have been played by “Baldwin effects” that lend a teleological directionality to natural selection, mediated by the inherent teleology of the elaboration of symbolic communication. The traditional and still dominant view of evolution and development is one in which the development of “higher” levels of organization is dependent upon prior developments in “lower” levels of organization. In particular, the priority of individual organismic properties is assumed to carry over from the level at which natural selection occurs to the level of psychological processes. Even if the existence of emergent, higher-level (sociocultural) properties is conceded, the autonomy of these levels is continually undermined by theories that reduce them to the causal properties of supposedly “more basic” levels. An alternative view, consistent with recent findings in developmental psychology and cultural primatology, proposes that an emergent sociocultural level of organization set the evolutionary stage for subsequent epigenetic development and genetic selection. This account stresses the emergence of the first foundation of symbolization and language not in individual cognition but in the quintessentially social space of intersubjectivity and normativity. It is this space that constituted the niche for the emergence of symbols from signals. Further epigenetic dynamics involved the evolution of infancy toward adaptation to intersubjective communication and conventionalized patterns of interaction, facilitating rapid ontogenetic acquisition of symbol systems. I have proposed above a model for the further elaboration, in the context of increasing sociocultural complexity and, perhaps, group size (Dunbar, chapter 14 in this volume), of the symbol system into grammaticalized natural language, perhaps relatively recently in human evolution. It is possible, too, that the same epigenetic dynamics responsible for the adaptation of human infants to intersubjectivity produced adaptations to structurally complex symbolization. As yet, we have no decisive evidence to accept or reject hypotheses regarding the innateness of a specifically grammatical component of the human language faculty. It should, however, be emphasized that in an epigenetic perspective, any developmental predisposition for learning language is unlikely either to involve direct coding of, or to be dedicated exclusively to, linguistic structure (Mueller, 1996; Sinha, 1996). Acknowledgments I thank the editors and referees for their detailed and very helpful comments on an earlier version of this chapter.
234
Chris Sinha
References Baldwin JM (1902) Development and Evolution. London: Macmillan. Bard K (1992) Intentional behaviour and intentional communication in young free-ranging orangutangs. Child Devel 63: 1186–1197. Bard K, Myowa-Yamakoshi M, Quinn J, Tomonaga M, Matsuzawa T (2002) Cultural differences in mutual gaze between mother and infant chimpanzees. Paper presented to the XIXth Congress of the International Primatological Society, Beijing, August. Bohn O-S (2000) Linguistic relativity in speech perception: An overview of the influence of language experience on the perception of speech sounds from infancy to adulthood. In: Evidence for Linguistic Relativity (Niemeyer S, Dirven R, eds.). Amsterdam: John Benjamins. Britten R (2002) Divergence between samples of chimpanzee and human DNA sequences is 5%, counting indels. Proc Nati Acad Sci USA 21: 13633–13635. Bühler K (1990) Theory of Language: The Representational Function of Language. Amsterdam: John Benjamins. (First published 1934). Butterworth G, Jarrett N (1991) What minds have in common is space: Spatial mechanisms serving joint visual attention in infancy. Brit J Devel Psych 9: 55–72. Changeux J-P (1985) Neuronal Man: The Biology of Mind. Oxford: Oxford Uinversity Press. Cheney DL, Seyfarth RM (1981) Selective forces affecting the predator alarm calls of vervet monkeys. Behaviour 76: 25–61. Chomsky N (2000) The Architecture of Language (Mukherji N, Patnaik BN, Agnihotri RK, eds.). New Delhi: Oxford University Press. Franco F, Butterworth G (1996) Pointing and social awareness: Declaring and requesting in the second year. J Child Lang 23: 307–336. Hauser MD (1996) The Evolution of Communication. Cambridge, Mass.: MIT Press. Hockett C (1960) Logical considerations in the study of animal communication. In: Animal Sounds and Communication (Lanyon WE, Tavogla, WN, eds.), 392–430. Washington, D.C.: American Institute of Biological Sciences. Hurford J (1989) Biological evolution of the Saussurian sign as a component of the language acquisition device. Lingua 77: 187–222. Huxley J (1966) A discussion of ritualization of behaviour in animals and man. Phil Trans Roy Soc London 251: 273–284. Kuhl P (2000) Language, mind and brain: Experience alters perception. In: The New Cognitive Neurosciences, 2nd ed. (Gazzaniga M, ed.). Cambridge, Mass.: MIT Press. Lakoff G (1987) Women, Fire and Dangerous Things. Chicago: University of Chicago Press. Langacker RW (1987) Foundations of Cognitive Grammar, vol. 1, Theoretical Prerequisites. Stanford, Calif.: Stanford University Press. Lyra M, Souza M (in press) Dynamics of dialogue and emergence of self in early communication. In: Dialogicality in Development, vol. 5, Child Development in Culturally Structured Environments (Josephs IE, ed.). Stamford, Conn.: Greenwood. MacWhinney B (ed.) (1999) The Emergence of Language. Mahwah, N.J.: Lawrence Erlbaum. Marler P, Peters S (1982) Developmental overproduction and selective attrition: New processes in the epigenesis of birdsong. Devel Psychobiol 15: 369–378. Moro C, Rodriguez C (1998) Towards a pragmatical conception of the object: The construction of the uses of objects by the baby in the prelinguistic period. In: Construction of Psychological Processes in Interpersonal Communication (Lyra M, Valsiner J, eds.). Stamford, Conn.: Ablex.
Evolution of Language
235
Mueller R-A (1996) Innateness, autonomy, universality? Neurobiological approaches to language. Behav Brain Sci 19: 611–675. Oller DK (2000) The Emergence of the Speech Capacity. Mahwah N.J.: Lawrence Erlbaum. Peirce CS (1955) The Philosophical Writings of Peirce (Buchler J, ed.). New York: Dover Books. Pepperberg I (1999) The Alex Studies. Cambridge, Mass.: Harvard University Press. Piaget J (1979) Behaviour and Evolution. London: Routledge and Kegan Paul. Savage-Rumbaugh ES, Fields WM (2000) Linguistic, cultural and cognitive capacities of bonobos (Pan paniscus). Culture Psych 6: 131–153. Searle JR (1980) Minds, brains and programs. Behav Brain Sci 3: 417–424. Sinha C (1988) Language and Representation: A Socio-Naturalistic Approach to Human Development. Hemel Hempstead: Harvester-Wheatsheaf. Sinha C (1996) Autonomy and its discontents. Behav Brain Sci 19: 647–648. Sinha C (1999) Grounding, mapping and acts of meaning. In: Cognitive Linguistics: Foundations, Scope and Methodology (Janssen T, Redeker G, eds.). Berlin: Mouton de Gruyter. Sinha C, Jensen de López K (2000) Language, culture and the embodiment of spatial cognition. Cognit Ling 11: 17–41. Slobin DI (1997) The origins of grammaticizable notions: Beyond the individual mind. In: The Crosslinguistic Study of Language Acquisition, vol. 5, Expanding the Contexts (Slobin DI, ed.). Mahwah, N.J.: Lawrence Erlbaum. Talmy L (1996) Fictive motion in language and perception. In: Language and Space (Bloom P, Peterson M, Nadel L, Garrett M, eds.). Cambridge, Mass.: MIT Press. Tomasello M (1996) The child’s contribution to culture: A commentary on Toomela. Culture Psych 2: 307–318. Tomasello M (1999) The Cultural Origins of Human Cognition. Cambridge, Mass.: Harvard University Press. Trevarthen C, Hubley P (1978) Secondary intersubjectivity: Confidence, confiding and acts of meaning in the first year. In: Action, Gesture and Symbol: The Emergence of Language (Lock A, ed.). London: Academic Press. Valsiner J (2000) Culture and Human Development. London: Sage. Waal F de (2001) The Ape and the Sushi Master. London: Allen Lane. Waddington CH (1975) The Evolution of an Evolutionist. Edinburgh: Edinburgh University Press. Whiten A, Goodall J, McGrew WC, Nishida T, Reynolds V, Sugiyama Y, Tutin CEG, Wrangham RW, Boesch C (1999) Cultures and chimpanzees. Nature 399: 682–685.
13
Cooperation and the Evolution of Symbolic Communication
Peter Gärdenfors What Are the Evolutionary Roles of Language? Homo sapiens is the only species with a symbolic language. According to evolutionary theory, there should be some selective advantage that has fostered the development of language among humans. There are many proposals for such an evolutionary force. Some of the major ideas have been (1) that language brings with it the ability to convey information about prey or other food or about dangers of different sorts; (2) that it is a result of sexual selection;1 (3) that language replaces the social grooming found in monkeys and apes as an instrument for building coalitions and other social bonds (the “gossip theory” proposed by Dunbar, 1996); or (4) that language is a “mother tongue” that evolved among kin for “honest” communication (Fitch, chapter 15 in this volume). However, despite all the merits of these proposals, they have problems explaining why language has not evolved among other primates or animals. I do not claim that there is a unique explanation for why language has evolved with humans. On the contrary, different aspects of language may fulfill different evolutionary needs. However, in this chapter I will propose another advantage of symbolic language that may be more important for the later stages of the evolution of communication than those previously suggested: (5) that language makes it possible to cooperate about future goals. I shall prepare the ground for this thesis by first arguing that humans are the only animals that can plan for future goals. If this is correct, then language would indeed be beyond the cognitive reach of other species. I will then argue that symbolic communication is necessary for advanced cooperation. Finally, as a paradigmatic example of communicating about future goals, I will analyze the cognitive and communicative prerequisites for different types of referential expressions. The evolutionary gain of being able to communicate about referents that are not yet present is that more advanced forms of long-term planning become possible. However, the basis for it all is the notion of a representation. This will be the topic of the following section. Cued and Detached Representations In order to understand the functions of most of the higher forms of cognition, one must rely on an analysis of how animals represent various things, in particular the surrounding world and what it can offer. There is an extensive debate in the literature over what should be taken to be the appropriate meaning of “representation” in this context (see, e.g.,
238
Peter Gärdenfors
Roitblat, 1982; Vauclair, 1990; Gärdenfors, 1996; Grush, 1997). Here I will not go into the intricacies of the debate, but only point out that there are different kinds of representations. In this chapter, the focus will be on the kinds of representations used in cooperative communication. A key point is that in order to give an accurate analysis of many phenomena in animal and human cognition, it is necessary to distinguish between two kinds of representations: cued and detached (Gärdenfors, 1996). A cued representation stands for something that is present in the current external situation of the representing organism. When, for example, a particular object is categorized as food, the animal will then act differently than if the same object had been categorized as a potential mate. I am not assuming that the animal is, in any sense, aware of the representation, only that there is some generalizing factor that determines its behavior. In general, the represented object need not be actually present in the actual situation, but it must have been triggered by something in a recent situation. Delayed responses in the behaviorist’s sense, according to this characterization, are also based on cued representations. In contrast, detached representations may stand for objects or events that are neither present in the current situation nor triggered by some recent situation. A memory of something that can be evoked independently of the context where the memory was created would be an example of a detached representation. Similarly, consider a chimpanzee that performs the following sequence of actions: walks away from a termite hill, breaks a twig, peels off its leaves, returns to the termite hill, and uses the stick to “fish” for termites. This behavior seems impossible to explain unless it is assumed that the chimp has a detached representation of a stick and its use. I am not claiming that it is possible to draw a sharp line between cued and detached representations. There are degrees of detachment. However, I still believe that the rough distinction between the two major kinds of representations is instrumental in that it directs our attention to key features of the representational forms.2 What is the main evolutionary advantage of detached representations in comparison to cued ones? In order to answer this question, I will elaborate an idea introduced by Craik: If the organism carries a “small-scale model” of external reality and of its own possible actions within its head, it is able to try out various alternatives, conclude which are the best of them, react to future situations before they arise, utilize the knowledge of past events in dealing with the present and future, and in every way to react in a much fuller, safer and more competent manner to the emergencies which face it. (1943, p. 61)
I will call this kind of “small-scale model” the “inner world.” The inner world is necessary for representing objects (like food and predators), places (where food or shelter can be found), actions (and their consequences), and so on, even when these things are not
Cooperation and the Evolution of Symbolic Communication
239
perceptually present. The evolution of such a representational power will clearly increase the survival chances of the animal. As a tentative definition, the inner world of an animal will in this chapter be identified with the collection of all detached representations of the animal and their interrelations. It should be noted that I am not assuming that the animal is aware of its inner world, nor of the processes utilizing this construct. It seems that many animal species, in particular mammals, have inner worlds.3 For example, the searching behavior of rats is best explained if it is assumed that they have some form of “spatial maps” in their heads. Evidence for this, based on their abilities to find optimal paths in mazes, was collected by Tolman in the 1930s (Tolman, 1948). However, his results were swept under the carpet for many years because they were clear anomalies for the behaviorist paradigm.4 Anticipatory Planning One of the main evolutionary advantages of an inner world is that it frees an animal that is seeking a solution to a problem from dangerous trial-and-error behavior. Jeannerod (1994) says that “actions are driven by an internally represented goal rather than directly by the external world.” By exploiting its inner world, the animal can simulate a number of different actions in order to “see” their consequences and evaluate them (also compare Grush, 1997; Barsalou, 1999). After these simulations, it can choose the most appropriate action to perform in the outer environment. Of course, the success of the simulations depends on how well the inner world is matched to the outer. Evolutionary selection pressures will, in the long run, result in a sufficient correspondence between the inner world and the outer world. As the Norwegian poet Olav Haugen wrote, “Reality is a hard shore against which the wave-borne dreamer strands.” The ability to envision various actions and their consequences is a necessary requirement for an animal to be capable of planning. Following Gulz (1991, p. 46), I will use the following criterion: An animal is planning its actions if it has a representation of a goal and a start situation, and it is capable of generating a representation of a partially ordered set of actions for itself for getting from start to goal. This criterion presupposes representations of (1) goal and start situations, (2) sequences of actions, and (3) the outcomes of actions. The representations of the actions must be detached; otherwise it is not possible for the animal to choose different actions. In brief, planning presupposes an inner world. Ethologists, who study animal behavior, appear to be largely in agreement that certain animal species can plan in the sense defined here (see e.g., Ellen and Thinus-Blanc, 1987, chaps. 5, 7, 8, and 9; Gulz, 1991, pp. 58–61; and Hauser, 2002, chap. 4). Yet all examples
240
Peter Gärdenfors
of planning among animals available in the ethological literature concern planning for current needs. Animals plan because they are hungry or thirsty, tired or frightened. Their motivation comes from the present state of the body. Oakley writes: Sultan, the chimpanzee observed by Kohler, was capable of improvising tools in certain situations. Tool making occurred only in the presence of a visible reward, and never without it. In the chimpanzee the mental range seems to be limited to present situations, with little conception of past or future. (1961, p. 187)
Man seems to be the only animal that can plan for future needs. We can foresee that we will be hungry tomorrow and put away some of our food; we realize that it will be cold and windy in the winter, so we build a shelter in good time. (Chimpanzees build night camps, but only for the coming night.) Gulz (1991) calls the capacity to plan for the future “anticipatory planning.” That apes and other animals are incapable of anticipatory planning is illustrated by an experiment with chimpanzees performed by Boysen and Berntson (1995). They put peanuts in two heaps of different size on a table out of reach of the apes. One ape was to point at one of the heaps; then that heap was given to the other ape, while the pointer got the one he did not point at. The result of the test was surprising. The chimpanzee repeatedly pointed at the bigger pile and was very disappointed when that pile was given to the other ape, and he himself received the smaller pile. The presence of the desired food seemed to make them incapable of imagining the near future, in which the other party received the pile that they chose and they were left with the other pile. Boysen and Berntson’s experiment clearly shows how difficult it is for chimpanzees to manage even the simplest form of planning for a future goal. Deacon (1997, p. 414) writes that the choice is difficult for the apes because the indirect solution (choosing the small pile) is overshadowed by the direct presence of a more attractive stimulus, the big pile. They cannot suppress their perception. If one performs the same kind of experiment with human children, they have no problem choosing the small pile—from the age of two years and up. They can imagine receiving the big pile when they point at the small one. When children are younger, they behave more like chimpanzees. Why is it cognitively more difficult to plan for future needs than for current ones? The answer has to do with the different representations that are required for the two types of planning. When planning in order to satisfy current needs, one must be able to represent actions and their consequences, and to determine the value of the consequences in relation to the needs one has at that moment. But no detached representation of that need is required. To plan for future needs, on the other hand, one must also be able to represent these potential needs (and to understand that some of them will arise).
Cooperation and the Evolution of Symbolic Communication
241
The available ethological evidence so far indicates that man is the only species with the ability to imagine future wishes and to plan and act accordingly (Gulz, 1991).5 Animals that gather food for the winter are not planning. For example, there is no evidence that the squirrel has an image of its cache or of its needs come winter. In support of this, Sjölander writes, “If you give a squirrel in a cage a standing tube with a hole at the bottom, and a nut, then the squirrel busies itself all day by putting the nut in the tube, where it falls out again, just to pick it up again, and put it back into the tube etc.” (Sjölander, 2002, p. 26; my translation). Signals and Symbols Humans, as well as other animals, can simulate sequences of actions in their inner worlds (Jeannerod, 1994). Such simulations are the core elements in planning activities. However, planning, including anticipatory planning, does not presume a language. There are many species that can plan in various ways but do not have any of the human linguistic capacities. Language is, in my opinion, a latecomer on the evolutionary scene. On the other hand, language presumes the existence of an intricate inner world. In order to make this clear, I will distinguish between signals and symbols. Both are tools of communication. The fundamental difference between them is that the reference of a symbol is a detached representation, while a signal refers to a cued representation. In other words, a signal refers to something in the outer environment or to the emotional state of the signaler, while a symbol refers to the inner world. (Emotions do not belong to what I call the inner world because they are not representations.) Sinha (chapter 12 in this volume) also makes the distinction between signal and symbol. However, he has a slightly different, but compatible, view of their roles: “Whereas a communicative signal can be viewed as an instruction (perhaps coded) to behave, the use of symbols involves two emergent properties, reference and construal.” In this chapter, the role of symbols in establishing references to detached objects and in construing future goals will be in focus. Language consists of symbols—it can be used to talk about things not present in the current situation. This idea can be traced back to Hockett’s (1960) notion of “displacement.”6 Glasersfeld expresses the point as follows: [W]e can talk not only about things that are spatially or temporally remote, but also about things that have no location in space and never happen at all . . . in order to become a symbol, the sign must be detached from input. What the sign signifies, i.e., its meaning, has to be available, regardless of the contextual situation. (1977, p. 64)
With few exceptions, linguistic communication is achieved with the aid of symbols. Sjölander explains elegantly what is missing in animal communication:
242
Peter Gärdenfors
Clearly, if you live in the present, communicating mainly about how you feel and what you want to do in the moment, the biological signals inherent in each species are sufficient. A language is needed only to communicate your internal representation of what could be, what has been, and of those things and happenings that are not present in the vicinity. (1993, pp. 5–6)
Symbols referring to something in one person’s inner world can be used to communicate as soon as the listeners have, or are prepared to add, the corresponding references in their inner worlds.7 The actual conditions of the outer situation need not play any role for the communication to take place: Two prisoners can talk fervently about life on a sunny Pacific island in the pitch dark of their cell. Many animals have intricate systems of signals, such as the dances of bees. However, even if their dances seem to have a kind of grammar, they still consist only of signals. The bees categorize, in a sophisticated way, places where nectar can be found. The crucial point is that they use their dances only in a cued manner, and thus the dances are not symbols according to my criterion. In spite of all attempts to teach apes various forms of symbolic codes (see, e.g., SavageRumbaugh et al., 1998), humans seem to be the only animals that use language in a fully detached way. Even though the bonobo Kanzi’s performance is quite impressive, his use of symbols is dependent on the context: They mainly express requests to “direct teacher’s attention to places, things and activities” (Savage-Rumbaugh et al., 1985, p. 658). On one classification, 96 percent of his productions are requests. Human children, in contrast, at a very early stage use language outside the context of request—for example, in commenting or in narratives (Tomasello, 1999). Vauclair (1990, p. 319) notes that “the use of symbols by apes is closely tied to the achievement of immediate goals, because the referents occur in the context of behavior on their objects.” This is congenial with Gulz’s (1991) conclusion that only humans are anticipatory planners. My conjecture is that this capability is required for the complete detachment of language. We are still waiting for Kanzi to tell us a story by the campfire. Cooperation and Communication by Symbols Human beings as well as other animals cooperate in order to reach common goals. Even seemingly simple animals like ants and bees cooperate in building complex societies. However, their cooperation is instinctive—they have no detached representation of the goal their collaboration is aimed at. For lack of representations, they cannot create new goals of cooperation.8 Nevertheless, for many forms of cooperation among animals, it seems that representations are not needed. If the common goal is present in the actual environment—for
Cooperation and the Evolution of Symbolic Communication
243
example, food to be eaten or an antagonist to fight—the collaborators need not focus on a joint representation of it before acting. If, on the other hand, the goal is detached (distant in time or space), then a common representation of it must be produced before cooperative action can be taken. In other words, cooperation about detached goals requires that the inner worlds of the individuals be coordinated. It seems hard to explain how this can be done without evoking symbolic communication. A problem concerning collaboration in order to reach a detached goal is that the value of the goal cannot be determined from the given environment, unlike a goal that is already present on the scene. The value of the future goal has to be estimated by each individual with regard to possible outcomes. Communication by symbols is quite intricate, because the meanings of the symbols are general and defined by interrelation. As mentioned earlier, it has so far not been shown that apes can communicate in a fully symbolic way (Deacon, 1997; Tomasello, 1999). Rather, it seems that apes in their natural habitat mainly exploit indexicals in their signaling. Human language is the prototype example of a symbolic communication system. Clearly, human language paves the way for long-term cooperation and for cooperation toward future goals. As Boysen and Berntson’s (1995) experiment indicates, it may be hard to give up a good in possession for a future but more precious one. An important feature of the use of symbols in cooperation is that they can free the cooperators from the goals that are available in the present environment. The detached goals and the means to reach them are picked out and externally shared through the linguistic medium. This kind of sharing gives humans an enormous advantage in cooperation in comparison to other species. I view this advantage as a strong evolutionary force behind the emergence of symbolic communication. More precisely, I submit that there has been a coevolution of cooperation about future goals and symbolic communication (cf. the “ratchet effect” discussed in Tomasello, 1999, pp. 37–40). Language is based on the use of representations as stand-ins for entities, actual or imagined. Use of such representations replaces the use of environmental cues in communication. If I have an idea about a goal I wish to attain, I can use language to communicate my thoughts. In this way, language makes it possible for us to share visions. There are many kinds of visions. Some of them are about concrete goals. For instance, the chief of a village can try to convince the inhabitants that they should cooperate in digging a common well that everybody will benefit from or in building a defensive wall that will increase the security of everybody. The goal requires efforts by the members of the community, and it can have a positive net benefit for all involved. Other visions are more abstract and distant, and their potential values are hard to assess. Many religions promise a heaven after death, if you behave according to certain norms. Such a vision is attractive to many, even though it is impossible to know whether it can
244
Peter Gärdenfors
be fulfilled. An eloquent leader can depict enticing goals and convince his supporters to make radical sacrifices, even though the visionary goals are extremely uncertain. The theory outlined here is compatible with the “mother tongue” hypothesis presented by Fitch (chapter 15 in this volume). Situations where common goals exist occur more frequently among kin than among non-kin. Such situations also foster “honest” communication. Therefore it is more likely that a system of symbolic references develops in a kin group than in a group of unrelated individuals. Another common point is that cooperation (rather than deception) is a crucial aspect of language evolution. However, a limitation of Fitch’s theory seems to be that it cannot explain why only humans have a symbolic language. There is nothing in the “mother tongue” hypothesis as presented by Fitch that precludes apes, for example, from having the capacity to develop a symbolic communication system. Similarly, Dunbar’s (1996) “gossip” theory builds on a correlation between the size of the cortex of different species of primates and their group size. The argument is that the larger the group, the more time and mental effort is required for social bonding. The big human brain is a result of the fact that the hominids were forced by ecological factors to live in larger groups. At a certain point in time (roughly when Homo sapiens appeared), grooming was no longer sufficient as a bonding device and language emerged as a more efficient mechanism. Dunbar writes (personal communication): “Chimpanzees have no pressure to evolve anything like language as a bonding device because they have no need to evolve larger groups than can be bonded using social grooming in the normal way.” I agree, but in order to explain why chimpanzees did not evolve language, it must also be explained why language is such a costly device, in terms of the required brain capacity. According to the theory outlined in this chapter, the fact that humans, but apparently no other species, can represent future goals and the inner worlds of others, makes us uniquely prepared for symbolic communication. These features, which are necessary for the cooperative benefits of symbolic language, clearly presume a substantial brain capacity exceeding that of other primates. What Aspects of the Evolution of Communication Should Be Explained First? When an animal (or a human) communicates, it wants something from another individual (even if it is only recognition of its existence). In this sense, all communication is a sign of failure. If everybody is pleased with the situation, there is no need for communication. When communication first appears, it is the communicative act in itself and the context in which it occurs that are most important, not the expressive form of the act (Winter, 1998, intro.). As a consequence, the pragmatic aspects of language are the most fundamental from an evolutionary point of view. When communicative acts (later speech acts) in due time become more varied, and eventually conventionalized, and their contents are
Cooperation and the Evolution of Symbolic Communication
245
detached from the immediate context, one can start analyzing the different meanings of the acts. Then semantic considerations become salient. Finally, when linguistic communication becomes even more conventionalized and combinatorially richer, certain markers, alias syntax, are used to disambiguate the communicative content when the context is not sufficient to do so. Thus syntax is required only for the subtlest aspects of communication—pragmatic and semantic features are more fundamental. This view on the evolutionary order of different linguistic functions stands in sharp contrast to mainstream contemporary linguistics. For followers of the Chomskyan school, syntax is the primary study object of linguistics; semantic features are added when grammar is not enough; and pragmatics is a wastebasket for what is left over (context, deixis, etc.). However, I believe that when the goal is to develop a theory of the evolution of communication, the converse order—pragmatics before semantics before syntax—is more appropriate. In other words, there is much to find out about the evolution of communication before we can understand the evolution of semantics and syntax. In support of the position that pragmatics is evolutionarily primary, I want to point out that most human cognitive functions had been chiseled out by evolution before the advent of language. I submit that language would not be possible without all these cognitive capacities, in particular having a theory of mind and being able to represent future goals. This position is not uncontested. Some researchers argue that human thinking cannot exist in its full sense without language (e.g., Dennett, 1991). Thus the emergence of language is seen as a cause of certain forms of thinking, such as concept formation. However, seeing language as a cause of human thinking is like seeing money as a cause of human economics (Tomasello, 1999, p. 94). Humans have been trading goods as long as they have existed. But when a monetary system emerges, it makes economic transactions more efficient. The same applies to language: Hominids were communicating long before they had a language, but language makes the exchange of knowledge more efficient. The analogy carries further: When money is introduced into a society, a relatively stable system of prices emerges. Similarly, when linguistic communication develops, individuals will come to share a relatively stable system of meanings (i.e., components in their inner worlds that communicators can exchange with each other). In this way, language fosters a common structure of the inner worlds of the individuals in a society. This view on the regulatory role of language gains additional support from a different direction. In a variety of computer simulations and robotic experiments (e.g., Hurford, 1999; Kirby, 1999; Steels, 1999, and chapter 5 in this volume; Kaplan, 2000), it has been shown that a stable communicative system can emerge as a result of iterated interactions between artificial agents, even though there is nobody who determines any “rules” for the communication. A general finding of the experiments is that the more “speakers” and “hearers” are involved in communication about the same outer world, the stronger the
246
Peter Gärdenfors
convergence of the reference of the “words” that are used and the faster the convergence is attained. Still, different “dialects” in the simulated community often emerge. The Evolution of Referential Expressions I want to view semantics as conventionalized pragmatics. One important question then concerns what the cognitive structure of the semantic conventions is. Here, I believe that so-called cognitive semantics offers one part of the answer (Lakoff, 1987; Langacker, 1987). According to cognitive semantics, the meanings of words can be represented as “image schemas” in the heads of people. But a general problem for such a semantic theory is that if everybody has his or her own inner world, how can we then talk about a representation being the meaning of an expression? In other words, how can individual representations, cued or detached, become conventions? Therefore, the question in focus in this section will be how language can help us share our inner worlds. The use of language has many facets, and it is impossible to cover all of them in this chapter. However, in cooperative communication about detached goals, a particularly important case of sharing inner worlds is to jointly refer to objects that are not present at the scene of communication. It can be an object that is distant, such as an animal to hunt or a tree containing honey, but it can also be a not yet existing object that is to be created by cooperation, such as a communal well. In contrast, indexical or deictic reference, such as pointing, is sufficient for identifying referents that are present in the environment. I will take this communicative problem as paradigmatic for an analysis of what is required for symbolic communication concerning cooperation about future goals. In the computer simulations and robotic experiments performed by Steels and others, the typical communicative situation is a “guessing game” (Steels, chapter 5 in this volume) where the speaker, by uttering a word, tries to make the hearer identify a particular object in the environment. It should be noted that in such guessing games (as in Wittgenstein’s language games) the participants are concerned only with finding the appropriate referent among those that are on the scene. In contrast, communication about nonpresent referents, which will be in focus here, demands that the communicators have more advanced representational capacities. In this section I want to describe three stages of abstraction in the communication about referents and the establishing of the meanings of referential expressions (partly following Winter and Gärdenfors, 1998; Gärdenfors, 2000; also see Olson, 1970). At each stage I shall specify the assumptions concerning the sharing of inner worlds that it requires. The fitness variables driving the abstraction process could be the strain on memory as a cost, and efficiency of communication (in the present case, identifying a referent) as benefit.
Cooperation and the Evolution of Symbolic Communication
247
Names The starting assumption is that each object that is perceived or communicated about is represented as a point in a conceptual space (as described in Gärdenfors, 2000). Conceptual spaces consist of a number of “quality dimensions” that represent various properties of the object, such as color, size, shape, texture, and sound. Conceptual spaces can be seen as providing the framework for the knowledge that is represented in the inner worlds of individuals. Different individuals may structure their spaces differently, so there may be no immediate way of comparing them. Properties of the objects may be changing, which means that the points representing them move around in the conceptual space, as indicated in figure 13.1a. Furthermore, objects come into existence and disappear, which means that points come and go in the representing space. Now suppose each individual in a communicative dyad has his or her own set of representational points in a private conceptual space. How can we solve the paradigmatic communicative problem where the speaker wants to use symbolic language to make the hearer identify a particular object? At the lowest level of abstraction, this communicative task is achieved by names. A name picks out a particular object represented as a point in the conceptual space of an individual. In figure 13.1b, this identification is represented by circling the representation of an object. If both participants associate the same name with the same external object, then the hearer can identify the object that the speaker intends. It should be noted that the naming mechanism puts no requirement on the alignment of the conceptual spaces of the communicating individuals, but only that their inner worlds contain an appropriate referent for the name. Even though this communicative mechanism in principle solves the task of identifying a common referent, it works only when both speakers are acquainted with the named object and have associated the same name with it. Furthermore, the mechanism is dependent on
a
b
Figure 13.1 (a) Points move around in the conceptual space. (b) A name singles out a unique referent.
248
Peter Gärdenfors
a stable context in the sense that entities exist in the presence of the speaker and the hearer long enough for a name to be established (by deixis or some similar pragmatic mechanism). In an evolutionary setting, there are two kinds of entities that remain relatively stable and identifiable within a community: people and places. Thus one can speculate that the first stages of language contained names for people and places together with words denoting relations between such entities (Dunbar, 1996; Worden, 1996). Such a communicative system would be a protolanguage in the sense of Bickerton (1990). Nouns In the light of these assumptions, one should ask how objects that are not suitable for naming can be identified. To answer this question, we must enter the second level of abstraction within the set of points in a conceptual space. This level builds on a fundamental fact about the world around us: It is not random. In other words, properties of objects tend to go together. It is an interesting fact about the evolution of human thinking that, fortunately, our minds seem predisposed to detect such correlations of properties (Kornblith, 1993; Holland et al., 1995). A likely explanation of this capacity is that our perceptions of natural objects show correlations along several quality dimensions and, as a result of evolutionary pressures, we have developed a competence to detect these correlations. In conceptual spaces, correlations show up as clusters of points. Such a cluster is marked by a circle in figure 13.2. A paramount feature of clusters is that, unlike points representing single objects, they will remain stable even when objects change their properties somewhat or when new
Figure 13.2 A noun corresponds to a cluster of correlated properties.
Cooperation and the Evolution of Symbolic Communication
249
objects come into existence or old ones disappear. Thus, clusters are much more reliable as references of words than are points representing single objects. Furthermore, even if two individuals are not acquainted with the same objects represented within a cluster, their clusters may still be sufficiently similar to be matched. For this to happen, it is sufficient that we interact with the same kinds of objects and have shared sociocultural practices. So if there is only one object from a given cluster that is salient in the cooperative context, it is sufficient that the communicators can identify the same cluster in their inner worlds for them to identify the object of collaboration. This level of abstraction thus puts some minimal constraints on the coordination of the conceptual spaces of the communicating individuals. The prime linguistic tool for referring to a cluster is a noun. Rather than referring to the entire cluster, a noun refers to a point (representing a possible object) that functions as a stand-in for the cluster. This stand-in point, a white star in figure 13.2, can be identified as the prototype of the cluster. This mechanism explains why nouns (noun phrases) have basically the same grammatical function as names. By using a noun, the speaker indicates that he or she is talking about one of the elements in the cluster, by default a prototypical element, which is often sufficient for the hearer to identify the appropriate object in the context.9 However, a fundamental difference between objects and prototypes is that there are, in principle, an infinite number of possible objects (with different combinations of properties), whereas we typically work with a small number of clusters and their representing prototypes. Focusing on nouns results in a discretization of the space (compare Petitot, 1989, p. 27).10 Such a discretization is also necessary for a finite vocabulary. The prototype need not represent any of the objects anybody has encountered. It is represented as a central point in the cluster associated with a noun, but no existing object needs to have its representation there. Nevertheless, since different regions of the space are correlated with different properties in other domains, the possible object represented by the prototypical point will, by default, be assigned a number of properties. For example, a bird normally is small, sings, flies, and builds nests in trees. These properties form the expectations generated by the mentioning of a noun. Among the objects represented in the conceptual space of an individual, there may be several layers of clusters, depending on how finely one wants to partition the space. However, there tends to be a privileged way of clustering the objects that will generate the basic categories in the sense of prototype theory (see, e.g., Rosch, 1978). This is the set of clusters that provides the most “economic” way of partitioning the world. What is “economic” depends, among other things, on the practices of the members of the community. Economy goes hand in hand with learnability: The basic categories are also those that are first learned by children.
250
Peter Gärdenfors
Adjectives Basic level nouns partition the conceptual space only in a rather coarse way. Using nouns presumes that the communicators have representations of the same clusters, which is a much less severe assumption than that they are acquainted with the same individuals. However, in some communicative contexts even this presumption delimits the communicative capacities. One example of such a context is when the speaker and hearer face a class of objects that all fall under the same noun and the speaker needs to identify one of the objects in the class, but has no name for it. There are two solutions to this referential problem. The first is to introduce a finer level of granularity when identifying clusters. This strategy leads to the introduction of subordinate nouns (ostrich instead of bird, Volvo instead of car, etc.). The drawback, from the viewpoint of cognitive economy, is (as in the case of names) that learning a large number of subordinate nouns demands a rich memory. However, if a finer categorization helps you solve new problems, the cost of remembering additional nouns may be worth the benefits. (As a matter of fact, being an expert in an area involves having a large number of subordinate concepts, i.e., having a finely partitioned set of clusters.) The second solution is to introduce a third level of abstraction. A fundamental strategy to distinguish points within a cluster that has been determined by correlated properties is to identify a feature that does not covary with other properties of the cluster. This is the basic mechanism for generating the dimensions of communication. For example, the color of an object often does not covary with other properties. In figure 13.3, the color dimensions are indicated (in one dimension only) by different shades of gray. Domains that are singled out by this process will be expressed by adjectives in natural language (see also Givón, 1984). For example, to identify a particular car in a parking lot, one can say “the red car” (color domain) or “the big car” (size domain). The most useful adjectives are those that can be used with a large class of nouns, such as color or size words. In principle, adjectives can be used to refer without a noun. For example, you may use an expression such as “the red one” to identify an object that is present in the communicative context (where the noun phrase “one” serves as a placeholder for a noun).
Figure 13.3 Adjectives single out dimensions.
Cooperation and the Evolution of Symbolic Communication
251
However, in most cases an adjective is used to give further information about a specific object. The combination of an adjective plus a noun allows you to identify a referent with a smaller burden on memory than subcategories of nouns. In elementary communicationeconomic terms, if you have a vocabulary with m nouns and n adjectives, you can use these m + n words to express m ¥ n combinations. This multiplicativity of referential power does not apply to subcategories of nouns. Another aspect of communicative economy is that when you are faced with a situation where a noun covers several potential referents, you should select an adjective that picks out a maximally informative dimension within the cluster that represents the noun. Speakers are in general skilled at intuitively selecting the right dimension in a given communicative context. These considerations show that adjectives contribute substantially to the cognitive economy of communication. The cost is that the use of adjectives presupposes that communicators share dimensions. This presupposition demands a rather strict alignment of the conceptual spaces of the communicators, which is why adjectives involve a higher level of abstraction and coordination than names and nouns. The thesis that adjectives are more abstract tools for communication than are names and nouns is supported by data from children’s language, as is witnessed by the following quotation from Smith: [T]here is a dimensionalization of the knowledge system. . . . Children’s early word acquisitions suggest such a trend. Among the first words acquired by children are the names for basic categories— categories such as dog and chair, which seem well organized by overall similarities. Words that refer to superordinate categories (e.g., animal) are not well organized by overall similarity, and the words that refer to dimensional relations themselves (e.g., red or tall) appear to be understood relatively late. . . . (1989, p. 159)11
Social interactions will generate a need for representations where the dimensional structure is represented by a small number of values on each dimension. As a matter of fact, dimensional adjectives generally come in polarity pairs: heavy–light, tall–short, and so on. Freyd (1983) argues that knowledge about the world, by the fact that it is shared in a language community, imposes constraints on individual representations. She states that the structural properties of individuals’ knowledge domains have evolved because “they provide for the most efficient sharing of concepts,” and proposes that a dimensional structure with a small number of values on each dimension will be especially “shareable.” This process of creating shared meanings is continually ongoing: The interplay between individual and social structures is in eternal coevolution. The effects are magnified when communication takes place between many individuals (cf. the simulations by Steels and others).
252
Peter Gärdenfors
It should also be noted that representational availability of a domain normally precedes explicit awareness of the domain. In other words, even if a domain is exploited in linguistic communication, the communicators often are not able to refer to the domain itself. Such a capacity would presume an even higher level of abstraction than the three levels discussed in this section. In support of this position, it can be noted that children learn to use color words before they can engage in abstract talk about color in general. A related phenomenon from children’s language is that adjectives that denote contrasts within one domain are often used for other domains. Thus, three- and four-year-olds confuse “high” with “tall,” “big” with “bright,” and so on (Carey, 1985). There is potentially an unlimited number of dimensions in conceptual spaces that are grounded in perception. This could be an insurmountable problem when coordinating the spaces of several individuals. However, even though the class of adjectives is open-ended, linguistic space has a limited number of dimensions. Furthermore, cooperative communication highlights the dimensions that are relevant (in a particular society). Which dimensions they are is to a large extent dependent on the practices of the society. Success in communicative tasks leads to a stabilization of the perceptual dimensions of the individuals and makes them shared in a community. Following an earlier analogy, it can be said that, like money, language is a social good. In this section I have modeled an abstraction process concerning communication about referents. The arguments suggest that common dimensional structures are likely to emerge as a consequence of the requirement that cooperation about future goals be highly dependent on shared knowledge. This stance on symbolic communication leads to a chicken-or-egg problem: Are conceptual spaces prerequisites for successful communication or are they emergent results of successful communication? The answer, it seems to me, is “both.” As is argued in Gärdenfors (2000), the dimensions in conceptual spaces have several origins. This section has added yet another: Communication is a catalyst for geometrically structured meanings. The analysis also indicates the semantic functions of different word classes (in contrast to traditional linguistic theory, which defines word classes in terms of syntactic features). Conclusion: Cooperation Begat Language Recent literature on animal cognition has, to a large extent, focused on social complexity and sophistication. As a litmus test, the deceptive capacities of different species (e.g., Whiten and Byrne, 1988; Byrne, 1995) have been studied, often in terms of so-called Machiavellian intelligence (Whiten and Byrne, 1997). This tendency has spilled over into the debate on the evolution of human cognition. However, a general conclusion to be
Cooperation and the Evolution of Symbolic Communication
253
drawn from this chapter is that, as regards the human species, the development of advanced forms of cooperation is more important when explaining the evolution of language. Advanced cooperation demands access to detached representations and the capacity to communicate about such representations. Therefore, the efficiency of communication about a detached goal will be a bottleneck in changing the strategic situation of the group. The core argument of this chapter is that without the aid of symbolic communication, we would not be able to share visions about the future. We need it in order to convince each other that a future goal is worth striving for. The key question for cooperation on the basis of symbolic communication is thus how we communicate the detached representations of our inner worlds. In my opinion, the emergence of sharable conceptual spaces provides the first steps of an answer. I believe that the benefits of advanced cooperation are so extensive that they are the major evolutionary forces behind the emergence of symbolic language. In this sense, cooperation begets language. The theory presented in this chapter also explains why only humans have language. Being able to cooperate about future goals requires detached representations of goals as well as a theory of mind. As far as we know, both these cognitive capacities are uniquely human. Acknowledgments I want to thank the participants in the Altenberg conference The Evolution of Communication Systems, in particular Robin Dunbar, Ulrike Griebel, Kim Oller, and Tecumseh Fitch, as well as Ingar Brinck and David de Léon for their helpful comments. Notes 1. The first version of a sexual selection theory was proposed by Darwin (1896, p. 87). 2. Another caveat concerning my use of the notion of representation is that I am not making any ontological claims: I am not proposing that representations are entities with some kind of reality status. Rather, I view representations as theoretical terms, in the way standardly conceived of in philosophy of science (e.g., Sneed, 1971). Representations are theoretical idealizations, similar to “forces” in Newtonian mechanics, that are introduced to predict and explain empirical generalizations (cf. Lachman and Lachman, 1982). 3. Animals with inner worlds correspond well to what Dennett (1996) calls Popperian beings (in contrast to Skinnerian beings, who learn by trial and error and conditioning). 4. Vauclair (1987) provides a more recent analysis of the notion of a “cognitive mapping.” 5. Byrne (1995, pp. 119–120) presents a case where a group of old male chimpanzees cornered a mother leopard and a cub in her narrow breeding cave. One brave chimp went into the cave and emerged with the cub, which was then bitten and kneaded until it was dying. Byrne writes that “in the absence of any immediate reward, their behaviour cannot be explained as conventional animal learning. . . . Perhaps the ‘least implausible’ explanation
254
Peter Gärdenfors
is that the chimpanzees had an understanding of the likely effects of their actions in the future.” I don’t agree that this is the least implausible explanation. It is natural that chimps show more or less innate aggressive behavior against leopards—old and young—and if there was no danger, they would be happy to eliminate any leopard they encountered. In the observed case, directly attacking the mother would be too dangerous—but bringing the cub out of the cave turned out to be possible thanks to a foolhardy chimp. In my opinion, the event can thus be explained without any reference to the chimps imagining the future consequences of their hunt, and hence it is not a case of anticipatory planning. 6. However, my claim that symbols refer to detached representations in an inner world is not exactly the same as Hockett’s criterion. The reason is that he includes the following under “displacement”: “Any delay between the reception of a stimulus and the appearance of the response means that the former has been coded into a stable spatial array, which endures at least until it is read off in the response” (Hockett, 1960, p. 417). His description has a clear behavioristic ring to it, and it means that every signal that is not an immediate reaction to a stimulus would be counted as an example of “displacement” according to his criterion. There are, however, many examples of signals that derive from perceptions where there may be longer or shorter delays before the signal is emitted. This does not entail that the signal has any symbolic function whatsoever. 7. For a model theoretic account of how such communication can be established, see Gärdenfors (1993). A special case of the process will be discussed in the section “The Evolution of Referential Expressions.” 8. This section is based on material from Brinck and Gärdenfors (2003). 9. Some further aspects of referential communication, in particular the relevance of contrast classes, are treated in Winter and Gärdenfors (1998). 10. This process is related to the phenomenon of categorical perception. 11. Also see Smith and Sera (1992, p. 132).
References Barsalou LW (1999) Perceptual symbol systems. Behav Brain Sci 22: 577–609. Bickerton D (1990) Language and Species. Chicago: University of Chicago Press. Boysen S, Berntson G (1995) Responses to quantity: Perceptual versus cognitive mechanisms in chimpanzees (Pan troglodytes). J Exper Psych Anim Behav Proc 21: 82–86. Brinck I, Gärdenfors P (2003) Co-operation and communication in apes and humans. Mind and Language 18: 484–501. Byrne R (1995) The Thinking Ape: Evolutionary Origins of Intelligence. Oxford: Oxford University Press. Carey S (1985) Conceptual Change in Childhood. Cambridge, Mass.: MIT Press. Craik K (1943) The Nature of Explanation. Cambridge: Cambridge University Press. Darwin C (1896) The Descent of Man and Selection in Relation to Sex. London: William Cloves. Deacon TW (1997) The Symbolic Species: The Co-evolution of Language and the Brain. New York: Norton. Dennett D (1991) Consciousness Explained. Boston: Little, Brown. Dennett D (1996) Kinds of Minds. New York: Basic Books. Donald M (1991) Origins of the Modern Mind. Cambridge, Mass.: Harvard University Press. Dunbar R (1996) Grooming, Gossip and the Evolution of Language. Cambridge, Mass.: Harvard University Press. Ellen P, Thinus-Blanc C (eds.) (1987) Cognitive Processes and Spatial Orientation in Animal and Man, vol. 1, Experimental Animal Psychology and Ethology. Dordrecht: Martinus Nijhoff. Freyd J (1983) Shareability: The social psychology of epistemology. Cognit Sci 7: 191–210. Gärdenfors P (1993) The emergence of meaning. Ling Phil 16: 285–309.
Cooperation and the Evolution of Symbolic Communication
255
Gärdenfors P (1996) Cued and detached representations in animal cognition. Behav Proc 36: 263–273. Gärdenfors P (2000) Conceptual Spaces: The Geometry of Thought. Cambridge, Mass.: MIT Press. Givón T (1984) Syntax—a Functional–Typological Introduction, vol. 1. Amsterdam: John Benjamins. Glasersfeld E (1977) Linguistic communication: theory and definition. In: Language Learning by a Chimpanzee: The LANA Project (Rumbangh DM, ed.), 55–71. New York: Academic Press. Grush R (1997) The architecture of representation. Phil Psych 10: 5–23. Gulz A (1991) The Planning of Action as a Cognitive and Biological Phenomenon. Lund University Cognitive Studies 2. Lund. Hauser M (2002) Wild Minds: What Animals Really Think. London: Allen Lane. Hockett C (1960) Logical considerations in the study of animal communication. In: Animal sounds and communication (Lanyon WE, Tavolga WN, eds.) 392–430. Washington, DC: American Institute of Biological Sciences. Holland JH, Holyoak KJ, Nisbett RE, Thagard PR (1995) Induction: Processes of Inference, Learning, and Discovery. Cambridge, Mass.: MIT Press. Hurford J (1999) The evolution of language and languages. In: The Evolution of Culture (Dunbar R, Knight C, Power C, eds.), 173–193. Edinburgh: Edinburgh University Press. Jeannerod M (1994) The representing brain, neural correlates of motor intention and imagery. Behav Brain Sci 17: 187–202. Kaplan F (2000) L’émergence d’un lexique dans une population d’agents autonomes. Ph.D. thesis, Laboratoire d’Informatique de Paris 6. Kirby S (1999) Function, Selection and Innateness: The Emergence of Language Universals. Oxford: Oxford University Press. Kornblith H (1993) Inductive Inference and Its Natural Ground: An Essay in Naturalistic Epistemology. Cambridge, Mass.: MIT Press. Lachman R, Lachman JL (1982) Memory representations in animals: Some metatheoretical issues. Behav Brain Sci 5: 380–381. Lakoff G (1987) Women, Fire, and Dangerous Things. Chicago: University of Chicago Press. Langacker RW (1987) Foundations of Cognitive Grammar, vol. 1. Stanford, Calif.: Stanford University Press. Oakley KP (1961) On man’s use of fire, with comments on tool-making and hunting. In: Social Life of Early Man (Washburn SL, ed.), 176–193. Chicago: Aldine. Olson DR (1970) Language and thought—aspects of a cognitive theory of semantics. Psych Rev 77: 257–273. Petitot J (1989) Morphodynamics and the categorical perception of phonological units. Theoretical Ling 15: 25–71. Roitblat HL (1982) The meaning of representation in animal memory. Behav Brain Sci 5: 353–372. Rosch E (1978) Prototype classification and logical classification: The two systems. In: New Trends in Cognitive Representation: Challenges to Piaget’s Theory (Scholnik E, ed.), 73–86. Hillsdale, N.J.: Lawrence Erlbaum. Savage-Rumbaugh ES, Rumbaugh DM, McDonald K (1985) Language learning in two species of apes. Neurosci Biobehav Rev 9: 653–665. Savage-Rumbaugh ES, Shanker SG, Taylor TJ (1998) Apes, Language and the Human Mind. Oxford: Oxford University Press. Sjölander S (1993) Some cognitive breakthroughs in the evolution of cognition and consciousness, and their impact on the biology of language. Evol Cognit 3: 1–10. Sjölander S (2002) Naturens Budbärare. Nora, Sweden: Nya Doxa. Smith LB (1989) From global similarities to kinds of similarities—the construction of dimensions in development. In: Similarity and Analogical Reasoning (Vosniadou S, Ortony A, eds.), 146–178. Cambridge: Cambridge University Press.
256
Peter Gärdenfors
Smith LB, Sera MD (1992) A developmental analysis of the polar structure of dimensions. Cognit Psych 24: 99–142. Sneed J (1971) The Logical Structure of Mathematical Physics. Dordrecht: Reidel. Steels L (1999) The Talking Heads Experiment. Antwerp: Laboratorium. Tolman EC (1948) Cognitive maps in rats and men. Psych Rev 55: 189–208. Tomasello M (1999) The Cultural Origins of Human Cognition. Cambridge, Mass.: Harvard University Press. Vauclair J (1987) A comparative approach to cognitive mapping. In: Cognitive Processes and Spatial Orientation in Animal and Man, vol. 1, Experimental Animal Psychology and Ethology (Ellen P, Thinus-Blanc C, eds.), 89–96. Dordrecht: Martinus Nijhoff. Vauclair J (1990) Primate cognition: From representation to language. In: Language and Intelligence in Monkeys and Apes (Parker ST, Gibson KR, eds.), 312–329. Cambridge: Cambridge University Press. Von Glasersfeld E (1977) Linguistic communication: Theory and definition. In: Language Learning by a Chimpanzee: The LANA Project (Rumbaugh DM, ed.), 55–71. New York: Academic Press. Whiten A, Byrne RW (1988) Tactical deception in primates. Behav Brain Sci 11: 233–273. Whiten A, Byrne RW (eds.) (1997) Machiavellian Intelligence II: Evaluations and Extensions. Cambridge: Cambridge University Press. Winter S (1998) Expectations and Linguistic Meaning. Lund University Cognitive Studies 71. Lund. Winter S, Gärdenfors P (1998) Evolving Social Constraints on Individual Conceptual Representations. Lund University Cognitive Studies 69. Lund. Worden RP (1996) Primate social intelligence. Cognit Sci 20: 579–616.
14
Language, Music, and Laughter in Evolutionary Perspective
R. I. M. Dunbar Speech (and thus language) is unique to modern humans. The lack of comparative cases makes its origins and the selective forces favoring its evolution difficult to determine with any reliability. The result has been a plethora of rather speculative suggestions about the origins of language. Among these, for example, has been the suggestion that language evolved as a by-product of gestural forms of communication. However, since the 1990s, there have been a number of attempts to examine this problem in a more concerted way. Nowak and colleagues (Nowak and Krakauer, 1999; Nowak et al., 1999; Nowak and Komarova, 2001), for example, have developed mathematical models that attempt to explore the conditions under which referential utterances (nouns, verbs) and grammar (noun + verb complexes) might have evolved as mechanisms for facilitating cooperative relationships among members of social groups. It seems that the conditions under which this might occur are relatively benign: quite modest improvements in the reliability of communication (i.e., the extent to which transmission errors are minimized) are sufficient to promote these aspects of language in a social context. Similarly, a strong case has been made for the suggestion that the principal selective advantage for the evolution of language was social rather than environmental or technical (Dunbar, 1993, 1996). This argument rests on the claim that language evolved to supplement (and ultimately largely to replace) grooming as the principal mechanism for social bonding within the later hominid lineage, once group sizes had begun to exceed those that could be sustained by the more conventional primate mechanism of social grooming. Although it has been possible to adduce a significant amount of evidence in support of this claim, there remain a number of anomalies that require explanation. One of these is the nature of the transition from social grooming to the earliest forms of language: Was this a step transition or did it, instead, evolve some kind of continuum via intermediate phases? Another anomaly is the fact that the bonding mechanism among nonhuman primates (social grooming) seems to have a strong pharmacological component: Grooming is an extremely effective stimulus for the release of endogenous opioids (Keverne et al., 1989), as well as other endocrines (e.g., oxytocin: Unväs-Moberg, 1998) that seem to act as the primary reinforcers for affiliative social interaction. Although we don’t really understand how it works, this pharmacological underpinning for grooming seems to be crucial in facilitating social bonding, perhaps because it creates a sense of pharmacological “warmth” that facilitates intimacy and trust. If the conventional primate bonding process is of this kind, how does language bridge the pharmacological gap? My aim in this chapter is to address the first of these anomalies and then to suggest a plausible candidate for the second. First, however, I will briefly review the way social
258
R. I. M. Dunbar
grooming works to bond the social groups of nonhuman primates and the reasons why it has been suggested that language evolved to fill the same function in the hominid lineage. Social Grooming as a Bonding Mechanism Primates are highly social animals and, by comparison with all other taxa, devote an unusually large proportion of their day to social grooming. There is evidence from a number of field studies to suggest that grooming frequencies are correlated with willingness to provide coalitionary support (vervets: Seyfarth and Cheney, 1984; gelada: Dunbar, 1980, 1989), though some have questioned the generality of this claim (Henzi and Barrett, 1999). Irrespective of the fine details in this respect, it seems that the amount of time that Old World monkey and ape species devote to social grooming is correlated with social group size (Dunbar, 1991). It is difficult to escape the conclusion that social grooming is intimately involved in the creation and maintenance of coalitions, and that the effectiveness with which such coalitionary relationships work is a more or less linear function of the amount of grooming time that has been invested in the relationship. We have next to no idea as to why grooming enables such relationships to be formed. One suggestion, however, is that grooming allows animals to relax in each other’s company. Grooming is an especially intense activity, requiring great concentration on the part of the groomer, and there is some (albeit questionable) evidence that it is among the most energetically expensive of all activities (Coelho, 1974). It is quite clear, however, that the effect of grooming on the groomee is physiologically relaxing: heart rate and behavioral measures of anxiety (e.g., scratching) decline when an animal is groomed (Goosen, 1981). Indeed, in some species, animals being groomed often fall asleep and need to be prompted into taking their turn to groom the partner. More important, there is solid experimental evidence that grooming releases endorphins (endogenous opioids): Animals that have been groomed have higher b-endorphin titers than those which have not (Keverne et al., 1989). In addition, Keverne et al. showed that animals that have been given opioids show a significantly reduced interest in grooming, whereas those that have been given opiate blockers (such as naloxone) exhibit increased tenseness and a desire to be groomed. However, endorphins may not be the only endocrines involved. There is a considerable body of experimental evidence to implicate oxytocin. Oxytocin appears to have many of the same kinds of psychopharmacological effects as endorphins, and its release seems to be triggered by many of the same kinds of stimuli (Carter, 1998; Uvnäs-Moberg, 1998). At present, it remains unclear which of these neuroendocrines is the operative mechanism generating feelings of well-being during intense social interactions—or, indeed, whether
Language, Music, and Laughter in Evolutionary Perspective
259
they are both involved in some kind of endocrine cascade. There is experimental evidence to suggest that oxytocin injections may trigger the release of endorphins (Petersson et al., 1996). Between them, these findings suggest that grooming acts in such a way as to make animals feel more relaxed in the company of their regular grooming partners. One possible implication of this is that the pharmacological effects of grooming create the basis for the commitment (in a loose sense, trust?) between individuals that serves to make subsequent coalitionary support possible. Animals that feel more emotionally committed to each other are more likely to come to each other’s aid when one of them is under attack. Social Bonding and the Evolution of Language We have been able to show that in primates as a whole, social group size is a function of relative neocortex size (Dunbar, 1992a), probably because there is an informationprocessing constraint on the number of relationships that can be held in mind at any one time. This seems to be not so much a problem about memory capacity (i.e., the number of names that can be put to faces) as a problem about the manipulation of the animal’s knowledge about the state of a given (dyadic, perhaps even triadic) relationship, and how this is updated in light of the continuous flow of information about social events that an animal receives. We have used this relationship to predict the size of human groups, given the relative size of the human neocortex. The predicted value of about 150 turns out to be a common value among humans for groups that are composed of individuals with a particular relationship to each other (Dunbar, 1993). This relationship seems to reflect a degree of intimate personal knowledge between network members, perhaps associated with a level of trust and a sense obligation. Hill and Dunbar (2003) used Christmas card lists as a means of estimating social network size in a sample of British households. The habit of sending Christmas cards to those who are considered important to oneself is a deeply embedded feature of British culture; almost everybody does it. It represents a once-a-year event when meaningful relationships are reinforced. The mean recipient group size (as indexed by the number of coresiding individuals in recipient households) was 153.5 ± 84.5. This is very close to the estimates of social network size based on “small world” experiments (~135: Killworth et al., 1984); the size of the smallest independent military units in modern armies (120–220: MacDonald, 1955); the mean size of hunter-gatherer clans/communities (~155: Dunbar, 1993); and the size at which Hutterites insist on splitting their communities (130–150: Muncy, 1973), among many other examples. It is important to appreciate just what is meant by “group” in this context. Essentially, it is the number of individuals with whom one has a specific relationship that is definable
260
R. I. M. Dunbar
in terms of the depth of knowledge about each individual, the relationships within which these individuals are embedded, and the levels of dependability (and trust?) that hold within those relationships. In effect, in human terms it is the set of individuals whom I know well enough to feel comfortable with them in a social context: I do not have to introduce myself or explain who I am, since all that is known. I may have to do some catching up on the details of recent history, but I know just how everyone in that set relates to the others and to me. In effect, I know that there is a sufficient basis for a relationship. This is not necessarily identical to the number of individuals I live with or the number of individuals I know by sight. Although, fortunately for the original analyses, these two definitions are more or less synonymous in primates (actual day-to-day group size does coincide with the limit on the number of relationships an individual has), this is not so for all species. Chimpanzees, spider monkeys, gelada and hamadryas baboons, as well as human hunter-gatherers, live in social systems that are characterized by considerable structural fluidity (so-called fission-fusion societies). Human hunter-gatherers, for example, commonly live in small (30–50 individuals) foraging camps. However, most such fission-fusion social systems are structured in a hierarchically inclusive way: the foraging units (such as hunter-gatherers’ camps) are joined into a more inclusive (but still, in terms of membership, exclusive) grouping (usually referred to as a clan or a community). Chimpanzee foraging parties (typically 1–15 individuals in size) consist only of individuals who belong to the same “community” (typically 50–100 animals); hunter-gatherers normally make camps and hunt only with members of the same “clan” (typically 100–200 individuals in size). This is not to say that strangers are never welcomed into a camp or foraging party of hunter-gatherers or chimpanzees: they clearly are, but the quality of the relationship with these individuals is very different (and perceived as being very different) from the relationships one has with those with whom one habitually lives. In all cases, it is these larger, more inclusive groupings that correspond to the size of group predicted by neocortex size for a particular species. The smaller groupings are ecological units into which the larger groupings are forced to disperse by environmental conditions; so far as we know, they have no implicit cognitive standing. Given that the natural “cognitive” groups of humans are of this order, it raises an interesting question: How do modern humans bond these large groups? The first step is to ask how much time would need to be invested in grooming if they did so in the conventional catarrhine primate manner. The equation for grooming time as a function of social group size in Old World monkeys and apes would predict that humans living in groups of 150 would need to spend about 43 percent of their total day grooming each other in order to bond these groups effectively (and hence ensure their cohesion and effectiveness through time). We know from studies of wild baboon populations, for example, that failure to
Language, Music, and Laughter in Evolutionary Perspective
261
devote enough time to social grooming leads to significantly elevated rates of group fission and social fragmentation (Dunbar, 1992b), so these relationships have real ecological and demographic force. So large a proportion of time devoted to grooming, however, has serious implications for animals’ abilities to meet their nutritional requirements if they have to forage for food in the conventional way. Indeed, the predicted value of 43 percent is more than double the highest value yet recorded for time spent in social grooming by any primate group (Dunbar, 1991). Hence, we can conclude that if humans were to bond such large social groups effectively, they must use a bonding mechanism that is qualitatively different from that used by other primates—at least in the sense that it uses time more effectively than grooming does. Given that there appears to be an upper limit on the amount of time that primates can spend engaged in social interaction if they are to meet the other demands on their time, this bonding mechanism must be roughly twice as effective as primate grooming in its use of time. The suggestion, then, is that language evolved to bridge this time budgeting gap. This suggestion is reinforced by evidence from a variety of contemporary cultures that the mean amount of time devoted to social interaction (mainly conversation) by humans is exactly 20 percent of total available time (Dunbar, 2000)—in other words, exactly the limiting value found in wild primates. In effect, humans push what social time primates have available to its limit, but use that time more efficiently in terms of social bonding. In principle, language is a good candidate for this role because it has three major advantages over conventional grooming as a mechanism for facilitating the building of relationships between individuals: (1) we can speak while engaged in other activities, such as feeding or walking (something that is not possible for grooming); (2) it allows a speaker to interact (“groom”) with several individuals at once (whereas grooming is a strictly oneon-one activity, even among modern humans); and (3) it allows an individual to obtain information about social events that it does not see (primates being limited in this respect to what they see with their own eyes). Music and the Origins of Speech Aiello and Dunbar (1993) used the relationships between neocortex volume, group size, and grooming time to explore the question of when language might have evolved within the human lineage. Using relationships between cranial volume and brain component volumes to map these relationships onto the hominid fossil record, they showed that group size and grooming time increased only gradually prior to one million years ago. After this point, grooming time requirement increased at an exponential rate, reaching its peak with modern humans and the Neanderthals. In order to determine the point at which language
262
R. I. M. Dunbar
would have been essential as a bonding device, it is necessary only to identify the threshold beyond which there would not have been sufficient time to bond the social group using conventional primate grooming. Since they were comparing two alternative views of when language might have evolved that had been the subject of intense debate in the literature (250,000 years ago versus 50,000 years ago), Aiello and Dunbar (1993) argued that the earlier date was the more likely, since grooming time requirements had already reached modern human levels (i.e., 40–45 percent) well before 50,000 years. More detailed consideration of the data, however, suggests that a date in the order of 500,000 years would be more realistic (Barrett et al., 2002). A date of around 500,000 years would have corresponded to a grooming time requirement approaching 30 percent of total daytime, which probably represents something of a Rubicon in terms of the extent to which time budgets could possibly be squeezed by increasing grooming to compensate for larger group sizes. My concern here is not with the exact date at which speech arose (though there is now additional anatomical evidence pointing to a date in the vicinity of 500,000 years: Kay et al., 1998; MacLarnon and Hewitt, 1999), but rather with the fact that, however one looks at it, there is a significant time gap between the point at which hominid grooming time requirements exceeded the 20 percent limit seen in modern primates (around two million years ago) and the likely earliest date for the emergence of speech and language (sometime around half a million years ago). More important, the anatomical evidence from hominid cranial volumes at no point suggests that there was any kind of step transition during this period, no Rubicon clearly separating a postlinguistic period from a prelinguistic past in hominid history. Rather, the evidence suggests that during this long critical period, group size and grooming time requirements rose steadily but inexorably on an exponential trend. This raises the question of what might have bridged the gap between the limiting value for grooming and the bonding investment time required at any given point. Aiello and Dunbar (1993) suggested that vocal exchanges analogous to those already seen in the contact calling of Old World monkeys and apes were initially added to more conventional grooming to achieve the desired investment. Contact calling exchanges are, in some sense, a natural candidate for this, since they are already used by a number of primate species (gelada, baboons, bonobos) for exactly this purpose, a form of grooming-at-a-distance. More important, perhaps, vocal exchanges have the advantage that they can be used while engaged in other activities (e.g., feeding or travel). In other words, contact calling exchanges allow the bonding process to be extended without necessarily taking time away from other essential activities. Aiello and Dunbar (1993) thus suggested a multistage sequence for the evolution of language in which, initially, grooming time was increased
Language, Music, and Laughter in Evolutionary Perspective
263
as group size rose to the point at which time demands necessitated the addition of increasing amounts of chorusing to bridge the gap; eventually, once the upper limit at which chorusing could be used for these purposes had been breached (at perhaps 30 percent bonding time), speech and language would have evolved, although at this stage their form may well have been limited to the exchange of social information. Although there would have been no further increases in group size to demand further expansion in communication capacity after this point, the possibility of a qualitative shift in how language was used may well have occurred in association with the Upper Paleolithic revolution of 50,000 years ago: This would not have been associated with any further changes in hardware, but rather with changes in how languages was used—specifically, a shift from relatively simple exchanges of social information to exchange about more extensive cultural matters (ideas about worlds beyond the physical, philosophical justifications for ethics, religion, and political organization). I suggest that the contact calling exchanges identified as bridging the gap between the upper limits of grooming and the first appearance of speech/language in fact developed into musical chorusing at quite an early stage in hominid evolutionary history—certainly by one million years ago, when Homo erectus group sizes would have required around 25 percent or more of daytime to be devoted to social bonding (i.e., around 5 percent more than the maximum that modern primates seem capable of supporting). Music has several key advantages that make it potentially an excellent prelinguistic bonding agent. First, its principal neural foci appear to be in the right hemisphere, suggesting that its origins and evolution have little to do with the evolution of speech as such (which is localized mainly in the left hemisphere). Second, we experience music as a deeply emotional phenomenon (the so-called tingle factor of music: Justin and Sloboda, 2001). Third, we clearly get most out of music precisely when we do it as a communal activity: Communal singing seems to have the same kinds of endorphin release that give grooming its pharmacological reinforcement (on the latter point, see Keverne et al., 1989). Hence, rather than seeing language entirely as an all-or-nothing affair that sprang spontaneously and uniquely from nowhere during the course of (presumably late) hominid evolution, we should see it as the culmination of a process of increasing diversification of social bonding mechanisms based on natural forms of communication. Its appearance as a means of communication (and, indeed, bonding device) may have been relatively quick even by the standards of hominid evolutionary history (though, even so, still strung out over a period on the order of tens of thousands of years), but it seems likely that it built on—and developed—anatomical and neural mechanisms (those required to support vocal singing) that had been developing increasing levels of sophistication over very much longer periods of time.
264
R. I. M. Dunbar
Laughter and Smiling as Bonding Agents While language appears to solve rather neatly the general requirements for bonding large numbers of individuals, there remain important gaps in the argument. Language seems suited to its role in that it allows time-sharing of social and other activities, while at the same time making a wider information network possible (we can talk to more people at once than we can groom). At one level, of course, language might be seen as fulfilling at least one role that grooming has: the fact that, at base, it is simply a statement of intent or commitment (“I would rather be sitting here grooming with you than over there with Jemima”). In other words, it is the time investment and not the particular kind of interaction that is important for bonding. Thus language seems to solve the bonding problem at a cognitive level quite effectively. There remains, however, one crucial issue that this argument overlooks: the fact that, irrespective of its cognitive features, language fails to address one of the crucial mechanisms that seems to allow grooming to facilitate bonding. In primates, the bonding process has a distinctive emotional component in the form of the pharmacological kick associated with the release of endogenous opioids (Keverne et al., 1989) that makes grooming pleasurable and reinforces (in some way that we do not at present fully understand) the social relationships involved. Where in language-based interactions is the equivalent reinforcer? It might be possible to argue that, in addition to the simple time commitment involved, interactions through language produce the same kinds of emotional uplift that are generated by grooming—that sense of emotional frisson that comes through interacting with a particular individual. It is easy to see this in the context of a new (and perhaps especially a desirable) relationship, but it is not at all clear that this is enough to provide the basis for a continuing relationship. If the opioid effect is integral to the process of bonding in primates, then something else is needed to provide the proximate basis for an enduring relationship in humans. It seems to me that the obvious (and perhaps the only) candidates are smiling and laughter. Laughter and smiling are used under broadly similar conditions during human communication. Both are strictly social in the sense that they are only rarely performed when alone (though this may be less true for smiling), and both are deliberately elicited in listeners by speakers (Provine, 2000). However, there is evidence to suggest that these two actions derive from different evolutionary origins. Van Hooff (1972) and Preuschoft (1995) have argued convincingly that laughter is morphologically related to the facial expressions given by chimpanzees during play (“round open-mouthed face,” or ROM, sometimes also known as the “play face”), whereas smiling is morphologically related to facial expressions associated with submissive behavior (“silent bared teeth face” or “fear grimace,” SBT).
Language, Music, and Laughter in Evolutionary Perspective
265
It is worth noting in this respect that ROM is often associated with pantlike vocalizations (not unlike those given during laughter), whereas SBT is, as its name indicates, largely silent (or given in association with submissive fear calls). Waller (2001) has shown that these two facial expressions (and their associated vocalizations) are given under very different circumstances by chimpanzees. SBT is given in a wide range of social contexts (but, relative to ROM, especially those associated with aggression and/or fear) and is usually associated with a subsequent increase in either aggressive or affinitive interactions. In contrast, ROM is given primarily during play, with play bouts being significantly longer following bidirectional ROM than when ROM is not given or is given by only one interactant. SBT thus seems to have overtones of fearfulness, whereas ROM is associated with relaxed, often boisterous social encounters. The burden of these findings, then, is that, in nonhuman primates, the homologue of human smiling is a fear-submissive behavior, whereas that of laughter is a proactive social behavior intended to stimulate and/or encourage continued interaction. This seems to agree with the circumstances under which smiling and laughter are seen in humans: smiling has always been perceived as having overtones of nervousness (the tense or nervous smile), and is often associated with interactions with unfamiliar or more dominant individuals: Strangers and subordinates smile more than familiars and dominants (Coser, 1960). This being so, it would seem likely that even though smiling and laughter are often seen together, smiling has more to do with submissive behavior than with the kinds of situations that invite laughter. This being so, we can probably discount smiling as a possible bonding mechanism. In contrast, laughter seems like a more plausible candidate as a bonding mechanism. A possible relationship between endorphins and laughter has been mooted for some time. The basis for such a claim is principally casual observation that bouts of intense laughter leave us not just weak and breathless but also flooded with feelings of warmth and wellbeing. The suspicion that the latter sensations are in some way related directly to the effort of laughing is hard to escape, especially given our understanding that opioid production by the brain is readily triggered by many aspects of psychological and physiological stress (Dunbar, 1985). Routinized circuit training involving intense physical exertion on a daily basis, for example, raises serum b-endorphin and met-enkephalin levels in normal, healthy young women (Howlett et al., 1984). Attempts to demonstrate a direct causal relationship between laughter and endorphin production have largely been of limited success. Berk et al. (1989) showed comedy videos to subjects and measured endogenous opioid titers in cerebrospinal fluid before and afterward. Although there was a slight increase in opioid levels after watching the video, the increase was not significant. Unfortunately, the procedures used in the experiments were themselves stressful: cerebrospinal fluid can be obtained only by spinal lumbar puncture,
266
R. I. M. Dunbar
and this procedure is both painful and prone to leave the patient with aftereffects (25 percent of the recipients of a lumbar puncture experience infection, nausea, vomiting, or leakage). Such a procedure is likely to raise opioid levels even before the experiment has begun. In addition, Berk et al. failed to record whether or not subjects actually laughed at the material on the video: Laughter is a social phenomenon, and people who watch humorous videos alone notoriously laugh a great deal less than those who watch them in company (Provine, 2000). However, a number of studies have yielded indirect evidence of an opiate effect from laughter. Most of these have relied on the assumption that opioid release serves to raise pain thresholds. Cogan et al. (1987), for example, used a standard blood pressure cuff to assess the pain tolerance of subjects who experienced four alternative conditions: an audiotape of a laughter-inducing comedy show, a relaxation tape, a neutral narrative, or silence, each presented for 20 minutes before testing with the cuff. They found that subjects from the first two conditions (laughter and relaxation) were able to endure higher cuff inflations than those subjected to the second two conditions (neutral story or silence). In this study, subjects were included in the laughter condition only if they had in fact laughed out loud. Zillman et al. (1993) compared pressure cuff tolerance before and after subjects watched different types of videos (including comedy, instructional, drama, or tragedy), and found that comedy and tragedy produced significant increases in pain threshold, but the other two did not. They suggested that the heightened emotional arousal generated by a tragedy (compared with a simple drama) may have been important here. Finally, Hudak et al. (1991) found that tolerance of a painful electrical stimulus applied to the hand was greater for subjects watching a comedy video than for those watching a documentary. In contrast, Nevo et al. (1993) found no difference in how long subjects could keep their hand in ice water between those who watched a comedy and those who watched a documentary. Rotton and Shats (1996) found no differences in self-reports of pain and discomfort between patients who had been allowed to watch comedy films after major orthopedic surgery compared with those who had been allowed to watch other kinds of films, although the comedy group did request fewer analgesics. Although these various studies have produced some evidence suggesting that laughter might be associated with endorphin release, all fall afoul of the problem that subjects were working in isolation during the experiment. In order to try to circumvent this problem, we used a similar design except that subjects were tested in groups of three to six. Eight females and ten males who stated on a pre-experiment questionnaire that they habitually laughed out loud when watching comedies on TV (the experimental group) were shown a comedy video for 15 minutes, while another group of eight females and ten males watched a video of a news, science, or documentary TV program (the control condition).
Language, Music, and Laughter in Evolutionary Perspective
267
Pain tolerance before and after exposure was assessed by determining how long subjects could keep a “Rapid” wine-cooling sleeve (-16°C when frozen) on their arm. Subjects were asked to say when the cooler became too uncomfortable to continue (subject to a maximum of 180 seconds to prevent skin damage). Time spent laughing while viewing the video was recorded for each subject, using an instantaneous scan sample taken at 30second intervals (30 scans per 15-minute viewing session). The results (figure 14.1) show that while control group subjects tended to be willing to keep the cooling sleeve on for less time on the second (postvideo) occasion (ratio of postvideo to prevideo tolerance below 1.0), those in the experimental group (who viewed a comedy video) tended to show an increase in pain tolerance (ratio of scores greater than 1.0). The difference between the control and experimental groups is significant (ANOVA, data normalized by ln-transformation: F1,34 = 17.86, P < 0.001). Since subjects did not differ in their pretest pain tolerance (means of 39.0 and 43.8 seconds for the control and experimental groups, respectively: data ln-transformed, F1,34 = 0.041, P = 0.842), this difference is unlikely to be due to any differences in how subjects were assigned to the two groups. More interesting, perhaps, when all subjects were pooled together, the data suggested a positive relationship between time spent laughing during video viewing and the ratio of pain thresholds (regression on ln-transformed data: r2 = 0.316, t34 = 3.96, P < 0.001). On balance, then, there is some convincing (if indirect) evidence to suggest that laughter generates opiate release. This being so, laughter might supply the missing pharmacological bonding agent for humans. This would provide both a proximate reward
Difference (secs)
100
0
-100 -5
0
5
10
15
20
25
Scans laughing Figure 14.1 Difference in pain tolerance between prevideo and postvideo conditions (indexed as the ratio of duration for which the wine-cooling sleeve could be kept on the arm) for control subjects (who watched a documentary video) and experimental subjects (who watched a comedy video), plotted against the number of scan samples (taken at 30-second intervals) during which they were laughing. Source: Stowe (2000)
268
R. I. M. Dunbar
for engaging in conversation and a mechanism for facilitating the social bonding of individuals. Such an argument would then explain a number of curious features of human conversational behavior that are otherwise difficult to explain because they seem to lack any functional relevance. These include (1) the fact that joking behavior is extremely common (and universal in all human cultures), even though it would seem to be functionally completely pointless (at least on the conventional view that language evolved to facilitate the exchange of technical knowledge); (2) the fact that we devote a great deal of effort during conversation to trying to stimulate laughter in those with whom we are interacting; and (3) the fact that we clearly find conversations with those who do not respond to our jokes or who fail to smile or laugh during conversations extremely hard going (and likely to precipitate an urgent desire to find a more congenial conversation partner). One implication of this is that conversations which involve a lot of laughter last longer, involve more individuals, and create a greater sense of well-being in the participants than those which involve little or no laughter. In an attempt to test the first prediction, Seepersand (1999) sampled 50 conversations between pairs of adults in public places (bars, cafés) for up to 30 minutes, recording the number of laughs of different kinds given in each minute interval and the main topic of conversation during that interval. The results show that with the exception of some notoriously humorless topics (politics, religion, and technical instruction), the amount of time devoted to a topic in 50 dyadic conversations is a more or less linear function of the number of laughs recorded (figure 14.2). More important, the length of time for which the pair continued to discuss a given topic was significantly greater following intervals when at least one of them laughed than following intervals that did not include a laugh (figure 14.3; Mann-Whitney test, z = -3.96, N = 54.75, P < 0.001). Laughter is, of course, a complex phenomenon, and it has been noted that speakers often laugh more than their listeners (Provine, 2000). Laughter thus might function simply as a socially facilitated opioid-release self-stimulator rather than as a mechanism for triggering opioid release in others. However, irrespective of whether or not this is the case, it is clear that we devote a great deal of effort during conversations to deliberately trying to make our listeners laugh (for example, by telling jokes). Moreover, laughter is a contagiously social phenomenon: People can be made to laugh merely by hearing someone else laughing, even when they have no idea why that person is laughing (Provine, 2000). Laughter may make the speaker feel better (perhaps even more relaxed), but it is equally noticeable that failure to elicit a response is not an ideal basis for a continuing relationship. Conversations with someone who does not laugh are hard to sustain, and they are even harder to sustain if we spend much of our speaking time laughing uproariously— nothing is quite as discouraging as laughing without eliciting laughter in response. The
Language, Music, and Laughter in Evolutionary Perspective
269
10000
Time on topic (mins)
1000
100 Politics Technical
10 Religion
1
0.1 0.01
0.1
1
10
100
1000
Number of laughs
Figure 14.2 Total number of minutes devoted to different topics by 50 conversational pairs (each sampled for approximately 30 minutes), plotted against the total number of laughs recorded while discussing that topic. Aside from those previously identified, the topics included celebrities, colleagues, personal concerns, family, friends, food, health, holidays, immediate surroundings, other people in the venue, leisure, objects, personal matters, personal experiences, personal relationships, future plans, shopping, sport, work. Source: Seepersand (1999)
issue, it seems, is to get others to laugh, and whether we do this by telling them jokes while maintaining a deadpan face or by socially facilitating laughter by laughing ourselves may not matter too much. As many children have found, playing the fool is as good a way of gaining acceptance by one’s peers as any. This emphasis on the use of jokes to stimulate laughter raises an interesting cognitive dimension. Joking is a sophisticated linguistic phenomenon, requiring advanced cognitive abilities (minimally second-order intentionality,1 but probably higher). Jokes commonly depend on creating surprise either by the outcome or through a play on words (by exploiting our ability to comprehend metaphor or double meanings). This may explain why humans possess very much more sophisticated social cognition than monkeys and apes: Whereas chimpanzees can, at best, only aspire to second-order intentionality, humans can habitually cope with fourth-order intentionality (Kinderman et al., 1998; Dunbar, 1998,
270
R. I. M. Dunbar
50
Percent
40 30 20 10 0 1
3
5
7
9
11
15
Time on topic (mins) Figure 14.3 Number of minutes for which conversational pairs continued to discuss the same topic following a point at which at least one of them laughs (solid bars) or no laugh had occurred in the previous five minutes (open bars). Sample is 33 dyadic conversations, each of 20–30 minutes duration. Source: Seepersand (1999)
2002). Two additional orders of intentionality hardly seem necessary just for parsing speech and comprehending the factual meaning of utterances. However, if added to this is something additional in terms of understanding metaphor and hidden meaning, then demand for extra processing costs may be more plausible. It is not clear, however, whether joking is cause or consequence of humans’ advanced social cognitive abilities: The causal arrow can run in either direction. In this respect, it may be important not to confuse those features of laughter that are primitive (and were responsible for its evolution) with those that are derivative (having been exploited subsequent to its evolution). One final point perhaps requires comment. Laughter should not be seen as a mechanism for bonding groups of 150 individuals. Grooming does not function in this way among primates. Primates use grooming to reinforce those special relationships that lie at the core of their social systems. Kudo and Dunbar (2001) showed that grooming networks among primates (including humans) are very small (1–4 across a range of primate species, 12–15 in humans), and that these correlate both with neocortex size and with total group size. My interpretation of these findings is that the coalitions that grooming relationships make possible are crucial to the animals’ ability to live in groups because they provide the basis for buffering the individual against the stresses created by living in a large group. The larger the group, the larger the coalition (or grooming network) that is necessary to make that possible.
Language, Music, and Laughter in Evolutionary Perspective
271
Conclusion I have made three central claims. First, language evolved to allow humans to bond larger groups than they would otherwise be able to do using the more conventional mechanism of social grooming that underpins the social groups of our primate cousins. Language overcomes the time constraints that exist with grooming by increasing the broadcast network and by allowing us to acquire information on what is going on within our social network. Second, language was probably preceded by a (possibly lengthy) period in which music (in the form of communal singing) was used to extend grooming time into nonsocial activities, during which time the neural controls required for speech were laid down. Third, speech per se lacks some of the key proximate reinforcers that make social grooming an effective bonding agent (even in modern humans), namely, the psychopharmacological effects of endorphin (and perhaps oxytocin) release. I have suggested that the classic primate play invitation face (“round open mouth” display) and its associated vocalizations were elaborated to fill this need, presumably because of their natural endorphinreleasing properties. Joking developed as a linguistic means of stimulating laughter (and hence endorphin release) to provide the pharmacological reinforcer needed for servicing relationships. Acknowledgments I am grateful to Don Owings, Chuck Snowdon, and the editors for their helpful comments on the manuscript. Note 1. Intentionality refers to the reflexive ability to understand states of mind (Dennett, 1981). Second-order intentionality (also known as theory of mind, or mind reading) is the ability to understand that someone else has a belief/desire/intention.
References Aiello LC, Dunbar RIM (1993) Neocortex size, group size and the evolution of language. Curr Anthropol 34: 184–193. Barrett L, Dunbar RIM, Lycett JE (2002) Human Evolutionary Psychology. Basingstoke, Hampshire: Palgrave/Macmillan, and Princeton, N.J.: Princeton University Press. Berk LS, Tan SA, Fry WF, Napier BJ, Lee JW, Hubbard RW, Lewis JE, Eby WC (1989) Neuroendocrine and stress hormone changes during mirthful laughter. Amer J Med Sci 298: 390–396. Carter CS (1998) Neuroendocrine perspectives on social attachment and love. Psychoneuroendocrinology 23: 779–818.
272
R. I. M. Dunbar
Coelho AM (1974) Socio-bioenergetics and sexual dimorphism in primates. Primates 15: 263–269. Cogan R, Cogan D, Waltz W, McCue M (1987) Effects of laughter and relaxation on discomfort thresholds. J Behav Med 10: 139–144. Coser R (1960) Laughter among colleagues: A study of the social functions of humor among the staff of a mental hospital. Psychiatry 23: 81–95. Dennett D (1981) Intentional systems in cognitive ethology: The “Panglossian Paradigm” defended. Behav Brain Sci 6: 343–390. Dunbar RIM (1980) Determinants and evolutionary consequences of dominance among female gelada baboons. Behav Ecol Sociobiol 7: 253–265. Dunbar RIM (1985) Stress is a good contraceptive. New Sci 105: 16–18. Dunbar RIM (1989) Reproductive strategies of female gelada baboons. In: Sociobiology of Sexual and Reproductive Strategies (Rasa A, Vogel C, Voland E, eds.), 74–92. London: Chapman and Hall. Dunbar RIM (1991) Functional significance of social grooming in primates. Fol Primatol 57: 121–131. Dunbar RIM (1992a) Neocortex size as a constraint on group size in primates. J Hum Evol 22: 469–493. Dunbar RIM (1992b) Time: A hidden constraint on the behavioural ecology of baboons. Behav Ecol Sociobiol 31: 35–49. Dunbar RIM (1993) Coevolution of neocortex size, group size and language in humans. Behav Brain Sci 16: 681–735. Dunbar RIM (1996) Grooming, Gossip and the Evolution of Language. Cambridge, Mass.: Harvard University Press. Dunbar RIM (1998) Theory of mind and the evolution of language. In: Approaches to the Evolution of Language (Hurford J, Studdart-Kennedy M, Knight C, eds.), 92–110. Cambridge: Cambridge University Press. Dunbar RIM (2000) On the origin of the human mind. In: The Evolution of Mind (Carruthers P, Chamberlain A, eds.), 238–253. Cambridge: Cambridge University Press. Dunbar RIM (2002) Why are apes so smart? In: Primate Life Histories (Perreira M, Kappeler P, eds.), Cambridge, Mass.: MIT Press. Goosen C (1981) On the function of allogrooming in Old World monkeys. In: Primate Behaviour and Sociobiology (Chiarelli AB, Corruccini RS, eds.), 110–120. Berlin: Springer-Verlag. Henzi SP, Barrett L (1999) The value of grooming to female primates. Primates 40: 47–59. Hill RA, Dunbar RIM (2003) Social network size in humans. Hum Nat 14: 53–72. Hooff JARAM van (1972) A comparative approach to the phylogeny of laughter and smiling. In: Nonverbal Communication (Hinde RA, ed.), 209–241. Cambridge: Cambridge University Press. Howlett TA, Tomlin S, Ngahfoong L, Rees LH, Bullen BA, Skrinar GS, MacArthur JW (1984) Release of b endorphin and met-enkephalin during exercise in normal women: Response to training. Brit Med J 288: 1950–1952. Hudak DA, Dale A, Hudak MA, DeGood DE (1991) Effects of humorous stimuli and sense of humour on discomfort. Psych Rept 69: 779–786. Justin PN, Sloboda JA (eds.) (2001) Music and Emotion. Oxford: Oxford University Press. Kay RF, Cartmill M, Balow M (1998) The hypoglossal canal and the origin of human vocal behavior. Proc Nat Acad Sci USA 95: 5417–5419. Keverne EB, Martensz N, Tuite B (1989) Beta-endorphin concentrations in cerebrospinal fluid of monkeys are influenced by grooming relationships. Psychoneuroendocrinology 14: 155–161. Killworth PD, Bernard HP, McCarthy C (1984) Measuring patterns of acquaintanceship. Curr Anthropol 25: 385–397. Kinderman P, Dunbar RIM, Bentall RP (1998) Theory-of-mind deficits and causal attributions. Brit J Psych 89: 191–204.
Language, Music, and Laughter in Evolutionary Perspective
273
Kudo H, Dunbar RIM (2001) Neocortex size and social network size in primates. Anim Behav 62: 711–722. MacDonald CB (1955) Company. In: Encyclopaedia Britannica, 14th ed. London: Encyclopaedia Britannica. MacLarnon A, Hewitt G (1999) The evolution of human speech: The role of enhanced breathing control. Amer J Phys Anthropol 109: 341–363. Muncy RL (1973) Sex and Marriage in Utopian Communities: 19th Century America. Bloomington: Indiana University Press. Nevo O, Keinan G, Teshimovsky-Arditi M (1993) Humor and pain tolerance. Humor 6: 71–88. Nowak MA, Komarova NL (2001) Towards an evolutionary theory of language. Trends Cognit Sci 5: 288–295. Nowak MA, Krakauer DC (1999) The evolution of language. Proc Nat Acad Sci USA 96: 8028–8033. Nowak MA, Krakauer DC, Dress A (1999) An error limit for the evolution of language. Proc Roy Soc London B266: 2131–2136. Petersson M, Alster P, Lundeberg T, Unväs-Moberg K (1996) Oxytocin increases nociceptive pain threshold in a long-term perspective in female and male rats. Neuroscience 212: 87–90. Preuschoft S (1995) “Laughter” and “smiling” in macaques: An evolutionary perspective. Ph.D. thesis, University of Utrecht. Provine RR (2000) Laughter: A Scientific Investigation. London: Faber and Faber. Rotton L, Shats M (1996) Effect of state humor, expectancies and choice on postsurgical mood and selfmedication: A field experiment. J Appl Soc Psych 26: 1775–1794. Seepersand F (1999) Laughter and language in evolution. M.Sc. thesis, University of Liverpool. Seyfarth RM, Cheney DL (1984) Grooming, alliances and reciprocal altruism in vervet monkeys. Nature 308: 541–543. Stowe J (2000) Investigation into the possible influence of laughter on endorphin release reflected through pain tolerance. M.Sc. thesis, University of Liverpool. Unväs-Moberg K (1998) Oxytocin may mediate the benfits of positive social interaction and emotions. Psychoneuroendocrinology 23: 819–835. Waller B (2001) Are there differential behavioural effects of “smiling” and “laughing” in chimpanzees (Pan troglodytes). M.Sc. thesis, University of Liverpool. Zillman D, Rockwell S, Schweitzer K, Sundar S (1993) Does humor facilitate coping with physical discomfort? Motiv Emot 17: 1–21.
15
Kin Selection and “Mother Tongues”: A Neglected Component in Language Evolution
W. Tecumseh Fitch In a famous passage, J. B. S. Haldane (1955) conveyed the seed of the idea of kin selection when he acknowledged the selective advantage of saving, at risk to his own life, drowning brothers or cousins, but not more distant relatives. In an odd turn for so insightful a biologist, he then concluded that it was highly unlikely for such logic to explain known examples of altruism. Haldane’s reasoning was logical: In a large population the average relatedness would be much smaller than the one in ten risk of drowning, and it would indeed be unprofitable, genetically speaking, to jump to the aid of a randomly chosen individual. What is peculiar is that Haldane overlooked the fact that a “gift” of altruism, if bestowed selectively to closely related kin, could easily be selected for, thus leaving it to Hamilton (1964a, b) to comprehend and formalize inclusive-fitness theory, and make the most significant contribution to evolutionary theory since Darwin. Indeed, such “altruistic” acts need only satisfy Hamilton’s famous inequality Br > C (the Benefit to kin, as diluted by their fraction of relatedness r, must exceed the Cost to self) for selection to favor the action. Perhaps Haldane’s reluctance to acknowledge this possibility was influenced by his own rather unusual experience of twice saving a drowning individual, reportedly giving no conscious thought to relatedness. The unsavory implications of doing otherwise were resurrected by the term “nepotism” associated with early experimental work on kin selection in the 1970s (e.g., Sherman, 1977). Perhaps distaste for nepotism partially accounts for the fact that kin selection has only slowly begun to be integrated into mainstream theory on the evolution of communication (Grafen, 1979; Maynard Smith, 1978, 1994; Johnstone and Grafen, 1992a; Godfray, 1991; Bergstrom and Lachmann, 1998b). My aim in this chapter is to help move this process of integration forward, particularly in the context of the evolution of human language, where the intersection between kin selection and communication theory appears to have interesting unexplored implications. I will refer to systems of communication that have evolved in a context of kin selection as “mother tongues,” and will argue that such systems have very attractive theoretical properties for the evolution of rich communication systems like spoken language. Namely, mother tongues can be selected for accurate or “honest” communication (because senders and receivers often have each other’s best “genetic interests” at heart), and for semantic complexity (exchange of detailed information being thus valuable by increasing inclusive fitness to a theoretical limit set only by the complexity of senders’ and receivers’ mental structures). I will suggest that these dual virtues allow mother tongues to evade the evolutionary traps of constant Machiavellian deceit or wasteful Zahavian handicaps, which frequently bedevil communication systems among nonkin.
276
W. Tecumseh Fitch
In this chapter I provide a brief introduction to some relevant evolutionary theory, review known examples of kin-specific communication in nature, and consider some of the selective forces proposed by previous scholars to underlie the evolution of human language. Then I will advance the argument for mother tongues. Kin selection has led to many of the honest low-cost signals believed to represent mother tongues in animals, including squirrel alarm calls, primate grunts, cat purring, signature whistles in dolphins, and many others. Despite their ubiquity, such systems appear to lack the structural complexity that would be necessary to convey arbitrarily complex thought, as human language does. Paradoxically, the most structurally complex communication systems in nature, other than human language, are probably the learned passerine bird and humpback whale “songs,” which as far as we know convey no propositional information at all. Such sexually selected systems are typically confined to males and appear at puberty, unlike human languages, which appear during infancy in both sexes. However, the massive transmission of information that occurs during the extended learning periods of childhood in many higher vertebrates can select for vocal systems which aid this exchange even slightly. I suggest that the combination of honest communication among kin and highly complex structure was adaptive in the context of the exchange of detailed information among kin (especially parents and offspring) over the extended human childhood, which accounts for both the precocity and the sexual equality of language. The mother tongue hypothesis bypasses many of the problems currently plaguing neo-Darwinian theories of the evolution of language, identifies numerous relevant points of contact between human language and communication systems in other animals, and makes a number of testable predictions. Background: Kin Selection, Signaling Theory, and the Evolution of Language Altruism and Kin Selection The notion of kin selection and the theory of inclusive fitness represent the most important contribution to evolutionary theory in the 20th century, and provide an explanation for “altruistic” phenomena, such as sociality in insects, that deeply troubled Darwin. The central notion, as intuited by Haldane (1955), is that fitness is not simply a factor of an individual’s survival or its success at producing offspring, but also of the reproductive success of all those relatives who share its genes (Hamilton, 1964a, 1964b). If an individual’s actions aid relatives’ survival or reproduction at a minor cost to itself, they will increase its overall inclusive fitness. Hamilton realized that from a gene’s perspective, the critical question is not the survival of a particular mortal body in which it finds itself, but
Kin Selection and “Mother Tongues”
277
in the propagation of copies of itself in any body. This (in retrospect rather intuitive) concept of inclusive fitness has important implications for the evolution of social behavior. It provided an immediate solution to the problem of the “altruism” of female honeybees and ants, who cooperate to raise their sisters and protect them at the cost of their own lives, while bearing no young themselves. In the framework of inclusive fitness, eusociality follows from a genetic peculiarity of hymenopteran insects, which results in sisters being more closely related to one another than they are to their own offspring (Wilson, 1975). A second theoretical explanation for apparently altruistic acts does not require kinship. This is known as “reciprocal altruism” (Trivers, 1971), in which unrelated individuals mutually benefit by taking turns exchanging resources at times when the cost to the donor is low and the benefit to the receiver is high. Such behavior requires a rather special set of circumstances, and empirical demonstrations of reciprocal altruism in nature are rare at best (e.g., the exchange of blood by vampire bats studied by Wilkinson [1984] was mostly among kin). Thus, despite the ubiquity of reciprocity in human society, the available evidence suggests that such systems are rare in animals, in sharp contrast to kinselected systems, which are very common. Honest Signaling Theory “Honest” signals are those which accurately (though not necessarily perfectly) convey information about some relevant quality of the signaler (e.g., its species, sex, size, condition, etc.) or environment (Dawkins and Guilford, 1991). Like many terms in modern ethology, this one should be interpreted in this technical sense, not in terms of human honesty, which includes assumptions of self-knowledge, intention to communicate, and various other factors. Thus, if a newborn baby cries only when needing care, this is an “honest” signal, despite the fact that few would attribute honesty, in the ordinary sense, to newborns. Below I will dispense with the quotation marks in the understanding that “honest” is being used in its technical sense throughout. A long tradition in ethology assumed that much communication evolved to facilitate honest communication, particularly among kin (Dawkins and Krebs, 1978; Hinde, 1981). This assumption came under attack in seminal papers by Amotz Zahavi (Zahavi, 1975, 1977), which spurred a large literature on honest signaling theory. The central claim of Zahavi’s attack, echoed in Dawkins and Krebs (1978), is that such honesty cannot be simply assumed. In fact, natural selection should in many cases favor dishonest behavior, if it leads to personal advantage and increased reproductive success. This is easily understood when we consider the advantages obtained by liars and cheats in human society if they escape detection and punishment. By highlighting the readiness with which Machiavellian deceit can destabilize honest signaling systems, this perspective suggests that the real question is why any signaling systems are honest (if indeed they are).
278
W. Tecumseh Fitch
Zahavi’s proposed solution, the “handicap principle,” embodies a somewhat nonintuitive claim: that honest signals are possible only when the signaler pays a high cost when emitting the signal. According to Zahavi (1993), such costs are necessary if a signal is to stay honest and remain in circulation over evolutionary time. Despite early critiques of this idea from a mathematical viewpoint (e.g., Maynard Smith, 1976), the handicap principle received theoretical support from a complex mathematical model of signaling introduced by Grafen (1990a, 1990b), and has since generated a weighty volume of theoretical work, along with a less impressive body of empirical work. Recently some results of early theoretical work were discovered to be dependent on errors in the original papers (Siller, 1998). It is far beyond the scope of this chapter to review all of this literature, much of which is highly technical, and I confine myself to three major themes that have emerged. Any signal bears some cost (even if only the time wasted not doing something else) (Maynard Smith and Harper, 1995). Unless the handicap principle requires more stringent conditions on signaling costs than their existence, it reduces to this obvious and uninformative fact. Empirical demonstrations of some cost of signal production thus provide no support for the handicap principle, which demands additional “strategic” costs over and above those necessary to produce a detectable signal. When the costs of signals have been measured, highly costly signals appear to be the exception rather than the rule, contrary to the predictions of the handicap principle. For example, vocal signals in many vertebrates appear to have surprisingly low metabolic costs (Chappell et al., 1995; Horn et al., 1995; McCarty, 1996) while still maintaining honesty; the same is true of human speech. Various signals, such as piloerection (“raising the hackles”) or crest erection in birds, appear to have small physiological costs but are extremely common (Marler, 1968; Wilson, 1972). Furthermore, signaling systems evolve within a context of physical and physiological constraints which may make honesty difficult or impossible to escape (Maynard Smith and Harper, 1995). In such cases honesty is the default condition and it is dishonesty which demands an adaptive explanation (Fitch and Hauser, 2002). An example is formant cues to body size which are present in vocalizations of many birds and mammals. Formants are the resonances of the vocal tract, and their frequencies depend on the length of this tube of air in a manner that follows from acoustic laws. Because (in most cases) vocal tract length is itself necessarily correlated with body size, formants often provide a “free” cue to body size in any vocalizing vertebrate (Fitch, 1997; Fitch and Giedd, 1999). Such size information does not cost anything extra to encode: it is automatically present as a result of the physics and physiology of sound production. Although selection has acted in some cases to exaggerate formant cues by elongating the vocal tract beyond its normal anatomical confines (Fitch and Reby, 2001; Fitch, 1999), it is this deception which is an adaptation, not the original honest signal which existed by default.
Kin Selection and “Mother Tongues”
279
Second, an important limitation of much early signaling theory is that it did not include the costs and benefits to receivers, implicitly assuming such “assessment costs” to be at or near zero. Later work redressed this oversight by focusing on the costs to receivers of eliciting or evaluating signals (Dawkins and Guilford, 1991) or on the effects of less-thanperfect signal reception (Wiley, 1994; Johnstone and Grafen, 1992b). These results show that “conventional” signaling systems, which convey some information most of the time, but tolerate a low level of deceptive signaling, can be more stable evolutionarily than totally honest systems which exact a high price from both signalers and receivers. In general, threat displays should tolerate some level of bluffing, because the assessment costs to intruders of “probing” could be severe bodily injury if the signaler is not bluffing (Adams and Caldwell, 1990). Similarly, it may not pay a female choosing a mate to spend weeks evaluating the quality of her potential mates. Instead, the best strategy can be to choose one who looks or acts like a past good mate, or even simply to choose the male that other females are mating with (e.g., Losey et al., 1986). These issues are important, virtually ensuring that evolution will not fill the world with arbitrarily costly signaling systems. However, such considerations are less relevant in the quest for cheap, honest signaling systems that I am concerned with here. Their main importance in the context of the evolution of language is that the cost of discovering the truth provides a theoretical model for how a low level of dishonesty can persist indefinitely in a basically honest system. Such is the case for human language, and assessment costs may account for the obvious fact that despite its undeniable (and quite remarkable) capacity for honest transmission of information, language is not always used honestly. Finally, the most significant modification of honest signaling theory comes from a consideration of communication among kin. This issue has been studied in the context of the Sir Philip Sydney game, introduced by Maynard Smith (1991), a stylized theoretical framework in which an individual must decide whether to donate some indivisible resource to a needy relative. Donation to a relative in this framework allows a potential inclusive fitness benefit to the donor at an immediate cost to the donor. Maynard Smith found that an honest signaling system can evolve in the presence of low cost to the signaler if sender and receiver interests do not conflict (technically, if the outcomes are ranked equivalently by both participants). Further models have extended this theory (e.g., Godfray, 1991; Johnstone and Grafen, 1992a; Bergstrom and Lachmann, 1998a), confirming the possibility of low or zero signaling costs for closely related individuals. This work has been tested empirically as well, and very low physiological costs have been measured in begging nestling birds (McCarty, 1996). Theorists thus agree that, given partially shared genetic interests among kin, cheap or free honest communication systems can evolve, and be evolutionarily stable.
280
W. Tecumseh Fitch
Nonhuman Mother Tongues: Kin Communication in Other Species A voluminous literature demonstrates kin communication in animals, and I will briefly review only a few examples: food and route learning, alarm calls, and nestling begging calls. Many precocial birds such as grouse, ducks, and chickens must essentially feed themselves from a very early age. In such species, it is quite common for the young to follow their mother, creating an opportunity to learn by example from her behavior. The existence of food calls, emitted when sighting food or feeding on it, provides a nice example of a very simple form of kin communication that can transmit useful, learned information about what is (or is not) nutritious. Simply by feeding on a particular substance, and emitting food calls to attract the attention of her young, a mother provides an opportunity for her children to benefit from her past experience and thus bypass a certain amount of trial-and-error learning. This may be the simplest example of kin communication that helps transmit learned information. A similar example is provided by migrating birds. In many species juveniles accompany their parents on the first longdistance migration and thus learn safe migration routes and stopping points (Matthews, 1968). Alarm calls provide the most intensively studied example of mother tongues. Many birds and mammals emit characteristic calls at the appearance of predators, thus alerting conspecifics to the predator and often eliciting immediate escape reactions. The existence of such “alarm calls” in birds and mammals was early recognized as a problem by evolutionary theorists (Hamilton, 1964b; Maynard Smith, 1965). The problem is simple: From an individually selfish viewpoint, why should an organism spotting a predator vocalize, and thus call attention to itself, when it could just slink away, leaving its unsuspecting groupmates to be attacked? Although various “selfish” proposals have been offered, such as that the call deters predators from repeated hunting at that site (Trivers, 1971; Sherman, 1985), these are seen as relatively implausible compared with the alternative: that calls serve to warn kin and thus increase inclusive fitness. During the 1970s, a wealth of comparative data was gathered on this topic, providing strong empirical support for the kin-communication hypothesis. In two seminal papers (Dunford, 1977; Sherman, 1977), ground squirrel females with kin present were found to be the predominant alarm callers. Males and transient females did little alarm calling. Sherman’s paper further demonstrated a cost to calling: Callers were significantly more likely to be killed than noncallers. Further data on other species substantiated these conclusions (Hoogland, 1983; Smith, 1978; Barash, 1976), with the interesting twist that in species where males participate in parental care and/or live among kin, males also call. Although it may be the case that much of the alarm calling serves to protect an individual’s offspring, Sherman (1980) provides data indicating that offspring are not the only
Kin Selection and “Mother Tongues”
281
kin “protected” via alarm calls, and that alarm calls are therefore kin communication in the wider sense. Once a proclivity for alarm calling is established, kin selection may also act to increase the specificity of the calls, for example, to distinguish aerial predators from ground predators. This elaboration is observed in a variety of bird and mammal species (Klump and Shalter, 1984), including nonhuman primates (Seyfarth et al., 1980). The “honesty” of these differentiated signals is easy to explain in the context of kin selection: If there are different optimal escape strategies for different predators, an individual will increase its inclusive fitness by emitting calls different enough to allow listening kin to adopt the best escape strategy. Note that this does not require intent by the caller to “label” different predators: A difference in arousal caused by different predators that leads to discriminable differences in call acoustics would suffice (Owings and Hennessy, 1984; Seyfarth and Cheney, 1997). Screams of surprise might reliably signal less dangerous predators than screams of terror. Thus one does not have to posit intentional “predator labeling” to understand the adaptive value of signal elaboration in the context of kin-selected alarm calls. Of course, the number of differentiated signals that are necessary in this context will be limited by the types of danger that require different responses. Even in the case of highly preyed-upon species like vervet monkeys, with 16 known predators (Cheney and Seyfarth, 1981) or Belding’s ground squirrels, with nine known predators (Sherman, 1977, 1985), alarm calling alone will never lead to an infinitely extensible set of vocalizations. Nestling begging is a type of kin communication that has been studied from both theoretical (e.g., Godfray, 1991) and empirical perspectives (McCarty, 1996; Briskie et al., 1994; Haskell, 1994). Begging involves food as a benefit (not just information), and so differs considerably from the above two examples of mother tongues. Competition between nestmates for food can be extremely fierce in birds, sometimes leading to siblicide. Such sibling competition and/or parent/offspring conflict will be much less severe if the young are being “fed” information, at low physiological cost, by the parents. With food involved, we might expect some “dishonest” begging. However, the evidence from metabolic rates suggests that begging is not particularly costly (McCarty, 1996), and no clear examples of dishonesty are known, even in this highly competitive situation. To summarize, comparative data show that mother tongues provide, in many cases, the preconditions for the evolution of honest, low-cost communication systems. Such systems are common in mammals that live in kin groups. Thus the theory of mother tongues, by focusing on the genetic common interests of kin, seems to satisfy one of the primary desiderata for a theory of language evolution: honest communication without handicaps. I submit that this is an important, if relatively obvious, point in its own right. It is surprising that this hypothesis appears to have escaped detailed discussion by earlier
282
W. Tecumseh Fitch
theorists, who typically focus on communication among nonkin adults when discussing language evolution. Critical Hurdles in the Evolution of Language From the viewpoint of natural selection, honesty is not always the best policy, and when honesty exists, it demands an explanation. This has some important implications for the evolution of human language. Spoken language is low-cost and has an unparalleled capacity to honestly convey detailed and arbitrarily complex information. Thus language is quite anomalous from the viewpoint of handicap theory (Zahavi, 1993), which has led recent writers to highlight this apparent discrepancy (e.g., Zahavi, 1993; Dessalles, 1998; Knight, 1998). However, this discrepancy is troubling only to the extent that the handicap principle is true. The point of the review above is that communication systems evolving among kin need not rely on handicaps (whether other systems do, is a separate question). Thus it seems worth considering the thesis of this chapter: that human language evolved in a context of communication among kin. Language is so different from the communication systems of other animals that the very comparison sometimes seems strained. However, it is clear that there are fundamental biological similarities between humans and animals, in terms of both neural functions (there are no new neurotransmitters, or new types of neurons, in humans) and genetics (virtually all of our genes are shared with mice, and the sequence similarity of these genes is around 99 percent between chimps and humans). Further, many aspects of human language are built on a foundation shared with other animals (these include, uncontroversially, most aspects of the vocal production and hearing apparatus, the system of long-term memory that must underlie the lexicon, and rather complex conceptual structures including memory of places, events, and individuals). An important issue faced by any theory of language evolution is how to conceptualize the undeniable differences between language and other communication systems in a manner that neither neglects the similarities nor trivializes the differences, both of which are important. To my mind, a natural framework within which to understand language evolution is comparative, that is, in comparison to the many aspects of animal communication and cognition that are now reasonably well understood. A comparison of language and animal communication systems allows us to identify important differences along with key similarities, both homologies (probably present in our prelinguistic ancestors) and analogies (repeatedly evolved solutions to some common problem) (Hauser et al., 2002). This approach highlights difficult evolutionary problems that were solved, somehow, en route to modern language. These are, in no particular order:
Kin Selection and “Mother Tongues”
283
1. Cheap honesty: an ability and propensity to communicate rich and accurate information, at low cost (“meaning”) 2. Vocal imitation:
an ability to learn and reproduce arbitrary acoustic signals
3. Generativity (or “discrete infinity”): the ability to generate a complex, open-ended system of words and sentences (“syntax”) While the combination of these capabilities appears unique to humans, each of these three basic capacities has parallels in the animal kingdom. While there are many examples of either information-poor systems (e.g., most birdsongs appear to have little meaning beyond “I’m a male of species X, males stay away, females approach”) or actively deceitful systems (some bird alarm calls are used deceptively as frequently as 60 percent of the time (Møller, 1988, 1990, Munn, 1986), there are also “honest” systems in nature that convey accurate information (e.g., mammal alarm calls), including information about events that are not perceptually present (e.g., honeybee dance). Vocal imitation is also well developed in other species, though apparently not in other primates (Janik and Slater, 1997; Fitch, 2000)—most birds have a capacity for vocal learning, and some, such as mockingbirds, have a rich, unbounded capacity to imitate arbitrary sounds. Finally, the ability to recombine smaller units to generate an open-ended and potentially infinite variety of words (phonology) and sentences (syntax), seen in all human languages, has often seemed qualitatively different from the capacities of any other animal. This property follows from the manner in which languages generate their structures by recombination of a finite set of primitives (phonemes/syllables in phonology, words in syntax). Any word can be extended by adding various affixes (as in adding “non-” to “disestablishmentarianism”), and any sentence by adding new phrases (as in adding “Mary believes that” to any sentence). The critical factor in both systems is flexible, open-ended generation of novelty by recombination of discrete elements (Studdert-Kennedy, 1998; Nowak et al., 1999); hence the term “discrete infinity.” This concept has occasionally been criticized because, in reality, we produce neither infinitely long words nor infinitely long sentences. These limitations appear to be a matter of implementation limitations (of memory, time, breath, etc.) rather than intrinsic limitation of the principles of phonology or syntax per se. I agree that such limitations play an important role in the neural implementation and evolution of language, and do not advocate ignoring them. However, these facts also do not justify neglect of the basic productivity of language. This productivity is central to language, and very different from most other communication systems. However, the songs of birds or humpback whales use recombination of basic units to form larger, more complex units, and there are no obvious limits on the variety of the units thus formed (Payne and
284
W. Tecumseh Fitch
McVay, 1971). Although these larger units appear to be ends in themselves, rather than a vehicle for transmitting detailed messages, they are richly generative nonetheless. A similar point could be made about melody in music: there is no known limit to the number of possible melodies, although these variant structures convey no obvious propositional meanings. To summarize, the combination of honesty, vocal imitation, and generativity appears to be unique to Homo sapiens, despite the fact that analogues of each capability are observed in other species. That these abilities are necessary for language should be relatively uncontroversial; whether they are sufficient is certainly not. For instance, modern language relies heavily on a notion of the contents of other minds, often termed a “theory of mind.” More controversially, many authors suggest that a much more complex set of abilities, including detailed innate constraints on both phonology and syntax summed up by the phrase “universal grammar,” are also necessary (Pinker and Bloom, 1990; Jackendoff, 1999). I will make no commitment on these issues here (see, e.g., Jackendoff, 2002; Bickerton, 1995; Nowak et al., 2001; Hauser et al., 2002). All workers should agree on the necessity of the above three components in one form or another, and I shall focus on them. Previous Theories for the Selective Value of Language Any evolved capacity can be explained in terms of mechanisms (anatomy, neural circuitry, etc.), ontogeny (the developmental construction of the capacity), phylogeny (the history of its evolution), and function (the selective advantage of the capacity). These explanations are complementary, not alternatives (Tinbergen, 1963). While a complete understanding of a trait would necessitate answers to all of these questions, they are conceptually separate and can be treated alone. In this chapter I will discuss only the functional side of the components underlying language. My goal is to explain the adaptive function of the key components of imitation, complexity, and honesty, making as few a priori assumptions as possible. I do not assume that the function of the three components was the same, nor that they evolved simultaneously, nor that their function today in modern language is necessarily the same as their original function. I will not discuss the evolution of the mechanisms of speech production or perception (for this, see Lieberman, 1984; MacNeilage, 1998; Liberman and Mattingly, 1985; Fitch, 2000) nor the phylogenetic history of language (in australopithecines, Homo erectus, or Neanderthals: Lieberman, 2000). The question posed here is what selective advantage language ability gave to its first users, whoever they were. Some have questioned the very notion that language is adaptive, suggesting that language is a “spandrel”—a nonadaptive by-product of some other trait, such as large brains. This viewpoint derives from a misinterpretation of Chomsky’s technical use of the term
Kin Selection and “Mother Tongues”
285
“language” to pick out core aspects of syntax. Clearly, certain aspects of language will be nonadaptive by-products of other changes (as for virtually any trait), and it is likely that some aspects of syntax are among them. However, this cannot be the case for the language capacity in the broad sense, including phonetics, phonology, syntax, semantics, and pragmatics. This complex set of interacting subsystems could not result from genetic drift, physical constraints, or correlated change, but has all the earmarks of an adaptation (Lieberman, 1984; Pinker and Bloom, 1990; Jackendoff, 2002; Hauser et al., 2002). It is this whole that I aim to understand in a selective context. It has also been questioned whether the most important function of language is communication. While the powerful generativity and structure provided by language undeniably have value beyond communication (e.g., for more elaborate or articulated thought), such noncommunicative generativity is not by itself adequate to explain the generativity of human language. Humans clearly are able to use their generative capabilities in communication, and the phonological system which allows this is extraneous from the viewpoint of “pure thought.” Furthermore, even when modern humans use language to think privately, we “hear” the sounds of words in our heads (and can form rhymes, count syllables, etc.), meaning that even private language makes use of this external, socially shareable component of language. Thus, it is important to consider the evolutionary basis for such capabilities in the context of communication, and of animal communication systems. There are abundant hypotheses as to the selective value of at least some aspects of language, which I will now selectively review. Many earlier authors focused on the role of linguistic communication among adults, with very little comment on its role in transferring information between parents and offspring. In particular, a conviction that complex language serves to increase an individual’s mating success directly, by making the speaker attractive to potential mates, appears to be held by an otherwise diverse group of authors (e.g., Bickerton, 1998; Miller, 2001; Pinker and Bloom, 1990; Lightfoot, 1991). For example, “Females would surely have preferred mates whose communicative capacities so strikingly outclassed those of other available partners” (Bickerton, 1998, p. 353), and “that tribal chiefs are often both gifted orators and highly polygynous is a splendid prod to any imagination that cannot conceive of how linguistic skills could make a Darwinian difference” (Pinker and Bloom, 1990, p. 725). Despite the apparent appeal of a putative link between mating success and complex utterances to academics, I know of no data indicating a link between linguistic complexity and mating success in humans. It is thus surprising that so many scholars assume that “better grammar led to more sex” in our evolutionary history. Several authors have suggested that social intelligence played a key role in language evolution. Traditionally it has been assumed that the large brains of nonhuman primates, and probably of humans as well, result from selection for increased behavioral flexibility,
286
W. Tecumseh Fitch
spatial memory, and other factors aiding survival by ecological generalists. The “Machiavellian intelligence” hypothesis (Humphrey, 1976; Byrne and Whiten, 1988) suggests, in contrast, that the increased social complexity of group living fueled the dramatic increase in neural horsepower that characterizes primate evolution. Stable groups pose serious information processing problems—just remembering identities of and past interactions with 20 or more group members will be challenging for the average mammal, and primates live in a social world where these interactions and relationships are crucial to survival. Dunbar (1993, 1996) has suggested that these social pressures increased to the breaking point in early hominids, when group size grew above a limit imposed by grooming time in other primates. Dunbar’s “gossip as grooming” hypothesis extends the Machiavellian intelligence notion, suggesting that language arose primarily as a solution to the problems for group cohesion created under these circumstances. By Dunbar’s hypothesis, language exists primarily to exchange information about other group members, and thus to establish and cement social relations within small coalitions or subgroups. A similar idea is proposed by Bickerton (1998), who suggests that the neural mechanisms initially evolved for social intelligence (particularly “theta analysis”—who did what to whom) were exapted (put to new evolutionary use) into the new realm of processing syntactic structure. Deacon (1997) has proposed that the selective value of language arose from its value in stabilizing the relationship between monogamous males and females, a relationship that became necessary as human children became an increasing burden, demanding more care than a single individual female could provide. By this hypothesis, the intrinsic instability of parental monogamy (which is extremely rare in mammals) required some stabilizing mechanism, and language arose to fill this need. Both the ability to have rational discussions of past and future between the mates, and the ability for other group members to act as gossips who report extra-pair dalliances, are suggested to have been important in the evolution of our unusual linguistic ability to discuss the past and future. To summarize, the proposed role of language in facilitating matings seems to be the most popular selective advantage ascribed to early language. First advanced by Darwin (1871), the idea that sexual selection played a key role in language evolution is appealing in its simplicity: In the contest for mates, there is a constant “arms race” among different displays. For example, a choosy female observing males might always select the most complex display, perhaps because it originally provided some indication of male quality (intellect, vigor, etc.). In such a situation, a male who could always “trump” a neighbor’s display by repeating it with an additional element would clearly achieve high reproductive success. Sexual selection thus provides the best explanation for the evolution of
Kin Selection and “Mother Tongues”
287
complex displays in many contexts, especially birdsongs and the songs of humpback whales. This basic and well-known idea (Fisher, 1930) has been rediscovered with much fanfare by evolutionary psychologists (e.g., Miller, 2001). However, the sexual selection hypothesis of language provides only an explanation of imitation and complexity, but no rationale for cheap honesty. Indeed, given the often opposing interests of males and females in the context of mate choice, it is difficult to see how cheap honesty could persist in such a system at all. Furthermore, there are two critical problems with the sexual selection model for the evolution of language. First, sexually selected traits are typically dimorphic, with the displaying sex expressing the traits to a much greater degree than the “choosing” sex. In most vertebrates, the displaying sex is male (rare exceptions include polyandrous birds or frogs), and the choosy sex is female. This is the situation virtually universally in mammals and in primates: Males are larger and more competitive in mating contexts, and display secondary sexual display characteristics to a greater degree than do females. If language originated in sexually selected displays, we would thus expect human males to have more highly developed linguistic capacities than females. In fact, just the opposite is the case: All available data suggests that where they differ, female linguistic abilities exceed those of males (see Henton, 1992, for a review). Language abilities develop sooner in girls than in boys; women have larger vocabularies than men, and surpass men at tongue twisters and other tests of speech abilities. Speech abnormalities such as stuttering, dyslexia, and autism afflict males much more frequently than females. Second, and more glaringly, most sexual display characters in the animal kingdom arise at puberty, concomitant with their initial utility in sexual maturity and the onset of mating. This is clearly not the case with human language, which is remarkable for its precocity. Human linguistic skills are already impressive at 1.5 years, and in fact begin a steady decline at puberty, quite the opposite of what sexual selection would predict (or what we see in songbirds or whales). These discrepancies between the predictions of sexual selection theory and the facts of human language suggest the need for an alternative hypothesis for the selective value of human language. We can, of course, posit a two-stage model in which complexity for its own sake was selected first (e.g., as “song” in the mate choice context: Darwin, 1871) and then honesty was added in a later stage of evolution (e.g., in the mother tongue context, as suggested here). Such a model seems both plausible and consistent with a fair amount of the evidence concerning human evolution. However, even if generative complexity was “jumpstarted” by sexual selection, it is unlikely that language achieved its present infant-onset, female-biased status due to sexual selection alone. These aspects of language are much more understandable if a key selective advantage of language is the transfer of information between kin, particularly parents and their offspring.
288
W. Tecumseh Fitch
Language as a Kin-Selected Communication System The capacity of human language to honestly communicate arbitrarily complex information seems to be unique among known life forms. I hypothesize that the conditions for the evolution of these unusual characteristics are best met by positing that human language evolved as a “mother tongue”—a communication system used among kin, especially (though not exclusively) between parents and their offspring. Because the physiological costs of communicating via speech are low, the benefits of the information shared need not be particularly high for such a system to satisfy Hamilton’s inequality and increase inclusive fitness. In turn, the use of a system to communicate among kin can avoid both runaway Machiavellian deceit and wasteful Zahavian handicaps. In animals with very long childhoods, such as humans, there is a potential value to the entire store of information that parents or other relatives have accumulated, meaning that there is continual pressure for increased complexity to transmit more detailed information. Thus, mother tongues could provide ideal circumstances for the three capabilities described above as central to human language. From the viewpoint of kin selection theory, the evolution of honest communication between kin requires the satisfaction of Hamilton’s inequality, C < Br. The costs to the signaler must be less than the benefits to the receiver, discounted by the coefficient of relatedness. For the current discussion, benefits are in terms of the bits of information acquired by the recipient of a signal, relative to the cost of acquiring the same information by unaided trial-and-error learning. It should be noted that this is quite different from the benefits in some other communication systems, which are often in terms of noninformational resources like food (in the case of begging calls) or mating (in the case of advertisement.) The physiological costs of human speech are so low as to be nearly unmeasurable (Russell et al., 1998). Given the physiology of human speech, this is not surprising: The motive power for speech is the elasticity of the lungs, which drives a stream of expired air that fuels vocal fold vibrations and the conversion of airflow into acoustic energy (Lieberman and Blumstein, 1988; Titze, 1994). Since air must be inspired into the lungs to sustain life, the source of energy for speech is part of resting metabolism, not a cost of vocalization. Thus, the physiological costs of spoken language are unlikely to be a major factor in the evolution of spoken language. In the evolution of language, perhaps the most significant cost was due to unintended sharing of information with competitors. While the benefit of informing one’s kin of the location of a new food source might be great, it could easily be offset if this information was also shared with many unrelated competitors. This would put pressure on signalers to discriminate kin from nonkin competitors. Earlier authors, starting with Darwin, remarked on the possibility that sexual selection might lead to complex “displays” like those of human languages, by analogy with
Kin Selection and “Mother Tongues”
289
birdsong. Mother tongues provide an alternative selective force that could underlie the generation of complexity: the need to communicate arbitrarily complex ideas. We can assume our common ancestor with chimpanzees had a complex conceptual store (of a sort that is present in chimpanzees and many other primates, including information about past events, distant locations, both transitory and permanent characteristics of individuals, etc.). In the mother tongue context, it is also reasonable to assume that much of this information would be valuable to relatives if it could somehow be transmitted. Thus, once a communication system of this sort was in place, each small increment of complexity (e.g., enlarged lexical capacity, speed of transmission or acquisition, or syntactic disambiguation) would correspond to an increment in the efficacy of information transmission, and thus in inclusive fitness. The mother tongue hypothesis thus entails a selective force toward increasing complexity. The limits to which this selective force might push the system are determined again by Hamilton’s inequality, and as long as the costs are very low, even quite small and incremental benefits could be selected for (up to the theoretical limit of the complexity of conceptual structures of both communicators). In contrast to sexual selection, the predictions of this kin selection theory fit quite nicely with the facts of human language competence. First, it is clear that mother tongues select for early competence on the part of offspring: The earlier a child’s language competence comes on-line, the greater the benefit (both in immediate survival and in the sum total of information transferred during childhood). Thus the remarkably early age at which children begin acquiring language, which is sharply discrepant with the predictions of sexual selection theory, makes perfect sense from the kin communication viewpoint. Second, given the primary role of females as caretakers, it is unsurprising that language abilities should be more developed in females than in males. Of course, male children must communicate with their mothers, so we wouldn’t expect mother tongues to be exclusively female. Further, humans are unusual among primates in having a significant amount of paternal care, and thus adult male competence in language is necessary. However, if one sex were to selectively suffer language deficits, the theory predicts it would be males—as it is. Finally, the propensity of children to communicate with one another (particularly siblings) is again predicted by kin communication but not by sexual selection theory. Reciprocal Altruism and Communication Among Nonkin A glaringly obvious difficulty for the mother tongues hypothesis is that today, language is not used exclusively or even predominantly to communicate among kin. Although the hypothesis does a good job of accounting for the origin of language, it clearly cannot account for this facet of contemporary usage. I view the frequent exchange of
290
W. Tecumseh Fitch
information among nonkin as an example of reciprocal altruism (Trivers, 1971). By this hypothesis, linguistic information exchange is but another example of the pervasive propensity for social exchange and monitoring that appears to typify the human species, making us the preeminent practitioners of reciprocal altruism. Once language capabilities had evolved, via kin selection, to the level where valuable information could be exchanged at low cost, it provided the additional possibility for such exchange among unrelated individuals who spoke similar dialects. Among total strangers, little exchange of useful or truthful information is predicted, and discussions of the current weather or other platitudes should abound. In contrast, in stable social groups and especially among mates, the requirements for adaptive social exchange among familiar nonrelatives are often met, and the attendant mechanisms for cheater detection and punishment (especially gossip and “reputation”: Dunbar, 1996) could develop via cultural evolution. No obvious genetic changes in the language capacity appear to be necessary for the transformation of language from kin-biased to the abundant nonkin uses of language we observe today (where language is intimately intertwined with all aspects of human behavior, from aggression to courtship). Note that kin communication thus provides a plausible evolutionary route to reciprocal information sharing among nonkin, but the converse is not true. Mother Tongues and Dialects Finally, the mother tongues hypothesis provides an account for the otherwise curious fact that language seems more complex than necessary for communication, in the sense that our ability to recognize regional or class dialects far exceeds the needs of semantic communication. Great English-language authors like Vladimir Nabokov or Joseph Conrad (born in Russia and Poland, respectively), despite enormous grasp and fluid command of the English language, still spoke with an accent and sounded detectably “foreign” to native English speakers. Highly sophisticated concepts are exchanged between native and nonnative speakers on a daily basis throughout the world, but these speakers nonetheless perceive each other’s speech as distinctly different. Thus, the characteristics of language are such that speech provides not just the intended information but also dialectal information about the provenance of the speaker. These everyday facts are inexplicable if human language evolved solely for purposes of semantic communication: Why should our phonological system be more complex than needed to efficiently transmit propositional information? I suggest that the existence of such extra-propositional dialectal variation increases the ability of kin to recognize each other. This enables even distant kin that have not met to recognize one another and to share information, allowing selective transmission of valuable information among distant kin.
Kin Selection and “Mother Tongues”
291
By enlarging the store of knowledge available in the kin group, this provides positive feedback by increasing the benefit of the other two aspects of the system. Beyond a certain critical mass of communicators, the large shared pool of information made available to a widely extended kin group becomes a highly adaptive resource, and almost irresistibly selected for. However, once a kin-based system for distributing favors arises, it is always susceptible to cheaters who masquerade as kin, receive benefits, but do not return the favor. This is known as the “free rider” problem, and has been extensively discussed (Enquist and Leimar, 1993; Dunbar, 1996). I join previous researchers (Nettle and Dunbar, 1997) in suggesting that an ever-changing dialect, mastered at an early age, could provide a reasonably reliable (though not perfect) marker of distant kinship and thus help circumvent the free rider problem. The basic idea here is simple: If offspring slavishly imitate the details of their parents’ and siblings’ pronunciation, and cease imitating before “leaving home,” they will be branded for life with a mark of their family background. In the large, fluid societies we live in today, with huge numbers of people in constant contact, this “brand” is called a regional and/or class dialect. In the much smaller, more closely knit populations that characterized human evolution until 10,000 years ago, the information carried by a dialect would have been specific to the social group and probably to the specific kin group in which a child was raised. Later in life, individuals who could recognize their native dialect, and share information preferentially with those who spoke it, would often be behaving preferentially to kin. Thus, dialects could provide a means for kin recognition, boosting the power of kin selection. This could lead to preferential exchange of information (in the lowest-cost case), and perhaps preferential exchange of other resources (food, shelter, coalitionary aid, etc.) as well. A nonhuman example of kinship calls may be the signature system of bottle-nosed dolphins (Sayigh et al., 1990). A dialectal indicator of kinship is admittedly imperfect: Unrelated orphans raised by a family would be treated as kin by this system. This is no different from the various mechanisms of kin recognition known in animals. For instance, mammalian mothers typically learn to recognize their infants immediately after birth by their smell, and can be duped into adopting a foreign infant by being presented with it just after birth (Klopfer and Klopfer, 1968). Young birds come to recognize their parents, and species, via imprinting (which can go awry when ethologists like Konrad Lorenz are the first organisms observed (Bolhuis, 1991). Mammals normally use proximity in early childhood as an indicator of kinship, refusing to mate with animals they were raised with (regardless of actual genetic relationship) (Michener, 1974; Walters, 1987). None of these systems is perfect, but all work well enough that their benefits overwhelm occasional errors. By increasing the size of the kin group that can be preferentially communicated with, such a system of kin recognition extends the kin group to include half sibs, aunts and
292
W. Tecumseh Fitch
uncles, grandparents, and others. Such a system would apply nicely to nomadic hominids who encountered each other for the first time, and reliably recognized each other’s dialect as familiar. This expansion would increase the store of information available to any speaker of a dialect, increase the value of the mother tongue in general, and put additional pressure for complexity and open-endedness on the system. Thus, if dialects are used to recognize kin, selective pressure for slavish imitation and generative complexity is increased. This aspect of the hypothesis predicts that informative language should be used preferentially among kin in hunter-gatherer societies or other traditional societies, and that in such traditional settings, people should interact more honestly and favorably with those who share their dialect. In what little data is available, this prediction appears to be confirmed (Nettle and Dunbar, 1997). Conclusion To summarize, I have argued that some of the most important characteristics of human language, the combination of which sets language apart from all other known communication systems, are explicable in the context of kin-selected communication systems, or mother tongues. The mother tongue hypothesis—that language developed primarily in a context of kin communication—provides both a good overall fit to much of the existing data and a solution to some serious problems left standing by other models. The advantage of this posited selective force over the sexual selection assumed by earlier workers is twofold. First, it accords with the fact that language learning begins in early childhood rather than at puberty, and second, it is expressed in both sexes rather than preferentially in males. Of course, no single hypothesized function will ever explain all aspects of human language. I have suggested, for example, that the descent of the human larynx may have originally been driven by selection to exaggerate body size, and only later has been exapted for its use in expanding phonetic range (Fitch and Reby, 2001; Fitch, 2002). Thus, I do not offer the mother tongue hypothesis as a total functional explanation, and indeed I very much doubt that such a Holy Grail exists. I do think, however, that the theoretical advantages of a kin-selected communication system for explaining precisely those aspects of language that set it apart from other communication systems should be recognized and, I hope, further explored by others and incorporated into future theoretical treatments. Acknowledgments I thank Robin Dunbar, Marc Hauser, Rufus Johnstone, Kim Oller, Eric Nicolas, David Raubenheimer, and Charles Snowdon for their comments on earlier versions of this manuscript.
Kin Selection and “Mother Tongues”
293
References Adams ES, Caldwell RL (1990) Deceptive communication in asymmetric fights of the stomatopod crustacean Gonodactylus bredini. Anim Behav 39: 706–716. Barash DP (1976) Social behavior and individual differences in free living alpine marmots (Marmota marmota). Anim Behav 24: 27–35. Bergstrom CT, Lachmann M (1998a) Signaling among relatives. III Talk is cheap. Proc Nat Acad Sci USA 95: 5100–5105. Bergstrom CT, Lachmann M (1998b) Signalling among relatives. I Is costly signalling too costly? Phil Trans Roy Soc London 352: 609–617. Bickerton D (1995) Language and Human Behavior. Seattle: University of Washington Press. Bickerton D (1998) Catastrophic evolution: The case for a single step from protolanguage to full human language. In: Approaches to the Evolution of Language (Hurford JR, Studdert-Kennedy M, Knight C, eds.), 341–358. New York: Cambridge University Press. Bolhuis JJ (1991) Mechanisms of avian imprinting: A review. Biol Rev 66: 303–345. Briskie JV, Naugler CT, Leech SM (1994) Begging intensity of nestling birds varies with sibling relatedness. Proc Roy Soc London B258: 73–78. Byrne RW, Whiten A (1988) Machiavellian Intelligence: Social Expertise and the Evolution of Intellect in Monkeys, Apes and Humans. Oxford: Clarendon Press. Chappell MA, Zuk M, Kwan TH, Johnsen TS (1995) Energy costs of an avian vocal display: Crowing in red junglefowl. Anim Behav 49: 255–257. Cheney DL, Seyfarth RM (1981) Selective forces affecting the predator alarm calls of vervet monkeys. Behaviour 76: 25–61. Darwin C (1871) The Descent of Man and Selection in Relation to Sex. London: John Murray. Dawkins MS, Guilford T (1991) The corruption of honest signalling. Anim Behav 41: 865–873. Dawkins R, Krebs JR (1978) Animal signals: Information or manipulation? In: Behavioural Ecology (Krebs JR, Davies NB, eds.), 282–309. Oxford: Blackwell Scientific Publications. Deacon TW (1997) The Symbolic Species: The Co-evolution of Language and the Brain. New York: Norton. Dessalles J-L (1998) Altruism, status and the origin of relevance. In: Approaches to the Evolution of Language (Hurford JR, Studdert-Kennedy M, Knight C, eds.), 130–147. New York: Cambridge University Press. Dunbar R (1996) Grooming, Gossip and the Evolution of Language. Cambridge, Mass.: Harvard University Press. Dunbar RIM (1993) Coevolution of neocortical size, group size and language in humans. Behav Brain Sci 16: 681–735. Dunford C (1977) Kin selection for ground squirrel alarm calls. Amer Nat 111: 782–785. Enquist M, Leimar O (1993) The evolution of cooperation in mobile organisms. Anim Behav 45: 747–757. Fisher RA (1930) The Genetical Theory of Natural Selection. Oxford: Clarendon Press. Fitch WT (1997) Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. J Acous Soc Amer 102: 1213–1222. Fitch WT (1999) Acoustic exaggeration of size in birds by tracheal elongation: Comparative and theoretical analyses. J Zool (London) 248: 31–49. Fitch WT (2000) The evolution of speech: A comparative review. Trends Cognit Sci 4: 258–267. Fitch WT (2002) Comparative vocal production and the evolution of speech: reinterpreting the descent of the larynx. In: The Transition to Language (Wray A, ed.), 21–45. Oxford: Oxford University Press. Fitch WT, Giedd J (1999) Morphology and development of the human vocal tract: A study using magnetic resonance imaging. J Acous Soc Amer 106: 1511–1522.
294
W. Tecumseh Fitch
Fitch WT, Hauser MD (2002) Unpacking “honesty”: Vertebrate vocal production and the evolution of acoustic signals. In: Acoustic Communication (Simmons A, Fay RR, Popper AN, eds.), 65–137. New York: Springer-Verlag. Fitch WT, Reby D (2001) The descended larynx is not uniquely human. Proc Roy Soc London B268: 1669–1675. Godfray HCJ (1991) Signalling of need by offspring to their parents. Nature 352: 328–330. Grafen A (1979) The hawk–dove game played between relatives. Anim Behav 27: 905–907. Grafen A (1990a) Biological signals as handicaps. J Theoret Biol 144: 517–546. Grafen A (1990b) Sexual selection unhandicapped by the Fisher process. J Theoret Biol 144: 473–516. Haldane JBS (1955) Population genetics. New Biol 18: 34–51. Hamilton WD (1964a) The evolution of altruistic behavior. Amer Nat 97: 354–356. Hamilton WD (1964b) The genetical evolution of social behavior. J Theoret Biol 7: 1–52. Haskell D (1994) Experimental evidence that nestling begging incurs a cost due to nest predation. Proc Roy Soc London B257: 161–164. Hauser M, Chomsky N, Fitch WT (2002) The language faculty: What is it, who has it, and how did it evolve? Science 298: 1569–1579. Henton C (1992) The abnormality of male speech. In: New Departures in Linguistics (Wolf G, ed.). New York: Garland. Hinde RA (1981) Animal signals: Ethological and games-theory approaches are not incompatible. Anim Behav 29: 535–542. Hoogland JL (1983) Nepotism and alarm calling in the black-tailed prairie dog (Cynomys ludovicianus). Anim Behav 31: 472–479. Horn AG, Leonard ML, Weary DM (1995) Oxygen consumption during crowing by roosters: Talk is cheap. Anim Behav 50: 1171–1175. Humphrey NK (1976) The social function of intellect. In: Growing Points in Ethology (Bateson PPG, Hinde RA, eds.), 303–317. Cambridge: Cambridge University Press. Jackendoff R (1999) Possible stages in the evolution of the language capacity. Trends Cognit Sci 3: 272–279. Jackendoff R (2002) Foundations of Language. New York: Oxford University Press. Janik VM, Slater PB (1997) Vocal learning in mammals. Adv Study Behav 26: 59–99. Johnstone RA, Grafen A (1992a) The continuous Sir Philip Sidney game: A simple model of biological signalling. J Theoret Biol 156: 215–234. Johnstone RA, Grafen A (1992b) Error-prone signalling. Proc Roy Soc London B248: 229–233. Klopfer PH, Klopfer MS (1968) Maternal imprinting in goats: Fostering of alien young. Zeit Tierpsych 25: 862–866. Klump GM, Shalter MD (1984) Acoustic behaviour of birds and mammals in the predator context. I. Factors affecting the structure of alarm signals. II. The functional significance and evolution of alarm signals. Zeit Tierpsych 66: 189–226. Knight C (1998) Ritual/speech coevolution: A solution to the problem of deception. In: Approaches to the Evolution of Language (Hurford JR, Studdert-Kennedy M, Knight C, eds.), 68–91. New York: Cambridge University Press. Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 21: 1–36. Lieberman P (1984) The Biology and Evolution of Language. Cambridge, Mass.: Harvard University Press. Lieberman P (2000) Human Language and Our Reptilian Brain: The Subcortical Bases of Speech, Syntax and Thought. Cambridge, Mass.: Harvard University Press. Lieberman P, Blumstein SE (1988) Speech Physiology, Speech Perception, and Acoustic Phonetics. Cambridge: Cambridge University Press.
Kin Selection and “Mother Tongues”
295
Lightfoot D (1991) Subjacency and sex. Lang Commun 11: 67–69. Losey GS, Stanton FG, Telecky TM, Tyler WA (1986) Copying others, an evolutionarily stable strategy for mate choice: A model. Amer Nat 128: 653–664. MacNeilage PF (1998) The frame/content theory of evolution of speech production. Behav Brain Sci 21: 499–546. Marler P (1968) Visual signals. In: Animal Communication: Techniques of Study and Results of Research (Sebeok TA, ed.). Bloomington: Indiana University Press. Matthews GVT (1968) Bird Navigation. Cambridge: Cambridge University Press. Maynard Smith J (1965) The evolution of alarm calls. Amer Nat 99: 59–63. Maynard Smith J (1976) Sexual selection and the handicap principle. J Theoret Biol 57: 239–242. Maynard Smith J (1978) Optimization theory in evolution. Ann Rev Ecol Systemat 9: 31–56. Maynard Smith J (1991) Honest signalling: The Philip Sydney game. Anim Behav 42: 1034–1035. Maynard Smith J (1994) Must reliable signals always be costly? Anim Behav 47: 1115–1120. Maynard Smith J, Harper DGC (1995) Animal signals: Models and terminology. J Theoret Biol 177: 305–311. McCarty JP (1996) The energetic costs of begging in nestling passerines. Auk 113: 178–188. Michener GR (1974) Development of adult–young identification in Richardson’s ground squirrel. Devel Psychobiol 7: 375–384. Miller GF (2001) The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature. New York: Doubleday. Møller AP (1988) False alarm calls as a means of resource usurpation in the great tit, Parus major. Ethology 79: 25–30. Møller AP (1990) Deceptive use of alarm calls by male swallows, Hirundo rustica: A new paternity guard. Behav Ecol 1: 1–16. Munn C (1986) Birds that “cry wolf.” Nature 319: 143–145. Nettle D, Dunbar R (1997) Social markers and the evolution of reciprocal exchange. Curr Anthropol 38: 93–99. Nowak M, Komarova NL, Niyogi P (2001) Evolution of universal grammar. Science 291: 114–118. Nowak MA, Krakauer DC, Dress A (1999) An error limit for the evolution of language. Proc Roy Soc London B266: 2131–2136. Owings DH, Hennessy DF (1984) The importance of variation in sciurid visual and vocal communication. In: The Biology of Ground-Dwelling Squirrels (Murie JO, Michener GR, eds.), 169–200. Lincoln: University of Nebraska Press. Payne R, McVay S (1971) Songs of humpback whales. Science 173: 583–597. Pinker S, Bloom P (1990) Natural language and natural selection. Behav Brain Sci 13: 707–784. Russell BA, Cerny FJ, Stathopoulos ET (1998) Effects of varied vocal intensity on ventilation and energy expenditure in women and men. J Speech, lang, Hearing Res 41: 239–248. Sayigh LS, Tyack PL, Wells RS, Scott MD (1990) Signature whistles of free-ranging bottlenose dolphins Tursiops truncatus: Stability and mother–offspring comparisons. Behav Ecol Sociobiol 26: 247–260. Seyfarth RM, Cheney DL (1997) Behavioral mechanisms underlying vocal communication in nonhuman primates. Anim Learn Behav 25: 249–267. Seyfarth RM, Cheney DL, Marler P (1980) Monkey responses to three different alarm calls: Evidence of predator classification and semantic communication. Science 210: 801–803. Sherman P (1977) Nepotism and the evolution of alarm calls. Science 197: 1246–1253. Sherman PW (1980) The meaning of nepotism. Amer Nat 116: 604–606. Sherman PW (1985) Alarm calls of Belding’s ground squirrels to aerial predators: Nepotism or selfpreservation? Behav Ecol Sociobiol 17: 313–323.
296
W. Tecumseh Fitch
Siller S (1998) A note on errors in Grafen’s strategic handicap models. J Theoret Biol 195: 413–417. Smith SF (1978) Alarm calls, their origin and use in Eutamias sonomae. J Mammal 59: 888–893. Studdert-Kennedy M (1998) The particulate origins of language generativity: From syllable to gesture. In: Approaches to the Evolution of Language (Hurford JR, Studdert-Kennedy M, Knight C, eds.), 202–221. New York: Cambridge University Press. Tinbergen N (1963) On aims and methods of ethology. Zeit Tierpsych 20: 410–433. Titze IR (1994) Principles of Voice Production. Englewood Cliffs, N.J.: Prentice-Hall. Trivers RL (1971) The evolution of reciprocal altruism. Quart Rev Biol 46: 35–57. Walters J (1987) Kin recognition in non-human primates. In: Kin Recognition in Animals (Fletcher DJC, Michener CD, eds.), 359–394. New York: Wiley. Wiley RH (1994) Errors, exaggeration, and deception in animal communication. In: Behavioral Mechanisms in Ecology (Real L, ed.), 157–189. Chicago: University of Chicago Press. Wilkinson GS (1984) Reciprocal food sharing in the vampire bat. Nature 308: 181–184. Wilson EO (1972) Animal communication Scientific American. 227: 52–60. Wilson EO (1975) Sociobiology. Cambridge, Mass.: Harvard University Press. Zahavi A (1975) Mate selection: A selection for a handicap. J Theoret Biol 53: 205–214. Zahavi A (1977) The cost of honesty (further remarks on the handicap principle). J Theoret Biol 67: 603–605. Zahavi A (1993) The fallacy of conventional signalling. Proc Roy Soc London B340: 227–230.
16
Language beyond Our Grasp: What Mirror Neurons Can, and Cannot, Do for the Evolution of Language
James R. Hurford Before trying to construct scenarios of language origin and evolution based on MNS [mirror neurons or mirror neuron system] we must take care to analyse properly the nature of MNS itself. —Stamenov (2002)
And, I would add, we must also take care to analyze properly the nature of language itself. Several recent papers (Rizzolatti and Arbib, 1998; Arbib, 2001, 2002) suggest that the discovery of mirror neurons helps us to understand in more detail how human language evolved. The present chapter tries to explore and continue further the development of ideas in these papers. Two main issues raised by mirror neurons are addressed in the first two sections of the chapter. The first section, “Mirror Neurons and Linguistic Signs,” aims to correct a possibly widespread misunderstanding of the significance of mirror neurons, as reflected in the following journalistic passages. USC’s Michael A. Arbib, Ph.D., says “the neurons, located in the premotor cortex just in front of the motor cortex, are a mechanism for recognizing the meaning of actions made by others. . . . For communication to succeed, both the individual sending a message and the individual receiving it must recognize the significance of the sender’s signal. Mirror neurons are thus the missing link in the evolution of language. They provide a mechanism for the sharing of meaning.” (ScienceDaily, August 20, 1998, quoted in USC Trojan Family, Spring 1999) Rizzolatti and Arbib think that mirror neurons may have provided the bridge from “doing” to “communicating.” The relationship between actor and observer may have developed into one involving the sending and receiving of a message. In all communication the sender and receiver have to have a common understanding about what’s passing between them. Could mirror neurons explain how this is achieved? Rizzolatti and Arbib think the answer is yes. (New Scientist 169 [January 27, 2001]: 22).
Popular reporters often exaggerate and oversimplify. In academic publications, authors reporting work on mirror neurons are usually more cautious, avoiding direct claims about meaning and communication. But even in the academic literature one can find claims that mirror neurons provide access to meaning: “[A]udiovisual mirror neurons code abstract contents—the meanings of actions” (Kohler et al., 2002: 846). “[T]he mirror neuron system is preprogrammed (via values and biases) to be able to read [i.e., extract the meanings of] these emotional expressions [of loneliness and sadness]” (Wolf et al., 2001: 108). This first section first outlines areas of agreement with the main mirror neuron researchers on the significance of mirror neurons for the understanding of language. But I then argue that mirror neurons cannot give us any new insight into one of the most crucial features of language, the meanings of signs.
298
James R. Hurford
The second section, “Mirror Neuronlike Structures Are Probably Common,” also seeks to correct a widespread misconception which attributes very special status to the discovery of mirror neurons, as reflected in the “missing link” remark above, and in such claims as the following, by a distinguished neuroscientist (V. S. Ramachandran): “I predict that mirror neurons will do for psychology what DNA did for biology. They will provide a unifying framework and help explain a host of mental abilities that have hitherto remained mysterious” (New Scientist 169 [January 27, 2001]: 22). This section attempts to locate the concept of mirror neurons within the wider context of behavioral and brain mechanisms, only some of which may be involved in communication. It is argued that mirror neurons are simply a special case of mechanisms that are widespread and well known. Mirror Neurons and Linguistic Signs Definition A mirror neuron is a neuron that fires both when performing an action and when observing the same action performed by another (possibly conspecific) creature. The classic case is that of neurons in a macaque which fire both when the monkey grasps a nut and when it sees a human grasp a nut. The discovery of mirror neurons has important implications for the evolution of language, suggesting preexisting brain structure which could have provided a basis for human language. I concede for present purposes that mirror neurons can plausibly be shown to provide a basis for linguistically exploitable representations of both sounds (or gestures) and meanings, although a great deal of work is still needed in fleshing out such a claim satisfactorily. But my argument in this section is that mirror neurons cannot, by their very nature, provide a basis for the central, essential structural relation in human language: the bidirectional arbitrary mapping between sounds and meanings inherent in the Saussurean sign, as traditionally diagrammed (see figure 16.1). The relation between the signified concept and the signifying sound image is arbitrary; there is nothing in the pronunciation of the word that in any way resembles the denoted concept. Both ends of the sign relation are internal mental representations. The “meaning” end of the relation is a mental entity (a concept or a “sense”), not a referent object, action, or event in the real world. And the “sound” end of the relation is not an actual utterance or an articulatory/acoustic event located in space-time, but a schematic representation of a class of such events. Language is naturally homogeneous: it is a system of signs in which the sole essential is the union of meaning and sound image, and in which both these parts of the sign are equally psychological in nature. (Saussure, 1916; my translation)
Language beyond Our Grasp
299
Figure 16.1 An example of a Saussurean sign, an arbitrary bidirectional mapping between a concept and a sound image. The upper part represents the concept of an apple, and the lower part the sound image for the word “apple.”
Descriptions in nonneural terms such as “meaning,” “concept,” “sense,” “representation,” and “sound image” can be interpreted in neural terms, as discussed below. Mirror Neurons May Explain Speech Imitation First, concerning the sound image, the existence of mirror neurons is consistent with two somewhat similar theories of speech perception, the motor theory of speech perception (Liberman, 1957; Liberman et al., 1967; Liberman and Mattingly, 1985) and the articulatory filter hypothesis, proposed by Vihman (1993, 2002). The motor theory of speech perception, originating long before mirror neurons were discovered, holds that the mental representation of perceived speech is in terms of motor articulatory categories, as opposed to acoustic categories (which might seem more likely, since the input to the ear is acoustic). If perception of grasping involves some neurons which are also involved in the performance of grasping, it lends some plausibility to the idea that perception of a particular spoken sound involves neurons which are also involved in the performance of speaking that sound. In these terms, the “sound” end of the sign relation can be conceived as the intersection of a motor schema and a sensory (auditory) schema. Motor schemata are configurations of neurons that, when activated, produce recognizable, specific bodily movements. And a sensory schema is a configuration which, when activated, produces an image of something in the mind. And activation can be
300
James R. Hurford
halfhearted, as when we just imagine hearing a word or imagine pronouncing it. So the motor theory of speech perception implies, on this “intersection” view, mirror neurons in the phonetic/phonological representations of words. A theory such as the motor theory of speech perception could, if true, solve a problem in language learning. Children are able to imitate the speech sounds they hear; that is, they somehow know how to configure their own vocal tracts so as to produce an auditory impression similar to what they hear, even though the raw information reaching the ear is purely acoustic and not articulatory. Given the motor theory of speech perception, one can understand how a prelinguistic child can so easily learn to imitate the sound of a word; the acoustic signal is transformed automatically by the ear and brain to a representation at least partly expressed in terms of the articulatory movements required for repronouncing the word. This link between mirror neurons and the motor theory of speech perception has been emphasized in the literature discussing the potential of the new discoveries to illuminate language evolution (Gallese et al., 1996; Rizzolatti and Arbib, 1998; Skoyles, 1998). The motor theory has certainly not gained wide acceptance, although neither has it been relegated to the historic wastebasket. Perception always has to start with a discriminatory event that is not motoric in nature, but it is conceivable that automatic motor responses to certain percepts could have evolved. A theory resembling the motor theory of speech perception, though different from it in detail, and possibly overcoming its problems, is the “articulatory filter hypothesis,” proposed by Vihman (1993). “On this account, the experience of frequently producing CV [consonant-vowel] syllables sensitizes infants to similar patterns in the input speech stream” (Vihman, 2002: 310). What is common to both theories is the idea that there is some articulatory (i.e., motor) component to children’s representations of speech sounds. Vihman (2002) notes the support given to her hypothesis by the discovery of mirror neurons. Liberman’s motor theory hypothesizes a strong innate component of the perceptuomotor representation of speech sounds, whereas according to Vihman’s articulatory filter hypothesis, the child acquires such representation through experience of its own babbling behavior. Westermann and Miranda (2002) provide an elegant computer model of the process whereby “mirror neurons responding to both auditory and visual stimuli can develop” (p. 275), based on such feedback from babbling. It is not known whether the mirror neurons in the original experimental monkeys develop in ontogeny influenced by the monkey’s experience of its own grasping gestures, or whether they are epigenetically programmed to develop in any case, regardless of experience. The innate/acquired issue will concern us again in the conclusion of this chapter.
Language beyond Our Grasp
301
Mirror Neurons May Aid Concept Representation Likewise the “sense,” concept, or meaning end of the sign relation is neurally the pattern of activation that constitutes the “bringing to mind” of a particular concept. Here, too, in the conceptual domain, there are probably aspects of mirror neuron organization. The central mirror neuron results can most obviously be applied to the mental representations of bodily actions. For instance, if humans are organized in this respect like macaques, the mental representation of the concept GRASP/GRASPING involves some neurons that are involved both in the act of grasping and in the observation of grasping. So thinking of grasping (either by oneself or by someone else) activates these mirror neurons. Similarly, it seems likely that a representation of the concept WALK/WALKING will involve mirror neurons involved both in the observation and the performance of walking. (See the discussion of spontaneous imitative responses in humans in the next section.) Mirror neurons are, by definition, involved only in the representations of actions, such as grasping and walking. Therefore, adhering to a narrow definition of mirror neuron, we cannot claim that the mental representations of objects, such as apples and screwdrivers, involve mirror neurons. Apples and screwdrivers are not actions. But it seems likely that representations of objects involve some congruence between motor and sensory neurons similar to that found in the representations of actions. Attending to or acting on a real apple in an appropriate way, or imagining an apple, involves bringing to mind the concept of an apple. The mental representations of tools involve areas of motor cortex appropriate for handling them, besides sensory information about what the tools look like (Martin et al., 1996). It is hard to dissociate the passive manual feel of an object from active knowledge of what to do with it. Similarly, one’s concept of, say, an apple includes motor information about how to hold it and bite it, as well as sensory information about what it looks/tastes/smells like (Fadiga et al., 2000; Murata et al., 1997). (You don’t have to buy this “mirror” aspect of representations of concepts for the further argument below to go through.) Mirror Neurons Cannot Facilitate Sign Learning So far, my point has been to agree that the discovery of mirror neurons is a step forward in our understanding of the evolution of human language. I have given a rough characterization in neural terms of the two ends of the Saussurean sign arguments, the sound and the meaning, before even considering the bidirectional relation between them. We might express the Saussurean sign in neural terms as in figure 16.2. But what, in neural terms, might the bidirectional relation between the concept and the sound image be? The well-known “arbitrariness of the sign” implies that in the general case there is no overlap between the neurons involved in the representation of the meaning
302
James R. Hurford
Figure 16.2 A Saussurean sign represented in neural terms.
and those involved in the representation of the sound. The pronunciation of the word “apple” bears no resemblance whatsoever, in sensory or motor affordances, to apples. The prelinguistic child may well have a fairly solid concept of the category APPLE, being able to interact in appropriate ways with apples. This concept, of course, involves some neurons. And the child, as argued above, can of course also represent the sound image of the word “apple,” also using neurons. But before the learning of the sound-meaning connection, there is no preestablished overlap between the neurons involved in the concept and the neurons involved in the sound image. This holds generally for all words except marginal onomatopeic words. So mirror neurons cannot be seen as helping to account for the extreme facility shown by humans in learning the vocabulary of their native language. There can be no doubt that the extreme facility for learning arbitrary sound-meaning mappings is a specifically human trait. Although trained apes can acquire vocabularies of a few hundred symbols, this is often achieved only with quite laborious training. Even when the learning is somewhat spontaneous, as in Kanzi’s case (Lyn and Savage-Rumbaugh, 2000; Savage-Rumbaugh et al., 1986), the ape’s eventual vocabulary is orders of magnitude smaller than an adult human’s. Adult humans typically have vocabularies in the tens of thousands. The process of a human infant’s acquisition of vocabulary has been labeled “fast mapping” (Carey, 1978; Carey and Bartlett, 1978), reflecting the fact that astonishingly few exposures (sometimes just one!) are needed to learn a word and its meaning. And, as argued above, it cannot be the case that preexisting mirror neurons facilitate this process.
Language beyond Our Grasp
303
Mirror Neuronlike Structures Are Probably Common This section will argue that a wide range of animal behaviors probably involve arrangements more or less like mirror neurons, depending on how far one is prepared to stretch the term. It will become apparent that a natural definition of “mirror neuron” should be somewhat elastic or fuzzy. It is useful to regard mirror neurons as constituting a fuzzy set rather than a precisely defined class. There are prototypical, clear central cases of mirror neuronlike arrangements, and there are cases partially resembling them in relevant ways. “Automatic” Behavior Possibly Reflecting Mirror Neuron Structure It will be useful to begin by considering behaviors which are involuntary, either automatic, reflex, or innate. The focus will be on responses to perceived stimuli that are fast, robust, and hardly subject to suppression or inhibition. Because of the speed of the response to the perceived stimulus and the near impossibility of inhibition or suppression, these present clear cases where sensory and motor mechanisms are in a tight linkage. An immediate, automatic response to a stimulus is, by definition, an action performed when perceiving the stimulus. Action and perception are not absolutely instantaneous; each happens over some brief interval, which we shall call the perception interval and the motor interval. The onset of the perception interval slightly precedes the onset of the motor interval, but with rapid responses, the two intervals will overlap, and one can look for the possibility of the same neurons being involved in both the perception and the performance of “the same action.” Consider schooling fish and flocking birds. A school of fish appears to act as a single elastic body, with all the member fish swerving uniformly in the same direction—left, right, upward, and downward. We do not know what specific neurons in the fishes’ brains are involved. Computer modeling of schooling and flocking behavior (Reynolds, 1987; Toner and Tu, 1998) shows that it is possible to account for it in terms of very simple perception-action responses in the individual animals. Many schooling fish use several sensory modalities to keep a constant spacing between individuals. Both vision and lateral lines down the sides of the fish’s body, sensitive to pressure, are used (Partridge, 1987). Interestingly, the simple basic behavioral principle underlying schooling can be expressed in different ways in English, one way suggesting that a mirror neuron mechanism is at work, and the other way not suggesting this. One can say that the schooling fish’s basic rule is that sensing a neighbor turn in a given direction automatically triggers the action of turning itself in that same direction. Sensing a left turn triggers a left turn; sensing an upward turn triggers an upward turn; and so forth. Given this constant fast and automatic linkage of perception of action to performance of
304
James R. Hurford
the same action, it seems almost inescapable that neurons fitting the definition of mirror neuron are involved. Almost certainly, there are neurons involved both in the perception of the neighbor’s turn and the immediate turning response. Alternatively, one can express the facts differently, in a way not suggesting a mirror neuron mechanism. In this version, the basic rule is “Keep a constant distance from your neighbors.” This rule is implemented by perception of a decrease in the distance to a neighbor triggering a movement away from the neighbor, and by perception of an increase in the distance triggering a move toward the neighbor. It seems wrong to let the issue hinge on the pseudo question of whether the turning fish acts in response to perception of a turn or to perception of a change in distance. The two are inseparable; a turn causes a change in distance, and a change in distance implies a turn. The “keep a constant distance” principle is very similar to that used much of the time by a person driving a car in freeway traffic. When the car in front slows down, slow down; when the car in front speeds up, speed up. But would one be tempted to suggest that mirror neurons are involved in this basic aspect of freeway driving? Almost certainly not, for several reasons. The car in front is not an animal of any kind; one may not even be able to see the driver. Although one assumes it is being driven by a creature with a brain, one is not reacting directly to the pressure of that other driver’s foot on the gas pedal and the brake pedal. It is not the case that seeing the other driver step on her brake pedal prompts one to step on one’s own brake pedal. The response is essentially to the perceived distance to the hard metal shell on wheels in front. There is, however, a way of seeing this freeway driving behavior as involving tightly linked perception and performance of “the same action.” As a thought experiment, consider the car as an outer shell of the driver’s body, and thus the whole car + driver ensemble as a single locomotive organism. Its wheels are its limbs; the driver’s foot and the pedals are inner working parts of this single organism. The driver’s brain is the brain of the whole car + driver creature. If the driver is well trained, to the point where she regularly brakes involuntarily on perceiving a looming bumper in front, then in terms of the whole car + driver creature, this translates to the rule “Observation of slowing automatically triggers slowing,” just as with the schooling fish. In fact, the imaginative thought experiment is not necessary, if one allows a functional conception of what counts as “the same action.” If one may count slowing as a recognizable action on the part of a driver, whatever muscles are used to bring it about, and count observation of a looming bumper in front as observation of slowing, then the statement “Observation of slowing automatically triggers slowing” is a fair description of the driver’s behavior. And to the extent that the slowing behavior is automatically associated with observation of slowing behavior, it is likely that some of the neurons involved in the observation are also involved in the action.
Language beyond Our Grasp
305
Similar to flocking is sudden takeoff triggered by observation of sudden takeoff by a bird nearby. “A pigeon that signals its intention generally departs without disturbing the others. If a pigeon sees a sign of danger, however, it flies off without giving any intention signals. The other pigeons then immediately take alarm and fly up also” (McFarland, 1987a: 13). In cases where birds and mammals browse the same area in flocks and herds, escape responses in the birds are sometimes triggered by perception of escape behavior in the mammals, and vice versa. Escape by birds involves taking to the air; escape by the mammals involves running. The events can be described either as “Perception of escape behavior triggers escape behavior,” suggesting the possibility of a process which might involve mirror neurons, or as “Perception of running triggers taking to the air.” Viewed functionally, taking to the air and running off are instances of “the same action,” but viewed purely in terms of what limbs are involved, they are not the same action. I turn now to another kind of behavior which prompts questions about whether mirror neurons, or something like them, are involved. Some animals, notably the cuttlefish (Sepia officinalis) . . . are able to alter their coloration to match that of the background. (McFarland, 1987b: 54)
Neural control of the chromatophores enables a cephalopod to change its appearance almost instantaneously, a key feature in some escape behaviors and during agonistic signaling. Equally important, it enables cephalopods to generate the discrete patterns so essential for camouflage or for signaling. . . . The chromatophores are controlled by a set of lobes in the brain organized hierarchically. At the highest level, the optic lobes, acting largely on visual information, select specific motor programmes (i.e., body patterns); at the lowest level, motoneurons in the chromatophore lobes execute the programmes, their activity or inactivity producing the patterning seen in the skin. (Messenger, 2001: 473)
Although such color-changing behavior is not technically labeled “imitation” in the animal behavior literature, it is clearly similar to imitation. Rather than imitating a perceived action, the animal “imitates” a perceived brightness pattern or texture. Perceiving a stony pattern/texture triggers turning a stony pattern/texture; perceiving a sandy pattern/texture triggers turning a sandy pattern/texture. To the extent that this behavior is automatic, no doubt some particular neurons are involved both in the pattern perception and in the pattern-changing performance. Should we label these mirror neurons? The question is terminological, not empirical. . . . caterpillars of some swallowtails (Papilio spp.) and cabbage white butterflies (Pieris brassicae) change into green pupae when there are many green leaves present, but into brown pupae when the leaves are dead or absent. (McFarland, 1987b: 122)
306
James R. Hurford
In the case of the caterpillars, it may well not be perception of color that triggers the color change. It is just as likely to be something about the comparative smells of green leaves and dead leaves. But if a particular smell is regularly associated with green leaves, it is not unreasonable to say that the caterpillar is, at least indirectly, detecting greenness. If, as seems likely, the neurons responsible for somehow detecting greenness are also involved in the color change to green, should one call them mirror neurons? Again, the question is terminological rather than empirical. The bittern (Botaurus lentiginosus) also imitates its surroundings, but in motion rather than color. A bittern hiding in reeds stretches its neck upward, lengthening and narrowing its profile, and sways in unison with the reeds as they are moved by the wind (McGowan, 1997; Barrows, 1913). A reasonable description of the bittern’s behavior is “On seeing leftward swaying [of the reeds], sway left; on seeing rightward swaying, sway right.” The main difference between this and the behavioral rule for schooling fish is that the bittern is “imitating” not another animal (let alone a conspecific) but a nonsentient organism in its environment. Another very common form of defense mechanism is freezing, standing absolutely still. Freezing is not necessarily a response to a predator; many animals hold perfectly still for periods between short bursts of activity. An animal freezing acts to match its body to the surroundings by its immobility. A freezing deer, by not moving, in some sense imitates the rocks around it. Freezing is a static form of the “Keep a constant distance” behavior. Despite this “imitative” component to the behavior, conceptually similar to the colorchanging behavior of the cuttlefish, it would be hard to argue that freezing involves mirror neurons according to the classic definition. What these examples illustrate is the problematic nature of the phrase “the same action” in the canonical definition of mirror neurons. One problem is with the term same. It is a matter of judgment whether what the animal observes is “the same action” as what it performs. There are borderline cases which can be argued either way. If we choose to describe the behavior of a well-trained freeway driver as “Hit the brake pedal when you see the rear bumper of the car in front looming close,” even if perception of the looming bumper and hitting the brake involve some common neurons, these fall outside the definition of mirror neuron. But if we describe the behavior as “Slow down when you see the car in front slow down,” this seems to be a case involving perception and performance of “the same action” (i.e., slowing down). Another problem is with the term “action,” which seems to draw the limits of the class of mirror neurons too narrowly. Intuitively, there is a continuum of related, broadly imitative behaviors stretching from imitation of action (as with fish schooling), through “imitation” of pattern or texture (e.g., by cuttlefish) and “imitation of movement of
Language beyond Our Grasp
307
background” (e.g., by bitterns), to defensive freezing, where the animal “imitates the immobility of its surroundings.” None of these comments are intended to diminish the significance of the experimental work that gave rise to the term mirror neuron. My suggestion is that mirror neurons occupy one corner of a continuous, extremely diverse, space of possible neuronal arrangements. Neural organization that is mirror neuronlike to various degrees can be found widely across many species. Natural selection has shaped schooling by fish and flocking by birds (Partridge, 1987; Krebs, 1987), color-changing in cuttlefish and some caterpillars, and defensive freezing, in many species. A fish which turned left when its neighbors (swimming in the same direction) turned right would have become an isolated easy target for a predator (Hamilton, 1971). The selective advantages of camouflaging color-change and freezing are obvious. Any involuntary behavior which increases the fitness of an individual is likely to have been naturally selected, with the necessary neurons getting hardwired during the individual’s development. Involuntary imitative behaviors are merely a subcase, and because some of the same neurons are involved in the perception and performance of what can be described as the same action, they can be accorded the special label “mirror neuron.” Suppressible or Learned Imitative Behaviors Expressed informally, what happens with the macaques involved in experiments is that on seeing a human grasping a nut, the monkey’s brain takes the first small step toward carrying out a grasping action, but the action is not completed. The action is suppressed, or inhibited. It is the firing of a neuron which normally fires during an action when that action is not being carried out, but merely observed, that attracts so much scholarly and journalistic attention to mirror neurons. If the action were actually routinely completed, on observation of “the same action,” the case would not seem so interesting, and would be classified as a familiar instance of “reflex imitative action,” as with schooling fish. In humans, yawning and laughter are often triggered involuntarily by observation of other people yawning or laughing. With some effort of the will, one can resist the temptation to laugh on hearing another person laugh, and one has to be in the right mood for the automatic laugh mechanism to work fully. But there can be no doubt that, in the right circumstances, observing laughter triggers laughter. Entertainment companies boost the perceived funniness of their shows by introducing canned laughter. Yawning on seeing yawning is a weaker, less reliable response, but there is nevertheless an effect. In the cases where the laugh or yawn response is not inhibited, the neurons mediating between stimulus and response conform approximately to the definition of mirror neuron; they fire “when” observing the action and “when” carrying it out.
308
James R. Hurford
The scare quotes around “when” here acknowledge the slight delay between the observed laugh or yawn and the evoked laugh or yawn. Any mirror neurons involved presumably fire in the later stages of the observation of the event and in the preparatory stages of the triggered performance. If the response is inhibited, it is an empirical matter whether any such neurons fire. (Here an experimental problem arises, because deliberate [i.e., faked] yawning and laughter are not controlled by the same mechanisms as spontaneous yawning and laughter. But an experiment could be possible, in principle, along the following lines. Put a subject alone in a room, reading a funny book, and monitor brain activity when the subject laughs spontaneously. The empirical question is how far this activity resembles brain activity on hearing or seeing laughter.) In fact, a wide variety of actions can trigger spontaneous imitative responses in humans. Quite often, individuals mirror the behaviors of their conversational partners without having conscious intention of doing so (Condon and Ogston, 1967; Kendon, 1970). In an informal group, people may cross their legs at similar angles, hold their arms in similar positions, even simultaneously perform head or hand motions (Rotondo and Boker, 2002). Such imitative behaviors are suppressible, and Rotondo and Boker discuss cases of such “symmetry breaking.” To the extent that such imitative responses are not suppressed, it is a fair bet (though a proposition subject to empirical verification) that mirror neurons are involved. Even when the imitative behavior is suppressed, there could be some activation of the neurons involved in preparatory stages of such gestures. Here now is an example involving communication. “[Vervet] monkeys often grunt as they watch another animal, or as they themselves, initiate a group movement across an open plain” (Cheney and Seyfarth, 1990: 114). If this description of the vervets’ behavior is adequate, since this grunt must be initiated by some specific neural activity, this is another clear case of mirror neurons. These grunt-neurons fire both when the animal starts out across open terrain and when it observes another animal doing so. It can be inferred from the experimenters’ description (using “often”) that this behavior is susceptible to suppression. Other vervet vocalizations are clearly sensitive to differing circumstances, and are suppressible. Implications In the following sections, I will discuss where we should now take the arguments presented above. The next section will relate to the first section, “Mirror Neurons and Linguistic Signs,” and to the issue of whether mirror neurons are innate or acquired. The following section, picks up from the second section, “Mirror Neuronlike Structures Are Probably Common,” will ask whether the existence of coordinated perceptual and motor neurons is more significant than the fact that in higher animals many such coordinated
Language beyond Our Grasp
309
schemes are masked by heavy systems of more or less voluntary invocation and inhibition. Learned or Innate? The classical experiments revealing mirror neurons were conducted on adult macaques that had certainly observed and practiced grasping many times in their lives. One wants to know whether the same results could be obtained with very young macaques. Is the firing of a mirror neuron on observing grasping an evolved innate response, like the schooling fish’s turning response? Or is it a learned response? An adult macaque could have learned to associate the sight of its own hand grasping a nut with the grasping action, and generalized this association to include the sight of a human hand grasping. Michael Arbib (personal communication) reports recent studies by Luciano Fogassi in which some of the same neurons involved in breaking a peanut in half also fire when the monkey hears the sound of a peanut breaking. This suggests a learned response, since the sound of a peanut breaking is very specific and perhaps not likely to have been accurately targeted by natural selection. Again, there is an experiment demanding to be done (suggested to me by Arbib). Could a monkey be trained to associate some very different sound, artificially piped into its ears via headphones, with its action of breaking a nut? And would perception of this sound then activate some of the neurons activated in the act of breaking a peanut? (The peanut-breaking case is subtly different from the well-known nut-grasping case. Much of an observed grasping action precedes the actual taking of the nut, whereas the sound of a peanut breaking is simultaneous with, or slightly after, the center of the action.) Section 1 emphasized the arbitrariness of the Saussurean sign, and its consequence that the sound-image and the meaning, or concept, associated with a word are intrinsically not the same. This lack of sameness is fatal to any straightforward idea that preexisting mirror neuron structure mediates humans’ impressively fast learning of arbitrary symbolic mappings. Some of the imitative behaviors discussed in section 2 are innate, while others are clearly learned (e.g., the well-trained freeway driver slowing upon observing slowing). “Unnatural” (i.e., noninnate, imitative) responses evidently can be instilled by training, just as nonimitative, in fact arbitrary, responses can be drummed in. Michael Arbib (personal communication), replying to my challenge about the arbitrary nature of the linguistic sign, made an insightful remark to the effect that the relation between a retinal image caused by observing grasping and a motor neuron firing somewhere else in the cortex is also “arbitrary.” The macaque’s brain is intricately wired up to connect a certain specific pattern of activation of cells of one sort in the retina to an equally specific pattern of activation in an array of cells of a quite different sort in a quite distant
310
James R. Hurford
part of the brain. The sameness between the observed and the performed action is external to the brain. Consider an analogy. The cities connected by a road are not “the same city.” There is nothing intrinsic to each city that in some sense demands that it be connected to the others, as opposed to similar cities across the sea. The connectedness of one city to another by a road is a contingent geographical fact, brought about by the usefulness of a road connection. Neural connections arise, in phylogeny or ontogeny, between intrinsically dissimilar substructures of the brain, for similar functional reasons. Stamenov argues similarly that the “sameness” of the observed action and the performed action are post hoc constructions of the experimenter. What happens in the macaque’s brain . . . is due to a resonance-based deictic (here-and-now) attunement of a quite peculiar sort. . . . The appearance of intersubjectivity of MNS, to my mind, is an artefact of the conceptual differentiation in its functioning of two separate and different entities—of “observer” and “agent”—that are afterwards identified with (or mapped onto) each other. It is their mapping that makes the way MNS functions as if “dialogically tuned” and potentially capable of supporting such high-level cognitive capacities like social learning and intersubjective sharing of experience. (Stamenov, 2002: 254)
What humans are amazingly good at is building such arbitrary neural connections. After learning, the association of a word with its meaning is automatic and reflexlike. To an English speaker, utterance of /dog/ brings to mind the concept DOG, and /kæt/ means CAT, and it takes physical injury to disrupt such connections. Facile acquisition of tens of thousands of such arbitrary symbols is not mediated by any preexisting connection, of a strict mirror neuronlike sort, between concepts and sound-images. The firing of the macaque’s grasping mirror neurons may not be innate, but acquired through some sort of Hebbian learning on repeated perception of its own grasping. Then a human child’s acquisition of massive vocabulary could possibly be seen as differing only in degree from this. This overly simple story would attribute a child’s learning of each arbitrary word-meaning mapping to repeated hearing of the word coupled with a clear ostensive indication of its meaning. Very possibly, a few word-meaning mappings are acquired in this way. But it is clear that this cannot be the whole story of vocabulary acquisition. Much vocabulary (e.g., abstract words) cannot be directly tied to any class of percepts. Inhibition Was the Major Step Section 2 emphasized the widespread occurrence of automatic and semivoluntary responses to observation of an action by performance of the same action, arguing that mirror neuronlike organization is common, and often advantageous to the individual. Mirror neuron arrangements are merely a special case, made more interesting by the involvement of the predicate “same,” of close neural connections from perception to motor (or premotor) activation. Perception-to-motor linkages are the stuff of animal life. In lower
Language beyond Our Grasp
311
animals, they are largely genetically determined and not subject to inhibition. In higher animals, there is a greater potential for acquired perception-to-motor linkages (e.g., conditioned responses). In the higher animals, too, there is both higher incidence of inhibition of motor responses and greater freedom from immediate stimulus control, because concepts may be “brought to mind” without direct perceptual input. Using new technology to see something previously hidden is always exciting at first. But reflection may tell us that what we have seen was not, after all, unexpected. The discovery of mirror neurons in fellow primates has been taken to suggest that such linked perceptual and motor mechanisms evolved relatively late and, significantly, probably only in the lineage leading to humans. I have not seen discussion of mirror neurons in nonprimate species. Two features mark mirror neurons as discussed in the literature: the perceptuomotor linkage, and its hiddenness. We may ask which of these evolved first. We humans clothe our naked bodies; nakedness is the state of nature. The ability to mask the primeval state of nature with clothing came very late in the day. The data mentioned in the second section suggest that imitative perceptuomotor linkage of all sorts is common in nature. What is probably new in the recent evolutionary lineage of humans is the ability to inhibit or suppress some of the motor aspects of imitative perceptuomotor linkage. This is consistent with what we know about brain evolution and function. In particular, human brains differ most markedly from chimpanzee and other primate brains in having disproportionately large prefrontal cortex (Deacon, 1997: 219). Further, “Tasks sensitive to prefrontal damage . . . all have to do with using information about something you’ve just done or seen against itself, so to speak, to inhibit the tendency to follow up that correlation and instead shift attention and direct action to alternative associations” (Deacon, 1997: 263). Most pertinently, Deacon argues that “The ability to overcome the symbol-learning problem can be traced to the expansion of the prefrontal cortical region, and the preeminence of its projections in competition for synapses throughout the brain” (Deacon, 1997: 220). In Homo sapiens, the supremely plastic and self-controlling animal, the voracious acquisition of arbitrary symbols certainly involves the creation of neural connections, but these are embedded so deeply in inhibiting systems, and also subject to such complex systems of “voluntary” evocation, that any behaviorist stimulus-response interpretation of the arbitrary sound-meaning relationship is wholly inappropriate. Last Words I will end as I began, with a quotation from Stamenov, because his conclusions are so parallel to mine, although we arrived at them from quite different directions, and citing different data.
312
James R. Hurford
. . . MNS does not perform the work the same way in monkeys and humans (if we assume a causal role of MNS for language origin). In the latter species it can apparently function not only as part of a local brain circuit, but also in an unencapsulated way as a component of the central system supporting the processing of speech and language. If this indeed turns out to be the case after further experimental verification—that the MNS in humans is a double-action system—this would entail both good news and bad news. The bad news would be that one and the same class of neurons functions in different ways in two biological species. This means that from studying monkeys’ brains we cannot infer for sure how . . . human brains perform even on the “low” level of the way classes of neurons function. This is definitely not . . . good news, as the majority of neurological studies of monkeys and primates are made with an eye that the human brain performs the same way. The good news would be rather more hypothetical in nature and consequences. It involves the construction of a controversial scenario involving the unencapsulation of the serial component of MNS on an evolutionary scale, and the generalization of its application to the nascent mechanisms of speech and language. (Stamenov, 2002: 269–270)
There is a long way to go from mirror neurons to language. References Arbib MA (2001) The mirror system hypothesis for the language-ready brain. In: Computational Approaches to the Evolution of Language and Communication (Cangelosi A, Parisi D, eds.). Berlin: Springer-Verlag. Arbib MA (2002) The mirror system, imitation and the evolution of language. In: Imitation in Animals and Artifacts (Nehaniv C, Dautenhahn K, eds.). Cambridge, Mass.: MIT Press. Barrows WB (1913) Concealing action of the bittern (Botaurus lentiginosus). Ibis 30: 187–190. Carey S (1978) The child as word-learner. In: Linguistic Theory and Psychological Reality (Halle M, Bresnan J, Miller GA, eds.). Cambridge, Mass.: MIT Press. Carey S, Bartlett E (1978) Acquiring a single new word. Papers Repts Child Lang Devel 15: 17–29. Cheney D, Seyfarth R (1990) How Monkeys See the World: Inside the Mind of Another Species. Chicago: University of Chicago Press. Condon WS, Ogston WD (1967) A segmentation of behavior. J Psychiat Res 5: 221–235. Deacon T (1997) The Symbolic Species: The Co-evolution of Language and the Human Brain. London: Penguin Press. Fadiga L, Fogassi L, Gallese V, Rizzolatti G (2000) Visuomotor neurons: Ambiguity of the discharge or “motor” perception? Internat J Psychophysiol 35 (2–3): 165–177. Gallese V, Fadiga L, Fogassi L, Rizzolatti G (1996) Action recognition in the premotor cortex. Brain 119: 593–609. Hamilton W (1971) Geometry for the selfish herd. J Theoret Biol 31 (2): 295–311. Kendon A (1970) Movement coordination in social interaction: Some examples described. Acta Psych 32: 1–25. Kohler E, Keysers C, Umiltà MA, Fogassi L, Gallese V, Rizzolatti G (2002) Hearing sounds, understanding actions: Action representation in mirror neurons. Science 297: 846–848. Krebs JR (1987) Flocking in birds. In: The Oxford Companion to Animal Behaviour (McFarland D, ed.), 204–208. Oxford: Oxford University Press. Liberman AM (1957) Some results of research on speech perception. J Acous Soc Amer 29: 117–123.
Language beyond Our Grasp
313
Liberman AM, Cooper FS, Shankweiler DP, Studdert-Kennedy M (1967) Perception of the speech code. Psych Rev 74: 431–461. Liberman AM, Mattingly IG (1985) The motor theory of speech perception revised. Cognition 21: 1–36. Lyn H, Savage-Rumbaugh S (2000) Observational word learning in two bonobos (Pan paniscus): Ostensive and non-ostensive contexts. Lang Commun 20 (3): 255–273. Martin A, Wiggs CL, Ungerleider LG, Haxby JV (1996) Neural correlates of category-specific knowledge. Nature 379: 649–652. McFarland D (1987a) Alarm responses. In: The Oxford Companion to Animal Behaviour (McFarland D, ed.), 13–14. Oxford: Oxford University Press. McFarland D (1987b) Camouflage. In: The Oxford Companion to Animal Behaviour (McFarland D, ed.), 53–55. Oxford: Oxford University Press. McGowan C (1997) The Raptor and the Lamb. London: Penguin Books. Messenger JB (2001) Cephalopod chromatophores: Neurobiology and natural history. Biol Rev 76 (4): 473–528. Murata A, Fadiga L, Fogassi L, Gallese V, Raos V, Rizzolatti G (1997) Object representation in the ventral premotor cortex (area F5) of the monkey. J Neurophysiol 78 (4): 2226–2230. Partridge B (1987) Schooling. In: The Oxford Companion to Animal Behaviour (McFarland D, ed.), 490–494. Oxford: Oxford University Press. Pellegrino GD, Fadiga L, Fogassi L, Gallese V, Rizzolatti G (1992) Understanding motor events. Exper Brain Res 91: 176–180. Reynolds CW (1987) Flocks, herds, and schools: A distributed behavioral model. Comp Graph 21 (4): 25–34. Rizzolatti G, Arbib MA (1998) Language within our grasp. Trends Neurosci 21 (5): 188–194. Rizzolatti G, Fadiga L, Gallese V, Fogassi L (1996) Premotor cortex and the recognition of motor actions. Cognit Brain Res 3 (2): 131–141. Rotondo JL, Boker SM (2002) Behavioral synchronization in human conversational interaction. In: Mirror Neurons and the Evolution of Brain and Language (Stamenov M, Gallese V, eds.), 151–162. Amsterdam: John Benjamins. Saussure F de (1916) Cours de Linguistique Générale. Paris: Payot. Saussure F de (1959) Course in General Linguistics (Baskin W, trans.). New York: The Philosophical Library. Savage-Rumbaugh S, McDonald K, Sevcik RA, Hopkins WD, Rubert E (1986) Spontaneous symbol acquisition and communicative use by pygmy chimpanzees (Pan paniscus). J Exper Psych: Gen 115 (3): 211–235. Skoyles J (1998) Speech phones are a replication code. Med Hypoth 50: 167–173. Also available at http:// cogprints.soton.ac.uk/documents/disk0/00/00/07/82/index.html. Stamenov ML (2002) Some features that make mirror neurons and human language faculty unique. In: Mirror Neurons and the Evolution of Brain and Language (Stamenov ML, Gallese V, eds.), 249–271. Amsterdam: John Benjamins. Toner J, Tu YH (1998) Flocks, herds, and schools: A quantitative theory of flocking. Phys Rev E58 (4): 4828–4858. Vihman MM (1993) Variable paths to early word production. J Phonet 21: 61–82. Vihman MM (2002) The role of mirror neurons in the ontogeny of speech. In: Mirror Neurons and the Evolution of Brain and Language (Stamenov M, Gallese V, eds.), 305–314. Amsterdam: John Benjamins. Westermann G, Miranda ER (2002) Integrating perception and production in a neural network model. In: Connectionist Models of Cognition and Perception (Bullinaria JA, Lowe W, eds.). London: World Scientific. Wolf NS, Gales ME, Shane E, Shane M (2001) The developmental trajectory from amodal perception to empathy and communication: The role of mirror neurons in this process. Psychoanalyt Inquiry 21 (1): 94–112.
17
How Far Is Language beyond Our Grasp? A Response to Hurford
Michael A. Arbib Understanding the Mirror System Hypothesis Rizzolatti and Arbib (1998) developed the mirror system hypothesis (MSH) that mirror neurons (for grasping) offer a neural missing link in the evolutionary development of brain mechanisms supporting human language. Hurford (chapter 16 in this volume) grounds his critique of this claim on press reports that oversimplify the work of Rizzolatti and Arbib. I agree with Hurford that “There is a long way to go from mirror neurons to language,” but fear that his article obscures the nature of the journey. Humans have language and monkeys do not. To probe the differences and commonalities in brain mechanisms involved, Rizzolatti and Arbib focus on the macaque mirror neurons for grasping—neurons in the F5 area of premotor cortex that fire both when the monkey executes a specific type of grasp and when he sees a human or other monkey executing a more or less similar grasp. Rizzolatti and Arbib then argue for a four-stage progression: 1. Grasping 2. A mirror system for grasping 3. A manual-based communication system, breaking through the fixed repertoire of primate vocalizations to yield an open repertoire 4. Speech as a result of “invasion” of the vocal apparatus by collaterals from the manual/orofacial communication system. Human brain imaging, which distinguishes brain regions rather than neuron-by-neuron encoding, has revealed that Broca’s area (a key human speech area) is the region in human frontal cortex that “lights up” during both grasping and observation of grasping when contrasted with simple observation of objects. The development of manual communication en route to speech along the hominid line is thus postulated to involve the evolution of Broca’s area from the F5-equivalent of the common ancestor of humans and monkeys. In particular, Rizzolatti and Arbib give particular importance in stage (3) to the ability to pantomime, in which a hand movement becomes recognized as standing for something very different. This move anticipates and, to some extent, addresses the key element of Hurford’s critique (p. 298), that mirror neurons cannot, by their very nature, provide a basis for the central, essential structural relation in human language: the bidirectional arbitrary mapping between sounds and meanings inherent in the Saussurean sign . . .
316
Michael A. Arbib
Hurford does see the existence of mirror neurons as lending partial support to the motor theory of speech perception (i.e., that the mental representation of perceived speech is in terms of motor articulatory categories). This is in itself a concession to MSH, since evidence for mirror neurons for grasping offers no direct support for mirror neurons for articulation; it is only via stage (4) of MSH that the articulation of words may be seen to parallel the grasping of objects. Many authors (most recently Stokoe, 2001) have argued for the parallelism of spoken and signed language, but something like the MSH is still needed to bridge from grasping in monkeys to language signs in humans. Hurford (chapter 16 in this volume) reviews a wide range of animal behaviors—schooling fish, flocking birds, escape behavior, human yawning and laughter, and more—that may involve arrangements more or less like mirror neurons. I find some of the examples convincing, while others are a bit of a stretch, but I stress here that MSH does not state that mirror systems in general provide the kernel of a language system, so whether or not mirror systems are widespread is irrelevant to an assessment of MSH. What MSH does say is that a specific mirror system, the mirror system for grasping, is the shared heritage of humans and monkeys, and that in humans it evolved to provide core components for the language system. However, having a mirror system for grasping alone does not equip monkeys for language, and so current work on MSH does indeed go beyond the mirror, as Hurford advocates. The Saussurean Sign: Is There a Mirror System for Concepts? Figure 17.1 (a) is my diagram of Hurford’s view of the Saussurean sign. If we include recognition and production of sign language under “hear” and “say,” then the top row of
Hear
Mirror for Words
Say
Hear
Mirror for Words
Say
A Schema Network for Concepts
Act
{ The Sign Relation }
Perceive
Mirror for Concepts
(a)
Act
Perceive
(b)
Figure 17.1 The sign relation links words and concepts. But is there (a) a mirror system for concepts as Hurford suggests, or (b) a schema network for concepts, as this chapter argues, with only some concepts having mirror neurons?
How Far Is Language beyond Our Grasp?
317
the figure seems to make explicit the progression within MSH, as follows (Arbib, 2002a, supplies details which go beyond Rizzolatti and Arbib 1998), of mirror systems for 1. Grasping and manual pragmatic actions 2. Pantomime of grasping and manual pragmatic actions 3. Pantomime of actions outside the pantomimic’s own behavioral repertoire (e.g., flapping the arms to mime a flying bird) 4. Conventional gestures used to formalize and disambiguate pantomime (e.g., to distinguish “bird” from “flying”) 5. Protosign, comprising manual (and related orofacial) communicative gestures 6. Protospeech, a wide variety of vocal gestures. The transition to pantomime of actions outside the pantomimic’s own behavioral repertoire is crucial in extending the range of communication beyond the animal’s own behavior. It also makes possible the extension of pantomime to objects (e.g., by pantomiming a typical use of the object or sketching its shape). However, the true transition comes with the mastery of conventional gestures by a community to formalize and disambiguate pantomime, for once the community has made this discovery, its members are then free to invent arbitrary gestures to communicate concepts for which pantomime is ill suited, yielding step (5). The development of mirror systems (5) and (6) may overlap in time, with the necessary brain mechanisms coevolving. It is important here to distinguish speech from other vocal gestures for communication. Monkey vocalizations (such as the snake and leopard calls of vervet monkeys) are related to the cingulate cortex rather than the F5 homologue of Broca’s area. I think it likely (though empirical data are sadly lacking) that the primate cortex contains a mirror system for such vocal communications, and that a related mirror system persists in humans, but I suggest that it is a complement to, rather than an integral part of, the speech system that includes Broca’s area in humans. Note that the mirror systems described above all relate to the top row of figure 17.1 (a). As for the bottom row, Hurford suggests (p. 301) that there may also be aspects of mirror neuron organization in the conceptual domain: . . . if humans are . . . like macaques, the mental representation of the concept GRASP/GRASPING involves some neurons that are involved both in the act of grasping and in the observation of grasping. . . . Similarly, it seems likely that a representation of the concept WALK/WALKING will involve mirror neurons involved both in the observation and the performance of walking. . . . [Moreover,] representations of objects [may] . . . involve some congruence between motor and sensory neurons similar to that found in the representations of actions. . . . one’s concept of, say, an
318
Michael A. Arbib
apple includes motor information about how to hold it and bite it, as well as sensory information about what it looks/tastes/smells like.
However, I do not agree with this suggestion that there is a mirror system for all concepts, and do not regard this notion as a necessary part of MSH. To explain this, I must detour briefly into schema theory as I see it (Arbib, 2002b). I distinguish between perceptual schemas and motor schemas. A perceptual schema not only determines whether a given “domain of interaction” is present in the environment but also can provide parameters concerning the current relationship of the organism with that domain. Motor schemas provide the control systems which can be coordinated to effect a wide variety of actions. More generally, cognitive psychology views schemas as cognitive structures built up in the course of interaction with the environment to organize experience. Shallice (1988: 308n) stresses that the schema “not only has the function of being an efficient description of a state of affairs . . . but also is held to produce an output that provides the immediate control of the mechanisms required in one cognitive or action operation.” This raises the question of why I do not combine perceptual and motor schemas into a single notion of schema that integrates sensory analysis with motor control. Indeed, there are cases where such a combination makes sense. However, recognizing an object (an apple, say) may be linked to many different courses of action (to place it in one’s shopping basket; to place it in a bowl; to pick it up; to peel it; to cook with it; to eat it; to discard a rotten apple, etc.). Of course, once one has decided on a particular course of action, then specific perceptual and motor subschemas must be invoked. But note that, in the list just given, some items are apple-specific whereas others invoke generic schemas for reaching and grasping. It was considerations like this that led me to separate perceptual and motor schemas—a given action may be invoked in a wide variety of circumstances; a given perception may precede many courses of action. There is no one grand “apple schema” which links all “apple perception strategies” to “every action that involves an apple.” Moreover, in the schema-theoretic approach, “apple perception” is not mere categorization—“this is an apple”—but may provide access to a range of parameters relevant to interaction with the apple at hand. Thus I reject the notion of a mirror system for concepts. Instead, I visualize the brain as encoding a varied network of perceptual and motor schemas. Only rarely (as in the case of certain basic actions) will the perceptual and motor schemas be integrated into a “mirror schema.” In general, a word may be linked to many schemas, with varying context-dependent activation strengths. I do not see a “concept” as corresponding to one word, but rather as being a graded set of activations of the schema network. See figure 17.1 (b).
How Far Is Language beyond Our Grasp?
319
The Plasticity of Mirror Systems Hurford asks what, in neural terms, the sign relation (bidirectional arrow) might be in figure 17.1, stressing that the “arbitrariness of the sign” implies that there is generally no overlap between neurons involved in the representation of the sign’s meaning and those involved in representation of its sound. He thus concludes that mirror neurons do not help to account for the human facility in acquiring a large vocabulary and that “it cannot be the case that preexisting mirror neurons facilitate this process.” I agree to some extent, but it is important to correct a misleading viewpoint smuggled into the last quote by speaking of preexisting mirror neurons. The classic papers on the mirror system for grasping in the monkey certainly focus on a repertoire of grasps that seems so basic that it is tempting to think of them as prewired. However, observation of human infants shows that many months pass before a human infant has in its motor repertoire the basic grasps (such as the precision pinch) for which mirror neurons have been observed in the monkey. Oztop, Bradley and Arbib (in press) thus argue that, in monkey as well as humans, the basic repertoire of grasps is attained through sensorimotor feedback. They present the infant learning to grasp model (ILGM) that explains this process of grasp acquisition; a complementary model, the MNS1 model of Oztop and Arbib (2002), explains how mirror neurons may organize themselves to recognize grasps as they become added to the motor repertoire. Future modeling will address the issue of how the infant may eventually learn through observation, with mirror neurons and grasping circuitry developing in a synergistic manner. Indeed, the Parma group has recently studied mirror neurons for actions which are accompanied by characteristic sounds, and found that a subset of these are activated by the sound of the action (e.g., breaking a peanut in half) as well as sight of the action (Kohler et al., 2002). This expands the point that perceptuomotor integration in mirror neurons may be highly plastic; it also offers possible mechanisms to be exploited in the transition from manual to vocal signing. Indeed, where Hurford stresses the arbitrary nature of the linguistic sign, I note that the relation between a retinal image caused by observing grasping and a pattern of motoneuron firing that yields a similar action is also “arbitrary.” The sameness between the observed and the performed action is external to the brain. Arbib and Rizzolatti (1997) sketched how to build on the work of Jordan and Rumelhart (1992) to distinguish the learning of how to perform an action (the “direct” model) from learning how to recognize that action (the idea being that the mirror system embodies an “inverse” model). Turning to detailed simulations, the MNS1 model (Oztop and Arbib, 2002) shows how neural plasticity can yield such connectivity through correlated experience rather than “prewiring.”
320
Michael A. Arbib
A Measure of Agreement Hurford is quite right when he concludes his article by stating, “There is a long way to go from mirror neurons to language.” The issue is whether to read this statement as saying “MSH is correct, but of course is only part of the story,” or “MSH is simply false, and contributes nothing to the understanding of language evolution.” Despite the controversialist stance of his article, I think Hurford subscribes to the former, more supportive view. However, he does assert that “Language follows from being amazingly good at building arbitrary neural connections,” rather than building on its inheritance of a mirror system for grasping. I would respond that the brain makes only those connections within a neural structure that are possible within the genetic bounds of its locale in the brain, its cellular morphology, and its plasticity. Such characteristics distinguish a human brain from a chimpanzee or monkey brain, but that in no way renders all learning tasks equally amenable for the human. It is “easy” for the child to learn to speak or to sign; it is hard for the child to learn to read and write. Returning to figure 17.1, we need to explain why it is easy for humans to build a “mirror system” for “words,” and the MSH explains why, counterintuitively, protosign may have provided essential scaffolding for the emergence of protospeech and the language-ready brain. Hurford’s discussion of the Saussurean sign reminds us that we also need to understand how this mirror system can be linked to the system of concepts (though, as I have noted, the notion of “associated concept” needs to be reformulated in terms of schema theory). In each case, being amazingly good at building neural connections may be crucial, but I suggest that this is not a general property happily spread across the human brain; instead it involves different patterns of plasticity linked to specific brain mechanisms which evolved along the hominid line. For this, the MSH is crucial; but we still face immense challenges as we build on the MSH to go beyond the mirror to build a new, action-oriented approach to linguistics. Acknowledgment Preparation of this article was supported in part by a fellowship of the Center for Interdisciplinary Research at the University of Southern California. References Arbib MA (2002a) The mirror system, imitation, and the evolution of language. In Imitation in Animals and Artifacts (Nehaniv C, Dautenhahn K, eds.), 229–280. Cambridge, Mass.: MIT Press. Arbib MA (2002b) Schema theory. In: The Handbook of Brain Theory and Neural Networks, 2nd ed. (Arbib MA, ed.). Cambridge, Mass.: MIT Press.
How Far Is Language beyond Our Grasp?
321
Arbib M, Rizzolatti G (1997) Neural expectations: A possible evolutionary path from manual skills to language. Commun Cognit 29: 393–424. Jordan MI, Rumelhart DE (1992) Forward models: Supervised learning with a distal teacher. Cognit Sci 16: 307–354. Kohler E, Keysers C, Umiltà MA, Fogassi L, Gallese V, Rizzolatti G (2002) Hearing sounds, understanding actions: Action representation in mirror neurons. Science 297: 846–848. Oztop E, Arbib MA (2002) Schema design and implementation of the grasp-related mirror neuron system. Biol Cybernet 87(2): 116–140. Oztop E, Bradley NS, Arbib MA (in press) Infant grasp learning: A computational model. Rizzolatti G, Arbib MA (1998) Language within our grasp. Trends Neurosci 21(5): 188–194. Shallice T (1988) From Neuropsychology to Mental Structure. Cambridge: Cambridge University Press. Stokoe WC (2001) Language in Hand: Why Sign Came Before Speech. Washington, D.C.: Gallaudet University Press.
VI
CONCLUDING REMARKS
18
Directions for Research in Comparative Communication Systems
D. Kimbrough Oller and Ulrike Griebel An Integrated View of Communication Evolution The multifaceted interaction that produced this volume suggests at least two broad realms where major new achievements are on the horizon. First, based in part on the discussions among the authors and on their writings, the goal of formulating a workable new framework of general properties for potential communication systems (or “design features”) appears to be attainable, although there are hurdles ahead to reaching a consensus. A successful framework should make possible a set of general standards for comparison among communication systems of various species, and should form the basis for more productive speculations about evolutionary patterns for communication systems, including language. Second, the volume has offered an intriguing, and surprisingly integrated, view of the ecological conditions that may have led to the hominid explosion in the realm of communication. This view of early hominid ecology focuses on how complex social behavior in hominids may have created conditions under which vocal communication was especially advantageous. A New Framework for Evolutionary Analysis of Communication and for Crossspecies Comparisons There are no fewer than five chapters in this volume that address the fundamental question of how the human linguistic system can be differentiated on principled grounds from more primitive forms of communication. The proposals in these chapters differ in terminology and in points of focus, but can be seen to be compatible in a number of regards. Taken as a whole, they offer a hopeful view toward the possibility that a new, generally acceptable “design features” approach may be on the horizon, extending the pioneering efforts of Hockett (1960). Coupling of Primitive “Fixed Signals” and Decoupling in Language First, it is clear that various authors in this volume (Millikan, Harms, Oller, Sinha, and Gärdenfors) view primitive communication systems as being relatable to language, if one looks at them through the lens of a sufficiently general framework. This is a view that contrasts sharply with the traditional Chomskyan perspective, wherein possible parallels between language and nonhuman communication systems have been treated as uninteresting (Chomsky, 1967).
326
D. Kimbrough Oller and Ulrike Griebel
But more important, all the authors present proposals in which general properties characterizing possible communication systems can be utilized to provide a standard of comparison for the degree of elaborateness and power of communication in differing species. In each case the proposals suggest a logic in which more primitive communication types provide a foundation upon which more elaborate communication types can be built. Also, the various proposals share the view that more primitive communications tend to be simpler than language in precisely the sense that the parts of primitive communications (or the representations upon which they depend) are bound in inarticulate wholes, grounded in the present, both temporally and spatially, with indicative aspects (references to entities or events) and response-influencing (effects of signaling) aspects that are coupled and inseparable. Language, on the other hand, offers the freedom to make the relation between reference and potential effects of communicative signals flexible and decoupled. An act of human language is unlike the fixed signal (or call) of an animal, which is always produced in the same general circumstances, thus referring (to the extent that a call can be said to refer) to the same entities or events on every occasion. Several of the authors use as an example a particular vervet monkey warning call (Cheney and Seyfarth, 1990) that can be thought (in the traditional anthropomorphic characterization) to “refer” to a leopard or to “danger from the ground.” Animal calls such as the vervet warning call also tend to result in the same effects on the receiver on each occasion of use. The vervet hearer tends to look around on the ground for danger or immediately to run up a tree. Acts of human language are enormously more flexible than such animal calls. Language allows us to decouple, to divorce reference from specific immediate conditions of occurrence of any referent entity, as well as from the specific effects that might be engendered by use of any particular word or sequence of words. We can talk about any entity, using a word to refer to it, whether or not the entity is present or nearby, and we can do so while intending a wide variety of different effects on different occasions of producing the same word. For example, if a human says “leopard,” in a forest where leopards are known to exist, a human listener might take the word as a warning. But we can also say “leopard” merely to bring the idea of leopards to mind, to initiate a comment upon their beauty, to begin to evaluate their hunting capabilities, or merely to talk about the meaning of the word. This decoupled flexibility of the human communicative capability (in contrast to the fixed signal limitation found in animal calls) is the key that all the chapters address, although in somewhat different ways. But then there is a wide variety of ways that flexibility is manifest in the human system of communication, and this fact seems to be at the root of both the richness of and the discrepancies among the portrayals found in these articles.
Directions for Research in Comparative Communication Systems
327
A Key Point of Difference among the Views Expressed in the Volume Some of these discrepancies may well represent mere differing choices among the authors about what angle they use for viewing the fundamental differences between animal calls and language, and what terminology they use in describing them. But the proposals may not be, in every respect, merely terminological variants reflecting formally equivalent options for framework development. On one point we would like to venture an opinion about what may reflect a fundamental difference of formulation. This point concerns the characterization of primitive representations as found in, for example, the sorts of nonhuman primate calls discussed above. It is our opinion that the notion that there are two aspects (or “faces”) of representation in such cases requires, for optimal formulation, an explicit incorporation of Austin’s notion of “illocutionary force” (Austin, 1962), and a resistance to characterizing the primate calls in terms of the notion of “meaning” as Austin distinguishes it from illocutionary force. Here, our view stands in contrast to the specific formulations of Millikan and Harms. Returning to the vervet warning call as an example, we view it as confounding to assert that the vervet call (given its empirical descriptions to date) has “meaning”—or, to put it another way, that the call says anything, such as “There is a leopard” or “Run up a tree.” There is no proper possible formulation in any natural language of a word or a sentence to mimic what the warning call constitutes, as Harms correctly notes in his argument about untranslatability of animal calls. But he claims that the Austinian notion of illocutionary force would complicate the characterization of animal calls, while we see it as the natural simplifying solution to that characterization. When the vervet call is issued, it constitutes a warning, and that is, we think, precisely what the call is. It is not a word or sentence that “says” anything. In the Austinian notion of illocutionary force, a warning is a kind of illocutionary force, something that can be done (not before or after, but) in an act of communication. An animal call can transmit a warning, but it does not have a “meaning” in our interpretation of Austin’s usage because it does not contain any word (or anything approximating a word) with flexible referential character. Words, unlike animal calls, can be used to refer to entities in any circumstance and from any point of view, and they require no external stimulus for speakers to produce them. Neither the vervet alarm call nor any other primate vocal communication, to our knowledge, has referential flexibility. The vervet alarm constitutes a warning that is indeed produced in a particular circumstance (where danger from the ground is perceived), and it is clear that the call has been naturally selected to produce a response of self-protection from that potential danger in the hearer. But unlike language, where we can make any specification we wish with the words we choose, the vervet call is overwhelmingly
328
D. Kimbrough Oller and Ulrike Griebel
nonspecific. It does not indicate whether the danger is from a leopard or some other ground-living predator, or even from a mere rustling in the bushes, and it cannot by its nature provide any such specification. Further, it cannot suggest or command listeners to do anything specific. It cannot tell them to look for danger nor to take cover, nor to run, nor to hide, nor to climb. In contrast, of course, a language can do any of these things with words and sentences. By variations in intensity and abruptness, the primate call can offer an indication of the level of urgency of a warning, but this level of specification seems to be no more notable than the levels of urgency that a human newborn can express with variation in the intensity or abruptness of its cry (see chapter 9 in this volume). When all is said and done, the vervet call (as we understand it thus far from its empirical descriptions) is very unlike a word or a sentence, and very much more like a newborn infant’s cry. The vervet call has the distinction of having been naturally selected to occur in response to a general location (the ground) of potential danger, but such a distinction does not in any regard elevate it to the level of words. Neither alarm calls nor human infants’ cries are meaningful, as we interpret Austin’s sense of the term. So if it bears so little resemblance to a word, how is the vervet call in any way comparable to language? Our answer, again, is that what words, animal calls, and baby cries have in common is that all of them, when they are produced, constitute vocal acts with illocutionary forces, and the recognition of that shared characteristic provides a common ground in terms of which proper comparisons can be made across both species and ages. Every “specialized” communicative act (in Hockett’s sense, the term encompasses both fixed signals and words) can be characterized in terms of illocutionary force. The vervet alarm call is a warning (a term that specifies the call’s illocutionary force). The human infant’s cry is a distress call (a term that specifies its illocutionary force). And any word or sentence spoken in a natural human language also expresses some illocutionary force— but not necessarily the same one on each occasion of use. Any word can express a wide variety of illocutionary forces. A key difference between the primitive communication (the call) and the linguistic communication is that the linguistic communication (the word or sentence) can express both an illocutionary force and a meaning, and that the two are not coupled in the linguistic communication. The human speaker can use the same meaning (referring to the category of leopards by saying “leopard,” for example), but can do so with any number of different illocutionary forces on different occasions: The speaker can utter the word to call attention to a leopard, to request a picture of a leopard, to correct a misperception about a lion, to warn someone about a leopard, or merely to give an example in an article on the evolution of language. Nonhuman calls (and human fixed signals such as cries and shrieks) do not have this sort of decoupled flexibility.
Directions for Research in Comparative Communication Systems
329
To the extent that fixed signals may imply meanings (references), they do so with extremely weak specificity, and any greater specification must be supplied by an active hearer rather than by the signal itself. Further, whatever specificity is involved directly in the signal is predesignated by natural history. No such limitation is imposed upon meaning of the sort that Austin saw in language and distinguished from illocutionary force. On the Future of Framework Development for Description of Evolution in Communication Systems The optimal formulation of the relation between illocutionary force and meaning is one of the keys, we think, to establishing the basis for lasting, insightful comparison among communication systems of differing species. Humans clearly differ from nonhuman primates in the flexible ability of the humans to manipulate a communicative system that differentiates illocutionary force and meaning in every utterance. Such a capability is seen in human children by about two years of age. But in addition, there are many other ways that humans’ vocal capabilities are starkly distinct from, and more powerful than, those of any of their primate relatives, and many of these vocal advancements occur very early in the life of the human infant. By the first few months of human life, contextual freedom of vocal usage, free expressivity, and free directivity are all well in place (see chapter 4 in this volume and Oller, 2000). All of these are properties that, if they occur in nonhuman primates, do so only with weak and transitory character as far as the empirical evidence illustrates (see, e.g., chapter 8 in this volume). The development of a lasting model to replace the Hockett scheme will require a clear theoretical formulation of the steps that yield the human vocal progress in the first months of life, as well as comparative empirical demonstrations of vocal communication capabilities in both human infants and nonhuman primates. Of course the model will need to account for many other species besides the primates, and it seems that in some cases, at least, other species (parrots and song birds provide good examples) may surpass the nonhuman primates in vocal command, including, perhaps, in the ability to decouple “words” from particular illocutionary forces, after “language” training (see chapter 10 in this volume). Much remains to be determined on this point, because little research has directly addressed the possibility of such decoupling in any species other than humankind. It is notable that many of the authors in the volume present lists of presumed steps or properties of communication systems, and in each case (Harms, Oller, Steels, Sinha, Gärdenfors, and Fitch), the goal is to provide a logical basis for cross-species comparison and evolutionary speculation. A consensus is not yet in place, but the goal of providing a new “design features” model is clearly on the research agenda for many scientists pursuing the elusive evolution of communication systems.
330
D. Kimbrough Oller and Ulrike Griebel
The Ecological Conditions That Led to the Hominid Explosion Through the course of the workshop and the editing of this volume, one of the most exciting aspects of the experience was the recognition that progress is being made on characterization of the conditions that may have led to language. A convergence of new empirical information and increasingly plausible speculations based upon theoretical modeling is fostering confidence that the evolution of language is being fundamentally illuminated. The chapters in this volume that provide the new perspectives were inspired by various empirical and theoretical enterprises, all of which have focused upon changes in the social conditions of early hominids in comparison with their ape cousins. Dunbar (chapter 14) provides a key suggestion when he presents evidence that group size in ancient hominids moved beyond that of any other apes and has grown further in modern humans. He argues convincingly that the larger groups demanded greater capabilities to communicate in order to maintain social cohesion. Note that this argument emphasizes the dyadic character of communication, its function as a social bonding and maintenance mechanism, rather than the triadic character of communication that we see in communication between two parties about a third point in the triad: an entity, circumstance, or event. The growth of dyadic communication comes first in Dunbar’s scenario, and triadic growth later. Snowdon’s (chapter 8) primary thesis appears to be bolstered by the group size hypothesis of Dunbar. Snowdon argues that humans, and presumably early hominids, as cooperative breeders (unlike other apes), had especially strong social reasons to bond and to maintain social ties because they needed to work together for the survival of their own young. Larger groups could have provided more opportunity for cooperative effort in the rearing of young. Snowdon’s thesis also supports Fitch’s (chapter 15) contention that language evolution was spurred on by kin selection, operating within groups that had reason to rear young cooperatively and to support each other to maximize survival of their gene lines. Fitch, utilizing Hamilton’s idea of inclusive fitness (Hamilton, 1963), emphasizes that kin selection offers the key to breaking the stifling cycle of Zahavian handicaps (Zahavi, 1975) and Machiavellian conflicts of interest that might hold communicative evolution back, because in the kin selection circumstance, the effects of communicative conflict (as emphasized by Zahavi) are minimized, while effects of cooperation are maximized. Peter Gärdenfors offers yet another set of reasons to think that it was cultural and social settings that drove the linguistic explosion in the hominids. He argues that humans, and by implication ancient hominids, have had a particularly notable cognitive capability to think ahead, to detach thoughts from the here and now, and to cooperate in the formulation of plans into the future, designed to enhance survival of the cooperating group. This view is also supported by Dunbar’s group size idea, since larger groups require more coordination for division of labor, provisioning, and maintenance of relationships. Gärdenfors
Directions for Research in Comparative Communication Systems
331
argues that nonhumans do not think beyond their perceived present (either internally or externally), and that because humans do, they have a special foundation, not found in other animals. Because humans have a thought capability that is detached from the here and now, they are able to communicate in ways that are also detached from the here and now. Many of the authors (including Gärdenfors, Sinha, Millikan, Harms, and Oller) argue that communication thus freed of the here and now provides a critical step up in the progression toward powerful communication. The idea that it was changes in the social conditions in which hominids lived that led to this capability, changes that included an increased social need to bond, and then to cooperate, provides a fundamental alternative to the traditional Darwinian interpretation of language as a product of sexual selection. Fitch offers empirical reasons to doubt that sexual selection played the primary driving role in the emergence of language, and the social argument that can be synthesized across the chapters in this volume, regarding group size growth in the hominids, cooperative breeding, kin selection, and cooperative communication, provides a compelling new way to look at the driving forces in language evolution. Steels’s (chapter 5) robotic and computer simulations also provide reasons to believe that once a varied social system is in place (the sort of system that would be necessary with large groups and cooperative breeding), much communicative evolution might take place through dynamic self-organizing processes in both dyadic and triadic communicative circumstances. Connectionist simulations such as those of Christiansen and Dale (chapter 6), and theoretical arguments from Chris Sinha (chapter 12) also suggest that selforganization in a “constructive” social context can form the basis for significant growth in a communicative system. Of course, none of this rules out a role for sexual selection in the advancement of communication systems, but the overview does indicate that evidence is mounting to place more fundamental and general social changes in ancient humans at center stage in the evolution of language. References Austin JL (1962) How to Do Things with Words. London: Oxford University Press. Cheney DL, Seyfarth RM (1990) How Monkeys See the World: Inside the Mind of Another Species. Chicago: University of Chicago Press. Chomsky N (1967) The general properties of language. In: Brain Mechanisms Underlying Speech and Language (Darley FL, ed.). New York: Grune and Stratton. Hamilton WD (1963) The evolution of altruistic behavior. Amer Nat 97: 354–356. Hockett CF (1960) The origin of speech. In: Human Communication: Language and Its Psychobiological Bases. Readings from Scientific American. San Francisco: W.H. Freeman. Oller DK (2000) The Emergence of the Speech Capacity. Mahwah, N.J: Lawrence Erlbaum. Zahavi A (1975) Mate selection: A selection for a handicap. J Theoret Biol 53: 205–214.
Index
Acoustic analysis formant, 78, 181–182, 278, 293 pitch, 50, 57–58, 65, 98, 105, 158–159, 162, 167–168, 170, 181, 242 resonance, 49–50, 57–58, 293, 310 Adaptation. See Learning Affordance, 18–19, 24, 26–29, 302 Ainsworth, M., 154–155, 158, 161–163, 166–167 Ant, 75, 242, 277 Attention gaze following, 227–228 joint, 9, 82, 145–146, 187, 191, 225, 228, 230 joint reference, 224–225, 228–229, 231 mutual gaze, 227, 234 pointing, 71, 82, 153, 224–225, 228, 234, 246, 262 shared, 82 Austin, J. L., 6, 11, 34–35, 47, 60–62, 64, 327–329, 331 Avian. See Bird Babbling, 136–137, 139, 142–143, 147, 150, 172, 177, 191, 300 canonical, 136 in the pygmy marmoset, 136 infant, 142 Baldwin, J. M., 72, 176, 178–179, 187, 220, 233–234 Bat, 141, 146–147, 277, 296 Behavior assessment and management of, 152–153 behaviorism, 113, 238–239, 311 mutual regulation of, 153, 165 Bird, 147, 171–172, 176, 178, 180–182, 184, 186–187, 190–191, 293 African grey parrot, 8, 62, 171–172, 177–178, 181, 184, 187, 190–192 avian communication, 184 birdsong, 138, 148, 172, 191, 219, 234, 289 bittern, 306–307, 312 budgerigar, 182, 192 oscine, 179–180 parrot, 8, 145, 172–179, 182, 184–190, 192, 217, 329 passerine, 276, 295 pigeon, 191, 305 songbird, 20, 134, 145, 172, 180, 182, 187, 189, 192, 218, 287 starling, 142, 147 white-crowned sparrow, 180, 185, 187, 189, 191 wren, 180, 188 Bonobo. See Primate Bowlby, J., 152, 154–155, 157, 167 Brain mirror neuron, 297–313, 315–317, 319–321 neocortex, 144, 259–261, 270–273
neuron, 10, 297–298, 301, 303–304, 306–307, 309–310, 315, 317, 321 plasticity, 21, 137, 139, 141, 145, 148, 154, 185, 211, 219–220, 319, 320 prefrontal cortex, 311 Bühler, K., 221, 223, 225, 234 Call alarm, 5, 17, 51, 140, 142, 144, 147, 149, 234, 276, 280–281, 283, 293–296, 327–328 animal, 326–328 contact, 150, 192, 262–263 distress, 58, 328 food call, 136, 140–141, 280 Catarrhine monkey. See Primate Cephalopod cuttlefish, 196–202, 206, 210–211, 213, 305–307 squid, 8, 197, 199–212 Cetacean, 142, 150, 171 dolphin, 133, 141, 148, 188, 190, 228, 276, 291, 295 whale, 276, 283, 287, 295 Child abuse, 157, 163, 167, 169–170 care, 145, 160 child-directed speech, 98–99, 103, 105 deafness, 137, 145 development, 56 emotion regulation in, 162 (see also Emotion) language, 221 (see also Language, acquisition of) learning, 62, 98 (see also Learning) prelinguistic, 300, 302 rearing, 145, 168 Chimpanzee. See Primate Chomsky, N., 70–72, 88, 115, 126, 219, 234, 245, 284, 294, 325, 331 Cognition animal, 252, 255 categorization, 40, 46, 85, 127, 172, 185–186, 225, 250, 318 concept formation, 81, 87, 245 conceptual space, 247–253 domain generality, 202 domain specificity, 202, 211 future goals, 237, 241, 243–246, 252–253 inner world, 238, 239, 241–247, 249, 253–254 planning, 10, 69, 115, 200, 231, 237, 239–241, 254–255 Communication affective, 23, 41, 44, 92, 99, 142, 150, 163, 164, 169, 176–177, 179, 185 affiliative, 9, 136, 144, 168–169, 257, 265 among kin, 64, 280–281, 289–290, 292 deception in, 10, 131, 227, 244, 256, 275, 277–278, 288, 294, 296
334
Communication (cont.) dyadic, 121, 123, 226, 259, 268, 270, 330–331 gestural, 49, 62, 143, 145, 148, 161, 225, 235, 257, 296 honest, 10, 276–277, 279, 281, 288 pragmatics of, 15, 29, 69, 73–74, 82, 153, 159, 178–179, 224, 230, 244–246, 248, 285, 317 triadic, 228, 259, 330–331 Communication systems infrastructure of, 50, 64 modeling of, 3–4, 6–7, 9, 70, 89, 96, 106–108, 174–176, 187, 190–191, 303, 319, 330 natural logic in evolution of, 5, 50–51, 54, 56 power of, 326 Concealment, 193, 196–198, 200, 204 Connectionism, 6, 22, 25, 91–93, 96, 98, 106–109, 313, 331. See also Neural networks; Dynamical systems; Self-organization Connotation, 33. See also Meaning Construal in human communication, 9, 221, 224, 230–232, 241 Constructivism, 29, 219 Crustacean, 202, 293 Darwin, C., 31, 48, 79, 88, 154, 167, 220–221, 253–254, 275–276, 285–288, 293, 331 Dawkins, R., 43, 47, 115, 126, 153, 167, 195, 201, 208, 211, 277, 279, 293 De Condillac, E. B., 3, 11, 54, 64 De Saussure, F., 52, 65, 298, 313 Design features of communication systems, 3–5, 11, 49–50, 54, 65, 147, 201, 325, 329. See also Properties of communication systems Dialect, 138–139, 145, 147, 172, 188, 189, 192, 218, 246, 290–292 Discretization, 249 Dog, 24, 27, 226, 251, 294, 310 Dolphin. See Cetacean Dynamical systems, 91. See also Neural networks; Connectionism; Self-organization Elman, J. L., 91, 93–94, 96, 107–108 Emergence, 9–10, 31, 35, 41–42, 44, 46, 62, 65, 76–77, 79, 89, 96, 105–109, 111–112, 126–127, 148–149, 190, 218, 220–221, 224–225, 227–229, 231–235, 243, 245, 253–255, 262, 320, 331 Emergentism, 9, 41–42, 46, 62, 106, 112, 166, 185, 192, 218, 220–222, 232–233, 241, 252. See also Self-organization Emotion aggression, 8, 65, 156, 187, 204, 265, 290 emotional expression, 56, 150, 297 expression of distress, 5, 51, 58, 151, 154–155, 158–162, 165, 168–169, 328
Index
facial expression, 5, 17–19, 23, 71, 127, 143, 158, 163, 178, 201, 227, 238, 250, 264–265, 269, 271, 320 fussing, 151, 158, 162, 165 laughter, 9, 133, 149–150, 162, 257, 264–273, 307–308, 316 smile, 150, 162–265, 268, 272–273 temperament, 158, 167 Epigenesis, 9, 168, 218–221, 232–234 Evolution convergent, 193 language, 3, 6–7, 10, 49, 63–64, 72, 88, 91–93, 97, 99, 105–107, 108, 131, 143–144, 146, 149, 171, 230, 244, 253, 255, 257, 262, 271–273, 276, 279, 281–283, 285–288, 295, 297–298, 300, 312, 320, 328, 330–331 Finite state systems, 136–137, 144 Fish, 195–197, 199–200, 202–203, 206, 211–212, 238, 303–304, 306–307, 309, 316 Frege, G., 31, 33–34, 47 Frog, 132, 287 Gallistel, C. R., 16–17, 29 Game theory, 77 Genetics, 72–75, 127–128, 133, 139, 217–218, 220–221, 233, 275, 277, 279, 281, 285, 290–291 Genome, 72, 75, 124, 217 Gibson, J. J., 18, 24, 29, 256 Gorilla. See Primate Grice, P., 29, 32, 47 Group size in human evolution, 9–10, 144, 147, 233, 244, 257, 259–263, 270–272, 286, 293, 330–331 Hamilton, W. D., 43, 47, 275–276, 280, 288–289, 294, 307, 312, 330–331 Handicap principle. See Signal Hockett, C., 3–5, 11, 49–51, 54, 65, 173, 188, 201, 211, 230, 234, 241, 254–255, 325, 328–329, 331 Hominid evolution, 7, 9–10, 64, 70, 91, 106, 244–245, 257–258, 261–263, 286, 292, 315, 320, 325, 330–331 Hunting, 131, 145, 171, 187, 209, 246, 254–255, 260, 280, 326 Illocutionary force, 6, 8, 42, 60–63, 173, 327–329. See also Meaning Imprinting, 21, 152, 219, 291, 293–294 Inclusive fitness. See Natural selection Infancy, 57, 65, 139, 141, 150, 168–170, 219, 228, 232–234, 276 Infant cry, 8, 151–152, 154–156, 158, 161, 164–167, 169–170
Index
crying and colic, 151, 161, 166, 170 development, 168, 220 vocalization, 49 Infraphonology, 5, 52, 57, 59, 61 Infrasemiotics, 5, 52, 55–56, 59–60 Innate, 21, 73, 75, 77, 97–98, 108–109, 138–139, 142, 152, 219, 222, 227, 232, 254, 284, 300, 303, 308–310 Insect bee, 17–20, 22, 25, 202, 209, 242 butterfly, 199, 305 caterpillar, 305–307 dances of bees, 19–20, 202, 242 Instinct, 21, 29, 65, 89, 137–138, 149 Intelligence, 21, 45, 88, 147–148, 184–188, 191, 202–203, 208–211, 254, 256, 285–286, 293 artificial intelligence, 6, 70, 127 machiavellian, 252, 286 Intentionality of communication. See Communication Invertebrate, 209, 213. See also Cephalopod; Insect Involuntary communication. See Communication Isomorphism, 16–19, 148 Kin selection. See Natural selection Langacker, R. W., 9, 11, 69, 89, 224, 230, 234, 246, 255 Language acquisition of, 72–73, 88–89, 91–93, 96–97, 99, 105–109, 146, 177, 180, 188–192, 221, 234 adjective, 9, 72, 98, 201, 250–252 change in, 70, 94, 100, 232 consonants, 132, 178, 182, 190, 300 detachment in communication, 238, 242 gossip, 89, 144, 147, 237, 244, 254, 272, 286, 290, 293 gossip as grooming, 286 grammar, 90, 232 head order in grammatical systems, 95, 101, 103–105 historical linguistics, 70 language game, 79–80, 82, 85–87, 90, 246 lexical characteristics of, 63, 69, 71, 76, 80, 82–83, 85, 95, 99–101, 103–105, 109, 189, 192, 230, 282, 289 noun, 9, 71–72, 94–96, 98–99, 201, 248–251, 257 parameter setting in grammar acquisition, 75 phonology, 15, 49–50, 65, 77, 94, 97–99, 107–109, 132, 136–137, 172, 181–182, 187, 189, 220, 255, 283, 285, 290, 292, 294, 300 recursion in, 104, 107, 116–117, 172 sentence, 31–34, 39–40, 47, 69, 93, 95, 98, 100, 109, 283, 327–328
335
sign language, 15–17, 19, 22, 27–28, 32, 36, 39, 54, 128, 140, 143, 145, 147–148, 162, 173, 218, 224, 230, 297–298, 308, 316 syntax, 15, 32, 38–39, 42, 88–90, 95–99, 103, 105, 107–109, 115, 126, 144, 148–150, 172, 178–180, 187, 223, 245, 252, 255, 283–286, 289, 294 universal grammar, 71, 73, 77, 284, 295 universals of language, 89, 137 Larynx, 132, 182, 292–294 Learning inhibition, 155, 164, 197, 303, 309–311 learnability, 76, 96–97, 99, 104, 108–109, 249 reinforcement, 79, 85, 88 scaffolding, 73, 140, 320 statistical, 109, 138, 147, 149 unsupervised, 87 Lewontin, R. C., 106, 108 Lorenz, K., 10, 18, 29, 54, 64–65, 152, 163, 168, 291 Mammal, 8, 24, 26, 39, 182, 185, 226, 239, 278, 280–281, 283, 286–287, 291, 294, 296, 305 Marler, P., 134, 138, 148, 150, 158, 169, 172, 176–179, 186, 189, 218–219, 234, 278, 295 Meaning. See also Connotation; Illocutionary force; Symbolization extension, 5, 76, 198, 200, 317 intension, 5, 16, 33 Memes, 113, 124, 126 Model of mind, 37, 42, 44–45, 47 Mollusc, 211, 213 Mortality, 47, 156–157 Mother tongue, 237, 244, 275–276, 280–281, 287–290, 292 Motivation, 20, 27, 38, 133–134, 145, 160, 163–165, 179, 205, 221, 232, 240 Multiple-cue integration, 98–99, 106 Music, 9, 257, 261, 263, 271–272, 284 Mutual exclusivity, 177, 189–191 Nativism, 220 Natural selection adaptation, 17, 43, 46–48, 52, 57, 65, 69, 77, 89, 91, 96, 99, 106–107, 109, 144, 147, 154–157, 160, 163, 168, 170, 186, 201–202, 210–212, 233, 276–278, 281, 284–285, 290–291 arms race, 286 inclusive fitness, 10, 275–277, 279–280, 281, 288–289, 330 kin selection, 10, 43, 144, 275–276, 281, 288–291, 293, 330–331 sexual selection, 146, 237, 253, 286–289, 292, 294–295, 331
336
Natural selection (cont.) survival, 51, 111, 156–157, 222, 239, 276, 286, 289, 330 Neanderthal, 91, 261, 284 Neophenotype, 163 Neural networks. See also Connectionism; Dynamical systems; Self-organization feedback in, 73–75, 79–83, 87, 152, 154, 156, 158, 161–162, 166, 174, 178, 205, 291, 300, 319 recombination, 172, 188, 283 simple recurrent networks, 94 Ontogeny, 91, 105, 147, 149, 154, 163, 167, 169, 188, 190–191, 193, 227, 232–233, 284, 300, 310, 313 Oscine. See Bird Parrot. See Bird Passerine. See Bird Pattern detection, 6, 7, 115, 118–119 hidden, 7, 112, 114, 122 temporal, 111, 113 Peirce, C. S., 53, 65, 221, 224, 235 Perception, 18–19, 22, 24–29, 41, 42, 47, 78, 136, 148, 150, 152, 162, 166–169, 188, 197, 199, 210, 212, 219, 234–235, 240, 252, 254–255, 284, 294, 299–300, 303–307, 309–313, 318 Phenotype, 219 Pheromones, 124 Phonation, 57, 190, 192 Phylogeny, 91, 105, 111, 150, 171–172, 199, 220, 225, 227, 232, 272, 284, 310 Piaget, J., 219, 235, 255 Pinker, S., 72, 74, 89, 91, 97–98, 109, 137, 149, 284–285, 295 Population, 37, 70, 74–76, 79, 85, 100, 132, 138–139, 145, 255, 275, 294 Posture, 194, 197, 200 Pragmatics of communication. See Illocutionary force Predator, 16, 21, 41–44, 61–62, 142, 144–145, 156, 164, 186, 193, 195–201, 203–204, 207–209, 217, 234, 238, 280–281, 293–295, 306–307, 328 Primate ape, 7, 9, 27, 131–134, 137, 142–144, 146, 148–149, 173, 187, 190, 192, 235, 237, 240, 242–244, 254–256, 258, 260, 262, 269, 272, 293, 302, 330 baboon, 134, 187, 260, 262, 272 bonobo, 132–133, 217, 235, 242, 262, 313 catarrhine monkey, 260 chimpanzee, 131–133, 140, 146–149, 187, 217–218, 227, 234–235, 238, 240, 244, 253–256, 260, 264–265, 269, 273, 289, 311, 313, 320
Index
gorilla, 132, 147 lemur, 142, 149, 188, 211 macaque, 134–136, 139, 142, 147–149, 156, 273, 293, 298, 301, 307, 309–310, 315, 317 marmoset, 7, 132, 134–137, 139, 141–144, 147, 149–150 primatology, 148, 233 rhesus monkey, 142, 148–149, 156, 167–170, 293 sifaka monkey, 142, 149 spider monkey, 260 squirrel monkey, 160 tamarin, 7, 134–136, 138, 139–144, 146–150, 191 titi monkey, 136, 144, 149 vervet monkey, 34, 38–40, 61, 65, 134, 139, 142, 147, 149, 168, 217, 223, 229, 234, 258, 273, 281, 293, 308, 317, 326, 327–328 Properties of communication systems arbitrarity, 16, 26, 37, 77, 173, 221, 223, 229, 283, 298–302, 309–311, 315, 317, 319–320 conventionality or conventionalization, 15, 28, 31–34, 37–39, 41–44, 98, 221, 223–224, 228–230, 232, 253, 257, 260–262, 268, 271, 279, 296, 317 directivity, 56, 58, 329 displacement, 173, 209, 241, 254 economy, 76, 249–251 efficiency, 43, 51–52, 63, 143, 246, 253 expressivity, 56, 58, 76, 329 imitation, 56–58, 65, 77–78, 89, 98, 133, 187, 283–284, 287, 291–292, 299–300, 305–306, 312, 320 indexicality, 39, 42, 53–54, 159, 165, 221, 224–225, 246 intentionality, 15, 29, 32, 34, 175, 183, 185, 213, 221–222, 224, 226–227, 232, 255, 269–271, 277, 305, 308 involuntary, 303, 307 openness, 51–52, 182, 209, 220 productivity, 16, 37, 43, 51, 221, 283 purposefulness, 51 referentiality, 5, 9, 19, 31–33, 37, 39–40, 42, 49–50, 60–63, 74, 83, 108, 144, 160, 167, 172–179, 184–185, 190–191, 198, 221–230, 232, 237, 241–242, 246–247, 250–252, 254, 257, 298, 326–327 specialization, 54 voluntary, 15, 309, 311 Prototype, 78–79, 243, 249, 255 Psittacine. See Bird, parrot Psychology, 28, 32, 47–48, 113, 125–126, 128, 188, 195, 201, 211–212, 221, 233, 254, 256, 271, 298, 318 Recapitulation, 232 Reflex, 18, 37, 42, 142, 196, 208, 271, 303, 307
Index
Representation cued, 238, 241 detached, 238–242, 253–255 mental, 15–18, 20–21, 23, 27–28, 173, 298–299, 301, 316–317 Robotics, 6, 78, 82, 85, 87, 89, 124, 145 Searle, J., 17, 29, 235 Self-organization, 6, 9, 75, 77, 79, 85, 88, 112, 331. See also Neural networks; Dynamical systems; Connectionism Semantics, 5, 16, 31–32, 35, 38, 41–48, 60–64, 90–98, 150, 160, 178–179, 245–246, 252, 255, 275, 285, 290, 295 Semiotics, 160, 221, 224, 231 Signal communicative display, 51, 58, 61, 131, 138, 142–143, 177, 188, 193–196, 198–201, 203–212, 223, 271, 279, 286–288, 293 communicative value of, 226 conventional, 296 coupling of signal and function, 5, 6, 60–61, 75, 325 cue integration, 99, 106 decoupling of signal from function, 5, 6, 8, 52, 54–58, 60–63, 159–160, 325, 329 fixed, 5, 51, 54, 56, 58, 64, 326, 328–329 fixed action pattern, 18, 152 graded cue, 160 handicap principle in evolution of, 44, 164–165, 278, 282, 295–296 natural sign, 15, 17, 18, 27, 39, 41, 54 playback, 135–136, 149–150 request, 8, 61, 144, 173, 175, 181, 242, 266, 328 ritualization of, 54, 204, 226, 229, 234 threat, 58, 61, 64, 136, 144, 155, 157, 200, 208, 279 warning, 5, 16, 34, 38, 40–45, 47, 61–62, 199, 207, 217, 326–328 Slobin, D. I., 96, 109, 232, 235 Sociality altruism, 48, 275–277, 289–290, 293 attachment, 151–152, 154–157, 160, 166–168, 170, 271 bonding, 9, 141, 142, 151–152, 172, 244, 257, 258–265, 267–268, 270–271, 330–331 caregiving, 64, 151, 156–161, 163–164, 166–167 coalition building, 141, 188, 258–259, 270, 291 cooperation, 10, 43–44, 46–47, 65, 209, 220, 237, 242–244, 246, 252–253, 293, 330 cooperative breeding, 7, 10, 64, 140, 331 cultural transmission, 91, 106–107, 147, 218 culture, 6, 69–70, 72–77, 80, 88, 90–91, 98, 106–107, 124, 131, 147, 150, 166, 168–169, 189, 211, 217–218, 229, 232–235, 255–256, 259, 263, 290, 330
337
food sharing, 140–141, 144, 296 grooming, 9, 89, 144, 147, 226, 237, 244, 254, 257–264, 270–273, 286, 293 group size, 258–260 intersubjectivity, 65, 226–229, 232–233, 235, 310 of animals, 45, 222, 258 reciprocal altruism, 273, 277, 290, 296 social interaction, 50, 65, 128, 136–137, 141–142, 144, 146, 172, 174, 175–177, 187, 189–190, 192, 201, 251, 257–258, 261, 273, 312 social learning, 73–74, 77, 79, 86, 89–90, 140, 147, 192, 310 social structure, 133, 134, 251 Socioeconomic status, 157 Software, 6–7, 118, 128 Sound image, 298–299, 301–302 Speech articulation, 20–23, 69, 78–79, 107, 182–183, 190, 298–300, 316 articulatory filter hypothesis, 299–300 coarticulation, 172, 182–183, 191 motor theory of speech perception, 294, 299–300, 313, 316 prosody, 58, 61, 97–99, 107–109 vowel, 71, 77–79, 89, 98, 132, 178, 182, 190, 192, 300 Squid. See Cephalopod Structural coupling, 75, 81, 85, 88 Syllable, 50, 57–58, 71, 77, 90, 99, 114, 138, 283, 285, 296, 300 Symbolization, 9, 42, 69, 73, 82, 107, 148, 169, 173, 188, 218, 220–221, 223–225, 228–233, 235, 237, 241–244, 246–247, 252–254, 293, 309, 311–313 icon, 15, 221 symbolic elaboration, 9 Syrinx, 181–182 Theme software, 7 Theory of mind, 245, 253, 271–272, 284 Tinbergen, N., 18, 29, 54, 65, 115, 128, 170, 212, 284, 296 Tomasello, M., 73, 90, 97–98, 109, 145, 150, 228, 232, 235, 242–243, 245, 256 Trivers, R. L., 46, 48, 165, 170, 277, 280, 290, 296 Umwelt, 152, 157 Vertebrates, 21, 147, 164, 182, 193, 195–200, 208, 276, 278, 287, 294 Vocal development, 49, 134, 136–138, 143, 146, 149, 169 expansion stage of vocal development, 136 production, 55, 133, 138–139, 141, 144, 282, 293–294 repertoire, 133, 135, 142, 147–148
338
Vocal (cont.) signals, 52, 58, 134, 142, 278 tract, 71, 78, 91, 182–183, 278, 293, 300 usage, 141, 329 Vocalization, 9, 50, 56, 61, 64–65, 134, 136, 170, 190, 288 Voice, 58, 72, 180, 182, 296 Voicing, 134, 182 Voluntary communication. See Communication Waddington, C. H., 219, 235 Whale. See Cetacean Yawning, 307–308, 316 Zahavi, A., 10, 41, 44, 48, 153, 170, 275, 277–278, 282, 288, 296, 330–331
Index